from:"Koen Segers"

Re: [ofa-general] PCI-Express payload

2008-03-31 Thread Koen Segers

Hi Dotan,

You are refering to the MTU of InfiniBand. This parameter defines how
much data a single packet can contain when transferred over an
Infiniband network. 

I want to know the MTU of PCI-Express. PCI-Express is used to transfer
data from memory to HCA and vice versa. As far as I know, this MTU can
differ. Our SAS HBA, has a PCI-Express MTU of 512 bytes. 

Regards,

Koen


On Mon, 2008-03-31 at 15:05 +0300, Dotan Barak wrote:
> Hi.
> 
> You can check what is the maximum supported MTU of your HCA using 
> ibv_devinfo.
> Most of the HCAs supports 2KB MTU and some of them 4KB MTU.
> 
> The MTU is the maximum payload size that can be sent/received
> (the space for the IB headers are not part of the MTU)
> 
> 
> Every connected QP (RC/UC) can define which MTU value he is using 
> (according to the
> used path).
> 
> 
> I hope that is the info you asked for ..
> Dotan
> 
> Koen Segers wrote:
> > Hi,
> >
> > I'm interested in computing the overhead coming from PCI-Express to an 
> > IB HCA. Therefore I need to know the payload size of different types 
> > of HCA.
> >
> > We have InfiniHost III Mellanox cards of 4x SDR and DDR IB. According 
> > the lspci -vvv command on SLES 10 SP1, the PCI-e payload of these 
> > cards is maximum 128 bytes.
> >
> > Can someone give my the payload size for a ConnectX 4x SDR and DDR IB 
> > card?
> >
> > Thank you,
> >
> > Koen
> >
> > This is my output:
> > 41:00.0 InfiniBand: Mellanox Technologies MT25208 InfiniHost III Ex 
> > (rev a0)
> > Subsystem: Mellanox Technologies MT25208 InfiniHost III Ex
> > Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
> > ParErr+ Stepping- SERR- FastB2B-
> > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- 
> > SERR-  > Latency: 0, Cache Line Size: 64 bytes
> > Interrupt: pin A routed to IRQ 209
> > Region 0: Memory at e170 (64-bit, non-prefetchable) [size=1M]
> > Region 2: Memory at e180 (64-bit, prefetchable) [size=8M]
> > Capabilities: [40] Power Management version 2
> > Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
> > PME(D0-,D1-,D2-,D3hot-,D3cold-)
> > Status: D0 PME-Enable- DSel=0 DScale=0 PME-
> > Capabilities: [48] Vital Product Data
> > Capabilities: [90] Message Signalled Interrupts: Mask- 64bit+ 
> > Queue=0/5 Enable-
> > Address:   Data: 
> > Capabilities: [84] MSI-X: Enable+ Mask- TabSize=32
> > Vector table: BAR=0 offset=00082000
> > PBA: BAR=0 offset=00082200
> > Capabilities: [60] Express Endpoint IRQ 0
> > *Device: Supported: MaxPayload 128 bytes, PhantFunc 0, 
> > ExtTag+*
> > Device: Latency L0s <64ns, L1 unlimited
> > Device: AtnBtn- AtnInd- PwrInd-
> > Device: Errors: Correctable- Non-Fatal+ Fatal+ 
> > Unsupported-
> > Device: RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
> > Device: MaxPayload 128 bytes, MaxReadReq 4096 bytes
> > Link: Supported Speed 2.5Gb/s, Width x8, ASPM L0s, Port 8
> > Link: Latency L0s unlimited, L1 unlimited
> > Link: ASPM Disabled RCB 128 bytes CommClk- ExtSynch-
> > Link: Speed 2.5Gb/s, Width x8
> >
> >  
> > *** Disclaimer ***
> >
> > Vlaamse Radio- en Televisieomroep
> > Auguste Reyerslaan 52, 1043 Brussel
> >
> > nv van publiek recht
> > BTW BE 0244.142.664
> > RPR Brussel
> > http://www.vrt.be/disclaimer
> > 
> >
> > ___
> > general mailing list
> > general@lists.openfabrics.org
> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> >
> > To unsubscribe, please visit 
> > http://openib.org/mailman/listinfo/openib-general
> 
*** Disclaimer ***

Vlaamse Radio- en Televisieomroep
Auguste Reyerslaan 52, 1043 Brussel

nv van publiek recht
BTW BE 0244.142.664
RPR Brussel
http://www.vrt.be/disclaimer
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[ofa-general] PCI-Express payload

2008-03-31 Thread Koen Segers

Hi,

I'm interested in computing the overhead coming from PCI-Express to an
IB HCA. Therefore I need to know the payload size of different types of
HCA.

We have InfiniHost III Mellanox cards of 4x SDR and DDR IB. According
the lspci -vvv command on SLES 10 SP1, the PCI-e payload of these cards
is maximum 128 bytes.

Can someone give my the payload size for a ConnectX 4x SDR and DDR IB
card? 

Thank you,

Koen

This is my output:
41:00.0 InfiniBand: Mellanox Technologies MT25208 InfiniHost III Ex (rev
a0)
Subsystem: Mellanox Technologies MT25208 InfiniHost III Ex
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr+ Stepping- SERR- FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
SERR- http://www.vrt.be/disclaimer
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [ofa-general] Bonding and hw_csum

2008-01-31 Thread Koen Segers

On Wed, 2008-01-30 at 18:42 +0200, Tziporet Koren wrote:
> Or Gerlitz wrote:
> >
> > This is interesting report, however, since currently the hw checksum 
> > patch in not being submitted to the mainline kernel and it is also 
> > about to be removed from ofed 1.3 (Tziporet, can you update on that?), 
> > I am not going to look into that.
> >
> > Or.
> >
> the hw checksum patch was removed from OFED 1.3

I just saw some patches on the mailing list concerning csum offloading.
Are these applied in RC3? Or are they going to be introduced in the
daily build of tomorrow?

Is it correct to state that these patches replace the hw_csum parameter
by offloading the csum computation to the mthca? This would mean that
the results should be similar also.

Does the new offload patch depend on the type of hca being used?
According to lspci, we have the "InfiniBand: Mellanox Technologies
MT25208 InfiniHost III Ex (rev a0)" card. Do these patches work on a
sles 10 sp1 installed on x3755 and x3655 machines of IBM that have this
card inserted?

Is bonding going to work with this type of offloading?

Kind Regards

Koen

> 
> Tziporet
> 
> ___
> general mailing list
> general@lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
*** Disclaimer ***

Vlaamse Radio- en Televisieomroep
Auguste Reyerslaan 52, 1043 Brussel

nv van publiek recht
BTW BE 0244.142.664
RPR Brussel
http://www.vrt.be/disclaimer
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [ofa-general] Bonding and hw_csum

2008-01-30 Thread Koen Segers


On Wed, 2008-01-30 at 16:26 +0200, Or Gerlitz wrote:
> > Why is hw_checksum not submitted to the mainline kernel (and thus
> also
> > removed from ofed)? We definitely want to enable hw_checksum as it
> gives
> > an enormous bandwidth boost with ipoib.
> 
> you should ask that the individual/s that are signed on the patch

Is this Michael S. Tsirkin?

I don't know where else to find this information.

Regards,

Koen
*** Disclaimer ***

Vlaamse Radio- en Televisieomroep
Auguste Reyerslaan 52, 1043 Brussel

nv van publiek recht
BTW BE 0244.142.664
RPR Brussel
http://www.vrt.be/disclaimer
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [ofa-general] Bonding and hw_csum

2008-01-30 Thread Koen Segers

On Wed, 2008-01-30 at 15:34 +0200, Or Gerlitz wrote:
> Stijn De Smet wrote:
> > I'm trying to get IPOIB bonding to work with the hw_csum enabled. 
> ...
> > When I disable hw_csums, I can start iperf's, pull and replug all cables
> > and the iperf's run uninterrupted.
> 
> This is interesting report, however, since currently the hw checksum 
> patch in not being submitted to the mainline kernel and it is also about 
> to be removed from ofed 1.3 (Tziporet, can you update on that?), I am 
> not going to look into that.

Do you mean that bonding with hw_csum enabled will never work?

Why is hw_checksum not submitted to the mainline kernel (and thus also
removed from ofed)? We definitely want to enable hw_checksum as it gives
an enormous bandwidth boost with ipoib.

Koen.

> 
> Or.
> 
> ___
> general mailing list
> general@lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
*** Disclaimer ***

Vlaamse Radio- en Televisieomroep
Auguste Reyerslaan 52, 1043 Brussel

nv van publiek recht
BTW BE 0244.142.664
RPR Brussel
http://www.vrt.be/disclaimer
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [ofa-general] RE: ib_macro_model on OMNET++

2008-01-14 Thread Koen Segers

Hi,

Is this simulation package based on ofed-1.2.5 or an older version of
ofed?
Might it be possible to load different ofed versions in omnet++? 
I'm not aware of the internal structure of omnet++, but I thought it was
possible in ns2 to insert different library versions (for instance of
tcp).

Kind regards,

Koen


On Mon, 2008-01-14 at 10:01 +0200, Eitan Zahavi wrote:
> Hi Mahesh
> 
> I suspect the non existing parameter "GenModel" is still accessed by the 
> gen.cc code.
> So there must be a bug in the gen.cc code. 
> 
> As you can guess the code I opened is a stripped down version of our internal 
> model.
> As such I did not do too much of testing after the strip down.
> 
> I will provide a fix later this week.
> If you are able to debug and fix it yourself - please let me know.
> 
> Thanks
> 
> Eitan
> 
> 
> 
> 
> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Keshetti Mahesh
> Sent: ב 14 ינואר 2008 06:34
> To: Eitan Zahavi
> Cc: openIB
> Subject: Re: ib_macro_model on OMNET++
> 
> > The model describes IB HCAs and Switches.
> > One can build networks of these models and simulate traffic through 
> > the network.
> > The model accuratly describes how credits are flowing through the network.
> > The switches are built out of virtual output queues.
> > It also let you play with parameters for the switches and HCAs.
> > FDBs are programable.
> 
> Hi Eitan,
> 
> Thanks for the reply. I have successfully installed OMNET++ package on my 
> machine and I am able to run samples available in that package.
> 
> But when I tried to run a sample network from the ib_macro_model package, the 
> run is getting aborted with the following message.
> 
> [EMAIL PROTECTED] 2h_1s]# ./2h_1s
> OMNeT++/OMNEST Discrete Event Simulation  (C) 1992-2005 Andras Varga
> Release: 3.2, edition: Academic Public License.
> See the license for distribution terms and warranty disclaimer Setting up 
> Cmdenv...
> 
> Loading NED file:
> /home/maheshk/softwares/ib_macro_model/networks/2h_1s/../../src/hca.ned
> Loading NED file:
> /home/maheshk/softwares/ib_macro_model/networks/2h_1s/../../src/switch.ned
> Loading NED file:
> /home/maheshk/softwares/ib_macro_model/networks/2h_1s/../../src/gen.ned
> Loading NED file:
> /home/maheshk/softwares/ib_macro_model/networks/2h_1s/../../src/sink.ned
> Loading NED file:
> /home/maheshk/softwares/ib_macro_model/networks/2h_1s/../../src/ibuf.ned
> Loading NED file:
> /home/maheshk/softwares/ib_macro_model/networks/2h_1s/../../src/obuf.ned
> Loading NED file:
> /home/maheshk/softwares/ib_macro_model/networks/2h_1s/../../src/vlarb.ned
> 
> Preparing for Run #1...
> Setting up network `FABRIC'...
> Initializing...
> 
> RUNTIME ERROR. A cRuntimeError exception is about to be thrown, and you 
> requested (by setting debug-on-errors=true in the ini file) that errors abort 
> execution and break into the debugger.
>  - on Linux or Unix-like systems: you should now probably be running the
>simulation under gdb or another debugger. The simulation kernel will now
>raise a SIGABRT signal which will get you into the debugger. If you're not
>running under a debugger, you can still use the core dump for post-mortem
>debugging.
>  - on Windows: your should have a just-in-time debugger (such as
>the Visual C++ IDE) enabled. The simulation kernel will now
>cause a debugger interrupt to get you into the debugger -- press
>the [Debug] button in the dialog that comes up.
> Once in the debugger, use its "view stack trace" command (in gdb: "bt") to 
> see the context of the runtime error. See error text below.
> 
>  Error in module (IBGenerator) FABRIC.H_1.gen: has no parameter called 
> `GenModel'.
> Aborted
> 
> 
> Do you have any idea why it is happening ?
> 
> Thanks and Regards,
> -Mahesh
> 
> >
> > Etc etc
> >
> > Eitan
> ___
> general mailing list
> general@lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
*** Disclaimer ***

Vlaamse Radio- en Televisieomroep
Auguste Reyerslaan 52, 1043 Brussel

nv van publiek recht
BTW BE 0244.142.664
RPR Brussel
http://www.vrt.be/disclaimer
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [ofa-general] IO Size more than 48K

2007-12-03 Thread Koen Segers

Can you give more information on which analyzer you used?


Regards,

Koen


On Fri, 2007-11-30 at 10:48 -0700, Batwara, Ashish wrote:
> This is what I did as suggested by Vu and it seems to be working.
> However, when I send 2MB IO, it gets broken into 512K+1MB+512K by SRP
> as seen on analyzer. I am just wondering what the logic is? On the
> other side, when we increase the srp_sg_tablesize beyond 256, we are
> seeing following message in /var/log/messages “Nov 29 21:17:50 p50
> kernel:   REJ reason 0x3” which indicates “IB_CM_REJ_NO_RESOURCES”, so
> not sure how to get around to this problem to send larger IO than 1MB
> in one shot.
> 
>  
> 
>  
> 
> modprobe ib_srp srp_sg_tablesize=256
> 
> echo
> id_ext=200600A0B81138C9,max_sect=4096,ioc_guid=00a0b81112da0003,dgid=fe8000a0b81112da0001,pkey=,service_id=200600a0b81138c9>
>  /sys/class/infiniband_srp/srp-mthca0-1/add_target
> 
>  
> 
> -Original Message-
> From: chas williams - CONTRACTOR [mailto:[EMAIL PROTECTED] 
> Sent: Friday, November 30, 2007 11:43 AM
> To: Kevin Harms
> Cc: Vu Pham; [EMAIL PROTECTED]; Batwara, Ashish
> Subject: Re: [ofa-general] IO Size more than 48K 
> 
>  
> 
> addtionally, you might need to echo 'blocks' >
> 
> /sys/block/ 
> rdma segments.
> 
>  
> 
> max_hw_segments doesnt exist on all kernels i think.
> 
>  
> 
> In message <[EMAIL PROTECTED]>,Kevin
> Harms writ
> 
> es:
> 
> > 
> 
> > you may also have to go to /sys/block/sdX/queue and echo 1024
> >  
> 
> >max_sectors_kb
> 
> > if you use the srp_daemon you can also add:
> 
> > a max_sect=2048 to /etc/srp_daemon.conf
> 
> > 
> 
> >kevin
> 
> > 
> 
> >On Nov 29, 2007, at 11:08 AM, Vu Pham wrote:
> 
> > 
> 
> >> 
> 
> >>> Hi,
> 
> >>> We are using OFED-1.2, and using xdd and some other tools, and  
> 
> >>> trying to
> 
> >>> send 1/2MB IOs, but what we are seeing in analyzer traces, that  
> 
> >>> memory
> 
> >>> descriptor in SRP command shows max. 48K which means 1MB I/Os
> has  
> 
> >>> broken
> 
> >>> into smaller SRP request from initiator.
> 
> >>> How can I have this I/O directly going to target? What parameter
> I  
> 
> >>> need
> 
> >>> to change?
> 
> >>> 
> 
> >>> 
> 
> >> 
> 
> >> module param srp_sg_tablesize (default is 12 ie. 12 x 4K = 48K)
> 
> >> and/or
> 
> >> max_sect=yyy in echo id_ext=xxx,...,max_sect=1024,service_id=
> > /sys/ 
> 
> >> class/infiniband_srp/...
> 
> >> 
> 
> >> -vu
> 
> >> 
> 
> >>> Thanks
> 
> >>> Ashish
> 
> >>> ___
> 
> >>> general mailing list
> 
> >>> general@lists.openfabrics.org
> 
> >>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> >>> 
> 
> >>> To unsubscribe, please visit
> http://openib.org/mailman/listinfo/openib-gene
> 
> >ral
> 
> >>> 
> 
> >> 
> 
> >> ___
> 
> >> general mailing list
> 
> >> general@lists.openfabrics.org
> 
> >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> >> 
> 
> >> To unsubscribe, please visit
> http://openib.org/mailman/listinfo/openib-gener
> 
> >al
> 
> >> 
> 
> > 
> 
> >___
> 
> >general mailing list
> 
> >general@lists.openfabrics.org
> 
> >http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> > 
> 
> >To unsubscribe, please visit
> http://openib.org/mailman/listinfo/openib-general
> 
> > 
> 
> 
> ___
> general mailing list
> general@lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
*** Disclaimer ***

Vlaamse Radio- en Televisieomroep
Auguste Reyerslaan 52, 1043 Brussel

nv van publiek recht
BTW BE 0244.142.664
RPR Brussel
http://www.vrt.be/disclaimer
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [ofa-general] Expected RDMA performance

2007-10-22 Thread Koen Segers

On Fri, 2007-10-19 at 09:09 -0700, Michael Krause wrote:
> At 08:20 AM 10/19/2007, Peter Kjellstrom wrote:
> > On Thursday 18 October 2007, Chuck Hartley wrote:
> > ...
> > > 838860850001342.12   1342.12
> > > --
> > >
> > > Is this typical RDMA performance?
> > 
> > It's close to what I've seen on similar hw. ~1400 is what you can
> > push through 
> > the 8x pci-e of the intel 5000 chipset (confirmed by trying 4x pci-e
> > which 
> > has shown ~700).
> > 
> > > What is the maximum theoretical BW for 
> > > DDR IB - 1525MB/sec?
> > 
> > No, it's 20 Gbps on the wire and 8/10 encoded so 16 Gbps effective
> > which is 
> > 2000 MB/s (10-base) and 1907 MiB/s (2-base).
> 
> There is also IB protocol overhead combined with driver / device
> control traffic overhead (consumes device as well as PCI resources /
> bandwidth), end-to-end control traffic  which is also a function of
> how the application is constructed.   In general, hitting about 80-85%
> of the theoretical maximum is possible.


I'm very interested in this result. Can you elaborate this a bit more?

Has anyone documented the ib traffic control mechanism?


Regards,

Koen Segers
> 
> > On our system (with a different HCA) we see quite a difference with 
> > snoop-filter off (bios option). With snoop off (our) application
> > performance 
> > goes up (not very suprising) but IB performance goes down (latency
> > 0.4us 
> > worse and bw ~1400->1200).
> 
> Mike
> ___
> general mailing list
> general@lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
*** Disclaimer ***

Vlaamse Radio- en Televisieomroep
Auguste Reyerslaan 52, 1043 Brussel

nv van publiek recht
BTW BE 0244.142.664
RPR Brussel
http://www.vrt.be/disclaimer
 

___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[ofa-general] SDP overhead

2007-08-21 Thread Koen Segers

Hello,

Can somebody tell me what the overhead is of an SDP packet when the
packet is completely filled with data?

The numbers I want to have are:
max data length of 1 packet
total header length of 1 packet

Regards,

Koen Segers
*** Disclaimer ***

Vlaamse Radio- en Televisieomroep
Auguste Reyerslaan 52, 1043 Brussel

nv van publiek recht
BTW BE 0244.142.664
RPR Brussel
http://www.vrt.be/disclaimer
 

___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [ofa-general] GPFS node loses IB-connection

2007-05-22 Thread Koen Segers

If I understand it wright, the switch is actually polling (=pinging) the
interfaces every 10s. This means that when the interface is handling
other traffic, the poll can fail and the port could be considered out of
service. My question is then: "How can the timeout be reached while
packets are being sent/received to/from the interface?"

Anyway, what timeout-value would you recommend for us? And why?

To recapitulate: these are the actions I'll take tomorrow
1) change the MAD niceness of the servers
2) change the timeout on the switches

Are these changes sufficient for the HCA's to keep their ports in
PORT_ACTIVE state?

Regards,

Koen

On Tue, 2007-05-22 at 12:59 -0700, Scott Weitzenkamp (sweitzen) wrote:
> Yes, you can tune it.  Here's an example via the switch CLI:
>  
> SFS-7000D(config)# ib sm subnet-prefix fe:80:00:00:00:00:00:00
> node-timeout 
> 
> The default is 10 seconds, it can be configured up to 2000 seconds.
> If a HCA is completely unresponsive for longer than the node-timeout
> value, then we consider that HCA out of service.
>  
> Scott Weitzenkamp
> SQA and Release Manager
> Server Virtualization Business Unit
> Cisco Systems
>  
> 
> 
> __
> From: Shirley Ma [mailto:[EMAIL PROTECTED] 
> Sent: Tuesday, May 22, 2007 11:30 AM
> To: [EMAIL PROTECTED]
> Cc: Ami Perlmutter; general@lists.openfabrics.org;
> [EMAIL PROTECTED]; Scott Weitzenkamp
> (sweitzen)
> Subject: RE: [ofa-general] GPFS node loses IB-connection
> 
> 
> 
> Koen,
> 
> So it is most likely you hit the same bug as 229 (Scott
> pointed out earlier). The same workaround might work for you
> by renicing ib_mad as Scott suggested.
> 
> I think this should be a SM query timeout tunable value in
> Cisco SM. Am I right, Scott?
> 
> Thanks
> Shirley Ma
> 
> 
> Inactive hide details for Koen Segers <[EMAIL PROTECTED]>Koen
> Segers <[EMAIL PROTECTED]>
> 
> 
> Koen Segers <[EMAIL PROTECTED]> 
> 
> 05/22/07 11:14 AM 
> Please respond to
> [EMAIL PROTECTED]
> 
> 
>  To
> 
> Shirley
> Ma/Beaverton/[EMAIL PROTECTED]
> 
>  cc
> 
> Ami Perlmutter
> <[EMAIL PROTECTED]>, general@lists.openfabrics.org, [EMAIL PROTECTED]
> 
> Subject
> 
> RE:
> [ofa-general]
> GPFS node loses
> IB-connection
> 
> 
> 
> Hi,
> 
> It is the Cisco SM. 
> 
> SFS-7000P> show version
> 
> 
> 
> 
>   System Version Information
> 
> 
>   system-version : SFS-7000P TopspinOS 2.9.0 releng
> #147
> 10/25/2006 02:01:32
>  contact : [EMAIL PROTECTED]
> name : SFS-7000P
> location : 170 West Tasman Drive, San Jose, CA
> 95134
>  up-time : 11(d):7(h):49(m):3(s)
>  last-change : none
> last-config-save : none
>   action : none
>   result : none
>oper-mode : normal
> 
> There is also a command that gives the SM version, but I can't
> find it
> right now. 
> 
> On Tue, 2007-05-22 at 09:45 -0700, Shirley Ma wrote:
> > Hello Koen,
> > 
> > From the switch log, it looks a SM issue to me. The node was
> kicked
> > out of the membership. Which SM you are using in your
> fabric? 
> > 
> > Thanks
> > Shirley Ma
> > 
> *** Disclaimer ***
> 
> Vlaamse Radio- en Televisieomroep
> Auguste Reyerslaan 52, 1043 Brussel
> 
> nv van publiek recht
> BTW BE 0244.142.664
>

RE: [ofa-general] GPFS node loses IB-connection

2007-05-22 Thread Koen Segers

Hi,

It is the Cisco SM. 

SFS-7000P> show version



   System Version Information

   system-version : SFS-7000P TopspinOS 2.9.0 releng #147
10/25/2006 02:01:32
  contact : [EMAIL PROTECTED]
 name : SFS-7000P
 location : 170 West Tasman Drive, San Jose, CA 95134
  up-time : 11(d):7(h):49(m):3(s)
  last-change : none
 last-config-save : none
   action : none
   result : none
oper-mode : normal

There is also a command that gives the SM version, but I can't find it
right now. 

On Tue, 2007-05-22 at 09:45 -0700, Shirley Ma wrote:
> Hello Koen,
> 
> From the switch log, it looks a SM issue to me. The node was kicked
> out of the membership. Which SM you are using in your fabric? 
> 
> Thanks
> Shirley Ma
> 
*** Disclaimer ***

Vlaamse Radio- en Televisieomroep
Auguste Reyerslaan 52, 1043 Brussel

nv van publiek recht
BTW BE 0244.142.664
RPR Brussel
http://www.vrt.be/disclaimer
 

___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [ofa-general] GPFS node loses IB-connection

2007-05-22 Thread Koen Segers

On Tue, 2007-05-22 at 08:34 -0700, Scott Weitzenkamp (sweitzen) wrote:
> What server model and CPU model do you have?

cat /proc/cpuinfo
processor   : 7
vendor_id   : AuthenticAMD
cpu family  : 15
model   : 65
model name  : Dual-Core AMD Opteron(tm) Processor 8218
stepping: 2
cpu MHz : 2600.202
cache size  : 1024 KB
physical id : 3
siblings: 2
core id : 1
cpu cores   : 2
fpu : yes
fpu_exception   : yes
cpuid level : 1
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext
fxsr_opt lm 3dnowext 3dnow pni cx16 lahf_lm cmp_legacy svm cr8_legacy
bogomips: 5200.54
TLB size: 1024 4K pages
clflush size: 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc

>  
> This could be https://bugs.openfabrics.org//show_bug.cgi?id=229.  Try
> setting RENICE_IB_MAD=yes in /etc/infiniband/openibd.conf, then reboot
> or run /etc/init.d/openibd restart, and see if that helps.

AHA, this is interesting. I'll do it tomorrow!

>  
> Scott Weitzenkamp
> SQA and Release Manager
> Server Virtualization Business Unit
> Cisco Systems
>  
> 
> 
> __
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of
> SEGERS Koen
> Sent: Tuesday, May 22, 2007 6:44 AM
> To: Ami Perlmutter; Shirley Ma
> Cc: [EMAIL PROTECTED];
> general@lists.openfabrics.org
> Subject: RE: [ofa-general] GPFS node loses IB-connection
> 
> 
> 
> I did the iperf tests on servers with OFED-1.2-RC3.
> 
>  
> 
> It also gives the same result. Actually, it is even worse:
> when the interface dies, it gets in PORT_INIT state, but it
> doesn’t go to PORT_ACTIVE again. At least not within 10
> minutes.
> 
>  
> 
> I’ll give you the test script I ran:
> 
>  
> 
> ssh 10.224.158.114 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf
> -s -p 5001 &
> 
> ssh 10.224.158.114 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf
> -s -p 5002 &
> 
> ssh 10.224.158.114 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf
> -s -p 5003 &
> 
> ssh 10.224.158.115 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf
> -s -p 6001 &
> 
> ssh 10.224.158.115 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf
> -s -p 6002 &
> 
> ssh 10.224.158.115 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf
> -s -p 6003 &
> 
> ssh 10.224.158.116 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf
> -s -p 7001 &
> 
> ssh 10.224.158.116 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf
> -s -p 7002 &
> 
> ssh 10.224.158.116 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf
> -s -p 7003 &
> 
> ssh 10.224.158.117 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf
> -s -p 8001 &
> 
> ssh 10.224.158.117 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf
> -s -p 8002 &
> 
> ssh 10.224.158.117 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf
> -s -p 8003 &
> 
>  
> 
> sleep 5
> 
>  
> 
> for i in 14 15 16 17
> 
> do
> 
> ssh 10.224.158.111 LD_PRELOAD=libsdp.so
> SIMPLE_LIBSDP=OK iperf -c 192.168.2.$i -p $((i-9))001 -t 120
> -d -P 5 &
> 
> ssh 10.224.158.112 LD_PRELOAD=libsdp.so
> SIMPLE_LIBSDP=OK iperf -c 192.168.2.$i -p $((i-9))002 -t 120
> -d -P 5 &
> 
> ssh 10.224.158.113 LD_PRELOAD=libsdp.so
> SIMPLE_LIBSDP=OK iperf -c 192.168.2.$i -p $((i-9))003 -t 120
> -d -P 5 &
> 
> done
> 
>  
> 
> Any ideas?
> 
>  
> 
> Regards,
> 
>  
> 
> Koen
> 
>
> __
> Van: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] Namens SEGERS
> Koen
> Verzonden: dinsdag 22 mei 2007 10:55
> Aan: Ami Perlmutter; Shirley Ma
> CC: [EMAIL PROTECTED];
> general@lists.openfabrics.org
> Onderwerp: RE: [ofa-general] GPFS node loses IB-connection
> 
> 
>  
> 
> GPFS keeps its connection constantly open.
> 
>  
> 
> We did some more tests with iperf:
> 
> If we don’t run bidirectional tests, all connections keeps
> running smoothly.

[ofa-general] DDR and SDR

2007-05-07 Thread Koen Segers

A simple question:

Is it possible to connect a SDR HCA to a DDR switch?
If so, what happens with the data that is send from a DDR HCA to the SDR
HCA?

Regards,

Koen
*** Disclaimer ***

Vlaamse Radio- en Televisieomroep
Auguste Reyerslaan 52, 1043 Brussel

nv van publiek recht
BTW BE 0244.142.664
RPR Brussel
http://www.vrt.be/disclaimer
 

___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [ofa-general] PCI-Express payload

[ofa-general] PCI-Express payload

Re: [ofa-general] Bonding and hw_csum

Re: [ofa-general] Bonding and hw_csum

Re: [ofa-general] Bonding and hw_csum

Re: [ofa-general] RE: ib_macro_model on OMNET++

RE: [ofa-general] IO Size more than 48K

Re: [ofa-general] Expected RDMA performance

[ofa-general] SDP overhead

RE: [ofa-general] GPFS node loses IB-connection

RE: [ofa-general] GPFS node loses IB-connection

RE: [ofa-general] GPFS node loses IB-connection

[ofa-general] DDR and SDR

13 matches

Site Navigation

Mail list logo

Footer information