[ofa-general] Re: [Bug 465] IPoIB CM HA fails after several hours of failovers

2007-04-11 Thread Michael S. Tsirkin
I've tried this with RHEL4 U3 x86_64 LionMini SDR, SLES10 x86_64 LionCub DDR, and RHEL4 U3 x86_64 LionMini DDR so far. You reported an oops. One which OS/HW/FW did you observe it? -- MST ___ general mailing list general@lists.openfabrics.org

Re: [ofa-general] questions about OFED 1.2 IPoIB bonding

2007-04-11 Thread Michael S. Tsirkin
Quoting Roland Dreier [EMAIL PROTECTED]: Subject: Re: [ofa-general] questions about OFED 1.2 IPoIB bonding Out of curiousity, why does this cause a catastrophic error? I would have thought a work request with a bogus bus address would generate an affiliated error, since you know

Re: [ofa-general] Changed default bugzilla Priority/Severity from P1/Blocker to P3/Normal

2007-04-11 Thread Michael S. Tsirkin
Quoting Scott Weitzenkamp (sweitzen) [EMAIL PROTECTED]: Subject: [ofa-general] Changed default bugzilla Priority/Severity from P1/Blocker to P3/Normal Now there will be no more accidental P1/Blocker bugs. Good idea. -- MST ___ general

[ofa-general] Re: multicast join failed for...

2007-04-11 Thread Michael S. Tsirkin
Are all your ports DDR or do you have a mix ? If all are DDR, you can configure the default partition to use this rate. If I get this right, user has to manually configure the rate in a mixed subnet. Is that correct? If yes, I'm actually not too happy with this. Would something like the

Re: [ewg] Re: [ofa-general] OFED-1.2 compiles 32b apps on SLES 10 for IBM PPC

2007-04-11 Thread Stefan Roscher
Hi, On Tuesday 10 April 2007 20:25, Al Chu wrote: On Tue, 2007-04-10 at 12:14 -0500, Steve Wise wrote: I just built the ofed-1.2-rc1 kit on an IBM P5 PPC with SLES 10 and some the apps got built as 32b. Seems like gcc on this distro defaults to 32b: My opinion is this bug is with Suse.

[ofa-general] ofa_1_2_kernel 20070411-0200 daily build status

2007-04-11 Thread Vladimir Sokolovsky
This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-rds-mod --with-cxgb3-mod Passed: Passed on i686 with

[ofa-general] Re: re [NET]: Fix neighbour destructor handling

2007-04-11 Thread Michael S. Tsirkin
I did followed most of the discussions between you and MoniS re the ipoib/bonding integration in OFED 1.2 and elsewhere, however: i don't see why bonding is basically broken for ipoib, if you don't mind, please tell me the bottom line from your perspective. Here's a short summary of issues

[ofa-general] Re: multicast join failed for...

2007-04-11 Thread Michael S. Tsirkin
Quoting Hal Rosenstock [EMAIL PROTECTED]: Subject: Re: multicast join failed for... On Mon, 2007-04-09 at 18:47, Egor Tur wrote: Hi folk. ib1: multicast join failed for ff12:601b::::::0001, status -22 ib0: multicast join failed for

Re: [ofa-general] OFA server bugzilla product

2007-04-11 Thread Tziporet Koren
Very good idea Tziporet ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [ofa-general] iser/lustre memfree issues

2007-04-11 Thread Or Gerlitz
Roland Dreier wrote: 472 Data corruption with Lustre+OFED when using FMR on memfree HCAs We see it also with iser, basically only on scsi --read-- which from IB perspective is RDMA write from the target to the initiator. The env we see it is Sinai (25204) hw_ver=A0 and

Re: [ofa-general] iser/lustre memfree issues

2007-04-11 Thread Michael S. Tsirkin
Quoting Or Gerlitz [EMAIL PROTECTED]: Subject: Re: [ofa-general] iser/lustre memfree issues Roland Dreier wrote: 472 Data corruption with Lustre+OFED when using FMR on memfree HCAs We see it also with iser, basically only on scsi --read-- which from IB perspective is RDMA

Re: [ofa-general] mvapich2 over iwarp DOA - bug520

2007-04-11 Thread Vladimir Sokolovsky
On Wed, 2007-04-04 at 22:13 -0500, Steve Wise wrote: On Wed, 2007-04-04 at 10:57 -0500, Steve Wise wrote: I just built and installed today's daily ofed-1.2 build and mvapich2 doesn't work at all over iwarp. The build is OFED-1.2-20070404-0600.tgz. I've opened bug 520 to track this.

[ofa-general] Photoshop, Windows, Office

2007-04-11 Thread Quinn Hall
See attachment. - Ennis Del Mar wakes before fiv The stale coffee is boiling up They were raised on small, poo pic26.gif Description: GIF image ___ general mailing list general@lists.openfabrics.org

[ofa-general] Re: multicast join failed for...

2007-04-11 Thread Hal Rosenstock
On Wed, 2007-04-11 at 03:22, Michael S. Tsirkin wrote: Are all your ports DDR or do you have a mix ? If all are DDR, you can configure the default partition to use this rate. If I get this right, user has to manually configure the rate in a mixed subnet. Is that correct? I'm not sure;

[ofa-general] Re: multicast join failed for...

2007-04-11 Thread Hal Rosenstock
On Wed, 2007-04-11 at 05:49, Michael S. Tsirkin wrote: Quoting Hal Rosenstock [EMAIL PROTECTED]: Subject: Re: multicast join failed for... On Mon, 2007-04-09 at 18:47, Egor Tur wrote: Hi folk. ib1: multicast join failed for ff12:601b::::::0001, status

[ofa-general] Re: [GIT PULL] OFED-1.2 - Chelsio Bug Fixes

2007-04-11 Thread Vladimir Sokolovsky
On Tue, 2007-04-10 at 17:02 -0500, Steve Wise wrote: Vlad, Please pull these cxgb3 and iw_cxgb3 changes from git://git.openfabrics.org/~swise/ofed_1_2 ofed_1_2 Thanks, Steve. Divy Le Ray: Ensure that the TCAM active region size is at least 16.

[ofa-general] Re: multicast join failed for...

2007-04-11 Thread Michael S. Tsirkin
If yes, I'm actually not too happy with this. Would something like the following heuristic work better? - select the max rate between all participants The issue is that one doesn't know all the participants in a group as they are joined dynamically. (I think we've been over this

[ofa-general] OFA server gitweb internal server error

2007-04-11 Thread Hal Rosenstock
Anyone know what's going on with gitweb on the OFA server ? When I try: http://www.openfabrics.org/gitweb/ I get: Internal Server Error The server encountered an internal error or misconfiguration and was unable to complete your request. Please contact the server administrator, [EMAIL

[ofa-general] Re: multicast join failed for...

2007-04-11 Thread Hal Rosenstock
On Wed, 2007-04-11 at 14:12, Michael S. Tsirkin wrote: If yes, I'm actually not too happy with this. Would something like the following heuristic work better? - select the max rate between all participants The issue is that one doesn't know all the participants in a group as

Re: [ofa-general] OFA server gitweb internal server error

2007-04-11 Thread Jeff Becker
Hi Hal. After a long delay, it seems to work for me. Thanks. -jeff On 11 Apr 2007 14:17:25 -0400, Hal Rosenstock [EMAIL PROTECTED] wrote: Anyone know what's going on with gitweb on the OFA server ? When I try: http://www.openfabrics.org/gitweb/ I get: Internal Server Error The server

[ofa-general] RE: Bugzilla setup for utils component

2007-04-11 Thread Scott Weitzenkamp \(sweitzen\)
The utils component started as a place for installer bugs. We then created an Installer component. I view utils as a place for bugs on tvflash, mstflint, perftest, anything that does not fit in another component. We could create compoents for ibutils, tvflash, mstflint, etc. if that would be

[ofa-general] RE: Bugzilla setup for utils component

2007-04-11 Thread Hal Rosenstock
On Wed, 2007-04-11 at 14:57, Scott Weitzenkamp (sweitzen) wrote: The utils component started as a place for installer bugs. We then created an Installer component. I view utils as a place for bugs on tvflash, mstflint, perftest, anything that does not fit in another component. We could

[ofa-general] RE: Bugzilla setup for utils component

2007-04-11 Thread Scott Weitzenkamp \(sweitzen\)
I added ibutils. -Original Message- From: Hal Rosenstock [mailto:[EMAIL PROTECTED] Sent: Wednesday, April 11, 2007 11:57 AM To: Scott Weitzenkamp (sweitzen) Cc: OpenFabricsEWG; general@lists.openfabrics.org; Eitan Zahavi Subject: RE: Bugzilla setup for utils component On Wed,

[ofa-general] {PATCH] Documentation/user_mad.txt: Clarify transaction ID usage

2007-04-11 Thread Hal Rosenstock
Documentation/user_mad.txt: Clarify transaction ID usage Signed-off-by: Hal Rosenstock [EMAIL PROTECTED] diff --git a/Documentation/infiniband/user_mad.txt b/Documentation/infiniband/user_mad.txt index 750fe5e..1d2dbf1 100644 --- a/Documentation/infiniband/user_mad.txt +++

[ofa-general] Re: multicast join failed for...

2007-04-11 Thread Michael S. Tsirkin
Quoting Hal Rosenstock [EMAIL PROTECTED]: Subject: Re: multicast join failed for... On Wed, 2007-04-11 at 14:12, Michael S. Tsirkin wrote: If yes, I'm actually not too happy with this. Would something like the following heuristic work better? - select the max rate between all

[ofa-general] How fast to get RDMA_CM_EVENT_DISCONNECTED ?

2007-04-11 Thread Tang, Changqing
Sean: A question about rdmacm library. I use rdma_connect/accept to wire the IB connection between A and B. Somehow the IB connection is broken by either process B dies, or a bad cable. If process A just receives messages from process B, can process A get a RDMA_CM_EVENT_DISCONNECTED

[ofa-general] Re: {PATCH] Documentation/user_mad.txt: Clarify transaction ID usage

2007-04-11 Thread Roland Dreier
+Transaction IDs + + Clients of the MAD layer can use the lower 32 bits of the + transaction ID field to track mad request/response pairs. The + upper 32 bits are reserved for use by the kernel ib_mad module. This is a good addition. But I think it would be worth saying which half

[ofa-general] Re: {PATCH] Documentation/user_mad.txt: Clarify transaction ID usage

2007-04-11 Thread Hal Rosenstock
On Wed, 2007-04-11 at 17:02, Roland Dreier wrote: +Transaction IDs + + Clients of the MAD layer can use the lower 32 bits of the + transaction ID field to track mad request/response pairs. The + upper 32 bits are reserved for use by the kernel ib_mad module. This is a good

[ofa-general] Re: {PATCH] Documentation/user_mad.txt: Clarify transaction ID usage

2007-04-11 Thread Hal Rosenstock
On Wed, 2007-04-11 at 17:02, Roland Dreier wrote: +Transaction IDs + + Clients of the MAD layer can use the lower 32 bits of the + transaction ID field to track mad request/response pairs. The + upper 32 bits are reserved for use by the kernel ib_mad module. This is a good

Re: [ofa-general] Re: {PATCH] Documentation/user_mad.txt: Clarify transaction ID usage

2007-04-11 Thread Roland Dreier
This is a good addition. But I think it would be worth saying which half of the TID is the lower half. I would fix it up myself but I don't know off the top of my head which byte order the TID is interpreted with. Should this be described relative to network (rather than host)

Re: [ofa-general] Re: {PATCH] Documentation/user_mad.txt: Clarify transaction ID usage

2007-04-11 Thread Hal Rosenstock
On Wed, 2007-04-11 at 17:13, Roland Dreier wrote: This is a good addition. But I think it would be worth saying which half of the TID is the lower half. I would fix it up myself but I don't know off the top of my head which byte order the TID is interpreted with. Should this

[ofa-general] Re: multicast join failed for...

2007-04-11 Thread Hal Rosenstock
On Wed, 2007-04-11 at 15:47, Michael S. Tsirkin wrote: Quoting Hal Rosenstock [EMAIL PROTECTED]: Subject: Re: multicast join failed for... On Wed, 2007-04-11 at 14:12, Michael S. Tsirkin wrote: If yes, I'm actually not too happy with this. Would something like the

[ofa-general] RE: [ewg] Re: SRP HA dm_multipath testing and questions

2007-04-11 Thread Scott Weitzenkamp \(sweitzen\)
I haven't tried adding or removing storage, just failover. I guess leave 91-srp.rules in for now, it seems benign. Scott -Original Message- From: Ishai Rabinovitz [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 10, 2007 9:46 PM To: Chieng Etta Cc: Scott Weitzenkamp (sweitzen);

Re: [ofa-general] Re: [PATCH 00 of 33] Set of ipath patches for 2.6.22

2007-04-11 Thread Roland Dreier
BTW: any idea how this ever got triggered? The only way I can see is if you're either not using libipathverbs and libibverbs and you just create the CQ some other way, which seems unlikely. Do you know how Jason triggered this bug? Yes, it was because he was using 32-bit userspace and

Re: [ofa-general] Re: re [NET]: Fix neighbour destructor handling

2007-04-11 Thread Or Gerlitz
On 4/11/07, Michael S. Tsirkin [EMAIL PROTECTED] wrote: I did followed most of the discussions between you and MoniS re the ipoib/bonding integration in OFED 1.2 and elsewhere, however: i don't see why bonding is basically broken for ipoib, if you don't mind, please tell me the bottom line

Re: [ewg] Re: [ofa-general] iser/lustre memfree issues

2007-04-11 Thread Or Gerlitz
On 4/12/07, Roland Dreier [EMAIL PROTECTED] wrote: Could you try commenting out just these 2 lines in mthca_cmd.c: if (dev-mthca_flags MTHCA_FLAG_SINAI_OPT) MTHCA_PUT(inbox, 0x1, INIT_HCA_FLAGS1_OFFSET); (reverting your changes, that is keeping

Re: [ofa-general] Re: [PATCH 00 of 33] Set of ipath patches for 2.6.22

2007-04-11 Thread Robert Walsh
Roland Dreier wrote: BTW: any idea how this ever got triggered? The only way I can see is if you're either not using libipathverbs and libibverbs and you just create the CQ some other way, which seems unlikely. Do you know how Jason triggered this bug? Yes, it was because he was using

[ofa-general] RE: How fast to get RDMA_CM_EVENT_DISCONNECTED ?

2007-04-11 Thread Tang, Changqing
-Original Message- From: Sean Hefty [mailto:[EMAIL PROTECTED] Sent: Wednesday, April 11, 2007 4:50 PM To: Tang, Changqing Cc: general@lists.openfabrics.org Subject: Re: How fast to get RDMA_CM_EVENT_DISCONNECTED ? A question about rdmacm library. I use

RE: [ofa-general] RE: How fast to get RDMA_CM_EVENT_DISCONNECTED ?

2007-04-11 Thread Sean Hefty
Serveral seconds to detect connection failure is not acceptable for us, so if I use rdmacm, I want to know if I detect the connection failure faster than heart-beat message. In general, use of the rdma or ib cm will not help detect failures on active connection any faster. If the remove process

[ofa-general] Re: multicast join failed for...

2007-04-11 Thread Egor Tur
Hi folk. I see that my small problem has been interesting. Thanks for your help. Rate 6 is 20 Gb/sec whereas 3 is 10 Gb/sec. So the port is 4x DDR (rate 6) and the group is 4x SDR. The request is for equal to the rate so it fails. Are all your ports DDR or do you have a mix ? If all

Re: [ofa-general] Re: multicast join failed for...

2007-04-11 Thread Ira Weiny
On 11 Apr 2007 17:45:54 -0400 Hal Rosenstock [EMAIL PROTECTED] wrote: On Wed, 2007-04-11 at 15:47, Michael S. Tsirkin wrote: - previously we had some client failing join which is worse. Maybe not. Maybe that's what the admin wants (to keep the higher rate rather than degrade the group

[ofa-general] Re: multicast join failed for...

2007-04-11 Thread Michael S. Tsirkin
Quoting Hal Rosenstock [EMAIL PROTECTED]: Subject: Re: multicast join failed for... On Wed, 2007-04-11 at 15:47, Michael S. Tsirkin wrote: Quoting Hal Rosenstock [EMAIL PROTECTED]: Subject: Re: multicast join failed for... On Wed, 2007-04-11 at 14:12, Michael S. Tsirkin wrote:

Re: [ofa-general] RE: How fast to get RDMA_CM_EVENT_DISCONNECTED ?

2007-04-11 Thread Roland Dreier
Yes, Internally in A, if the # of receives exceeds lowwater(4), an ack will be sent back. I assume ACK is not trigered at the moment. when A is trying to receive a message from B, and the message never shows, A acctualy sends a heart beat back to B, however, it takes serveral seconds for

[ofa-general] Re: [Bug 506] IPoIB IPv4 multicast throughput is poor

2007-04-11 Thread Michael S. Tsirkin
Yes, get a stready stream of these on sender. ib0: TX ring full, stopping kernel net queue Aha. (As a note, it's always useful to set debug level when you experience problems). Why am I getting low throughput of IP multicast vs IPoIB UD UDP unicast? It's something in the hardware - it's

[ofa-general] RE: [Bug 506] IPoIB IPv4 multicast throughput is poor

2007-04-11 Thread Scott Weitzenkamp \(sweitzen\)
Michael or Roland, could you please try iperf with UDP vs multicast, so I'm not the middleman here? Scott -Original Message- From: Michael S. Tsirkin [mailto:[EMAIL PROTECTED] Sent: Wednesday, April 11, 2007 9:17 PM To: [EMAIL PROTECTED]; Roland Dreier; Scott Weitzenkamp