Re: [openib-general] please pull for 2.6.21: fix + add IB multicast support

2007-02-09 Thread Michael S. Tsirkin
Quoting Roland Dreier [EMAIL PROTECTED]: Subject: Re: please pull for 2.6.21: fix + add IB multicast support I merged the increment port number and remove redundant '_wq' patches from git.openfabrics.org/~shefty/scm/rdma-dev.git for-roland I plan to review to multicast stuff next week and

Re: [openib-general] please pull for 2.6.21: fix + add IB multicast support

2007-02-09 Thread Or Gerlitz
On 2/9/07, Roland Dreier [EMAIL PROTECTED] wrote: I plan to review to multicast stuff next week and I hope to merge it for 2.6.21 thanks, good news! Or, have you or anyone else at Voltaire read over the code in addition to using it? Do you see anything that should be cleaned up? OK, I

Re: [openib-general] please pull for 2.6.21: fix + add IB multicast support

2007-02-09 Thread Michael S. Tsirkin
Or, have you or anyone else at Voltaire read over the code in addition to using it? Do you see anything that should be cleaned up? OK, I most the the review i did (and interaction with Sean to add changes) was on the rdma_cm: add multicast communication support patch, and i was less

[openib-general] ofa_1_2_kernel 20070209-0200 daily build status

2007-02-09 Thread vlad
-2.6.12 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.14 Passed on ppc64 with linux-2.6.18 Passed on ia64 with linux-2.6.15 Passed on ia64 with linux-2.6.17 Failed: Build failed on ia64 with linux-2.6.16.21-0.8-default Log: /home/vlad/tmp/ofa_1_2_kernel-20070209-0200_linux-2.6.16.21-0.8

Re: [openib-general] Problem is routing CM REQ

2007-02-09 Thread Hal Rosenstock
On Thu, 2007-02-08 at 18:43, Sean Hefty wrote: Looking at the problem more, I think that the issue extends to the remote port LID as well. My expectation with a local path record query is that the SLID is the local port, and the DLID is the local router. This should be

Re: [openib-general] Problem is routing CM REQ

2007-02-09 Thread Hal Rosenstock
On Thu, 2007-02-08 at 23:37, Jason Gunthorpe wrote: On Thu, Feb 08, 2007 at 03:43:24PM -0800, Sean Hefty wrote: Looking at the problem more, I think that the issue extends to the remote port LID as well. My expectation with a local path record query is that the SLID is the

[openib-general] [PATCH ofed-1.2] ofa_user.spec: fix installation path for ehca.driver

2007-02-09 Thread Stefan Roscher
Hi Vladimir, we tested the newest ofed1.2 package and found out that ehca.driver file is not copied into /usr/local/ofed/etc/libibverbs.d/ This patch add the installation path for ehca.driver to ofa_user.spec. Please ensure you first apply the ofa_user.spec patch I sent yesterday:

Re: [openib-general] [PATCH] RDMA/iwcm: Bugs in cm_conn_req_handler()

2007-02-09 Thread Tom Tucker
Kumar: I _LOVE_ the patch and the fact that you're making this code better. I just want to tweak it a little bit... * Please convince yourself (and me ;-)) that the iw_cm_destroy_id can never block where you've put it. I'll bet that it's fine, but convince yourself too. Your comment scared me a

Re: [openib-general] [PATCH 0/5] iw_cxgb3 - misc cleanup and fixes

2007-02-09 Thread Steve Wise
On Fri, 2007-02-09 at 08:51 +0200, Michael S. Tsirkin wrote: Also I agree with MST, I would like to see the core/ subdirectory die completely. ok ok...I'll kill the subdir... It's not just the directory BTW. Stuff like building completions in t3_cqe format and then reformatting

Re: [openib-general] [PATCH 0/5] iw_cxgb3 - misc cleanup and fixes

2007-02-09 Thread Michael S. Tsirkin
Quoting r. Steve Wise [EMAIL PROTECTED]: Subject: Re: [PATCH 0/5] iw_cxgb3 - misc cleanup and fixes On Fri, 2007-02-09 at 08:51 +0200, Michael S. Tsirkin wrote: Also I agree with MST, I would like to see the core/ subdirectory die completely. ok ok...I'll kill the

Re: [openib-general] dapl broken for iWARP

2007-02-09 Thread Kanevsky, Arkady
Steve, what is an issue of using max_qp_rd_atom and max_qp_init_rd_atom beside the bad name? Thanks, Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195 Waltham, MA 02451

Re: [openib-general] [PATCH 0/5] iw_cxgb3 - misc cleanup and fixes

2007-02-09 Thread Steve Wise
I understand, I did not get that. But for example create_read_req_cqe builds it in software. It could build ib_wc instead. Reads are handled in a slightly different manner. This is due to the fact that the T3 HW can complete a read out of order. For example: POST READ POST WRITE The

Re: [openib-general] dapl broken for iWARP

2007-02-09 Thread Steve Wise
On Fri, 2007-02-09 at 10:15 -0500, Kanevsky, Arkady wrote: Steve, what is an issue of using max_qp_rd_atom and max_qp_init_rd_atom beside the bad name? its a hack. But Bob already asked to do this, so I guess I will. We still don't ensure interoperability with DAPL consumers. A global

Re: [openib-general] [PATCH] IPOIB: Use a GRH when appropriate forunicast packets

2007-02-09 Thread Hal Rosenstock
Arkady, On Fri, 2007-02-09 at 10:32, Kanevsky, Arkady wrote: Hal, unfortunately, IBTA punted on this issue. We considered it for IBTA CM IP address annex but at the end could not handle all the cases. Thanks. Any idea if this issue might be addressed (no pun intended) or whether it is left

Re: [openib-general] [PATCH] OpenSM/osm_ucast_lash.c: In osm_get_lash_sl, fix SL when CA ports on same switch

2007-02-09 Thread Dale Purdy
We have successfully tested this bug fix and would like to see it pushed into the 1.2 branch. Dale On Thu, 8 Feb 2007, Hal Rosenstock wrote: OpenSM/osm_ucast_lash.c: In osm_get_lash_sl, fix SL when CA ports on same switch This change resolves an issue with strange SL assignment when two

Re: [openib-general] [PATCH] OpenSM/osm_ucast_lash.c: In osm_get_lash_sl, fix SL when CA ports on same switch

2007-02-09 Thread Hal Rosenstock
On Fri, 2007-02-09 at 11:05, Dale Purdy wrote: We have successfully tested this bug fix Thanks. and would like to see it pushed into the 1.2 branch. Already pushed for ofed_1_2. I will be sending a note to Vlad to pick these up and it should be in alpha. -- Hal Dale On Thu, 8 Feb

[openib-general] [PATCH] for-2.6.21 Declare iwch_ev_dispatch in iwch.h

2007-02-09 Thread Steve Wise
Declare iwch_ev_dispatch in iwch.h Remove the extern declaration from iwch.c and put it in iwch.h Signed-off-by: Steve Wise [EMAIL PROTECTED] --- drivers/infiniband/hw/cxgb3/iwch.c |2 -- drivers/infiniband/hw/cxgb3/iwch.h |2 ++ 2 files changed, 2 insertions(+), 2 deletions(-) diff

Re: [openib-general] Problem is routing CM REQ

2007-02-09 Thread Sean Hefty
I have a follow up question to this.. With CM how is the SL for each side determined? I'm looking through the code here and it looks like the SL of the active side is passed in the REQ to the passive side (ie both sides are the same) But cma_query_ib_route does not set the reversible bit when

Re: [openib-general] Unknown SMP Recv

2007-02-09 Thread Michael Arndt
Hi, umad_send takes the timeout in msec. 100 msec is too short. Try something on the order of seconds. Note also that negative 'timeout_ms' value makes the kernel wait for the reply forever. I have tried many values, but sooner or later the umad_send broke down, which is bad because the SM

Re: [openib-general] Problem is routing CM REQ

2007-02-09 Thread Sean Hefty
SLID corresponding to SGID and a DLID for some IB router on the subnet which can route to the remote DGID. This was my assumption as well. An SM is free to choose SLID and DLID to supply to if there are multiple LIDs for the ports in question it can choose alternates. The key here is

Re: [openib-general] please pull for 2.6.21: fix + add IB multicast support

2007-02-09 Thread Sean Hefty
+ member = kzalloc(sizeof *member, gfp_mask); + if (!member) + return ERR_PTR(-ENOMEM); This appears okay to replace with kmalloc. + group = kzalloc(sizeof *group, gfp_mask); + if (!group) + return NULL; + We would need additional

Re: [openib-general] Problem is routing CM REQ

2007-02-09 Thread Hal Rosenstock
On Fri, 2007-02-09 at 12:22, Sean Hefty wrote: SLID corresponding to SGID and a DLID for some IB router on the subnet which can route to the remote DGID. This was my assumption as well. An SM is free to choose SLID and DLID to supply to if there are multiple LIDs for the ports in

Re: [openib-general] Unknown SMP Recv

2007-02-09 Thread Hal Rosenstock
On Fri, 2007-02-09 at 12:14, Michael Arndt wrote: Hi, umad_send takes the timeout in msec. 100 msec is too short. Try something on the order of seconds. Note also that negative 'timeout_ms' value makes the kernel wait for the reply forever. I have tried many values, but sooner or

Re: [openib-general] Unknown SMP Recv

2007-02-09 Thread Michael Arndt
Hi, I have no clue; I don't really understand what you have changed so it is hard to know. For example: if I send ten SMPs like: for (i=0;i10;i++){ umad_send(portid, agentid, msg, len, timeout, repeats); } timeout 0! than only the first one is sent and all other

Re: [openib-general] Open MPI rpmbuild fails in OFED-1.2

2007-02-09 Thread Jeff Squyres
New SRPM on server that munges the %build section into the %install section. Yuck. :-) On Feb 7, 2007, at 11:42 AM, Vladimir Sokolovsky wrote: Hi Jeff, Please remove %build macro from the RPM spec file. On SuSE distros it removes RPM_BUILD_ROOT. Executing(%build): /bin/sh -e

Re: [openib-general] Unknown SMP Recv

2007-02-09 Thread Sasha Khapyorsky
Hi Michael, On Fri, 2007-02-09 at 19:38 +0100, Michael Arndt wrote: Hi, I have no clue; I don't really understand what you have changed so it is hard to know. For example: if I send ten SMPs like: for (i=0;i10;i++){ umad_send(portid, agentid, msg, len, timeout, repeats);

Re: [openib-general] Unknown SMP Recv

2007-02-09 Thread Hal Rosenstock
On Fri, 2007-02-09 at 13:38, Michael Arndt wrote: Hi, I have no clue; I don't really understand what you have changed so it is hard to know. For example: if I send ten SMPs like: for (i=0;i10;i++){ umad_send(portid, agentid, msg, len, timeout, repeats); }

Re: [openib-general] Problem is routing CM REQ

2007-02-09 Thread Sean Hefty
the /missing part (right now) is locating the SA on that remote subnet if this is a needed function. Maybe we can expose this to SA clients through a ServiceRecord? This doesn't solve how the two SAs find each other (or any of the other difficult stuff), but with this and the path record

Re: [openib-general] Unknown SMP Recv

2007-02-09 Thread Sasha Khapyorsky
On Fri, 2007-02-09 at 21:19 +0100, Michael Arndt wrote: Hi, It is strange, I did similar thing (you can see in management/diags/src/mcm_rereg_test.c) and it worked fine for me. What location is that? Do git clone git://git.openfabrics.org/~halr/management and find this as

Re: [openib-general] Problem is routing CM REQ

2007-02-09 Thread Hal Rosenstock
On Fri, 2007-02-09 at 14:20, Jason Gunthorpe wrote: On Fri, Feb 09, 2007 at 12:58:51PM -0500, Hal Rosenstock wrote: For simplicity, assume a single path. My assumption in this case was that the SLID/DLID values would be reversed. That is, the LIDs are relative to the local

Re: [openib-general] Unknown SMP Recv

2007-02-09 Thread Hal Rosenstock
On Fri, 2007-02-09 at 15:19, Michael Arndt wrote: Hi, It is strange, I did similar thing (you can see in management/diags/src/mcm_rereg_test.c) and it worked fine for me. What location is that? Which libibumad version you are using? Also I understand you did some changes in the

Re: [openib-general] Problem is routing CM REQ

2007-02-09 Thread Hal Rosenstock
On Fri, 2007-02-09 at 15:34, Sean Hefty wrote: the /missing part (right now) is locating the SA on that remote subnet if this is a needed function. Maybe we can expose this to SA clients through a ServiceRecord? That might be one way if there were a standardized service name for SA and

[openib-general] MVAPICH 0.9.9-beta release is available

2007-02-09 Thread Dhabaleswar Panda
The MVAPICH team is pleased to announce the availability of MVAPICH 0.9.9-beta with the following NEW features: - Message coalescing support to enable reduction of per Queue-pair send queues for reduction in memory requirement on large scale clusters. This design also increases the small

Re: [openib-general] Problem is routing CM REQ

2007-02-09 Thread Jason Gunthorpe
On Fri, Feb 09, 2007 at 04:45:29PM -0500, Hal Rosenstock wrote: Off hand I don't see that the existing path record query structure has enough information to do this.. Particularly, in cases where each subnet has more than 1 router port there is no real guarentee that querying

Re: [openib-general] [PATCH] RDMA/iwcm: Bugs in cm_conn_req_handler()

2007-02-09 Thread Steve Wise
All 4 above cases were tested by injecting random error in iw_conn_req_handler() and running rdma_bw/krping, they were confirmed. I added the BUG_ON() to confirm the earlier check for id_priv-refcount==0 should always be true (and could be removed). Can you post the test case you're using

Re: [openib-general] Problem is routing CM REQ

2007-02-09 Thread Sean Hefty
Sean: Even if you can query both SA's there isn't enough information to force things to use the same router path in each direction. My assumption is that the remote SA contains the necessary information about how a packet coming from the local SGID to the remote DGID would be routed on the

Re: [openib-general] Unknown SMP Recv

2007-02-09 Thread Michael Arndt
Hi, below the two files missing, sender.h and helper.c. Thanks Michael # Sender. h ## // Includes #include infiniband/umad.h #include string.h #include errno.h #include sys/select.h

Re: [openib-general] Problem is routing CM REQ

2007-02-09 Thread Sean Hefty
The hard part is the global distribution of this information. The best idea I can come up with for locating remote SAs is to have the SAs assign themselves a specific Unicast Global GID Assigned Value. So, each SA gives themselves a GID similar to: 64-bit subnet prefix :: 1. Hosts on remote

Re: [openib-general] Problem is routing CM REQ

2007-02-09 Thread Jason Gunthorpe
On Fri, Feb 09, 2007 at 03:08:12PM -0800, Sean Hefty wrote: The route itself is determined using the SGID, DGID, TClass, FlowLabel. So, as long as the two queries match on these fields, I would think that it would work. So basically what you are saying is that the TClass and FlowLabel act

Re: [openib-general] Problem is routing CM REQ

2007-02-09 Thread Sean Hefty
So basically what you are saying is that the TClass and FlowLabel act as some kind of global dis-ambiguation that lets all SAs know that the tuple SGID,DGID,TClass,FlowLabel MUST be matched with LRH_A,LRH_B on each side. Sort of... My reasoning is that if you look at a packet traveling from the