Re: [openib-general] [PATCH] IB_CM: Limit the MRA timeout

2006-10-03 Thread Sean Hefty
Ishai Rabinovitz wrote: There is a bug in SRP Engenio target that send a large value as service timeout. (It gets 30 which mean timeout of (2^(30-8))=4195 sec.) Such a long timeout is not reasonable and it may leave the kernel module waiting on wait_for_completion and may stuck a lot of

Re: [openib-general] 2.6.18 kernel support in the main trunk.

2006-10-02 Thread Sean Hefty
James Unfortunately this does happen. Sean has already said he James can only access git trees at kernel.org. I think he just said that he can only access git trees via http://. I can access git://git.kernel.org or http://git.somewhere.else. - Sean

Re: [openib-general] Kernel Oops in user-mad, mad

2006-10-02 Thread Sean Hefty
Hal Rosenstock wrote: Is there a possibility that there is a double deletion from a list somewhere? Perhaps but I don't see it. Sean ? Roland ? I looked at this and couldn't find anything obviously wrong. I was waiting to hear back to Michael's question about module unload being

Re: [openib-general] [RFC] [PATCH] ib_cm: send DREP in response to unmatched DREQ

2006-10-01 Thread Sean Hefty
This is correct. Note that the number of DREQ retries was changed to 15 now. do you mean internally changed in the CM or somehow controlled from the outside by uDAPL? I meant the number of retries set by RDMA CM. - Sean ___ openib-general mailing

[openib-general] [PATCH 2/5] 2.6.19 rdma_cm: fix device removal race

2006-09-29 Thread Sean Hefty
in cma_req_handler() so that process A will return instead of doing a rdma_destroy_id(). Signed-off-by: Krishna Kumar [EMAIL PROTECTED] Signed-off-by: Sean Hefty [EMAIL PROTECTED] --- diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 69bb089..f383a4f 100644 --- a/drivers

[openib-general] [PATCH 4/5] 2.6.19 rdma_cm: eliminate unnecessary remove list

2006-09-29 Thread Sean Hefty
Eliminate remove_list by using list_del_init instead during device removal handling. Signed-off-by: Krishna Kumar [EMAIL PROTECTED] Signed-off-by: Sean Hefty [EMAIL PROTECTED] --- This removes a stack variable and simplifies the code, but does not fix any bugs. We can defer this to 2.6.20

[openib-general] [PATCH 5/5] 2.6.19 rdma_cm: optimize error handling

2006-09-29 Thread Sean Hefty
Re-organize code relating to cma_get_net_info() and rdam_create_id() to optimize error case handling (no need to alloc memory/etc. as part of rdma_create_id() if input parameters are wrong). Signed-off-by: Krishna Kumar [EMAIL PROTECTED] Signed-off-by: Sean Hefty [EMAIL PROTECTED] --- This does

Re: [openib-general] 2.6.18 kernel support in the main trunk.

2006-09-29 Thread Sean Hefty
Steve Wise wrote: Why? I don't see anything wrong with the git trees that are at www.mellanox.co.il right now. Just trying to simplify things and centralize the technology location... Well, for myself, I have been unable to access the git trees at mellanox. For me to access git

[openib-general] [PATCH] ib_cm: fix module unload race with timewait

2006-09-29 Thread Sean Hefty
their deferred work on module unload. Signed-off-by: Sean Hefty [EMAIL PROTECTED] --- Erez, can you see if this fixes the crash problem that you're seeing? Index: cm.c === --- cm.c(revision 9680) +++ cm.c(working copy

Re: [openib-general] RDMA CM callback status

2006-09-28 Thread Sean Hefty
Can you post a patch pls? This was the patch committed to svn. I'm creating a patch set for review for 2.6.19/2.6.20 to merge the svn code upstream. I will post those patches against the 2.6.19 code tree when they are ready. Signed-off-by: Sean Hefty [EMAIL PROTECTED] Index: core/cma.c

Re: [openib-general] [RFC] [PATCH] ib_cm: send DREP in response to unmatched DREQ

2006-09-28 Thread Sean Hefty
Or Gerlitz wrote: My understanding is that without this patch the side that sends the DREQ would do few DREQ resends as of the firsts DREPs being lost and no DREPs sent once the id at the peer side left the timewait state, correct? This is correct. Note that the number of DREQ retries was

Re: [openib-general] 2.6.18 kernel support in the main trunk.

2006-09-28 Thread Sean Hefty
Matt Leininger wrote: I'd add one more thing. To make the OFED release process go more smoothly I'd like to see the maintainers for each stack component spin out releases from time to time. Roland has been doing this with libmthca and libibverbs. If we had the development releases for

Re: [openib-general] 2.6.18 kernel support in the main trunk.

2006-09-28 Thread Sean Hefty
Roland Dreier wrote: Not to be difficult -- but I disagree. I think this statement doesn't actually make sense, because: ** what does latest mean?? ** I think this is more a matter of whether there's a single, main development branch somewhere, or if one even needs to exist. Well, I think

Re: [openib-general] oops after rmmod ib_cm when stopping iSER

2006-09-27 Thread Sean Hefty
Erez Zilber wrote: When stopping iSER, we run 'modprobe -r ib_iser'. Then, we see an oops (below). In order to check which module caused that oops, I replaced the 'modprobe -r' call with rmmod for each module: rmmod ib_iser rmmod libiscsi rmmod scsi_transport_iscsi rmmod rdma_cm rmmod

Re: [openib-general] Different byte order between gen1 CM and gen2 CM -RE: How to connect gen2 CM to gen1 IBGD CM?

2006-09-27 Thread Sean Hefty
Sean Hefty wrote: The byte ordering in the kernel APIs are fairly clear about this, but that documentation didn't carry up to userspace everywhere. I will update the userspace documentation, but it may take me a few weeks to get to this. I've added some additional comments next to structure

Re: [openib-general] [RFC] [PATCH] ib_cm: send DREP in response to unmatched DREQ

2006-09-27 Thread Sean Hefty
Sean Hefty wrote: Currently a DREP is only sent in response to a DREQ if a connection has been found matching the DREQ, and it is in the proper state. Once a DREP is sent, the local connection moves into timewait. Duplicate DREQs received while in this state result in re-sending the DREP

Re: [openib-general] [PATCH] ucma : Encapsulate duplicate code to common routine

2006-09-27 Thread Sean Hefty
Krishna Kumar wrote: Encapsulate duplicate code to common routine - avoid checking same errors in multiple places. I went back and forth on this, but ended up committing it, since it does slightly simplify maintenance. - Sean ___ openib-general

Re: [openib-general] [PATCH] id_priv_list-list is not initialized sometimes

2006-09-27 Thread Sean Hefty
Krishna Kumar wrote: rdma_listen could be called from a context where id_priv-list is not initialized. Then at a later stage, a cma_cancel_listen does a list_del() which could oops since this element is not on any list. Eg, in rdma_listen(), if id-device is !NULL, it calls cma_ib_listen()

Re: [openib-general] [PATCH] Fix freed mem deref race in cma_process_remove/cma_req_handler

2006-09-27 Thread Sean Hefty
Good catch. Thanks - committed. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH] fix cma_leave_mc_groups

2006-09-27 Thread Sean Hefty
Krishna Kumar wrote: - cma_leave_mc_groups can race with other routines updating or reading the mclist, so use lock. Eg while doing a rdma_destroy_id(), other processes could be looking at this id and de-referencing mclist. I don't think that there's an issue here. The mc_list is only

Re: [openib-general] RDMA CM callback status

2006-09-27 Thread Sean Hefty
Sean Hefty wrote: 1. Should I even be looking at event-status or does the event type tell me everything I need to know? I've had a report that the assertion (event-status != 0) is failing on RDMA_CM_EVENT_ROUTE_ERROR. It sounds like (and looks like from reading the code) that you've hit

Re: [openib-general] [RFC] determining which changes in svn to merge upstream or remove

2006-09-26 Thread Sean Hefty
BTW, there was a set of bugfix patches for CMA posted that didn't get acked or nacked yet. They looked sane and I took them into ofed - could you take the time to review please? Should I repost? It might make sense to put stability fixes in before adding more features. I've actually been on

Re: [openib-general] [RFC] determining which changes in svn to merge upstream or remove

2006-09-26 Thread Sean Hefty
Connections taking 60 sec to create is an issue. Can you please explain how the fact that some connections are used affect the time it takes to send the response? This is in userspace, and IMO, an application issue. Threads using established connections simply begin consuming all processor time.

[openib-general] [RFC] determining which changes in svn to merge upstream or remove

2006-09-25 Thread Sean Hefty
Now that changes from the iWarp branch have been merged upstream, I wanted to get feedback about migrating existing changes in svn upstream, or removing features from svn. Specifically, the following features are in svn only: * RDMA CM: - userspace support - multicast support

Re: [openib-general] RDMA CM callback status

2006-09-21 Thread Sean Hefty
1. Should I even be looking at event-status or does the event type tell me everything I need to know? I've had a report that the assertion (event-status != 0) is failing on RDMA_CM_EVENT_ROUTE_ERROR. The event type is usually sufficient. In the case of an error, the status should provide

Re: [openib-general] Negotiation of Rsponder resource Initiator depth

2006-09-20 Thread Sean Hefty
Erez Zilber wrote: In the IB spec it says in 12.7.29: The recipient of the REQ message shall choose a local Initiator Depth that does not exceed the Responder Resources offered in the REQ. If the recipient of the REQ message is unwilling or unable to do so, it shall send a REJ message to

Re: [openib-general] Different byte order between gen1 CM and gen2 CM -RE: How to connect gen2 CM to gen1 IBGD CM?

2006-09-14 Thread Sean Hefty
Bub Thomas wrote: Do you know rany other Verbs or CM parameter that does have a different byte order between gen1 and gen2? I'm not really familiar with the gen1 code. P.S.: Maybe someone should put a big “Warning” sign somewhere so that others don’t stumple into that pit again. ;-) The

Re: [openib-general] [PATCH] IB/Kconfig: add help text and change CMA config name

2006-09-14 Thread Sean Hefty
Or Gerlitz wrote: change INFINIBAND_ADDR_TRANS to INFINIBAND_RDMA_CM and add help text clarifying what the thing does. Adding the help text also has the side effect of the cma config being visible when one does make menuconfig Signed-off-by: Or Gerlitz [EMAIL PROTECTED] Acked-by: Sean Hefty

[openib-general] [PATCH v2] ib_sa: add generic RMPP query interface

2006-09-14 Thread Sean Hefty
a query (e.g. multipath record queries), but it also simplifies a userspace interface. The implementation of existing SA query routines were layered on top of the generic query interface. Signed-off-by: Sean Hefty sean.hefty at intel.com --- Index: include/rdma/ib_sa.h

Re: [openib-general] [PATCH] RDMA/cma: document error flow of rdma_accept

2006-09-14 Thread Sean Hefty
I merge 100 patches every kernel release. If I have to spend an extra 5 minutes creating a patch or pulling it out of svn, then I end up burning an extra day of stupid work. If 20+ people who contribute patches sent me clean patches, then everyone will be happier because I'll be able to merge

[openib-general] [RFC] [PATCH] ib_cm: send DREP in response to unmatched DREQ

2006-09-14 Thread Sean Hefty
increase the timewait state before a QP can be re-used when CM messages are not lost. An alternative is to send a DREP in response to a DREQ, even if a local connection is not found, which is what this patch does. Signed-off-by: Sean Hefty [EMAIL PROTECTED] --- Index: cm.c

Re: [openib-general] [PATCH] RDMA/cma: document error flow of rdma_accept

2006-09-13 Thread Sean Hefty
Committed to svn 9461. Roland, can you also pull into 2.6.19? Signed-off-by: Sean Hefty [EMAIL PROTECTED] Or Gerlitz wrote: Document the reject sending and modifying qp to error done in rdma_accept Signed-off-by: Or Gerlitz [EMAIL PROTECTED] diff --git a/include/rdma/rdma_cm.h b/include

Re: [openib-general] [PATCH for-2.6.18] Re: [PATCH] IB/cma: add rdma_establish

2006-09-13 Thread Sean Hefty
. Signed-off-by: Michael S. Tsirkin [EMAIL PROTECTED] Dropping 3 packets in a row seems likely only under stress testing, so I'm not sure that this is worthy of a change to 2.6.18 at this point (we're at rc7). This seems fine for 19 though. Acked-by: Sean Hefty [EMAIL PROTECTED

Re: [openib-general] [PATCH] Optimize cma_process_remove()

2006-09-13 Thread Sean Hefty
Krishna Kumar wrote: Thanks for the explanation. So a list_del_init() would be the best thing to do. Another option is to add a remove_list to rdma_id_private by which this entry could be added to a local remove_list and traversed without holding a lock, but it doesn't make sense to add that

Re: [openib-general] How to connect gen2 CM to gen1 IBGD CM?

2006-09-13 Thread Sean Hefty
Bub Thomas wrote: Do you have a cmpost for gen1 IBGD I can use to connect from gen2 to gen1? No - the gen1 code is really the old Topspin code. Topspin is now part of Cisco, so they may have something. Or is there any other trick to play here? I don't think so. I'm pretty sure that this

Re: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K

2006-09-13 Thread Sean Hefty
Hal Rosenstock wrote: But it only needs the MTU on each local side (once for the REQ and on the remote side for the REP). It would mean that if the local side were capable of larger MTU and the remote side were Tavor, that the REQ would be REJ with MTU too large and need to be retried at a

Re: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K

2006-09-13 Thread Sean Hefty
Michael S. Tsirkin wrote: I think we can do that without breaking IPoIB. IPoIB needs mtu = 1K. IPoIB sets mtu selector to = 2K. I am talking about users that do not set mtu selector. The ipoib spec requires support for a 2k MTU, but allows support for smaller MTUs. I agree that if the ipoib

Re: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K

2006-09-13 Thread Sean Hefty
Putting knowledge about hw quirks in all protocols is really horrible. Agreed. MTU should be decided by SA as part of path information. If ULPs have spicific limitations wrt MTU they should use mtu selector in path record query. Thinking about this more, the proper place for this does seem to

Re: [openib-general] [PATCH for-2.6.18] Re: [PATCH] IB/cma: add rdma_establish

2006-09-13 Thread Sean Hefty
Michael S. Tsirkin wrote: I don't really understand. The fix is a one-liner. The problem is observed in practice, under stress. Who *wants* systems that fall apart under stress? My view is: is this worth delaying the release of the kernel? And I don't see that it is at this point in the

Re: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K

2006-09-13 Thread Sean Hefty
Michael S. Tsirkin wrote: Although, I don't like the idea of the CMA changing every path to use an MTU of 1k. Well, that's why it's off by default. So, Ack? I'd like to find a way to support a 1k MTU to tavor HCAs without making the MTU 1k to other HCAs, in case we're dealing with a

Re: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K

2006-09-13 Thread Sean Hefty
Michael S. Tsirkin wrote: That's the default and not the minimum MTU (for IPoIB). How isn't it? By default, IPoIB reports 2K MTU to linux. So it will get 2K packets, and since IB swiches can not fragment packets, they will simply get dropped. I think this is simply the difference between the

Re: [openib-general] [PATCH] IB/cma: add rdma_establish

2006-09-13 Thread Sean Hefty
Michael S. Tsirkin wrote: IB/cma: add rdma_establish Make it possible for ULPs to handle RTU loss by calling rdma_establish. I've committed this patch to svn 9470. It still requires exporting the rdma_establish call to userspace. - Sean ___

Re: [openib-general] [PATCH v3] ib_sa: require SA registration

2006-09-13 Thread Sean Hefty
Roland Dreier wrote: OK, I added the following to my for-2.6.19 branch. The differences from your patch are: - CMA can have a static variable (good to avoid clashes with a global 'sa_client' variable name too) - IPoIB does not use multicast module upstream, fix ipoib_multicast.c too.

Re: [openib-general] [PATCH] IB/cma: add rdma_establish

2006-09-12 Thread Sean Hefty
Or Gerlitz wrote: Just to make sure, you come to say that you would merge this patch instead the one that had the CM track local qp numbers and install a callback for the consumer QP to catch the async event etc? correct Indeed the **patch** for itself is somehow simpler, but the consumer

Re: [openib-general] [PATCH] Optimize cma_process_remove()

2006-09-12 Thread Sean Hefty
Krishna Kumar2 wrote: mutex_lock(lock); while (!list_empty(cma_dev-id_list)) { id_priv = list_entry(cma_dev-id_list.next, struct rdma_id_private, list); if (cma_internal_listen(id_priv)) {

Re: [openib-general] cmpost establisehd connections are very fragile!?

2006-09-12 Thread Sean Hefty
Bub Thomas wrote: What I don’t understand why the local_cm_response_timeout set to 254 instead of 20 can block IBV_WR_SEND from client to server while the opposite direction from server to client works!? local_cm_response_timeout is a 5-bit value. It's 4.096 x 2 ^ local_cm_response_timeout

Re: [openib-general] [PATCH] RDMA/cma: document error flow of rdma_accept

2006-09-12 Thread Sean Hefty
Or Gerlitz wrote: + * In the case of error, a reject message is sent to the remote side and the + * state of the qp associated with the id is modified to error, such that any + * previously posted receive buffers would be flushed. Hmm... this makes me question whether this is what it should be

Re: [openib-general] [PATCH] for-2.6.19 cma: protect against adding device during destruction

2006-09-12 Thread Sean Hefty
Can you queue this for 2.6.19 ? Roland, can you pull this patch in for 2.6.19? It's SVN check-in 9273. --- Clarify that rdma_destroy_id cancels outstanding asynchronous operations on the Associated id. Signed-off-by: Or Gerlitz [EMAIL PROTECTED] Signed-off-by: Sean Hefty [EMAIL PROTECTED

Re: [openib-general] [PATCH] cma_connect_ib leaks memory in failure cases.

2006-09-12 Thread Sean Hefty
Michael S. Tsirkin wrote: The ib_cm_id will be cleaned up if the rdma_cm_id is destroyed, as long as a second call is not made to rdma_connect after the first call fails. So we're probably safe deferring this until 2.6.19, unless someone has code which calls rdma_connect twice. SDP can do

Re: [openib-general] an example to use of multicast messages over the verbs exists in the openib svn

2006-09-12 Thread Sean Hefty
This test (for now) don't send any join message to the SA, it only attach (and detach) the QP to the multicast group. I posted a simple multicast test program that uses the proposed libibsa interface in: http://openib.org/pipermail/openib-general/2006-August/025433.html (See the program at the

Re: [openib-general] RFC: mthca: implement timewait by tracking QPNs

2006-09-12 Thread Sean Hefty
Well, the idea of pushing timewait handling down into the low-level drivers seems strange to me. I don't think any other stack or any other OS does anything like this. I think the Windows IB stack may do something similar. The difficulty is doing this at a higher level is that the QP must be

Re: [openib-general] [PATCH] IB/cma: add rdma_establish

2006-09-12 Thread Sean Hefty
Michael S. Tsirkin wrote: As a side note, reasons for frequent loss of RTU must be investigated. A lost RTU shouldn't be any more likely than a lost REQ or REP. Is the RTU never showing up? Seems like that. I know fir sure I do accept after REP but remote side never gets ESTABLISHED. I

Re: [openib-general] CMA issue: bind selects the same port after close

2006-09-12 Thread Sean Hefty
I completely understand that the existing port management services are not exported, but functionally, they support multiple port spaces, show up in netstat, etc... Can someone please explain to me the reluctance to use these services in favor of replicating them? My reluctance to use the

Re: [openib-general] [PATCH v3] ib_sa: require SA registration

2006-09-11 Thread Sean Hefty
Roland Dreier wrote: I haven't really read the later patches but I am planning on merging at least the registration stuff for 2.6.19. I'd like to commit the SA related patches soon. There have been several e-mails recently about using IB multicast and the IB CM directly. - Sean

Re: [openib-general] Wrong byte order in lid of struct ibv_port_attr reported by ibv_query port!?

2006-09-11 Thread Sean Hefty
Bub Thomas wrote: with the help of your modified cmpost.c example I found out that the byte order in the lid your query_for_path in cmpost.c is getting into the ib_sa_path_rec is the opposite to the one reported by ibv_query_port. The path record defines all fields in network-byte order.

Re: [openib-general] RDMA CMA and C++

2006-09-11 Thread Sean Hefty
Dotan Barak wrote: The user-mode cm header files don't have the C++ stuff to identify all the declarations as C. The verbs.h file has it and works fine if you wanted to copy it, but all you really need is ... Sean, please add those definitions to the libibcm header as well. I've updated the

Re: [openib-general] [PATCH] cma_connect_ib leaks memory in failure cases.

2006-09-11 Thread Sean Hefty
Michael S. Tsirkin wrote: cma_connect_ib leaks an struct ib_cm_id* in failure cases. Signed-off-by: Krishna Kumar [EMAIL PROTECTED] This one looks like it might be good for 2.6.18. Sean? The ib_cm_id will be cleaned up if the rdma_cm_id is destroyed, as long as a second call is not made to

Re: [openib-general] [PATCH] cma_connect_ib leaks memory in failure cases.

2006-09-11 Thread Sean Hefty
Krishna Kumar wrote: cma_connect_ib leaks an struct ib_cm_id* in failure cases. Thanks - committed. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit

Re: [openib-general] [PATCH] Modify callers of cma_get_net_info for better error handling.

2006-09-11 Thread Sean Hefty
Krishna Kumar wrote: Re-organize code relating to cma_get_net_info() and rdma_create_id() to optimize error case handling (no need to alloc memory/etc as part of rdma_create_id() if input parameters are wrong). Thanks! Committed with a minor adjustment to rename 'out' label 'err'. - Sean

Re: [openib-general] [PATCH] Optimize cma_process_remove()

2006-09-11 Thread Sean Hefty
Krishna Kumar wrote: static void cma_process_remove(struct cma_device *cma_dev) { struct list_head remove_list; - struct rdma_id_private *id_priv; + struct rdma_id_private *id_priv, *tmp; int ret; INIT_LIST_HEAD(remove_list); @@ -2344,22 +2344,20 @@ static

Re: [openib-general] [PATCH v3] ib_sa: require SA registration

2006-09-11 Thread Sean Hefty
- CMA can have a static variable (good to avoid clashes with a global 'sa_client' variable name too) Sounds good - that's a goof on my part. - IPoIB does not use multicast module upstream, fix ipoib_multicast.c too. Okay - As an FYI, I will probably submit the multicast module upstream for

Re: [openib-general] [PATCH] IB/cma: add rdma_establish

2006-09-11 Thread Sean Hefty
Michael S. Tsirkin wrote: Sean, did we decide what to do for upstream yet? I would say we need something like the below for 2.6.19 too (probably just need to update node type check). And, I like it that this approach leaves all matters of policy to users (such as whether move QP to RTS after

Re: [openib-general] [librdmacm] execuation of the the test udaddy is failing

2006-09-06 Thread Sean Hefty
# udaddy udaddy: starting server librdmacm: Kernel ABI does not support requested port space. udaddy: listen request failed test complete return status -93 UD QP and multicast support requires kernel ABI version 2. It appears that the kernel version running is 1. - Sean

Re: [openib-general] libibcm can't connect/talk to libicm on other machine.

2006-09-05 Thread Sean Hefty
Bub Thomas wrote: Dotan, the ibv_rc_pingpong example works for me so I can exclude the architecture. I never got the libibcm example compiled. Which is your example and which architecture x86 vs. x86_64 did you compile it for? Can you share your libibcm the example code? (if it is not the

Re: [openib-general] [PATCH] for-2.6.19 cma: protect against adding device during destruction

2006-09-04 Thread Sean Hefty
ok, thanks for clarifying that, is cancellation allowed only for address resolution or also for route resolving and/or CM calls? also how about documenting this? Cancellation is allowed for any asynchronous operation. I will pull in your patch when I get back in the office. Thanks. - Sean

Re: [openib-general] rdmacm library

2006-09-04 Thread Sean Hefty
/usr/bin/ld: warning: libibverbs.so.1, needed by /usr/local/lib/librdmacm.so, may conflict with libibverbs.so.2 Does rdmacm use the older version of ibverbs or do I need to install rdmacm differently? I keep the RDMA CM updated with the latest version of verbs. There may be an issue with the

Re: [openib-general] [PATCH] for-2.6.19 cma: protect against adding device during destruction

2006-09-03 Thread Sean Hefty
Does this patch protects against the case where an rdma_cm_id is being destructed while address resolution related to the **same** id attaches it to a device? If yes, why does someone destroys this id? is it legal to do so? Yes - this protects against the user destroying the id while that same

Re: [openib-general] [PATCH] cma: protect against adding device during destruction

2006-09-01 Thread Sean Hefty
I'll test some, but the problem hasn't reappeared since. The patch looks right, I'd say push it for 2.6.18. We need the following change, which applies on top of the previous patch, as well. Add missing synchronization around acquiring an IB device. Signed-off-by: Sean Hefty [EMAIL PROTECTED

[openib-general] [PATCH] for-2.6.19 cma: protect against adding device during destruction

2006-09-01 Thread Sean Hefty
This closes a window where address resolution can attach an rdma_cm_id to a device during destruction of the rdma_cm_id. This can result in the rdma_cm_id remaining in the device list after its memory has been freed. Signed-off-by: Sean Hefty [EMAIL PROTECTED] --- I generated this patch off

[openib-general] [PATCH] 2.6.19 cma: fix typo

2006-08-31 Thread Sean Hefty
Comma should be semi-colon Signed-off-by: Sean Hefty [EMAIL PROTECTED] --- Please queue for 2.6.19 diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index d6f99d5..bf20410 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -265,7 +265,7

[openib-general] [PATCH] cma: protect against adding device during destruction

2006-08-31 Thread Sean Hefty
Can you see if this patch helps any? This closes a window where address resolution can attach an rdma_cm_id to a device during destruction of the rdma_cm_id. This can result in the rdma_cm_id remaining in the device list after its memory has been freed. Signed-off-by: Sean Hefty [EMAIL

Re: [openib-general] CMA oops

2006-08-30 Thread Sean Hefty
Michael S. Tsirkin wrote: Apparently, list-prev pointer in CMA id_priv structure is NULL which causes a crash in list_del. I note that rdma_destroy_id tests outside the mutex lock. Could that be the problem? The problem is not unfortunately easily reproducible. I think I see one bug, but

Re: [openib-general] CMA oops

2006-08-30 Thread Sean Hefty
Michael S. Tsirkin wrote: I'm trying to come up with a fix for this, but I'm not convinced it's the problem that you're seeing. Could be what you describe leads to a memory corruption. I believe so. If this were the cause of the crash, I would expect to see an issue with list-prev-prev or

Re: [openib-general] [PATCH ] RFC IB/cm do not track remote QPN in timewait state

2006-08-30 Thread Sean Hefty
. In this situation, the DREQ gets dropped repeatedly. We will want to queue this patch for 2.6.19, if you can point Roland to your git tree. Acked-by: Sean Hefty [EMAIL PROTECTED] ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman

Re: [openib-general] [PATCH v5 2/2] iWARP Core Changes.

2006-08-30 Thread Sean Hefty
Roland Dreier wrote: While merging this, I uninlined rdma_node_get_transport, since I don't think there's any reason to make it inline: I've committed the patch to svn to sync as well. - Sean ___ openib-general mailing list openib-general@openib.org

Re: [openib-general] [PATCH] libibcm: modify API to support multi-threaded event processing

2006-08-29 Thread Sean Hefty
There are compilation errors with this patch when using gcc 4.1.0: Hmmm... I will look into this. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit

Re: [openib-general] [PATCH] libibsa: userspace SA query and multicast support

2006-08-29 Thread Sean Hefty
Why SEND ? In general, couldn't it be used like SET/DELETE (in addition to being used like the GET method variants) ? Also, the SA doesn't use the SEND method. The latest version of the patch only allows GET or GET_TABLE for PathRecords ServiceRecords, and MCMemberRecords, and GET_MULTI for

Re: [openib-general] [PATCH ] RFC IB/cm do not track remote QPN in timewait state

2006-08-29 Thread Sean Hefty
Here's an idea: how about we move the whole timewait thing to low level driver, starting timer automatically upon QP destroy? I've thought about this too, and I think this may end up making the most sense. How would the driver determine how long the QP should remain in timewait, and how would you

Re: [openib-general] [PATCHES] for 2.6.19

2006-08-29 Thread Sean Hefty
I handled it all myself this time, but in the future it is easier for me if each patch is inline in a separate email. A couple of other things that would also make my life easier: That's not a problem. I think in the past I've just referred you to the svn revision numbers. I was just trying to

Re: [openib-general] [PATCH] libibcm: modify API to support multi-threaded event processing

2006-08-29 Thread Sean Hefty
Michael S. Tsirkin wrote: I think offsetof is defined in stddef.h, so you must include that. Dotan, Can you see if adding this include works for you? I just re-tested the build on my system, and it worked fine without it (gcc 3.3.3). Jack posted a patch for this earlier if you need one. -

Re: [openib-general] [PATCH ] RFC IB/cm do not track remote QPN in timewait state

2006-08-29 Thread Sean Hefty
Michael S. Tsirkin wrote: I've thought about this too, and I think this may end up making the most sense. How would the driver determine how long the QP should remain in timewait, Need to look into this - likely we can just add a call for that. Roland? The Intel gen1 code passed this into

Re: [openib-general] [PATCH] libibsa: userspace SA query and multicast support

2006-08-29 Thread Sean Hefty
Hal Rosenstock wrote: OK. So shouldn't IBV_SA_METHOD_SEND be removed from sa_net.h ? I was just defining the well known methods. I can remove this. By raw access, do you mean SEND_MAD operation ? How do those applications gain this privilege ? The kernel module exports two files to

Re: [openib-general] [PATCH] libibcm: Need to include stddef.h in cm.c for SLES10 compilations

2006-08-29 Thread Sean Hefty
Jack Morgenstein wrote: Fix compilation on SLES10: cm.c uses offsetof, so it must include stddef.h Thanks - committed in 9150. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To

Re: [openib-general] libibcm can't open /dev/infiniband/ucm0

2006-08-29 Thread Sean Hefty
Looked into the openIB kernel sources and found that the minor number seems to be wrong in the README file. With a minor number 224 and the creation like: mknod /dev/infiniband/ucm0 c 231 224 The README file was never updated when the userspace CM added per device handling. I've updated

Re: [openib-general] [PATCH ] RFC IB/cm do not track remote QPN in timewait state

2006-08-29 Thread Sean Hefty
Sean Hefty wrote: How would the driver determine how long the QP should remain in timewait The spec isn't totally clear to me on this, but here's what I can gather: timewait = packet lifetime x 2 + remote ack delay local_ack_timeout (in CM REQ) = packet lifetime x 2 + local ack delay Verbs

Re: [openib-general] [PATCH ] RFC IB/cm do not track remote QPN in timewait state

2006-08-29 Thread Sean Hefty
Michael S. Tsirkin wrote: Verbs gets local_ack_timeout through qp_attr.timeout when modifying the QP to RTS. Isn't that RTR? It's the transition from RTR to RTS. So it seems we won't need any API changes. This begins to look good. I waner what Roland and other low level driver

Re: [openib-general] [PATCH ] RFC IB/cm do not track remote QPN in timewait state

2006-08-29 Thread Sean Hefty
Michael S. Tsirkin wrote: Hmm. But you need timewait already after you get to RTR, right? The active side looks fine. The passive side can enter timewait without moving through RTS if it gets an RTU timeout. I'm not sure how much going into timewait really helps in this case though. If we

Re: [openib-general] [PATCH ] RFC IB/cm do not track remote QPN in timewait state

2006-08-29 Thread Sean Hefty
Michael S. Tsirkin wrote: If we completely ignore timewait, what conditions are required to have a problem occur? Outstanding packets with PSNs and QP numbers coinside between the 2 connections. Look for Stale packet in IB spec. From what I can tell, a QP will receive an incoming packet

Re: [openib-general] CMA oops

2006-08-28 Thread Sean Hefty
Michael S. Tsirkin wrote: Apparently, list-prev pointer in CMA id_priv structure is NULL which causes a crash in list_del. I note that rdma_destroy_id tests outside the mutex lock. Could that be the problem? The problem is not unfortunately easily reproducible. I'll see if I see a problem.

Re: [openib-general] [PATCH ] RFC IB/cm do not track remote QPN in timewait state

2006-08-28 Thread Sean Hefty
Michael S. Tsirkin wrote: Comments appreciated. I will look at the spec in more details, but I thought that timewait was included as part of the life of a connection. I.e. the connection wasn't released until it returned to idle. Also, isn't the purpose behind timewait to prevent

Re: [openib-general] [PATCH ] RFC IB/cm do not track remote QPN in timewait state

2006-08-28 Thread Sean Hefty
Michael S. Tsirkin wrote: IB spec, section 12.4, says: CMs shall maintain enough connection state information to detect an attempt to initiate a connection on a remote QP/EEC that has not been released from a connection with a local QP/EEC, or that is in the TimeWait

Re: [openib-general] [PATCH ] RFC IB/cm do not track remote QPN in timewait state

2006-08-28 Thread Sean Hefty
Michael S. Tsirkin wrote: So, you must somehow detect that the remote QP is in timewait state. I don't see any way to do this, and this is not what the CM currently does. Our CM tracks local QPs in timewait state, which is obviously not what the spec intends since remote QP could be reused

Re: [openib-general] [PATCH ] RFC IB/cm do not track remote QPN in timewait state

2006-08-28 Thread Sean Hefty
Michael S. Tsirkin wrote: Another problem that I see is that CMA currently seems to completely mask timewait exit. This is correct. So there's no way to properly handle timewait on top of cma that I can see. I don't think so, which is what brought up the problem with Arlin. (He's using

Re: [openib-general] [PATCH ] RFC IB/cm do not track remote QPN in timewait state

2006-08-28 Thread Sean Hefty
Michael S. Tsirkin wrote: I believe communication id should be checked to detect duplicates. Right? Can you clarify this? Check the remote comm id of an incoming REQ against a value in timewait? Remote QPN stale connection rule is only to avoid a case where we keep connection in established

Re: [openib-general] drop mthca from svn?

2006-08-28 Thread Sean Hefty
Well, what is an OpenFabrics driver anyway? I'm interesting in writing Linux drivers to be honest. It's often ignored, but OpenFabrics does include Windows. My understanding is that the requirement for lower level components is that they must be licensed using dual GPL / BSD. This agreement

[openib-general] [PATCHES] for 2.6.19

2006-08-28 Thread Sean Hefty
- randomize starting local comm id Let me know if you'd prefer these in another format (such as inline). - Sean From d697059a6f69e19c18a50c87df20894d253d3d8f Mon Sep 17 00:00:00 2001 From: Sean Hefty [EMAIL PROTECTED] Date: Mon, 28 Aug 2006 15:15:18 -0700 Subject: [PATCH] Randomize the starting local

Re: [openib-general] [PATCH] libibcm: modify API to support multi-threaded event processing

2006-08-28 Thread Sean Hefty
Sean Hefty wrote: Modify the libibcm API to provide better support for multi-threaded event processing. CM devices are no longer tied to verb devices and hidden from the user. This should allow an application to direct events to specific threads for processing. This patch also removes

Re: [openib-general] basic IB doubt

2006-08-25 Thread Sean Hefty
Thomas How does an adapter guarantee that no bridges or other Thomas intervening devices reorder their writes, or for that Thomas matter flush them to memory at all!? That's a good point. The HCA would have to do a read to flush the posted writes, and I'm sure it's not doing that

Re: [openib-general] librdmacm ABI issues with OFED 1.1

2006-08-24 Thread Sean Hefty
Michael S. Tsirkin wrote: Maybe the librdmacm part should be merged to svn? So librdmacm could try to read from misc, then from /sys/class/infiniband/rdma_cm, and then assume latest. It's good to have userspace code portable across distros ... I can go with that. - Sean

Re: [openib-general] [PATCH 0/4] Dispatch communication related events to the IB CM

2006-08-24 Thread Sean Hefty
Michael S. Tsirkin wrote: And even with these proposed changes, there's a race condition where the CM can timeout a connection after data is received over it, but before this event can be processed. Hmm. And what happens then? The connection is aborted by the CM. The CM sends a REJ for the

<    1   2   3   4   5   6   7   8   9   10   >