Re: [openib-general] [PATCH] [RFC] librdmacm: expose device list to users

2006-07-14 Thread Sean Hefty
Pradipta Kumar Banerjee wrote: Thanks Sean for adding this functionality. This was needed. This was committed to svn 8523. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To

Re: [openib-general] is wc valid if ib_poll_cq() returns zero

2006-07-14 Thread Sean Hefty
somenath wrote: 1. if ib_poll_cq(cq, 1, wc) returns zero, does wc contain a valid entry? no * Poll a CQ for (possibly multiple) completions. If the return value * is 0, an error occurred. If the return value is = 0, it is the * number of completions returned. If the return value is

Re: [openib-general] is wc valid if ib_poll_cq() returns zero

2006-07-14 Thread Sean Hefty
somenath wrote: 2. why is the io completion routine called when ib_poll_cq() returns zero? does this kind of notification contain any information? is there some error happening here? what are some possible problem areas? any wild guess...? Can you clarify what's happening? Are you calling

Re: [openib-general] is wc valid if ib_poll_cq() returns zero

2006-07-14 Thread Sean Hefty
somenath wrote: just to make sure I conveyed the exact thing I meant, if I change the above code as follows: while (ib_poll_cq(cq, 1, wc) 0) { process completion(); } rearm CQ; then I just get notification once, and don't get any futher notifications...so I assume rearm CQ should

Re: [openib-general] is wc valid if ib_poll_cq() returns zero

2006-07-14 Thread Sean Hefty
Can you also post your code, including the completion handler routines and QP creation / initialization sections? - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe,

Re: [openib-general] is wc valid if ib_poll_cq() returns zero

2006-07-14 Thread Sean Hefty
somenath wrote: int io_complete( struct ib_cq *cq, void *passed_arg) { xxx_connection_t*arg = passed_arg; xxx_status_tstat = xxx_st_ok; struct ib_wc wc; int count = 0; if (count = ib_poll_cq(cq, 1, wc) 0) { I think this evaluates ib_poll_cq(..)

Re: [openib-general] is wc valid if ib_poll_cq() returns zero

2006-07-14 Thread Sean Hefty
somenath wrote: I think this evaluates ib_poll_cq(..) 0 before doing the assignment. Since the expression evaluates to false, count is assigned 0. Can you try modifying this to: if ((count = ib_poll_cq(..)) 0) - Sean I added that stuff, but it didn't make a difference...it still

Re: [openib-general] multicast

2006-07-13 Thread Sean Hefty
Yes - I'm actually talking about a separate issue here. It looks like using the RDMA CM for multicast is going to require using it for all of my connection management, so I'm looking at what that entails. Currently I'm using only ibverbs and Open MPI's runtime environment layer. The RDMA CM is

[openib-general] [PATCH] librdmacm: remove dependency on sysfs

2006-07-13 Thread Sean Hefty
Remove libsysfs usage from librdmacm. Signed-off-by: Sean Hefty [EMAIL PROTECTED] ---Index: configure.in === --- configure.in(revision 8215) +++ configure.in(working copy) @@ -25,8 +25,6 @@ AC_CHECK_SIZEOF(long) dnl

[openib-general] [PATCH] [RFC] librdmacm: expose device list to users

2006-07-13 Thread Sean Hefty
. By exposing the device list to the user, it makes it easier for the user to allocate device specific resources (such as PDs, CQs, etc.) that are shared among multiple rdma_cm_id's. Signed-off-by: Sean Hefty [EMAIL PROTECTED] --- Index: include/rdma/rdma_cma.h

Re: [openib-general] ipoib lockdep warning

2006-07-12 Thread Sean Hefty
Michael S. Tsirkin wrote: Yes, this is true for users that pass GFP_ATOMIC to sa_query, at least. But might not be so for other users: send_mad in sa_query actually gets gfp_flags parameter, but for some reason it does not pass it to idr_pre_get, which means even sa query done with

Re: [openib-general] [PATCH] validate MADs issued from userspace for spec compliance C13-18.1.1

2006-07-12 Thread Sean Hefty
Hal Rosenstock wrote: and running multiple copies of opensm on different systems. Not sure what that would fail. The other SMs should be standbys. I can't think of what would fail in osmtest off the top of my head but haven't tried this yet but am now about to. I was starting / stopping openSM

Re: [openib-general] [PATCH] validate MADs issued from userspace for spec compliance C13-18.1.1

2006-07-12 Thread Sean Hefty
I was starting / stopping openSM on different systems soon before running the tests. Not sure I quite understand the sequencing. I was being somewhat random, just trying to stress things. How quickly will one SM take over for another after one dies? Can you run with -V and send me the output

Re: [openib-general] multicast

2006-07-12 Thread Sean Hefty
Andrew Friedley wrote: I'm trying to understand how the ibverbs multicast API works, but I'm not sure how multicast groups are created. I understand that ibv_attach_mcast() and ibv_detach_mcast() are used to leave/join a particular multicast group, but IB architecture spec indicates a

[openib-general] openSM failover / failback issue?

2006-07-12 Thread Sean Hefty
Hal Rosenstock wrote: With the default sminfo_polling_timeout of 10 seconds and default polling_retry_number of 4, so the total handoff time should be around 40 seconds. I just did that experiment with 2 SMs and saw that as well. Okay - I narrowed down the test case to something reproducible.

Re: [openib-general] openSM failover / failback issue?

2006-07-12 Thread Sean Hefty
I don't know if this is an HCA firmware issues, switch issue, or openSM issue. I don't think it's related to my changes or osmtest at this point. I'll see if I can reproduce this tomorrow. Also, can you send me the guid2lid files from the 3 SMs ? I'll send this tomorrow. Before reloading

Re: [openib-general] multicast

2006-07-12 Thread Sean Hefty
I'm concerned about how rdma_cm abstracts HCAs. It looks like I can use the src_addr argument to rdma_resolve_addr() to select which IP address/HCA (assuming one IP per HCA), but how can I enumerate the available HCAs? The HCA / RDMA device abstraction is there for device hotplug, but the verb

Re: [openib-general] [PATCH] validate MADs issued from userspace for spec compliance C13-18.1.1

2006-07-11 Thread Sean Hefty
Michael S. Tsirkin wrote: This will be 2.6.18 material, right? I want to get some wider testing of the patch before pushing upstream, but I consider this a bug fix that we should try to push into 2.6.18. - Sean ___ openib-general mailing list

Re: [openib-general] [RFC] [PATCH 2/7] ibrdmaverbs config files 2

2006-07-11 Thread Sean Hefty
Krishna Kumar wrote: The intention was never to break the existing applications, since I am not suggesting to remove libibverbs immediately. The intention is : when all applications are converted to use the new API, then the libibverbs can be removed. Does that sound reasonable ? Otherwise

Re: [openib-general] [PATCH] validate MADs issued from userspace for spec compliance C13-18.1.1

2006-07-11 Thread Sean Hefty
I consider this a bug fix that we should try to push into 2.6.18. Right. Is this in SVN at the moment? I haven't checked this in yet. I was just about to run some additional tests before doing this. My biggest concern is that modifying the TID for SENDs may cause an issue with some

Re: [openib-general] [PATCH] validate MADs issued from userspace for spec compliance C13-18.1.1

2006-07-11 Thread Sean Hefty
I haven't checked this in yet. I was just about to run some additional tests before doing this. My biggest concern is that modifying the TID for SENDs may cause an issue with some application. How about we check that application has put 0 in high 32 bit, and return an error if it did not?

Re: [openib-general] which patches are needed for 2.6.17 kernel?

2006-07-11 Thread Sean Hefty
Johann George wrote: Still having problems compiling: CC [M] drivers/infiniband/core/addr.o In file included from drivers/infiniband/core/addr.c:38: drivers/infiniband/include/rdma/ib_addr.h:43: error: field 'dev_type' has incomplete type drivers/infiniband/core/addr.c: In function

Re: [openib-general] ipoib lockdep warning

2006-07-11 Thread Sean Hefty
Quoting r. Zach Brown [EMAIL PROTECTED]: BC: query_idr.lock is taken with interrupts enabled and so is implicitly ordered before dev-_xmit_lock which is taken in interrupt context. ipoib_mcast_join_task() ipoib_mcast_join() ib_sa_mcmember_rec_query() send_mad()

Re: [openib-general] [PATCH] validate MADs issued from userspace for spec compliance C13-18.1.1

2006-07-11 Thread Sean Hefty
Sean Hefty wrote: @@ -438,6 +493,11 @@ static ssize_t ib_umad_write(struct file copy_offset = IB_MGMT_RMPP_HDR; rmpp_active = ib_get_rmpp_flags(rmpp_mad-rmpp_hdr) IB_MGMT_RMPP_FLAG_ACTIVE; + if (rmpp_active

Re: [openib-general] [PATCH] validate MADs issued from userspace for spec compliance C13-18.1.1

2006-07-11 Thread Sean Hefty
Michael S. Tsirkin wrote: This will be 2.6.18 material, right? I've committed this to svn. Assuming that further testing goes without a hitch, you can pull the actual check-in from revision 8498 to push upstream. - Sean ___ openib-general mailing

Re: [openib-general] [Openib-windows] ib_types.h and Win/Linux consolidation

2006-07-10 Thread Sean Hefty
Fabian Tillier wrote: Could you filter these out and send out what the actual changes thatmatter are? I quickly lost interest here. Can you also use the -up diff format? - Sean ___ openib-general mailing list openib-general@openib.org

Re: [openib-general] [PATCH upstream] IB/cm: drop REQ when out of memory

2006-07-10 Thread Sean Hefty
Michael S. Tsirkin wrote: I plan to send the following (from SVN trunk rev 8261) upstream to Andrew. Comments? This is fine. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To

Re: [openib-general] [PATCH upstream] IB/addr: gid structure alignment fix

2006-07-10 Thread Sean Hefty
Michael S. Tsirkin wrote: I plan to send the following (from SVN r8265) upstream to Andrew. Comments? looks fine - thanks for separating these changes out - Sean ___ openib-general mailing list openib-general@openib.org

Re: [openib-general] user_mad check question

2006-07-10 Thread Sean Hefty
Rimmer, Todd wrote: We defined a response as: ((R bit set || TRAP_REPRESS) ! SEND) || (Class=BM SEND AttributeModifier BM Response bit set) At this point, I'm leaning towards setting the upper bits of the TID for all MADs that are not responses. (This is for usermode only, so kernel

Re: [openib-general] user_mad check question

2006-07-10 Thread Sean Hefty
I disagree, this implies a non-symmetric translation of the TID for SENDs (ie. it would be translated on the outbound SEND but not on any corresponding inbound SEND which might be a reply). The CM and BMA established the precedent for a SEND based protocol where TID was important and class

Re: [openib-general] [PATCH] libibmad: Support MFT and Notice/Trap fields

2006-07-10 Thread Sean Hefty
Hal Rosenstock wrote: +enum TRAP_NUM_ID { + IB_TRAP_128, + + IB_TRAP_LAST +}; Should TRAP_128 be defined as 0? - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To

Re: [openib-general] rdma_cm callback event private data length == 0

2006-07-10 Thread Sean Hefty
Ira Weiny wrote: The problem with the private_data_len being 0 appears to be in the cma_ib_handler function. The following is a patch which simply tells the user the private data length for the REJ message. Lustre, which checks this length, then happily gets its data. Is this a bug which

Re: [openib-general] rdma_cm callback event private data length == 0

2006-07-10 Thread Sean Hefty
Michael S. Tsirkin wrote: + if (ib_event-event == IB_CM_REJ_RECEIVED) + { + printk(KERN_CRIT REJECT (private_data_len = %d)\n, + private_data_len); + } Not sure why is this KERN_CRIT? Also, pls take a look at Documentation/CodingStyle, Chapter 3:

Re: [openib-general] rdma_cm callback event private data length == 0

2006-07-10 Thread Sean Hefty
Michael S. Tsirkin wrote: I guess we need this upstream, don't we? Yes - I made a note to forward out a patch, but if you just pull it from svn 8483 to merge into your git tree, that would work. - Sean ___ openib-general mailing list

Re: [openib-general] rdma_cm callback event private data length == 0

2006-07-10 Thread Sean Hefty
Ira Weiny wrote: Do the other events need this as well? ie REQ, REP, RTU, DREQ, DREP? The REQ and REP set the private data size already. Private data is not reported to the user through the RDMA CM for the other events. As a general rule, an application cannot be guaranteed to receive

Re: [openib-general] [ucm] device file of the ucm is not being created

2006-07-10 Thread Sean Hefty
Dotan Barak wrote: On Thursday 06 July 2006 20:14, Sean Hefty wrote: Dotan Barak wrote: KERNEL=ucma, NAME=infiniband/%k, MODE=0666 KERNEL=rdma_cm, NAME=infiniband/%k, MODE=0666 do you know that is the problem? The ucma should be in /sys/class/misc/rdma_cm. To clarify, I believe

[openib-general] [PATCH] validate MADs issued from userspace for spec compliance C13-18.1.1

2006-07-10 Thread Sean Hefty
Enhance validation of MADs sent by userspace clients for spec compliance with C13-18.1.1 (duplicate requests / responses). Also verify that RMPP MADs are data only, to avoid a userspace app causing a kernel crash by sending non-data MADs. Signed-off-by: Sean Hefty [EMAIL PROTECTED] --- NOTE

Re: [openib-general] [Openib-windows] ib_types.h and Win/Linux consolidation

2006-07-09 Thread Sean Hefty
Hi Sean, Hal, Fab, I did the obvious diff... Attached is the results. To me most of the differences seem trivial to merge. Can you please resend as inline text, or at the very least a plain text attachment? - Sean ___ openib-general mailing list

Re: [openib-general] ucma into kernel.org

2006-07-07 Thread Sean Hefty
Michael S. Tsirkin wrote: Max CM retries is a 4-bit value carried in the REQ indicating the number of times that a REQ, REP, or DREQ can be retried. See 12.7.27. I would expect software to adhere to this value. Hmm. How can SDP implement TCP_SYNCNT then? I would like to retry as much

[openib-general] user_mad check question

2006-07-07 Thread Sean Hefty
The following check in user_mad is done when sending a MAD. /* * If userspace is generating a request that will generate a * response, we need to make sure the high-order part of the * transaction ID matches the agent being used to send the * MAD.

Re: [openib-general] user_mad check question

2006-07-07 Thread Sean Hefty
Rimmer, Todd wrote: While the TID can be appropriate for a SEND (it depends on management class, some classes could chose to always use 0), this code fragment cannot be sure if the SEND is a new request or a response to an existing request. Hence it cannot be certain if it should modify the

Re: [openib-general] ucma into kernel.org

2006-07-06 Thread Sean Hefty
Michael S. Tsirkin wrote: What I am saying that giving the application control over the timeouts seems more like a workaround than a solution. The CM timeout depends on both the round trip time, as well as the time it takes the remote service to respond to the connection request. The errors

Re: [openib-general] [ucm] device file of the ucm is not being created

2006-07-06 Thread Sean Hefty
Dotan Barak wrote: KERNEL=ucma, NAME=infiniband/%k, MODE=0666 KERNEL=rdma_cm, NAME=infiniband/%k, MODE=0666 do you know that is the problem? The ucma should be in /sys/class/misc/rdma_cm. - Sean ___ openib-general mailing list

Re: [openib-general] ucma into kernel.org

2006-07-06 Thread Sean Hefty
Michael S. Tsirkin wrote: TCP sockets just expose this to application through the TCP_SYNCNT option. Which leads again to my suggestion: since both TCP and IB CM have this, let us change max_cm_retries to max_request_retries, and add this in rdma_cm as a generic option. I'm not against

Re: [openib-general] ucma into kernel.org

2006-07-06 Thread Sean Hefty
Michael S. Tsirkin wrote: What limits IB retry cound to 15? I would expect it to be arbitrary. Note I am talking about CM REQ retries that we do in software and that are not passed in any message. Max CM retries is a 4-bit value carried in the REQ indicating the number of times that a REQ,

Re: [openib-general] ucma into kernel.org

2006-07-05 Thread Sean Hefty
Sean - looking on the cma/ucma APIs i see that the kernel APIs are not in place yet (eg the equivalent of rdma_set/get_options) or in the kernel the CMA consumer is expected to call directly the APIs exported by rdma_cm_ib.h? Kernel clients are expected to call the APIs directly. - Sean

Re: [openib-general] ucma into kernel.org

2006-07-05 Thread Sean Hefty
But are the options IB specific? Can they be geeralized to work for all transports? The options in rdma_cm_ib are IB specific (get/set path records and IB CM timeout values). - Sean ___ openib-general mailing list openib-general@openib.org

Re: [openib-general] CM and REP handling

2006-06-30 Thread Sean Hefty
Viswanath Krishnamurthy wrote: In the current communication manager (CM) implementation how is the REP MAD getting lost handled. When the REP gets lost, the cm_dup_req_handler gets called which currently enters the default condition and does nothing. The client retries the number of

Re: [openib-general] CM and REP handling

2006-06-30 Thread Sean Hefty
Rimmer, Todd wrote: Shouldn't the cm_dup_req_handler in this case also resend the REP per the IBTA passive side state machine REP Sent state? The REP will already being retried based on a timeout. It could be resent immediately in response to a duplicate REQ as well, but that shouldn't be

Re: [openib-general] CM and REP handling

2006-06-30 Thread Sean Hefty
Rimmer, Todd wrote: I would recommend implementing the state machine as defined in the spec for the following reasons: Technically, I believe that this follows the state machine. After receiving a duplicate REQ, a REP will be resent. The only difference is that there is a delay in resending

[openib-general] [PATCH] RMPP: add Dual-sided RMPP support

2006-06-30 Thread Sean Hefty
ACK to the request.) This is a slight spec deviation, but is necessary to allow communication with nodes that do not generate the DS ACK. It also handles the case when a response is sent after the request state has been discarded. Signed-off-by: Sean Hefty [EMAIL PROTECTED] --- This was tested

Re: [openib-general] ipath patch series a-comin', but no IB maintainer to shepherd them

2006-06-29 Thread Sean Hefty
This currently includes a single patch from Venkatesh Babu: IB/core: Set alternate port number when initializing QP attributes. that has been checked into openib svn by Sean. Thanks Michael. I will assume that you will push this change in through Roland when he's back. - Sean

Re: [openib-general] design for communication established affiliated asynchronous event handling

2006-06-29 Thread Sean Hefty
Rimmer, Todd wrote: The CM would open the CA, provide its async event callback routine and perform a special register_cm() verbs call. Of course most CM traffic would occur on the GSI QP, so this open CA instance was only for this purpose. This special verb was only available in kernel space

Re: [openib-general] thread safety

2006-06-29 Thread Sean Hefty
Andrew Friedley wrote: I'm working with Matt Leininger this summer on developing support for UD in Open MPI, and eventually multicast collectives - he suggested I ask my question here. Is there any documentation available on thread safety (i.e., what is (non-)reentrant) with the openib

Re: [openib-general] design for communication established affiliated asynchronous event handling

2006-06-28 Thread Sean Hefty
Roland Dreier wrote: I suggest the following design: the CMA would replace the event handler provided with the qp_init_attr struct with a callback of its own and keep the original handler/context on a private structure. This is probably fine. There is one further situation where the

Re: [openib-general] [Bug 160] OFED1.0: ib_modify_qp() of RC QP fails with -EINVAL

2006-06-28 Thread Sean Hefty
OFED is tracking 2.6.18 so to get things there they need to be submitted to Roland's for-2.6.18 tree. I downloaded Linus' latest tree today, and will submit a patch tomorrow. - Sean ___ openib-general mailing list openib-general@openib.org

Re: [openib-general] design for communication established affiliated asynchronous event handling

2006-06-28 Thread Sean Hefty
How about user taking this into account and not arming the CQ / not polling it until the established event? The CQ could be in use by other QPs. - Sean ___ openib-general mailing list openib-general@openib.org

Re: [openib-general] Kernel Oops related to IPoIB (multicast module?)

2006-06-27 Thread Sean Hefty
Tziporet Koren wrote: Resolving this issue is critical for us since it prevent us from any usage of the new multicsat module. An easy way to reproduce it is to use the OFED openibd script. Just run openibd start and than openibd stop and you will see the problem. This script is available

Re: [openib-general] Kernel Oops related to IPoIB (multicast module?)

2006-06-27 Thread Sean Hefty
Jack Morgenstein wrote: Evidently, ipoib was still attempting to connect with an SA, when the ipoib module was unloaded (modprobe -r). After the ipoib module was unloaded (or at least rendered inaccessible), the ib_sa module attempted to invoke ib_sa_mcmember_rec_callback (for a callback

Re: [openib-general] Kernel Oops related to IPoIB (multicast module?)

2006-06-27 Thread Sean Hefty
Sean Hefty wrote: The SA query interface always invokes a callback, regardless if a call succeeds. So if a call to ib_sa_mcmmember_rec_set() fails (which happens in this case because the SM is down), the user's callback is still invoked. The multicast module is coded assuming

Re: [openib-general] RFC: CMA backlog (was Re: CMA backlog)

2006-06-27 Thread Sean Hefty
If a user of the IB CM returns -ENOMEM from their connection callback, simply drop the incoming REQ. Do not send a reject, which should allow the sender to retry the request. This is necessary for SDP to support a backlog. Signed-off-by: Michael S. Tsirkin [EMAIL PROTECTED] Signed-off-by: Sean

Re: [openib-general] ucma into kernel.org

2006-06-27 Thread Sean Hefty
Tziporet Koren wrote: These features are needed for uDAPL and were requested by Woody and Arlin for Intel MPI scalability. Since in OFED 1.1 we are going to take CMA from kernel 2.6.18 we need them upstream. Can you drive these enhancements only to 2.6.18. I would like these features in

Re: [openib-general] RFC: CMA backlog (was Re: CMA backlog)

2006-06-27 Thread Sean Hefty
Michael S. Tsirkin wrote: Looks good to me. Please go ahead, then I'll use this in SDP and test this way. Committed in 8261. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe,

Re: [openib-general] ucma into kernel.org

2006-06-27 Thread Sean Hefty
Michael S. Tsirkin wrote: Would you consider making a git repository available with just the CMA code appropriate for OFED 1.1? Mixing git and SVN code to build OFED is really painful for us. Sure, I can consider doing that. There would just be some logistics to work out, like the location

[openib-general] [PATCH] ib_addr: fix get/set gid alignment issues

2006-06-27 Thread Sean Hefty
The device address contains unsigned character arrays, which contain raw GID addresses. The GIDs may not be naturally aligned, so do not cast them to structures or unions. Signed-off-by: Sean Hefty [EMAIL PROTECTED] --- This fixes an alignment issue pointed out by Michael when adding MGID

Re: [openib-general] ucma into kernel.org

2006-06-26 Thread Sean Hefty
Steve Wise wrote: I agree that it would be nice to get this into 2.6.18. It seems stable enough IMO. It's not a stability issue. We wanted to make sure that the user to kernel interface was correct before pushing anything upstream. At the time the decision was made (a couple of months

Re: [openib-general] RFC: CMA backlog (was Re: CMA backlog)

2006-06-26 Thread Sean Hefty
Michael S. Tsirkin wrote: Here's an untested patch that does this. Comments? Rather than exporting wrapper functions around atomic inc/dec, I would rather the user just maintain the current backlog themselves, with the patch limited to the cm.c file only. Index:

Re: [openib-general] ucma into kernel.org

2006-06-26 Thread Sean Hefty
Michael S. Tsirkin wrote: How about the cma changes required by ucma to get/set options? I think they are not upstream yet. Could these go upstream, to make building ucma out-of-kernel possible, without kernel patches? Wouldn't you have to patch the kernel to include the kernel ucma

Re: [openib-general] ucma into kernel.org

2006-06-26 Thread Sean Hefty
Michael S. Tsirkin wrote: Wouldn't you have to patch the kernel to include the kernel ucma anyway? I would? Why can't it be compiled as an out of kernel module? I understand you now. UD QP and multicast support were also recently added. I don't think that we want to risk pushing them

Re: [openib-general] ucma into kernel.org

2006-06-26 Thread Sean Hefty
Michael S. Tsirkin wrote: UD QP and multicast support were also recently added. These options are slightly different however - kernel ULPs I think will also want to set the number of retries/timeout (SDP needs it). So you can look it as a kind of fix, not a new feature. And, the change is

Re: [openib-general] RFC: CMA backlog (was Re: CMA backlog)

2006-06-26 Thread Sean Hefty
Michael S. Tsirkin wrote: I'm just saying that we can use exactly the code in ib_destroy_cm_id, but avoid calling ib_send_cm_rej in this one case: Ah... yes, something like that should work. - Sean ___ openib-general mailing list

Re: [openib-general] Kernel Oops related to IPoIB (multicast module?)

2006-06-26 Thread Sean Hefty
Jack Morgenstein wrote: The following Oops occurred upon unloading the openib driver. I unloaded the driver immediately following a reboot (the driver had been loaded during the boot sequence). I did NOT run opensm before unloading the driver. Evidently, ipoib was still attempting to

Re: [openib-general] Disabling end-to-end flow control

2006-06-23 Thread Sean Hefty
Is there a way to disable end-to-end flowcontrol using any of the API's ? I believe that all of the APIs (verbs, ib_cm, rdma_cm) let the user specify whether flow control is enabled. - Sean ___ openib-general mailing list

Re: [openib-general] [PATCH][TRIVIAL] librdmacm/examples/udaddy.c: Fix example name in messages

2006-06-23 Thread Sean Hefty
librdmacm/examples/udaddy.c: Fix example name in messages Signed-off-by: Hal Rosenstock [EMAIL PROTECTED] Thanks - if you haven't, can you commit this as well? (My connection is _really_ slow at the moment...) - Sean ___ openib-general mailing list

Re: [openib-general] uCMA kernel slab corruption and oops

2006-06-23 Thread Sean Hefty
I will look into this next week. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] mckey program

2006-06-23 Thread Sean Hefty
I was checking the mckey.c program for IB. I did some quick check and found that the rdma_resolve_addr function is invoking the cma_handler with erroneous event. mckey: event: 1, error: -19 Is there any easy way to check what might be happening? Try adding a route for 224.0.0.1 to the ipoib

Re: [openib-general] [PATCH v2 2/2] iWARP changes to librdmacm.

2006-06-23 Thread Sean Hefty
Are these changes acceptable? These look fine to commit by me. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit

Re: [openib-general] [PATCH] [TRIVIAL] librdmacm/examples/mckey.c: Fix example name in messages

2006-06-21 Thread Sean Hefty
librdmacm/examples/mckey.c: Fix example name in messages Signed-off-by: Hal Rosenstock [EMAIL PROTECTED] Thanks, Hal. Do you mind committing this change? - Sean ___ openib-general mailing list openib-general@openib.org

Re: [openib-general] ib_gid lookup

2006-06-20 Thread Sean Hefty
i'm trying to find whether i can do a lookup of ib_gid by either node name or node's ip address. is this information available from the subnet manager? A lookup is done from IP address to GID using the address translation module (ib_addr). This functionality is exposed to userspace through the

Re: [openib-general] is there is any SA client in user level?

2006-06-18 Thread Sean Hefty
Title: Message I want to send a join message to the SA from user space. I know that I can use the umad or the osm_vendor in order to do it.. what is the best way to do it? is there is any SA client implementation in the user level (or is it a transparent

Re: [openib-general] ucma into kernel.org

2006-06-16 Thread Sean Hefty
Steve Wise wrote: Will the ucma make it into 2.6.18? I notice its not in Roland's for-2.6.18 tree right now. The plan is to allow the userspace interface to mature some before trying to merge them upstream. This is why it is not included in 2.6.18. - Sean

Re: [openib-general] design for communication established affiliated asynchronous event handling

2006-06-16 Thread Sean Hefty
James Lentini wrote: As an alternative, I don't think that there's any reason why the QP can't be transition to RTS when the CM REP is sent. I like this idea. It simplifies how ULPs handle this issue. Are there any spec. compliance issues with this? There's no spec compliance issues that

Re: [openib-general] design for communication established affiliated asynchronous event handling

2006-06-16 Thread Sean Hefty
Or Gerlitz wrote: This is what i was suspecting, Sean can you confirm that? if it does not emulate RTU reception, than what it does do? Both receiving an RTU and getting a connection established event move the connection into the established state. They generate different events to the user

Re: [openib-general] design for communication established affiliated asynchronous event handling

2006-06-16 Thread Sean Hefty
Hal Rosenstock wrote: IMO, it would violate the CM state machine and the passive CM transition specification in 12.9.7.2 and have the effect of circumventing the retransmission of REP on lost RTU. Data can't fly until either the RTU or the first data message is received from the other

Re: [openib-general] design for communication established affiliated asynchronous event handling

2006-06-16 Thread Sean Hefty
Hal Rosenstock wrote: This moves the QP state to RTS, as opposed to the CEP state to connected. So I don't believe that it violates the spec. Isn't the CEP the QP (see p. 689 line 7) ? Hmm... I was viewing the CEP as moving through the states described in 12.9.5 and 12.9.6. (Idle, REQ

Re: [openib-general] [PATCH 1/5] ib_addr: retrieve MGID from device address

2006-06-16 Thread Sean Hefty
Sean Hefty wrote: dev_addr-broadcast + 4/dev_addr-src_dev_addr + 4 may not be naturally aligned, so casting this pointer to structure type may cause compiler to generate incorrect code. Thanks - I'll update this. An update for this ends up working out better as a separate patch. Fixes

Re: [openib-general] design for communication established affiliated asynchronous event handling

2006-06-16 Thread Sean Hefty
Rimmer, Todd wrote: CM - have a hook so the CM can get the Async Events for all CAs. On getting the Async Event for packet first packet received while in RTR (Communication established), the CM should treat this exactly like an RTU (with no private data). The CM will need to cross reference

Re: [openib-general] design for communication established affiliated asynchronous event handling

2006-06-15 Thread Sean Hefty
The cma/verbs consumer can't just ignore the event since its qp state is still RTR which means an attempt to tx replying the rx would fail. In most cases, I would expect that the IB CM will eventually receive the RTU, which will generate an event to the RDMA CM to transition the QP into RTS.

Re: [openib-general] [PATCH] backlog ignored when listening on all devs

2006-06-15 Thread Sean Hefty
Roland, can you pick up this patch for 2.6.18? Thanks - committed in 8057. - Sean If you listen on 0.0.0.0, then the backlog isn't passed down to the devices because its not stored in the id_priv struct before calling cma_listen_on_all(). See cma_list_on_dev() which uses id_priv-backlog...

Re: [openib-general] RFC: detecting duplicate MAD requests

2006-06-14 Thread Sean Hefty
Michael S. Tsirkin wrote: Would keeping around MADs in the done list consume significant extra memory resources? For kernel clients, it shouldn't consume any additional memory. For userspace clients, it would continue to consume memory until a response were generated. Currently, that memory

Re: [openib-general] communication established affiliated asynchronous event

2006-06-14 Thread Sean Hefty
James Lentini wrote: The IBTA spec (volume 1, version 1.2) describes a communication established affiliated asynchronous event. Is this event supposed to be delivered to the verbs consumer or the IB CM? We've seen this event delivered to our NFS-RDMA server and aren't sure what to do

Re: [openib-general] RFC: detecting duplicate MAD requests

2006-06-14 Thread Sean Hefty
Michael S. Tsirkin wrote: Is that per-agent, or global? If per-agent, can this hurt user that writes scripts using management utilities? These will typically send or receive something and exit. No? This is per agent. The proposal would only affect applications that generate the responses.

Re: [openib-general] oops on trunk

2006-06-14 Thread Sean Hefty
How many nodes were running on the fabric when this happened? This was just caused by executing modprobe -r ib_ipoib, right? I'm still completely stumped on how this is occurring, and haven't been able to reproduce it. - Sean ___ openib-general

Re: [openib-general] [PATCH 0/5] multicast abstraction

2006-06-14 Thread Sean Hefty
Sean Hefty wrote: This patch series enhances support for joining and leaving multicast groups, providing the following functionality: I'd like to commit both the multicast and UD QP support change sets. Are there any disagreements with committing these to the trunk? This would provide

Re: [openib-general] RFC: detecting duplicate MAD requests

2006-06-14 Thread Sean Hefty
Michael S. Tsirkin wrote: Another concern with this approach: consider an application that accepts incoming MAD requests and drops some of them. With current code it can do this safely and remote side will retry. With the duplicate tracking in umad module that you propose, MAD will stay in

Re: [openib-general] RFC: detecting duplicate MAD requests

2006-06-14 Thread Sean Hefty
Hal Rosenstock wrote: If everyone is okay with breaking the ABI, then I would add send completion notification to umad, and put the responsibility on callers not to generate duplicate responses. Is this a better architectural solution ? Not sure. It doesn't solve supporting DS RMPP, which

Re: [openib-general] RFC: detecting duplicate MAD requests

2006-06-14 Thread Sean Hefty
Michael S. Tsirkin wrote: Here's an alternative idea: instead of making huge changes all over, how about we delay passing the RMPP transaction up to the user until we have the ACK with the response window, and ask the user to give us back this ACK packet (or just the window?) when he sends

Re: [openib-general] RFC: detecting duplicate MAD requests

2006-06-14 Thread Sean Hefty
Michael S. Tsirkin wrote: We're kind of left with the same issue of trying to determine if a received MAD will generate a response. How do you mean? We have IsDS=1 flag for dual-sided, don't we? Dual-sided transfer always has a response, doesn't it? Unless I completely missed something,

Re: [openib-general] RFC: detecting duplicate MAD requests

2006-06-14 Thread Sean Hefty
Well the ACK for the direction switch is special, isn't it? All I'm saying, let's pass it up to the application. I really don't think that this is the direction that we want to take the interface. A multithreaded application could see the ACK before the request. Multiple ACKs could be received

Re: [openib-general] RFC: detecting duplicate MAD requests

2006-06-13 Thread Sean Hefty
There are architected ways to do that. There's busy for MADs which could be used for some MADs. For RMPP, would the transfer be ABORTed ? I don't think you can switch to BUSY in the middle (but I'm not 100% sure). I don't know how this limit is being used exactly, but it might be best if the RMPP

<    2   3   4   5   6   7   8   9   10   11   >