Re: [openib-general] failure to create an FMR mapping 1K pages on memfree
On 2/27/07, Roland Dreier <[EMAIL PROTECTED]> wrote: > Is it really returning -ENOMEM? It seems much more likely that you > are hitting the code > > /* For Arbel, all MTTs must fit in the same page. */ > if (mthca_is_memfree(dev) && > mr->attr.max_pages * sizeof *mr->mem.arbel.mtts > PAGE_SIZE) > return -EINVAL; > > I guess you could call this limit a driver design issue. Indeed, sorry for the in accorate description, mthca_fmr_alloc returns -EINVAL and the fmr pool code returns -ENOMEM. Thanks for the clarification. Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] failure to create an FMR mapping 1K pages on memfree
oops - i fogot to CC openib-general. On 2/26/07, Or Gerlitz <[EMAIL PROTECTED]> wrote: > Hi Roland, > > I have got a report on failure to create FMR mapping 1K pages (that is > 4MB) on memfree. > > I don't have the exact details (ie if Arbel/Sinai / what FW / etc) > nor which exact check fails in > mthca_fmr_alloc, but what's clear is that the latter function returns > -ENOMEM when attr.max_pages is 1024 and it works fine when > attr.max_pages is 256. > > Is this failure clear to you? if yes, does a HW or FW limit is being > hit or its a driver design issue? > > Or. > ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] ipoib & the partial pkey
Hal Rosenstock wrote: > On Sun, 2007-02-25 at 05:48, Or Gerlitz wrote: >> Just to have us agree on the quote, it is from section 4 of rfc 4392 >> (page 14) eg in http://www.ietf.org/rfc/rfc4392.txt >>> at the time of creating an IB multicast group, multiple values such as the >>> P_Key, Q_Key, Service Level, Hop Limit, Flow ID, TClass, MTU, etc. have to >>> be >>> specified. These values should be such that all potential members of the IB >>> multicast group are able to communicate with one another when using them. >> OK, I suggest to remove this spec limitation, > IMO you would need to get the IB spec changed first in order to do this. do you refers to this? > What about the description og P_Key in MCMemberRecord (table 210 on p. > 908 which is compliance) which states: > > "All members of the multicast group shall have full membership in the > partition indicated by the partition key." if yes, indeed, this also has to be changed. Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] ipoib & the partial pkey
Sean Hefty wrote: > I looked into this more... > RFC 4391 states (middle of page 5): > For a node to join a partition, one of its ports must be assigned the relevant > P_Key by the SM [RFC4392]. > Jumping to RFC 4392 (top of page 4): Just to have us agree on the quote, it is from section 4 of rfc 4392 (page 14) eg in http://www.ietf.org/rfc/rfc4392.txt > at the time of creating an IB multicast group, multiple values such as the > P_Key, Q_Key, Service Level, Hop Limit, Flow ID, TClass, MTU, etc. have to be > specified. These values should be such that all potential members of the IB > multicast group are able to communicate with one another when using them. OK, I suggest to remove this spec limitation, as it does not allow the use case of a server using a partition for which inter-client communication is not allowed. Actually since it does not let people use partial membership partitioning with IPoIB as every ipoib device needs to join the broadcast group, it is probably a spec bug and not a limitation done on purpose. A simple real-life example is I/O target, the system admin wants IB block and/or file storage traffic to use a partition, but he does not want initiators to communicate among themselves on this partition. To achieve that the SM is configured to assign the partial pkey to the initiator nodes and the full pkey to the target ports. The current implementation of IPoIB and core perfectly (and transparently...) supports that. Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey
On 2/22/07, Sean Hefty <[EMAIL PROTECTED]> wrote: > >My understanding is that when an IPoIB broadcast domain contains both > >partial and full members (*) attempts to communicate between two partial > >members would silently fail, does this silence is something you think we > >should work to change? > > I'm looking at this from a different view than just ipoib multicast groups. > For > example, can two users of the ib_cm successfully establish a connection, but > not > actually be able to transfer data between each other? This seems possible, > though unlikely. This is the type of silent failure I'm referring to. I don't think this is possible since the active CM uses the pkey index of the pkey provided in REQ.path to send the REQ mad, same for the passive CM - it uses the index in its table of REQ.path.pkey. So if the CMs are able to talk over QP1 using this pkey index the CM consumers can talk over their RC (REQ) / UD (SIDR REQ) QPs. And both the CM and its consumers would use the same index - the one returned from the ib_get_cached_pkey Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey
On 2/22/07, Sean Hefty <[EMAIL PROTECTED]> wrote: > >An IB multicast group _cannot_ have partial members so this never should > >get far enough to where two limited members would be unable to > >communicate. > Can someone help my understanding here? Is ipoib joining a multicast group > using the full membership PKey, even if the node that it joins from only has > the > limited membership PKey configured? And the code in ib_find_cached_pkey helps > enable this? Yep. The ipoib create_child function Or-s 0x8000 to the device pkey which was provided by the user. Now, IPoIB uses the device pkey when forming MGIDs and when doing modify qp to init. Indeed the way ib_find_cached_pkey() is implemented, make the latter use trivial. Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] librdmacm examples not working under OFED 1.2 alpha
Steve Wise wrote: > What device? mthca ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] librdmacm examples not working under OFED 1.2 alpha
I have tested RH4 U4 and to some extent also RH5 beta and see the following: under RH4 U4 - rping: addr and route resolution passing, client getting reject on conn req - udaddy: working fine on both UDP and IPOIB port spaces - mckey: not applicable on RH4 U4 till my patch with ip_ib_mc_map is merged under both udaddy and rping librdmacm report: librdmacm: couldn't read ABI version. librdmacm: assuming: 4 under RH5 = basically, the same: rping does not work, udaddy works on both port spaces. Also was able to check mckey and it works fine on both port spaces. The ABI error print is not seen. The rping client/server logs are below, Or. rping client [EMAIL PROTECTED] ~]# rping -c -v -d -a 193.168.80.175 ipaddr (193.168.80.175) librdmacm: couldn't read ABI version. librdmacm: assuming: 4 created cm_id 0x505f10 cma_event type 0 cma_id 0x505f10 (parent) cma_event type 2 cma_id 0x505f10 (parent) rdma_resolve_addr - rdma_resolve_route successful created pd 0x507830 created channel 0x506260 created cq 0x507880 created qp 0x507990 rping_setup_buffers called on cb 0x505010 allocated & registered buffers... cq_thread started. cq completion failed status 5 wait for CONNECTED state 10 connect error -1 rping_free_buffers called on cb 0x505010 cma_event type 8 cma_id 0x505f10 (parent) cma event 8, error 0 rping server === [EMAIL PROTECTED] ~]# rping -s -d -v -S 100 -C 100 verbose size 100 count 100 librdmacm: couldn't read ABI version. librdmacm: assuming: 4 created cm_id 0x505f00 rdma_bind_addr successful rdma_listen ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey
Sean Hefty wrote: >> Note that since the HCA validates the pkey in the in coming packet, no >> matter what the IB SW would do, partial members of a partition can't >> talk to each other. So the approach taken by the core/ipoib code was >> to just ignore the MSb in places where the code looks for the pkey >> --index-- and use the full member pkey when forming MGIDs. This seems >> fine to me. > My concern is that ib_find_cached_pkey() returns an index to a pkey that > wasn't > the one in the search. Can this lead to a QP being configured in such a way > that communication with a remote QP would silently fail? My understanding is that when an IPoIB broadcast domain contains both partial and full members (*) attempts to communicate between two partial members would silently fail, does this silence is something you think we should work to change? (*) eg when you have bunch or clients and a server or bunch of servers and you don't want to allow --clients-- to communicate among themselves) > I'm not against this patch, but I want to make sure that I understand the > issues, so we're not creating a work-around solution. The patch is against > the > librdmacm, yet there's nothing that I see in the librdmacm that makes me > think > it's behaving incorrectly. My thinking is that if in the end of this thread we are willing to move forward without changing ib_find_cached_pkey() then this patch should be merged. Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey
Hal Rosenstock wrote: > On Wed, 2007-02-21 at 15:45, Or Gerlitz wrote: >> On 21 Feb 2007 08:20:23 -0500, Hal Rosenstock <[EMAIL PROTECTED]> wrote: >> If the IPoIB spec does not allow both partial and full members of a >> partition to share a broadcast domain (eg the IPv4 broadcast group >> associated with the full membership pkey) or any other multicast >> group, burn it (or at least the relevant section). > I was referring to the IB spec, not an IPoIB RFC. Can you provide a pointer? >> The OpenIB code supposed to work and as done with the RDMA CM header, >> the implementation should not wait for spec to be written or changed. > Really ? Maybe I'm mistaken but I didn't think that OpenIB/OpenFabrics > wanted to issue code which is not IBA spec compliant. The code resides in the Linux kernel, period. Linux is not under the control of this or that organization, period, period. Linux uses an hierarchic maintainship structure where Roland, Sean and yourself are listed as the maintainers, which means you are able to promote and/or block this or that agenda, go for it! Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey
On 2/21/07, Sean Hefty <[EMAIL PROTECTED]> wrote: > >There is no problem. As i have explained over this thread the ipoib > >and the core abstract away from the user the actual value of the MSb > >of the pkey, that is whether it is a full or partial membership pkey. > > But *why* does the kernel code do this, and should it? It does this since its makes life simple and robust. Note that since the HCA validates the pkey in the in coming packet, no matter what the IB SW would do, partial members of a partition can't talk to each other. So the approach taken by the core/ipoib code was to just ignore the MSb in places where the code looks for the pkey --index-- and use the full member pkey when forming MGIDs. This seems fine to me. Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey
On 21 Feb 2007 08:20:23 -0500, Hal Rosenstock <[EMAIL PROTECTED]> wrote: > On Wed, 2007-02-21 at 07:35, Or Gerlitz wrote: > > > I believe it is a spec (compliance) violation for the port to be a > > > partial member and join as a full member. > > Since partial members can't talk among themselves, there is no reason to > > form a multicast group containing --only-- ports that can --not-- talk > > to each other... So if the spec does not allow this (having a partial > > member joining with the full member pkey) - it a spec bug... > I think there are two issues here then: > 1. If this is the case, getting the spec changed to accomodate this use case > 2. I believe that OpenIB code is supposed to be spec compliant. If the IPoIB spec does not allow both partial and full members of a partition to share a broadcast domain (eg the IPv4 broadcast group associated with the full membership pkey) or any other multicast group, burn it (or at least the relevant section). The OpenIB code supposed to work and as done with the RDMA CM header, the implementation should not wait for spec to be written or changed. Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey
On 2/21/07, Sean Hefty <[EMAIL PROTECTED]> wrote: >>However, no matter what the SM configures, the core & ipoib code act as >>the full pkey is there. This is nice simplification and it works well. > Is the problem here really in the librdmacm or in the core/ipoib software? There is no problem. As i have explained over this thread the ipoib and the core abstract away from the user the actual value of the MSb of the pkey, that is whether it is a full or partial membership pkey. IPoIB does it by OR-ing 0x8000 to the pkey it uses and the core does it in ib_find_cached_pkey() which when provided a pkey, return the index of $pkey or of $pkey & 0x7fff which ever one of the them is there. The only missing piece is for librdmacm to play this game as well and the patch does this. > (I looked at the patch, but haven't looked into the full reason why it's > needed.) start with checking me... tell the SM to configure 0x7fff instead of 0x to one of your nodes as the pkey at index 0, then see that ping is working but librdmacm RC utils such as rping or ib_rdma_bw -c do not. Then apply the patch and check again. Or. Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH for-2.6.21] IPoIB/cm: improve small message bandwidth
Michael S. Tsirkin wrote: > Avoid overhead of freeing/reallocating and mapping/unmapping for dma > for pages that have not been written to by hardware. > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c > b/drivers/infiniband/ulp/ipoib/ipoib_cm.c > index 8ee6f06..a23c8e3 100644 > --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c > +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c > @@ -68,14 +68,14 @@ struct ipoib_cm_id { > static int ipoib_cm_tx_handler(struct ib_cm_id *cm_id, > struct ib_cm_event *event); > > -static void ipoib_cm_dma_unmap_rx(struct ipoib_dev_priv *priv, > +static void ipoib_cm_dma_unmap_rx(struct ipoib_dev_priv *priv, int frags, > u64 mapping[IPOIB_CM_RX_SG]) > { > int i; > > ib_dma_unmap_single(priv->ca, mapping[0], IPOIB_CM_HEAD_SIZE, > DMA_FROM_DEVICE); > > - for (i = 0; i < IPOIB_CM_RX_SG - 1; ++i) > + for (i = 0; i < frags; ++i) > ib_dma_unmap_single(priv->ca, mapping[i + 1], PAGE_SIZE, > DMA_FROM_DEVICE); > } I understand that in ipoib_cm_alloc_rx_skb you call dma_map_page on IPOIB_CM_RX_SG pages where here you call dma_unmap_single only $frags times, correct? does this means you are trashing the IOMMU etc etc of the system? Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey
>> However, no matter what the SM configures, the core & ipoib code act as >> the full pkey is there. This is nice simplification and it works well. > I believe it is a spec (compliance) violation for the port to be a > partial member and join as a full member. Since partial members can't talk among themselves, there is no reason to form a multicast group containing --only-- ports that can --not-- talk to each other... So if the spec does not allow this (having a partial member joining with the full member pkey) - it a spec bug... Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey
>> Yes. Its a little bit confusing: partial and full members of an IPoIB IB >> partition use the same MGID. When an IPoIB MGID is constructed, the pkey >> placed by the driver is --always-- the full membership one. However, on >> a node with partial membership, what's plugged into the QP is the pkey >> index of the partial instance... > So in this case, do both the full and partial keys need configuring for > that port ? No. The SM configures --either-- the full or the partial pkey. However, no matter what the SM configures, the core & ipoib code act as the full pkey is there. This is nice simplification and it works well. Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey
Hal Rosenstock wrote: > On Tue, 2007-02-20 at 10:38, Or Gerlitz wrote: >> Yes. Its a little bit confusing: partial and full members of an IPoIB IB >> partition use the same MGID. When an IPoIB MGID is constructed, the pkey >> placed by the driver is --always-- the full membership one. However, on >> a node with partial membership, what's plugged into the QP is the pkey >> index of the partial instance... > So in this case, do both the full and partial keys need configuring for > that port ? No. The SM configures --either-- the full or the partial pkey. However, no matter what the SM configures, the core & ipoib code act as the full pkey is there. This is nice simplification and it works well. Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey
Hal Rosenstock wrote: >> The pkey extracted by the RDMA CM from the IPoIB device hardware address >> always >> has the full membership bit set. However, when looking in the pkey table the >> search must mask out the full membership bit. > Is this true for both RC and UD QPs ? I thought that at least the UD QPs > were being used for multicast in which case wouldn't full member be > required for this ? Yes. Its a little bit confusing: partial and full members of an IPoIB IB partition use the same MGID. When an IPoIB MGID is constructed, the pkey placed by the driver is --always-- the full membership one. However, on a node with partial membership, what's plugged into the QP is the pkey index of the partial instance... In the kernel all this is nicely hidden from the IB ULPs in ib_find_cached_pkey(). Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Fork issues with simple MPI program
Arlin Davis wrote: > Any insight would be greatly appreciated. It was our assumption that the > parent process can continue > to use IB resources after the fixes went into 2.6.16 and OFED 1.1. Is this > true? As was discussed over this list in few occasions: in contrast to popular thought the fork support was deployed in libibverbs1.1 where OFED 1.1 contains libibverbs1.0 Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [openfabrics-ewg] OFED 1.2 alpha release
Tziporet Koren wrote: > Regarding RHEL4 U4 and IPoIB bug - Or just prepared a patch that should > fix it. We will merge it and test for the beta. The patch will only fix the bug for RDMA CM multicast consumers, since unlike IPoIB who gets the (wrong in the RH4 U4 case) L2 multicast address from the stack, the rdma cm has the multicast IP address and is able to compute the correct L2 address. This is confusing, i know... ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey
Hi Sean, this fixes a bug which did not allow to run librdmacm apps over a node which is partial member of a partition. The patch takes the approach of the kernel ib_find_cached_pkey implementation. If you approve this, i suggest pushing it also into OFED 1.2 as a bug fix. Or. -- The pkey extracted by the RDMA CM from the IPoIB device hardware address always has the full membership bit set. However, when looking in the pkey table the search must mask out the full membership bit. Signed-off-by: Or Gerlitz <[EMAIL PROTECTED]> Signed-off-by: Olga Shern <[EMAIL PROTECTED]> diff --git a/src/cma.c b/src/cma.c index c5f8cd9..9c24c6a 100644 --- a/src/cma.c +++ b/src/cma.c @@ -661,7 +661,7 @@ static int ucma_find_pkey(struct cma_dev for (i = 0, ret = 0; !ret; i++) { ret = ibv_query_pkey(cma_dev->verbs, port_num, i, &chk_pkey); - if (!ret && pkey == chk_pkey) { + if ((!ret && pkey == chk_pkey) || (!ret && htons(ntohs(pkey) & 0x7fff) == chk_pkey)) { *pkey_index = (uint16_t) i; return 0; } ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] uDAPL: RDMA Write example
Christian Kaiser wrote: > I'm trying to find a small sample program, that uses RDMA Write instead > of Send/Recv. In the sources there is no single uDAPL example program > and on the net neither. > Could someone please help me to find something useful? see http://dapl.svn.sourceforge.net/viewvc/dapl/trunk/test/dapltest Anyway, can you comment what using udapl buys you which you don't get from coding to the verbs (libibverbs) and rdmacm (librdmacm) ??? Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] IPv6 multicast address per NIC
Hal Rosenstock wrote: > Or, > > On Thu, 2007-02-15 at 04:25, Or Gerlitz wrote: >> Hi, >> >> I see that when IPv6 is enabled in the kernel, the stack joins for a >> --dedicated-- multicast group per each interface. Can anyone here supply >> me with a pointer to where this is defined, doing a quick look on rfc >> 3307 did not provide an answer. > > You are referring to the solicited-node multicast address (see RFC > 4291). There have been several different threads on issues relating to > this on this list over time. thanks for the pointer, i will look into that. Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] IPv6 multicast address per NIC
Hi, I see that when IPv6 is enabled in the kernel, the stack joins for a --dedicated-- multicast group per each interface. Can anyone here supply me with a pointer to where this is defined, doing a quick look on rfc 3307 did not provide an answer. Or. Below is the maddr show on a node with two partitions on ib0, note that the --pkey-- is not presented in the link addresses since IPoIB fill that in its own copy (i don't mind send a patch to fix that if anyone here think it is helpful). $ ip maddr show > 41: ib0 > link 00:ff:ff:ff:ff:12:40:1b:00:00:00:00:00:00:00:00:00:00:00:01 > link 00:ff:ff:ff:ff:12:60:1b:00:00:00:00:00:00:00:01:ff:98:00:6d > link 00:ff:ff:ff:ff:12:60:1b:00:00:00:00:00:00:00:00:00:00:00:01 > inet 224.0.0.1 > inet6 ff02::1:ff98:6d > inet6 ff02::1 > 45: ib0.8001 > link 00:ff:ff:ff:ff:12:40:1b:00:00:00:00:00:00:00:00:00:00:00:01 > link 00:ff:ff:ff:ff:12:60:1b:00:00:00:00:00:00:00:01:ff:98:00:6d > link 00:ff:ff:ff:ff:12:60:1b:00:00:00:00:00:00:00:00:00:00:00:01 > inet 224.0.0.1 > inet6 ff02::1:ff98:6d > inet6 ff02::1 > 46: ib0.8003 > link 00:ff:ff:ff:ff:12:40:1b:00:00:00:00:00:00:00:00:00:00:00:01 > link 00:ff:ff:ff:ff:12:60:1b:00:00:00:00:00:00:00:01:ff:98:00:6d > link 00:ff:ff:ff:ff:12:60:1b:00:00:00:00:00:00:00:00:00:00:00:01 > inet 224.0.0.1 > inet6 ff02::1:ff98:6d > inet6 ff02::1 > ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] please pull for 2.6.21: fix + add IB multicast support
Or Gerlitz wrote: > Roland Dreier wrote: >> I merged the "increment port number" and "remove redundant '_wq'" >> patches from git.openfabrics.org/~shefty/scm/rdma-dev.git for-roland >> >> I plan to review to multicast stuff next week and I hope to merge it >> for 2.6.21. Or, have you or anyone else at Voltaire read over the >> code in addition to using it? Do you see anything that should be >> cleaned up? > > OK, I spent some time today on reviewing and playing with the ib_sa: > track multicast join/leave requests patch - and have no special > comments. I think the two patches are ready for merge, let me know if > you have any specific question. Roland - any progress here? Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] mvapich2 ofed 1.2 problem
Roland Dreier wrote: > > How do I tell? Can I tell from the .so files? > > ldd on the .so and the app would probably give you good info. > > I'm pretty sure that mpicc must be linking against an libibverbs 1.0 > from somewhere. To be really sure which dynamic libraries where loaded, do $ info sharedlibrary within the gdb console Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] please pull for 2.6.21: fix + add IB multicast support
Roland Dreier wrote: > I merged the "increment port number" and "remove redundant '_wq'" > patches from git.openfabrics.org/~shefty/scm/rdma-dev.git for-roland > > I plan to review to multicast stuff next week and I hope to merge it > for 2.6.21. Or, have you or anyone else at Voltaire read over the > code in addition to using it? Do you see anything that should be > cleaned up? OK, I spent some time today on reviewing and playing with the ib_sa: track multicast join/leave requests patch - and have no special comments. I think the two patches are ready for merge, let me know if you have any specific question. Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] please pull for 2.6.21: fix + add IB multicast support
On 2/9/07, Roland Dreier <[EMAIL PROTECTED]> wrote: > I plan to review to multicast stuff next week and I hope to merge it for > 2.6.21 thanks, good news! > Or, have you or anyone else at Voltaire read over the > code in addition to using it? Do you see anything that should be > cleaned up? OK, I most the the review i did (and interaction with Sean to add changes) was on the rdma_cm: add multicast communication support patch, and i was less focused on the ib_sa: track multicast join/leave requests patch, however i recall that there were some discussions between Sean and Michael and they reached an agreement. I will look on the ib_sa patch on Sunday and let Sean/you know if i have any comments. Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] please pull for 2.6.21: fix + add IB multicast support
Or Gerlitz wrote: > Sean Hefty wrote: >>> Sean Hefty (3): >>>rdma_cm: Increment port number after close to avoid re-use. >>>ib_sa: track multicast join/leave requests >>>rdma_cm: add multicast communication support >> Assuming that you haven't look at this yet, I updated the ib_sa patch >> above to shorten the workqueue name, plus added a fourth patch to >> shorten the workqueue names for ib_addr and rdma_cm. E.g. "ib_mcast_wq" >> became "ib_mcast". > Roland, > We are working (developing and testing) with a userspace rdma cm based > multicast app over this code during the last two months and are very > satisfied with it. The testing included IPoIB, the user space app and > multicast interoperability between them. Roland, Can you comment on the status of this merge request? thanks, Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] please pull for 2.6.21: fix + add IB multicast support
Sean Hefty wrote: >> Sean Hefty (3): >>rdma_cm: Increment port number after close to avoid re-use. >>ib_sa: track multicast join/leave requests >>rdma_cm: add multicast communication support > > Assuming that you haven't look at this yet, I updated the ib_sa patch > above to shorten the workqueue name, plus added a fourth patch to > shorten the workqueue names for ib_addr and rdma_cm. E.g. "ib_mcast_wq" > became "ib_mcast". > Let me know if you need any assistance. Roland, Can you comment on the multicast changes merge for 2.6.21 status? We are working (developing and testing) with a userspace rdma cm based multicast app over this code during the last two months and are very satisfied with it. The testing included IPoIB, the user space app and multicast interoperability between them. Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Immediate data question
On 2/5/07, Tang, Changqing <[EMAIL PROTECTED]> wrote: > On sender side: > opcode = IBV_WR_SEND_WITH_IMM; > imm_data = my_4_bytes_data; > Do I still need to specify sg_list and num_sge ? At the sender side i think you can do well with: opcode = IBV_WR_SEND send_flags |= IBV_SEND_INLINE sge.addr = pointer to the 4 bytes sge.len = 4 sge.lkey = don't care since the 4 bytes are --copied-- by the IB library from sge.addr during the execution of ibv_post_send(), the owenership of sge.addr is yours once the call returns. > On receiver side, because the immediate data is inside the completion > structure, do I need to post a receive for above message ? yes, i don't see how you can get a way from posting a receive WR > The reason I ask is that at some point, I can not(or hard) to provide > registered memory only for 4 bytes data. what about the mpi impl. header ??? do you have a case where only 4 bytes need to be passed to the other side? Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] ip_ib_mc_map?
Or Gerlitz wrote: > On 2/2/07, Doug Ledford <[EMAIL PROTECTED]> wrote: >> Yeah, I've got a setup, I just don't have any multicast tests that I >> run. Any test programs you have for multicast in particular would be >> helpful. > This is farely simple to do: have some multicast traffic routed over > an IPoIB subnet on two nodes, eg using > > $ route add -net 224.0.0.0 netmask 255.0.0.0 dev ib0 > $ iperf -usB 224.5.5.5 -i 1 OK, to verifying the problem is away based on running client/server is actually harder, since when the problem persist data is being moved on the broadcast group... so basically, first thing you want to do is set routing, then open an iperf server and see if the netstack has computed a correct IPoIB multicast hw address and instructed the device to use it. > # iperf -usB 224.5.5.5 & this is on U3, the stack computed fine the hw addresses for 224.5.5.5 and 224.0.0.1 > # ip maddr show ib0 > 5: ib0 > link 00:ff:ff:ff:ff:12:40:1b:00:00:00:00:00:00:00:00:00:05:05:05 > link 00:ff:ff:ff:ff:12:40:1b:00:00:00:00:00:00:00:00:00:00:00:01 > inet 224.5.5.5 > inet 224.0.0.1 this is on U4, the stack did not compute any hw addresses for 224.5.5.5 and 224.0.0.1, the inet addresses are the output of /proc/net/igmp which means the stack is aware this node joins these groups but as we know the ARPHRD_INFINIBAND case was removed from the code computing a multicast link layer address... > # ip maddr show ib0 > 8: ib0 > inet 224.5.5.5 > inet 224.0.0.1 So basically, if on your U5-staged node, you have the same # ip maddr show output as over U3 we made a progress. Really verifying that this traffic does not go over the broadcast group is a little bit harder, you would need a third active IPoIB device (that is another node or a second ipoib running device on the rx machine - eg ib1), run the iperf multicast test and make sure the --rx counters-- of the third device doe not get progress, where on U3 they would progress since all mcast traffic goes on the broadcast channel. Please let me know if you need any further clarifications on how to test this, and... thanks! for taking care of it. Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Detecting when an RDMA writer process disappears
Mike Heffner wrote: > Is there any method by which a receiving process that is polling in > preregistered memory regions for data from a sender performing RDMA > writes, can detect if the sender is killed? Say by a SIGKILL signal? The > RC connection is setup using the RDMA CM and there do not appear to be > any CM events created on the event channel If you have a process with connected RDMA CM ID whose associated peer process died you should get DISCONNECTED event. how do you verify that there is no rdma cm event present at the polling side? Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] ip_ib_mc_map?
On 2/2/07, Doug Ledford <[EMAIL PROTECTED]> wrote: > > As of the importance for us to have IP multicast working fine with > > IPoIB over RH4... > > do you have an IB setup to test that? > > Yeah, I've got a setup, I just don't have any multicast tests that I > run. Any test programs you have for multicast in particular would be > helpful. This is farely simple to do: have some multicast traffic routed over an IPoIB subnet on two nodes, eg using $ route add -net 224.0.0.0 netmask 255.0.0.0 dev ib0 and then server $ iperf -usB 224.5.5.5 -i 1 client $ iperf -uc 224.5.5.5 -l 100 -b 50M -t 30 -i 1 Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] ip_ib_mc_map?
On 2/1/07, Doug Ledford <[EMAIL PROTECTED]> wrote: > On Thu, 2007-02-01 at 10:58 +0200, Or Gerlitz wrote: > > From a reason that no one at RH can trace... someone went and removed > > all the support for ARPHRD_INFINIBAND multicast from u4 where it exists > > perfectly fine in u3 and hopefully on u5 as well (Doug can you update?), > > see https://bugs.openfabrics.org/show_bug.cgi?id=2661 > Yes. It's been fixed for U5. It wasn't that the patch got removed, > it's that between U3 and U4 I did a complete rebase, which means that > all the patches from U3 were tossed out the window and a complete new > set made for U4. I just missed re-adding this one in U4. thanks for fixing this for U5 (which i understand is not out yet, correct?). As of the importance for us to have IP multicast working fine with IPoIB over RH4... do you have an IB setup to test that? Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] ip_ib_mc_map?
Michael S. Tsirkin wrote: >> 1) adding a backport to the rdma_cm containing ip_ib_mc_map, period. >> 2) having the rdma cm follow the net stack and make its consumer use the >> broadcast group. > Correct. Since multicast is broken in other respects on U4 > (sockets can't join multicast groups), I think 2 is the simplest approach. The situation in U4 is kind of more involved, sockets doing IP_ADD_MEMBERSHIP to some multicast group are actually sending and receiving traffic over the IPoIB broadcast group which makes this cluster IPoIB kind of hell. > Anyone who wants IPoIB milticast should just stay away from U4. We are still interested to be able to run our multicast app over the RDMA CM and we want it to be done over the correct multicast group and not over a broadcast group. So option 2 is real problem for us. Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [mthca] Creation of a SRQ with many WR (> 16K) in kernel level fails
Michael S. Tsirkin wrote: >> As for the user space sharing of the same limitation, how about adding >> to the --kernel-- struct ib_device_attr "for user space" buddy fields to >> max_qp_wr max_srq_wr and max_cqe such that each hw driver set both >> values: for the "user space" field the actual hw limitation and for >> "kernel space" field a value which would pass kmalloc. > We could do that I guess but no one so far used query in kernel, > and userspace values are currently good. srp calls ibv_device_query but does not care for these fields, as for IPoIB CM if you see things as in my other email, i guess you don't need to query as well. However, as this is a kind of easy to implement change which does not break the user kernel ABI and allows kernel consumers to count on query results they got from the hw driver, going longer term i think we do want to have it done. Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [mthca] Creation of a SRQ with many WR (> 16K) in kernel level fails
Roland Dreier wrote: > > anyway, the solution that comes into my mind is to disable creating a > > QP/SRQ for which > 128KB allocations are needed. So > > mthca_query_device() will set the max_qp_wr and max_srq_wr attributes > > to values whose derived size still allows to use kmalloc. > > But that will limit the size of the queues that userspace can create > too. I guess we could allocate kernel wrid arrays with vmalloc(), but > I wonder if anyone actually cares about this limit... mmm, i would avoid vmalloc if possible. Allocating upto 128K bytes for a kernel resource sounds fine. As for the user space sharing of the same limitation, how about adding to the --kernel-- struct ib_device_attr "for user space" buddy fields to max_qp_wr max_srq_wr and max_cqe such that each hw driver set both values: for the "user space" field the actual hw limitation and for "kernel space" field a value which would pass kmalloc. kernel ULPs calling ibv_device_query would use the original fields, no need to patch them. Same for user space ULPs no need to patch them. However, when the call is made from user space, uverbs_query_device copies to the resp struct the "user space" attr. Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [mthca] Creation of a SRQ with many WR (> 16K) in kernel level fails
Dotan Barak wrote: > I think that now, when implementation of IPoIB CM is available and SRQ > is being used, one may > need to use a SRQ with more than 16K WRs. IPoIB UD uses SRQ by nature (since RX from all peers consume buffers from the --only-- RQ) and lives fine with 32 buffers (or 64 you can look in the code). Moreover, my assumption is that pps(RC) <= pps(UC) <= pps(UD) this means that what ever number of RX buffer for UD/2K MTU which is "enough" to have no (or close to zero) packet loss under some traffic pattern, the same pattern can be served with IPoIB CM using SRQ of the same size. Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] ip_ib_mc_map?
Steve Wise wrote: > where can I find this symbol? I can't load rdma_cm on rhel4u4... > rdma_cm: Unknown symbol ip_ib_mc_map Sean, OK, sorry not to mention the rh4u4 issue once you did the push to OFED 1.2 ... From a reason that no one at RH can trace... someone went and removed all the support for ARPHRD_INFINIBAND multicast from u4 where it exists perfectly fine in u3 and hopefully on u5 as well (Doug can you update?), see https://bugs.openfabrics.org/show_bug.cgi?id=2661 Specifically, the below snip from the patch means that on rh4 u4 all IPv4 ARPHRD_INFINIBAND multicast goes on the broadcast group !!! > Index: linux-2.6.9/net/ipv4/arp.c > === > --- linux-2.6.9.orig/net/ipv4/arp.c 2004-10-18 23:55:06.0 +0200 > +++ linux-2.6.9/net/ipv4/arp.c2006-09-20 14:43:59.0 +0300 > @@ -213,6 +213,9 @@ > case ARPHRD_IEEE802_TR: > ip_tr_mc_map(addr, haddr); > return 0; > + case ARPHRD_INFINIBAND: > + ip_ib_mc_map(addr, haddr); > + return 0; > default: > if (dir) { > memcpy(haddr, dev->broadcast, dev->addr_len); anyway, OFED wise, i see two ways to solve this: 1) adding a backport to the rdma_cm containing ip_ib_mc_map, period. This means that apps offloading multicast traffic through the rdma cm would use the correct group where apps working through the net stack use the broadcast group. 2) having the rdma cm follow the net stack and make its consumer use the broadcast group. Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [RFC][PATCH] rdma_cm: allow joins to return a unique address
On 1/30/07, Sean Hefty <[EMAIL PROTECTED]> wrote: > I believe that this patch lets you can do what you're trying to do. The group > handle would be the returned mgid from the initial join that created the > group. > The mgid would need to be passed to other processes as an IPv6 address, who > issue a join request on that group. (The mgid is available from the > rdma_cm_event.param.ud.ah_attr.grh.dgid.) Sean, I understand that your approach relies on the uniqueness of the MGID being generated. This means that to have different MPI jobs use different MGIDs , the MGIDs must be generated --always-- on the same NODE and be propagated to other nodes/ranks participating in that MPI job - correct? Andrew - can you fulfil this demand? that is having the rank which generated MGIDs always run on the same node of the cluster??? Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [mthca] Creation of a SRQ with many WR (> 16K) in kernel level fails
Dotan Barak wrote: > When one tries to create a SRQ with many WR (> 16K WR), creation of the SRQ > fails. > static int mthca_alloc_srq_buf(struct mthca_dev *dev, struct mthca_pd *pd, >struct mthca_srq *srq) > srq->wrid = kmalloc(srq->max * sizeof (u64), GFP_KERNEL); > if (!srq->wrid) > return -ENOMEM; > which means that creating a SRQ with 16K WRs (or more), the driver will try to > allocate 16K*8=128K bytes using kmalloc. This is a very high amount of memory > to be allocated using kmalloc. mthca_alloc_wqe_buf has the same problem, as it does qp->wrid = kmalloc((qp->rq.max + qp->sq.max) * sizeof (u64), GFP_KERNEL); anyway, the solution that comes into my mind is to disable creating a QP/SRQ for which > 128KB allocations are needed. So mthca_query_device() will set the max_qp_wr and max_srq_wr attributes to values whose derived size still allows to use kmalloc. Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH 1/2] rdma_cm: add support to join IPOIB multicast groups
On 1/25/07, Sean Hefty <[EMAIL PROTECTED]> wrote: > >The only missing piece here, as we agreed yesterday is to allow using > >PS_IPOIB IDs for unicast traffic over librdmacm, i guess this should be > >fairly simple to add. > I'm adding this now. I would like to include all of these changes as part of > the multicast code push for OFED/upstream. I hope to test this today. Cool, just to make sure... the push to OFED should include both the kernel and librdmacm changes... i did not see a commit of the librdmacm patch to your librdmacm git tree. thanks for all your help and responsiveness Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH 1/2] rdma_cm: add support to join IPOIB multicast groups
Sean Hefty wrote: > Add to the rdma_cm an IPOIB port space that allows interoperability with > IPoIB multicast traffic. Use of the RDMA_PS_IPOIB is limited to multicast > join/leave. > > Rename the RDMA_UD_QKEY to RDMA_UDP_QKEY to signify that the qkey is only > used with the RDMA_PS_UDP port space. > > Signed-off-by: Sean Hefty <[EMAIL PROTECTED]> > --- > This patch differs from those posted by Or by limiting the ipoib port space > to multicast traffic only. OK, Sean i have tested the two patches and things are working fine, that is I have changed my multicast app to use RDMA_PS_IPOIB instead of RDMA_PS_UDP and I am now able to run it against itself and against ipoib in all the possibilities : tx-app / rx-ipoib tx-ipoib / rx-app tx-app / rx-app this means that basically (*) you have my OK for pushing the mutlicast support to OFED 1.2 (again my thinking is that this is fine for upstream as well). The only missing piece here, as we agreed yesterday is to allow using PS_IPOIB IDs for unicast traffic over librdmacm, i guess this should be fairly simple to add. However, as the code freeze deadline becomes closer, would you be able to implement and push this by the end of this week? Basically, my thinking is that if have the code that allows PS_IPOIB to do unicast and you have both udaddy and mckey working in --both-- PS_IPOIB and PS_UDP modes - push that. how does this sounds to you? Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH 1/2] rdma_cm: add support to join IPOIB multicast groups
On 1/24/07, Sean Hefty <[EMAIL PROTECTED]> wrote: > > The peer IPoIB would send an ARP and then would assume it can send its > > packets to the QP number provided in the arp reply, so it would be > > talking not with the rdma cm consumer but rather with the underlying > > IPoIB in this node. > > Okay - so you want to change the QPN from that given in the ARP? I missed > that > you wanted this, and I think I understand better what you're trying to do. we don't want to use the QPN from the arp reply but rather the sidr exchange etc as it is implemented in the rdma cm. > > no! it is broken since the PS_IPOIB ID/QP that joined/attached the > > multicast group is now using the ipoib broadcast qkey where the PS_UDP > > ID/QP is using the RDMA_UDP_QKEY > > I'm only trying to support communication within the same port space, not > between > them. Unicast is supported between different RDMA_PS_IPOIB QPs. working only within a port space makes sense. However, your patch does not allow for PS_IPOIB IDs to do unicast since some places in the cma kernel code only care for PS_UDP where they should care for PS_UDP OR PS_IPOIB as i did in my patch... > The question > is how to obtain the IB unicast address (i.e. QPN, etc.) for RDMA_PS_IPOIB. > My > assumption was that this capability wasn't needed, but you're saying that it > is. > I will update the patches. thanks, and again its fine to obtain the IB unicast address for PS_IPOIB IDs using the sidr exchange, you don't need to worry on the ARP result. Only make sure that PS_IPOIB uses the ipoib broadcast group qkey and also to what i mention above (code branching on PS_UDP where it should do so on PS_UDP or PS_IPOIB). thanks! Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH 1/2] rdma_cm: add support to join IPOIB multicast groups
On 1/24/07, Or Gerlitz <[EMAIL PROTECTED]> wrote: > On 1/24/07, Sean Hefty <[EMAIL PROTECTED]> wrote: > > > However, it is possible that an RDMA_PS_IPOIB consumer would want to > > > talk over ---one-- UD QP with two peers: > > > 1) IPoIB - multicast traffic > > > 2) --another-- RDMA CM consumer - unicast traffic > > After the user joins the multicast group, unicast traffic is still > > supported. > no! it is broken since the PS_IPOIB ID/QP that joined/attached the > multicast group is now using the ipoib broadcast qkey where the PS_UDP > ID/QP is using the RDMA_UDP_QKEY OK, i have managed to confuse myself... with the patch you have sent PS_IPOIB ID does not does support unicast traffic so this all use scanrio is not possible from the first place. But, my preferation is not to block RDMA CM use patterns of UD unicast to UD unicast and UD unicast to UD unicast/multicast etc. Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH 1/2] rdma_cm: add support to join IPOIB multicast groups
On 1/24/07, Sean Hefty <[EMAIL PROTECTED]> wrote: > > However, it is possible that an RDMA_PS_IPOIB consumer would want to > > talk over ---one-- UD QP with two peers: > > > > 1) IPoIB - multicast traffic > > 2) --another-- RDMA CM consumer - unicast traffic > My thinking on this was that path record lookup and SIDR resolution isn't part > of the ipoib protocol, and I wanted to limit the scope of the patch. indeed they are not part of the ipoib protocol, but the reason there interop is not possible between PS_IPOIB ID/QP to peer node IPoIB UD - is much more simple - as of IPoIB address resolution... The peer IPoIB would send an ARP and then would assume it can send its packets to the QP number provided in the arp reply, so it would be talking not with the rdma cm consumer but rather with the underlying IPoIB in this node. On the other direction you are correct, IPoIB does not listen for SIDR requests. > After the user joins the multicast group, unicast traffic is still supported. no! it is broken since the PS_IPOIB ID/QP that joined/attached the multicast group is now using the ipoib broadcast qkey where the PS_UDP ID/QP is using the RDMA_UDP_QKEY > The issue I see is whether the rdma_cm uses address resolution (which ends up > being IP ARP), an SA query, and SIDR to resolve the remote QPN, or if it can > obtain it through some other method. A possible fallback to RDMA CM consumer is: issue ARP, then send SIDR - if there is no response use the IPoIB QP from the ARP reply and the ipv4 broadcast qkey to talk directly with IPoIB. However, as i mention above this hack is not possible in the other direction, that is you can't make IPoIB do unicast talking with PS_IPOIB consumer. Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] ib_addr: Handle Ethernet neighbour updates during route resolution.
On 1/24/07, Steve Wise <[EMAIL PROTECTED]> wrote: > Handle Ethernet neighbour updates during route resolution. > The IWCM uses the ib_addr services to do route resolution (neighbour > discovery in the IP world). The ib_addr netevent callback routine, > however, currently only acts on Inifininband neighbour updates. It needs > to act on ethernet neighbour updates as well. > This patch just removes filtering on device type altogether and > will trigger on any neighour updates where the nud_type is valid. > This simplifies the code some. OK, as I have mentioned in the past there is a check in the fast path xmit code of IPoIB to verify that the neighbour we are using now to xmit (skb->neigh) has not changed its HA address since the last time IPoIB xmit-ed with it - that is that the GID in the struct neighbous->ha is the same as the GID in struct ipoib_neigh. Such a diff happens when the kernel is acting to gratitius arp - that is a remote peer has changed its HW address (eg as of fail-over of an IP address from one IPoIB NIC to another IPoIB NIC - eg with bonding). >From this patch i understand that we can register to the neighbour change event in IPoIB and eliminate the run time check !?!?!? Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] RDMA CM multicast
On 1/24/07, Sean Hefty <[EMAIL PROTECTED]> wrote: > What would be needed is a way for the user to indicate that they need a unique > address. An obvious way to accomplish this is for the user to specify an IP > address of 0.0.0.0 when calling rdma_join_multicast(). The user would first > need to bind to a specific device by calling rdma_bind_addr() with a local IP > address. > Your code would look something like this: > rdma_bind_addr(local IP address) > rdma_join_multicast(0.0.0.0, port 0)<- exchange group info out of band > rdma_join_multicast(0.0.0.0, port 1)<- exchange group info out of band > send data to a lot of nodes at once > rdma_leave_multicast(0.0.0.0, port 0) > rdma_leave_multicast(0.0.0.0, port 1) Sean, This seems to me as a little bit of over engineering... since we do require that to use the RDMA CM the consumers must have a functional IPoIB NIC (so they can call rdma_bind_addrress to resolve the device/port/pkey) we can add another requirement to have the sys admin configure their routing such that some multicast IP subnet (eg net 224.0.0.0 mask 255.0.0.0) is routed to the IPoIB NIC. Once this routing is in place, the only thing they need is to enhance the MPI job starter/etc to allocate to each job (say) two unique multicast --IP-- addresses on the relevant subnet and provide these IP addresses to each rank. Now the rank can use the RDMA CM without any hack. Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH 1/2] rdma_cm: add support to join IPOIB multicast groups
On 1/24/07, Sean Hefty <[EMAIL PROTECTED]> wrote: > > However, it will not support "mixed mode" communication patterns (which > > you were raising last week) that is one app having a UD QP for both > > multicast and unicast that talks with two "peers" IPoIB multicast and > > another app doing only unicast. > Separating ipoib to its own port space alleviated my concerns on existing > usage. > The RDMA_PS_UDP continues operating as before, with mixed mode traffic > supported. Mixed mode for RDMA_PS_IPOIB is not supported, since it's not > clear > to me how that would be used. The IPOIB protocol doesn't use SIDR, so I'm > hesitant to extend the capabilities until there's a clear need/use. Indeed, it is not possible to have UDP --unicast-- interop between "IPoIB UD" (ie not IPoIB CM) and an RDMA_PS_IPOIB RDMA CM consumer. However, it is possible that an RDMA_PS_IPOIB consumer would want to talk over ---one-- UD QP with two peers: 1) IPoIB - multicast traffic 2) --another-- RDMA CM consumer - unicast traffic since both talks are over the same QP everyone must use the same --QKEY--, now since RDMA_PS_IPOIB does not support the SIDR exchange this config is broken. The patch i have sent allows this, and it can be really nice to remove this restriction with some documentation explaining the restrictions. > > Also, just a clarification - how exactly the patch enforces that an app > > would not be able to do listen/connect/accept on RDMA_PS_IPOIB ID??? > This is not enforce directly yet. (It just requires an if statement in > resolve > route.) I would expect that if it were tried, there would be a failure at > some > point. OK, that (failure at some point) was my thought as well. Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [RFC/PATCH] librdmacm: use the ipoib broadcast group qkey
Sean Hefty wrote: >> Maybe just ask user to always call rdma_join_multicast after rdma_create_qp? >> Joins are now properly reference counted, so it shouldn't be a problem >> to repeat this any number of times. Right? > > This is the solution for now, and it should work fine. I don't think it would > be hard to support creating the QP after joining if someone ever came up with > the need, but it doesn't seem like a priority at the moment. Indeed, since to do multicast RX/TX you need an IB UD QP... naturally an IB app (eg IPoIB) would create its QP before doing any join/leave on the group, if there would be a demand for a "crazy" use scheme of 1st join 2nd create qp, you can enhance librdmacm to support it. Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH 2/2] librdmacm: add support to join IPOIB multicast groups
Sean Hefty wrote: > Add to the librdmacm an IPOIB port space that allows interoperability with > IPoIB multicast traffic. Use of the RDMA_PS_IPOIB is limited to multicast > join/leave. the two patches seems fine, however i will not be able to test them today being out of the office all the day, will send my testing feedback on Thursday early IL time (late Wed night PST) Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [RFC/PATCH v3] rdma/cma: add RDMA_PS_IPOIB port space
Sean Hefty wrote: > I was thinking of SIDR, but what about connected mode ipoib? This could > make the ipoib port space interesting, or require breaking it into two > separate port spaces, or... I'm only going to worry about multicast for > now, unless there's a reason to consider other use. I don't think we need to worry on offloading IPoIB connected mode now, but thanks for bringing the idea. Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH 1/2] rdma_cm: add support to join IPOIB multicast groups
Sean Hefty wrote: > Add to the rdma_cm an IPOIB port space that allows interoperability with > IPoIB multicast traffic. Use of the RDMA_PS_IPOIB is limited to multicast > join/leave. OK, Sean the patch looks perfectly fine for allowing multicast interoperability with IPoIB. However, it will not support "mixed mode" communication patterns (which you were raising last week) that is one app having a UD QP for both multicast and unicast that talks with two "peers" IPoIB multicast and another app doing only unicast. Such a scenario would have been supported if you allow for unicast apps to use the IPOIB port space as well - similar to the my version of the patch. Also, just a clarification - how exactly the patch enforces that an app would not be able to do listen/connect/accept on RDMA_PS_IPOIB ID??? Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] librdmacm and udapl: Which git branch to use in ofed_1_2 build
Tziporet Koren wrote: > Sean Hefty wrote: >>>multicast >>> >> >> This goes with the multicast branch of my rdma-dev git tree. >> >> IMO, OFED should determine which features they want and pull in the >> appropriate branch. I know that Voltaire would like the multicast >> feature, but require a couple of changes to the code before its usable >> for them. > Moni/Or > Can you update us regarding multicast feature status and testing I am working with Sean over the list on the changes needed to the multicast code needed for interoperability with IPoIB, it seems to converge and the code should be ready by the end of this week to be merged. Sean owns this and would do the push into OFED and upstream. I am doing testing all the time over my systems, but my code bases are either upstream or something i have set on top of OFED 1.1, i don't have an OFED 1.2 env yet. Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [RFC/PATCH] librdmacm: modify multicast code for RDMA_PS_IPOIB port space
Sean, with one host being both the client and the server, mckey does not work for me even without the IPOIB PS changes, it used to work between two hosts with the below patch that forces the sender to generate and poll completions on its TX packets. please let me know how its going with mckey on your system, i am going to test the librdmacm patches i have just sent with our mcast app. Or. Index: librdmacm/examples/mckey.c === --- librdmacm.orig/examples/mckey.c 2007-01-23 16:24:19.0 +0200 +++ librdmacm/examples/mckey.c 2007-01-23 16:50:13.0 +0200 @@ -452,10 +453,14 @@ static int run(void) if (is_sender) { printf("initiating data transfers\n"); for (i = 0; i < connections; i++) { - ret = post_sends(&test.nodes[i], 0); + ret = post_sends(&test.nodes[i], IBV_SEND_SIGNALED); if (ret) goto out; - } + } + printf("polling data transfers completion\n"); + ret = poll_cqs(); + if (ret) + goto out; } else { printf("receiving data transfers\n"); ret = poll_cqs(); ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [RFC/PATCH] librdmacm: modify multicast code for RDMA_PS_IPOIB port space
Enhance the mckey test program to work in either of the port spaces. Signed-off-by: Or Gerlitz <[EMAIL PROTECTED]> Index: librdmacm/examples/mckey.c === --- librdmacm.orig/examples/mckey.c 2007-01-23 16:52:16.0 +0200 +++ librdmacm/examples/mckey.c 2007-01-23 17:02:26.0 +0200 @@ -78,6 +78,7 @@ static int message_count = 10; static int is_sender; static char *dst_addr; static char *src_addr; +static enum rdma_port_space port_space = RDMA_PS_UDP; static int create_message(struct cmatest_node *node) { @@ -328,7 +329,7 @@ static int alloc_nodes(void) for (i = 0; i < connections; i++) { test.nodes[i].id = i; ret = rdma_create_id(test.channel, &test.nodes[i].cma_id, -&test.nodes[i], RDMA_PS_UDP); +&test.nodes[i], port_space); if (ret) goto err; } @@ -472,7 +473,7 @@ int main(int argc, char **argv) { int op, ret; - while ((op = getopt(argc, argv, "m:sb:c:C:S:")) != -1) { + while ((op = getopt(argc, argv, "m:sb:c:C:S:p:")) != -1) { switch (op) { case 'm': dst_addr = optarg; @@ -492,6 +493,9 @@ int main(int argc, char **argv) case 'S': message_size = atoi(optarg); break; + case 'p': + port_space = strtol(optarg, NULL, 0); + break; default: printf("usage: %s\n", argv[0]); printf("\t-m multicast_address\n"); @@ -500,6 +504,7 @@ int main(int argc, char **argv) printf("\t[-c connections]\n"); printf("\t[-C message_count]\n"); printf("\t[-S message_size]\n"); + printf("\t[-p port space - %#x for UDP %#x for IPoIB]\n",RDMA_PS_UDP,RDMA_PS_IPOIB); exit(1); } } ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [RFC/PATCH v2] librdmacm: add RDMA_PS_IPOIB port space
Add to librdmacm an IPoIB port space (RDMA_PS_IPOIB) whose semantics are similar to those of RDMA_PS_UDP where RDMA_PS_IPOIB IDs allow for inter operability with IPoIB on some traffic patterns. For RDMA_PS_UDP and RDMA_PS_IPOIB IDs, the qkey is provided by the kernel in ADDR_RESOLVED and CONNECT_REQUEST events and is stored by the library in struct cma_id_private. Later the library use the qkey when it is called to create a UD QP. The udaddy test program was enhanced to work in either of the port spaces. Signed-off-by: Or Gerlitz <[EMAIL PROTECTED]> Index: librdmacm/src/cma.c === --- librdmacm.orig/src/cma.c2007-01-22 21:21:37.0 +0200 +++ librdmacm/src/cma.c 2007-01-23 13:57:48.0 +0200 @@ -116,6 +116,7 @@ struct cma_id_private { pthread_mutex_t mut; uint32_t handle; struct cma_multicast *mc_list; + uint32_t qkey; }; struct cma_multicast { @@ -687,7 +688,7 @@ static int ucma_init_ud_qp(struct cma_id qp_attr.port_num = id_priv->id.port_num; qp_attr.qp_state = IBV_QPS_INIT; - qp_attr.qkey = RDMA_UD_QKEY; + qp_attr.qkey = id_priv->qkey; ret = ibv_modify_qp(qp, &qp_attr, IBV_QP_STATE | IBV_QP_PKEY_INDEX | IBV_QP_PORT | IBV_QP_QKEY); if (ret) @@ -718,7 +719,7 @@ int rdma_create_qp(struct rdma_cm_id *id if (!qp) return -ENOMEM; - if (id->ps == RDMA_PS_UDP) + if (id->ps == RDMA_PS_UDP || id->ps == RDMA_PS_IPOIB) ret = ucma_init_ud_qp(id_priv, qp); else ret = ucma_init_ib_qp(id_priv, qp); @@ -809,7 +810,7 @@ int rdma_accept(struct rdma_cm_id *id, s void *msg; int ret, size; - if (id->ps != RDMA_PS_UDP) { + if (id->ps != RDMA_PS_UDP && id->ps != RDMA_PS_IPOIB) { ret = ucma_modify_qp_rtr(id); if (ret) return ret; @@ -1169,6 +1170,7 @@ int rdma_get_cm_event(struct rdma_event_ struct ucma_abi_get_event *cmd; struct cma_event *evt; void *msg; + struct cma_id_private *id_priv; int ret, size; ret = cma_dev_cnt ? 0 : ucma_init(); @@ -1197,8 +1199,11 @@ retry: evt->id_priv = (void *) (uintptr_t) resp->uid; evt->event.id = &evt->id_priv->id; evt->event.status = ucma_query_route(&evt->id_priv->id); + id_priv = evt->id_priv; if (evt->event.status) evt->event.event = RDMA_CM_EVENT_ADDR_ERROR; + else if (id_priv->id.ps == RDMA_PS_UDP || id_priv->id.ps == RDMA_PS_IPOIB) + id_priv->qkey = resp->param.ud.qkey; break; case RDMA_CM_EVENT_ROUTE_RESOLVED: evt->id_priv = (void *) (uintptr_t) resp->uid; @@ -1211,12 +1216,16 @@ retry: evt->id_priv = (void *) (uintptr_t) resp->uid; if (evt->id_priv->id.ps == RDMA_PS_TCP) ucma_copy_conn_event(evt, &resp->param.conn); - else + else ucma_copy_ud_event(evt, &resp->param.ud); ret = ucma_process_conn_req(evt, resp->id); if (ret) goto retry; + + id_priv = container_of(evt->event.id, struct cma_id_private, id); + if (id_priv->id.ps == RDMA_PS_UDP || id_priv->id.ps == RDMA_PS_IPOIB) + id_priv->qkey = resp->param.ud.qkey; break; case RDMA_CM_EVENT_CONNECT_RESPONSE: evt->id_priv = (void *) (uintptr_t) resp->uid; @@ -1233,7 +1242,8 @@ retry: case RDMA_CM_EVENT_ESTABLISHED: evt->id_priv = (void *) (uintptr_t) resp->uid; evt->event.id = &evt->id_priv->id; - if (evt->id_priv->id.ps == RDMA_PS_UDP) { + id_priv = evt->id_priv; + if (id_priv->id.ps == RDMA_PS_UDP || id_priv->id.ps == RDMA_PS_IPOIB) { ucma_copy_ud_event(evt, &resp->param.ud); break; } Index: librdmacm/examples/udaddy.c === --- librdmacm.orig/examples/udaddy.c2007-01-22 21:19:52.0 +0200 +++ librdmacm/examples/udaddy.c 2007-01-23 15:50:48.0 +0200 @@ -76,6 +76,7 @@ static int message_size = 100; static int message_count = 10; static char *dst_addr; static char *src_addr; +static enum rdma_port_space port_space = RDMA_PS_UDP; static int create_message(struct cmatest_node *node) { @@ -253,7 +254,7 @@ err: return ret; } -static int connect_handler
[openib-general] [RFC/PATCH] rdma/cma: port rdma_cm multicast code to the UDP/IPOIB port space framework
Allow rdma_cm/ipoib multicast inter operability for RDMA_PS_IPOIB IDs. This is implemented by having the rdma cm use the --same-- qkey and multicast gid used by ipoib where for RDMA_UD_UDP IDs the rdma cm uses a qkey of its own and adds a signature byte to the multicast gid. Signed-off-by: Or Gerlitz <[EMAIL PROTECTED]> Index: rdma-dev/drivers/infiniband/core/cma.c === --- rdma-dev.orig/drivers/infiniband/core/cma.c 2007-01-23 15:56:01.0 +0200 +++ rdma-dev/drivers/infiniband/core/cma.c 2007-01-23 15:56:23.0 +0200 @@ -2473,7 +2473,10 @@ static int cma_join_ib_multicast(struct return ret; ip_ib_mc_map(sin->sin_addr.s_addr, mc_map); - mc_map[7] = 0x01; /* Use RDMA CM signature */ + if (id_priv->id.ps == RDMA_PS_UDP) { + rec.qkey = RDMA_UD_QKEY; /* Use RDMA CM QKEY */ + mc_map[7] = 0x01; /* Use RDMA CM signature */ + } mc_map[8] = ib_addr_get_pkey(dev_addr) >> 8; mc_map[9] = (unsigned char) ib_addr_get_pkey(dev_addr); ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [RFC/PATCH v3] rdma/cma: add RDMA_PS_IPOIB port space
Add to the RDMA CM an IPoIB port space (RDMA_PS_IPOIB) whose semantics are similar to those of RDMA_PS_UDP where RDMA_PS_IPOIB IDs allow for inter operability with IPoIB on some traffic patterns. For RDMA_PS_UDP and RDMA_PS_IPOIB IDs, the qkey is stored in struct rdma_id_private and delivered also in ADDR_RESOLVED and CONNECT_REQUEST events. The user space library learns the qkey from these events and use them when it is called to create UD QP. The IB UD qkey used by RDMA_PS_IPOIB IDs is that of the related ipoib broadcast group where the qkey used by RDMA_PS_UDP IDs is hard defined "rdma cm qkey". Creation of RDMA_PS_IPOIB IDs by proceeses is controlled by the linux kernel capabilities subsystem. Signed-off-by: Or Gerlitz <[EMAIL PROTECTED]> Index: rdma-dev/drivers/infiniband/core/cma.c === --- rdma-dev.orig/drivers/infiniband/core/cma.c 2007-01-21 12:11:16.0 +0200 +++ rdma-dev/drivers/infiniband/core/cma.c 2007-01-23 15:45:52.0 +0200 @@ -71,6 +71,7 @@ static struct workqueue_struct *cma_wq; static DEFINE_IDR(sdp_ps); static DEFINE_IDR(tcp_ps); static DEFINE_IDR(udp_ps); +static DEFINE_IDR(ipoib_ps); struct cma_device { struct list_headlist; @@ -136,6 +137,7 @@ struct rdma_id_private { u32 seq_num; u32 qp_num; u8 srq; + u32 qkey; }; struct cma_multicast { @@ -323,6 +325,10 @@ struct rdma_cm_id *rdma_create_id(rdma_c { struct rdma_id_private *id_priv; + /* XXX - work around this till capabilities work fine for non root users */ + if (ps == RDMA_PS_IPOIB && !capable(CAP_NET_BROADCAST)) + return ERR_PTR(-EACCES); + id_priv = kzalloc(sizeof *id_priv, GFP_KERNEL); if (!id_priv) return ERR_PTR(-ENOMEM); @@ -884,6 +890,31 @@ out: return ret; } +static int cma_set_qkey(struct rdma_id_private *id_priv, struct rdma_cm_event *event) +{ + struct ib_sa_mcmember_rec rec; + struct rdma_dev_addr *dev_addr; + int ret; + + if (id_priv->id.ps == RDMA_PS_IPOIB) { + dev_addr = &id_priv->id.route.addr.dev_addr; + ib_addr_get_mgid(dev_addr, &rec.mgid); + ret = ib_sa_get_mcmember_rec(id_priv->id.device, id_priv->id.port_num, +&rec.mgid, &rec); + if (ret) + return -EINVAL; + id_priv->qkey = rec.qkey; + event->param.ud.qkey = rec.qkey; + } + + if (id_priv->id.ps == RDMA_PS_UDP) { + id_priv->qkey = RDMA_UD_QKEY; + event->param.ud.qkey = RDMA_UD_QKEY; + } + + return 0; +} + static struct rdma_id_private *cma_new_conn_id(struct rdma_cm_id *listen_id, struct ib_cm_event *ib_event) { @@ -999,7 +1030,7 @@ static int cma_req_handler(struct ib_cm_ memset(&event, 0, sizeof event); offset = cma_user_data_offset(listen_id->id.ps); event.event = RDMA_CM_EVENT_CONNECT_REQUEST; - if (listen_id->id.ps == RDMA_PS_UDP) { + if (listen_id->id.ps == RDMA_PS_UDP || listen_id->id.ps == RDMA_PS_IPOIB) { conn_id = cma_new_udp_id(&listen_id->id, ib_event); event.param.ud.private_data = ib_event->private_data + offset; event.param.ud.private_data_len = @@ -1020,7 +1051,11 @@ static int cma_req_handler(struct ib_cm_ mutex_unlock(&lock); if (ret) goto release_conn_id; - + + ret = cma_set_qkey(conn_id, &event); + if (ret) + goto release_conn_id; + conn_id->cm_id.ib = cm_id; cm_id->context = conn_id; cm_id->cm_handler = cma_ib_handler; @@ -1600,6 +1635,7 @@ static void addr_handler(int status, str { struct rdma_id_private *id_priv = context; struct rdma_cm_event event; + int ret; memset(&event, 0, sizeof event); atomic_inc(&id_priv->dev_remove); @@ -1627,6 +1663,11 @@ static void addr_handler(int status, str memcpy(&id_priv->id.route.addr.src_addr, src_addr, ip_addr_size(src_addr)); event.event = RDMA_CM_EVENT_ADDR_RESOLVED; + ret = cma_set_qkey(id_priv, &event); + if (ret) { + event.event = RDMA_CM_EVENT_ADDR_ERROR; + event.status = ret; + } } if (id_priv->id.event_handler(&id_priv->id, &event)) { @@ -1822,6 +1863,9 @@ static int cma_get_port(struct rdma_id_p case RDMA_PS_UDP: ps = &udp_ps; break; + case RDMA_PS_IPOIB: +
Re: [openib-general] rdma/cma: use the ipoib broadcast group qkey - linux capabilities
Or Gerlitz wrote: >> This checks prevents applications from trying to use port numbers below 1024 >> without unless they possess the net bind service capability. A similar check >> could just be: >> >> if (ps == RDMA_PS_IPOIB && !capable(CAP_NET_BIND_SERVICE)) >> return -EACCES; > > OK, lets see i got it: your suggestion is that only if the process has > the net bind service capability it would be able to create RDMA_PS_IPOIB > IDs. How do processes get a possession of this capability(). > > Talking here, I understand that there are issues with Linux > capability()-ies , specifically capabilities are not passed through > execve() see "understanding Linux capabilities brokenness" @ > http://lkml.org/lkml/2005/8/8/248 > > This means capabilities are practically not usable for "non root processes". I have now got a pointer to this more recent LKML discussion where a patch was suggested to solve the problem "patch to make Linux capabilities into something useful (v 0.3.1)" @ http://lkml.org/lkml/2006/9/5/246 This means that unless someone proves that capabilities are not broken, we will allow (eg under some mod param) non-root apps to create RDMA_PS_IPOIB IDs, OK? Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [RFC/PATCH v2] rdma/cma: use the ipoib broadcast group qkey
Sean Hefty wrote: > After more consideration, I think this is the correct approach. I've already > started working on a patch for this that I should have done but by the end of > the week (hopefully tomorrow). > This checks prevents applications from trying to use port numbers below 1024 > without unless they possess the net bind service capability. A similar check > could just be: > > if (ps == RDMA_PS_IPOIB && !capable(CAP_NET_BIND_SERVICE)) > return -EACCES; OK, lets see i got it: your suggestion is that only if the process has the net bind service capability it would be able to create RDMA_PS_IPOIB IDs. How do processes get a possession of this capability(). Talking here, I understand that there are issues with Linux capability()-ies , specifically capabilities are not passed through execve() see "understanding Linux capabilities brokenness" @ http://lkml.org/lkml/2005/8/8/248 This means capabilities are practically not usable for "non root processes". Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [RFC/PATCH v2] rdma/cma: use the ipoib broadcast group qkey
On 1/23/07, Sean Hefty <[EMAIL PROTECTED]> wrote: > Or Gerlitz wrote: > > Modify the kernel rdma cm use the ipoib broadcast group qkey instead a qkey > > of its own for its UD IDs/QPs. For RDMA_PS_UDP ID, the qkey is stored in > > struct rdma_id_private and delivered also in ADDR_RESOLVED and > > CONNECT_REQUEST events. The user space library learns the qkey from these > > events and use them when it is called to create UD QP. > > Overall, I think this is a reasonable approach. I would just like the > framework > to provide a way to restrict any userspace application from joining an ipoib > multicast group. What do you think of the idea of creating a new port space > specific to ipoib, similar to what's provided for SDP? Basically, I am positive to this, under the assumption that it will be possible for --non-- root user space application to create RDMA_PS_IPOIB IDs and use them as i would have been doing with RDMA_PS_UDP IDs. > For example, add: > enum rdma_port_space { > RDMA_PS_SDP = 0x0001, > + RDMA_PS_IPOIB = 0x0002, > RDMA_PS_TCP = 0x0106, > RDMA_PS_UDP = 0x0111, > > The qkey/MGID would adjust based on the port space, which is specified as part > of rdma_create_id(). OK > of rdma_create_id(). Use of RDMA_PS_IPOIB could then be restricted using a > check similar to that used for port assignment (see cma_use_port() - > capable(CAP_NET_BIND_SERVICE)). I don't want to loose a day, so if you don't mind, i would ask you for a crash course here, i don't really think to fully understand the following lines from cma_use_port() ... 1753 sin = (struct sockaddr_in *) &id_priv->id.route.addr.src_addr; 1754 snum = ntohs(sin->sin_port); 1755 if (snum < PROT_SOCK && !capable(CAP_NET_BIND_SERVICE)) 1756 return -EACCES; what would be the equivalent check for RDMA_PS_IPOIB? and would this check be done only on rdma_create_id time? Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [RFC/PATCH] librdmacm: use the ipoib broadcast group qkey
Sean, Using the two patches udaddy works fine except for the packets sent by the passive side which are filtered out by the active side HCA/QP. This is b/c the passive side of this --test-- is not really doing RDMA CM UD qp and qkey resolution but rather uses the imm data to "exchange" (below) the active side qp and hard coded qkey. I think that in real life librdmacm apps this sort of design is much less expected, and the passive side would also initiate qp/qkey/sidr exchange. I need to think on this point a little bit to see if my design can be changed a little to allow for this sort of simplification. +/* + * Global qkey value for all UD QPs and multicast groups created via the + * RDMA CM. + * XXX FIXME - enhance test to not assume a pre defined qkey + */ +#define RDMA_UD_QKEY 0x01234567 + +static void create_reply_ah(struct cmatest_node *node, struct ibv_wc *wc) +{ + node->ah = ibv_create_ah_from_wc(node->pd, wc, node->mem, +node->cma_id->port_num); + node->remote_qpn = ntohl(wc->imm_data); + node->remote_qkey = RDMA_UD_QKEY; +} Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [RFC/PATCH] librdmacm: use the ipoib broadcast group qkey
Modify librdmacm use a qkey for its UD IDs/QPs delivered to it by the rdma cm kernel code instead the a hard coded RDMA_UD_QKEY. For RDMA_PS_UDP ID, the qkey is provided by the kernel in ADDR_RESOLVED and CONNECT_REQUEST events and is stored by the library in struct cma_id_private. Later the library use the qkey when it is called to create a UD QP. Signed-off-by: Or Gerlitz <[EMAIL PROTECTED]> Index: librdmacm/src/cma.c === --- librdmacm.orig/src/cma.c2007-01-22 21:21:37.0 +0200 +++ librdmacm/src/cma.c 2007-01-22 21:57:13.0 +0200 @@ -116,6 +116,7 @@ struct cma_id_private { pthread_mutex_t mut; uint32_t handle; struct cma_multicast *mc_list; + uint32_t qkey; }; struct cma_multicast { @@ -687,7 +688,7 @@ static int ucma_init_ud_qp(struct cma_id qp_attr.port_num = id_priv->id.port_num; qp_attr.qp_state = IBV_QPS_INIT; - qp_attr.qkey = RDMA_UD_QKEY; + qp_attr.qkey = id_priv->qkey; ret = ibv_modify_qp(qp, &qp_attr, IBV_QP_STATE | IBV_QP_PKEY_INDEX | IBV_QP_PORT | IBV_QP_QKEY); if (ret) @@ -1169,6 +1170,7 @@ int rdma_get_cm_event(struct rdma_event_ struct ucma_abi_get_event *cmd; struct cma_event *evt; void *msg; + struct cma_id_private *id_priv; int ret, size; ret = cma_dev_cnt ? 0 : ucma_init(); @@ -1199,6 +1201,9 @@ retry: evt->event.status = ucma_query_route(&evt->id_priv->id); if (evt->event.status) evt->event.event = RDMA_CM_EVENT_ADDR_ERROR; + else if (evt->id_priv->id.ps == RDMA_PS_UDP) { +evt->id_priv->qkey = resp->param.ud.qkey; + } break; case RDMA_CM_EVENT_ROUTE_RESOLVED: evt->id_priv = (void *) (uintptr_t) resp->uid; @@ -1211,12 +1216,16 @@ retry: evt->id_priv = (void *) (uintptr_t) resp->uid; if (evt->id_priv->id.ps == RDMA_PS_TCP) ucma_copy_conn_event(evt, &resp->param.conn); - else + else ucma_copy_ud_event(evt, &resp->param.ud); ret = ucma_process_conn_req(evt, resp->id); if (ret) goto retry; + if (evt->id_priv->id.ps == RDMA_PS_UDP) { + id_priv = container_of(evt->event.id, struct cma_id_private, id); + id_priv->qkey = resp->param.ud.qkey; + } break; case RDMA_CM_EVENT_CONNECT_RESPONSE: evt->id_priv = (void *) (uintptr_t) resp->uid; Index: librdmacm/examples/udaddy.c === --- librdmacm.orig/examples/udaddy.c2007-01-22 21:19:52.0 +0200 +++ librdmacm/examples/udaddy.c 2007-01-22 22:02:07.0 +0200 @@ -415,6 +415,13 @@ static void destroy_nodes(void) free(test.nodes); } +/* + * Global qkey value for all UD QPs and multicast groups created via the + * RDMA CM. + * XXX FIXME - enhance test to not assume a pre defined qkey + */ +#define RDMA_UD_QKEY 0x01234567 + static void create_reply_ah(struct cmatest_node *node, struct ibv_wc *wc) { node->ah = ibv_create_ah_from_wc(node->pd, wc, node->mem, Index: librdmacm/include/rdma/rdma_cma.h === --- librdmacm.orig/include/rdma/rdma_cma.h 2007-01-22 21:56:13.0 +0200 +++ librdmacm/include/rdma/rdma_cma.h 2007-01-22 21:56:32.0 +0200 @@ -65,12 +65,6 @@ enum rdma_port_space { RDMA_PS_UDP = 0x0111, }; -/* - * Global qkey value for all UD QPs and multicast groups created via the - * RDMA CM. - */ -#define RDMA_UD_QKEY 0x01234567 - struct ib_addr { union ibv_gid sgid; union ibv_gid dgid; ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [RFC/PATCH v2] rdma/cma: use the ipoib broadcast group qkey
Modify the kernel rdma cm use the ipoib broadcast group qkey instead a qkey of its own for its UD IDs/QPs. For RDMA_PS_UDP ID, the qkey is stored in struct rdma_id_private and delivered also in ADDR_RESOLVED and CONNECT_REQUEST events. The user space library learns the qkey from these events and use them when it is called to create UD QP. Signed-off-by: Or Gerlitz <[EMAIL PROTECTED]> Index: rdma-dev/drivers/infiniband/core/cma.c === --- rdma-dev.orig/drivers/infiniband/core/cma.c 2007-01-21 12:11:16.0 +0200 +++ rdma-dev/drivers/infiniband/core/cma.c 2007-01-22 21:52:30.0 +0200 @@ -136,6 +136,7 @@ struct rdma_id_private { u32 seq_num; u32 qp_num; u8 srq; + u32 qkey; }; struct cma_multicast { @@ -884,6 +885,21 @@ out: return ret; } +static int get_broadcast_group_qkey(struct rdma_id_private *id_priv) +{ + struct ib_sa_mcmember_rec rec; + struct rdma_dev_addr *dev_addr = &id_priv->id.route.addr.dev_addr; + int ret; + + ib_addr_get_mgid(dev_addr, &rec.mgid); + ret = ib_sa_get_mcmember_rec(id_priv->id.device, id_priv->id.port_num, +&rec.mgid, &rec); + if (ret) + return -EINVAL; + id_priv->qkey = rec.qkey; + return 0; +} + static struct rdma_id_private *cma_new_conn_id(struct rdma_cm_id *listen_id, struct ib_cm_event *ib_event) { @@ -1020,7 +1036,14 @@ static int cma_req_handler(struct ib_cm_ mutex_unlock(&lock); if (ret) goto release_conn_id; - + + if (conn_id->id.ps == RDMA_PS_UDP) { + ret = get_broadcast_group_qkey(conn_id); + if (ret) + goto release_conn_id; + event.param.ud.qkey = conn_id->qkey; + } + conn_id->cm_id.ib = cm_id; cm_id->context = conn_id; cm_id->cm_handler = cma_ib_handler; @@ -1600,6 +1623,7 @@ static void addr_handler(int status, str { struct rdma_id_private *id_priv = context; struct rdma_cm_event event; + int ret; memset(&event, 0, sizeof event); atomic_inc(&id_priv->dev_remove); @@ -1627,6 +1651,14 @@ static void addr_handler(int status, str memcpy(&id_priv->id.route.addr.src_addr, src_addr, ip_addr_size(src_addr)); event.event = RDMA_CM_EVENT_ADDR_RESOLVED; + if (id_priv->id.ps == RDMA_PS_UDP) { + ret = get_broadcast_group_qkey(id_priv); + if (ret) { + event.event = RDMA_CM_EVENT_ADDR_ERROR; + event.status = ret; + } else + event.param.ud.qkey = id_priv->qkey; + } } if (id_priv->id.event_handler(&id_priv->id, &event)) { @@ -1936,7 +1968,9 @@ static int cma_sidr_rep_handler(struct i event.status = ib_event->param.sidr_rep_rcvd.status; break; } - if (rep->qkey != RDMA_UD_QKEY) { + if (rep->qkey != id_priv->qkey) { + printk(KERN_WARNING "qkey mismatch %.8x client qkey %.8x\n", + rep->qkey, id_priv->qkey); event.event = RDMA_CM_EVENT_UNREACHABLE; event.status = -EINVAL; break; @@ -2231,7 +2265,7 @@ static int cma_send_sidr_rep(struct rdma rep.status = status; if (status == IB_SIDR_SUCCESS) { rep.qp_num = id_priv->qp_num; - rep.qkey = RDMA_UD_QKEY; + rep.qkey = id_priv->qkey; } rep.private_data = private_data; rep.private_data_len = private_data_len; Index: rdma-dev/include/rdma/rdma_cm_ib.h === --- rdma-dev.orig/include/rdma/rdma_cm_ib.h 2007-01-18 13:43:37.0 +0200 +++ rdma-dev/include/rdma/rdma_cm_ib.h 2007-01-22 21:59:34.0 +0200 @@ -44,7 +44,4 @@ int rdma_set_ib_paths(struct rdma_cm_id *id, struct ib_sa_path_rec *path_rec, int num_paths); -/* Global qkey for UD QPs and multicast groups. */ -#define RDMA_UD_QKEY 0x01234567 - #endif /* RDMA_CM_IB_H */ ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [RFC/PATCH] rdma/cma: use the ipoib broadcast group qkey
Sean, Please let me know what you think - my intention is to have the group type effect only whether or not to set the rdmacm signature byte on the mgid and as for the qkey, just make the ipoib broadcast group qkey being used instread a qkey defined by the rdma cm. The patch is not completed yet in the sense that the qkey associated with the rdma cm kernel id should be exported to user space (on the client side it would be on the addr resolve event flow and on the server side on the conn req event flow) to be set by librdmacm into the user UD QP on the time rdma_create_qp is called. change the kernel rdma cm use the ipoib broadcast group qkey instead a qkey of its own. Signed-off-by: Or Gerlitz <[EMAIL PROTECTED]> Index: rdma-dev/drivers/infiniband/core/cma.c === --- rdma-dev.orig/drivers/infiniband/core/cma.c 2007-01-21 12:11:16.0 +0200 +++ rdma-dev/drivers/infiniband/core/cma.c 2007-01-22 14:05:22.0 +0200 @@ -136,6 +136,7 @@ struct rdma_id_private { u32 seq_num; u32 qp_num; u8 srq; + u32 qkey; }; struct cma_multicast { @@ -884,6 +885,21 @@ out: return ret; } +static int get_broadcast_group_qkey(struct rdma_id_private *id_priv) +{ + struct ib_sa_mcmember_rec rec; + struct rdma_dev_addr *dev_addr = &id_priv->id.route.addr.dev_addr; + int ret; + + ib_addr_get_mgid(dev_addr, &rec.mgid); + ret = ib_sa_get_mcmember_rec(id_priv->id.device, id_priv->id.port_num, +&rec.mgid, &rec); + if (ret) + return -EINVAL; + id_priv->qkey = rec.qkey; + return 0; +} + static struct rdma_id_private *cma_new_conn_id(struct rdma_cm_id *listen_id, struct ib_cm_event *ib_event) { @@ -1021,6 +1037,10 @@ static int cma_req_handler(struct ib_cm_ if (ret) goto release_conn_id; + ret = get_broadcast_group_qkey(conn_id); + if (ret) + goto release_conn_id; + conn_id->cm_id.ib = cm_id; cm_id->context = conn_id; cm_id->cm_handler = cma_ib_handler; @@ -1600,6 +1620,7 @@ static void addr_handler(int status, str { struct rdma_id_private *id_priv = context; struct rdma_cm_event event; + int ret; memset(&event, 0, sizeof event); atomic_inc(&id_priv->dev_remove); @@ -1626,6 +1647,11 @@ static void addr_handler(int status, str } else { memcpy(&id_priv->id.route.addr.src_addr, src_addr, ip_addr_size(src_addr)); + ret = get_broadcast_group_qkey(id_priv); + if (ret) { + event.event = RDMA_CM_EVENT_ADDR_ERROR; + event.status = ret; + } event.event = RDMA_CM_EVENT_ADDR_RESOLVED; } @@ -1936,7 +1962,9 @@ static int cma_sidr_rep_handler(struct i event.status = ib_event->param.sidr_rep_rcvd.status; break; } - if (rep->qkey != RDMA_UD_QKEY) { + if (rep->qkey != id_priv->qkey) { + printk(KERN_WARNING "qkey mismatch %.8x client qkey %.8x\n", + rep->qkey, id_priv->qkey); event.event = RDMA_CM_EVENT_UNREACHABLE; event.status = -EINVAL; break; @@ -2231,7 +2259,7 @@ static int cma_send_sidr_rep(struct rdma rep.status = status; if (status == IB_SIDR_SUCCESS) { rep.qp_num = id_priv->qp_num; - rep.qkey = RDMA_UD_QKEY; + rep.qkey = id_priv->qkey; } rep.private_data = private_data; rep.private_data_len = private_data_len; ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCH] rdma/cma: remove per multicast group qkey usage
Sean, Please see the cleanup below, also i see now that librdmacm has two functions to init a qp: ucma_init_ud_qp for UD QPs and ucma_init_ib_qp for RC QPs, where the rdmacm kernel code only has ucma_init_ib_qp, i guess something here is missing (is it only set the QKEY into the UD QP or also modify to RTR and RTS ? let me know and i can send a patch). a cleanup on the RDMA CM UD code: remove per group qkey usage for the join flow as this is impossible to achieve in practice with same UD QP serving attached to multiple group Signed-off-by: Or Gerlitz <[EMAIL PROTECTED]> Index: rdma-dev/drivers/infiniband/core/cma.c === --- rdma-dev.orig/drivers/infiniband/core/cma.c 2007-01-21 12:08:06.0 +0200 +++ rdma-dev/drivers/infiniband/core/cma.c 2007-01-21 12:11:16.0 +0200 @@ -2434,7 +2434,6 @@ static int cma_join_ib_multicast(struct ib_addr_get_sgid(dev_addr, &rec.port_gid); rec.pkey = cpu_to_be16(ib_addr_get_pkey(dev_addr)); rec.join_state = 1; - rec.qkey = sin->sin_addr.s_addr; comp_mask = IB_SA_MCMEMBER_REC_MGID | IB_SA_MCMEMBER_REC_PORT_GID | IB_SA_MCMEMBER_REC_PKEY | IB_SA_MCMEMBER_REC_JOIN_STATE | ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] failure to use libibverbs clone
I have configured libmthca against the install of libibverbs (/usr/local/rdmacm), the configure output is below. Or. dill:/usr/src/libmthca # ./configure --prefix=/usr/local/rdmacm CFLAGS=-I/usr/local/rdmacm/include \ LDFLAGS=-L/usr/local/rdmacm/lib LD_LIBRARY_PATH=/usr/local/rdmacm/lib checking for a BSD-compatible install... /usr/bin/install -c checking whether build environment is sane... yes checking for gawk... gawk checking whether make sets $(MAKE)... yes checking build system type... x86_64-suse-linux checking host system type... x86_64-suse-linux checking for style of include used by make... GNU checking for gcc... gcc checking for C compiler default output file name... a.out checking whether the C compiler works... yes checking whether we are cross compiling... no checking for suffix of executables... checking for suffix of object files... o checking whether we are using the GNU C compiler... yes checking whether gcc accepts -g... yes checking for gcc option to accept ANSI C... none needed checking dependency style of gcc... gcc3 checking for a sed that does not truncate output... /usr/bin/sed checking for egrep... grep -E checking for ld used by gcc... /usr/x86_64-suse-linux/bin/ld checking if the linker (/usr/x86_64-suse-linux/bin/ld) is GNU ld... yes checking for /usr/x86_64-suse-linux/bin/ld option to reload object files... -r checking for BSD-compatible nm... /usr/bin/nm -B checking whether ln -s works... yes checking how to recognise dependent libraries... pass_all checking how to run the C preprocessor... gcc -E checking for ANSI C header files... yes checking for sys/types.h... yes checking for sys/stat.h... yes checking for stdlib.h... yes checking for string.h... yes checking for memory.h... yes checking for strings.h... yes checking for inttypes.h... yes checking for stdint.h... yes checking for unistd.h... yes checking dlfcn.h usability... yes checking dlfcn.h presence... yes checking for dlfcn.h... yes checking for g++... g++ checking whether we are using the GNU C++ compiler... yes checking whether g++ accepts -g... yes checking dependency style of g++... gcc3 checking how to run the C++ preprocessor... g++ -E checking for g77... g77 checking whether we are using the GNU Fortran 77 compiler... yes checking whether g77 accepts -g... yes checking the maximum length of command line arguments... 32768 checking command to parse /usr/bin/nm -B output from gcc object... ok checking for objdir... .libs checking for ar... ar checking for ranlib... ranlib checking for strip... strip checking if gcc static flag works... yes checking if gcc supports -fno-rtti -fno-exceptions... no checking for gcc option to produce PIC... -fPIC checking if gcc PIC flag -fPIC works... yes checking if gcc supports -c -o file.o... yes checking whether the gcc linker (/usr/x86_64-suse-linux/bin/ld -m elf_x86_64) supports shared libraries... yes checking whether -lc should be explicitly linked in... no checking dynamic linker characteristics... GNU/Linux ld.so checking how to hardcode library paths into programs... immediate checking whether stripping libraries is possible... yes checking if libtool supports shared libraries... yes checking whether to build shared libraries... yes checking whether to build static libraries... yes configure: creating libtool appending configuration tag "CXX" to libtool checking for ld used by g++... /usr/x86_64-suse-linux/bin/ld -m elf_x86_64 checking if the linker (/usr/x86_64-suse-linux/bin/ld -m elf_x86_64) is GNU ld... yes checking whether the g++ linker (/usr/x86_64-suse-linux/bin/ld -m elf_x86_64) supports shared libraries... yes checking for g++ option to produce PIC... -fPIC checking if g++ PIC flag -fPIC works... yes checking if g++ supports -c -o file.o... yes checking whether the g++ linker (/usr/x86_64-suse-linux/bin/ld -m elf_x86_64) supports shared libraries... yes checking dynamic linker characteristics... GNU/Linux ld.so checking how to hardcode library paths into programs... immediate checking whether stripping libraries is possible... yes appending configuration tag "F77" to libtool checking if libtool supports shared libraries... yes checking whether to build shared libraries... yes checking whether to build static libraries... yes checking for g77 option to produce PIC... -fPIC checking if g77 PIC flag -fPIC works... yes checking if g77 supports -c -o file.o... yes checking whether the g77 linker (/usr/x86_64-suse-linux/bin/ld -m elf_x86_64) supports shared libraries... yes checking dynamic linker characteristics... GNU/Linux ld.so checking how to hardcode library paths into programs... immediate checking whether stripping libraries is possible... yes checking for gcc... (cached) gcc checking whether we are using the GNU C compiler... (cached) yes checking whether gcc accepts -g... (cached) yes checking for gcc option to accept ANSI C... (cached) none needed checking dependency style of gcc... (cached) gcc3 checking for ibv_get_device_list in -libverbs... ye
Re: [openib-general] failure to use libibverbs clone
install libmthca does not seem to create the etc directory Or. dill:/usr/src/libmthca # make install make[1]: Entering directory `/usr/src/libmthca' test -z "/usr/local/rdmacm/lib" || mkdir -p -- . "/usr/local/rdmacm/lib" test -z "" || mkdir -p -- . "" test -z "/usr/local/rdmacm/lib/infiniband" || mkdir -p -- . "/usr/local/rdmacm/lib/infiniband" /bin/sh ./libtool --mode=install /usr/bin/install -c 'src/mthca.la' '/usr/local/rdmacm/lib/infiniband/mthca.la' /usr/bin/install -c src/.libs/mthca.so /usr/local/rdmacm/lib/infiniband/mthca.so /usr/bin/install -c src/.libs/mthca.lai /usr/local/rdmacm/lib/infiniband/mthca.la /usr/bin/install -c src/.libs/mthca.a /usr/local/rdmacm/lib/infiniband/mthca.a ranlib /usr/local/rdmacm/lib/infiniband/mthca.a chmod 644 /usr/local/rdmacm/lib/infiniband/mthca.a PATH="$PATH:/sbin" ldconfig -n /usr/local/rdmacm/lib/infiniband -- Libraries have been installed in: /usr/local/rdmacm/lib/infiniband If you ever happen to want to link against installed libraries in a given directory, LIBDIR, you must either use libtool, and specify the full pathname of the library, or use the `-LLIBDIR' flag during linking and do at least one of the following: - add LIBDIR to the `LD_LIBRARY_PATH' environment variable during execution - add LIBDIR to the `LD_RUN_PATH' environment variable during linking - use the `-Wl,--rpath -Wl,LIBDIR' linker flag - have your system administrator add LIBDIR to `/etc/ld.so.conf' See any operating system documentation about shared libraries for more information, such as the ld(1) and ld.so(8) manual pages. -- make[1]: Leaving directory `/usr/src/libmthca' ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] OFED ipoib_8111_to_2_6_16.patch
Michael S. Tsirkin wrote: >> OK, thanks for the info. The context here is the bonding support. We had >> an issue with distro (eg RH4 U3, SLES10) kernels that was not reproduced >> with upstream kernels and it seems to be related to the change you have >> pushed to 2.6.17. I will let you know if we need more clarifications. > Was the issue triggered at ipoib module unload? no, its an issue related to the bonding design and the two layer nature of the ipoib neighbouring scheme: struct neighbour "pointing" to struct ipoib_neigh etc. We are still investigating it, hope to know more by tomorrow. Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] failure to use libibverbs clone
Michael S. Tsirkin wrote: >> libibverbs: Warning: couldn't open config directory >> '/usr/local/rdmacm/etc/libibverbs.d'. > > Well, do you have /usr/local/rdmacm/etc/libibverbs.d? no, who should create it? doing $ make install under libibverbs does not do it. I have created manually an empty library and then this warning went away but the other one and the failure to find devices stayed. Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] failure to use libibverbs clone
Roland, Using a fresh clone of libibverbs, libmthca and a kernel based on 2.6.20-rc3 (clone of Sean's rdma-dev git tree at open fabrics) I am getting errors such as # LD_LIBRARY_PATH=/usr/local/rdmacm/lib /usr/local/rdmacm/bin/ibv_devinfo libibverbs: Warning: couldn't open config directory '/usr/local/rdmacm/etc/libibverbs.d'. libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0 No IB devices found the strace traces follow, the system is very much operative (eg with IPoIB) Or. execve("/usr/local/rdmacm/bin/ibv_devinfo", ["/usr/local/rdmacm/bin/ibv_devinfo"], [/* 70 vars */]) = 0 uname({sys="Linux", node="dill", ...}) = 0 brk(0) = 0x503000 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2aeba2f6f000 open("/etc/ld.so.preload", O_RDONLY)= -1 ENOENT (No such file or directory) open("/usr/local/rdmacm/lib/tls/x86_64/libibverbs.so.2", O_RDONLY) = -1 ENOENT (No such file or directory) stat("/usr/local/rdmacm/lib/tls/x86_64", 0x7fff07b4e0f0) = -1 ENOENT (No such file or directory) open("/usr/local/rdmacm/lib/tls/libibverbs.so.2", O_RDONLY) = -1 ENOENT (No such file or directory) stat("/usr/local/rdmacm/lib/tls", 0x7fff07b4e0f0) = -1 ENOENT (No such file or directory) open("/usr/local/rdmacm/lib/x86_64/libibverbs.so.2", O_RDONLY) = -1 ENOENT (No such file or directory) stat("/usr/local/rdmacm/lib/x86_64", 0x7fff07b4e0f0) = -1 ENOENT (No such file or directory) open("/usr/local/rdmacm/lib/libibverbs.so.2", O_RDONLY) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\320\'\0"..., 640) = 640 fstat(3, {st_mode=S_IFREG|0755, st_size=164431, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2aeba2f7 mmap(NULL, 1085352, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x2aeba3071000 madvise(0x2aeba3071000, 1085352, MADV_SEQUENTIAL|0x1) = 0 mprotect(0x2aeba3079000, 1052584, PROT_NONE) = 0 mmap(0x2aeba3171000, 36864, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0) = 0x2aeba3171000 close(3)= 0 open("/usr/local/rdmacm/lib/libpthread.so.0", O_RDONLY) = -1 ENOENT (No such file or directory) open("/usr/local/rdmacm/lib/libpthread.so.0", O_RDONLY) = -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=130091, ...}) = 0 mmap(NULL, 130091, PROT_READ, MAP_PRIVATE, 3, 0) = 0x2aeba317a000 close(3)= 0 open("/lib64/tls/libpthread.so.0", O_RDONLY) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\340X\0\0"..., 640) = 640 fstat(3, {st_mode=S_IFREG|0755, st_size=99188, ...}) = 0 mmap(NULL, 1129880, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x2aeba319a000 madvise(0x2aeba319a000, 1129880, MADV_SEQUENTIAL|0x1) = 0 mprotect(0x2aeba31a8000, 1072536, PROT_NONE) = 0 mmap(0x2aeba329a000, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0) = 0x2aeba329a000 mmap(0x2aeba32aa000, 15768, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x2aeba32aa000 close(3)= 0 open("/usr/local/rdmacm/lib/libdl.so.2", O_RDONLY) = -1 ENOENT (No such file or directory) open("/usr/local/rdmacm/lib/libdl.so.2", O_RDONLY) = -1 ENOENT (No such file or directory) open("/lib64/libdl.so.2", O_RDONLY) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\320\36\0"..., 640) = 640 fstat(3, {st_mode=S_IFREG|0755, st_size=16807, ...}) = 0 mmap(NULL, 1058904, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x2aeba32ae000 madvise(0x2aeba32ae000, 1058904, MADV_SEQUENTIAL|0x1) = 0 mprotect(0x2aeba32b1000, 1046616, PROT_NONE) = 0 mmap(0x2aeba33ae000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0) = 0x2aeba33ae000 close(3)= 0 open("/usr/local/rdmacm/lib/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or directory) open("/usr/local/rdmacm/lib/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or directory) open("/lib64/tls/libc.so.6", O_RDONLY) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0p\313\1\0"..., 640) = 640 lseek(3, 624, SEEK_SET) = 624 read(3, "\4\0\0\0\20\0\0\0\1\0\0\0GNU\0\0\0\0\0\2\0\0\0\6\0\0\0"..., 32) = 32 fstat(3, {st_mode=S_IFREG|0755, st_size=1401317, ...}) = 0 mmap(NULL, 2235432, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x2aeba33b1000 madvise(0x2aeba33b1000, 2235432, MADV_SEQUENTIAL|0x1) = 0 mprotect(0x2aeba34b7000, 1162280, PROT_NONE) = 0 mmap(0x2aeba35b1000, 122880, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0x10) = 0x2aeba35b1000 mmap(0x2aeba35cf000, 15400, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x2aeba35cf000 close(3)= 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2aeba35d3000 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2aeba35d4000 arch_prctl(0x
Re: [openib-general] multicast code/merge status
> This is fine, but it may change when the user needs to make this > choice. E.g. when creating the QP, versus joining the multicast group, > in order to support the valid options. The selection also needs to be > conveyed to the kernel somehow. At this point, maybe we just need to > start looking at specific implementations. Indeed. I will send a patch early next week. Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] OFED ipoib_8111_to_2_6_16.patch
Michael S. Tsirkin wrote: >> However, since understanding this patch in detail is important to a peer >> member individual/company of the community (myself/Voltaire)fo/openib-general > > I really would like to help. What is it that you want to know? > Here's an explanation from an older mail. Does this help? > > Work around for neighbour destructor issue for kernels < 2.6.17: > keep a global list of all ipoib neighbours. Use it in destructor to > 1. Verify that this neighbour belongs to an ipoib device > 2. Check that the neighbour is the last one to use the destructor, > if so reset the destructor pointer OK, thanks for the info. The context here is the bonding support. We had an issue with distro (eg RH4 U3, SLES10) kernels that was not reproduced with upstream kernels and it seems to be related to the change you have pushed to 2.6.17. I will let you know if we need more clarifications. Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [openfabrics-ewg] Minutes for January 15, 2007 teleconference about OFED 1.2 development progress toward code freeze
Michael S. Tsirkin wrote: > Sounds too risky to me, this is technology preview code so > I want to have all this stuff off by default but easily > enabled by users who want to demo. I really don't want us to go again through things like yours (MST, Jack) vs. Sean rdma_establish, ucma versions etc. Like it or not, as was defined by the founders, OFED is --not-- a framework for development and unless there is a very specific reason (*) its kernel/user content should be based on code that have --passed through this component maintainer-- As been said over this list lets not treat OFED as a framework to shovel in unreviewed code. If you feel that your mthca and rdmacm QoS changes should be under CONFIG_EXPERIMENTAL , for-mm etc, specify this when you send the patches for review. Bottom line, lets not hind behind obscure definitions like "technology preview" to escape from normal processes where there -is- an alternative, the point here is not to meet the code freeze dead line avoiding normal processes - lets use processes and extend the deadline for the QoS merge if needed. (*) So far, the only case where people felt it makes sense to merge out of tree code was the local-sa and it is done by this component maintainer. > After I post the rest of the code, if you like you'll be able to > post an iser patch to add this stuff to iser as well. this is irrelevant till we resolve the process. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [openfabrics-ewg] Minutes for January 15, 2007 teleconference about OFED 1.2 development progress toward code freeze
On 1/17/07, Michael S. Tsirkin <[EMAIL PROTECTED]> wrote: > > Quoting Or Gerlitz <[EMAIL PROTECTED]>: > > I understand that the change involves letting the rdma cm know the SID > > when the consumer calls --rdma_resolve_route-- where today it get to > > know the SID when the consumer calls --rdma_connect-- . So this is not > > an internal RDMA CM change but rather also changes the API. > > Same for SRP as the api of ib_sa_path_rec_get (that is the structure it > > gets as input) changes, the SRP code also changes. > > Any, can you send the mthca and rdmacm/rdmacm-consumers changes as > > RFC/PATCH over the list before the actual code freeze??? > I didn't start on this code yet, but it does not look like a > huge project, I hope to post code by next week. > To avoid major disruptions all over the stack, my preference for OFED 1.2 > would be to add new API calls and a module option (off by default) for cma/srp > to use them. the rdmacm api change is not such a big deal and if you want to change it only for the kernel portion for the ofed 1.2 it makes sense to me. I really don't think --adding-- a special api is the way to go. Doing it in "end in mind" fashion, work on a patch, send it to the rdmacm maintainer/list for RFC and so on. > For OFED 1.2, I only planned to implement this for SDP and SRP. > I do not expect all this to be mergeable in 2.6.21 time frame, > so maybe that's enough. SDP is coded over the RDMA CM and i say above my suggestion is not to add a special API, so just dp the same QoS patching you do to SDP to iSER etc. Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] multicast code/merge status
On 1/17/07, Sean Hefty <[EMAIL PROTECTED]> wrote: > > +1 used only for unicast > > +2 used only for multicast > > +3 used for both unicast and multicast > If you view this as the use case for one side only, we also have option 3 > communicating with options 1 and 2. I would list these as: OK > +4 unicast QP to unicast and multicast QP i think you mean 3 <--> 1 that is unicast and multicast QP to unicast QP > +5 multicast QP to unicast and multicast QP i think you mean 3 <--> 2 that is unicast and multicast QP to multicast QP > Today, all of these work. What you're wanting to add is the ability to > communicate with an ipoib multicast group. I'd like to do this without > breaking > any of the existing communications, or treat ipoib separately for security > reasons. makes sense, so my suggestion is "leave this (using the ipoib qkey) to the user" if you prefer to have two group types: rdmacm and ipoib - that's fine. we would use ipoib type groups and in the envs that seting the qkey to be the ipoib would not break our communication (that is where we do need to interop with IPoIB) - we would do it, else we would do nothing. > > To make things simple, the solution i suggest is that that the RDMA CM > > would --not-- do this modify QP/QKEY (that is would set the 0x12345678 > > qkey on the modify qp to init) and rather leave it to the RDMA CM > > consumer --if-- they wish to do so. However it will use the ipv4 > > broadcast group qkey for doing mcast joins and report this qkey to the > > user in the ud param of the event. > > We need to be able to handle options 4 and 5 as well. indeed, i have addressed that above. > > this (what qkey is assigned to the ipv4 broadcast group by different > > SAs) is orthogonal to the discussion we do here. > This depends on whether verbs allows, or if it should allow, a user to > specify a > controlled qkey when configuring their QP. I don't think there is any limitation today in the verbs layer, actually for our testing so far we patches the rdmacm not set the sig byte and use the ipoib (ie not override it in core/cma.c) and we manage to interop fine with ipoib. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] multicast code/merge status
On 1/17/07, Michael S. Tsirkin <[EMAIL PROTECTED]> wrote: > > not following you here, how does qkey relates to RC QPs ? > Currently you can block userspace from creating QPs by unloading uverbs > module. > Maybe we should make it possible to block creating UD QPs from userspace > as a separate security measure. I don't think this is valid option for most of the IB production env. but if you want to add blocking UD QP creation to ib_uverbs as mod param whose default value is --unset--, i don't really care. Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] OFED ipoib_8111_to_2_6_16.patch
Or Gerlitz wrote: > Michael S. Tsirkin wrote: >>>> git log -Sneigh_destructor -- include/net/neighbour.h >>> also, having that at (my) hand does not remove the need that you will >>> set a changelog/signature for the OFED ipoib related backport patch. >> Feel free to add that. > Unless i miss something, we want all OFED kernel patches to meet > **basic** kernel working conversions, specifically that for each patch > there is a change log and an owner. OK, I realize now that in OFED 1.1 out of 438 .patch files under kernel_patches only 103 of them have Signed-Off-by line and assuming this maps 1:1 to the files that have change log, i am not asking you to write now 335 change-logs/signed-off-by section. However, since understanding this patch in detail is important to a peer member individual/company of the community (myself/Voltaire) and you being this patch owner and also having the OFED kernel patches maintainer chair, it makes sense that per our request you will put 5 minutes of your time to write a change log. Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] multicast code/merge status
Michael S. Tsirkin wrote: >> Quoting Sean Hefty <[EMAIL PROTECTED]>: >> Subject: Re: multicast code/merge status >> >>> sure, it can use the rdmacm qkey (0x1234567 etc) when it creates the QP >>> and later --if-- the user joins a multicast group modify the qp state >>> with the group qkey and report it in the cma event such that the >>> consumer of the rdmacm would set this into his IB UD TX WR >> Changing the qkey would break its existing UD communication. >> >>> Bottom line, Looking in the IB SPEC and IPoIB RFC i did not see >>> mentioning of privileged QKEY. >> From RFC 4391 (ipoib RFC), 4.1: >> >> 2. Q_Key >> >>It is RECOMMENDED that a controlled Q_Key be used with the >>high-order bit set. This is to prevent non-privileged >>software from fabricating and sending out bogus IP datagrams. > > BTW, should we be worried that proposed extension (passing qkey in rdma cm > param > list) seems to expose this qkey to non-privileged software? As was said over related threads here and elsewhere, multicast has its in nature non safeties and having IB implement broadcast over multicast adds more in safety to the party. Specifically, as Roland has commented, a user can attach his user space UD QP to the MGID of the ipv4 broadcast(if ipoib is running on this node it will join the group) and start making this IP subnet go crazy. We only want interop with IPoIB and we don't need to join/attach the ipv4 broadcast group just have an option for the rdmacm to use its qkey for joins and later either the rdmacm or the consumer will also set this qkey into the QP and the UD TX WR > Maybe a machanism should be in place to control access to this separately > from regular rdma cm for RC QPs? not following you here, how does qkey relates to RC QPs ? Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [openfabrics-ewg] Minutes for January 15, 2007 teleconference about OFED 1.2 development progress toward code freeze
Tziporet Koren wrote: > Or Gerlitz wrote: >> Tziporet Koren wrote: >> >> >> The bonding package would support: fresh (2.6.20) and some older >> upstream kernels along with SLES10 and RH4 Ux (x=3 for sure) >> >> > OK - please send us all the info once its ready >>> General changes to the package: >>> * Multicast - we wait for Voltaire and Sean to close all technical >>> details - should be ready by the end of the week >>> >> >> I have just sent Sean over the list a clarification email, if needed >> we would be able to help doing the missing patches and i guess in a >> combined effort this would be ready for the end of --next-- week >> >> > Thanks - please work with MST & Vlad on integration >> what about the host side QoS code? i did not see an newer RFC nor >> patch other then the RFC that was sent many months ago. > We are going to update our low level driver (mthca) to support it. > Beside there should be a small change in CMA for this, and its specified > in the RFC. I understand that the change involves letting the rdma cm know the SID when the consumer calls --rdma_resolve_route-- where today it get to know the SID when the consumer calls --rdma_connect-- . So this is not an internal RDMA CM change but rather also changes the API. Same for SRP as the api of ib_sa_path_rec_get (that is the structure it gets as input) changes, the SRP code also changes. Any, can you send the mthca and rdmacm/rdmacm-consumers changes as RFC/PATCH over the list before the actual code freeze??? As for the QoS RFC (http://openib.org/pipermail/openib-general/2006-May/022331.html) sent by Eitan, one design issue I see there is how to deal with IB ULPs which do --not-- have a well known SID. So they call ib_cm_listen with IB_CM_ASSIGN_SERVICE_ID and get from the CM a service id to use, then they might do some out of band exchange of this SID before starting their connection establishment. from include/rdma/ib_cm.h > * @service_id: Service identifier matched against incoming connection > * and service ID resolution requests. The service ID should be specified > * network-byte order. If set to IB_CM_ASSIGN_SERVICE_ID, the CM will > * assign a service ID to the caller. Typically this happens with MPI up to the extent that different ranks within the same job may get a different SID. One solution i was thinking of is to +1 define --range-- (eg big enough to serve 1024 CM consumers) per ULP +2 change the CM to support allocating SID in a range +3 change the ULPs which use IB_CM_ASSIGN_SERVICE_ID to ask SID in the relevant range +4 change the QoS manager at the SM side to support ranges Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] multicast code/merge status
Sean Hefty wrote: >> sure, it can use the rdmacm qkey (0x1234567 etc) when it creates the >> QP and later --if-- the user joins a multicast group modify the qp >> state with the group qkey and report it in the cma event such that the >> consumer of the rdmacm would set this into his IB UD TX WR > Changing the qkey would break its existing UD communication. OK, so we have three use cases here for a UD QP +1 used only for unicast +2 used only for multicast +3 used for both unicast and multicast and my suggestion (default qkey, when join is completed do qp modify with the group qkey) would work fine for use cases 1 - since the user never joins to anything and 2 - same as it works in ipoib so we are left with use case 3. To make things simple, the solution i suggest is that that the RDMA CM would --not-- do this modify QP/QKEY (that is would set the 0x12345678 qkey on the modify qp to init) and rather leave it to the RDMA CM consumer --if-- they wish to do so. However it will use the ipv4 broadcast group qkey for doing mcast joins and report this qkey to the user in the ud param of the event. So users that don't care about their qkey would never bother to do this modify qp and users who do care would do it and have to take caution if their QP is of type 3 (both unicast and mcast). If you don't like this direction, your idea from below to have two option for group type - rdmacm or ipoib and have the consumer specify it, so for group type ipoib you will use the ipv4 brd qkey for both join and modify qp and for group type rdmacm you would just use the rdmacm qkey and do no modify qp - this is fine for us as well. >> Bottom line, Looking in the IB SPEC and IPoIB RFC i did not see >> mentioning of privileged QKEY. > > From RFC 4391 (ipoib RFC), 4.1: > > 2. Q_Key > > It is RECOMMENDED that a controlled Q_Key be used with the > high-order bit set. This is to prevent non-privileged > software from fabricating and sending out bogus IP datagrams. > > I don't know what qkey is actually assigned, however. this (what qkey is assigned to the ipv4 broadcast group by different SAs) is orthogonal to the discussion we do here. > I have some path forward related tasks that I would like to complete > before starting on this. I hope to finish that before the end of this > week. I don't want to rush on the multicast support and miss > something. For the rdma cm, we may need to let the user set some > options when joining a multicast group. Maybe something like: join type > (send-only or send-receive), group type (ipoib or rdma defined), etc. As I see it, the group type (or having no types and being always interoperable with ipoib as i suggest above) seems easy to add to the current implementation and would put it in acceptable state for upstream pushing to 2.6.21 and inclusion in OFED 1.2 . As for the join type, as i told you before, I think it should --not-- delay the upstream nor the ofed 1.2 push - if you have the time add this to the user/kernel ABI and have ucma kernel return -EINVAL if someone attempts to to send-only join. And if you don't have the time for that, it can be added later. Actually, as you can see in the ipoib code, it never does a send-only-non-member join, so my take here is that till the ipoib issue is resolved there is no reason to have this complexity in the rdmacm. > I do plan on requesting that the core multicast changes to ib_sa and > ib_ipoib be pulled into 2.6.21. This is great news but again I think the "nobody perfect" rule applies well here, the current rdmacm multicast support (which the little fixes we discuss over this thread) can be pushed to 2.6.21 and be enhanced later. Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] some IB multicast sendonly thoughts
Eitan Zahavi wrote: > Or Gerlitz wrote: >> Eitan Zahavi wrote: >> So you are saying that the GW **has** to listen on IGMP at the Eth >> side and **has** to do IB SA join in the only way that forces the SA >> to create the group --> FullMember ? > Yes >> If indeed, this is kind of bad, > I find it very reasonable OK, going fwd with this approach, the GW got IGMP --> so it did FULL MEMBER join and the group is created, what is going on when the Eth multicast node stopped doing RX is there a "leave" IGMP which the GW can trap and act? Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] multicast code/merge status
Sean Hefty wrote: >> mimic IPoIB qkey flow: >> +3 on rdma_create_qp do modify qp with some def qkey (eg zero) >> +4 on the join completion path before attaching a qp to the associated >> mgid, do modify qp with this mrec qkey (=ipv4 broadcast one) > The rdma cm allows UD QP communication, which requires a valid qkey > before or without joining a multicast group. I'd like to find a way to > continue to support this. sure, it can use the rdmacm qkey (0x1234567 etc) when it creates the QP and later --if-- the user joins a multicast group modify the qp state with the group qkey and report it in the cma event such that the consumer of the rdmacm would set this into his IB UD TX WR >> +3 on rdma_create_qp do modify qp with some def qkey (eg zero) >> +4 on the join completion path before attaching a qp to the associated >> mgid, do modify qp with this mrec qkey (=ipv4 broadcast one) > Isn't the ipoib qkey a privileged qkey? looking in ipoib code you can see the following code in ipoib_mcast_join_task > if (!test_bit(IPOIB_MCAST_FLAG_ATTACHED, &priv->broadcast->flags)) { > ipoib_mcast_join(dev, priv->broadcast, 0); > return; > } so ipoib_mcast_join is called with create=0 for the broadcast group and this makes it provide a component mask of > comp_mask = > IB_SA_MCMEMBER_REC_MGID | > IB_SA_MCMEMBER_REC_PORT_GID | > IB_SA_MCMEMBER_REC_PKEY | > IB_SA_MCMEMBER_REC_JOIN_STATE; that is the SA sets the QKEY, RATE, MTU, SL etc etc for the broadcast group and later other any joins done by ipoib uses the priv->broadcast->mcmember fields So the broadcast qkey is basically what the SA has set when it created the group. During my talking here i got a pointer to section 10 in the IPoIB RFC (4391) mentioning something like "some 3rd party --has-- to create the broadcast group": > 10. Sending and Receiving IP Multicast Packets > A node joining an IP multicast group must first construct an MGID >according to the rule described in section 4 above. Once the correct >MGID is calculated, the node must call the SA of the outbound link to >attempt a "FullMember" join of the IB multicast group corresponding >to the MGID. If the IB multicast group does not already exist, one >must be created first with the IPoIB link MTU. The MGID MUST use the >same P_Key, Q_Key, SL, MTU, and HopLimit as those used in the >broadcast-GID. The rest of attributes SHOULD follow the values used >in the broadcast-GID as well. Bottom line, Looking in the IB SPEC and IPoIB RFC i did not see mentioning of privileged QKEY. Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] some IB multicast sendonly thoughts
Eitan Zahavi wrote: >> So you are saying that the GW **has** to listen on IGMP at the Eth >> side and **has** to do IB SA join in the only way that forces the SA >> to create the group --> FullMember ? > Yes >> If indeed, this is kind of bad, > I find it very reasonable OK, let me think about it for a while, thanks for the quick response. Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Minutes for January 15, 2007 teleconference about OFED 1.2 development progress toward code freeze
Tziporet Koren wrote: Hi Tziporet, thanks for the details info, below are few comments: > *Abbreviated minutes / summary* > * Bonding module will be added to OFED 1.2 to support HA on older > kernels The bonding package would support: fresh (2.6.20) and some older upstream kernels along with SLES10 and RH4 Ux (x=3 for sure) > General changes to the package: > * Multicast - we wait for Voltaire and Sean to close all technical > details - should be ready by the end of the week I have just sent Sean over the list a clarification email, if needed we would be able to help doing the missing patches and i guess in a combined effort this would be ready for the end of --next-- week > Management: > * OpenSM: > o QoS - on work; first version will be ready at end of month what about the host side QoS code? i did not see an newer RFC nor patch other then the RFC that was sent many months ago. Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] some IB multicast sendonly thoughts
Eitan Zahavi wrote: > Or Gerlitz wrote: >> OK, assuming my setup consists of: >> +1 IB node doing only multicast TX on a group >> +2 an IB/Ethernet gateway >> 3+ Eth node doing only multicast RX on the equiv mac (forget manytoone) >> The gateway design is to register for SA MGID IN/OUT traps and when it >> gets MGID IN it joins the the mgroup as ***NonMember** etc > GW needs to listen on IGMP on the Eth port... this was fast... thanks for jumping on it. So you are saying that the GW **has** to listen on IGMP at the Eth side and **has** to do IB SA join in the only way that forces the SA to create the group --> FullMember ? If indeed, this is kind of bad, I find the approach of the GW being "transparent" to the SA in the sense that it does not cause mgroup create/destroy nor mgroup ref count inc/dec much more robust, so you are saying its not feasible with the IB spec. Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] some IB multicast sendonly thoughts
> 15.2.5.17.1 GROUP MEMBERSHIP > An endport must specify the type of multicast subscription or deletion that > it wants. The MCMemberRecord:JoinState component indicates the > membership qualities a port wishes to add (in joining or creating a group) > or remove (in leaving a group). The meanings of the MCMember- > Record:JoinState bits are: > • FullMember: Group messages are routed both to and from the port. > The port is considered a member for purposes of group creation and > deletion, i.e.: if no member ports with FullMember=1 remain, the > group may be deleted; otherwise it may not. > • NonMember: Group messages are routed both to and from the port. > The port is not considered a member for purposes of group creation/ > deletion. > • SendOnlyNonMember: Group messages are only routed from the > port; none are routed to the port. The port is not considered a member > for purposes of group creation/deletion. > MCMemberRecord:JoinState.FullMember bit must be set to 1 in the SubnAdmSet() > request that creates a multicast group. ... OK, assuming my setup consists of: +1 IB node doing only multicast TX on a group +2 an IB/Ethernet gateway 3+ Eth node doing only multicast RX on the equiv mac (forget manytoone) The gateway design is to register for SA MGID IN/OUT traps and when it gets MGID IN it joins the the mgroup as ***NonMember** etc Now, since the TX node joins as SendOnlyNonMember the SA would never create this group --> the TX node would never get MLID etc to create AH, etc etc ---> this setup is broken. any thoughts and/or ideas would be welcome Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] multicast code/merge status
Sean Hefty wrote: >> OK, got you at last (sorry but i have somehow ignored the call to >> ib_addr_get_mgid() at the rdmacm code). So to achieve interop with >> IPoIB all we need to do is remove the rdmacm signature bit and not to >> over-write the rdmacm qkey on the the qkey of the ipoib ipv4 broadcast >> group, are you ok with that? > I believe this would achieve interop with ipoib. However, overwriting > the qkey may break any existing UD communication that the user may > have. I just need to think about this more, and see what we can come up > with. Hi Sean, Based on our communication so far, the elements which are missing are ++ on the rdmacm kernel code: (drivers/infiniband/core/cma.c) +1 remove the rdmacm signature byte from the mgid +2 get the qkey used by the ipv4 broadcast group and use it mimic IPoIB qkey flow: +3 on rdma_create_qp do modify qp with some def qkey (eg zero) +4 on the join completion path before attaching a qp to the associated mgid, do modify qp with this mrec qkey (=ipv4 broadcast one) ++ on the rdmacm user space code: (librdmacm/src/cma.c) +3 on rdma_create_qp do modify qp with some def qkey (eg zero) +4 on the join completion path before attaching a qp to the associated mgid, do modify qp with this mrec qkey (=ipv4 broadcast one) With the time frame for 2.6.21 and OFED 1.2 becoming short, can you update of the multicast patch series status? We really want it in for this time frame, please let me know if you prefer to get patches that implement the above (eg as reference) or do it yourself... thanks, Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] OFED ipoib_8111_to_2_6_16.patch
Michael S. Tsirkin wrote: >>> git log -Sneigh_destructor -- include/net/neighbour.h >> produced nothing on my net-2.6.20 git however browsing the git log i see >> this patch, is this the one you refer to? > Yes. thanks >> also, having that at (my) hand does not remove the need that you will >> set a changelog/signature for the OFED ipoib related backport patch. > Feel free to add that. Unless i miss something, we want all OFED kernel patches to meet **basic** kernel working conversions, specifically that for each patch there is a change log and an owner. Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] OFED ipoib_8111_to_2_6_16.patch
Michael S. Tsirkin wrote: >>> It's a backport for kernels <= 2.6.16. >> Can you please send (and add to OFED 1.2) a changelog comment explaining >> the problem and how it is solved in 2.6.17 and above ?! >> We are looking on some code around ipoib_neigh_destructor() and friends >> and the changelog would really be of help to us. > Try this > git log -Sneigh_destructor -- include/net/neighbour.h produced nothing on my net-2.6.20 git however browsing the git log i see this patch, is this the one you refer to? also, having that at (my) hand does not remove the need that you will set a changelog/signature for the OFED ipoib related backport patch. > commit c5ecd62c25400a3c6856e009f84257d5bd03f03b > Author: Michael S. Tsirkin <[EMAIL PROTECTED]> > Date: Mon Mar 20 22:25:41 2006 -0800 > > [NET]: Move destructor from neigh->ops to neigh_params > > struct neigh_ops currently has a destructor field, which no in-kernel > drivers outside of infiniband use. The infiniband/ulp/ipoib in-tree > driver stashes some info in the neighbour structure (the results of > the second-stage lookup from ARP results to real link-level path), and > it uses neigh->ops->destructor to get a callback so it can clean up > this extra info when a neighbour is freed. We've run into problems > with this: since the destructor is in an ops field that is shared > between neighbours that may belong to different net devices, there's > no way to set/clear it safely. > > The following patch moves this field to neigh_parms where it can be > safely set, together with its twin neigh_setup. Two additional > patches in the patch series update ipoib to use this new interface. > > Signed-off-by: Michael S. Tsirkin <[EMAIL PROTECTED]> > Signed-off-by: Roland Dreier <[EMAIL PROTECTED]> > Signed-off-by: David S. Miller <[EMAIL PROTECTED]> > ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] OFED ipoib_8111_to_2_6_16.patch
Michael S. Tsirkin wrote: >> Quoting Or Gerlitz <[EMAIL PROTECTED]>: >> Subject: OFED ipoib_8111_to_2_6_16.patch > Isn't it obvious from the name? sure, thanks for the clarification. > It's a backport for kernels <= 2.6.16. Can you please send (and add to OFED 1.2) a changelog comment explaining the problem and how it is solved in 2.6.17 and above ?! We are looking on some code around ipoib_neigh_destructor() and friends and the changelog would really be of help to us. Thanks, Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] OFED ipoib_8111_to_2_6_16.patch
Hi Michael, I have just realized that a) this patch was not pushed upstream and b) the --same-- instance of it is kept on all backports of both OFED 1.1 & 1.2 staging It also does not have a changelog comment and Signed-Off-By signature... Can you shed some light on what's going on here? thanks, Or. # pwd /home/ogerlitz/OFED-1.1/SOURCES/openib-1.1/kernel_patches # find . -name \*ipoib\* | grep 8111 | xargs ls -l -rw-r--r-- 1 1078 101 2616 Oct 19 16:21 ./backport/2.6.11_FC4/ipoib_8111_to_2_6_16.patch -rw-r--r-- 1 1078 101 2616 Oct 19 16:21 ./backport/2.6.11/ipoib_8111_to_2_6_16.patch -rw-r--r-- 1 1078 101 2616 Oct 19 16:21 ./backport/2.6.12/ipoib_8111_to_2_6_16.patch -rw-r--r-- 1 1078 101 2616 Oct 19 16:21 ./backport/2.6.13/ipoib_8111_to_2_6_16.patch -rw-r--r-- 1 1078 101 2616 Oct 19 16:21 ./backport/2.6.13_suse10_0_u/ipoib_8111_to_2_6_16.patch -rw-r--r-- 1 1078 101 2616 Oct 19 16:21 ./backport/2.6.14/ipoib_8111_to_2_6_16.patch -rw-r--r-- 1 1078 101 2616 Oct 19 16:21 ./backport/2.6.15/ipoib_8111_to_2_6_16.patch -rw-r--r-- 1 1078 101 2616 Oct 19 16:21 ./backport/2.6.16/ipoib_8111_to_2_6_16.patch -rw-r--r-- 1 1078 101 2616 Oct 19 16:21 ./backport/2.6.16_sles10/ipoib_8111_to_2_6_16.patch -rw-r--r-- 1 1078 101 2616 Oct 19 16:21 ./backport/2.6.9/ipoib_8111_to_2_6_16.patch -rw-r--r-- 1 1078 101 2616 Oct 19 16:21 ./backport/2.6.9_U3/ipoib_8111_to_2_6_16.patch -rw-r--r-- 1 1078 101 2616 Oct 19 16:21 ./backport/2.6.9_U4/ipoib_8111_to_2_6_16.patch Index: openib_branch1.0/drivers/infiniband/ulp/ipoib/ipoib_main.c === --- openib_branch1.0.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ openib_branch1.0/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -82,6 +82,9 @@ static const u8 ipv4_bcast_addr[] = { struct workqueue_struct *ipoib_workqueue; +static DEFINE_SPINLOCK(ipoib_all_neigh_list_lock); +static LIST_HEAD(ipoib_all_neigh_list); + static void ipoib_add_one(struct ib_device *device); static void ipoib_remove_one(struct ib_device *device); @@ -751,6 +754,17 @@ static void ipoib_neigh_destructor(struc unsigned long flags; struct ipoib_ah *ah = NULL; + struct ipoib_neigh *tn, *nn = NULL; + spin_lock(&ipoib_all_neigh_list_lock); + list_for_each_entry(tn, &ipoib_all_neigh_list, all_neigh_list) + if (tn->neighbour == n) { + nn = tn; + break; + } + spin_unlock(&ipoib_all_neigh_list_lock); + if (!nn) + return; + ipoib_dbg(priv, "neigh_destructor for %06x " IPOIB_GID_FMT "\n", be32_to_cpup((__be32 *) n->ha), @@ -783,19 +797,33 @@ struct ipoib_neigh *ipoib_neigh_alloc(st neigh->neighbour = neighbour; *to_ipoib_neigh(neighbour) = neigh; + spin_lock(&ipoib_all_neigh_list_lock); + list_add_tail(&neigh->all_neigh_list, &ipoib_all_neigh_list); + neigh->neighbour->ops->destructor = ipoib_neigh_destructor; + spin_unlock(&ipoib_all_neigh_list_lock); + return neigh; } void ipoib_neigh_free(struct ipoib_neigh *neigh) { + struct ipoib_neigh *nn; + spin_lock(&ipoib_all_neigh_list_lock); + list_del(&neigh->all_neigh_list); + list_for_each_entry(nn, &ipoib_all_neigh_list, all_neigh_list) + if (nn->neighbour->ops == neigh->neighbour->ops) + goto found; + + neigh->neighbour->ops->destructor = NULL; +found: + spin_unlock(&ipoib_all_neigh_list_lock); + *to_ipoib_neigh(neigh->neighbour) = NULL; kfree(neigh); } static int ipoib_neigh_setup_dev(struct net_device *dev, struct neigh_parms *parms) { - parms->neigh_destructor = ipoib_neigh_destructor; - return 0; } Index: openib_branch1.0/drivers/infiniband/ulp/ipoib/ipoib.h === --- openib_branch1.0.orig/drivers/infiniband/ulp/ipoib/ipoib.h +++ openib_branch1.0/drivers/infiniband/ulp/ipoib/ipoib.h @@ -47,6 +47,8 @@ #include #include +#include + #include #include @@ -217,6 +219,7 @@ struct ipoib_neigh { struct neighbour *neighbour; + struct list_headall_neigh_list; struct list_headlist; }; ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] ipoib ipv6 multicast joins, was: multicast code/merge status
On 1/15/07, Sean Hefty <[EMAIL PROTECTED]> wrote: > > Can you explain how this relates to your multicast changes? the IPoIB > > send-only-full-member-join hack was there before your patch and stayed > > there after your patch... and how come a change in the multicast code > > can cause the error steam to be finite... have you moved the retry > > mechanism from the ib_sa consumer to the ib_sa mcast engine? > > There was a bug in the ib_sa multicast engine handling failed joins, which had > it retry forever. (Basically, the response was not being matched with the > request. So the response was discarded, and the request was retried.) I had > fixed this in svn, but lost the patch moving over to git. sure, got you. Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] [libibverbs] Adding acks to all of the CQ events in the pingpong examples
Roland Dreier wrote: > OK, this is correct -- but since the examples don't destroy the CQ, is > there any point in acking the events? Yes, people use these examples when learning how to write code for IB, lets educate them well ... (ie the destroy cq should be added later) Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] ipoib ipv6 multicast joins, was: multicast code/merge status
Sean Hefty wrote: >> So, this looks like a work-around for some broken SM, does it not? > > Yes - I mentioned it because the resulting error message (wrong > component mask) is what was filling up the opensm log file. > > Jan 11 14:21:36 083844 [40583BB0] -> osm_mcmr_rcv_join_mgrp: ERR 1B11: > method = > SubnAdmSet, scope_state = 0x1, component mask = 0x00010083, > expected com > p mask = 0x000130c7, MGID: 0x : > 0x201400020404 from > port 0x0002c9010ad258f1 > > I've applied a missing patch to my rdma-dev git tree that should avoid > filling up the opensm log file. But the error in the opensm log file is > a result of this work-around. Sean, Can you explain how this relates to your multicast changes? the IPoIB send-only-full-member-join hack was there before your patch and stayed there after your patch... and how come a change in the multicast code can cause the error steam to be finite... have you moved the retry mechanism from the ib_sa consumer to the ib_sa mcast engine? Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [RFC] [PATCH V2 0/3] bonding support for operation over IPoIB
Or Gerlitz wrote: > This patch series is a second version (see below link to V1) of the suggested > changes to the bonding driver such that it would be able to support non > ARPHRD_ETHER > netdevices for its High-Availability (active-backup) mode. > > The motivation is to enable the bonding driver on its HA mode to work with the > IP over Infiniband (IPoIB) driver. With these patches I was able to enslave > IPoIB netdevices and run TCP, UDP, IP (UDP) Multicast and ICMP traffic with > fail-over and fail-back working fine. My working env was the net-2.6.20 git. > These patches are not enough for configuration of IPoIB bonding through tools > (eg /sbin/ifenslave and /sbin/ifup) provided by packages such as sysconfig and > initscripts, specifically since these tools sets the bonding device to be UP > before enslaving anything. Once this patchset gets positive/feedback the next > step > would be to look how to enhance the tools/packages so it would be possible to > bond/enslave with the modified code. As suggested by the bonding maintainer, > this > step can potentially involve converting ifenslave to be a script based on the > bonding > sysfs infrastructure rather on the somehow obsoleted > Documentation/networking/ifenslave.c Jay, I would like to move forward and push the V2 patch series upstream through netdev and then start working on the configuration tools etc changes needed to support bonding IPoIB devices through non direct bonding sysfs scripts... are you OK with that? If you agree to the push, who is doing this nowadays, is it Jeff Garzik or David Miller? Roland - any other comments/concerns that you might have are very much appreciated. Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH 7/7] libcxgb3: Update libcxgb3 for new libibverbs driver handling
Michael S. Tsirkin wrote: >> > So libibverbs 1.1 will be part of ofed 1.2? >> That's the goal, and I guess you're counting on it for libcxg3 > I guess this means libcxg3 can be made to work with libibverbs 1.0 if > desired. Just a reminder for the importance of including libibverbs 1.1 in OFED 1.2 ---> to have the ***fork*** supported merged at last to an official release. Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] multicast code/merge status
On 10 Jan 2007 14:31:38 -0500, Hal Rosenstock <[EMAIL PROTECTED]> wrote: > On Wed, 2007-01-10 at 13:47, Or Gerlitz wrote: > > (*) there are some more issues here which need to be addressed, see > > for example the "Some SMs don't support send-only yet" weird comment > > at ipoib_mcast_sendonly_join() > It's more likely an SA issue but I'm only guessing... It may also be > historical... We are not a huge community, how about asking the person who put this comment to come and say "i did it" and "it was done b/c this or that" Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general