Re: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey
On Thu, 2007-02-22 at 17:18, Sean Hefty wrote: > >>Can someone help my understanding here? Is ipoib joining a multicast group > >>using the full membership PKey, even if the node that it joins from only > >>has the > >>limited membership PKey configured? And the code in ib_find_cached_pkey > >>helps > >>enable this? > > > > Yep. The ipoib create_child function Or-s 0x8000 to the device pkey > > which was provided by the user. Now, IPoIB uses the device pkey when > > forming MGIDs and when doing modify qp to init. Indeed the way > > ib_find_cached_pkey() is implemented, make the latter use trivial. > > Doesn't this allow ipoib to join a multicast group for which it may not be > able > to communicate with all members? Yes, if the join were to succeed which appears to be to be noncompliant behavior. > For the broadcast group, this seems like an error to me. Why for just the broadcast group ? Isn't it any IPoIB MC group for which this would be done ? (See below as to what the IBA spec says). > Can ipoib work in such a configuration? If all nodes were > assigned a partial membership PKey, none of them could communicate, but no > errors would be generated anywhere. > > Joining a multicast group requires specifying the full membership PKey. I > don't > see anything in the spec that explicitly prohibits joining the group from a > node > with only a partial membership PKey, What about the description og P_Key in MCMemberRecord (table 210 on p. 908 which is compliance) which states: "All members of the multicast group shall have full membership in the partition indicated by the partition key." -- Hal > but at first glance, this seems like a > subnet configuration issue. Is there some use of this I'm overlooking? > > - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey
On Thu, 2007-02-22 at 17:18, Sean Hefty wrote: > >>Can someone help my understanding here? Is ipoib joining a multicast group > >>using the full membership PKey, even if the node that it joins from only > >>has the > >>limited membership PKey configured? And the code in ib_find_cached_pkey > >>helps > >>enable this? > > > > Yep. The ipoib create_child function Or-s 0x8000 to the device pkey > > which was provided by the user. Now, IPoIB uses the device pkey when > > forming MGIDs and when doing modify qp to init. Indeed the way > > ib_find_cached_pkey() is implemented, make the latter use trivial. > > Doesn't this allow ipoib to join a multicast group for which it may not be > able > to communicate with all members? Yes, if the join were to succeed which appears to me to be to be noncompliant behavior. > For the broadcast group, this seems like an error to me. Why for just the broadcast group ? Isn't it any IPoIB MC group for which this would be done ? (See below as to what the IBA spec says). > Can ipoib work in such a configuration? If all nodes were > assigned a partial membership PKey, none of them could communicate, but no > errors would be generated anywhere. > > Joining a multicast group requires specifying the full membership PKey. I > don't > see anything in the spec that explicitly prohibits joining the group from a > node > with only a partial membership PKey, What about the description og P_Key in MCMemberRecord (table 210 on p. 908 which is compliance) which states: "All members of the multicast group shall have full membership in the partition indicated by the partition key." -- Hal > but at first glance, this seems like a > subnet configuration issue. Is there some use of this I'm overlooking? > > - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey
>>Can someone help my understanding here? Is ipoib joining a multicast group >>using the full membership PKey, even if the node that it joins from only has >>the >>limited membership PKey configured? And the code in ib_find_cached_pkey helps >>enable this? > > Yep. The ipoib create_child function Or-s 0x8000 to the device pkey > which was provided by the user. Now, IPoIB uses the device pkey when > forming MGIDs and when doing modify qp to init. Indeed the way > ib_find_cached_pkey() is implemented, make the latter use trivial. Doesn't this allow ipoib to join a multicast group for which it may not be able to communicate with all members? For the broadcast group, this seems like an error to me. Can ipoib work in such a configuration? If all nodes were assigned a partial membership PKey, none of them could communicate, but no errors would be generated anywhere. Joining a multicast group requires specifying the full membership PKey. I don't see anything in the spec that explicitly prohibits joining the group from a node with only a partial membership PKey, but at first glance, this seems like a subnet configuration issue. Is there some use of this I'm overlooking? - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey
On 2/22/07, Sean Hefty <[EMAIL PROTECTED]> wrote: > >My understanding is that when an IPoIB broadcast domain contains both > >partial and full members (*) attempts to communicate between two partial > >members would silently fail, does this silence is something you think we > >should work to change? > > I'm looking at this from a different view than just ipoib multicast groups. > For > example, can two users of the ib_cm successfully establish a connection, but > not > actually be able to transfer data between each other? This seems possible, > though unlikely. This is the type of silent failure I'm referring to. I don't think this is possible since the active CM uses the pkey index of the pkey provided in REQ.path to send the REQ mad, same for the passive CM - it uses the index in its table of REQ.path.pkey. So if the CMs are able to talk over QP1 using this pkey index the CM consumers can talk over their RC (REQ) / UD (SIDR REQ) QPs. And both the CM and its consumers would use the same index - the one returned from the ib_get_cached_pkey Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey
On 2/22/07, Sean Hefty <[EMAIL PROTECTED]> wrote: > >An IB multicast group _cannot_ have partial members so this never should > >get far enough to where two limited members would be unable to > >communicate. > Can someone help my understanding here? Is ipoib joining a multicast group > using the full membership PKey, even if the node that it joins from only has > the > limited membership PKey configured? And the code in ib_find_cached_pkey helps > enable this? Yep. The ipoib create_child function Or-s 0x8000 to the device pkey which was provided by the user. Now, IPoIB uses the device pkey when forming MGIDs and when doing modify qp to init. Indeed the way ib_find_cached_pkey() is implemented, make the latter use trivial. Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey
>An IB multicast group _cannot_ have partial members so this never should >get far enough to where two limited members would be unable to >communicate. Can someone help my understanding here? Is ipoib joining a multicast group using the full membership PKey, even if the node that it joins from only has the limited membership PKey configured? And the code in ib_find_cached_pkey helps enable this? - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey
>My understanding is that when an IPoIB broadcast domain contains both >partial and full members (*) attempts to communicate between two partial >members would silently fail, does this silence is something you think we >should work to change? I'm looking at this from a different view than just ipoib multicast groups. For example, can two users of the ib_cm successfully establish a connection, but not actually be able to transfer data between each other? This seems possible, though unlikely. This is the type of silent failure I'm referring to. Without this patch, two clients that try to connect using the librdmacm will fail. That failure is reported to the user. With this patch, the connection would be created, but I don't think that it guarantees that communication can actually occur. I don't want to mask a configuration issue. >My thinking is that if in the end of this thread we are willing to move >forward without changing ib_find_cached_pkey() then this patch should be >merged. I'm still unsure about where the cause of this problem lies. It may be that the kernel rdma_cm or rdma_ucm needs to change if we decide the ib_find_cached_pkey is correct. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey
On Thu, 2007-02-22 at 03:04, Or Gerlitz wrote: > Sean Hefty wrote: > >> Note that since the HCA validates the pkey in the in coming packet, no > >> matter what the IB SW would do, partial members of a partition can't > >> talk to each other. So the approach taken by the core/ipoib code was > >> to just ignore the MSb in places where the code looks for the pkey > >> --index-- and use the full member pkey when forming MGIDs. This seems > >> fine to me. > > > My concern is that ib_find_cached_pkey() returns an index to a pkey that > > wasn't > > the one in the search. Can this lead to a QP being configured in such a > > way > > that communication with a remote QP would silently fail? > > My understanding is that when an IPoIB broadcast domain contains both > partial and full members (*) attempts to communicate between two partial > members would silently fail, An IB multicast group _cannot_ have partial members so this never should get far enough to where two limited members would be unable to communicate. -- Hal > does this silence is something you think we > should work to change? > > (*) eg when you have bunch or clients and a server or bunch of servers > and you don't want to allow --clients-- to communicate among themselves) > > > I'm not against this patch, but I want to make sure that I understand the > > issues, so we're not creating a work-around solution. The patch is against > > the > > librdmacm, yet there's nothing that I see in the librdmacm that makes me > > think > > it's behaving incorrectly. > > My thinking is that if in the end of this thread we are willing to move > forward without changing ib_find_cached_pkey() then this patch should be > merged. > > Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey
On Thu, 2007-02-22 at 02:28, Or Gerlitz wrote: > Hal Rosenstock wrote: > > On Wed, 2007-02-21 at 15:45, Or Gerlitz wrote: > >> On 21 Feb 2007 08:20:23 -0500, Hal Rosenstock <[EMAIL PROTECTED]> wrote: > > >> If the IPoIB spec does not allow both partial and full members of a > >> partition to share a broadcast domain (eg the IPv4 broadcast group > >> associated with the full membership pkey) or any other multicast > >> group, burn it (or at least the relevant section). > > > I was referring to the IB spec, not an IPoIB RFC. > > Can you provide a pointer? See MCMemberRecord:P_Key description in table 210 (p. 908). > >> The OpenIB code supposed to work and as done with the RDMA CM header, > >> the implementation should not wait for spec to be written or changed. > > > Really ? Maybe I'm mistaken but I didn't think that OpenIB/OpenFabrics > > wanted to issue code which is not IBA spec compliant. > > The code resides in the Linux kernel, period. Linux is not under the > control of this or that organization, period, period. Linux uses an > hierarchic maintainship structure where Roland, Sean and yourself are > listed as the maintainers, which means you are able to promote and/or > block this or that agenda, go for it! OpenIB claims IBA compliance (currently mostly v1.2) and is there any good reason that we shouldn't continue to adhere to this ? -- Hal > Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey
Sean Hefty wrote: >> Note that since the HCA validates the pkey in the in coming packet, no >> matter what the IB SW would do, partial members of a partition can't >> talk to each other. So the approach taken by the core/ipoib code was >> to just ignore the MSb in places where the code looks for the pkey >> --index-- and use the full member pkey when forming MGIDs. This seems >> fine to me. > My concern is that ib_find_cached_pkey() returns an index to a pkey that > wasn't > the one in the search. Can this lead to a QP being configured in such a way > that communication with a remote QP would silently fail? My understanding is that when an IPoIB broadcast domain contains both partial and full members (*) attempts to communicate between two partial members would silently fail, does this silence is something you think we should work to change? (*) eg when you have bunch or clients and a server or bunch of servers and you don't want to allow --clients-- to communicate among themselves) > I'm not against this patch, but I want to make sure that I understand the > issues, so we're not creating a work-around solution. The patch is against > the > librdmacm, yet there's nothing that I see in the librdmacm that makes me > think > it's behaving incorrectly. My thinking is that if in the end of this thread we are willing to move forward without changing ib_find_cached_pkey() then this patch should be merged. Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey
Hal Rosenstock wrote: > On Wed, 2007-02-21 at 15:45, Or Gerlitz wrote: >> On 21 Feb 2007 08:20:23 -0500, Hal Rosenstock <[EMAIL PROTECTED]> wrote: >> If the IPoIB spec does not allow both partial and full members of a >> partition to share a broadcast domain (eg the IPv4 broadcast group >> associated with the full membership pkey) or any other multicast >> group, burn it (or at least the relevant section). > I was referring to the IB spec, not an IPoIB RFC. Can you provide a pointer? >> The OpenIB code supposed to work and as done with the RDMA CM header, >> the implementation should not wait for spec to be written or changed. > Really ? Maybe I'm mistaken but I didn't think that OpenIB/OpenFabrics > wanted to issue code which is not IBA spec compliant. The code resides in the Linux kernel, period. Linux is not under the control of this or that organization, period, period. Linux uses an hierarchic maintainship structure where Roland, Sean and yourself are listed as the maintainers, which means you are able to promote and/or block this or that agenda, go for it! Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey
On Wed, 2007-02-21 at 17:53, Hal Rosenstock wrote: > On Wed, 2007-02-21 at 17:36, Sean Hefty wrote: > > > It does this since its makes life simple and robust. > > > > Is an SM prevented from loading two PKeys into an HCA's PKey table that > > differ > > by only the membership bit? > > Nope. > > > I can't think of any reason to do such a thing, > > Me neither. It would be a configuration error of sorts. It is vendor dependent whether the SM would allow this. As Sasha points out, this cannot be done with OpenSM (at least currently). -- Hal > > but depending on which index was > > selected could limit which nodes you could communicate with. > > > > Note that since the HCA validates the pkey in the in coming packet, no > > > matter what the IB SW would do, partial members of a partition can't > > > talk to each other. So the approach taken by the core/ipoib code was > > > to just ignore the MSb in places where the code looks for the pkey > > > --index-- and use the full member pkey when forming MGIDs. This seems > > > fine to me. > > > > My concern is that ib_find_cached_pkey() returns an index to a pkey that > > wasn't > > the one in the search. Can this lead to a QP being configured in such a > > way > > that communication with a remote QP would silently fail? > > > > I realize that a user could call ib_get_cached_pkey and see if the returned > > value matches the one in the original search, but this is a non-obvious way > > to > > check for a mismatch. > > > > I'm not against this patch, but I want to make sure that I understand the > > issues, so we're not creating a work-around solution. The patch is against > > the > > librdmacm, yet there's nothing that I see in the librdmacm that makes me > > think > > it's behaving incorrectly. > > I'm not sure it's this patch in particular but it appears that there may > be some non compliant behavior being exercised IMO. > > -- Hal > > > - Sean > > > ___ > openib-general mailing list > openib-general@openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey
On Wed, 2007-02-21 at 17:36, Sean Hefty wrote: > > It does this since its makes life simple and robust. > > Is an SM prevented from loading two PKeys into an HCA's PKey table that > differ > by only the membership bit? Nope. > I can't think of any reason to do such a thing, Me neither. It would be a configuration error of sorts. > but depending on which index was > selected could limit which nodes you could communicate with. > > Note that since the HCA validates the pkey in the in coming packet, no > > matter what the IB SW would do, partial members of a partition can't > > talk to each other. So the approach taken by the core/ipoib code was > > to just ignore the MSb in places where the code looks for the pkey > > --index-- and use the full member pkey when forming MGIDs. This seems > > fine to me. > > My concern is that ib_find_cached_pkey() returns an index to a pkey that > wasn't > the one in the search. Can this lead to a QP being configured in such a way > that communication with a remote QP would silently fail? > > I realize that a user could call ib_get_cached_pkey and see if the returned > value matches the one in the original search, but this is a non-obvious way > to > check for a mismatch. > > I'm not against this patch, but I want to make sure that I understand the > issues, so we're not creating a work-around solution. The patch is against > the > librdmacm, yet there's nothing that I see in the librdmacm that makes me > think > it's behaving incorrectly. I'm not sure it's this patch in particular but it appears that there may be some non compliant behavior being exercised IMO. -- Hal > - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey
> It does this since its makes life simple and robust. Is an SM prevented from loading two PKeys into an HCA's PKey table that differ by only the membership bit? I can't think of any reason to do such a thing, but depending on which index was selected could limit which nodes you could communicate with. > Note that since the HCA validates the pkey in the in coming packet, no > matter what the IB SW would do, partial members of a partition can't > talk to each other. So the approach taken by the core/ipoib code was > to just ignore the MSb in places where the code looks for the pkey > --index-- and use the full member pkey when forming MGIDs. This seems > fine to me. My concern is that ib_find_cached_pkey() returns an index to a pkey that wasn't the one in the search. Can this lead to a QP being configured in such a way that communication with a remote QP would silently fail? I realize that a user could call ib_get_cached_pkey and see if the returned value matches the one in the original search, but this is a non-obvious way to check for a mismatch. I'm not against this patch, but I want to make sure that I understand the issues, so we're not creating a work-around solution. The patch is against the librdmacm, yet there's nothing that I see in the librdmacm that makes me think it's behaving incorrectly. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey
On Wed, 2007-02-21 at 15:45, Or Gerlitz wrote: > On 21 Feb 2007 08:20:23 -0500, Hal Rosenstock <[EMAIL PROTECTED]> wrote: > > On Wed, 2007-02-21 at 07:35, Or Gerlitz wrote: > > > > > I believe it is a spec (compliance) violation for the port to be a > > > > partial member and join as a full member. > > > > Since partial members can't talk among themselves, there is no reason to > > > form a multicast group containing --only-- ports that can --not-- talk > > > to each other... So if the spec does not allow this (having a partial > > > member joining with the full member pkey) - it a spec bug... > > > I think there are two issues here then: > > 1. If this is the case, getting the spec changed to accomodate this use case > > 2. I believe that OpenIB code is supposed to be spec compliant. > > If the IPoIB spec does not allow both partial and full members of a > partition to share a broadcast domain (eg the IPv4 broadcast group > associated with the full membership pkey) or any other multicast > group, burn it (or at least the relevant section). I was referring to the IB spec, not an IPoIB RFC. > The OpenIB code supposed to work and as done with the RDMA CM header, > the implementation should not wait for spec to be written or changed. Really ? Maybe I'm mistaken but I didn't think that OpenIB/OpenFabrics wanted to issue code which is not IBA spec compliant. -- Hal > Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey
On 2/21/07, Sean Hefty <[EMAIL PROTECTED]> wrote: > >There is no problem. As i have explained over this thread the ipoib > >and the core abstract away from the user the actual value of the MSb > >of the pkey, that is whether it is a full or partial membership pkey. > > But *why* does the kernel code do this, and should it? It does this since its makes life simple and robust. Note that since the HCA validates the pkey in the in coming packet, no matter what the IB SW would do, partial members of a partition can't talk to each other. So the approach taken by the core/ipoib code was to just ignore the MSb in places where the code looks for the pkey --index-- and use the full member pkey when forming MGIDs. This seems fine to me. Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey
On 21 Feb 2007 08:20:23 -0500, Hal Rosenstock <[EMAIL PROTECTED]> wrote: > On Wed, 2007-02-21 at 07:35, Or Gerlitz wrote: > > > I believe it is a spec (compliance) violation for the port to be a > > > partial member and join as a full member. > > Since partial members can't talk among themselves, there is no reason to > > form a multicast group containing --only-- ports that can --not-- talk > > to each other... So if the spec does not allow this (having a partial > > member joining with the full member pkey) - it a spec bug... > I think there are two issues here then: > 1. If this is the case, getting the spec changed to accomodate this use case > 2. I believe that OpenIB code is supposed to be spec compliant. If the IPoIB spec does not allow both partial and full members of a partition to share a broadcast domain (eg the IPv4 broadcast group associated with the full membership pkey) or any other multicast group, burn it (or at least the relevant section). The OpenIB code supposed to work and as done with the RDMA CM header, the implementation should not wait for spec to be written or changed. Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey
>There is no problem. As i have explained over this thread the ipoib >and the core abstract away from the user the actual value of the MSb >of the pkey, that is whether it is a full or partial membership pkey. But *why* does the kernel code do this, and should it? - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey
On 2/21/07, Sean Hefty <[EMAIL PROTECTED]> wrote: >>However, no matter what the SM configures, the core & ipoib code act as >>the full pkey is there. This is nice simplification and it works well. > Is the problem here really in the librdmacm or in the core/ipoib software? There is no problem. As i have explained over this thread the ipoib and the core abstract away from the user the actual value of the MSb of the pkey, that is whether it is a full or partial membership pkey. IPoIB does it by OR-ing 0x8000 to the pkey it uses and the core does it in ib_find_cached_pkey() which when provided a pkey, return the index of $pkey or of $pkey & 0x7fff which ever one of the them is there. The only missing piece is for librdmacm to play this game as well and the patch does this. > (I looked at the patch, but haven't looked into the full reason why it's > needed.) start with checking me... tell the SM to configure 0x7fff instead of 0x to one of your nodes as the pkey at index 0, then see that ping is working but librdmacm RC utils such as rping or ib_rdma_bw -c do not. Then apply the patch and check again. Or. Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey
On Wed, 2007-02-21 at 07:35, Or Gerlitz wrote: > >> However, no matter what the SM configures, the core & ipoib code act as > >> the full pkey is there. This is nice simplification and it works well. > > > I believe it is a spec (compliance) violation for the port to be a > > partial member and join as a full member. > > Since partial members can't talk among themselves, there is no reason to > form a multicast group containing --only-- ports that can --not-- talk > to each other... So if the spec does not allow this (having a partial > member joining with the full member pkey) - it a spec bug... I think there are two issues here then: 1. If this is the case, getting the spec changed to accomodate this use case. 2. I believe that OpenIB code is supposed to be spec compliant. -- Hal > Or. > > ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey
>> However, no matter what the SM configures, the core & ipoib code act as >> the full pkey is there. This is nice simplification and it works well. > I believe it is a spec (compliance) violation for the port to be a > partial member and join as a full member. Since partial members can't talk among themselves, there is no reason to form a multicast group containing --only-- ports that can --not-- talk to each other... So if the spec does not allow this (having a partial member joining with the full member pkey) - it a spec bug... Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey
On Wed, 2007-02-21 at 01:43, Or Gerlitz wrote: > Hal Rosenstock wrote: > > On Tue, 2007-02-20 at 10:38, Or Gerlitz wrote: > > >> Yes. Its a little bit confusing: partial and full members of an IPoIB IB > >> partition use the same MGID. When an IPoIB MGID is constructed, the pkey > >> placed by the driver is --always-- the full membership one. However, on > >> a node with partial membership, what's plugged into the QP is the pkey > >> index of the partial instance... > > > So in this case, do both the full and partial keys need configuring for > > that port ? > > No. The SM configures --either-- the full or the partial pkey. That's what I was afraid of :-( > However, no matter what the SM configures, the core & ipoib code act as > the full pkey is there. This is nice simplification and it works well. I believe it is a spec (compliance) violation for the port to be a partial member and join as a full member. -- Hal > Or. > ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey
>> Yes. Its a little bit confusing: partial and full members of an IPoIB IB >> partition use the same MGID. When an IPoIB MGID is constructed, the pkey >> placed by the driver is --always-- the full membership one. However, on >> a node with partial membership, what's plugged into the QP is the pkey >> index of the partial instance... > So in this case, do both the full and partial keys need configuring for > that port ? No. The SM configures --either-- the full or the partial pkey. However, no matter what the SM configures, the core & ipoib code act as the full pkey is there. This is nice simplification and it works well. Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey
Hal Rosenstock wrote: > On Tue, 2007-02-20 at 10:38, Or Gerlitz wrote: >> Yes. Its a little bit confusing: partial and full members of an IPoIB IB >> partition use the same MGID. When an IPoIB MGID is constructed, the pkey >> placed by the driver is --always-- the full membership one. However, on >> a node with partial membership, what's plugged into the QP is the pkey >> index of the partial instance... > So in this case, do both the full and partial keys need configuring for > that port ? No. The SM configures --either-- the full or the partial pkey. However, no matter what the SM configures, the core & ipoib code act as the full pkey is there. This is nice simplification and it works well. Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey
On Tue, 2007-02-20 at 10:38, Or Gerlitz wrote: > Hal Rosenstock wrote: > > >> The pkey extracted by the RDMA CM from the IPoIB device hardware address > >> always > >> has the full membership bit set. However, when looking in the pkey table > >> the > >> search must mask out the full membership bit. > > > Is this true for both RC and UD QPs ? I thought that at least the UD QPs > > were being used for multicast in which case wouldn't full member be > > required for this ? > > Yes. Its a little bit confusing: partial and full members of an IPoIB IB > partition use the same MGID. When an IPoIB MGID is constructed, the pkey > placed by the driver is --always-- the full membership one. However, on > a node with partial membership, what's plugged into the QP is the pkey > index of the partial instance... So in this case, do both the full and partial keys need configuring for that port ? -- Hal > In the kernel all this is nicely hidden from the IB ULPs in > ib_find_cached_pkey(). > > Or. > ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey
Hal Rosenstock wrote: >> The pkey extracted by the RDMA CM from the IPoIB device hardware address >> always >> has the full membership bit set. However, when looking in the pkey table the >> search must mask out the full membership bit. > Is this true for both RC and UD QPs ? I thought that at least the UD QPs > were being used for multicast in which case wouldn't full member be > required for this ? Yes. Its a little bit confusing: partial and full members of an IPoIB IB partition use the same MGID. When an IPoIB MGID is constructed, the pkey placed by the driver is --always-- the full membership one. However, on a node with partial membership, what's plugged into the QP is the pkey index of the partial instance... In the kernel all this is nicely hidden from the IB ULPs in ib_find_cached_pkey(). Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey
On Mon, 2007-02-19 at 01:40, Or Gerlitz wrote: > Hi Sean, > > this fixes a bug which did not allow to run librdmacm apps over a node > which is partial member of a partition. The patch takes the approach of the > kernel ib_find_cached_pkey implementation. > > If you approve this, i suggest pushing it also into OFED 1.2 as a bug fix. > > Or. > > -- > The pkey extracted by the RDMA CM from the IPoIB device hardware address > always > has the full membership bit set. However, when looking in the pkey table the > search must mask out the full membership bit. > > Signed-off-by: Or Gerlitz <[EMAIL PROTECTED]> > Signed-off-by: Olga Shern <[EMAIL PROTECTED]> > > diff --git a/src/cma.c b/src/cma.c > index c5f8cd9..9c24c6a 100644 > --- a/src/cma.c > +++ b/src/cma.c > @@ -661,7 +661,7 @@ static int ucma_find_pkey(struct cma_dev > > for (i = 0, ret = 0; !ret; i++) { > ret = ibv_query_pkey(cma_dev->verbs, port_num, i, &chk_pkey); > - if (!ret && pkey == chk_pkey) { > + if ((!ret && pkey == chk_pkey) || (!ret && htons(ntohs(pkey) & > 0x7fff) == chk_pkey)) { Is this true for both RC and UD QPs ? I thought that at least the UD QPs were being used for multicast in which case wouldn't full member be required for this ? -- Hal > *pkey_index = (uint16_t) i; > return 0; > } > > ___ > openib-general mailing list > openib-general@openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey
Or, On 2/19/07, Or Gerlitz <[EMAIL PROTECTED]> wrote: > Hi Sean, > > this fixes a bug which did not allow to run librdmacm apps over a node > which is partial member of a partition. The patch takes the approach of the > kernel ib_find_cached_pkey implementation. > > If you approve this, i suggest pushing it also into OFED 1.2 as a bug fix. > > Or. > > -- > The pkey extracted by the RDMA CM from the IPoIB device hardware address > always > has the full membership bit set. However, when looking in the pkey table the > search must mask out the full membership bit. > > Signed-off-by: Or Gerlitz <[EMAIL PROTECTED]> > Signed-off-by: Olga Shern <[EMAIL PROTECTED]> > > diff --git a/src/cma.c b/src/cma.c > index c5f8cd9..9c24c6a 100644 > --- a/src/cma.c > +++ b/src/cma.c > @@ -661,7 +661,7 @@ static int ucma_find_pkey(struct cma_dev > > for (i = 0, ret = 0; !ret; i++) { > ret = ibv_query_pkey(cma_dev->verbs, port_num, i, &chk_pkey); > - if (!ret && pkey == chk_pkey) { > + if ((!ret && pkey == chk_pkey) || (!ret && htons(ntohs(pkey) > & 0x7fff) == chk_pkey)) { What about just using: if (!ret && pkey | 0x8000 == chk_pkey | 0x8000) { even if not there is no need to check the ret twice in case of limited membership -- Moni > *pkey_index = (uint16_t) i; > return 0; > } > > ___ > openib-general mailing list > openib-general@openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general