Re: [openib-general] failure to create an FMR mapping 1K pages on memfree

2007-02-27 Thread Or Gerlitz
On 2/27/07, Roland Dreier <[EMAIL PROTECTED]> wrote:
> Is it really returning -ENOMEM?  It seems much more likely that you
> are hitting the code
>
> /* For Arbel, all MTTs must fit in the same page. */
> if (mthca_is_memfree(dev) &&
> mr->attr.max_pages * sizeof *mr->mem.arbel.mtts > PAGE_SIZE)
> return -EINVAL;
>
> I guess you could call this limit a driver design issue.

Indeed, sorry for the in accorate description, mthca_fmr_alloc returns
-EINVAL and the fmr pool code returns -ENOMEM. Thanks for the
clarification.

Or.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] failure to create an FMR mapping 1K pages on memfree

2007-02-26 Thread Or Gerlitz
oops - i fogot to CC openib-general.

On 2/26/07, Or Gerlitz <[EMAIL PROTECTED]> wrote:
> Hi Roland,
>
> I have got a report on failure to create FMR mapping 1K pages (that is
> 4MB) on memfree.
>
> I don't have the exact details (ie if Arbel/Sinai / what FW  / etc)
> nor which exact check fails in
> mthca_fmr_alloc, but what's clear is that the latter function returns
> -ENOMEM when attr.max_pages is 1024 and it works fine when
> attr.max_pages is 256.
>
> Is this failure clear to you? if yes, does a HW or FW limit is being
> hit or its a driver design issue?
>
> Or.
>

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] ipoib & the partial pkey

2007-02-26 Thread Or Gerlitz
Hal Rosenstock wrote:
> On Sun, 2007-02-25 at 05:48, Or Gerlitz wrote:

>> Just to have us agree on the quote, it is from section 4 of rfc 4392 
>> (page 14) eg in http://www.ietf.org/rfc/rfc4392.txt

>>> at the time of creating an IB multicast group, multiple values such as the
>>> P_Key, Q_Key, Service Level, Hop Limit, Flow ID, TClass, MTU, etc.  have to 
>>> be
>>> specified.  These values should be such that all potential members of the IB
>>> multicast group are able to communicate with one another when using them.

>> OK, I suggest to remove this spec limitation,

> IMO you would need to get the IB spec changed first in order to do this.

do you refers to this?

> What about the description og P_Key in MCMemberRecord (table 210 on p.
> 908 which is compliance) which states:
> 
> "All members of the multicast group shall have full membership in the
> partition indicated by the partition key."

if yes, indeed, this also has to be changed.

Or.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] ipoib & the partial pkey

2007-02-25 Thread Or Gerlitz
Sean Hefty wrote:
> I looked into this more...
> RFC 4391 states (middle of page 5):
> For a node to join a partition, one of its ports must be assigned the relevant
> P_Key by the SM [RFC4392].

> Jumping to RFC 4392 (top of page 4):

Just to have us agree on the quote, it is from section 4 of rfc 4392 
(page 14) eg in http://www.ietf.org/rfc/rfc4392.txt

> at the time of creating an IB multicast group, multiple values such as the
> P_Key, Q_Key, Service Level, Hop Limit, Flow ID, TClass, MTU, etc.  have to be
> specified.  These values should be such that all potential members of the IB
> multicast group are able to communicate with one another when using them.

OK, I suggest to remove this spec limitation, as it does not allow the 
use case of a server using a partition for which inter-client 
communication is not allowed.

Actually since it does not let people use partial membership 
partitioning with IPoIB as every ipoib device needs to join the 
broadcast group, it is probably a spec bug and not a limitation done on 
purpose.

A simple real-life example is I/O target, the system admin wants IB 
block and/or file storage traffic to use a partition, but he does not 
want initiators to communicate among themselves on this partition.

To achieve that the SM is configured to assign the partial pkey to the 
initiator nodes and the full pkey to the target ports.

The current implementation of IPoIB and core perfectly (and 
transparently...) supports that.

Or.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey

2007-02-22 Thread Or Gerlitz
On 2/22/07, Sean Hefty <[EMAIL PROTECTED]> wrote:
> >My understanding is that when an IPoIB broadcast domain contains both
> >partial and full members (*) attempts to communicate between two partial
> >members would silently fail, does this silence is something you think we
> >should work to change?
>
> I'm looking at this from a different view than just ipoib multicast groups.  
> For
> example, can two users of the ib_cm successfully establish a connection, but 
> not
> actually be able to transfer data between each other?  This seems possible,
> though unlikely.  This is the type of silent failure I'm referring to.

I don't think this is possible since the active CM uses the pkey index
of the pkey provided in REQ.path to send the REQ mad, same for the
passive CM - it uses the index in its table of REQ.path.pkey. So if
the CMs are able to talk over QP1 using this pkey index the CM
consumers can talk over their RC (REQ) / UD (SIDR REQ) QPs. And both
the CM and its consumers would use the same index - the one returned
from the ib_get_cached_pkey

Or.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey

2007-02-22 Thread Or Gerlitz
On 2/22/07, Sean Hefty <[EMAIL PROTECTED]> wrote:
> >An IB multicast group _cannot_ have partial members so this never should
> >get far enough to where two limited members would be unable to
> >communicate.

> Can someone help my understanding here?  Is ipoib joining a multicast group
> using the full membership PKey, even if the node that it joins from only has 
> the
> limited membership PKey configured? And the code in ib_find_cached_pkey helps
> enable this?

Yep. The ipoib create_child  function Or-s 0x8000  to the device pkey
which was provided by the user. Now, IPoIB uses the device pkey when
forming MGIDs and when doing modify qp to init. Indeed the way
ib_find_cached_pkey() is implemented, make the latter use trivial.

Or.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] librdmacm examples not working under OFED 1.2 alpha

2007-02-22 Thread Or Gerlitz
Steve Wise wrote:
> What device?

mthca


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] librdmacm examples not working under OFED 1.2 alpha

2007-02-22 Thread Or Gerlitz
I have tested RH4 U4 and to some extent also RH5 beta and see the following:

under RH4 U4


- rping: addr and route resolution passing, client getting reject on conn req

- udaddy: working fine on both UDP and IPOIB port spaces

- mckey: not applicable on RH4 U4 till my patch with ip_ib_mc_map is merged

under both udaddy and rping librdmacm report:

librdmacm: couldn't read ABI version.
librdmacm: assuming: 4

under RH5
=

basically, the same: rping does not work, udaddy works on both port spaces.
Also was able to check mckey and it works fine on both port spaces.
The ABI error print is not seen.

The rping client/server logs are below,

Or.

rping client


[EMAIL PROTECTED] ~]# rping -c -v -d -a 193.168.80.175
ipaddr (193.168.80.175)
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
created cm_id 0x505f10
cma_event type 0 cma_id 0x505f10 (parent)
cma_event type 2 cma_id 0x505f10 (parent)
rdma_resolve_addr - rdma_resolve_route successful
created pd 0x507830
created channel 0x506260
created cq 0x507880
created qp 0x507990
rping_setup_buffers called on cb 0x505010
allocated & registered buffers...
cq_thread started.
cq completion failed status 5
wait for CONNECTED state 10
connect error -1
rping_free_buffers called on cb 0x505010
cma_event type 8 cma_id 0x505f10 (parent)
cma event 8, error 0

rping server
===
[EMAIL PROTECTED] ~]# rping -s -d -v -S 100 -C 100
verbose
size 100
count 100
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
created cm_id 0x505f00
rdma_bind_addr successful
rdma_listen


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey

2007-02-22 Thread Or Gerlitz
Sean Hefty wrote:
>> Note that since the HCA validates the pkey in the in coming packet, no
>> matter what the IB SW would do, partial members of a partition can't
>> talk to each other. So the approach taken by the core/ipoib code was
>> to just ignore the MSb in places where the code looks for the pkey
>> --index-- and use the full member pkey when forming MGIDs. This seems
>> fine to me.

> My concern is that ib_find_cached_pkey() returns an index to a pkey that 
> wasn't 
> the one in the search.  Can this lead to a QP being configured in such a way 
> that communication with a remote QP would silently fail?

My understanding is that when an IPoIB broadcast domain contains both 
partial and full members (*) attempts to communicate between two partial 
members would silently fail, does this silence is something you think we 
should work to change?

(*) eg when you have bunch or clients and a server or bunch of servers 
and you don't want to allow --clients-- to communicate among themselves)

> I'm not against this patch, but I want to make sure that I understand the 
> issues, so we're not creating a work-around solution.  The patch is against 
> the 
> librdmacm, yet there's nothing that I see in the librdmacm that makes me 
> think 
> it's behaving incorrectly.

My thinking is that if in the end of this thread we are willing to move 
forward without changing ib_find_cached_pkey() then this patch should be 
merged.

Or.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey

2007-02-21 Thread Or Gerlitz
Hal Rosenstock wrote:
> On Wed, 2007-02-21 at 15:45, Or Gerlitz wrote:
>> On 21 Feb 2007 08:20:23 -0500, Hal Rosenstock <[EMAIL PROTECTED]> wrote:

>> If the IPoIB spec does not allow both partial and full members of a
>> partition to share a broadcast domain (eg the IPv4 broadcast group
>> associated with the full membership pkey) or any other multicast
>> group, burn it (or at least the relevant section).

> I was referring to the IB spec, not an IPoIB RFC.

Can you provide a pointer?

>> The OpenIB code supposed to work and as done with the RDMA CM header,
>> the implementation should not wait for spec to be written or changed.

> Really ? Maybe I'm mistaken but I didn't think that OpenIB/OpenFabrics
> wanted to issue code which is not IBA spec compliant.

The code resides in the Linux kernel, period. Linux is not under the 
control of this or that organization, period, period. Linux uses an 
hierarchic maintainship structure where Roland, Sean and yourself are 
listed as the maintainers, which means you are able to promote and/or 
block this or that agenda, go for it!

Or.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey

2007-02-21 Thread Or Gerlitz
On 2/21/07, Sean Hefty <[EMAIL PROTECTED]> wrote:
> >There is no problem. As i have explained over this thread the ipoib
> >and the core abstract away from the user the actual value of the MSb
> >of the pkey, that is whether it is a full or partial membership pkey.
>
> But *why* does the kernel code do this, and should it?

It does this since its makes life simple and robust.

Note that since the HCA validates the pkey in the in coming packet, no
matter what the IB SW would do, partial members of a partition can't
talk to each other. So the approach taken by the core/ipoib code was
to just ignore the MSb in places where the code looks for the pkey
--index-- and use the full member pkey when forming MGIDs. This seems
fine to me.

Or.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey

2007-02-21 Thread Or Gerlitz
On 21 Feb 2007 08:20:23 -0500, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
> On Wed, 2007-02-21 at 07:35, Or Gerlitz wrote:

> > > I believe it is a spec (compliance) violation for the port to be a
> > > partial member and join as a full member.

> > Since partial members can't talk among themselves, there is no reason to
> > form a multicast group containing --only-- ports that can --not-- talk
> > to each other... So if the spec does not allow this (having a partial
> > member joining with the full member pkey) - it a spec bug...

> I think there are two issues here then:
> 1. If this is the case, getting the spec changed to accomodate this use case
> 2. I believe that OpenIB code is supposed to be spec compliant.

If the IPoIB spec does not allow both partial and full members of a
partition to share a broadcast domain (eg the IPv4 broadcast group
associated with the full membership pkey) or any other multicast
group, burn it (or at least the relevant section).

The OpenIB code supposed to work and as done with the RDMA CM header,
the implementation should not wait for spec to be written or changed.

Or.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey

2007-02-21 Thread Or Gerlitz
On 2/21/07, Sean Hefty <[EMAIL PROTECTED]> wrote:
>>However, no matter what the SM configures, the core & ipoib code act as
>>the full pkey is there. This is nice simplification and it works well.

> Is the problem here really in the librdmacm or in the core/ipoib software?

There is no problem. As i have explained over this thread the ipoib
and the core abstract away from the user the actual value of the MSb
of the pkey, that is whether it is a full or partial membership pkey.
IPoIB does it by OR-ing 0x8000 to the pkey it uses and the core does
it in ib_find_cached_pkey() which when provided a pkey, return the
index of $pkey or of $pkey & 0x7fff which ever one of the them is
there. The only missing piece is for librdmacm to play this game as
well and the patch does this.

> (I looked at the patch, but haven't looked into the full reason why it's
> needed.)

start with checking me... tell the SM to configure 0x7fff instead of
0x to one of your nodes  as the pkey at index 0, then see that
ping is working but librdmacm RC utils such as rping or ib_rdma_bw -c
do not. Then apply the patch and check again.

Or.
Or.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH for-2.6.21] IPoIB/cm: improve small message bandwidth

2007-02-21 Thread Or Gerlitz
Michael S. Tsirkin wrote:
> Avoid overhead of freeing/reallocating and mapping/unmapping for dma
> for pages that have not been written to by hardware.

> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c 
> b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> index 8ee6f06..a23c8e3 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> @@ -68,14 +68,14 @@ struct ipoib_cm_id {
>  static int ipoib_cm_tx_handler(struct ib_cm_id *cm_id,
>  struct ib_cm_event *event);
>  
> -static void ipoib_cm_dma_unmap_rx(struct ipoib_dev_priv *priv,
> +static void ipoib_cm_dma_unmap_rx(struct ipoib_dev_priv *priv, int frags,
> u64 mapping[IPOIB_CM_RX_SG])
>  {
>   int i;
>  
>   ib_dma_unmap_single(priv->ca, mapping[0], IPOIB_CM_HEAD_SIZE, 
> DMA_FROM_DEVICE);
>  
> - for (i = 0; i < IPOIB_CM_RX_SG - 1; ++i)
> + for (i = 0; i < frags; ++i)
>   ib_dma_unmap_single(priv->ca, mapping[i + 1], PAGE_SIZE, 
> DMA_FROM_DEVICE);
>  }

I understand that in ipoib_cm_alloc_rx_skb you call dma_map_page on 
IPOIB_CM_RX_SG pages where here you call dma_unmap_single only $frags 
times, correct? does this means you are trashing the IOMMU etc etc of 
the system?

Or.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey

2007-02-21 Thread Or Gerlitz
>> However, no matter what the SM configures, the core & ipoib code act as 
>> the full pkey is there. This is nice simplification and it works well.

> I believe it is a spec (compliance) violation for the port to be a
> partial member and join as a full member.

Since partial members can't talk among themselves, there is no reason to 
form a multicast group containing --only-- ports that can --not-- talk 
to each other... So if the spec does not allow this (having a partial 
member joining with the full member pkey) - it a spec bug...

Or.



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey

2007-02-21 Thread Or Gerlitz
>> Yes. Its a little bit confusing: partial and full members of an IPoIB IB 
>> partition use the same MGID. When an IPoIB MGID is constructed, the pkey 
>> placed by the driver is --always-- the full membership one. However, on 
>> a node with partial membership, what's plugged into the QP is the pkey 
>> index of the partial instance...

> So in this case, do both the full and partial keys need configuring for
> that port ?

No. The SM configures --either-- the full or the partial pkey.

However, no matter what the SM configures, the core & ipoib code act as 
the full pkey is there. This is nice simplification and it works well.

Or.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey

2007-02-20 Thread Or Gerlitz
Hal Rosenstock wrote:
> On Tue, 2007-02-20 at 10:38, Or Gerlitz wrote:

>> Yes. Its a little bit confusing: partial and full members of an IPoIB IB 
>> partition use the same MGID. When an IPoIB MGID is constructed, the pkey 
>> placed by the driver is --always-- the full membership one. However, on 
>> a node with partial membership, what's plugged into the QP is the pkey 
>> index of the partial instance...

> So in this case, do both the full and partial keys need configuring for
> that port ?

No. The SM configures --either-- the full or the partial pkey.

However, no matter what the SM configures, the core & ipoib code act as 
the full pkey is there. This is nice simplification and it works well.

Or.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey

2007-02-20 Thread Or Gerlitz
Hal Rosenstock wrote:

>> The pkey extracted by the RDMA CM from the IPoIB device hardware address 
>> always
>> has the full membership bit set. However, when looking in the pkey table the
>> search must mask out the full membership bit.

> Is this true for both RC and UD QPs ? I thought that at least the UD QPs
> were being used for multicast in which case  wouldn't full member be
> required for this ?

Yes. Its a little bit confusing: partial and full members of an IPoIB IB 
partition use the same MGID. When an IPoIB MGID is constructed, the pkey 
placed by the driver is --always-- the full membership one. However, on 
a node with partial membership, what's plugged into the QP is the pkey 
index of the partial instance...

In the kernel all this is nicely hidden from the IB ULPs in 
ib_find_cached_pkey().

Or.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Fork issues with simple MPI program

2007-02-20 Thread Or Gerlitz
Arlin Davis wrote:
> Any insight would be greatly appreciated. It was our assumption that the 
> parent process can continue
> to use IB resources after the fixes went into 2.6.16 and OFED 1.1. Is this 
> true? 

As was discussed over this list in few occasions: in contrast to popular 
thought the fork support was deployed in libibverbs1.1 where OFED 1.1 
contains libibverbs1.0

Or.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [openfabrics-ewg] OFED 1.2 alpha release

2007-02-19 Thread Or Gerlitz
Tziporet Koren wrote:
> Regarding RHEL4 U4 and IPoIB bug - Or just prepared a patch that should 
> fix it. We will merge it and test for the beta.

The patch will only fix the bug for RDMA CM multicast consumers, since 
unlike IPoIB who gets the (wrong in the RH4 U4 case) L2 multicast 
address from the stack, the rdma cm has the multicast IP address and is 
able to compute the correct L2 address. This is confusing, i know...





___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey

2007-02-18 Thread Or Gerlitz
Hi Sean,

this fixes a bug which did not allow to run librdmacm apps over a node
which is partial member of a partition. The patch takes the approach of the
kernel ib_find_cached_pkey implementation.

If you approve this, i suggest pushing it also into OFED 1.2 as a bug fix.

Or.

--
The pkey extracted by the RDMA CM from the IPoIB device hardware address always
has the full membership bit set. However, when looking in the pkey table the
search must mask out the full membership bit.

Signed-off-by: Or Gerlitz <[EMAIL PROTECTED]>
Signed-off-by: Olga Shern <[EMAIL PROTECTED]>

diff --git a/src/cma.c b/src/cma.c
index c5f8cd9..9c24c6a 100644
--- a/src/cma.c
+++ b/src/cma.c
@@ -661,7 +661,7 @@ static int ucma_find_pkey(struct cma_dev

for (i = 0, ret = 0; !ret; i++) {
ret = ibv_query_pkey(cma_dev->verbs, port_num, i, &chk_pkey);
-   if (!ret && pkey == chk_pkey) {
+   if ((!ret && pkey  == chk_pkey) || (!ret && htons(ntohs(pkey) & 
0x7fff)  == chk_pkey)) {
*pkey_index = (uint16_t) i;
return 0;
}

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] uDAPL: RDMA Write example

2007-02-18 Thread Or Gerlitz
Christian Kaiser wrote:
> I'm trying to find a small sample program, that uses RDMA Write instead 
> of Send/Recv. In the sources there is no single uDAPL example program 
> and on the net neither.
> Could someone please help me to find something useful?

see http://dapl.svn.sourceforge.net/viewvc/dapl/trunk/test/dapltest

Anyway, can you comment what using udapl buys you which you don't get 
from coding to the verbs (libibverbs) and rdmacm (librdmacm) ???

Or.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] IPv6 multicast address per NIC

2007-02-15 Thread Or Gerlitz
Hal Rosenstock wrote:
> Or,
> 
> On Thu, 2007-02-15 at 04:25, Or Gerlitz wrote:
>> Hi,
>>
>> I see that when IPv6 is enabled in the kernel, the stack joins for a 
>> --dedicated-- multicast group per each interface. Can anyone here supply 
>> me with a pointer to where this is defined, doing a quick look on rfc 
>> 3307 did not provide an answer.
> 
> You are referring to the solicited-node multicast address (see RFC
> 4291). There have been several different threads on issues relating to
> this on this list over time.

thanks for the pointer, i will look into that.

Or.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] IPv6 multicast address per NIC

2007-02-15 Thread Or Gerlitz
Hi,

I see that when IPv6 is enabled in the kernel, the stack joins for a 
--dedicated-- multicast group per each interface. Can anyone here supply 
me with a pointer to where this is defined, doing a quick look on rfc 
3307 did not provide an answer.

Or.

Below is the maddr show on a node with two partitions on ib0, note that 
the --pkey-- is not presented in the link addresses since IPoIB fill 
that in its own copy (i don't mind send a patch to fix that if anyone 
here think it is helpful).

$ ip maddr show

> 41: ib0
> link  00:ff:ff:ff:ff:12:40:1b:00:00:00:00:00:00:00:00:00:00:00:01
> link  00:ff:ff:ff:ff:12:60:1b:00:00:00:00:00:00:00:01:ff:98:00:6d
> link  00:ff:ff:ff:ff:12:60:1b:00:00:00:00:00:00:00:00:00:00:00:01
> inet  224.0.0.1
> inet6 ff02::1:ff98:6d
> inet6 ff02::1
> 45: ib0.8001
> link  00:ff:ff:ff:ff:12:40:1b:00:00:00:00:00:00:00:00:00:00:00:01
> link  00:ff:ff:ff:ff:12:60:1b:00:00:00:00:00:00:00:01:ff:98:00:6d
> link  00:ff:ff:ff:ff:12:60:1b:00:00:00:00:00:00:00:00:00:00:00:01
> inet  224.0.0.1
> inet6 ff02::1:ff98:6d
> inet6 ff02::1
> 46: ib0.8003
> link  00:ff:ff:ff:ff:12:40:1b:00:00:00:00:00:00:00:00:00:00:00:01
> link  00:ff:ff:ff:ff:12:60:1b:00:00:00:00:00:00:00:01:ff:98:00:6d
> link  00:ff:ff:ff:ff:12:60:1b:00:00:00:00:00:00:00:00:00:00:00:01
> inet  224.0.0.1
> inet6 ff02::1:ff98:6d
> inet6 ff02::1
> 


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] please pull for 2.6.21: fix + add IB multicast support

2007-02-14 Thread Or Gerlitz
Or Gerlitz wrote:
> Roland Dreier wrote:
>> I merged the "increment port number" and "remove redundant '_wq'"
>> patches from git.openfabrics.org/~shefty/scm/rdma-dev.git for-roland
>>
>> I plan to review to multicast stuff next week and I hope to merge it
>> for 2.6.21.  Or, have you or anyone else at Voltaire read over the
>> code in addition to using it?  Do you see anything that should be
>> cleaned up?
> 
> OK, I spent some time today on reviewing and playing with the ib_sa: 
> track multicast join/leave requests patch - and have no special 
> comments. I think the two patches are ready for merge, let me know if 
> you have any specific question.

Roland - any progress here?

Or.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] mvapich2 ofed 1.2 problem

2007-02-13 Thread Or Gerlitz
Roland Dreier wrote:
>  > How do I tell?  Can I tell from the .so files?
> 
> ldd on the .so and the app would probably give you good info.
> 
> I'm pretty sure that mpicc must be linking against an libibverbs 1.0
> from somewhere.

To be really sure which dynamic libraries where loaded, do

$ info sharedlibrary

within the gdb console

Or.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] please pull for 2.6.21: fix + add IB multicast support

2007-02-11 Thread Or Gerlitz
Roland Dreier wrote:
> I merged the "increment port number" and "remove redundant '_wq'"
> patches from git.openfabrics.org/~shefty/scm/rdma-dev.git for-roland
> 
> I plan to review to multicast stuff next week and I hope to merge it
> for 2.6.21.  Or, have you or anyone else at Voltaire read over the
> code in addition to using it?  Do you see anything that should be
> cleaned up?

OK, I spent some time today on reviewing and playing with the ib_sa: 
track multicast join/leave requests patch - and have no special 
comments. I think the two patches are ready for merge, let me know if 
you have any specific question.

Or.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] please pull for 2.6.21: fix + add IB multicast support

2007-02-09 Thread Or Gerlitz
On 2/9/07, Roland Dreier <[EMAIL PROTECTED]> wrote:
> I plan to review to multicast stuff next week and I hope to merge it for 
> 2.6.21

thanks, good news!

> Or, have you or anyone else at Voltaire read over the
> code in addition to using it?  Do you see anything that should be
> cleaned up?

OK, I most the the review i did (and interaction with Sean to add changes) was
on the rdma_cm: add multicast communication support patch, and i was
less focused
on the ib_sa: track multicast join/leave requests patch,  however i
recall that there were some discussions between Sean and Michael and
they reached an agreement.

I will look on the ib_sa patch on Sunday and let Sean/you know if i
have any comments.

Or.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] please pull for 2.6.21: fix + add IB multicast support

2007-02-08 Thread Or Gerlitz
Or Gerlitz wrote:
> Sean Hefty wrote:

>>> Sean Hefty (3):
>>>rdma_cm: Increment port number after close to avoid re-use.
>>>ib_sa: track multicast join/leave requests
>>>rdma_cm: add multicast communication support

>> Assuming that you haven't look at this yet, I updated the ib_sa patch 
>> above to shorten the workqueue name, plus added a fourth patch to 
>> shorten the workqueue names for ib_addr and rdma_cm.  E.g. "ib_mcast_wq" 
>> became "ib_mcast".

> Roland,

> We are working (developing and testing) with a userspace rdma cm based 
> multicast app over this code during the last two months and are very 
> satisfied with it. The testing included IPoIB, the user space app and 
> multicast interoperability between them.

Roland,

Can you comment on the status of this merge request?

thanks,

Or.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] please pull for 2.6.21: fix + add IB multicast support

2007-02-06 Thread Or Gerlitz
Sean Hefty wrote:
>> Sean Hefty (3):
>>rdma_cm: Increment port number after close to avoid re-use.
>>ib_sa: track multicast join/leave requests
>>rdma_cm: add multicast communication support
> 
> Assuming that you haven't look at this yet, I updated the ib_sa patch 
> above to shorten the workqueue name, plus added a fourth patch to 
> shorten the workqueue names for ib_addr and rdma_cm.  E.g. "ib_mcast_wq" 
> became "ib_mcast".

> Let me know if you need any assistance.

Roland,

Can you comment on the multicast changes merge for 2.6.21 status?

We are working (developing and testing) with a userspace rdma cm based 
multicast app over this code during the last two months and are very 
satisfied with it. The testing included IPoIB, the user space app and 
multicast interoperability between them.

Or.



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Immediate data question

2007-02-05 Thread Or Gerlitz
On 2/5/07, Tang, Changqing <[EMAIL PROTECTED]> wrote:

> On sender side:
> opcode = IBV_WR_SEND_WITH_IMM;
> imm_data = my_4_bytes_data;
> Do I still need to specify sg_list and num_sge ?

At the sender side i think you can do well with:
opcode = IBV_WR_SEND
send_flags |= IBV_SEND_INLINE
sge.addr = pointer to the 4 bytes
sge.len   = 4
sge.lkey = don't care

since the 4 bytes are --copied-- by the IB library from sge.addr
during the execution of ibv_post_send(), the owenership of sge.addr is
yours once the call returns.

> On receiver side, because the immediate data is inside the completion
> structure, do I need to post a receive for above message ?

yes, i don't see how you can get a way from posting a receive WR

> The reason I ask is that at some point, I can not(or hard) to provide
> registered memory only for 4 bytes data.

what about the mpi impl. header ??? do you have a case where only 4
bytes need to be passed to the other side?

Or.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] ip_ib_mc_map?

2007-02-04 Thread Or Gerlitz
Or Gerlitz wrote:
> On 2/2/07, Doug Ledford <[EMAIL PROTECTED]> wrote:
>> Yeah, I've got a setup, I just don't have any multicast tests that I
>> run.  Any test programs you have for multicast in particular would be
>> helpful.

> This is farely simple to do: have some multicast traffic routed over
> an IPoIB subnet on two nodes, eg using
> 
> $ route add -net 224.0.0.0 netmask 255.0.0.0 dev ib0
> $ iperf -usB 224.5.5.5 -i 1

OK, to verifying the problem is away based on running client/server is 
actually harder, since when the problem persist data is being moved on 
the broadcast group... so basically, first thing you want to do is set 
routing, then open an iperf server and see if the netstack has computed 
a correct IPoIB multicast hw address and instructed the device to use it.

> # iperf -usB 224.5.5.5 &

this is on U3, the stack computed fine the hw addresses for 224.5.5.5 
and 224.0.0.1

> # ip maddr show ib0
> 5:  ib0
> link  00:ff:ff:ff:ff:12:40:1b:00:00:00:00:00:00:00:00:00:05:05:05
> link  00:ff:ff:ff:ff:12:40:1b:00:00:00:00:00:00:00:00:00:00:00:01
> inet  224.5.5.5
> inet  224.0.0.1

this is on U4, the stack did not compute any hw addresses for 224.5.5.5 
and 224.0.0.1, the inet addresses are the output of /proc/net/igmp which 
means the stack is aware this node joins these groups but as we know the 
ARPHRD_INFINIBAND case was removed from the code computing a multicast 
link layer address...

> # ip maddr show ib0
> 8:  ib0
> inet  224.5.5.5
> inet  224.0.0.1

So basically, if on your U5-staged node, you have the same
# ip maddr show output as over U3 we made a progress. Really verifying 
that this traffic does not go over the broadcast group is a little bit 
harder, you would need a third active IPoIB device (that is another node 
or a second ipoib running device on the rx machine - eg ib1), run the 
iperf multicast test and make sure the --rx counters-- of the third 
device doe not get progress, where on U3 they would progress since all 
mcast traffic goes on the broadcast channel.

Please let me know if you need any further clarifications on how to test 
this, and... thanks! for taking care of it.

Or.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Detecting when an RDMA writer process disappears

2007-02-04 Thread Or Gerlitz
Mike Heffner wrote:
> Is there any method by which a receiving process that is polling in 
> preregistered memory regions for data from a sender performing RDMA 
> writes, can detect if the sender is killed? Say by a SIGKILL signal? The 
> RC connection is setup using the RDMA CM and there do not appear to be 
> any CM events created on the event channel

If you have a process with connected RDMA CM ID whose associated peer 
process died you should get DISCONNECTED event. how do you verify that 
there is no rdma cm event present at the polling side?

Or.







___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] ip_ib_mc_map?

2007-02-01 Thread Or Gerlitz
On 2/2/07, Doug Ledford <[EMAIL PROTECTED]> wrote:
> > As of the importance for us to have IP multicast working fine with
> > IPoIB over RH4...
> > do you have an IB setup to test that?
>
> Yeah, I've got a setup, I just don't have any multicast tests that I
> run.  Any test programs you have for multicast in particular would be
> helpful.

This is farely simple to do: have some multicast traffic routed over
an IPoIB subnet on two nodes, eg using

$ route add -net 224.0.0.0 netmask 255.0.0.0 dev ib0

and then

server

$ iperf -usB 224.5.5.5 -i 1

client

$ iperf -uc 224.5.5.5 -l 100 -b 50M -t 30 -i 1

Or.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] ip_ib_mc_map?

2007-02-01 Thread Or Gerlitz
On 2/1/07, Doug Ledford <[EMAIL PROTECTED]> wrote:
> On Thu, 2007-02-01 at 10:58 +0200, Or Gerlitz wrote:

> >  From a reason that no one at RH can trace... someone went and removed
> > all the support for ARPHRD_INFINIBAND multicast from u4 where it exists
> > perfectly fine in u3 and hopefully on u5 as well (Doug can you update?),
> > see https://bugs.openfabrics.org/show_bug.cgi?id=2661

> Yes.  It's been fixed for U5.  It wasn't that the patch got removed,
> it's that between U3 and U4 I did a complete rebase, which means that
> all the patches from U3 were tossed out the window and a complete new
> set made for U4.  I just missed re-adding this one in U4.

thanks for fixing this for U5 (which i understand is not out yet, correct?).

As of the importance for us to have IP multicast working fine with
IPoIB over RH4...
do you have an IB setup to test that?

Or.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] ip_ib_mc_map?

2007-02-01 Thread Or Gerlitz
Michael S. Tsirkin wrote:
>> 1) adding a backport to the rdma_cm containing ip_ib_mc_map, period.

>> 2) having the rdma cm follow the net stack and make its consumer use the 
>> broadcast group.

> Correct. Since multicast is broken in other respects on U4
> (sockets can't join multicast groups), I think 2 is the simplest approach.

The situation in U4 is kind of more involved, sockets doing 
IP_ADD_MEMBERSHIP to some multicast group are actually sending and 
receiving traffic over the IPoIB broadcast group which makes this 
cluster IPoIB kind of hell.

> Anyone who wants IPoIB milticast should just stay away from U4.

We are still interested to be able to run our multicast app over the 
RDMA CM and we want it to be done over the correct multicast group and 
not over a broadcast group. So option 2 is real problem for us.

Or.




___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [mthca] Creation of a SRQ with many WR (> 16K) in kernel level fails

2007-02-01 Thread Or Gerlitz
Michael S. Tsirkin wrote:
>> As for the user space sharing of the same limitation, how about adding 
>> to the --kernel-- struct ib_device_attr "for user space" buddy fields to 
>> max_qp_wr max_srq_wr and max_cqe such that each hw driver set both 
>> values: for the "user space" field the actual hw limitation and for 
>> "kernel space" field a value which would pass kmalloc.

> We could do that I guess but no one so far used query in kernel,
> and userspace values are currently good.

srp calls ibv_device_query but does not care for these fields, as for 
IPoIB CM if you see things as in my other email, i guess you don't need 
to query as well.

However, as this is a kind of easy to implement change which does not 
break the user kernel ABI and allows kernel consumers to count on query 
results they got from the hw driver, going longer term i think we do 
want to have it done.

Or.






___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [mthca] Creation of a SRQ with many WR (> 16K) in kernel level fails

2007-02-01 Thread Or Gerlitz
Roland Dreier wrote:
>  > anyway, the solution that comes into my mind is to disable creating a
>  > QP/SRQ for which > 128KB allocations are needed. So
>  > mthca_query_device() will set the max_qp_wr and max_srq_wr attributes
>  > to values whose derived size still allows to use kmalloc.
> 
> But that will limit the size of the queues that userspace can create
> too.  I guess we could allocate kernel wrid arrays with vmalloc(), but
> I wonder if anyone actually cares about this limit...

mmm, i would avoid vmalloc if possible. Allocating upto 128K bytes for a 
kernel resource sounds fine.

As for the user space sharing of the same limitation, how about adding 
to the --kernel-- struct ib_device_attr "for user space" buddy fields to 
max_qp_wr max_srq_wr and max_cqe such that each hw driver set both 
values: for the "user space" field the actual hw limitation and for 
"kernel space" field a value which would pass kmalloc.

kernel ULPs calling ibv_device_query would use the original fields, no 
need to patch them. Same for user space ULPs no need to patch them.

However, when the call is made from user space, uverbs_query_device 
copies to the resp struct the "user space" attr.

Or.



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [mthca] Creation of a SRQ with many WR (> 16K) in kernel level fails

2007-02-01 Thread Or Gerlitz
Dotan Barak wrote:
> I think that now, when implementation of IPoIB CM is available and SRQ 
> is being used, one may
> need to use a SRQ with more than 16K WRs.

IPoIB UD uses SRQ by nature (since RX from all peers consume buffers 
from the --only-- RQ) and lives fine with 32 buffers (or 64 you can look 
in the code). Moreover, my assumption is that

pps(RC) <= pps(UC) <= pps(UD)

this means that what ever number of RX buffer for UD/2K MTU which is 
"enough" to have no (or close to zero) packet loss under some traffic 
pattern, the same pattern can be served with IPoIB CM using SRQ of the 
same size.

Or.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] ip_ib_mc_map?

2007-02-01 Thread Or Gerlitz
Steve Wise wrote:
> where can I find this symbol?  I can't load rdma_cm on rhel4u4...
> rdma_cm: Unknown symbol ip_ib_mc_map

Sean, OK, sorry not to mention the rh4u4 issue once you did the push to 
OFED 1.2 ...

 From a reason that no one at RH can trace... someone went and removed 
all the support for ARPHRD_INFINIBAND multicast from u4 where it exists 
perfectly fine in u3 and hopefully on u5 as well (Doug can you update?), 
see https://bugs.openfabrics.org/show_bug.cgi?id=2661

Specifically, the below snip from the patch means that on rh4 u4 all 
IPv4 ARPHRD_INFINIBAND multicast goes on the broadcast group !!!

> Index: linux-2.6.9/net/ipv4/arp.c
> ===
> --- linux-2.6.9.orig/net/ipv4/arp.c   2004-10-18 23:55:06.0 +0200
> +++ linux-2.6.9/net/ipv4/arp.c2006-09-20 14:43:59.0 +0300
> @@ -213,6 +213,9 @@
>   case ARPHRD_IEEE802_TR:
>   ip_tr_mc_map(addr, haddr);
>   return 0;
> + case ARPHRD_INFINIBAND:
> + ip_ib_mc_map(addr, haddr);
> + return 0;
>   default:
>   if (dir) {
>   memcpy(haddr, dev->broadcast, dev->addr_len);

anyway, OFED wise, i see two ways to solve this:

1) adding a backport to the rdma_cm containing ip_ib_mc_map, period.

This means that apps offloading multicast traffic through the rdma cm 
would use the correct group where apps working through the net stack
use the broadcast group.

2) having the rdma cm follow the net stack and make its consumer use the 
broadcast group.

Or.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [RFC][PATCH] rdma_cm: allow joins to return a unique address

2007-01-31 Thread Or Gerlitz
On 1/30/07, Sean Hefty <[EMAIL PROTECTED]> wrote:
> I believe that this patch lets you can do what you're trying to do.  The group
> handle would be the returned mgid from the initial join that created the 
> group.
>   The mgid would need to be passed to other processes as an IPv6 address, who
> issue a join request on that group.  (The mgid is available from the
> rdma_cm_event.param.ud.ah_attr.grh.dgid.)

Sean,

I understand that your approach relies on the uniqueness of the MGID
being generated. This means that to have different MPI jobs use
different MGIDs , the MGIDs must be generated --always-- on the same
NODE and be propagated to other nodes/ranks participating in that MPI
job - correct?

Andrew - can you fulfil this demand? that is having the rank which
generated MGIDs always run on the same node of the cluster???

Or.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [mthca] Creation of a SRQ with many WR (> 16K) in kernel level fails

2007-01-30 Thread Or Gerlitz
Dotan Barak wrote:
> When one tries to create a SRQ with many WR (> 16K WR), creation of the SRQ
> fails.

> static int mthca_alloc_srq_buf(struct mthca_dev *dev, struct mthca_pd *pd,
>struct mthca_srq *srq)
> srq->wrid = kmalloc(srq->max * sizeof (u64), GFP_KERNEL);
> if (!srq->wrid)
> return -ENOMEM;
> which means that creating a SRQ with 16K WRs (or more), the driver will try to
> allocate 16K*8=128K bytes using kmalloc. This is a very high amount of memory
> to be allocated using kmalloc.

mthca_alloc_wqe_buf has the same problem, as it does qp->wrid = 
kmalloc((qp->rq.max + qp->sq.max) * sizeof (u64), GFP_KERNEL);

anyway, the solution that comes into my mind is to disable creating a 
QP/SRQ for which > 128KB allocations are needed. So mthca_query_device() 
will set the max_qp_wr and max_srq_wr attributes to values whose derived 
size still allows to use kmalloc.

Or.



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 1/2] rdma_cm: add support to join IPOIB multicast groups

2007-01-25 Thread Or Gerlitz
On 1/25/07, Sean Hefty <[EMAIL PROTECTED]> wrote:

> >The only missing piece here, as we agreed yesterday is to allow using
> >PS_IPOIB IDs for unicast traffic over librdmacm, i guess this should be
> >fairly simple to add.

> I'm adding this now.  I would like to include all of these changes as part of
> the multicast code push for OFED/upstream.  I hope to test this today.

Cool, just to make sure... the push to OFED should include both the
kernel and librdmacm changes... i did not see a commit of the
librdmacm patch to your librdmacm git tree.

thanks for all your help and responsiveness

Or.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 1/2] rdma_cm: add support to join IPOIB multicast groups

2007-01-25 Thread Or Gerlitz
Sean Hefty wrote:
> Add to the rdma_cm an IPOIB port space that allows interoperability with
> IPoIB multicast traffic.  Use of the RDMA_PS_IPOIB is limited to multicast
> join/leave.
> 
> Rename the RDMA_UD_QKEY to RDMA_UDP_QKEY to signify that the qkey is only
> used with the RDMA_PS_UDP port space.
> 
> Signed-off-by: Sean Hefty <[EMAIL PROTECTED]>
> ---
> This patch differs from those posted by Or by limiting the ipoib port space
> to multicast traffic only.

OK, Sean i have tested the two patches and things are working fine, that 
is I have changed my multicast app to use RDMA_PS_IPOIB instead of 
RDMA_PS_UDP and  I am now able to run it against itself and against 
ipoib in all the possibilities :
tx-app   / rx-ipoib
tx-ipoib / rx-app
tx-app   / rx-app

this means that basically (*) you have my OK for pushing the mutlicast 
support to OFED 1.2 (again my thinking is that this is fine for upstream 
as well).

The only missing piece here, as we agreed yesterday is to allow using 
PS_IPOIB IDs for unicast traffic over librdmacm, i guess this should be
fairly simple to add.

However, as the code freeze deadline becomes closer, would you be able 
to implement and push this by the end of this week?

Basically, my thinking is that if have the code that allows PS_IPOIB to 
do unicast and you have both udaddy and mckey working in --both-- 
PS_IPOIB and PS_UDP modes - push that.

how does this sounds to you?

Or.




___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 1/2] rdma_cm: add support to join IPOIB multicast groups

2007-01-24 Thread Or Gerlitz
On 1/24/07, Sean Hefty <[EMAIL PROTECTED]> wrote:
> > The peer IPoIB would send an ARP and then would assume it can send its
> > packets to the QP number provided in the arp reply, so it would be
> > talking not with the rdma cm consumer but rather with the underlying
> > IPoIB in this node.
>
> Okay - so you want to change the QPN from that given in the ARP?  I missed 
> that
> you wanted this, and I think I understand better what you're trying to do.

we don't want to use the QPN from the arp reply but rather the sidr
exchange etc as it is implemented in the rdma cm.


> > no! it is broken since the PS_IPOIB ID/QP that joined/attached the
> > multicast group is now using the ipoib broadcast qkey where the PS_UDP
> > ID/QP is using the RDMA_UDP_QKEY
>
> I'm only trying to support communication within the same port space, not 
> between
> them.  Unicast is supported between different RDMA_PS_IPOIB QPs.

working only within a port space makes sense. However, your patch does
not allow for PS_IPOIB IDs  to do unicast since some places in the cma
kernel code only care for PS_UDP where they should care for PS_UDP OR
PS_IPOIB as i did in my patch...

> The question
> is how to obtain the IB unicast address (i.e. QPN, etc.) for RDMA_PS_IPOIB.  
> My
> assumption was that this capability wasn't needed, but you're saying that it 
> is.
> I will update the patches.

thanks, and again its fine to obtain the IB unicast address for
PS_IPOIB IDs using the sidr exchange, you don't need to worry on the
ARP result. Only make sure that PS_IPOIB uses the ipoib broadcast
group qkey and also to what i mention above (code branching on PS_UDP
where it should do so on PS_UDP or PS_IPOIB).

thanks!

Or.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 1/2] rdma_cm: add support to join IPOIB multicast groups

2007-01-24 Thread Or Gerlitz
On 1/24/07, Or Gerlitz <[EMAIL PROTECTED]> wrote:
> On 1/24/07, Sean Hefty <[EMAIL PROTECTED]> wrote:

> > > However, it is possible that an RDMA_PS_IPOIB consumer would want to
> > > talk over ---one-- UD QP with two peers:
> > > 1) IPoIB - multicast traffic
> > > 2) --another-- RDMA CM consumer - unicast traffic

> > After the user joins the multicast group, unicast traffic is still 
> > supported.

> no! it is broken since the PS_IPOIB ID/QP that joined/attached the
> multicast group is now using the ipoib broadcast qkey where the PS_UDP
> ID/QP is using the RDMA_UDP_QKEY

OK, i have managed to confuse myself... with the patch you have sent
PS_IPOIB ID does not does support unicast traffic so this all use
scanrio is not possible from the first place.

But, my preferation is not to block RDMA CM use patterns of UD unicast
to UD unicast and UD unicast to UD unicast/multicast etc.

Or.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 1/2] rdma_cm: add support to join IPOIB multicast groups

2007-01-24 Thread Or Gerlitz
On 1/24/07, Sean Hefty <[EMAIL PROTECTED]> wrote:
> > However, it is possible that an RDMA_PS_IPOIB consumer would want to
> > talk over ---one-- UD QP with two peers:
> >
> > 1) IPoIB - multicast traffic
> > 2) --another-- RDMA CM consumer - unicast traffic

> My thinking on this was that path record lookup and SIDR resolution isn't part
> of the ipoib protocol, and I wanted to limit the scope of the patch.

indeed they are not part of the ipoib protocol, but the reason there
interop is not possible between PS_IPOIB ID/QP to peer node IPoIB UD -
is much more simple - as of IPoIB address resolution...

The peer IPoIB would send an ARP and then would assume it can send its
packets to the QP number provided in the arp reply, so it would be
talking not with the rdma cm consumer but rather with the underlying
IPoIB in this node.

On the other direction you are correct, IPoIB does not listen for SIDR requests.

> After the user joins the multicast group, unicast traffic is still supported.

no! it is broken since the PS_IPOIB ID/QP that joined/attached the
multicast group is now using the ipoib broadcast qkey where the PS_UDP
ID/QP is using the RDMA_UDP_QKEY

> The issue I see is whether the rdma_cm uses address resolution (which ends up
> being IP ARP), an SA query, and SIDR to resolve the remote QPN, or if it can
> obtain it through some other method.

A possible fallback to RDMA CM consumer is: issue ARP, then send SIDR
- if there is no response use the IPoIB  QP from the ARP reply and the
ipv4 broadcast qkey to talk directly with IPoIB. However, as i mention
above this hack is not possible in the other direction, that is you
can't make IPoIB do unicast talking with PS_IPOIB consumer.

Or.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] ib_addr: Handle Ethernet neighbour updates during route resolution.

2007-01-24 Thread Or Gerlitz
On 1/24/07, Steve Wise <[EMAIL PROTECTED]> wrote:
> Handle Ethernet neighbour updates during route resolution.

> The IWCM uses the ib_addr services to do route resolution (neighbour
> discovery in the IP world).  The ib_addr netevent callback routine,
> however, currently only acts on Inifininband neighbour updates.  It needs
> to act on ethernet neighbour updates as well.

> This patch just removes filtering on device type altogether and
> will trigger on any neighour updates where the nud_type is valid.
> This simplifies the code some.

OK, as I have mentioned in the past there is a check in the fast path
xmit code of IPoIB to verify that the neighbour we are using now to
xmit (skb->neigh) has not changed its HA address since the last time
IPoIB xmit-ed with it - that is that the GID in the struct
neighbous->ha is the same as the GID in struct ipoib_neigh.

Such a diff happens when the kernel is acting to gratitius arp - that
is a remote peer has changed its HW address (eg as of fail-over of an
IP address from one IPoIB NIC to another IPoIB NIC - eg with bonding).

>From this patch i understand that we can register to the neighbour
change event in IPoIB and eliminate the run time check !?!?!?

Or.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] RDMA CM multicast

2007-01-24 Thread Or Gerlitz
On 1/24/07, Sean Hefty <[EMAIL PROTECTED]> wrote:

> What would be needed is a way for the user to indicate that they need a unique
> address.  An obvious way to accomplish this is for the user to specify an IP
> address of 0.0.0.0 when calling rdma_join_multicast().  The user would first
> need to bind to a specific device by calling rdma_bind_addr() with a local IP
> address.

> Your code would look something like this:

> rdma_bind_addr(local IP address)
> rdma_join_multicast(0.0.0.0, port 0)<- exchange group info out of band
> rdma_join_multicast(0.0.0.0, port 1)<- exchange group info out of band
> send data to a lot of nodes at once
> rdma_leave_multicast(0.0.0.0, port 0)
> rdma_leave_multicast(0.0.0.0, port 1)

Sean,

This seems to me as a little bit of over engineering... since we do
require that to use the RDMA CM the consumers must have a functional
IPoIB NIC (so they can call rdma_bind_addrress to resolve the
device/port/pkey) we can add another requirement to have the sys admin
configure their routing such that some multicast IP subnet (eg net
224.0.0.0 mask 255.0.0.0) is routed to the IPoIB NIC.

Once this routing is in place, the only thing they need is to enhance
the MPI job starter/etc to allocate to each job (say) two unique
multicast --IP-- addresses on the relevant subnet and provide these IP
addresses to each rank. Now the rank can use the RDMA CM without any
hack.

Or.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 1/2] rdma_cm: add support to join IPOIB multicast groups

2007-01-24 Thread Or Gerlitz
On 1/24/07, Sean Hefty <[EMAIL PROTECTED]> wrote:
> > However, it will not support "mixed mode" communication patterns (which
> > you were raising last week) that is one app having a UD QP for both
> > multicast and unicast that talks with two "peers" IPoIB multicast and
> > another app doing only unicast.

> Separating ipoib to its own port space alleviated my concerns on existing 
> usage.
>   The RDMA_PS_UDP continues operating as before, with mixed mode traffic
> supported.  Mixed mode for RDMA_PS_IPOIB is not supported, since it's not 
> clear
> to me how that would be used.  The IPOIB protocol doesn't use SIDR, so I'm
> hesitant to extend the capabilities until there's a clear need/use.

Indeed, it is not possible to have UDP --unicast-- interop between
"IPoIB UD" (ie not IPoIB CM) and an RDMA_PS_IPOIB RDMA CM consumer.
However, it is possible that an RDMA_PS_IPOIB consumer would want to
talk over ---one-- UD QP with two peers:

1) IPoIB - multicast traffic
2) --another-- RDMA CM consumer - unicast traffic

since both talks are over the same QP everyone must use the same
--QKEY--, now since RDMA_PS_IPOIB does not support the SIDR exchange
this config is broken.

The patch i have sent allows this, and it can be really nice to remove
this restriction with some documentation explaining the restrictions.

> > Also, just a clarification - how exactly the patch enforces that an app
> > would not be able to do listen/connect/accept on RDMA_PS_IPOIB ID???

> This is not enforce directly yet.  (It just requires an if statement in 
> resolve
> route.)  I would expect that if it were tried, there would be a failure at 
> some
> point.

OK, that (failure at some point) was my thought as well.

Or.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [RFC/PATCH] librdmacm: use the ipoib broadcast group qkey

2007-01-23 Thread Or Gerlitz
Sean Hefty wrote:
>> Maybe just ask user to always call rdma_join_multicast after rdma_create_qp?
>> Joins are now properly reference counted, so it shouldn't be a problem
>> to repeat this any number of times. Right?
> 
> This is the solution for now, and it should work fine.  I don't think it would
> be hard to support creating the QP after joining if someone ever came up with
> the need, but it doesn't seem like a priority at the moment.

Indeed, since to do multicast RX/TX you need an IB UD QP... naturally an 
IB app (eg IPoIB) would create its QP before doing any join/leave on the 
group, if there would be a demand for a "crazy" use scheme of 1st join 
2nd create qp, you can enhance librdmacm to support it.

Or.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 2/2] librdmacm: add support to join IPOIB multicast groups

2007-01-23 Thread Or Gerlitz
Sean Hefty wrote:

> Add to the librdmacm an IPOIB port space that allows interoperability with
> IPoIB multicast traffic.  Use of the RDMA_PS_IPOIB is limited to multicast
> join/leave.

the two patches seems fine, however i will not be able to test them 
today being out of the office all the day, will send my testing feedback 
on Thursday early IL time (late Wed night PST)

Or.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [RFC/PATCH v3] rdma/cma: add RDMA_PS_IPOIB port space

2007-01-23 Thread Or Gerlitz
Sean Hefty wrote:


> I was thinking of SIDR, but what about connected mode ipoib?  This could 
> make the ipoib port space interesting, or require breaking it into two 
> separate port spaces, or...  I'm only going to worry about multicast for 
> now, unless there's a reason to consider other use.

I don't think we need to worry on offloading IPoIB connected mode now, 
but thanks for bringing the idea.

Or.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 1/2] rdma_cm: add support to join IPOIB multicast groups

2007-01-23 Thread Or Gerlitz
Sean Hefty wrote:
> Add to the rdma_cm an IPOIB port space that allows interoperability with
> IPoIB multicast traffic.  Use of the RDMA_PS_IPOIB is limited to multicast
> join/leave.

OK, Sean the patch looks perfectly fine for allowing multicast 
interoperability with IPoIB.

However, it will not support "mixed mode" communication patterns (which 
you were raising last week) that is one app having a UD QP for both 
multicast and unicast that talks with two "peers" IPoIB multicast and 
another app doing only unicast.

Such a scenario would have been supported if you allow for unicast apps 
to use the IPOIB port space as well - similar to the my version of the 
patch.

Also, just a clarification - how exactly the patch enforces that an app 
would not be able to do listen/connect/accept on RDMA_PS_IPOIB ID???

Or.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] librdmacm and udapl: Which git branch to use in ofed_1_2 build

2007-01-23 Thread Or Gerlitz
Tziporet Koren wrote:
> Sean Hefty wrote:
>>>multicast
>>> 
>>
>> This goes with the multicast branch of my rdma-dev git tree.
>>
>> IMO, OFED should determine which features they want and pull in the 
>> appropriate branch.  I know that Voltaire would like the multicast 
>> feature, but require a couple of changes to the code before its usable 
>> for them.

> Moni/Or

> Can you update us regarding multicast feature status and testing

I am working with Sean over the list on the changes needed to the 
multicast code needed for interoperability with IPoIB, it seems to 
converge and the code should be ready by the end of this week to be 
merged. Sean owns this and would do the push into OFED and upstream.

I am doing testing all the time over my systems, but my code bases are 
either upstream or something i have set on top of OFED 1.1, i don't have 
an OFED 1.2 env yet.

Or.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [RFC/PATCH] librdmacm: modify multicast code for RDMA_PS_IPOIB port space

2007-01-23 Thread Or Gerlitz
Sean,

with one host being both the client and the server, mckey does not work for me 
even without the
IPOIB PS changes, it used to work between two hosts with the below patch that 
forces the sender
to generate and poll completions on its TX packets.

please let me know how its going with mckey on your system, i am going to
test the librdmacm patches i have just sent with our mcast app.

Or.

Index: librdmacm/examples/mckey.c
===
--- librdmacm.orig/examples/mckey.c 2007-01-23 16:24:19.0 +0200
+++ librdmacm/examples/mckey.c  2007-01-23 16:50:13.0 +0200
@@ -452,10 +453,14 @@ static int run(void)
if (is_sender) {
printf("initiating data transfers\n");
for (i = 0; i < connections; i++) {
-   ret = post_sends(&test.nodes[i], 0);
+   ret = post_sends(&test.nodes[i], 
IBV_SEND_SIGNALED);
if (ret)
goto out;
-   }
+   }
+   printf("polling data transfers completion\n");
+   ret = poll_cqs();
+   if (ret)
+   goto out;
} else {
printf("receiving data transfers\n");
ret = poll_cqs();


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [RFC/PATCH] librdmacm: modify multicast code for RDMA_PS_IPOIB port space

2007-01-23 Thread Or Gerlitz
Enhance the mckey test program to work in either of the port spaces.

Signed-off-by: Or Gerlitz <[EMAIL PROTECTED]>

Index: librdmacm/examples/mckey.c
===
--- librdmacm.orig/examples/mckey.c 2007-01-23 16:52:16.0 +0200
+++ librdmacm/examples/mckey.c  2007-01-23 17:02:26.0 +0200
@@ -78,6 +78,7 @@ static int message_count = 10;
 static int is_sender;
 static char *dst_addr;
 static char *src_addr;
+static enum rdma_port_space port_space = RDMA_PS_UDP;

 static int create_message(struct cmatest_node *node)
 {
@@ -328,7 +329,7 @@ static int alloc_nodes(void)
for (i = 0; i < connections; i++) {
test.nodes[i].id = i;
ret = rdma_create_id(test.channel, &test.nodes[i].cma_id,
-&test.nodes[i], RDMA_PS_UDP);
+&test.nodes[i], port_space);
if (ret)
goto err;
}
@@ -472,7 +473,7 @@ int main(int argc, char **argv)
 {
int op, ret;

-   while ((op = getopt(argc, argv, "m:sb:c:C:S:")) != -1) {
+   while ((op = getopt(argc, argv, "m:sb:c:C:S:p:")) != -1) {
switch (op) {
case 'm':
dst_addr = optarg;
@@ -492,6 +493,9 @@ int main(int argc, char **argv)
case 'S':
message_size = atoi(optarg);
break;
+   case 'p':
+   port_space = strtol(optarg, NULL, 0);
+   break;
default:
printf("usage: %s\n", argv[0]);
printf("\t-m multicast_address\n");
@@ -500,6 +504,7 @@ int main(int argc, char **argv)
printf("\t[-c connections]\n");
printf("\t[-C message_count]\n");
printf("\t[-S message_size]\n");
+   printf("\t[-p port space - %#x for UDP %#x for 
IPoIB]\n",RDMA_PS_UDP,RDMA_PS_IPOIB);
exit(1);
}
}


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [RFC/PATCH v2] librdmacm: add RDMA_PS_IPOIB port space

2007-01-23 Thread Or Gerlitz
Add to librdmacm an IPoIB port space (RDMA_PS_IPOIB) whose semantics are similar
to those of RDMA_PS_UDP where RDMA_PS_IPOIB IDs allow for inter operability with
IPoIB on some traffic patterns.

For RDMA_PS_UDP and RDMA_PS_IPOIB IDs, the qkey is provided by the kernel in
ADDR_RESOLVED and CONNECT_REQUEST events and is stored by the library in struct
cma_id_private. Later the library use the qkey when it is called to create a UD 
QP.

The udaddy test program was enhanced to work in either of the port spaces.

Signed-off-by: Or Gerlitz <[EMAIL PROTECTED]>

Index: librdmacm/src/cma.c
===
--- librdmacm.orig/src/cma.c2007-01-22 21:21:37.0 +0200
+++ librdmacm/src/cma.c 2007-01-23 13:57:48.0 +0200
@@ -116,6 +116,7 @@ struct cma_id_private {
pthread_mutex_t   mut;
uint32_t  handle;
struct cma_multicast *mc_list;
+   uint32_t  qkey;
 };

 struct cma_multicast {
@@ -687,7 +688,7 @@ static int ucma_init_ud_qp(struct cma_id

qp_attr.port_num = id_priv->id.port_num;
qp_attr.qp_state = IBV_QPS_INIT;
-   qp_attr.qkey = RDMA_UD_QKEY;
+   qp_attr.qkey = id_priv->qkey;
ret = ibv_modify_qp(qp, &qp_attr, IBV_QP_STATE | IBV_QP_PKEY_INDEX |
  IBV_QP_PORT | IBV_QP_QKEY);
if (ret)
@@ -718,7 +719,7 @@ int rdma_create_qp(struct rdma_cm_id *id
if (!qp)
return -ENOMEM;

-   if (id->ps == RDMA_PS_UDP)
+   if (id->ps == RDMA_PS_UDP || id->ps == RDMA_PS_IPOIB)
ret = ucma_init_ud_qp(id_priv, qp);
else
ret = ucma_init_ib_qp(id_priv, qp);
@@ -809,7 +810,7 @@ int rdma_accept(struct rdma_cm_id *id, s
void *msg;
int ret, size;

-   if (id->ps != RDMA_PS_UDP) {
+   if (id->ps != RDMA_PS_UDP && id->ps != RDMA_PS_IPOIB) {
ret = ucma_modify_qp_rtr(id);
if (ret)
return ret;
@@ -1169,6 +1170,7 @@ int rdma_get_cm_event(struct rdma_event_
struct ucma_abi_get_event *cmd;
struct cma_event *evt;
void *msg;
+   struct cma_id_private *id_priv;
int ret, size;

ret = cma_dev_cnt ? 0 : ucma_init();
@@ -1197,8 +1199,11 @@ retry:
evt->id_priv = (void *) (uintptr_t) resp->uid;
evt->event.id = &evt->id_priv->id;
evt->event.status = ucma_query_route(&evt->id_priv->id);
+   id_priv = evt->id_priv;
if (evt->event.status)
evt->event.event = RDMA_CM_EVENT_ADDR_ERROR;
+   else if (id_priv->id.ps == RDMA_PS_UDP || id_priv->id.ps == 
RDMA_PS_IPOIB)
+   id_priv->qkey = resp->param.ud.qkey;
break;
case RDMA_CM_EVENT_ROUTE_RESOLVED:
evt->id_priv = (void *) (uintptr_t) resp->uid;
@@ -1211,12 +1216,16 @@ retry:
evt->id_priv = (void *) (uintptr_t) resp->uid;
if (evt->id_priv->id.ps == RDMA_PS_TCP)
ucma_copy_conn_event(evt, &resp->param.conn);
-   else
+   else
ucma_copy_ud_event(evt, &resp->param.ud);

ret = ucma_process_conn_req(evt, resp->id);
if (ret)
goto retry;
+
+   id_priv = container_of(evt->event.id, struct cma_id_private, 
id);
+   if (id_priv->id.ps == RDMA_PS_UDP || id_priv->id.ps == 
RDMA_PS_IPOIB)
+   id_priv->qkey = resp->param.ud.qkey;
break;
case RDMA_CM_EVENT_CONNECT_RESPONSE:
evt->id_priv = (void *) (uintptr_t) resp->uid;
@@ -1233,7 +1242,8 @@ retry:
case RDMA_CM_EVENT_ESTABLISHED:
evt->id_priv = (void *) (uintptr_t) resp->uid;
evt->event.id = &evt->id_priv->id;
-   if (evt->id_priv->id.ps == RDMA_PS_UDP) {
+   id_priv = evt->id_priv;
+   if (id_priv->id.ps == RDMA_PS_UDP || id_priv->id.ps == 
RDMA_PS_IPOIB) {
ucma_copy_ud_event(evt, &resp->param.ud);
break;
}
Index: librdmacm/examples/udaddy.c
===
--- librdmacm.orig/examples/udaddy.c2007-01-22 21:19:52.0 +0200
+++ librdmacm/examples/udaddy.c 2007-01-23 15:50:48.0 +0200
@@ -76,6 +76,7 @@ static int message_size = 100;
 static int message_count = 10;
 static char *dst_addr;
 static char *src_addr;
+static enum rdma_port_space port_space = RDMA_PS_UDP;

 static int create_message(struct cmatest_node *node)
 {
@@ -253,7 +254,7 @@ err:
return ret;
 }

-static int connect_handler

[openib-general] [RFC/PATCH] rdma/cma: port rdma_cm multicast code to the UDP/IPOIB port space framework

2007-01-23 Thread Or Gerlitz
Allow rdma_cm/ipoib multicast inter operability for RDMA_PS_IPOIB IDs. This is 
implemented by
having the rdma cm use the --same-- qkey and multicast gid used by ipoib where 
for RDMA_UD_UDP
IDs the rdma cm uses a qkey of its own and adds a signature byte to the 
multicast gid.

Signed-off-by: Or Gerlitz <[EMAIL PROTECTED]>

Index: rdma-dev/drivers/infiniband/core/cma.c
===
--- rdma-dev.orig/drivers/infiniband/core/cma.c 2007-01-23 15:56:01.0 
+0200
+++ rdma-dev/drivers/infiniband/core/cma.c  2007-01-23 15:56:23.0 
+0200
@@ -2473,7 +2473,10 @@ static int cma_join_ib_multicast(struct
return ret;

ip_ib_mc_map(sin->sin_addr.s_addr, mc_map);
-   mc_map[7] = 0x01;   /* Use RDMA CM signature */
+   if (id_priv->id.ps == RDMA_PS_UDP) {
+   rec.qkey  = RDMA_UD_QKEY;   /* Use RDMA CM QKEY  */
+   mc_map[7] = 0x01;   /* Use RDMA CM signature */
+   }
mc_map[8] = ib_addr_get_pkey(dev_addr) >> 8;
mc_map[9] = (unsigned char) ib_addr_get_pkey(dev_addr);



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [RFC/PATCH v3] rdma/cma: add RDMA_PS_IPOIB port space

2007-01-23 Thread Or Gerlitz
Add to the RDMA CM an IPoIB port space (RDMA_PS_IPOIB) whose semantics are 
similar
to those of RDMA_PS_UDP where RDMA_PS_IPOIB IDs allow for inter operability with
IPoIB on some traffic patterns.

For RDMA_PS_UDP and RDMA_PS_IPOIB IDs, the qkey is stored in struct 
rdma_id_private
and delivered also in ADDR_RESOLVED and CONNECT_REQUEST events. The user space 
library
learns the qkey from these events and use them when it is called to create UD 
QP.

The IB UD qkey used by RDMA_PS_IPOIB IDs is that of the related ipoib broadcast 
group
where the qkey used by RDMA_PS_UDP IDs is hard defined "rdma cm qkey".

Creation of RDMA_PS_IPOIB IDs by proceeses is controlled by the linux kernel
capabilities subsystem.

Signed-off-by: Or Gerlitz <[EMAIL PROTECTED]>

Index: rdma-dev/drivers/infiniband/core/cma.c
===
--- rdma-dev.orig/drivers/infiniband/core/cma.c 2007-01-21 12:11:16.0 
+0200
+++ rdma-dev/drivers/infiniband/core/cma.c  2007-01-23 15:45:52.0 
+0200
@@ -71,6 +71,7 @@ static struct workqueue_struct *cma_wq;
 static DEFINE_IDR(sdp_ps);
 static DEFINE_IDR(tcp_ps);
 static DEFINE_IDR(udp_ps);
+static DEFINE_IDR(ipoib_ps);

 struct cma_device {
struct list_headlist;
@@ -136,6 +137,7 @@ struct rdma_id_private {
u32 seq_num;
u32 qp_num;
u8  srq;
+   u32 qkey;
 };

 struct cma_multicast {
@@ -323,6 +325,10 @@ struct rdma_cm_id *rdma_create_id(rdma_c
 {
struct rdma_id_private *id_priv;

+   /* XXX - work around this till capabilities work fine for non root 
users */
+   if (ps == RDMA_PS_IPOIB && !capable(CAP_NET_BROADCAST))
+   return ERR_PTR(-EACCES);
+
id_priv = kzalloc(sizeof *id_priv, GFP_KERNEL);
if (!id_priv)
return ERR_PTR(-ENOMEM);
@@ -884,6 +890,31 @@ out:
return ret;
 }

+static int cma_set_qkey(struct rdma_id_private *id_priv, struct rdma_cm_event 
*event)
+{
+   struct ib_sa_mcmember_rec rec;
+   struct rdma_dev_addr *dev_addr;
+   int ret;
+
+   if (id_priv->id.ps == RDMA_PS_IPOIB) {
+   dev_addr = &id_priv->id.route.addr.dev_addr;
+   ib_addr_get_mgid(dev_addr, &rec.mgid);
+   ret = ib_sa_get_mcmember_rec(id_priv->id.device, 
id_priv->id.port_num,
+&rec.mgid, &rec);
+   if (ret)
+   return -EINVAL;
+   id_priv->qkey = rec.qkey;
+   event->param.ud.qkey = rec.qkey;
+   }
+
+   if (id_priv->id.ps == RDMA_PS_UDP) {
+   id_priv->qkey = RDMA_UD_QKEY;
+   event->param.ud.qkey = RDMA_UD_QKEY;
+   }
+
+   return 0;
+}
+
 static struct rdma_id_private *cma_new_conn_id(struct rdma_cm_id *listen_id,
   struct ib_cm_event *ib_event)
 {
@@ -999,7 +1030,7 @@ static int cma_req_handler(struct ib_cm_
memset(&event, 0, sizeof event);
offset = cma_user_data_offset(listen_id->id.ps);
event.event = RDMA_CM_EVENT_CONNECT_REQUEST;
-   if (listen_id->id.ps == RDMA_PS_UDP) {
+   if (listen_id->id.ps == RDMA_PS_UDP || listen_id->id.ps == 
RDMA_PS_IPOIB) {
conn_id = cma_new_udp_id(&listen_id->id, ib_event);
event.param.ud.private_data = ib_event->private_data + offset;
event.param.ud.private_data_len =
@@ -1020,7 +1051,11 @@ static int cma_req_handler(struct ib_cm_
mutex_unlock(&lock);
if (ret)
goto release_conn_id;
-
+
+   ret = cma_set_qkey(conn_id, &event);
+   if (ret)
+   goto release_conn_id;
+
conn_id->cm_id.ib = cm_id;
cm_id->context = conn_id;
cm_id->cm_handler = cma_ib_handler;
@@ -1600,6 +1635,7 @@ static void addr_handler(int status, str
 {
struct rdma_id_private *id_priv = context;
struct rdma_cm_event event;
+   int ret;

memset(&event, 0, sizeof event);
atomic_inc(&id_priv->dev_remove);
@@ -1627,6 +1663,11 @@ static void addr_handler(int status, str
memcpy(&id_priv->id.route.addr.src_addr, src_addr,
   ip_addr_size(src_addr));
event.event = RDMA_CM_EVENT_ADDR_RESOLVED;
+   ret = cma_set_qkey(id_priv, &event);
+   if (ret) {
+   event.event = RDMA_CM_EVENT_ADDR_ERROR;
+   event.status = ret;
+   }
}

if (id_priv->id.event_handler(&id_priv->id, &event)) {
@@ -1822,6 +1863,9 @@ static int cma_get_port(struct rdma_id_p
case RDMA_PS_UDP:
ps = &udp_ps;
break;
+   case RDMA_PS_IPOIB:
+

Re: [openib-general] rdma/cma: use the ipoib broadcast group qkey - linux capabilities

2007-01-23 Thread Or Gerlitz
Or Gerlitz wrote:
>> This checks prevents applications from trying to use port numbers below 1024
>> without unless they possess the net bind service capability.  A similar check
>> could just be:
>>
>> if (ps == RDMA_PS_IPOIB && !capable(CAP_NET_BIND_SERVICE))
>>  return -EACCES;
> 
> OK, lets see i got it: your suggestion is that only if the process has 
> the net bind service capability it would be able to create RDMA_PS_IPOIB 
> IDs. How do processes get a possession of this capability().
> 
> Talking here, I understand that there are issues with Linux 
> capability()-ies , specifically capabilities are not passed through 
> execve() see "understanding Linux capabilities brokenness" @ 
> http://lkml.org/lkml/2005/8/8/248
> 
> This means capabilities are practically not usable for "non root processes".

I have now got a pointer to this more recent LKML discussion where a 
patch was suggested to solve the problem "patch to make Linux 
capabilities into something useful (v 0.3.1)" @ 
http://lkml.org/lkml/2006/9/5/246

This means that unless someone proves that capabilities are not broken, 
we will allow (eg under some mod param) non-root apps to create 
RDMA_PS_IPOIB IDs, OK?

Or.



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [RFC/PATCH v2] rdma/cma: use the ipoib broadcast group qkey

2007-01-22 Thread Or Gerlitz
Sean Hefty wrote:
> After more consideration, I think this is the correct approach.  I've already
> started working on a patch for this that I should have done but by the end of
> the week (hopefully tomorrow).

> This checks prevents applications from trying to use port numbers below 1024
> without unless they possess the net bind service capability.  A similar check
> could just be:
> 
> if (ps == RDMA_PS_IPOIB && !capable(CAP_NET_BIND_SERVICE))
>   return -EACCES;

OK, lets see i got it: your suggestion is that only if the process has 
the net bind service capability it would be able to create RDMA_PS_IPOIB 
IDs. How do processes get a possession of this capability().

Talking here, I understand that there are issues with Linux 
capability()-ies , specifically capabilities are not passed through 
execve() see "understanding Linux capabilities brokenness" @ 
http://lkml.org/lkml/2005/8/8/248

This means capabilities are practically not usable for "non root processes".

Or.



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [RFC/PATCH v2] rdma/cma: use the ipoib broadcast group qkey

2007-01-22 Thread Or Gerlitz
On 1/23/07, Sean Hefty <[EMAIL PROTECTED]> wrote:
> Or Gerlitz wrote:
> > Modify the kernel rdma cm use the ipoib broadcast group qkey instead a qkey
> > of its own for its UD IDs/QPs. For RDMA_PS_UDP ID, the qkey is stored in
> > struct rdma_id_private and delivered also in ADDR_RESOLVED and
> > CONNECT_REQUEST events. The user space library learns the qkey from these
> > events and use them when it is called to create UD QP.
>
> Overall, I think this is a reasonable approach.  I would just like the 
> framework
> to provide a way to restrict any userspace application from joining an ipoib
> multicast group.  What do you think of the idea of creating a new port space
> specific to ipoib, similar to what's provided for SDP?

Basically, I am positive to this, under the assumption that it will be
possible for --non-- root user space application to create
RDMA_PS_IPOIB IDs and use them as i would have been doing with
RDMA_PS_UDP IDs.

> For example, add:
> enum rdma_port_space {
> RDMA_PS_SDP   = 0x0001,
> +   RDMA_PS_IPOIB = 0x0002,
> RDMA_PS_TCP   = 0x0106,
> RDMA_PS_UDP   = 0x0111,
>
> The qkey/MGID would adjust based on the port space, which is specified as part
> of rdma_create_id().

OK

> of rdma_create_id().  Use of RDMA_PS_IPOIB could then be restricted using a
> check similar to that used for port assignment (see cma_use_port() -
> capable(CAP_NET_BIND_SERVICE)).

I don't want to loose a day, so if you don't mind, i would ask you for
a crash course here, i don't really think to fully understand the
following lines from cma_use_port() ...

1753 sin = (struct sockaddr_in *) &id_priv->id.route.addr.src_addr;
1754 snum = ntohs(sin->sin_port);
1755 if (snum < PROT_SOCK && !capable(CAP_NET_BIND_SERVICE))
1756 return -EACCES;

what would be the equivalent check for RDMA_PS_IPOIB? and would this
check be done only on rdma_create_id time?

Or.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [RFC/PATCH] librdmacm: use the ipoib broadcast group qkey

2007-01-22 Thread Or Gerlitz
Sean,

Using the two patches udaddy works fine except for the packets sent by
the passive side which are filtered out by the active side HCA/QP.

This is b/c the passive side of this --test-- is not really doing
RDMA CM UD qp and qkey resolution but rather uses the imm data to
"exchange" (below) the active side qp and hard coded qkey. I think that
in real life librdmacm apps this sort of design is much less expected, and
the passive side would also initiate qp/qkey/sidr exchange.

I need to think on this point a little bit to see if my design can be
changed a little to allow for this sort of simplification.

+/*
+ * Global qkey value for all UD QPs and multicast groups created via the
+ * RDMA CM.
+ * XXX FIXME - enhance test to not assume a pre defined qkey
+ */
+#define RDMA_UD_QKEY 0x01234567
+
+static void create_reply_ah(struct cmatest_node *node, struct ibv_wc *wc)
+{
+   node->ah = ibv_create_ah_from_wc(node->pd, wc, node->mem,
+node->cma_id->port_num);
+   node->remote_qpn = ntohl(wc->imm_data);
+   node->remote_qkey = RDMA_UD_QKEY;
+}

Or.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [RFC/PATCH] librdmacm: use the ipoib broadcast group qkey

2007-01-22 Thread Or Gerlitz
Modify librdmacm use a qkey for its UD IDs/QPs delivered to it by the
rdma cm kernel code instead the a hard coded RDMA_UD_QKEY. For RDMA_PS_UDP
ID, the qkey is provided by the kernel in ADDR_RESOLVED and CONNECT_REQUEST
events and is stored by the library in struct cma_id_private. Later
the library use the qkey when it is called to create a UD QP.

Signed-off-by: Or Gerlitz <[EMAIL PROTECTED]>

Index: librdmacm/src/cma.c
===
--- librdmacm.orig/src/cma.c2007-01-22 21:21:37.0 +0200
+++ librdmacm/src/cma.c 2007-01-22 21:57:13.0 +0200
@@ -116,6 +116,7 @@ struct cma_id_private {
pthread_mutex_t   mut;
uint32_t  handle;
struct cma_multicast *mc_list;
+   uint32_t  qkey;
 };

 struct cma_multicast {
@@ -687,7 +688,7 @@ static int ucma_init_ud_qp(struct cma_id

qp_attr.port_num = id_priv->id.port_num;
qp_attr.qp_state = IBV_QPS_INIT;
-   qp_attr.qkey = RDMA_UD_QKEY;
+   qp_attr.qkey = id_priv->qkey;
ret = ibv_modify_qp(qp, &qp_attr, IBV_QP_STATE | IBV_QP_PKEY_INDEX |
  IBV_QP_PORT | IBV_QP_QKEY);
if (ret)
@@ -1169,6 +1170,7 @@ int rdma_get_cm_event(struct rdma_event_
struct ucma_abi_get_event *cmd;
struct cma_event *evt;
void *msg;
+   struct cma_id_private *id_priv;
int ret, size;

ret = cma_dev_cnt ? 0 : ucma_init();
@@ -1199,6 +1201,9 @@ retry:
evt->event.status = ucma_query_route(&evt->id_priv->id);
if (evt->event.status)
evt->event.event = RDMA_CM_EVENT_ADDR_ERROR;
+   else if (evt->id_priv->id.ps == RDMA_PS_UDP) {
+evt->id_priv->qkey = resp->param.ud.qkey;
+   }
break;
case RDMA_CM_EVENT_ROUTE_RESOLVED:
evt->id_priv = (void *) (uintptr_t) resp->uid;
@@ -1211,12 +1216,16 @@ retry:
evt->id_priv = (void *) (uintptr_t) resp->uid;
if (evt->id_priv->id.ps == RDMA_PS_TCP)
ucma_copy_conn_event(evt, &resp->param.conn);
-   else
+   else
ucma_copy_ud_event(evt, &resp->param.ud);

ret = ucma_process_conn_req(evt, resp->id);
if (ret)
goto retry;
+   if (evt->id_priv->id.ps == RDMA_PS_UDP) {
+   id_priv = container_of(evt->event.id, struct 
cma_id_private, id);
+   id_priv->qkey = resp->param.ud.qkey;
+   }
break;
case RDMA_CM_EVENT_CONNECT_RESPONSE:
evt->id_priv = (void *) (uintptr_t) resp->uid;
Index: librdmacm/examples/udaddy.c
===
--- librdmacm.orig/examples/udaddy.c2007-01-22 21:19:52.0 +0200
+++ librdmacm/examples/udaddy.c 2007-01-22 22:02:07.0 +0200
@@ -415,6 +415,13 @@ static void destroy_nodes(void)
free(test.nodes);
 }

+/*
+ * Global qkey value for all UD QPs and multicast groups created via the
+ * RDMA CM.
+ * XXX FIXME - enhance test to not assume a pre defined qkey
+ */
+#define RDMA_UD_QKEY 0x01234567
+
 static void create_reply_ah(struct cmatest_node *node, struct ibv_wc *wc)
 {
node->ah = ibv_create_ah_from_wc(node->pd, wc, node->mem,
Index: librdmacm/include/rdma/rdma_cma.h
===
--- librdmacm.orig/include/rdma/rdma_cma.h  2007-01-22 21:56:13.0 
+0200
+++ librdmacm/include/rdma/rdma_cma.h   2007-01-22 21:56:32.0 +0200
@@ -65,12 +65,6 @@ enum rdma_port_space {
RDMA_PS_UDP  = 0x0111,
 };

-/*
- * Global qkey value for all UD QPs and multicast groups created via the
- * RDMA CM.
- */
-#define RDMA_UD_QKEY 0x01234567
-
 struct ib_addr {
union ibv_gid   sgid;
union ibv_gid   dgid;


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [RFC/PATCH v2] rdma/cma: use the ipoib broadcast group qkey

2007-01-22 Thread Or Gerlitz

Modify the kernel rdma cm use the ipoib broadcast group qkey instead a qkey of 
its
own for its UD IDs/QPs. For RDMA_PS_UDP ID, the qkey is stored in struct 
rdma_id_private
and delivered also in ADDR_RESOLVED and CONNECT_REQUEST events. The user space 
library
learns the qkey from these events and use them when it is called to create UD 
QP.

Signed-off-by: Or Gerlitz <[EMAIL PROTECTED]>

Index: rdma-dev/drivers/infiniband/core/cma.c
===
--- rdma-dev.orig/drivers/infiniband/core/cma.c 2007-01-21 12:11:16.0 
+0200
+++ rdma-dev/drivers/infiniband/core/cma.c  2007-01-22 21:52:30.0 
+0200
@@ -136,6 +136,7 @@ struct rdma_id_private {
u32 seq_num;
u32 qp_num;
u8  srq;
+   u32 qkey;
 };

 struct cma_multicast {
@@ -884,6 +885,21 @@ out:
return ret;
 }

+static int get_broadcast_group_qkey(struct rdma_id_private *id_priv)
+{
+   struct ib_sa_mcmember_rec rec;
+   struct rdma_dev_addr *dev_addr = &id_priv->id.route.addr.dev_addr;
+   int ret;
+
+   ib_addr_get_mgid(dev_addr, &rec.mgid);
+   ret = ib_sa_get_mcmember_rec(id_priv->id.device, id_priv->id.port_num,
+&rec.mgid, &rec);
+   if (ret)
+   return -EINVAL;
+   id_priv->qkey = rec.qkey;
+   return 0;
+}
+
 static struct rdma_id_private *cma_new_conn_id(struct rdma_cm_id *listen_id,
   struct ib_cm_event *ib_event)
 {
@@ -1020,7 +1036,14 @@ static int cma_req_handler(struct ib_cm_
mutex_unlock(&lock);
if (ret)
goto release_conn_id;
-
+
+   if (conn_id->id.ps == RDMA_PS_UDP) {
+   ret = get_broadcast_group_qkey(conn_id);
+   if (ret)
+   goto release_conn_id;
+   event.param.ud.qkey = conn_id->qkey;
+   }
+
conn_id->cm_id.ib = cm_id;
cm_id->context = conn_id;
cm_id->cm_handler = cma_ib_handler;
@@ -1600,6 +1623,7 @@ static void addr_handler(int status, str
 {
struct rdma_id_private *id_priv = context;
struct rdma_cm_event event;
+   int ret;

memset(&event, 0, sizeof event);
atomic_inc(&id_priv->dev_remove);
@@ -1627,6 +1651,14 @@ static void addr_handler(int status, str
memcpy(&id_priv->id.route.addr.src_addr, src_addr,
   ip_addr_size(src_addr));
event.event = RDMA_CM_EVENT_ADDR_RESOLVED;
+   if (id_priv->id.ps == RDMA_PS_UDP) {
+   ret = get_broadcast_group_qkey(id_priv);
+   if (ret) {
+   event.event = RDMA_CM_EVENT_ADDR_ERROR;
+   event.status = ret;
+   } else
+   event.param.ud.qkey = id_priv->qkey;
+   }
}

if (id_priv->id.event_handler(&id_priv->id, &event)) {
@@ -1936,7 +1968,9 @@ static int cma_sidr_rep_handler(struct i
event.status = ib_event->param.sidr_rep_rcvd.status;
break;
}
-   if (rep->qkey != RDMA_UD_QKEY) {
+   if (rep->qkey != id_priv->qkey) {
+   printk(KERN_WARNING "qkey mismatch %.8x client qkey 
%.8x\n",
+   rep->qkey, id_priv->qkey);
event.event = RDMA_CM_EVENT_UNREACHABLE;
event.status = -EINVAL;
break;
@@ -2231,7 +2265,7 @@ static int cma_send_sidr_rep(struct rdma
rep.status = status;
if (status == IB_SIDR_SUCCESS) {
rep.qp_num = id_priv->qp_num;
-   rep.qkey = RDMA_UD_QKEY;
+   rep.qkey = id_priv->qkey;
}
rep.private_data = private_data;
rep.private_data_len = private_data_len;
Index: rdma-dev/include/rdma/rdma_cm_ib.h
===
--- rdma-dev.orig/include/rdma/rdma_cm_ib.h 2007-01-18 13:43:37.0 
+0200
+++ rdma-dev/include/rdma/rdma_cm_ib.h  2007-01-22 21:59:34.0 +0200
@@ -44,7 +44,4 @@
 int rdma_set_ib_paths(struct rdma_cm_id *id,
  struct ib_sa_path_rec *path_rec, int num_paths);

-/* Global qkey for UD QPs and multicast groups. */
-#define RDMA_UD_QKEY 0x01234567
-
 #endif /* RDMA_CM_IB_H */


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [RFC/PATCH] rdma/cma: use the ipoib broadcast group qkey

2007-01-22 Thread Or Gerlitz
Sean,

Please let me know what you think - my intention is to have the group
type effect only whether or not to set the rdmacm signature byte on the
mgid and as for the qkey, just make the ipoib broadcast group qkey being
used instread a qkey defined by the rdma cm. The patch is not completed
yet in the sense that the qkey associated with the rdma cm kernel id
should be exported to user space (on the client side it would be on the
addr resolve event flow and on the server side on the conn req event flow)
to be set by librdmacm into the user UD QP on the time rdma_create_qp is called.

change the kernel rdma cm use the ipoib broadcast group qkey instead a qkey of 
its own.

Signed-off-by: Or Gerlitz <[EMAIL PROTECTED]>

Index: rdma-dev/drivers/infiniband/core/cma.c
===
--- rdma-dev.orig/drivers/infiniband/core/cma.c 2007-01-21 12:11:16.0 
+0200
+++ rdma-dev/drivers/infiniband/core/cma.c  2007-01-22 14:05:22.0 
+0200
@@ -136,6 +136,7 @@ struct rdma_id_private {
u32 seq_num;
u32 qp_num;
u8  srq;
+   u32 qkey;
 };

 struct cma_multicast {
@@ -884,6 +885,21 @@ out:
return ret;
 }

+static int get_broadcast_group_qkey(struct rdma_id_private *id_priv)
+{
+   struct ib_sa_mcmember_rec rec;
+   struct rdma_dev_addr *dev_addr = &id_priv->id.route.addr.dev_addr;
+   int ret;
+
+   ib_addr_get_mgid(dev_addr, &rec.mgid);
+   ret = ib_sa_get_mcmember_rec(id_priv->id.device, id_priv->id.port_num,
+&rec.mgid, &rec);
+   if (ret)
+   return -EINVAL;
+   id_priv->qkey = rec.qkey;
+   return 0;
+}
+
 static struct rdma_id_private *cma_new_conn_id(struct rdma_cm_id *listen_id,
   struct ib_cm_event *ib_event)
 {
@@ -1021,6 +1037,10 @@ static int cma_req_handler(struct ib_cm_
if (ret)
goto release_conn_id;

+   ret = get_broadcast_group_qkey(conn_id);
+   if (ret)
+   goto release_conn_id;
+
conn_id->cm_id.ib = cm_id;
cm_id->context = conn_id;
cm_id->cm_handler = cma_ib_handler;
@@ -1600,6 +1620,7 @@ static void addr_handler(int status, str
 {
struct rdma_id_private *id_priv = context;
struct rdma_cm_event event;
+   int ret;

memset(&event, 0, sizeof event);
atomic_inc(&id_priv->dev_remove);
@@ -1626,6 +1647,11 @@ static void addr_handler(int status, str
} else {
memcpy(&id_priv->id.route.addr.src_addr, src_addr,
   ip_addr_size(src_addr));
+   ret = get_broadcast_group_qkey(id_priv);
+   if (ret) {
+   event.event = RDMA_CM_EVENT_ADDR_ERROR;
+   event.status = ret;
+   }
event.event = RDMA_CM_EVENT_ADDR_RESOLVED;
}

@@ -1936,7 +1962,9 @@ static int cma_sidr_rep_handler(struct i
event.status = ib_event->param.sidr_rep_rcvd.status;
break;
}
-   if (rep->qkey != RDMA_UD_QKEY) {
+   if (rep->qkey != id_priv->qkey) {
+   printk(KERN_WARNING "qkey mismatch %.8x client qkey 
%.8x\n",
+   rep->qkey, id_priv->qkey);
event.event = RDMA_CM_EVENT_UNREACHABLE;
event.status = -EINVAL;
break;
@@ -2231,7 +2259,7 @@ static int cma_send_sidr_rep(struct rdma
rep.status = status;
if (status == IB_SIDR_SUCCESS) {
rep.qp_num = id_priv->qp_num;
-   rep.qkey = RDMA_UD_QKEY;
+   rep.qkey = id_priv->qkey;
}
rep.private_data = private_data;
rep.private_data_len = private_data_len;

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [PATCH] rdma/cma: remove per multicast group qkey usage

2007-01-22 Thread Or Gerlitz
Sean,

Please see the cleanup below, also i see now that librdmacm has two functions
to init a qp: ucma_init_ud_qp for UD QPs and ucma_init_ib_qp for RC QPs,
where the rdmacm kernel code only has ucma_init_ib_qp, i guess something
here is missing (is it only set the QKEY into the UD QP or also modify to
RTR and RTS ? let me know and i can send a patch).

a cleanup on the RDMA CM UD code: remove per group qkey usage for the join flow 
as this
is impossible to achieve in practice with same UD QP serving attached to 
multiple group

Signed-off-by: Or Gerlitz <[EMAIL PROTECTED]>

Index: rdma-dev/drivers/infiniband/core/cma.c
===
--- rdma-dev.orig/drivers/infiniband/core/cma.c 2007-01-21 12:08:06.0 
+0200
+++ rdma-dev/drivers/infiniband/core/cma.c  2007-01-21 12:11:16.0 
+0200
@@ -2434,7 +2434,6 @@ static int cma_join_ib_multicast(struct
ib_addr_get_sgid(dev_addr, &rec.port_gid);
rec.pkey = cpu_to_be16(ib_addr_get_pkey(dev_addr));
rec.join_state = 1;
-   rec.qkey = sin->sin_addr.s_addr;

comp_mask = IB_SA_MCMEMBER_REC_MGID | IB_SA_MCMEMBER_REC_PORT_GID |
IB_SA_MCMEMBER_REC_PKEY | IB_SA_MCMEMBER_REC_JOIN_STATE |

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] failure to use libibverbs clone

2007-01-22 Thread Or Gerlitz
I have configured libmthca against the install of libibverbs 
(/usr/local/rdmacm),
the configure output is below.

Or.

dill:/usr/src/libmthca # ./configure --prefix=/usr/local/rdmacm 
CFLAGS=-I/usr/local/rdmacm/include \
LDFLAGS=-L/usr/local/rdmacm/lib LD_LIBRARY_PATH=/usr/local/rdmacm/lib

checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking build system type... x86_64-suse-linux
checking host system type... x86_64-suse-linux
checking for style of include used by make... GNU
checking for gcc... gcc
checking for C compiler default output file name... a.out
checking whether the C compiler works... yes
checking whether we are cross compiling... no
checking for suffix of executables...
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ANSI C... none needed
checking dependency style of gcc... gcc3
checking for a sed that does not truncate output... /usr/bin/sed
checking for egrep... grep -E
checking for ld used by gcc... /usr/x86_64-suse-linux/bin/ld
checking if the linker (/usr/x86_64-suse-linux/bin/ld) is GNU ld... yes
checking for /usr/x86_64-suse-linux/bin/ld option to reload object
files... -r
checking for BSD-compatible nm... /usr/bin/nm -B
checking whether ln -s works... yes
checking how to recognise dependent libraries... pass_all
checking how to run the C preprocessor... gcc -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking dlfcn.h usability... yes
checking dlfcn.h presence... yes
checking for dlfcn.h... yes
checking for g++... g++
checking whether we are using the GNU C++ compiler... yes
checking whether g++ accepts -g... yes
checking dependency style of g++... gcc3
checking how to run the C++ preprocessor... g++ -E
checking for g77... g77
checking whether we are using the GNU Fortran 77 compiler... yes
checking whether g77 accepts -g... yes
checking the maximum length of command line arguments... 32768
checking command to parse /usr/bin/nm -B output from gcc object... ok
checking for objdir... .libs
checking for ar... ar
checking for ranlib... ranlib
checking for strip... strip
checking if gcc static flag  works... yes
checking if gcc supports -fno-rtti -fno-exceptions... no
checking for gcc option to produce PIC... -fPIC
checking if gcc PIC flag -fPIC works... yes
checking if gcc supports -c -o file.o... yes
checking whether the gcc linker (/usr/x86_64-suse-linux/bin/ld -m
elf_x86_64) supports shared libraries... yes
checking whether -lc should be explicitly linked in... no
checking dynamic linker characteristics... GNU/Linux ld.so
checking how to hardcode library paths into programs... immediate
checking whether stripping libraries is possible... yes
checking if libtool supports shared libraries... yes
checking whether to build shared libraries... yes
checking whether to build static libraries... yes
configure: creating libtool
appending configuration tag "CXX" to libtool
checking for ld used by g++... /usr/x86_64-suse-linux/bin/ld -m elf_x86_64
checking if the linker (/usr/x86_64-suse-linux/bin/ld -m elf_x86_64) is
GNU ld... yes
checking whether the g++ linker (/usr/x86_64-suse-linux/bin/ld -m
elf_x86_64) supports shared libraries... yes
checking for g++ option to produce PIC... -fPIC
checking if g++ PIC flag -fPIC works... yes
checking if g++ supports -c -o file.o... yes
checking whether the g++ linker (/usr/x86_64-suse-linux/bin/ld -m
elf_x86_64) supports shared libraries... yes
checking dynamic linker characteristics... GNU/Linux ld.so
checking how to hardcode library paths into programs... immediate
checking whether stripping libraries is possible... yes
appending configuration tag "F77" to libtool
checking if libtool supports shared libraries... yes
checking whether to build shared libraries... yes
checking whether to build static libraries... yes
checking for g77 option to produce PIC... -fPIC
checking if g77 PIC flag -fPIC works... yes
checking if g77 supports -c -o file.o... yes
checking whether the g77 linker (/usr/x86_64-suse-linux/bin/ld -m
elf_x86_64) supports shared libraries... yes
checking dynamic linker characteristics... GNU/Linux ld.so
checking how to hardcode library paths into programs... immediate
checking whether stripping libraries is possible... yes
checking for gcc... (cached) gcc
checking whether we are using the GNU C compiler... (cached) yes
checking whether gcc accepts -g... (cached) yes
checking for gcc option to accept ANSI C... (cached) none needed
checking dependency style of gcc... (cached) gcc3
checking for ibv_get_device_list in -libverbs... ye

Re: [openib-general] failure to use libibverbs clone

2007-01-21 Thread Or Gerlitz
install libmthca does not seem to create the etc directory

Or.


dill:/usr/src/libmthca # make install
make[1]: Entering directory `/usr/src/libmthca'
test -z "/usr/local/rdmacm/lib" || mkdir -p -- . "/usr/local/rdmacm/lib"
test -z "" || mkdir -p -- . ""
test -z "/usr/local/rdmacm/lib/infiniband" || mkdir -p -- .
"/usr/local/rdmacm/lib/infiniband"
 /bin/sh ./libtool --mode=install /usr/bin/install -c  'src/mthca.la'
'/usr/local/rdmacm/lib/infiniband/mthca.la'
/usr/bin/install -c src/.libs/mthca.so
/usr/local/rdmacm/lib/infiniband/mthca.so
/usr/bin/install -c src/.libs/mthca.lai
/usr/local/rdmacm/lib/infiniband/mthca.la
/usr/bin/install -c src/.libs/mthca.a
/usr/local/rdmacm/lib/infiniband/mthca.a
ranlib /usr/local/rdmacm/lib/infiniband/mthca.a
chmod 644 /usr/local/rdmacm/lib/infiniband/mthca.a
PATH="$PATH:/sbin" ldconfig -n /usr/local/rdmacm/lib/infiniband
--
Libraries have been installed in:
   /usr/local/rdmacm/lib/infiniband

If you ever happen to want to link against installed libraries
in a given directory, LIBDIR, you must either use libtool, and
specify the full pathname of the library, or use the `-LLIBDIR'
flag during linking and do at least one of the following:
   - add LIBDIR to the `LD_LIBRARY_PATH' environment variable
 during execution
   - add LIBDIR to the `LD_RUN_PATH' environment variable
 during linking
   - use the `-Wl,--rpath -Wl,LIBDIR' linker flag
   - have your system administrator add LIBDIR to `/etc/ld.so.conf'

See any operating system documentation about shared libraries for
more information, such as the ld(1) and ld.so(8) manual pages.
--
make[1]: Leaving directory `/usr/src/libmthca'



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] OFED ipoib_8111_to_2_6_16.patch

2007-01-21 Thread Or Gerlitz
Michael S. Tsirkin wrote:
>> OK, thanks for the info. The context here is the bonding support. We had 
>> an issue with distro (eg RH4 U3, SLES10) kernels that was not reproduced 
>> with upstream kernels and it seems to be related to the change you have 
>> pushed to 2.6.17. I will let you know if we need more clarifications.

> Was the issue triggered at ipoib module unload?

no, its an issue related to the bonding design and the two layer nature 
of the ipoib neighbouring scheme: struct neighbour "pointing" to struct 
ipoib_neigh etc. We are still investigating it, hope to know more by 
tomorrow.

Or.




___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] failure to use libibverbs clone

2007-01-21 Thread Or Gerlitz
Michael S. Tsirkin wrote:
>>  libibverbs: Warning: couldn't open config directory 
>> '/usr/local/rdmacm/etc/libibverbs.d'.
> 
> Well, do you have /usr/local/rdmacm/etc/libibverbs.d?

no, who should create it? doing $ make install under libibverbs does not 
  do it. I have created manually an empty library and then this warning 
went away but the other one and the failure to find devices stayed.

Or.




___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] failure to use libibverbs clone

2007-01-21 Thread Or Gerlitz
Roland,

Using a fresh clone of libibverbs, libmthca and a kernel based on 2.6.20-rc3 
(clone of
Sean's rdma-dev git tree at open fabrics)

I am getting errors such as

# LD_LIBRARY_PATH=/usr/local/rdmacm/lib /usr/local/rdmacm/bin/ibv_devinfo

libibverbs: Warning: couldn't open config directory 
'/usr/local/rdmacm/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for 
/sys/class/infiniband_verbs/uverbs0
No IB devices found

the strace traces follow, the system is very much operative (eg with IPoIB)

Or.


execve("/usr/local/rdmacm/bin/ibv_devinfo", 
["/usr/local/rdmacm/bin/ibv_devinfo"], [/* 70 vars */]) = 0
uname({sys="Linux", node="dill", ...})  = 0
brk(0)  = 0x503000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x2aeba2f6f000
open("/etc/ld.so.preload", O_RDONLY)= -1 ENOENT (No such file or directory)
open("/usr/local/rdmacm/lib/tls/x86_64/libibverbs.so.2", O_RDONLY) = -1 ENOENT 
(No such file or directory)
stat("/usr/local/rdmacm/lib/tls/x86_64", 0x7fff07b4e0f0) = -1 ENOENT (No such 
file or directory)
open("/usr/local/rdmacm/lib/tls/libibverbs.so.2", O_RDONLY) = -1 ENOENT (No 
such file or directory)
stat("/usr/local/rdmacm/lib/tls", 0x7fff07b4e0f0) = -1 ENOENT (No such file or 
directory)
open("/usr/local/rdmacm/lib/x86_64/libibverbs.so.2", O_RDONLY) = -1 ENOENT (No 
such file or directory)
stat("/usr/local/rdmacm/lib/x86_64", 0x7fff07b4e0f0) = -1 ENOENT (No such file 
or directory)
open("/usr/local/rdmacm/lib/libibverbs.so.2", O_RDONLY) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\320\'\0"..., 640) = 640
fstat(3, {st_mode=S_IFREG|0755, st_size=164431, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x2aeba2f7
mmap(NULL, 1085352, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x2aeba3071000
madvise(0x2aeba3071000, 1085352, MADV_SEQUENTIAL|0x1) = 0
mprotect(0x2aeba3079000, 1052584, PROT_NONE) = 0
mmap(0x2aeba3171000, 36864, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0) 
= 0x2aeba3171000
close(3)= 0
open("/usr/local/rdmacm/lib/libpthread.so.0", O_RDONLY) = -1 ENOENT (No such 
file or directory)
open("/usr/local/rdmacm/lib/libpthread.so.0", O_RDONLY) = -1 ENOENT (No such 
file or directory)
open("/etc/ld.so.cache", O_RDONLY)  = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=130091, ...}) = 0
mmap(NULL, 130091, PROT_READ, MAP_PRIVATE, 3, 0) = 0x2aeba317a000
close(3)= 0
open("/lib64/tls/libpthread.so.0", O_RDONLY) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\340X\0\0"..., 640) = 640
fstat(3, {st_mode=S_IFREG|0755, st_size=99188, ...}) = 0
mmap(NULL, 1129880, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x2aeba319a000
madvise(0x2aeba319a000, 1129880, MADV_SEQUENTIAL|0x1) = 0
mprotect(0x2aeba31a8000, 1072536, PROT_NONE) = 0
mmap(0x2aeba329a000, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0) 
= 0x2aeba329a000
mmap(0x2aeba32aa000, 15768, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x2aeba32aa000
close(3)= 0
open("/usr/local/rdmacm/lib/libdl.so.2", O_RDONLY) = -1 ENOENT (No such file or 
directory)
open("/usr/local/rdmacm/lib/libdl.so.2", O_RDONLY) = -1 ENOENT (No such file or 
directory)
open("/lib64/libdl.so.2", O_RDONLY) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\320\36\0"..., 640) = 640
fstat(3, {st_mode=S_IFREG|0755, st_size=16807, ...}) = 0
mmap(NULL, 1058904, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x2aeba32ae000
madvise(0x2aeba32ae000, 1058904, MADV_SEQUENTIAL|0x1) = 0
mprotect(0x2aeba32b1000, 1046616, PROT_NONE) = 0
mmap(0x2aeba33ae000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0) 
= 0x2aeba33ae000
close(3)= 0
open("/usr/local/rdmacm/lib/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or 
directory)
open("/usr/local/rdmacm/lib/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or 
directory)
open("/lib64/tls/libc.so.6", O_RDONLY)  = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0p\313\1\0"..., 640) = 640
lseek(3, 624, SEEK_SET) = 624
read(3, "\4\0\0\0\20\0\0\0\1\0\0\0GNU\0\0\0\0\0\2\0\0\0\6\0\0\0"..., 32) = 32
fstat(3, {st_mode=S_IFREG|0755, st_size=1401317, ...}) = 0
mmap(NULL, 2235432, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x2aeba33b1000
madvise(0x2aeba33b1000, 2235432, MADV_SEQUENTIAL|0x1) = 0
mprotect(0x2aeba34b7000, 1162280, PROT_NONE) = 0
mmap(0x2aeba35b1000, 122880, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 
0x10) = 0x2aeba35b1000
mmap(0x2aeba35cf000, 15400, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x2aeba35cf000
close(3)= 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x2aeba35d3000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x2aeba35d4000
arch_prctl(0x

Re: [openib-general] multicast code/merge status

2007-01-18 Thread Or Gerlitz
> This is fine, but it may change when the user needs to make this 
> choice.  E.g. when creating the QP, versus joining the multicast group, 
> in order to support the valid options.  The selection also needs to be 
> conveyed to the kernel somehow.  At this point, maybe we just need to 
> start looking at specific implementations.

Indeed. I will send a patch early next week.

Or.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] OFED ipoib_8111_to_2_6_16.patch

2007-01-18 Thread Or Gerlitz
Michael S. Tsirkin wrote:
>> However, since understanding this patch in detail is important to a peer 
>> member individual/company of the community (myself/Voltaire)fo/openib-general
> 
> I really would like to help. What is it that you want to know?
> Here's an explanation from an older mail. Does this help?
> 
>   Work around for neighbour destructor issue for kernels < 2.6.17:
>   keep a global list of all ipoib neighbours. Use it in destructor to
>   1. Verify that this neighbour belongs to an ipoib device
>   2. Check that the neighbour is the last one to use the destructor,
>   if so reset the destructor pointer

OK, thanks for the info. The context here is the bonding support. We had 
an issue with distro (eg RH4 U3, SLES10) kernels that was not reproduced 
with upstream kernels and it seems to be related to the change you have 
pushed to 2.6.17. I will let you know if we need more clarifications.

Or.




___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [openfabrics-ewg] Minutes for January 15, 2007 teleconference about OFED 1.2 development progress toward code freeze

2007-01-18 Thread Or Gerlitz
Michael S. Tsirkin wrote:
> Sounds too risky to me, this is technology preview code so
> I want to have all this stuff off by default but easily
> enabled by users who want to demo.

I really don't want us to go again through things like yours (MST, Jack) 
vs. Sean rdma_establish, ucma versions etc.

Like it or not, as was defined by the founders, OFED is --not-- a 
framework for development and unless there is a very specific reason (*) 
its kernel/user content should be based on code that have --passed 
through this component maintainer--

As been said over this list lets not treat OFED as a framework to shovel 
in unreviewed code.

If you feel that your mthca and rdmacm QoS changes should be under 
CONFIG_EXPERIMENTAL , for-mm etc, specify this when you send the patches 
for review.

Bottom line, lets not hind behind obscure definitions like "technology 
preview" to escape from normal processes where there -is- an 
alternative, the point here is not to meet the code freeze dead line 
avoiding normal processes - lets use processes and extend the deadline 
for the QoS merge if needed.

(*) So far, the only case where people felt it makes sense to merge out 
of tree code was the local-sa and it is done by this component maintainer.

> After I post the rest of the code, if you like you'll be able to
> post an iser patch to add this stuff to iser as well.

this is irrelevant till we resolve the process.




___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [openfabrics-ewg] Minutes for January 15, 2007 teleconference about OFED 1.2 development progress toward code freeze

2007-01-17 Thread Or Gerlitz
On 1/17/07, Michael S. Tsirkin <[EMAIL PROTECTED]> wrote:
> > Quoting Or Gerlitz <[EMAIL PROTECTED]>:

> > I understand that the change involves letting the rdma cm know the SID
> > when the consumer calls --rdma_resolve_route-- where today it get to
> > know the SID when the consumer calls --rdma_connect-- . So this is not
> > an internal RDMA CM change but rather also changes the API.

> > Same for SRP as the api of ib_sa_path_rec_get (that is the structure it
> > gets as input) changes, the SRP code also changes.

> > Any, can you send the mthca and rdmacm/rdmacm-consumers changes as
> > RFC/PATCH over the list before the actual code freeze???

> I didn't start on this code yet, but it does not look like a
> huge project, I hope to post code by next week.

> To avoid major disruptions all over the stack, my preference for OFED 1.2
> would be to add new API calls and a module option (off by default) for cma/srp
> to use them.

the rdmacm api change is not such a big deal and if you want to change
it only for the kernel portion for the ofed 1.2 it makes sense to me.
I really don't think --adding-- a special api is the way to go. Doing
it in "end in mind" fashion, work on a patch, send it to the rdmacm
maintainer/list for RFC and so on.

> For OFED 1.2, I only planned to implement this for SDP and SRP.
> I do not expect all this to be mergeable in 2.6.21 time frame,
> so maybe that's enough.

SDP is coded over the RDMA CM and i say above my suggestion is not to
add a special API, so just dp the same QoS patching you do to SDP to
iSER etc.

Or.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] multicast code/merge status

2007-01-17 Thread Or Gerlitz
On 1/17/07, Sean Hefty <[EMAIL PROTECTED]> wrote:
> > +1 used only for unicast
> > +2 used only for multicast
> > +3 used for both unicast and multicast

> If you view this as the use case for one side only, we also have option 3
> communicating with options 1 and 2.  I would list these as:

OK

> +4 unicast QP to unicast and multicast QP

i think you mean 3 <--> 1 that is unicast and multicast QP to unicast QP

> +5 multicast QP to unicast and multicast QP

i think you mean 3 <--> 2 that is unicast and multicast QP to multicast QP

> Today, all of these work.  What you're wanting to add is the ability to
> communicate with an ipoib multicast group.  I'd like to do this without 
> breaking
> any of the existing communications, or treat ipoib separately for security 
> reasons.

makes sense, so my suggestion is "leave this (using the ipoib qkey) to
the user"
if you prefer to have two group types: rdmacm and ipoib - that's fine.
we would use ipoib type groups and in the envs that seting the qkey to
be the ipoib would not break our communication (that is where we do
need to interop with IPoIB) - we would do it, else we would do
nothing.

> > To make things simple, the solution i suggest is that that the RDMA CM
> > would --not-- do this modify QP/QKEY (that is would set the 0x12345678
> > qkey on the modify qp to init) and rather leave it to the RDMA CM
> > consumer --if-- they wish to do so. However it will use the ipv4
> > broadcast group qkey for doing mcast joins and report this qkey to the
> > user in the ud param of the event.
>
> We need to be able to handle options 4 and 5 as well.

indeed, i have addressed that above.

> > this (what qkey is assigned to the ipv4 broadcast group by different
> > SAs) is orthogonal to the discussion we do here.

> This depends on whether verbs allows, or if it should allow, a user to 
> specify a
> controlled qkey when configuring their QP.

I don't think there is any limitation today in the verbs layer,
actually for our testing so far we patches the rdmacm not set the sig
byte and use the ipoib (ie not override it in core/cma.c)  and we
manage to interop fine with ipoib.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] multicast code/merge status

2007-01-17 Thread Or Gerlitz
On 1/17/07, Michael S. Tsirkin <[EMAIL PROTECTED]> wrote:
> > not following you here, how does qkey relates to RC QPs ?

> Currently you can block userspace from creating QPs by unloading uverbs 
> module.
> Maybe we should make it possible to block creating UD QPs from userspace
> as a separate security measure.

I don't think this is valid option for most of the IB production env.
but if you want to add blocking UD QP creation to ib_uverbs as  mod
param whose default value is --unset--,  i don't really care.

Or.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] OFED ipoib_8111_to_2_6_16.patch

2007-01-17 Thread Or Gerlitz
Or Gerlitz wrote:
> Michael S. Tsirkin wrote:
>>>> git log -Sneigh_destructor -- include/net/neighbour.h

>>> also, having that at (my) hand does not remove the need that you will 
>>> set a changelog/signature for the OFED ipoib related backport patch.

>> Feel free to add that.

> Unless i miss something, we want all OFED kernel patches to meet 
> **basic** kernel working conversions, specifically that for each patch 
> there is a change log and an owner.

OK, I realize now that in OFED 1.1 out of 438 .patch files under 
kernel_patches only 103 of them have Signed-Off-by line and assuming 
this maps 1:1 to the files that have change log, i am not asking you to 
write now 335 change-logs/signed-off-by section.

However, since understanding this patch in detail is important to a peer 
member individual/company of the community (myself/Voltaire) and you 
being this patch owner and also having the OFED kernel patches 
maintainer chair, it makes sense that per our request you will put 5 
minutes of your time to write a change log.

Or.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] multicast code/merge status

2007-01-17 Thread Or Gerlitz
Michael S. Tsirkin wrote:
>> Quoting Sean Hefty <[EMAIL PROTECTED]>:
>> Subject: Re: multicast code/merge status
>>
>>> sure, it can use the rdmacm qkey (0x1234567 etc) when it creates the QP 
>>> and later --if-- the user joins a multicast group modify the qp state 
>>> with the group qkey and report it in the cma event such that the 
>>> consumer of the rdmacm would set this into his IB UD TX WR
>> Changing the qkey would break its existing UD communication.
>>
>>> Bottom line, Looking in the IB SPEC and IPoIB RFC i did not see 
>>> mentioning of privileged QKEY.
>>  From RFC 4391 (ipoib RFC), 4.1:
>>
>>   2. Q_Key
>>
>>It is RECOMMENDED that a controlled Q_Key be used with the
>>high-order bit set.  This is to prevent non-privileged
>>software from fabricating and sending out bogus IP datagrams.
> 
> BTW, should we be worried that proposed extension (passing qkey in rdma cm 
> param
> list) seems to expose this qkey to non-privileged software?

As was said over related threads here and elsewhere, multicast has its 
in nature non safeties and having IB implement broadcast over multicast 
adds more in safety to the party.

Specifically, as Roland has commented, a user can attach his user space 
UD QP to the MGID of the ipv4 broadcast(if ipoib is running on this 
node it will join the group) and start making this IP subnet go crazy.

We only want interop with IPoIB and we don't need to join/attach the 
ipv4 broadcast group just have an option for the rdmacm to use its qkey 
for joins and later either the rdmacm or the consumer will also set this 
qkey into the QP and the UD TX WR

> Maybe a machanism should be in place to control access to this separately
> from regular rdma cm for RC QPs?

not following you here, how does qkey relates to RC QPs ?

Or.




___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [openfabrics-ewg] Minutes for January 15, 2007 teleconference about OFED 1.2 development progress toward code freeze

2007-01-17 Thread Or Gerlitz
Tziporet Koren wrote:
> Or Gerlitz wrote:
>> Tziporet Koren wrote:
>>
>>
>>   The bonding package would support: fresh (2.6.20) and some older 
>> upstream kernels along with SLES10 and RH4 Ux (x=3 for sure)
>>
>>   
> OK - please send us all the info once its ready
>>>   General changes to the package:
>>> * Multicast - we wait for Voltaire and Sean to close all technical
>>>   details - should be ready by the end of the week
>>> 
>>
>> I have just sent Sean over the list a clarification email, if needed 
>> we would be able to help doing the missing patches and i guess in a 
>> combined effort this would be ready for the end of --next-- week
>>
>>   
> Thanks - please work with MST & Vlad on integration
>> what about the host side QoS code? i did not see an newer RFC nor 
>> patch other then the RFC that was sent many months ago.

> We are going to update our low level driver (mthca) to support it. 

> Beside there should be a small change in CMA for this, and its specified 
> in the RFC.

I understand that the change involves letting the rdma cm know the SID 
when the consumer calls --rdma_resolve_route-- where today it get to 
know the SID when the consumer calls --rdma_connect-- . So this is not 
an internal RDMA CM change but rather also changes the API.

Same for SRP as the api of ib_sa_path_rec_get (that is the structure it 
gets as input) changes, the SRP code also changes.

Any, can you send the mthca and rdmacm/rdmacm-consumers changes as 
RFC/PATCH over the list before the actual code freeze???

As for the QoS RFC
(http://openib.org/pipermail/openib-general/2006-May/022331.html) sent 
by Eitan, one design issue I see there is how to deal with IB ULPs which 
do --not-- have a well known SID. So they call ib_cm_listen with 
IB_CM_ASSIGN_SERVICE_ID and get from the CM a service id to use, then 
they might do some out of band exchange of this SID before starting 
their connection establishment.

from include/rdma/ib_cm.h

> * @service_id: Service identifier matched against incoming connection
>  *   and service ID resolution requests.  The service ID should be specified
>  *   network-byte order.  If set to IB_CM_ASSIGN_SERVICE_ID, the CM will
>  *   assign a service ID to the caller.

Typically this happens with MPI up to the extent that different ranks 
within the same job may get a different SID. One solution i was thinking 
of is to

+1 define --range-- (eg big enough to serve 1024 CM consumers) per ULP
+2 change the CM to support allocating SID in a range
+3 change the ULPs which use IB_CM_ASSIGN_SERVICE_ID to ask SID in the 
relevant range
+4 change the QoS manager at the SM side to support ranges

Or.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] multicast code/merge status

2007-01-17 Thread Or Gerlitz
Sean Hefty wrote:
>> sure, it can use the rdmacm qkey (0x1234567 etc) when it creates the 
>> QP and later --if-- the user joins a multicast group modify the qp 
>> state with the group qkey and report it in the cma event such that the 
>> consumer of the rdmacm would set this into his IB UD TX WR

> Changing the qkey would break its existing UD communication.

OK, so we have three use cases here for a UD QP

+1 used only for unicast
+2 used only for multicast
+3 used for both unicast and multicast

and my suggestion (default qkey, when join is completed do qp modify 
with the group qkey) would work fine for use cases

1 - since the user never joins to anything and
2 - same as it works in ipoib

so we are left with use case 3.

To make things simple, the solution i suggest is that that the RDMA CM 
would --not-- do this modify QP/QKEY (that is would set the 0x12345678 
qkey on the modify qp to init) and rather leave it to the RDMA CM 
consumer --if-- they wish to do so. However it will use the ipv4 
broadcast group qkey for doing mcast joins and report this qkey to the 
user in the ud param of the event.

So users that don't care about their qkey would never bother to do this 
modify qp and users who do care would do it and have to take caution if 
their QP is of type 3 (both unicast and mcast).

If you don't like this direction, your idea from below to have two 
option for group type - rdmacm or ipoib and have the consumer specify 
it, so for group type ipoib you will use the ipv4 brd qkey for both join 
and modify qp and for group type rdmacm you would just use the rdmacm 
qkey and do no modify qp - this is fine for us as well.

>> Bottom line, Looking in the IB SPEC and IPoIB RFC i did not see 
>> mentioning of privileged QKEY.
> 
>  From RFC 4391 (ipoib RFC), 4.1:
> 
>  2. Q_Key
> 
>   It is RECOMMENDED that a controlled Q_Key be used with the
>   high-order bit set.  This is to prevent non-privileged
>   software from fabricating and sending out bogus IP datagrams.
> 
> I don't know what qkey is actually assigned, however.

this (what qkey is assigned to the ipv4 broadcast group by different 
SAs) is orthogonal to the discussion we do here.

> I have some path forward related tasks that I would like to complete 
> before starting on this.  I hope to finish that before the end of this 
> week.  I don't want to rush on the multicast support and miss 
> something.  For the rdma cm, we may need to let the user set some 
> options when joining a multicast group.  Maybe something like: join type 
> (send-only or send-receive), group type (ipoib or rdma defined), etc.

As I see it, the group type (or having no types and being always 
interoperable with ipoib as i suggest above) seems easy to add to the 
current implementation and would put it in acceptable state for upstream 
pushing to 2.6.21 and inclusion in OFED 1.2 .

As for the join type, as i told you before, I think it should --not-- 
delay the upstream nor the ofed 1.2 push - if you have the time add this 
  to the user/kernel ABI and have ucma kernel return -EINVAL if someone 
attempts to to send-only join. And if you don't have the time for that, 
it can be added later.

Actually, as you can see in the ipoib code, it never does a 
send-only-non-member join, so my take here is that till the ipoib issue 
is resolved there is no reason to have this complexity in the rdmacm.

> I do plan on requesting that the core multicast changes to ib_sa and 
> ib_ipoib be pulled into 2.6.21.

This is great news but again I think the "nobody perfect" rule applies 
well here, the current rdmacm multicast support (which the little fixes 
we discuss over this thread) can be pushed to 2.6.21 and be enhanced later.

Or.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] some IB multicast sendonly thoughts

2007-01-16 Thread Or Gerlitz
Eitan Zahavi wrote:
> Or Gerlitz wrote:
>> Eitan Zahavi wrote:

>> So you are saying that the GW **has** to listen on IGMP at the Eth 
>> side and **has** to do IB SA join in the only way that forces the SA 
>> to create the group --> FullMember ?

> Yes

>> If indeed, this is kind of bad, 

> I find it very reasonable

OK, going fwd with this approach, the GW got IGMP --> so it did FULL 
MEMBER join and the group is created, what is going on when the Eth 
multicast node stopped doing RX is there a "leave" IGMP which the GW can 
trap and act?

Or.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] multicast code/merge status

2007-01-16 Thread Or Gerlitz
Sean Hefty wrote:
>> mimic IPoIB qkey flow:

>> +3 on rdma_create_qp do modify qp with some def qkey (eg zero)
>> +4 on the join completion path before attaching a qp to the associated
>> mgid, do modify qp with this mrec qkey (=ipv4 broadcast one)

> The rdma cm allows UD QP communication, which requires a valid qkey 
> before or without joining a multicast group.  I'd like to find a way to 
> continue to support this.

sure, it can use the rdmacm qkey (0x1234567 etc) when it creates the QP 
and later --if-- the user joins a multicast group modify the qp state 
with the group qkey and report it in the cma event such that the 
consumer of the rdmacm would set this into his IB UD TX WR

>> +3 on rdma_create_qp do modify qp with some def qkey (eg zero)
>> +4 on the join completion path before attaching a qp to the associated
>> mgid, do modify qp with this mrec qkey (=ipv4 broadcast one)

> Isn't the ipoib qkey a privileged qkey?

looking in ipoib code you can see the following code in 
ipoib_mcast_join_task

>   if (!test_bit(IPOIB_MCAST_FLAG_ATTACHED, &priv->broadcast->flags)) {
>   ipoib_mcast_join(dev, priv->broadcast, 0);
>   return;
>   }

so ipoib_mcast_join is called with create=0 for the broadcast group and 
this makes
it provide a component mask of

>   comp_mask =
>   IB_SA_MCMEMBER_REC_MGID |
>   IB_SA_MCMEMBER_REC_PORT_GID |
>   IB_SA_MCMEMBER_REC_PKEY |
>   IB_SA_MCMEMBER_REC_JOIN_STATE;

that is the SA sets the QKEY, RATE, MTU, SL etc etc for the broadcast 
group and later other
any joins done by ipoib uses the priv->broadcast->mcmember fields

So the broadcast qkey is basically what the SA has set when it created 
the group.

During my talking here i got a pointer to section 10 in the IPoIB RFC 
(4391)
mentioning something like "some 3rd party --has-- to create the 
broadcast group":

> 10.  Sending and Receiving IP Multicast Packets

> A node joining an IP multicast group must first construct an MGID
>according to the rule described in section 4 above.  Once the correct
>MGID is calculated, the node must call the SA of the outbound link to
>attempt a "FullMember" join of the IB multicast group corresponding
>to the MGID.  If the IB multicast group does not already exist, one
>must be created first with the IPoIB link MTU.  The MGID MUST use the
>same P_Key, Q_Key, SL, MTU, and HopLimit as those used in the
>broadcast-GID.  The rest of attributes SHOULD follow the values used
>in the broadcast-GID as well.

Bottom line, Looking in the IB SPEC and IPoIB RFC i did not see 
mentioning of privileged QKEY.

Or.




___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] some IB multicast sendonly thoughts

2007-01-16 Thread Or Gerlitz
Eitan Zahavi wrote:
>> So you are saying that the GW **has** to listen on IGMP at the Eth 
>> side and **has** to do IB SA join in the only way that forces the SA 
>> to create the group --> FullMember ?

> Yes

>> If indeed, this is kind of bad, 

> I find it very reasonable

OK, let me think about it for a while, thanks for the quick response.

Or.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Minutes for January 15, 2007 teleconference about OFED 1.2 development progress toward code freeze

2007-01-16 Thread Or Gerlitz
Tziporet Koren wrote:

Hi Tziporet, thanks for the details info, below are few comments:

> *Abbreviated minutes / summary*
> * Bonding module will be added to OFED 1.2 to support HA on older
>   kernels

The bonding package would support: fresh (2.6.20) and some older 
upstream kernels along with SLES10 and RH4 Ux (x=3 for sure)

>   General changes to the package:
> * Multicast - we wait for Voltaire and Sean to close all technical
>   details - should be ready by the end of the week

I have just sent Sean over the list a clarification email, if needed we 
would be able to help doing the missing patches and i guess in a 
combined effort this would be ready for the end of --next-- week

>   Management:
> * OpenSM:
>   o QoS - on work; first version will be ready at end of month

what about the host side QoS code? i did not see an newer RFC nor patch 
other then the RFC that was sent many months ago.

Or.





___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] some IB multicast sendonly thoughts

2007-01-16 Thread Or Gerlitz
Eitan Zahavi wrote:
> Or Gerlitz wrote:
>> OK, assuming my setup consists of:
>> +1 IB node doing only multicast TX on a group
>> +2 an IB/Ethernet gateway
>> 3+ Eth node doing only multicast RX on the equiv mac (forget manytoone)

>> The gateway design is to register for SA MGID IN/OUT traps and when it 
>> gets MGID IN it joins the the mgroup as ***NonMember** etc

> GW needs to listen on IGMP on the Eth port...

this was fast... thanks for jumping on it.

So you are saying that the GW **has** to listen on IGMP at the Eth side 
and **has** to do IB SA join in the only way that forces the SA to 
create the group --> FullMember ?

If indeed, this is kind of bad, I find the approach of the GW being 
"transparent" to the SA in the sense that it does not cause mgroup 
create/destroy nor mgroup ref count inc/dec much more robust, so you are 
saying its not feasible with the IB spec.

Or.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] some IB multicast sendonly thoughts

2007-01-16 Thread Or Gerlitz
> 15.2.5.17.1 GROUP MEMBERSHIP

> An endport must specify the type of multicast subscription or deletion that
> it wants. The MCMemberRecord:JoinState component indicates the
> membership qualities a port wishes to add (in joining or creating a group)
> or remove (in leaving a group). The meanings of the MCMember-
> Record:JoinState bits are:

> • FullMember: Group messages are routed both to and from the port.
> The port is considered a member for purposes of group creation and
> deletion, i.e.: if no member ports with FullMember=1 remain, the
> group may be deleted; otherwise it may not.

> • NonMember: Group messages are routed both to and from the port.
> The port is not considered a member for purposes of group creation/
> deletion.

> • SendOnlyNonMember: Group messages are only routed from the
> port; none are routed to the port. The port is not considered a member
> for purposes of group creation/deletion.



> MCMemberRecord:JoinState.FullMember bit must be set to 1 in the SubnAdmSet()
> request that creates a multicast group.

...

OK, assuming my setup consists of:

+1 IB node doing only multicast TX on a group

+2 an IB/Ethernet gateway

3+ Eth node doing only multicast RX on the equiv mac (forget manytoone)

The gateway design is to register for SA MGID IN/OUT traps and when it 
gets MGID IN it joins the the mgroup as ***NonMember** etc

Now, since the TX node joins as SendOnlyNonMember the SA would never 
create this group --> the TX node would never get MLID etc to create AH, 
etc etc

---> this setup is broken.

any thoughts and/or ideas would be welcome

Or.

















___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] multicast code/merge status

2007-01-16 Thread Or Gerlitz
Sean Hefty wrote:
>> OK, got you at last (sorry but i have somehow ignored the call to
>> ib_addr_get_mgid() at the rdmacm code). So to achieve interop with
>> IPoIB all we need to do is remove the rdmacm signature bit and not to
>> over-write the rdmacm qkey on the the qkey of the ipoib ipv4 broadcast
>> group, are you ok with that?

> I believe this would achieve interop with ipoib.  However, overwriting 
> the qkey may break any existing UD communication that the user may 
> have.  I just need to think about this more, and see what we can come up 
> with.

Hi Sean,

Based on our communication so far, the elements which are missing are

++ on the rdmacm kernel code: (drivers/infiniband/core/cma.c)

+1 remove the rdmacm signature byte from the mgid
+2 get the qkey used by the ipv4 broadcast group and use it

mimic IPoIB qkey flow:

+3 on rdma_create_qp do modify qp with some def qkey (eg zero)
+4 on the join completion path before attaching a qp to the associated
mgid, do modify qp with this mrec qkey (=ipv4 broadcast one)

++ on the rdmacm user space code: (librdmacm/src/cma.c)

+3 on rdma_create_qp do modify qp with some def qkey (eg zero)
+4 on the join completion path before attaching a qp to the associated
mgid, do modify qp with this mrec qkey (=ipv4 broadcast one)

With the time frame for 2.6.21 and OFED 1.2 becoming short, can you 
update of the multicast patch series status? We really want it in for 
this time frame, please let me know if you prefer to get patches that 
implement the above (eg as reference) or do it yourself...

thanks,

Or.



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] OFED ipoib_8111_to_2_6_16.patch

2007-01-16 Thread Or Gerlitz
Michael S. Tsirkin wrote:
>>> git log -Sneigh_destructor -- include/net/neighbour.h
>> produced nothing on my net-2.6.20 git however browsing the git log i see 
>> this patch, is this the one you refer to?

> Yes.

thanks

>> also, having that at (my) hand does not remove the need that you will 
>> set a changelog/signature for the OFED ipoib related backport patch.

> Feel free to add that.

Unless i miss something, we want all OFED kernel patches to meet 
**basic** kernel working conversions, specifically that for each patch 
there is a change log and an owner.

Or.




___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] OFED ipoib_8111_to_2_6_16.patch

2007-01-16 Thread Or Gerlitz
Michael S. Tsirkin wrote:
>>> It's a backport for kernels <= 2.6.16.

>> Can you please send (and add to OFED 1.2) a changelog comment explaining 
>> the problem and how it is solved in 2.6.17 and above ?!

>> We are looking on some code around ipoib_neigh_destructor() and friends 
>> and the changelog would really be of help to us.

> Try this
> git log -Sneigh_destructor -- include/net/neighbour.h

produced nothing on my net-2.6.20 git however browsing the git log i see 
this patch, is this the one you refer to?

also, having that at (my) hand does not remove the need that you will 
set a changelog/signature for the OFED ipoib related backport patch.

> commit c5ecd62c25400a3c6856e009f84257d5bd03f03b
> Author: Michael S. Tsirkin <[EMAIL PROTECTED]>
> Date:   Mon Mar 20 22:25:41 2006 -0800
> 
> [NET]: Move destructor from neigh->ops to neigh_params
> 
> struct neigh_ops currently has a destructor field, which no in-kernel
> drivers outside of infiniband use.  The infiniband/ulp/ipoib in-tree
> driver stashes some info in the neighbour structure (the results of
> the second-stage lookup from ARP results to real link-level path), and
> it uses neigh->ops->destructor to get a callback so it can clean up
> this extra info when a neighbour is freed.  We've run into problems
> with this: since the destructor is in an ops field that is shared
> between neighbours that may belong to different net devices, there's
> no way to set/clear it safely.
> 
> The following patch moves this field to neigh_parms where it can be
> safely set, together with its twin neigh_setup.  Two additional
> patches in the patch series update ipoib to use this new interface.
> 
> Signed-off-by: Michael S. Tsirkin <[EMAIL PROTECTED]>
> Signed-off-by: Roland Dreier <[EMAIL PROTECTED]>
> Signed-off-by: David S. Miller <[EMAIL PROTECTED]>
> 







___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] OFED ipoib_8111_to_2_6_16.patch

2007-01-16 Thread Or Gerlitz
Michael S. Tsirkin wrote:
>> Quoting Or Gerlitz <[EMAIL PROTECTED]>:
>> Subject: OFED ipoib_8111_to_2_6_16.patch

> Isn't it obvious from the name?

sure, thanks for the clarification.

> It's a backport for kernels <= 2.6.16.

Can you please send (and add to OFED 1.2) a changelog comment explaining 
the problem and how it is solved in 2.6.17 and above ?!

We are looking on some code around ipoib_neigh_destructor() and friends 
and the changelog would really be of help to us.

Thanks,

Or.






___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] OFED ipoib_8111_to_2_6_16.patch

2007-01-16 Thread Or Gerlitz
Hi Michael,

I have just realized that

a) this patch was not pushed upstream

and

b) the --same-- instance of it is kept on all backports of both OFED 1.1 & 1.2 
staging

It also does not have a changelog comment and Signed-Off-By signature...

Can you shed some light on what's going on here?

thanks,

Or.


# pwd

/home/ogerlitz/OFED-1.1/SOURCES/openib-1.1/kernel_patches

# find . -name \*ipoib\* | grep 8111 | xargs ls -l
-rw-r--r--  1 1078 101 2616 Oct 19 16:21 
./backport/2.6.11_FC4/ipoib_8111_to_2_6_16.patch
-rw-r--r--  1 1078 101 2616 Oct 19 16:21 
./backport/2.6.11/ipoib_8111_to_2_6_16.patch
-rw-r--r--  1 1078 101 2616 Oct 19 16:21 
./backport/2.6.12/ipoib_8111_to_2_6_16.patch
-rw-r--r--  1 1078 101 2616 Oct 19 16:21 
./backport/2.6.13/ipoib_8111_to_2_6_16.patch
-rw-r--r--  1 1078 101 2616 Oct 19 16:21 
./backport/2.6.13_suse10_0_u/ipoib_8111_to_2_6_16.patch
-rw-r--r--  1 1078 101 2616 Oct 19 16:21 
./backport/2.6.14/ipoib_8111_to_2_6_16.patch
-rw-r--r--  1 1078 101 2616 Oct 19 16:21 
./backport/2.6.15/ipoib_8111_to_2_6_16.patch
-rw-r--r--  1 1078 101 2616 Oct 19 16:21 
./backport/2.6.16/ipoib_8111_to_2_6_16.patch
-rw-r--r--  1 1078 101 2616 Oct 19 16:21 
./backport/2.6.16_sles10/ipoib_8111_to_2_6_16.patch
-rw-r--r--  1 1078 101 2616 Oct 19 16:21 
./backport/2.6.9/ipoib_8111_to_2_6_16.patch
-rw-r--r--  1 1078 101 2616 Oct 19 16:21 
./backport/2.6.9_U3/ipoib_8111_to_2_6_16.patch
-rw-r--r--  1 1078 101 2616 Oct 19 16:21 
./backport/2.6.9_U4/ipoib_8111_to_2_6_16.patch

Index: openib_branch1.0/drivers/infiniband/ulp/ipoib/ipoib_main.c
===
--- openib_branch1.0.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ openib_branch1.0/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -82,6 +82,9 @@ static const u8 ipv4_bcast_addr[] = {

 struct workqueue_struct *ipoib_workqueue;

+static DEFINE_SPINLOCK(ipoib_all_neigh_list_lock);
+static LIST_HEAD(ipoib_all_neigh_list);
+
 static void ipoib_add_one(struct ib_device *device);
 static void ipoib_remove_one(struct ib_device *device);

@@ -751,6 +754,17 @@ static void ipoib_neigh_destructor(struc
unsigned long flags;
struct ipoib_ah *ah = NULL;

+   struct ipoib_neigh *tn, *nn = NULL;
+   spin_lock(&ipoib_all_neigh_list_lock);
+   list_for_each_entry(tn, &ipoib_all_neigh_list, all_neigh_list)
+   if (tn->neighbour == n) {
+   nn = tn;
+   break;
+   }
+   spin_unlock(&ipoib_all_neigh_list_lock);
+   if (!nn)
+   return;
+
ipoib_dbg(priv,
  "neigh_destructor for %06x " IPOIB_GID_FMT "\n",
  be32_to_cpup((__be32 *) n->ha),
@@ -783,19 +797,33 @@ struct ipoib_neigh *ipoib_neigh_alloc(st
neigh->neighbour = neighbour;
*to_ipoib_neigh(neighbour) = neigh;

+   spin_lock(&ipoib_all_neigh_list_lock);
+   list_add_tail(&neigh->all_neigh_list, &ipoib_all_neigh_list);
+   neigh->neighbour->ops->destructor = ipoib_neigh_destructor;
+   spin_unlock(&ipoib_all_neigh_list_lock);
+
return neigh;
 }

 void ipoib_neigh_free(struct ipoib_neigh *neigh)
 {
+   struct ipoib_neigh *nn;
+   spin_lock(&ipoib_all_neigh_list_lock);
+   list_del(&neigh->all_neigh_list);
+   list_for_each_entry(nn, &ipoib_all_neigh_list, all_neigh_list)
+   if (nn->neighbour->ops == neigh->neighbour->ops)
+   goto found;
+
+   neigh->neighbour->ops->destructor = NULL;
+found:
+   spin_unlock(&ipoib_all_neigh_list_lock);
+
*to_ipoib_neigh(neigh->neighbour) = NULL;
kfree(neigh);
 }

 static int ipoib_neigh_setup_dev(struct net_device *dev, struct neigh_parms 
*parms)
 {
-   parms->neigh_destructor = ipoib_neigh_destructor;
-
return 0;
 }

Index: openib_branch1.0/drivers/infiniband/ulp/ipoib/ipoib.h
===
--- openib_branch1.0.orig/drivers/infiniband/ulp/ipoib/ipoib.h
+++ openib_branch1.0/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -47,6 +47,8 @@
 #include 
 #include 

+#include 
+
 #include 

 #include 
@@ -217,6 +219,7 @@ struct ipoib_neigh {

struct neighbour   *neighbour;

+   struct list_headall_neigh_list;
struct list_headlist;
 };


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] ipoib ipv6 multicast joins, was: multicast code/merge status

2007-01-15 Thread Or Gerlitz
On 1/15/07, Sean Hefty <[EMAIL PROTECTED]> wrote:
> > Can you explain how this relates to your multicast changes? the IPoIB
> > send-only-full-member-join hack was there before your patch and stayed
> > there after your patch... and how come a change in the multicast code
> > can cause the error steam to be finite... have you moved the retry
> > mechanism from the ib_sa consumer to the ib_sa mcast engine?
>
> There was a bug in the ib_sa multicast engine handling failed joins, which had
> it retry forever.  (Basically, the response was not being matched with the
> request.  So the response was discarded, and the request was retried.)  I had
> fixed this in svn, but lost the patch moving over to git.

sure, got you.

Or.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] [libibverbs] Adding acks to all of the CQ events in the pingpong examples

2007-01-14 Thread Or Gerlitz
Roland Dreier wrote:
> OK, this is correct -- but since the examples don't destroy the CQ, is
> there any point in acking the events?

Yes, people use these examples when learning how to write code for IB, 
lets educate them well ... (ie the destroy cq should be added later)

Or.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] ipoib ipv6 multicast joins, was: multicast code/merge status

2007-01-14 Thread Or Gerlitz
Sean Hefty wrote:
>> So, this looks like a work-around for some broken SM, does it not?
> 
> Yes - I mentioned it because the resulting error message (wrong 
> component mask) is what was filling up the opensm log file.
> 
> Jan 11 14:21:36 083844 [40583BB0] -> osm_mcmr_rcv_join_mgrp: ERR 1B11: 
> method =
> SubnAdmSet, scope_state = 0x1, component mask = 0x00010083, 
> expected com
> p mask = 0x000130c7, MGID: 0x : 
> 0x201400020404 from
> port 0x0002c9010ad258f1
> 
> I've applied a missing patch to my rdma-dev git tree that should avoid 
> filling up the opensm log file.  But the error in the opensm log file is 
> a result of this work-around.

Sean,

Can you explain how this relates to your multicast changes? the IPoIB 
send-only-full-member-join hack was there before your patch and stayed 
there after your patch... and how come a change in the multicast code 
can cause the error steam to be finite... have you moved the retry 
mechanism from the ib_sa consumer to the ib_sa mcast engine?

Or.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [RFC] [PATCH V2 0/3] bonding support for operation over IPoIB

2007-01-11 Thread Or Gerlitz
Or Gerlitz wrote:
> This patch series is a second version (see below link to V1) of the suggested
> changes to the bonding driver such that it would be able to support non 
> ARPHRD_ETHER
> netdevices for its High-Availability (active-backup) mode.
> 
> The motivation is to enable the bonding driver on its HA mode to work with the
> IP over Infiniband (IPoIB) driver. With these patches I was able to enslave
> IPoIB netdevices and run TCP, UDP, IP (UDP) Multicast and ICMP traffic with
> fail-over and fail-back working fine. My working env was the net-2.6.20 git.


> These patches are not enough for configuration of IPoIB bonding through tools
> (eg /sbin/ifenslave and /sbin/ifup) provided by packages such as sysconfig and
> initscripts, specifically since these tools sets the bonding device to be UP
> before enslaving anything. Once this patchset gets positive/feedback the next 
> step
> would be to look how to enhance the tools/packages so it would be possible to
> bond/enslave with the modified code. As suggested by the bonding maintainer, 
> this
> step can potentially involve converting ifenslave to be a script based on the 
> bonding
> sysfs infrastructure rather on the somehow obsoleted 
> Documentation/networking/ifenslave.c

Jay,

I would like to move forward and push the V2 patch series upstream 
through netdev and then start working on the configuration tools etc 
changes needed to support bonding IPoIB devices through non direct 
bonding sysfs scripts... are you OK with that?

If you agree to the push, who is doing this nowadays, is it Jeff Garzik 
or David Miller?

Roland - any other comments/concerns that you might have are very much 
appreciated.

Or.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 7/7] libcxgb3: Update libcxgb3 for new libibverbs driver handling

2007-01-10 Thread Or Gerlitz
Michael S. Tsirkin wrote:

>>  > So libibverbs 1.1 will be part of ofed 1.2?

>> That's the goal, and I guess you're counting on it for libcxg3

> I guess this means libcxg3 can be made to work with libibverbs 1.0 if
> desired.

Just a reminder for the importance of including libibverbs 1.1 in OFED 
1.2 ---> to have the ***fork*** supported merged at last to an official 
release.

Or.




___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] multicast code/merge status

2007-01-10 Thread Or Gerlitz
On 10 Jan 2007 14:31:38 -0500, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
> On Wed, 2007-01-10 at 13:47, Or Gerlitz wrote:

> > (*) there are some more issues here which need to be addressed, see
> > for example the "Some SMs don't support send-only yet" weird comment
> > at ipoib_mcast_sendonly_join()

> It's more likely an SA issue but I'm only guessing... It may also be
> historical...

We are not a huge community, how about asking the person who put this
comment to come and say "i did it" and "it was done b/c this or that"

Or.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



  1   2   3   4   5   6   >