Re: [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys

2018-05-16 Thread Hal Rosenstock
On 5/16/2018 3:22 PM, Håkon Bugge wrote:
> But, do we need an update to IBTA (that the BTH.PKey shall be that of the 
> VM's Port)?

Nothing in spec mentions shared (port) virtualization so that is an
exercise completely left to the reader...

Annex A19 is silent on this specific point but the virtual port model
has virtual port partition table so there's no ambiguity there.


Re: [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys

2018-05-16 Thread Hal Rosenstock
On 5/16/2018 3:22 PM, Håkon Bugge wrote:
> But, do we need an update to IBTA (that the BTH.PKey shall be that of the 
> VM's Port)?

Nothing in spec mentions shared (port) virtualization so that is an
exercise completely left to the reader...

Annex A19 is silent on this specific point but the virtual port model
has virtual port partition table so there's no ambiguity there.


Re: [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys

2018-05-16 Thread Hal Rosenstock
On 5/16/2018 11:12 AM, Jason Gunthorpe wrote:
> On Wed, May 16, 2018 at 08:47:21AM +0200, Håkon Bugge wrote:
> 
>>> This is not a difficult issue.
>>>
>>> If the GMP is properly tagged with the right PKey then it will never
>>> be delivered to the VM if the VM does not have the PKey in the
>>> table.
>>
>> Not quite right. For the shared port model, a GMP will (most
>> probably) be accepted by the physical port, due to:
> 
> Sure, but I am talking about the VM's 'virtual port'.
> 
>> So, if the GMP is destined to VM1, which is a limited member, but we
>> have another VM2 which is a full member of the same partition, the
>> GMP will pass the HCA’s PKey check.
> 
> Sure.
> 
>>> It is up to the hypervisor to block GMPs that have Pkeys not in
>>> the virtual PKey table of the VF.
>>
>> The packet is received by the HCA and stripped from IB headers, in
>> particular the BTH. How can the "hypervisor" block it when its
>> doesn’t have access to the GMP’s BTH.PKey?
> 
> This is wrong too, in Linux the WC's for GMP packets include the pkey
> index that was used to RX the packet. This is a Linux extension
> matching the one on the sending side.
> 
> The hypervisor *must* block these GMPs, it is a hard requirement of
> the pkey security model for VMs.

Aside from the pkey enforcement at the VF, there is also the issue of
the pkey trap - if physical port model is followed for shared virtual
ports, trap should be generated which could look confusing to SM as well
as counting in some error counter (for physical port, it's either
PortCounter:Port[Xmit Rcv]ConstraintErrors.

> AFAIK the Mellanox drivers do this right - do you know differently?
> 
>>> The only time you could need a new REJ code is if the GMP is using a
>>> PKey different from the REQ - which is a pretty goofy thing to do
>>> considering this VM case.
>>
>> Its goofy. In the CX-3 shared port model, the BTH.PKey is the
>> default one and the REQ.PKey is the full one even if the sending
>> VM’s port only is a limited member. This patch series fixes the last
>> issue.
> 
> Again, this is wrong, the BTH.Pkey and REQ.Pkey should be the same -

I do not believe there is anything in the spec that requires this. I
agree it's the simplest use model though.

> please fix that first before trying to change anything else.
> 
>>> Remember the SM doesn't know what Pkeys are in the VM, so it is
>>> basically impossible for the REQ side to reliably select two different
>>> pkeys and know that they will bothmake it to the VM.
>>
>> The active side should use the "authentic" PKey in the REQ
>> message. That is the one that would be used in BTH.PKey when
>> communication has been established. This is implemented by this
>> patch series.
>>
>> Not sure what you mean by "reliably select two different pkeys". The
>> CM REQ message contains one PKey.
> 
> If BTH.Pkey != REQ.PKey then the requestor side has to obviously
> select two PKeys, which is basically impossible.
> 
> The VM should not be part of the default partition, for instance.

I think that the VM is at least a limited member of the default partition.

-- Hal

>> See above, not sure how that could be implemented. And if it is
>> solved by the HCA trapping the GMP due to the PKey check, it doesn’t
>> help me, as the purpose of the series is to avoid (excessive) PKey
>> traps sent to the SM.
> 
> It might not help you, but is shows this fix for the pkey trap issue
> is wrong. You must fix the pkey traps on the sending side, not on the
> responder side..
> 
>>> If I recall there were bugs here in mlx drivers, where the driver
>>> sent with the wrong Pkey. I think this has actually been fixed
>>> now, so please check the upstream kernel to be sure the Pkey is
>>> not what it is supposed to be.
>>
>> Let me get back to you with some ibdumps here.
> 
> Upstream kernel (eg 4.16) and newish Mellanox firmware please.
> 
> And it would be fantastic if you could ID why this is happening, AFAIK
> it should be OK in the kernel.
> 
> Jason
> 


Re: [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys

2018-05-16 Thread Hal Rosenstock
On 5/16/2018 11:12 AM, Jason Gunthorpe wrote:
> On Wed, May 16, 2018 at 08:47:21AM +0200, Håkon Bugge wrote:
> 
>>> This is not a difficult issue.
>>>
>>> If the GMP is properly tagged with the right PKey then it will never
>>> be delivered to the VM if the VM does not have the PKey in the
>>> table.
>>
>> Not quite right. For the shared port model, a GMP will (most
>> probably) be accepted by the physical port, due to:
> 
> Sure, but I am talking about the VM's 'virtual port'.
> 
>> So, if the GMP is destined to VM1, which is a limited member, but we
>> have another VM2 which is a full member of the same partition, the
>> GMP will pass the HCA’s PKey check.
> 
> Sure.
> 
>>> It is up to the hypervisor to block GMPs that have Pkeys not in
>>> the virtual PKey table of the VF.
>>
>> The packet is received by the HCA and stripped from IB headers, in
>> particular the BTH. How can the "hypervisor" block it when its
>> doesn’t have access to the GMP’s BTH.PKey?
> 
> This is wrong too, in Linux the WC's for GMP packets include the pkey
> index that was used to RX the packet. This is a Linux extension
> matching the one on the sending side.
> 
> The hypervisor *must* block these GMPs, it is a hard requirement of
> the pkey security model for VMs.

Aside from the pkey enforcement at the VF, there is also the issue of
the pkey trap - if physical port model is followed for shared virtual
ports, trap should be generated which could look confusing to SM as well
as counting in some error counter (for physical port, it's either
PortCounter:Port[Xmit Rcv]ConstraintErrors.

> AFAIK the Mellanox drivers do this right - do you know differently?
> 
>>> The only time you could need a new REJ code is if the GMP is using a
>>> PKey different from the REQ - which is a pretty goofy thing to do
>>> considering this VM case.
>>
>> Its goofy. In the CX-3 shared port model, the BTH.PKey is the
>> default one and the REQ.PKey is the full one even if the sending
>> VM’s port only is a limited member. This patch series fixes the last
>> issue.
> 
> Again, this is wrong, the BTH.Pkey and REQ.Pkey should be the same -

I do not believe there is anything in the spec that requires this. I
agree it's the simplest use model though.

> please fix that first before trying to change anything else.
> 
>>> Remember the SM doesn't know what Pkeys are in the VM, so it is
>>> basically impossible for the REQ side to reliably select two different
>>> pkeys and know that they will bothmake it to the VM.
>>
>> The active side should use the "authentic" PKey in the REQ
>> message. That is the one that would be used in BTH.PKey when
>> communication has been established. This is implemented by this
>> patch series.
>>
>> Not sure what you mean by "reliably select two different pkeys". The
>> CM REQ message contains one PKey.
> 
> If BTH.Pkey != REQ.PKey then the requestor side has to obviously
> select two PKeys, which is basically impossible.
> 
> The VM should not be part of the default partition, for instance.

I think that the VM is at least a limited member of the default partition.

-- Hal

>> See above, not sure how that could be implemented. And if it is
>> solved by the HCA trapping the GMP due to the PKey check, it doesn’t
>> help me, as the purpose of the series is to avoid (excessive) PKey
>> traps sent to the SM.
> 
> It might not help you, but is shows this fix for the pkey trap issue
> is wrong. You must fix the pkey traps on the sending side, not on the
> responder side..
> 
>>> If I recall there were bugs here in mlx drivers, where the driver
>>> sent with the wrong Pkey. I think this has actually been fixed
>>> now, so please check the upstream kernel to be sure the Pkey is
>>> not what it is supposed to be.
>>
>> Let me get back to you with some ibdumps here.
> 
> Upstream kernel (eg 4.16) and newish Mellanox firmware please.
> 
> And it would be fantastic if you could ID why this is happening, AFAIK
> it should be OK in the kernel.
> 
> Jason
> 


Re: [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys

2018-05-14 Thread Hal Rosenstock
On 5/14/2018 5:02 PM, Jason Gunthorpe wrote:
> On Thu, May 10, 2018 at 05:16:28PM +0200, Håkon Bugge wrote:
> 
>> We are talking about two things here. The PKey in the BTH and the
>> PKey in the CM REQ payload. They differ.
>>
>> I am out of office, but if my memory serves me correct, the PKey in
>> the BTH in the MAD packet will be the default PKey. Further, we have
>> per IBTA:
> 
> This sounds like a Linux bug.
> 
> Linux does not do a PR to get a reversible path dedicated to the GMP> so it 
> always uses the data flow path, thus the GMP path paramenters
> and those in the REQ should always exactly match.
>
> Where is Linux setting the BTH.PKey and how did it choose to use the
> default pkey? Lets fix that at least for sure.
> 
> Once that is fixed the rest of the series makes no sense since a REQ
> with invalid PKey will never arrive.
> 
> However...
> 
> This series seems inconsistent with the spec.
> 
> IIRC the spec doesn't say if a full or limited pkey should be placed
> in the REQ (Hal?). 

CM spec for REQ just says partition key without indicating whether this
means P_Key or just the partition (15 bits) so my read is that either
full or limited pkey is allowed in REQ.

> It is designed so that the requestor can get a
> single reversible path and put that results into the REQ without
> additional processing, however the PR returns only one PKey and again,
> it is not really specified if it should be the full or limited pkey
> (Hal?).

Correct; it's not specified.

> Basically this means that any pkey in the REQ could randomly be the
> full or limited value, and that in-of-itself has not bearing on the
> connection.
> 
> So it is quite wrong to insist that the pkey be limited or full when
> processing the REQ. The end port is expected to match against the
> local table.

Note that there is thorny issue with shared (physical) port
virtualization. In shared port virtualization, the VF pkey assignment is
a local matter. Only thing SM knows is the physical port pkeys where
both full and limited membership of same partition is possible. It is
conceivable that CM REQ contains limited partition key between 2 limited
VFs and for that a new REJ reason code appears to be needed.

-- Hal

> The real answer to your trap problem is to fix the SM to not create
> paths that are non-functional, that is just flat out broken SM
> behavior.
>
> Jason
> 


Re: [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys

2018-05-14 Thread Hal Rosenstock
On 5/14/2018 5:02 PM, Jason Gunthorpe wrote:
> On Thu, May 10, 2018 at 05:16:28PM +0200, Håkon Bugge wrote:
> 
>> We are talking about two things here. The PKey in the BTH and the
>> PKey in the CM REQ payload. They differ.
>>
>> I am out of office, but if my memory serves me correct, the PKey in
>> the BTH in the MAD packet will be the default PKey. Further, we have
>> per IBTA:
> 
> This sounds like a Linux bug.
> 
> Linux does not do a PR to get a reversible path dedicated to the GMP> so it 
> always uses the data flow path, thus the GMP path paramenters
> and those in the REQ should always exactly match.
>
> Where is Linux setting the BTH.PKey and how did it choose to use the
> default pkey? Lets fix that at least for sure.
> 
> Once that is fixed the rest of the series makes no sense since a REQ
> with invalid PKey will never arrive.
> 
> However...
> 
> This series seems inconsistent with the spec.
> 
> IIRC the spec doesn't say if a full or limited pkey should be placed
> in the REQ (Hal?). 

CM spec for REQ just says partition key without indicating whether this
means P_Key or just the partition (15 bits) so my read is that either
full or limited pkey is allowed in REQ.

> It is designed so that the requestor can get a
> single reversible path and put that results into the REQ without
> additional processing, however the PR returns only one PKey and again,
> it is not really specified if it should be the full or limited pkey
> (Hal?).

Correct; it's not specified.

> Basically this means that any pkey in the REQ could randomly be the
> full or limited value, and that in-of-itself has not bearing on the
> connection.
> 
> So it is quite wrong to insist that the pkey be limited or full when
> processing the REQ. The end port is expected to match against the
> local table.

Note that there is thorny issue with shared (physical) port
virtualization. In shared port virtualization, the VF pkey assignment is
a local matter. Only thing SM knows is the physical port pkeys where
both full and limited membership of same partition is possible. It is
conceivable that CM REQ contains limited partition key between 2 limited
VFs and for that a new REJ reason code appears to be needed.

-- Hal

> The real answer to your trap problem is to fix the SM to not create
> paths that are non-functional, that is just flat out broken SM
> behavior.
>
> Jason
> 


Re: [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys

2018-05-11 Thread Hal Rosenstock
On 5/11/2018 6:55 AM, Håkon Bugge wrote:
> 
> 
>> On 10 May 2018, at 18:54, Hal Rosenstock <h...@dev.mellanox.co.il> wrote:
>>
>> On 5/10/2018 11:16 AM, Håkon Bugge wrote:
>>>
>>>
>>>> On 10 May 2018, at 16:01, Hal Rosenstock <h...@dev.mellanox.co.il> wrote:
>>>>
>>>> On 5/10/2018 5:16 AM, Håkon Bugge wrote:
>>>>>
>>>>>
>>>>>> On 9 May 2018, at 13:28, Hal Rosenstock <h...@dev.mellanox.co.il> wrote:
>>>>>>
>>>>>> On 5/9/2018 5:30 AM, Håkon Bugge wrote:
>>>>>>> There is no point in using RDMA CM to establish a connection between
>>>>>>> two QPs that cannot possible communicate. Particularly, if both the
>>>>>>> active and passive side use limited pkeys, they are not able to
>>>>>>> communicate.
>>>>>>>
>>>>>>> In order to detect this situation, the authentic pkey is used in the
>>>>>>> CM REQ message. The authentic pkey is the one that the HCA inserts
>>>>>>> into the BTH in the IB packets.
>>>>>>>
>>>>>>> When the passive side receives the REQ, commit ("ib_core: A full pkey
>>>>>>> is required to match a limited one") ensures that
>>>>>>> ib_find_matched_cached_pkey() fails unless at least one of the pkeys
>>>>>>> compared has the full-member bit.
>>>>>>>
>>>>>>> In the limited-to-limited case, this will prohibit the connection to
>>>>>>> be formed, and thus, Pkey Violation Traps will not be sent to the SM.
>>>>>>>
>>>>>>> Signed-off-by: Håkon Bugge <haakon.bu...@oracle.com>
>>>>>>> ---
>>>>>>> drivers/infiniband/core/cm.c | 39 
>>>>>>> ---
>>>>>>> include/rdma/ib_cm.h |  4 +++-
>>>>>>> 2 files changed, 35 insertions(+), 8 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
>>>>>>> index a92e1a5c202b..52ed51d5bd2a 100644
>>>>>>> --- a/drivers/infiniband/core/cm.c
>>>>>>> +++ b/drivers/infiniband/core/cm.c
>>>>>>> @@ -3,6 +3,7 @@
>>>>>>> * Copyright (c) 2004 Topspin Corporation.  All rights reserved.
>>>>>>> * Copyright (c) 2004, 2005 Voltaire Corporation.  All rights reserved.
>>>>>>> * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved.
>>>>>>> + * Copyright (c) 2018 Oracle and/or its affiliates. All rights 
>>>>>>> reserved.
>>>>>>> *
>>>>>>> * This software is available to you under a choice of one of two
>>>>>>> * licenses.  You may choose to be licensed under the terms of the GNU
>>>>>>> @@ -91,6 +92,7 @@ static const char * const ibcm_rej_reason_strs[] = {
>>>>>>> [IB_CM_REJ_INVALID_CLASS_VERSION]   = "invalid class 
>>>>>>> version",
>>>>>>> [IB_CM_REJ_INVALID_FLOW_LABEL]  = "invalid flow label",
>>>>>>> [IB_CM_REJ_INVALID_ALT_FLOW_LABEL]  = "invalid alt flow 
>>>>>>> label",
>>>>>>> +   [IB_CM_REJ_INVALID_PKEY]= "invalid PKey",
>>>>>>
>>>>>> If this patch goes ahead, IBA spec for CM should be updated to include 
>>>>>> this.
>>>>>
>>>>> Sure, I see:
>>>>>
>>>>> 33 Invalid Alternate Flow Label
>>>>>
>>>>> as the latest in the spec.
>>>>
>>>> This appears to be an implementation rather than architectural issue. If
>>>> there is limited <-> limited CM, then the MAD would never get up to the
>>>> CM layer as it would be filtered by end port pkey enforcement. This is
>>>> only needed to protect against current code which can send on the full
>>>> rather than limited pkey (in BTH), right ?
>>>
>>> We are talking about two things here. The PKey in the BTH and the PKey in 
>>> the CM REQ payload. They differ.
>>
>> Yes, I understand the difference between the 2 potentially different PKeys.
>>
>>> I am out of office, but if my memory s

Re: [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys

2018-05-11 Thread Hal Rosenstock
On 5/11/2018 6:55 AM, Håkon Bugge wrote:
> 
> 
>> On 10 May 2018, at 18:54, Hal Rosenstock  wrote:
>>
>> On 5/10/2018 11:16 AM, Håkon Bugge wrote:
>>>
>>>
>>>> On 10 May 2018, at 16:01, Hal Rosenstock  wrote:
>>>>
>>>> On 5/10/2018 5:16 AM, Håkon Bugge wrote:
>>>>>
>>>>>
>>>>>> On 9 May 2018, at 13:28, Hal Rosenstock  wrote:
>>>>>>
>>>>>> On 5/9/2018 5:30 AM, Håkon Bugge wrote:
>>>>>>> There is no point in using RDMA CM to establish a connection between
>>>>>>> two QPs that cannot possible communicate. Particularly, if both the
>>>>>>> active and passive side use limited pkeys, they are not able to
>>>>>>> communicate.
>>>>>>>
>>>>>>> In order to detect this situation, the authentic pkey is used in the
>>>>>>> CM REQ message. The authentic pkey is the one that the HCA inserts
>>>>>>> into the BTH in the IB packets.
>>>>>>>
>>>>>>> When the passive side receives the REQ, commit ("ib_core: A full pkey
>>>>>>> is required to match a limited one") ensures that
>>>>>>> ib_find_matched_cached_pkey() fails unless at least one of the pkeys
>>>>>>> compared has the full-member bit.
>>>>>>>
>>>>>>> In the limited-to-limited case, this will prohibit the connection to
>>>>>>> be formed, and thus, Pkey Violation Traps will not be sent to the SM.
>>>>>>>
>>>>>>> Signed-off-by: Håkon Bugge 
>>>>>>> ---
>>>>>>> drivers/infiniband/core/cm.c | 39 
>>>>>>> ---
>>>>>>> include/rdma/ib_cm.h |  4 +++-
>>>>>>> 2 files changed, 35 insertions(+), 8 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
>>>>>>> index a92e1a5c202b..52ed51d5bd2a 100644
>>>>>>> --- a/drivers/infiniband/core/cm.c
>>>>>>> +++ b/drivers/infiniband/core/cm.c
>>>>>>> @@ -3,6 +3,7 @@
>>>>>>> * Copyright (c) 2004 Topspin Corporation.  All rights reserved.
>>>>>>> * Copyright (c) 2004, 2005 Voltaire Corporation.  All rights reserved.
>>>>>>> * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved.
>>>>>>> + * Copyright (c) 2018 Oracle and/or its affiliates. All rights 
>>>>>>> reserved.
>>>>>>> *
>>>>>>> * This software is available to you under a choice of one of two
>>>>>>> * licenses.  You may choose to be licensed under the terms of the GNU
>>>>>>> @@ -91,6 +92,7 @@ static const char * const ibcm_rej_reason_strs[] = {
>>>>>>> [IB_CM_REJ_INVALID_CLASS_VERSION]   = "invalid class 
>>>>>>> version",
>>>>>>> [IB_CM_REJ_INVALID_FLOW_LABEL]  = "invalid flow label",
>>>>>>> [IB_CM_REJ_INVALID_ALT_FLOW_LABEL]  = "invalid alt flow 
>>>>>>> label",
>>>>>>> +   [IB_CM_REJ_INVALID_PKEY]= "invalid PKey",
>>>>>>
>>>>>> If this patch goes ahead, IBA spec for CM should be updated to include 
>>>>>> this.
>>>>>
>>>>> Sure, I see:
>>>>>
>>>>> 33 Invalid Alternate Flow Label
>>>>>
>>>>> as the latest in the spec.
>>>>
>>>> This appears to be an implementation rather than architectural issue. If
>>>> there is limited <-> limited CM, then the MAD would never get up to the
>>>> CM layer as it would be filtered by end port pkey enforcement. This is
>>>> only needed to protect against current code which can send on the full
>>>> rather than limited pkey (in BTH), right ?
>>>
>>> We are talking about two things here. The PKey in the BTH and the PKey in 
>>> the CM REQ payload. They differ.
>>
>> Yes, I understand the difference between the 2 potentially different PKeys.
>>
>>> I am out of office, but if my memory serves me correct, the PKey in the BTH 
>>> in the MAD packet will be the default PKey. 
>>
>> MADs are se

Re: [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys

2018-05-10 Thread Hal Rosenstock
On 5/10/2018 11:16 AM, Håkon Bugge wrote:
> 
> 
>> On 10 May 2018, at 16:01, Hal Rosenstock <h...@dev.mellanox.co.il> wrote:
>>
>> On 5/10/2018 5:16 AM, Håkon Bugge wrote:
>>>
>>>
>>>> On 9 May 2018, at 13:28, Hal Rosenstock <h...@dev.mellanox.co.il> wrote:
>>>>
>>>> On 5/9/2018 5:30 AM, Håkon Bugge wrote:
>>>>> There is no point in using RDMA CM to establish a connection between
>>>>> two QPs that cannot possible communicate. Particularly, if both the
>>>>> active and passive side use limited pkeys, they are not able to
>>>>> communicate.
>>>>>
>>>>> In order to detect this situation, the authentic pkey is used in the
>>>>> CM REQ message. The authentic pkey is the one that the HCA inserts
>>>>> into the BTH in the IB packets.
>>>>>
>>>>> When the passive side receives the REQ, commit ("ib_core: A full pkey
>>>>> is required to match a limited one") ensures that
>>>>> ib_find_matched_cached_pkey() fails unless at least one of the pkeys
>>>>> compared has the full-member bit.
>>>>>
>>>>> In the limited-to-limited case, this will prohibit the connection to
>>>>> be formed, and thus, Pkey Violation Traps will not be sent to the SM.
>>>>>
>>>>> Signed-off-by: Håkon Bugge <haakon.bu...@oracle.com>
>>>>> ---
>>>>> drivers/infiniband/core/cm.c | 39 ---
>>>>> include/rdma/ib_cm.h |  4 +++-
>>>>> 2 files changed, 35 insertions(+), 8 deletions(-)
>>>>>
>>>>> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
>>>>> index a92e1a5c202b..52ed51d5bd2a 100644
>>>>> --- a/drivers/infiniband/core/cm.c
>>>>> +++ b/drivers/infiniband/core/cm.c
>>>>> @@ -3,6 +3,7 @@
>>>>> * Copyright (c) 2004 Topspin Corporation.  All rights reserved.
>>>>> * Copyright (c) 2004, 2005 Voltaire Corporation.  All rights reserved.
>>>>> * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved.
>>>>> + * Copyright (c) 2018 Oracle and/or its affiliates. All rights reserved.
>>>>> *
>>>>> * This software is available to you under a choice of one of two
>>>>> * licenses.  You may choose to be licensed under the terms of the GNU
>>>>> @@ -91,6 +92,7 @@ static const char * const ibcm_rej_reason_strs[] = {
>>>>>   [IB_CM_REJ_INVALID_CLASS_VERSION]   = "invalid class version",
>>>>>   [IB_CM_REJ_INVALID_FLOW_LABEL]  = "invalid flow label",
>>>>>   [IB_CM_REJ_INVALID_ALT_FLOW_LABEL]  = "invalid alt flow label",
>>>>> + [IB_CM_REJ_INVALID_PKEY]= "invalid PKey",
>>>>
>>>> If this patch goes ahead, IBA spec for CM should be updated to include 
>>>> this.
>>>
>>> Sure, I see:
>>>
>>> 33 Invalid Alternate Flow Label
>>>
>>> as the latest in the spec.
>>
>> This appears to be an implementation rather than architectural issue. If
>> there is limited <-> limited CM, then the MAD would never get up to the
>> CM layer as it would be filtered by end port pkey enforcement. This is
>> only needed to protect against current code which can send on the full
>> rather than limited pkey (in BTH), right ?
> 
> We are talking about two things here. The PKey in the BTH and the PKey in the 
> CM REQ payload. They differ.

Yes, I understand the difference between the 2 potentially different PKeys.

> I am out of office, but if my memory serves me correct, the PKey in the BTH 
> in the MAD packet will be the default PKey. 

MADs are sent on reversible paths (13.5.4.1) where the response can be
constructed from only header info in the request. BTH:pkey doesn't
__have to be__ the default pkey. It just needs to be a common/shared
pkey between the source/destination pair.

For example, there are some configurations where the port is only a
limited member of the default partition (and full member of other non
default partition(s)).

There are also configurations where the port is both full and limited
member on one or more partition(s). This is the allow_both_pkeys TRUE
case for OpenSM which also uses the 'both' syntax in the partitions
config file. This appears to be the case to which you are referring.

CM MAD could be sent on default partition but that may not work if both
source and dest are only lim

Re: [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys

2018-05-10 Thread Hal Rosenstock
On 5/10/2018 11:16 AM, Håkon Bugge wrote:
> 
> 
>> On 10 May 2018, at 16:01, Hal Rosenstock  wrote:
>>
>> On 5/10/2018 5:16 AM, Håkon Bugge wrote:
>>>
>>>
>>>> On 9 May 2018, at 13:28, Hal Rosenstock  wrote:
>>>>
>>>> On 5/9/2018 5:30 AM, Håkon Bugge wrote:
>>>>> There is no point in using RDMA CM to establish a connection between
>>>>> two QPs that cannot possible communicate. Particularly, if both the
>>>>> active and passive side use limited pkeys, they are not able to
>>>>> communicate.
>>>>>
>>>>> In order to detect this situation, the authentic pkey is used in the
>>>>> CM REQ message. The authentic pkey is the one that the HCA inserts
>>>>> into the BTH in the IB packets.
>>>>>
>>>>> When the passive side receives the REQ, commit ("ib_core: A full pkey
>>>>> is required to match a limited one") ensures that
>>>>> ib_find_matched_cached_pkey() fails unless at least one of the pkeys
>>>>> compared has the full-member bit.
>>>>>
>>>>> In the limited-to-limited case, this will prohibit the connection to
>>>>> be formed, and thus, Pkey Violation Traps will not be sent to the SM.
>>>>>
>>>>> Signed-off-by: Håkon Bugge 
>>>>> ---
>>>>> drivers/infiniband/core/cm.c | 39 ---
>>>>> include/rdma/ib_cm.h |  4 +++-
>>>>> 2 files changed, 35 insertions(+), 8 deletions(-)
>>>>>
>>>>> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
>>>>> index a92e1a5c202b..52ed51d5bd2a 100644
>>>>> --- a/drivers/infiniband/core/cm.c
>>>>> +++ b/drivers/infiniband/core/cm.c
>>>>> @@ -3,6 +3,7 @@
>>>>> * Copyright (c) 2004 Topspin Corporation.  All rights reserved.
>>>>> * Copyright (c) 2004, 2005 Voltaire Corporation.  All rights reserved.
>>>>> * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved.
>>>>> + * Copyright (c) 2018 Oracle and/or its affiliates. All rights reserved.
>>>>> *
>>>>> * This software is available to you under a choice of one of two
>>>>> * licenses.  You may choose to be licensed under the terms of the GNU
>>>>> @@ -91,6 +92,7 @@ static const char * const ibcm_rej_reason_strs[] = {
>>>>>   [IB_CM_REJ_INVALID_CLASS_VERSION]   = "invalid class version",
>>>>>   [IB_CM_REJ_INVALID_FLOW_LABEL]  = "invalid flow label",
>>>>>   [IB_CM_REJ_INVALID_ALT_FLOW_LABEL]  = "invalid alt flow label",
>>>>> + [IB_CM_REJ_INVALID_PKEY]= "invalid PKey",
>>>>
>>>> If this patch goes ahead, IBA spec for CM should be updated to include 
>>>> this.
>>>
>>> Sure, I see:
>>>
>>> 33 Invalid Alternate Flow Label
>>>
>>> as the latest in the spec.
>>
>> This appears to be an implementation rather than architectural issue. If
>> there is limited <-> limited CM, then the MAD would never get up to the
>> CM layer as it would be filtered by end port pkey enforcement. This is
>> only needed to protect against current code which can send on the full
>> rather than limited pkey (in BTH), right ?
> 
> We are talking about two things here. The PKey in the BTH and the PKey in the 
> CM REQ payload. They differ.

Yes, I understand the difference between the 2 potentially different PKeys.

> I am out of office, but if my memory serves me correct, the PKey in the BTH 
> in the MAD packet will be the default PKey. 

MADs are sent on reversible paths (13.5.4.1) where the response can be
constructed from only header info in the request. BTH:pkey doesn't
__have to be__ the default pkey. It just needs to be a common/shared
pkey between the source/destination pair.

For example, there are some configurations where the port is only a
limited member of the default partition (and full member of other non
default partition(s)).

There are also configurations where the port is both full and limited
member on one or more partition(s). This is the allow_both_pkeys TRUE
case for OpenSM which also uses the 'both' syntax in the partitions
config file. This appears to be the case to which you are referring.

CM MAD could be sent on default partition but that may not work if both
source and dest are only limited members of the default partition.

CM MAD could be sent on any common/shared

Re: [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys

2018-05-10 Thread Hal Rosenstock
On 5/10/2018 5:16 AM, Håkon Bugge wrote:
> 
> 
>> On 9 May 2018, at 13:28, Hal Rosenstock <h...@dev.mellanox.co.il> wrote:
>>
>> On 5/9/2018 5:30 AM, Håkon Bugge wrote:
>>> There is no point in using RDMA CM to establish a connection between
>>> two QPs that cannot possible communicate. Particularly, if both the
>>> active and passive side use limited pkeys, they are not able to
>>> communicate.
>>>
>>> In order to detect this situation, the authentic pkey is used in the
>>> CM REQ message. The authentic pkey is the one that the HCA inserts
>>> into the BTH in the IB packets.
>>>
>>> When the passive side receives the REQ, commit ("ib_core: A full pkey
>>> is required to match a limited one") ensures that
>>> ib_find_matched_cached_pkey() fails unless at least one of the pkeys
>>> compared has the full-member bit.
>>>
>>> In the limited-to-limited case, this will prohibit the connection to
>>> be formed, and thus, Pkey Violation Traps will not be sent to the SM.
>>>
>>> Signed-off-by: Håkon Bugge <haakon.bu...@oracle.com>
>>> ---
>>> drivers/infiniband/core/cm.c | 39 ---
>>> include/rdma/ib_cm.h |  4 +++-
>>> 2 files changed, 35 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
>>> index a92e1a5c202b..52ed51d5bd2a 100644
>>> --- a/drivers/infiniband/core/cm.c
>>> +++ b/drivers/infiniband/core/cm.c
>>> @@ -3,6 +3,7 @@
>>>  * Copyright (c) 2004 Topspin Corporation.  All rights reserved.
>>>  * Copyright (c) 2004, 2005 Voltaire Corporation.  All rights reserved.
>>>  * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved.
>>> + * Copyright (c) 2018 Oracle and/or its affiliates. All rights reserved.
>>>  *
>>>  * This software is available to you under a choice of one of two
>>>  * licenses.  You may choose to be licensed under the terms of the GNU
>>> @@ -91,6 +92,7 @@ static const char * const ibcm_rej_reason_strs[] = {
>>> [IB_CM_REJ_INVALID_CLASS_VERSION]   = "invalid class version",
>>> [IB_CM_REJ_INVALID_FLOW_LABEL]  = "invalid flow label",
>>> [IB_CM_REJ_INVALID_ALT_FLOW_LABEL]  = "invalid alt flow label",
>>> +   [IB_CM_REJ_INVALID_PKEY]= "invalid PKey",
>>
>> If this patch goes ahead, IBA spec for CM should be updated to include this.
> 
> Sure, I see:
> 
>  33 Invalid Alternate Flow Label
> 
> as the latest in the spec.

This appears to be an implementation rather than architectural issue. If
there is limited <-> limited CM, then the MAD would never get up to the
CM layer as it would be filtered by end port pkey enforcement. This is
only needed to protect against current code which can send on the full
rather than limited pkey (in BTH), right ?

>>
>>> };
>>>
>>> const char *__attribute_const__ ibcm_reject_msg(int reason)
>>> @@ -518,8 +520,8 @@ static int cm_init_av_by_path(struct sa_path_rec *path, 
>>> struct cm_av *av,
>>> return -EINVAL;
>>> cm_dev = port->cm_dev;
>>>
>>> -   ret = ib_find_cached_pkey(cm_dev->ib_device, port->port_num,
>>> - be16_to_cpu(path->pkey), >pkey_index);
>>> +   ret = ib_find_matched_cached_pkey(cm_dev->ib_device, port->port_num,
>>> + be16_to_cpu(path->pkey), 
>>> >pkey_index);
>>> if (ret)
>>> return ret;
>>>
>>> @@ -1241,7 +1243,7 @@ static void cm_format_req(struct cm_req_msg *req_msg,
>>> cm_req_set_starting_psn(req_msg, cpu_to_be32(param->starting_psn));
>>> cm_req_set_local_resp_timeout(req_msg,
>>>   param->local_cm_response_timeout);
>>> -   req_msg->pkey = param->primary_path->pkey;
>>> +   req_msg->pkey = cpu_to_be16(cm_id_priv->pkey);
>>> cm_req_set_path_mtu(req_msg, param->primary_path->mtu);
>>> cm_req_set_max_cm_retries(req_msg, param->max_cm_retries);
>>>
>>> @@ -1396,7 +1398,23 @@ int ib_send_cm_req(struct ib_cm_id *cm_id,
>>> cm_id_priv->responder_resources = param->responder_resources;
>>> cm_id_priv->retry_count = param->retry_count;
>>> cm_id_priv->path_mtu = param->primary_path->mtu;
>>> -   cm_id_priv

Re: [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys

2018-05-10 Thread Hal Rosenstock
On 5/10/2018 5:16 AM, Håkon Bugge wrote:
> 
> 
>> On 9 May 2018, at 13:28, Hal Rosenstock  wrote:
>>
>> On 5/9/2018 5:30 AM, Håkon Bugge wrote:
>>> There is no point in using RDMA CM to establish a connection between
>>> two QPs that cannot possible communicate. Particularly, if both the
>>> active and passive side use limited pkeys, they are not able to
>>> communicate.
>>>
>>> In order to detect this situation, the authentic pkey is used in the
>>> CM REQ message. The authentic pkey is the one that the HCA inserts
>>> into the BTH in the IB packets.
>>>
>>> When the passive side receives the REQ, commit ("ib_core: A full pkey
>>> is required to match a limited one") ensures that
>>> ib_find_matched_cached_pkey() fails unless at least one of the pkeys
>>> compared has the full-member bit.
>>>
>>> In the limited-to-limited case, this will prohibit the connection to
>>> be formed, and thus, Pkey Violation Traps will not be sent to the SM.
>>>
>>> Signed-off-by: Håkon Bugge 
>>> ---
>>> drivers/infiniband/core/cm.c | 39 ---
>>> include/rdma/ib_cm.h |  4 +++-
>>> 2 files changed, 35 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
>>> index a92e1a5c202b..52ed51d5bd2a 100644
>>> --- a/drivers/infiniband/core/cm.c
>>> +++ b/drivers/infiniband/core/cm.c
>>> @@ -3,6 +3,7 @@
>>>  * Copyright (c) 2004 Topspin Corporation.  All rights reserved.
>>>  * Copyright (c) 2004, 2005 Voltaire Corporation.  All rights reserved.
>>>  * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved.
>>> + * Copyright (c) 2018 Oracle and/or its affiliates. All rights reserved.
>>>  *
>>>  * This software is available to you under a choice of one of two
>>>  * licenses.  You may choose to be licensed under the terms of the GNU
>>> @@ -91,6 +92,7 @@ static const char * const ibcm_rej_reason_strs[] = {
>>> [IB_CM_REJ_INVALID_CLASS_VERSION]   = "invalid class version",
>>> [IB_CM_REJ_INVALID_FLOW_LABEL]  = "invalid flow label",
>>> [IB_CM_REJ_INVALID_ALT_FLOW_LABEL]  = "invalid alt flow label",
>>> +   [IB_CM_REJ_INVALID_PKEY]= "invalid PKey",
>>
>> If this patch goes ahead, IBA spec for CM should be updated to include this.
> 
> Sure, I see:
> 
>  33 Invalid Alternate Flow Label
> 
> as the latest in the spec.

This appears to be an implementation rather than architectural issue. If
there is limited <-> limited CM, then the MAD would never get up to the
CM layer as it would be filtered by end port pkey enforcement. This is
only needed to protect against current code which can send on the full
rather than limited pkey (in BTH), right ?

>>
>>> };
>>>
>>> const char *__attribute_const__ ibcm_reject_msg(int reason)
>>> @@ -518,8 +520,8 @@ static int cm_init_av_by_path(struct sa_path_rec *path, 
>>> struct cm_av *av,
>>> return -EINVAL;
>>> cm_dev = port->cm_dev;
>>>
>>> -   ret = ib_find_cached_pkey(cm_dev->ib_device, port->port_num,
>>> - be16_to_cpu(path->pkey), >pkey_index);
>>> +   ret = ib_find_matched_cached_pkey(cm_dev->ib_device, port->port_num,
>>> + be16_to_cpu(path->pkey), 
>>> >pkey_index);
>>> if (ret)
>>> return ret;
>>>
>>> @@ -1241,7 +1243,7 @@ static void cm_format_req(struct cm_req_msg *req_msg,
>>> cm_req_set_starting_psn(req_msg, cpu_to_be32(param->starting_psn));
>>> cm_req_set_local_resp_timeout(req_msg,
>>>   param->local_cm_response_timeout);
>>> -   req_msg->pkey = param->primary_path->pkey;
>>> +   req_msg->pkey = cpu_to_be16(cm_id_priv->pkey);
>>> cm_req_set_path_mtu(req_msg, param->primary_path->mtu);
>>> cm_req_set_max_cm_retries(req_msg, param->max_cm_retries);
>>>
>>> @@ -1396,7 +1398,23 @@ int ib_send_cm_req(struct ib_cm_id *cm_id,
>>> cm_id_priv->responder_resources = param->responder_resources;
>>> cm_id_priv->retry_count = param->retry_count;
>>> cm_id_priv->path_mtu = param->primary_path->mtu;
>>> -   cm_id_priv->pkey = param->primary_path->pkey;
>>> +
>>> 

Re: [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys

2018-05-09 Thread Hal Rosenstock
On 5/9/2018 5:30 AM, Håkon Bugge wrote:
> There is no point in using RDMA CM to establish a connection between
> two QPs that cannot possible communicate. Particularly, if both the
> active and passive side use limited pkeys, they are not able to
> communicate.
> 
> In order to detect this situation, the authentic pkey is used in the
> CM REQ message. The authentic pkey is the one that the HCA inserts
> into the BTH in the IB packets.
> 
> When the passive side receives the REQ, commit ("ib_core: A full pkey
> is required to match a limited one") ensures that
> ib_find_matched_cached_pkey() fails unless at least one of the pkeys
> compared has the full-member bit.
> 
> In the limited-to-limited case, this will prohibit the connection to
> be formed, and thus, Pkey Violation Traps will not be sent to the SM.
> 
> Signed-off-by: Håkon Bugge 
> ---
>  drivers/infiniband/core/cm.c | 39 ---
>  include/rdma/ib_cm.h |  4 +++-
>  2 files changed, 35 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
> index a92e1a5c202b..52ed51d5bd2a 100644
> --- a/drivers/infiniband/core/cm.c
> +++ b/drivers/infiniband/core/cm.c
> @@ -3,6 +3,7 @@
>   * Copyright (c) 2004 Topspin Corporation.  All rights reserved.
>   * Copyright (c) 2004, 2005 Voltaire Corporation.  All rights reserved.
>   * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved.
> + * Copyright (c) 2018 Oracle and/or its affiliates. All rights reserved.
>   *
>   * This software is available to you under a choice of one of two
>   * licenses.  You may choose to be licensed under the terms of the GNU
> @@ -91,6 +92,7 @@ static const char * const ibcm_rej_reason_strs[] = {
>   [IB_CM_REJ_INVALID_CLASS_VERSION]   = "invalid class version",
>   [IB_CM_REJ_INVALID_FLOW_LABEL]  = "invalid flow label",
>   [IB_CM_REJ_INVALID_ALT_FLOW_LABEL]  = "invalid alt flow label",
> + [IB_CM_REJ_INVALID_PKEY]= "invalid PKey",

If this patch goes ahead, IBA spec for CM should be updated to include this.

>  };
>  
>  const char *__attribute_const__ ibcm_reject_msg(int reason)
> @@ -518,8 +520,8 @@ static int cm_init_av_by_path(struct sa_path_rec *path, 
> struct cm_av *av,
>   return -EINVAL;
>   cm_dev = port->cm_dev;
>  
> - ret = ib_find_cached_pkey(cm_dev->ib_device, port->port_num,
> -   be16_to_cpu(path->pkey), >pkey_index);
> + ret = ib_find_matched_cached_pkey(cm_dev->ib_device, port->port_num,
> +   be16_to_cpu(path->pkey), 
> >pkey_index);
>   if (ret)
>   return ret;
>  
> @@ -1241,7 +1243,7 @@ static void cm_format_req(struct cm_req_msg *req_msg,
>   cm_req_set_starting_psn(req_msg, cpu_to_be32(param->starting_psn));
>   cm_req_set_local_resp_timeout(req_msg,
> param->local_cm_response_timeout);
> - req_msg->pkey = param->primary_path->pkey;
> + req_msg->pkey = cpu_to_be16(cm_id_priv->pkey);
>   cm_req_set_path_mtu(req_msg, param->primary_path->mtu);
>   cm_req_set_max_cm_retries(req_msg, param->max_cm_retries);
>  
> @@ -1396,7 +1398,23 @@ int ib_send_cm_req(struct ib_cm_id *cm_id,
>   cm_id_priv->responder_resources = param->responder_resources;
>   cm_id_priv->retry_count = param->retry_count;
>   cm_id_priv->path_mtu = param->primary_path->mtu;
> - cm_id_priv->pkey = param->primary_path->pkey;
> +
> + /*
> +  * We want to send the pkey used in the BTH in packets
> +  * sent. This, in order for the passive side to determine if
> +  * communication is permitted by the respective pkeys.
> +  *
> +  * The pkey in the paths are derived from the MGID, which has
> +  * the full membership bit set. Hence, we retrieve the pkey by
> +  * using the address vector's pkey_index.

The paths usually come from the SM and I don't expect SM to provide path
between ports of only limited members of partition. Default ACM provider
forms path from multicast group parameters including pkey. Is that the
scenario of concern ? If so, I still don't fully understand the scenario
because limited members are not supposed to be part of a multicast
group. There was some work started to extend this for client/server
model but it was never completed. However, there may be hole(s) in
various components of implementation which open(s) this door.

-- Hal

> +  */
> + ret = ib_get_cached_pkey(cm_id_priv->id.device,
> +  cm_id_priv->av.port->port_num,
> +  cm_id_priv->av.pkey_index,
> +  _id_priv->pkey);
> + if (ret)
> + goto error1;
> +
>   cm_id_priv->qp_type = param->qp_type;
>  
>   ret = cm_alloc_msg(cm_id_priv, _id_priv->msg);
> @@ -1956,16 +1974,19 @@ static int cm_req_handler(struct cm_work 

Re: [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys

2018-05-09 Thread Hal Rosenstock
On 5/9/2018 5:30 AM, Håkon Bugge wrote:
> There is no point in using RDMA CM to establish a connection between
> two QPs that cannot possible communicate. Particularly, if both the
> active and passive side use limited pkeys, they are not able to
> communicate.
> 
> In order to detect this situation, the authentic pkey is used in the
> CM REQ message. The authentic pkey is the one that the HCA inserts
> into the BTH in the IB packets.
> 
> When the passive side receives the REQ, commit ("ib_core: A full pkey
> is required to match a limited one") ensures that
> ib_find_matched_cached_pkey() fails unless at least one of the pkeys
> compared has the full-member bit.
> 
> In the limited-to-limited case, this will prohibit the connection to
> be formed, and thus, Pkey Violation Traps will not be sent to the SM.
> 
> Signed-off-by: Håkon Bugge 
> ---
>  drivers/infiniband/core/cm.c | 39 ---
>  include/rdma/ib_cm.h |  4 +++-
>  2 files changed, 35 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
> index a92e1a5c202b..52ed51d5bd2a 100644
> --- a/drivers/infiniband/core/cm.c
> +++ b/drivers/infiniband/core/cm.c
> @@ -3,6 +3,7 @@
>   * Copyright (c) 2004 Topspin Corporation.  All rights reserved.
>   * Copyright (c) 2004, 2005 Voltaire Corporation.  All rights reserved.
>   * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved.
> + * Copyright (c) 2018 Oracle and/or its affiliates. All rights reserved.
>   *
>   * This software is available to you under a choice of one of two
>   * licenses.  You may choose to be licensed under the terms of the GNU
> @@ -91,6 +92,7 @@ static const char * const ibcm_rej_reason_strs[] = {
>   [IB_CM_REJ_INVALID_CLASS_VERSION]   = "invalid class version",
>   [IB_CM_REJ_INVALID_FLOW_LABEL]  = "invalid flow label",
>   [IB_CM_REJ_INVALID_ALT_FLOW_LABEL]  = "invalid alt flow label",
> + [IB_CM_REJ_INVALID_PKEY]= "invalid PKey",

If this patch goes ahead, IBA spec for CM should be updated to include this.

>  };
>  
>  const char *__attribute_const__ ibcm_reject_msg(int reason)
> @@ -518,8 +520,8 @@ static int cm_init_av_by_path(struct sa_path_rec *path, 
> struct cm_av *av,
>   return -EINVAL;
>   cm_dev = port->cm_dev;
>  
> - ret = ib_find_cached_pkey(cm_dev->ib_device, port->port_num,
> -   be16_to_cpu(path->pkey), >pkey_index);
> + ret = ib_find_matched_cached_pkey(cm_dev->ib_device, port->port_num,
> +   be16_to_cpu(path->pkey), 
> >pkey_index);
>   if (ret)
>   return ret;
>  
> @@ -1241,7 +1243,7 @@ static void cm_format_req(struct cm_req_msg *req_msg,
>   cm_req_set_starting_psn(req_msg, cpu_to_be32(param->starting_psn));
>   cm_req_set_local_resp_timeout(req_msg,
> param->local_cm_response_timeout);
> - req_msg->pkey = param->primary_path->pkey;
> + req_msg->pkey = cpu_to_be16(cm_id_priv->pkey);
>   cm_req_set_path_mtu(req_msg, param->primary_path->mtu);
>   cm_req_set_max_cm_retries(req_msg, param->max_cm_retries);
>  
> @@ -1396,7 +1398,23 @@ int ib_send_cm_req(struct ib_cm_id *cm_id,
>   cm_id_priv->responder_resources = param->responder_resources;
>   cm_id_priv->retry_count = param->retry_count;
>   cm_id_priv->path_mtu = param->primary_path->mtu;
> - cm_id_priv->pkey = param->primary_path->pkey;
> +
> + /*
> +  * We want to send the pkey used in the BTH in packets
> +  * sent. This, in order for the passive side to determine if
> +  * communication is permitted by the respective pkeys.
> +  *
> +  * The pkey in the paths are derived from the MGID, which has
> +  * the full membership bit set. Hence, we retrieve the pkey by
> +  * using the address vector's pkey_index.

The paths usually come from the SM and I don't expect SM to provide path
between ports of only limited members of partition. Default ACM provider
forms path from multicast group parameters including pkey. Is that the
scenario of concern ? If so, I still don't fully understand the scenario
because limited members are not supposed to be part of a multicast
group. There was some work started to extend this for client/server
model but it was never completed. However, there may be hole(s) in
various components of implementation which open(s) this door.

-- Hal

> +  */
> + ret = ib_get_cached_pkey(cm_id_priv->id.device,
> +  cm_id_priv->av.port->port_num,
> +  cm_id_priv->av.pkey_index,
> +  _id_priv->pkey);
> + if (ret)
> + goto error1;
> +
>   cm_id_priv->qp_type = param->qp_type;
>  
>   ret = cm_alloc_msg(cm_id_priv, _id_priv->msg);
> @@ -1956,16 +1974,19 @@ static int cm_req_handler(struct cm_work *work)
>  

Re: [PATCH] IB/mlx5: Set the default active rate and width to QDR and 4X

2018-03-16 Thread Hal Rosenstock
On 3/15/2018 10:37 PM, Honggang LI wrote:
> From: Honggang Li <ho...@redhat.com>
> 
> commit f1b65df5a232 ("IB/mlx5: Add support for active_width and
> active_speed in RoCE"). Before this patch applied, the mlx5_ib
> driver set default active_width and active_speed to IB_WIDTH_4X
> and IB_SPEED_QDR.
> 
> When the RoCE port is down, the RoCE port did not negotiate the
> active width with remote side. The active width is zero. If run
> ibstat to require the port status, ibstat will panic as it read
> invalid width from sys file.
> 
> This patch restores the original behavior.
> 
> Fixes: f1b65df5a232 ("IB/mlx5: Add support for active_width and active_speed 
> in RoCE").
> Signed-off-by: Honggang Li <ho...@redhat.com>

Reviewed-by: Hal Rosenstock <h...@mellanox.com>

> ---
>  drivers/infiniband/hw/mlx5/main.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/infiniband/hw/mlx5/main.c 
> b/drivers/infiniband/hw/mlx5/main.c
> index 033b6af90de9..a48e9730fab8 100644
> --- a/drivers/infiniband/hw/mlx5/main.c
> +++ b/drivers/infiniband/hw/mlx5/main.c
> @@ -388,6 +388,9 @@ static int mlx5_query_port_roce(struct ib_device *device, 
> u8 port_num,
>   if (err)
>   goto out;
>  
> + props->active_width = IB_WIDTH_4X;
> + props->active_speed = IB_SPEED_QDR;
> +
>   translate_eth_proto_oper(eth_prot_oper, >active_speed,
>>active_width);
>  
> 


Re: [PATCH] IB/mlx5: Set the default active rate and width to QDR and 4X

2018-03-16 Thread Hal Rosenstock
On 3/15/2018 10:37 PM, Honggang LI wrote:
> From: Honggang Li 
> 
> commit f1b65df5a232 ("IB/mlx5: Add support for active_width and
> active_speed in RoCE"). Before this patch applied, the mlx5_ib
> driver set default active_width and active_speed to IB_WIDTH_4X
> and IB_SPEED_QDR.
> 
> When the RoCE port is down, the RoCE port did not negotiate the
> active width with remote side. The active width is zero. If run
> ibstat to require the port status, ibstat will panic as it read
> invalid width from sys file.
> 
> This patch restores the original behavior.
> 
> Fixes: f1b65df5a232 ("IB/mlx5: Add support for active_width and active_speed 
> in RoCE").
> Signed-off-by: Honggang Li 

Reviewed-by: Hal Rosenstock 

> ---
>  drivers/infiniband/hw/mlx5/main.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/infiniband/hw/mlx5/main.c 
> b/drivers/infiniband/hw/mlx5/main.c
> index 033b6af90de9..a48e9730fab8 100644
> --- a/drivers/infiniband/hw/mlx5/main.c
> +++ b/drivers/infiniband/hw/mlx5/main.c
> @@ -388,6 +388,9 @@ static int mlx5_query_port_roce(struct ib_device *device, 
> u8 port_num,
>   if (err)
>   goto out;
>  
> + props->active_width = IB_WIDTH_4X;
> + props->active_speed = IB_SPEED_QDR;
> +
>   translate_eth_proto_oper(eth_prot_oper, >active_speed,
>>active_width);
>  
> 


Re: [PATCH 2/2] IB/core: Set width to 1X for invalid active widths when port is down

2018-03-15 Thread Hal Rosenstock
On 3/15/2018 8:43 AM, Honggang LI wrote:
> On Thu, Mar 15, 2018 at 08:32:02AM -0400, Hal Rosenstock wrote:
>> On 3/15/2018 8:01 AM, Hal Rosenstock wrote:
>>> On 3/15/2018 5:02 AM, Honggang LI wrote:
>>>> From: Honggang Li <ho...@redhat.com>
>>>>
>>>> commit f1b65df5a232 ("IB/mlx5: Add support for active_width and
>>>> active_speed in RoCE"). Before this patch applied, the mlx5_ib
>>>> driver set default active_width and active_speed to IB_WIDTH_4X
>>>> and IB_SPEED_QDR.
>>>>
>>>> When the RoCE port is down, the RoCE port did not negotiate the
>>>> active width with remote side. The active width is zero. If run
>>>> ibstat to require the port status, ibstat will panic as it read
>>>> invalid width from sys file.
>>>>
>>>> Signed-off-by: Honggang Li <ho...@redhat.com>
>>>> ---
>>>>  drivers/infiniband/core/sysfs.c | 15 +++
>>>>  1 file changed, 11 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/drivers/infiniband/core/sysfs.c 
>>>> b/drivers/infiniband/core/sysfs.c
>>>> index cf36ff1f0068..722e4571f4d2 100644
>>>> --- a/drivers/infiniband/core/sysfs.c
>>>> +++ b/drivers/infiniband/core/sysfs.c
>>>> @@ -240,6 +240,7 @@ static ssize_t rate_show(struct ib_port *p, struct 
>>>> port_attribute *unused,
>>>>struct ib_port_attr attr;
>>>>char *speed = "";
>>>>int rate;   /* in deci-Gb/sec */
>>>> +  int width;
>>>>ssize_t ret;
>>>>  
>>>>ret = ib_query_port(p->ibdev, p->port_num, );
>>>> @@ -278,13 +279,19 @@ static ssize_t rate_show(struct ib_port *p, struct 
>>>> port_attribute *unused,
>>>>break;
>>>>}
>>>>  
>>>> -  rate *= ib_width_enum_to_int(attr.active_width);
>>>> -  if (rate < 0)
>>>> -  return -EINVAL;
>>>> +  width = ib_width_enum_to_int(attr.active_width);
>>>> +  if (width < 0) {
>>>> +  if (attr.state != IB_PORT_ACTIVE)
>>>
>>> Link width is valid in any PortState other than Down so I think that
>>> this check should be:
>>> if (attr.state != IB_PORT_DOWN)
>>>
>>> However, I don't think overriding width should be needed for this case
>>> and just returning -EINVAL should be fine regardless of port state.
>>> AFAIK it's the driver responsibility to populate acceptable defaults for
>>> such parameters. What driver(s) have this issue ? Shouldn't it be fixed
>>> there rather than here ?
>>
>> I just noticed that you reference commit f1b65df5a232 ("IB/mlx5: Add
>> support for active_width and active_speed in RoCE"). Before this patch
>> applied, the mlx5_ib driver set default active_width and active_speed to
>> IB_WIDTH_4X and IB_SPEED_QDR.
>>
>> Should the fix be to hw/mlx5/main.c:translate_eth_proto_oper which now has:
>>
>> switch (eth_proto_oper) {
>> ...
>> default:
>> return -EINVAL;
>> }
>>
>> return 0;
>>
>> and change default case to:
>>  *active_width = IB_WIDTH_1X;
>>
>> ?
> 
> I suggest to restore previous behavior before apply f1b65df5a232.
> 
> diff --git a/drivers/infiniband/hw/mlx5/main.c 
> b/drivers/infiniband/hw/mlx5/main.c
> index 033b6af90de9..0d73d2772d9b 100644
> --- a/drivers/infiniband/hw/mlx5/main.c
> +++ b/drivers/infiniband/hw/mlx5/main.c
> @@ -388,6 +388,9 @@ static int mlx5_query_port_roce(struct ib_device *device, 
> u8 port_num,
>   if (err)
>   goto out;
>  
> +props->active_width = IB_WIDTH_4X;
> +props->active_speed = IB_SPEED_QDR;
> +
>   translate_eth_proto_oper(eth_prot_oper, >active_speed,
>>active_width);
>  

Sure; makes sense that it should preserve the original behavior for this
case.

Thanks.

-- Hal



Re: [PATCH 2/2] IB/core: Set width to 1X for invalid active widths when port is down

2018-03-15 Thread Hal Rosenstock
On 3/15/2018 8:43 AM, Honggang LI wrote:
> On Thu, Mar 15, 2018 at 08:32:02AM -0400, Hal Rosenstock wrote:
>> On 3/15/2018 8:01 AM, Hal Rosenstock wrote:
>>> On 3/15/2018 5:02 AM, Honggang LI wrote:
>>>> From: Honggang Li 
>>>>
>>>> commit f1b65df5a232 ("IB/mlx5: Add support for active_width and
>>>> active_speed in RoCE"). Before this patch applied, the mlx5_ib
>>>> driver set default active_width and active_speed to IB_WIDTH_4X
>>>> and IB_SPEED_QDR.
>>>>
>>>> When the RoCE port is down, the RoCE port did not negotiate the
>>>> active width with remote side. The active width is zero. If run
>>>> ibstat to require the port status, ibstat will panic as it read
>>>> invalid width from sys file.
>>>>
>>>> Signed-off-by: Honggang Li 
>>>> ---
>>>>  drivers/infiniband/core/sysfs.c | 15 +++
>>>>  1 file changed, 11 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/drivers/infiniband/core/sysfs.c 
>>>> b/drivers/infiniband/core/sysfs.c
>>>> index cf36ff1f0068..722e4571f4d2 100644
>>>> --- a/drivers/infiniband/core/sysfs.c
>>>> +++ b/drivers/infiniband/core/sysfs.c
>>>> @@ -240,6 +240,7 @@ static ssize_t rate_show(struct ib_port *p, struct 
>>>> port_attribute *unused,
>>>>struct ib_port_attr attr;
>>>>char *speed = "";
>>>>int rate;   /* in deci-Gb/sec */
>>>> +  int width;
>>>>ssize_t ret;
>>>>  
>>>>ret = ib_query_port(p->ibdev, p->port_num, );
>>>> @@ -278,13 +279,19 @@ static ssize_t rate_show(struct ib_port *p, struct 
>>>> port_attribute *unused,
>>>>break;
>>>>}
>>>>  
>>>> -  rate *= ib_width_enum_to_int(attr.active_width);
>>>> -  if (rate < 0)
>>>> -  return -EINVAL;
>>>> +  width = ib_width_enum_to_int(attr.active_width);
>>>> +  if (width < 0) {
>>>> +  if (attr.state != IB_PORT_ACTIVE)
>>>
>>> Link width is valid in any PortState other than Down so I think that
>>> this check should be:
>>> if (attr.state != IB_PORT_DOWN)
>>>
>>> However, I don't think overriding width should be needed for this case
>>> and just returning -EINVAL should be fine regardless of port state.
>>> AFAIK it's the driver responsibility to populate acceptable defaults for
>>> such parameters. What driver(s) have this issue ? Shouldn't it be fixed
>>> there rather than here ?
>>
>> I just noticed that you reference commit f1b65df5a232 ("IB/mlx5: Add
>> support for active_width and active_speed in RoCE"). Before this patch
>> applied, the mlx5_ib driver set default active_width and active_speed to
>> IB_WIDTH_4X and IB_SPEED_QDR.
>>
>> Should the fix be to hw/mlx5/main.c:translate_eth_proto_oper which now has:
>>
>> switch (eth_proto_oper) {
>> ...
>> default:
>> return -EINVAL;
>> }
>>
>> return 0;
>>
>> and change default case to:
>>  *active_width = IB_WIDTH_1X;
>>
>> ?
> 
> I suggest to restore previous behavior before apply f1b65df5a232.
> 
> diff --git a/drivers/infiniband/hw/mlx5/main.c 
> b/drivers/infiniband/hw/mlx5/main.c
> index 033b6af90de9..0d73d2772d9b 100644
> --- a/drivers/infiniband/hw/mlx5/main.c
> +++ b/drivers/infiniband/hw/mlx5/main.c
> @@ -388,6 +388,9 @@ static int mlx5_query_port_roce(struct ib_device *device, 
> u8 port_num,
>   if (err)
>   goto out;
>  
> +props->active_width = IB_WIDTH_4X;
> +props->active_speed = IB_SPEED_QDR;
> +
>   translate_eth_proto_oper(eth_prot_oper, >active_speed,
>>active_width);
>  

Sure; makes sense that it should preserve the original behavior for this
case.

Thanks.

-- Hal



Re: [PATCH 2/2] IB/core: Set width to 1X for invalid active widths when port is down

2018-03-15 Thread Hal Rosenstock
On 3/15/2018 8:01 AM, Hal Rosenstock wrote:
> On 3/15/2018 5:02 AM, Honggang LI wrote:
>> From: Honggang Li <ho...@redhat.com>
>>
>> commit f1b65df5a232 ("IB/mlx5: Add support for active_width and
>> active_speed in RoCE"). Before this patch applied, the mlx5_ib
>> driver set default active_width and active_speed to IB_WIDTH_4X
>> and IB_SPEED_QDR.
>>
>> When the RoCE port is down, the RoCE port did not negotiate the
>> active width with remote side. The active width is zero. If run
>> ibstat to require the port status, ibstat will panic as it read
>> invalid width from sys file.
>>
>> Signed-off-by: Honggang Li <ho...@redhat.com>
>> ---
>>  drivers/infiniband/core/sysfs.c | 15 +++
>>  1 file changed, 11 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/infiniband/core/sysfs.c 
>> b/drivers/infiniband/core/sysfs.c
>> index cf36ff1f0068..722e4571f4d2 100644
>> --- a/drivers/infiniband/core/sysfs.c
>> +++ b/drivers/infiniband/core/sysfs.c
>> @@ -240,6 +240,7 @@ static ssize_t rate_show(struct ib_port *p, struct 
>> port_attribute *unused,
>>  struct ib_port_attr attr;
>>  char *speed = "";
>>  int rate;   /* in deci-Gb/sec */
>> +int width;
>>  ssize_t ret;
>>  
>>  ret = ib_query_port(p->ibdev, p->port_num, );
>> @@ -278,13 +279,19 @@ static ssize_t rate_show(struct ib_port *p, struct 
>> port_attribute *unused,
>>  break;
>>  }
>>  
>> -rate *= ib_width_enum_to_int(attr.active_width);
>> -if (rate < 0)
>> -return -EINVAL;
>> +width = ib_width_enum_to_int(attr.active_width);
>> +if (width < 0) {
>> +if (attr.state != IB_PORT_ACTIVE)
> 
> Link width is valid in any PortState other than Down so I think that
> this check should be:
>   if (attr.state != IB_PORT_DOWN)
> 
> However, I don't think overriding width should be needed for this case
> and just returning -EINVAL should be fine regardless of port state.
> AFAIK it's the driver responsibility to populate acceptable defaults for
> such parameters. What driver(s) have this issue ? Shouldn't it be fixed
> there rather than here ?

I just noticed that you reference commit f1b65df5a232 ("IB/mlx5: Add
support for active_width and active_speed in RoCE"). Before this patch
applied, the mlx5_ib driver set default active_width and active_speed to
IB_WIDTH_4X and IB_SPEED_QDR.

Should the fix be to hw/mlx5/main.c:translate_eth_proto_oper which now has:

switch (eth_proto_oper) {
...
default:
return -EINVAL;
}

return 0;

and change default case to:
*active_width = IB_WIDTH_1X;

?

> -- Hal
> 
>> +width = 1; /* default to 1X for invalid widths */
>> +else
>> +return -EINVAL;
>> +}
>> +
>> +rate *= width;
>>  
>>  return sprintf(buf, "%d%s Gb/sec (%dX%s)\n",
>> rate / 10, rate % 10 ? ".5" : "",
>> -   ib_width_enum_to_int(attr.active_width), speed);
>> +   width, speed);
>>  }
>>  
>>  static ssize_t phys_state_show(struct ib_port *p, struct port_attribute 
>> *unused,
>>


Re: [PATCH 2/2] IB/core: Set width to 1X for invalid active widths when port is down

2018-03-15 Thread Hal Rosenstock
On 3/15/2018 8:01 AM, Hal Rosenstock wrote:
> On 3/15/2018 5:02 AM, Honggang LI wrote:
>> From: Honggang Li 
>>
>> commit f1b65df5a232 ("IB/mlx5: Add support for active_width and
>> active_speed in RoCE"). Before this patch applied, the mlx5_ib
>> driver set default active_width and active_speed to IB_WIDTH_4X
>> and IB_SPEED_QDR.
>>
>> When the RoCE port is down, the RoCE port did not negotiate the
>> active width with remote side. The active width is zero. If run
>> ibstat to require the port status, ibstat will panic as it read
>> invalid width from sys file.
>>
>> Signed-off-by: Honggang Li 
>> ---
>>  drivers/infiniband/core/sysfs.c | 15 +++
>>  1 file changed, 11 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/infiniband/core/sysfs.c 
>> b/drivers/infiniband/core/sysfs.c
>> index cf36ff1f0068..722e4571f4d2 100644
>> --- a/drivers/infiniband/core/sysfs.c
>> +++ b/drivers/infiniband/core/sysfs.c
>> @@ -240,6 +240,7 @@ static ssize_t rate_show(struct ib_port *p, struct 
>> port_attribute *unused,
>>  struct ib_port_attr attr;
>>  char *speed = "";
>>  int rate;   /* in deci-Gb/sec */
>> +int width;
>>  ssize_t ret;
>>  
>>  ret = ib_query_port(p->ibdev, p->port_num, );
>> @@ -278,13 +279,19 @@ static ssize_t rate_show(struct ib_port *p, struct 
>> port_attribute *unused,
>>  break;
>>  }
>>  
>> -rate *= ib_width_enum_to_int(attr.active_width);
>> -if (rate < 0)
>> -return -EINVAL;
>> +width = ib_width_enum_to_int(attr.active_width);
>> +if (width < 0) {
>> +if (attr.state != IB_PORT_ACTIVE)
> 
> Link width is valid in any PortState other than Down so I think that
> this check should be:
>   if (attr.state != IB_PORT_DOWN)
> 
> However, I don't think overriding width should be needed for this case
> and just returning -EINVAL should be fine regardless of port state.
> AFAIK it's the driver responsibility to populate acceptable defaults for
> such parameters. What driver(s) have this issue ? Shouldn't it be fixed
> there rather than here ?

I just noticed that you reference commit f1b65df5a232 ("IB/mlx5: Add
support for active_width and active_speed in RoCE"). Before this patch
applied, the mlx5_ib driver set default active_width and active_speed to
IB_WIDTH_4X and IB_SPEED_QDR.

Should the fix be to hw/mlx5/main.c:translate_eth_proto_oper which now has:

switch (eth_proto_oper) {
...
default:
return -EINVAL;
}

return 0;

and change default case to:
*active_width = IB_WIDTH_1X;

?

> -- Hal
> 
>> +width = 1; /* default to 1X for invalid widths */
>> +else
>> +return -EINVAL;
>> +}
>> +
>> +rate *= width;
>>  
>>  return sprintf(buf, "%d%s Gb/sec (%dX%s)\n",
>> rate / 10, rate % 10 ? ".5" : "",
>> -   ib_width_enum_to_int(attr.active_width), speed);
>> +   width, speed);
>>  }
>>  
>>  static ssize_t phys_state_show(struct ib_port *p, struct port_attribute 
>> *unused,
>>


Re: [PATCH 2/2] IB/core: Set width to 1X for invalid active widths when port is down

2018-03-15 Thread Hal Rosenstock
On 3/15/2018 5:02 AM, Honggang LI wrote:
> From: Honggang Li 
> 
> commit f1b65df5a232 ("IB/mlx5: Add support for active_width and
> active_speed in RoCE"). Before this patch applied, the mlx5_ib
> driver set default active_width and active_speed to IB_WIDTH_4X
> and IB_SPEED_QDR.
> 
> When the RoCE port is down, the RoCE port did not negotiate the
> active width with remote side. The active width is zero. If run
> ibstat to require the port status, ibstat will panic as it read
> invalid width from sys file.
> 
> Signed-off-by: Honggang Li 
> ---
>  drivers/infiniband/core/sysfs.c | 15 +++
>  1 file changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c
> index cf36ff1f0068..722e4571f4d2 100644
> --- a/drivers/infiniband/core/sysfs.c
> +++ b/drivers/infiniband/core/sysfs.c
> @@ -240,6 +240,7 @@ static ssize_t rate_show(struct ib_port *p, struct 
> port_attribute *unused,
>   struct ib_port_attr attr;
>   char *speed = "";
>   int rate;   /* in deci-Gb/sec */
> + int width;
>   ssize_t ret;
>  
>   ret = ib_query_port(p->ibdev, p->port_num, );
> @@ -278,13 +279,19 @@ static ssize_t rate_show(struct ib_port *p, struct 
> port_attribute *unused,
>   break;
>   }
>  
> - rate *= ib_width_enum_to_int(attr.active_width);
> - if (rate < 0)
> - return -EINVAL;
> + width = ib_width_enum_to_int(attr.active_width);
> + if (width < 0) {
> + if (attr.state != IB_PORT_ACTIVE)

Link width is valid in any PortState other than Down so I think that
this check should be:
if (attr.state != IB_PORT_DOWN)

However, I don't think overriding width should be needed for this case
and just returning -EINVAL should be fine regardless of port state.
AFAIK it's the driver responsibility to populate acceptable defaults for
such parameters. What driver(s) have this issue ? Shouldn't it be fixed
there rather than here ?

-- Hal

> + width = 1; /* default to 1X for invalid widths */
> + else
> + return -EINVAL;
> + }
> +
> + rate *= width;
>  
>   return sprintf(buf, "%d%s Gb/sec (%dX%s)\n",
>  rate / 10, rate % 10 ? ".5" : "",
> -ib_width_enum_to_int(attr.active_width), speed);
> +width, speed);
>  }
>  
>  static ssize_t phys_state_show(struct ib_port *p, struct port_attribute 
> *unused,
> 


Re: [PATCH 2/2] IB/core: Set width to 1X for invalid active widths when port is down

2018-03-15 Thread Hal Rosenstock
On 3/15/2018 5:02 AM, Honggang LI wrote:
> From: Honggang Li 
> 
> commit f1b65df5a232 ("IB/mlx5: Add support for active_width and
> active_speed in RoCE"). Before this patch applied, the mlx5_ib
> driver set default active_width and active_speed to IB_WIDTH_4X
> and IB_SPEED_QDR.
> 
> When the RoCE port is down, the RoCE port did not negotiate the
> active width with remote side. The active width is zero. If run
> ibstat to require the port status, ibstat will panic as it read
> invalid width from sys file.
> 
> Signed-off-by: Honggang Li 
> ---
>  drivers/infiniband/core/sysfs.c | 15 +++
>  1 file changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c
> index cf36ff1f0068..722e4571f4d2 100644
> --- a/drivers/infiniband/core/sysfs.c
> +++ b/drivers/infiniband/core/sysfs.c
> @@ -240,6 +240,7 @@ static ssize_t rate_show(struct ib_port *p, struct 
> port_attribute *unused,
>   struct ib_port_attr attr;
>   char *speed = "";
>   int rate;   /* in deci-Gb/sec */
> + int width;
>   ssize_t ret;
>  
>   ret = ib_query_port(p->ibdev, p->port_num, );
> @@ -278,13 +279,19 @@ static ssize_t rate_show(struct ib_port *p, struct 
> port_attribute *unused,
>   break;
>   }
>  
> - rate *= ib_width_enum_to_int(attr.active_width);
> - if (rate < 0)
> - return -EINVAL;
> + width = ib_width_enum_to_int(attr.active_width);
> + if (width < 0) {
> + if (attr.state != IB_PORT_ACTIVE)

Link width is valid in any PortState other than Down so I think that
this check should be:
if (attr.state != IB_PORT_DOWN)

However, I don't think overriding width should be needed for this case
and just returning -EINVAL should be fine regardless of port state.
AFAIK it's the driver responsibility to populate acceptable defaults for
such parameters. What driver(s) have this issue ? Shouldn't it be fixed
there rather than here ?

-- Hal

> + width = 1; /* default to 1X for invalid widths */
> + else
> + return -EINVAL;
> + }
> +
> + rate *= width;
>  
>   return sprintf(buf, "%d%s Gb/sec (%dX%s)\n",
>  rate / 10, rate % 10 ? ".5" : "",
> -ib_width_enum_to_int(attr.active_width), speed);
> +width, speed);
>  }
>  
>  static ssize_t phys_state_show(struct ib_port *p, struct port_attribute 
> *unused,
> 


Re: [PATCH v3] Documentation/ABI: update infiniband sysfs interfaces

2018-02-07 Thread Hal Rosenstock
On 2/6/2018 11:59 PM, Aishwarya Pant wrote:
> Add documentation for core and hardware specific infiniband interfaces.
> The descriptions have been collected from git commit logs, reading
> through code and data sheets. Some drivers have incomplete doc and are
> annotated with the comment '[to be documented]'.
> 
> Signed-off-by: Aishwarya Pant <aishp...@gmail.com>

Thanks. Looks good to me (the device neutral parts of this which I
reviewed).

Nice work!

Reviewed-by: Hal Rosenstock <h...@mellanox.com>


Re: [PATCH v3] Documentation/ABI: update infiniband sysfs interfaces

2018-02-07 Thread Hal Rosenstock
On 2/6/2018 11:59 PM, Aishwarya Pant wrote:
> Add documentation for core and hardware specific infiniband interfaces.
> The descriptions have been collected from git commit logs, reading
> through code and data sheets. Some drivers have incomplete doc and are
> annotated with the comment '[to be documented]'.
> 
> Signed-off-by: Aishwarya Pant 

Thanks. Looks good to me (the device neutral parts of this which I
reviewed).

Nice work!

Reviewed-by: Hal Rosenstock 


Re: [PATCH v2] Documentation/ABI: update infiniband sysfs interfaces

2018-02-06 Thread Hal Rosenstock
On 2/6/2018 2:24 AM, Aishwarya Pant wrote:
> Add documentation for core and hardware specific infiniband interfaces.
> The descriptions have been collected from git commit logs, reading
> through code and data sheets. Some drivers have incomplete doc and are
> annotated with the comment '[to be documented]'.
> 
> Signed-off-by: Aishwarya Pant 

Looks good. One nit below.

> ---
> Changes in v2:
> - Move infiniband interface from testing to stable
> - Fix typos
> - Update description of cap_mask, port_xmit_constraint_errors and
>   port_rcv_constraint_errors
> - Add doc for hw_counters
> - Remove old documentation
> 
>  Documentation/ABI/stable/sysfs-class-infiniband  | 818 
> +++
>  Documentation/ABI/testing/sysfs-class-infiniband |  16 -
>  Documentation/infiniband/sysfs.txt   | 129 +---
>  3 files changed, 820 insertions(+), 143 deletions(-)
>  create mode 100644 Documentation/ABI/stable/sysfs-class-infiniband
>  delete mode 100644 Documentation/ABI/testing/sysfs-class-infiniband
> 
> diff --git a/Documentation/ABI/stable/sysfs-class-infiniband 
> b/Documentation/ABI/stable/sysfs-class-infiniband
> new file mode 100644
> index ..f9c709a8d0ab
> --- /dev/null
> +++ b/Documentation/ABI/stable/sysfs-class-infiniband
> @@ -0,0 +1,818 @@
> +sysfs interface common for all infiniband devices
> +-
> +
> +What:/sys/class/infiniband//node_type
> +What:/sys/class/infiniband//node_guid
> +What:/sys/class/infiniband//sys_image_guid
> +Date:Apr, 2005
> +KernelVersion:   v2.6.12
> +Contact: linux-r...@vger.kernel.org
> +Description:
> + node_type:  (RO) Node type (CA, RNIC, usNIC, usNIC UDP,
> + switch or router)
> +
> + node_guid:  (RO) Node GUID
> +
> + sys_image_guid: (RO) System image GUID
> +
> +
> +What:/sys/class/infiniband//node_desc
> +Date:Feb, 2006
> +KernelVersion:   v2.6.17
> +Contact: linux-r...@vger.kernel.org
> +Description:
> + (RW) Update the node description with information such as the
> + node's hostname, so that IB network management software can tie
> + its view to the real world.
> +
> +
> +What:/sys/class/infiniband//fw_ver
> +Date:Jun, 2016
> +KernelVersion:   v4.10
> +Contact: linux-r...@vger.kernel.org
> +Description:
> + (RO) Display firmware version
> +
> +
> +What:/sys/class/infiniband//ports//lid
> +What:/sys/class/infiniband//ports//rate
> +What:
> /sys/class/infiniband//ports//lid_mask_count
> +What:/sys/class/infiniband//ports//sm_sl
> +What:/sys/class/infiniband//ports//sm_lid
> +What:/sys/class/infiniband//ports//state
> +What:
> /sys/class/infiniband//ports//phys_state
> +What:/sys/class/infiniband//ports//cap_mask
> +Date:Apr, 2005
> +KernelVersion:   v2.6.12
> +Contact: linux-r...@vger.kernel.org
> +Description:
> +
> + lid:(RO) Port LID
> +
> + rate:   (RO) Port data rate (active width * active
> + speed)
> +
> + lid_mask_count: (RO) Port LID mask count
> +
> + sm_sl:  (RO) Subnet manager SL for port's subnet
> +
> + sm_lid: (RO) Subnet manager LID for port's subnet
> +
> + state:  (RO) Port state (DOWN, INIT, ARMED, ACTIVE or
> + ACTIVE_DEFER)
> +
> + phys_state: (RO) Port physical state (Sleep, Polling,
> + LinkUp, etc)
> +
> + cap_mask:   (RO) Port capability mask. 2 bits here are
> + settable- IsCommunicationManagementSupported
> + (set when CM module is loaded) and IsSM (set via
> + open of issmN file).
> +
> +
> +What:
> /sys/class/infiniband//ports//link_layer
> +Date:Oct, 2010
> +KernelVersion:   v2.6.37
> +Contact: linux-r...@vger.kernel.org
> +Description:
> + (RO) Link layer type information (Infiniband or Ethernet type)
> +
> +
> +What:
> /sys/class/infiniband//ports//counters/symbol_error
> +What:
> /sys/class/infiniband//ports//counters/port_rcv_errors
> +What:
> /sys/class/infiniband//ports//counters/port_rcv_remote_physical_errors
> +What:
> /sys/class/infiniband//ports//counters/port_rcv_switch_relay_errors
> +What:
> /sys/class/infiniband//ports//counters/link_error_recovery
> +What:
> /sys/class/infiniband//ports//counters/port_xmit_constraint_errors
> +What:
> 

Re: [PATCH v2] Documentation/ABI: update infiniband sysfs interfaces

2018-02-06 Thread Hal Rosenstock
On 2/6/2018 2:24 AM, Aishwarya Pant wrote:
> Add documentation for core and hardware specific infiniband interfaces.
> The descriptions have been collected from git commit logs, reading
> through code and data sheets. Some drivers have incomplete doc and are
> annotated with the comment '[to be documented]'.
> 
> Signed-off-by: Aishwarya Pant 

Looks good. One nit below.

> ---
> Changes in v2:
> - Move infiniband interface from testing to stable
> - Fix typos
> - Update description of cap_mask, port_xmit_constraint_errors and
>   port_rcv_constraint_errors
> - Add doc for hw_counters
> - Remove old documentation
> 
>  Documentation/ABI/stable/sysfs-class-infiniband  | 818 
> +++
>  Documentation/ABI/testing/sysfs-class-infiniband |  16 -
>  Documentation/infiniband/sysfs.txt   | 129 +---
>  3 files changed, 820 insertions(+), 143 deletions(-)
>  create mode 100644 Documentation/ABI/stable/sysfs-class-infiniband
>  delete mode 100644 Documentation/ABI/testing/sysfs-class-infiniband
> 
> diff --git a/Documentation/ABI/stable/sysfs-class-infiniband 
> b/Documentation/ABI/stable/sysfs-class-infiniband
> new file mode 100644
> index ..f9c709a8d0ab
> --- /dev/null
> +++ b/Documentation/ABI/stable/sysfs-class-infiniband
> @@ -0,0 +1,818 @@
> +sysfs interface common for all infiniband devices
> +-
> +
> +What:/sys/class/infiniband//node_type
> +What:/sys/class/infiniband//node_guid
> +What:/sys/class/infiniband//sys_image_guid
> +Date:Apr, 2005
> +KernelVersion:   v2.6.12
> +Contact: linux-r...@vger.kernel.org
> +Description:
> + node_type:  (RO) Node type (CA, RNIC, usNIC, usNIC UDP,
> + switch or router)
> +
> + node_guid:  (RO) Node GUID
> +
> + sys_image_guid: (RO) System image GUID
> +
> +
> +What:/sys/class/infiniband//node_desc
> +Date:Feb, 2006
> +KernelVersion:   v2.6.17
> +Contact: linux-r...@vger.kernel.org
> +Description:
> + (RW) Update the node description with information such as the
> + node's hostname, so that IB network management software can tie
> + its view to the real world.
> +
> +
> +What:/sys/class/infiniband//fw_ver
> +Date:Jun, 2016
> +KernelVersion:   v4.10
> +Contact: linux-r...@vger.kernel.org
> +Description:
> + (RO) Display firmware version
> +
> +
> +What:/sys/class/infiniband//ports//lid
> +What:/sys/class/infiniband//ports//rate
> +What:
> /sys/class/infiniband//ports//lid_mask_count
> +What:/sys/class/infiniband//ports//sm_sl
> +What:/sys/class/infiniband//ports//sm_lid
> +What:/sys/class/infiniband//ports//state
> +What:
> /sys/class/infiniband//ports//phys_state
> +What:/sys/class/infiniband//ports//cap_mask
> +Date:Apr, 2005
> +KernelVersion:   v2.6.12
> +Contact: linux-r...@vger.kernel.org
> +Description:
> +
> + lid:(RO) Port LID
> +
> + rate:   (RO) Port data rate (active width * active
> + speed)
> +
> + lid_mask_count: (RO) Port LID mask count
> +
> + sm_sl:  (RO) Subnet manager SL for port's subnet
> +
> + sm_lid: (RO) Subnet manager LID for port's subnet
> +
> + state:  (RO) Port state (DOWN, INIT, ARMED, ACTIVE or
> + ACTIVE_DEFER)
> +
> + phys_state: (RO) Port physical state (Sleep, Polling,
> + LinkUp, etc)
> +
> + cap_mask:   (RO) Port capability mask. 2 bits here are
> + settable- IsCommunicationManagementSupported
> + (set when CM module is loaded) and IsSM (set via
> + open of issmN file).
> +
> +
> +What:
> /sys/class/infiniband//ports//link_layer
> +Date:Oct, 2010
> +KernelVersion:   v2.6.37
> +Contact: linux-r...@vger.kernel.org
> +Description:
> + (RO) Link layer type information (Infiniband or Ethernet type)
> +
> +
> +What:
> /sys/class/infiniband//ports//counters/symbol_error
> +What:
> /sys/class/infiniband//ports//counters/port_rcv_errors
> +What:
> /sys/class/infiniband//ports//counters/port_rcv_remote_physical_errors
> +What:
> /sys/class/infiniband//ports//counters/port_rcv_switch_relay_errors
> +What:
> /sys/class/infiniband//ports//counters/link_error_recovery
> +What:
> /sys/class/infiniband//ports//counters/port_xmit_constraint_errors
> +What:
> 

Re: [PATCH] Documentation/ABI: update infiniband sysfs interfaces

2018-02-05 Thread Hal Rosenstock
On 2/5/2018 2:19 AM, Aishwarya Pant wrote:
>>> +   cap_mask:   (RO) Port capability mask
>> 2 bits here are settable: IsCommunicationManagementSupported and IsSM.
> Hi
> 
> Sorry, I don't quite understand this. cap_mask is a read only value which
> indicates the supported functions. So the two bits-
> IsCommunicationManagementSupported and IsSM, should not be setttable?

In terms of IB, PortInfo:CapabilityMask is RO from perspective of SM.

In linux, IsSM is settable via open of issmN file. An SM will do this to
set the IsSM bit.

It is similar in terms of IsCommunicationManagementSupported bit where
bit is set when CM module is loaded in kernel (in cm_add_one).

-- Hal

> Aishwarya
> 
> 
> 


Re: [PATCH] Documentation/ABI: update infiniband sysfs interfaces

2018-02-05 Thread Hal Rosenstock
On 2/5/2018 2:19 AM, Aishwarya Pant wrote:
>>> +   cap_mask:   (RO) Port capability mask
>> 2 bits here are settable: IsCommunicationManagementSupported and IsSM.
> Hi
> 
> Sorry, I don't quite understand this. cap_mask is a read only value which
> indicates the supported functions. So the two bits-
> IsCommunicationManagementSupported and IsSM, should not be setttable?

In terms of IB, PortInfo:CapabilityMask is RO from perspective of SM.

In linux, IsSM is settable via open of issmN file. An SM will do this to
set the IsSM bit.

It is similar in terms of IsCommunicationManagementSupported bit where
bit is set when CM module is loaded in kernel (in cm_add_one).

-- Hal

> Aishwarya
> 
> 
> 


Re: [PATCH] Documentation/ABI: update infiniband sysfs interfaces

2018-02-01 Thread Hal Rosenstock
On 2/1/2018 8:32 AM, Aishwarya Pant wrote:
> Add documentation for core and hardware specific infiniband interfaces.
> The descriptions have been collected from git commit logs, reading
> through code and data sheets. Some drivers have incomplete doc and are
> annotated with the comment '[to be documented]'.
> 
> Signed-off-by: Aishwarya Pant 
> ---
>  Documentation/ABI/testing/sysfs-class-infiniband | 755 
> +++
>  1 file changed, 755 insertions(+)
> 
> diff --git a/Documentation/ABI/testing/sysfs-class-infiniband 
> b/Documentation/ABI/testing/sysfs-class-infiniband
> index a86abe66a316..2e150169633b 100644
> --- a/Documentation/ABI/testing/sysfs-class-infiniband
> +++ b/Documentation/ABI/testing/sysfs-class-infiniband
> @@ -1,3 +1,194 @@
> +sysfs interface common for all infiniband devices
> +-
> +
> +What:/sys/class/infiniband//node_type
> +What:/sys/class/infiniband//node_guid
> +What:/sys/class/infiniband//sys_image_guid
> +Date:Apr, 2005
> +KernelVersion:   v2.6.12
> +Contact: linux-r...@vger.kernel.org
> +Description:
> + node_type:  (RO) Node type (CA, RNIC, usNIC, usNIC UDP,
> + switch or router)
> +
> + node_guid:  (RO) Node GUID
> +
> + sys_image_guid: (RO) System image GUID
> +
> +
> +What:/sys/class/infiniband//node_desc
> +Date:Feb, 2006
> +KernelVersion:   v2.6.17
> +Contact: linux-r...@vger.kernel.org
> +Description:
> + (RW) Update the node description with information such as the
> + node's hostname, so that IB network management software can tie
> + its view to the real world.
> +
> +
> +What:/sys/class/infiniband//fw_ver
> +Date:Jun, 2016
> +KernelVersion:   v4.10
> +Contact: linux-r...@vger.kernel.org
> +Description:
> + (RO) Display firmware version
> +
> +
> +What:/sys/class/infiniband//ports//lid
> +What:/sys/class/infiniband//ports//rate
> +What:
> /sys/class/infiniband//ports//lid_mask_count
> +What:/sys/class/infiniband//ports//sm_sl
> +What:/sys/class/infiniband//ports//sm_lid
> +What:/sys/class/infiniband//ports//state
> +What:
> /sys/class/infiniband//ports//phys_state
> +What:/sys/class/infiniband//ports//cap_mask
> +Date:Apr, 2005
> +KernelVersion:   v2.6.12
> +Contact: linux-r...@vger.kernel.org
> +Description:
> +
> + lid:(RO) Port LID
> +
> + rate:   (RO) Port data rate (active width * active
> + speed)
> +
> + lid_mask_count: (RO) Port LID mask count
> +
> + sm_slL  (RO) Subnet manager SL for port's subnet

Typo sm_sl:

> +
> + sm_lid: (RO) Subnet manager LID for port's subnet
> +
> + state:  (RO) Port state (DOWN, INIT, ARMED, ACTIVE or
> + ACTIVE_DEFER> +
> + phys_state: (RO) Port physical state (Sleep, Polling,
> + LinkUp, etc)
> +
> + cap_mask:   (RO) Port capability mask

2 bits here are settable: IsCommunicationManagementSupported and IsSM.

> +
> +
> +What:
> /sys/class/inifiniband//ports//link_layer

Typo: infiniband

> +Date:Oct, 2010
> +KernelVersion:   v2.6.37
> +Contact: linux-r...@vger.kernel.org
> +Description:
> + (RO) Link layer type information (Infiniband or Ethernet type)
> +
> +
> +What:
> /sys/class/infiniband//ports//counters/symbol_error
> +What:
> /sys/class/infiniband//ports//counters/port_rcv_errors
> +What:
> /sys/class/infiniband//ports//counters/port_rcv_remote_physical_errors
> +What:
> /sys/class/infiniband//ports//counters/port_rcv_switch_relay_errors
> +What:
> /sys/class/infiniband//ports//counters/link_error_recovery
> +What:
> /sys/class/infiniband//ports//counters/port_xmit_constraint_errors
> +What:
> /sys/class/infiniband//ports//counters/port_rcv_contraint_errors
> +What:
> /sys/class/infiniband//ports//counters/local_link_integrity_errors
> +What:
> /sys/class/infiniband//ports//counters/ecessive_buffer_overrun_errors

typo: excessive_buffer_overrun_errors

> +What:
> /sys/class/infiniband//ports//counters/port_xmit_data
> +What:
> /sys/class/infiniband//ports//counters/port_rcv_data
> +What:
> /sys/class/infiniband//ports//counters/port_xmit_packets
> +What:
> /sys/class/infiniband//ports//counters/port_rcv_packets
> +What:
> 

Re: [PATCH] Documentation/ABI: update infiniband sysfs interfaces

2018-02-01 Thread Hal Rosenstock
On 2/1/2018 8:32 AM, Aishwarya Pant wrote:
> Add documentation for core and hardware specific infiniband interfaces.
> The descriptions have been collected from git commit logs, reading
> through code and data sheets. Some drivers have incomplete doc and are
> annotated with the comment '[to be documented]'.
> 
> Signed-off-by: Aishwarya Pant 
> ---
>  Documentation/ABI/testing/sysfs-class-infiniband | 755 
> +++
>  1 file changed, 755 insertions(+)
> 
> diff --git a/Documentation/ABI/testing/sysfs-class-infiniband 
> b/Documentation/ABI/testing/sysfs-class-infiniband
> index a86abe66a316..2e150169633b 100644
> --- a/Documentation/ABI/testing/sysfs-class-infiniband
> +++ b/Documentation/ABI/testing/sysfs-class-infiniband
> @@ -1,3 +1,194 @@
> +sysfs interface common for all infiniband devices
> +-
> +
> +What:/sys/class/infiniband//node_type
> +What:/sys/class/infiniband//node_guid
> +What:/sys/class/infiniband//sys_image_guid
> +Date:Apr, 2005
> +KernelVersion:   v2.6.12
> +Contact: linux-r...@vger.kernel.org
> +Description:
> + node_type:  (RO) Node type (CA, RNIC, usNIC, usNIC UDP,
> + switch or router)
> +
> + node_guid:  (RO) Node GUID
> +
> + sys_image_guid: (RO) System image GUID
> +
> +
> +What:/sys/class/infiniband//node_desc
> +Date:Feb, 2006
> +KernelVersion:   v2.6.17
> +Contact: linux-r...@vger.kernel.org
> +Description:
> + (RW) Update the node description with information such as the
> + node's hostname, so that IB network management software can tie
> + its view to the real world.
> +
> +
> +What:/sys/class/infiniband//fw_ver
> +Date:Jun, 2016
> +KernelVersion:   v4.10
> +Contact: linux-r...@vger.kernel.org
> +Description:
> + (RO) Display firmware version
> +
> +
> +What:/sys/class/infiniband//ports//lid
> +What:/sys/class/infiniband//ports//rate
> +What:
> /sys/class/infiniband//ports//lid_mask_count
> +What:/sys/class/infiniband//ports//sm_sl
> +What:/sys/class/infiniband//ports//sm_lid
> +What:/sys/class/infiniband//ports//state
> +What:
> /sys/class/infiniband//ports//phys_state
> +What:/sys/class/infiniband//ports//cap_mask
> +Date:Apr, 2005
> +KernelVersion:   v2.6.12
> +Contact: linux-r...@vger.kernel.org
> +Description:
> +
> + lid:(RO) Port LID
> +
> + rate:   (RO) Port data rate (active width * active
> + speed)
> +
> + lid_mask_count: (RO) Port LID mask count
> +
> + sm_slL  (RO) Subnet manager SL for port's subnet

Typo sm_sl:

> +
> + sm_lid: (RO) Subnet manager LID for port's subnet
> +
> + state:  (RO) Port state (DOWN, INIT, ARMED, ACTIVE or
> + ACTIVE_DEFER> +
> + phys_state: (RO) Port physical state (Sleep, Polling,
> + LinkUp, etc)
> +
> + cap_mask:   (RO) Port capability mask

2 bits here are settable: IsCommunicationManagementSupported and IsSM.

> +
> +
> +What:
> /sys/class/inifiniband//ports//link_layer

Typo: infiniband

> +Date:Oct, 2010
> +KernelVersion:   v2.6.37
> +Contact: linux-r...@vger.kernel.org
> +Description:
> + (RO) Link layer type information (Infiniband or Ethernet type)
> +
> +
> +What:
> /sys/class/infiniband//ports//counters/symbol_error
> +What:
> /sys/class/infiniband//ports//counters/port_rcv_errors
> +What:
> /sys/class/infiniband//ports//counters/port_rcv_remote_physical_errors
> +What:
> /sys/class/infiniband//ports//counters/port_rcv_switch_relay_errors
> +What:
> /sys/class/infiniband//ports//counters/link_error_recovery
> +What:
> /sys/class/infiniband//ports//counters/port_xmit_constraint_errors
> +What:
> /sys/class/infiniband//ports//counters/port_rcv_contraint_errors
> +What:
> /sys/class/infiniband//ports//counters/local_link_integrity_errors
> +What:
> /sys/class/infiniband//ports//counters/ecessive_buffer_overrun_errors

typo: excessive_buffer_overrun_errors

> +What:
> /sys/class/infiniband//ports//counters/port_xmit_data
> +What:
> /sys/class/infiniband//ports//counters/port_rcv_data
> +What:
> /sys/class/infiniband//ports//counters/port_xmit_packets
> +What:
> /sys/class/infiniband//ports//counters/port_rcv_packets
> +What:
> 

Re: [PATCH] IB/mlx5: give back valid speed/width even without plugged in SFP module

2017-10-28 Thread Hal Rosenstock
On 10/28/2017 6:42 AM, Thomas Bogendoerfer wrote:
>> I must be missing something as to what is going on in this scenario.

I think I see what's going on now. The -EINVAL in kernel sysfs/rate_show
causes error to either open or read of file. I had checked that an empty
file is parsed correctly but didn't consider an error on open or read.

>> sysfs.c:rate_show is inconsistent as it paves over an invalid speed
>> setting that to SDR but does not pave over invalid width returning
>> -EINVAL but this comment is in another "direction".
> I thought about going in the other direction by making the speed/width 
> unknown case
> somehow visible in rate_show(). When I checked all other drivers only mlx5 
> stand out
> so I decided to only change mlx5. But I make a change to let rate_show() 
> print out, 
> that is currently has no rate for the port.

I wasn't proposing to make that visible in rate_show() but rather
pointing out the inconsistency in changing unknown speeds to SDR but
unknown widths are error. There were comments on not having to set
"random" numbers for speed/width. If so, shouldn't these be handled
consistently here ? Would setting -EINVAL when not one of recognized
speeds cause an issue (rather than setting to SDR) ?

IMO a more future proof implementation is not to have error for
ib_width_enum_to_int but have unknown widths default to 1x similar to
how unknown speeds default to SDR:

include/rdma/ib_verbs.h:
static inline int ib_width_enum_to_int(enum ib_port_width width)
{
switch (width) {
case IB_WIDTH_1X:  return  1;
case IB_WIDTH_4X:  return  4;
case IB_WIDTH_8X:  return  8;
case IB_WIDTH_12X: return 12;
default:  return 1;
}
}

For example, 2x link widths have been added to IBA spec.

This is alternative approach to libibumad/ibstat patches for invalid
link width as it's not set when QSFP is not plugged into port.

-- Hal




Re: [PATCH] IB/mlx5: give back valid speed/width even without plugged in SFP module

2017-10-28 Thread Hal Rosenstock
On 10/28/2017 6:42 AM, Thomas Bogendoerfer wrote:
>> I must be missing something as to what is going on in this scenario.

I think I see what's going on now. The -EINVAL in kernel sysfs/rate_show
causes error to either open or read of file. I had checked that an empty
file is parsed correctly but didn't consider an error on open or read.

>> sysfs.c:rate_show is inconsistent as it paves over an invalid speed
>> setting that to SDR but does not pave over invalid width returning
>> -EINVAL but this comment is in another "direction".
> I thought about going in the other direction by making the speed/width 
> unknown case
> somehow visible in rate_show(). When I checked all other drivers only mlx5 
> stand out
> so I decided to only change mlx5. But I make a change to let rate_show() 
> print out, 
> that is currently has no rate for the port.

I wasn't proposing to make that visible in rate_show() but rather
pointing out the inconsistency in changing unknown speeds to SDR but
unknown widths are error. There were comments on not having to set
"random" numbers for speed/width. If so, shouldn't these be handled
consistently here ? Would setting -EINVAL when not one of recognized
speeds cause an issue (rather than setting to SDR) ?

IMO a more future proof implementation is not to have error for
ib_width_enum_to_int but have unknown widths default to 1x similar to
how unknown speeds default to SDR:

include/rdma/ib_verbs.h:
static inline int ib_width_enum_to_int(enum ib_port_width width)
{
switch (width) {
case IB_WIDTH_1X:  return  1;
case IB_WIDTH_4X:  return  4;
case IB_WIDTH_8X:  return  8;
case IB_WIDTH_12X: return 12;
default:  return 1;
}
}

For example, 2x link widths have been added to IBA spec.

This is alternative approach to libibumad/ibstat patches for invalid
link width as it's not set when QSFP is not plugged into port.

-- Hal




Re: [PATCH] IB/mlx5: give back valid speed/width even without plugged in SFP module

2017-10-27 Thread Hal Rosenstock
On 10/27/2017 7:19 PM, Hal Rosenstock wrote:
> On 10/27/2017 7:04 PM, Ghazale Hosseinabadi wrote:
>>
>>
>> On 10/27/2017 03:52 PM, Hal Rosenstock wrote:
>>> On 10/27/2017 5:54 PM, Ghazale Hosseinabadi wrote:
>>>> When running ibstat (if transceiver is not connected in adapter):
>>>>
>>>> ibpanic: [7851] main: stat of IB device 'mlx5_1' failed: Invalid
>>>> argument
>>> Any output before that ?
>> no, It only prints this line.
> 
> and setting the width to 1x in the driver so the rate file is properly
> populated fixes this ? I must be missing something as to what is going
> on in this scenario.

[off list...]
Are you using libibumad or rdma-core package ? Which version ?
What version of infiniband-diags are you using ?
Can you build from sources ?
I have patch to libibumad/rdma-core and another patch to ibstat
(infiniband-diags) which I'd like you to try. Is that possible ?

Thanks.

-- Hal

> sysfs.c:rate_show is inconsistent as it paves over an invalid speed
> setting that to SDR but does not pave over invalid width returning
> -EINVAL but this comment is in another "direction".
> 
> -- Hal
> 
>>
>> -- Ghazale
>>>   I'm trying to understand how far it gets. It
>>> looks to me that empty rate file would be parsed as 0 and ibstat would
>>> show that rate. ibpanic would occur if file was not found but I could be
>>> missing something.
>>>
>>
>>


Re: [PATCH] IB/mlx5: give back valid speed/width even without plugged in SFP module

2017-10-27 Thread Hal Rosenstock
On 10/27/2017 7:19 PM, Hal Rosenstock wrote:
> On 10/27/2017 7:04 PM, Ghazale Hosseinabadi wrote:
>>
>>
>> On 10/27/2017 03:52 PM, Hal Rosenstock wrote:
>>> On 10/27/2017 5:54 PM, Ghazale Hosseinabadi wrote:
>>>> When running ibstat (if transceiver is not connected in adapter):
>>>>
>>>> ibpanic: [7851] main: stat of IB device 'mlx5_1' failed: Invalid
>>>> argument
>>> Any output before that ?
>> no, It only prints this line.
> 
> and setting the width to 1x in the driver so the rate file is properly
> populated fixes this ? I must be missing something as to what is going
> on in this scenario.

[off list...]
Are you using libibumad or rdma-core package ? Which version ?
What version of infiniband-diags are you using ?
Can you build from sources ?
I have patch to libibumad/rdma-core and another patch to ibstat
(infiniband-diags) which I'd like you to try. Is that possible ?

Thanks.

-- Hal

> sysfs.c:rate_show is inconsistent as it paves over an invalid speed
> setting that to SDR but does not pave over invalid width returning
> -EINVAL but this comment is in another "direction".
> 
> -- Hal
> 
>>
>> -- Ghazale
>>>   I'm trying to understand how far it gets. It
>>> looks to me that empty rate file would be parsed as 0 and ibstat would
>>> show that rate. ibpanic would occur if file was not found but I could be
>>> missing something.
>>>
>>
>>


Re: [PATCH] IB/mlx5: give back valid speed/width even without plugged in SFP module

2017-10-27 Thread Hal Rosenstock
On 10/27/2017 7:04 PM, Ghazale Hosseinabadi wrote:
> 
> 
> On 10/27/2017 03:52 PM, Hal Rosenstock wrote:
>> On 10/27/2017 5:54 PM, Ghazale Hosseinabadi wrote:
>>> When running ibstat (if transceiver is not connected in adapter):
>>>
>>> ibpanic: [7851] main: stat of IB device 'mlx5_1' failed: Invalid
>>> argument
>> Any output before that ?
> no, It only prints this line.

and setting the width to 1x in the driver so the rate file is properly
populated fixes this ? I must be missing something as to what is going
on in this scenario.

sysfs.c:rate_show is inconsistent as it paves over an invalid speed
setting that to SDR but does not pave over invalid width returning
-EINVAL but this comment is in another "direction".

-- Hal

> 
> -- Ghazale
>>   I'm trying to understand how far it gets. It
>> looks to me that empty rate file would be parsed as 0 and ibstat would
>> show that rate. ibpanic would occur if file was not found but I could be
>> missing something.
>>
> 
> 


Re: [PATCH] IB/mlx5: give back valid speed/width even without plugged in SFP module

2017-10-27 Thread Hal Rosenstock
On 10/27/2017 7:04 PM, Ghazale Hosseinabadi wrote:
> 
> 
> On 10/27/2017 03:52 PM, Hal Rosenstock wrote:
>> On 10/27/2017 5:54 PM, Ghazale Hosseinabadi wrote:
>>> When running ibstat (if transceiver is not connected in adapter):
>>>
>>> ibpanic: [7851] main: stat of IB device 'mlx5_1' failed: Invalid
>>> argument
>> Any output before that ?
> no, It only prints this line.

and setting the width to 1x in the driver so the rate file is properly
populated fixes this ? I must be missing something as to what is going
on in this scenario.

sysfs.c:rate_show is inconsistent as it paves over an invalid speed
setting that to SDR but does not pave over invalid width returning
-EINVAL but this comment is in another "direction".

-- Hal

> 
> -- Ghazale
>>   I'm trying to understand how far it gets. It
>> looks to me that empty rate file would be parsed as 0 and ibstat would
>> show that rate. ibpanic would occur if file was not found but I could be
>> missing something.
>>
> 
> 


Re: [PATCH] IB/mlx5: give back valid speed/width even without plugged in SFP module

2017-10-27 Thread Hal Rosenstock
On 10/27/2017 5:54 PM, Ghazale Hosseinabadi wrote:
> When running ibstat (if transceiver is not connected in adapter):
> 
> ibpanic: [7851] main: stat of IB device 'mlx5_1' failed: Invalid argument

Any output before that ? I'm trying to understand how far it gets. It
looks to me that empty rate file would be parsed as 0 and ibstat would
show that rate. ibpanic would occur if file was not found but I could be
missing something.



Re: [PATCH] IB/mlx5: give back valid speed/width even without plugged in SFP module

2017-10-27 Thread Hal Rosenstock
On 10/27/2017 5:54 PM, Ghazale Hosseinabadi wrote:
> When running ibstat (if transceiver is not connected in adapter):
> 
> ibpanic: [7851] main: stat of IB device 'mlx5_1' failed: Invalid argument

Any output before that ? I'm trying to understand how far it gets. It
looks to me that empty rate file would be parsed as 0 and ibstat would
show that rate. ibpanic would occur if file was not found but I could be
missing something.



Re: [PATCH] IB/mlx5: give back valid speed/width even without plugged in SFP module

2017-10-27 Thread Hal Rosenstock
On 10/27/2017 4:33 PM, Parav Pandit wrote:
> 
> 
>> -Original Message-----
>> From: Hal Rosenstock [mailto:h...@dev.mellanox.co.il]
>> Sent: Friday, October 27, 2017 3:19 PM
>> To: Parav Pandit <pa...@mellanox.com>; Thomas Bogendoerfer
>> <tbogendoer...@suse.de>; Matan Barak <mat...@mellanox.com>; Leon
>> Romanovsky <leo...@mellanox.com>; Doug Ledford <dledf...@redhat.com>;
>> linux-r...@vger.kernel.org; linux-kernel@vger.kernel.org
>> Cc: Ghazale Hosseinabadi <ghazale.hosseinab...@oracle.com>
>> Subject: Re: [PATCH] IB/mlx5: give back valid speed/width even without 
>> plugged
>> in SFP module
>>
>> On 10/27/2017 2:32 PM, Parav Pandit wrote:
>>> However I believe that ibstat tool should be enhanced to report unknown port
>> speed instead of expecting drivers to supply some random number like this.
>>
>> ibstat gets the rate from libibumad via /sys/class/infiniband/> device>/ports//rate file which is supposed to be populated by 
>> the
>> driver. Is there no rate file in this error case ?
>>
> <...>//rate file exist.
> 
> rate_show() has invalid active_width as expected due to nonexistence of SFP.
> So sysfs call return invalid value.
> We don't have invalid_active_width defined right now.
> So ibstat and other applications should not crash on such valid errors.

Agreed. I haven't seen ibstat crash reported though. Can someone provide
the crash details ?



Re: [PATCH] IB/mlx5: give back valid speed/width even without plugged in SFP module

2017-10-27 Thread Hal Rosenstock
On 10/27/2017 4:33 PM, Parav Pandit wrote:
> 
> 
>> -Original Message-----
>> From: Hal Rosenstock [mailto:h...@dev.mellanox.co.il]
>> Sent: Friday, October 27, 2017 3:19 PM
>> To: Parav Pandit ; Thomas Bogendoerfer
>> ; Matan Barak ; Leon
>> Romanovsky ; Doug Ledford ;
>> linux-r...@vger.kernel.org; linux-kernel@vger.kernel.org
>> Cc: Ghazale Hosseinabadi 
>> Subject: Re: [PATCH] IB/mlx5: give back valid speed/width even without 
>> plugged
>> in SFP module
>>
>> On 10/27/2017 2:32 PM, Parav Pandit wrote:
>>> However I believe that ibstat tool should be enhanced to report unknown port
>> speed instead of expecting drivers to supply some random number like this.
>>
>> ibstat gets the rate from libibumad via /sys/class/infiniband/> device>/ports//rate file which is supposed to be populated by 
>> the
>> driver. Is there no rate file in this error case ?
>>
> <...>//rate file exist.
> 
> rate_show() has invalid active_width as expected due to nonexistence of SFP.
> So sysfs call return invalid value.
> We don't have invalid_active_width defined right now.
> So ibstat and other applications should not crash on such valid errors.

Agreed. I haven't seen ibstat crash reported though. Can someone provide
the crash details ?



Re: [PATCH] IB/mlx5: give back valid speed/width even without plugged in SFP module

2017-10-27 Thread Hal Rosenstock
On 10/27/2017 2:32 PM, Parav Pandit wrote:
> However I believe that ibstat tool should be enhanced to report unknown port 
> speed instead of expecting drivers to supply some random number like this.

ibstat gets the rate from libibumad via /sys/class/infiniband//ports//rate file which is supposed to be populated by the 
driver. Is there no rate file in this error case ?

-- Hal


Re: [PATCH] IB/mlx5: give back valid speed/width even without plugged in SFP module

2017-10-27 Thread Hal Rosenstock
On 10/27/2017 2:32 PM, Parav Pandit wrote:
> However I believe that ibstat tool should be enhanced to report unknown port 
> speed instead of expecting drivers to supply some random number like this.

ibstat gets the rate from libibumad via /sys/class/infiniband//ports//rate file which is supposed to be populated by the 
driver. Is there no rate file in this error case ?

-- Hal


Re: [PATCH RFC] IB/mad: remove obsolete snoop interface

2015-09-30 Thread Hal Rosenstock
On 9/30/2015 12:32 PM, Weiny, Ira wrote:
>>
>> On 9/30/2015 2:01 AM, ira.we...@intel.com wrote:
>>> From: Ira Weiny 
>>>
>>> This interface has no current users and is obsolete.
>>
>> There are no in tree users of this but there is Sean's madeye tool (which is 
>> out
>> of tree). This is still a useful debug tool for MADs.
> 
> Where is that?  

It's been out of tree for some time now. For one place, it can be found
on OpenFabrics site in some old and moldy OFED.

> I did not think it was supported any longer?

It still works AFAIK.

> I have a series of about 5 patches which implement tracing in the MAD stack 
> which I was working through and was going to submit 

Does your MAD stack tracing include the dumping and decode of sent and
received MADs ?

-- Hal

> along with this patch (those are still being tested in my spare time).  I 
> just wanted put this out here because of the patch Nicholas posted yesterday.
>>
>>> Signed-off-by: Ira Weiny 
>>> ---
>>>
>>> This has undergone a medium level of testing.  I have run it against
>>> mlx4, qib, and OPA hardware and it does not seem to cause any regressions.
>>>
>>>  drivers/infiniband/core/mad.c  | 226 
>>> +
>>>  drivers/infiniband/core/mad_priv.h |  13 ---
>>>  2 files changed, 5 insertions(+), 234 deletions(-)
>>
>> There are also snoop changes needed in include/rdma/ib_mad.h.
> 
> I'll look into it if there are no other objections to removing the snoop 
> interface.
>
>>
>> Is it correct to assume that OPA_CAP_MASK3_IsSnoopSupported in
>> opa_port_info.h is not related to this ?
> 
> Nope, not related.
> 
> Ira
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC] IB/mad: remove obsolete snoop interface

2015-09-30 Thread Hal Rosenstock
On 9/30/2015 2:01 AM, ira.we...@intel.com wrote:
> From: Ira Weiny 
> 
> This interface has no current users and is obsolete.

There are no in tree users of this but there is Sean's madeye tool
(which is out of tree). This is still a useful debug tool for MADs.

> Signed-off-by: Ira Weiny 
> ---
> 
> This has undergone a medium level of testing.  I have run it against
> mlx4, qib, and OPA hardware and it does not seem to cause any regressions.
> 
>  drivers/infiniband/core/mad.c  | 226 
> +
>  drivers/infiniband/core/mad_priv.h |  13 ---
>  2 files changed, 5 insertions(+), 234 deletions(-)

There are also snoop changes needed in include/rdma/ib_mad.h.

Is it correct to assume that OPA_CAP_MASK3_IsSnoopSupported in
opa_port_info.h is not related to this ?

-- Hal
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC] IB/mad: remove obsolete snoop interface

2015-09-30 Thread Hal Rosenstock
On 9/30/2015 12:32 PM, Weiny, Ira wrote:
>>
>> On 9/30/2015 2:01 AM, ira.we...@intel.com wrote:
>>> From: Ira Weiny 
>>>
>>> This interface has no current users and is obsolete.
>>
>> There are no in tree users of this but there is Sean's madeye tool (which is 
>> out
>> of tree). This is still a useful debug tool for MADs.
> 
> Where is that?  

It's been out of tree for some time now. For one place, it can be found
on OpenFabrics site in some old and moldy OFED.

> I did not think it was supported any longer?

It still works AFAIK.

> I have a series of about 5 patches which implement tracing in the MAD stack 
> which I was working through and was going to submit 

Does your MAD stack tracing include the dumping and decode of sent and
received MADs ?

-- Hal

> along with this patch (those are still being tested in my spare time).  I 
> just wanted put this out here because of the patch Nicholas posted yesterday.
>>
>>> Signed-off-by: Ira Weiny 
>>> ---
>>>
>>> This has undergone a medium level of testing.  I have run it against
>>> mlx4, qib, and OPA hardware and it does not seem to cause any regressions.
>>>
>>>  drivers/infiniband/core/mad.c  | 226 
>>> +
>>>  drivers/infiniband/core/mad_priv.h |  13 ---
>>>  2 files changed, 5 insertions(+), 234 deletions(-)
>>
>> There are also snoop changes needed in include/rdma/ib_mad.h.
> 
> I'll look into it if there are no other objections to removing the snoop 
> interface.
>
>>
>> Is it correct to assume that OPA_CAP_MASK3_IsSnoopSupported in
>> opa_port_info.h is not related to this ?
> 
> Nope, not related.
> 
> Ira
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC] IB/mad: remove obsolete snoop interface

2015-09-30 Thread Hal Rosenstock
On 9/30/2015 2:01 AM, ira.we...@intel.com wrote:
> From: Ira Weiny 
> 
> This interface has no current users and is obsolete.

There are no in tree users of this but there is Sean's madeye tool
(which is out of tree). This is still a useful debug tool for MADs.

> Signed-off-by: Ira Weiny 
> ---
> 
> This has undergone a medium level of testing.  I have run it against
> mlx4, qib, and OPA hardware and it does not seem to cause any regressions.
> 
>  drivers/infiniband/core/mad.c  | 226 
> +
>  drivers/infiniband/core/mad_priv.h |  13 ---
>  2 files changed, 5 insertions(+), 234 deletions(-)

There are also snoop changes needed in include/rdma/ib_mad.h.

Is it correct to assume that OPA_CAP_MASK3_IsSnoopSupported in
opa_port_info.h is not related to this ?

-- Hal
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 04/27] IB/Verbs: Reform IB-core cm

2015-04-20 Thread Hal Rosenstock
On 4/20/2015 4:33 AM, Michael Wang wrote:
> 
> Use raw management helpers to reform IB-core cm.
> 
> Cc: Hal Rosenstock 
> Cc: Steve Wise 
> Cc: Tom Talpey 
> Cc: Jason Gunthorpe 
> Cc: Doug Ledford 
> Cc: Ira Weiny 
> Cc: Sean Hefty 
> Signed-off-by: Michael Wang 
> ---
>  drivers/infiniband/core/cm.c | 20 +---
>  1 file changed, 17 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
> index e28a494..3c10b75 100644
> --- a/drivers/infiniband/core/cm.c
> +++ b/drivers/infiniband/core/cm.c
> @@ -3761,9 +3761,7 @@ static void cm_add_one(struct ib_device *ib_device)
>   unsigned long flags;
>   int ret;
>   u8 i;
> -
> - if (rdma_node_get_transport(ib_device->node_type) != RDMA_TRANSPORT_IB)
> - return;
> + int count = 0;

Nit: Should the int count line be moved above u8 i declaration so
declarations are naturally aligned ?

-- Hal


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 04/27] IB/Verbs: Reform IB-core cm

2015-04-20 Thread Hal Rosenstock
On 4/20/2015 4:33 AM, Michael Wang wrote:
 
 Use raw management helpers to reform IB-core cm.
 
 Cc: Hal Rosenstock h...@dev.mellanox.co.il
 Cc: Steve Wise sw...@opengridcomputing.com
 Cc: Tom Talpey t...@talpey.com
 Cc: Jason Gunthorpe jguntho...@obsidianresearch.com
 Cc: Doug Ledford dledf...@redhat.com
 Cc: Ira Weiny ira.we...@intel.com
 Cc: Sean Hefty sean.he...@intel.com
 Signed-off-by: Michael Wang yun.w...@profitbricks.com
 ---
  drivers/infiniband/core/cm.c | 20 +---
  1 file changed, 17 insertions(+), 3 deletions(-)
 
 diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
 index e28a494..3c10b75 100644
 --- a/drivers/infiniband/core/cm.c
 +++ b/drivers/infiniband/core/cm.c
 @@ -3761,9 +3761,7 @@ static void cm_add_one(struct ib_device *ib_device)
   unsigned long flags;
   int ret;
   u8 i;
 -
 - if (rdma_node_get_transport(ib_device-node_type) != RDMA_TRANSPORT_IB)
 - return;
 + int count = 0;

Nit: Should the int count line be moved above u8 i declaration so
declarations are naturally aligned ?

-- Hal

snip...
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 10/27] IB/Verbs: Reform cm related part in IB-core cma/ucm

2015-04-16 Thread Hal Rosenstock
On 4/16/2015 11:58 AM, Jason Gunthorpe wrote:
> It also looks like hardwired 1 won't work on switch ports, so it is no-go.

AFAIK enhanced switch port 0 is not supported by CM/RDMA CM in the
current code. There is no need for CM/RDMA CM on base switch port 0.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 25/28] IB/Verbs: Use management helper cap_af_ib()

2015-04-16 Thread Hal Rosenstock
On 4/16/2015 9:57 AM, Or Gerlitz wrote:
> On Mon, Apr 13, 2015 at 3:35 PM, Michael Wang  
> wrote:
>>
>> Introduce helper cap_af_ib() to help us check if the port of an
>> IB device support Native Infiniband Address.
>>
>> Cc: Steve Wise 
>> Cc: Tom Talpey 
>> Cc: Jason Gunthorpe 
>> Cc: Doug Ledford 
>> Cc: Ira Weiny 
>> Cc: Sean Hefty 
>> Signed-off-by: Michael Wang 
>> ---
>>  drivers/infiniband/core/cma.c |  2 +-
>>  include/rdma/ib_verbs.h   | 15 +++
>>  2 files changed, 16 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
>> index 65e41f4..7f5815d 100644
>> --- a/drivers/infiniband/core/cma.c
>> +++ b/drivers/infiniband/core/cma.c
>> @@ -470,7 +470,7 @@ static int cma_resolve_ib_dev(struct rdma_id_private 
>> *id_priv)
>>
>> list_for_each_entry(cur_dev, _list, list) {
>> for (p = 1; p <= cur_dev->device->phys_port_cnt; ++p) {
>> -   if (!rdma_ib_or_iboe(cur_dev->device, p))
>> +   if (!cap_af_ib(cur_dev->device, p))
>> continue;
>>
>> if (ib_find_cached_pkey(cur_dev->device, p, pkey, 
>> ))
>> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
>> index 29ddd14..dfe33f3 100644
>> --- a/include/rdma/ib_verbs.h
>> +++ b/include/rdma/ib_verbs.h
>> @@ -1879,6 +1879,21 @@ static inline int cap_ipoib(struct ib_device *device, 
>> u8 port_num)
>>  }
>>
>>  /**
>> + * cap_af_ib - Check if the port of device has the capability
>> + * Native Infiniband Address.
>> + *
>> + * @device: Device to be checked
>> + * @port_num: Port number of the device
>> + *
>> + * Return 0 when port of the device don't support
>> + * Native Infiniband Address.
>> + */
>> +static inline int cap_af_ib(struct ib_device *device, u8 port_num)
>> +{
>> +   return rdma_ib_or_iboe(device, port_num);
>> +}
> 
> Sean, can you please put a precise writeup what does it take to
> support AF_IB... I am a bit
> confused here and wasn't sure if this can be supported with RoCE.

I think this means IB GID addressing is checked (Native Infiniband
Address) and not AF_IB (which is socket address/protocol family like
INET and INET6).

I think this naming is confusing and maybe cap_ib_gid is better ?

-- Hal

> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 14/27] IB/Verbs: Reform cma_acquire_dev()

2015-04-16 Thread Hal Rosenstock
On 4/16/2015 9:35 AM, Michael Wang wrote:
> 
> 
> On 04/16/2015 03:19 PM, Hal Rosenstock wrote:
> [snip]
>>>  
>>> +static inline int cma_validate_port(struct ib_device *device, u8 port,
>>> + union ib_gid *gid, int dev_type)
>>> +{
>>> +   u8 found_port;
>>> +   int ret = -ENODEV;
>>> +
>>> +   if ((dev_type == ARPHRD_INFINIBAND) && !rdma_tech_ib(device, port))
>>> +   return ret;
>>> +
>>> +   if ((dev_type != ARPHRD_INFINIBAND) && rdma_tech_ib(device, port))
>>> +   return ret;
>>> +
>>> +   ret = ib_find_cached_gid(device, gid, _port, NULL);
>>> +
>>> +   if (!ret && (port == found_port))
>>> +   return 0;
>>> +
>>> +   return ret;
>>
>> Should the case where ret = 0 and port != found_port need to be handled
>> the same as currently ? It looks different to me since in this case the
>> port will be saved into id_priv->id.port_num whereas currently it isn't.
> 
> I get your point :-) what about:
> 
>   ret = ib_find_cached_gid(device, gid, _port, NULL);
>   if (port != found_port)
>   return -ENODEV;
> 
>   return ret;

Yes, that looks to me to be consistent with the current implementation.

-- Hal

> Regards,
> Michael Wang
> 
>>
>> -- Hal
>>
>>> +}
>>> +
>>>  static int cma_acquire_dev(struct rdma_id_private *id_priv,
>>>struct rdma_id_private *listen_id_priv)
>>>  {
>>> struct rdma_dev_addr *dev_addr = _priv->id.route.addr.dev_addr;
>>> struct cma_device *cma_dev;
>>> -   union ib_gid gid, iboe_gid;
>>> +   union ib_gid gid, iboe_gid, *gidp;
>>> int ret = -ENODEV;
>>> -   u8 port, found_port;
>>> -   enum rdma_link_layer dev_ll = dev_addr->dev_type == ARPHRD_INFINIBAND ?
>>> -   IB_LINK_LAYER_INFINIBAND : IB_LINK_LAYER_ETHERNET;
>>> +   u8 port;
>>>  
>>> -   if (dev_ll != IB_LINK_LAYER_INFINIBAND &&
>>> +   if (dev_addr->dev_type != ARPHRD_INFINIBAND &&
>>> id_priv->id.ps == RDMA_PS_IPOIB)
>>> return -EINVAL;
>>>  
>>> @@ -391,41 +409,36 @@ static int cma_acquire_dev(struct rdma_id_private 
>>> *id_priv,
>>>  
>>> memcpy(, dev_addr->src_dev_addr +
>>>rdma_addr_gid_offset(dev_addr), sizeof gid);
>>> -   if (listen_id_priv &&
>>> -   rdma_port_get_link_layer(listen_id_priv->id.device,
>>> -listen_id_priv->id.port_num) == dev_ll) {
>>> +
>>> +   if (listen_id_priv) {
>>> cma_dev = listen_id_priv->cma_dev;
>>> port = listen_id_priv->id.port_num;
>>> -   if (rdma_node_get_transport(cma_dev->device->node_type) == 
>>> RDMA_TRANSPORT_IB &&
>>> -   rdma_port_get_link_layer(cma_dev->device, port) == 
>>> IB_LINK_LAYER_ETHERNET)
>>> -   ret = ib_find_cached_gid(cma_dev->device, _gid,
>>> -_port, NULL);
>>> -   else
>>> -   ret = ib_find_cached_gid(cma_dev->device, ,
>>> -_port, NULL);
>>> +   gidp = rdma_tech_iboe(cma_dev->device, port) ?
>>> +  _gid : 
>>>  
>>> -   if (!ret && (port  == found_port)) {
>>> -   id_priv->id.port_num = found_port;
>>> +   ret = cma_validate_port(cma_dev->device, port, gidp,
>>> +   dev_addr->dev_type);
>>> +   if (!ret) {
>>> +   id_priv->id.port_num = port;
>>> goto out;
>>> }
>>> }
>>> +
>>> list_for_each_entry(cma_dev, _list, list) {
>>> for (port = 1; port <= cma_dev->device->phys_port_cnt; ++port) {
>>> if (listen_id_priv &&
>>> listen_id_priv->cma_dev == cma_dev &&
>>> listen_id_priv->id.port_num == port)
>>> continue;
>>> -   if (rdma_port_get_link_layer(cma_dev->device, port) == 
>>> dev_ll) {
>>> -   if 
>>> (rdma_node_get_transport

Re: [PATCH v4 27/27] IB/Verbs: Cleanup rdma_node_get_transport()

2015-04-16 Thread Hal Rosenstock
On 4/16/2015 9:41 AM, Michael Wang wrote:
> 
> 
> On 04/16/2015 03:36 PM, Hal Rosenstock wrote:
> [snip]
>>> -EXPORT_SYMBOL(rdma_node_get_transport);
>>> -
>>>  enum rdma_link_layer rdma_port_get_link_layer(struct ib_device *device, u8 
>>> port_num)
>>>  {
>>> if (device->get_link_layer)
>>> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
>>> index 262bf44..f9ef479 100644
>>> --- a/include/rdma/ib_verbs.h
>>> +++ b/include/rdma/ib_verbs.h
>>> @@ -84,9 +84,6 @@ enum rdma_transport_type {
>>> RDMA_TRANSPORT_IBOE,
>>>  };
>>>  
>>> -__attribute_const__ enum rdma_transport_type
>>> -rdma_node_get_transport(enum rdma_node_type node_type);
>>> -
>>>  enum rdma_link_layer {
>>> IB_LINK_LAYER_UNSPECIFIED,
>>
>> Is IB_LINK_LAYER_UNSPECIFIED still possible ?
> 
> Actually it's impossible in kernel at first, all those who implemented the 
> callback
> won't return UNSPECIFIED, others all have the correct transport type 
> (otherwise BUG())
> and won't result UNSPECIFIED :-)

Should it be removed from this enum somewhere in this patch series
(perhaps early on) ?

-- Hal

> Regards,
> Michael Wang
> 
>>
>>> IB_LINK_LAYER_INFINIBAND,
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 20/27] IB/Verbs: Use management helper cap_ib_sa()

2015-04-16 Thread Hal Rosenstock
On 4/16/2015 4:12 AM, Michael Wang wrote:
> 
> Introduce helper cap_ib_sa() to help us check if the port of an
> IB device support Infiniband Subnet Administrator.

Nit: Administrator -> Administration

> 
> Cc: Steve Wise 
> Cc: Tom Talpey 
> Cc: Jason Gunthorpe 
> Cc: Doug Ledford 
> Cc: Ira Weiny 
> Cc: Sean Hefty 
> Signed-off-by: Michael Wang 
> ---
>  drivers/infiniband/core/cma.c  |  4 ++--
>  drivers/infiniband/core/sa_query.c | 10 +-
>  drivers/infiniband/core/ucma.c |  2 +-
>  include/rdma/ib_verbs.h| 15 +++
>  4 files changed, 23 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
> index a6f1526..094816f 100644
> --- a/drivers/infiniband/core/cma.c
> +++ b/drivers/infiniband/core/cma.c
> @@ -955,7 +955,7 @@ static inline int cma_user_data_offset(struct 
> rdma_id_private *id_priv)
>  
>  static void cma_cancel_route(struct rdma_id_private *id_priv)
>  {
> - if (rdma_tech_ib(id_priv->id.device, id_priv->id.port_num)) {
> + if (cap_ib_sa(id_priv->id.device, id_priv->id.port_num)) {
>   if (id_priv->query)
>   ib_sa_cancel_query(id_priv->query_id, id_priv->query);
>   }
> @@ -1979,7 +1979,7 @@ int rdma_resolve_route(struct rdma_cm_id *id, int 
> timeout_ms)
>   return -EINVAL;
>  
>   atomic_inc(_priv->refcount);
> - if (rdma_tech_ib(id->device, id->port_num))
> + if (cap_ib_sa(id->device, id->port_num))
>   ret = cma_resolve_ib_route(id_priv, timeout_ms);
>   else if (rdma_tech_iboe(id->device, id->port_num))
>   ret = cma_resolve_iboe_route(id_priv);
> diff --git a/drivers/infiniband/core/sa_query.c 
> b/drivers/infiniband/core/sa_query.c
> index 803ccf7..fc7e161 100644
> --- a/drivers/infiniband/core/sa_query.c
> +++ b/drivers/infiniband/core/sa_query.c
> @@ -450,7 +450,7 @@ static void ib_sa_event(struct ib_event_handler *handler, 
> struct ib_event *event
>   struct ib_sa_port *port =
>   _dev->port[event->element.port_num - 
> sa_dev->start_port];
>  
> - if (WARN_ON(!rdma_tech_ib(handler->device, port->port_num)))
> + if (WARN_ON(!cap_ib_sa(handler->device, port->port_num)))
>   return;
>  
>   spin_lock_irqsave(>ah_lock, flags);
> @@ -1173,7 +1173,7 @@ static void ib_sa_add_one(struct ib_device *device)
>  
>   for (i = 0; i <= e - s; ++i) {
>   spin_lock_init(_dev->port[i].ah_lock);
> - if (!rdma_tech_ib(device, i + 1))
> + if (!cap_ib_sa(device, i + 1))
>   continue;
>  
>   sa_dev->port[i].sm_ah= NULL;
> @@ -1210,7 +1210,7 @@ static void ib_sa_add_one(struct ib_device *device)
>   goto err;
>  
>   for (i = 0; i <= e - s; ++i) {
> - if (rdma_tech_ib(device, i + 1))
> + if (cap_ib_sa(device, i + 1))
>   update_sm_ah(_dev->port[i].update_task);
>   }
>  
> @@ -1218,7 +1218,7 @@ static void ib_sa_add_one(struct ib_device *device)
>  
>  err:
>   while (--i >= 0) {
> - if (rdma_tech_ib(device, i + 1))
> + if (cap_ib_sa(device, i + 1))
>   ib_unregister_mad_agent(sa_dev->port[i].agent);
>   }
>  
> @@ -1240,7 +1240,7 @@ static void ib_sa_remove_one(struct ib_device *device)
>   flush_workqueue(ib_wq);
>  
>   for (i = 0; i <= sa_dev->end_port - sa_dev->start_port; ++i) {
> - if (rdma_tech_ib(device, i + 1)) {
> + if (cap_ib_sa(device, i + 1)) {
>   ib_unregister_mad_agent(sa_dev->port[i].agent);
>   if (sa_dev->port[i].sm_ah)
>   kref_put(_dev->port[i].sm_ah->ref, 
> free_sm_ah);
> diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
> index 7331c6c..bed7957 100644
> --- a/drivers/infiniband/core/ucma.c
> +++ b/drivers/infiniband/core/ucma.c
> @@ -723,7 +723,7 @@ static ssize_t ucma_query_route(struct ucma_file *file,
>   resp.node_guid = (__force __u64) ctx->cm_id->device->node_guid;
>   resp.port_num = ctx->cm_id->port_num;
>  
> - if (rdma_tech_ib(ctx->cm_id->device, ctx->cm_id->port_num))
> + if (cap_ib_sa(ctx->cm_id->device, ctx->cm_id->port_num))
>   ucma_copy_ib_route(, >cm_id->route);
>   else if (rdma_tech_iboe(ctx->cm_id->device, ctx->cm_id->port_num))
>   ucma_copy_iboe_route(, >cm_id->route);
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index e4999f6..3bfdf81 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -1833,6 +1833,21 @@ static inline int cap_iw_cm(struct ib_device *device, 
> u8 port_num)
>   return rdma_tech_iwarp(device, port_num);
>  }
>  
> +/**
> + * cap_ib_sa - Check if the port of device has the capability Infiniband
> + * Subnet Administrator.

Nit: Administrator -> Administration


Re: [PATCH v4 27/27] IB/Verbs: Cleanup rdma_node_get_transport()

2015-04-16 Thread Hal Rosenstock
On 4/16/2015 4:15 AM, Michael Wang wrote:
> 
> We have get rid of all the scene using legacy rdma_node_get_transport(),
> now clean it up.
> 
> Cc: Steve Wise 
> Cc: Tom Talpey 
> Cc: Jason Gunthorpe 
> Cc: Doug Ledford 
> Cc: Ira Weiny 
> Cc: Sean Hefty 
> Signed-off-by: Michael Wang 
> ---
>  drivers/infiniband/core/verbs.c | 21 -
>  include/rdma/ib_verbs.h |  3 ---
>  2 files changed, 24 deletions(-)
> 
> diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
> index ee4b5cb..bbea0c0 100644
> --- a/drivers/infiniband/core/verbs.c
> +++ b/drivers/infiniband/core/verbs.c
> @@ -107,27 +107,6 @@ __attribute_const__ int ib_rate_to_mbps(enum ib_rate 
> rate)
>  }
>  EXPORT_SYMBOL(ib_rate_to_mbps);
>  
> -__attribute_const__ enum rdma_transport_type
> -rdma_node_get_transport(enum rdma_node_type node_type)
> -{
> - switch (node_type) {
> - case RDMA_NODE_IB_CA:
> - case RDMA_NODE_IB_SWITCH:
> - case RDMA_NODE_IB_ROUTER:
> - return RDMA_TRANSPORT_IB;
> - case RDMA_NODE_RNIC:
> - return RDMA_TRANSPORT_IWARP;
> - case RDMA_NODE_USNIC:
> - return RDMA_TRANSPORT_USNIC;
> - case RDMA_NODE_USNIC_UDP:
> - return RDMA_TRANSPORT_USNIC_UDP;
> - default:
> - BUG();
> - return 0;
> - }
> -}
> -EXPORT_SYMBOL(rdma_node_get_transport);
> -
>  enum rdma_link_layer rdma_port_get_link_layer(struct ib_device *device, u8 
> port_num)
>  {
>   if (device->get_link_layer)
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index 262bf44..f9ef479 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -84,9 +84,6 @@ enum rdma_transport_type {
>   RDMA_TRANSPORT_IBOE,
>  };
>  
> -__attribute_const__ enum rdma_transport_type
> -rdma_node_get_transport(enum rdma_node_type node_type);
> -
>  enum rdma_link_layer {
>   IB_LINK_LAYER_UNSPECIFIED,

Is IB_LINK_LAYER_UNSPECIFIED still possible ?

>   IB_LINK_LAYER_INFINIBAND,
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 14/27] IB/Verbs: Reform cma_acquire_dev()

2015-04-16 Thread Hal Rosenstock
On 4/16/2015 4:09 AM, Michael Wang wrote:
> 
> Reform cma_acquire_dev() with management helpers, introduce
> cma_validate_port() to make the code more clean.
> 
> Cc: Steve Wise 
> Cc: Tom Talpey 
> Cc: Jason Gunthorpe 
> Cc: Doug Ledford 
> Cc: Ira Weiny 
> Cc: Sean Hefty 
> Signed-off-by: Michael Wang 
> ---
>  drivers/infiniband/core/cma.c | 69 
> +--
>  1 file changed, 41 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
> index b520882..902cc1a 100644
> --- a/drivers/infiniband/core/cma.c
> +++ b/drivers/infiniband/core/cma.c
> @@ -370,18 +370,36 @@ static int cma_translate_addr(struct sockaddr *addr, 
> struct rdma_dev_addr *dev_a
>   return ret;
>  }
>  
> +static inline int cma_validate_port(struct ib_device *device, u8 port,
> +   union ib_gid *gid, int dev_type)
> +{
> + u8 found_port;
> + int ret = -ENODEV;
> +
> + if ((dev_type == ARPHRD_INFINIBAND) && !rdma_tech_ib(device, port))
> + return ret;
> +
> + if ((dev_type != ARPHRD_INFINIBAND) && rdma_tech_ib(device, port))
> + return ret;
> +
> + ret = ib_find_cached_gid(device, gid, _port, NULL);
> +
> + if (!ret && (port == found_port))
> + return 0;
> +
> + return ret;

Should the case where ret = 0 and port != found_port need to be handled
the same as currently ? It looks different to me since in this case the
port will be saved into id_priv->id.port_num whereas currently it isn't.

-- Hal

> +}
> +
>  static int cma_acquire_dev(struct rdma_id_private *id_priv,
>  struct rdma_id_private *listen_id_priv)
>  {
>   struct rdma_dev_addr *dev_addr = _priv->id.route.addr.dev_addr;
>   struct cma_device *cma_dev;
> - union ib_gid gid, iboe_gid;
> + union ib_gid gid, iboe_gid, *gidp;
>   int ret = -ENODEV;
> - u8 port, found_port;
> - enum rdma_link_layer dev_ll = dev_addr->dev_type == ARPHRD_INFINIBAND ?
> - IB_LINK_LAYER_INFINIBAND : IB_LINK_LAYER_ETHERNET;
> + u8 port;
>  
> - if (dev_ll != IB_LINK_LAYER_INFINIBAND &&
> + if (dev_addr->dev_type != ARPHRD_INFINIBAND &&
>   id_priv->id.ps == RDMA_PS_IPOIB)
>   return -EINVAL;
>  
> @@ -391,41 +409,36 @@ static int cma_acquire_dev(struct rdma_id_private 
> *id_priv,
>  
>   memcpy(, dev_addr->src_dev_addr +
>  rdma_addr_gid_offset(dev_addr), sizeof gid);
> - if (listen_id_priv &&
> - rdma_port_get_link_layer(listen_id_priv->id.device,
> -  listen_id_priv->id.port_num) == dev_ll) {
> +
> + if (listen_id_priv) {
>   cma_dev = listen_id_priv->cma_dev;
>   port = listen_id_priv->id.port_num;
> - if (rdma_node_get_transport(cma_dev->device->node_type) == 
> RDMA_TRANSPORT_IB &&
> - rdma_port_get_link_layer(cma_dev->device, port) == 
> IB_LINK_LAYER_ETHERNET)
> - ret = ib_find_cached_gid(cma_dev->device, _gid,
> -  _port, NULL);
> - else
> - ret = ib_find_cached_gid(cma_dev->device, ,
> -  _port, NULL);
> + gidp = rdma_tech_iboe(cma_dev->device, port) ?
> +_gid : 
>  
> - if (!ret && (port  == found_port)) {
> - id_priv->id.port_num = found_port;
> + ret = cma_validate_port(cma_dev->device, port, gidp,
> + dev_addr->dev_type);
> + if (!ret) {
> + id_priv->id.port_num = port;
>   goto out;
>   }
>   }
> +
>   list_for_each_entry(cma_dev, _list, list) {
>   for (port = 1; port <= cma_dev->device->phys_port_cnt; ++port) {
>   if (listen_id_priv &&
>   listen_id_priv->cma_dev == cma_dev &&
>   listen_id_priv->id.port_num == port)
>   continue;
> - if (rdma_port_get_link_layer(cma_dev->device, port) == 
> dev_ll) {
> - if 
> (rdma_node_get_transport(cma_dev->device->node_type) == RDMA_TRANSPORT_IB &&
> - rdma_port_get_link_layer(cma_dev->device, 
> port) == IB_LINK_LAYER_ETHERNET)
> - ret = 
> ib_find_cached_gid(cma_dev->device, _gid, _port, NULL);
> - else
> - ret = 
> ib_find_cached_gid(cma_dev->device, , _port, NULL);
> -
> - if (!ret && (port == found_port)) {
> - id_priv->id.port_num = found_port;
> - goto out;
> - }
> +
> + gidp = 

Re: [PATCH v4 10/27] IB/Verbs: Reform cm related part in IB-core cma/ucm

2015-04-16 Thread Hal Rosenstock
On 4/16/2015 4:08 AM, Michael Wang wrote:
> 
> Use raw management helpers to reform cm related part in IB-core cma/ucm.
> 
> These checks focus on the device cm type rather than the port capability,
> directly pass port 1 works currently, but can't support mixing cm type
> device in future.

This is equivalent to today where the checks are per node rather than
per port.

Should all checks here be port 1 based or only certain ones like listen
? For example, in connect/reject/disconnect, don't we already have port
? Guess this can be dealt with later as this is not a regression from
the current implementation.

-- Hal

> 
> Cc: Steve Wise 
> Cc: Tom Talpey 
> Cc: Jason Gunthorpe 
> Cc: Doug Ledford 
> Cc: Ira Weiny 
> Cc: Sean Hefty 
> Signed-off-by: Michael Wang 
> ---
>  drivers/infiniband/core/cma.c | 81 
> +--
>  drivers/infiniband/core/ucm.c |  3 +-
>  2 files changed, 26 insertions(+), 58 deletions(-)
> 
> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
> index d570030..6b8a64d 100644
> --- a/drivers/infiniband/core/cma.c
> +++ b/drivers/infiniband/core/cma.c
> @@ -735,8 +735,7 @@ int rdma_init_qp_attr(struct rdma_cm_id *id, struct 
> ib_qp_attr *qp_attr,
>   int ret = 0;
>  
>   id_priv = container_of(id, struct rdma_id_private, id);
> - switch (rdma_node_get_transport(id_priv->id.device->node_type)) {
> - case RDMA_TRANSPORT_IB:
> + if (rdma_ib_or_iboe(id_priv->id.device, 1)) {
>   if (!id_priv->cm_id.ib || (id_priv->id.qp_type == IB_QPT_UD))
>   ret = cma_ib_init_qp_attr(id_priv, qp_attr, 
> qp_attr_mask);
>   else
> @@ -745,19 +744,15 @@ int rdma_init_qp_attr(struct rdma_cm_id *id, struct 
> ib_qp_attr *qp_attr,
>  
>   if (qp_attr->qp_state == IB_QPS_RTR)
>   qp_attr->rq_psn = id_priv->seq_num;
> - break;
> - case RDMA_TRANSPORT_IWARP:
> + } else if (rdma_tech_iwarp(id_priv->id.device, 1)) {
>   if (!id_priv->cm_id.iw) {
>   qp_attr->qp_access_flags = 0;
>   *qp_attr_mask = IB_QP_STATE | IB_QP_ACCESS_FLAGS;
>   } else
>   ret = iw_cm_init_qp_attr(id_priv->cm_id.iw, qp_attr,
>qp_attr_mask);
> - break;
> - default:
> + } else
>   ret = -ENOSYS;
> - break;
> - }
>  
>   return ret;
>  }
> @@ -1037,17 +1032,12 @@ void rdma_destroy_id(struct rdma_cm_id *id)
>   mutex_unlock(_priv->handler_mutex);
>  
>   if (id_priv->cma_dev) {
> - switch (rdma_node_get_transport(id_priv->id.device->node_type)) 
> {
> - case RDMA_TRANSPORT_IB:
> + if (rdma_ib_or_iboe(id_priv->id.device, 1)) {
>   if (id_priv->cm_id.ib)
>   ib_destroy_cm_id(id_priv->cm_id.ib);
> - break;
> - case RDMA_TRANSPORT_IWARP:
> + } else if (rdma_tech_iwarp(id_priv->id.device, 1)) {
>   if (id_priv->cm_id.iw)
>   iw_destroy_cm_id(id_priv->cm_id.iw);
> - break;
> - default:
> - break;
>   }
>   cma_leave_mc_groups(id_priv);
>   cma_release_dev(id_priv);
> @@ -1626,7 +1616,7 @@ static void cma_listen_on_dev(struct rdma_id_private 
> *id_priv,
>   int ret;
>  
>   if (cma_family(id_priv) == AF_IB &&
> - rdma_node_get_transport(cma_dev->device->node_type) != 
> RDMA_TRANSPORT_IB)
> + !rdma_ib_or_iboe(cma_dev->device, 1))
>   return;
>  
>   id = rdma_create_id(cma_listen_handler, id_priv, id_priv->id.ps,
> @@ -2028,7 +2018,7 @@ static int cma_bind_loopback(struct rdma_id_private 
> *id_priv)
>   mutex_lock();
>   list_for_each_entry(cur_dev, _list, list) {
>   if (cma_family(id_priv) == AF_IB &&
> - rdma_node_get_transport(cur_dev->device->node_type) != 
> RDMA_TRANSPORT_IB)
> + !rdma_ib_or_iboe(cur_dev->device, 1))
>   continue;
>  
>   if (!cma_dev)
> @@ -2060,7 +2050,7 @@ port_found:
>   goto out;
>  
>   id_priv->id.route.addr.dev_addr.dev_type =
> - (rdma_port_get_link_layer(cma_dev->device, p) == 
> IB_LINK_LAYER_INFINIBAND) ?
> + (rdma_tech_ib(cma_dev->device, p)) ?
>   ARPHRD_INFINIBAND : ARPHRD_ETHER;
>  
>   rdma_addr_set_sgid(_priv->id.route.addr.dev_addr, );
> @@ -2537,18 +2527,15 @@ int rdma_listen(struct rdma_cm_id *id, int backlog)
>  
>   id_priv->backlog = backlog;
>   if (id->device) {
> - switch (rdma_node_get_transport(id->device->node_type)) {
> - case RDMA_TRANSPORT_IB:
> + if (rdma_ib_or_iboe(id->device, 1)) {
>   ret = cma_ib_listen(id_priv);
>

Re: [PATCH v4 03/27] IB/Verbs: Reform IB-core mad/agent/user_mad

2015-04-16 Thread Hal Rosenstock
On 4/16/2015 4:05 AM, Michael Wang wrote:
> 
> Use raw management helpers to reform IB-core mad/agent/user_mad.
> 
> Cc: Steve Wise 
> Cc: Tom Talpey 
> Cc: Jason Gunthorpe 
> Cc: Doug Ledford 
> Cc: Ira Weiny 
> Cc: Sean Hefty 
> Signed-off-by: Michael Wang 
> ---
>  drivers/infiniband/core/agent.c|  2 +-
>  drivers/infiniband/core/mad.c  | 20 ++--
>  drivers/infiniband/core/user_mad.c | 26 --
>  3 files changed, 31 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/infiniband/core/agent.c b/drivers/infiniband/core/agent.c
> index f6d2961..ffdef4d 100644
> --- a/drivers/infiniband/core/agent.c
> +++ b/drivers/infiniband/core/agent.c
> @@ -156,7 +156,7 @@ int ib_agent_port_open(struct ib_device *device, int 
> port_num)
>   goto error1;
>   }
>  
> - if (rdma_port_get_link_layer(device, port_num) == 
> IB_LINK_LAYER_INFINIBAND) {
> + if (rdma_tech_ib(device, port_num)) {
>   /* Obtain send only MAD agent for SMI QP */
>   port_priv->agent[0] = ib_register_mad_agent(device, port_num,
>   IB_QPT_SMI, NULL, 0,
> diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
> index 74c30f4..d451a47 100644
> --- a/drivers/infiniband/core/mad.c
> +++ b/drivers/infiniband/core/mad.c
> @@ -2938,7 +2938,7 @@ static int ib_mad_port_open(struct ib_device *device,
>   init_mad_qp(port_priv, _priv->qp_info[1]);
>  
>   cq_size = mad_sendq_size + mad_recvq_size;
> - has_smi = rdma_port_get_link_layer(device, port_num) == 
> IB_LINK_LAYER_INFINIBAND;
> + has_smi = rdma_tech_ib(device, port_num);
>   if (has_smi)
>   cq_size *= 2;
>  
> @@ -3057,9 +3057,6 @@ static void ib_mad_init_device(struct ib_device *device)
>  {
>   int start, end, i;
>  
> - if (rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB)
> - return;
> -
>   if (device->node_type == RDMA_NODE_IB_SWITCH) {
>   start = 0;
>   end   = 0;
> @@ -3069,6 +3066,9 @@ static void ib_mad_init_device(struct ib_device *device)
>   }
>  
>   for (i = start; i <= end; i++) {
> + if (!rdma_ib_or_iboe(device, i))
> + continue;
> +
>   if (ib_mad_port_open(device, i)) {
>   dev_err(>dev, "Couldn't open port %d\n", i);
>   goto error;
> @@ -3086,15 +3086,15 @@ error_agent:
>   dev_err(>dev, "Couldn't close port %d\n", i);
>  
>  error:
> - i--;
> + while (--i >= start) {
> + if (!rdma_ib_or_iboe(device, i))
> + continue;
>  
> - while (i >= start) {
>   if (ib_agent_port_close(device, i))
>   dev_err(>dev,
>   "Couldn't close port %d for agents\n", i);
>   if (ib_mad_port_close(device, i))
>   dev_err(>dev, "Couldn't close port %d\n", i);
> - i--;
>   }
>  }
>  
> @@ -3102,9 +3102,6 @@ static void ib_mad_remove_device(struct ib_device 
> *device)
>  {
>   int i, num_ports, cur_port;
>  
> - if (rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB)
> - return;
> -
>   if (device->node_type == RDMA_NODE_IB_SWITCH) {
>   num_ports = 1;
>   cur_port = 0;
> @@ -3113,6 +3110,9 @@ static void ib_mad_remove_device(struct ib_device 
> *device)
>   cur_port = 1;
>   }
>   for (i = 0; i < num_ports; i++, cur_port++) {
> + if (!rdma_ib_or_iboe(device, i))

Shouldn't this be:
if (!rdma_ib_or_iboe(device, cur_port))
?

-- Hal

> + continue;
> +
>   if (ib_agent_port_close(device, cur_port))
>   dev_err(>dev,
>   "Couldn't close port %d for agents\n",
> diff --git a/drivers/infiniband/core/user_mad.c 
> b/drivers/infiniband/core/user_mad.c
> index 928cdd2..71fc8ba 100644
> --- a/drivers/infiniband/core/user_mad.c
> +++ b/drivers/infiniband/core/user_mad.c
> @@ -1273,9 +1273,7 @@ static void ib_umad_add_one(struct ib_device *device)
>  {
>   struct ib_umad_device *umad_dev;
>   int s, e, i;
> -
> - if (rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB)
> - return;
> + int count = 0;
>  
>   if (device->node_type == RDMA_NODE_IB_SWITCH)
>   s = e = 0;
> @@ -1296,11 +1294,21 @@ static void ib_umad_add_one(struct ib_device *device)
>   umad_dev->end_port   = e;
>  
>   for (i = s; i <= e; ++i) {
> + if (!rdma_ib_or_iboe(device, i))
> + continue;
> +
>   umad_dev->port[i - s].umad_dev = umad_dev;
>  
>   if (ib_umad_init_port(device, i, umad_dev,
> _dev->port[i - s]))
>   goto err;
> +
> + count++;
> + }
> +
> 

Re: [PATCH v4 04/27] IB/Verbs: Reform IB-core cm

2015-04-16 Thread Hal Rosenstock
On 4/16/2015 4:05 AM, Michael Wang wrote:
> 
> Use raw management helpers to reform IB-core cm.
> 
> Cc: Steve Wise 
> Cc: Tom Talpey 
> Cc: Jason Gunthorpe 
> Cc: Doug Ledford 
> Cc: Ira Weiny 
> Cc: Sean Hefty 
> Signed-off-by: Michael Wang 
> ---
>  drivers/infiniband/core/cm.c | 22 +++---
>  1 file changed, 19 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
> index e28a494..50321fe 100644
> --- a/drivers/infiniband/core/cm.c
> +++ b/drivers/infiniband/core/cm.c
> @@ -3761,9 +3761,7 @@ static void cm_add_one(struct ib_device *ib_device)
>   unsigned long flags;
>   int ret;
>   u8 i;
> -
> - if (rdma_node_get_transport(ib_device->node_type) != RDMA_TRANSPORT_IB)
> - return;
> + int count = 0;
>  
>   cm_dev = kzalloc(sizeof(*cm_dev) + sizeof(*port) *
>ib_device->phys_port_cnt, GFP_KERNEL);
> @@ -3783,6 +3781,9 @@ static void cm_add_one(struct ib_device *ib_device)
>  
>   set_bit(IB_MGMT_METHOD_SEND, reg_req.method_mask);
>   for (i = 1; i <= ib_device->phys_port_cnt; i++) {
> + if (!rdma_ib_or_iboe(ib_device, i))
> + continue;
> +
>   port = kzalloc(sizeof *port, GFP_KERNEL);
>   if (!port)
>   goto error1;
> @@ -3809,7 +3810,16 @@ static void cm_add_one(struct ib_device *ib_device)
>   ret = ib_modify_port(ib_device, i, 0, _modify);
>   if (ret)
>   goto error3;
> +
> + count++;
>   }
> +
> + if (!count) {
> + device_unregister(cm_dev->device);
> + kfree(cm_dev);
> + return;

Nit: alternatively, this could be goto at end of this routine where
there is same 2 calls. I think Jason made this comment on earlier set of
these patches.

-- Hal

> + }
> +
>   ib_set_client_data(ib_device, _client, cm_dev);
>  
>   write_lock_irqsave(_lock, flags);
> @@ -3825,6 +3835,9 @@ error1:
>   port_modify.set_port_cap_mask = 0;
>   port_modify.clr_port_cap_mask = IB_PORT_CM_SUP;
>   while (--i) {
> + if (!rdma_ib_or_iboe(ib_device, i))
> + continue;
> +
>   port = cm_dev->port[i-1];
>   ib_modify_port(ib_device, port->port_num, 0, _modify);
>   ib_unregister_mad_agent(port->mad_agent);
> @@ -3853,6 +3866,9 @@ static void cm_remove_one(struct ib_device *ib_device)
>   write_unlock_irqrestore(_lock, flags);
>  
>   for (i = 1; i <= ib_device->phys_port_cnt; i++) {
> + if (!rdma_ib_or_iboe(ib_device, i))
> + continue;
> +
>   port = cm_dev->port[i-1];
>   ib_modify_port(ib_device, port->port_num, 0, _modify);
>   ib_unregister_mad_agent(port->mad_agent);

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 01/28] IB/Verbs: Implement new callback query_transport()

2015-04-16 Thread Hal Rosenstock
On 4/15/2015 4:33 PM, ira.weiny wrote:
> On Wed, Apr 15, 2015 at 02:36:13PM -0400, Hal Rosenstock wrote:
>> On 4/13/2015 8:22 AM, Michael Wang wrote:
>>> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
>>> index 65994a1..d54f91e 100644
>>> --- a/include/rdma/ib_verbs.h
>>> +++ b/include/rdma/ib_verbs.h
>>> @@ -75,10 +75,13 @@ enum rdma_node_type {
>>>  };
>>>  
>>>  enum rdma_transport_type {
>>> +   /* legacy for users */
>>> RDMA_TRANSPORT_IB,
>>> RDMA_TRANSPORT_IWARP,
>>> RDMA_TRANSPORT_USNIC,
>>> -   RDMA_TRANSPORT_USNIC_UDP
>>> +   RDMA_TRANSPORT_USNIC_UDP,
>>> +   /* new transport */
>>> +   RDMA_TRANSPORT_IBOE,
>>>  };
>>>  
>>>  __attribute_const__ enum rdma_transport_type
>>> @@ -1501,6 +1504,8 @@ struct ib_device {
>>> int(*query_port)(struct ib_device *device,
>>>  u8 port_num,
>>>  struct ib_port_attr 
>>> *port_attr);
>>> +   enum rdma_transport_type   (*query_transport)(struct ib_device *device,
>>> + u8 port_num);
>>> enum rdma_link_layer   (*get_link_layer)(struct ib_device *device,
>>>  u8 port_num);
>>> int(*query_gid)(struct ib_device *device,
>>
>> libibverbs also exposes transport at the device level. Isn't a change to
>> make transport per port rather than per device needed there as well to
>> be consistent with these proposed kernel changes ? If so, would the
>> additional IBoE transport be exposed ? We also need to worry about
>> backward compatibility for existing applications.
> 
> Early on in this thread we agreed that user space would stay the same until we
> get the kernel straightened out.  

I missed that in this very long thread. I just wanted to be sure that
this will be addressed.

-- Hal

> The above enum and function are not exported
> to user space so I believe this is in alignment with that plan.
> 
> Ira
> 
> .
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 01/28] IB/Verbs: Implement new callback query_transport()

2015-04-16 Thread Hal Rosenstock
On 4/15/2015 4:33 PM, ira.weiny wrote:
 On Wed, Apr 15, 2015 at 02:36:13PM -0400, Hal Rosenstock wrote:
 On 4/13/2015 8:22 AM, Michael Wang wrote:
 diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
 index 65994a1..d54f91e 100644
 --- a/include/rdma/ib_verbs.h
 +++ b/include/rdma/ib_verbs.h
 @@ -75,10 +75,13 @@ enum rdma_node_type {
  };
  
  enum rdma_transport_type {
 +   /* legacy for users */
 RDMA_TRANSPORT_IB,
 RDMA_TRANSPORT_IWARP,
 RDMA_TRANSPORT_USNIC,
 -   RDMA_TRANSPORT_USNIC_UDP
 +   RDMA_TRANSPORT_USNIC_UDP,
 +   /* new transport */
 +   RDMA_TRANSPORT_IBOE,
  };
  
  __attribute_const__ enum rdma_transport_type
 @@ -1501,6 +1504,8 @@ struct ib_device {
 int(*query_port)(struct ib_device *device,
  u8 port_num,
  struct ib_port_attr 
 *port_attr);
 +   enum rdma_transport_type   (*query_transport)(struct ib_device *device,
 + u8 port_num);
 enum rdma_link_layer   (*get_link_layer)(struct ib_device *device,
  u8 port_num);
 int(*query_gid)(struct ib_device *device,

 libibverbs also exposes transport at the device level. Isn't a change to
 make transport per port rather than per device needed there as well to
 be consistent with these proposed kernel changes ? If so, would the
 additional IBoE transport be exposed ? We also need to worry about
 backward compatibility for existing applications.
 
 Early on in this thread we agreed that user space would stay the same until we
 get the kernel straightened out.  

I missed that in this very long thread. I just wanted to be sure that
this will be addressed.

-- Hal

 The above enum and function are not exported
 to user space so I believe this is in alignment with that plan.
 
 Ira
 
 .
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 03/27] IB/Verbs: Reform IB-core mad/agent/user_mad

2015-04-16 Thread Hal Rosenstock
On 4/16/2015 4:05 AM, Michael Wang wrote:
 
 Use raw management helpers to reform IB-core mad/agent/user_mad.
 
 Cc: Steve Wise sw...@opengridcomputing.com
 Cc: Tom Talpey t...@talpey.com
 Cc: Jason Gunthorpe jguntho...@obsidianresearch.com
 Cc: Doug Ledford dledf...@redhat.com
 Cc: Ira Weiny ira.we...@intel.com
 Cc: Sean Hefty sean.he...@intel.com
 Signed-off-by: Michael Wang yun.w...@profitbricks.com
 ---
  drivers/infiniband/core/agent.c|  2 +-
  drivers/infiniband/core/mad.c  | 20 ++--
  drivers/infiniband/core/user_mad.c | 26 --
  3 files changed, 31 insertions(+), 17 deletions(-)
 
 diff --git a/drivers/infiniband/core/agent.c b/drivers/infiniband/core/agent.c
 index f6d2961..ffdef4d 100644
 --- a/drivers/infiniband/core/agent.c
 +++ b/drivers/infiniband/core/agent.c
 @@ -156,7 +156,7 @@ int ib_agent_port_open(struct ib_device *device, int 
 port_num)
   goto error1;
   }
  
 - if (rdma_port_get_link_layer(device, port_num) == 
 IB_LINK_LAYER_INFINIBAND) {
 + if (rdma_tech_ib(device, port_num)) {
   /* Obtain send only MAD agent for SMI QP */
   port_priv-agent[0] = ib_register_mad_agent(device, port_num,
   IB_QPT_SMI, NULL, 0,
 diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
 index 74c30f4..d451a47 100644
 --- a/drivers/infiniband/core/mad.c
 +++ b/drivers/infiniband/core/mad.c
 @@ -2938,7 +2938,7 @@ static int ib_mad_port_open(struct ib_device *device,
   init_mad_qp(port_priv, port_priv-qp_info[1]);
  
   cq_size = mad_sendq_size + mad_recvq_size;
 - has_smi = rdma_port_get_link_layer(device, port_num) == 
 IB_LINK_LAYER_INFINIBAND;
 + has_smi = rdma_tech_ib(device, port_num);
   if (has_smi)
   cq_size *= 2;
  
 @@ -3057,9 +3057,6 @@ static void ib_mad_init_device(struct ib_device *device)
  {
   int start, end, i;
  
 - if (rdma_node_get_transport(device-node_type) != RDMA_TRANSPORT_IB)
 - return;
 -
   if (device-node_type == RDMA_NODE_IB_SWITCH) {
   start = 0;
   end   = 0;
 @@ -3069,6 +3066,9 @@ static void ib_mad_init_device(struct ib_device *device)
   }
  
   for (i = start; i = end; i++) {
 + if (!rdma_ib_or_iboe(device, i))
 + continue;
 +
   if (ib_mad_port_open(device, i)) {
   dev_err(device-dev, Couldn't open port %d\n, i);
   goto error;
 @@ -3086,15 +3086,15 @@ error_agent:
   dev_err(device-dev, Couldn't close port %d\n, i);
  
  error:
 - i--;
 + while (--i = start) {
 + if (!rdma_ib_or_iboe(device, i))
 + continue;
  
 - while (i = start) {
   if (ib_agent_port_close(device, i))
   dev_err(device-dev,
   Couldn't close port %d for agents\n, i);
   if (ib_mad_port_close(device, i))
   dev_err(device-dev, Couldn't close port %d\n, i);
 - i--;
   }
  }
  
 @@ -3102,9 +3102,6 @@ static void ib_mad_remove_device(struct ib_device 
 *device)
  {
   int i, num_ports, cur_port;
  
 - if (rdma_node_get_transport(device-node_type) != RDMA_TRANSPORT_IB)
 - return;
 -
   if (device-node_type == RDMA_NODE_IB_SWITCH) {
   num_ports = 1;
   cur_port = 0;
 @@ -3113,6 +3110,9 @@ static void ib_mad_remove_device(struct ib_device 
 *device)
   cur_port = 1;
   }
   for (i = 0; i  num_ports; i++, cur_port++) {
 + if (!rdma_ib_or_iboe(device, i))

Shouldn't this be:
if (!rdma_ib_or_iboe(device, cur_port))
?

-- Hal

 + continue;
 +
   if (ib_agent_port_close(device, cur_port))
   dev_err(device-dev,
   Couldn't close port %d for agents\n,
 diff --git a/drivers/infiniband/core/user_mad.c 
 b/drivers/infiniband/core/user_mad.c
 index 928cdd2..71fc8ba 100644
 --- a/drivers/infiniband/core/user_mad.c
 +++ b/drivers/infiniband/core/user_mad.c
 @@ -1273,9 +1273,7 @@ static void ib_umad_add_one(struct ib_device *device)
  {
   struct ib_umad_device *umad_dev;
   int s, e, i;
 -
 - if (rdma_node_get_transport(device-node_type) != RDMA_TRANSPORT_IB)
 - return;
 + int count = 0;
  
   if (device-node_type == RDMA_NODE_IB_SWITCH)
   s = e = 0;
 @@ -1296,11 +1294,21 @@ static void ib_umad_add_one(struct ib_device *device)
   umad_dev-end_port   = e;
  
   for (i = s; i = e; ++i) {
 + if (!rdma_ib_or_iboe(device, i))
 + continue;
 +
   umad_dev-port[i - s].umad_dev = umad_dev;
  
   if (ib_umad_init_port(device, i, umad_dev,
 umad_dev-port[i - s]))
   goto err;
 +
 +  

Re: [PATCH v4 04/27] IB/Verbs: Reform IB-core cm

2015-04-16 Thread Hal Rosenstock
On 4/16/2015 4:05 AM, Michael Wang wrote:
 
 Use raw management helpers to reform IB-core cm.
 
 Cc: Steve Wise sw...@opengridcomputing.com
 Cc: Tom Talpey t...@talpey.com
 Cc: Jason Gunthorpe jguntho...@obsidianresearch.com
 Cc: Doug Ledford dledf...@redhat.com
 Cc: Ira Weiny ira.we...@intel.com
 Cc: Sean Hefty sean.he...@intel.com
 Signed-off-by: Michael Wang yun.w...@profitbricks.com
 ---
  drivers/infiniband/core/cm.c | 22 +++---
  1 file changed, 19 insertions(+), 3 deletions(-)
 
 diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
 index e28a494..50321fe 100644
 --- a/drivers/infiniband/core/cm.c
 +++ b/drivers/infiniband/core/cm.c
 @@ -3761,9 +3761,7 @@ static void cm_add_one(struct ib_device *ib_device)
   unsigned long flags;
   int ret;
   u8 i;
 -
 - if (rdma_node_get_transport(ib_device-node_type) != RDMA_TRANSPORT_IB)
 - return;
 + int count = 0;
  
   cm_dev = kzalloc(sizeof(*cm_dev) + sizeof(*port) *
ib_device-phys_port_cnt, GFP_KERNEL);
 @@ -3783,6 +3781,9 @@ static void cm_add_one(struct ib_device *ib_device)
  
   set_bit(IB_MGMT_METHOD_SEND, reg_req.method_mask);
   for (i = 1; i = ib_device-phys_port_cnt; i++) {
 + if (!rdma_ib_or_iboe(ib_device, i))
 + continue;
 +
   port = kzalloc(sizeof *port, GFP_KERNEL);
   if (!port)
   goto error1;
 @@ -3809,7 +3810,16 @@ static void cm_add_one(struct ib_device *ib_device)
   ret = ib_modify_port(ib_device, i, 0, port_modify);
   if (ret)
   goto error3;
 +
 + count++;
   }
 +
 + if (!count) {
 + device_unregister(cm_dev-device);
 + kfree(cm_dev);
 + return;

Nit: alternatively, this could be goto at end of this routine where
there is same 2 calls. I think Jason made this comment on earlier set of
these patches.

-- Hal

 + }
 +
   ib_set_client_data(ib_device, cm_client, cm_dev);
  
   write_lock_irqsave(cm.device_lock, flags);
 @@ -3825,6 +3835,9 @@ error1:
   port_modify.set_port_cap_mask = 0;
   port_modify.clr_port_cap_mask = IB_PORT_CM_SUP;
   while (--i) {
 + if (!rdma_ib_or_iboe(ib_device, i))
 + continue;
 +
   port = cm_dev-port[i-1];
   ib_modify_port(ib_device, port-port_num, 0, port_modify);
   ib_unregister_mad_agent(port-mad_agent);
 @@ -3853,6 +3866,9 @@ static void cm_remove_one(struct ib_device *ib_device)
   write_unlock_irqrestore(cm.device_lock, flags);
  
   for (i = 1; i = ib_device-phys_port_cnt; i++) {
 + if (!rdma_ib_or_iboe(ib_device, i))
 + continue;
 +
   port = cm_dev-port[i-1];
   ib_modify_port(ib_device, port-port_num, 0, port_modify);
   ib_unregister_mad_agent(port-mad_agent);

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 14/27] IB/Verbs: Reform cma_acquire_dev()

2015-04-16 Thread Hal Rosenstock
On 4/16/2015 9:35 AM, Michael Wang wrote:
 
 
 On 04/16/2015 03:19 PM, Hal Rosenstock wrote:
 [snip]
  
 +static inline int cma_validate_port(struct ib_device *device, u8 port,
 + union ib_gid *gid, int dev_type)
 +{
 +   u8 found_port;
 +   int ret = -ENODEV;
 +
 +   if ((dev_type == ARPHRD_INFINIBAND)  !rdma_tech_ib(device, port))
 +   return ret;
 +
 +   if ((dev_type != ARPHRD_INFINIBAND)  rdma_tech_ib(device, port))
 +   return ret;
 +
 +   ret = ib_find_cached_gid(device, gid, found_port, NULL);
 +
 +   if (!ret  (port == found_port))
 +   return 0;
 +
 +   return ret;

 Should the case where ret = 0 and port != found_port need to be handled
 the same as currently ? It looks different to me since in this case the
 port will be saved into id_priv-id.port_num whereas currently it isn't.
 
 I get your point :-) what about:
 
   ret = ib_find_cached_gid(device, gid, found_port, NULL);
   if (port != found_port)
   return -ENODEV;
 
   return ret;

Yes, that looks to me to be consistent with the current implementation.

-- Hal

 Regards,
 Michael Wang
 

 -- Hal

 +}
 +
  static int cma_acquire_dev(struct rdma_id_private *id_priv,
struct rdma_id_private *listen_id_priv)
  {
 struct rdma_dev_addr *dev_addr = id_priv-id.route.addr.dev_addr;
 struct cma_device *cma_dev;
 -   union ib_gid gid, iboe_gid;
 +   union ib_gid gid, iboe_gid, *gidp;
 int ret = -ENODEV;
 -   u8 port, found_port;
 -   enum rdma_link_layer dev_ll = dev_addr-dev_type == ARPHRD_INFINIBAND ?
 -   IB_LINK_LAYER_INFINIBAND : IB_LINK_LAYER_ETHERNET;
 +   u8 port;
  
 -   if (dev_ll != IB_LINK_LAYER_INFINIBAND 
 +   if (dev_addr-dev_type != ARPHRD_INFINIBAND 
 id_priv-id.ps == RDMA_PS_IPOIB)
 return -EINVAL;
  
 @@ -391,41 +409,36 @@ static int cma_acquire_dev(struct rdma_id_private 
 *id_priv,
  
 memcpy(gid, dev_addr-src_dev_addr +
rdma_addr_gid_offset(dev_addr), sizeof gid);
 -   if (listen_id_priv 
 -   rdma_port_get_link_layer(listen_id_priv-id.device,
 -listen_id_priv-id.port_num) == dev_ll) {
 +
 +   if (listen_id_priv) {
 cma_dev = listen_id_priv-cma_dev;
 port = listen_id_priv-id.port_num;
 -   if (rdma_node_get_transport(cma_dev-device-node_type) == 
 RDMA_TRANSPORT_IB 
 -   rdma_port_get_link_layer(cma_dev-device, port) == 
 IB_LINK_LAYER_ETHERNET)
 -   ret = ib_find_cached_gid(cma_dev-device, iboe_gid,
 -found_port, NULL);
 -   else
 -   ret = ib_find_cached_gid(cma_dev-device, gid,
 -found_port, NULL);
 +   gidp = rdma_tech_iboe(cma_dev-device, port) ?
 +  iboe_gid : gid;
  
 -   if (!ret  (port  == found_port)) {
 -   id_priv-id.port_num = found_port;
 +   ret = cma_validate_port(cma_dev-device, port, gidp,
 +   dev_addr-dev_type);
 +   if (!ret) {
 +   id_priv-id.port_num = port;
 goto out;
 }
 }
 +
 list_for_each_entry(cma_dev, dev_list, list) {
 for (port = 1; port = cma_dev-device-phys_port_cnt; ++port) {
 if (listen_id_priv 
 listen_id_priv-cma_dev == cma_dev 
 listen_id_priv-id.port_num == port)
 continue;
 -   if (rdma_port_get_link_layer(cma_dev-device, port) == 
 dev_ll) {
 -   if 
 (rdma_node_get_transport(cma_dev-device-node_type) == RDMA_TRANSPORT_IB 
 -   rdma_port_get_link_layer(cma_dev-device, 
 port) == IB_LINK_LAYER_ETHERNET)
 -   ret = 
 ib_find_cached_gid(cma_dev-device, iboe_gid, found_port, NULL);
 -   else
 -   ret = 
 ib_find_cached_gid(cma_dev-device, gid, found_port, NULL);
 -
 -   if (!ret  (port == found_port)) {
 -   id_priv-id.port_num = found_port;
 -   goto out;
 -   }
 +
 +   gidp = rdma_tech_iboe(cma_dev-device, port) ?
 +  iboe_gid : gid;
 +
 +   ret = cma_validate_port(cma_dev-device, port, gidp,
 +   dev_addr-dev_type);
 +   if (!ret) {
 +   id_priv-id.port_num = port;
 +   goto out;
 }
 }
 }

 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http

Re: [PATCH v4 27/27] IB/Verbs: Cleanup rdma_node_get_transport()

2015-04-16 Thread Hal Rosenstock
On 4/16/2015 9:41 AM, Michael Wang wrote:
 
 
 On 04/16/2015 03:36 PM, Hal Rosenstock wrote:
 [snip]
 -EXPORT_SYMBOL(rdma_node_get_transport);
 -
  enum rdma_link_layer rdma_port_get_link_layer(struct ib_device *device, u8 
 port_num)
  {
 if (device-get_link_layer)
 diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
 index 262bf44..f9ef479 100644
 --- a/include/rdma/ib_verbs.h
 +++ b/include/rdma/ib_verbs.h
 @@ -84,9 +84,6 @@ enum rdma_transport_type {
 RDMA_TRANSPORT_IBOE,
  };
  
 -__attribute_const__ enum rdma_transport_type
 -rdma_node_get_transport(enum rdma_node_type node_type);
 -
  enum rdma_link_layer {
 IB_LINK_LAYER_UNSPECIFIED,

 Is IB_LINK_LAYER_UNSPECIFIED still possible ?
 
 Actually it's impossible in kernel at first, all those who implemented the 
 callback
 won't return UNSPECIFIED, others all have the correct transport type 
 (otherwise BUG())
 and won't result UNSPECIFIED :-)

Should it be removed from this enum somewhere in this patch series
(perhaps early on) ?

-- Hal

 Regards,
 Michael Wang
 

 IB_LINK_LAYER_INFINIBAND,
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 14/27] IB/Verbs: Reform cma_acquire_dev()

2015-04-16 Thread Hal Rosenstock
On 4/16/2015 4:09 AM, Michael Wang wrote:
 
 Reform cma_acquire_dev() with management helpers, introduce
 cma_validate_port() to make the code more clean.
 
 Cc: Steve Wise sw...@opengridcomputing.com
 Cc: Tom Talpey t...@talpey.com
 Cc: Jason Gunthorpe jguntho...@obsidianresearch.com
 Cc: Doug Ledford dledf...@redhat.com
 Cc: Ira Weiny ira.we...@intel.com
 Cc: Sean Hefty sean.he...@intel.com
 Signed-off-by: Michael Wang yun.w...@profitbricks.com
 ---
  drivers/infiniband/core/cma.c | 69 
 +--
  1 file changed, 41 insertions(+), 28 deletions(-)
 
 diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
 index b520882..902cc1a 100644
 --- a/drivers/infiniband/core/cma.c
 +++ b/drivers/infiniband/core/cma.c
 @@ -370,18 +370,36 @@ static int cma_translate_addr(struct sockaddr *addr, 
 struct rdma_dev_addr *dev_a
   return ret;
  }
  
 +static inline int cma_validate_port(struct ib_device *device, u8 port,
 +   union ib_gid *gid, int dev_type)
 +{
 + u8 found_port;
 + int ret = -ENODEV;
 +
 + if ((dev_type == ARPHRD_INFINIBAND)  !rdma_tech_ib(device, port))
 + return ret;
 +
 + if ((dev_type != ARPHRD_INFINIBAND)  rdma_tech_ib(device, port))
 + return ret;
 +
 + ret = ib_find_cached_gid(device, gid, found_port, NULL);
 +
 + if (!ret  (port == found_port))
 + return 0;
 +
 + return ret;

Should the case where ret = 0 and port != found_port need to be handled
the same as currently ? It looks different to me since in this case the
port will be saved into id_priv-id.port_num whereas currently it isn't.

-- Hal

 +}
 +
  static int cma_acquire_dev(struct rdma_id_private *id_priv,
  struct rdma_id_private *listen_id_priv)
  {
   struct rdma_dev_addr *dev_addr = id_priv-id.route.addr.dev_addr;
   struct cma_device *cma_dev;
 - union ib_gid gid, iboe_gid;
 + union ib_gid gid, iboe_gid, *gidp;
   int ret = -ENODEV;
 - u8 port, found_port;
 - enum rdma_link_layer dev_ll = dev_addr-dev_type == ARPHRD_INFINIBAND ?
 - IB_LINK_LAYER_INFINIBAND : IB_LINK_LAYER_ETHERNET;
 + u8 port;
  
 - if (dev_ll != IB_LINK_LAYER_INFINIBAND 
 + if (dev_addr-dev_type != ARPHRD_INFINIBAND 
   id_priv-id.ps == RDMA_PS_IPOIB)
   return -EINVAL;
  
 @@ -391,41 +409,36 @@ static int cma_acquire_dev(struct rdma_id_private 
 *id_priv,
  
   memcpy(gid, dev_addr-src_dev_addr +
  rdma_addr_gid_offset(dev_addr), sizeof gid);
 - if (listen_id_priv 
 - rdma_port_get_link_layer(listen_id_priv-id.device,
 -  listen_id_priv-id.port_num) == dev_ll) {
 +
 + if (listen_id_priv) {
   cma_dev = listen_id_priv-cma_dev;
   port = listen_id_priv-id.port_num;
 - if (rdma_node_get_transport(cma_dev-device-node_type) == 
 RDMA_TRANSPORT_IB 
 - rdma_port_get_link_layer(cma_dev-device, port) == 
 IB_LINK_LAYER_ETHERNET)
 - ret = ib_find_cached_gid(cma_dev-device, iboe_gid,
 -  found_port, NULL);
 - else
 - ret = ib_find_cached_gid(cma_dev-device, gid,
 -  found_port, NULL);
 + gidp = rdma_tech_iboe(cma_dev-device, port) ?
 +iboe_gid : gid;
  
 - if (!ret  (port  == found_port)) {
 - id_priv-id.port_num = found_port;
 + ret = cma_validate_port(cma_dev-device, port, gidp,
 + dev_addr-dev_type);
 + if (!ret) {
 + id_priv-id.port_num = port;
   goto out;
   }
   }
 +
   list_for_each_entry(cma_dev, dev_list, list) {
   for (port = 1; port = cma_dev-device-phys_port_cnt; ++port) {
   if (listen_id_priv 
   listen_id_priv-cma_dev == cma_dev 
   listen_id_priv-id.port_num == port)
   continue;
 - if (rdma_port_get_link_layer(cma_dev-device, port) == 
 dev_ll) {
 - if 
 (rdma_node_get_transport(cma_dev-device-node_type) == RDMA_TRANSPORT_IB 
 - rdma_port_get_link_layer(cma_dev-device, 
 port) == IB_LINK_LAYER_ETHERNET)
 - ret = 
 ib_find_cached_gid(cma_dev-device, iboe_gid, found_port, NULL);
 - else
 - ret = 
 ib_find_cached_gid(cma_dev-device, gid, found_port, NULL);
 -
 - if (!ret  (port == found_port)) {
 - id_priv-id.port_num = found_port;
 - goto out;
 - }
 +
 +   

Re: [PATCH v3 25/28] IB/Verbs: Use management helper cap_af_ib()

2015-04-16 Thread Hal Rosenstock
On 4/16/2015 9:57 AM, Or Gerlitz wrote:
 On Mon, Apr 13, 2015 at 3:35 PM, Michael Wang yun.w...@profitbricks.com 
 wrote:

 Introduce helper cap_af_ib() to help us check if the port of an
 IB device support Native Infiniband Address.

 Cc: Steve Wise sw...@opengridcomputing.com
 Cc: Tom Talpey t...@talpey.com
 Cc: Jason Gunthorpe jguntho...@obsidianresearch.com
 Cc: Doug Ledford dledf...@redhat.com
 Cc: Ira Weiny ira.we...@intel.com
 Cc: Sean Hefty sean.he...@intel.com
 Signed-off-by: Michael Wang yun.w...@profitbricks.com
 ---
  drivers/infiniband/core/cma.c |  2 +-
  include/rdma/ib_verbs.h   | 15 +++
  2 files changed, 16 insertions(+), 1 deletion(-)

 diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
 index 65e41f4..7f5815d 100644
 --- a/drivers/infiniband/core/cma.c
 +++ b/drivers/infiniband/core/cma.c
 @@ -470,7 +470,7 @@ static int cma_resolve_ib_dev(struct rdma_id_private 
 *id_priv)

 list_for_each_entry(cur_dev, dev_list, list) {
 for (p = 1; p = cur_dev-device-phys_port_cnt; ++p) {
 -   if (!rdma_ib_or_iboe(cur_dev-device, p))
 +   if (!cap_af_ib(cur_dev-device, p))
 continue;

 if (ib_find_cached_pkey(cur_dev-device, p, pkey, 
 index))
 diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
 index 29ddd14..dfe33f3 100644
 --- a/include/rdma/ib_verbs.h
 +++ b/include/rdma/ib_verbs.h
 @@ -1879,6 +1879,21 @@ static inline int cap_ipoib(struct ib_device *device, 
 u8 port_num)
  }

  /**
 + * cap_af_ib - Check if the port of device has the capability
 + * Native Infiniband Address.
 + *
 + * @device: Device to be checked
 + * @port_num: Port number of the device
 + *
 + * Return 0 when port of the device don't support
 + * Native Infiniband Address.
 + */
 +static inline int cap_af_ib(struct ib_device *device, u8 port_num)
 +{
 +   return rdma_ib_or_iboe(device, port_num);
 +}
 
 Sean, can you please put a precise writeup what does it take to
 support AF_IB... I am a bit
 confused here and wasn't sure if this can be supported with RoCE.

I think this means IB GID addressing is checked (Native Infiniband
Address) and not AF_IB (which is socket address/protocol family like
INET and INET6).

I think this naming is confusing and maybe cap_ib_gid is better ?

-- Hal

 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 10/27] IB/Verbs: Reform cm related part in IB-core cma/ucm

2015-04-16 Thread Hal Rosenstock
On 4/16/2015 4:08 AM, Michael Wang wrote:
 
 Use raw management helpers to reform cm related part in IB-core cma/ucm.
 
 These checks focus on the device cm type rather than the port capability,
 directly pass port 1 works currently, but can't support mixing cm type
 device in future.

This is equivalent to today where the checks are per node rather than
per port.

Should all checks here be port 1 based or only certain ones like listen
? For example, in connect/reject/disconnect, don't we already have port
? Guess this can be dealt with later as this is not a regression from
the current implementation.

-- Hal

 
 Cc: Steve Wise sw...@opengridcomputing.com
 Cc: Tom Talpey t...@talpey.com
 Cc: Jason Gunthorpe jguntho...@obsidianresearch.com
 Cc: Doug Ledford dledf...@redhat.com
 Cc: Ira Weiny ira.we...@intel.com
 Cc: Sean Hefty sean.he...@intel.com
 Signed-off-by: Michael Wang yun.w...@profitbricks.com
 ---
  drivers/infiniband/core/cma.c | 81 
 +--
  drivers/infiniband/core/ucm.c |  3 +-
  2 files changed, 26 insertions(+), 58 deletions(-)
 
 diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
 index d570030..6b8a64d 100644
 --- a/drivers/infiniband/core/cma.c
 +++ b/drivers/infiniband/core/cma.c
 @@ -735,8 +735,7 @@ int rdma_init_qp_attr(struct rdma_cm_id *id, struct 
 ib_qp_attr *qp_attr,
   int ret = 0;
  
   id_priv = container_of(id, struct rdma_id_private, id);
 - switch (rdma_node_get_transport(id_priv-id.device-node_type)) {
 - case RDMA_TRANSPORT_IB:
 + if (rdma_ib_or_iboe(id_priv-id.device, 1)) {
   if (!id_priv-cm_id.ib || (id_priv-id.qp_type == IB_QPT_UD))
   ret = cma_ib_init_qp_attr(id_priv, qp_attr, 
 qp_attr_mask);
   else
 @@ -745,19 +744,15 @@ int rdma_init_qp_attr(struct rdma_cm_id *id, struct 
 ib_qp_attr *qp_attr,
  
   if (qp_attr-qp_state == IB_QPS_RTR)
   qp_attr-rq_psn = id_priv-seq_num;
 - break;
 - case RDMA_TRANSPORT_IWARP:
 + } else if (rdma_tech_iwarp(id_priv-id.device, 1)) {
   if (!id_priv-cm_id.iw) {
   qp_attr-qp_access_flags = 0;
   *qp_attr_mask = IB_QP_STATE | IB_QP_ACCESS_FLAGS;
   } else
   ret = iw_cm_init_qp_attr(id_priv-cm_id.iw, qp_attr,
qp_attr_mask);
 - break;
 - default:
 + } else
   ret = -ENOSYS;
 - break;
 - }
  
   return ret;
  }
 @@ -1037,17 +1032,12 @@ void rdma_destroy_id(struct rdma_cm_id *id)
   mutex_unlock(id_priv-handler_mutex);
  
   if (id_priv-cma_dev) {
 - switch (rdma_node_get_transport(id_priv-id.device-node_type)) 
 {
 - case RDMA_TRANSPORT_IB:
 + if (rdma_ib_or_iboe(id_priv-id.device, 1)) {
   if (id_priv-cm_id.ib)
   ib_destroy_cm_id(id_priv-cm_id.ib);
 - break;
 - case RDMA_TRANSPORT_IWARP:
 + } else if (rdma_tech_iwarp(id_priv-id.device, 1)) {
   if (id_priv-cm_id.iw)
   iw_destroy_cm_id(id_priv-cm_id.iw);
 - break;
 - default:
 - break;
   }
   cma_leave_mc_groups(id_priv);
   cma_release_dev(id_priv);
 @@ -1626,7 +1616,7 @@ static void cma_listen_on_dev(struct rdma_id_private 
 *id_priv,
   int ret;
  
   if (cma_family(id_priv) == AF_IB 
 - rdma_node_get_transport(cma_dev-device-node_type) != 
 RDMA_TRANSPORT_IB)
 + !rdma_ib_or_iboe(cma_dev-device, 1))
   return;
  
   id = rdma_create_id(cma_listen_handler, id_priv, id_priv-id.ps,
 @@ -2028,7 +2018,7 @@ static int cma_bind_loopback(struct rdma_id_private 
 *id_priv)
   mutex_lock(lock);
   list_for_each_entry(cur_dev, dev_list, list) {
   if (cma_family(id_priv) == AF_IB 
 - rdma_node_get_transport(cur_dev-device-node_type) != 
 RDMA_TRANSPORT_IB)
 + !rdma_ib_or_iboe(cur_dev-device, 1))
   continue;
  
   if (!cma_dev)
 @@ -2060,7 +2050,7 @@ port_found:
   goto out;
  
   id_priv-id.route.addr.dev_addr.dev_type =
 - (rdma_port_get_link_layer(cma_dev-device, p) == 
 IB_LINK_LAYER_INFINIBAND) ?
 + (rdma_tech_ib(cma_dev-device, p)) ?
   ARPHRD_INFINIBAND : ARPHRD_ETHER;
  
   rdma_addr_set_sgid(id_priv-id.route.addr.dev_addr, gid);
 @@ -2537,18 +2527,15 @@ int rdma_listen(struct rdma_cm_id *id, int backlog)
  
   id_priv-backlog = backlog;
   if (id-device) {
 - switch (rdma_node_get_transport(id-device-node_type)) {
 - case RDMA_TRANSPORT_IB:
 + if (rdma_ib_or_iboe(id-device, 1)) {
   ret = cma_ib_listen(id_priv);
 

Re: [PATCH v4 20/27] IB/Verbs: Use management helper cap_ib_sa()

2015-04-16 Thread Hal Rosenstock
On 4/16/2015 4:12 AM, Michael Wang wrote:
 
 Introduce helper cap_ib_sa() to help us check if the port of an
 IB device support Infiniband Subnet Administrator.

Nit: Administrator - Administration

 
 Cc: Steve Wise sw...@opengridcomputing.com
 Cc: Tom Talpey t...@talpey.com
 Cc: Jason Gunthorpe jguntho...@obsidianresearch.com
 Cc: Doug Ledford dledf...@redhat.com
 Cc: Ira Weiny ira.we...@intel.com
 Cc: Sean Hefty sean.he...@intel.com
 Signed-off-by: Michael Wang yun.w...@profitbricks.com
 ---
  drivers/infiniband/core/cma.c  |  4 ++--
  drivers/infiniband/core/sa_query.c | 10 +-
  drivers/infiniband/core/ucma.c |  2 +-
  include/rdma/ib_verbs.h| 15 +++
  4 files changed, 23 insertions(+), 8 deletions(-)
 
 diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
 index a6f1526..094816f 100644
 --- a/drivers/infiniband/core/cma.c
 +++ b/drivers/infiniband/core/cma.c
 @@ -955,7 +955,7 @@ static inline int cma_user_data_offset(struct 
 rdma_id_private *id_priv)
  
  static void cma_cancel_route(struct rdma_id_private *id_priv)
  {
 - if (rdma_tech_ib(id_priv-id.device, id_priv-id.port_num)) {
 + if (cap_ib_sa(id_priv-id.device, id_priv-id.port_num)) {
   if (id_priv-query)
   ib_sa_cancel_query(id_priv-query_id, id_priv-query);
   }
 @@ -1979,7 +1979,7 @@ int rdma_resolve_route(struct rdma_cm_id *id, int 
 timeout_ms)
   return -EINVAL;
  
   atomic_inc(id_priv-refcount);
 - if (rdma_tech_ib(id-device, id-port_num))
 + if (cap_ib_sa(id-device, id-port_num))
   ret = cma_resolve_ib_route(id_priv, timeout_ms);
   else if (rdma_tech_iboe(id-device, id-port_num))
   ret = cma_resolve_iboe_route(id_priv);
 diff --git a/drivers/infiniband/core/sa_query.c 
 b/drivers/infiniband/core/sa_query.c
 index 803ccf7..fc7e161 100644
 --- a/drivers/infiniband/core/sa_query.c
 +++ b/drivers/infiniband/core/sa_query.c
 @@ -450,7 +450,7 @@ static void ib_sa_event(struct ib_event_handler *handler, 
 struct ib_event *event
   struct ib_sa_port *port =
   sa_dev-port[event-element.port_num - 
 sa_dev-start_port];
  
 - if (WARN_ON(!rdma_tech_ib(handler-device, port-port_num)))
 + if (WARN_ON(!cap_ib_sa(handler-device, port-port_num)))
   return;
  
   spin_lock_irqsave(port-ah_lock, flags);
 @@ -1173,7 +1173,7 @@ static void ib_sa_add_one(struct ib_device *device)
  
   for (i = 0; i = e - s; ++i) {
   spin_lock_init(sa_dev-port[i].ah_lock);
 - if (!rdma_tech_ib(device, i + 1))
 + if (!cap_ib_sa(device, i + 1))
   continue;
  
   sa_dev-port[i].sm_ah= NULL;
 @@ -1210,7 +1210,7 @@ static void ib_sa_add_one(struct ib_device *device)
   goto err;
  
   for (i = 0; i = e - s; ++i) {
 - if (rdma_tech_ib(device, i + 1))
 + if (cap_ib_sa(device, i + 1))
   update_sm_ah(sa_dev-port[i].update_task);
   }
  
 @@ -1218,7 +1218,7 @@ static void ib_sa_add_one(struct ib_device *device)
  
  err:
   while (--i = 0) {
 - if (rdma_tech_ib(device, i + 1))
 + if (cap_ib_sa(device, i + 1))
   ib_unregister_mad_agent(sa_dev-port[i].agent);
   }
  
 @@ -1240,7 +1240,7 @@ static void ib_sa_remove_one(struct ib_device *device)
   flush_workqueue(ib_wq);
  
   for (i = 0; i = sa_dev-end_port - sa_dev-start_port; ++i) {
 - if (rdma_tech_ib(device, i + 1)) {
 + if (cap_ib_sa(device, i + 1)) {
   ib_unregister_mad_agent(sa_dev-port[i].agent);
   if (sa_dev-port[i].sm_ah)
   kref_put(sa_dev-port[i].sm_ah-ref, 
 free_sm_ah);
 diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
 index 7331c6c..bed7957 100644
 --- a/drivers/infiniband/core/ucma.c
 +++ b/drivers/infiniband/core/ucma.c
 @@ -723,7 +723,7 @@ static ssize_t ucma_query_route(struct ucma_file *file,
   resp.node_guid = (__force __u64) ctx-cm_id-device-node_guid;
   resp.port_num = ctx-cm_id-port_num;
  
 - if (rdma_tech_ib(ctx-cm_id-device, ctx-cm_id-port_num))
 + if (cap_ib_sa(ctx-cm_id-device, ctx-cm_id-port_num))
   ucma_copy_ib_route(resp, ctx-cm_id-route);
   else if (rdma_tech_iboe(ctx-cm_id-device, ctx-cm_id-port_num))
   ucma_copy_iboe_route(resp, ctx-cm_id-route);
 diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
 index e4999f6..3bfdf81 100644
 --- a/include/rdma/ib_verbs.h
 +++ b/include/rdma/ib_verbs.h
 @@ -1833,6 +1833,21 @@ static inline int cap_iw_cm(struct ib_device *device, 
 u8 port_num)
   return rdma_tech_iwarp(device, port_num);
  }
  
 +/**
 + * cap_ib_sa - Check if the port of device has the capability Infiniband
 + * Subnet Administrator.

Nit: Administrator - 

Re: [PATCH v4 27/27] IB/Verbs: Cleanup rdma_node_get_transport()

2015-04-16 Thread Hal Rosenstock
On 4/16/2015 4:15 AM, Michael Wang wrote:
 
 We have get rid of all the scene using legacy rdma_node_get_transport(),
 now clean it up.
 
 Cc: Steve Wise sw...@opengridcomputing.com
 Cc: Tom Talpey t...@talpey.com
 Cc: Jason Gunthorpe jguntho...@obsidianresearch.com
 Cc: Doug Ledford dledf...@redhat.com
 Cc: Ira Weiny ira.we...@intel.com
 Cc: Sean Hefty sean.he...@intel.com
 Signed-off-by: Michael Wang yun.w...@profitbricks.com
 ---
  drivers/infiniband/core/verbs.c | 21 -
  include/rdma/ib_verbs.h |  3 ---
  2 files changed, 24 deletions(-)
 
 diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
 index ee4b5cb..bbea0c0 100644
 --- a/drivers/infiniband/core/verbs.c
 +++ b/drivers/infiniband/core/verbs.c
 @@ -107,27 +107,6 @@ __attribute_const__ int ib_rate_to_mbps(enum ib_rate 
 rate)
  }
  EXPORT_SYMBOL(ib_rate_to_mbps);
  
 -__attribute_const__ enum rdma_transport_type
 -rdma_node_get_transport(enum rdma_node_type node_type)
 -{
 - switch (node_type) {
 - case RDMA_NODE_IB_CA:
 - case RDMA_NODE_IB_SWITCH:
 - case RDMA_NODE_IB_ROUTER:
 - return RDMA_TRANSPORT_IB;
 - case RDMA_NODE_RNIC:
 - return RDMA_TRANSPORT_IWARP;
 - case RDMA_NODE_USNIC:
 - return RDMA_TRANSPORT_USNIC;
 - case RDMA_NODE_USNIC_UDP:
 - return RDMA_TRANSPORT_USNIC_UDP;
 - default:
 - BUG();
 - return 0;
 - }
 -}
 -EXPORT_SYMBOL(rdma_node_get_transport);
 -
  enum rdma_link_layer rdma_port_get_link_layer(struct ib_device *device, u8 
 port_num)
  {
   if (device-get_link_layer)
 diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
 index 262bf44..f9ef479 100644
 --- a/include/rdma/ib_verbs.h
 +++ b/include/rdma/ib_verbs.h
 @@ -84,9 +84,6 @@ enum rdma_transport_type {
   RDMA_TRANSPORT_IBOE,
  };
  
 -__attribute_const__ enum rdma_transport_type
 -rdma_node_get_transport(enum rdma_node_type node_type);
 -
  enum rdma_link_layer {
   IB_LINK_LAYER_UNSPECIFIED,

Is IB_LINK_LAYER_UNSPECIFIED still possible ?

   IB_LINK_LAYER_INFINIBAND,
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 10/27] IB/Verbs: Reform cm related part in IB-core cma/ucm

2015-04-16 Thread Hal Rosenstock
On 4/16/2015 11:58 AM, Jason Gunthorpe wrote:
 It also looks like hardwired 1 won't work on switch ports, so it is no-go.

AFAIK enhanced switch port 0 is not supported by CM/RDMA CM in the
current code. There is no need for CM/RDMA CM on base switch port 0.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 01/28] IB/Verbs: Implement new callback query_transport()

2015-04-15 Thread Hal Rosenstock
On 4/13/2015 8:22 AM, Michael Wang wrote:
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index 65994a1..d54f91e 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -75,10 +75,13 @@ enum rdma_node_type {
>  };
>  
>  enum rdma_transport_type {
> + /* legacy for users */
>   RDMA_TRANSPORT_IB,
>   RDMA_TRANSPORT_IWARP,
>   RDMA_TRANSPORT_USNIC,
> - RDMA_TRANSPORT_USNIC_UDP
> + RDMA_TRANSPORT_USNIC_UDP,
> + /* new transport */
> + RDMA_TRANSPORT_IBOE,
>  };
>  
>  __attribute_const__ enum rdma_transport_type
> @@ -1501,6 +1504,8 @@ struct ib_device {
>   int(*query_port)(struct ib_device *device,
>u8 port_num,
>struct ib_port_attr 
> *port_attr);
> + enum rdma_transport_type   (*query_transport)(struct ib_device *device,
> +   u8 port_num);
>   enum rdma_link_layer   (*get_link_layer)(struct ib_device *device,
>u8 port_num);
>   int(*query_gid)(struct ib_device *device,

libibverbs also exposes transport at the device level. Isn't a change to
make transport per port rather than per device needed there as well to
be consistent with these proposed kernel changes ? If so, would the
additional IBoE transport be exposed ? We also need to worry about
backward compatibility for existing applications.

-- Hal
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 10/28] IB/Verbs: Reform cm related part in IB-core cma

2015-04-15 Thread Hal Rosenstock
On 4/13/2015 3:50 PM, Jason Gunthorpe wrote:
> Less clear is how rocee vs ib work within a device... Can you APM
> between those two kinds of ports?

The specs allow this to work but AFAIK it's not implemented.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 01/28] IB/Verbs: Implement new callback query_transport()

2015-04-15 Thread Hal Rosenstock
On 4/13/2015 8:22 AM, Michael Wang wrote:
 diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
 index 65994a1..d54f91e 100644
 --- a/include/rdma/ib_verbs.h
 +++ b/include/rdma/ib_verbs.h
 @@ -75,10 +75,13 @@ enum rdma_node_type {
  };
  
  enum rdma_transport_type {
 + /* legacy for users */
   RDMA_TRANSPORT_IB,
   RDMA_TRANSPORT_IWARP,
   RDMA_TRANSPORT_USNIC,
 - RDMA_TRANSPORT_USNIC_UDP
 + RDMA_TRANSPORT_USNIC_UDP,
 + /* new transport */
 + RDMA_TRANSPORT_IBOE,
  };
  
  __attribute_const__ enum rdma_transport_type
 @@ -1501,6 +1504,8 @@ struct ib_device {
   int(*query_port)(struct ib_device *device,
u8 port_num,
struct ib_port_attr 
 *port_attr);
 + enum rdma_transport_type   (*query_transport)(struct ib_device *device,
 +   u8 port_num);
   enum rdma_link_layer   (*get_link_layer)(struct ib_device *device,
u8 port_num);
   int(*query_gid)(struct ib_device *device,

libibverbs also exposes transport at the device level. Isn't a change to
make transport per port rather than per device needed there as well to
be consistent with these proposed kernel changes ? If so, would the
additional IBoE transport be exposed ? We also need to worry about
backward compatibility for existing applications.

-- Hal
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 10/28] IB/Verbs: Reform cm related part in IB-core cma

2015-04-15 Thread Hal Rosenstock
On 4/13/2015 3:50 PM, Jason Gunthorpe wrote:
 Less clear is how rocee vs ib work within a device... Can you APM
 between those two kinds of ports?

The specs allow this to work but AFAIK it's not implemented.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [ofa-general] InfiniBand/RDMA merge plans for 2.6.25

2008-01-18 Thread Hal Rosenstock
On Fri, 2008-01-18 at 09:42 -0800, Sean Hefty wrote:
> >> Sean Hefty (6):
> >>   IB/mad: Fix incorrect access to items on local_list
> >
> >It wasn't clear to me that this issue was ever really nailed. Was the
> >loop on this closed ?
> 
> The error that this patches addresses is fairly obvious if you inspect the 
> code.

The bug seems obvious but I'm not sure about the fix. Just my $0.02
worth.

-- Hal

> There's a strong chance that the patch fixes the bug that was reported, but 
> the
> last I remember, they had trouble reproducing the crash to see if the patch
> would indeed make it go away.
> 
> - Sean
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.25

2008-01-18 Thread Hal Rosenstock
On Thu, 2008-01-17 at 16:11 -0800, Roland Dreier wrote:
> Here all the patches I already have in my for-2.6.25 branch:
> Sean Hefty (6):
>   IB/mad: Fix incorrect access to items on local_list

It wasn't clear to me that this issue was ever really nailed. Was the
loop on this closed ?

-- Hal

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [ofa-general] InfiniBand/RDMA merge plans for 2.6.25

2008-01-18 Thread Hal Rosenstock
On Fri, 2008-01-18 at 09:42 -0800, Sean Hefty wrote:
  Sean Hefty (6):
IB/mad: Fix incorrect access to items on local_list
 
 It wasn't clear to me that this issue was ever really nailed. Was the
 loop on this closed ?
 
 The error that this patches addresses is fairly obvious if you inspect the 
 code.

The bug seems obvious but I'm not sure about the fix. Just my $0.02
worth.

-- Hal

 There's a strong chance that the patch fixes the bug that was reported, but 
 the
 last I remember, they had trouble reproducing the crash to see if the patch
 would indeed make it go away.
 
 - Sean
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.25

2008-01-18 Thread Hal Rosenstock
On Thu, 2008-01-17 at 16:11 -0800, Roland Dreier wrote:
 Here all the patches I already have in my for-2.6.25 branch:
 Sean Hefty (6):
   IB/mad: Fix incorrect access to items on local_list

It wasn't clear to me that this issue was ever really nailed. Was the
loop on this closed ?

-- Hal

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ofa-general] Further 2.6.23 merge plans...

2007-07-12 Thread Hal Rosenstock
On Thu, 2007-07-12 at 19:15, Roland Dreier wrote:
> As you can see, I just sent my first 2.6.23 pull request for Linus.
> There are still a few more things I plan to do in before the merge
> window closes (in ~10 days):
> 
>  - Write a patch to add P_Key handling to user_mad in the way we
>discussed (add an ioctl to enable P_Key mode without breaking old
>apps) -- I hope to do this tomorrow so we can get some review and
>testing before merging it.

Unfortunately, I'll mostly just be able to review it. Not sure how much
testing I will be able to do but we'll see...

-- Hal

>  - Take a look at Sean's local SA caching patches.  I merged
>everything else from Sean's tree, but I'm still undecided about
>these.  I haven't read them carefully yet, but even aside from that
>I don't have a good feeling about whether there's consensus about
>this yet.  Any opinions about merging, for or against, would be
>appreciated here.
> 
>  - Merge up pending hardware driver changes, including the cxgb3 and
>ehca patches I have in my queue, plus Jack's catastrophic error
>patch for mlx4.
> 
>  - Try to get to resolution on the IPoIB "CM without SRQ" solution.
> 
> Also, if there's something I didn't list and didn't already include in
> the tree I asked Linus to pull, please remind me.  I probably dropped it.
> 
>  - R.
> ___
> general mailing list
> [EMAIL PROTECTED]
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ofa-general] Further 2.6.23 merge plans...

2007-07-12 Thread Hal Rosenstock
On Thu, 2007-07-12 at 19:15, Roland Dreier wrote:
 As you can see, I just sent my first 2.6.23 pull request for Linus.
 There are still a few more things I plan to do in before the merge
 window closes (in ~10 days):
 
  - Write a patch to add P_Key handling to user_mad in the way we
discussed (add an ioctl to enable P_Key mode without breaking old
apps) -- I hope to do this tomorrow so we can get some review and
testing before merging it.

Unfortunately, I'll mostly just be able to review it. Not sure how much
testing I will be able to do but we'll see...

-- Hal

  - Take a look at Sean's local SA caching patches.  I merged
everything else from Sean's tree, but I'm still undecided about
these.  I haven't read them carefully yet, but even aside from that
I don't have a good feeling about whether there's consensus about
this yet.  Any opinions about merging, for or against, would be
appreciated here.
 
  - Merge up pending hardware driver changes, including the cxgb3 and
ehca patches I have in my queue, plus Jack's catastrophic error
patch for mlx4.
 
  - Try to get to resolution on the IPoIB CM without SRQ solution.
 
 Also, if there's something I didn't list and didn't already include in
 the tree I asked Linus to pull, please remind me.  I probably dropped it.
 
  - R.
 ___
 general mailing list
 [EMAIL PROTECTED]
 http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
 
 To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ofa-general] [PATCH] eHCA: Add "Modify Port" verb

2007-04-23 Thread Hal Rosenstock
Hi Joachim,

On Mon, 2007-04-23 at 12:23, Joachim Fenkes wrote:
> Add "Modify Port" verb support to eHCA driver.
> ib_cm needs this to initialize properly.
> 
> 
> Signed-off-by: Joachim Fenkes <[EMAIL PROTECTED]>
> ---
> 
>  ehca_hca.c |   48 ++--
>  hcp_if.c   |   24 
>  hcp_if.h   |4 
>  3 files changed, 74 insertions(+), 2 deletions(-)
> 
> diff -urp a/drivers/infiniband/hw/ehca/ehca_hca.c 
> b/drivers/infiniband/hw/ehca/ehca_hca.c
> --- a/drivers/infiniband/hw/ehca/ehca_hca.c   2007-02-04 19:44:54.0 
> +0100
> +++ b/drivers/infiniband/hw/ehca/ehca_hca.c   2007-04-23 18:09:38.0 
> +0200
> @@ -147,6 +147,7 @@ int ehca_query_port(struct ib_device *ib
>   break;
>   }
>  
> + props->port_cap_flags  = rblock->capability_mask;
>   props->gid_tbl_len = rblock->gid_tbl_len;
>   props->max_msg_sz  = rblock->max_msg_sz;
>   props->bad_pkey_cntr   = rblock->bad_pkey_cntr;
> @@ -233,10 +234,53 @@ query_gid1:
>   return ret;
>  }
>  
> +const u32 allowed_port_caps = (
> + IB_PORT_SM | IB_PORT_LED_INFO_SUP | IB_PORT_CM_SUP |
> + IB_PORT_SNMP_TUNNEL_SUP | IB_PORT_DEVICE_MGMT_SUP |
> + IB_PORT_VENDOR_CLASS_SUP);

I didn't think IB_PORT_SM was allowed (as QP0 is not exposed) or does
this just fail later when it is attempted to be actually set ?

-- Hal

> +
>  int ehca_modify_port(struct ib_device *ibdev,
>u8 port, int port_modify_mask,
>struct ib_port_modify *props)
>  {
> - /* Not implemented yet */
> - return -EFAULT;
> + int ret = 0;
> + struct ehca_shca *shca = container_of(ibdev, struct ehca_shca, 
> ib_device);
> + struct hipz_query_port *rblock;
> + u32 cap;
> + u64 hret;
> +
> + if ((props->set_port_cap_mask | props->clr_port_cap_mask)
> + & ~allowed_port_caps) {
> + ehca_err(>ib_device, "Non-changeable bits set in masks  "
> +  "set=%x  clr=%x  allowed=%x", props->set_port_cap_mask,
> +  props->clr_port_cap_mask, allowed_port_caps);
> + return -EINVAL;
> + }
> +
> + rblock = ehca_alloc_fw_ctrlblock(GFP_KERNEL);
> + if (!rblock) {
> + ehca_err(>ib_device,  "Can't allocate rblock memory.");
> + return -ENOMEM;
> + }
> +
> + if (hipz_h_query_port(shca->ipz_hca_handle, port, rblock) != H_SUCCESS) 
> {
> + ehca_err(>ib_device, "Can't query port properties");
> + ret = -EINVAL;
> + goto modify_port1;
> + }
> +
> + cap = (rblock->capability_mask | props->set_port_cap_mask)
> + & ~props->clr_port_cap_mask;
> +
> + hret = hipz_h_modify_port(shca->ipz_hca_handle, port,
> +   cap, props->init_type, port_modify_mask);
> + if (hret != H_SUCCESS) {
> + ehca_err(>ib_device, "Modify port failed  hret=%lx", 
> hret);
> + ret = -EINVAL;
> + }
> +
> +modify_port1:
> + ehca_free_fw_ctrlblock(rblock);
> +
> + return ret;
>  }
> diff -urp a/drivers/infiniband/hw/ehca/hcp_if.c 
> b/drivers/infiniband/hw/ehca/hcp_if.c
> --- a/drivers/infiniband/hw/ehca/hcp_if.c 2007-02-04 19:44:54.0 
> +0100
> +++ b/drivers/infiniband/hw/ehca/hcp_if.c 2007-04-23 18:06:09.0 
> +0200
> @@ -70,6 +70,10 @@
>  #define H_ALL_RES_QP_SQUEUE_SIZE_PAGES  EHCA_BMASK_IBM(0, 31)
>  #define H_ALL_RES_QP_RQUEUE_SIZE_PAGES  EHCA_BMASK_IBM(32, 63)
>  
> +#define H_MP_INIT_TYPE  EHCA_BMASK_IBM(44, 47)
> +#define H_MP_SHUTDOWN   EHCA_BMASK_IBM(48, 48)
> +#define H_MP_RESET_QKEY_CTR EHCA_BMASK_IBM(49, 49)
> +
>  /* direct access qp controls */
>  #define DAQP_CTRL_ENABLE0x01
>  #define DAQP_CTRL_SEND_COMP 0x20
> @@ -364,6 +368,26 @@ u64 hipz_h_query_port(const struct ipz_a
>   return ret;
>  }
>  
> +u64 hipz_h_modify_port(const struct ipz_adapter_handle adapter_handle,
> +const u8 port_id, const u32 port_cap,
> +const u8 init_type, const int modify_mask)
> +{
> + u64 port_attributes = port_cap;
> +
> + if (modify_mask & IB_PORT_SHUTDOWN)
> + port_attributes |= EHCA_BMASK_SET(H_MP_SHUTDOWN, 1);
> + if (modify_mask & IB_PORT_INIT_TYPE)
> + port_attributes |= EHCA_BMASK_SET(H_MP_INIT_TYPE, init_type);
> + if (modify_mask & IB_PORT_RESET_QKEY_CNTR)
> + port_attributes |= EHCA_BMASK_SET(H_MP_RESET_QKEY_CTR, 1);
> +
> + return ehca_plpar_hcall_norets(H_MODIFY_PORT,
> +adapter_handle.handle, /* r4 */
> +port_id,   /* r5 */
> +port_attributes,   /* r6 */
> +0, 0, 0, 0);
> +}
> +
>  u64 hipz_h_query_hca(const struct ipz_adapter_handle adapter_handle,
>struct hipz_query_hca 

Re: [ofa-general] [PATCH] eHCA: Add Modify Port verb

2007-04-23 Thread Hal Rosenstock
Hi Joachim,

On Mon, 2007-04-23 at 12:23, Joachim Fenkes wrote:
 Add Modify Port verb support to eHCA driver.
 ib_cm needs this to initialize properly.
 
 
 Signed-off-by: Joachim Fenkes [EMAIL PROTECTED]
 ---
 
  ehca_hca.c |   48 ++--
  hcp_if.c   |   24 
  hcp_if.h   |4 
  3 files changed, 74 insertions(+), 2 deletions(-)
 
 diff -urp a/drivers/infiniband/hw/ehca/ehca_hca.c 
 b/drivers/infiniband/hw/ehca/ehca_hca.c
 --- a/drivers/infiniband/hw/ehca/ehca_hca.c   2007-02-04 19:44:54.0 
 +0100
 +++ b/drivers/infiniband/hw/ehca/ehca_hca.c   2007-04-23 18:09:38.0 
 +0200
 @@ -147,6 +147,7 @@ int ehca_query_port(struct ib_device *ib
   break;
   }
  
 + props-port_cap_flags  = rblock-capability_mask;
   props-gid_tbl_len = rblock-gid_tbl_len;
   props-max_msg_sz  = rblock-max_msg_sz;
   props-bad_pkey_cntr   = rblock-bad_pkey_cntr;
 @@ -233,10 +234,53 @@ query_gid1:
   return ret;
  }
  
 +const u32 allowed_port_caps = (
 + IB_PORT_SM | IB_PORT_LED_INFO_SUP | IB_PORT_CM_SUP |
 + IB_PORT_SNMP_TUNNEL_SUP | IB_PORT_DEVICE_MGMT_SUP |
 + IB_PORT_VENDOR_CLASS_SUP);

I didn't think IB_PORT_SM was allowed (as QP0 is not exposed) or does
this just fail later when it is attempted to be actually set ?

-- Hal

 +
  int ehca_modify_port(struct ib_device *ibdev,
u8 port, int port_modify_mask,
struct ib_port_modify *props)
  {
 - /* Not implemented yet */
 - return -EFAULT;
 + int ret = 0;
 + struct ehca_shca *shca = container_of(ibdev, struct ehca_shca, 
 ib_device);
 + struct hipz_query_port *rblock;
 + u32 cap;
 + u64 hret;
 +
 + if ((props-set_port_cap_mask | props-clr_port_cap_mask)
 +  ~allowed_port_caps) {
 + ehca_err(shca-ib_device, Non-changeable bits set in masks  
 +  set=%x  clr=%x  allowed=%x, props-set_port_cap_mask,
 +  props-clr_port_cap_mask, allowed_port_caps);
 + return -EINVAL;
 + }
 +
 + rblock = ehca_alloc_fw_ctrlblock(GFP_KERNEL);
 + if (!rblock) {
 + ehca_err(shca-ib_device,  Can't allocate rblock memory.);
 + return -ENOMEM;
 + }
 +
 + if (hipz_h_query_port(shca-ipz_hca_handle, port, rblock) != H_SUCCESS) 
 {
 + ehca_err(shca-ib_device, Can't query port properties);
 + ret = -EINVAL;
 + goto modify_port1;
 + }
 +
 + cap = (rblock-capability_mask | props-set_port_cap_mask)
 +  ~props-clr_port_cap_mask;
 +
 + hret = hipz_h_modify_port(shca-ipz_hca_handle, port,
 +   cap, props-init_type, port_modify_mask);
 + if (hret != H_SUCCESS) {
 + ehca_err(shca-ib_device, Modify port failed  hret=%lx, 
 hret);
 + ret = -EINVAL;
 + }
 +
 +modify_port1:
 + ehca_free_fw_ctrlblock(rblock);
 +
 + return ret;
  }
 diff -urp a/drivers/infiniband/hw/ehca/hcp_if.c 
 b/drivers/infiniband/hw/ehca/hcp_if.c
 --- a/drivers/infiniband/hw/ehca/hcp_if.c 2007-02-04 19:44:54.0 
 +0100
 +++ b/drivers/infiniband/hw/ehca/hcp_if.c 2007-04-23 18:06:09.0 
 +0200
 @@ -70,6 +70,10 @@
  #define H_ALL_RES_QP_SQUEUE_SIZE_PAGES  EHCA_BMASK_IBM(0, 31)
  #define H_ALL_RES_QP_RQUEUE_SIZE_PAGES  EHCA_BMASK_IBM(32, 63)
  
 +#define H_MP_INIT_TYPE  EHCA_BMASK_IBM(44, 47)
 +#define H_MP_SHUTDOWN   EHCA_BMASK_IBM(48, 48)
 +#define H_MP_RESET_QKEY_CTR EHCA_BMASK_IBM(49, 49)
 +
  /* direct access qp controls */
  #define DAQP_CTRL_ENABLE0x01
  #define DAQP_CTRL_SEND_COMP 0x20
 @@ -364,6 +368,26 @@ u64 hipz_h_query_port(const struct ipz_a
   return ret;
  }
  
 +u64 hipz_h_modify_port(const struct ipz_adapter_handle adapter_handle,
 +const u8 port_id, const u32 port_cap,
 +const u8 init_type, const int modify_mask)
 +{
 + u64 port_attributes = port_cap;
 +
 + if (modify_mask  IB_PORT_SHUTDOWN)
 + port_attributes |= EHCA_BMASK_SET(H_MP_SHUTDOWN, 1);
 + if (modify_mask  IB_PORT_INIT_TYPE)
 + port_attributes |= EHCA_BMASK_SET(H_MP_INIT_TYPE, init_type);
 + if (modify_mask  IB_PORT_RESET_QKEY_CNTR)
 + port_attributes |= EHCA_BMASK_SET(H_MP_RESET_QKEY_CTR, 1);
 +
 + return ehca_plpar_hcall_norets(H_MODIFY_PORT,
 +adapter_handle.handle, /* r4 */
 +port_id,   /* r5 */
 +port_attributes,   /* r6 */
 +0, 0, 0, 0);
 +}
 +
  u64 hipz_h_query_hca(const struct ipz_adapter_handle adapter_handle,
struct hipz_query_hca *query_hca_rblock)
  {
 diff -urp a/drivers/infiniband/hw/ehca/hcp_if.h 
 b/drivers/infiniband/hw/ehca/hcp_if.h
 --- 

Re: [PATCH 0/29v2] InfiniBand core update

2005-07-12 Thread Hal Rosenstock
On Tue, 2005-07-12 at 18:38, Andrew Morton wrote:
> OK, well the timing of a merge is mainly up to you guys, especially as the
> subsystem is pretty raw and you're the only people who use it ;)
> 
> Two things from a quick scan:
> 
> a) In many places the patch does
> 
>   if (p)
>   kfree(p);
> 
>But kfree(0) is permitted.  The cleanup police will be after you at
>some stage - it'd be best to fix those things up immediately.

I'll/We'll work on eradicating these and pushing a patch for this
upstream.

> b) The patch exports a ton of symbols to non-GPL modules:
> 
> +EXPORT_SYMBOL(ib_create_ah_from_wc);
> +EXPORT_SYMBOL(ib_modify_mad);
> +EXPORT_SYMBOL(ib_create_send_mad);
> +EXPORT_SYMBOL(ib_free_send_mad);
> +EXPORT_SYMBOL(ib_coalesce_recv_mad);
> +EXPORT_SYMBOL(ib_sa_service_rec_query);
> +EXPORT_SYMBOL(ib_create_cm_id);
> +EXPORT_SYMBOL(ib_destroy_cm_id);
> +EXPORT_SYMBOL(ib_cm_listen);
> +EXPORT_SYMBOL(ib_send_cm_req);
> +EXPORT_SYMBOL(ib_send_cm_rep);
> +EXPORT_SYMBOL(ib_send_cm_rtu);
> +EXPORT_SYMBOL(ib_send_cm_dreq);
> +EXPORT_SYMBOL(ib_send_cm_drep);
> +EXPORT_SYMBOL(ib_send_cm_rej);
> +EXPORT_SYMBOL(ib_send_cm_mra);
> +EXPORT_SYMBOL(ib_send_cm_lap);
> +EXPORT_SYMBOL(ib_send_cm_apr);
> +EXPORT_SYMBOL(ib_send_cm_sidr_req);
> +EXPORT_SYMBOL(ib_send_cm_sidr_rep);
> +EXPORT_SYMBOL(ib_cm_establish);
> +EXPORT_SYMBOL(ib_cm_init_qp_attr);
> 
>Why?

I think that is because OpenIB has a dual GPL/BSD license.

-- Hal

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/29v2] InfiniBand core update

2005-07-12 Thread Hal Rosenstock
On Mon, 2005-07-11 at 23:11, Andrew Morton wrote:
> Well I was asking.  Do you guys think that this material is appropriate to
> and safe enough for 2.6.13?

I used your versions of the patches (Tom's ucm one is needed and you
added that). I also back ported the trailing whitespace elimination
changes.

I just completed a regression test including uCM, CM, RMPP, OpenSM,
IPoIB and it looks good to me.

I'll check this again when the next -mm comes out.

Thanks.

-- Hal

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/29v2] InfiniBand core update

2005-07-12 Thread Hal Rosenstock
On Mon, 2005-07-11 at 23:11, Andrew Morton wrote:
> Well I was asking.  Do you guys think that this material is appropriate to
> and safe enough for 2.6.13?

We think so.

> What are "user CM" and "kernel CM"?

CM is the InfiniBand Communications Manager. It is (primarily)
responsible for setting up and managing connections between endpoints. 
Connections. IB communication occurs over the queue pairs (QPs) set up
by the CM protocol.

kernel CM is CM functionality in the kernel. user CM is to enable
userspace applications to utilize this functionality.

-- Hal

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/29v2] InfiniBand core update

2005-07-12 Thread Hal Rosenstock
On Mon, 2005-07-11 at 23:12, David S. Miller wrote:
> Please acknowledge that you understand how inappropriate
> and problem causing your huge patch bomb was today to this
> mailing list.
> 
> It is nearly 8 hours later, and vger.kernel.org is still
> trying mightily to spit out all of your patches to the
> 5000+ people subscribed to linux-kernel.
> 
> There is about ~6 hour posting latency as a result of all
> of this.

I understand the problem it caused. The MAD layer of InfiniBand was 
quite a bit behind and we wanted to catch up. I tried to make them in
smaller manageable pieces and that was why there were so many patches in
the series. In the future, I will endeavor to make the changes smaller.
Sorry for the problems this caused.

-- Hal

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/29v2] InfiniBand core update

2005-07-12 Thread Hal Rosenstock
On Mon, 2005-07-11 at 23:11, Andrew Morton wrote:
 Well I was asking.  Do you guys think that this material is appropriate to
 and safe enough for 2.6.13?

I used your versions of the patches (Tom's ucm one is needed and you
added that). I also back ported the trailing whitespace elimination
changes.

I just completed a regression test including uCM, CM, RMPP, OpenSM,
IPoIB and it looks good to me.

I'll check this again when the next -mm comes out.

Thanks.

-- Hal

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/29v2] InfiniBand core update

2005-07-12 Thread Hal Rosenstock
On Tue, 2005-07-12 at 18:38, Andrew Morton wrote:
 OK, well the timing of a merge is mainly up to you guys, especially as the
 subsystem is pretty raw and you're the only people who use it ;)
 
 Two things from a quick scan:
 
 a) In many places the patch does
 
   if (p)
   kfree(p);
 
But kfree(0) is permitted.  The cleanup police will be after you at
some stage - it'd be best to fix those things up immediately.

I'll/We'll work on eradicating these and pushing a patch for this
upstream.

 b) The patch exports a ton of symbols to non-GPL modules:
 
 +EXPORT_SYMBOL(ib_create_ah_from_wc);
 +EXPORT_SYMBOL(ib_modify_mad);
 +EXPORT_SYMBOL(ib_create_send_mad);
 +EXPORT_SYMBOL(ib_free_send_mad);
 +EXPORT_SYMBOL(ib_coalesce_recv_mad);
 +EXPORT_SYMBOL(ib_sa_service_rec_query);
 +EXPORT_SYMBOL(ib_create_cm_id);
 +EXPORT_SYMBOL(ib_destroy_cm_id);
 +EXPORT_SYMBOL(ib_cm_listen);
 +EXPORT_SYMBOL(ib_send_cm_req);
 +EXPORT_SYMBOL(ib_send_cm_rep);
 +EXPORT_SYMBOL(ib_send_cm_rtu);
 +EXPORT_SYMBOL(ib_send_cm_dreq);
 +EXPORT_SYMBOL(ib_send_cm_drep);
 +EXPORT_SYMBOL(ib_send_cm_rej);
 +EXPORT_SYMBOL(ib_send_cm_mra);
 +EXPORT_SYMBOL(ib_send_cm_lap);
 +EXPORT_SYMBOL(ib_send_cm_apr);
 +EXPORT_SYMBOL(ib_send_cm_sidr_req);
 +EXPORT_SYMBOL(ib_send_cm_sidr_rep);
 +EXPORT_SYMBOL(ib_cm_establish);
 +EXPORT_SYMBOL(ib_cm_init_qp_attr);
 
Why?

I think that is because OpenIB has a dual GPL/BSD license.

-- Hal

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/29v2] InfiniBand core update

2005-07-12 Thread Hal Rosenstock
On Mon, 2005-07-11 at 23:12, David S. Miller wrote:
 Please acknowledge that you understand how inappropriate
 and problem causing your huge patch bomb was today to this
 mailing list.
 
 It is nearly 8 hours later, and vger.kernel.org is still
 trying mightily to spit out all of your patches to the
 5000+ people subscribed to linux-kernel.
 
 There is about ~6 hour posting latency as a result of all
 of this.

I understand the problem it caused. The MAD layer of InfiniBand was 
quite a bit behind and we wanted to catch up. I tried to make them in
smaller manageable pieces and that was why there were so many patches in
the series. In the future, I will endeavor to make the changes smaller.
Sorry for the problems this caused.

-- Hal

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/29v2] InfiniBand core update

2005-07-12 Thread Hal Rosenstock
On Mon, 2005-07-11 at 23:11, Andrew Morton wrote:
 Well I was asking.  Do you guys think that this material is appropriate to
 and safe enough for 2.6.13?

We think so.

 What are user CM and kernel CM?

CM is the InfiniBand Communications Manager. It is (primarily)
responsible for setting up and managing connections between endpoints. 
Connections. IB communication occurs over the queue pairs (QPs) set up
by the CM protocol.

kernel CM is CM functionality in the kernel. user CM is to enable
userspace applications to utilize this functionality.

-- Hal

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/29v2] InfiniBand core update

2005-07-11 Thread Hal Rosenstock
On Mon, 2005-07-11 at 20:05, Andrew Morton wrote:
> Hal Rosenstock <[EMAIL PROTECTED]> wrote:
> >
> > This is version 2 of a patch series to get the Infiniband core up to
> > date.
> 
> Well that was interesting.
> 
> - All the patches had mangled headers:
> 
> -- linux-2.6.13-rc2-mm1-16/...
> +++ linux-2.6.13-rc2-mm1-17/...
> 
>   instead of
> 
> --- linux-2.6.13-rc2-mm1-16/...
> +++ linux-2.6.13-rc2-mm1-17/...

Not sure how that happened.

>   Which I fixed up.

Thanks.

> - The second patch didn't apply.  I fixed that too.

Not sure how this was broken. Thanks for fixing this.

> - The patches add lots of trailing whitespace.  I habitually trim that
>   off, figuring that any downstream merging hassle which that causes is
>   just punishment ;)

I'll work on merging this back downstream :-(

> Please check that it all landed OK in the next -mm.

Will do.

> I've hung on to Tom Duffy's patch pending a test compilation.
> 
> I'll tentatively consider this material to be not-for-2.6.13?

Presuming that "this material" refers to the patch to add the kernel CM
implementation, if kernel CM does not make 2.6.13, then user CM should
not either as it is dependent on it.

-- Hal

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/27] Combine some MAD routines

2005-07-11 Thread Hal Rosenstock
Combine response_mad() and solicited_mad() routines into a single
function and simplify/encapsulate its usage.

Signed-off-by: Sean Hefty <[EMAIL PROTECTED]>
Signed-off-by: Hal Rosenstock <[EMAIL PROTECTED]>

This patch depends on patch 3/27.

-- 
 mad.c |  105 --
 1 files changed, 27 insertions(+), 78 deletions(-)
diff -uprN linux-2.6.13-rc2-mm1/drivers/infiniband3/core/mad.c 
linux-2.6.13-rc2-mm1/drivers/infiniband4/core/mad.c
-- linux-2.6.13-rc2-mm1/drivers/infiniband3/core/mad.c  2005-07-09 
14:31:49.0 -0400
+++ linux-2.6.13-rc2-mm1/drivers/infiniband4/core/mad.c 2005-07-09 
15:06:29.0 -0400
@@ -58,7 +58,7 @@ static int method_in_use(struct ib_mad_m
 static void remove_mad_reg_req(struct ib_mad_agent_private *priv);
 static struct ib_mad_agent_private *find_mad_agent(
struct ib_mad_port_private *port_priv,
-   struct ib_mad *mad, int solicited);
+   struct ib_mad *mad);
 static int ib_mad_post_receive_mads(struct ib_mad_qp_info *qp_info,
struct ib_mad_private *mad);
 static void cancel_mads(struct ib_mad_agent_private *mad_agent_priv);
@@ -67,7 +67,6 @@ static void ib_mad_complete_send_wr(stru
 static void timeout_sends(void *data);
 static void cancel_sends(void *data);
 static void local_completions(void *data);
-static int solicited_mad(struct ib_mad *mad);
 static int add_nonoui_reg_req(struct ib_mad_reg_req *mad_reg_req,
  struct ib_mad_agent_private *agent_priv,
  u8 mgmt_class);
@@ -558,6 +557,13 @@ int ib_unregister_mad_agent(struct ib_ma
 }
 EXPORT_SYMBOL(ib_unregister_mad_agent);
 
+static inline int response_mad(struct ib_mad *mad)
+{
+   /* Trap represses are responses although response bit is reset */
+   return ((mad->mad_hdr.method == IB_MGMT_METHOD_TRAP_REPRESS) ||
+   (mad->mad_hdr.method & IB_MGMT_METHOD_RESP));
+}
+
 static void dequeue_mad(struct ib_mad_list_head *mad_list)
 {
struct ib_mad_queue *mad_queue;
@@ -650,7 +656,7 @@ static int handle_outgoing_dr_smp(struct
  struct ib_smp *smp,
  struct ib_send_wr *send_wr)
 {
-   int ret, solicited;
+   int ret;
unsigned long flags;
struct ib_mad_local_private *local;
struct ib_mad_private *mad_priv;
@@ -696,11 +702,7 @@ static int handle_outgoing_dr_smp(struct
switch (ret)
{
case IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_REPLY:
-   /*
-* See if response is solicited and
-* there is a recv handler
-*/
-   if (solicited_mad(_priv->mad.mad) &&
+   if (response_mad(_priv->mad.mad) &&
mad_agent_priv->agent.recv_handler) {
local->mad_priv = mad_priv;
local->recv_mad_agent = mad_agent_priv;
@@ -717,15 +719,13 @@ static int handle_outgoing_dr_smp(struct
break;
case IB_MAD_RESULT_SUCCESS:
/* Treat like an incoming receive MAD */
-   solicited = solicited_mad(_priv->mad.mad);
port_priv = ib_get_mad_port(mad_agent_priv->agent.device,
mad_agent_priv->agent.port_num);
if (port_priv) {
mad_priv->mad.mad.mad_hdr.tid =
((struct ib_mad *)smp)->mad_hdr.tid;
recv_mad_agent = find_mad_agent(port_priv,
-  _priv->mad.mad,
-   solicited);
+   _priv->mad.mad);
}
if (!port_priv || !recv_mad_agent) {
kmem_cache_free(ib_mad_cache, mad_priv);
@@ -1421,42 +1421,15 @@ out:
return;
 }
 
-static int response_mad(struct ib_mad *mad)
-{
-   /* Trap represses are responses although response bit is reset */
-   return ((mad->mad_hdr.method == IB_MGMT_METHOD_TRAP_REPRESS) ||
-   (mad->mad_hdr.method & IB_MGMT_METHOD_RESP));
-}
-
-static int solicited_mad(struct ib_mad *mad)
-{
-   /* CM MADs are never solicited */
-   if (mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_CM) {
-   return 0;
-   }
-
-   /* XXX: Determine whether MAD is using RMPP */
-
-   /* Not using RMPP */
-   /* Is this MAD a response to a previous MAD ? */
-   return response_mad(mad);
-}
-
 static struct ib_mad_agent_private *
 find_mad_agent(struct ib_mad_port_private *port_priv,
-  struct ib_mad *mad,
-  int solicited)
+  struct ib_mad *mad)
 {
struct ib_mad_agent_private *mad

PATCH [6/27] Change ib_mad_send_wr_private struct

2005-07-11 Thread Hal Rosenstock
Have ib_mad_send_wr_private reference the private agent structure
directly, rather than the exposed agent definition.  Remove unneeded
parameters to functions and simplify code were possible from this
change.

Signed-off-by: Sean Hefty <[EMAIL PROTECTED]>
Signed-off-by: Hal Rosenstock <[EMAIL PROTECTED]>

This patch depends on patch 5/27.

-- 
 mad.c  |   22 ++--
 mad_priv.h |4 ++--
 2 files changed, 12 insertions(+), 14 deletions(-)
diff -uprN linux-2.6.13-rc2-mm1/drivers/infiniband5/core/mad.c 
linux-2.6.13-rc2-mm1/drivers/infiniband6/core/mad.c
-- linux-2.6.13-rc2-mm1/drivers/infiniband5/core/mad.c  2005-07-09 
15:08:31.0 -0400
+++ linux-2.6.13-rc2-mm1/drivers/infiniband6/core/mad.c 2005-07-09 
15:12:32.0 -0400
@@ -839,8 +839,7 @@ void ib_free_send_mad(struct ib_mad_send
 }
 EXPORT_SYMBOL(ib_free_send_mad);
 
-static int ib_send_mad(struct ib_mad_agent_private *mad_agent_priv,
-  struct ib_mad_send_wr_private *mad_send_wr)
+static int ib_send_mad(struct ib_mad_send_wr_private *mad_send_wr)
 {
struct ib_mad_qp_info *qp_info;
struct ib_send_wr *bad_send_wr;
@@ -848,7 +847,7 @@ static int ib_send_mad(struct ib_mad_age
int ret;
 
/* Set WR ID to find mad_send_wr upon completion */
-   qp_info = mad_agent_priv->qp_info;
+   qp_info = mad_send_wr->mad_agent_priv->qp_info;
mad_send_wr->send_wr.wr_id = (unsigned long)_send_wr->mad_list;
mad_send_wr->mad_list.mad_queue = _info->send_queue;
 
@@ -857,7 +856,7 @@ static int ib_send_mad(struct ib_mad_age
list_add_tail(_send_wr->mad_list.list,
  _info->send_queue.list);
spin_unlock_irqrestore(_info->send_queue.lock, flags);
-   ret = ib_post_send(mad_agent_priv->agent.qp,
+   ret = ib_post_send(mad_send_wr->mad_agent_priv->agent.qp,
   _send_wr->send_wr, _send_wr);
if (ret) {
printk(KERN_ERR PFX "ib_post_send failed: %d\n", ret);
@@ -950,7 +949,7 @@ int ib_post_send_mad(struct ib_mad_agent
mad_send_wr->wr_id = mad_send_wr->send_wr.wr_id;
mad_send_wr->send_wr.next = NULL;
mad_send_wr->tid = send_wr->wr.ud.mad_hdr->tid;
-   mad_send_wr->agent = mad_agent;
+   mad_send_wr->mad_agent_priv = mad_agent_priv;
/* Timeout will be updated after send completes */
mad_send_wr->timeout = msecs_to_jiffies(send_wr->wr.
ud.timeout_ms);
@@ -966,7 +965,7 @@ int ib_post_send_mad(struct ib_mad_agent
  _agent_priv->send_list);
spin_unlock_irqrestore(_agent_priv->lock, flags);
 
-   ret = ib_send_mad(mad_agent_priv, mad_send_wr);
+   ret = ib_send_mad(mad_send_wr);
if (ret) {
/* Fail send request */
spin_lock_irqsave(_agent_priv->lock, flags);
@@ -1742,13 +1741,14 @@ static void adjust_timeout(struct ib_mad
}
 }
 
-static void wait_for_response(struct ib_mad_agent_private *mad_agent_priv,
- struct ib_mad_send_wr_private *mad_send_wr )
+static void wait_for_response(struct ib_mad_send_wr_private *mad_send_wr)
 {
+   struct ib_mad_agent_private *mad_agent_priv;
struct ib_mad_send_wr_private *temp_mad_send_wr;
struct list_head *list_item;
unsigned long delay;
 
+   mad_agent_priv = mad_send_wr->mad_agent_priv;
list_del(_send_wr->agent_list);
 
delay = mad_send_wr->timeout;
@@ -1781,9 +1781,7 @@ static void ib_mad_complete_send_wr(stru
struct ib_mad_agent_private *mad_agent_priv;
unsigned long   flags;
 
-   mad_agent_priv = container_of(mad_send_wr->agent,
- struct ib_mad_agent_private, agent);
-
+   mad_agent_priv = mad_send_wr->mad_agent_priv;
spin_lock_irqsave(_agent_priv->lock, flags);
if (mad_send_wc->status != IB_WC_SUCCESS &&
mad_send_wr->status == IB_WC_SUCCESS) {
@@ -1794,7 +1792,7 @@ static void ib_mad_complete_send_wr(stru
if (--mad_send_wr->refcount > 0) {
if (mad_send_wr->refcount == 1 && mad_send_wr->timeout &&
mad_send_wr->status == IB_WC_SUCCESS) {
-   wait_for_response(mad_agent_priv, mad_send_wr);
+   wait_for_response(mad_send_wr);
}
spin_unlock_irqrestore(_agent_priv->lock, flags);
return;
diff -uprN linux-2.6.13-rc2-mm1/drivers/infiniband5/core/mad_priv.h 
linux-2.6.13-rc2-mm1/drivers/infiniband6/core/mad_priv.h
-- linux-2.6.13-rc2-mm1/dr

[PATCH 8/27] Minor cleanup during MAD startup and shutdown

2005-07-11 Thread Hal Rosenstock
Minor cleanup during startup and shutdown

Signed-off-by: Hal Rosenstock <[EMAIL PROTECTED]>

This patch depends on patch 7/27.

-- 
 mad.c |   44 +--
 1 files changed, 9 insertions(+), 35 deletions(-)
diff -uprN linux-2.6.13-rc2-mm1/drivers/infiniband7/core/mad.c 
linux-2.6.13-rc2-mm1/drivers/infiniband8/core/mad.c
-- linux-2.6.13-rc2-mm1/drivers/infiniband7/core/mad.c  2005-07-09 
16:48:45.0 -0400
+++ linux-2.6.13-rc2-mm1/drivers/infiniband8/core/mad.c 2005-07-09 
16:51:21.0 -0400
@@ -2487,14 +2487,6 @@ static int ib_mad_port_open(struct ib_de
unsigned long flags;
char name[sizeof "ib_mad123"];
 
-   /* First, check if port already open at MAD layer */
-   port_priv = ib_get_mad_port(device, port_num);
-   if (port_priv) {
-   printk(KERN_DEBUG PFX "%s port %d already open\n",
-  device->name, port_num);
-   return 0;
-   }
-
/* Create new device info */
port_priv = kmalloc(sizeof *port_priv, GFP_KERNEL);
if (!port_priv) {
@@ -2619,7 +2611,7 @@ static int ib_mad_port_close(struct ib_d
 
 static void ib_mad_init_device(struct ib_device *device)
 {
-   int ret, num_ports, cur_port, i, ret2;
+   int num_ports, cur_port, i;
 
if (device->node_type == IB_NODE_SWITCH) {
num_ports = 1;
@@ -2629,47 +2621,37 @@ static void ib_mad_init_device(struct ib
cur_port = 1;
}
for (i = 0; i < num_ports; i++, cur_port++) {
-   ret = ib_mad_port_open(device, cur_port);
-   if (ret) {
+   if (ib_mad_port_open(device, cur_port)) {
printk(KERN_ERR PFX "Couldn't open %s port %d\n",
   device->name, cur_port);
goto error_device_open;
}
-   ret = ib_agent_port_open(device, cur_port);
-   if (ret) {
+   if (ib_agent_port_open(device, cur_port)) {
printk(KERN_ERR PFX "Couldn't open %s port %d "
   "for agents\n",
   device->name, cur_port);
goto error_device_open;
}
}
-
-   goto error_device_query;
+   return;
 
 error_device_open:
while (i > 0) {
cur_port--;
-   ret2 = ib_agent_port_close(device, cur_port);
-   if (ret2) {
+   if (ib_agent_port_close(device, cur_port))
printk(KERN_ERR PFX "Couldn't close %s port %d "
   "for agents\n",
   device->name, cur_port);
-   }
-   ret2 = ib_mad_port_close(device, cur_port);
-   if (ret2) {
+   if (ib_mad_port_close(device, cur_port))
printk(KERN_ERR PFX "Couldn't close %s port %d\n",
   device->name, cur_port);
-   }
i--;
}
-
-error_device_query:
-   return;
 }
 
 static void ib_mad_remove_device(struct ib_device *device)
 {
-   int ret = 0, i, num_ports, cur_port, ret2;
+   int i, num_ports, cur_port;
 
if (device->node_type == IB_NODE_SWITCH) {
num_ports = 1;
@@ -2679,21 +2661,13 @@ static void ib_mad_remove_device(struct 
cur_port = 1;
}
for (i = 0; i < num_ports; i++, cur_port++) {
-   ret2 = ib_agent_port_close(device, cur_port);
-   if (ret2) {
+   if (ib_agent_port_close(device, cur_port))
printk(KERN_ERR PFX "Couldn't close %s port %d "
   "for agents\n",
   device->name, cur_port);
-   if (!ret)
-   ret = ret2;
-   }
-   ret2 = ib_mad_port_close(device, cur_port);
-   if (ret2) {
+   if (ib_mad_port_close(device, cur_port))
printk(KERN_ERR PFX "Couldn't close %s port %d\n",
   device->name, cur_port);
-   if (!ret)
-   ret = ret2;
-   }
}
 }
 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 10/27] Add automatic retries to MAD layer

2005-07-11 Thread Hal Rosenstock
Add automatic retries to MAD layer. 
Signed-off-by: Sean Hefty <[EMAIL PROTECTED]>
Signed-off-by: Hal Rosenstock <[EMAIL PROTECTED]>

This patch depends on patch 9/27.

-- 
 core/mad.c |   26 +-
 core/mad_priv.h|2 ++
 core/sa_query.c|3 ++-
 core/user_mad.c|1 +
 include/ib_verbs.h |1 +
 5 files changed, 31 insertions(+), 2 deletions(-)
diff -uprN linux-2.6.13-rc2-mm1/drivers/infiniband9/core/mad.c 
linux-2.6.13-rc2-mm1/drivers/infiniband10/core/mad.c
-- linux-2.6.13-rc2-mm1/drivers/infiniband9/core/mad.c  2005-07-09 
17:25:36.0 -0400
+++ linux-2.6.13-rc2-mm1/drivers/infiniband10/core/mad.c2005-07-09 
17:25:28.0 -0400
@@ -954,7 +954,7 @@ int ib_post_send_mad(struct ib_mad_agent
/* Timeout will be updated after send completes */
mad_send_wr->timeout = msecs_to_jiffies(send_wr->wr.
ud.timeout_ms);
-   mad_send_wr->retry = 0;
+   mad_send_wr->retries = mad_send_wr->send_wr.wr.ud.retries;
/* One reference for each work request to QP + response */
mad_send_wr->refcount = 1 + (mad_send_wr->timeout > 0);
mad_send_wr->status = IB_WC_SUCCESS;
@@ -2174,6 +2174,27 @@ local_send_completion:
spin_unlock_irqrestore(_agent_priv->lock, flags);
 }
 
+static int retry_send(struct ib_mad_send_wr_private *mad_send_wr)
+{
+   int ret;
+
+   if (!mad_send_wr->retries--)
+   return -ETIMEDOUT;
+
+   mad_send_wr->timeout = msecs_to_jiffies(mad_send_wr->send_wr.
+   wr.ud.timeout_ms);
+
+   ret = ib_send_mad(mad_send_wr);
+
+   if (!ret) {
+   mad_send_wr->refcount++;
+   list_del(_send_wr->agent_list);
+   list_add_tail(_send_wr->agent_list,
+ _send_wr->mad_agent_priv->send_list);
+   }
+   return ret;
+}
+
 static void timeout_sends(void *data)
 {
struct ib_mad_agent_private *mad_agent_priv;
@@ -2202,6 +2223,9 @@ static void timeout_sends(void *data)
break;
}
 
+   if (!retry_send(mad_send_wr))
+   continue;
+
list_del(_send_wr->agent_list);
spin_unlock_irqrestore(_agent_priv->lock, flags);
 
diff -uprN linux-2.6.13-rc2-mm1/drivers/infiniband9/core/mad_priv.h 
linux-2.6.13-rc2-mm1/drivers/infiniband10/core/mad_priv.h
-- linux-2.6.13-rc2-mm1/drivers/infiniband9/core/mad_priv.h 2005-07-09 
16:48:05.0 -0400
+++ linux-2.6.13-rc2-mm1/drivers/infiniband10/core/mad_priv.h   2005-07-09 
17:15:40.0 -0400
@@ -123,6 +123,7 @@ struct ib_mad_send_wr_private {
u64 wr_id;  /* client WR ID */
u64 tid;
unsigned long timeout;
+   int retries;
int retry;
int refcount;
enum ib_wc_status status;
@@ -136,6 +137,7 @@ struct ib_mad_local_private {
struct ib_sge sg_list[IB_MAD_SEND_REQ_MAX_SG];
u64 wr_id;  /* client WR ID */
u64 tid;
+   int retries;
 };
 
 struct ib_mad_mgmt_method_table {
diff -uprN linux-2.6.13-rc2-mm1/drivers/infiniband9/core/sa_query.c 
linux-2.6.13-rc2-mm1/drivers/infiniband10/core/sa_query.c
-- linux-2.6.13-rc2-mm1/drivers/infiniband9/core/sa_query.c 2005-07-10 
16:21:55.0 -0400
+++ linux-2.6.13-rc2-mm1/drivers/infiniband10/core/sa_query.c   2005-07-10 
16:22:13.0 -0400
@@ -462,7 +462,8 @@ static int send_mad(struct ib_sa_query *
 .mad_hdr = >mad->mad_hdr,
 .remote_qpn  = 1,
 .remote_qkey = IB_QP1_QKEY,
-.timeout_ms  = timeout_ms
+.timeout_ms  = timeout_ms,
+.retries = 0 
 }
 }
};
diff -uprN linux-2.6.13-rc2-mm1/drivers/infiniband9/core/user_mad.c 
linux-2.6.13-rc2-mm1/drivers/infiniband10/core/user_mad.c
-- linux-2.6.13-rc2-mm1/drivers/infiniband9/core/user_mad.c 2005-06-29 
19:00:53.0 -0400
+++ linux-2.6.13-rc2-mm1/drivers/infiniband10/core/user_mad.c   2005-07-09 
17:14:46.0 -0400
@@ -322,6 +322,7 @@ static ssize_t ib_umad_write(struct file
wr.wr.ud.remote_qpn  = be32_to_cpu(packet->mad.qpn);
wr.wr.ud.remote_qkey = be32_to_cpu(packet->mad.qkey);
wr.wr.ud.timeout_ms  = packet->mad.timeout_ms;
+   wr.wr.ud.retries = 0;
 
wr.wr_id= (unsigned long) packet;
 
diff -uprN linux-2.6.13-rc2-mm1/drivers/infiniband9/include/ib_verbs.h 
linux-2.6.13-rc2-mm1/drivers/infiniband10/include/ib_verbs.h
-- linux-2.6.13-rc2-mm1/drivers/infiniband9/include/ib_verbs.h  2005-07-10 
11

[PATCH 15/27] Fix a couple of MAD code paths

2005-07-11 Thread Hal Rosenstock
Fixed locking to handle error posting MAD send work requests. 
Fixed handling canceling a MAD with an active work request.

Signed-off-by: Sean Hefty <[EMAIL PROTECTED]>
Signed-off-by: Hal Rosenstock <[EMAIL PROTECTED]>

This patch depends on patch 14/27.

-- 
 mad.c |   28 ++--
 1 files changed, 14 insertions(+), 14 deletions(-)
diff -uprN linux-2.6.13-rc2-mm1/drivers/infiniband14/core/mad.c 
linux-2.6.13-rc2-mm1/drivers/infiniband15/core/mad.c
-- linux-2.6.13-rc2-mm1/drivers/infiniband14/core/mad.c 2005-07-09 
18:19:51.0 -0400
+++ linux-2.6.13-rc2-mm1/drivers/infiniband15/core/mad.c2005-07-09 
18:24:43.0 -0400
@@ -841,6 +841,7 @@ static int ib_send_mad(struct ib_mad_sen
 {
struct ib_mad_qp_info *qp_info;
struct ib_send_wr *bad_send_wr;
+   struct list_head *list;
unsigned long flags;
int ret;
 
@@ -850,22 +851,20 @@ static int ib_send_mad(struct ib_mad_sen
mad_send_wr->mad_list.mad_queue = _info->send_queue;
 
spin_lock_irqsave(_info->send_queue.lock, flags);
-   if (qp_info->send_queue.count++ < qp_info->send_queue.max_active) {
-   list_add_tail(_send_wr->mad_list.list,
- _info->send_queue.list);
-   spin_unlock_irqrestore(_info->send_queue.lock, flags);
+   if (qp_info->send_queue.count < qp_info->send_queue.max_active) {
ret = ib_post_send(mad_send_wr->mad_agent_priv->agent.qp,
   _send_wr->send_wr, _send_wr);
-   if (ret) {
-   printk(KERN_ERR PFX "ib_post_send failed: %d\n", ret);
-   dequeue_mad(_send_wr->mad_list);
-   }
+   list = _info->send_queue.list;
} else {
-   list_add_tail(_send_wr->mad_list.list,
- _info->overflow_list);
-   spin_unlock_irqrestore(_info->send_queue.lock, flags);
ret = 0;
+   list = _info->overflow_list;
}
+
+   if (!ret) {
+   qp_info->send_queue.count++;
+   list_add_tail(_send_wr->mad_list.list, list);
+   }
+   spin_unlock_irqrestore(_info->send_queue.lock, flags);
return ret;
 }
 
@@ -2023,8 +2022,7 @@ static void cancel_mads(struct ib_mad_ag
 }
 
 static struct ib_mad_send_wr_private*
-find_send_by_wr_id(struct ib_mad_agent_private *mad_agent_priv,
-  u64 wr_id)
+find_send_by_wr_id(struct ib_mad_agent_private *mad_agent_priv, u64 wr_id)
 {
struct ib_mad_send_wr_private *mad_send_wr;
 
@@ -2047,6 +2045,7 @@ int ib_modify_mad(struct ib_mad_agent *m
struct ib_mad_agent_private *mad_agent_priv;
struct ib_mad_send_wr_private *mad_send_wr;
unsigned long flags;
+   int active;
 
mad_agent_priv = container_of(mad_agent, struct ib_mad_agent_private,
  agent);
@@ -2057,13 +2056,14 @@ int ib_modify_mad(struct ib_mad_agent *m
return -EINVAL;
}
 
+   active = (!mad_send_wr->timeout || mad_send_wr->refcount > 1);
if (!timeout_ms) {
mad_send_wr->status = IB_WC_WR_FLUSH_ERR;
mad_send_wr->refcount -= (mad_send_wr->timeout > 0);
}
 
mad_send_wr->send_wr.wr.ud.timeout_ms = timeout_ms;
-   if (!mad_send_wr->timeout || mad_send_wr->refcount > 1)
+   if (active)
mad_send_wr->timeout = msecs_to_jiffies(timeout_ms);
else
ib_reset_mad_timeout(mad_send_wr, timeout_ms);


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 16/27] Add ib_create_ah_from_wc to IB verbs

2005-07-11 Thread Hal Rosenstock
Added new call: ib_create_ah_from_wc. Call will allocate an 
address handle given work completion information, including any
received GRH.

Signed-off-by: Sean Hefty <[EMAIL PROTECTED]>
Signed-off-by: Hal Rosenstock <[EMAIL PROTECTED]>

This patch depends on patch 15/27.

-- 
 core/verbs.c   |   35 +++
 include/ib_mad.h   |9 --
 include/ib_verbs.h |   24 
 3 files changed, 59 insertions(+), 9 deletions(-)
diff -uprN linux-2.6.13-rc2-mm1/drivers/infiniband15/core/verbs.c 
linux-2.6.13-rc2-mm1/drivers/infiniband16/core/verbs.c
-- linux-2.6.13-rc2-mm1/drivers/infiniband15/core/verbs.c   2005-07-10 
11:09:37.0 -0400
+++ linux-2.6.13-rc2-mm1/drivers/infiniband16/core/verbs.c  2005-07-10 
11:43:44.0 -0400
@@ -41,6 +41,7 @@
 #include 
 
 #include 
+#include 
 
 /* Protection domains */
 
@@ -88,6 +89,40 @@ struct ib_ah *ib_create_ah(struct ib_pd 
 }
 EXPORT_SYMBOL(ib_create_ah);
 
+struct ib_ah *ib_create_ah_from_wc(struct ib_pd *pd, struct ib_wc *wc,
+  struct ib_grh *grh, u8 port_num)
+{
+   struct ib_ah_attr ah_attr;
+   u32 flow_class;
+   u16 gid_index;
+   int ret;
+
+   memset(_attr, 0, sizeof ah_attr);
+   ah_attr.dlid = wc->slid;
+   ah_attr.sl = wc->sl;
+   ah_attr.src_path_bits = wc->dlid_path_bits;
+   ah_attr.port_num = port_num;
+   
+   if (wc->wc_flags & IB_WC_GRH) {
+   ah_attr.ah_flags = IB_AH_GRH;
+   ah_attr.grh.dgid = grh->dgid;
+
+   ret = ib_find_cached_gid(pd->device, >sgid, _num,
+_index);
+   if (ret)
+   return ERR_PTR(ret);
+
+   ah_attr.grh.sgid_index = (u8) gid_index;
+   flow_class = be32_to_cpu(>version_tclass_flow);
+   ah_attr.grh.flow_label = flow_class & 0xF;
+   ah_attr.grh.traffic_class = (flow_class >> 20) & 0xFF;
+   ah_attr.grh.hop_limit = grh->hop_limit;
+   }
+
+   return ib_create_ah(pd, _attr);
+}
+EXPORT_SYMBOL(ib_create_ah_from_wc);
+
 int ib_modify_ah(struct ib_ah *ah, struct ib_ah_attr *ah_attr)
 {
return ah->device->modify_ah ?
diff -uprN linux-2.6.13-rc2-mm1/drivers/infiniband15/include/ib_mad.h 
linux-2.6.13-rc2-mm1/drivers/infiniband16/include/ib_mad.h
-- linux-2.6.13-rc2-mm1/drivers/infiniband15/include/ib_mad.h   2005-07-09 
17:57:11.0 -0400
+++ linux-2.6.13-rc2-mm1/drivers/infiniband16/include/ib_mad.h  2005-07-10 
11:43:45.0 -0400
@@ -77,15 +77,6 @@
 #define IB_QP1_QKEY0x8001
 #define IB_QP_SET_QKEY 0x8000
 
-struct ib_grh {
-   u32 version_tclass_flow;
-   u16 paylen;
-   u8  next_hdr;
-   u8  hop_limit;
-   union ib_gidsgid;
-   union ib_giddgid;
-} __attribute__ ((packed));
-
 struct ib_mad_hdr {
u8  base_version;
u8  mgmt_class;
diff -uprN linux-2.6.13-rc2-mm1/drivers/infiniband15/include/ib_verbs.h 
linux-2.6.13-rc2-mm1/drivers/infiniband16/include/ib_verbs.h
-- linux-2.6.13-rc2-mm1/drivers/infiniband15/include/ib_verbs.h 2005-07-10 
10:55:59.0 -0400
+++ linux-2.6.13-rc2-mm1/drivers/infiniband16/include/ib_verbs.h
2005-07-10 11:43:45.0 -0400
@@ -289,6 +289,15 @@ struct ib_global_route {
u8  traffic_class;
 };
 
+struct ib_grh {
+   u32 version_tclass_flow;
+   u16 paylen;
+   u8  next_hdr;
+   u8  hop_limit;
+   union ib_gidsgid;
+   union ib_giddgid;
+};
+
 enum {
IB_MULTICAST_QPN = 0xff
 };
@@ -991,6 +1000,21 @@ int ib_dealloc_pd(struct ib_pd *pd);
 struct ib_ah *ib_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr);
 
 /**
+ * ib_create_ah_from_wc - Creates an address handle associated with the
+ *   sender of the specified work completion.
+ * @pd: The protection domain associated with the address handle.
+ * @wc: Work completion information associated with a received message.
+ * @grh: References the received global route header.  This parameter is
+ *   ignored unless the work completion indicates that the GRH is valid.
+ * @port_num: The outbound port number to associate with the address.
+ *
+ * The address handle is used to reference a local or global destination
+ * in all UD QP post sends.
+ */
+struct ib_ah *ib_create_ah_from_wc(struct ib_pd *pd, struct ib_wc *wc,
+  struct ib_grh *grh, u8 port_num);
+
+/**
  * ib_modify_ah - Modifies the address vector associated with an address
  *   handle.
  * @ah: The address handle to modify.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 6/29v2] Change ib_mad_send_wr_private struct

2005-07-11 Thread Hal Rosenstock
Have ib_mad_send_wr_private reference the private agent structure 
directly, rather than the exposed agent definition.  Remove unneeded
parameters to functions and simplify code were possible from this
change.

Signed-off-by: Sean Hefty <[EMAIL PROTECTED]>
Signed-off-by: Hal Rosenstock <[EMAIL PROTECTED]>

This patch depends on patch 5/29.

--
 mad.c  |   22 ++--
 mad_priv.h |4 ++--
 2 files changed, 12 insertions(+), 14 deletions(-)
diff -uprN linux-2.6.13-rc2-mm1-5/drivers/infiniband/core/mad.c 
linux-2.6.13-rc2-mm1-6/drivers/infiniband/core/mad.c
-- linux-2.6.13-rc2-mm1-5/drivers/infiniband/core/mad.c 2005-07-11 
13:35:56.0 -0400
+++ linux-2.6.13-rc2-mm1-6/drivers/infiniband/core/mad.c2005-07-11 
13:36:16.0 -0400
@@ -839,8 +839,7 @@ void ib_free_send_mad(struct ib_mad_send
 }
 EXPORT_SYMBOL(ib_free_send_mad);
 
-static int ib_send_mad(struct ib_mad_agent_private *mad_agent_priv,
-  struct ib_mad_send_wr_private *mad_send_wr)
+static int ib_send_mad(struct ib_mad_send_wr_private *mad_send_wr)
 {
struct ib_mad_qp_info *qp_info;
struct ib_send_wr *bad_send_wr;
@@ -848,7 +847,7 @@ static int ib_send_mad(struct ib_mad_age
int ret;
 
/* Set WR ID to find mad_send_wr upon completion */
-   qp_info = mad_agent_priv->qp_info;
+   qp_info = mad_send_wr->mad_agent_priv->qp_info;
mad_send_wr->send_wr.wr_id = (unsigned long)_send_wr->mad_list;
mad_send_wr->mad_list.mad_queue = _info->send_queue;
 
@@ -857,7 +856,7 @@ static int ib_send_mad(struct ib_mad_age
list_add_tail(_send_wr->mad_list.list,
  _info->send_queue.list);
spin_unlock_irqrestore(_info->send_queue.lock, flags);
-   ret = ib_post_send(mad_agent_priv->agent.qp,
+   ret = ib_post_send(mad_send_wr->mad_agent_priv->agent.qp,
   _send_wr->send_wr, _send_wr);
if (ret) {
printk(KERN_ERR PFX "ib_post_send failed: %d\n", ret);
@@ -950,7 +949,7 @@ int ib_post_send_mad(struct ib_mad_agent
mad_send_wr->wr_id = mad_send_wr->send_wr.wr_id;
mad_send_wr->send_wr.next = NULL;
mad_send_wr->tid = send_wr->wr.ud.mad_hdr->tid;
-   mad_send_wr->agent = mad_agent;
+   mad_send_wr->mad_agent_priv = mad_agent_priv;
/* Timeout will be updated after send completes */
mad_send_wr->timeout = msecs_to_jiffies(send_wr->wr.
ud.timeout_ms);
@@ -966,7 +965,7 @@ int ib_post_send_mad(struct ib_mad_agent
  _agent_priv->send_list);
spin_unlock_irqrestore(_agent_priv->lock, flags);
 
-   ret = ib_send_mad(mad_agent_priv, mad_send_wr);
+   ret = ib_send_mad(mad_send_wr);
if (ret) {
/* Fail send request */
spin_lock_irqsave(_agent_priv->lock, flags);
@@ -1742,13 +1741,14 @@ static void adjust_timeout(struct ib_mad
}
 }
 
-static void wait_for_response(struct ib_mad_agent_private *mad_agent_priv,
- struct ib_mad_send_wr_private *mad_send_wr )
+static void wait_for_response(struct ib_mad_send_wr_private *mad_send_wr)
 {
+   struct ib_mad_agent_private *mad_agent_priv;
struct ib_mad_send_wr_private *temp_mad_send_wr;
struct list_head *list_item;
unsigned long delay;
 
+   mad_agent_priv = mad_send_wr->mad_agent_priv;
list_del(_send_wr->agent_list);
 
delay = mad_send_wr->timeout;
@@ -1781,9 +1781,7 @@ static void ib_mad_complete_send_wr(stru
struct ib_mad_agent_private *mad_agent_priv;
unsigned long   flags;
 
-   mad_agent_priv = container_of(mad_send_wr->agent,
- struct ib_mad_agent_private, agent);
-
+   mad_agent_priv = mad_send_wr->mad_agent_priv;
spin_lock_irqsave(_agent_priv->lock, flags);
if (mad_send_wc->status != IB_WC_SUCCESS &&
mad_send_wr->status == IB_WC_SUCCESS) {
@@ -1794,7 +1792,7 @@ static void ib_mad_complete_send_wr(stru
if (--mad_send_wr->refcount > 0) {
if (mad_send_wr->refcount == 1 && mad_send_wr->timeout &&
mad_send_wr->status == IB_WC_SUCCESS) {
-   wait_for_response(mad_agent_priv, mad_send_wr);
+   wait_for_response(mad_send_wr);
}
spin_unlock_irqrestore(_agent_priv->lock, flags);
return;
diff -uprN linux-2.6.13-rc2-mm1-5/drivers/infiniband/core/mad_priv.h 
linux-2.6.13-rc2-mm1-6/drivers/infiniband/core/mad_priv.h
-- linux-2.6.

[PATCH 11/29v2] Simplify calling of list_del in MAD

2005-07-11 Thread Hal Rosenstock
Simplify calling of list_del.


Signed-off-by: Sean Hefty <[EMAIL PROTECTED]>
Signed-off-by: Hal Rosenstock <[EMAIL PROTECTED]>

This patch depends on patch 10/29.

--
 mad.c |3 +--
 1 files changed, 1 insertion(+), 2 deletions(-)
diff -uprN linux-2.6.13-rc2-mm1-10/drivers/infiniband/core/mad.c 
linux-2.6.13-rc2-mm1-11/drivers/infiniband/core/mad.c
-- linux-2.6.13-rc2-mm1-10/drivers/infiniband/core/mad.c2005-07-11 
13:37:54.0 -0400
+++ linux-2.6.13-rc2-mm1-11/drivers/infiniband/core/mad.c   2005-07-11 
13:38:06.0 -0400
@@ -2188,7 +2188,6 @@ static int retry_send(struct ib_mad_send
 
if (!ret) {
mad_send_wr->refcount++;
-   list_del(_send_wr->agent_list);
list_add_tail(_send_wr->agent_list,
  _send_wr->mad_agent_priv->send_list);
}
@@ -2223,10 +,10 @@ static void timeout_sends(void *data)
break;
}
 
+   list_del(_send_wr->agent_list);
if (!retry_send(mad_send_wr))
continue;
 
-   list_del(_send_wr->agent_list);
spin_unlock_irqrestore(_agent_priv->lock, flags);
 
mad_send_wc.wr_id = mad_send_wr->wr_id;


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 10/29v2] Add automatic retries to MAD layer

2005-07-11 Thread Hal Rosenstock
Add automatic retries to MAD layer. 
Signed-off-by: Sean Hefty <[EMAIL PROTECTED]>
Signed-off-by: Hal Rosenstock <[EMAIL PROTECTED]>

This patch depends on patch 9/29.

--
 core/mad.c |   26 +-
 core/mad_priv.h|2 ++
 core/sa_query.c|3 ++-
 core/user_mad.c|1 +
 include/ib_verbs.h |1 +
 5 files changed, 31 insertions(+), 2 deletions(-)
diff -uprN linux-2.6.13-rc2-mm1-9/drivers/infiniband/core/mad.c 
linux-2.6.13-rc2-mm1-10/drivers/infiniband/core/mad.c
-- linux-2.6.13-rc2-mm1-9/drivers/infiniband/core/mad.c 2005-07-11 
13:37:10.0 -0400
+++ linux-2.6.13-rc2-mm1-10/drivers/infiniband/core/mad.c   2005-07-11 
13:37:54.0 -0400
@@ -954,7 +954,7 @@ int ib_post_send_mad(struct ib_mad_agent
/* Timeout will be updated after send completes */
mad_send_wr->timeout = msecs_to_jiffies(send_wr->wr.
ud.timeout_ms);
-   mad_send_wr->retry = 0;
+   mad_send_wr->retries = mad_send_wr->send_wr.wr.ud.retries;
/* One reference for each work request to QP + response */
mad_send_wr->refcount = 1 + (mad_send_wr->timeout > 0);
mad_send_wr->status = IB_WC_SUCCESS;
@@ -2174,6 +2174,27 @@ local_send_completion:
spin_unlock_irqrestore(_agent_priv->lock, flags);
 }
 
+static int retry_send(struct ib_mad_send_wr_private *mad_send_wr)
+{
+   int ret;
+
+   if (!mad_send_wr->retries--)
+   return -ETIMEDOUT;
+
+   mad_send_wr->timeout = msecs_to_jiffies(mad_send_wr->send_wr.
+   wr.ud.timeout_ms);
+
+   ret = ib_send_mad(mad_send_wr);
+
+   if (!ret) {
+   mad_send_wr->refcount++;
+   list_del(_send_wr->agent_list);
+   list_add_tail(_send_wr->agent_list,
+ _send_wr->mad_agent_priv->send_list);
+   }
+   return ret;
+}
+
 static void timeout_sends(void *data)
 {
struct ib_mad_agent_private *mad_agent_priv;
@@ -2202,6 +2223,9 @@ static void timeout_sends(void *data)
break;
}
 
+   if (!retry_send(mad_send_wr))
+   continue;
+
list_del(_send_wr->agent_list);
spin_unlock_irqrestore(_agent_priv->lock, flags);
 
diff -uprN linux-2.6.13-rc2-mm1-9/drivers/infiniband/core/mad_priv.h 
linux-2.6.13-rc2-mm1-10/drivers/infiniband/core/mad_priv.h
-- linux-2.6.13-rc2-mm1-9/drivers/infiniband/core/mad_priv.h2005-07-09 
16:48:05.0 -0400
+++ linux-2.6.13-rc2-mm1-10/drivers/infiniband/core/mad_priv.h  2005-07-09 
17:15:40.0 -0400
@@ -123,6 +123,7 @@ struct ib_mad_send_wr_private {
u64 wr_id;  /* client WR ID */
u64 tid;
unsigned long timeout;
+   int retries;
int retry;
int refcount;
enum ib_wc_status status;
@@ -136,6 +137,7 @@ struct ib_mad_local_private {
struct ib_sge sg_list[IB_MAD_SEND_REQ_MAX_SG];
u64 wr_id;  /* client WR ID */
u64 tid;
+   int retries;
 };
 
 struct ib_mad_mgmt_method_table {
diff -uprN linux-2.6.13-rc2-mm1-9/drivers/infiniband/core/sa_query.c 
linux-2.6.13-rc2-mm1-10/drivers/infiniband/core/sa_query.c
-- linux-2.6.13-rc2-mm1-9/drivers/infiniband/core/sa_query.c2005-07-10 
16:21:55.0 -0400
+++ linux-2.6.13-rc2-mm1-10/drivers/infiniband/core/sa_query.c  2005-07-10 
16:22:13.0 -0400
@@ -462,7 +462,8 @@ static int send_mad(struct ib_sa_query *
 .mad_hdr = >mad->mad_hdr,
 .remote_qpn  = 1,
 .remote_qkey = IB_QP1_QKEY,
-.timeout_ms  = timeout_ms
+.timeout_ms  = timeout_ms,
+.retries = 0 
 }
 }
};
diff -uprN linux-2.6.13-rc2-mm1-9/drivers/infiniband/core/user_mad.c 
linux-2.6.13-rc2-mm1-10/drivers/infiniband/core/user_mad.c
-- linux-2.6.13-rc2-mm1-9/drivers/infiniband/core/user_mad.c2005-06-29 
19:00:53.0 -0400
+++ linux-2.6.13-rc2-mm1-10/drivers/infiniband/core/user_mad.c  2005-07-09 
17:14:46.0 -0400
@@ -322,6 +322,7 @@ static ssize_t ib_umad_write(struct file
wr.wr.ud.remote_qpn  = be32_to_cpu(packet->mad.qpn);
wr.wr.ud.remote_qkey = be32_to_cpu(packet->mad.qkey);
wr.wr.ud.timeout_ms  = packet->mad.timeout_ms;
+   wr.wr.ud.retries = 0;
 
wr.wr_id= (unsigned long) packet;
 
diff -uprN linux-2.6.13-rc2-mm1-9/drivers/infiniband/include/ib_verbs.h 
linux-2.6.13-rc2-mm1-10/drivers/infiniband/include/ib_verbs.h
-- linux-2.6.13-rc2-mm1-9/drivers/infiniband/include/ib_verbs.h 2005

  1   2   3   >