from:"Hal Rosenstock"

[PATCHv4 TRIVIAL] IB/core: Documentation fix to the snoop handler in the MAD header file

2016-01-05 Thread Hal Rosenstock

In ib_mad.h, ib_mad_snoop_handler uses send_buf rather than send_wr

Signed-off-by: Hal Rosenstock 
---
Change since v3:
Fixed title to not include function name

Change since v2:
Changed title to use "higher" language

Change since v1:
Fixed typo in patch description

diff --git a/include/rdma/ib_mad.h b/include/rdma/ib_mad.h
index ec9b44d..2b3573d 100644
--- a/include/rdma/ib_mad.h
+++ b/include/rdma/ib_mad.h
@@ -424,11 +424,11 @@ typedef void (*ib_mad_send_handler)(struct ib_mad_agent 
*mad_agent,
 /**
  * ib_mad_snoop_handler - Callback handler for snooping sent MADs.
  * @mad_agent: MAD agent that snooped the MAD.
- * @send_wr: Work request information on the sent MAD.
+ * @send_buf: send MAD data buffer.
  * @mad_send_wc: Work completion information on the sent MAD.  Valid
  *   only for snooping that occurs on a send completion.
  *
- * Clients snooping MADs should not modify data referenced by the @send_wr
+ * Clients snooping MADs should not modify data referenced by the @send_buf
  * or @mad_send_wc.
  */
 typedef void (*ib_mad_snoop_handler)(struct ib_mad_agent *mad_agent,
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv3 TRIVIAL] IB/core: Documentation fix to ib_mad_snoop_handler in the MAD header file

2016-01-05 Thread Hal Rosenstock

On 1/5/2016 12:38 PM, Hefty, Sean wrote:
>> In ib_mad.h, ib_mad_snoop_handler uses send_buf rather than send_wr
> 
> The MAD snooping should be removed from the mad stack.  

This last discussed on linux-rdma list back in late September when Ira
posted a partial RFC patch to do this.

> There are no in tree users and the only attempt at adding one was rejected.

There are no in tree users of this but there is your madeye tool (which
is out of tree). This is still a useful debug tool for MADs and there
are people who still use that.

Ira posted the start of MAD stack tracing but AFAIT it was not
equivalent and that thread died out.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv3 TRIVIAL] IB/core: Documentation fix to ib_mad_snoop_handler in the MAD header file

2016-01-05 Thread Hal Rosenstock

In ib_mad.h, ib_mad_snoop_handler uses send_buf rather than send_wr

Signed-off-by: Hal Rosenstock 
---
Change since v2:
Changed title to use "higher" language

Change since v1:
Fixed typo in patch description

diff --git a/include/rdma/ib_mad.h b/include/rdma/ib_mad.h
index ec9b44d..2b3573d 100644
--- a/include/rdma/ib_mad.h
+++ b/include/rdma/ib_mad.h
@@ -424,11 +424,11 @@ typedef void (*ib_mad_send_handler)(struct ib_mad_agent 
*mad_agent,
 /**
  * ib_mad_snoop_handler - Callback handler for snooping sent MADs.
  * @mad_agent: MAD agent that snooped the MAD.
- * @send_wr: Work request information on the sent MAD.
+ * @send_buf: send MAD data buffer.
  * @mad_send_wc: Work completion information on the sent MAD.  Valid
  *   only for snooping that occurs on a send completion.
  *
- * Clients snooping MADs should not modify data referenced by the @send_wr
+ * Clients snooping MADs should not modify data referenced by the @send_buf
  * or @mad_send_wc.
  */
 typedef void (*ib_mad_snoop_handler)(struct ib_mad_agent *mad_agent,
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] IB/sysfs: Fix sparse warning on attr_id

2016-01-04 Thread Hal Rosenstock

On 1/3/2016 10:44 PM, ira.we...@intel.com wrote:
> From: Ira Weiny 
> 
> Attributed ID was declared as an int while the value should really be big
> endian 16.
> 
> Fixes: 35c4cbb17811 ("IB/core: Create get_perf_mad function in sysfs.c")
> 
> Reported-by: Bart Van Assche 
> Signed-off-by: Ira Weiny 

Reviewed-by: Hal Rosenstock 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv2 TRIVIAL] IB/core: ib_mad.h ib_mad_snoop_handler documentation fix

2016-01-04 Thread Hal Rosenstock

ib_mad_snoop_handler uses send_buf rather than send_wr

Signed-off-by: Hal Rosenstock 
---
Change since v1:
Fixed typo in patch description

diff --git a/include/rdma/ib_mad.h b/include/rdma/ib_mad.h
index ec9b44d..2b3573d 100644
--- a/include/rdma/ib_mad.h
+++ b/include/rdma/ib_mad.h
@@ -424,11 +424,11 @@ typedef void (*ib_mad_send_handler)(struct ib_mad_agent 
*mad_agent,
 /**
  * ib_mad_snoop_handler - Callback handler for snooping sent MADs.
  * @mad_agent: MAD agent that snooped the MAD.
- * @send_wr: Work request information on the sent MAD.
+ * @send_buf: send MAD data buffer.
  * @mad_send_wc: Work completion information on the sent MAD.  Valid
  *   only for snooping that occurs on a send completion.
  *
- * Clients snooping MADs should not modify data referenced by the @send_wr
+ * Clients snooping MADs should not modify data referenced by the @send_buf
  * or @mad_send_wc.
  */
 typedef void (*ib_mad_snoop_handler)(struct ib_mad_agent *mad_agent,
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] IB/mad: use CQ abstraction

2016-01-04 Thread Hal Rosenstock

On 1/4/2016 9:16 AM, Christoph Hellwig wrote:
> Remove the local workqueue to process mad completions and use the CQ API
> instead.
> 
> Signed-off-by: Christoph Hellwig 

Reviewed-by: Hal Rosenstock 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH TRIVIAL] IB/core: ib_mad.h ib_mad_snoop_handler documentation fix

2016-01-04 Thread Hal Rosenstock

ib_mad_snoop_handler ues send_buf rather than send_wr

Signed-off-by: Hal Rosenstock 
---
diff --git a/include/rdma/ib_mad.h b/include/rdma/ib_mad.h
index ec9b44d..2b3573d 100644
--- a/include/rdma/ib_mad.h
+++ b/include/rdma/ib_mad.h
@@ -424,11 +424,11 @@ typedef void (*ib_mad_send_handler)(struct ib_mad_agent 
*mad_agent,
 /**
  * ib_mad_snoop_handler - Callback handler for snooping sent MADs.
  * @mad_agent: MAD agent that snooped the MAD.
- * @send_wr: Work request information on the sent MAD.
+ * @send_buf: send MAD data buffer.
  * @mad_send_wc: Work completion information on the sent MAD.  Valid
  *   only for snooping that occurs on a send completion.
  *
- * Clients snooping MADs should not modify data referenced by the @send_wr
+ * Clients snooping MADs should not modify data referenced by the @send_buf
  * or @mad_send_wc.
  */
 typedef void (*ib_mad_snoop_handler)(struct ib_mad_agent *mad_agent,
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] IB/mad: pass ib_mad_send_buf explicitly to the recv_handler

2016-01-04 Thread Hal Rosenstock

On 1/4/2016 8:15 AM, Christoph Hellwig wrote:
> Stop abusing wr_id and just pass the parameter explicitly.
> 
> Signed-off-by: Christoph Hellwig 

Reviewed-by: Hal Rosenstock 

> ---
>  drivers/infiniband/core/cm.c  |  1 +
>  drivers/infiniband/core/mad.c | 18 ++
>  drivers/infiniband/core/sa_query.c|  7 ---
>  drivers/infiniband/core/user_mad.c|  1 +
>  drivers/infiniband/ulp/srpt/ib_srpt.c |  1 +
>  include/rdma/ib_mad.h |  2 ++
>  6 files changed, 19 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
> index e3a95d1..ad3726d 100644
> --- a/drivers/infiniband/core/cm.c
> +++ b/drivers/infiniband/core/cm.c
> @@ -3503,6 +3503,7 @@ int ib_cm_notify(struct ib_cm_id *cm_id, enum 
> ib_event_type event)
>  EXPORT_SYMBOL(ib_cm_notify);
>  
>  static void cm_recv_handler(struct ib_mad_agent *mad_agent,
> + struct ib_mad_send_buf *send_buf,
>   struct ib_mad_recv_wc *mad_recv_wc)
>  {
>   struct cm_port *port = mad_agent->context;
> diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
> index d4d2a61..cbe232a 100644
> --- a/drivers/infiniband/core/mad.c
> +++ b/drivers/infiniband/core/mad.c
> @@ -693,7 +693,7 @@ static void snoop_recv(struct ib_mad_qp_info *qp_info,
>  
>   atomic_inc(&mad_snoop_priv->refcount);
>   spin_unlock_irqrestore(&qp_info->snoop_lock, flags);
> - mad_snoop_priv->agent.recv_handler(&mad_snoop_priv->agent,
> + mad_snoop_priv->agent.recv_handler(&mad_snoop_priv->agent, NULL,
>  mad_recv_wc);
>   deref_snoop_agent(mad_snoop_priv);
>   spin_lock_irqsave(&qp_info->snoop_lock, flags);
> @@ -1994,9 +1994,9 @@ static void ib_mad_complete_recv(struct 
> ib_mad_agent_private *mad_agent_priv,
>   /* user rmpp is in effect
>* and this is an active RMPP MAD
>*/
> - mad_recv_wc->wc->wr_id = 0;
> - 
> mad_agent_priv->agent.recv_handler(&mad_agent_priv->agent,
> -mad_recv_wc);
> + mad_agent_priv->agent.recv_handler(
> + &mad_agent_priv->agent, NULL,
> + mad_recv_wc);
>   atomic_dec(&mad_agent_priv->refcount);
>   } else {
>   /* not user rmpp, revert to normal behavior and
> @@ -2010,9 +2010,10 @@ static void ib_mad_complete_recv(struct 
> ib_mad_agent_private *mad_agent_priv,
>   spin_unlock_irqrestore(&mad_agent_priv->lock, flags);
>  
>   /* Defined behavior is to complete response before 
> request */
> - mad_recv_wc->wc->wr_id = (unsigned long) 
> &mad_send_wr->send_buf;
> - 
> mad_agent_priv->agent.recv_handler(&mad_agent_priv->agent,
> -mad_recv_wc);
> + mad_agent_priv->agent.recv_handler(
> + &mad_agent_priv->agent,
> + &mad_send_wr->send_buf,
> + mad_recv_wc);
>   atomic_dec(&mad_agent_priv->refcount);
>  
>   mad_send_wc.status = IB_WC_SUCCESS;
> @@ -2021,7 +2022,7 @@ static void ib_mad_complete_recv(struct 
> ib_mad_agent_private *mad_agent_priv,
>   ib_mad_complete_send_wr(mad_send_wr, &mad_send_wc);
>   }
>   } else {
> - mad_agent_priv->agent.recv_handler(&mad_agent_priv->agent,
> + mad_agent_priv->agent.recv_handler(&mad_agent_priv->agent, NULL,
>  mad_recv_wc);
>   deref_mad_agent(mad_agent_priv);
>   }
> @@ -2762,6 +2763,7 @@ static void local_completions(struct work_struct *work)
>  IB_MAD_SNOOP_RECVS);
>   recv_mad_agent->agent.recv_handler(
>   &recv_mad_agent->agent,
> + &local->mad_send_wr->send_buf,
>   
> &local->ma

Re: [PATCH v2] IB/core: sysfs.c: Fix PerfMgt ClassPortInfo handling

2015-12-29 Thread Hal Rosenstock

On 12/29/2015 7:21 AM, Or Gerlitz wrote:
> On 12/29/2015 12:43 PM, Hal Rosenstock wrote:
>> This handles issue pointed out by Matan Barak 
>>
>> Fixes: 145d9c541032 ('IB/core: Display extended counter set if
>> available')
>>
>> Signed-off-by: Hal Rosenstock 
> again, remove the blank line after the fixes tag.
> 
> Also,  I am not that the way Doug is setting the branch for pull would
> preserve commit IDs when
> the offending patch landed in Linus tree. If this is the case, we should
> put your patch in 2nd pull
> request and have the right commit ID there. Please check with Doug.

Doug ?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2] IB/core: sysfs.c: Fix PerfMgt ClassPortInfo handling

2015-12-29 Thread Hal Rosenstock


Port number is not part of ClassPortInfo attribute but is
still needed as a parameter when invoking process_mad.

To properly handle this attribute, port_num is added as a
parameter to get_counter_table and get_perf_mad was changed
not to store port_num in the attribute itself when it's
querying the ClassPortInfo attribute.

This handles issue pointed out by Matan Barak 

Fixes: 145d9c541032 ('IB/core: Display extended counter set if available')

Signed-off-by: Hal Rosenstock 
Acked-by: Matan Barak 
Acked-by: Ira Weiny 
---
Change from v1:
Added fixes line to description

diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c
index 539040f..2daf832 100644
--- a/drivers/infiniband/core/sysfs.c
+++ b/drivers/infiniband/core/sysfs.c
@@ -438,7 +438,8 @@ static int get_perf_mad(struct ib_device *dev, int 
port_num, int attr,
in_mad->mad_hdr.method= IB_MGMT_METHOD_GET;
in_mad->mad_hdr.attr_id   = attr;
 
-   in_mad->data[41] = port_num;/* PortSelect field */
+   if (attr != IB_PMA_CLASS_PORT_INFO)
+   in_mad->data[41] = port_num;/* PortSelect field */
 
if ((dev->process_mad(dev, IB_MAD_IGNORE_MKEY,
 port_num, NULL, NULL,
@@ -714,11 +715,12 @@ err:
  * Figure out which counter table to use depending on
  * the device capabilities.
  */
-static struct attribute_group *get_counter_table(struct ib_device *dev)
+static struct attribute_group *get_counter_table(struct ib_device *dev,
+int port_num)
 {
struct ib_class_port_info cpi;
 
-   if (get_perf_mad(dev, 0, IB_PMA_CLASS_PORT_INFO,
+   if (get_perf_mad(dev, port_num, IB_PMA_CLASS_PORT_INFO,
&cpi, 40, sizeof(cpi)) >= 0) {
 
if (cpi.capability_mask && IB_PMA_CLASS_CAP_EXT_WIDTH)
@@ -776,7 +778,7 @@ static int add_port(struct ib_device *device, int port_num,
goto err_put;
}
 
-   p->pma_table = get_counter_table(device);
+   p->pma_table = get_counter_table(device, port_num);
ret = sysfs_create_group(&p->kobj, p->pma_table);
if (ret)
goto err_put_gid_attrs;
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] IB/core: sysfs.c: Fix PerfMgt ClassPortInfo handling

2015-12-28 Thread Hal Rosenstock


Port number is not part of ClassPortInfo attribute but is
still needed as a parameter when invoking process_mad.

To properly handle this attribute, port_num is added as a
parameter to get_counter_table and get_perf_mad was changed
not to store port_num in the attribute itself when it's
querying the ClassPortInfo attribute.

This handles issue pointed out by Matan Barak 

Signed-off-by: Hal Rosenstock 
Acked-by: Matan Barak 
---
diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c
index 539040f..2daf832 100644
--- a/drivers/infiniband/core/sysfs.c
+++ b/drivers/infiniband/core/sysfs.c
@@ -438,7 +438,8 @@ static int get_perf_mad(struct ib_device *dev, int 
port_num, int attr,
in_mad->mad_hdr.method= IB_MGMT_METHOD_GET;
in_mad->mad_hdr.attr_id   = attr;
 
-   in_mad->data[41] = port_num;/* PortSelect field */
+   if (attr != IB_PMA_CLASS_PORT_INFO)
+   in_mad->data[41] = port_num;/* PortSelect field */
 
if ((dev->process_mad(dev, IB_MAD_IGNORE_MKEY,
 port_num, NULL, NULL,
@@ -714,11 +715,12 @@ err:
  * Figure out which counter table to use depending on
  * the device capabilities.
  */
-static struct attribute_group *get_counter_table(struct ib_device *dev)
+static struct attribute_group *get_counter_table(struct ib_device *dev,
+int port_num)
 {
struct ib_class_port_info cpi;
 
-   if (get_perf_mad(dev, 0, IB_PMA_CLASS_PORT_INFO,
+   if (get_perf_mad(dev, port_num, IB_PMA_CLASS_PORT_INFO,
&cpi, 40, sizeof(cpi)) >= 0) {
 
if (cpi.capability_mask && IB_PMA_CLASS_CAP_EXT_WIDTH)
@@ -776,7 +778,7 @@ static int add_port(struct ib_device *device, int port_num,
goto err_put;
}
 
-   p->pma_table = get_counter_table(device);
+   p->pma_table = get_counter_table(device, port_num);
ret = sysfs_create_group(&p->kobj, p->pma_table);
if (ret)
goto err_put_gid_attrs;
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 for-next 5/7] IB/mlx4: Add IB counters table

2015-12-25 Thread Hal Rosenstock

On 12/25/2015 9:50 AM, Hal Rosenstock wrote:
> On 12/24/2015 11:09 AM, Matan Barak wrote:
>> On Thu, Dec 24, 2015 at 4:07 PM, Matan Barak  
>> wrote:
>>> On Thu, Dec 24, 2015 at 2:38 PM, Or Gerlitz  wrote:
>>>> On 12/24/2015 12:42 PM, Sagi Grimberg wrote:
>>>>>
>>>>>
>>>>>>> This patch seems to generate a list corruption [1] when I test
>>>>>>> with Doug's for-4.5 tree. Eran, care to take a look at this?
>>>>>>
>>>>>>
>>>>>> This patch is part from a series that was introduced in 4.3-rc1 [1],
>>>>>
>>>>>
>>>>> Then something else broke it. Can people check their patches on doug's
>>>>> tree? At the moment it's unusable...
>>>>
>>>
>>> Leon and I have checked Doug's tree with mlx4_ib disabled and we
>>> didn't encounter any error.
>>> We ran ucmatose over IB connection (in mlx5) and it worked flawlessly.
>>>
>>>>
>>>> Yes, I checked the branch up to commit 882f3b3 "Merge branches
>>>> '4.5/Or-cleanup' and '4.5/rdma-cq' into k.o/for-4.5" and it works (rping,
>>>> ibv_rc_pingpong over top of mlx4 VPI)
>>>>
>>
>> Regarding mlx4, Eran and I analyzed it. We didn't test that, but it
>> seems like the bug is introduced in the 64bit counters test. Here's a
>> proposal:
>>
>> diff --git a/drivers/infiniband/core/sysfs.c 
>> b/drivers/infiniband/core/sysfs.c
>> index 539040f..8da3c83 100644
>> --- a/drivers/infiniband/core/sysfs.c
>> +++ b/drivers/infiniband/core/sysfs.c
>> @@ -714,11 +714,12 @@ err:
>>   * Figure out which counter table to use depending on
>>   * the device capabilities.
>>   */
>> -static struct attribute_group *get_counter_table(struct ib_device *dev)
>> +static struct attribute_group *get_counter_table(struct ib_device *dev,
>> +  int port_num)
>>  {
>> struct ib_class_port_info cpi;
>>
>> -   if (get_perf_mad(dev, 0, IB_PMA_CLASS_PORT_INFO,
>> + if (get_perf_mad(dev, port_num, IB_PMA_CLASS_PORT_INFO,
>> &cpi, 40, sizeof(cpi)) >= 0) {
> 
> Your proposal is similar to earlier version of Christoph's patch but was
> changed since ClassPortInfo attribute does not have PortSelect field
> like other PerfMgt attributes which is where this port num would be
> placed. In ClassPortInfo attribute, that location would be the
> ClassVersion field that would be set to port number in PerfMgt Get query.

In actuality, I don't think it really matters as this is a Get not a Set
and the PMA would do the right thing even if some field in the CPI were
stepped on.

> -- Hal
> 
>>
>> if (cpi.capability_mask && IB_PMA_CLASS_CAP_EXT_WIDTH)
>> @@ -776,7 +777,7 @@ static int add_port(struct ib_device *device, int 
>> port_num,
>> goto err_put;
>> }
>>
>> -   p->pma_table = get_counter_table(device);
>> + p->pma_table = get_counter_table(device, port_num);
>> ret = sysfs_create_group(&p->kobj, p->pma_table);
>> if (ret)
>> goto err_put_gid_attrs;
>>
>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>>> the body of a message to majord...@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 for-next 5/7] IB/mlx4: Add IB counters table

2015-12-25 Thread Hal Rosenstock

On 12/24/2015 11:09 AM, Matan Barak wrote:
> On Thu, Dec 24, 2015 at 4:07 PM, Matan Barak  
> wrote:
>> On Thu, Dec 24, 2015 at 2:38 PM, Or Gerlitz  wrote:
>>> On 12/24/2015 12:42 PM, Sagi Grimberg wrote:


>> This patch seems to generate a list corruption [1] when I test
>> with Doug's for-4.5 tree. Eran, care to take a look at this?
>
>
> This patch is part from a series that was introduced in 4.3-rc1 [1],


 Then something else broke it. Can people check their patches on doug's
 tree? At the moment it's unusable...
>>>
>>
>> Leon and I have checked Doug's tree with mlx4_ib disabled and we
>> didn't encounter any error.
>> We ran ucmatose over IB connection (in mlx5) and it worked flawlessly.
>>
>>>
>>> Yes, I checked the branch up to commit 882f3b3 "Merge branches
>>> '4.5/Or-cleanup' and '4.5/rdma-cq' into k.o/for-4.5" and it works (rping,
>>> ibv_rc_pingpong over top of mlx4 VPI)
>>>
> 
> Regarding mlx4, Eran and I analyzed it. We didn't test that, but it
> seems like the bug is introduced in the 64bit counters test. Here's a
> proposal:
> 
> diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c
> index 539040f..8da3c83 100644
> --- a/drivers/infiniband/core/sysfs.c
> +++ b/drivers/infiniband/core/sysfs.c
> @@ -714,11 +714,12 @@ err:
>   * Figure out which counter table to use depending on
>   * the device capabilities.
>   */
> -static struct attribute_group *get_counter_table(struct ib_device *dev)
> +static struct attribute_group *get_counter_table(struct ib_device *dev,
> +  int port_num)
>  {
> struct ib_class_port_info cpi;
> 
> -   if (get_perf_mad(dev, 0, IB_PMA_CLASS_PORT_INFO,
> + if (get_perf_mad(dev, port_num, IB_PMA_CLASS_PORT_INFO,
> &cpi, 40, sizeof(cpi)) >= 0) {

Your proposal is similar to earlier version of Christoph's patch but was
changed since ClassPortInfo attribute does not have PortSelect field
like other PerfMgt attributes which is where this port num would be
placed. In ClassPortInfo attribute, that location would be the
ClassVersion field that would be set to port number in PerfMgt Get query.

-- Hal

> 
> if (cpi.capability_mask && IB_PMA_CLASS_CAP_EXT_WIDTH)
> @@ -776,7 +777,7 @@ static int add_port(struct ib_device *device, int 
> port_num,
> goto err_put;
> }
> 
> -   p->pma_table = get_counter_table(device);
> + p->pma_table = get_counter_table(device, port_num);
> ret = sysfs_create_group(&p->kobj, p->pma_table);
> if (ret)
> goto err_put_gid_attrs;
> 
> 
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>> the body of a message to majord...@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] Display extended counter set if available

2015-12-24 Thread Hal Rosenstock

On 12/24/2015 11:22 AM, eran ben elisha wrote:
> On Mon, Dec 21, 2015 at 4:20 PM, Christoph Lameter  wrote:
>> V2->V3: Add check for NOIETF mode and create special table
>>   for that case.
>>
>> Check if the extended counters are available and if so
>> create the proper extended and additional counters.
>>
>> Reviewed-by: Hal Rosenstock 
>> Signed-off-by: Christoph Lameter 
>> ---
>>  drivers/infiniband/core/sysfs.c | 104 
>> +++-
>>  include/rdma/ib_pma.h   |   1 +
>>  2 files changed, 104 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/infiniband/core/sysfs.c 
>> b/drivers/infiniband/core/sysfs.c
>> index 34dcc23..b179fca 100644
>> --- a/drivers/infiniband/core/sysfs.c
>> +++ b/drivers/infiniband/core/sysfs.c
>> @@ -320,6 +320,13 @@ struct port_table_attribute port_pma_attr_##_name = {   
>>\
>> .attr_id = IB_PMA_PORT_COUNTERS ,   \
>>  }
>>
>> +#define PORT_PMA_ATTR_EXT(_name, _width, _offset)  \
>> +struct port_table_attribute port_pma_attr_ext_##_name = {  \
>> +   .attr  = __ATTR(_name, S_IRUGO, show_pma_counter, NULL),\
>> +   .index = (_offset) | ((_width) << 16),  \
>> +   .attr_id = IB_PMA_PORT_COUNTERS_EXT ,   \
>> +}
>> +
>>  /*
>>   * Get a Perfmgmt MAD block of data.
>>   * Returns error code or the number of bytes retrieved.
>> @@ -400,6 +407,11 @@ static ssize_t show_pma_counter(struct ib_port *p, 
>> struct port_attribute *attr,
>> ret = sprintf(buf, "%u\n",
>>   be32_to_cpup((__be32 *)data));
>> break;
>> +   case 64:
>> +   ret = sprintf(buf, "%llu\n",
>> +   be64_to_cpup((__be64 *)data));
>> +   break;
>> +
>> default:
>> ret = 0;
>> }
>> @@ -424,6 +436,18 @@ static PORT_PMA_ATTR(port_rcv_data , 13, 
>> 32, 224);
>>  static PORT_PMA_ATTR(port_xmit_packets , 14, 32, 256);
>>  static PORT_PMA_ATTR(port_rcv_packets  , 15, 32, 288);
>>
>> +/*
>> + * Counters added by extended set
>> + */
>> +static PORT_PMA_ATTR_EXT(port_xmit_data, 64,  64);
>> +static PORT_PMA_ATTR_EXT(port_rcv_data , 64, 128);
>> +static PORT_PMA_ATTR_EXT(port_xmit_packets , 64, 192);
>> +static PORT_PMA_ATTR_EXT(port_rcv_packets  , 64, 256);
>> +static PORT_PMA_ATTR_EXT(unicast_xmit_packets  , 64, 320);
>> +static PORT_PMA_ATTR_EXT(unicast_rcv_packets   , 64, 384);
>> +static PORT_PMA_ATTR_EXT(multicast_xmit_packets, 64, 448);
>> +static PORT_PMA_ATTR_EXT(multicast_rcv_packets , 64, 512);
>> +
>>  static struct attribute *pma_attrs[] = {
>> &port_pma_attr_symbol_error.attr.attr,
>> &port_pma_attr_link_error_recovery.attr.attr,
>> @@ -444,11 +468,65 @@ static struct attribute *pma_attrs[] = {
>> NULL
>>  };
>>
>> +static struct attribute *pma_attrs_ext[] = {
>> +   &port_pma_attr_symbol_error.attr.attr,
>> +   &port_pma_attr_link_error_recovery.attr.attr,
>> +   &port_pma_attr_link_downed.attr.attr,
>> +   &port_pma_attr_port_rcv_errors.attr.attr,
>> +   &port_pma_attr_port_rcv_remote_physical_errors.attr.attr,
>> +   &port_pma_attr_port_rcv_switch_relay_errors.attr.attr,
>> +   &port_pma_attr_port_xmit_discards.attr.attr,
>> +   &port_pma_attr_port_xmit_constraint_errors.attr.attr,
>> +   &port_pma_attr_port_rcv_constraint_errors.attr.attr,
>> +   &port_pma_attr_local_link_integrity_errors.attr.attr,
>> +   &port_pma_attr_excessive_buffer_overrun_errors.attr.attr,
>> +   &port_pma_attr_VL15_dropped.attr.attr,
>> +   &port_pma_attr_ext_port_xmit_data.attr.attr,
>> +   &port_pma_attr_ext_port_rcv_data.attr.attr,
>> +   &port_pma_attr_ext_port_xmit_packets.attr.attr,
>> +   &port_pma_attr_ext_port_rcv_packets.attr.attr,
>> +   &port_pma_attr_ext_unicast_rcv_packets.attr.attr,
>> +   &port_pma_attr_ext_unicast_xmit_packets.attr.attr,
>> +   &port_pma_attr_ext_multicast_rcv_packets.attr.attr,
>> +   &port_pma_attr_ext_multicast_xmit_packets.attr.attr,

Re: [PATCH 3/3] Display extended counter set if available

2015-12-21 Thread Hal Rosenstock

On 12/21/2015 2:47 PM, ira.weiny wrote:
> On Mon, Dec 21, 2015 at 01:31:31PM -0600, Christoph Lameter wrote:
>> On Mon, 21 Dec 2015, Hal Rosenstock wrote:
>>
>>>> Don't we need to change all the sysfs_remove_groups to use 
>>>> get_counter_table as
>>>> well?
>>>
>>> Looks like it to me too. Good catch.
>>
>> Fix follows:
>>
>> From: Christoph Lameter 
>> Subject: Fix sysfs entry removal by storing the table format in  pma_table
>>
>> Store the table being used in the ib_port structure and use it when sysfs
>> entries have to be removed.
>>
>> Signed-off-by: Christoph Lameter 
> 
> Reviewed-by: Ira Weiny 

Reviewed-by: Hal Rosenstock 

> 
>>
>> Index: linux/drivers/infiniband/core/sysfs.c
>> ===
>> --- linux.orig/drivers/infiniband/core/sysfs.c
>> +++ linux/drivers/infiniband/core/sysfs.c
>> @@ -47,6 +47,7 @@ struct ib_port {
>>  struct attribute_group gid_group;
>>  struct attribute_group pkey_group;
>>  u8 port_num;
>> +struct attribute_group *pma_table;
>>  };
>>
>>  struct port_attribute {
>> @@ -651,7 +652,8 @@ static int add_port(struct ib_device *de
>>  return ret;
>>  }
>>
>> -ret = sysfs_create_group(&p->kobj, get_counter_table(device));
>> +p->pma_table = get_counter_table(device);
>> +ret = sysfs_create_group(&p->kobj, p->pma_table);
>>  if (ret)
>>  goto err_put;
>>
>> @@ -710,7 +712,7 @@ err_free_gid:
>>  p->gid_group.attrs = NULL;
>>
>>  err_remove_pma:
>> -sysfs_remove_group(&p->kobj, &pma_group);
>> +sysfs_remove_group(&p->kobj, p->pma_table);
>>
>>  err_put:
>>  kobject_put(&p->kobj);
>> @@ -923,7 +925,7 @@ static void free_port_list_attributes(st
>>  list_for_each_entry_safe(p, t, &device->port_list, entry) {
>>  struct ib_port *port = container_of(p, struct ib_port, kobj);
>>  list_del(&p->entry);
>> -sysfs_remove_group(p, &pma_group);
>> +sysfs_remove_group(p, port->pma_table);
>>  sysfs_remove_group(p, &port->pkey_group);
>>  sysfs_remove_group(p, &port->gid_group);
>>  kobject_put(p);
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] Display extended counter set if available

2015-12-21 Thread Hal Rosenstock

On 12/21/2015 12:53 PM, ira.weiny wrote:
> On Mon, Dec 21, 2015 at 08:20:29AM -0600, Christoph Lameter wrote:
>> V2->V3: Add check for NOIETF mode and create special table
>>   for that case.
>>
>> Check if the extended counters are available and if so
>> create the proper extended and additional counters.
>>
>> Reviewed-by: Hal Rosenstock 
>> Signed-off-by: Christoph Lameter 
>> ---
>>  drivers/infiniband/core/sysfs.c | 104 
>> +++-
>>  include/rdma/ib_pma.h   |   1 +
>>  2 files changed, 104 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/infiniband/core/sysfs.c 
>> b/drivers/infiniband/core/sysfs.c
>> index 34dcc23..b179fca 100644
>> --- a/drivers/infiniband/core/sysfs.c
>> +++ b/drivers/infiniband/core/sysfs.c
>> @@ -320,6 +320,13 @@ struct port_table_attribute port_pma_attr_##_name = {   
>> \
>>  .attr_id = IB_PMA_PORT_COUNTERS ,   \
>>  }
>>  
>> +#define PORT_PMA_ATTR_EXT(_name, _width, _offset)   \
>> +struct port_table_attribute port_pma_attr_ext_##_name = {   \
>> +.attr  = __ATTR(_name, S_IRUGO, show_pma_counter, NULL),\
>> +.index = (_offset) | ((_width) << 16),  \
>> +.attr_id = IB_PMA_PORT_COUNTERS_EXT ,   \
>> +}
>> +
>>  /*
>>   * Get a Perfmgmt MAD block of data.
>>   * Returns error code or the number of bytes retrieved.
>> @@ -400,6 +407,11 @@ static ssize_t show_pma_counter(struct ib_port *p, 
>> struct port_attribute *attr,
>>  ret = sprintf(buf, "%u\n",
>>be32_to_cpup((__be32 *)data));
>>  break;
>> +case 64:
>> +ret = sprintf(buf, "%llu\n",
>> +be64_to_cpup((__be64 *)data));
>> +break;
>> +
>>  default:
>>  ret = 0;
>>  }
>> @@ -424,6 +436,18 @@ static PORT_PMA_ATTR(port_rcv_data  , 
>> 13, 32, 224);
>>  static PORT_PMA_ATTR(port_xmit_packets  , 14, 32, 256);
>>  static PORT_PMA_ATTR(port_rcv_packets   , 15, 32, 288);
>>  
>> +/*
>> + * Counters added by extended set
>> + */
>> +static PORT_PMA_ATTR_EXT(port_xmit_data , 64,  64);
>> +static PORT_PMA_ATTR_EXT(port_rcv_data  , 64, 128);
>> +static PORT_PMA_ATTR_EXT(port_xmit_packets  , 64, 192);
>> +static PORT_PMA_ATTR_EXT(port_rcv_packets   , 64, 256);
>> +static PORT_PMA_ATTR_EXT(unicast_xmit_packets   , 64, 320);
>> +static PORT_PMA_ATTR_EXT(unicast_rcv_packets, 64, 384);
>> +static PORT_PMA_ATTR_EXT(multicast_xmit_packets , 64, 448);
>> +static PORT_PMA_ATTR_EXT(multicast_rcv_packets  , 64, 512);
>> +
>>  static struct attribute *pma_attrs[] = {
>>  &port_pma_attr_symbol_error.attr.attr,
>>  &port_pma_attr_link_error_recovery.attr.attr,
>> @@ -444,11 +468,65 @@ static struct attribute *pma_attrs[] = {
>>  NULL
>>  };
>>  
>> +static struct attribute *pma_attrs_ext[] = {
>> +&port_pma_attr_symbol_error.attr.attr,
>> +&port_pma_attr_link_error_recovery.attr.attr,
>> +&port_pma_attr_link_downed.attr.attr,
>> +&port_pma_attr_port_rcv_errors.attr.attr,
>> +&port_pma_attr_port_rcv_remote_physical_errors.attr.attr,
>> +&port_pma_attr_port_rcv_switch_relay_errors.attr.attr,
>> +&port_pma_attr_port_xmit_discards.attr.attr,
>> +&port_pma_attr_port_xmit_constraint_errors.attr.attr,
>> +&port_pma_attr_port_rcv_constraint_errors.attr.attr,
>> +&port_pma_attr_local_link_integrity_errors.attr.attr,
>> +&port_pma_attr_excessive_buffer_overrun_errors.attr.attr,
>> +&port_pma_attr_VL15_dropped.attr.attr,
>> +&port_pma_attr_ext_port_xmit_data.attr.attr,
>> +&port_pma_attr_ext_port_rcv_data.attr.attr,
>> +&port_pma_attr_ext_port_xmit_packets.attr.attr,
>> +&port_pma_attr_ext_port_rcv_packets.attr.attr,
>> +&port_pma_attr_ext_unicast_rcv_packets.attr.attr,
>> +&port_pma_attr_ext_unicast_xmit_packets.attr.attr,
>> +&port_pma_attr_ext_multicast_rcv_packets.attr.attr,
>> +&port_pma_attr_ext_multicast_xmit_packets.attr.attr,
>> +NULL
>> +};
>> +
>> +static struct attribute *pma_attrs_noietf[] = {
>> +&port_pma_attr

Re: [PATCH 3/3] IB core: Display 64 bit counters from the extended set

2015-12-20 Thread Hal Rosenstock

On 12/20/2015 5:10 AM, Matan Barak wrote:
> On Mon, Dec 14, 2015 at 6:06 PM, Christoph Lameter  wrote:
>> On Mon, 14 Dec 2015, Matan Barak wrote:
>>
 No idea what the counter is doing. Saw another EXT counter implementation
 use 0 so I thought that was fine.
>>>
>>> It seems like a counter index, but I might be wrong though. If it is,
>>> don't we want to preserve the existing non-EXT schema for the new
>>> counters too?
>>
>> I do not see any use of that field so I am not sure what to put in there.
>> Could it be obsolete?
>>
> 
> I don't see any usage to that field either, but I think 0 is a bit misleading.
> So maybe it's more appropriate to delete this field.

counter number is gone in Christoph's latest patches in terms of the new
port extended counters but still present for the original port counters.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] bject: IB Core: Display extended counter set if available

2015-12-18 Thread Hal Rosenstock

On 12/18/2015 10:46 AM, Christoph Lameter wrote:
> Subject: IB core counters: Support noietf extended counters V2
> 
> V1-V2: Fix logic to detect when 64 bit counter are available
>   based on Hal's suggestions.
> 
> Detect if we have extended counters but not IETF counters.
> For that we need a special table and create a function that
> returns the table address.
> 
> Signed-off-by: Christoph Lameter 

Reviewed-by: Hal Rosenstock 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] bject: IB Core: Display extended counter set if available

2015-12-18 Thread Hal Rosenstock

On 12/18/2015 8:39 AM, Christoph Lameter wrote:
> On Thu, 17 Dec 2015, Hal Rosenstock wrote:
> 
>>> +   if (cpi.capability_mask && IB_PMA_CLASS_CAP_EXT_WIDTH) {
>>> +   /* We have extended counters */
>>> +
>>> +   if (cpi.capability_mask && IB_PMA_CLASS_CAP_EXT_WIDTH_NOIETF)
>>> +   /* But not the IETF ones */
>>> +   return &pma_group_noietf;
>>
>> These 2 capability bits are mutually exclusive so I think it should be:
>>
>>  if (cpi.capability_mask && IB_PMA_CLASS_CAP_EXT_WIDTH) {
>>  /* We have extended counters */
>>  return &pma_group_ext;
>>  }
>>
>>  if (cpi.capability_mask && IB_PMA_CLASS_CAP_EXT_WIDTH_NOIETF)
>>  /* But not the IETF ones */
>>  return &pma_group_noietf;
> 
> This case would then use the 64 bit counters despite of the
> IB_PMA_CLASS_CAP_EXT_WIDTH not being set.

Yes, IB_PMA_CLASS_CAP_EXT_WIDTH means all extended counters including
IETF ones whereas IB_PMA_CLASS_CAP_EXT_WIDTH_NOIETF means extended
counters without IETF ones ([uni multi]cast [rcv xmit] pkts).

> 
>>  }
>>
>>  return &pma_group;
>>
>>> +
> 
> The tables contain all the counters each. So we would need another table
> of counters that has the ietf counters but not the 64 bit extended ones?

I'm not following what you mean. I thought pma_group_noietf and
pma_group_ext take care of the 2 different options for extended counters
(with the error counters needed from the mandatory PortCounters) and
pma_group is when neither of the extended counter capabilities is
indicated.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] bject: IB Core: Display extended counter set if available

2015-12-17 Thread Hal Rosenstock

On 12/17/2015 5:50 PM, Christoph Lameter wrote:
> On Thu, 17 Dec 2015, Hal Rosenstock wrote:
> 
>> On 12/17/2015 4:28 PM, Christoph Lameter wrote:
>>> So port_check_extended_counters need to return another value for this.
>>> The IETF ones are the uni/mcast xxx counters?
>>
>> Yes
> 
> Ok. Then this patch on top of the last one should give us all of what you
> want:
> 
> 
> 
> Subject: IB core counters: Support noietf extended counters
> 
> Detect if we have extended counters but not IETF counters.
> For that we need a special table and create a function that
> returns the table address.
> 
> Signed-off-by: Christoph Lameter 
> 
> Index: linux/drivers/infiniband/core/sysfs.c
> ===
> --- linux.orig/drivers/infiniband/core/sysfs.c
> +++ linux/drivers/infiniband/core/sysfs.c
> @@ -493,6 +493,26 @@ static struct attribute *pma_attrs_ext[]
>   NULL
>  };
> 
> +static struct attribute *pma_attrs_noietf[] = {
> + &port_pma_attr_symbol_error.attr.attr,
> + &port_pma_attr_link_error_recovery.attr.attr,
> + &port_pma_attr_link_downed.attr.attr,
> + &port_pma_attr_port_rcv_errors.attr.attr,
> + &port_pma_attr_port_rcv_remote_physical_errors.attr.attr,
> + &port_pma_attr_port_rcv_switch_relay_errors.attr.attr,
> + &port_pma_attr_port_xmit_discards.attr.attr,
> + &port_pma_attr_port_xmit_constraint_errors.attr.attr,
> + &port_pma_attr_port_rcv_constraint_errors.attr.attr,
> + &port_pma_attr_local_link_integrity_errors.attr.attr,
> + &port_pma_attr_excessive_buffer_overrun_errors.attr.attr,
> + &port_pma_attr_VL15_dropped.attr.attr,
> + &port_pma_attr_ext_port_xmit_data.attr.attr,
> + &port_pma_attr_ext_port_rcv_data.attr.attr,
> + &port_pma_attr_ext_port_xmit_packets.attr.attr,
> + &port_pma_attr_ext_port_rcv_packets.attr.attr,
> + NULL
> +};
> +
>  static struct attribute_group pma_group = {
>   .name  = "counters",
>   .attrs  = pma_attrs
> @@ -503,6 +523,11 @@ static struct attribute_group pma_group_
>   .attrs  = pma_attrs_ext
>  };
> 
> +static struct attribute_group pma_group_noietf = {
> + .name  = "counters",
> + .attrs  = pma_attrs_noietf
> +};
> +
>  static void ib_port_release(struct kobject *kobj)
>  {
>   struct ib_port *p = container_of(kobj, struct ib_port, kobj);
> @@ -576,23 +601,32 @@ err:
>  }
> 
>  /*
> - * Check if the port supports the Extended Counters.
> - * Return error code of 0 for success
> + * Figure out which counter table to use depending on
> + * the device capabilities.
>   */
> -static int port_check_extended_counters(struct ib_device *dev)
> +static struct attribute_group *get_counter_table(struct ib_device *dev)
>  {
>   int ret = 0;
>   struct ib_class_port_info cpi;
> 
>   ret = get_perf_mad(dev, 0, IB_PMA_CLASS_PORT_INFO, &cpi, 40, 
> sizeof(cpi));
> 
> - if (ret >= 0) {
> - if (!(cpi.capability_mask && IB_PMA_CLASS_CAP_EXT_WIDTH) &&
> - !(cpi.capability_mask && 
> IB_PMA_CLASS_CAP_EXT_WIDTH_NOIETF))
> - ret = -ENOSYS;
> + if (ret < 0)
> + /* Fall back to normal counters */
> + return &pma_group;
> +
> +
> + if (cpi.capability_mask && IB_PMA_CLASS_CAP_EXT_WIDTH) {
> + /* We have extended counters */
> +
> + if (cpi.capability_mask && IB_PMA_CLASS_CAP_EXT_WIDTH_NOIETF)
> + /* But not the IETF ones */
> + return &pma_group_noietf;

These 2 capability bits are mutually exclusive so I think it should be:

if (cpi.capability_mask && IB_PMA_CLASS_CAP_EXT_WIDTH) {
/* We have extended counters */
return &pma_group_ext;
}

if (cpi.capability_mask && IB_PMA_CLASS_CAP_EXT_WIDTH_NOIETF)
/* But not the IETF ones */
return &pma_group_noietf;
}

return &pma_group;

> +
> + return &pma_group_ext;
>   }
> 
> - return ret;
> + return &pma_group;
>  }
> 
>  static int add_port(struct ib_device *device, int port_num,
> @@ -623,11 +657,7 @@ static int add_port(struct ib_device *de
>   return ret;
>   }
> 
> - ret = sysfs_create_group(&p->kobj,
> - port_check_extended_counters(device) ?
> - &pma_group_ext :
> - &pma_group);
> -
> + ret = sysfs_create_group(&p->kobj, get_counter_table(device));
>   if (ret)
>   goto err_put;
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] bject: IB Core: Display extended counter set if available

2015-12-17 Thread Hal Rosenstock

On 12/17/2015 4:28 PM, Christoph Lameter wrote:
> So port_check_extended_counters need to return another value for this.
> The IETF ones are the uni/mcast xxx counters?

Yes
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/3] IB Core: Create get_perf_mad function in sysfs.c

2015-12-17 Thread Hal Rosenstock

On 12/17/2015 2:52 PM, Christoph Lameter wrote:
> Create a new function to retrieve performance management
> data from the existing code in get_pma_counter().
> 
> Signed-off-by: Christoph Lameter 

Reviewed-by: Hal Rosenstock 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/3] IB core counters: Specify attribute_id in port_table_attribute

2015-12-17 Thread Hal Rosenstock

On 12/17/2015 2:52 PM, Christoph Lameter wrote:
> Add the attr_id on port_table_attribute since we will have to add
> a different port_table_attribute for the extended attribute soon.
> 
> Signed-off-by: Christoph Lameter 

Reviewed-by: Hal Rosenstock 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] bject: IB Core: Display extended counter set if available

2015-12-17 Thread Hal Rosenstock

On 12/17/2015 2:52 PM, Christoph Lameter wrote:
> Check if the extended counters are available and if so
> create the proper extended and additional counters.
> 
> Signed-off-by: Christoph Lameter 
> 
> Index: linux/drivers/infiniband/core/sysfs.c
> ===
> --- linux.orig/drivers/infiniband/core/sysfs.c
> +++ linux/drivers/infiniband/core/sysfs.c
> @@ -320,6 +320,13 @@ struct port_table_attribute port_pma_att
>   .attr_id = IB_PMA_PORT_COUNTERS ,   \
>  }
>  
> +#define PORT_PMA_ATTR_EXT(_name, _width, _offset)\
> +struct port_table_attribute port_pma_attr_ext_##_name = {\
> + .attr  = __ATTR(_name, S_IRUGO, show_pma_counter, NULL),\
> + .index = (_offset) | ((_width) << 16),  \
> + .attr_id = IB_PMA_PORT_COUNTERS_EXT ,   \
> +}
> +
>  /*
>   * Get a Perfmgmt MAD block of data.
>   * Returns error code or the number of bytes retrieved.
> @@ -349,7 +356,8 @@ static int get_perf_mad(struct ib_device
>   in_mad->mad_hdr.method= IB_MGMT_METHOD_GET;
>   in_mad->mad_hdr.attr_id   = attr;
>  
> - in_mad->data[41] = port_num;/* PortSelect field */
> + if (port_num)
> + in_mad->data[41] = port_num;/* PortSelect field */
>  
>   if ((dev->process_mad(dev, IB_MAD_IGNORE_MKEY,
>port_num, NULL, NULL,
> @@ -400,6 +408,11 @@ static ssize_t show_pma_counter(struct i
>   ret = sprintf(buf, "%u\n",
> be32_to_cpup((__be32 *)data));
>   break;
> + case 64:
> + ret = sprintf(buf, "%llu\n",
> + be64_to_cpup((__be64 *)data));
> + break;
> +
>   default:
>   ret = 0;
>   }
> @@ -424,6 +437,18 @@ static PORT_PMA_ATTR(port_rcv_data
>  static PORT_PMA_ATTR(port_xmit_packets   , 14, 32, 256);
>  static PORT_PMA_ATTR(port_rcv_packets, 15, 32, 288);
>  
> +/*
> + * Counters added by extended set
> + */
> +static PORT_PMA_ATTR_EXT(port_xmit_data  , 64,  64);
> +static PORT_PMA_ATTR_EXT(port_rcv_data   , 64, 128);
> +static PORT_PMA_ATTR_EXT(port_xmit_packets   , 64, 192);
> +static PORT_PMA_ATTR_EXT(port_rcv_packets, 64, 256);
> +static PORT_PMA_ATTR_EXT(unicast_xmit_packets, 64, 320);
> +static PORT_PMA_ATTR_EXT(unicast_rcv_packets , 64, 384);
> +static PORT_PMA_ATTR_EXT(multicast_xmit_packets  , 64, 448);
> +static PORT_PMA_ATTR_EXT(multicast_rcv_packets   , 64, 512);
> +
>  static struct attribute *pma_attrs[] = {
>   &port_pma_attr_symbol_error.attr.attr,
>   &port_pma_attr_link_error_recovery.attr.attr,
> @@ -444,11 +469,40 @@ static struct attribute *pma_attrs[] = {
>   NULL
>  };
>  
> +static struct attribute *pma_attrs_ext[] = {
> + &port_pma_attr_symbol_error.attr.attr,
> + &port_pma_attr_link_error_recovery.attr.attr,
> + &port_pma_attr_link_downed.attr.attr,
> + &port_pma_attr_port_rcv_errors.attr.attr,
> + &port_pma_attr_port_rcv_remote_physical_errors.attr.attr,
> + &port_pma_attr_port_rcv_switch_relay_errors.attr.attr,
> + &port_pma_attr_port_xmit_discards.attr.attr,
> + &port_pma_attr_port_xmit_constraint_errors.attr.attr,
> + &port_pma_attr_port_rcv_constraint_errors.attr.attr,
> + &port_pma_attr_local_link_integrity_errors.attr.attr,
> + &port_pma_attr_excessive_buffer_overrun_errors.attr.attr,
> + &port_pma_attr_VL15_dropped.attr.attr,
> + &port_pma_attr_ext_port_xmit_data.attr.attr,
> + &port_pma_attr_ext_port_rcv_data.attr.attr,
> + &port_pma_attr_ext_port_xmit_packets.attr.attr,
> + &port_pma_attr_ext_port_rcv_packets.attr.attr,
> + &port_pma_attr_ext_unicast_rcv_packets.attr.attr,
> + &port_pma_attr_ext_unicast_xmit_packets.attr.attr,
> + &port_pma_attr_ext_multicast_rcv_packets.attr.attr,
> + &port_pma_attr_ext_multicast_xmit_packets.attr.attr,
> + NULL
> +};
> +
>  static struct attribute_group pma_group = {
>   .name  = "counters",
>   .attrs  = pma_attrs
>  };
>  
> +static struct attribute_group pma_group_ext = {
> + .name  = "counters",
> + .attrs  = pma_attrs_ext
> +};
> +
>  static void ib_port_release(struct kobject *kobj)
>  {
>   struct ib_port *p = container_of(kobj, struct ib_port, kobj);
> @@ -521,6 +575,26 @@ err:
>   return NULL;
>  }
>  
> +/*
> + * Check if the port supports the Extended Counters.
> + * Return error code of 0 for success
> + */
> +static int port_check_extended_counters(struct ib_device *dev)
> +{
> + int ret = 0;
> + struct ib_class_port_info cpi;
> +
> + ret = get_perf_mad(dev, 0, IB_PMA_CLASS_PORT_INFO, &cpi, 40, 
> sizeof(cpi));
> +
> + if (ret >= 0) {
> + if (!(cpi.capability_mask && IB_PMA_CLASS_CAP_EXT_WIDTH) &&
> +

Re: [PATCH 3/3] IB core: Display 64 bit counters from the extended set

2015-12-17 Thread Hal Rosenstock

On 12/17/2015 2:21 PM, Christoph Lameter wrote:
> On Thu, 17 Dec 2015, Hal Rosenstock wrote:
> 
>>> I thought a port is always supplied since we get the info for a particular
>>> port and the directory only exists if there is a port?
>>
>> Yes, but there is no port (PortSelect) field in ClassPortInfo attribute
>> unlike the PortCounters and PortExtendedCounters attributes.
> 
> Ok but its valid for all ports on that class right? Then this does not
> matter?

It would be queried for each end port LID but should provide same
response on each port. Note that switch only has port 0 as end port.

It looks to me that in the query that the port supplied overwrites
ClassVersion field in supplied ClassPortInfo attribute which is MAD data
offset 41 where PortSelect goes. ClassPortInfo.ClassVersion is a RO
field and should be ignored by the PMA in the query and set properly in
the response so yes, you're right that it should not matter.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] IB core: Display 64 bit counters from the extended set

2015-12-17 Thread Hal Rosenstock



On 12/17/2015 1:54 PM, Christoph Lameter wrote:
> On Thu, 17 Dec 2015, Hal Rosenstock wrote:

>> ClassPortInfo is per class not per class per port so need to indicate to
>> get_mad whether a port is supplied or not or conditionalize based on
>> attr ID.
> 
> I thought a port is always supplied since we get the info for a particular
> port and the directory only exists if there is a port?

Yes, but there is no port (PortSelect) field in ClassPortInfo attribute
unlike the PortCounters and PortExtendedCounters attributes.

>>> -   ret = sysfs_create_group(&p->kobj, &pma_group);
>>> +   ret = sysfs_create_group(&p->kobj,
>>> +   port_check_extended_counters(device, port_num) ?
>>> +   &pma_group_ext :
>>> +   &pma_group);
>>
>> PortExtendedCounters does not have all the error counters in
>> PortCounters so this isn't an either or. When extended port counters are
>> supported should still include the original port counters with the
>> exception of the [xmit rcv] [pkts data] which should come from the
>> extended counters.
> 
> The original port counters are still included. The _ext table refers to
> both extended and regular counters.

Good; I missed that; sorry.

-- Hal
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] IB core: Display 64 bit counters from the extended set

2015-12-17 Thread Hal Rosenstock

On 12/16/2015 2:34 PM, Christoph Lameter wrote:
> On Wed, 16 Dec 2015, Christoph Lameter wrote:
> 
>> DRAFT: This is missing the check if this device supports
>> extended counters.
> 
> Found some time and here is the patch with the detection of the extended
> attribute through sending a mad request. Untested. Got the info on how
> to do the proper mad request from an earlier patch by Or in 2011.
> 
> 
> Subject: IB Core: Display extended counter set if available V2
> 
> Check if the extended counters are available and if so
> create the proper extended and additional counters.

Looks mostly good to me with some minor comments below.

> Signed-off-by: Christoph Lameter 
> 
> Index: linux/drivers/infiniband/core/sysfs.c
> ===
> --- linux.orig/drivers/infiniband/core/sysfs.c
> +++ linux/drivers/infiniband/core/sysfs.c
> @@ -39,6 +39,7 @@
>  #include 
> 
>  #include 
> +#include 
> 
>  struct ib_port {
>   struct kobject kobj;
> @@ -65,6 +66,7 @@ struct port_table_attribute {
>   struct port_attribute   attr;
>   charname[8];
>   int index;
> + int attr_id;
>  };
> 
>  static ssize_t port_attr_show(struct kobject *kobj,
> @@ -314,24 +316,33 @@ static ssize_t show_port_pkey(struct ib_
>  #define PORT_PMA_ATTR(_name, _counter, _width, _offset)  
> \
>  struct port_table_attribute port_pma_attr_##_name = {
> \
>   .attr  = __ATTR(_name, S_IRUGO, show_pma_counter, NULL),\
> - .index = (_offset) | ((_width) << 16) | ((_counter) << 24)  \
> + .index = (_offset) | ((_width) << 16) | ((_counter) << 24), \
> + .attr_id = IB_PMA_PORT_COUNTERS ,   \
>  }
> 
> -static ssize_t show_pma_counter(struct ib_port *p, struct port_attribute 
> *attr,
> - char *buf)
> +#define PORT_PMA_ATTR_EXT(_name, _width, _offset)\
> +struct port_table_attribute port_pma_attr_ext_##_name = {\
> + .attr  = __ATTR(_name, S_IRUGO, show_pma_counter, NULL),\
> + .index = (_offset) | ((_width) << 16),  \
> + .attr_id = IB_PMA_PORT_COUNTERS_EXT ,   \
> +}
> +
> +
> +/*
> + * Get a MAD block of data.

Nit: Get PerfMgt MAD block of data

> + * Returns error code or the number of bytes retrieved.
> + */
> +static int get_mad(struct ib_device *dev, int port_num, int attr,

Nit: Maybe this is too verbose but better name might be get_perf_mad

> + void *data, int offset, size_t size)
>  {
> - struct port_table_attribute *tab_attr =
> - container_of(attr, struct port_table_attribute, attr);
> - int offset = tab_attr->index & 0x;
> - int width  = (tab_attr->index >> 16) & 0xff;
> - struct ib_mad *in_mad  = NULL;
> - struct ib_mad *out_mad = NULL;
> + struct ib_mad *in_mad;
> + struct ib_mad *out_mad;
>   size_t mad_size = sizeof(*out_mad);
>   u16 out_mad_pkey_index = 0;
>   ssize_t ret;
> 
> - if (!p->ibdev->process_mad)
> - return sprintf(buf, "N/A (no PMA)\n");
> + if (!dev->process_mad)
> + return -ENOSYS;
> 
>   in_mad  = kzalloc(sizeof *in_mad, GFP_KERNEL);
>   out_mad = kmalloc(sizeof *out_mad, GFP_KERNEL);
> @@ -344,12 +355,12 @@ static ssize_t show_pma_counter(struct i
>   in_mad->mad_hdr.mgmt_class= IB_MGMT_CLASS_PERF_MGMT;
>   in_mad->mad_hdr.class_version = 1;
>   in_mad->mad_hdr.method= IB_MGMT_METHOD_GET;
> - in_mad->mad_hdr.attr_id   = cpu_to_be16(0x12); /* PortCounters */
> + in_mad->mad_hdr.attr_id   = attr;
> 
> - in_mad->data[41] = p->port_num; /* PortSelect field */
> + in_mad->data[41] = port_num;/* PortSelect field */
> 
> - if ((p->ibdev->process_mad(p->ibdev, IB_MAD_IGNORE_MKEY,
> -  p->port_num, NULL, NULL,
> + if ((dev->process_mad(dev, IB_MAD_IGNORE_MKEY,
> +  port_num, NULL, NULL,
>(const struct ib_mad_hdr *)in_mad, mad_size,
>(struct ib_mad_hdr *)out_mad, &mad_size,
>&out_mad_pkey_index) &
> @@ -358,31 +369,54 @@ static ssize_t show_pma_counter(struct i
>   ret = -EINVAL;
>   goto out;
>   }
> + memcpy(data, out_mad->data + offset, size);
> + ret = size;
> +out:
> + kfree(in_mad);
> + kfree(out_mad);
> + return ret;
> +}
> +
> +static ssize_t show_pma_counter(struct ib_port *p, struct port_attribute 
> *attr,
> + char *buf)
> +{
> + struct port_table_attribute *tab_attr =
> + container_of(attr, struct port_table_attribute, attr);
> + int offset = tab_attr->index & 0x;
> + int width  = (tab_attr->index >> 16) & 0xff;
> + ssize_t ret;
> + u8 data[8];
> +
> + ret = get_mad(p->ibdev, p->port_num, tab_attr->

Re: [PATCH 3/3] IB core: Display 64 bit counters from the extended set

2015-12-15 Thread Hal Rosenstock

On 12/15/2015 4:20 PM, Jason Gunthorpe wrote:
>> The unicast/multicast extended counters are not always supported -
>> > depends on setting of PerfMgt ClassPortInfo
>> > CapabilityMask.IsExtendedWidthSupportedNoIETF (bit 10).

> Yes.. certainly this proposed patch needs to account for that and
> continue to use the 32 bit ones in that case.

There are no 32 bit equivalents of those 4 "IETF" counters ([uni
multi]cast [xmit rcv] pkts).

When not supported, perhaps it is best not to populate these counters in
sysfs so one can discern between counter not supported and 0 value.

I'm still working on definitive mthca answer but think the attribute is
not supported there. Does anyone out there have an mthca setup where
they can try this ?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] IB core: Display 64 bit counters from the extended set

2015-12-15 Thread Hal Rosenstock

On 12/15/2015 2:55 PM, Jason Gunthorpe wrote:
> On Tue, Dec 15, 2015 at 01:51:35PM -0600, Christoph Lameter wrote:
>> On Mon, 14 Dec 2015, Hal Rosenstock wrote:
>>
>>>> Mellanox should really confirm this for their hardware matrix.
>>>
>>> I am trying to get definitive answer to this.
>>
>> I was told today on a conf call with a couple of Mellanox employees that
>> extended counters are always available.
> 
> .. and every field is valid?
> 
> I would agree, from my observation, that the two main byte counters
> are always available.

The extended packet counts work but I thought there was a PMA with one
of the extended byte counts wired to 0.

> But not necessarily the others.

The unicast/multicast extended counters are not always supported -
depends on setting of PerfMgt ClassPortInfo
CapabilityMask.IsExtendedWidthSupportedNoIETF (bit 10).

-- Hal

> Jason 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] IB core: Display 64 bit counters from the extended set

2015-12-14 Thread Hal Rosenstock

On 12/11/2015 7:00 PM, Jason Gunthorpe wrote:
>> qib, mlx4 are fine.  mlx5 should be as well I would think (I don't have that
>> hardware.)

I'm not 100% sure but I don't think that mthca supports the
PortCountersExtended attribute.

> I have no specifics to add, but I keep running into systems, even
> today, where the 64 bit counters don't work. The MAD might be there,
> but several counters are wired to 0.

I have seen this too :-(

> Not sure exactly which HW though.
> 
> Mellanox should really confirm this for their hardware matrix.

I am trying to get definitive answer to this.

-- Hal
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/1] Ibacm: default pkey for partitioned fabrics

2015-12-09 Thread Hal Rosenstock

On 12/9/2015 11:26 AM, Hefty, Sean wrote:
>>> I mean that the user needs to investigate why the fabric is not working
>> out of box.
>>
>> My point is that an educated admin should _know_ to configure in these
>> cases and that debug is only when things are broken not by default in
>> these more complex cases. This means the limitations of the out of box
>> approach needs to be explained in the docs.
> 
> When IP addresses are used, the corresponding pkey is used.  The issue this 
> patch is addressing is the mapping of 'hostnames' to pkeys, as show in this 
> ibacm_addr.cfg example:
> 
> #Name  device port pkey
> cst-lin0   mlx4_0 1default
> cst-lin0-1 mlx4_0 1default
> cst-lin0-2 mlx4_0 2default
> 
> Currently, 'default' is hard-coded to a pkey of 0x.  The intent is to 
> define a better default value.  Kaike has suggested this be the new default:
> 
> 1. Find the first non-management full-member pkey;

By "non-management full-member pkey", I think you mean "pkey which is
other than a full member of default (0x7fff) partition".

> 2. If it fails, find pkey 0x;

To me, finding pkey 0x is better/safer than assuming it's in index 0
although it's likely there.

> 3. If pkey 0x is not available, use the first pkey.
> 
> Is there better alternative for what default should be?  

Order of 1 and 2 depends on use models for full default partition and
other partitions. Reversing 1 and 2 (full default partition first) would
handle the most common use models and handles Jason's case.

The only common case that I'm aware of where that might fall down is in
the virtualized case. I'm not sure what policy is best there and would
need to think about that scenario some more (and there is more
fundamental issue with ACM in those environments).

> Jason was suggesting use pkey[0], which seems less robust in theory, but is 
> simple and may cover the vast majority of real use cases.

Seems less robust to me too.

> The fewer cases where manual configuration is necessary, the fewer emails I 
> receive, and the better off I am.  :)

Understood.

-- Hal

> - Sean
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/1] Ibacm: default pkey for partitioned fabrics

2015-12-09 Thread Hal Rosenstock

On 12/9/2015 10:04 AM, Wan, Kaike wrote:
> I mean that the user needs to investigate why the fabric is not working out 
> of box. 

My point is that an educated admin should _know_ to configure in these
cases and that debug is only when things are broken not by default in
these more complex cases. This means the limitations of the out of box
approach needs to be explained in the docs.

> This patch itself does not make configuration of the fabric harder.

Agreed.

> On the contrary, it relieves the user from having to configure ibacm on each 
> node in those common cases.

Agreed.

-- Hal
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/1] Ibacm: default pkey for partitioned fabrics

2015-12-09 Thread Hal Rosenstock

>> Unfortunately not. It comes down to whether the out of box cases outweigh
>> the debug when it's an exception case. The premise of this patch is that
>> that's the case.
> 
> I would argue that it does. Without this patch, ibacm will not work on secure 
> fabric out of box 
> (where "default" is interpret as 0x), and it will be equal or more likely 
> not to work by default 
> in more complicated configurations, where debugging is required anyway. 

Why is debugging required anyway in those more complicated cases ? It's
configuration that's required.

> This patch enables ibacm to work properly in most common configurations out 
> of box.

Agreed.

-- Hal

> Kaike
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/1] Ibacm: default pkey for partitioned fabrics

2015-12-09 Thread Hal Rosenstock

On 12/9/2015 8:55 AM, Wan, Kaike wrote:
> This is the best effort and it should work for most common configurations, 
> but may not work for more complicated cases.

Right, there are various scenarios where it will not work. This was one
of them but there are others I can think of.

> Any other suggestions?

Unfortunately not. It comes down to whether the out of box cases
outweigh the debug when it's an exception case. The premise of this
patch is that that's the case.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/1] Ibacm: default pkey for partitioned fabrics

2015-12-09 Thread Hal Rosenstock

On 12/9/2015 8:24 AM, Wan, Kaike wrote:
>> From: Hal Rosenstock [mailto:h...@dev.mellanox.co.il]
>> Sent: Wednesday, December 09, 2015 7:50 AM
>> To: Wan, Kaike; Hefty, Sean
>> Cc: linux-rdma@vger.kernel.org
>> Subject: Re: [PATCH 1/1] Ibacm: default pkey for partitioned fabrics
>>
>> On 12/8/2015 12:33 PM, kaike@intel.com wrote:
>>> From: Kaike Wan 
>>>
>>> In an insecure IB fabric, the default pkey in a port is 0x, where
>>> each node is allowed to talk to any other node in the fabric,
>>> including the SA node. However, in a secure fabric, to limit member
>>> access, not all nodes can have the full-member default pkey 0x. A
>>> typical configuration is to let SA node have pkey 0x while all
>>> other nodes have pkey 0x7fff; in addition, each node can be assigned
>>> some other full-member pkeys, such as
>>> 0x8001 and 0x8002, so that it can be assigned to different partitions.
>>> In this case, each node can access SA, and yet limits its other access
>>> to only those nodes in its assigned partitions. In such a secure
>>> fabric, however, ibacm will not work by interpreting "default" in its
>>> default address file as 0x.
>>>
>>> To solve the problem, this patch introduces the following priority to
>>> interpret default pkey:
>>> 1. Find the first non-management full-member pkey; 2. If it fails,
>>> find pkey 0x; 3. If pkey 0x is not available, use the first
>>> pkey.
>>> This approach will work in both securely and insecurely partitions
>>> fabrics.
>>
>> Shouldn't the pkey to be used for such interACM communication be
>> configured ?
> Yes. The purpose of this patch is only to make a secure system work out of 
> box (default configuration). When a specific pkey is given in the 
> ibacm_addr.cfg file, there will be no need to interpret the "default" pkey.
> 
>> First full member pkey is non-deterministic. Isn't it the case that
>> it may not include proper set of ACMs to communicate with ?
> 
> This is only for the default configuration, where a reasonable assumption is 
> that members of an intended 
> partition (group of ports) will all have the same full-member pkey.

Yes, but it may not be first (lowest index) pkey in table of different
ports.

> One could argue that a port could have two or more full-member non-management 
> pkeys because
> it is assigned to multiple partitions. 

Yes, that's a perfectly valid configuration.

> In this case, the port will only join only one multicast group, not all the 
> multicast groups. The reply is 
> that the default ibacm_addr.cfg have only one endpoint with pkey "default" 
> anyway.

In this case, the non default partitions are not useful for ACM and all
ACMs need to share "default" partition.

> To make it really work, one needs to edit ibacm_addr.cfg.

It may work without config depending on a number of factors but can
cause issues to be debugged.

Only sure way is config :-(

-- Hal

> Kaike
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/1] Ibacm: default pkey for partitioned fabrics

2015-12-09 Thread Hal Rosenstock

On 12/8/2015 7:26 PM, Hefty, Sean wrote:
>>> +* Determine the default pkey for parsing address file as well.
>>> +*   order of preference: first full-member non-management pkey,
>>> +*   0x, first pkey.
>>> +*/
>>
>> This really should just be the 0 index pkey, which exactly matches how
>> IPoIB determines the default pkey, which is what matters when talking
>> rdmacm..
> 
> Ibacm currently hard-codes the 'default' pkey to 0x for ibacm <-> ibacm 
> communication.   
> If there's no disagreement to switching to pkey[0], I'm fine with that.  

There's no IBA requirement that 0x pkey is always in index 0. It's
only a requirement on device bootup with non volatile storage per C10-123.

Furthermore, there's no requirement that 0x pkey is present in the
table. It may be that there's only 0x7fff pkey.

> I did not realize that ipoib uses this same default.

I think this is a problem in limiting partition use and should be
changed. Rather than assuming index 0, pkey table should be searched for
this pkey.

-- Hal
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/1] Ibacm: default pkey for partitioned fabrics

2015-12-09 Thread Hal Rosenstock

On 12/8/2015 4:21 PM, Jason Gunthorpe wrote:
> On Tue, Dec 08, 2015 at 12:33:02PM -0500, kaike@intel.com wrote:
>> From: Kaike Wan 
>>
>> In an insecure IB fabric, the default pkey in a port is 0x, where each
>> node is allowed to talk to any other node in the fabric, including the SA
>> node. However, in a secure fabric, to limit member access, not all nodes
>> can have the full-member default pkey 0x. A typical configuration is
>> to let SA node have pkey 0x while all other nodes have pkey 0x7fff; in
>> addition, each node can be assigned some other full-member pkeys, such as
>> 0x8001 and 0x8002, so that it can be assigned to different partitions.
>> In this case, each node can access SA, and yet limits its other access to
>> only those nodes in its assigned partitions. In such a secure fabric,
>> however, ibacm will not work by interpreting "default" in its default
>> address file as 0x.
> 
> ipoib always uses the 0 pkey index to create the default ipoib
> interface. (see eg, update_parent_pkey)

This is beyond IBA spec and is currently a linux convention for IPoIB.
IMO it should be changed to search for this pkey rather than assume it's
in index 0. There's no requirement that it be in index 0 other than at
bootup with non volatile storage (C10-123).

> When operating securely the SA should place the pkey for default ipoib
> operation in pkey index 0, and place 0x7FFF in another index. I run
> alot of networks exactly like this and it works very well.

Yes, it can run that way but more secure is without the full default
pkey. When full default pkey is in every port, the rest of the
partitioning doesn't really matter...

-- Hal
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/1] Ibacm: default pkey for partitioned fabrics

2015-12-09 Thread Hal Rosenstock

On 12/8/2015 12:33 PM, kaike@intel.com wrote:
> From: Kaike Wan 
> 
> In an insecure IB fabric, the default pkey in a port is 0x, where each
> node is allowed to talk to any other node in the fabric, including the SA
> node. However, in a secure fabric, to limit member access, not all nodes
> can have the full-member default pkey 0x. A typical configuration is
> to let SA node have pkey 0x while all other nodes have pkey 0x7fff; in
> addition, each node can be assigned some other full-member pkeys, such as
> 0x8001 and 0x8002, so that it can be assigned to different partitions.
> In this case, each node can access SA, and yet limits its other access to
> only those nodes in its assigned partitions. In such a secure fabric,
> however, ibacm will not work by interpreting "default" in its default
> address file as 0x.
> 
> To solve the problem, this patch introduces the following priority to
> interpret default pkey:
> 1. Find the first non-management full-member pkey;
> 2. If it fails, find pkey 0x;
> 3. If pkey 0x is not available, use the first pkey.
> This approach will work in both securely and insecurely partitions
> fabrics.

Shouldn't the pkey to be used for such interACM communication be
configured ? First full member pkey is non-deterministic. Isn't it the
case that it may not include proper set of ACMs to communicate with ?

-- Hal
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2] IB/mad: In validate_mad, validate CM send method for attributes other than ClassPortInfo

2015-11-13 Thread Hal Rosenstock


Receipt of CM MAD with other than the Send method for an attribute
other than the ClassPortInfo attribute is invalid.

CM attributes other than ClassPortInfo only use the send method.

The SRP initiator does not maintain a timeout policy for CM connect
requests relies on the CM layer to do that. The result was that
the SRP initiator hung as the connect request never completed.

A new SRP target has been observed to respond to Send CM REQ
with GetResp of CM REQ with bad status. This is non conformant
with IBA spec but exposes a vulnerability in the current MAD/CM
code which will respond to the incoming GetResp of CM REQ as if
it was a valid incoming Send of CM REQ rather than tossing
this on the floor. It also causes the MAD layer not to
retransmit the original REQ even though it has not received a REP.

Reviewed-by: Sagi Grimberg 
Signed-off-by: Hal Rosenstock 
---
Changes since v1:
Removed ClassPortInfo method validation

 drivers/infiniband/core/mad.c |5 +
 include/rdma/ib_mad.h |2 ++
 2 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index 8d8af7a..2281de1 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -1811,6 +1811,11 @@ static int validate_mad(const struct ib_mad_hdr *mad_hdr,
if (qp_num == 0)
valid = 1;
} else {
+   /* CM attributes other than ClassPortInfo only use Send method 
*/
+   if ((mad_hdr->mgmt_class == IB_MGMT_CLASS_CM) &&
+   (mad_hdr->attr_id != IB_MGMT_CLASSPORTINFO_ATTR_ID) &&
+   (mad_hdr->method != IB_MGMT_METHOD_SEND))
+   goto out;
/* Filter GSI packets sent to QP0 */
if (qp_num != 0)
valid = 1;
diff --git a/include/rdma/ib_mad.h b/include/rdma/ib_mad.h
index 188df91..ec9b44d 100644
--- a/include/rdma/ib_mad.h
+++ b/include/rdma/ib_mad.h
@@ -237,6 +237,8 @@ struct ib_vendor_mad {
u8  data[IB_MGMT_VENDOR_DATA];
 };
 
+#define IB_MGMT_CLASSPORTINFO_ATTR_ID  cpu_to_be16(0x0001)
+
 struct ib_class_port_info {
u8  base_version;
u8  class_version;
-- 
1.7.8.2


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] IB/mad: In validate_mad, validate CM method and attribute

2015-11-13 Thread Hal Rosenstock

On 11/12/2015 3:30 PM, Hefty, Sean wrote:
>>> +   /* CM attributes other than ClassPortInfo only use Send method
>> */
>>> +   if (mad_hdr->mgmt_class == IB_MGMT_CLASS_CM) {
>>> +   if (mad_hdr->attr_id != IB_MGMT_CLASSPORTINFO_ATTR_ID) {
>>> +   if (mad_hdr->method != IB_MGMT_METHOD_SEND)
>>> +   goto out;
>>> +   } else if (mad_hdr->method != IB_MGMT_METHOD_GET_RESP)
>>> +   goto out;
>>> +   }
>>
>> Doesn't this invalidate a CM Get(ClassPortInfo) mad?
> 
> I believe this does.  I think you could remove the else if clause and let the 
> received MAD get passed to the CM.  It would be dropped there as unsupported. 
>  The net result is likely the same.

Right now this is indeed the case as CPI is not supported in CM.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] IB/mad: In validate_mad, validate CM method and attribute

2015-11-13 Thread Hal Rosenstock

On 11/12/2015 5:57 PM, Hefty, Sean wrote:
>> IIRC responding to Get(CPI) is mandatory?
> 
> Maybe the drivers are responding?  I don't believe that the CM does.

Sorry for the confusion. linux kernel CM does not respond to Get(CPI).
I was just trying to add the validation code in for that since it was
being added for the remainder of the CM attributes even though CPI is
not currently supported there.

-- Hal
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] IB/mad: In validate_mad, validate CM method and attribute

2015-11-13 Thread Hal Rosenstock

IOn 11/12/2015 5:13 PM, Jason Gunthorpe wrote:
> On Thu, Nov 12, 2015 at 08:30:55PM +, Hefty, Sean wrote:
 +  /* CM attributes other than ClassPortInfo only use Send method
>>> */
 +  if (mad_hdr->mgmt_class == IB_MGMT_CLASS_CM) {
 +  if (mad_hdr->attr_id != IB_MGMT_CLASSPORTINFO_ATTR_ID) {
 +  if (mad_hdr->method != IB_MGMT_METHOD_SEND)
 +  goto out;
 +  } else if (mad_hdr->method != IB_MGMT_METHOD_GET_RESP)
 +  goto out;
 +  }
>>>
>>> Doesn't this invalidate a CM Get(ClassPortInfo) mad?
>>
>> I believe this does.

My bad. That wasn't my intention.

>> I think you could remove the else if clause
>> and let the received MAD get passed to the CM.  It would be dropped
>> there as unsupported.  The net result is likely the same.

I'm dropping else clause for CM CPI validation in next version of this
patch.

> IIRC responding to Get(CPI) is mandatory?

Yes it's mandatory to support CPI (and get method of CPI) on a class.
However, it's not currently done in CM.

-- Hal

> Jason
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] IB/mad: In validate_mad, validate CM method and attribute

2015-11-13 Thread Hal Rosenstock

On 11/12/2015 1:13 PM, ira.weiny wrote:
> On Thu, Nov 12, 2015 at 01:48:50PM +0200, Hal Rosenstock wrote:
>>
>> Receipt of CM MAD with response method for other than ClassPortInfo
>> attribute is invalid.
>>
>> CM attributes other than ClassPortInfo use send method only
>> and GetResp is valid for ClassPortInfo attribute.
>> Note also that the CM ClassPortInfo is not currently supported.
>>
>> The SRP initiator does not maintain a timeout policy for CM connect
>> requests relies on the CM layer to do that. The result was that
>> the SRP initiator hung as the connect request never completed.
>>
>> A new SRP target has been observed to respond to Send CM REQ
>> with GetResp of CM REQ with bad status. This is non conformant
>> with IBA spec but exposes a vulnerability in the current MAD/CM
>> code which will respond to the incoming GetResp of CM REQ as if
>> it was a valid incoming Send of CM REQ rather than tossing
>> this on the floor. It also causes the MAD layer not to
>> retransmit the original REQ even though it has not received a REP.
>>
>> Reviewed-by: Sagi Grimberg 
>> Signed-off-by: Hal Rosenstock 
>> ---
>>  drivers/infiniband/core/mad.c |8 
>>  include/rdma/ib_mad.h |2 ++
>>  2 files changed, 10 insertions(+), 0 deletions(-)
>>
>> diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
>> index 8d8af7a..e2d425f 100644
>> --- a/drivers/infiniband/core/mad.c
>> +++ b/drivers/infiniband/core/mad.c
>> @@ -1811,6 +1811,14 @@ static int validate_mad(const struct ib_mad_hdr 
>> *mad_hdr,
>>  if (qp_num == 0)
>>  valid = 1;
>>  } else {
>> +/* CM attributes other than ClassPortInfo only use Send method 
>> */
>> +if (mad_hdr->mgmt_class == IB_MGMT_CLASS_CM) {
>> +if (mad_hdr->attr_id != IB_MGMT_CLASSPORTINFO_ATTR_ID) {
>> +if (mad_hdr->method != IB_MGMT_METHOD_SEND)
>> +goto out;
>> +} else if (mad_hdr->method != IB_MGMT_METHOD_GET_RESP)
>> +goto out;
>> +}
> 
> Doesn't this invalidate a CM Get(ClassPortInfo) mad?  Is that the intention
> because it is not supported in the CM?

The intention was to future proof it for when it would be supported but
I misexecuted this as it does not currently handle get as you pointed
out. v2 of patch to follow...

> For the future it seems like these types of checks should be done at the class
> level.  But I'm not advocating that right now.

Yes, architecturally these sort of checks belong at the class level.
I started with the checks at the class level (in CM) but since CM relies
on the MAD retransmission mechanism, I didn't want to reimplement that
there so decided to violate the layering and put these (CM) class
specific checks in the mad module.

-- Hal

> Ira
> 
>>  /* Filter GSI packets sent to QP0 */
>>  if (qp_num != 0)
>>  valid = 1;
>> diff --git a/include/rdma/ib_mad.h b/include/rdma/ib_mad.h
>> index 188df91..ec9b44d 100644
>> --- a/include/rdma/ib_mad.h
>> +++ b/include/rdma/ib_mad.h
>> @@ -237,6 +237,8 @@ struct ib_vendor_mad {
>>  u8  data[IB_MGMT_VENDOR_DATA];
>>  };
>>  
>> +#define IB_MGMT_CLASSPORTINFO_ATTR_ID   cpu_to_be16(0x0001)
>> +
>>  struct ib_class_port_info {
>>  u8  base_version;
>>  u8  class_version;
>> -- 
>> 1.7.8.2
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] IB/mad: In validate_mad, validate CM method and attribute

2015-11-12 Thread Hal Rosenstock


Receipt of CM MAD with response method for other than ClassPortInfo
attribute is invalid.

CM attributes other than ClassPortInfo use send method only
and GetResp is valid for ClassPortInfo attribute.
Note also that the CM ClassPortInfo is not currently supported.

The SRP initiator does not maintain a timeout policy for CM connect
requests relies on the CM layer to do that. The result was that
the SRP initiator hung as the connect request never completed.

A new SRP target has been observed to respond to Send CM REQ
with GetResp of CM REQ with bad status. This is non conformant
with IBA spec but exposes a vulnerability in the current MAD/CM
code which will respond to the incoming GetResp of CM REQ as if
it was a valid incoming Send of CM REQ rather than tossing
this on the floor. It also causes the MAD layer not to
retransmit the original REQ even though it has not received a REP.

Reviewed-by: Sagi Grimberg 
Signed-off-by: Hal Rosenstock 
---
 drivers/infiniband/core/mad.c |8 
 include/rdma/ib_mad.h |2 ++
 2 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index 8d8af7a..e2d425f 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -1811,6 +1811,14 @@ static int validate_mad(const struct ib_mad_hdr *mad_hdr,
if (qp_num == 0)
valid = 1;
} else {
+   /* CM attributes other than ClassPortInfo only use Send method 
*/
+   if (mad_hdr->mgmt_class == IB_MGMT_CLASS_CM) {
+   if (mad_hdr->attr_id != IB_MGMT_CLASSPORTINFO_ATTR_ID) {
+   if (mad_hdr->method != IB_MGMT_METHOD_SEND)
+   goto out;
+   } else if (mad_hdr->method != IB_MGMT_METHOD_GET_RESP)
+   goto out;
+   }
/* Filter GSI packets sent to QP0 */
if (qp_num != 0)
valid = 1;
diff --git a/include/rdma/ib_mad.h b/include/rdma/ib_mad.h
index 188df91..ec9b44d 100644
--- a/include/rdma/ib_mad.h
+++ b/include/rdma/ib_mad.h
@@ -237,6 +237,8 @@ struct ib_vendor_mad {
u8  data[IB_MGMT_VENDOR_DATA];
 };
 
+#define IB_MGMT_CLASSPORTINFO_ATTR_ID  cpu_to_be16(0x0001)
+
 struct ib_class_port_info {
u8  base_version;
u8  class_version;
-- 
1.7.8.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH infiniband-diags] ibsendtrap.c: Eliminate unused parameter from send_trap

2015-11-10 Thread Hal Rosenstock


src/ibsendtrap.c: In function ?send_trap?:
src/ibsendtrap.c:124: warning: unused parameter ?name?

Signed-off-by: Hal Rosenstock 
---
diff --git a/src/ibsendtrap.c b/src/ibsendtrap.c
index 46fd6e7..659f2d2 100644
--- a/src/ibsendtrap.c
+++ b/src/ibsendtrap.c
@@ -121,8 +121,7 @@ static void build_trap129(ib_mad_notice_attr_t * n, 
ib_portid_t * port)
n->data_details.ntc_129_131.port_num = (uint8_t) error_port;
 }
 
-static int send_trap(const char *name,
-void (*build) (ib_mad_notice_attr_t *, ib_portid_t *))
+static int send_trap(void (*build) (ib_mad_notice_attr_t *, ib_portid_t *))
 {
ib_portid_t sm_port;
ib_portid_t selfportid;
@@ -169,7 +168,7 @@ int process_send_trap(char *trap_name)
 
for (i = 0; traps[i].trap_name; i++)
if (strcmp(traps[i].trap_name, trap_name) == 0)
-   return send_trap(trap_name, traps[i].build_func);
+   return send_trap(traps[i].build_func);
ibdiag_show_usage();
return 1;
 }
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH infiniband-diags] Remove unused pisize parameter from dump_portinfo in ibdiag_common

2015-11-10 Thread Hal Rosenstock


src/ibdiag_common.c: In function ?dump_portinfo?:
src/ibdiag_common.c:856: warning: unused parameter ?pisize?

Signed-off-by: Hal Rosenstock 
---
diff --git a/include/ibdiag_common.h b/include/ibdiag_common.h
index af5b3c4..21b0522 100644
--- a/include/ibdiag_common.h
+++ b/include/ibdiag_common.h
@@ -150,7 +150,7 @@ int vsnprint_field(char *buf, size_t n, enum MAD_FIELDS f, 
int spacing,
   const char *format, va_list va_args);
 int snprint_field(char *buf, size_t n, enum MAD_FIELDS f, int spacing,
  const char *format, ...);
-void dump_portinfo(void *pi, int pisize, int tabs);
+void dump_portinfo(void *pi, int tabs);
 
 /**
  * Some common command line parsing
diff --git a/src/ibdiag_common.c b/src/ibdiag_common.c
index 5ec0167..3ebdbb9 100644
--- a/src/ibdiag_common.c
+++ b/src/ibdiag_common.c
@@ -853,7 +853,7 @@ int snprint_field(char *buf, size_t n, enum MAD_FIELDS f, 
int spacing,
return ret;
 }
 
-void dump_portinfo(void *pi, int pisize, int tabs)
+void dump_portinfo(void *pi, int tabs)
 {
int field, i;
char val[64];
diff --git a/src/saquery.c b/src/saquery.c
index cc8d8dc..bd70b7b 100644
--- a/src/saquery.c
+++ b/src/saquery.c
@@ -304,7 +304,7 @@ static void dump_one_portinfo_record(void *data, struct 
query_params *p)
   "\t\tOptions.0x%x\n"
   "\tPortInfo dump:\n",
   cl_ntoh16(pir->lid), pir->port_num, pir->options);
-   dump_portinfo(pi, sizeof(*pi), 2);
+   dump_portinfo(pi, 2);
 }
 
 static void dump_one_mcmember_record(void *data, struct query_params *p)
diff --git a/src/smpquery.c b/src/smpquery.c
index 187ef61..2bd7132 100644
--- a/src/smpquery.c
+++ b/src/smpquery.c
@@ -161,7 +161,7 @@ static char *port_info(ib_portid_t * dest, char **argv, int 
argc)
return "port info query failed";
 
printf("# Port info: %s port %d\n", portid2str(dest), orig_portnum);
-   dump_portinfo(data, sizeof data, 0);
+   dump_portinfo(data, 0);
return 0;
 }
 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH infiniband-diags] ibportstate.c: Fix unsigned comparison warnings

2015-11-10 Thread Hal Rosenstock


src/ibportstate.c:450: warning: comparison of unsigned expression < 0 is always 
false
src/ibportstate.c:454: warning: comparison of unsigned expression < 0 is always 
false
src/ibportstate.c:458: warning: comparison of unsigned expression < 0 is always 
false
src/ibportstate.c:462: warning: comparison of unsigned expression < 0 is always 
false
src/ibportstate.c:483: warning: comparison of unsigned expression < 0 is always 
false
src/ibportstate.c:500: warning: comparison of unsigned expression < 0 is always 
false
src/ibportstate.c:504: warning: comparison of unsigned expression < 0 is always 
false

Signed-off-by: Hal Rosenstock 
---
diff --git a/src/ibportstate.c b/src/ibportstate.c
index 47e9133..cb47aa9 100644
--- a/src/ibportstate.c
+++ b/src/ibportstate.c
@@ -447,40 +447,40 @@ int main(int argc, char **argv)
val = strtoull(argv[i], 0, 0);
switch (j) {
case SPEED:
-   if (val < 0 || val > 15)
+   if (val > 15)
IBEXIT("invalid speed value %ld", val);
break;
case ESPEED:
-   if (val < 0 || val > 31)
+   if (val > 31)
IBEXIT("invalid extended speed value 
%ld", val);
break;
case FDR10SPEED:
-   if (val < 0 || val > 1)
+   if (val > 1)
IBEXIT("invalid fdr10 speed value %ld", 
val);
break;
case WIDTH:
-   if (val < 0 || (val > 15 && val != 255))
+   if ((val > 15 && val != 255))
IBEXIT("invalid width value %ld", val);
break;
case VLS:
-   if (val <= 0 || val > 5)
+   if (val == 0 || val > 5)
IBEXIT("invalid vls value %ld", val);
break;
case MTU:
-   if (val <= 0 || val > 5)
+   if (val == 0 || val > 5)
IBEXIT("invalid mtu value %ld", val);
break;
case LID:
-   if (val <= 0 || val >= 0xC000)
+   if (val == 0 || val >= 0xC000)
IBEXIT("invalid lid value 0x%lx", val);
break;
case SMLID:
-   if (val <= 0 || val >= 0xC000)
+   if (val == 0 || val >= 0xC000)
IBEXIT("invalid smlid value 0x%lx",
val);
break;
case LMC:
-   if (val < 0 || val > 7)
+   if (val > 7)
IBEXIT("invalid lmc value %ld", val);
break;
case MKEY:
@@ -497,11 +497,11 @@ int main(int argc, char **argv)
/* All 64-bit values are legal */
break;
case MKEYLEASE:
-   if (val < 0 || val > 0x)
+   if (val > 0x)
IBEXIT("invalid mkey lease time %ld", 
val);
break;
case MKEYPROT:
-   if (val < 0 || val > 3)
+   if (val > 3)
IBEXIT("invalid mkey protection bit 
setting %ld", val);
}
*port_args[j].val = val;
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH libibcm] cmpost.c: Handle ibv_get_device_list returning no IB devices in init()

2015-11-09 Thread Hal Rosenstock

On 11/5/2015 7:46 PM, Hefty, Sean wrote:
> Merged - thanks.
> 
> This is the first patch against the libibcm in over 4 years.  Is there a 
> reason why this is being used instead of the librdmacm?  I ask because I 
> assumed that the libibcm was basically deprecated.  The last release was over 
> 6 years ago.

I started trying to use this test program to try to reproduce a CM bug.
More to follow after some additional verification.

-- Hal

> - Sean
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH libibcm] cmpost.c: Handle ibv_get_device_list returning no IB devices in init()

2015-11-01 Thread Hal Rosenstock


Also, print message when init fails

Signed-off-by: Hal Rosenstock 
---
diff --git a/examples/cmpost.c b/examples/cmpost.c
index 1b0edf8..f7833f5 100644
--- a/examples/cmpost.c
+++ b/examples/cmpost.c
@@ -447,6 +447,8 @@ static int init(void)
test.disconnects_left = connections;
 
dev_list = ibv_get_device_list(NULL);
+   if (!dev_list)
+   return -1;
test.device = dev_list[0];
if (!test.device)
return -1;
@@ -756,8 +758,10 @@ int main(int argc, char **argv)
}
 
is_server = (argc == 1);
-   if (init())
+   if (init()) {
+   printf("init failed\n");
exit(1);
+   }
 
if (is_server)
run_server();
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH rdma-RC] IB/cm: Fix sleeping while atomic when creating AH from WC

2015-10-20 Thread Hal Rosenstock



On 10/20/2015 11:57 AM, Hefty, Sean wrote:
>> Today, cm assumes paths are reversible primary_path->reversible = 1.
> 
> I can't quickly find a link, but I believe all CM MADs are reversible, per 
> the spec.

Spec citation for this is:

C12-5.1.3: All responses generated by the CM protocol shall follow the
rules for response generation that are enumerated in 13.5.4 Response
Generation and Reversible Paths.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] OpenSM: LFT update breaks if IB_SMP_DATA_SIZE changes

2015-10-13 Thread Hal Rosenstock

On 10/13/2015 9:25 AM, Jens Domke wrote:
> This is only a precautionary patch for a theoretical
> bug which would arise if someone redefines IB_SMP_DATA_SIZE
> to a values !=64.
> 
> ucast_mgr_pipeline_fwd_tbl() calculates the max. number of
> blocks to update using 64 explicitly, while set_lft_block()
> uses IB_SMP_DATA_SIZE.
> If IB_SMP_DATA_SIZE != 64 then switches would receive too few
> or too many blocks.
> 
> Signed-off-by: Jens Domke 

Thanks. Applied.

-- Hal
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH infiniband-diags] ibqueryerrors.c: Removed unused passed parameters to print_port_config and query_and_dump

2015-10-13 Thread Hal Rosenstock


src/ibqueryerrors.c: In function ?print_port_config?:
src/ibqueryerrors.c:176: warning: unused parameter ?node_name?
src/ibqueryerrors.c: In function ?query_and_dump?:
src/ibqueryerrors.c:367: warning: unused parameter ?node?

Signed-off-by: Hal Rosenstock 
---
diff --git a/src/ibqueryerrors.c b/src/ibqueryerrors.c
index 40a8ad1..304dc15 100644
--- a/src/ibqueryerrors.c
+++ b/src/ibqueryerrors.c
@@ -173,7 +173,7 @@ static int exceeds_threshold(int field, unsigned val)
return (val > thres);
 }
 
-static void print_port_config(char *node_name, ibnd_node_t * node, int portnum)
+static void print_port_config(ibnd_node_t * node, int portnum)
 {
char width[64], speed[64], state[64], physstate[64];
char remote_str[256];
@@ -364,7 +364,7 @@ Exit:
 }
 
 static int query_and_dump(char *buf, size_t size, ib_portid_t * portid,
- ibnd_node_t * node, char *node_name, int portnum,
+ char *node_name, int portnum,
  const char *attr_name, uint16_t attr_id,
  int start_field, int end_field)
 {
@@ -418,7 +418,7 @@ static int print_results(ib_portid_t * portid, char 
*node_name,
/* If there are PortXmitDiscards, get details (if 
supported) */
if (i == IB_PC_XMT_DISCARDS_F && details) {
n += query_and_dump(str + n, sizeof(buf) - n, 
portid,
-   node, node_name, portnum,
+   node_name, portnum,
"PortXmitDiscardDetails",

IB_GSI_PORT_XMIT_DISCARD_DETAILS,
IB_PC_RCV_LOCAL_PHY_ERR_F,
@@ -426,7 +426,7 @@ static int print_results(ib_portid_t * portid, char 
*node_name,
/* If there are PortRcvErrors, get details (if 
supported) */
} else if (i == IB_PC_ERR_RCV_F && details) {
n += query_and_dump(str + n, sizeof(buf) - n, 
portid,
-   node, node_name, portnum,
+   node_name, portnum,
"PortRcvErrorDetails",

IB_GSI_PORT_RCV_ERROR_DETAILS,
IB_PC_XMT_INACT_DISC_F,
@@ -499,7 +499,7 @@ static int print_results(ib_portid_t * portid, char 
*node_name,
printf("   GUID 0x%" PRIx64 " port %d:%s\n",
   node->ports[portnum]->guid, portnum, str);
if (port_config)
-   print_port_config(node_name, node, portnum);
+   print_port_config(node, portnum);
summary.bad_ports++;
}
}
@@ -596,7 +596,7 @@ static int print_data_cnts(ib_portid_t * portid, uint16_t 
cap_mask,
printf("\n");
 
if (portnum != 0xFF && port_config)
-   print_port_config(node_name, node, portnum);
+   print_port_config(node, portnum);
 
return (0);
 }
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH infiniband-diags] ibcacheedit.c: Eliminate unused parameter passed to update_switchportguids

2015-10-13 Thread Hal Rosenstock


src/ibcacheedit.c: In function ?update_switchportguids?:
src/ibcacheedit.c:168: warning: unused parameter ?guid?

Signed-off-by: Hal Rosenstock 
---
diff --git a/src/ibcacheedit.c b/src/ibcacheedit.c
index 983c0ab..b520839 100644
--- a/src/ibcacheedit.c
+++ b/src/ibcacheedit.c
@@ -165,7 +165,7 @@ static int process_opt(void *context, int ch, char *optarg)
return 0;
 }
 
-static void update_switchportguids(ibnd_node_t *node, uint64_t guid)
+static void update_switchportguids(ibnd_node_t *node)
 {
ibnd_port_t *port;
int p;
@@ -191,7 +191,7 @@ static void replace_node_guid(ibnd_node_t *node, void 
*user_data)
 * switches, so update port guids too
 */
if (node->type == IB_NODE_SWITCH)
-   update_switchportguids(node, guids->after);
+   update_switchportguids(node);
 
guids->found++;
}
@@ -229,7 +229,7 @@ static void replace_portguid(ibnd_node_t *node, void 
*user_data)
 */
if (node->guid == guids->before) {
node->guid = guids->after;
-   update_switchportguids(node, guids->after);
+   update_switchportguids(node);
guids->found++;
}
}
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH TRIVIAL infiniband-diags] ibdiag_common.c: Move static to beginning of get_build_version declaration

2015-10-12 Thread Hal Rosenstock


Eliminate compiler warning:
src/ibdiag_common.c:85: warning: ?static? is not at beginning of declaration

Signed-off-by: Hal Rosenstock 
---
diff --git a/src/ibdiag_common.c b/src/ibdiag_common.c
index e09623d..5424845 100644
--- a/src/ibdiag_common.c
+++ b/src/ibdiag_common.c
@@ -82,7 +82,7 @@ static const char **prog_examples;
 static struct option *long_opts = NULL;
 static const struct ibdiag_opt *opts_map[256];
 
-const static char *get_build_version(void)
+static const char *get_build_version(void)
 {
return "BUILD VERSION: " IBDIAG_VERSION " Build date: " __DATE__ " "
__TIME__;
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH infiniband-diags] perfquery.c: Fix smp_query_via return value checks

2015-10-12 Thread Hal Rosenstock


smp_query_via returns pointer so < 0 comparison is wrong:
src/perfquery.c: In function ?is_rsfec_mode_active?:
src/perfquery.c:481: warning: ordered comparison of pointer with integer zero
src/perfquery.c: In function ?main?:
src/perfquery.c:919: warning: ordered comparison of pointer with integer zero
src/perfquery.c:928: warning: ordered comparison of pointer with integer zero
 
Reported-by: David Binderman  
Signed-off-by: Hal Rosenstock 
---
Fix for OFA Bugzilla #2572

diff --git a/src/perfquery.c b/src/perfquery.c
index 9e3a307..948ce52 100644
--- a/src/perfquery.c
+++ b/src/perfquery.c
@@ -477,8 +477,8 @@ static uint8_t is_rsfec_mode_active(ib_portid_t * portid, 
int port,
return 0;
}
 
-   if (smp_query_via(data, portid, IB_ATTR_PORT_INFO_EXT, port, 0,
- srcport) < 0)
+   if (!smp_query_via(data, portid, IB_ATTR_PORT_INFO_EXT, port, 0,
+  srcport))
IBEXIT("smp query portinfo extended failed");
 
mad_decode_field(data, IB_PORT_EXT_CAPMASK_F, &pie_capmask);
@@ -915,8 +915,8 @@ int main(int argc, char **argv)
 
 
if (all_ports_loop || (loop_ports && (all_ports || port == ALL_PORTS))) 
{
-   if (smp_query_via(data, &portid, IB_ATTR_NODE_INFO, 0, 0,
- srcport) < 0)
+   if (!smp_query_via(data, &portid, IB_ATTR_NODE_INFO, 0, 0,
+  srcport))
IBEXIT("smp query nodeinfo failed");
node_type = mad_get_field(data, 0, IB_NODE_TYPE_F);
mad_decode_field(data, IB_NODE_NPORTS_F, &num_ports);
@@ -924,8 +924,8 @@ int main(int argc, char **argv)
IBEXIT("smp query nodeinfo: num ports invalid");
 
if (node_type == IB_NODE_SWITCH) {
-   if (smp_query_via(data, &portid, IB_ATTR_SWITCH_INFO,
- 0, 0, srcport) < 0)
+   if (!smp_query_via(data, &portid, IB_ATTR_SWITCH_INFO,
+  0, 0, srcport))
IBEXIT("smp query nodeinfo failed");
enhancedport0 =
mad_get_field(data, 0, IB_SW_ENHANCED_PORT0_F);
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/5 RFC] IB/mad: Add kernel trace to umad/mad

2015-10-05 Thread Hal Rosenstock

On 10/1/2015 12:51 AM, ira.we...@intel.com wrote:
> From: Ira Weiny 
> 
> This has mild testing and still has some FIXMEs etc.  I'm submitting to help
> decide if we wish to remove the MAD snooping infrastructure.

I think this has a ways to go before it has equivalent functionality to
madeye.

I'm also not sure that this is a replacement for MAD snooping. There's
some overlap but they are somewhat complementary right now.

IMO, at a minimum, the snooping interface should not be removed until
this has equivalent functionality.

> I believe this to be a more efficient debugging mechanism as it is most always
> available (if debugfs is available)

Snooping could be exposed to user space via user_mad and madeye could be
ported to user space.

> and should have limited performance impact when off.

This is true for snooping too.

-- Hal

> 
> Ira Weiny (5):
>   IB/MAD: add send path trace points
>   IB/MAD: add recv path trace point
>   IB/MAD Add agent trace points
>   IB/mad: Add umad trace points
>   IB/mad: Add SMP tracing
> 
>  drivers/infiniband/core/mad.c  |  72 +++
>  drivers/infiniband/core/user_mad.c |   5 +
>  include/trace/events/ib_mad.h  | 385 
> +
>  include/trace/events/ib_umad.h | 312 ++
>  4 files changed, 774 insertions(+)
>  create mode 100644 include/trace/events/ib_mad.h
>  create mode 100644 include/trace/events/ib_umad.h
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH TRIVIAL] ib_pack.h: Fix commentary IBA reference for CNP in IB opcode enum

2015-09-30 Thread Hal Rosenstock


IBA spec is now 1.3 not 3.1 and vol 1 should be mentioned
as there is also vol 2.

Signed-off-by: Hal Rosenstock 
---
diff --git a/include/rdma/ib_pack.h b/include/rdma/ib_pack.h
index 709a533..e99d8f9 100644
--- a/include/rdma/ib_pack.h
+++ b/include/rdma/ib_pack.h
@@ -76,7 +76,7 @@ enum {
IB_OPCODE_UC= 0x20,
IB_OPCODE_RD= 0x40,
IB_OPCODE_UD= 0x60,
-   /* per IBTA 3.1 Table 38, A10.3.2 */
+   /* per IBTA 1.3 vol 1 Table 38, A10.3.2 */
IB_OPCODE_CNP   = 0x80,
 
/* operations -- just used to define real constants */
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC] IB/mad: remove obsolete snoop interface

2015-09-30 Thread Hal Rosenstock

On 9/30/2015 12:32 PM, Weiny, Ira wrote:
>>
>> On 9/30/2015 2:01 AM, ira.we...@intel.com wrote:
>>> From: Ira Weiny 
>>>
>>> This interface has no current users and is obsolete.
>>
>> There are no in tree users of this but there is Sean's madeye tool (which is 
>> out
>> of tree). This is still a useful debug tool for MADs.
> 
> Where is that?  

It's been out of tree for some time now. For one place, it can be found
on OpenFabrics site in some old and moldy OFED.

> I did not think it was supported any longer?

It still works AFAIK.

> I have a series of about 5 patches which implement tracing in the MAD stack 
> which I was working through and was going to submit 

Does your MAD stack tracing include the dumping and decode of sent and
received MADs ?

-- Hal

> along with this patch (those are still being tested in my spare time).  I 
> just wanted put this out here because of the patch Nicholas posted yesterday.
>>
>>> Signed-off-by: Ira Weiny 
>>> ---
>>>
>>> This has undergone a medium level of testing.  I have run it against
>>> mlx4, qib, and OPA hardware and it does not seem to cause any regressions.
>>>
>>>  drivers/infiniband/core/mad.c  | 226 
>>> +
>>>  drivers/infiniband/core/mad_priv.h |  13 ---
>>>  2 files changed, 5 insertions(+), 234 deletions(-)
>>
>> There are also snoop changes needed in include/rdma/ib_mad.h.
> 
> I'll look into it if there are no other objections to removing the snoop 
> interface.
>
>>
>> Is it correct to assume that OPA_CAP_MASK3_IsSnoopSupported in
>> opa_port_info.h is not related to this ?
> 
> Nope, not related.
> 
> Ira
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH librdmacm] examples/r[io]stream.c: Better handling of rpoll errors during client async rconnect

2015-09-30 Thread Hal Rosenstock


Rather than silently exiting when connection is refused by server,
print error message (Connection refused) during async rconnect at client.

Signed-off-by: Hal Rosenstock 
---
diff --git a/examples/riostream.c b/examples/riostream.c
index c12dd0d..b6c9f86 100644
--- a/examples/riostream.c
+++ b/examples/riostream.c
@@ -470,8 +470,10 @@ static int client_connect(void)
fds.fd = rs;
fds.events = POLLOUT;
ret = do_poll(&fds, poll_timeout);
-   if (ret)
+   if (ret) {
+   perror("rpoll");
goto close;
+   }
 
len = sizeof err;
ret = rgetsockopt(rs, SOL_SOCKET, SO_ERROR, &err, &len);
diff --git a/examples/rstream.c b/examples/rstream.c
index d93e9aa..4b76bbc 100644
--- a/examples/rstream.c
+++ b/examples/rstream.c
@@ -469,8 +469,10 @@ static int client_connect(void)
fds.fd = rs;
fds.events = POLLOUT;
ret = do_poll(&fds, poll_timeout);
-   if (ret)
+   if (ret) {
+   perror("rpoll");
goto close;
+   }
 
len = sizeof err;
ret = rs_getsockopt(rs, SOL_SOCKET, SO_ERROR, &err, &len);
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH librdmacm] rstream.c: Use proper define for ai_flags before getaddrinfo call in client_connect

2015-09-30 Thread Hal Rosenstock


even though both have same value

Signed-off-by: Hal Rosenstock 
---
diff --git a/examples/rstream.c b/examples/rstream.c
index d93e9aa..ba5e449 100644
--- a/examples/rstream.c
+++ b/examples/rstream.c
@@ -421,7 +421,7 @@ static int client_connect(void)
rai_hints.ai_flags |= RAI_PASSIVE;
ret = rdma_getaddrinfo(src_addr, port, &rai_hints, 
&rai_src);
} else {
-   ai_hints.ai_flags |= RAI_PASSIVE;
+   ai_hints.ai_flags |= AI_PASSIVE;
ret = getaddrinfo(src_addr, port, &ai_hints, &ai_src);
}
if (ret) {
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH librdmacm] examples: Use gai_strerror rather than perror for [rdma_]getaddrinfo failures

2015-09-30 Thread Hal Rosenstock


[rdma_]getaddrinfo error codes are decoded by gai_strerror (and not
set in errno) so replace perror calls following these failed calls. 

Signed-off-by: Hal Rosenstock 
---
diff --git a/examples/cmatose.c b/examples/cmatose.c
index ab3e746..d7bd92d 100644
--- a/examples/cmatose.c
+++ b/examples/cmatose.c
@@ -509,7 +509,7 @@ static int run_server(void)
 
ret = get_rdma_addr(src_addr, dst_addr, port, &hints, &test.rai);
if (ret) {
-   perror("cmatose: getrdmaaddr error");
+   printf("cmatose: getrdmaaddr error: %s\n", gai_strerror(ret));
goto out;
}
 
@@ -582,7 +582,7 @@ static int run_client(void)
 
ret = get_rdma_addr(src_addr, dst_addr, port, &hints, &test.rai);
if (ret) {
-   perror("cmatose: getaddrinfo error");
+   printf("cmatose: getaddrinfo error: %s\n", gai_strerror(ret));
return ret;
}
 
diff --git a/examples/cmtime.c b/examples/cmtime.c
index ebc660b..e45980b 100644
--- a/examples/cmtime.c
+++ b/examples/cmtime.c
@@ -479,7 +479,7 @@ static int run_server(void)
 
ret = get_rdma_addr(src_addr, dst_addr, port, &hints, &rai);
if (ret) {
-   perror("getrdmaaddr error");
+   printf("getrdmaaddr error: %s\n", gai_strerror(ret));
goto out;
}
 
@@ -508,7 +508,7 @@ static int run_client(void)
 
ret = get_rdma_addr(src_addr, dst_addr, port, &hints, &rai);
if (ret) {
-   perror("getaddrinfo error");
+   printf("getaddrinfo error: %s\n", gai_strerror(ret));
return ret;
}
 
diff --git a/examples/mckey.c b/examples/mckey.c
index a6b5c4d..2032aa9 100644
--- a/examples/mckey.c
+++ b/examples/mckey.c
@@ -452,7 +452,7 @@ static int get_addr(char *dst, struct sockaddr *addr)
 
ret = getaddrinfo(dst, NULL, NULL, &res);
if (ret) {
-   printf("getaddrinfo failed - invalid hostname or IP address\n");
+   printf("getaddrinfo failed (%s) - invalid hostname or IP 
address\n", gai_strerror(ret));
return ret;
}
 
diff --git a/examples/rcopy.c b/examples/rcopy.c
index 152acef..085b017 100644
--- a/examples/rcopy.c
+++ b/examples/rcopy.c
@@ -186,7 +186,7 @@ static int server_listen(void)
hints.ai_flags = RAI_PASSIVE;
ret = getaddrinfo(NULL, port, &hints, &res);
if (ret) {
-   perror("getaddrinfo failed\n");
+   printf("getaddrinfo failed: %s\n", gai_strerror(ret));
return ret;
}
 
@@ -396,7 +396,7 @@ static int client_connect(void)
 
ret = getaddrinfo(dst_addr, port, NULL, &res);
if (ret) {
-   perror("getaddrinfo failed\n");
+   printf("getaddrinfo failed: %s\n", gai_strerror(ret));
return ret;
}
 
diff --git a/examples/rdma_client.c b/examples/rdma_client.c
index f676b70..d7f65f5 100644
--- a/examples/rdma_client.c
+++ b/examples/rdma_client.c
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -55,7 +56,7 @@ static int run(void)
hints.ai_port_space = RDMA_PS_TCP;
ret = rdma_getaddrinfo(server, port, &hints, &res);
if (ret) {
-   perror("rdma_getaddrinfo");
+   printf("rdma_getaddrinfo: %s\n", gai_strerror(ret));
goto out;
}
 
diff --git a/examples/rdma_server.c b/examples/rdma_server.c
index 129cf42..d98d11a 100644
--- a/examples/rdma_server.c
+++ b/examples/rdma_server.c
@@ -57,7 +57,7 @@ static int run(void)
hints.ai_port_space = RDMA_PS_TCP;
ret = rdma_getaddrinfo(NULL, port, &hints, &res);
if (ret) {
-   perror("rdma_getaddrinfo");
+   printf("rdma_getaddrinfo: %s\n", gai_strerror(ret));
return ret;
}
 
diff --git a/examples/rdma_xclient.c b/examples/rdma_xclient.c
index 6510408..8dba266 100644
--- a/examples/rdma_xclient.c
+++ b/examples/rdma_xclient.c
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -80,7 +81,7 @@ static int test(void)
 
ret = rdma_getaddrinfo(server, port, &hints, &res);
if (ret) {
-   perror("rdma_getaddrinfo");
+   printf("rdma_getaddrinfo: %s\n", gai_strerror(ret));
return ret;
}
 
diff --git a/examples/rdma_xserver.c b/examples/rdma_xserver.c
index d30c88e..69f170b 100644
--- a/examples/rdma_xserver.c
+++ b/examples/rdma_xserver.c
@@ -78,7 +78,7 @@ static int test(void)
 
ret = rdma_getaddrinfo(NULL, port, &hints, &res);
if (ret) {
-   perror(

Re: [PATCH RFC] IB/mad: remove obsolete snoop interface

2015-09-30 Thread Hal Rosenstock

On 9/30/2015 2:01 AM, ira.we...@intel.com wrote:
> From: Ira Weiny 
> 
> This interface has no current users and is obsolete.

There are no in tree users of this but there is Sean's madeye tool
(which is out of tree). This is still a useful debug tool for MADs.

> Signed-off-by: Ira Weiny 
> ---
> 
> This has undergone a medium level of testing.  I have run it against
> mlx4, qib, and OPA hardware and it does not seem to cause any regressions.
> 
>  drivers/infiniband/core/mad.c  | 226 
> +
>  drivers/infiniband/core/mad_priv.h |  13 ---
>  2 files changed, 5 insertions(+), 234 deletions(-)

There are also snoop changes needed in include/rdma/ib_mad.h.

Is it correct to assume that OPA_CAP_MASK3_IsSnoopSupported in
opa_port_info.h is not related to this ?

-- Hal
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH infiniband-diags] libibnetdisc: Avoid pushing same pointer to the hash table

2015-09-30 Thread Hal Rosenstock

From: Dan Ben Yosef 

This patch avoids pushing the same pointer to the hash table that causes
endless loop during dumping.

Signed-off-by: Dan Ben Yosef 
Signed-off-by: Hal Rosenstock 
---
 libibnetdisc/src/ibnetdisc.c   |   39 ---
 libibnetdisc/src/ibnetdisc_cache.c |   19 ++--
 libibnetdisc/src/internal.h|4 +-
 3 files changed, 53 insertions(+), 9 deletions(-)

diff --git a/libibnetdisc/src/ibnetdisc.c b/libibnetdisc/src/ibnetdisc.c
index e0f2d78..938a516 100644
--- a/libibnetdisc/src/ibnetdisc.c
+++ b/libibnetdisc/src/ibnetdisc.c
@@ -353,7 +353,12 @@ static int recv_port_info(smp_engine_t * engine, 
ibnd_smp_t * smp,
port->lmc = node->smalmc;
}
 
-   add_to_portguid_hash(port, f_int->fabric.portstbl);
+   int rc1 = add_to_portguid_hash(port, f_int->fabric.portstbl);
+   if (rc1)
+   IBND_ERROR("Error Occurred when trying"
+  " to insert new port guid 0x%016" PRIx64 " to DB\n",
+  port->guid);
+
add_to_portlid_hash(port, f_int->lid2guid);
 
if ((scan->cfg->flags & IBND_CONFIG_MLX_EPI)
@@ -458,7 +463,11 @@ static ibnd_node_t *create_node(smp_engine_t * engine, 
ib_portid_t * path,
rc->path_portid = *path;
memcpy(rc->info, node_info, sizeof(rc->info));
 
-   add_to_nodeguid_hash(rc, f_int->fabric.nodestbl);
+   int rc1 = add_to_nodeguid_hash(rc, f_int->fabric.nodestbl);
+   if (rc1)
+   IBND_ERROR("Error Occurred when trying"
+  " to insert new node guid 0x%016" PRIx64 " to DB\n",
+  rc->guid);
 
/* add this to the all nodes list */
rc->next = f_int->fabric.nodes;
@@ -607,20 +616,42 @@ ibnd_node_t *ibnd_find_node_dr(ibnd_fabric_t * fabric, 
char *dr_str)
return rc->node;
 }
 
-void add_to_nodeguid_hash(ibnd_node_t * node, ibnd_node_t * hash[])
+int add_to_nodeguid_hash(ibnd_node_t * node, ibnd_node_t * hash[])
 {
+   int rc = 0;
+   ibnd_node_t *tblnode;
int hash_idx = HASHGUID(node->guid) % HTSZ;
 
+   for (tblnode = hash[hash_idx]; tblnode; tblnode = tblnode->htnext) {
+   if (tblnode == node) {
+   IBND_ERROR("Duplicate Node: Node with guid 0x%016"
+  PRIx64 " already exists in nodes DB\n",
+  node->guid);
+   return 1;
+   }
+   }
node->htnext = hash[hash_idx];
hash[hash_idx] = node;
+   return rc;
 }
 
-void add_to_portguid_hash(ibnd_port_t * port, ibnd_port_t * hash[])
+int add_to_portguid_hash(ibnd_port_t * port, ibnd_port_t * hash[])
 {
+   int rc = 0;
+   ibnd_port_t *tblport;
int hash_idx = HASHGUID(port->guid) % HTSZ;
 
+   for (tblport = hash[hash_idx]; tblport; tblport = tblport->htnext) {
+   if (tblport == port) {
+   IBND_ERROR("Duplicate Port: Port with guid 0x%016"
+  PRIx64 " already exists in ports DB\n",
+  port->guid);
+   return 1;
+   }
+   }
port->htnext = hash[hash_idx];
hash[hash_idx] = port;
+   return rc;
 }
 
 void create_lid2guid(f_internal_t *f_int)
diff --git a/libibnetdisc/src/ibnetdisc_cache.c 
b/libibnetdisc/src/ibnetdisc_cache.c
index d4663bf..94dd004 100644
--- a/libibnetdisc/src/ibnetdisc_cache.c
+++ b/libibnetdisc/src/ibnetdisc_cache.c
@@ -515,7 +515,13 @@ static int _fill_port(ibnd_fabric_cache_t * fabric_cache, 
ibnd_node_t * node,
/* achu: needed if user wishes to re-cache a loaded fabric.
 * Otherwise, mostly unnecessary to do this.
 */
-   add_to_portguid_hash(port_cache->port, 
fabric_cache->f_int->fabric.portstbl);
+   int rc = add_to_portguid_hash(port_cache->port,
+ fabric_cache->f_int->fabric.portstbl);
+   if (rc) {
+   IBND_DEBUG("Error Occurred when trying"
+  " to insert new port guid 0x%016" PRIx64 " to DB\n",
+  port_cache->port->guid);
+   }
return 0;
 }
 
@@ -538,8 +544,15 @@ static int _rebuild_nodes(ibnd_fabric_cache_t * 
fabric_cache)
node->next = fabric_cache->f_int->fabric.nodes;
fabric_cache->f_int->fabric.nodes = node;
 
-   add_to_nodeguid_hash(node_cache->node,
-fabric_cache->f_int->fabric.nodestbl);
+   int rc = add_to_nodeguid_hash(node_cache->node,
+ fabric_cache->
+

[PATCH infiniband-diags] saquery.c: Fix saquery -D option

2015-09-10 Thread Hal Rosenstock

From: Vladimir Koushnir 
Date: Wed, 9 Sep 2015 14:29:38 +0300

-D and -list saquery options are operational only
when -N option is explicitly invoked (default option)

This patch allows using -D or --list options when no other option is
invoked.

Signed-off-by: Vladimir Koushnir 
Signed-off-by: Hal Rosenstock 
---
 src/saquery.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/src/saquery.c b/src/saquery.c
index 4343e33..db863f2 100644
--- a/src/saquery.c
+++ b/src/saquery.c
@@ -1591,6 +1591,7 @@ static int process_opt(void *context, int ch, char 
*optarg)
break;
case 'D':
node_print_desc = ALL_DESC;
+   command = SAQUERY_CMD_NODE_RECORD;
break;
case 'c':
command = SAQUERY_CMD_CLASS_PORT_INFO;
-- 
1.7.8.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] IB/core: Move SM class defines from ib_mad.h to ib_smi.h

2015-09-03 Thread Hal Rosenstock

On 9/2/2015 6:45 PM, ira.we...@intel.com wrote:
> From: Ira Weiny 
> 
> When the hfi1 driver was added these definitions were moved from the qib 
> driver
> to ib_mad.h to be used by both qib and hfi1.  They should have been moved to
> ib_smi.h instead.
> 
> Fixes: d4ab347005fb ("IB/core: Add core header changes needed for OPA")
> Signed-off-by: Ira Weiny 

Reviewed-by: Hal Rosenstock 

A couple of comment nits below.

> ---
>  drivers/infiniband/hw/qib/qib_ruc.c |  1 +
>  include/rdma/ib_mad.h   | 45 ---
>  include/rdma/ib_smi.h   | 47 
> +
>  3 files changed, 48 insertions(+), 45 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/qib/qib_ruc.c 
> b/drivers/infiniband/hw/qib/qib_ruc.c
> index f42bd0f47577..22e356ca8058 100644
> --- a/drivers/infiniband/hw/qib/qib_ruc.c
> +++ b/drivers/infiniband/hw/qib/qib_ruc.c
> @@ -32,6 +32,7 @@
>   */
>  
>  #include 
> +#include 
>  
>  #include "qib.h"
>  #include "qib_mad.h"
> diff --git a/include/rdma/ib_mad.h b/include/rdma/ib_mad.h
> index 7f2cf855a0b6..188df91d5851 100644
> --- a/include/rdma/ib_mad.h
> +++ b/include/rdma/ib_mad.h
> @@ -144,31 +144,6 @@
>  #define IB_NOTICE_PROD_ROUTERcpu_to_be16(3)
>  #define IB_NOTICE_PROD_CLASS_MGR cpu_to_be16(4)
>  
> -/*
> - * Generic trap/notice numbers
> - */
> -#define IB_NOTICE_TRAP_LLI_THRESHcpu_to_be16(129)
> -#define IB_NOTICE_TRAP_EBO_THRESHcpu_to_be16(130)
> -#define IB_NOTICE_TRAP_FLOW_UPDATE   cpu_to_be16(131)
> -#define IB_NOTICE_TRAP_CAP_MASK_CHG  cpu_to_be16(144)
> -#define IB_NOTICE_TRAP_SYS_GUID_CHG  cpu_to_be16(145)
> -#define IB_NOTICE_TRAP_BAD_MKEY  cpu_to_be16(256)
> -#define IB_NOTICE_TRAP_BAD_PKEY  cpu_to_be16(257)
> -#define IB_NOTICE_TRAP_BAD_QKEY  cpu_to_be16(258)
> -
> -/*
> - * Generic trap/notice other local changes flags (trap 144).
> - */
> -#define IB_NOTICE_TRAP_LSE_CHG   0x04/* Link Speed Enable 
> changed */
> -#define IB_NOTICE_TRAP_LWE_CHG   0x02/* Link Width Enable 
> changed */
> -#define IB_NOTICE_TRAP_NODE_DESC_CHG 0x01
> -
> -/*
> - * Generic trap/notice M_Key volation flags in dr_trunc_hop (trap 256).
> - */
> -#define IB_NOTICE_TRAP_DR_NOTICE 0x80
> -#define IB_NOTICE_TRAP_DR_TRUNC  0x40
> -
>  enum {
>   IB_MGMT_MAD_HDR = 24,
>   IB_MGMT_MAD_DATA = 232,
> @@ -282,21 +257,6 @@ struct ib_class_port_info {
>   __be32  trap_qkey;
>  };
>  
> -struct ib_node_info {
> - u8 base_version;
> - u8 class_version;
> - u8 node_type;
> - u8 num_ports;
> - __be64 sys_guid;
> - __be64 node_guid;
> - __be64 port_guid;
> - __be16 partition_cap;
> - __be16 device_id;
> - __be32 revision;
> - u8 local_port_num;
> - u8 vendor_id[3];
> -} __packed;
> -
>  struct ib_mad_notice_attr {
>   u8 generic_type;
>   u8 prod_type_msb;
> @@ -361,11 +321,6 @@ struct ib_mad_notice_attr {
>   } details;
>  };
>  
> -struct ib_vl_weight_elem {
> - u8  vl; /* VL is low 5 bits, upper 3 bits reserved */
> - u8  weight;
> -};
> -
>  /**
>   * ib_mad_send_buf - MAD data buffer and work request for sends.
>   * @next: A pointer used to chain together MADs for posting.
> diff --git a/include/rdma/ib_smi.h b/include/rdma/ib_smi.h
> index 98b9086d769a..a39eecca7bec 100644
> --- a/include/rdma/ib_smi.h
> +++ b/include/rdma/ib_smi.h
> @@ -119,10 +119,57 @@ struct ib_port_info {
>   u8 link_roundtrip_latency[3];
>  };
>  
> +struct ib_node_info {
> + u8 base_version;
> + u8 class_version;
> + u8 node_type;
> + u8 num_ports;
> + __be64 sys_guid;
> + __be64 node_guid;
> + __be64 port_guid;
> + __be16 partition_cap;
> + __be16 device_id;
> + __be32 revision;
> + u8 local_port_num;
> + u8 vendor_id[3];
> +} __packed;
> +
> +struct ib_vl_weight_elem {
> + u8  vl; /* IB: VL is low 4 bits, upper 4 bits reserved */
> +/* OPA: VL is low 5 bits, upper 3 bits reserved */
> + u8  weight;
> +};
> +
>  static inline u8
>  ib_get_smp_direction(struct ib_smp *smp)
>  {
>   return ((smp->status & IB_SMP_DIRECTION) == IB_SMP_DIRECTION);
>  }
>  
> +/*
> + * SMI Trap/Notice numbers

Nit: SM class Trap/Notice numbers rather than SMI.

> + */
> +#define IB_NOTICE_TRAP_LLI_THRESHcpu_to_be16(129)
> +#define IB_NOTICE_TRAP_EBO_THRESHcpu_to_be16(130)
> +#define IB_NOTICE_TRAP_FLOW

Re: [PATCH 1/2] IB/core: Remove unnecessary defines from ib_mad.h

2015-09-03 Thread Hal Rosenstock

On 9/2/2015 6:45 PM, ira.we...@intel.com wrote:
> From: Ira Weiny 
> 
> Remove the unused IB_NOTICE_REPRESS_* defines.
> 
> When the hfi1 driver was added these definitions were moved from the qib 
> driver
> to ib_mad.h.  They should have been removed instead.
> 
> Fixes: d4ab347005fb ("IB/core: Add core header changes needed for OPA")
> Signed-off-by: Ira Weiny 

Reviewed-by: Hal Rosenstock 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH infiniband-diags] perfquery: Fix timeout on nodes supporting RS_FEC capability

2015-09-01 Thread Hal Rosenstock

>From 0e8341980ca7133fe0a472fddc1c622f766b8f8e Mon Sep 17 00:00:00 2001
From: Dan Ben Yosef 
Date: Mon, 31 Aug 2015 10:49:04 +0300

perfquery -T (print Extended Speed Counters) times out on nodes
supporting RS_FEC capability.

Before sending ExtendedSpeedCounters PerfMgt GMP, perfquery sends
PortInfoExtended SMP to get CapMask and FECModeActive fields.
Upon reception of PortInfoExtended SMP, fec_mode_active field is
erroneously decoded to 16 bit buffer instead of 32 bit buffer.
As a result, memory corruption occurs and portid.dlid field is set to 0,
so that ExtendedSpeedCounters GMP is sent to wrong destination.

Signed-off-by: Dan Ben Yosef 
Signed-off-by: Hal Rosenstock 
---
 src/perfquery.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/perfquery.c b/src/perfquery.c
index 46451ef..9e3a307 100644
--- a/src/perfquery.c
+++ b/src/perfquery.c
@@ -469,7 +469,7 @@ static uint8_t is_rsfec_mode_active(ib_portid_t * portid, 
int port,
  uint16_t cap_mask)
 {
uint8_t data[IB_SMP_DATA_SIZE] = { 0 };
-   uint16_t fec_mode_active = 0;
+   uint32_t fec_mode_active = 0;
uint32_t pie_capmask = 0;
if (cap_mask & IS_PM_RSFEC_COUNTERS_SUP) {
if (!is_port_info_extended_supported(portid, port, srcport)) {
@@ -486,7 +486,7 @@ static uint8_t is_rsfec_mode_active(ib_portid_t * portid, 
int port,
 &fec_mode_active);
if((pie_capmask &
CL_NTOH32(IB_PORT_EXT_CAP_IS_FEC_MODE_SUPPORTED)) &&
-  (CL_NTOH16(IB_PORT_EXT_RS_FEC_MODE_ACTIVE) == 
fec_mode_active))
+  (CL_NTOH16(IB_PORT_EXT_RS_FEC_MODE_ACTIVE) == 
(fec_mode_active & 0x)))
return 1;
}
 
-- 
1.7.8.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] IB/ipoib: Suppress warning for send only join failures

2015-08-28 Thread Hal Rosenstock

On 8/27/2015 7:34 PM, Jason Gunthorpe wrote:
> On Wed, Aug 26, 2015 at 05:41:08AM -0400, Hal Rosenstock wrote:
>> On 8/25/2015 12:28 PM, Jason Gunthorpe wrote:
>>> On Tue, Aug 25, 2015 at 08:59:13AM -0400, Hal Rosenstock wrote:
>>>>> - if (mcast->logcount++ < 20) {
>>>>> - if (status == -ETIMEDOUT || status == -EAGAIN) {
>>>>> + bool silent_fail =
>>>>> + test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags) &&
>>>>> + status == -EINVAL;
>>>>
>>>> Aren't there other reasons that send only join might have EINVAL
>>>> indicated ?
>>>
>>> Not sure, the layers below all eat the detailed error code. Hopefully
>>> EINVAL isn't re-used.
>>
>> AFAIR there are a number of reasons EINVAL could occur here in which
>> case this makes this change overly silent. If so, this particular
>> failure case of send only join failure due to SM rejection (perhaps
>> ERR_REQ_INVALID SA status only)

I meant ERR_REQ_INSUFFICIENT_COMPONENTS here.

>> is best to be made unique and different
>> from the other current EINVAL failures here.
> 
> That is way to much to undertake just to silence this message.
>
> Unless you know the other EINVALs are likely to happen, I'd just
> ignore this imperfection.

That's probably the only reasonable choice in the short run :-(

-- Hal

> 
> Jason
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] IB/ipoib: Suppress warning for send only join failures

2015-08-26 Thread Hal Rosenstock

On 8/25/2015 12:28 PM, Jason Gunthorpe wrote:
> On Tue, Aug 25, 2015 at 08:59:13AM -0400, Hal Rosenstock wrote:
>>> -   if (mcast->logcount++ < 20) {
>>> -   if (status == -ETIMEDOUT || status == -EAGAIN) {
>>> +   bool silent_fail =
>>> +   test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags) &&
>>> +   status == -EINVAL;
>>
>> Aren't there other reasons that send only join might have EINVAL
>> indicated ?
> 
> Not sure, the layers below all eat the detailed error code. Hopefully
> EINVAL isn't re-used.

AFAIR there are a number of reasons EINVAL could occur here in which
case this makes this change overly silent. If so, this particular
failure case of send only join failure due to SM rejection (perhaps
ERR_REQ_INVALID SA status only) is best to be made unique and different
from the other current EINVAL failures here.

> 
>> Maybe it's better to be overly silent rather than overly
>> verbose as to not spam the log but it seems like it would make debug of
>> such cases harder.
> 
> It makes debugging harder to have worthless messages because they
> obscure what is going on. The first time I saw this I assumed there
> was an issue, but it turns out to be an expected failure.
> 
> The other issue is the way the rate limiting works:
> 
>>> +   if (mcast->logcount < 20) {
>>> +   if (status == -ETIMEDOUT || status == -EAGAIN ||
>>> +   silent_fail) {
>>> ipoib_dbg_mcast(priv, "%smulticast join failed 
>>> for %pI6, status %d\n",
>>> 
>>> test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags) ? "sendonly " : "",
>>> mcast->mcmember.mgid.raw, 
>>> status);
> 
> So wasting logcount with expected failures just results in eating
> unexpected failures...

Yes, the problem is distinguishing an "expected" failure from the real
ones and only logging the real ones.

-- Hal
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] IB/ipoib: Suppress warning for send only join failures

2015-08-25 Thread Hal Rosenstock

On 8/21/2015 7:34 PM, Jason Gunthorpe wrote:
> We expect send only joins to fail, it just means there are no listeners
> for the group. The correct thing to do is silently drop the packet
> at source.
> 
> Eg avahi will full join 224.0.0.251 which causes a send only IGMP packet
> to 224.0.0.22, and then a warning level kmessage like this:
> 
>  ib0: sendonly multicast join failed for 
> ff12:401b::::::0016, status -22
> 
> If there is no IP router listening to IGMP.
> 
> Signed-off-by: Jason Gunthorpe 
> ---
>  drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 12 ++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 
> b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> index c0e702c577d5..2d43ec542b63 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> @@ -393,8 +393,13 @@ static int ipoib_mcast_join_complete(int status,
>   goto out_locked;
>   }
>   } else {
> - if (mcast->logcount++ < 20) {
> - if (status == -ETIMEDOUT || status == -EAGAIN) {
> + bool silent_fail =
> + test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags) &&
> + status == -EINVAL;

Aren't there other reasons that send only join might have EINVAL
indicated ? Maybe it's better to be overly silent rather than overly
verbose as to not spam the log but it seems like it would make debug of
such cases harder.

> +
> + if (mcast->logcount < 20) {
> + if (status == -ETIMEDOUT || status == -EAGAIN ||
> + silent_fail) {
>   ipoib_dbg_mcast(priv, "%smulticast join failed 
> for %pI6, status %d\n",
>   
> test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags) ? "sendonly " : "",
>   mcast->mcmember.mgid.raw, 
> status);

ipoib_dbg_mcast logging is conditionalized on CONFIG_INFINIBAND_IPOIB_DEBUG

> @@ -403,6 +408,9 @@ static int ipoib_mcast_join_complete(int status,
>   
> test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags) ? "sendonly " : "",
>  mcast->mcmember.mgid.raw, status);
>   }
> +
> + if (!silent_fail)
> + mcast->logcount++;
>   }
>  
>   if (test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags) &&

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] IB/ipoib: Clean up send-only multicast joins

2015-08-25 Thread Hal Rosenstock

On 8/21/2015 7:34 PM, Jason Gunthorpe wrote:
> Even though we don't expect the group to be created by the SM we
> sill need to provide all the parameters to force the SM to validate
> they are correct.

Out of curiosity, has it been observed that there was inconsistency in
these additional IPoIB parameters between broadcast and non broadcast
groups on client side or is this just to be defensive to make sure this
does not occur ?

-- Hal

> 
> Signed-off-by: Jason Gunthorpe 
> ---
>  drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 47 
> +-
>  1 file changed, 31 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 
> b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> index 0d23e0568deb..c0e702c577d5 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> @@ -448,8 +448,8 @@ out_locked:
>   return status;
>  }
>  
> -static void ipoib_mcast_join(struct net_device *dev, struct ipoib_mcast 
> *mcast,
> -  int create)
> +/* priv->lock must be held when calling */
> +static void ipoib_mcast_join(struct net_device *dev, struct ipoib_mcast 
> *mcast)
>  {
>   struct ipoib_dev_priv *priv = netdev_priv(dev);
>   struct ib_sa_multicast *multicast;
> @@ -471,7 +471,14 @@ static void ipoib_mcast_join(struct net_device *dev, 
> struct ipoib_mcast *mcast,
>   IB_SA_MCMEMBER_REC_PKEY |
>   IB_SA_MCMEMBER_REC_JOIN_STATE;
>  
> - if (create) {
> + if (mcast != priv->broadcast) {
> + /*
> +  * RFC 4391:
> +  *  The MGID MUST use the same P_Key, Q_Key, SL, MTU,
> +  *  and HopLimit as those used in the broadcast-GID.  The rest
> +  *  of attributes SHOULD follow the values used in the
> +  *  broadcast-GID as well.
> +  */
>   comp_mask |=
>   IB_SA_MCMEMBER_REC_QKEY |
>   IB_SA_MCMEMBER_REC_MTU_SELECTOR |
> @@ -492,19 +499,35 @@ static void ipoib_mcast_join(struct net_device *dev, 
> struct ipoib_mcast *mcast,
>   rec.sl= priv->broadcast->mcmember.sl;
>   rec.flow_label= priv->broadcast->mcmember.flow_label;
>   rec.hop_limit = priv->broadcast->mcmember.hop_limit;
> +
> + /*
> +  * Historically Linux IPoIB has never properly supported SEND
> +  * ONLY join. It emulated it by not providing all the required
> +  * attributes, which is enough to prevent group creation and
> +  * detect if there are full members or not. A major problem
> +  * with supporting SEND ONLY is detecting when the group is
> +  * auto-destroyed as IPoIB will cache the MLID..
> +  */
> +#if 1
> + if (test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags))
> + comp_mask &= ~IB_SA_MCMEMBER_REC_TRAFFIC_CLASS;
> +#else
> + if (test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags))
> + rec.join_state = 4;
> +#endif
>   }
>  
> + spin_unlock_irq(&priv->lock);
>   multicast = ib_sa_join_multicast(&ipoib_sa_client, priv->ca, priv->port,
>&rec, comp_mask, GFP_KERNEL,
>ipoib_mcast_join_complete, mcast);
> + spin_lock_irq(&priv->lock);
>   if (IS_ERR(multicast)) {
>   ret = PTR_ERR(multicast);
>   ipoib_warn(priv, "ib_sa_join_multicast failed, status %d\n", 
> ret);
> - spin_lock_irq(&priv->lock);
>   /* Requeue this join task with a backoff delay */
>   __ipoib_mcast_schedule_join_thread(priv, mcast, 1);
>   clear_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags);
> - spin_unlock_irq(&priv->lock);
>   complete(&mcast->done);
>   }
>  }
> @@ -517,7 +540,6 @@ void ipoib_mcast_join_task(struct work_struct *work)
>   struct ib_port_attr port_attr;
>   unsigned long delay_until = 0;
>   struct ipoib_mcast *mcast = NULL;
> - int create = 1;
>  
>   if (!test_bit(IPOIB_FLAG_OPER_UP, &priv->flags))
>   return;
> @@ -566,7 +588,6 @@ void ipoib_mcast_join_task(struct work_struct *work)
>   if (IS_ERR_OR_NULL(priv->broadcast->mc) &&
>   !test_bit(IPOIB_MCAST_FLAG_BUSY, &priv->broadcast->flags)) {
>   mcast = priv->broadcast;
> - create = 0;
>   if (mcast->backoff > 1 &&
>   time_before(jiffies, mcast->delay_until)) {
>   delay_until = mcast->delay_until;
> @@ -590,13 +611,7 @@ void ipoib_mcast_join_task(struct work_struct *work)
>   /* Found the next unjoined group */
>   init_completion(&mcast->don

Re: [PULL REQUEST] Please pull rdma.git

2015-07-16 Thread Hal Rosenstock

On 7/16/2015 8:50 AM, Doug Ledford wrote:
> On 07/16/2015 08:45 AM, Christoph Hellwig wrote:
>> Hi Doug,
>>
>> the point is that there is no driver that even set the is_switch
>> flag in-tree and I can't find a publicly available one elsewhere
>> either.
> 
> Try looking on Mellanox.com for their source code.  I know they use
> upstream linux kernels as the base for their switch OS and in their
> switch-X driver it should set this flag.
>
> However, this is a point that I don't really care about, but I inherited
> (and I know other people care about).

There are 2 vendors that I'm aware of that use linux for their embedded
IB switch platforms: Mellanox and Bay Microsystems and there may be others.

Mellanox has switch drivers for the various different switch ICs that
have been used over time (InfiniScale IV, SwitchX/SwitchX-2, and most
Switch-IB).

-- Hal
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv3 infiniband-diags] iblinkinfo.c: Close additional file descriptor in advance

2015-07-15 Thread Hal Rosenstock


Additional file descriptor for SMP MADs should be closed before running
ibnd_discover_fabric() to avoid parallel usage of two SMP file descriptors

Signed-off-by: Vladimir Koushnir 
Signed-off-by: Hal Rosenstock 
---
Change since v2:
Only SMP query for NodeInfo if (!all && dr_path) - same as current code

Change since v1:
Per Ira's comment, moved location of SMP query for NodeInfo
so partial fabric DR query will work.

diff --git a/src/iblinkinfo.c b/src/iblinkinfo.c
index 92ff3c6..5f2b677 100644
--- a/src/iblinkinfo.c
+++ b/src/iblinkinfo.c
@@ -594,6 +594,7 @@ int main(int argc, char **argv)
ibnd_fabric_t *diff_fabric = NULL;
struct ibmad_port *ibmad_port;
ib_portid_t port_id = { 0 };
+   uint8_t ni[IB_SMP_DATA_SIZE] = { 0 };
int mgmt_classes[3] =
{ IB_SMI_CLASS, IB_SMI_DIRECT_CLASS, IB_SA_CLASS };
 
@@ -659,6 +660,7 @@ int main(int argc, char **argv)
node_name_map = open_node_name_map(node_name_map_file);
 
if (dr_path && load_cache_file) {
+   mad_rpc_close_port(ibmad_port);
fprintf(stderr, "Cannot specify cache and direct route path\n");
exit(1);
}
@@ -679,6 +681,16 @@ int main(int argc, char **argv)
   guid_str);
}
 
+   if (!all && dr_path) {
+   if (!smp_query_via(ni, &port_id, IB_ATTR_NODE_INFO, 0,
+  ibd_timeout, ibmad_port)){
+   mad_rpc_close_port(ibmad_port);
+   fprintf(stderr, "Failed to get local Node Info\n");
+   exit(1);
+   }
+   }
+   mad_rpc_close_port(ibmad_port);
+
if (diff_cache_file &&
!(diff_fabric = ibnd_load_fabric(diff_cache_file, 0)))
IBEXIT("loading cached fabric for diff failed\n");
@@ -723,11 +735,6 @@ int main(int argc, char **argv)
fprintf(stderr, "Failed to find port: %s\n", guid_str);
} else if (!all && dr_path) {
ibnd_port_t *p = NULL;
-   uint8_t ni[IB_SMP_DATA_SIZE] = { 0 };
-
-   if (!smp_query_via(ni, &port_id, IB_ATTR_NODE_INFO, 0,
-  ibd_timeout, ibmad_port))
-   return -1;
mad_decode_field(ni, IB_NODE_PORT_GUID_F, &(guid));
 
p = ibnd_find_port_guid(fabric, guid);
@@ -758,6 +765,5 @@ int main(int argc, char **argv)
 
 close_port:
close_node_name_map(node_name_map);
-   mad_rpc_close_port(ibmad_port);
exit(rc);
 }
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv2 infiniband-diags] iblinkinfo.c: Close additional file descriptor inadvance

2015-07-15 Thread Hal Rosenstock


Additional file descriptor for SMP MADs should be closed before running
ibnd_discover_fabric() to avoid parallel usage of two SMP file descriptors

Signed-off-by: Vladimir Koushnir 
Signed-off-by: Hal Rosenstock 
---
Change since v1:
Per Ira's comment, moved location of SMP query for NodeInfo
so partial fabric DR query will work.

diff --git a/src/iblinkinfo.c b/src/iblinkinfo.c
index 92ff3c6..677862f 100644
--- a/src/iblinkinfo.c
+++ b/src/iblinkinfo.c
@@ -594,6 +594,7 @@ int main(int argc, char **argv)
ibnd_fabric_t *diff_fabric = NULL;
struct ibmad_port *ibmad_port;
ib_portid_t port_id = { 0 };
+   uint8_t ni[IB_SMP_DATA_SIZE] = { 0 };
int mgmt_classes[3] =
{ IB_SMI_CLASS, IB_SMI_DIRECT_CLASS, IB_SA_CLASS };
 
@@ -659,6 +660,7 @@ int main(int argc, char **argv)
node_name_map = open_node_name_map(node_name_map_file);
 
if (dr_path && load_cache_file) {
+   mad_rpc_close_port(ibmad_port);
fprintf(stderr, "Cannot specify cache and direct route path\n");
exit(1);
}
@@ -679,6 +681,15 @@ int main(int argc, char **argv)
   guid_str);
}
 
+   if (!smp_query_via(ni, &port_id, IB_ATTR_NODE_INFO, 0,
+  ibd_timeout, ibmad_port)){
+   mad_rpc_close_port(ibmad_port);
+   fprintf(stderr,
+   "Failed to get local Node Info\n");
+   exit(1);
+   }
+   mad_rpc_close_port(ibmad_port);
+
if (diff_cache_file &&
!(diff_fabric = ibnd_load_fabric(diff_cache_file, 0)))
IBEXIT("loading cached fabric for diff failed\n");
@@ -723,11 +734,6 @@ int main(int argc, char **argv)
fprintf(stderr, "Failed to find port: %s\n", guid_str);
} else if (!all && dr_path) {
ibnd_port_t *p = NULL;
-   uint8_t ni[IB_SMP_DATA_SIZE] = { 0 };
-
-   if (!smp_query_via(ni, &port_id, IB_ATTR_NODE_INFO, 0,
-  ibd_timeout, ibmad_port))
-   return -1;
mad_decode_field(ni, IB_NODE_PORT_GUID_F, &(guid));
 
p = ibnd_find_port_guid(fabric, guid);
@@ -758,6 +764,5 @@ int main(int argc, char **argv)
 
 close_port:
close_node_name_map(node_name_map);
-   mad_rpc_close_port(ibmad_port);
exit(rc);
 }
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 2/4 infiniband-diags] ibqueryerrors: Close global file descriptor before running ibnd_discover_fabric

2015-07-15 Thread Hal Rosenstock

On 7/3/2015 11:24 AM, ira.weiny wrote:
> On Sun, Apr 26, 2015 at 03:33:50PM -0400, Hal Rosenstock wrote:
>> From: Vladimir Koushnir 
>> Date: Sun, 26 Apr 2015 12:24:06 +0300
>>
>> Global file descriptor for SMPs and GMPs should be closed before running
>> ibnd_discover_fabric() to avoid parallel usage of two SMP file descriptors
>>
>> Signed-off-by: Vladimir Koushnir 
>> Signed-off-by: Hal Rosenstock 
> 
> This patch made the goto labels misleading when reading the code.  I changed
> the name of the goto labels.

These label name changes look fine to me. Thanks.

-- Hal

> Please verify I did not break anything on your end.
> 
> Applied with fix ups to the goto labels.
> 
> Ira
> 
>> ---
>> Changes since v1:
>> Fixed direct route (-D) option
>>
>> diff --git a/src/ibqueryerrors.c b/src/ibqueryerrors.c
>> index 06fcbac..80436d3 100644
>> --- a/src/ibqueryerrors.c
>> +++ b/src/ibqueryerrors.c
>> @@ -1000,18 +1002,20 @@ int main(int argc, char **argv)
>>  config.flags = ibd_ibnetdisc_flags;
>>  config.mkey = ibd_mkey;
>>  
>> -node_name_map = open_node_name_map(node_name_map_file);
>> -
>>  if (dr_path && load_cache_file) {
>> +mad_rpc_close_port(ibmad_port);
>>  fprintf(stderr, "Cannot specify cache and direct route path\n");
>>  exit(-1);
>>  }
>>  
>>  if (resolve_self(ibd_ca, ibd_ca_port, &self_portid, &port, 
>> &self_gid.raw) < 0) {
>> +mad_rpc_close_port(ibmad_port);
>>  IBEXIT("can't resolve self port %s", argv[0]);
>>  goto close_port;
>>  }
>>  
>> +node_name_map = open_node_name_map(node_name_map_file);
>> +
>>  /* limit the scan the fabric around the target */
>>  if (dr_path) {
>>  if ((resolved =
>> @@ -1030,10 +1034,13 @@ int main(int argc, char **argv)
>>  lid2sl_table[portid.lid] = portid.sl;
>>  }
>>  
>> +mad_rpc_close_port(ibmad_port);
>> +
>>  if (load_cache_file) {
>>  if ((fabric = ibnd_load_fabric(load_cache_file, 0)) == NULL) {
>>  fprintf(stderr, "loading cached fabric failed\n");
>> -exit(-1);
>> +rc = -1;
>> +goto close_port;
>>  }
>>  } else {
>>  if (resolved >= 0) {
>> @@ -1057,6 +1064,21 @@ int main(int argc, char **argv)
>>  
>>  set_thresholds(threshold_file);
>>  
>> +/* reopen the global ibmad_port */
>> +ibmad_port = mad_rpc_open_port(ibd_ca, ibd_ca_port,
>> +   mgmt_classes, 4);
>> +if (!ibmad_port) {
>> +ibnd_destroy_fabric(fabric);
>> +close_node_name_map(node_name_map);
>> +IBEXIT("Failed to reopen port: %s:%d\n",
>> +ibd_ca, ibd_ca_port);
>> +}
>> +
>> +smp_mkey_set(ibmad_port, ibd_mkey);
>> +
>> +if (ibd_timeout)
>> +mad_rpc_set_timeout(ibmad_port, ibd_timeout);
>> +
>>  if (port_guid_str) {
>>  ibnd_port_t *port = ibnd_find_port_guid(fabric, port_guid);
>>  if (port)
>> @@ -1067,12 +1089,12 @@ int main(int argc, char **argv)
>>  } else if (dr_path) {
>>  ibnd_port_t *port = ibnd_find_port_dr(fabric, dr_path);
>>  uint8_t ni[IB_SMP_DATA_SIZE] = { 0 };
>> -
>>  if (!smp_query_via(ni, &portid, IB_ATTR_NODE_INFO, 0,
>> -   ibd_timeout, ibmad_port)) {
>> -rc = -1;
>> -goto destroy_fabric;
>> +   ibd_timeout, ibmad_port)) {
>> +fprintf(stderr, "Failed to query local Node 
>> Info\n");
>> +goto destroy_fabric;
>>  }
>> +
>>  mad_decode_field(ni, IB_NODE_PORT_GUID_F, &(port_guid));
>>  
>>  port = ibnd_find_port_guid(fabric, port_guid);
>> @@ -1087,6 +1109,7 @@ int main(int argc, char **argv)
>>  if(obtain_sl)
>>  if(path_record_query(self_gid,0))
>>  goto destroy_fabric;
>> +
>>  ibnd_iter_nodes(fabric, print_node, NULL);
>>  }
>>  
>> @@ -1095,10 +1118,10 @@ int main(int argc, char **argv)
>>  rc = 1;
>>  
>>  destroy_fabric:
>> +mad_rpc_close_port(ibmad_port);
>>  ibnd_destroy_fabric(fabric);
>>  
>>  close_port:
>> -mad_rpc_close_port(ibmad_port);
>>  close_node_name_map(node_name_map);
>>  exit(rc);
>>  }
>> -- 
>> 1.7.8.2
>>
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] IB: Add rdma_cap_ib_switch helper and use where appropriate

2015-06-29 Thread Hal Rosenstock

Persuant to Liran's comments on node_type on linux-rdma
mailing list:

In an effort to reform the RDMA core and ULPs to minimize use of
node_type in struct ib_device, an additional bit is added to
struct ib_device for is_switch (IB switch). This is needed 
to be initialized by any IB switch device driver. This is a 
NEW requirement on such device drivers which are all
"out of tree".

In addition, an ib_switch helper was added to ib_verbs.h
based on the is_switch device bit rather than node_type
(although those should be consistent).

The RDMA core (MAD, SMI, agent, sa_query, multicast, sysfs)
as well as (IPoIB and SRP) ULPs are updated where
appropriate to use this new helper. In some cases,
the helper is now used under the covers of using
rdma_[start end]_port rather than the open coding
previously used.

Reviewed-by: Sean Hefty 
Reviewed-By: Jason Gunthorpe 
Reviewed-by: Ira Weiny 
Tested-by: Ira Weiny 
Signed-off-by: Hal Rosenstock 
---
Identical to RFCv2 patch with additional Reviewed and Tested by lines

diff --git a/drivers/infiniband/core/agent.c b/drivers/infiniband/core/agent.c
index c7dcfe4..0429040 100644
--- a/drivers/infiniband/core/agent.c
+++ b/drivers/infiniband/core/agent.c
@@ -88,7 +88,7 @@ void agent_send_response(const struct ib_mad_hdr *mad_hdr, 
const struct ib_grh *
struct ib_ah *ah;
struct ib_mad_send_wr_private *mad_send_wr;
 
-   if (device->node_type == RDMA_NODE_IB_SWITCH)
+   if (rdma_cap_ib_switch(device))
port_priv = ib_get_agent_port(device, 0);
else
port_priv = ib_get_agent_port(device, port_num);
@@ -122,7 +122,7 @@ void agent_send_response(const struct ib_mad_hdr *mad_hdr, 
const struct ib_grh *
memcpy(send_buf->mad, mad_hdr, resp_mad_len);
send_buf->ah = ah;
 
-   if (device->node_type == RDMA_NODE_IB_SWITCH) {
+   if (rdma_cap_ib_switch(device)) {
mad_send_wr = container_of(send_buf,
   struct ib_mad_send_wr_private,
   send_buf);
diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index a4b1466..c82d751 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -769,7 +769,7 @@ static int handle_outgoing_dr_smp(struct 
ib_mad_agent_private *mad_agent_priv,
bool opa = rdma_cap_opa_mad(mad_agent_priv->qp_info->port_priv->device,

mad_agent_priv->qp_info->port_priv->port_num);
 
-   if (device->node_type == RDMA_NODE_IB_SWITCH &&
+   if (rdma_cap_ib_switch(device) &&
smp->mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE)
port_num = send_wr->wr.ud.port_num;
else
@@ -787,7 +787,8 @@ static int handle_outgoing_dr_smp(struct 
ib_mad_agent_private *mad_agent_priv,
if ((opa_get_smp_direction(opa_smp)
 ? opa_smp->route.dr.dr_dlid : opa_smp->route.dr.dr_slid) ==
 OPA_LID_PERMISSIVE &&
-opa_smi_handle_dr_smp_send(opa_smp, device->node_type,
+opa_smi_handle_dr_smp_send(opa_smp,
+   rdma_cap_ib_switch(device),
port_num) == IB_SMI_DISCARD) {
ret = -EINVAL;
dev_err(&device->dev, "OPA Invalid directed route\n");
@@ -810,7 +811,7 @@ static int handle_outgoing_dr_smp(struct 
ib_mad_agent_private *mad_agent_priv,
} else {
if ((ib_get_smp_direction(smp) ? smp->dr_dlid : smp->dr_slid) ==
 IB_LID_PERMISSIVE &&
-smi_handle_dr_smp_send(smp, device->node_type, port_num) ==
+smi_handle_dr_smp_send(smp, rdma_cap_ib_switch(device), 
port_num) ==
 IB_SMI_DISCARD) {
ret = -EINVAL;
dev_err(&device->dev, "Invalid directed route\n");
@@ -2030,7 +2031,7 @@ static enum smi_action handle_ib_smi(const struct 
ib_mad_port_private *port_priv
struct ib_smp *smp = (struct ib_smp *)recv->mad;
 
if (smi_handle_dr_smp_recv(smp,
-  port_priv->device->node_type,
+  rdma_cap_ib_switch(port_priv->device),
   port_num,
   port_priv->device->phys_port_cnt) ==
   IB_SMI_DISCARD)
@@ -2042,13 +2043,13 @@ static enum smi_action handle_ib_smi(const struct 
ib_mad_port_private *port_priv
 
if (retsmi == IB_SMI_SEND) { /* don't forward */
if (smi_handle_dr_smp_send(smp,
-  port_priv->device->node_type,
+

Re: [PATCHv2 RFC] IB: Add rdma_cap_ib_switch helper and use where appropriate

2015-06-22 Thread Hal Rosenstock

On 6/22/2015 6:40 PM, Hefty, Sean wrote:
>> Note that there still remain a couple of node type checks in the kernel
>> that we may want to remove. There's an IB CA check in cma.c:rdma_notify
> 
> This should check for rdma_cap_ib_cm()

IB, RoCE, and OPA ports have the RDMA_CORE_CAP_IB_CM set so this seems OK.

For IB switches, there are 2 cases: enhanced switch port 0s can support CM and 
base switch port 0s would not so this seems better than the IB CA node type 
current check.

> 
>> as well as in rds/ib.c:rds_ib_add_one and rds_ib_laddr_check and
> 
> Not sure what the underlying reason is for these checks.

rds_ib_add_one says:
/* Only handle IB (no iWARP) devices */
but not sure comment is 100% accurate as it checks node type != IB CA.

rds_ib_laddr_check says:
/* rdma_bind_addr will only succeed for IB & iWARP devices */
/* due to this, we will claim to support iWARP devices unless we check 
node_type. */
It's similar to above in that it checks node type != IB CA and does not seem 
100% accurate.

Comments and node type check are from Andy Grover back in 2009 (in the original 
RDS commit).

>> an RNIC check in rds/iw.c:rds_iw_add_one and rds_iw_laddr_check
> 
> rdma_cap_iw_cm()

iWARP ports are the only ones to have the RDMA_CORE_CAP_IW_CM bit so that seems 
right.

> The intent is to clarify why the checks that exist are made and replace them 
> with a check that clearly conveys 
> what the restriction is.  So, yes, the use of node_type should be replaced.

Understood.

-- Hal

> - Sean
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in

Re: [PATCHv2 RFC] IB: Add rdma_cap_ib_switch helper and use where appropriate

2015-06-22 Thread Hal Rosenstock

On 6/22/2015 4:20 PM, Jason Gunthorpe wrote:
> On Thu, Jun 18, 2015 at 07:25:32PM -0400, Hal Rosenstock wrote:
>> Persuant to Liran's comments on node_type on linux-rdma
>> mailing list:
>>
>> In an effort to reform the RDMA core and ULPs to minimize use of
>> node_type in struct ib_device, an additional bit is added to
>> struct ib_device for is_switch (IB switch). This is needed 
>> to be initialized by any IB switch device driver. This is a 
>> NEW requirement on such device drivers which are all
>> "out of tree".
>>
>> In addition, an ib_switch helper was added to ib_verbs.h
>> based on the is_switch device bit rather than node_type
>> (although those should be consistent).
>>
>> The RDMA core (MAD, SMI, agent, sa_query, multicast, sysfs)
>> as well as (IPoIB and SRP) ULPs are updated where
>> appropriate to use this new helper. In some cases,
>> the helper is now used under the covers of using
>> rdma_[start end]_port rather than the open coding
>> previously used.
>>
>> Signed-off-by: Hal Rosenstock 
> 
> Looks pretty good now.
> 
> Reviewed-By: Jason Gunthorpe 
> 
> Although a bitfield isn't my preference:
> 
>> index 986fddb..b0f898e 100644
>> +++ b/include/rdma/ib_verbs.h
>> @@ -1745,6 +1745,7 @@ struct ib_device {
>>  char node_desc[64];
>>  __be64   node_guid;
>>  u32  local_dma_lkey;
>> +u16  is_switch:1;
>>  u8   node_type;
>>  u8   phys_port_cnt;

What would be better/what would be your preference ?

Note that there still remain a couple of node type checks in the kernel
that we may want to remove. There's an IB CA check in cma.c:rdma_notify
as well as in rds/ib.c:rds_ib_add_one and rds_ib_laddr_check and an RNIC
check in rds/iw.c:rds_iw_add_one and rds_iw_laddr_check. Should these be
changed not to use node type ?

-- Hal

> Jason
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in

Re: [PATCH 14/14] IB/mad: Add final OPA MAD processing

2015-06-19 Thread Hal Rosenstock

On 6/18/2015 5:00 PM, Doug Ledford wrote:
> There is *zero* functional difference between node_type == OPA or node_type 
> == IB_CA and link_layer == OPA. 
> An application has *exactly* what they need

We have neither of these things in the kernel today. Also, if I interpreted 
what was written by first Ira and more recently Sean, even if either of these 
things were done, the user space provider library for OPA might/could just 
change these back to the IB types.

-- Hal
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in

[PATCHv2 RFC] IB: Add rdma_cap_ib_switch helper and use where appropriate

2015-06-18 Thread Hal Rosenstock

Persuant to Liran's comments on node_type on linux-rdma
mailing list:

In an effort to reform the RDMA core and ULPs to minimize use of
node_type in struct ib_device, an additional bit is added to
struct ib_device for is_switch (IB switch). This is needed 
to be initialized by any IB switch device driver. This is a 
NEW requirement on such device drivers which are all
"out of tree".

In addition, an ib_switch helper was added to ib_verbs.h
based on the is_switch device bit rather than node_type
(although those should be consistent).

The RDMA core (MAD, SMI, agent, sa_query, multicast, sysfs)
as well as (IPoIB and SRP) ULPs are updated where
appropriate to use this new helper. In some cases,
the helper is now used under the covers of using
rdma_[start end]_port rather than the open coding
previously used.

Signed-off-by: Hal Rosenstock 
---
Changes since v1:
Per Jason and Sean's comments, used Ira's rdma_[start end]_port
routines where needed and made those routines in ib_verbs.h 
use the new is_switch helper.

diff --git a/drivers/infiniband/core/agent.c b/drivers/infiniband/core/agent.c
index c7dcfe4..0429040 100644
--- a/drivers/infiniband/core/agent.c
+++ b/drivers/infiniband/core/agent.c
@@ -88,7 +88,7 @@ void agent_send_response(const struct ib_mad_hdr *mad_hdr, 
const struct ib_grh *
struct ib_ah *ah;
struct ib_mad_send_wr_private *mad_send_wr;
 
-   if (device->node_type == RDMA_NODE_IB_SWITCH)
+   if (rdma_cap_ib_switch(device))
port_priv = ib_get_agent_port(device, 0);
else
port_priv = ib_get_agent_port(device, port_num);
@@ -122,7 +122,7 @@ void agent_send_response(const struct ib_mad_hdr *mad_hdr, 
const struct ib_grh *
memcpy(send_buf->mad, mad_hdr, resp_mad_len);
send_buf->ah = ah;
 
-   if (device->node_type == RDMA_NODE_IB_SWITCH) {
+   if (rdma_cap_ib_switch(device)) {
mad_send_wr = container_of(send_buf,
   struct ib_mad_send_wr_private,
   send_buf);
diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index a4b1466..c82d751 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -769,7 +769,7 @@ static int handle_outgoing_dr_smp(struct 
ib_mad_agent_private *mad_agent_priv,
bool opa = rdma_cap_opa_mad(mad_agent_priv->qp_info->port_priv->device,

mad_agent_priv->qp_info->port_priv->port_num);
 
-   if (device->node_type == RDMA_NODE_IB_SWITCH &&
+   if (rdma_cap_ib_switch(device) &&
smp->mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE)
port_num = send_wr->wr.ud.port_num;
else
@@ -787,7 +787,8 @@ static int handle_outgoing_dr_smp(struct 
ib_mad_agent_private *mad_agent_priv,
if ((opa_get_smp_direction(opa_smp)
 ? opa_smp->route.dr.dr_dlid : opa_smp->route.dr.dr_slid) ==
 OPA_LID_PERMISSIVE &&
-opa_smi_handle_dr_smp_send(opa_smp, device->node_type,
+opa_smi_handle_dr_smp_send(opa_smp,
+   rdma_cap_ib_switch(device),
port_num) == IB_SMI_DISCARD) {
ret = -EINVAL;
dev_err(&device->dev, "OPA Invalid directed route\n");
@@ -810,7 +811,7 @@ static int handle_outgoing_dr_smp(struct 
ib_mad_agent_private *mad_agent_priv,
} else {
if ((ib_get_smp_direction(smp) ? smp->dr_dlid : smp->dr_slid) ==
 IB_LID_PERMISSIVE &&
-smi_handle_dr_smp_send(smp, device->node_type, port_num) ==
+smi_handle_dr_smp_send(smp, rdma_cap_ib_switch(device), 
port_num) ==
 IB_SMI_DISCARD) {
ret = -EINVAL;
dev_err(&device->dev, "Invalid directed route\n");
@@ -2030,7 +2031,7 @@ static enum smi_action handle_ib_smi(const struct 
ib_mad_port_private *port_priv
struct ib_smp *smp = (struct ib_smp *)recv->mad;
 
if (smi_handle_dr_smp_recv(smp,
-  port_priv->device->node_type,
+  rdma_cap_ib_switch(port_priv->device),
   port_num,
   port_priv->device->phys_port_cnt) ==
   IB_SMI_DISCARD)
@@ -2042,13 +2043,13 @@ static enum smi_action handle_ib_smi(const struct 
ib_mad_port_private *port_priv
 
if (retsmi == IB_SMI_SEND) { /* don't forward */
if (smi_handle_dr_smp_send(smp,
-  port_priv->device->node_type,
+

Re: [PATCH RFC] IB: Add rdma_cap_ib_switch helper and use where appropriate

2015-06-18 Thread Hal Rosenstock

On 6/18/2015 4:32 PM, Hefty, Sean wrote:
>> In an effort to reform the RDMA core and ULPs to minimize use of
>> node_type in struct ib_device, an additional bit is added to
>> struct ib_device for is_switch (IB switch). This is needed
>> to be initialized by any IB switch device driver. This is a
>> NEW requirement on such device drivers which are all
>> "out of tree".
> 
> I agree with Jason, only I would add we need to decide if we want to continue 
> supporting the out of tree devices.  
> We have for so long that it seems reasonable to do so.  

Yes, it seems reasonable to me to continue supporting them.

> At the same time, I think we'd reject these changes now without a lower level 
> user of them.

Although this is the usual kernel standard, the RDMA code broke this
precedent from day one where switches were supported so I, for one,
don't think it's appropriate to reverse that decision now.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC] IB: Add rdma_cap_ib_switch helper and use where appropriate

2015-06-18 Thread Hal Rosenstock

On 6/18/2015 4:25 PM, Jason Gunthorpe wrote:
> Ira had patch that made some functions related to this public, not
> sure if it is applied yet..

Which patch from Ira are you referring to ?

-- Hal
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH RFC] IB: Add rdma_cap_ib_switch helper and use where appropriate

2015-06-18 Thread Hal Rosenstock

Persuant to Liran's comments on node_type on linux-rdma
mailing list:

In an effort to reform the RDMA core and ULPs to minimize use of
node_type in struct ib_device, an additional bit is added to
struct ib_device for is_switch (IB switch). This is needed 
to be initialized by any IB switch device driver. This is a 
NEW requirement on such device drivers which are all
"out of tree".

In addition, an ib_switch helper was added to ib_verbs.h
based on the is_switch device bit rather than node_type
(although those should be consistent).

The RDMA core (MAD, SMI, agent, sa_query, multicast, sysfs)
as well as (IPoIB and SRP) ULPs are updated where
appropriate to use this new helper.

Signed-off-by: Hal Rosenstock 
---
diff --git a/drivers/infiniband/core/agent.c b/drivers/infiniband/core/agent.c
index c7dcfe4..0429040 100644
--- a/drivers/infiniband/core/agent.c
+++ b/drivers/infiniband/core/agent.c
@@ -88,7 +88,7 @@ void agent_send_response(const struct ib_mad_hdr *mad_hdr, 
const struct ib_grh *
struct ib_ah *ah;
struct ib_mad_send_wr_private *mad_send_wr;
 
-   if (device->node_type == RDMA_NODE_IB_SWITCH)
+   if (rdma_cap_ib_switch(device))
port_priv = ib_get_agent_port(device, 0);
else
port_priv = ib_get_agent_port(device, port_num);
@@ -122,7 +122,7 @@ void agent_send_response(const struct ib_mad_hdr *mad_hdr, 
const struct ib_grh *
memcpy(send_buf->mad, mad_hdr, resp_mad_len);
send_buf->ah = ah;
 
-   if (device->node_type == RDMA_NODE_IB_SWITCH) {
+   if (rdma_cap_ib_switch(device)) {
mad_send_wr = container_of(send_buf,
   struct ib_mad_send_wr_private,
   send_buf);
diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index a4b1466..ced077e 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -769,7 +769,7 @@ static int handle_outgoing_dr_smp(struct 
ib_mad_agent_private *mad_agent_priv,
bool opa = rdma_cap_opa_mad(mad_agent_priv->qp_info->port_priv->device,

mad_agent_priv->qp_info->port_priv->port_num);
 
-   if (device->node_type == RDMA_NODE_IB_SWITCH &&
+   if (rdma_cap_ib_switch(device) &&
smp->mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE)
port_num = send_wr->wr.ud.port_num;
else
@@ -787,7 +787,8 @@ static int handle_outgoing_dr_smp(struct 
ib_mad_agent_private *mad_agent_priv,
if ((opa_get_smp_direction(opa_smp)
 ? opa_smp->route.dr.dr_dlid : opa_smp->route.dr.dr_slid) ==
 OPA_LID_PERMISSIVE &&
-opa_smi_handle_dr_smp_send(opa_smp, device->node_type,
+opa_smi_handle_dr_smp_send(opa_smp,
+   rdma_cap_ib_switch(device),
port_num) == IB_SMI_DISCARD) {
ret = -EINVAL;
dev_err(&device->dev, "OPA Invalid directed route\n");
@@ -810,7 +811,7 @@ static int handle_outgoing_dr_smp(struct 
ib_mad_agent_private *mad_agent_priv,
} else {
if ((ib_get_smp_direction(smp) ? smp->dr_dlid : smp->dr_slid) ==
 IB_LID_PERMISSIVE &&
-smi_handle_dr_smp_send(smp, device->node_type, port_num) ==
+smi_handle_dr_smp_send(smp, rdma_cap_ib_switch(device), 
port_num) ==
 IB_SMI_DISCARD) {
ret = -EINVAL;
dev_err(&device->dev, "Invalid directed route\n");
@@ -2030,7 +2031,7 @@ static enum smi_action handle_ib_smi(const struct 
ib_mad_port_private *port_priv
struct ib_smp *smp = (struct ib_smp *)recv->mad;
 
if (smi_handle_dr_smp_recv(smp,
-  port_priv->device->node_type,
+  rdma_cap_ib_switch(port_priv->device),
   port_num,
   port_priv->device->phys_port_cnt) ==
   IB_SMI_DISCARD)
@@ -2042,13 +2043,13 @@ static enum smi_action handle_ib_smi(const struct 
ib_mad_port_private *port_priv
 
if (retsmi == IB_SMI_SEND) { /* don't forward */
if (smi_handle_dr_smp_send(smp,
-  port_priv->device->node_type,
+  
rdma_cap_ib_switch(port_priv->device),
   port_num) == IB_SMI_DISCARD)
return IB_SMI_DISCARD;
 
if (smi_check_local_smp(smp, port_priv->device) == 
IB_SMI_DISCARD)
return IB_SMI_D

Re: [PATCH v3 01/49] IB/core: Add header definitions

2015-06-17 Thread Hal Rosenstock

On 6/17/2015 8:28 AM, Mike Marciniszyn wrote:
> From: Ira Weiny 
> 
> Add common OPA header definitions for driver
> build:
> - opa_port_info.h
> - opa_smi.h
> - hfi1_user.sh
> 
> Additionally, ib_mad.h, has additional definitions
> that are common to ib_drivers including:
> - trap support
> - cca support
> 
> The qib driver has the duplication removed in favor
> those in ib_mad.h
> 
> Reviewed-by: Mike Marciniszyn 
> Reviewed-by: John, Jubin 
> Signed-off-by: Ira Weiny 
> ---
>  drivers/infiniband/hw/qib/qib_mad.h |  147 +---
>  include/rdma/ib_mad.h   |  138 +++
>  include/rdma/opa_port_info.h|  433 
> +++

Should opa_port_info.h be in include/rdma or in drivers/infiniband/hw/hfi1 ?

>  include/rdma/opa_smi.h  |   47 
>  include/uapi/rdma/hfi/hfi1_user.h   |  427 
> +++
>  5 files changed, 1053 insertions(+), 139 deletions(-)
>  create mode 100644 include/rdma/opa_port_info.h
>  create mode 100644 include/uapi/rdma/hfi/hfi1_user.h
> 
> diff --git a/drivers/infiniband/hw/qib/qib_mad.h 
> b/drivers/infiniband/hw/qib/qib_mad.h
> index 941d4d5..57e99dc 100644
> --- a/drivers/infiniband/hw/qib/qib_mad.h
> +++ b/drivers/infiniband/hw/qib/qib_mad.h
> @@ -36,148 +36,17 @@
>  
>  #include 
>  
> -#define IB_SMP_UNSUP_VERSIONcpu_to_be16(0x0004)
> -#define IB_SMP_UNSUP_METHOD cpu_to_be16(0x0008)
> -#define IB_SMP_UNSUP_METH_ATTR  cpu_to_be16(0x000C)
> -#define IB_SMP_INVALID_FIELDcpu_to_be16(0x001C)
> +#define IB_SMP_UNSUP_VERSION \
> +cpu_to_be16(IB_MGMT_MAD_STATUS_BAD_VERSION)
>  
> -struct ib_node_info {
> - u8 base_version;
> - u8 class_version;
> - u8 node_type;
> - u8 num_ports;
> - __be64 sys_guid;
> - __be64 node_guid;
> - __be64 port_guid;
> - __be16 partition_cap;
> - __be16 device_id;
> - __be32 revision;
> - u8 local_port_num;
> - u8 vendor_id[3];
> -} __packed;
> -
> -struct ib_mad_notice_attr {
> - u8 generic_type;
> - u8 prod_type_msb;
> - __be16 prod_type_lsb;
> - __be16 trap_num;
> - __be16 issuer_lid;
> - __be16 toggle_count;
> -
> - union {
> - struct {
> - u8  details[54];
> - } raw_data;
> -
> - struct {
> - __be16  reserved;
> - __be16  lid;/* where violation happened */
> - u8  port_num;   /* where violation happened */
> - } __packed ntc_129_131;
> -
> - struct {
> - __be16  reserved;
> - __be16  lid;/* LID where change occurred */
> - u8  reserved2;
> - u8  local_changes;  /* low bit - local changes */
> - __be32  new_cap_mask;   /* new capability mask */
> - u8  reserved3;
> - u8  change_flags;   /* low 3 bits only */
> - } __packed ntc_144;
> -
> - struct {
> - __be16  reserved;
> - __be16  lid;/* lid where sys guid changed */
> - __be16  reserved2;
> - __be64  new_sys_guid;
> - } __packed ntc_145;
> -
> - struct {
> - __be16  reserved;
> - __be16  lid;
> - __be16  dr_slid;
> - u8  method;
> - u8  reserved2;
> - __be16  attr_id;
> - __be32  attr_mod;
> - __be64  mkey;
> - u8  reserved3;
> - u8  dr_trunc_hop;
> - u8  dr_rtn_path[30];
> - } __packed ntc_256;
> -
> - struct {
> - __be16  reserved;
> - __be16  lid1;
> - __be16  lid2;
> - __be32  key;
> - __be32  sl_qp1; /* SL: high 4 bits */
> - __be32  qp2;/* high 8 bits reserved */
> - union ib_gidgid1;
> - union ib_gidgid2;
> - } __packed ntc_257_258;
> -
> - } details;
> -};
> -
> -/*
> - * Generic trap/notice types
> - */
> -#define IB_NOTICE_TYPE_FATAL 0x80
> -#define IB_NOTICE_TYPE_URGENT0x81
> -#define IB_NOTICE_TYPE_SECURITY  0x82
> -#define IB_NOTICE_TYPE_SM0x83
> -#define IB_NOTICE_TYPE_INFO  0x84
> +#define IB_SMP_UNSUP_METHOD \
> +cpu_to_be16(IB_MGMT_MAD_STATUS_UNSUPPORTED_METHOD)
>  
> -/*
> - * Generic trap/notice producers
> - */
> -#define IB_NOTICE_PROD_CAcpu_to_be16(1)
> -#define IB_NOTICE_PROD_SWITCHcpu_to_be16(2)
> -#define IB_NOTICE_PROD_ROUTERcpu_to_be16(3)
> -#define IB_NOTICE_PROD_CLASS_MGR cpu_to_be16

Re: [PATCH v2 01/49] IB/core: Add OPA Port header definitions

2015-06-17 Thread Hal Rosenstock

On 6/17/2015 8:08 AM, Marciniszyn, Mike wrote:
>>
>>>  include/rdma/opa_port_info.h  |  433
>>
>> This matches the current code structure, but is this the best location for 
>> this file?
> 
> Do you have a suggestion?

Couldn't it be a header in the hfi1 driver directory ? Isn't it OPA
specific ?

-- Hal

> 
> Mike
> N�r��y���b�X��ǧv�^�)޺{.n�+{��ٚ�{ay�ʇڙ�,j��f���h���z��w���
> ���j:+v���w�j�mzZ+��ݢj"��!tml=

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH acm] acme.c: Fix IPv6 address handling in inet_any_pton

2015-06-17 Thread Hal Rosenstock

On 6/17/2015 1:50 AM, Hefty, Sean wrote:
> Thanks - applied manually

Thanks.

>>  acm/src/acme.c |2 +-
> 
> Path?
> 
>> diff --git a/acm/src/acme.c b/acm/src/acme.c
>> index 9bf7557..d54c8b9 100644
>> --- a/acm/src/acme.c
>> +++ b/acm/src/acme.c
>> @@ -787,7 +787,7 @@ static int inet_any_pton(char *addr, struct sockaddr
> 
> This was off by a couple hundred lines.  Can you check your branch?

My bad; patch I sent was against SSA ACM. I only compared code to see it
was needed upstream and forgot to port it against upstream tree.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH acm] acme.c: Fix IPv6 address handling in inet_any_pton

2015-06-16 Thread Hal Rosenstock

From: Ilya Nelkenbaum 

In inet_any_pton() routine, src_addr was used by mistake
instead of addr argument.

Signed-off-by: Ilya Nelkenbaum 
Signed-off-by: Hal Rosenstock 
---
 acm/src/acme.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/acm/src/acme.c b/acm/src/acme.c
index 9bf7557..d54c8b9 100644
--- a/acm/src/acme.c
+++ b/acm/src/acme.c
@@ -787,7 +787,7 @@ static int inet_any_pton(char *addr, struct sockaddr *sa)
if (ret <= 0) {
sin6 = (struct sockaddr_in6 *) sa;
sa->sa_family = AF_INET6;
-   ret = inet_pton(AF_INET6, src_addr, &sin6->sin6_addr);
+   ret = inet_pton(AF_INET6, addr, &sin6->sin6_addr);
}
 
return ret;
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 14/14] IB/mad: Add final OPA MAD processing

2015-06-11 Thread Hal Rosenstock

On 6/11/2015 7:52 PM, Hefty, Sean wrote:
>>> I agree that the node type enum isn't particularly useful and should be
>> retired.
>>
>> Are you referring to kernel space or user space or both ?
> 
> Short term, kernel space.  User space needs to keep something around for 
> backwards compatibility.
> 
> But the in tree code will never expose this value up.
> 
>>> But even if we do, I'm not sure this is the correct approach.  I don't
>> know this for a fact,
>>> but it seems more likely that someone would embed Linux on an IB switch
>> than they would plug an IB switch
>>> into a Linux based system.  The code is designed around the latter.
>> Making this a system wide setting might simplify the code and optimize the
>> code paths.
>>
>> I think we need to discuss how user space would be addressed.
> 
> This is an issue with out of tree drivers.  We're having to guess what things 
> might be doing.  
> Are all devices being exposed up as a 'switch', or is there ever a case where 
> there's a 'switch' device 
> and an HCA device being reported together, or (highly unlikely) a switch 
> device and an RNIC?

Gateways are comprised of switch + HCA devices. There are other more
complex cases of multiple devices.

> If the real use case is to embed Linux on a switch, then we could look at 
> making that a system wide setting, 
> rather than per device.  This could clean up the kernel without impacting the 
> uABI.

I think that system wide is too limiting and it needs to be on a per
device basis.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 14/14] IB/mad: Add final OPA MAD processing

2015-06-11 Thread Hal Rosenstock

On 6/11/2015 5:00 PM, Hefty, Sean wrote:
>>> cap_is_switch_smi would be a nice refinement to let us drop nodetype.
>>
>> Exactly, we need a bit added to the immutable data bits, and a new cap_
>> helper, and then nodetype is ready to be retired.  Add a bit, drop a
>> u8 ;-)
> 
> I agree that the node type enum isn't particularly useful and should be 
> retired.  

Are you referring to kernel space or user space or both ?

> In fact, I don't see where RDMA_NODE_IB_SWITCH is used by any upstream 
> device.  

While not upstream, there are at least 2 vendors with one or more switch
device drivers using the upstream stack.

> So I don't think there's any obligation to keep it.  

In kernel space, we can get rid of it but it's exposed by verbs and
currently relied upon in user space in a number of places.

There's one kernel place that needs more than just cap_is_switch_smi().

> But even if we do, I'm not sure this is the correct approach.  I don't know 
> this for a fact, 
> but it seems more likely that someone would embed Linux on an IB switch than 
> they would plug an IB switch 
> into a Linux based system.  The code is designed around the latter.  Making 
> this a system wide setting might simplify the code and optimize the code 
> paths.

I think we need to discuss how user space would be addressed.

-- Hal

> - Sean
> N�r��y���b�X��ǧv�^�)޺{.n�+{��ٚ�{ay�ʇڙ�,j��f���h���z��w���
> ���j:+v���w�j�mzZ+��ݢj"��!tml=

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 1/1] ibacm: Add support for pathrecord query through netlink

2015-06-11 Thread Hal Rosenstock

On 6/11/2015 10:25 AM, Wan, Kaike wrote:

>>> +static void acm_nl_process_resolve(struct acmc_client *client,
>>> +  struct acm_nl_msg *acmnlmsg)
>>> +{
>>> +   struct acm_msg msg;
>>> +   struct nlattr *attr;
>>> +   int payload_len;
>>> +   int rem;
>>> +   int total_attr_len;
>>> +
>>> +   memset(&msg, 0, sizeof(msg));
>>> +   msg.hdr.opcode = ACM_OP_RESOLVE;
>>> +   msg.hdr.version = ACM_VERSION;
>>> +   msg.hdr.length = ACM_MSG_HDR_LENGTH +
>> ACM_MSG_EP_LENGTH;
>>> +   msg.hdr.status = ACM_STATUS_SUCCESS;
>>> +   msg.hdr.tid = (uint64_t) acmnlmsg;
>>> +   msg.resolve_data[0].type = ACM_EP_INFO_PATH;
>>> +
>>> +   payload_len = acmnlmsg->nlmsg_header.nlmsg_len -
>> NLMSG_HDRLEN;
>>> +   attr = NLMSG_DATA(&acmnlmsg->nlmsg_header);
>>> +   rem = payload_len;
>>> +   while (1) {
>>> +   if (rem < (int) sizeof(*attr) ||
>>> +   attr->nla_len < sizeof(*attr) ||
>>> +   attr->nla_len > rem)
>>> +   break;
>>> +
>>> +   acm_nl_parse_path_attr(attr, &msg.resolve_data[0]);
>>> +
>>> +   /* Next attribute */
>>> +   total_attr_len = NLA_ALIGN(attr->nla_len);
>>> +   rem -= total_attr_len;
>>> +   attr = (struct nlattr *) ((char *) attr + total_attr_len);
>>> +   }
>>> +
>>
>> Since ACM does not resolve multicast PRs,
> 
> Why not? The multicast gid will be used as the dgid and ibacm will not ask 
> peers for address resolution. 
> Instead, It will ask SA directly for the multicast pathrecord (route 
> resolution only). 
> I can't see why it can't be done here.

Are you saying the local cache lookup fails so it falls back to ask SA
for multicast PR ? If so, then perhaps there's provider optimization to
fail that style of lookup rather than doing the tree lookup.

>> as an optimization here, some
>> minor check of DGID could be done and if it's multicast DGID, ENODATA
>> could be indicated in NL message.
>>
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 1/1] ibacm: Add support for pathrecord query through netlink

2015-06-11 Thread Hal Rosenstock

On 6/9/2015 10:57 AM, kaike@intel.com wrote:
> From: Kaike Wan 
> 
> This patch enables ibacm to process pathrecord queries through netlink.
> Since ibacm can cache pathrecords, this implementation provides an easy
> pathrecord cache for kernel components and therefore offers great
> performance advantage on large fabric systems.
> 
> Signed-off-by: Kaike Wan 
> Signed-off-by: John Fleck 
> Signed-off-by: Ira Weiny 
> Reviewed-by: Sean Hefty 
> ---
>  src/acm.c |  360 
> -
>  1 files changed, 357 insertions(+), 3 deletions(-)
> 
> diff --git a/src/acm.c b/src/acm.c
> index 7649725..d180bd2 100644
> --- a/src/acm.c
> +++ b/src/acm.c
> @@ -46,6 +46,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  #include 
>  #include  
>  #include 
> @@ -55,6 +57,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  #include 
>  #include "acm_mad.h"
>  #include "acm_util.h"
> @@ -66,6 +70,7 @@
>  #define MAX_EP_ADDR 4
>  #define NL_MSG_BUF_SIZE 4096
>  #define ACM_PROV_NAME_SIZE 64
> +#define NL_CLIENT_INDEX 0
>  
>  struct acmc_subnet {
>   DLIST_ENTRYentry;
> @@ -151,6 +156,26 @@ struct acmc_sa_req {
>   struct acm_sa_mad   mad;
>  };
>  
> +struct acm_nl_status {
> + struct nlattr   attr_hdr;
> + struct rdma_nla_ls_status   status;
> +};
> +
> +struct acm_nl_path {
> + struct nlattr   attr_hdr;
> + struct ib_path_rec_data rec;
> +};
> +
> +struct acm_nl_msg {
> + struct nlmsghdr nlmsg_header;
> + union {
> + uint8_t data[ACM_MSG_DATA_LENGTH];
> + struct nlattr   attr[0];
> + struct acm_nl_statusstatus[0];
> + struct acm_nl_path  path[0];
> + };
> +};
> +
>  static char def_prov_name[ACM_PROV_NAME_SIZE] = "ibacmp";
>  static DLIST_ENTRY provider_list;
>  static struct acmc_prov *def_provider = NULL;
> @@ -172,6 +197,7 @@ static struct acmc_ep *acm_find_ep(struct acmc_port 
> *port, uint16_t pkey);
>  static int acm_ep_insert_addr(struct acmc_ep *ep, const char *name, uint8_t 
> *addr,
> size_t addr_len, uint8_t addr_type);
>  static void acm_event_handler(struct acmc_device *dev);
> +static int acm_nl_send(SOCKET sock, struct acm_msg *msg);
>  
>  static struct sa_data {
>   int timeout;
> @@ -466,7 +492,11 @@ int acm_resolve_response(uint64_t id, struct acm_msg 
> *msg)
>   goto release;
>   }
>  
> - ret = send(client->sock, (char *) msg, msg->hdr.length, 0);
> + if (id == NL_CLIENT_INDEX)
> + ret = acm_nl_send(client->sock, msg);
> + else
> + ret = send(client->sock, (char *) msg, msg->hdr.length, 0);
> +
>   if (ret != msg->hdr.length)
>   acm_log(0, "ERROR - failed to send response\n");
>   else
> @@ -597,6 +627,8 @@ static void acm_svr_accept(void)
>   }
>  
>   for (i = 0; i < FD_SETSIZE - 1; i++) {
> + if (i == NL_CLIENT_INDEX)
> + continue;
>   if (!atomic_get(&client_array[i].refcnt))
>   break;
>   }
> @@ -1346,6 +1378,323 @@ static void acm_ipnl_handler(void)
>   }
>  }
>  
> +static int acm_nl_send(SOCKET sock, struct acm_msg *msg)
> +{
> + struct sockaddr_nl dst_addr;
> + struct acm_nl_msg acmnlmsg;
> + struct acm_nl_msg *orig;
> + int ret;
> + int datalen;
> +
> + orig = (struct acm_nl_msg *) msg->hdr.tid;
> +
> + memset(&dst_addr, 0, sizeof(dst_addr));
> + dst_addr.nl_family = AF_NETLINK;
> + dst_addr.nl_groups = (1 << (RDMA_NL_GROUP_LS - 1));
> +
> + memset(&acmnlmsg, 0, sizeof(acmnlmsg));
> + acmnlmsg.nlmsg_header.nlmsg_len = NLMSG_HDRLEN;
> + acmnlmsg.nlmsg_header.nlmsg_pid = getpid();
> + acmnlmsg.nlmsg_header.nlmsg_type = orig->nlmsg_header.nlmsg_type;
> + acmnlmsg.nlmsg_header.nlmsg_flags = NLM_F_REQUEST;
> + acmnlmsg.nlmsg_header.nlmsg_seq = orig->nlmsg_header.nlmsg_seq;
> +
> + if (msg->hdr.status != ACM_STATUS_SUCCESS) {
> + acm_log(2, "acm status no success = %d\n", msg->hdr.status);
> + acmnlmsg.nlmsg_header.nlmsg_flags |= RDMA_NL_LS_F_ERR;
> + acmnlmsg.nlmsg_header.nlmsg_len +=
> + NLA_ALIGN(sizeof(struct acm_nl_status));
> + acmnlmsg.status[0].attr_hdr.nla_type = LS_NLA_TYPE_STATUS;
> + acmnlmsg.status[0].attr_hdr.nla_len = NLA_HDRLEN +
> + sizeof(struct rdma_nla_ls_status);
> + if (msg->hdr.status == ACM_STATUS_EINVAL)
> + acmnlmsg.status[0].status.status = LS_NLA_STATUS_EINVAL;
> + else
> + acmnlmsg.status[0].status.status =
> + LS_NLA_STATUS_ENODATA;
> + } else {
> + acm_log(2, "acm status success\n

Re: [PATCH v4 4/4] IB/sa: Route SA pathrecord query through netlink

2015-06-11 Thread Hal Rosenstock

On 6/11/2015 8:54 AM, Wan, Kaike wrote:
>> From: Hal Rosenstock [mailto:h...@dev.mellanox.co.il]
>> Sent: Thursday, June 11, 2015 8:24 AM
>> To: Weiny, Ira
>> Cc: Jason Gunthorpe; Wan, Kaike; linux-rdma@vger.kernel.org; Fleck, John
>> Subject: Re: [PATCH v4 4/4] IB/sa: Route SA pathrecord query through netlink
>>
>> On 6/10/2015 5:09 PM, Weiny, Ira wrote:
>>>> On 6/10/2015 3:10 PM, Jason Gunthorpe wrote:
>>>>> On Wed, Jun 10, 2015 at 01:47:36PM -0400, Hal Rosenstock wrote:
>>>>>> On 6/9/2015 10:57 AM, kaike@intel.com wrote:
>>>>>>> From: Kaike Wan 
>>>>>>>
>>>>>>> This patch routes a SA pathrecord query to netlink first
>>>>>>
>>>>>> Should only unicast PRs be done in this manner or should API
>>>>>> support enabling for unicast and/or multicast ?
>>>>>>
>>>>>> AFAIK kernel doesn't query multicast PRs now (queries MCMRs) but
>>>>>> this seems like it would help make it future proof and not have to
>>>>>> take timeout on local query unless app supports it.
>>>>>
>>>>> It is a good question. We can clearly extend toward that, using a
>>>>> MGID as the DGID and adding additional nested netlink fields.
>>>>>
>>>>> However, does it make sense?
>>>>
>>>> If it doesn't make sense, then should MGIDs as DGIDs never be
>>>> requested via the PR netlink API ?
>>>>
>>>
>>> Why should we prevent it?  What does it hurt?
>>
>> It's merely an optimization in that round trip to user space is avoided with
>> user space needing to perform some validation/lookup which would fail in
>> case multicast PRs not supported (which is case for ACM with multicast
>> backend).
> 
> If the kernel client can query SA for mulitcast PRs, why can't ibacm (even 
> backed by acmp)? 

It can.

> Is there any code there that prevents a mgid being used as the destination?

No. It's a matter of optimization only.

>>
>> If user space PR capabilities (unicast, multicast, both) is supported, it 
>> affects
>> the API. 
> How? If we can query SA for multicast PRs without joining the multicast 
> groups, 
> what additional changes in netlink API do we need to support both?

Nothing to support both but if we wanted to disable one or other based
on user space we would but it sounds like we don't need/want this but
would use user space rejection/no data for this.

>> Maybe it's overkill but requires user space to indicate no PR available
>> for multicast DGIDs. I think this is the tradeoff to be made/decided.
>>
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 4/4] IB/sa: Route SA pathrecord query through netlink

2015-06-11 Thread Hal Rosenstock

On 6/10/2015 5:09 PM, Weiny, Ira wrote:
>> On 6/10/2015 3:10 PM, Jason Gunthorpe wrote:
>>> On Wed, Jun 10, 2015 at 01:47:36PM -0400, Hal Rosenstock wrote:
>>>> On 6/9/2015 10:57 AM, kaike@intel.com wrote:
>>>>> From: Kaike Wan 
>>>>>
>>>>> This patch routes a SA pathrecord query to netlink first
>>>>
>>>> Should only unicast PRs be done in this manner or should API support
>>>> enabling for unicast and/or multicast ?
>>>>
>>>> AFAIK kernel doesn't query multicast PRs now (queries MCMRs) but this
>>>> seems like it would help make it future proof and not have to take
>>>> timeout on local query unless app supports it.
>>>
>>> It is a good question. We can clearly extend toward that, using a MGID
>>> as the DGID and adding additional nested netlink fields.
>>>
>>> However, does it make sense?
>>
>> If it doesn't make sense, then should MGIDs as DGIDs never be requested via
>> the PR netlink API ?
>>
> 
> Why should we prevent it?  What does it hurt?

It's merely an optimization in that round trip to user space is avoided
with user space needing to perform some validation/lookup which would
fail in case multicast PRs not supported (which is case for ACM with
multicast backend).

If user space PR capabilities (unicast, multicast, both) is supported,
it affects the API. Maybe it's overkill but requires user space to
indicate no PR available for multicast DGIDs. I think this is the
tradeoff to be made/decided.

-- Hal

> Ira
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 1/4] IB/netlink: Add defines for local service requests through netlink

2015-06-10 Thread Hal Rosenstock

On 6/10/2015 2:31 PM, Wan, Kaike wrote:
>> From: Hal Rosenstock [mailto:h...@dev.mellanox.co.il]
>> Sent: Wednesday, June 10, 2015 1:47 PM
>>
>> On 6/9/2015 10:57 AM, kaike@intel.com wrote:
>>> From: Kaike Wan 
>>>
>>> This patch adds netlink defines for SA client, local service group,
>>> local service operations, and related attributes.
>>>
>>> Signed-off-by: Kaike Wan 
>>> Signed-off-by: John Fleck 
>>> Signed-off-by: Ira Weiny 
>>> Reviewed-by: Sean Hefty 
>>> ---
>>>  include/uapi/rdma/rdma_netlink.h |   82
>> ++
>>>  1 files changed, 82 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/include/uapi/rdma/rdma_netlink.h
>>> b/include/uapi/rdma/rdma_netlink.h
>>> index 6e4bb42..341e9be 100644
>>> --- a/include/uapi/rdma/rdma_netlink.h
>>> +++ b/include/uapi/rdma/rdma_netlink.h
>>> @@ -7,12 +7,14 @@ enum {
>>> RDMA_NL_RDMA_CM = 1,
>>> RDMA_NL_NES,
>>> RDMA_NL_C4IW,
>>> +   RDMA_NL_SA,
>>> RDMA_NL_NUM_CLIENTS
>>>  };
>>>
>>>  enum {
>>> RDMA_NL_GROUP_CM = 1,
>>> RDMA_NL_GROUP_IWPM,
>>> +   RDMA_NL_GROUP_LS,
>>> RDMA_NL_NUM_GROUPS
>>>  };
>>>
>>> @@ -128,5 +130,85 @@ enum {
>>> IWPM_NLA_ERR_MAX
>>>  };
>>>
>>> +/* Local service group opcodes */
>>> +enum {
>>> +   RDMA_NL_LS_OP_RESOLVE = 0,
>>> +   RDMA_NL_LS_OP_SET_TIMEOUT,
>>> +   RDMA_NL_LS_NUM_OPS
>>> +};
>>> +
>>> +/* Local service netlink message flags */
>>> +#define RDMA_NL_LS_F_OK0x0100  /* Success response */
>>> +#define RDMA_NL_LS_F_ERR   0x0200  /* Failed response */
>>> +
>>> +/* Local service attribute type */
>>> +enum {
>>> +   LS_NLA_TYPE_STATUS = 0,
>>> +   LS_NLA_TYPE_PATH_RECORD,
>>> +   LS_NLA_TYPE_TIMEOUT,
>>> +   LS_NLA_TYPE_SERVICE_ID,
>>> +   LS_NLA_TYPE_DGID,
>>> +   LS_NLA_TYPE_SGID,
>>> +   LS_NLA_TYPE_TCLASS,
>>> +   LS_NLA_TYPE_REVERSIBLE,
>>> +   LS_NLA_TYPE_NUM_PATH,
>>
>> Should this be NUMB_PATH rather than NUM_PATH ?
> 
> Yes. 
> 
>>
>>> +   LS_NLA_TYPE_PKEY,
>>> +   LS_NLA_TYPE_QOS_CLASS,
>>
>> Should this include SL too ?
> 
> It will be another attribute. 

If SL is to be included, maybe it should be SL mask. Maybe this is an
example of future extensibility...

> All the fields are modeled after struct ib_sa_path_rec and struct 
> ib_user_path_rec,  
> not after struct ibv_path_record in the user space. In addition, only those 
> pathrecord components that are currently 
> used by rdma_cm, ipoib, and srp are list here.

Understood.

>>
>>> +   LS_NLA_TYPE_MAX
>>> +};
>>> +
>>> +/* Local service status attribute */
>>> +enum {
>>> +   LS_NLA_STATUS_SUCCESS = 0,
>>> +   LS_NLA_STATUS_EINVAL,
>>> +   LS_NLA_STATUS_ENODATA,
>>> +   LS_NLA_STATUS_MAX
>>> +};
>>> +
>>> +struct rdma_nla_ls_status {
>>> +   __u32   status;
>>> +};
>>> +
>>> +/* Local service pathrecord attribute: struct ib_path_rec_data */
>>> +
>>> +/* Local service timeout attribute */ struct rdma_nla_ls_timeout {
>>> +   __u32   timeout;
>>> +};
>>> +
>>> +/* Local Service ServiceID attribute */ struct rdma_nla_ls_service_id
>>> +{
>>> +   __be64  service_id;
>>> +};
>>> +
>>> +/* Local Service DGID/SGID attribute: big endian */ struct
>>> +rdma_nla_ls_gid {
>>> +   __u8gid[16];
>>> +};
>>> +
>>> +/* Local Service Traffic Class attribute */ struct rdma_nla_ls_tclass
>>> +{
>>> +   __u8tclass;
>>> +};
>>> +
>>> +/* Local Service Reversible attribute */ struct
>>> +rdma_nla_ls_reversible {
>>> +   __u32   reversible;
>>> +};
>>
>> Isn't __u8 sufficient for reversible ?
> Certainly enough. However, reversible is __u32 in struct ib_user_path_rec and 
> int in struct ib_sa_path_rec.

OK; I hadn't double checked there. So it's "inherited" from those
reversible definitions.

>>
>>> +
>>> +/* Local Service numb_path attribute */ struct rdma_nla_ls_numb_path
>>> +{
>>> +   __u8numb_path;
>>> +};
>>> +
>>> +/* Local Service Pkey attribute*/
>>> +struct rdma_nla_ls_pkey {
>>> +   __be16  pkey;
>>> +};
>>> +
>>> +/* Local Service Qos Class attribute */ struct rdma_nla_ls_qos_class
>>> +{
>>> +   __be16  qos_class;
>>> +};
>>>
>>>  #endif /* _UAPI_RDMA_NETLINK_H */
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 4/4] IB/sa: Route SA pathrecord query through netlink

2015-06-10 Thread Hal Rosenstock

On 6/10/2015 3:10 PM, Jason Gunthorpe wrote:
> On Wed, Jun 10, 2015 at 01:47:36PM -0400, Hal Rosenstock wrote:
>> On 6/9/2015 10:57 AM, kaike@intel.com wrote:
>>> From: Kaike Wan 
>>>
>>> This patch routes a SA pathrecord query to netlink first
>>
>> Should only unicast PRs be done in this manner or should API support
>> enabling for unicast and/or multicast ?
>>
>> AFAIK kernel doesn't query multicast PRs now (queries MCMRs) but this
>> seems like it would help make it future proof and not have to take
>> timeout on local query unless app supports it.
> 
> It is a good question. We can clearly extend toward that, using a MGID
> as the DGID and adding additional nested netlink fields.
> 
> However, does it make sense? 

If it doesn't make sense, then should MGIDs as DGIDs never be requested
via the PR netlink API ?

> I don't think user space should be
> joining on behalf of the kernel... 

Agreed.

> Do we ever query without a join?

Query is possible prior to join.

The use case I'm aware of is doing query PR for MGID rather than query
MCMR. That case could be viewed just like unicast PRs.

-- Hal

> Jasno
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 0/4] Sending kernel pathrecord query to user cache server

2015-06-10 Thread Hal Rosenstock

On 6/10/2015 1:04 PM, Hefty, Sean wrote:
>> Not in the patches themselves but in the general issue when a PR changes.
>>
>> Do you think this needs addressing or are things fine as they are now ?
> 
> IMO, I think it needs addressing in terms of "can the proposed netlink 
> architecture and design accommodate this 
> sort of request in the future?"  We shouldn't design in a limitation up 
> front.  I don't see anything in the current approach that would cause an 
> issue.  There would likely be a need for new messages and attributes.

The current proposal is focused around the PR attributes/styles
currently used in the kernel. The case I can see is if in future a new
attribute is added to the PR netlink API. How is that handled ? Can user
space say it can't service a request ? That seems a little different
from the no PR case.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 1/4] IB/netlink: Add defines for local service requests through netlink

2015-06-10 Thread Hal Rosenstock

On 6/9/2015 10:57 AM, kaike@intel.com wrote:
> From: Kaike Wan 
> 
> This patch adds netlink defines for SA client, local service group, local
> service operations, and related attributes.
> 
> Signed-off-by: Kaike Wan 
> Signed-off-by: John Fleck 
> Signed-off-by: Ira Weiny 
> Reviewed-by: Sean Hefty 
> ---
>  include/uapi/rdma/rdma_netlink.h |   82 
> ++
>  1 files changed, 82 insertions(+), 0 deletions(-)
> 
> diff --git a/include/uapi/rdma/rdma_netlink.h 
> b/include/uapi/rdma/rdma_netlink.h
> index 6e4bb42..341e9be 100644
> --- a/include/uapi/rdma/rdma_netlink.h
> +++ b/include/uapi/rdma/rdma_netlink.h
> @@ -7,12 +7,14 @@ enum {
>   RDMA_NL_RDMA_CM = 1,
>   RDMA_NL_NES,
>   RDMA_NL_C4IW,
> + RDMA_NL_SA,
>   RDMA_NL_NUM_CLIENTS
>  };
>  
>  enum {
>   RDMA_NL_GROUP_CM = 1,
>   RDMA_NL_GROUP_IWPM,
> + RDMA_NL_GROUP_LS,
>   RDMA_NL_NUM_GROUPS
>  };
>  
> @@ -128,5 +130,85 @@ enum {
>   IWPM_NLA_ERR_MAX
>  };
>  
> +/* Local service group opcodes */
> +enum {
> + RDMA_NL_LS_OP_RESOLVE = 0,
> + RDMA_NL_LS_OP_SET_TIMEOUT,
> + RDMA_NL_LS_NUM_OPS
> +};
> +
> +/* Local service netlink message flags */
> +#define RDMA_NL_LS_F_OK  0x0100  /* Success response */
> +#define RDMA_NL_LS_F_ERR 0x0200  /* Failed response */
> +
> +/* Local service attribute type */
> +enum {
> + LS_NLA_TYPE_STATUS = 0,
> + LS_NLA_TYPE_PATH_RECORD,
> + LS_NLA_TYPE_TIMEOUT,
> + LS_NLA_TYPE_SERVICE_ID,
> + LS_NLA_TYPE_DGID,
> + LS_NLA_TYPE_SGID,
> + LS_NLA_TYPE_TCLASS,
> + LS_NLA_TYPE_REVERSIBLE,
> + LS_NLA_TYPE_NUM_PATH,

Should this be NUMB_PATH rather than NUM_PATH ?

> + LS_NLA_TYPE_PKEY,
> + LS_NLA_TYPE_QOS_CLASS,

Should this include SL too ?

> + LS_NLA_TYPE_MAX
> +};
> +
> +/* Local service status attribute */
> +enum {
> + LS_NLA_STATUS_SUCCESS = 0,
> + LS_NLA_STATUS_EINVAL,
> + LS_NLA_STATUS_ENODATA,
> + LS_NLA_STATUS_MAX
> +};
> +
> +struct rdma_nla_ls_status {
> + __u32   status;
> +};
> +
> +/* Local service pathrecord attribute: struct ib_path_rec_data */
> +
> +/* Local service timeout attribute */
> +struct rdma_nla_ls_timeout {
> + __u32   timeout;
> +};
> +
> +/* Local Service ServiceID attribute */
> +struct rdma_nla_ls_service_id {
> + __be64  service_id;
> +};
> +
> +/* Local Service DGID/SGID attribute: big endian */
> +struct rdma_nla_ls_gid {
> + __u8gid[16];
> +};
> +
> +/* Local Service Traffic Class attribute */
> +struct rdma_nla_ls_tclass {
> + __u8tclass;
> +};
> +
> +/* Local Service Reversible attribute */
> +struct rdma_nla_ls_reversible {
> + __u32   reversible;
> +};

Isn't __u8 sufficient for reversible ?

> +
> +/* Local Service numb_path attribute */
> +struct rdma_nla_ls_numb_path {
> + __u8numb_path;
> +};
> +
> +/* Local Service Pkey attribute*/
> +struct rdma_nla_ls_pkey {
> + __be16  pkey;
> +};
> +
> +/* Local Service Qos Class attribute */
> +struct rdma_nla_ls_qos_class {
> + __be16  qos_class;
> +};
>  
>  #endif /* _UAPI_RDMA_NETLINK_H */

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 4/4] IB/sa: Route SA pathrecord query through netlink

2015-06-10 Thread Hal Rosenstock

On 6/9/2015 10:57 AM, kaike@intel.com wrote:
> From: Kaike Wan 
> 
> This patch routes a SA pathrecord query to netlink first

Should only unicast PRs be done in this manner or should API support
enabling for unicast and/or multicast ?

AFAIK kernel doesn't query multicast PRs now (queries MCMRs) but this
seems like it would help make it future proof and not have to take
timeout on local query unless app supports it.

> and processes the
> response appropriately. If a failure is returned, the request will be sent
> through IB. The decision whether to route the request to netlink first is
> determined by the presence of a listener for the local service netlink
> multicast group. If the user-space local service netlink multicast group
> listener is not present, the request will be sent through IB, just like
> what is currently being done.
> 
> Signed-off-by: Kaike Wan 
> Signed-off-by: John Fleck 
> Signed-off-by: Ira Weiny 
> Reviewed-by: Sean Hefty 
> ---
>  drivers/infiniband/core/sa_query.c |  515 
> +++-
>  include/uapi/rdma/rdma_netlink.h   |2 +-
>  2 files changed, 515 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/infiniband/core/sa_query.c 
> b/drivers/infiniband/core/sa_query.c
> index 17e1cf7..e063593 100644
> --- a/drivers/infiniband/core/sa_query.c
> +++ b/drivers/infiniband/core/sa_query.c
> @@ -45,12 +45,21 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
> +#include 
> +#include 
>  #include "sa.h"
>  
>  MODULE_AUTHOR("Roland Dreier");
>  MODULE_DESCRIPTION("InfiniBand subnet administration query support");
>  MODULE_LICENSE("Dual BSD/GPL");
>  
> +#define IB_SA_LOCAL_SVC_TIMEOUT_MIN  100
> +#define IB_SA_LOCAL_SVC_TIMEOUT_DEFAULT  2000
> +#define IB_SA_LOCAL_SVC_TIMEOUT_MAX  20
> +static int sa_local_svc_timeout_ms = IB_SA_LOCAL_SVC_TIMEOUT_DEFAULT;
> +
>  struct ib_sa_sm_ah {
>   struct ib_ah*ah;
>   struct kref  ref;
> @@ -80,8 +89,24 @@ struct ib_sa_query {
>   struct ib_mad_send_buf *mad_buf;
>   struct ib_sa_sm_ah *sm_ah;
>   int id;
> + u32 flags;
>  };
>  
> +#define IB_SA_ENABLE_LOCAL_SERVICE   0x0001
> +#define IB_SA_CANCEL 0x0002
> +
> +#define IB_SA_LOCAL_SVC_ENABLED(query) \
> + ((query)->flags & IB_SA_ENABLE_LOCAL_SERVICE)
> +#define IB_SA_ENABLE_LOCAL_SVC(query) \
> + ((query)->flags |= IB_SA_ENABLE_LOCAL_SERVICE)
> +#define IB_SA_DISABLE_LOCAL_SVC(query) \
> + ((query)->flags &= ~IB_SA_ENABLE_LOCAL_SERVICE)
> +
> +#define IB_SA_QUERY_CANCELLED(query) \
> + ((query)->flags & IB_SA_CANCEL)
> +#define IB_SA_CANCEL_QUERY(query) \
> + ((query)->flags |= IB_SA_CANCEL)
> +
>  struct ib_sa_service_query {
>   void (*callback)(int, struct ib_sa_service_rec *, void *);
>   void *context;
> @@ -106,6 +131,26 @@ struct ib_sa_mcmember_query {
>   struct ib_sa_query sa_query;
>  };
>  
> +struct ib_nl_request_info {
> + struct list_head list;
> + u32 seq;
> + unsigned long timeout;
> + struct ib_sa_query *query;
> +};
> +
> +struct ib_nl_attr_info {
> + u16 len;/* Total attr len: header + payload + padding */
> + ib_sa_comp_mask comp_mask;
> + void *input;
> + void (*set_attrs)(struct sk_buff *skb, struct ib_nl_attr_info *info);
> +};
> +
> +static LIST_HEAD(ib_nl_request_list);
> +static DEFINE_SPINLOCK(ib_nl_request_lock);
> +static atomic_t ib_nl_sa_request_seq;
> +static struct workqueue_struct *ib_nl_wq;
> +static struct delayed_work ib_nl_timed_work;
> +
>  static void ib_sa_add_one(struct ib_device *device);
>  static void ib_sa_remove_one(struct ib_device *device);
>  
> @@ -381,6 +426,433 @@ static const struct ib_field guidinfo_rec_table[] = {
> .size_bits= 512 },
>  };
>  
> +static int ib_nl_send_msg(int opcode, struct ib_nl_attr_info *attrs, u32 seq)
> +{
> + struct sk_buff *skb = NULL;
> + struct nlmsghdr *nlh;
> + void *data;
> + int ret = 0;
> +
> + if (attrs->len <= 0)
> + return -EMSGSIZE;
> +
> + skb = nlmsg_new(attrs->len, GFP_KERNEL);
> + if (!skb) {
> + pr_err("alloc failed ret=%d\n", ret);
> + return -ENOMEM;
> + }
> +
> + /* Put nlmsg header only for now */
> + data = ibnl_put_msg(skb, &nlh, seq, 0, RDMA_NL_SA,
> + opcode, GFP_KERNEL);
> + if (!data) {
> + kfree_skb(skb);
> + return -EMSGSIZE;
> + }
> +
> + /* Add attributes */
> + attrs->set_attrs(skb, attrs);
> +
> + /* Repair the nlmsg header length */
> + nlmsg_end(skb, nlh);
> +
> + ret = ibnl_multicast(skb, nlh, RDMA_NL_GROUP_LS, GFP_KERNEL);
> + if (!ret) {
> + ret = attrs->len;
> + } else {
> + if (ret != -ESRCH)
> + pr_err("ibnl_multicast failed l=%d, r=%d\n",
> +attrs->len, ret);
> +

Re: [PATCH v4 0/4] Sending kernel pathrecord query to user cache server

2015-06-10 Thread Hal Rosenstock

On 6/10/2015 11:21 AM, Hefty, Sean wrote:
>> While this appears to address the current upstream use model for ACM
>> with it's multicast overlay backend where PRs are static, it does not
>> appear to address PR changes.
> 
> Although this ties into ibacm, from the viewpoint of the kernel, there's no 
> requirement on the user space implementation.

True. This is not how it works in the kernel today. It is meant as
future thinking/exploring.

>> Should aging/revalidation of PRs be supported ? If so, would this make
>> PRs similar at "high" level to IP neighbor cache in kernel ?
> 
> This is requesting a new feature not supported by anything in the kernel 
> today, and would seem to fall well
> outside the scope of the suggested changes.  

Outside scope of suggested changes but where does the kernel need to
head in the longer term ?

> Is there a specific issue in the patches that you are seeing related to this?

Not in the patches themselves but in the general issue when a PR changes.

Do you think this needs addressing or are things fine as they are now ?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 0/4] Sending kernel pathrecord query to user cache server

2015-06-10 Thread Hal Rosenstock

On 6/10/2015 11:49 AM, Wan, Kaike wrote:
>>> A SA cache is undeniably critical for fabric scalability and 
>>> performance.
>>> In user space, the ibacm application provides a good example of
>>> pathrecord cache for address and route resolution. With the recent
>>> implementation of the provider architecture, ibacm offers more
>> extensibility as a SA cache.
>>> In kernel, ipoib implements its own small cache for pathrecords,
>>> which is however not available for general use. Furthermore, the
>>> implementation of a SA cache in user space offers better
>>> flexibility, larger capacity, and more robustness for the system.
>>>
>>> In this patch series, a mechanism is implemented to allow ib_sa to
>>> send pathrecord query to a user application (eg ibacm) through netlink.
>>
>> While this appears to address the current upstream use model for
>> ACM with it's multicast overlay backend where PRs are static, it
>> does not appear to address PR changes.
>> Should aging/revalidation of PRs be supported ? If so, would this
>> make PRs similar at "high" level to IP neighbor cache in kernel ?
>
> Even for the default provider acmp, PRs will time out and the length
> of
 timeout is configurable. For other providers (eg ibssa), the PR
 change could be managed correctly and promptly, and this capability
 is beyond ibacm core itself.

 That deals with the update of PR in user space (ACM). Doesn't kernel
 need some way of knowing PR was updated ?
>>>
>>> Maybe. If a kernel client is interested in PR changes, it can register
>>> for notification
>>
>> Wouldn't the notification mechanism be via netlink ? What would the
>> granularity be for the registrations and notifications ?
> 
> Not necessarily. AFAIK, SA notification is not currently supported in kernel. 
> One could register notification through netlink, as long as the user space 
> application (eg: ibacm) supports it. But this would be a new feature, as Sean 
> pointed out in previous e-mail.

Such notifications are not necessarily tied to SA event reporting. In
fact, Todd made argument at 2014 OFA Devcon that such mechanism is
anathema to (exa)scalability.

-- Hal

> Kaike
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1627 matches

Mail list logo