Re: mlx4 missing completions (was Re: [PATCH-v2] ib_srpt: Initial SRP Target merge for v3.2-rc1)

2011-10-27 Thread Bart Van Assche
On Wed, Oct 26, 2011 at 10:06 PM, Roland Dreier rol...@purestorage.com wrote:
 On Wed, Oct 26, 2011 at 11:05 AM, Bart Van Assche bvanass...@acm.org wrote:
  Can I conclude from your reply that the last WQE event refers to the
  SRQ only and that it does not provide any information about the send
  queue associated with the same QP ?

 That is correct as I understand things.  The last WQE event was added
 to handle the fact that if you have a QP associated to an SRQ, you can't
 know how many receives have been taken from the SRQ and are in flight
 on the receive queue.  The IB spec is a bit vague on what last WQE
 actually means but based on the background, I believe it means that the
 last request on the receive queue has been processed, without saying
 anything about the send queue.

Are you sure about this ? There is a paragraph in the InfiniBand
Architecture Manual that can only be correct if last WQE refers to
both SRQ and send queue and not to the SRQ only. It is mentioned in
that manual that it is safe to reset the QP immediately after the last
WQE event has been received. A quote:

Note, for QPs that are associated with an SRQ, the Consumer should
take the QP through the Error State before invoking a Destroy QP or a
Modify QP to the Reset State. The Consumer may invoke the Destroy QP
without first performing a Modify QP to the Error State and waiting
for the Affiliated Asynchronous Last WQE Reached Event. However, if
the Consumer does not wait for the Affiliated Asynchronous Last WQE
Reached Event, then WQE and Data Segment leakage may occur. Therefore,
it is good programming practice to tear down a QP that is associated
with an SRQ by using the following process:
• Put the QP in the Error State;
• wait for the Affiliated Asynchronous Last WQE Reached Event;
• either:
• drain the CQ by invoking the Poll CQ verb and either wait for CQ to
be empty or the number of Poll CQ operations has exceeded CQ capacity
size; or
• post another WR that completes on the same CQ and wait for this WR
to return as a WC;
• and then invoke a Destroy QP or Reset QP.

Bart.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


REQ GID enforcement in the CM

2011-10-27 Thread Or Gerlitz

Hi Sean,

Looking on that case, I noted that the CM code (SB) checks that the GID 
in the incoming
REQ is present in at least of one the ports of the relevant device, but 
not specifically on
the port this request arrived to, is that following IBTA? I thought it 
could be problematic
e.g in the case of RC QP being set by this establishment, later the RC 
packets would be dropped
by the device if they have GRH and the GID isn't on that port table, 
isn't that?


Or.



ret = cm_init_av_by_path(work-path[0], cm_id_priv-av);
if (ret) {
ib_get_cached_gid(work-port-cm_dev-ib_device,
  work-port-port_num, 0, 
work-path[0].sgid);

ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_GID,
work-path[0].sgid, sizeof work-path[0].sgid,
   NULL, 0);
goto rejected;
}



 read_lock_irqsave(cm.device_lock, flags);
list_for_each_entry(cm_dev, cm.device_list, list) {
if (!ib_find_cached_gid(cm_dev-ib_device, path-sgid,
p, NULL)) {
port = cm_dev-port[p-1];
break;
}
}
read_unlock_irqrestore(cm.device_lock, flags);

if (!port)
return -EINVAL;



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] RDMA/nes: ip address print changes

2011-10-27 Thread Faisal Latif
Fix printing of ip addresses to use %pI4.

Reported-by: Roland Dreier rol...@purestorage.com
Signed-off-by: Tatyana Nikolova tatyana.e.nikol...@intel.com
Signed-off-by: Faisal Latif faisal.la...@intel.com
---
 drivers/infiniband/hw/nes/nes_cm.c |   96 +++-
 drivers/infiniband/hw/nes/nes_hw.c |6 +-
 2 files changed, 54 insertions(+), 48 deletions(-)

diff --git a/drivers/infiniband/hw/nes/nes_cm.c 
b/drivers/infiniband/hw/nes/nes_cm.c
index dfce9ea..1414168 100644
--- a/drivers/infiniband/hw/nes/nes_cm.c
+++ b/drivers/infiniband/hw/nes/nes_cm.c
@@ -173,6 +173,7 @@ static struct nes_cm_event *create_event(struct nes_cm_node 
*   cm_node,
 enum nes_cm_event_type type)
 {
struct nes_cm_event *event;
+   __be32 tmp_loc_addr, tmp_rem_addr;
 
if (!cm_node-cm_id)
return NULL;
@@ -191,11 +192,14 @@ static struct nes_cm_event *create_event(struct 
nes_cm_node * cm_node,
event-cm_info.loc_port = cm_node-loc_port;
event-cm_info.cm_id = cm_node-cm_id;
 
+   tmp_loc_addr = htonl(cm_node-loc_addr);
+   tmp_rem_addr = htonl(cm_node-rem_addr);
+
nes_debug(NES_DBG_CM, cm_node=%p Created event=%p, type=%u, 
- dst_addr=%08x[%x], src_addr=%08x[%x]\n,
- cm_node, event, type, event-cm_info.loc_addr,
- event-cm_info.loc_port, event-cm_info.rem_addr,
- event-cm_info.rem_port);
+ loc_addr=%pI4[%x], rem_addr=%pI4[%x]\n,
+   cm_node, event, type, tmp_loc_addr,
+   event-cm_info.loc_port, tmp_rem_addr,
+   event-cm_info.rem_port);
 
nes_cm_post_event(event);
return event;
@@ -1335,11 +1339,12 @@ static int nes_addr_resolve_neigh(struct nes_vnic 
*nesvnic, u32 dst_ip, int arpi
int rc = arpindex;
struct net_device *netdev;
struct nes_adapter *nesadapter = nesvnic-nesdev-nesadapter;
+   __be32 rem_addr = htonl(dst_ip);
 
-   rt = ip_route_output(init_net, htonl(dst_ip), 0, 0, 0);
+   rt = ip_route_output(init_net, rem_addr, 0, 0, 0);
if (IS_ERR(rt)) {
-   printk(KERN_ERR %s: ip_route_output_key failed for 0x%08X\n,
-  __func__, dst_ip);
+   printk(KERN_ERR %s: ip_route_output_key failed for %pI4\n,
+  __func__, rem_addr);
return rc;
}
 
@@ -1351,9 +1356,9 @@ static int nes_addr_resolve_neigh(struct nes_vnic 
*nesvnic, u32 dst_ip, int arpi
neigh = neigh_lookup(arp_tbl, rt-rt_gateway, netdev);
if (neigh) {
if (neigh-nud_state  NUD_VALID) {
-   nes_debug(NES_DBG_CM, Neighbor MAC address for 0x%08X
-  is %pM, Gateway is 0x%08X \n, dst_ip,
- neigh-ha, ntohl(rt-rt_gateway));
+   nes_debug(NES_DBG_CM, Neighbor MAC address for %pI4
+  is %pM, Gateway is %pI4\n, rem_addr,
+ neigh-ha, rt-rt_gateway);
 
if (arpindex = 0) {
if 
(!memcmp(nesadapter-arp_table[arpindex].mac_addr,
@@ -1397,6 +1402,7 @@ static struct nes_cm_node *make_cm_node(struct 
nes_cm_core *cm_core,
int arpindex = 0;
struct nes_device *nesdev;
struct nes_adapter *nesadapter;
+   __be32 tmp_rem_addr;
 
/* create an hte and cm_node for this instance */
cm_node = kzalloc(sizeof(*cm_node), GFP_ATOMIC);
@@ -1414,9 +1420,11 @@ static struct nes_cm_node *make_cm_node(struct 
nes_cm_core *cm_core,
cm_node-ird_size = IETF_NO_IRD_ORD;
cm_node-ord_size = IETF_NO_IRD_ORD;
 
+   tmp_rem_addr = htonl(cm_node-rem_addr);
nes_debug(NES_DBG_CM, Make node addresses : loc = %pI4:%x, rem = 
%pI4:%x\n,
- cm_node-loc_addr, cm_node-loc_port,
- cm_node-rem_addr, cm_node-rem_port);
+ nesvnic-local_ipaddr, cm_node-loc_port,
+ tmp_rem_addr, cm_node-rem_port);
+
cm_node-listener = listener;
cm_node-netdev = nesvnic-netdev;
cm_node-cm_id = cm_info-cm_id;
@@ -2198,8 +2206,8 @@ static struct nes_cm_listener *mini_cm_listen(struct 
nes_cm_core *cm_core,
struct nes_cm_listener *listener;
unsigned long flags;
 
-   nes_debug(NES_DBG_CM, Search for 0x%08x : 0x%04x\n,
- cm_info-loc_addr, cm_info-loc_port);
+   nes_debug(NES_DBG_CM, Search for %pI4 : 0x%04x\n,
+ cm_info-loc_addr, cm_info-loc_port);
 
/* cannot have multiple matching listeners */
listener = find_listener(cm_core, htonl(cm_info-loc_addr),
@@ -2248,9 +2256,9 @@ static struct nes_cm_listener *mini_cm_listen(struct 
nes_cm_core *cm_core,
atomic_inc(cm_core-listen_node_cnt);
}
 
-   nes_debug(NES_DBG_CM, Api - listen(): 

Re: REQ GID enforcement in the CM

2011-10-27 Thread Jason Gunthorpe
On Thu, Oct 27, 2011 at 04:14:07PM +0200, Or Gerlitz wrote:

 Looking on that case, I noted that the CM code (SB) checks that the
 GID in the incoming REQ is present in at least of one the ports of
 the relevant device, but not specifically on the port this request
 arrived to, is that following IBTA? I thought it could be
 problematic e.g in the case of RC QP being set by this
 establishment, later the RC packets would be dropped by the device
 if they have GRH and the GID isn't on that port table, isn't that?

You can use the CM to establish a connection on another port, eg send
CM GMPs to port 1, specify the primary data path is port 2 and specify
the alternate data path is port 1.

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] opensm/osm_link_mgr.c: Remove no longer needed memset

2011-10-27 Thread Hal Rosenstock

Since PortInfo attribute is now IB_SMP_DATA_SIZE (64 bytes),
there is no remaining space at the end of the attribute as
it now fills the entire SMP data size.

Found-by: Leonid Keller leo...@mellanox.co.il

Signed-off-by: Hal Rosenstock h...@mellanox.com
---
diff --git a/opensm/osm_link_mgr.c b/opensm/osm_link_mgr.c
index aaefa78..e834d16 100644
--- a/opensm/osm_link_mgr.c
+++ b/opensm/osm_link_mgr.c
@@ -157,8 +157,6 @@ static int link_mgr_set_physp_pi(osm_sm_t * sm, IN 
osm_physp_t * p_physp,
}
 
memcpy(payload, p_old_pi, sizeof(ib_port_info_t));
-   memset(payload + sizeof(ib_port_info_t), 0,
-  IB_SMP_DATA_SIZE - sizeof(ib_port_info_t));
 
/*
   Should never write back a value that is bigger then 3 in
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] opensm: Add the precreation of multicast groups

2011-10-27 Thread Hal Rosenstock
On 10/6/2011 8:13 PM, Ira Weiny wrote:
 
 Allow for the pre-creation of these groups on a partition by partition basis.

This looks good to me and has been needed for some time now; Thanks for
doing this!

Just some minor comments below from scanning through the patch. I
haven't tried it yet...

 P_Key is taken from the partition specification.  Q_Key, TClass, rate, and mtu
 can be specified.

If TClass is added, should FlowLabel also be added ?

 For IP groups, rate and mtu are verified to match the broadcast groups
 parameters.  P_Key can be specified in the mgid itself and is verified to 
 match
 the P_Key of the partition if != 0x.  If pkey == 0x then the P_Key is
 taken from the partition specification.

Do you mean the pkey in the IPoIB MGID here by partition specification ?

 The syntax extends the existing syntax by allowing MC groups to be specified
 one per line, intermixed with the port specifications.
 
 Signed-off-by: Ira Weiny wei...@llnl.gov
 ---
  include/opensm/osm_base.h  |5 +
  include/opensm/osm_partition.h |   12 +-
  man/opensm.8.in|  121 +++--

Similar changes should be made to doc/partition-config.txt which is
where this part of the man page came from originally.

  opensm/osm_prtn.c  |  111 
  opensm/osm_prtn_config.c   |  278 
 
  opensm/osm_qos_policy.c|   48 
  opensm/osm_subnet.c|2 +-
  7 files changed, 454 insertions(+), 123 deletions(-)
 
 diff --git a/include/opensm/osm_base.h b/include/opensm/osm_base.h
 index e558c55..869ba9c 100644
 --- a/include/opensm/osm_base.h
 +++ b/include/opensm/osm_base.h
 @@ -53,6 +53,7 @@
  #endif
  
  #include complib/cl_types.h
 +#include iba/ib_types.h
  
  #ifdef __cplusplus
  #  define BEGIN_C_DECLS extern C {
 @@ -954,6 +955,10 @@ typedef enum _osm_sm_signal {
  #define OSM_VENDOR_ID_HP4   0x00237D
  #define OSM_VENDOR_ID_OPENIB0x001405
  
 +/* IPoIB Broadcast Defaults */
 +#define OSM_IPOIB_BROADCAST_MGRP_QKEY 0x0b1b
 +extern const ib_gid_t osm_ipoib_broadcast_mgid;
 +
  /**/
  
  END_C_DECLS
 diff --git a/include/opensm/osm_partition.h b/include/opensm/osm_partition.h
 index fdb34b9..7277eac 100644
 --- a/include/opensm/osm_partition.h
 +++ b/include/opensm/osm_partition.h
 @@ -94,10 +94,11 @@ typedef struct osm_prtn {
   cl_map_item_t map_item;
   ib_net16_t pkey;
   uint8_t sl;
 - osm_mgrp_t *mgrp;
   cl_map_t full_guid_tbl;
   cl_map_t part_guid_tbl;
   char name[32];
 + osm_mgrp_t **mgrps;
 + int nmgrps;
  } osm_prtn_t;
  /*
  * FIELDS
 @@ -110,9 +111,10 @@ typedef struct osm_prtn {
  *sl
  *The Service Level (SL) associated with this Partiton.
  *
 -*mgrp
 -*The pointer to the well known Multicast Group
 -*that was created for this partition (when configured).
 +*   mgrps
 +*List of well known Multicast Groups
 +*that were created for this partition (when configured).
 +*This includes the IPoIB broadcast group.
  *
  *full_guid_tbl
  *Container of pointers to all Port objects in the Partition
 @@ -139,7 +141,7 @@ typedef struct osm_prtn {
  *
  * SYNOPSIS
  */
 -void osm_prtn_delete(IN OUT osm_prtn_t ** pp_prtn);
 +void osm_prtn_delete(IN OUT osm_prtn_t ** pp_prtn, osm_subn_t * p_subn);

IN osm_subn_t * p_subn

Also, reverse the parameters to be consistent with the coding convention.

  /*
  * PARAMETERS
  *pp_prtn
 diff --git a/man/opensm.8.in b/man/opensm.8.in
 index 042bee3..da0c247 100644
 --- a/man/opensm.8.in
 +++ b/man/opensm.8.in
 @@ -523,45 +523,76 @@ parser.
  
  General file format:
  
 -Partition Definition:PortGUIDs list ;
 -
 -Partition Definition:
 -
 -[PartitionName][=PKey][,flag[=value]][,defmember=full|limited]
 -
 - PartitionName - string, will be used with logging. When omitted
 - empty string will be used.
 - PKey  - P_Key value for this partition. Only low 15 bits will
 - be used. When omitted will be autogenerated.
 - flag  - used to indicate IPoIB capability of this partition.
 - defmember=full|limited - specifies default membership for port guid
 - list. Default is limited.
 -
 -Currently recognized flags are:
 -
 - ipoib   - indicates that this partition may be used for IPoIB, as
 -   result IPoIB capable MC group will be created.
 - rate=val  - specifies rate for this IPoIB MC group
 -   (default is 3 (10GBps))
 - mtu=val   - specifies MTU for this IPoIB MC group
 -   (default is 4 (2048))
 - sl=val- specifies SL for this IPoIB MC group
 -   (default is 0)
 - scope=val - specifies scope for this IPoIB MC group
 -   (default is 2 (link local)).  Multiple scope settings
 -   are permitted for a partition.
 -
 -Note that values for rate, mtu, and scope should