Re: [PATCH v2] IB/mlx4: Increase the timeout for CM cache

2019-02-17 Thread jackm
On Sun, 17 Feb 2019 15:45:12 +0100
Håkon Bugge  wrote:

> Using CX-3 virtual functions, either from a bare-metal machine or
> pass-through from a VM, MAD packets are proxied through the PF driver.
> 
> Since the VF drivers have separate name spaces for MAD Transaction Ids
> (TIDs), the PF driver has to re-map the TIDs and keep the book keeping
> in a cache.
> 
> Following the RDMA Connection Manager (CM) protocol, it is clear when
> an entry has to evicted form the cache. But life is not perfect,
> remote peers may die or be rebooted. Hence, it's a timeout to wipe out
> a cache entry, when the PF driver assumes the remote peer has gone.
> 
> During workloads where a high number of QPs are destroyed
> concurrently, excessive amount of CM DREQ retries has been observed
> 
> The problem can be demonstrated in a bare-metal environment, where two
> nodes have instantiated 8 VFs each. This using dual ported HCAs, so we
> have 16 vPorts per physical server.
> 
> 64 processes are associated with each vPort and creates and destroys
> one QP for each of the remote 64 processes. That is, 1024 QPs per
> vPort, all in all 16K QPs. The QPs are created/destroyed using the
> CM.
> 
> When tearing down these 16K QPs, excessive CM DREQ retries (and
> duplicates) are observed. With some cat/paste/awk wizardry on the
> infiniband_cm sysfs, we observe as sum of the 16 vPorts on one of the
> nodes:
> 
> cm_rx_duplicates:
>   dreq  2102
> cm_rx_msgs:
>   drep  1989
>   dreq  6195
>rep  3968
>req  4224
>rtu  4224
> cm_tx_msgs:
>   drep  4093
>   dreq 27568
>rep  4224
>req  3968
>rtu  3968
> cm_tx_retries:
>   dreq 23469
> 
> Note that the active/passive side is equally distributed between the
> two nodes.
> 
> Enabling pr_debug in cm.c gives tons of:
> 
> [171778.814239]  mlx4_ib_multiplex_cm_handler: id{slave:
> 1,sl_cm_id: 0xd393089f} is NULL!
> 
> By increasing the CM_CLEANUP_CACHE_TIMEOUT from 5 to 30 seconds, the
> tear-down phase of the application is reduced from approximately 90 to
> 50 seconds. Retries/duplicates are also significantly reduced:
> 
> cm_rx_duplicates:
>   dreq  2460
> []
> cm_tx_retries:
>   dreq  3010
>req47
> 
> Increasing the timeout further didn't help, as these duplicates and
> retries stems from a too short CMA timeout, which was 20 (~4 seconds)
> on the systems. By increasing the CMA timeout to 22 (~17 seconds), the
> numbers fell down to about 10 for both of them.
> 
> Adjustment of the CMA timeout is not part of this commit.
> 
> Signed-off-by: Håkon Bugge 
> 
> ---
> 
> v1 -> v2:
>* Reworded commit message to reflect the new test-setup using
>  multiple VFs
> ---
>  drivers/infiniband/hw/mlx4/cm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Acked-by: Jack Morgenstein 


Re: [PATCH] mlx4_ib: Increase the timeout for CM cache

2019-02-06 Thread jackm
On Wed, 6 Feb 2019 16:40:14 +0100
Håkon Bugge  wrote:

> Jack,
> 
> A major contributor to the long processing time in the PF driver
> proxying QP1 packets is:
> 
> create_pv_resources
>-> ib_create_cq(ctx->ib_dev, mlx4_ib_tunnel_comp_handler,  
>NULL, ctx, cq_size, 0);
> 
> That is, comp_vector is zero.
> 
> Due to commit 6ba1eb776461 ("IB/mlx4: Scatter CQs to different EQs"),
> the zero comp_vector has the intent of let the mlx4_core driver
> select the least used vector.
> 
> But, in mlx4_ib_create_cq(), we have:
> 
> pr_info("eq_table: %p\n", dev->eq_table);
> if (dev->eq_table) {
>   vector = dev->eq_table[mlx4_choose_vector(dev->dev,
> vector, ibdev->num_comp_vectors)];
> }
> 
> cq->vector = vector;
> 
> and dev->eq_table is NULL, so all the CQs for the proxy QPs get
> comp_vector zero.
> 
> I have to make some reservations, as this analysis is based on uek4,
> but I think the code here is equal upstream, but need to double check.
> 
> 
> Thxs, Håkon
> 
Hi Hakon and Jason,
I was ill today (bad cold, took antihistamines all day, which knocked
me out).
I'll get to this tomorrow.

-Jack



Re: [PATCH v3 2/2] IB/mad: Use ID allocator routines to allocate agent number (fwd)

2018-06-17 Thread jackm
On Sat, 16 Jun 2018 18:04:41 +0200 (CEST)
Julia Lawall  wrote:

> ib_mad_client_id is declared as u32, so it will not be < 0 (line 382).
> 
> julia
> 
Julia, your are correct.
However, I was under the impression that this patch set was abandoned
in favor of the one submitted by Matthew Wilcox! (Convert IB/mad to use
an IDR for agent IDs)

Hans, Matthew?

-Jack
> -- Forwarded message --
> Date: Fri, 8 Jun 2018 08:49:57 +0800
> From: kbuild test robot 
> To: kbu...@01.org
> Cc: Julia Lawall 
> Subject: Re: [PATCH v3 2/2] IB/mad: Use ID allocator routines to
> allocate agent number
> 
> Hi Hans,
> 
> Thank you for the patch! Perhaps something to improve:
> 
> [auto build test WARNING on linus/master]
> [also build test WARNING on v4.17]
> [if your patch is applied to the wrong git tree, please drop us a
> note to help improve the system]
> 
> url:
> https://github.com/0day-ci/linux/commits/Hans-Westgaard-Ry/IB-mad-Use-ID-allocator-routines-to-allocate-agent-number/20180608-022348
>  ::
> branch date: 6 hours ago :: commit date: 6 hours ago
> 
> >> drivers/infiniband/core/mad.c:382:5-21: WARNING: Unsigned
> >> expression compared with zero: ib_mad_client_id < 0  
> 
> #
> https://github.com/0day-ci/linux/commit/74b1ba09003c9d132878734bf44f32a62bad31db
> git remote add linux-review https://github.com/0day-ci/linux git
> remote update linux-review git checkout
> 74b1ba09003c9d132878734bf44f32a62bad31db vim +382
> drivers/infiniband/core/mad.c
> 
> 2527e681 Sean Hefty2006-07-20  190
> ^1da177e Linus Torvalds2005-04-16  191  /*
> ^1da177e Linus Torvalds2005-04-16  192   * ib_register_mad_agent
> - Register to send/receive MADs ^1da177e Linus Torvalds
> 2005-04-16  193   */ ^1da177e Linus Torvalds2005-04-16  194
> struct ib_mad_agent *ib_register_mad_agent(struct ib_device *device,
> ^1da177e Linus Torvalds2005-04-16  195
>  u8 port_num, ^1da177e
> Linus Torvalds2005-04-16  196
>  enum ib_qp_type qp_type,
> ^1da177e Linus Torvalds2005-04-16  197
>  struct ib_mad_reg_req
> *mad_reg_req, ^1da177e Linus Torvalds2005-04-16  198
>  u8 rmpp_version, ^1da177e
> Linus Torvalds2005-04-16  199
>  ib_mad_send_handler
> send_handler, ^1da177e Linus Torvalds2005-04-16  200
>  ib_mad_recv_handler
> recv_handler, 0f29b46d Ira Weiny 2014-08-08  201
>  void *context, 0f29b46d
> Ira Weiny 2014-08-08  202
>  u32 registration_flags)
> ^1da177e Linus Torvalds2005-04-16  203  { ^1da177e Linus
> Torvalds2005-04-16  204   struct ib_mad_port_private
> *port_priv; ^1da177e Linus Torvalds2005-04-16  205
> struct ib_mad_agent *ret = ERR_PTR(-EINVAL); ^1da177e Linus
> Torvalds2005-04-16  206   struct ib_mad_agent_private
> *mad_agent_priv; ^1da177e Linus Torvalds2005-04-16  207
> struct ib_mad_reg_req *reg_req = NULL; ^1da177e Linus Torvalds
> 2005-04-16  208   struct ib_mad_mgmt_class_table *class;
> ^1da177e Linus Torvalds2005-04-16  209struct
> ib_mad_mgmt_vendor_class_table *vendor; ^1da177e Linus Torvalds
> 2005-04-16  210   struct ib_mad_mgmt_vendor_class
> *vendor_class; ^1da177e Linus Torvalds2005-04-16  211
> struct ib_mad_mgmt_method_table *method; ^1da177e Linus Torvalds
> 2005-04-16  212   int ret2, qpn; ^1da177e Linus Torvalds
> 2005-04-16  213   unsigned long flags; ^1da177e Linus
> Torvalds2005-04-16  214   u8 mgmt_class, vclass; 74b1ba09
> Hans Westgaard Ry 2018-06-07  215 u32 ib_mad_client_id;
> ^1da177e Linus Torvalds2005-04-16  216/* Validate
> parameters */ ^1da177e Linus Torvalds2005-04-16  217  qpn
> = get_spl_qp_index(qp_type); 9ad13a42 Ira Weiny 2014-08-08
> 218   if (qpn == -1) { 9ad13a42 Ira Weiny 2014-08-08
> 219   dev_notice(>dev, 9ad13a42 Ira
> Weiny 2014-08-08  220
> "ib_register_mad_agent: invalid QP Type %d\n", 9ad13a42 Ira
> Weiny 2014-08-08  221qp_type);
> ^1da177e Linus Torvalds2005-04-16  222goto
> error1; 9ad13a42 Ira Weiny 2014-08-08  223}
> ^1da177e Linus Torvalds2005-04-16  224 9ad13a42 Ira Weiny
> 2014-08-08  225   if (rmpp_version && rmpp_version !=
> IB_MGMT_RMPP_VERSION) { 9ad13a42 Ira Weiny 2014-08-08  226
>   dev_notice(>dev, 9ad13a42 Ira Weiny
> 2014-08-08  227  "ib_register_mad_agent:
> invalid RMPP Version %u\n", 9ad13a42 Ira Weiny 2014-08-08
> 228  rmpp_version); fa619a77 Hal
> Rosenstock2005-07-27  229 goto error1; 9ad13a42
> Ira Weiny 2014-08-08  230 } ^1da177e Linus
> Torvalds2005-04-16  231 ^1da177e Linus Torvalds  

Re: [PATCH v3 2/2] IB/mad: Use ID allocator routines to allocate agent number (fwd)

2018-06-17 Thread jackm
On Sat, 16 Jun 2018 18:04:41 +0200 (CEST)
Julia Lawall  wrote:

> ib_mad_client_id is declared as u32, so it will not be < 0 (line 382).
> 
> julia
> 
Julia, your are correct.
However, I was under the impression that this patch set was abandoned
in favor of the one submitted by Matthew Wilcox! (Convert IB/mad to use
an IDR for agent IDs)

Hans, Matthew?

-Jack
> -- Forwarded message --
> Date: Fri, 8 Jun 2018 08:49:57 +0800
> From: kbuild test robot 
> To: kbu...@01.org
> Cc: Julia Lawall 
> Subject: Re: [PATCH v3 2/2] IB/mad: Use ID allocator routines to
> allocate agent number
> 
> Hi Hans,
> 
> Thank you for the patch! Perhaps something to improve:
> 
> [auto build test WARNING on linus/master]
> [also build test WARNING on v4.17]
> [if your patch is applied to the wrong git tree, please drop us a
> note to help improve the system]
> 
> url:
> https://github.com/0day-ci/linux/commits/Hans-Westgaard-Ry/IB-mad-Use-ID-allocator-routines-to-allocate-agent-number/20180608-022348
>  ::
> branch date: 6 hours ago :: commit date: 6 hours ago
> 
> >> drivers/infiniband/core/mad.c:382:5-21: WARNING: Unsigned
> >> expression compared with zero: ib_mad_client_id < 0  
> 
> #
> https://github.com/0day-ci/linux/commit/74b1ba09003c9d132878734bf44f32a62bad31db
> git remote add linux-review https://github.com/0day-ci/linux git
> remote update linux-review git checkout
> 74b1ba09003c9d132878734bf44f32a62bad31db vim +382
> drivers/infiniband/core/mad.c
> 
> 2527e681 Sean Hefty2006-07-20  190
> ^1da177e Linus Torvalds2005-04-16  191  /*
> ^1da177e Linus Torvalds2005-04-16  192   * ib_register_mad_agent
> - Register to send/receive MADs ^1da177e Linus Torvalds
> 2005-04-16  193   */ ^1da177e Linus Torvalds2005-04-16  194
> struct ib_mad_agent *ib_register_mad_agent(struct ib_device *device,
> ^1da177e Linus Torvalds2005-04-16  195
>  u8 port_num, ^1da177e
> Linus Torvalds2005-04-16  196
>  enum ib_qp_type qp_type,
> ^1da177e Linus Torvalds2005-04-16  197
>  struct ib_mad_reg_req
> *mad_reg_req, ^1da177e Linus Torvalds2005-04-16  198
>  u8 rmpp_version, ^1da177e
> Linus Torvalds2005-04-16  199
>  ib_mad_send_handler
> send_handler, ^1da177e Linus Torvalds2005-04-16  200
>  ib_mad_recv_handler
> recv_handler, 0f29b46d Ira Weiny 2014-08-08  201
>  void *context, 0f29b46d
> Ira Weiny 2014-08-08  202
>  u32 registration_flags)
> ^1da177e Linus Torvalds2005-04-16  203  { ^1da177e Linus
> Torvalds2005-04-16  204   struct ib_mad_port_private
> *port_priv; ^1da177e Linus Torvalds2005-04-16  205
> struct ib_mad_agent *ret = ERR_PTR(-EINVAL); ^1da177e Linus
> Torvalds2005-04-16  206   struct ib_mad_agent_private
> *mad_agent_priv; ^1da177e Linus Torvalds2005-04-16  207
> struct ib_mad_reg_req *reg_req = NULL; ^1da177e Linus Torvalds
> 2005-04-16  208   struct ib_mad_mgmt_class_table *class;
> ^1da177e Linus Torvalds2005-04-16  209struct
> ib_mad_mgmt_vendor_class_table *vendor; ^1da177e Linus Torvalds
> 2005-04-16  210   struct ib_mad_mgmt_vendor_class
> *vendor_class; ^1da177e Linus Torvalds2005-04-16  211
> struct ib_mad_mgmt_method_table *method; ^1da177e Linus Torvalds
> 2005-04-16  212   int ret2, qpn; ^1da177e Linus Torvalds
> 2005-04-16  213   unsigned long flags; ^1da177e Linus
> Torvalds2005-04-16  214   u8 mgmt_class, vclass; 74b1ba09
> Hans Westgaard Ry 2018-06-07  215 u32 ib_mad_client_id;
> ^1da177e Linus Torvalds2005-04-16  216/* Validate
> parameters */ ^1da177e Linus Torvalds2005-04-16  217  qpn
> = get_spl_qp_index(qp_type); 9ad13a42 Ira Weiny 2014-08-08
> 218   if (qpn == -1) { 9ad13a42 Ira Weiny 2014-08-08
> 219   dev_notice(>dev, 9ad13a42 Ira
> Weiny 2014-08-08  220
> "ib_register_mad_agent: invalid QP Type %d\n", 9ad13a42 Ira
> Weiny 2014-08-08  221qp_type);
> ^1da177e Linus Torvalds2005-04-16  222goto
> error1; 9ad13a42 Ira Weiny 2014-08-08  223}
> ^1da177e Linus Torvalds2005-04-16  224 9ad13a42 Ira Weiny
> 2014-08-08  225   if (rmpp_version && rmpp_version !=
> IB_MGMT_RMPP_VERSION) { 9ad13a42 Ira Weiny 2014-08-08  226
>   dev_notice(>dev, 9ad13a42 Ira Weiny
> 2014-08-08  227  "ib_register_mad_agent:
> invalid RMPP Version %u\n", 9ad13a42 Ira Weiny 2014-08-08
> 228  rmpp_version); fa619a77 Hal
> Rosenstock2005-07-27  229 goto error1; 9ad13a42
> Ira Weiny 2014-08-08  230 } ^1da177e Linus
> Torvalds2005-04-16  231 ^1da177e Linus Torvalds  

Re: [PATCH 2/2] IB/mad: Use IDR for agent IDs

2018-06-13 Thread jackm
On Fri,  8 Jun 2018 10:42:18 -0700
Matthew Wilcox  wrote:

> From: Matthew Wilcox 
> 
> Allocate agent IDs from a global IDR instead of an atomic variable.
> This eliminates the possibility of reusing an ID which is already in
> use after 4 billion registrations, and we can also limit the assigned
> ID to be less than 2^24, which fixes a bug in the mlx4 device.
> 
> We look up the agent under protection of the RCU lock, which means we
> have to free the agent using kfree_rcu, and only increment the
> reference counter if it is not 0.
> 
> Signed-off-by: Matthew Wilcox 
> ---
>  drivers/infiniband/core/mad.c  | 78
> ++ drivers/infiniband/core/mad_priv.h |
> 7 +-- include/linux/idr.h|  9 
>  3 files changed, 59 insertions(+), 35 deletions(-)
> 
> diff --git a/drivers/infiniband/core/mad.c
> b/drivers/infiniband/core/mad.c index 68f4dda916c8..62384a3dd3ec
> 100644 --- a/drivers/infiniband/core/mad.c
> +++ b/drivers/infiniband/core/mad.c
> @@ -38,6 +38,7 @@
>  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>  
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -58,8 +59,8 @@ MODULE_PARM_DESC(send_queue_size, "Size of send
> queue in number of work requests module_param_named(recv_queue_size,
> mad_recvq_size, int, 0444); MODULE_PARM_DESC(recv_queue_size, "Size
> of receive queue in number of work requests"); 
> +static DEFINE_IDR(ib_mad_clients);
>  static struct list_head ib_mad_port_list;
> -static atomic_t ib_mad_client_id = ATOMIC_INIT(0);
>  
>  /* Port list lock */
>  static DEFINE_SPINLOCK(ib_mad_port_list_lock);
> @@ -377,13 +378,24 @@ struct ib_mad_agent
> *ib_register_mad_agent(struct ib_device *device, goto error4;
>   }
>  
> - spin_lock_irq(_priv->reg_lock);
> - mad_agent_priv->agent.hi_tid =
> atomic_inc_return(_mad_client_id);
> + idr_preload(GFP_KERNEL);
> + idr_lock(_mad_clients);
> + ret2 = idr_alloc_cyclic(_mad_clients, mad_agent_priv, 0,
> + (1 << 24), GFP_ATOMIC);
> + idr_unlock(_mad_clients);
> + idr_preload_end();
> +
> + if (ret2 < 0) {
> + ret = ERR_PTR(ret2);
> + goto error5;
> + }
> + mad_agent_priv->agent.hi_tid = ret2;
>  
>   /*
>* Make sure MAD registration (if supplied)
>* is non overlapping with any existing ones
>*/
> + spin_lock_irq(_priv->reg_lock);
>   if (mad_reg_req) {
>   mgmt_class =
> convert_mgmt_class(mad_reg_req->mgmt_class); if
> (!is_vendor_class(mgmt_class)) { @@ -394,7 +406,7 @@ struct
> ib_mad_agent *ib_register_mad_agent(struct ib_device *device, if
> (method) { if (method_in_use(,
>  mad_reg_req))
> - goto error5;
> + goto error6;
>   }
>   }
>   ret2 = add_nonoui_reg_req(mad_reg_req,
> mad_agent_priv, @@ -410,24 +422,25 @@ struct ib_mad_agent
> *ib_register_mad_agent(struct ib_device *device, if
> (is_vendor_method_in_use( vendor_class,
>   mad_reg_req))
> - goto error5;
> + goto error6;
>   }
>   }
>   ret2 = add_oui_reg_req(mad_reg_req,
> mad_agent_priv); }
>   if (ret2) {
>   ret = ERR_PTR(ret2);
> - goto error5;
> + goto error6;
>   }
>   }
> -
> - /* Add mad agent into port's agent list */
> - list_add_tail(_agent_priv->agent_list,
> _priv->agent_list); spin_unlock_irq(_priv->reg_lock);
>  
>   return _agent_priv->agent;
> -error5:
> +error6:
>   spin_unlock_irq(_priv->reg_lock);
> + idr_lock(_mad_clients);
> + idr_remove(_mad_clients, mad_agent_priv->agent.hi_tid);
> + idr_unlock(_mad_clients);
> +error5:
>   ib_mad_agent_security_cleanup(_agent_priv->agent);
>  error4:
>   kfree(reg_req);
> @@ -589,8 +602,10 @@ static void unregister_mad_agent(struct
> ib_mad_agent_private *mad_agent_priv) 
>   spin_lock_irq(_priv->reg_lock);
>   remove_mad_reg_req(mad_agent_priv);
> - list_del(_agent_priv->agent_list);
>   spin_unlock_irq(_priv->reg_lock);
> + idr_lock(_mad_clients);
> + idr_remove(_mad_clients, mad_agent_priv->agent.hi_tid);
> + idr_unlock(_mad_clients);
>  
>   flush_workqueue(port_priv->wq);
>   ib_cancel_rmpp_recvs(mad_agent_priv);
> @@ -601,7 +616,7 @@ static void unregister_mad_agent(struct
> ib_mad_agent_private *mad_agent_priv)
> ib_mad_agent_security_cleanup(_agent_priv->agent); 
>   kfree(mad_agent_priv->reg_req);
> - kfree(mad_agent_priv);
> + kfree_rcu(mad_agent_priv, rcu);
>  }
>  
>  static void unregister_mad_snoop(struct ib_mad_snoop_private
> 

Re: [PATCH 2/2] IB/mad: Use IDR for agent IDs

2018-06-13 Thread jackm
On Fri,  8 Jun 2018 10:42:18 -0700
Matthew Wilcox  wrote:

> From: Matthew Wilcox 
> 
> Allocate agent IDs from a global IDR instead of an atomic variable.
> This eliminates the possibility of reusing an ID which is already in
> use after 4 billion registrations, and we can also limit the assigned
> ID to be less than 2^24, which fixes a bug in the mlx4 device.
> 
> We look up the agent under protection of the RCU lock, which means we
> have to free the agent using kfree_rcu, and only increment the
> reference counter if it is not 0.
> 
> Signed-off-by: Matthew Wilcox 
> ---
>  drivers/infiniband/core/mad.c  | 78
> ++ drivers/infiniband/core/mad_priv.h |
> 7 +-- include/linux/idr.h|  9 
>  3 files changed, 59 insertions(+), 35 deletions(-)
> 
> diff --git a/drivers/infiniband/core/mad.c
> b/drivers/infiniband/core/mad.c index 68f4dda916c8..62384a3dd3ec
> 100644 --- a/drivers/infiniband/core/mad.c
> +++ b/drivers/infiniband/core/mad.c
> @@ -38,6 +38,7 @@
>  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>  
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -58,8 +59,8 @@ MODULE_PARM_DESC(send_queue_size, "Size of send
> queue in number of work requests module_param_named(recv_queue_size,
> mad_recvq_size, int, 0444); MODULE_PARM_DESC(recv_queue_size, "Size
> of receive queue in number of work requests"); 
> +static DEFINE_IDR(ib_mad_clients);
>  static struct list_head ib_mad_port_list;
> -static atomic_t ib_mad_client_id = ATOMIC_INIT(0);
>  
>  /* Port list lock */
>  static DEFINE_SPINLOCK(ib_mad_port_list_lock);
> @@ -377,13 +378,24 @@ struct ib_mad_agent
> *ib_register_mad_agent(struct ib_device *device, goto error4;
>   }
>  
> - spin_lock_irq(_priv->reg_lock);
> - mad_agent_priv->agent.hi_tid =
> atomic_inc_return(_mad_client_id);
> + idr_preload(GFP_KERNEL);
> + idr_lock(_mad_clients);
> + ret2 = idr_alloc_cyclic(_mad_clients, mad_agent_priv, 0,
> + (1 << 24), GFP_ATOMIC);
> + idr_unlock(_mad_clients);
> + idr_preload_end();
> +
> + if (ret2 < 0) {
> + ret = ERR_PTR(ret2);
> + goto error5;
> + }
> + mad_agent_priv->agent.hi_tid = ret2;
>  
>   /*
>* Make sure MAD registration (if supplied)
>* is non overlapping with any existing ones
>*/
> + spin_lock_irq(_priv->reg_lock);
>   if (mad_reg_req) {
>   mgmt_class =
> convert_mgmt_class(mad_reg_req->mgmt_class); if
> (!is_vendor_class(mgmt_class)) { @@ -394,7 +406,7 @@ struct
> ib_mad_agent *ib_register_mad_agent(struct ib_device *device, if
> (method) { if (method_in_use(,
>  mad_reg_req))
> - goto error5;
> + goto error6;
>   }
>   }
>   ret2 = add_nonoui_reg_req(mad_reg_req,
> mad_agent_priv, @@ -410,24 +422,25 @@ struct ib_mad_agent
> *ib_register_mad_agent(struct ib_device *device, if
> (is_vendor_method_in_use( vendor_class,
>   mad_reg_req))
> - goto error5;
> + goto error6;
>   }
>   }
>   ret2 = add_oui_reg_req(mad_reg_req,
> mad_agent_priv); }
>   if (ret2) {
>   ret = ERR_PTR(ret2);
> - goto error5;
> + goto error6;
>   }
>   }
> -
> - /* Add mad agent into port's agent list */
> - list_add_tail(_agent_priv->agent_list,
> _priv->agent_list); spin_unlock_irq(_priv->reg_lock);
>  
>   return _agent_priv->agent;
> -error5:
> +error6:
>   spin_unlock_irq(_priv->reg_lock);
> + idr_lock(_mad_clients);
> + idr_remove(_mad_clients, mad_agent_priv->agent.hi_tid);
> + idr_unlock(_mad_clients);
> +error5:
>   ib_mad_agent_security_cleanup(_agent_priv->agent);
>  error4:
>   kfree(reg_req);
> @@ -589,8 +602,10 @@ static void unregister_mad_agent(struct
> ib_mad_agent_private *mad_agent_priv) 
>   spin_lock_irq(_priv->reg_lock);
>   remove_mad_reg_req(mad_agent_priv);
> - list_del(_agent_priv->agent_list);
>   spin_unlock_irq(_priv->reg_lock);
> + idr_lock(_mad_clients);
> + idr_remove(_mad_clients, mad_agent_priv->agent.hi_tid);
> + idr_unlock(_mad_clients);
>  
>   flush_workqueue(port_priv->wq);
>   ib_cancel_rmpp_recvs(mad_agent_priv);
> @@ -601,7 +616,7 @@ static void unregister_mad_agent(struct
> ib_mad_agent_private *mad_agent_priv)
> ib_mad_agent_security_cleanup(_agent_priv->agent); 
>   kfree(mad_agent_priv->reg_req);
> - kfree(mad_agent_priv);
> + kfree_rcu(mad_agent_priv, rcu);
>  }
>  
>  static void unregister_mad_snoop(struct ib_mad_snoop_private
> 

Re: [PATCH 2/2] IB/mad: Use IDR for agent IDs

2018-06-12 Thread jackm
On Fri,  8 Jun 2018 10:42:18 -0700
Matthew Wilcox  wrote:

> + rcu_read_lock();
> + mad_agent = idr_find(_mad_clients, hi_tid);
> + if (mad_agent
> && !atomic_inc_not_zero(_agent->refcount))
> + mad_agent = NULL;
> + rcu_read_unlock();

Hi Matthew,

I don't see the flow which can explain using atomic_inc_not_zero() here.

The refcount will go to zero only when unregister_mad_agent() is
called (code below, see asterisks):
idr_lock(_mad_clients);
 ***idr_remove(_mad_clients, mad_agent_priv->agent.hi_tid);
idr_unlock(_mad_clients);

flush_workqueue(port_priv->wq);
ib_cancel_rmpp_recvs(mad_agent_priv);

 ***deref_mad_agent(mad_agent_priv);
[JPM] The call to idr_find in the interrupt context
would need to occur here for the refcount to have a
possibility of being zero.
Shouldn't idr_find in the interrupt context fail, since
idr_remove has already been invoked?
wait_for_completion(_agent_priv->comp);

The refcount will be able to go to zero only after deref_mad_agent is
called above.  Before this, however, idr_remove() has been called --
so, if my understanding is correct, the idr_find call in
find_mad_agent() should not succeed since the refcount can get to zero
only AFTER the idr_remove call.

Could you please explain the flow which can result in idr_find
succeeding (in the interrupt context) after idr_remove has been invoked
(in the process context)?  Will idr_find succeed even after
idr_remove, and only fail after kfree_rcu is invoked as well? (or,
maybe after some garbage-collection delay?)

Thx!

-Jack


Re: [PATCH 2/2] IB/mad: Use IDR for agent IDs

2018-06-12 Thread jackm
On Fri,  8 Jun 2018 10:42:18 -0700
Matthew Wilcox  wrote:

> + rcu_read_lock();
> + mad_agent = idr_find(_mad_clients, hi_tid);
> + if (mad_agent
> && !atomic_inc_not_zero(_agent->refcount))
> + mad_agent = NULL;
> + rcu_read_unlock();

Hi Matthew,

I don't see the flow which can explain using atomic_inc_not_zero() here.

The refcount will go to zero only when unregister_mad_agent() is
called (code below, see asterisks):
idr_lock(_mad_clients);
 ***idr_remove(_mad_clients, mad_agent_priv->agent.hi_tid);
idr_unlock(_mad_clients);

flush_workqueue(port_priv->wq);
ib_cancel_rmpp_recvs(mad_agent_priv);

 ***deref_mad_agent(mad_agent_priv);
[JPM] The call to idr_find in the interrupt context
would need to occur here for the refcount to have a
possibility of being zero.
Shouldn't idr_find in the interrupt context fail, since
idr_remove has already been invoked?
wait_for_completion(_agent_priv->comp);

The refcount will be able to go to zero only after deref_mad_agent is
called above.  Before this, however, idr_remove() has been called --
so, if my understanding is correct, the idr_find call in
find_mad_agent() should not succeed since the refcount can get to zero
only AFTER the idr_remove call.

Could you please explain the flow which can result in idr_find
succeeding (in the interrupt context) after idr_remove has been invoked
(in the process context)?  Will idr_find succeed even after
idr_remove, and only fail after kfree_rcu is invoked as well? (or,
maybe after some garbage-collection delay?)

Thx!

-Jack


Re: [PATCH 2/2] IB/mad: Use IDR for agent IDs

2018-06-11 Thread jackm
On Mon, 11 Jun 2018 10:19:18 -0600
Jason Gunthorpe  wrote:

> On Mon, Jun 11, 2018 at 09:19:14AM +0300, jackm wrote:
> > On Sun, 10 Jun 2018 22:42:03 -0600
> > Jason Gunthorpe  wrote:
> >   
> > > Er, the spec has nothing to do with this. In Linux the TID is made
> > > unique because the core code provides 32 bits that are unique and
> > > the user provides another 32 bits that are unique. The driver
> > > cannot change any of those bits without risking non-uniquenes,
> > > which is exactly the bug mlx4 created when it stepped outside its
> > > bounds and improperly overrode bits in the TID for its own
> > > internal use.  
> > 
> > Actually, the opposite is true here.  When SRIOV is active, each VM
> > generates its *own* TIDs -- with 32 bits of agent number and 32 bits
> > of counter.  
> 
> And it does it while re-using the LRH of the host, so all VMs and the
> host are now forced to share a TID space, yes I know.
> 
> > There is a chance that two different VMs can generate the same TID!
> > Encoding the slave (VM) number in the packet actually guarantees
> > uniqueness here.  
> 
> Virtualizing the TID in the driver would be fine, but it must
> virtualize all the TIDs (even those generated by the HOST).

It DOES do so.  The host slave id is 0. Slave numbers start with 1.
If the MS byte contains a zero after paravirtualization, the MAD
was sent by the host.
In fact, ALL mads are paravirtualized -- including those to/from the host.

> 
> Just blindly assuming the host doesn't generate TID's that overlap
> with the virtualization process is a bug.
> 
Not the case, host mads are also paravirtualized.

> > There is nothing wrong with modifying the TID in a reversible way in
> > order to: a. guarantee uniqueness b. identify the VM which should
> > receive the response packet  
> 
> Sure, as long as *all* TID's sharing a LRH are vitalized like this.
> 
> > The problem was created when the agent-id numbers started to use the
> > most-significant byte (thus making the MSB slave-id addition
> > impossible).  
> 
> It hasn't always been this way? What commit?
> 
Commit: 37bfc7c1e83f1 ("IB/mlx4: SR-IOV multiplex and demultiplex MADs"):

Code snippet which replaces the MS byte is below (file 
drivers/infiniband/hw/mlx4/mad.c,
procedure mlx4_ib_multiplex_mad() ).
Just an aside:  Oracle noted the problem on the *host*: The host's message log
contained the warning issued on line 1529, with slave=0 (which is the 
hypervisor).

37bfc7c1e (Jack Morgenstein2012-08-03 08:40:44 + 1519)  switch 
(tunnel->mad.mad_hdr.method) {
37bfc7c1e (Jack Morgenstein2012-08-03 08:40:44 + 1520)  case 
IB_MGMT_METHOD_SET:
37bfc7c1e (Jack Morgenstein2012-08-03 08:40:44 + 1521)  case 
IB_MGMT_METHOD_GET:
37bfc7c1e (Jack Morgenstein2012-08-03 08:40:44 + 1522)  case 
IB_MGMT_METHOD_REPORT:
37bfc7c1e (Jack Morgenstein2012-08-03 08:40:44 + 1523)  case 
IB_SA_METHOD_GET_TABLE:
37bfc7c1e (Jack Morgenstein2012-08-03 08:40:44 + 1524)  case 
IB_SA_METHOD_DELETE:
37bfc7c1e (Jack Morgenstein2012-08-03 08:40:44 + 1525)  case 
IB_SA_METHOD_GET_MULTI:
37bfc7c1e (Jack Morgenstein2012-08-03 08:40:44 + 1526)  case 
IB_SA_METHOD_GET_TRACE_TBL:
37bfc7c1e (Jack Morgenstein2012-08-03 08:40:44 + 1527)  
slave_id = (u8 *) >mad.mad_hdr.tid;
37bfc7c1e (Jack Morgenstein2012-08-03 08:40:44 + 1528)  
if (*slave_id) {
37bfc7c1e (Jack Morgenstein2012-08-03 08:40:44 + 1529)  
mlx4_ib_warn(ctx->ib_dev, "egress mad has non-null tid msb:%d "
37bfc7c1e (Jack Morgenstein2012-08-03 08:40:44 + 1530)  
 "class:%d slave:%d\n", *slave_id,
37bfc7c1e (Jack Morgenstein2012-08-03 08:40:44 + 1531)  
 tunnel->mad.mad_hdr.mgmt_class, slave);
37bfc7c1e (Jack Morgenstein2012-08-03 08:40:44 + 1532)  
return;
37bfc7c1e (Jack Morgenstein2012-08-03 08:40:44 + 1533)  
} else
37bfc7c1e (Jack Morgenstein2012-08-03 08:40:44 + 1534)  
*slave_id = slave;
37bfc7c1e (Jack Morgenstein2012-08-03 08:40:44 + 1535)  default:
37bfc7c1e (Jack Morgenstein2012-08-03 08:40:44 + 1536)  
/* nothing */;
37bfc7c1e (Jack Morgenstein2012-08-03 08:40:44 + 1537)  }

> Jason 



Re: [PATCH 2/2] IB/mad: Use IDR for agent IDs

2018-06-11 Thread jackm
On Mon, 11 Jun 2018 10:19:18 -0600
Jason Gunthorpe  wrote:

> On Mon, Jun 11, 2018 at 09:19:14AM +0300, jackm wrote:
> > On Sun, 10 Jun 2018 22:42:03 -0600
> > Jason Gunthorpe  wrote:
> >   
> > > Er, the spec has nothing to do with this. In Linux the TID is made
> > > unique because the core code provides 32 bits that are unique and
> > > the user provides another 32 bits that are unique. The driver
> > > cannot change any of those bits without risking non-uniquenes,
> > > which is exactly the bug mlx4 created when it stepped outside its
> > > bounds and improperly overrode bits in the TID for its own
> > > internal use.  
> > 
> > Actually, the opposite is true here.  When SRIOV is active, each VM
> > generates its *own* TIDs -- with 32 bits of agent number and 32 bits
> > of counter.  
> 
> And it does it while re-using the LRH of the host, so all VMs and the
> host are now forced to share a TID space, yes I know.
> 
> > There is a chance that two different VMs can generate the same TID!
> > Encoding the slave (VM) number in the packet actually guarantees
> > uniqueness here.  
> 
> Virtualizing the TID in the driver would be fine, but it must
> virtualize all the TIDs (even those generated by the HOST).

It DOES do so.  The host slave id is 0. Slave numbers start with 1.
If the MS byte contains a zero after paravirtualization, the MAD
was sent by the host.
In fact, ALL mads are paravirtualized -- including those to/from the host.

> 
> Just blindly assuming the host doesn't generate TID's that overlap
> with the virtualization process is a bug.
> 
Not the case, host mads are also paravirtualized.

> > There is nothing wrong with modifying the TID in a reversible way in
> > order to: a. guarantee uniqueness b. identify the VM which should
> > receive the response packet  
> 
> Sure, as long as *all* TID's sharing a LRH are vitalized like this.
> 
> > The problem was created when the agent-id numbers started to use the
> > most-significant byte (thus making the MSB slave-id addition
> > impossible).  
> 
> It hasn't always been this way? What commit?
> 
Commit: 37bfc7c1e83f1 ("IB/mlx4: SR-IOV multiplex and demultiplex MADs"):

Code snippet which replaces the MS byte is below (file 
drivers/infiniband/hw/mlx4/mad.c,
procedure mlx4_ib_multiplex_mad() ).
Just an aside:  Oracle noted the problem on the *host*: The host's message log
contained the warning issued on line 1529, with slave=0 (which is the 
hypervisor).

37bfc7c1e (Jack Morgenstein2012-08-03 08:40:44 + 1519)  switch 
(tunnel->mad.mad_hdr.method) {
37bfc7c1e (Jack Morgenstein2012-08-03 08:40:44 + 1520)  case 
IB_MGMT_METHOD_SET:
37bfc7c1e (Jack Morgenstein2012-08-03 08:40:44 + 1521)  case 
IB_MGMT_METHOD_GET:
37bfc7c1e (Jack Morgenstein2012-08-03 08:40:44 + 1522)  case 
IB_MGMT_METHOD_REPORT:
37bfc7c1e (Jack Morgenstein2012-08-03 08:40:44 + 1523)  case 
IB_SA_METHOD_GET_TABLE:
37bfc7c1e (Jack Morgenstein2012-08-03 08:40:44 + 1524)  case 
IB_SA_METHOD_DELETE:
37bfc7c1e (Jack Morgenstein2012-08-03 08:40:44 + 1525)  case 
IB_SA_METHOD_GET_MULTI:
37bfc7c1e (Jack Morgenstein2012-08-03 08:40:44 + 1526)  case 
IB_SA_METHOD_GET_TRACE_TBL:
37bfc7c1e (Jack Morgenstein2012-08-03 08:40:44 + 1527)  
slave_id = (u8 *) >mad.mad_hdr.tid;
37bfc7c1e (Jack Morgenstein2012-08-03 08:40:44 + 1528)  
if (*slave_id) {
37bfc7c1e (Jack Morgenstein2012-08-03 08:40:44 + 1529)  
mlx4_ib_warn(ctx->ib_dev, "egress mad has non-null tid msb:%d "
37bfc7c1e (Jack Morgenstein2012-08-03 08:40:44 + 1530)  
 "class:%d slave:%d\n", *slave_id,
37bfc7c1e (Jack Morgenstein2012-08-03 08:40:44 + 1531)  
 tunnel->mad.mad_hdr.mgmt_class, slave);
37bfc7c1e (Jack Morgenstein2012-08-03 08:40:44 + 1532)  
return;
37bfc7c1e (Jack Morgenstein2012-08-03 08:40:44 + 1533)  
} else
37bfc7c1e (Jack Morgenstein2012-08-03 08:40:44 + 1534)  
*slave_id = slave;
37bfc7c1e (Jack Morgenstein2012-08-03 08:40:44 + 1535)  default:
37bfc7c1e (Jack Morgenstein2012-08-03 08:40:44 + 1536)  
/* nothing */;
37bfc7c1e (Jack Morgenstein2012-08-03 08:40:44 + 1537)  }

> Jason 



Re: [PATCH 2/2] IB/mad: Use IDR for agent IDs

2018-06-11 Thread jackm
On Sun, 10 Jun 2018 22:42:03 -0600
Jason Gunthorpe  wrote:

> Er, the spec has nothing to do with this. In Linux the TID is made
> unique because the core code provides 32 bits that are unique and the
> user provides another 32 bits that are unique. The driver cannot
> change any of those bits without risking non-uniquenes, which is
> exactly the bug mlx4 created when it stepped outside its bounds and
> improperly overrode bits in the TID for its own internal use.

Actually, the opposite is true here.
When SRIOV is active, each VM generates its *own* TIDs -- with 32 bits
of agent number and 32 bits of counter.

There is a  chance that two different VMs can generate the same TID!
Encoding the slave (VM) number in the packet actually guarantees
uniqueness here.
There is nothing wrong with modifying the TID in a reversible way in
order to:
a. guarantee uniqueness
b. identify the VM which should receive the response packet

The problem was created when the agent-id numbers started to use the
most-significant byte (thus making the MSB slave-id addition
impossible).


Re: [PATCH 2/2] IB/mad: Use IDR for agent IDs

2018-06-11 Thread jackm
On Sun, 10 Jun 2018 22:42:03 -0600
Jason Gunthorpe  wrote:

> Er, the spec has nothing to do with this. In Linux the TID is made
> unique because the core code provides 32 bits that are unique and the
> user provides another 32 bits that are unique. The driver cannot
> change any of those bits without risking non-uniquenes, which is
> exactly the bug mlx4 created when it stepped outside its bounds and
> improperly overrode bits in the TID for its own internal use.

Actually, the opposite is true here.
When SRIOV is active, each VM generates its *own* TIDs -- with 32 bits
of agent number and 32 bits of counter.

There is a  chance that two different VMs can generate the same TID!
Encoding the slave (VM) number in the packet actually guarantees
uniqueness here.
There is nothing wrong with modifying the TID in a reversible way in
order to:
a. guarantee uniqueness
b. identify the VM which should receive the response packet

The problem was created when the agent-id numbers started to use the
most-significant byte (thus making the MSB slave-id addition
impossible).


Re: [PATCH] IB/mad: Use ID allocator routines to allocate agent number

2018-05-30 Thread jackm
On Tue, 29 May 2018 12:54:45 +0300
Leon Romanovsky  wrote:

> On Tue, May 29, 2018 at 11:54:59AM +0300, Leon Romanovsky wrote:
> > On Tue, May 29, 2018 at 09:38:08AM +0200, Hans Westgaard Ry wrote:  
> > > The agent TID is a 64 bit value split in two dwords.  The least
> > > significant dword is the TID running counter. The most significant
> > > dword is the agent number. In the CX-3 shared port model, the mlx4
> > > driver uses the most significant byte of the agent number to
> > > store the slave number, making agent numbers greater and equal to
> > > 2^24 (3 bytes) unusable.  The current codebase uses a variable
> > > which is incremented atomically for each new agent number giving
> > > too large agent numbers over time.  The IDA set of functions are
> > > used instead of the simple counter approach. This allows re-use
> > > of agent numbers. A sysctl variable is also introduced, to
> > > control the max agent number.  
> >
> > Why don't you simply limit this number per-driver? By default, any
> > variable is allowed and mlx4_ib will set something else.

It is much simpler to do things here -- the current allocation scheme
does have a wrap-around problem, so there is a good reason for using a
global allocator residing in the ib core.

> >
> > What is the advantage of having sysctl?  
> 
> Anyway, it doesn't pass smoke test.

REASON:  The start argument in ida_simple_get should be 1, not 0:
+   ib_mad_client_id = ida_simple_get(_mad_client_ids,
+ 1,
+ ib_mad_sysctl_client_id_max,
+ GFP_KERNEL);


The agent ID 0 is interpreted as -no agent-, is used by snoop agents,
and is never allocated.

The first allocated agent id is 1:
static atomic_t ib_mad_client_id_min = ATOMIC_INIT(0);
...
mad_agent_priv->agent.hi_tid =atomic_inc_return(_mad_client_id);

If this is fixed, we do not get the panic.

However, please note Jason's feedback and my suggestion -- this
ida-based interface needs to avoid immediate re-use of freed ids.

 -Jack
> 
> [  126.428407] RPC: Unregistered rdma transport module.
> [  126.428513] RPC: Unregistered rdma backchannel transport module.
> [  194.664081] IPv6: ADDRCONF(NETDEV_UP): ib0: link is not ready
> [  209.068702] BUG: unable to handle kernel NULL pointer dereference
> at 0070 [  209.068858] PGD 8000341cf067 P4D
> 8000341cf067 PUD 34188067 PMD 0 [  209.068941] Oops: 0002 [#1]
> SMP PTI [  209.069006] Modules linked in: netconsole nfsv3 nfs fscache
> mlx4_ib(-) mlx4_en mlx4_core devlink ib_ipoib rdma_ucm ib_ucm
> ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core dm_mirror
> dm_region_hash dm_log dm_mod nfsd pcspkr i2c_piix4 auth_rpcgss
> nfs_acl lockd grace sunrpc ip_tables ata_generic cirrus
> drm_kms_helper pata_acpi syscopyarea sysfillrect sysimgblt
> fb_sys_fops ttm drm e1000 virtio_console i2c_core serio_raw ata_piix
> floppy [last unloaded: mlxfw] [  209.069312] CPU: 4 PID: 11048 Comm:
> modprobe Not tainted
> 4.17.0-rc7-2018-05-29_11-04-56_Hans_Westgaard_Ry__hans_westga #1
> [  209.069413] Hardware name: Red Hat KVM, BIOS Bochs 01/01/2011
> [  209.069486] RIP: 0010:_raw_spin_lock_irqsave+0x1e/0x40
> [  209.069536] RSP: 0018:c9b4fd70 EFLAGS: 00010046
> [  209.069591] RAX:  RBX: 0246 RCX:
> ea0004d7ed00 [  209.069653] RDX: 0001 RSI:
>  RDI: 0070 [  209.069717] RBP:
>  R08: 88013446fc00 R09: 0001801f
> [  209.069778] R10: 0001 R11: 88013446fc00 R12:
> 0070 [  209.069849] R13: 0202 R14:
>  R15:  [  209.069915] FS:
> 7fc34caf7740() GS:88013fd0()
> knlGS: [  209.069987] CS:  0010 DS:  ES: 
> CR0: 80050033 [  209.070043] CR2: 0070 CR3:
> 8853e000 CR4: 06e0 [  209.070128] Call Trace:
> [  209.070189]  ib_unregister_mad_agent+0x2d/0x540 [ib_core]
> [  209.070260]  ? __slab_free+0x9a/0x2d0 [  209.070332]
> ib_agent_port_close+0xad/0xf0 [ib_core] [  209.070396]
> ib_mad_remove_device+0x59/0xb0 [ib_core] [  209.070466]
> ib_unregister_device+0xd4/0x180 [ib_core] [  209.070537]
> mlx4_ib_remove+0x67/0x1f0 [mlx4_ib] [  209.070594]
> mlx4_remove_device+0x93/0xa0 [mlx4_core] [  209.070648]
> mlx4_unregister_interface+0x37/0x90 [mlx4_core] [  209.070705]
> mlx4_ib_cleanup+0xc/0x4db [mlx4_ib] [  209.072113]
> __x64_sys_delete_module+0x15b/0x260 [  209.073567]  ?
> exit_to_usermode_loop+0x7f/0x95 [  209.074945]
> do_syscall_64+0x48/0x100 [  209.076448]
> entry_SYSCALL_64_after_hwframe+0x44/0xa9 [  209.077799] RIP:
> 0033:0x7fc34bfe36b7 [  209.079122] RSP: 002b:7ffc8ffa98b8 EFLAGS:
> 0206 ORIG_RAX: 00b0 [  209.080500] RAX:
> ffda RBX: 013455c0 RCX: 7fc34bfe36b7
> [  209.081875] RDX:  RSI: 0800 RDI:
> 01345628 [  

Re: [PATCH] IB/mad: Use ID allocator routines to allocate agent number

2018-05-30 Thread jackm
On Tue, 29 May 2018 12:54:45 +0300
Leon Romanovsky  wrote:

> On Tue, May 29, 2018 at 11:54:59AM +0300, Leon Romanovsky wrote:
> > On Tue, May 29, 2018 at 09:38:08AM +0200, Hans Westgaard Ry wrote:  
> > > The agent TID is a 64 bit value split in two dwords.  The least
> > > significant dword is the TID running counter. The most significant
> > > dword is the agent number. In the CX-3 shared port model, the mlx4
> > > driver uses the most significant byte of the agent number to
> > > store the slave number, making agent numbers greater and equal to
> > > 2^24 (3 bytes) unusable.  The current codebase uses a variable
> > > which is incremented atomically for each new agent number giving
> > > too large agent numbers over time.  The IDA set of functions are
> > > used instead of the simple counter approach. This allows re-use
> > > of agent numbers. A sysctl variable is also introduced, to
> > > control the max agent number.  
> >
> > Why don't you simply limit this number per-driver? By default, any
> > variable is allowed and mlx4_ib will set something else.

It is much simpler to do things here -- the current allocation scheme
does have a wrap-around problem, so there is a good reason for using a
global allocator residing in the ib core.

> >
> > What is the advantage of having sysctl?  
> 
> Anyway, it doesn't pass smoke test.

REASON:  The start argument in ida_simple_get should be 1, not 0:
+   ib_mad_client_id = ida_simple_get(_mad_client_ids,
+ 1,
+ ib_mad_sysctl_client_id_max,
+ GFP_KERNEL);


The agent ID 0 is interpreted as -no agent-, is used by snoop agents,
and is never allocated.

The first allocated agent id is 1:
static atomic_t ib_mad_client_id_min = ATOMIC_INIT(0);
...
mad_agent_priv->agent.hi_tid =atomic_inc_return(_mad_client_id);

If this is fixed, we do not get the panic.

However, please note Jason's feedback and my suggestion -- this
ida-based interface needs to avoid immediate re-use of freed ids.

 -Jack
> 
> [  126.428407] RPC: Unregistered rdma transport module.
> [  126.428513] RPC: Unregistered rdma backchannel transport module.
> [  194.664081] IPv6: ADDRCONF(NETDEV_UP): ib0: link is not ready
> [  209.068702] BUG: unable to handle kernel NULL pointer dereference
> at 0070 [  209.068858] PGD 8000341cf067 P4D
> 8000341cf067 PUD 34188067 PMD 0 [  209.068941] Oops: 0002 [#1]
> SMP PTI [  209.069006] Modules linked in: netconsole nfsv3 nfs fscache
> mlx4_ib(-) mlx4_en mlx4_core devlink ib_ipoib rdma_ucm ib_ucm
> ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core dm_mirror
> dm_region_hash dm_log dm_mod nfsd pcspkr i2c_piix4 auth_rpcgss
> nfs_acl lockd grace sunrpc ip_tables ata_generic cirrus
> drm_kms_helper pata_acpi syscopyarea sysfillrect sysimgblt
> fb_sys_fops ttm drm e1000 virtio_console i2c_core serio_raw ata_piix
> floppy [last unloaded: mlxfw] [  209.069312] CPU: 4 PID: 11048 Comm:
> modprobe Not tainted
> 4.17.0-rc7-2018-05-29_11-04-56_Hans_Westgaard_Ry__hans_westga #1
> [  209.069413] Hardware name: Red Hat KVM, BIOS Bochs 01/01/2011
> [  209.069486] RIP: 0010:_raw_spin_lock_irqsave+0x1e/0x40
> [  209.069536] RSP: 0018:c9b4fd70 EFLAGS: 00010046
> [  209.069591] RAX:  RBX: 0246 RCX:
> ea0004d7ed00 [  209.069653] RDX: 0001 RSI:
>  RDI: 0070 [  209.069717] RBP:
>  R08: 88013446fc00 R09: 0001801f
> [  209.069778] R10: 0001 R11: 88013446fc00 R12:
> 0070 [  209.069849] R13: 0202 R14:
>  R15:  [  209.069915] FS:
> 7fc34caf7740() GS:88013fd0()
> knlGS: [  209.069987] CS:  0010 DS:  ES: 
> CR0: 80050033 [  209.070043] CR2: 0070 CR3:
> 8853e000 CR4: 06e0 [  209.070128] Call Trace:
> [  209.070189]  ib_unregister_mad_agent+0x2d/0x540 [ib_core]
> [  209.070260]  ? __slab_free+0x9a/0x2d0 [  209.070332]
> ib_agent_port_close+0xad/0xf0 [ib_core] [  209.070396]
> ib_mad_remove_device+0x59/0xb0 [ib_core] [  209.070466]
> ib_unregister_device+0xd4/0x180 [ib_core] [  209.070537]
> mlx4_ib_remove+0x67/0x1f0 [mlx4_ib] [  209.070594]
> mlx4_remove_device+0x93/0xa0 [mlx4_core] [  209.070648]
> mlx4_unregister_interface+0x37/0x90 [mlx4_core] [  209.070705]
> mlx4_ib_cleanup+0xc/0x4db [mlx4_ib] [  209.072113]
> __x64_sys_delete_module+0x15b/0x260 [  209.073567]  ?
> exit_to_usermode_loop+0x7f/0x95 [  209.074945]
> do_syscall_64+0x48/0x100 [  209.076448]
> entry_SYSCALL_64_after_hwframe+0x44/0xa9 [  209.077799] RIP:
> 0033:0x7fc34bfe36b7 [  209.079122] RSP: 002b:7ffc8ffa98b8 EFLAGS:
> 0206 ORIG_RAX: 00b0 [  209.080500] RAX:
> ffda RBX: 013455c0 RCX: 7fc34bfe36b7
> [  209.081875] RDX:  RSI: 0800 RDI:
> 01345628 [  

Re: [PATCH] IB/mad: Use ID allocator routines to allocate agent number

2018-05-30 Thread jackm
On Tue, 29 May 2018 10:40:32 -0600
Jason Gunthorpe  wrote:

> On Tue, May 29, 2018 at 06:16:14PM +0200, Håkon Bugge wrote:
> >  
> > > On 29 May 2018, at 17:49, Jason Gunthorpe  wrote:
> > > 
> > > On Tue, May 29, 2018 at 09:38:08AM +0200, Hans Westgaard Ry
> > > wrote:  
> > >> The agent TID is a 64 bit value split in two dwords.  The least
> > >> significant dword is the TID running counter. The most
> > >> significant dword is the agent number. In the CX-3 shared port
> > >> model, the mlx4 driver uses the most significant byte of the
> > >> agent number to store the slave number, making agent numbers
> > >> greater and equal to 2^24 (3 bytes) unusable.  
> > > 
> > > There is no reason for this to be an ida, just do something like
> > > 
> > > mad_agent_priv->agent.hi_tid =
> > > atomic_inc_return(_mad_client_id) &
> > > mad_agent_priv->ib_dev->tid_mask;
> > > 
> > > And have the driver set tid_mask to 3 bytes of 0xFF  
> > 
> > The issue is that some of the mad agents are long-lived, so you will
> > wrap and use the same TID twice.  
> 
> We already have that problem, and using ida is problematic because we
> need to maximize the time between TID re-use, which ida isn't doing.
> 
> Preventing re-use seems like a seperate issue from limiting the range
> to be compatible with mlx4.
> 

Preventing immediate re-use can be accomplished by judicious use of the
start argument (second argument) in the call to ida_simple_get (to
introduce hysteresis into the id allocations).

For example, can do something like:

static atomic_t ib_mad_client_id_min = ATOMIC_INIT(1);
...
ib_mad_client_id = ida_simple_get(_mad_client_ids,
  atomic_read(_mad_client_id_min),
  ib_mad_sysctl_client_id_max,
  GFP_KERNEL);

if (!(ib_mad_client_id % 1000) ||
ib_mad_sysctl_client_id_max - ib_mad_client_id <= 1000)
atomic_set(_mad_client_id_min, 1);
else
atomic_set(_mad_client_id_min, ib_mad_client_id + 1);

The above avoids immediate re-use of ids, and only searches for past
freed ids if the last allocated-id is zero mod 1000.

This is just suggestion -- will probably need some variation of the
above to handle what happens over time (i.e., to not depend on the
modulo operation to reset the search start to 1), to properly handle
how we deal with the start value when we are close to the allowed
client_id_max, and also to implement some sort of locking.

-Jack



Re: [PATCH] IB/mad: Use ID allocator routines to allocate agent number

2018-05-30 Thread jackm
On Tue, 29 May 2018 10:40:32 -0600
Jason Gunthorpe  wrote:

> On Tue, May 29, 2018 at 06:16:14PM +0200, Håkon Bugge wrote:
> >  
> > > On 29 May 2018, at 17:49, Jason Gunthorpe  wrote:
> > > 
> > > On Tue, May 29, 2018 at 09:38:08AM +0200, Hans Westgaard Ry
> > > wrote:  
> > >> The agent TID is a 64 bit value split in two dwords.  The least
> > >> significant dword is the TID running counter. The most
> > >> significant dword is the agent number. In the CX-3 shared port
> > >> model, the mlx4 driver uses the most significant byte of the
> > >> agent number to store the slave number, making agent numbers
> > >> greater and equal to 2^24 (3 bytes) unusable.  
> > > 
> > > There is no reason for this to be an ida, just do something like
> > > 
> > > mad_agent_priv->agent.hi_tid =
> > > atomic_inc_return(_mad_client_id) &
> > > mad_agent_priv->ib_dev->tid_mask;
> > > 
> > > And have the driver set tid_mask to 3 bytes of 0xFF  
> > 
> > The issue is that some of the mad agents are long-lived, so you will
> > wrap and use the same TID twice.  
> 
> We already have that problem, and using ida is problematic because we
> need to maximize the time between TID re-use, which ida isn't doing.
> 
> Preventing re-use seems like a seperate issue from limiting the range
> to be compatible with mlx4.
> 

Preventing immediate re-use can be accomplished by judicious use of the
start argument (second argument) in the call to ida_simple_get (to
introduce hysteresis into the id allocations).

For example, can do something like:

static atomic_t ib_mad_client_id_min = ATOMIC_INIT(1);
...
ib_mad_client_id = ida_simple_get(_mad_client_ids,
  atomic_read(_mad_client_id_min),
  ib_mad_sysctl_client_id_max,
  GFP_KERNEL);

if (!(ib_mad_client_id % 1000) ||
ib_mad_sysctl_client_id_max - ib_mad_client_id <= 1000)
atomic_set(_mad_client_id_min, 1);
else
atomic_set(_mad_client_id_min, ib_mad_client_id + 1);

The above avoids immediate re-use of ids, and only searches for past
freed ids if the last allocated-id is zero mod 1000.

This is just suggestion -- will probably need some variation of the
above to handle what happens over time (i.e., to not depend on the
modulo operation to reset the search start to 1), to properly handle
how we deal with the start value when we are close to the allowed
client_id_max, and also to implement some sort of locking.

-Jack



Re: [PATCH] IB/core: Make ib_mad_client_id atomic

2018-04-30 Thread jackm
On Mon, 30 Apr 2018 13:10:49 -0400
Doug Ledford <dledf...@redhat.com> wrote:

Looks good!

-Jack

> On Mon, 2018-04-30 at 08:49 -0600, Jason Gunthorpe wrote:
> > On Mon, Apr 23, 2018 at 10:16:18PM +0300, jackm wrote:
> >   
> > > > > TIDs need to be globally unique on the entire machine.
> > > Jason, that is not exactly correct.  
> > 
> > The expecation for /dev/umad users is that they all receive locally
> > unique TID prefixes. The kernel may be OK to keep things
> > port-specific but it is slightly breaking the API we are presenting
> > to userspace to allow them to alias..
> > 
> > Jason  
> 
> Would people be happier with this commit message then:
> 
> IB/core: Make ib_mad_client_id atomic
>   
> Currently, the kernel protects access to the agent ID allocator on a
> per port basis using a spinlock, so it is impossible for two
> apps/threads on the same port to get the same TID, but it is entirely
> possible for two threads on different ports to end up with the same
> TID.  
> 
> As this can be confusing (regardless of it being legal according to
> the IB Spec 1.3, C13-18.1.1, in section 13.4.6.4 - TransactionID
> usage), and as the rdma-core user space API for /dev/umad devices
> implies unique TIDs even across ports, make the TID an atomic type so
> that no two allocations, regardless of port number, will be the same.
> 
> Signed-off-by: Håkon Bugge <haakon.bu...@oracle.com>
> Reviewed-by: Jack Morgenstein <ja...@dev.mellanox.co.il>
> Reviewed-by: Ira Weiny <ira.we...@intel.com>
> Reviewed-by: Zhu Yanjun <yanjun@oracle.com>
> Signed-off-by: Doug Ledford <dledf...@redhat.com>
> 
> 



Re: [PATCH] IB/core: Make ib_mad_client_id atomic

2018-04-30 Thread jackm
On Mon, 30 Apr 2018 13:10:49 -0400
Doug Ledford  wrote:

Looks good!

-Jack

> On Mon, 2018-04-30 at 08:49 -0600, Jason Gunthorpe wrote:
> > On Mon, Apr 23, 2018 at 10:16:18PM +0300, jackm wrote:
> >   
> > > > > TIDs need to be globally unique on the entire machine.
> > > Jason, that is not exactly correct.  
> > 
> > The expecation for /dev/umad users is that they all receive locally
> > unique TID prefixes. The kernel may be OK to keep things
> > port-specific but it is slightly breaking the API we are presenting
> > to userspace to allow them to alias..
> > 
> > Jason  
> 
> Would people be happier with this commit message then:
> 
> IB/core: Make ib_mad_client_id atomic
>   
> Currently, the kernel protects access to the agent ID allocator on a
> per port basis using a spinlock, so it is impossible for two
> apps/threads on the same port to get the same TID, but it is entirely
> possible for two threads on different ports to end up with the same
> TID.  
> 
> As this can be confusing (regardless of it being legal according to
> the IB Spec 1.3, C13-18.1.1, in section 13.4.6.4 - TransactionID
> usage), and as the rdma-core user space API for /dev/umad devices
> implies unique TIDs even across ports, make the TID an atomic type so
> that no two allocations, regardless of port number, will be the same.
> 
> Signed-off-by: Håkon Bugge 
> Reviewed-by: Jack Morgenstein 
> Reviewed-by: Ira Weiny 
> Reviewed-by: Zhu Yanjun 
> Signed-off-by: Doug Ledford 
> 
> 



Re: [PATCH] IB/core: Make ib_mad_client_id atomic

2018-04-26 Thread jackm
On Thu, 26 Apr 2018 18:06:10 +0200
Håkon Bugge <haakon.bu...@oracle.com> wrote:

> > On 23 Apr 2018, at 21:16, jackm <ja...@dev.mellanox.co.il> wrote:
> > 
> > On Mon, 23 Apr 2018 16:19:57 +0200
> > Håkon Bugge <haakon.bu...@oracle.com> wrote:
> > 
> >   
> >>> 
> >>> This actually looks like a genuine bug, why is it described only
> >>> as 'confusing'? ib_register_mad_agent is callable from userspace,
> >>> so at least two userspace agents can race and get the same
> >>> TID’s.
> >> 
> >> My understanding is that every lookup is using the {port, TID}
> >> tuple. As such, it is not a bug, but, very confusing.  
> > Haakon, you are correct (see snippet from the IB spec, below).
> > 
> > We will NOT have a situation where there are 2 threads/apps
> > with the same agent ID on the *same port* (accessing the agent ID
> > allocator is protected by a per-port spinlock). Having the same
> > agent ID on DIFFERENT ports is OK.
> > Thus, there is NO bug here. (But as Haakon says, IMHO it is more
> > robust to avoid having the same agent ID for 2 agents even if those
> > agents are on different ports).
> >   
> >>   
> >>> TIDs need to be globally unique on the entire machine.
> >>   
> > Jason, that is not exactly correct.
> > 
> > From the IB Spec 1.3, C13-18.1.1 (in section 13.4.6.4 -
> > TransactionID usage):
> > "When initiating a new operation, MADHeader:TransactionID
> > shall be set to such a value that within that MAD the combination of
> > TID, SGID, and MgmtClass is different from that of any other
> > currently executing operation. If the MAD does not have a GRH, its
> > SLID is used in the combination in place of an SGID."
> > 
> > Since the SGID/SLID is different for each port, the per-port
> > guarantee of no 2 agents receiving the same agent-ID value is
> > sufficient.
> > 
> > -Jack  
> 
> Shall I interpret this silence as my commit is good to go or that I
> should add Jack’s tangible information to the commit message?
> 
> 
Haakon, I think Jason is on vacation this week. Best to wait for his
response.

-Jack

> Thxs, Håkon
> 



Re: [PATCH] IB/core: Make ib_mad_client_id atomic

2018-04-26 Thread jackm
On Thu, 26 Apr 2018 18:06:10 +0200
Håkon Bugge  wrote:

> > On 23 Apr 2018, at 21:16, jackm  wrote:
> > 
> > On Mon, 23 Apr 2018 16:19:57 +0200
> > Håkon Bugge  wrote:
> > 
> >   
> >>> 
> >>> This actually looks like a genuine bug, why is it described only
> >>> as 'confusing'? ib_register_mad_agent is callable from userspace,
> >>> so at least two userspace agents can race and get the same
> >>> TID’s.
> >> 
> >> My understanding is that every lookup is using the {port, TID}
> >> tuple. As such, it is not a bug, but, very confusing.  
> > Haakon, you are correct (see snippet from the IB spec, below).
> > 
> > We will NOT have a situation where there are 2 threads/apps
> > with the same agent ID on the *same port* (accessing the agent ID
> > allocator is protected by a per-port spinlock). Having the same
> > agent ID on DIFFERENT ports is OK.
> > Thus, there is NO bug here. (But as Haakon says, IMHO it is more
> > robust to avoid having the same agent ID for 2 agents even if those
> > agents are on different ports).
> >   
> >>   
> >>> TIDs need to be globally unique on the entire machine.
> >>   
> > Jason, that is not exactly correct.
> > 
> > From the IB Spec 1.3, C13-18.1.1 (in section 13.4.6.4 -
> > TransactionID usage):
> > "When initiating a new operation, MADHeader:TransactionID
> > shall be set to such a value that within that MAD the combination of
> > TID, SGID, and MgmtClass is different from that of any other
> > currently executing operation. If the MAD does not have a GRH, its
> > SLID is used in the combination in place of an SGID."
> > 
> > Since the SGID/SLID is different for each port, the per-port
> > guarantee of no 2 agents receiving the same agent-ID value is
> > sufficient.
> > 
> > -Jack  
> 
> Shall I interpret this silence as my commit is good to go or that I
> should add Jack’s tangible information to the commit message?
> 
> 
Haakon, I think Jason is on vacation this week. Best to wait for his
response.

-Jack

> Thxs, Håkon
> 



Re: [PATCH] IB/core: Make ib_mad_client_id atomic

2018-04-23 Thread jackm
On Mon, 23 Apr 2018 16:19:57 +0200
Håkon Bugge  wrote:


> > 
> > This actually looks like a genuine bug, why is it described only as
> > 'confusing'? ib_register_mad_agent is callable from userspace, so at
> > least two userspace agents can race and get the same TID’s.  
> 
> My understanding is that every lookup is using the {port, TID} tuple.
> As such, it is not a bug, but, very confusing.
Haakon, you are correct (see snippet from the IB spec, below).

We will NOT have a situation where there are 2 threads/apps
with the same agent ID on the *same port* (accessing the agent ID
allocator is protected by a per-port spinlock). Having the same agent ID
on DIFFERENT ports is OK.
Thus, there is NO bug here. (But as Haakon says, IMHO it is more robust
to avoid having the same agent ID for 2 agents even if those agents are
on different ports).

> 
> > TIDs need to be globally unique on the entire machine.  
> 
Jason, that is not exactly correct.

From the IB Spec 1.3, C13-18.1.1 (in section 13.4.6.4 - TransactionID
usage):
"When initiating a new operation, MADHeader:TransactionID
shall be set to such a value that within that MAD the combination of
TID, SGID, and MgmtClass is different from that of any other currently
executing operation. If the MAD does not have a GRH, its SLID is used
in the combination in place of an SGID."

Since the SGID/SLID is different for each port, the per-port guarantee
of no 2 agents receiving the same agent-ID value is sufficient.

-Jack




Re: [PATCH] IB/core: Make ib_mad_client_id atomic

2018-04-23 Thread jackm
On Mon, 23 Apr 2018 16:19:57 +0200
Håkon Bugge  wrote:


> > 
> > This actually looks like a genuine bug, why is it described only as
> > 'confusing'? ib_register_mad_agent is callable from userspace, so at
> > least two userspace agents can race and get the same TID’s.  
> 
> My understanding is that every lookup is using the {port, TID} tuple.
> As such, it is not a bug, but, very confusing.
Haakon, you are correct (see snippet from the IB spec, below).

We will NOT have a situation where there are 2 threads/apps
with the same agent ID on the *same port* (accessing the agent ID
allocator is protected by a per-port spinlock). Having the same agent ID
on DIFFERENT ports is OK.
Thus, there is NO bug here. (But as Haakon says, IMHO it is more robust
to avoid having the same agent ID for 2 agents even if those agents are
on different ports).

> 
> > TIDs need to be globally unique on the entire machine.  
> 
Jason, that is not exactly correct.

From the IB Spec 1.3, C13-18.1.1 (in section 13.4.6.4 - TransactionID
usage):
"When initiating a new operation, MADHeader:TransactionID
shall be set to such a value that within that MAD the combination of
TID, SGID, and MgmtClass is different from that of any other currently
executing operation. If the MAD does not have a GRH, its SLID is used
in the combination in place of an SGID."

Since the SGID/SLID is different for each port, the per-port guarantee
of no 2 agents receiving the same agent-ID value is sufficient.

-Jack