Re: [PATCH 3/7] RDMAVT: Fix synchronization around percpu_ref

2018-03-07 Thread Dennis Dalessandro

On 3/6/2018 12:33 PM, Tejun Heo wrote:

rvt_mregion uses percpu_ref for reference counting and RCU to protect
accesses from lkey_table.  When a rvt_mregion needs to be freed, it
first gets unregistered from lkey_table and then rvt_check_refs() is
called to wait for in-flight usages before the rvt_mregion is freed.

rvt_check_refs() seems to have a couple issues.

* It has a fast exit path which tests percpu_ref_is_zero().  However,
   a percpu_ref reading zero doesn't mean that the object can be
   released.  In fact, the ->release() callback might not even have
   started executing yet.  Proceeding with freeing can lead to
   use-after-free.

* lkey_table is RCU protected but there is no RCU grace period in the
   free path.  percpu_ref uses RCU internally but it's sched-RCU whose
   grace periods are different from regular RCU.  Also, it generally
   isn't a good idea to depend on internal behaviors like this.

To address the above issues, this patch removes the the fast exit and


Typo above too many "the".


adds an explicit synchronize_rcu().

Signed-off-by: Tejun Heo 
Cc: Dennis Dalessandro 
Cc: Mike Marciniszyn 
Cc: linux-r...@vger.kernel.org
Cc: Linus Torvalds 
---
Hello, Dennis, Mike.

I don't know RDMA at all and this patch is only compile tested.  Can
you please take a careful look?


Looks good to me, passes my basic sanity tests. Thanks!

Acked-by: Dennis Dalessandro 


Re: [PATCH 3/7] RDMAVT: Fix synchronization around percpu_ref

2018-03-07 Thread Dennis Dalessandro

On 3/6/2018 12:33 PM, Tejun Heo wrote:

rvt_mregion uses percpu_ref for reference counting and RCU to protect
accesses from lkey_table.  When a rvt_mregion needs to be freed, it
first gets unregistered from lkey_table and then rvt_check_refs() is
called to wait for in-flight usages before the rvt_mregion is freed.

rvt_check_refs() seems to have a couple issues.

* It has a fast exit path which tests percpu_ref_is_zero().  However,
   a percpu_ref reading zero doesn't mean that the object can be
   released.  In fact, the ->release() callback might not even have
   started executing yet.  Proceeding with freeing can lead to
   use-after-free.

* lkey_table is RCU protected but there is no RCU grace period in the
   free path.  percpu_ref uses RCU internally but it's sched-RCU whose
   grace periods are different from regular RCU.  Also, it generally
   isn't a good idea to depend on internal behaviors like this.

To address the above issues, this patch removes the the fast exit and


Typo above too many "the".


adds an explicit synchronize_rcu().

Signed-off-by: Tejun Heo 
Cc: Dennis Dalessandro 
Cc: Mike Marciniszyn 
Cc: linux-r...@vger.kernel.org
Cc: Linus Torvalds 
---
Hello, Dennis, Mike.

I don't know RDMA at all and this patch is only compile tested.  Can
you please take a careful look?


Looks good to me, passes my basic sanity tests. Thanks!

Acked-by: Dennis Dalessandro 


[PATCH 3/7] RDMAVT: Fix synchronization around percpu_ref

2018-03-06 Thread Tejun Heo
rvt_mregion uses percpu_ref for reference counting and RCU to protect
accesses from lkey_table.  When a rvt_mregion needs to be freed, it
first gets unregistered from lkey_table and then rvt_check_refs() is
called to wait for in-flight usages before the rvt_mregion is freed.

rvt_check_refs() seems to have a couple issues.

* It has a fast exit path which tests percpu_ref_is_zero().  However,
  a percpu_ref reading zero doesn't mean that the object can be
  released.  In fact, the ->release() callback might not even have
  started executing yet.  Proceeding with freeing can lead to
  use-after-free.

* lkey_table is RCU protected but there is no RCU grace period in the
  free path.  percpu_ref uses RCU internally but it's sched-RCU whose
  grace periods are different from regular RCU.  Also, it generally
  isn't a good idea to depend on internal behaviors like this.

To address the above issues, this patch removes the the fast exit and
adds an explicit synchronize_rcu().

Signed-off-by: Tejun Heo 
Cc: Dennis Dalessandro 
Cc: Mike Marciniszyn 
Cc: linux-r...@vger.kernel.org
Cc: Linus Torvalds 
---
Hello, Dennis, Mike.

I don't know RDMA at all and this patch is only compile tested.  Can
you please take a careful look?

Thanks.

 drivers/infiniband/sw/rdmavt/mr.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/sw/rdmavt/mr.c 
b/drivers/infiniband/sw/rdmavt/mr.c
index 1b2e536..cc429b5 100644
--- a/drivers/infiniband/sw/rdmavt/mr.c
+++ b/drivers/infiniband/sw/rdmavt/mr.c
@@ -489,11 +489,13 @@ static int rvt_check_refs(struct rvt_mregion *mr, const 
char *t)
unsigned long timeout;
struct rvt_dev_info *rdi = ib_to_rvt(mr->pd->device);
 
-   if (percpu_ref_is_zero(>refcount))
-   return 0;
-   /* avoid dma mr */
-   if (mr->lkey)
+   if (mr->lkey) {
+   /* avoid dma mr */
rvt_dereg_clean_qps(mr);
+   /* @mr was indexed on rcu protected @lkey_table */
+   synchronize_rcu();
+   }
+
timeout = wait_for_completion_timeout(>comp, 5 * HZ);
if (!timeout) {
rvt_pr_err(rdi,
-- 
2.9.5



[PATCH 3/7] RDMAVT: Fix synchronization around percpu_ref

2018-03-06 Thread Tejun Heo
rvt_mregion uses percpu_ref for reference counting and RCU to protect
accesses from lkey_table.  When a rvt_mregion needs to be freed, it
first gets unregistered from lkey_table and then rvt_check_refs() is
called to wait for in-flight usages before the rvt_mregion is freed.

rvt_check_refs() seems to have a couple issues.

* It has a fast exit path which tests percpu_ref_is_zero().  However,
  a percpu_ref reading zero doesn't mean that the object can be
  released.  In fact, the ->release() callback might not even have
  started executing yet.  Proceeding with freeing can lead to
  use-after-free.

* lkey_table is RCU protected but there is no RCU grace period in the
  free path.  percpu_ref uses RCU internally but it's sched-RCU whose
  grace periods are different from regular RCU.  Also, it generally
  isn't a good idea to depend on internal behaviors like this.

To address the above issues, this patch removes the the fast exit and
adds an explicit synchronize_rcu().

Signed-off-by: Tejun Heo 
Cc: Dennis Dalessandro 
Cc: Mike Marciniszyn 
Cc: linux-r...@vger.kernel.org
Cc: Linus Torvalds 
---
Hello, Dennis, Mike.

I don't know RDMA at all and this patch is only compile tested.  Can
you please take a careful look?

Thanks.

 drivers/infiniband/sw/rdmavt/mr.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/sw/rdmavt/mr.c 
b/drivers/infiniband/sw/rdmavt/mr.c
index 1b2e536..cc429b5 100644
--- a/drivers/infiniband/sw/rdmavt/mr.c
+++ b/drivers/infiniband/sw/rdmavt/mr.c
@@ -489,11 +489,13 @@ static int rvt_check_refs(struct rvt_mregion *mr, const 
char *t)
unsigned long timeout;
struct rvt_dev_info *rdi = ib_to_rvt(mr->pd->device);
 
-   if (percpu_ref_is_zero(>refcount))
-   return 0;
-   /* avoid dma mr */
-   if (mr->lkey)
+   if (mr->lkey) {
+   /* avoid dma mr */
rvt_dereg_clean_qps(mr);
+   /* @mr was indexed on rcu protected @lkey_table */
+   synchronize_rcu();
+   }
+
timeout = wait_for_completion_timeout(>comp, 5 * HZ);
if (!timeout) {
rvt_pr_err(rdi,
-- 
2.9.5