Re: [PATCH 3/8] RDMAVT: Fix synchronization around percpu_ref
On Wed, Mar 14, 2018 at 12:45:10PM -0700, Tejun Heo wrote: > rvt_mregion uses percpu_ref for reference counting and RCU to protect > accesses from lkey_table. When a rvt_mregion needs to be freed, it > first gets unregistered from lkey_table and then rvt_check_refs() is > called to wait for in-flight usages before the rvt_mregion is freed. > > rvt_check_refs() seems to have a couple issues. > > * It has a fast exit path which tests percpu_ref_is_zero(). However, > a percpu_ref reading zero doesn't mean that the object can be > released. In fact, the ->release() callback might not even have > started executing yet. Proceeding with freeing can lead to > use-after-free. > > * lkey_table is RCU protected but there is no RCU grace period in the > free path. percpu_ref uses RCU internally but it's sched-RCU whose > grace periods are different from regular RCU. Also, it generally > isn't a good idea to depend on internal behaviors like this. > > To address the above issues, this patch removes the fast exit and adds > an explicit synchronize_rcu(). > > Signed-off-by: Tejun Heo> Acked-by: Dennis Dalessandro > Cc: Mike Marciniszyn > Cc: linux-r...@vger.kernel.org > Cc: Linus Torvalds > drivers/infiniband/sw/rdmavt/mr.c | 10 ++ > 1 file changed, 6 insertions(+), 4 deletions(-) Applied to rdma for-next Thanks, Jason
Re: [PATCH 3/8] RDMAVT: Fix synchronization around percpu_ref
On Wed, Mar 14, 2018 at 12:45:10PM -0700, Tejun Heo wrote: > rvt_mregion uses percpu_ref for reference counting and RCU to protect > accesses from lkey_table. When a rvt_mregion needs to be freed, it > first gets unregistered from lkey_table and then rvt_check_refs() is > called to wait for in-flight usages before the rvt_mregion is freed. > > rvt_check_refs() seems to have a couple issues. > > * It has a fast exit path which tests percpu_ref_is_zero(). However, > a percpu_ref reading zero doesn't mean that the object can be > released. In fact, the ->release() callback might not even have > started executing yet. Proceeding with freeing can lead to > use-after-free. > > * lkey_table is RCU protected but there is no RCU grace period in the > free path. percpu_ref uses RCU internally but it's sched-RCU whose > grace periods are different from regular RCU. Also, it generally > isn't a good idea to depend on internal behaviors like this. > > To address the above issues, this patch removes the fast exit and adds > an explicit synchronize_rcu(). > > Signed-off-by: Tejun Heo > Acked-by: Dennis Dalessandro > Cc: Mike Marciniszyn > Cc: linux-r...@vger.kernel.org > Cc: Linus Torvalds > drivers/infiniband/sw/rdmavt/mr.c | 10 ++ > 1 file changed, 6 insertions(+), 4 deletions(-) Applied to rdma for-next Thanks, Jason
[PATCH 3/8] RDMAVT: Fix synchronization around percpu_ref
rvt_mregion uses percpu_ref for reference counting and RCU to protect accesses from lkey_table. When a rvt_mregion needs to be freed, it first gets unregistered from lkey_table and then rvt_check_refs() is called to wait for in-flight usages before the rvt_mregion is freed. rvt_check_refs() seems to have a couple issues. * It has a fast exit path which tests percpu_ref_is_zero(). However, a percpu_ref reading zero doesn't mean that the object can be released. In fact, the ->release() callback might not even have started executing yet. Proceeding with freeing can lead to use-after-free. * lkey_table is RCU protected but there is no RCU grace period in the free path. percpu_ref uses RCU internally but it's sched-RCU whose grace periods are different from regular RCU. Also, it generally isn't a good idea to depend on internal behaviors like this. To address the above issues, this patch removes the fast exit and adds an explicit synchronize_rcu(). Signed-off-by: Tejun HeoAcked-by: Dennis Dalessandro Cc: Mike Marciniszyn Cc: linux-r...@vger.kernel.org Cc: Linus Torvalds --- drivers/infiniband/sw/rdmavt/mr.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/sw/rdmavt/mr.c b/drivers/infiniband/sw/rdmavt/mr.c index 1b2e536..cc429b5 100644 --- a/drivers/infiniband/sw/rdmavt/mr.c +++ b/drivers/infiniband/sw/rdmavt/mr.c @@ -489,11 +489,13 @@ static int rvt_check_refs(struct rvt_mregion *mr, const char *t) unsigned long timeout; struct rvt_dev_info *rdi = ib_to_rvt(mr->pd->device); - if (percpu_ref_is_zero(>refcount)) - return 0; - /* avoid dma mr */ - if (mr->lkey) + if (mr->lkey) { + /* avoid dma mr */ rvt_dereg_clean_qps(mr); + /* @mr was indexed on rcu protected @lkey_table */ + synchronize_rcu(); + } + timeout = wait_for_completion_timeout(>comp, 5 * HZ); if (!timeout) { rvt_pr_err(rdi, -- 2.9.5
[PATCH 3/8] RDMAVT: Fix synchronization around percpu_ref
rvt_mregion uses percpu_ref for reference counting and RCU to protect accesses from lkey_table. When a rvt_mregion needs to be freed, it first gets unregistered from lkey_table and then rvt_check_refs() is called to wait for in-flight usages before the rvt_mregion is freed. rvt_check_refs() seems to have a couple issues. * It has a fast exit path which tests percpu_ref_is_zero(). However, a percpu_ref reading zero doesn't mean that the object can be released. In fact, the ->release() callback might not even have started executing yet. Proceeding with freeing can lead to use-after-free. * lkey_table is RCU protected but there is no RCU grace period in the free path. percpu_ref uses RCU internally but it's sched-RCU whose grace periods are different from regular RCU. Also, it generally isn't a good idea to depend on internal behaviors like this. To address the above issues, this patch removes the fast exit and adds an explicit synchronize_rcu(). Signed-off-by: Tejun Heo Acked-by: Dennis Dalessandro Cc: Mike Marciniszyn Cc: linux-r...@vger.kernel.org Cc: Linus Torvalds --- drivers/infiniband/sw/rdmavt/mr.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/sw/rdmavt/mr.c b/drivers/infiniband/sw/rdmavt/mr.c index 1b2e536..cc429b5 100644 --- a/drivers/infiniband/sw/rdmavt/mr.c +++ b/drivers/infiniband/sw/rdmavt/mr.c @@ -489,11 +489,13 @@ static int rvt_check_refs(struct rvt_mregion *mr, const char *t) unsigned long timeout; struct rvt_dev_info *rdi = ib_to_rvt(mr->pd->device); - if (percpu_ref_is_zero(>refcount)) - return 0; - /* avoid dma mr */ - if (mr->lkey) + if (mr->lkey) { + /* avoid dma mr */ rvt_dereg_clean_qps(mr); + /* @mr was indexed on rcu protected @lkey_table */ + synchronize_rcu(); + } + timeout = wait_for_completion_timeout(>comp, 5 * HZ); if (!timeout) { rvt_pr_err(rdi, -- 2.9.5