Re: [PATCH 3/7] RDMAVT: Fix synchronization around percpu_ref
On 3/6/2018 12:33 PM, Tejun Heo wrote: rvt_mregion uses percpu_ref for reference counting and RCU to protect accesses from lkey_table. When a rvt_mregion needs to be freed, it first gets unregistered from lkey_table and then rvt_check_refs() is called to wait for in-flight usages before the rvt_mregion is freed. rvt_check_refs() seems to have a couple issues. * It has a fast exit path which tests percpu_ref_is_zero(). However, a percpu_ref reading zero doesn't mean that the object can be released. In fact, the ->release() callback might not even have started executing yet. Proceeding with freeing can lead to use-after-free. * lkey_table is RCU protected but there is no RCU grace period in the free path. percpu_ref uses RCU internally but it's sched-RCU whose grace periods are different from regular RCU. Also, it generally isn't a good idea to depend on internal behaviors like this. To address the above issues, this patch removes the the fast exit and Typo above too many "the". adds an explicit synchronize_rcu(). Signed-off-by: Tejun HeoCc: Dennis Dalessandro Cc: Mike Marciniszyn Cc: linux-r...@vger.kernel.org Cc: Linus Torvalds --- Hello, Dennis, Mike. I don't know RDMA at all and this patch is only compile tested. Can you please take a careful look? Looks good to me, passes my basic sanity tests. Thanks! Acked-by: Dennis Dalessandro
Re: [PATCH 3/7] RDMAVT: Fix synchronization around percpu_ref
On 3/6/2018 12:33 PM, Tejun Heo wrote: rvt_mregion uses percpu_ref for reference counting and RCU to protect accesses from lkey_table. When a rvt_mregion needs to be freed, it first gets unregistered from lkey_table and then rvt_check_refs() is called to wait for in-flight usages before the rvt_mregion is freed. rvt_check_refs() seems to have a couple issues. * It has a fast exit path which tests percpu_ref_is_zero(). However, a percpu_ref reading zero doesn't mean that the object can be released. In fact, the ->release() callback might not even have started executing yet. Proceeding with freeing can lead to use-after-free. * lkey_table is RCU protected but there is no RCU grace period in the free path. percpu_ref uses RCU internally but it's sched-RCU whose grace periods are different from regular RCU. Also, it generally isn't a good idea to depend on internal behaviors like this. To address the above issues, this patch removes the the fast exit and Typo above too many "the". adds an explicit synchronize_rcu(). Signed-off-by: Tejun Heo Cc: Dennis Dalessandro Cc: Mike Marciniszyn Cc: linux-r...@vger.kernel.org Cc: Linus Torvalds --- Hello, Dennis, Mike. I don't know RDMA at all and this patch is only compile tested. Can you please take a careful look? Looks good to me, passes my basic sanity tests. Thanks! Acked-by: Dennis Dalessandro
[PATCH 3/7] RDMAVT: Fix synchronization around percpu_ref
rvt_mregion uses percpu_ref for reference counting and RCU to protect accesses from lkey_table. When a rvt_mregion needs to be freed, it first gets unregistered from lkey_table and then rvt_check_refs() is called to wait for in-flight usages before the rvt_mregion is freed. rvt_check_refs() seems to have a couple issues. * It has a fast exit path which tests percpu_ref_is_zero(). However, a percpu_ref reading zero doesn't mean that the object can be released. In fact, the ->release() callback might not even have started executing yet. Proceeding with freeing can lead to use-after-free. * lkey_table is RCU protected but there is no RCU grace period in the free path. percpu_ref uses RCU internally but it's sched-RCU whose grace periods are different from regular RCU. Also, it generally isn't a good idea to depend on internal behaviors like this. To address the above issues, this patch removes the the fast exit and adds an explicit synchronize_rcu(). Signed-off-by: Tejun HeoCc: Dennis Dalessandro Cc: Mike Marciniszyn Cc: linux-r...@vger.kernel.org Cc: Linus Torvalds --- Hello, Dennis, Mike. I don't know RDMA at all and this patch is only compile tested. Can you please take a careful look? Thanks. drivers/infiniband/sw/rdmavt/mr.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/sw/rdmavt/mr.c b/drivers/infiniband/sw/rdmavt/mr.c index 1b2e536..cc429b5 100644 --- a/drivers/infiniband/sw/rdmavt/mr.c +++ b/drivers/infiniband/sw/rdmavt/mr.c @@ -489,11 +489,13 @@ static int rvt_check_refs(struct rvt_mregion *mr, const char *t) unsigned long timeout; struct rvt_dev_info *rdi = ib_to_rvt(mr->pd->device); - if (percpu_ref_is_zero(>refcount)) - return 0; - /* avoid dma mr */ - if (mr->lkey) + if (mr->lkey) { + /* avoid dma mr */ rvt_dereg_clean_qps(mr); + /* @mr was indexed on rcu protected @lkey_table */ + synchronize_rcu(); + } + timeout = wait_for_completion_timeout(>comp, 5 * HZ); if (!timeout) { rvt_pr_err(rdi, -- 2.9.5
[PATCH 3/7] RDMAVT: Fix synchronization around percpu_ref
rvt_mregion uses percpu_ref for reference counting and RCU to protect accesses from lkey_table. When a rvt_mregion needs to be freed, it first gets unregistered from lkey_table and then rvt_check_refs() is called to wait for in-flight usages before the rvt_mregion is freed. rvt_check_refs() seems to have a couple issues. * It has a fast exit path which tests percpu_ref_is_zero(). However, a percpu_ref reading zero doesn't mean that the object can be released. In fact, the ->release() callback might not even have started executing yet. Proceeding with freeing can lead to use-after-free. * lkey_table is RCU protected but there is no RCU grace period in the free path. percpu_ref uses RCU internally but it's sched-RCU whose grace periods are different from regular RCU. Also, it generally isn't a good idea to depend on internal behaviors like this. To address the above issues, this patch removes the the fast exit and adds an explicit synchronize_rcu(). Signed-off-by: Tejun Heo Cc: Dennis Dalessandro Cc: Mike Marciniszyn Cc: linux-r...@vger.kernel.org Cc: Linus Torvalds --- Hello, Dennis, Mike. I don't know RDMA at all and this patch is only compile tested. Can you please take a careful look? Thanks. drivers/infiniband/sw/rdmavt/mr.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/sw/rdmavt/mr.c b/drivers/infiniband/sw/rdmavt/mr.c index 1b2e536..cc429b5 100644 --- a/drivers/infiniband/sw/rdmavt/mr.c +++ b/drivers/infiniband/sw/rdmavt/mr.c @@ -489,11 +489,13 @@ static int rvt_check_refs(struct rvt_mregion *mr, const char *t) unsigned long timeout; struct rvt_dev_info *rdi = ib_to_rvt(mr->pd->device); - if (percpu_ref_is_zero(>refcount)) - return 0; - /* avoid dma mr */ - if (mr->lkey) + if (mr->lkey) { + /* avoid dma mr */ rvt_dereg_clean_qps(mr); + /* @mr was indexed on rcu protected @lkey_table */ + synchronize_rcu(); + } + timeout = wait_for_completion_timeout(>comp, 5 * HZ); if (!timeout) { rvt_pr_err(rdi, -- 2.9.5