RE: [PATCH for-next v3 1/7] RDMA/rxe: Convert triple tasklets to use workqueue
On Thu, Dec 29, 2022 1:56 AM Bob Pearson wrote: > > On 12/23/22 00:51, Daisuke Matsuda wrote: > > In order to implement On-Demand Paging on the rxe driver, triple tasklets > > (requester, responder, and completer) must be allowed to sleep so that they > > can trigger page fault when pages being accessed are not present. > > > > This patch replaces the tasklets with a workqueue, but still allows direct- > > call of works from softirq context if it is obvious that MRs are not going > > to be accessed and there is no work being processed in the workqueue. > > There are already at least two patch sets that do this waiting to get > upstream. > Bob I wrote my intention at the first part of the cover letter. Cf. https://lore.kernel.org/lkml/cover.1671772917.git.matsuda-dais...@fujitsu.com/ Your patch set introduces a soft lockup issue. It would take much more time to find the root cause than to simply convert the tasklets to a workqueue with this patch. My ODP patches have been stuck for almost 4 months because of this issue, and I am not willing to wait any longer. Daisuke > > > > > As counterparts to tasklet_disable() and tasklet_enable() are missing for > > workqueues, an atomic value is introduced to prevent work items from being > > scheduled while qp reset is in progress. > > > > The way to initialize/destroy workqueue is picked up from the > > implementation of Ian Ziemba and Bob Pearson at HPE. > > > > Link: > > https://lore.kernel.org/all/20221018043345.4033-1-rpearson...@gmail.com/ > > Signed-off-by: Daisuke Matsuda > > --- > > drivers/infiniband/sw/rxe/rxe.c | 9 - > > drivers/infiniband/sw/rxe/rxe_comp.c | 2 +- > > drivers/infiniband/sw/rxe/rxe_param.h | 2 +- > > drivers/infiniband/sw/rxe/rxe_qp.c| 2 +- > > drivers/infiniband/sw/rxe/rxe_req.c | 2 +- > > drivers/infiniband/sw/rxe/rxe_resp.c | 2 +- > > drivers/infiniband/sw/rxe/rxe_task.c | 52 --- > > drivers/infiniband/sw/rxe/rxe_task.h | 15 ++-- > > 8 files changed, 65 insertions(+), 21 deletions(-) > > > > diff --git a/drivers/infiniband/sw/rxe/rxe.c > > b/drivers/infiniband/sw/rxe/rxe.c > > index 136c2efe3466..3c7e42e5b0c7 100644 > > --- a/drivers/infiniband/sw/rxe/rxe.c > > +++ b/drivers/infiniband/sw/rxe/rxe.c > > @@ -210,10 +210,16 @@ static int __init rxe_module_init(void) > > { > > int err; > > > > - err = rxe_net_init(); > > + err = rxe_alloc_wq(); > > if (err) > > return err; > > > > + err = rxe_net_init(); > > + if (err) { > > + rxe_destroy_wq(); > > + return err; > > + } > > + > > rdma_link_register(_link_ops); > > pr_info("loaded\n"); > > return 0; > > @@ -224,6 +230,7 @@ static void __exit rxe_module_exit(void) > > rdma_link_unregister(_link_ops); > > ib_unregister_driver(RDMA_DRIVER_RXE); > > rxe_net_exit(); > > + rxe_destroy_wq(); > > > > pr_info("unloaded\n"); > > } > > diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c > > b/drivers/infiniband/sw/rxe/rxe_comp.c > > index 20737fec392b..046bbacce37c 100644 > > --- a/drivers/infiniband/sw/rxe/rxe_comp.c > > +++ b/drivers/infiniband/sw/rxe/rxe_comp.c > > @@ -773,7 +773,7 @@ int rxe_completer(void *arg) > > } > > > > /* A non-zero return value will cause rxe_do_task to > > -* exit its loop and end the tasklet. A zero return > > +* exit its loop and end the work item. A zero return > > * will continue looping and return to rxe_completer > > */ > > done: > > diff --git a/drivers/infiniband/sw/rxe/rxe_param.h > > b/drivers/infiniband/sw/rxe/rxe_param.h > > index a754fc902e3d..bd8050e99d6b 100644 > > --- a/drivers/infiniband/sw/rxe/rxe_param.h > > +++ b/drivers/infiniband/sw/rxe/rxe_param.h > > @@ -112,7 +112,7 @@ enum rxe_device_param { > > RXE_INFLIGHT_SKBS_PER_QP_HIGH = 64, > > RXE_INFLIGHT_SKBS_PER_QP_LOW= 16, > > > > - /* Max number of interations of each tasklet > > + /* Max number of interations of each work item > > * before yielding the cpu to let other > > * work make progress > > */ > > diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c > > b/drivers/infiniband/sw/rxe/rxe_qp.c > > index ab72db68b58f..e033b2449dfe 100644 > > --- a/drivers/infiniband/sw/rxe/rxe_qp.c > > +++ b/drivers/infiniband/sw/rxe/rxe_qp.c > > @@ -471,7 +471,7 @@ int rxe_qp_chk_attr(struct rxe_dev *rxe, struct rxe_qp > > *qp, > > /* move the qp to the reset state */ > > static void rxe_qp_reset(struct rxe_qp *qp) > > { > > - /* stop tasks from running */ > > + /* flush workqueue and stop tasks from running */ > > rxe_disable_task(>resp.task); > > > > /* stop request/comp */ > > diff --git a/drivers/infiniband/sw/rxe/rxe_req.c > > b/drivers/infiniband/sw/rxe/rxe_req.c > > index 899c8779f800..2bcd287a2c3b 100644 > > --- a/drivers/infiniband/sw/rxe/rxe_req.c > > +++ b/drivers/infiniband/sw/rxe/rxe_req.c > > @@ -830,7 +830,7 @@ int rxe_requester(void *arg) > > update_state(qp, ); >
Re: [PATCH for-next v3 1/7] RDMA/rxe: Convert triple tasklets to use workqueue
On Wed, Dec 28, 2022 at 10:56:11AM -0600, Bob Pearson wrote: > On 12/23/22 00:51, Daisuke Matsuda wrote: > > In order to implement On-Demand Paging on the rxe driver, triple tasklets > > (requester, responder, and completer) must be allowed to sleep so that they > > can trigger page fault when pages being accessed are not present. > > > > This patch replaces the tasklets with a workqueue, but still allows direct- > > call of works from softirq context if it is obvious that MRs are not going > > to be accessed and there is no work being processed in the workqueue. > > There are already at least two patch sets that do this waiting to get upstream I don't see any unhandled RXE series, except of this one patch [1], which is one out larger series. [1] https://patchwork.kernel.org/project/linux-rdma/patch/20221029031331.64518-1-rpearson...@gmail.com/ > Bob
Re: [PATCH for-next v3 1/7] RDMA/rxe: Convert triple tasklets to use workqueue
On Thu, Dec 29, 2022 at 12:56 AM Bob Pearson wrote: > > On 12/23/22 00:51, Daisuke Matsuda wrote: > > In order to implement On-Demand Paging on the rxe driver, triple tasklets > > (requester, responder, and completer) must be allowed to sleep so that they > > can trigger page fault when pages being accessed are not present. > > > > This patch replaces the tasklets with a workqueue, but still allows direct- > > call of works from softirq context if it is obvious that MRs are not going > > to be accessed and there is no work being processed in the workqueue. > > There are already at least two patch sets that do this waiting to get > upstream. RXE accepted several patch sets just now. So it needs time to make tests and check bugs. Zhu Yanjun > Bob > > > > > As counterparts to tasklet_disable() and tasklet_enable() are missing for > > workqueues, an atomic value is introduced to prevent work items from being > > scheduled while qp reset is in progress. > > > > The way to initialize/destroy workqueue is picked up from the > > implementation of Ian Ziemba and Bob Pearson at HPE. > > > > Link: > > https://lore.kernel.org/all/20221018043345.4033-1-rpearson...@gmail.com/ > > Signed-off-by: Daisuke Matsuda > > --- > > drivers/infiniband/sw/rxe/rxe.c | 9 - > > drivers/infiniband/sw/rxe/rxe_comp.c | 2 +- > > drivers/infiniband/sw/rxe/rxe_param.h | 2 +- > > drivers/infiniband/sw/rxe/rxe_qp.c| 2 +- > > drivers/infiniband/sw/rxe/rxe_req.c | 2 +- > > drivers/infiniband/sw/rxe/rxe_resp.c | 2 +- > > drivers/infiniband/sw/rxe/rxe_task.c | 52 --- > > drivers/infiniband/sw/rxe/rxe_task.h | 15 ++-- > > 8 files changed, 65 insertions(+), 21 deletions(-) > > > > diff --git a/drivers/infiniband/sw/rxe/rxe.c > > b/drivers/infiniband/sw/rxe/rxe.c > > index 136c2efe3466..3c7e42e5b0c7 100644 > > --- a/drivers/infiniband/sw/rxe/rxe.c > > +++ b/drivers/infiniband/sw/rxe/rxe.c > > @@ -210,10 +210,16 @@ static int __init rxe_module_init(void) > > { > > int err; > > > > - err = rxe_net_init(); > > + err = rxe_alloc_wq(); > > if (err) > > return err; > > > > + err = rxe_net_init(); > > + if (err) { > > + rxe_destroy_wq(); > > + return err; > > + } > > + > > rdma_link_register(_link_ops); > > pr_info("loaded\n"); > > return 0; > > @@ -224,6 +230,7 @@ static void __exit rxe_module_exit(void) > > rdma_link_unregister(_link_ops); > > ib_unregister_driver(RDMA_DRIVER_RXE); > > rxe_net_exit(); > > + rxe_destroy_wq(); > > > > pr_info("unloaded\n"); > > } > > diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c > > b/drivers/infiniband/sw/rxe/rxe_comp.c > > index 20737fec392b..046bbacce37c 100644 > > --- a/drivers/infiniband/sw/rxe/rxe_comp.c > > +++ b/drivers/infiniband/sw/rxe/rxe_comp.c > > @@ -773,7 +773,7 @@ int rxe_completer(void *arg) > > } > > > > /* A non-zero return value will cause rxe_do_task to > > - * exit its loop and end the tasklet. A zero return > > + * exit its loop and end the work item. A zero return > >* will continue looping and return to rxe_completer > >*/ > > done: > > diff --git a/drivers/infiniband/sw/rxe/rxe_param.h > > b/drivers/infiniband/sw/rxe/rxe_param.h > > index a754fc902e3d..bd8050e99d6b 100644 > > --- a/drivers/infiniband/sw/rxe/rxe_param.h > > +++ b/drivers/infiniband/sw/rxe/rxe_param.h > > @@ -112,7 +112,7 @@ enum rxe_device_param { > > RXE_INFLIGHT_SKBS_PER_QP_HIGH = 64, > > RXE_INFLIGHT_SKBS_PER_QP_LOW= 16, > > > > - /* Max number of interations of each tasklet > > + /* Max number of interations of each work item > >* before yielding the cpu to let other > >* work make progress > >*/ > > diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c > > b/drivers/infiniband/sw/rxe/rxe_qp.c > > index ab72db68b58f..e033b2449dfe 100644 > > --- a/drivers/infiniband/sw/rxe/rxe_qp.c > > +++ b/drivers/infiniband/sw/rxe/rxe_qp.c > > @@ -471,7 +471,7 @@ int rxe_qp_chk_attr(struct rxe_dev *rxe, struct rxe_qp > > *qp, > > /* move the qp to the reset state */ > > static void rxe_qp_reset(struct rxe_qp *qp) > > { > > - /* stop tasks from running */ > > + /* flush workqueue and stop tasks from running */ > > rxe_disable_task(>resp.task); > > > > /* stop request/comp */ > > diff --git a/drivers/infiniband/sw/rxe/rxe_req.c > > b/drivers/infiniband/sw/rxe/rxe_req.c > > index 899c8779f800..2bcd287a2c3b 100644 > > --- a/drivers/infiniband/sw/rxe/rxe_req.c > > +++ b/drivers/infiniband/sw/rxe/rxe_req.c > > @@ -830,7 +830,7 @@ int rxe_requester(void *arg) > > update_state(qp, ); > > > > /* A non-zero return value will cause rxe_do_task to > > - * exit its loop and end the tasklet. A zero return > > + * exit its loop and end the work item. A zero return > >* will continue looping and return to rxe_requester > >
Re: [PATCH for-next v3 1/7] RDMA/rxe: Convert triple tasklets to use workqueue
On 12/23/22 00:51, Daisuke Matsuda wrote: > In order to implement On-Demand Paging on the rxe driver, triple tasklets > (requester, responder, and completer) must be allowed to sleep so that they > can trigger page fault when pages being accessed are not present. > > This patch replaces the tasklets with a workqueue, but still allows direct- > call of works from softirq context if it is obvious that MRs are not going > to be accessed and there is no work being processed in the workqueue. There are already at least two patch sets that do this waiting to get upstream. Bob > > As counterparts to tasklet_disable() and tasklet_enable() are missing for > workqueues, an atomic value is introduced to prevent work items from being > scheduled while qp reset is in progress. > > The way to initialize/destroy workqueue is picked up from the > implementation of Ian Ziemba and Bob Pearson at HPE. > > Link: https://lore.kernel.org/all/20221018043345.4033-1-rpearson...@gmail.com/ > Signed-off-by: Daisuke Matsuda > --- > drivers/infiniband/sw/rxe/rxe.c | 9 - > drivers/infiniband/sw/rxe/rxe_comp.c | 2 +- > drivers/infiniband/sw/rxe/rxe_param.h | 2 +- > drivers/infiniband/sw/rxe/rxe_qp.c| 2 +- > drivers/infiniband/sw/rxe/rxe_req.c | 2 +- > drivers/infiniband/sw/rxe/rxe_resp.c | 2 +- > drivers/infiniband/sw/rxe/rxe_task.c | 52 --- > drivers/infiniband/sw/rxe/rxe_task.h | 15 ++-- > 8 files changed, 65 insertions(+), 21 deletions(-) > > diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c > index 136c2efe3466..3c7e42e5b0c7 100644 > --- a/drivers/infiniband/sw/rxe/rxe.c > +++ b/drivers/infiniband/sw/rxe/rxe.c > @@ -210,10 +210,16 @@ static int __init rxe_module_init(void) > { > int err; > > - err = rxe_net_init(); > + err = rxe_alloc_wq(); > if (err) > return err; > > + err = rxe_net_init(); > + if (err) { > + rxe_destroy_wq(); > + return err; > + } > + > rdma_link_register(_link_ops); > pr_info("loaded\n"); > return 0; > @@ -224,6 +230,7 @@ static void __exit rxe_module_exit(void) > rdma_link_unregister(_link_ops); > ib_unregister_driver(RDMA_DRIVER_RXE); > rxe_net_exit(); > + rxe_destroy_wq(); > > pr_info("unloaded\n"); > } > diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c > b/drivers/infiniband/sw/rxe/rxe_comp.c > index 20737fec392b..046bbacce37c 100644 > --- a/drivers/infiniband/sw/rxe/rxe_comp.c > +++ b/drivers/infiniband/sw/rxe/rxe_comp.c > @@ -773,7 +773,7 @@ int rxe_completer(void *arg) > } > > /* A non-zero return value will cause rxe_do_task to > - * exit its loop and end the tasklet. A zero return > + * exit its loop and end the work item. A zero return >* will continue looping and return to rxe_completer >*/ > done: > diff --git a/drivers/infiniband/sw/rxe/rxe_param.h > b/drivers/infiniband/sw/rxe/rxe_param.h > index a754fc902e3d..bd8050e99d6b 100644 > --- a/drivers/infiniband/sw/rxe/rxe_param.h > +++ b/drivers/infiniband/sw/rxe/rxe_param.h > @@ -112,7 +112,7 @@ enum rxe_device_param { > RXE_INFLIGHT_SKBS_PER_QP_HIGH = 64, > RXE_INFLIGHT_SKBS_PER_QP_LOW= 16, > > - /* Max number of interations of each tasklet > + /* Max number of interations of each work item >* before yielding the cpu to let other >* work make progress >*/ > diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c > b/drivers/infiniband/sw/rxe/rxe_qp.c > index ab72db68b58f..e033b2449dfe 100644 > --- a/drivers/infiniband/sw/rxe/rxe_qp.c > +++ b/drivers/infiniband/sw/rxe/rxe_qp.c > @@ -471,7 +471,7 @@ int rxe_qp_chk_attr(struct rxe_dev *rxe, struct rxe_qp > *qp, > /* move the qp to the reset state */ > static void rxe_qp_reset(struct rxe_qp *qp) > { > - /* stop tasks from running */ > + /* flush workqueue and stop tasks from running */ > rxe_disable_task(>resp.task); > > /* stop request/comp */ > diff --git a/drivers/infiniband/sw/rxe/rxe_req.c > b/drivers/infiniband/sw/rxe/rxe_req.c > index 899c8779f800..2bcd287a2c3b 100644 > --- a/drivers/infiniband/sw/rxe/rxe_req.c > +++ b/drivers/infiniband/sw/rxe/rxe_req.c > @@ -830,7 +830,7 @@ int rxe_requester(void *arg) > update_state(qp, ); > > /* A non-zero return value will cause rxe_do_task to > - * exit its loop and end the tasklet. A zero return > + * exit its loop and end the work item. A zero return >* will continue looping and return to rxe_requester >*/ > done: > diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c > b/drivers/infiniband/sw/rxe/rxe_resp.c > index c74972244f08..d9134a00a529 100644 > --- a/drivers/infiniband/sw/rxe/rxe_resp.c > +++ b/drivers/infiniband/sw/rxe/rxe_resp.c > @@ -1691,7 +1691,7 @@ int rxe_responder(void *arg) > } > > /* A non-zero return value will cause rxe_do_task to > - * exit its
Re: [PATCH for-next v3 1/7] RDMA/rxe: Convert triple tasklets to use workqueue
On 23 Dec 2022 15:51:52 +0900 Daisuke Matsuda > @@ -137,15 +153,27 @@ void rxe_sched_task(struct rxe_task *task) > if (task->destroyed) > return; > > - tasklet_schedule(>tasklet); > + /* > + * busy-loop while qp reset is in progress. > + * This may be called from softirq context and thus cannot sleep. > + */ > + while (atomic_read(>suspended)) > + cpu_relax(); > + > + queue_work(task->workq, >work); > } This busy wait particularly in softirq barely makes sense given the flush_workqueue() below. > > void rxe_disable_task(struct rxe_task *task) > { > - tasklet_disable(>tasklet); > + /* Alternative to tasklet_disable() */ > + atomic_inc(>suspended); > + smp_mb__after_atomic(); > + flush_workqueue(task->workq); > } > > void rxe_enable_task(struct rxe_task *task) > { > - tasklet_enable(>tasklet); > + /* Alternative to tasklet_enable() */ > + smp_mb__before_atomic(); > + atomic_dec(>suspended); > } Feel free to add one-line comment for why smp_mb is needed in both cases.