RE: [PATCH for-next v3 1/7] RDMA/rxe: Convert triple tasklets to use workqueue

2023-01-05 Thread Daisuke Matsuda (Fujitsu)
On Thu, Dec 29, 2022 1:56 AM Bob Pearson wrote:
> 
> On 12/23/22 00:51, Daisuke Matsuda wrote:
> > In order to implement On-Demand Paging on the rxe driver, triple tasklets
> > (requester, responder, and completer) must be allowed to sleep so that they
> > can trigger page fault when pages being accessed are not present.
> >
> > This patch replaces the tasklets with a workqueue, but still allows direct-
> > call of works from softirq context if it is obvious that MRs are not going
> > to be accessed and there is no work being processed in the workqueue.
> 
> There are already at least two patch sets that do this waiting to get 
> upstream.
> Bob

I wrote my intention at the first part of the cover letter.
Cf. 
https://lore.kernel.org/lkml/cover.1671772917.git.matsuda-dais...@fujitsu.com/

Your patch set introduces a soft lockup issue. It would take much more time to 
find
the root cause than to simply convert the tasklets to a workqueue with this 
patch.
My ODP patches have been stuck for almost 4 months because of this issue, and I 
am
not willing to wait any longer.

Daisuke

> 
> >
> > As counterparts to tasklet_disable() and tasklet_enable() are missing for
> > workqueues, an atomic value is introduced to prevent work items from being
> > scheduled while qp reset is in progress.
> >
> > The way to initialize/destroy workqueue is picked up from the
> > implementation of Ian Ziemba and Bob Pearson at HPE.
> >
> > Link: 
> > https://lore.kernel.org/all/20221018043345.4033-1-rpearson...@gmail.com/
> > Signed-off-by: Daisuke Matsuda 
> > ---
> >  drivers/infiniband/sw/rxe/rxe.c   |  9 -
> >  drivers/infiniband/sw/rxe/rxe_comp.c  |  2 +-
> >  drivers/infiniband/sw/rxe/rxe_param.h |  2 +-
> >  drivers/infiniband/sw/rxe/rxe_qp.c|  2 +-
> >  drivers/infiniband/sw/rxe/rxe_req.c   |  2 +-
> >  drivers/infiniband/sw/rxe/rxe_resp.c  |  2 +-
> >  drivers/infiniband/sw/rxe/rxe_task.c  | 52 ---
> >  drivers/infiniband/sw/rxe/rxe_task.h  | 15 ++--
> >  8 files changed, 65 insertions(+), 21 deletions(-)
> >
> > diff --git a/drivers/infiniband/sw/rxe/rxe.c 
> > b/drivers/infiniband/sw/rxe/rxe.c
> > index 136c2efe3466..3c7e42e5b0c7 100644
> > --- a/drivers/infiniband/sw/rxe/rxe.c
> > +++ b/drivers/infiniband/sw/rxe/rxe.c
> > @@ -210,10 +210,16 @@ static int __init rxe_module_init(void)
> >  {
> > int err;
> >
> > -   err = rxe_net_init();
> > +   err = rxe_alloc_wq();
> > if (err)
> > return err;
> >
> > +   err = rxe_net_init();
> > +   if (err) {
> > +   rxe_destroy_wq();
> > +   return err;
> > +   }
> > +
> > rdma_link_register(_link_ops);
> > pr_info("loaded\n");
> > return 0;
> > @@ -224,6 +230,7 @@ static void __exit rxe_module_exit(void)
> > rdma_link_unregister(_link_ops);
> > ib_unregister_driver(RDMA_DRIVER_RXE);
> > rxe_net_exit();
> > +   rxe_destroy_wq();
> >
> > pr_info("unloaded\n");
> >  }
> > diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c 
> > b/drivers/infiniband/sw/rxe/rxe_comp.c
> > index 20737fec392b..046bbacce37c 100644
> > --- a/drivers/infiniband/sw/rxe/rxe_comp.c
> > +++ b/drivers/infiniband/sw/rxe/rxe_comp.c
> > @@ -773,7 +773,7 @@ int rxe_completer(void *arg)
> > }
> >
> > /* A non-zero return value will cause rxe_do_task to
> > -* exit its loop and end the tasklet. A zero return
> > +* exit its loop and end the work item. A zero return
> >  * will continue looping and return to rxe_completer
> >  */
> >  done:
> > diff --git a/drivers/infiniband/sw/rxe/rxe_param.h 
> > b/drivers/infiniband/sw/rxe/rxe_param.h
> > index a754fc902e3d..bd8050e99d6b 100644
> > --- a/drivers/infiniband/sw/rxe/rxe_param.h
> > +++ b/drivers/infiniband/sw/rxe/rxe_param.h
> > @@ -112,7 +112,7 @@ enum rxe_device_param {
> > RXE_INFLIGHT_SKBS_PER_QP_HIGH   = 64,
> > RXE_INFLIGHT_SKBS_PER_QP_LOW= 16,
> >
> > -   /* Max number of interations of each tasklet
> > +   /* Max number of interations of each work item
> >  * before yielding the cpu to let other
> >  * work make progress
> >  */
> > diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c 
> > b/drivers/infiniband/sw/rxe/rxe_qp.c
> > index ab72db68b58f..e033b2449dfe 100644
> > --- a/drivers/infiniband/sw/rxe/rxe_qp.c
> > +++ b/drivers/infiniband/sw/rxe/rxe_qp.c
> > @@ -471,7 +471,7 @@ int rxe_qp_chk_attr(struct rxe_dev *rxe, struct rxe_qp 
> > *qp,
> >  /* move the qp to the reset state */
> >  static void rxe_qp_reset(struct rxe_qp *qp)
> >  {
> > -   /* stop tasks from running */
> > +   /* flush workqueue and stop tasks from running */
> > rxe_disable_task(>resp.task);
> >
> > /* stop request/comp */
> > diff --git a/drivers/infiniband/sw/rxe/rxe_req.c 
> > b/drivers/infiniband/sw/rxe/rxe_req.c
> > index 899c8779f800..2bcd287a2c3b 100644
> > --- a/drivers/infiniband/sw/rxe/rxe_req.c
> > +++ b/drivers/infiniband/sw/rxe/rxe_req.c
> > @@ -830,7 +830,7 @@ int rxe_requester(void *arg)
> > update_state(qp, );
> 

Re: [PATCH for-next v3 1/7] RDMA/rxe: Convert triple tasklets to use workqueue

2022-12-28 Thread Leon Romanovsky
On Wed, Dec 28, 2022 at 10:56:11AM -0600, Bob Pearson wrote:
> On 12/23/22 00:51, Daisuke Matsuda wrote:
> > In order to implement On-Demand Paging on the rxe driver, triple tasklets
> > (requester, responder, and completer) must be allowed to sleep so that they
> > can trigger page fault when pages being accessed are not present.
> > 
> > This patch replaces the tasklets with a workqueue, but still allows direct-
> > call of works from softirq context if it is obvious that MRs are not going
> > to be accessed and there is no work being processed in the workqueue.
> 
> There are already at least two patch sets that do this waiting to get upstream

I don't see any unhandled RXE series, except of this one patch [1],
which is one out larger series.

[1] 
https://patchwork.kernel.org/project/linux-rdma/patch/20221029031331.64518-1-rpearson...@gmail.com/

> Bob



Re: [PATCH for-next v3 1/7] RDMA/rxe: Convert triple tasklets to use workqueue

2022-12-28 Thread Zhu Yanjun
On Thu, Dec 29, 2022 at 12:56 AM Bob Pearson  wrote:
>
> On 12/23/22 00:51, Daisuke Matsuda wrote:
> > In order to implement On-Demand Paging on the rxe driver, triple tasklets
> > (requester, responder, and completer) must be allowed to sleep so that they
> > can trigger page fault when pages being accessed are not present.
> >
> > This patch replaces the tasklets with a workqueue, but still allows direct-
> > call of works from softirq context if it is obvious that MRs are not going
> > to be accessed and there is no work being processed in the workqueue.
>
> There are already at least two patch sets that do this waiting to get 
> upstream.

RXE accepted several patch sets just now. So it needs time to make
tests and check bugs.

Zhu Yanjun

> Bob
>
> >
> > As counterparts to tasklet_disable() and tasklet_enable() are missing for
> > workqueues, an atomic value is introduced to prevent work items from being
> > scheduled while qp reset is in progress.
> >
> > The way to initialize/destroy workqueue is picked up from the
> > implementation of Ian Ziemba and Bob Pearson at HPE.
> >
> > Link: 
> > https://lore.kernel.org/all/20221018043345.4033-1-rpearson...@gmail.com/
> > Signed-off-by: Daisuke Matsuda 
> > ---
> >  drivers/infiniband/sw/rxe/rxe.c   |  9 -
> >  drivers/infiniband/sw/rxe/rxe_comp.c  |  2 +-
> >  drivers/infiniband/sw/rxe/rxe_param.h |  2 +-
> >  drivers/infiniband/sw/rxe/rxe_qp.c|  2 +-
> >  drivers/infiniband/sw/rxe/rxe_req.c   |  2 +-
> >  drivers/infiniband/sw/rxe/rxe_resp.c  |  2 +-
> >  drivers/infiniband/sw/rxe/rxe_task.c  | 52 ---
> >  drivers/infiniband/sw/rxe/rxe_task.h  | 15 ++--
> >  8 files changed, 65 insertions(+), 21 deletions(-)
> >
> > diff --git a/drivers/infiniband/sw/rxe/rxe.c 
> > b/drivers/infiniband/sw/rxe/rxe.c
> > index 136c2efe3466..3c7e42e5b0c7 100644
> > --- a/drivers/infiniband/sw/rxe/rxe.c
> > +++ b/drivers/infiniband/sw/rxe/rxe.c
> > @@ -210,10 +210,16 @@ static int __init rxe_module_init(void)
> >  {
> >   int err;
> >
> > - err = rxe_net_init();
> > + err = rxe_alloc_wq();
> >   if (err)
> >   return err;
> >
> > + err = rxe_net_init();
> > + if (err) {
> > + rxe_destroy_wq();
> > + return err;
> > + }
> > +
> >   rdma_link_register(_link_ops);
> >   pr_info("loaded\n");
> >   return 0;
> > @@ -224,6 +230,7 @@ static void __exit rxe_module_exit(void)
> >   rdma_link_unregister(_link_ops);
> >   ib_unregister_driver(RDMA_DRIVER_RXE);
> >   rxe_net_exit();
> > + rxe_destroy_wq();
> >
> >   pr_info("unloaded\n");
> >  }
> > diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c 
> > b/drivers/infiniband/sw/rxe/rxe_comp.c
> > index 20737fec392b..046bbacce37c 100644
> > --- a/drivers/infiniband/sw/rxe/rxe_comp.c
> > +++ b/drivers/infiniband/sw/rxe/rxe_comp.c
> > @@ -773,7 +773,7 @@ int rxe_completer(void *arg)
> >   }
> >
> >   /* A non-zero return value will cause rxe_do_task to
> > -  * exit its loop and end the tasklet. A zero return
> > +  * exit its loop and end the work item. A zero return
> >* will continue looping and return to rxe_completer
> >*/
> >  done:
> > diff --git a/drivers/infiniband/sw/rxe/rxe_param.h 
> > b/drivers/infiniband/sw/rxe/rxe_param.h
> > index a754fc902e3d..bd8050e99d6b 100644
> > --- a/drivers/infiniband/sw/rxe/rxe_param.h
> > +++ b/drivers/infiniband/sw/rxe/rxe_param.h
> > @@ -112,7 +112,7 @@ enum rxe_device_param {
> >   RXE_INFLIGHT_SKBS_PER_QP_HIGH   = 64,
> >   RXE_INFLIGHT_SKBS_PER_QP_LOW= 16,
> >
> > - /* Max number of interations of each tasklet
> > + /* Max number of interations of each work item
> >* before yielding the cpu to let other
> >* work make progress
> >*/
> > diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c 
> > b/drivers/infiniband/sw/rxe/rxe_qp.c
> > index ab72db68b58f..e033b2449dfe 100644
> > --- a/drivers/infiniband/sw/rxe/rxe_qp.c
> > +++ b/drivers/infiniband/sw/rxe/rxe_qp.c
> > @@ -471,7 +471,7 @@ int rxe_qp_chk_attr(struct rxe_dev *rxe, struct rxe_qp 
> > *qp,
> >  /* move the qp to the reset state */
> >  static void rxe_qp_reset(struct rxe_qp *qp)
> >  {
> > - /* stop tasks from running */
> > + /* flush workqueue and stop tasks from running */
> >   rxe_disable_task(>resp.task);
> >
> >   /* stop request/comp */
> > diff --git a/drivers/infiniband/sw/rxe/rxe_req.c 
> > b/drivers/infiniband/sw/rxe/rxe_req.c
> > index 899c8779f800..2bcd287a2c3b 100644
> > --- a/drivers/infiniband/sw/rxe/rxe_req.c
> > +++ b/drivers/infiniband/sw/rxe/rxe_req.c
> > @@ -830,7 +830,7 @@ int rxe_requester(void *arg)
> >   update_state(qp, );
> >
> >   /* A non-zero return value will cause rxe_do_task to
> > -  * exit its loop and end the tasklet. A zero return
> > +  * exit its loop and end the work item. A zero return
> >* will continue looping and return to rxe_requester
> >

Re: [PATCH for-next v3 1/7] RDMA/rxe: Convert triple tasklets to use workqueue

2022-12-28 Thread Bob Pearson
On 12/23/22 00:51, Daisuke Matsuda wrote:
> In order to implement On-Demand Paging on the rxe driver, triple tasklets
> (requester, responder, and completer) must be allowed to sleep so that they
> can trigger page fault when pages being accessed are not present.
> 
> This patch replaces the tasklets with a workqueue, but still allows direct-
> call of works from softirq context if it is obvious that MRs are not going
> to be accessed and there is no work being processed in the workqueue.

There are already at least two patch sets that do this waiting to get upstream.
Bob

> 
> As counterparts to tasklet_disable() and tasklet_enable() are missing for
> workqueues, an atomic value is introduced to prevent work items from being
> scheduled while qp reset is in progress.
> 
> The way to initialize/destroy workqueue is picked up from the
> implementation of Ian Ziemba and Bob Pearson at HPE.
> 
> Link: https://lore.kernel.org/all/20221018043345.4033-1-rpearson...@gmail.com/
> Signed-off-by: Daisuke Matsuda 
> ---
>  drivers/infiniband/sw/rxe/rxe.c   |  9 -
>  drivers/infiniband/sw/rxe/rxe_comp.c  |  2 +-
>  drivers/infiniband/sw/rxe/rxe_param.h |  2 +-
>  drivers/infiniband/sw/rxe/rxe_qp.c|  2 +-
>  drivers/infiniband/sw/rxe/rxe_req.c   |  2 +-
>  drivers/infiniband/sw/rxe/rxe_resp.c  |  2 +-
>  drivers/infiniband/sw/rxe/rxe_task.c  | 52 ---
>  drivers/infiniband/sw/rxe/rxe_task.h  | 15 ++--
>  8 files changed, 65 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
> index 136c2efe3466..3c7e42e5b0c7 100644
> --- a/drivers/infiniband/sw/rxe/rxe.c
> +++ b/drivers/infiniband/sw/rxe/rxe.c
> @@ -210,10 +210,16 @@ static int __init rxe_module_init(void)
>  {
>   int err;
>  
> - err = rxe_net_init();
> + err = rxe_alloc_wq();
>   if (err)
>   return err;
>  
> + err = rxe_net_init();
> + if (err) {
> + rxe_destroy_wq();
> + return err;
> + }
> +
>   rdma_link_register(_link_ops);
>   pr_info("loaded\n");
>   return 0;
> @@ -224,6 +230,7 @@ static void __exit rxe_module_exit(void)
>   rdma_link_unregister(_link_ops);
>   ib_unregister_driver(RDMA_DRIVER_RXE);
>   rxe_net_exit();
> + rxe_destroy_wq();
>  
>   pr_info("unloaded\n");
>  }
> diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c 
> b/drivers/infiniband/sw/rxe/rxe_comp.c
> index 20737fec392b..046bbacce37c 100644
> --- a/drivers/infiniband/sw/rxe/rxe_comp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_comp.c
> @@ -773,7 +773,7 @@ int rxe_completer(void *arg)
>   }
>  
>   /* A non-zero return value will cause rxe_do_task to
> -  * exit its loop and end the tasklet. A zero return
> +  * exit its loop and end the work item. A zero return
>* will continue looping and return to rxe_completer
>*/
>  done:
> diff --git a/drivers/infiniband/sw/rxe/rxe_param.h 
> b/drivers/infiniband/sw/rxe/rxe_param.h
> index a754fc902e3d..bd8050e99d6b 100644
> --- a/drivers/infiniband/sw/rxe/rxe_param.h
> +++ b/drivers/infiniband/sw/rxe/rxe_param.h
> @@ -112,7 +112,7 @@ enum rxe_device_param {
>   RXE_INFLIGHT_SKBS_PER_QP_HIGH   = 64,
>   RXE_INFLIGHT_SKBS_PER_QP_LOW= 16,
>  
> - /* Max number of interations of each tasklet
> + /* Max number of interations of each work item
>* before yielding the cpu to let other
>* work make progress
>*/
> diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c 
> b/drivers/infiniband/sw/rxe/rxe_qp.c
> index ab72db68b58f..e033b2449dfe 100644
> --- a/drivers/infiniband/sw/rxe/rxe_qp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_qp.c
> @@ -471,7 +471,7 @@ int rxe_qp_chk_attr(struct rxe_dev *rxe, struct rxe_qp 
> *qp,
>  /* move the qp to the reset state */
>  static void rxe_qp_reset(struct rxe_qp *qp)
>  {
> - /* stop tasks from running */
> + /* flush workqueue and stop tasks from running */
>   rxe_disable_task(>resp.task);
>  
>   /* stop request/comp */
> diff --git a/drivers/infiniband/sw/rxe/rxe_req.c 
> b/drivers/infiniband/sw/rxe/rxe_req.c
> index 899c8779f800..2bcd287a2c3b 100644
> --- a/drivers/infiniband/sw/rxe/rxe_req.c
> +++ b/drivers/infiniband/sw/rxe/rxe_req.c
> @@ -830,7 +830,7 @@ int rxe_requester(void *arg)
>   update_state(qp, );
>  
>   /* A non-zero return value will cause rxe_do_task to
> -  * exit its loop and end the tasklet. A zero return
> +  * exit its loop and end the work item. A zero return
>* will continue looping and return to rxe_requester
>*/
>  done:
> diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c 
> b/drivers/infiniband/sw/rxe/rxe_resp.c
> index c74972244f08..d9134a00a529 100644
> --- a/drivers/infiniband/sw/rxe/rxe_resp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_resp.c
> @@ -1691,7 +1691,7 @@ int rxe_responder(void *arg)
>   }
>  
>   /* A non-zero return value will cause rxe_do_task to
> -  * exit its 

Re: [PATCH for-next v3 1/7] RDMA/rxe: Convert triple tasklets to use workqueue

2022-12-23 Thread Hillf Danton
On 23 Dec 2022 15:51:52 +0900 Daisuke Matsuda 
> @@ -137,15 +153,27 @@ void rxe_sched_task(struct rxe_task *task)
>   if (task->destroyed)
>   return;
>  
> - tasklet_schedule(>tasklet);
> + /*
> +  * busy-loop while qp reset is in progress.
> +  * This may be called from softirq context and thus cannot sleep.
> +  */
> + while (atomic_read(>suspended))
> + cpu_relax();
> +
> + queue_work(task->workq, >work);
>  }

This busy wait particularly in softirq barely makes sense given the
flush_workqueue() below.
>  
>  void rxe_disable_task(struct rxe_task *task)
>  {
> - tasklet_disable(>tasklet);
> + /* Alternative to tasklet_disable() */
> + atomic_inc(>suspended);
> + smp_mb__after_atomic();
> + flush_workqueue(task->workq);
>  }
>  
>  void rxe_enable_task(struct rxe_task *task)
>  {
> - tasklet_enable(>tasklet);
> + /* Alternative to tasklet_enable() */
> + smp_mb__before_atomic();
> + atomic_dec(>suspended);
>  }

Feel free to add one-line comment for why smp_mb is needed in both
cases.