Re: SRCU: kworker hung in synchronize_srcu

2023-10-07 Thread zhuangel570
On Sun, Oct 1, 2023 at 10:27 AM Neeraj upadhyay wrote: > > On Sun, Oct 1, 2023 at 5:49 AM Joel Fernandes wrote: > > > > On Sat, Sep 30, 2023 at 6:01 AM Neeraj upadhyay > > wrote: > > > > > > On Fri, Sep 29, 2023 at 3:35 AM Joel Fernandes > > > wrote: > > > > > > > > Hello, > > > > Firstly, ku

Re: SRCU: kworker hung in synchronize_srcu

2023-10-07 Thread zhuangel570
On Fri, Sep 29, 2023 at 5:39 AM Joel Fernandes wrote: > > Hello, > Firstly, kudos to the detailed report and analysis. Rare failures are > hard and your usage crash/kdump is awesome to dig deeper into the > issue.. > > On Thu, Sep 28, 2023 at 3:59 AM zhuangel570 wrote: > > > > Hi, > > > > We enco

Re: SRCU: kworker hung in synchronize_srcu

2023-10-03 Thread Neeraj upadhyay
On Tue, Oct 3, 2023 at 5:52 PM Frederic Weisbecker wrote: > > On Mon, Oct 02, 2023 at 11:09:39PM +0200, Frederic Weisbecker wrote: > > > > spin_unlock_rcu_node(sdp); /* Interrupts remain disabled. */ > > > > WRITE_ONCE(ssp->srcu_sup->srcu_gp_start, jiffies); > > > > WRITE_

Re: SRCU: kworker hung in synchronize_srcu

2023-10-03 Thread Frederic Weisbecker
On Mon, Oct 02, 2023 at 11:09:39PM +0200, Frederic Weisbecker wrote: > > > spin_unlock_rcu_node(sdp); /* Interrupts remain disabled. */ > > > WRITE_ONCE(ssp->srcu_sup->srcu_gp_start, jiffies); > > > WRITE_ONCE(ssp->srcu_sup->srcu_n_exp_nodelay, 0); > > > @@ -1245,7 +1243,18

Re: SRCU: kworker hung in synchronize_srcu

2023-10-03 Thread Neeraj upadhyay
On Tue, Oct 3, 2023 at 4:16 AM Frederic Weisbecker wrote: > > Le Mon, Oct 02, 2023 at 07:51:10AM +0530, Neeraj upadhyay a écrit : > > > And if this works, can we then remove srcu_invoke_callbacks() > > > self-requeue? > > > If queued several times before it actually fires, it will catch the lates

Re: SRCU: kworker hung in synchronize_srcu

2023-10-03 Thread Neeraj upadhyay
On Tue, 3 Oct, 2023, 2:39 am Frederic Weisbecker, wrote: > > Le Mon, Oct 02, 2023 at 06:52:27PM +0530, Neeraj upadhyay a écrit : > > On Mon, Oct 2, 2023 at 4:11 PM Frederic Weisbecker > > wrote: > > > Also the role of the remaining advance in srcu_gp_start() is unclear to > > > me... > > > > >

Re: SRCU: kworker hung in synchronize_srcu

2023-10-02 Thread Frederic Weisbecker
Le Mon, Oct 02, 2023 at 07:51:10AM +0530, Neeraj upadhyay a écrit : > > And if this works, can we then remove srcu_invoke_callbacks() self-requeue? > > If queued several times before it actually fires, it will catch the latest > > grace period's end. And if queued while the callback runs, it will r

Re: SRCU: kworker hung in synchronize_srcu

2023-10-02 Thread Frederic Weisbecker
Le Mon, Oct 02, 2023 at 06:52:27PM +0530, Neeraj upadhyay a écrit : > On Mon, Oct 2, 2023 at 4:11 PM Frederic Weisbecker > wrote: > > Also the role of the remaining advance in srcu_gp_start() is unclear to > > me... > > > > As far as I understand, the advance call before accelerate is to make >

Re: SRCU: kworker hung in synchronize_srcu

2023-10-02 Thread Neeraj upadhyay
On Mon, Oct 2, 2023 at 4:11 PM Frederic Weisbecker wrote: > > On Mon, Oct 02, 2023 at 07:47:55AM +0530, Neeraj upadhyay wrote: > > On Mon, Oct 2, 2023 at 4:02 AM Frederic Weisbecker > > wrote: > > > > > > Le Sun, Oct 01, 2023 at 07:57:14AM +0530, Neeraj upadhyay a écrit : > > > > > > > > But "mo

Re: SRCU: kworker hung in synchronize_srcu

2023-10-02 Thread Frederic Weisbecker
On Mon, Oct 02, 2023 at 07:51:10AM +0530, Neeraj upadhyay wrote: > On Mon, Oct 2, 2023 at 4:10 AM Frederic Weisbecker > wrote: > > And if this works, can we then remove srcu_invoke_callbacks() self-requeue? > > If queued several times before it actually fires, it will catch the latest > > grace p

Re: SRCU: kworker hung in synchronize_srcu

2023-10-02 Thread Frederic Weisbecker
On Mon, Oct 02, 2023 at 07:47:55AM +0530, Neeraj upadhyay wrote: > On Mon, Oct 2, 2023 at 4:02 AM Frederic Weisbecker > wrote: > > > > Le Sun, Oct 01, 2023 at 07:57:14AM +0530, Neeraj upadhyay a écrit : > > > > > > But "more" only checks for CBs in DONE tail. The callbacks which have > > > been

Re: SRCU: kworker hung in synchronize_srcu

2023-10-01 Thread Neeraj upadhyay
On Mon, Oct 2, 2023 at 4:10 AM Frederic Weisbecker wrote: > > Le Mon, Oct 02, 2023 at 12:32:41AM +0200, Frederic Weisbecker a écrit : > > Le Sun, Oct 01, 2023 at 07:57:14AM +0530, Neeraj upadhyay a écrit : > > > > > > But "more" only checks for CBs in DONE tail. The callbacks which have > > > bee

Re: SRCU: kworker hung in synchronize_srcu

2023-10-01 Thread Neeraj upadhyay
On Mon, Oct 2, 2023 at 4:02 AM Frederic Weisbecker wrote: > > Le Sun, Oct 01, 2023 at 07:57:14AM +0530, Neeraj upadhyay a écrit : > > > > But "more" only checks for CBs in DONE tail. The callbacks which have been > > just > > accelerated are not advanced to DONE tail. > > > > Having said above, I

Re: SRCU: kworker hung in synchronize_srcu

2023-10-01 Thread Frederic Weisbecker
Le Mon, Oct 02, 2023 at 12:32:41AM +0200, Frederic Weisbecker a écrit : > Le Sun, Oct 01, 2023 at 07:57:14AM +0530, Neeraj upadhyay a écrit : > > > > But "more" only checks for CBs in DONE tail. The callbacks which have been > > just > > accelerated are not advanced to DONE tail. > > > > Having

Re: SRCU: kworker hung in synchronize_srcu

2023-10-01 Thread Frederic Weisbecker
Le Sun, Oct 01, 2023 at 07:57:14AM +0530, Neeraj upadhyay a écrit : > > But "more" only checks for CBs in DONE tail. The callbacks which have been > just > accelerated are not advanced to DONE tail. > > Having said above, I am still trying to figure out, which callbacks > are actually being poin

Re: SRCU: kworker hung in synchronize_srcu

2023-09-30 Thread Neeraj upadhyay
On Sun, Oct 1, 2023 at 5:49 AM Joel Fernandes wrote: > > On Sat, Sep 30, 2023 at 6:01 AM Neeraj upadhyay > wrote: > > > > On Fri, Sep 29, 2023 at 3:35 AM Joel Fernandes > > wrote: > > > > > > Hello, > > > Firstly, kudos to the detailed report and analysis. Rare failures are > > > hard and your

Re: SRCU: kworker hung in synchronize_srcu

2023-09-30 Thread Joel Fernandes
On Sat, Sep 30, 2023 at 6:01 AM Neeraj upadhyay wrote: > > On Fri, Sep 29, 2023 at 3:35 AM Joel Fernandes wrote: > > > > Hello, > > Firstly, kudos to the detailed report and analysis. Rare failures are > > hard and your usage crash/kdump is awesome to dig deeper into the > > issue.. > > > > On Th

Re: SRCU: kworker hung in synchronize_srcu

2023-09-30 Thread Neeraj upadhyay
On Fri, Sep 29, 2023 at 3:04 AM zhuangel570 wrote: > > Hi, > > We encounter SRCU hung issue in stable tree 5.4.203, we are running VM create > and destroy concurrent test, the issue happens after several weeks. Now we > didn't have a way to reproduce this issue, the issue happens randomly, this >

Re: SRCU: kworker hung in synchronize_srcu

2023-09-30 Thread Neeraj upadhyay
On Fri, Sep 29, 2023 at 3:35 AM Joel Fernandes wrote: > > Hello, > Firstly, kudos to the detailed report and analysis. Rare failures are > hard and your usage crash/kdump is awesome to dig deeper into the > issue.. > > On Thu, Sep 28, 2023 at 3:59 AM zhuangel570 wrote: > > > > Hi, > > > > We enco

Re: SRCU: kworker hung in synchronize_srcu

2023-09-30 Thread Neeraj upadhyay
On Sat, Sep 30, 2023 at 2:40 PM Frederic Weisbecker wrote: > > Le Sat, Sep 30, 2023 at 08:15:06AM +0530, Neeraj upadhyay a écrit : > > On Sat, Sep 30, 2023 at 4:15 AM Frederic Weisbecker > > wrote: > > > > > > Le Thu, Sep 28, 2023 at 05:39:17PM -0400, Joel Fernandes a écrit : > > > > If srcu_inv

Re: SRCU: kworker hung in synchronize_srcu

2023-09-30 Thread Frederic Weisbecker
Le Sat, Sep 30, 2023 at 08:15:06AM +0530, Neeraj upadhyay a écrit : > On Sat, Sep 30, 2023 at 4:15 AM Frederic Weisbecker > wrote: > > > > Le Thu, Sep 28, 2023 at 05:39:17PM -0400, Joel Fernandes a écrit : > > > If srcu_invoke_callbacks() was really called for the rdp, I would have > > > expected

Re: SRCU: kworker hung in synchronize_srcu

2023-09-30 Thread Frederic Weisbecker
On Sat, Sep 30, 2023 at 08:15:06AM +0530, Neeraj upadhyay wrote: > On Sat, Sep 30, 2023 at 4:15 AM Frederic Weisbecker > wrote: > > > > Le Thu, Sep 28, 2023 at 05:39:17PM -0400, Joel Fernandes a écrit : > > > If srcu_invoke_callbacks() was really called for the rdp, I would have > > > expected rc

Re: SRCU: kworker hung in synchronize_srcu

2023-09-29 Thread Neeraj upadhyay
On Sat, Sep 30, 2023 at 4:15 AM Frederic Weisbecker wrote: > > Le Thu, Sep 28, 2023 at 05:39:17PM -0400, Joel Fernandes a écrit : > > If srcu_invoke_callbacks() was really called for the rdp, I would have > > expected rcu_segcblist_advance() to advance all those pending > > callbacks to 304. > > >

Re: SRCU: kworker hung in synchronize_srcu

2023-09-29 Thread Frederic Weisbecker
Le Thu, Sep 28, 2023 at 05:39:17PM -0400, Joel Fernandes a écrit : > If srcu_invoke_callbacks() was really called for the rdp, I would have > expected rcu_segcblist_advance() to advance all those pending > callbacks to 304. > > I posit that probably srcu_invoke_callbacks() is not even being called

Re: SRCU: kworker hung in synchronize_srcu

2023-09-28 Thread Zhouyi Zhou
On Fri, Sep 29, 2023 at 6:05 AM Joel Fernandes wrote: > > Hello, > Firstly, kudos to the detailed report and analysis. Rare failures are > hard and your usage crash/kdump is awesome to dig deeper into the > issue.. > > On Thu, Sep 28, 2023 at 3:59 AM zhuangel570 wrote: > > > > Hi, > > > > We enco

Re: SRCU: kworker hung in synchronize_srcu

2023-09-28 Thread Joel Fernandes
On Thu, Sep 28, 2023 at 5:39 PM Joel Fernandes wrote: > > Hello, > Firstly, kudos to the detailed report and analysis. Rare failures are > hard and your usage crash/kdump is awesome to dig deeper into the > issue.. > > On Thu, Sep 28, 2023 at 3:59 AM zhuangel570 wrote: > > > > Hi, > > > > We enco

Re: SRCU: kworker hung in synchronize_srcu

2023-09-28 Thread Joel Fernandes
Hello, Firstly, kudos to the detailed report and analysis. Rare failures are hard and your usage crash/kdump is awesome to dig deeper into the issue.. On Thu, Sep 28, 2023 at 3:59 AM zhuangel570 wrote: > > Hi, > > We encounter SRCU hung issue in stable tree 5.4.203, we are running VM create > and

SRCU: kworker hung in synchronize_srcu

2023-09-28 Thread zhuangel570
Hi, We encounter SRCU hung issue in stable tree 5.4.203, we are running VM create and destroy concurrent test, the issue happens after several weeks. Now we didn't have a way to reproduce this issue, the issue happens randomly, this is the second time we found it in this year. We did some investi