Re: [PATCH] mailbox: cancel timer before starting it

2020-10-16 Thread Jassi Brar
On Fri, Oct 16, 2020 at 1:54 PM Jerome Brunet  wrote:
>
>
> On Fri 16 Oct 2020 at 19:33, Jassi Brar  wrote:
>
> > On Fri, Oct 16, 2020 at 4:00 AM Jerome Brunet  wrote:
> >>
> >>
> >> On Fri 16 Oct 2020 at 10:52, Ionela Voinescu  
> >> wrote:
> >>
> >> > On Thursday 15 Oct 2020 at 13:45:54 (-0500), Jassi Brar wrote:
> >> > [..]
> >> >> > >> --- a/drivers/mailbox/mailbox.c
> >> >> > >> +++ b/drivers/mailbox/mailbox.c
> >> >> > >> @@ -82,9 +82,13 @@ static void msg_submit(struct mbox_chan *chan)
> >> >> > >>  exit:
> >> >> > >>  spin_unlock_irqrestore(&chan->lock, flags);
> >> >> > >>
> >> >> > >> -if (!err && (chan->txdone_method & TXDONE_BY_POLL))
> >> >> > >> -/* kick start the timer immediately to avoid delays */
> >> >> > >> +if (!err && (chan->txdone_method & TXDONE_BY_POLL)) {
> >> >> > >> +/* Disable the timer if already active ... */
> >> >> > >> +hrtimer_cancel(&chan->mbox->poll_hrt);
> >> >> > >> +
> >> >> > >> +/* ... and kick start it immediately to avoid delays 
> >> >> > >> */
> >> >> > >>  hrtimer_start(&chan->mbox->poll_hrt, 0, 
> >> >> > >> HRTIMER_MODE_REL);
> >> >> > >> +}
> >> >> > >>  }
> >> >> > >>
> >> >> > >>  static void tx_tick(struct mbox_chan *chan, int r)
> >> >> > >
> >> >> > > I've tracked a regression back to this commit. Details to reproduce:
> >> >> >
> >> >> > Hi Ionela,
> >> >> >
> >> >> > I don't have access to your platform and I don't get what is going on
> >> >> > from the log below.
> >> >> >
> >> >> > Could you please give us a bit more details about what is going on ?
> >> >> >
> >> >> > All this patch does is add hrtimer_cancel().
> >> >> > * It is needed if the timer had already been started, which is
> >> >> >   appropriate AFAIU
> >> >> > * It is a NO-OP is the timer is not active.
> >> >> >
> >> >> Can you please try using hrtimer_try_to_cancel() instead ?
> >> >>
> >> >
> >> > Yes, using hrtimer_try_to_cancel() instead works for me. But doesn't
> >> > this limit how effective this change is? AFAIU, this will possibly only
> >> > reduce the chances for the race condition, but not solve it.
> >> >
> >>
> >> It is also my understanding, hrtimer_try_to_cancel() would remove a
> >> timer which as not already started but would return withtout doing
> >> anything if the callback is already running ... which is the original
> >> problem
> >>
> > If we are running in the callback path, hrtimer_try_to_cancel will
> > return -1, in which case we could skip hrtimer_start.
> > Anyways, I think simply checking for hrtimer_active should effect the same.
> > I have submitted a patch, of course not tested.
>
> Yes it sloves this race but ...
>
Thanks for confirmation.

> If a race is possible between a timer callback rescheduling itself (which
> is not that uncommon) and another thread trying to cancel it
>
In our case, we should not be cancelling+restarting the timer in the
first place, because txdone_hrtimer will take care of it via
hrtimer_forward_now.

>, maybe
> there is something worth fixing in hrtimer ? Also, mailbox calls
> hrtimer_cancel() in unregister ... are we confident this would work ?
>
Yes. After unregister() every channel is supposed to die and so must
its resources.

-jassi


Re: [PATCH] mailbox: cancel timer before starting it

2020-10-16 Thread Jerome Brunet


On Fri 16 Oct 2020 at 19:33, Jassi Brar  wrote:

> On Fri, Oct 16, 2020 at 4:00 AM Jerome Brunet  wrote:
>>
>>
>> On Fri 16 Oct 2020 at 10:52, Ionela Voinescu  wrote:
>>
>> > On Thursday 15 Oct 2020 at 13:45:54 (-0500), Jassi Brar wrote:
>> > [..]
>> >> > >> --- a/drivers/mailbox/mailbox.c
>> >> > >> +++ b/drivers/mailbox/mailbox.c
>> >> > >> @@ -82,9 +82,13 @@ static void msg_submit(struct mbox_chan *chan)
>> >> > >>  exit:
>> >> > >>  spin_unlock_irqrestore(&chan->lock, flags);
>> >> > >>
>> >> > >> -if (!err && (chan->txdone_method & TXDONE_BY_POLL))
>> >> > >> -/* kick start the timer immediately to avoid delays */
>> >> > >> +if (!err && (chan->txdone_method & TXDONE_BY_POLL)) {
>> >> > >> +/* Disable the timer if already active ... */
>> >> > >> +hrtimer_cancel(&chan->mbox->poll_hrt);
>> >> > >> +
>> >> > >> +/* ... and kick start it immediately to avoid delays */
>> >> > >>  hrtimer_start(&chan->mbox->poll_hrt, 0, 
>> >> > >> HRTIMER_MODE_REL);
>> >> > >> +}
>> >> > >>  }
>> >> > >>
>> >> > >>  static void tx_tick(struct mbox_chan *chan, int r)
>> >> > >
>> >> > > I've tracked a regression back to this commit. Details to reproduce:
>> >> >
>> >> > Hi Ionela,
>> >> >
>> >> > I don't have access to your platform and I don't get what is going on
>> >> > from the log below.
>> >> >
>> >> > Could you please give us a bit more details about what is going on ?
>> >> >
>> >> > All this patch does is add hrtimer_cancel().
>> >> > * It is needed if the timer had already been started, which is
>> >> >   appropriate AFAIU
>> >> > * It is a NO-OP is the timer is not active.
>> >> >
>> >> Can you please try using hrtimer_try_to_cancel() instead ?
>> >>
>> >
>> > Yes, using hrtimer_try_to_cancel() instead works for me. But doesn't
>> > this limit how effective this change is? AFAIU, this will possibly only
>> > reduce the chances for the race condition, but not solve it.
>> >
>>
>> It is also my understanding, hrtimer_try_to_cancel() would remove a
>> timer which as not already started but would return withtout doing
>> anything if the callback is already running ... which is the original
>> problem
>>
> If we are running in the callback path, hrtimer_try_to_cancel will
> return -1, in which case we could skip hrtimer_start.
> Anyways, I think simply checking for hrtimer_active should effect the same.
> I have submitted a patch, of course not tested.

Yes it sloves this race but ...

If a race is possible between a timer callback rescheduling itself (which
is not that uncommon) and another thread trying to cancel it, maybe
there is something worth fixing in hrtimer ? Also, mailbox calls
hrtimer_cancel() in unregister ... are we confident this would work ?

Any fix is by me - yours avoid killing and restarting the timer :) but
it feels like we are working around an issue that might bite us back
later on.

>
> Thanks



Re: [PATCH] mailbox: cancel timer before starting it

2020-10-16 Thread Jassi Brar
On Fri, Oct 16, 2020 at 4:00 AM Jerome Brunet  wrote:
>
>
> On Fri 16 Oct 2020 at 10:52, Ionela Voinescu  wrote:
>
> > On Thursday 15 Oct 2020 at 13:45:54 (-0500), Jassi Brar wrote:
> > [..]
> >> > >> --- a/drivers/mailbox/mailbox.c
> >> > >> +++ b/drivers/mailbox/mailbox.c
> >> > >> @@ -82,9 +82,13 @@ static void msg_submit(struct mbox_chan *chan)
> >> > >>  exit:
> >> > >>  spin_unlock_irqrestore(&chan->lock, flags);
> >> > >>
> >> > >> -if (!err && (chan->txdone_method & TXDONE_BY_POLL))
> >> > >> -/* kick start the timer immediately to avoid delays */
> >> > >> +if (!err && (chan->txdone_method & TXDONE_BY_POLL)) {
> >> > >> +/* Disable the timer if already active ... */
> >> > >> +hrtimer_cancel(&chan->mbox->poll_hrt);
> >> > >> +
> >> > >> +/* ... and kick start it immediately to avoid delays */
> >> > >>  hrtimer_start(&chan->mbox->poll_hrt, 0, 
> >> > >> HRTIMER_MODE_REL);
> >> > >> +}
> >> > >>  }
> >> > >>
> >> > >>  static void tx_tick(struct mbox_chan *chan, int r)
> >> > >
> >> > > I've tracked a regression back to this commit. Details to reproduce:
> >> >
> >> > Hi Ionela,
> >> >
> >> > I don't have access to your platform and I don't get what is going on
> >> > from the log below.
> >> >
> >> > Could you please give us a bit more details about what is going on ?
> >> >
> >> > All this patch does is add hrtimer_cancel().
> >> > * It is needed if the timer had already been started, which is
> >> >   appropriate AFAIU
> >> > * It is a NO-OP is the timer is not active.
> >> >
> >> Can you please try using hrtimer_try_to_cancel() instead ?
> >>
> >
> > Yes, using hrtimer_try_to_cancel() instead works for me. But doesn't
> > this limit how effective this change is? AFAIU, this will possibly only
> > reduce the chances for the race condition, but not solve it.
> >
>
> It is also my understanding, hrtimer_try_to_cancel() would remove a
> timer which as not already started but would return withtout doing
> anything if the callback is already running ... which is the original
> problem
>
If we are running in the callback path, hrtimer_try_to_cancel will
return -1, in which case we could skip hrtimer_start.
Anyways, I think simply checking for hrtimer_active should effect the same.
I have submitted a patch, of course not tested.

Thanks


Re: [PATCH] mailbox: cancel timer before starting it

2020-10-16 Thread Sudeep Holla
On Fri, Oct 16, 2020 at 1:15 PM Jerome Brunet  wrote:
>
>
> On Fri 16 Oct 2020 at 10:52, Ionela Voinescu  wrote:
>
> > On Thursday 15 Oct 2020 at 13:45:54 (-0500), Jassi Brar wrote:
> > [..]
> >> > >> --- a/drivers/mailbox/mailbox.c
> >> > >> +++ b/drivers/mailbox/mailbox.c
> >> > >> @@ -82,9 +82,13 @@ static void msg_submit(struct mbox_chan *chan)
> >> > >>  exit:
> >> > >>  spin_unlock_irqrestore(&chan->lock, flags);
> >> > >>
> >> > >> -if (!err && (chan->txdone_method & TXDONE_BY_POLL))
> >> > >> -/* kick start the timer immediately to avoid delays */
> >> > >> +if (!err && (chan->txdone_method & TXDONE_BY_POLL)) {
> >> > >> +/* Disable the timer if already active ... */
> >> > >> +hrtimer_cancel(&chan->mbox->poll_hrt);
> >> > >> +
> >> > >> +/* ... and kick start it immediately to avoid delays */
> >> > >>  hrtimer_start(&chan->mbox->poll_hrt, 0, 
> >> > >> HRTIMER_MODE_REL);
> >> > >> +}
> >> > >>  }
> >> > >>
> >> > >>  static void tx_tick(struct mbox_chan *chan, int r)
> >> > >
> >> > > I've tracked a regression back to this commit. Details to reproduce:
> >> >
> >> > Hi Ionela,
> >> >
> >> > I don't have access to your platform and I don't get what is going on
> >> > from the log below.
> >> >
> >> > Could you please give us a bit more details about what is going on ?
> >> >
> >> > All this patch does is add hrtimer_cancel().
> >> > * It is needed if the timer had already been started, which is
> >> >   appropriate AFAIU
> >> > * It is a NO-OP is the timer is not active.
> >> >
> >> Can you please try using hrtimer_try_to_cancel() instead ?
> >>
> >
> > Yes, using hrtimer_try_to_cancel() instead works for me. But doesn't
> > this limit how effective this change is? AFAIU, this will possibly only
> > reduce the chances for the race condition, but not solve it.
> >
>
> It is also my understanding, hrtimer_try_to_cancel() would remove a
> timer which as not already started but would return withtout doing
> anything if the callback is already running ... which is the original
> problem
>


There seem to be some races. It always hangs in the hrtimer_cancel.
Logging some extra messages makes it progress for a while and finally
get stuck in the loop. I wonder if there is a race between cancel
and handler execution.

-- 
Regards,
Sudeep


Re: [PATCH] mailbox: cancel timer before starting it

2020-10-16 Thread Sudeep Holla
On Fri, Oct 16, 2020 at 12:32 PM Ionela Voinescu
 wrote:
>
> On Thursday 15 Oct 2020 at 13:45:54 (-0500), Jassi Brar wrote:
> [..]
> > > >> --- a/drivers/mailbox/mailbox.c
> > > >> +++ b/drivers/mailbox/mailbox.c
> > > >> @@ -82,9 +82,13 @@ static void msg_submit(struct mbox_chan *chan)
> > > >>  exit:
> > > >>  spin_unlock_irqrestore(&chan->lock, flags);
> > > >>
> > > >> -if (!err && (chan->txdone_method & TXDONE_BY_POLL))
> > > >> -/* kick start the timer immediately to avoid delays */
> > > >> +if (!err && (chan->txdone_method & TXDONE_BY_POLL)) {
> > > >> +/* Disable the timer if already active ... */
> > > >> +hrtimer_cancel(&chan->mbox->poll_hrt);
> > > >> +
> > > >> +/* ... and kick start it immediately to avoid delays */
> > > >>  hrtimer_start(&chan->mbox->poll_hrt, 0, HRTIMER_MODE_REL);
> > > >> +}
> > > >>  }
> > > >>
> > > >>  static void tx_tick(struct mbox_chan *chan, int r)
> > > >
> > > > I've tracked a regression back to this commit. Details to reproduce:
> > >
> > > Hi Ionela,
> > >
> > > I don't have access to your platform and I don't get what is going on
> > > from the log below.
> > >
> > > Could you please give us a bit more details about what is going on ?
> > >
> > > All this patch does is add hrtimer_cancel().
> > > * It is needed if the timer had already been started, which is
> > >   appropriate AFAIU
> > > * It is a NO-OP is the timer is not active.
> > >
> > Can you please try using hrtimer_try_to_cancel() instead ?
> >
>
> Yes, using hrtimer_try_to_cancel() instead works for me. But doesn't
> this limit how effective this change is? AFAIU, this will possibly only
> reduce the chances for the race condition, but not solve it.
>

Indeed, I tried the same and got a lot of -ETIME failures. hrtimer_cancel
uses hrtimer_try_to_cancel in loop until it succeeds so that the following
hrtimer_start can be done w/o any pending timers.

-- 
Regards,
Sudeep


Re: [PATCH] mailbox: cancel timer before starting it

2020-10-16 Thread Sudeep Holla
On Fri, Oct 16, 2020 at 10:34:21AM +0100, Sudeep Holla wrote:
> On Fri, Oct 16, 2020 at 11:02:02AM +0200, Jerome Brunet wrote:
> > 
> > On Fri 16 Oct 2020 at 10:44, Sudeep Holla  wrote:
> > 
> > > On Thu, Oct 15, 2020 at 03:29:35PM +0100, Ionela Voinescu wrote:
> > >> Hi Jerome,
> > >> 
> > >> On Thursday 15 Oct 2020 at 15:58:30 (+0200), Jerome Brunet wrote:
> > >> > 
> > >> > On Thu 15 Oct 2020 at 15:46, Ionela Voinescu  
> > >> > wrote:
> > >> > 
> > >> > > Hi guys,
> > >> > >
> > >> > > On Wednesday 23 Sep 2020 at 14:39:16 (+0200), Jerome Brunet wrote:
> > >> > >> If the txdone is done by polling, it is possible for msg_submit() 
> > >> > >> to start
> > >> > >> the timer while txdone_hrtimer() callback is running. If the timer 
> > >> > >> needs
> > >> > >> recheduling, it could already be enqueued by the time 
> > >> > >> hrtimer_forward_now()
> > >> > >> is called, leading hrtimer to loudly complain.
> > >> > >> 
> > >> > >> WARNING: CPU: 3 PID: 74 at kernel/time/hrtimer.c:932 
> > >> > >> hrtimer_forward+0xc4/0x110
> > >> > >> CPU: 3 PID: 74 Comm: kworker/u8:1 Not tainted 
> > >> > >> 5.9.0-rc2-00236-gd3520067d01c-dirty #5
> > >> > >> Hardware name: Libre Computer AML-S805X-AC (DT)
> > >> > >> Workqueue: events_freezable_power_ thermal_zone_device_check
> > >> > >> pstate: 2085 (nzCv daIf -PAN -UAO BTYPE=--)
> > >> > >> pc : hrtimer_forward+0xc4/0x110
> > >> > >> lr : txdone_hrtimer+0xf8/0x118
> > >> > >> [...]
> > >> > >> 
> > >> > >> Canceling the timer before starting it ensure that the timer 
> > >> > >> callback is
> > >> > >> not running when the timer is started, solving this race condition.
> > >> > >> 
> > >> > >> Fixes: 0cc67945ea59 ("mailbox: switch to hrtimer for tx_complete 
> > >> > >> polling")
> > >> > >> Reported-by: Da Xue 
> > >> > >> Signed-off-by: Jerome Brunet 
> > >> > >> ---
> > >> > >>  drivers/mailbox/mailbox.c | 8 ++--
> > >> > >>  1 file changed, 6 insertions(+), 2 deletions(-)
> > >> > >> 
> > >> > >> diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
> > >> > >> index 0b821a5b2db8..34f9ab01caef 100644
> > >> > >> --- a/drivers/mailbox/mailbox.c
> > >> > >> +++ b/drivers/mailbox/mailbox.c
> > >> > >> @@ -82,9 +82,13 @@ static void msg_submit(struct mbox_chan *chan)
> > >> > >>  exit:
> > >> > >> spin_unlock_irqrestore(&chan->lock, flags);
> > >> > >>  
> > >> > >> -   if (!err && (chan->txdone_method & TXDONE_BY_POLL))
> > >> > >> -   /* kick start the timer immediately to avoid delays */
> > >> > >> +   if (!err && (chan->txdone_method & TXDONE_BY_POLL)) {
> > >> > >> +   /* Disable the timer if already active ... */
> > >> > >> +   hrtimer_cancel(&chan->mbox->poll_hrt);
> > >> > >> +
> > >> > >> +   /* ... and kick start it immediately to avoid delays */
> > >> > >> hrtimer_start(&chan->mbox->poll_hrt, 0, 
> > >> > >> HRTIMER_MODE_REL);
> > >> > >> +   }
> > >> > >>  }
> > >> > >>  
> > >> > >>  static void tx_tick(struct mbox_chan *chan, int r)
> > >> > >
> > >> > > I've tracked a regression back to this commit. Details to reproduce:
> > >> > 
> > >> > Hi Ionela,
> > >> > 
> > >> > I don't have access to your platform and I don't get what is going on
> > >> > from the log below.
> > >> > 
> > >> > Could you please give us a bit more details about what is going on ?
> > >> > 
> > >> 
> > >> I'm not familiar with the mailbox subsystem, so the best I can do right
> > >> now is to add Sudeep to Cc, in case this conflicts in some way with the
> > >> ARM MHU patches [1].
> > >>
> > >
> > > Not it can't be doorbell driver as we use SCPI(old firmware) with upstream
> > > MHU driver as is limiting the number of channels to be used.
> > >
> > >> In the meantime I'll get some traces and get more familiar with the
> > >> code.
> > >>
> > >
> > > I will try that too.
> > 
> > BTW, this issue was originally reported on amlogic platforms which also
> > use arm,mhu mailbox driver.
> > 
> 
> OK. Anyway just noticed that hrtimer_cancel uses  hrtimer_try_to_cancel
> and hrtimer_cancel_wait_running. The latter is just cpu_relax() if
> PREEMPT_RT=n, so you may still have issue if the hrtimer is still active
> or restarts in the meantime. 
>

Scratch that, I failed to see the loop in hrtimer_cancel earlier.

-- 
Regards,
Sudeep


Re: [PATCH] mailbox: cancel timer before starting it

2020-10-16 Thread Sudeep Holla
On Fri, Oct 16, 2020 at 11:02:02AM +0200, Jerome Brunet wrote:
> 
> On Fri 16 Oct 2020 at 10:44, Sudeep Holla  wrote:
> 
> > On Thu, Oct 15, 2020 at 03:29:35PM +0100, Ionela Voinescu wrote:
> >> Hi Jerome,
> >> 
> >> On Thursday 15 Oct 2020 at 15:58:30 (+0200), Jerome Brunet wrote:
> >> > 
> >> > On Thu 15 Oct 2020 at 15:46, Ionela Voinescu  
> >> > wrote:
> >> > 
> >> > > Hi guys,
> >> > >
> >> > > On Wednesday 23 Sep 2020 at 14:39:16 (+0200), Jerome Brunet wrote:
> >> > >> If the txdone is done by polling, it is possible for msg_submit() to 
> >> > >> start
> >> > >> the timer while txdone_hrtimer() callback is running. If the timer 
> >> > >> needs
> >> > >> recheduling, it could already be enqueued by the time 
> >> > >> hrtimer_forward_now()
> >> > >> is called, leading hrtimer to loudly complain.
> >> > >> 
> >> > >> WARNING: CPU: 3 PID: 74 at kernel/time/hrtimer.c:932 
> >> > >> hrtimer_forward+0xc4/0x110
> >> > >> CPU: 3 PID: 74 Comm: kworker/u8:1 Not tainted 
> >> > >> 5.9.0-rc2-00236-gd3520067d01c-dirty #5
> >> > >> Hardware name: Libre Computer AML-S805X-AC (DT)
> >> > >> Workqueue: events_freezable_power_ thermal_zone_device_check
> >> > >> pstate: 2085 (nzCv daIf -PAN -UAO BTYPE=--)
> >> > >> pc : hrtimer_forward+0xc4/0x110
> >> > >> lr : txdone_hrtimer+0xf8/0x118
> >> > >> [...]
> >> > >> 
> >> > >> Canceling the timer before starting it ensure that the timer callback 
> >> > >> is
> >> > >> not running when the timer is started, solving this race condition.
> >> > >> 
> >> > >> Fixes: 0cc67945ea59 ("mailbox: switch to hrtimer for tx_complete 
> >> > >> polling")
> >> > >> Reported-by: Da Xue 
> >> > >> Signed-off-by: Jerome Brunet 
> >> > >> ---
> >> > >>  drivers/mailbox/mailbox.c | 8 ++--
> >> > >>  1 file changed, 6 insertions(+), 2 deletions(-)
> >> > >> 
> >> > >> diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
> >> > >> index 0b821a5b2db8..34f9ab01caef 100644
> >> > >> --- a/drivers/mailbox/mailbox.c
> >> > >> +++ b/drivers/mailbox/mailbox.c
> >> > >> @@ -82,9 +82,13 @@ static void msg_submit(struct mbox_chan *chan)
> >> > >>  exit:
> >> > >>   spin_unlock_irqrestore(&chan->lock, flags);
> >> > >>  
> >> > >> - if (!err && (chan->txdone_method & TXDONE_BY_POLL))
> >> > >> - /* kick start the timer immediately to avoid delays */
> >> > >> + if (!err && (chan->txdone_method & TXDONE_BY_POLL)) {
> >> > >> + /* Disable the timer if already active ... */
> >> > >> + hrtimer_cancel(&chan->mbox->poll_hrt);
> >> > >> +
> >> > >> + /* ... and kick start it immediately to avoid delays */
> >> > >>   hrtimer_start(&chan->mbox->poll_hrt, 0, 
> >> > >> HRTIMER_MODE_REL);
> >> > >> + }
> >> > >>  }
> >> > >>  
> >> > >>  static void tx_tick(struct mbox_chan *chan, int r)
> >> > >
> >> > > I've tracked a regression back to this commit. Details to reproduce:
> >> > 
> >> > Hi Ionela,
> >> > 
> >> > I don't have access to your platform and I don't get what is going on
> >> > from the log below.
> >> > 
> >> > Could you please give us a bit more details about what is going on ?
> >> > 
> >> 
> >> I'm not familiar with the mailbox subsystem, so the best I can do right
> >> now is to add Sudeep to Cc, in case this conflicts in some way with the
> >> ARM MHU patches [1].
> >>
> >
> > Not it can't be doorbell driver as we use SCPI(old firmware) with upstream
> > MHU driver as is limiting the number of channels to be used.
> >
> >> In the meantime I'll get some traces and get more familiar with the
> >> code.
> >>
> >
> > I will try that too.
> 
> BTW, this issue was originally reported on amlogic platforms which also
> use arm,mhu mailbox driver.
> 

OK. Anyway just noticed that hrtimer_cancel uses  hrtimer_try_to_cancel
and hrtimer_cancel_wait_running. The latter is just cpu_relax() if
PREEMPT_RT=n, so you may still have issue if the hrtimer is still active
or restarts in the meantime.

-- 
Regards,
Sudeep


Re: [PATCH] mailbox: cancel timer before starting it

2020-10-16 Thread Jerome Brunet


On Fri 16 Oct 2020 at 10:44, Sudeep Holla  wrote:

> On Thu, Oct 15, 2020 at 03:29:35PM +0100, Ionela Voinescu wrote:
>> Hi Jerome,
>> 
>> On Thursday 15 Oct 2020 at 15:58:30 (+0200), Jerome Brunet wrote:
>> > 
>> > On Thu 15 Oct 2020 at 15:46, Ionela Voinescu  
>> > wrote:
>> > 
>> > > Hi guys,
>> > >
>> > > On Wednesday 23 Sep 2020 at 14:39:16 (+0200), Jerome Brunet wrote:
>> > >> If the txdone is done by polling, it is possible for msg_submit() to 
>> > >> start
>> > >> the timer while txdone_hrtimer() callback is running. If the timer needs
>> > >> recheduling, it could already be enqueued by the time 
>> > >> hrtimer_forward_now()
>> > >> is called, leading hrtimer to loudly complain.
>> > >> 
>> > >> WARNING: CPU: 3 PID: 74 at kernel/time/hrtimer.c:932 
>> > >> hrtimer_forward+0xc4/0x110
>> > >> CPU: 3 PID: 74 Comm: kworker/u8:1 Not tainted 
>> > >> 5.9.0-rc2-00236-gd3520067d01c-dirty #5
>> > >> Hardware name: Libre Computer AML-S805X-AC (DT)
>> > >> Workqueue: events_freezable_power_ thermal_zone_device_check
>> > >> pstate: 2085 (nzCv daIf -PAN -UAO BTYPE=--)
>> > >> pc : hrtimer_forward+0xc4/0x110
>> > >> lr : txdone_hrtimer+0xf8/0x118
>> > >> [...]
>> > >> 
>> > >> Canceling the timer before starting it ensure that the timer callback is
>> > >> not running when the timer is started, solving this race condition.
>> > >> 
>> > >> Fixes: 0cc67945ea59 ("mailbox: switch to hrtimer for tx_complete 
>> > >> polling")
>> > >> Reported-by: Da Xue 
>> > >> Signed-off-by: Jerome Brunet 
>> > >> ---
>> > >>  drivers/mailbox/mailbox.c | 8 ++--
>> > >>  1 file changed, 6 insertions(+), 2 deletions(-)
>> > >> 
>> > >> diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
>> > >> index 0b821a5b2db8..34f9ab01caef 100644
>> > >> --- a/drivers/mailbox/mailbox.c
>> > >> +++ b/drivers/mailbox/mailbox.c
>> > >> @@ -82,9 +82,13 @@ static void msg_submit(struct mbox_chan *chan)
>> > >>  exit:
>> > >> spin_unlock_irqrestore(&chan->lock, flags);
>> > >>  
>> > >> -   if (!err && (chan->txdone_method & TXDONE_BY_POLL))
>> > >> -   /* kick start the timer immediately to avoid delays */
>> > >> +   if (!err && (chan->txdone_method & TXDONE_BY_POLL)) {
>> > >> +   /* Disable the timer if already active ... */
>> > >> +   hrtimer_cancel(&chan->mbox->poll_hrt);
>> > >> +
>> > >> +   /* ... and kick start it immediately to avoid delays */
>> > >> hrtimer_start(&chan->mbox->poll_hrt, 0, 
>> > >> HRTIMER_MODE_REL);
>> > >> +   }
>> > >>  }
>> > >>  
>> > >>  static void tx_tick(struct mbox_chan *chan, int r)
>> > >
>> > > I've tracked a regression back to this commit. Details to reproduce:
>> > 
>> > Hi Ionela,
>> > 
>> > I don't have access to your platform and I don't get what is going on
>> > from the log below.
>> > 
>> > Could you please give us a bit more details about what is going on ?
>> > 
>> 
>> I'm not familiar with the mailbox subsystem, so the best I can do right
>> now is to add Sudeep to Cc, in case this conflicts in some way with the
>> ARM MHU patches [1].
>>
>
> Not it can't be doorbell driver as we use SCPI(old firmware) with upstream
> MHU driver as is limiting the number of channels to be used.
>
>> In the meantime I'll get some traces and get more familiar with the
>> code.
>>
>
> I will try that too.

BTW, this issue was originally reported on amlogic platforms which also
use arm,mhu mailbox driver.



Re: [PATCH] mailbox: cancel timer before starting it

2020-10-16 Thread Jerome Brunet


On Fri 16 Oct 2020 at 10:52, Ionela Voinescu  wrote:

> On Thursday 15 Oct 2020 at 13:45:54 (-0500), Jassi Brar wrote:
> [..]
>> > >> --- a/drivers/mailbox/mailbox.c
>> > >> +++ b/drivers/mailbox/mailbox.c
>> > >> @@ -82,9 +82,13 @@ static void msg_submit(struct mbox_chan *chan)
>> > >>  exit:
>> > >>  spin_unlock_irqrestore(&chan->lock, flags);
>> > >>
>> > >> -if (!err && (chan->txdone_method & TXDONE_BY_POLL))
>> > >> -/* kick start the timer immediately to avoid delays */
>> > >> +if (!err && (chan->txdone_method & TXDONE_BY_POLL)) {
>> > >> +/* Disable the timer if already active ... */
>> > >> +hrtimer_cancel(&chan->mbox->poll_hrt);
>> > >> +
>> > >> +/* ... and kick start it immediately to avoid delays */
>> > >>  hrtimer_start(&chan->mbox->poll_hrt, 0, HRTIMER_MODE_REL);
>> > >> +}
>> > >>  }
>> > >>
>> > >>  static void tx_tick(struct mbox_chan *chan, int r)
>> > >
>> > > I've tracked a regression back to this commit. Details to reproduce:
>> >
>> > Hi Ionela,
>> >
>> > I don't have access to your platform and I don't get what is going on
>> > from the log below.
>> >
>> > Could you please give us a bit more details about what is going on ?
>> >
>> > All this patch does is add hrtimer_cancel().
>> > * It is needed if the timer had already been started, which is
>> >   appropriate AFAIU
>> > * It is a NO-OP is the timer is not active.
>> >
>> Can you please try using hrtimer_try_to_cancel() instead ?
>> 
>
> Yes, using hrtimer_try_to_cancel() instead works for me. But doesn't
> this limit how effective this change is? AFAIU, this will possibly only
> reduce the chances for the race condition, but not solve it.
>

It is also my understanding, hrtimer_try_to_cancel() would remove a
timer which as not already started but would return withtout doing
anything if the callback is already running ... which is the original
problem


> Thanks,
> Ionela.
>
>> thanks



Re: [PATCH] mailbox: cancel timer before starting it

2020-10-16 Thread Ionela Voinescu
On Thursday 15 Oct 2020 at 13:45:54 (-0500), Jassi Brar wrote:
[..]
> > >> --- a/drivers/mailbox/mailbox.c
> > >> +++ b/drivers/mailbox/mailbox.c
> > >> @@ -82,9 +82,13 @@ static void msg_submit(struct mbox_chan *chan)
> > >>  exit:
> > >>  spin_unlock_irqrestore(&chan->lock, flags);
> > >>
> > >> -if (!err && (chan->txdone_method & TXDONE_BY_POLL))
> > >> -/* kick start the timer immediately to avoid delays */
> > >> +if (!err && (chan->txdone_method & TXDONE_BY_POLL)) {
> > >> +/* Disable the timer if already active ... */
> > >> +hrtimer_cancel(&chan->mbox->poll_hrt);
> > >> +
> > >> +/* ... and kick start it immediately to avoid delays */
> > >>  hrtimer_start(&chan->mbox->poll_hrt, 0, HRTIMER_MODE_REL);
> > >> +}
> > >>  }
> > >>
> > >>  static void tx_tick(struct mbox_chan *chan, int r)
> > >
> > > I've tracked a regression back to this commit. Details to reproduce:
> >
> > Hi Ionela,
> >
> > I don't have access to your platform and I don't get what is going on
> > from the log below.
> >
> > Could you please give us a bit more details about what is going on ?
> >
> > All this patch does is add hrtimer_cancel().
> > * It is needed if the timer had already been started, which is
> >   appropriate AFAIU
> > * It is a NO-OP is the timer is not active.
> >
> Can you please try using hrtimer_try_to_cancel() instead ?
> 

Yes, using hrtimer_try_to_cancel() instead works for me. But doesn't
this limit how effective this change is? AFAIU, this will possibly only
reduce the chances for the race condition, but not solve it.

Thanks,
Ionela.

> thanks


Re: [PATCH] mailbox: cancel timer before starting it

2020-10-16 Thread Sudeep Holla
On Thu, Oct 15, 2020 at 03:29:35PM +0100, Ionela Voinescu wrote:
> Hi Jerome,
> 
> On Thursday 15 Oct 2020 at 15:58:30 (+0200), Jerome Brunet wrote:
> > 
> > On Thu 15 Oct 2020 at 15:46, Ionela Voinescu  
> > wrote:
> > 
> > > Hi guys,
> > >
> > > On Wednesday 23 Sep 2020 at 14:39:16 (+0200), Jerome Brunet wrote:
> > >> If the txdone is done by polling, it is possible for msg_submit() to 
> > >> start
> > >> the timer while txdone_hrtimer() callback is running. If the timer needs
> > >> recheduling, it could already be enqueued by the time 
> > >> hrtimer_forward_now()
> > >> is called, leading hrtimer to loudly complain.
> > >> 
> > >> WARNING: CPU: 3 PID: 74 at kernel/time/hrtimer.c:932 
> > >> hrtimer_forward+0xc4/0x110
> > >> CPU: 3 PID: 74 Comm: kworker/u8:1 Not tainted 
> > >> 5.9.0-rc2-00236-gd3520067d01c-dirty #5
> > >> Hardware name: Libre Computer AML-S805X-AC (DT)
> > >> Workqueue: events_freezable_power_ thermal_zone_device_check
> > >> pstate: 2085 (nzCv daIf -PAN -UAO BTYPE=--)
> > >> pc : hrtimer_forward+0xc4/0x110
> > >> lr : txdone_hrtimer+0xf8/0x118
> > >> [...]
> > >> 
> > >> Canceling the timer before starting it ensure that the timer callback is
> > >> not running when the timer is started, solving this race condition.
> > >> 
> > >> Fixes: 0cc67945ea59 ("mailbox: switch to hrtimer for tx_complete 
> > >> polling")
> > >> Reported-by: Da Xue 
> > >> Signed-off-by: Jerome Brunet 
> > >> ---
> > >>  drivers/mailbox/mailbox.c | 8 ++--
> > >>  1 file changed, 6 insertions(+), 2 deletions(-)
> > >> 
> > >> diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
> > >> index 0b821a5b2db8..34f9ab01caef 100644
> > >> --- a/drivers/mailbox/mailbox.c
> > >> +++ b/drivers/mailbox/mailbox.c
> > >> @@ -82,9 +82,13 @@ static void msg_submit(struct mbox_chan *chan)
> > >>  exit:
> > >>  spin_unlock_irqrestore(&chan->lock, flags);
> > >>  
> > >> -if (!err && (chan->txdone_method & TXDONE_BY_POLL))
> > >> -/* kick start the timer immediately to avoid delays */
> > >> +if (!err && (chan->txdone_method & TXDONE_BY_POLL)) {
> > >> +/* Disable the timer if already active ... */
> > >> +hrtimer_cancel(&chan->mbox->poll_hrt);
> > >> +
> > >> +/* ... and kick start it immediately to avoid delays */
> > >>  hrtimer_start(&chan->mbox->poll_hrt, 0, 
> > >> HRTIMER_MODE_REL);
> > >> +}
> > >>  }
> > >>  
> > >>  static void tx_tick(struct mbox_chan *chan, int r)
> > >
> > > I've tracked a regression back to this commit. Details to reproduce:
> > 
> > Hi Ionela,
> > 
> > I don't have access to your platform and I don't get what is going on
> > from the log below.
> > 
> > Could you please give us a bit more details about what is going on ?
> > 
> 
> I'm not familiar with the mailbox subsystem, so the best I can do right
> now is to add Sudeep to Cc, in case this conflicts in some way with the
> ARM MHU patches [1].
>

Not it can't be doorbell driver as we use SCPI(old firmware) with upstream
MHU driver as is limiting the number of channels to be used.

> In the meantime I'll get some traces and get more familiar with the
> code.
>

I will try that too.

-- 
Regards,
Sudeep


Re: [PATCH] mailbox: cancel timer before starting it

2020-10-15 Thread Jassi Brar
On Thu, Oct 15, 2020 at 8:58 AM Jerome Brunet  wrote:
>
>
> On Thu 15 Oct 2020 at 15:46, Ionela Voinescu  wrote:
>
> > Hi guys,
> >
> > On Wednesday 23 Sep 2020 at 14:39:16 (+0200), Jerome Brunet wrote:
> >> If the txdone is done by polling, it is possible for msg_submit() to start
> >> the timer while txdone_hrtimer() callback is running. If the timer needs
> >> recheduling, it could already be enqueued by the time hrtimer_forward_now()
> >> is called, leading hrtimer to loudly complain.
> >>
> >> WARNING: CPU: 3 PID: 74 at kernel/time/hrtimer.c:932 
> >> hrtimer_forward+0xc4/0x110
> >> CPU: 3 PID: 74 Comm: kworker/u8:1 Not tainted 
> >> 5.9.0-rc2-00236-gd3520067d01c-dirty #5
> >> Hardware name: Libre Computer AML-S805X-AC (DT)
> >> Workqueue: events_freezable_power_ thermal_zone_device_check
> >> pstate: 2085 (nzCv daIf -PAN -UAO BTYPE=--)
> >> pc : hrtimer_forward+0xc4/0x110
> >> lr : txdone_hrtimer+0xf8/0x118
> >> [...]
> >>
> >> Canceling the timer before starting it ensure that the timer callback is
> >> not running when the timer is started, solving this race condition.
> >>
> >> Fixes: 0cc67945ea59 ("mailbox: switch to hrtimer for tx_complete polling")
> >> Reported-by: Da Xue 
> >> Signed-off-by: Jerome Brunet 
> >> ---
> >>  drivers/mailbox/mailbox.c | 8 ++--
> >>  1 file changed, 6 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
> >> index 0b821a5b2db8..34f9ab01caef 100644
> >> --- a/drivers/mailbox/mailbox.c
> >> +++ b/drivers/mailbox/mailbox.c
> >> @@ -82,9 +82,13 @@ static void msg_submit(struct mbox_chan *chan)
> >>  exit:
> >>  spin_unlock_irqrestore(&chan->lock, flags);
> >>
> >> -if (!err && (chan->txdone_method & TXDONE_BY_POLL))
> >> -/* kick start the timer immediately to avoid delays */
> >> +if (!err && (chan->txdone_method & TXDONE_BY_POLL)) {
> >> +/* Disable the timer if already active ... */
> >> +hrtimer_cancel(&chan->mbox->poll_hrt);
> >> +
> >> +/* ... and kick start it immediately to avoid delays */
> >>  hrtimer_start(&chan->mbox->poll_hrt, 0, HRTIMER_MODE_REL);
> >> +}
> >>  }
> >>
> >>  static void tx_tick(struct mbox_chan *chan, int r)
> >
> > I've tracked a regression back to this commit. Details to reproduce:
>
> Hi Ionela,
>
> I don't have access to your platform and I don't get what is going on
> from the log below.
>
> Could you please give us a bit more details about what is going on ?
>
> All this patch does is add hrtimer_cancel().
> * It is needed if the timer had already been started, which is
>   appropriate AFAIU
> * It is a NO-OP is the timer is not active.
>
Can you please try using hrtimer_try_to_cancel() instead ?

thanks


Re: [PATCH] mailbox: cancel timer before starting it

2020-10-15 Thread Ionela Voinescu
Hi Jerome,

On Thursday 15 Oct 2020 at 15:58:30 (+0200), Jerome Brunet wrote:
> 
> On Thu 15 Oct 2020 at 15:46, Ionela Voinescu  wrote:
> 
> > Hi guys,
> >
> > On Wednesday 23 Sep 2020 at 14:39:16 (+0200), Jerome Brunet wrote:
> >> If the txdone is done by polling, it is possible for msg_submit() to start
> >> the timer while txdone_hrtimer() callback is running. If the timer needs
> >> recheduling, it could already be enqueued by the time hrtimer_forward_now()
> >> is called, leading hrtimer to loudly complain.
> >> 
> >> WARNING: CPU: 3 PID: 74 at kernel/time/hrtimer.c:932 
> >> hrtimer_forward+0xc4/0x110
> >> CPU: 3 PID: 74 Comm: kworker/u8:1 Not tainted 
> >> 5.9.0-rc2-00236-gd3520067d01c-dirty #5
> >> Hardware name: Libre Computer AML-S805X-AC (DT)
> >> Workqueue: events_freezable_power_ thermal_zone_device_check
> >> pstate: 2085 (nzCv daIf -PAN -UAO BTYPE=--)
> >> pc : hrtimer_forward+0xc4/0x110
> >> lr : txdone_hrtimer+0xf8/0x118
> >> [...]
> >> 
> >> Canceling the timer before starting it ensure that the timer callback is
> >> not running when the timer is started, solving this race condition.
> >> 
> >> Fixes: 0cc67945ea59 ("mailbox: switch to hrtimer for tx_complete polling")
> >> Reported-by: Da Xue 
> >> Signed-off-by: Jerome Brunet 
> >> ---
> >>  drivers/mailbox/mailbox.c | 8 ++--
> >>  1 file changed, 6 insertions(+), 2 deletions(-)
> >> 
> >> diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
> >> index 0b821a5b2db8..34f9ab01caef 100644
> >> --- a/drivers/mailbox/mailbox.c
> >> +++ b/drivers/mailbox/mailbox.c
> >> @@ -82,9 +82,13 @@ static void msg_submit(struct mbox_chan *chan)
> >>  exit:
> >>spin_unlock_irqrestore(&chan->lock, flags);
> >>  
> >> -  if (!err && (chan->txdone_method & TXDONE_BY_POLL))
> >> -  /* kick start the timer immediately to avoid delays */
> >> +  if (!err && (chan->txdone_method & TXDONE_BY_POLL)) {
> >> +  /* Disable the timer if already active ... */
> >> +  hrtimer_cancel(&chan->mbox->poll_hrt);
> >> +
> >> +  /* ... and kick start it immediately to avoid delays */
> >>hrtimer_start(&chan->mbox->poll_hrt, 0, HRTIMER_MODE_REL);
> >> +  }
> >>  }
> >>  
> >>  static void tx_tick(struct mbox_chan *chan, int r)
> >
> > I've tracked a regression back to this commit. Details to reproduce:
> 
> Hi Ionela,
> 
> I don't have access to your platform and I don't get what is going on
> from the log below.
> 
> Could you please give us a bit more details about what is going on ?
> 

I'm not familiar with the mailbox subsystem, so the best I can do right
now is to add Sudeep to Cc, in case this conflicts in some way with the
ARM MHU patches [1].

In the meantime I'll get some traces and get more familiar with the
code.

[1]
https://lore.kernel.org/linux-arm-kernel/20201009113129.uqw5zrqztjgw6vga@bogus/

Hope it helps,
Ionela.

> All this patch does is add hrtimer_cancel().
> * It is needed if the timer had already been started, which is
>   appropriate AFAIU
> * It is a NO-OP is the timer is not active.
> 
> >
> >
> >  - HEAD: (linux-next)
> >* 62c04453381e  Jerome Brunet   3 weeks ago mailbox: cancel timer before 
> > starting it
> >
> >  - Platform: arm64 Juno R0 and Juno R2 [1]
> >
> >  - Partial log:
> > [0.00] Booting Linux on physical CPU 0x000100 [0x410fd030]
> > [0.00] Linux version 5.9.0-rc8-01722-g62c04453381e () 
> > (aarch64-none-linux-gnu-gcc (GNU Toolchain for the A-profile Architecture 
> > 9.2-2019.12 (arm-9.10)) 9.2.1 20191025, GNU ld (GNU Toolchain for the 
> > A-profile Architecture 9.2-2019.12 (arm-9.10)) 2.33.1.20191209) #175 SMP 
> > PREEMPT Thu Oct 15 14:17:41 BST 2020
> > [0.00] Machine model: ARM Juno development board (r0)
> > [..]
> > [1.714340] mhu 2b1f.mhu: ARM MHU Mailbox registered
> > [1.722768] NET: Registered protocol family 17
> > [1.727364] 9pnet: Installing 9P2000 support
> > [1.731689] Key type dns_resolver registered
> > [1.735474] usb 1-1: new high-speed USB device number 2 using 
> > ehci-platform
> > [1.736407] registered taskstats version 1
> > [1.747061] Loading compiled-in X.509 certificates
> > [1.755885] scpi_protocol scpi: SCP Protocol 1.2 Firmware 1.21.0 
> > version
> > [1.770484] cpu cpu0: EM: created perf domain
> > [1.778505] cpu cpu1: EM: created perf domain
> > [1.807449] scpi_clocks scpi:clocks: failed to register clock 
> > 'pxlclk'
> > [1.897593] hub 1-1:1.0: USB hub found
> > [1.901656] hub 1-1:1.0: 4 ports detected
> > [2.559453] atkbd serio0: keyboard reset failed on 1c06.kmi
> > [   22.787431] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > [   22.793536] rcu: 1-...0: (1 ticks this GP) 
> > idle=222/1/0x4002 softirq=63/64 fqs=2626
> > [   22.802421]  (detected by 2, t=5255 jiffies, g=-991, q=9)
> > [   22.807823] Task dump for CPU 1:
> 

Re: [PATCH] mailbox: cancel timer before starting it

2020-10-15 Thread Jerome Brunet


On Thu 15 Oct 2020 at 15:46, Ionela Voinescu  wrote:

> Hi guys,
>
> On Wednesday 23 Sep 2020 at 14:39:16 (+0200), Jerome Brunet wrote:
>> If the txdone is done by polling, it is possible for msg_submit() to start
>> the timer while txdone_hrtimer() callback is running. If the timer needs
>> recheduling, it could already be enqueued by the time hrtimer_forward_now()
>> is called, leading hrtimer to loudly complain.
>> 
>> WARNING: CPU: 3 PID: 74 at kernel/time/hrtimer.c:932 
>> hrtimer_forward+0xc4/0x110
>> CPU: 3 PID: 74 Comm: kworker/u8:1 Not tainted 
>> 5.9.0-rc2-00236-gd3520067d01c-dirty #5
>> Hardware name: Libre Computer AML-S805X-AC (DT)
>> Workqueue: events_freezable_power_ thermal_zone_device_check
>> pstate: 2085 (nzCv daIf -PAN -UAO BTYPE=--)
>> pc : hrtimer_forward+0xc4/0x110
>> lr : txdone_hrtimer+0xf8/0x118
>> [...]
>> 
>> Canceling the timer before starting it ensure that the timer callback is
>> not running when the timer is started, solving this race condition.
>> 
>> Fixes: 0cc67945ea59 ("mailbox: switch to hrtimer for tx_complete polling")
>> Reported-by: Da Xue 
>> Signed-off-by: Jerome Brunet 
>> ---
>>  drivers/mailbox/mailbox.c | 8 ++--
>>  1 file changed, 6 insertions(+), 2 deletions(-)
>> 
>> diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
>> index 0b821a5b2db8..34f9ab01caef 100644
>> --- a/drivers/mailbox/mailbox.c
>> +++ b/drivers/mailbox/mailbox.c
>> @@ -82,9 +82,13 @@ static void msg_submit(struct mbox_chan *chan)
>>  exit:
>>  spin_unlock_irqrestore(&chan->lock, flags);
>>  
>> -if (!err && (chan->txdone_method & TXDONE_BY_POLL))
>> -/* kick start the timer immediately to avoid delays */
>> +if (!err && (chan->txdone_method & TXDONE_BY_POLL)) {
>> +/* Disable the timer if already active ... */
>> +hrtimer_cancel(&chan->mbox->poll_hrt);
>> +
>> +/* ... and kick start it immediately to avoid delays */
>>  hrtimer_start(&chan->mbox->poll_hrt, 0, HRTIMER_MODE_REL);
>> +}
>>  }
>>  
>>  static void tx_tick(struct mbox_chan *chan, int r)
>
> I've tracked a regression back to this commit. Details to reproduce:

Hi Ionela,

I don't have access to your platform and I don't get what is going on
from the log below.

Could you please give us a bit more details about what is going on ?

All this patch does is add hrtimer_cancel().
* It is needed if the timer had already been started, which is
  appropriate AFAIU
* It is a NO-OP is the timer is not active.

>
>
>  - HEAD: (linux-next)
>* 62c04453381e  Jerome Brunet   3 weeks ago mailbox: cancel timer before 
> starting it
>
>  - Platform: arm64 Juno R0 and Juno R2 [1]
>
>  - Partial log:
>   [0.00] Booting Linux on physical CPU 0x000100 [0x410fd030]
>   [0.00] Linux version 5.9.0-rc8-01722-g62c04453381e () 
> (aarch64-none-linux-gnu-gcc (GNU Toolchain for the A-profile Architecture 
> 9.2-2019.12 (arm-9.10)) 9.2.1 20191025, GNU ld (GNU Toolchain for the 
> A-profile Architecture 9.2-2019.12 (arm-9.10)) 2.33.1.20191209) #175 SMP 
> PREEMPT Thu Oct 15 14:17:41 BST 2020
>   [0.00] Machine model: ARM Juno development board (r0)
>   [..]
>   [1.714340] mhu 2b1f.mhu: ARM MHU Mailbox registered
>   [1.722768] NET: Registered protocol family 17
>   [1.727364] 9pnet: Installing 9P2000 support
>   [1.731689] Key type dns_resolver registered
>   [1.735474] usb 1-1: new high-speed USB device number 2 using 
> ehci-platform
>   [1.736407] registered taskstats version 1
>   [1.747061] Loading compiled-in X.509 certificates
>   [1.755885] scpi_protocol scpi: SCP Protocol 1.2 Firmware 1.21.0 
> version
>   [1.770484] cpu cpu0: EM: created perf domain
>   [1.778505] cpu cpu1: EM: created perf domain
>   [1.807449] scpi_clocks scpi:clocks: failed to register clock 
> 'pxlclk'
>   [1.897593] hub 1-1:1.0: USB hub found
>   [1.901656] hub 1-1:1.0: 4 ports detected
>   [2.559453] atkbd serio0: keyboard reset failed on 1c06.kmi
>   [   22.787431] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
>   [   22.793536] rcu: 1-...0: (1 ticks this GP) 
> idle=222/1/0x4002 softirq=63/64 fqs=2626
>   [   22.802421]  (detected by 2, t=5255 jiffies, g=-991, q=9)
>   [   22.807823] Task dump for CPU 1:
>   [   22.811049] task:swapper/1   state:R  running task stack:
> 0 pid:0 ppid: 1 flags:0x002a
>   [   22.820980] Call trace:
>   [   22.823429]  __switch_to+0x138/0x198
>   [   23.583444] rcu: INFO: rcu_preempt detected expedited stalls on 
> CPUs/tasks: { 1-... } 5443 jiffies s: 49 root: 0x2/.
>   [   23.593995] rcu: blocking rcu_node structures:
>   [   23.598449] Task dump for CPU 1:
>   [   23.601680] task:swapper/1   state:R  running task stack:
> 0 pid:0 ppid: 1 flags:0x002a
>   [   23.6

Re: [PATCH] mailbox: cancel timer before starting it

2020-10-15 Thread Ionela Voinescu
Hi guys,

On Wednesday 23 Sep 2020 at 14:39:16 (+0200), Jerome Brunet wrote:
> If the txdone is done by polling, it is possible for msg_submit() to start
> the timer while txdone_hrtimer() callback is running. If the timer needs
> recheduling, it could already be enqueued by the time hrtimer_forward_now()
> is called, leading hrtimer to loudly complain.
> 
> WARNING: CPU: 3 PID: 74 at kernel/time/hrtimer.c:932 
> hrtimer_forward+0xc4/0x110
> CPU: 3 PID: 74 Comm: kworker/u8:1 Not tainted 
> 5.9.0-rc2-00236-gd3520067d01c-dirty #5
> Hardware name: Libre Computer AML-S805X-AC (DT)
> Workqueue: events_freezable_power_ thermal_zone_device_check
> pstate: 2085 (nzCv daIf -PAN -UAO BTYPE=--)
> pc : hrtimer_forward+0xc4/0x110
> lr : txdone_hrtimer+0xf8/0x118
> [...]
> 
> Canceling the timer before starting it ensure that the timer callback is
> not running when the timer is started, solving this race condition.
> 
> Fixes: 0cc67945ea59 ("mailbox: switch to hrtimer for tx_complete polling")
> Reported-by: Da Xue 
> Signed-off-by: Jerome Brunet 
> ---
>  drivers/mailbox/mailbox.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
> index 0b821a5b2db8..34f9ab01caef 100644
> --- a/drivers/mailbox/mailbox.c
> +++ b/drivers/mailbox/mailbox.c
> @@ -82,9 +82,13 @@ static void msg_submit(struct mbox_chan *chan)
>  exit:
>   spin_unlock_irqrestore(&chan->lock, flags);
>  
> - if (!err && (chan->txdone_method & TXDONE_BY_POLL))
> - /* kick start the timer immediately to avoid delays */
> + if (!err && (chan->txdone_method & TXDONE_BY_POLL)) {
> + /* Disable the timer if already active ... */
> + hrtimer_cancel(&chan->mbox->poll_hrt);
> +
> + /* ... and kick start it immediately to avoid delays */
>   hrtimer_start(&chan->mbox->poll_hrt, 0, HRTIMER_MODE_REL);
> + }
>  }
>  
>  static void tx_tick(struct mbox_chan *chan, int r)

I've tracked a regression back to this commit. Details to reproduce:


 - HEAD: (linux-next)
   * 62c04453381e  Jerome Brunet   3 weeks ago mailbox: cancel timer before 
starting it

 - Platform: arm64 Juno R0 and Juno R2 [1]

 - Partial log:
[0.00] Booting Linux on physical CPU 0x000100 [0x410fd030]
[0.00] Linux version 5.9.0-rc8-01722-g62c04453381e () 
(aarch64-none-linux-gnu-gcc (GNU Toolchain for the A-profile Architecture 
9.2-2019.12 (arm-9.10)) 9.2.1 20191025, GNU ld (GNU Toolchain for the A-profile 
Architecture 9.2-2019.12 (arm-9.10)) 2.33.1.20191209) #175 SMP PREEMPT Thu Oct 
15 14:17:41 BST 2020
[0.00] Machine model: ARM Juno development board (r0)
[..]
[1.714340] mhu 2b1f.mhu: ARM MHU Mailbox registered
[1.722768] NET: Registered protocol family 17
[1.727364] 9pnet: Installing 9P2000 support
[1.731689] Key type dns_resolver registered
[1.735474] usb 1-1: new high-speed USB device number 2 using 
ehci-platform
[1.736407] registered taskstats version 1
[1.747061] Loading compiled-in X.509 certificates
[1.755885] scpi_protocol scpi: SCP Protocol 1.2 Firmware 1.21.0 
version
[1.770484] cpu cpu0: EM: created perf domain
[1.778505] cpu cpu1: EM: created perf domain
[1.807449] scpi_clocks scpi:clocks: failed to register clock 
'pxlclk'
[1.897593] hub 1-1:1.0: USB hub found
[1.901656] hub 1-1:1.0: 4 ports detected
[2.559453] atkbd serio0: keyboard reset failed on 1c06.kmi
[   22.787431] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[   22.793536] rcu: 1-...0: (1 ticks this GP) 
idle=222/1/0x4002 softirq=63/64 fqs=2626
[   22.802421]  (detected by 2, t=5255 jiffies, g=-991, q=9)
[   22.807823] Task dump for CPU 1:
[   22.811049] task:swapper/1   state:R  running task stack:
0 pid:0 ppid: 1 flags:0x002a
[   22.820980] Call trace:
[   22.823429]  __switch_to+0x138/0x198
[   23.583444] rcu: INFO: rcu_preempt detected expedited stalls on 
CPUs/tasks: { 1-... } 5443 jiffies s: 49 root: 0x2/.
[   23.593995] rcu: blocking rcu_node structures:
[   23.598449] Task dump for CPU 1:
[   23.601680] task:swapper/1   state:R  running task stack:
0 pid:0 ppid: 1 flags:0x002a
[   23.611619] Call trace:
[   23.614064]  __switch_to+0x138/0x198
[   85.807430] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[   85.813534] rcu: 1-...0: (1 ticks this GP) 
idle=222/1/0x4002 softirq=63/64 fqs=10502
[   85.822506]  (detected by 2, t=21009 jiffies, g=-991, q=9)
[   85.827994] Task dump for CPU 1:
[   85.831220] task:swapper/1   state:R  running task stack:
0 pid:0 ppid: 1 flags:0x

Re: [PATCH] mailbox: cancel timer before starting it

2020-09-24 Thread Kevin Hilman
Jerome Brunet  writes:

> If the txdone is done by polling, it is possible for msg_submit() to start
> the timer while txdone_hrtimer() callback is running. If the timer needs
> recheduling, it could already be enqueued by the time hrtimer_forward_now()
> is called, leading hrtimer to loudly complain.
>
> WARNING: CPU: 3 PID: 74 at kernel/time/hrtimer.c:932 
> hrtimer_forward+0xc4/0x110
> CPU: 3 PID: 74 Comm: kworker/u8:1 Not tainted 
> 5.9.0-rc2-00236-gd3520067d01c-dirty #5
> Hardware name: Libre Computer AML-S805X-AC (DT)
> Workqueue: events_freezable_power_ thermal_zone_device_check
> pstate: 2085 (nzCv daIf -PAN -UAO BTYPE=--)
> pc : hrtimer_forward+0xc4/0x110
> lr : txdone_hrtimer+0xf8/0x118
> [...]
>
> Canceling the timer before starting it ensure that the timer callback is
> not running when the timer is started, solving this race condition.
>
> Fixes: 0cc67945ea59 ("mailbox: switch to hrtimer for tx_complete polling")
> Reported-by: Da Xue 
> Signed-off-by: Jerome Brunet 

Reviewed-by: Kevin Hilman 

Kevin


[PATCH] mailbox: cancel timer before starting it

2020-09-23 Thread Jerome Brunet
If the txdone is done by polling, it is possible for msg_submit() to start
the timer while txdone_hrtimer() callback is running. If the timer needs
recheduling, it could already be enqueued by the time hrtimer_forward_now()
is called, leading hrtimer to loudly complain.

WARNING: CPU: 3 PID: 74 at kernel/time/hrtimer.c:932 hrtimer_forward+0xc4/0x110
CPU: 3 PID: 74 Comm: kworker/u8:1 Not tainted 
5.9.0-rc2-00236-gd3520067d01c-dirty #5
Hardware name: Libre Computer AML-S805X-AC (DT)
Workqueue: events_freezable_power_ thermal_zone_device_check
pstate: 2085 (nzCv daIf -PAN -UAO BTYPE=--)
pc : hrtimer_forward+0xc4/0x110
lr : txdone_hrtimer+0xf8/0x118
[...]

Canceling the timer before starting it ensure that the timer callback is
not running when the timer is started, solving this race condition.

Fixes: 0cc67945ea59 ("mailbox: switch to hrtimer for tx_complete polling")
Reported-by: Da Xue 
Signed-off-by: Jerome Brunet 
---
 drivers/mailbox/mailbox.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
index 0b821a5b2db8..34f9ab01caef 100644
--- a/drivers/mailbox/mailbox.c
+++ b/drivers/mailbox/mailbox.c
@@ -82,9 +82,13 @@ static void msg_submit(struct mbox_chan *chan)
 exit:
spin_unlock_irqrestore(&chan->lock, flags);
 
-   if (!err && (chan->txdone_method & TXDONE_BY_POLL))
-   /* kick start the timer immediately to avoid delays */
+   if (!err && (chan->txdone_method & TXDONE_BY_POLL)) {
+   /* Disable the timer if already active ... */
+   hrtimer_cancel(&chan->mbox->poll_hrt);
+
+   /* ... and kick start it immediately to avoid delays */
hrtimer_start(&chan->mbox->poll_hrt, 0, HRTIMER_MODE_REL);
+   }
 }
 
 static void tx_tick(struct mbox_chan *chan, int r)
-- 
2.25.4