Re: [PATCH] blk-mq: Directly schedule q->timeout_work when aborting a request

2018-04-10 Thread Martin Steigerwald
Martin Steigerwald - 10.04.18, 20:43:
> Tejun Heo - 03.04.18, 00:04:
> > Request abortion is performed by overriding deadline to now and
> > scheduling timeout handling immediately.  For the latter part, the
> > code was using mod_timer(timeout, 0) which can't guarantee that the
> > timer runs afterwards.  Let's schedule the underlying work item
> > directly instead.
> > 
> > This fixes the hangs during probing reported by Sitsofe but it isn't
> > yet clear to me how the failure can happen reliably if it's just the
> > above described race condition.
> 
> Compiling a 4.16.1 kernel with that patch to test whether this fixes
> the boot hang I reported in:
> 
> [Possible REGRESSION, 4.16-rc4] Error updating SMART data during
> runtime and boot failures with blk_mq_terminate_expired in backtrace
> https://bugzilla.kernel.org/show_bug.cgi?id=199077

Fails as well, see

https://bugzilla.kernel.org/show_bug.cgi?id=199077#c8

for photo with (part of) backtrace.

> The "Error updating SMART data during runtime" thing I reported there
> as well may still be another (independent) issue.
> 
> > Signed-off-by: Tejun Heo 
> > Reported-by: Sitsofe Wheeler 
> > Reported-by: Meelis Roos 
> > Fixes: 358f70da49d7 ("blk-mq: make blk_abort_request() trigger
> > timeout path") Cc: sta...@vger.kernel.org # v4.16
> > Link:
> > http://lkml.kernel.org/r/CALjAwxh-PVYFnYFCJpGOja+m5SzZ8Sa4J7ohxdK=r8
> > NyOF-EM a...@mail.gmail.com Link:
> > http://lkml.kernel.org/r/alpine.lrh.2.21.1802261049140.4...@math.ut.
> > ee --- Hello,
> > 
> > I don't have the full explanation yet but here's a preliminary
> > patch.
> > 
> > Thanks.
> > 
> >  block/blk-timeout.c |2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/block/blk-timeout.c b/block/blk-timeout.c
> > index a05e367..f0e6e41 100644
> > --- a/block/blk-timeout.c
> > +++ b/block/blk-timeout.c
> > @@ -165,7 +165,7 @@ void blk_abort_request(struct request *req)
> > 
> >  * No need for fancy synchronizations.
> >  */
> > 
> > blk_rq_set_deadline(req, jiffies);
> > 
> > -   mod_timer(>q->timeout, 0);
> > +   kblockd_schedule_work(>q->timeout_work);
> > 
> > } else {
> > 
> > if (blk_mark_rq_complete(req))
> > 
> > return;


-- 
Martin




Re: [PATCH] blk-mq: Directly schedule q->timeout_work when aborting a request

2018-04-10 Thread Martin Steigerwald
Martin Steigerwald - 10.04.18, 20:43:
> Tejun Heo - 03.04.18, 00:04:
> > Request abortion is performed by overriding deadline to now and
> > scheduling timeout handling immediately.  For the latter part, the
> > code was using mod_timer(timeout, 0) which can't guarantee that the
> > timer runs afterwards.  Let's schedule the underlying work item
> > directly instead.
> > 
> > This fixes the hangs during probing reported by Sitsofe but it isn't
> > yet clear to me how the failure can happen reliably if it's just the
> > above described race condition.
> 
> Compiling a 4.16.1 kernel with that patch to test whether this fixes
> the boot hang I reported in:
> 
> [Possible REGRESSION, 4.16-rc4] Error updating SMART data during
> runtime and boot failures with blk_mq_terminate_expired in backtrace
> https://bugzilla.kernel.org/show_bug.cgi?id=199077

Fails as well, see

https://bugzilla.kernel.org/show_bug.cgi?id=199077#c8

for photo with (part of) backtrace.

> The "Error updating SMART data during runtime" thing I reported there
> as well may still be another (independent) issue.
> 
> > Signed-off-by: Tejun Heo 
> > Reported-by: Sitsofe Wheeler 
> > Reported-by: Meelis Roos 
> > Fixes: 358f70da49d7 ("blk-mq: make blk_abort_request() trigger
> > timeout path") Cc: sta...@vger.kernel.org # v4.16
> > Link:
> > http://lkml.kernel.org/r/CALjAwxh-PVYFnYFCJpGOja+m5SzZ8Sa4J7ohxdK=r8
> > NyOF-EM a...@mail.gmail.com Link:
> > http://lkml.kernel.org/r/alpine.lrh.2.21.1802261049140.4...@math.ut.
> > ee --- Hello,
> > 
> > I don't have the full explanation yet but here's a preliminary
> > patch.
> > 
> > Thanks.
> > 
> >  block/blk-timeout.c |2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/block/blk-timeout.c b/block/blk-timeout.c
> > index a05e367..f0e6e41 100644
> > --- a/block/blk-timeout.c
> > +++ b/block/blk-timeout.c
> > @@ -165,7 +165,7 @@ void blk_abort_request(struct request *req)
> > 
> >  * No need for fancy synchronizations.
> >  */
> > 
> > blk_rq_set_deadline(req, jiffies);
> > 
> > -   mod_timer(>q->timeout, 0);
> > +   kblockd_schedule_work(>q->timeout_work);
> > 
> > } else {
> > 
> > if (blk_mark_rq_complete(req))
> > 
> > return;


-- 
Martin




Re: [PATCH] blk-mq: Directly schedule q->timeout_work when aborting a request

2018-04-10 Thread Martin Steigerwald
Tejun Heo - 03.04.18, 00:04:
> Request abortion is performed by overriding deadline to now and
> scheduling timeout handling immediately.  For the latter part, the
> code was using mod_timer(timeout, 0) which can't guarantee that the
> timer runs afterwards.  Let's schedule the underlying work item
> directly instead.
> 
> This fixes the hangs during probing reported by Sitsofe but it isn't
> yet clear to me how the failure can happen reliably if it's just the
> above described race condition.

Compiling a 4.16.1 kernel with that patch to test whether this fixes the boot 
hang I reported in:

[Possible REGRESSION, 4.16-rc4] Error updating SMART data during runtime and 
boot failures with blk_mq_terminate_expired in backtrace
https://bugzilla.kernel.org/show_bug.cgi?id=199077

The "Error updating SMART data during runtime" thing I reported there as well 
may still be another (independent) issue.

> Signed-off-by: Tejun Heo 
> Reported-by: Sitsofe Wheeler 
> Reported-by: Meelis Roos 
> Fixes: 358f70da49d7 ("blk-mq: make blk_abort_request() trigger timeout
> path") Cc: sta...@vger.kernel.org # v4.16
> Link:
> http://lkml.kernel.org/r/CALjAwxh-PVYFnYFCJpGOja+m5SzZ8Sa4J7ohxdK=r8NyOF-EM
> a...@mail.gmail.com Link:
> http://lkml.kernel.org/r/alpine.lrh.2.21.1802261049140.4...@math.ut.ee ---
> Hello,
> 
> I don't have the full explanation yet but here's a preliminary patch.
> 
> Thanks.
> 
>  block/blk-timeout.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/blk-timeout.c b/block/blk-timeout.c
> index a05e367..f0e6e41 100644
> --- a/block/blk-timeout.c
> +++ b/block/blk-timeout.c
> @@ -165,7 +165,7 @@ void blk_abort_request(struct request *req)
>* No need for fancy synchronizations.
>*/
>   blk_rq_set_deadline(req, jiffies);
> - mod_timer(>q->timeout, 0);
> + kblockd_schedule_work(>q->timeout_work);
>   } else {
>   if (blk_mark_rq_complete(req))
>   return;
-- 
Martin




Re: [PATCH] blk-mq: Directly schedule q->timeout_work when aborting a request

2018-04-10 Thread Martin Steigerwald
Tejun Heo - 03.04.18, 00:04:
> Request abortion is performed by overriding deadline to now and
> scheduling timeout handling immediately.  For the latter part, the
> code was using mod_timer(timeout, 0) which can't guarantee that the
> timer runs afterwards.  Let's schedule the underlying work item
> directly instead.
> 
> This fixes the hangs during probing reported by Sitsofe but it isn't
> yet clear to me how the failure can happen reliably if it's just the
> above described race condition.

Compiling a 4.16.1 kernel with that patch to test whether this fixes the boot 
hang I reported in:

[Possible REGRESSION, 4.16-rc4] Error updating SMART data during runtime and 
boot failures with blk_mq_terminate_expired in backtrace
https://bugzilla.kernel.org/show_bug.cgi?id=199077

The "Error updating SMART data during runtime" thing I reported there as well 
may still be another (independent) issue.

> Signed-off-by: Tejun Heo 
> Reported-by: Sitsofe Wheeler 
> Reported-by: Meelis Roos 
> Fixes: 358f70da49d7 ("blk-mq: make blk_abort_request() trigger timeout
> path") Cc: sta...@vger.kernel.org # v4.16
> Link:
> http://lkml.kernel.org/r/CALjAwxh-PVYFnYFCJpGOja+m5SzZ8Sa4J7ohxdK=r8NyOF-EM
> a...@mail.gmail.com Link:
> http://lkml.kernel.org/r/alpine.lrh.2.21.1802261049140.4...@math.ut.ee ---
> Hello,
> 
> I don't have the full explanation yet but here's a preliminary patch.
> 
> Thanks.
> 
>  block/blk-timeout.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/blk-timeout.c b/block/blk-timeout.c
> index a05e367..f0e6e41 100644
> --- a/block/blk-timeout.c
> +++ b/block/blk-timeout.c
> @@ -165,7 +165,7 @@ void blk_abort_request(struct request *req)
>* No need for fancy synchronizations.
>*/
>   blk_rq_set_deadline(req, jiffies);
> - mod_timer(>q->timeout, 0);
> + kblockd_schedule_work(>q->timeout_work);
>   } else {
>   if (blk_mark_rq_complete(req))
>   return;
-- 
Martin




Re: [PATCH] blk-mq: Directly schedule q->timeout_work when aborting a request

2018-04-06 Thread Sasha Levin
Hi,

[This is an automated email]

This commit has been processed because it contains a "Fixes:" tag,
fixing commit: 358f70da49d7 blk-mq: make blk_abort_request() trigger timeout 
path.

The bot has also determined it's probably a bug fixing patch. (score: 98.7780)

The bot has tested the following trees: v4.16.

v4.16: Build OK!

--
Thanks,
Sasha

Re: [PATCH] blk-mq: Directly schedule q->timeout_work when aborting a request

2018-04-06 Thread Sasha Levin
Hi,

[This is an automated email]

This commit has been processed because it contains a "Fixes:" tag,
fixing commit: 358f70da49d7 blk-mq: make blk_abort_request() trigger timeout 
path.

The bot has also determined it's probably a bug fixing patch. (score: 98.7780)

The bot has tested the following trees: v4.16.

v4.16: Build OK!

--
Thanks,
Sasha

Re: [PATCH] blk-mq: Directly schedule q->timeout_work when aborting a request

2018-04-02 Thread Jens Axboe
On 4/2/18 4:04 PM, Tejun Heo wrote:
> Request abortion is performed by overriding deadline to now and
> scheduling timeout handling immediately.  For the latter part, the
> code was using mod_timer(timeout, 0) which can't guarantee that the
> timer runs afterwards.  Let's schedule the underlying work item
> directly instead.
> 
> This fixes the hangs during probing reported by Sitsofe but it isn't
> yet clear to me how the failure can happen reliably if it's just the
> above described race condition.
> 
> Signed-off-by: Tejun Heo 
> Reported-by: Sitsofe Wheeler 
> Reported-by: Meelis Roos 
> Fixes: 358f70da49d7 ("blk-mq: make blk_abort_request() trigger timeout path")
> Cc: sta...@vger.kernel.org # v4.16
> Link: 
> http://lkml.kernel.org/r/CALjAwxh-PVYFnYFCJpGOja+m5SzZ8Sa4J7ohxdK=r8nyof-...@mail.gmail.com
> Link: http://lkml.kernel.org/r/alpine.lrh.2.21.1802261049140.4...@math.ut.ee
> ---
> Hello,
> 
> I don't have the full explanation yet but here's a preliminary patch.
> 
> Thanks.
> 
>  block/blk-timeout.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/blk-timeout.c b/block/blk-timeout.c
> index a05e367..f0e6e41 100644
> --- a/block/blk-timeout.c
> +++ b/block/blk-timeout.c
> @@ -165,7 +165,7 @@ void blk_abort_request(struct request *req)
>* No need for fancy synchronizations.
>*/
>   blk_rq_set_deadline(req, jiffies);
> - mod_timer(>q->timeout, 0);
> + kblockd_schedule_work(>q->timeout_work);
>   } else {
>   if (blk_mark_rq_complete(req))
>   return;

In any case, it's cleaner than relying on mod_timer(.., 0). If that
doesn't guarantee that the timer runs again, I can see how a race
with the running timer could prevent us from seeing the timeout
after an abort.

I'll apply this, thanks.

-- 
Jens Axboe



Re: [PATCH] blk-mq: Directly schedule q->timeout_work when aborting a request

2018-04-02 Thread Jens Axboe
On 4/2/18 4:04 PM, Tejun Heo wrote:
> Request abortion is performed by overriding deadline to now and
> scheduling timeout handling immediately.  For the latter part, the
> code was using mod_timer(timeout, 0) which can't guarantee that the
> timer runs afterwards.  Let's schedule the underlying work item
> directly instead.
> 
> This fixes the hangs during probing reported by Sitsofe but it isn't
> yet clear to me how the failure can happen reliably if it's just the
> above described race condition.
> 
> Signed-off-by: Tejun Heo 
> Reported-by: Sitsofe Wheeler 
> Reported-by: Meelis Roos 
> Fixes: 358f70da49d7 ("blk-mq: make blk_abort_request() trigger timeout path")
> Cc: sta...@vger.kernel.org # v4.16
> Link: 
> http://lkml.kernel.org/r/CALjAwxh-PVYFnYFCJpGOja+m5SzZ8Sa4J7ohxdK=r8nyof-...@mail.gmail.com
> Link: http://lkml.kernel.org/r/alpine.lrh.2.21.1802261049140.4...@math.ut.ee
> ---
> Hello,
> 
> I don't have the full explanation yet but here's a preliminary patch.
> 
> Thanks.
> 
>  block/blk-timeout.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/blk-timeout.c b/block/blk-timeout.c
> index a05e367..f0e6e41 100644
> --- a/block/blk-timeout.c
> +++ b/block/blk-timeout.c
> @@ -165,7 +165,7 @@ void blk_abort_request(struct request *req)
>* No need for fancy synchronizations.
>*/
>   blk_rq_set_deadline(req, jiffies);
> - mod_timer(>q->timeout, 0);
> + kblockd_schedule_work(>q->timeout_work);
>   } else {
>   if (blk_mark_rq_complete(req))
>   return;

In any case, it's cleaner than relying on mod_timer(.., 0). If that
doesn't guarantee that the timer runs again, I can see how a race
with the running timer could prevent us from seeing the timeout
after an abort.

I'll apply this, thanks.

-- 
Jens Axboe



[PATCH] blk-mq: Directly schedule q->timeout_work when aborting a request

2018-04-02 Thread Tejun Heo
Request abortion is performed by overriding deadline to now and
scheduling timeout handling immediately.  For the latter part, the
code was using mod_timer(timeout, 0) which can't guarantee that the
timer runs afterwards.  Let's schedule the underlying work item
directly instead.

This fixes the hangs during probing reported by Sitsofe but it isn't
yet clear to me how the failure can happen reliably if it's just the
above described race condition.

Signed-off-by: Tejun Heo 
Reported-by: Sitsofe Wheeler 
Reported-by: Meelis Roos 
Fixes: 358f70da49d7 ("blk-mq: make blk_abort_request() trigger timeout path")
Cc: sta...@vger.kernel.org # v4.16
Link: 
http://lkml.kernel.org/r/CALjAwxh-PVYFnYFCJpGOja+m5SzZ8Sa4J7ohxdK=r8nyof-...@mail.gmail.com
Link: http://lkml.kernel.org/r/alpine.lrh.2.21.1802261049140.4...@math.ut.ee
---
Hello,

I don't have the full explanation yet but here's a preliminary patch.

Thanks.

 block/blk-timeout.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-timeout.c b/block/blk-timeout.c
index a05e367..f0e6e41 100644
--- a/block/blk-timeout.c
+++ b/block/blk-timeout.c
@@ -165,7 +165,7 @@ void blk_abort_request(struct request *req)
 * No need for fancy synchronizations.
 */
blk_rq_set_deadline(req, jiffies);
-   mod_timer(>q->timeout, 0);
+   kblockd_schedule_work(>q->timeout_work);
} else {
if (blk_mark_rq_complete(req))
return;


[PATCH] blk-mq: Directly schedule q->timeout_work when aborting a request

2018-04-02 Thread Tejun Heo
Request abortion is performed by overriding deadline to now and
scheduling timeout handling immediately.  For the latter part, the
code was using mod_timer(timeout, 0) which can't guarantee that the
timer runs afterwards.  Let's schedule the underlying work item
directly instead.

This fixes the hangs during probing reported by Sitsofe but it isn't
yet clear to me how the failure can happen reliably if it's just the
above described race condition.

Signed-off-by: Tejun Heo 
Reported-by: Sitsofe Wheeler 
Reported-by: Meelis Roos 
Fixes: 358f70da49d7 ("blk-mq: make blk_abort_request() trigger timeout path")
Cc: sta...@vger.kernel.org # v4.16
Link: 
http://lkml.kernel.org/r/CALjAwxh-PVYFnYFCJpGOja+m5SzZ8Sa4J7ohxdK=r8nyof-...@mail.gmail.com
Link: http://lkml.kernel.org/r/alpine.lrh.2.21.1802261049140.4...@math.ut.ee
---
Hello,

I don't have the full explanation yet but here's a preliminary patch.

Thanks.

 block/blk-timeout.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-timeout.c b/block/blk-timeout.c
index a05e367..f0e6e41 100644
--- a/block/blk-timeout.c
+++ b/block/blk-timeout.c
@@ -165,7 +165,7 @@ void blk_abort_request(struct request *req)
 * No need for fancy synchronizations.
 */
blk_rq_set_deadline(req, jiffies);
-   mod_timer(>q->timeout, 0);
+   kblockd_schedule_work(>q->timeout_work);
} else {
if (blk_mark_rq_complete(req))
return;