Re: [PATCH] blk-mq: fix race between complete and BLK_EH_RESET_TIMER

2018-04-13 Thread Ming Lei
On Thu, Apr 12, 2018 at 06:57:12AM -0700, Tejun Heo wrote:
> On Thu, Apr 12, 2018 at 07:05:13AM +0800, Ming Lei wrote:
> > > Not really because aborted_gstate right now doesn't have any memory
> > > barrier around it, so nothing ensures blk_add_timer() actually appears
> > > before.  We can either add the matching barriers in aborted_gstate
> > > update and when it's read in the normal completion path, or we can
> > > wait for the update to be visible everywhere by waiting for rcu grace
> > > period (because the reader is rcu protected).
> > 
> > Seems not necessary.
> > 
> > Suppose it is out of order, the only side-effect is that the new
> > recycled request is timed out as a bit late, I think that is what
> > we can survive, right?
> 
> It at least can mess up the timeout duration for the next recycle
> instance because there can be two competing blk_add_timer() instances.
> I'm not sure whether there can be other consequences.  When ownership
> isn't clear, it becomes really difficult to reason about these things
> and can lead to subtle failures.  I think it'd be best to always
> establish who owns what.

Please see the code of blk_add_timer() for blk-mq:

blk_rq_set_deadline(req, jiffies + req->timeout);
req->rq_flags &= ~RQF_MQ_TIMEOUT_EXPIRED;

if (!timer_pending(>timeout) ||
time_before(expiry, q->timeout.expires))
mod_timer(>timeout, expiry);

If this rq is recycled, blk_add_timer() only touches rq->deadline
and the EXPIRED flags, and the only effect is that the timeout
may be handled a bit late, but the timeout monitor won't be lost.

And this thing shouldn't be difficult to avoid, as you mentioned,
synchronize_rcu() can be added between blk_add_timer() and
resetting aborted gstate for avoiding it.


thanks,
Ming


Re: [PATCH] blk-mq: fix race between complete and BLK_EH_RESET_TIMER

2018-04-12 Thread Tejun Heo
On Thu, Apr 12, 2018 at 07:05:13AM +0800, Ming Lei wrote:
> > Not really because aborted_gstate right now doesn't have any memory
> > barrier around it, so nothing ensures blk_add_timer() actually appears
> > before.  We can either add the matching barriers in aborted_gstate
> > update and when it's read in the normal completion path, or we can
> > wait for the update to be visible everywhere by waiting for rcu grace
> > period (because the reader is rcu protected).
> 
> Seems not necessary.
> 
> Suppose it is out of order, the only side-effect is that the new
> recycled request is timed out as a bit late, I think that is what
> we can survive, right?

It at least can mess up the timeout duration for the next recycle
instance because there can be two competing blk_add_timer() instances.
I'm not sure whether there can be other consequences.  When ownership
isn't clear, it becomes really difficult to reason about these things
and can lead to subtle failures.  I think it'd be best to always
establish who owns what.

Thanks.

-- 
tejun


Re: [PATCH] blk-mq: fix race between complete and BLK_EH_RESET_TIMER

2018-04-11 Thread Ming Lei
On Wed, Apr 11, 2018 at 10:49:51PM +, Bart Van Assche wrote:
> On Thu, 2018-04-12 at 04:55 +0800, Ming Lei wrote:
> > +again:
> > switch (ret) {
> > case BLK_EH_HANDLED:
> > __blk_mq_complete_request(req);
> > break;
> > case BLK_EH_RESET_TIMER:
> > [ ... ]
> > +   spin_lock_irqsave(req->q->queue_lock, flags);
> > +   if (blk_mq_rq_state(req) != MQ_RQ_COMPLETE_IN_RESET) {
> > +   blk_mq_rq_update_aborted_gstate(req, 0);
> > +   blk_add_timer(req);
> > +   } else {
> > +   blk_mq_rq_update_state(req, MQ_RQ_IN_FLIGHT);
> > +   ret = BLK_EH_HANDLED;
> > +   goto again;
> > +   }
> > +   spin_unlock_irqrestore(req->q->queue_lock, flags);
> 
> Does the above chunk introduce a backwards goto from inside a region around
> which a spinlock is held to outside that region? Can such a goto result in
> anything else than a deadlock?

Yes, it is being fixed in my local V2, :-)

-- 
Ming


Re: [PATCH] blk-mq: fix race between complete and BLK_EH_RESET_TIMER

2018-04-11 Thread Bart Van Assche
On Thu, 2018-04-12 at 04:55 +0800, Ming Lei wrote:
> +again:
>   switch (ret) {
>   case BLK_EH_HANDLED:
>   __blk_mq_complete_request(req);
>   break;
>   case BLK_EH_RESET_TIMER:
>   [ ... ]
> + spin_lock_irqsave(req->q->queue_lock, flags);
> + if (blk_mq_rq_state(req) != MQ_RQ_COMPLETE_IN_RESET) {
> + blk_mq_rq_update_aborted_gstate(req, 0);
> + blk_add_timer(req);
> + } else {
> + blk_mq_rq_update_state(req, MQ_RQ_IN_FLIGHT);
> + ret = BLK_EH_HANDLED;
> + goto again;
> + }
> + spin_unlock_irqrestore(req->q->queue_lock, flags);

Does the above chunk introduce a backwards goto from inside a region around
which a spinlock is held to outside that region? Can such a goto result in
anything else than a deadlock?

Thanks,

Bart.





Re: [PATCH] blk-mq: fix race between complete and BLK_EH_RESET_TIMER

2018-04-11 Thread Tejun Heo
Hello,

On Thu, Apr 12, 2018 at 06:43:45AM +0800, Ming Lei wrote:
> On Wed, Apr 11, 2018 at 02:30:07PM -0700, Tejun Heo wrote:
> > Hello, Ming.
> > 
> > On Thu, Apr 12, 2018 at 04:55:29AM +0800, Ming Lei wrote:
> > ...
> > > + spin_lock_irqsave(req->q->queue_lock, flags);
> > > + if (blk_mq_rq_state(req) != MQ_RQ_COMPLETE_IN_RESET) {
> > > + blk_mq_rq_update_aborted_gstate(req, 0);
> > > + blk_add_timer(req);
> > 
> > Nothing prevents the above blk_add_timer() racing against the next
> > recycle instance of the request, so this still leaves a small race
> > window.
> 
> OK.
> 
> But this small race window can be avoided by running blk_add_timer(req)
> before blk_mq_rq_update_aborted_gstate(req, 0), can't it?

Not really because aborted_gstate right now doesn't have any memory
barrier around it, so nothing ensures blk_add_timer() actually appears
before.  We can either add the matching barriers in aborted_gstate
update and when it's read in the normal completion path, or we can
wait for the update to be visible everywhere by waiting for rcu grace
period (because the reader is rcu protected).

Thanks.

-- 
tejun


Re: [PATCH] blk-mq: fix race between complete and BLK_EH_RESET_TIMER

2018-04-11 Thread Ming Lei
On Wed, Apr 11, 2018 at 02:30:07PM -0700, Tejun Heo wrote:
> Hello, Ming.
> 
> On Thu, Apr 12, 2018 at 04:55:29AM +0800, Ming Lei wrote:
> ...
> > +   spin_lock_irqsave(req->q->queue_lock, flags);
> > +   if (blk_mq_rq_state(req) != MQ_RQ_COMPLETE_IN_RESET) {
> > +   blk_mq_rq_update_aborted_gstate(req, 0);
> > +   blk_add_timer(req);
> 
> Nothing prevents the above blk_add_timer() racing against the next
> recycle instance of the request, so this still leaves a small race
> window.

OK.

But this small race window can be avoided by running blk_add_timer(req)
before blk_mq_rq_update_aborted_gstate(req, 0), can't it?

-- 
Ming


Re: [PATCH] blk-mq: fix race between complete and BLK_EH_RESET_TIMER

2018-04-11 Thread Tejun Heo
Hello, Ming.

On Thu, Apr 12, 2018 at 04:55:29AM +0800, Ming Lei wrote:
...
> + spin_lock_irqsave(req->q->queue_lock, flags);
> + if (blk_mq_rq_state(req) != MQ_RQ_COMPLETE_IN_RESET) {
> + blk_mq_rq_update_aborted_gstate(req, 0);
> + blk_add_timer(req);

Nothing prevents the above blk_add_timer() racing against the next
recycle instance of the request, so this still leaves a small race
window.

Thanks.

-- 
tejun