Re: [PATCH] blk-mq: fix race between complete and BLK_EH_RESET_TIMER
On Thu, Apr 12, 2018 at 06:57:12AM -0700, Tejun Heo wrote: > On Thu, Apr 12, 2018 at 07:05:13AM +0800, Ming Lei wrote: > > > Not really because aborted_gstate right now doesn't have any memory > > > barrier around it, so nothing ensures blk_add_timer() actually appears > > > before. We can either add the matching barriers in aborted_gstate > > > update and when it's read in the normal completion path, or we can > > > wait for the update to be visible everywhere by waiting for rcu grace > > > period (because the reader is rcu protected). > > > > Seems not necessary. > > > > Suppose it is out of order, the only side-effect is that the new > > recycled request is timed out as a bit late, I think that is what > > we can survive, right? > > It at least can mess up the timeout duration for the next recycle > instance because there can be two competing blk_add_timer() instances. > I'm not sure whether there can be other consequences. When ownership > isn't clear, it becomes really difficult to reason about these things > and can lead to subtle failures. I think it'd be best to always > establish who owns what. Please see the code of blk_add_timer() for blk-mq: blk_rq_set_deadline(req, jiffies + req->timeout); req->rq_flags &= ~RQF_MQ_TIMEOUT_EXPIRED; if (!timer_pending(>timeout) || time_before(expiry, q->timeout.expires)) mod_timer(>timeout, expiry); If this rq is recycled, blk_add_timer() only touches rq->deadline and the EXPIRED flags, and the only effect is that the timeout may be handled a bit late, but the timeout monitor won't be lost. And this thing shouldn't be difficult to avoid, as you mentioned, synchronize_rcu() can be added between blk_add_timer() and resetting aborted gstate for avoiding it. thanks, Ming
Re: [PATCH] blk-mq: fix race between complete and BLK_EH_RESET_TIMER
On Thu, Apr 12, 2018 at 07:05:13AM +0800, Ming Lei wrote: > > Not really because aborted_gstate right now doesn't have any memory > > barrier around it, so nothing ensures blk_add_timer() actually appears > > before. We can either add the matching barriers in aborted_gstate > > update and when it's read in the normal completion path, or we can > > wait for the update to be visible everywhere by waiting for rcu grace > > period (because the reader is rcu protected). > > Seems not necessary. > > Suppose it is out of order, the only side-effect is that the new > recycled request is timed out as a bit late, I think that is what > we can survive, right? It at least can mess up the timeout duration for the next recycle instance because there can be two competing blk_add_timer() instances. I'm not sure whether there can be other consequences. When ownership isn't clear, it becomes really difficult to reason about these things and can lead to subtle failures. I think it'd be best to always establish who owns what. Thanks. -- tejun
Re: [PATCH] blk-mq: fix race between complete and BLK_EH_RESET_TIMER
On Wed, Apr 11, 2018 at 10:49:51PM +, Bart Van Assche wrote: > On Thu, 2018-04-12 at 04:55 +0800, Ming Lei wrote: > > +again: > > switch (ret) { > > case BLK_EH_HANDLED: > > __blk_mq_complete_request(req); > > break; > > case BLK_EH_RESET_TIMER: > > [ ... ] > > + spin_lock_irqsave(req->q->queue_lock, flags); > > + if (blk_mq_rq_state(req) != MQ_RQ_COMPLETE_IN_RESET) { > > + blk_mq_rq_update_aborted_gstate(req, 0); > > + blk_add_timer(req); > > + } else { > > + blk_mq_rq_update_state(req, MQ_RQ_IN_FLIGHT); > > + ret = BLK_EH_HANDLED; > > + goto again; > > + } > > + spin_unlock_irqrestore(req->q->queue_lock, flags); > > Does the above chunk introduce a backwards goto from inside a region around > which a spinlock is held to outside that region? Can such a goto result in > anything else than a deadlock? Yes, it is being fixed in my local V2, :-) -- Ming
Re: [PATCH] blk-mq: fix race between complete and BLK_EH_RESET_TIMER
On Thu, 2018-04-12 at 04:55 +0800, Ming Lei wrote: > +again: > switch (ret) { > case BLK_EH_HANDLED: > __blk_mq_complete_request(req); > break; > case BLK_EH_RESET_TIMER: > [ ... ] > + spin_lock_irqsave(req->q->queue_lock, flags); > + if (blk_mq_rq_state(req) != MQ_RQ_COMPLETE_IN_RESET) { > + blk_mq_rq_update_aborted_gstate(req, 0); > + blk_add_timer(req); > + } else { > + blk_mq_rq_update_state(req, MQ_RQ_IN_FLIGHT); > + ret = BLK_EH_HANDLED; > + goto again; > + } > + spin_unlock_irqrestore(req->q->queue_lock, flags); Does the above chunk introduce a backwards goto from inside a region around which a spinlock is held to outside that region? Can such a goto result in anything else than a deadlock? Thanks, Bart.
Re: [PATCH] blk-mq: fix race between complete and BLK_EH_RESET_TIMER
Hello, On Thu, Apr 12, 2018 at 06:43:45AM +0800, Ming Lei wrote: > On Wed, Apr 11, 2018 at 02:30:07PM -0700, Tejun Heo wrote: > > Hello, Ming. > > > > On Thu, Apr 12, 2018 at 04:55:29AM +0800, Ming Lei wrote: > > ... > > > + spin_lock_irqsave(req->q->queue_lock, flags); > > > + if (blk_mq_rq_state(req) != MQ_RQ_COMPLETE_IN_RESET) { > > > + blk_mq_rq_update_aborted_gstate(req, 0); > > > + blk_add_timer(req); > > > > Nothing prevents the above blk_add_timer() racing against the next > > recycle instance of the request, so this still leaves a small race > > window. > > OK. > > But this small race window can be avoided by running blk_add_timer(req) > before blk_mq_rq_update_aborted_gstate(req, 0), can't it? Not really because aborted_gstate right now doesn't have any memory barrier around it, so nothing ensures blk_add_timer() actually appears before. We can either add the matching barriers in aborted_gstate update and when it's read in the normal completion path, or we can wait for the update to be visible everywhere by waiting for rcu grace period (because the reader is rcu protected). Thanks. -- tejun
Re: [PATCH] blk-mq: fix race between complete and BLK_EH_RESET_TIMER
On Wed, Apr 11, 2018 at 02:30:07PM -0700, Tejun Heo wrote: > Hello, Ming. > > On Thu, Apr 12, 2018 at 04:55:29AM +0800, Ming Lei wrote: > ... > > + spin_lock_irqsave(req->q->queue_lock, flags); > > + if (blk_mq_rq_state(req) != MQ_RQ_COMPLETE_IN_RESET) { > > + blk_mq_rq_update_aborted_gstate(req, 0); > > + blk_add_timer(req); > > Nothing prevents the above blk_add_timer() racing against the next > recycle instance of the request, so this still leaves a small race > window. OK. But this small race window can be avoided by running blk_add_timer(req) before blk_mq_rq_update_aborted_gstate(req, 0), can't it? -- Ming
Re: [PATCH] blk-mq: fix race between complete and BLK_EH_RESET_TIMER
Hello, Ming. On Thu, Apr 12, 2018 at 04:55:29AM +0800, Ming Lei wrote: ... > + spin_lock_irqsave(req->q->queue_lock, flags); > + if (blk_mq_rq_state(req) != MQ_RQ_COMPLETE_IN_RESET) { > + blk_mq_rq_update_aborted_gstate(req, 0); > + blk_add_timer(req); Nothing prevents the above blk_add_timer() racing against the next recycle instance of the request, so this still leaves a small race window. Thanks. -- tejun