Re: [PATCH V3] blk-mq: fix race between complete and BLK_EH_RESET_TIMER
On Sat, Apr 14, 2018 at 03:22:07PM +, Bart Van Assche wrote: > On Fri, 2018-04-13 at 21:06 -0600, Jens Axboe wrote: > > I like this approach since it keeps the cost outside of the fast > > path. And it's fine to reuse the queue lock for this, instead of > > adding a special lock for something we consider a rare occurrence. > > > > From a quick look this looks sane, but I'll take a closer look > > tomrrow and add some testing too. Jens, please hold on, I will post out V4 soon, which will improve V3 a bit. > > Shouldn't we know the root cause of the "RIP: scsi_times_out+0x17" crash > reported in > https://bugzilla.kernel.org/show_bug.cgi?id=199077 before we decide how to > proceed? I will ask Martin to test the V4 once it is posted out. Thanks, Ming
Re: [PATCH V3] blk-mq: fix race between complete and BLK_EH_RESET_TIMER
On Fri, 2018-04-13 at 21:06 -0600, Jens Axboe wrote: > I like this approach since it keeps the cost outside of the fast > path. And it's fine to reuse the queue lock for this, instead of > adding a special lock for something we consider a rare occurrence. > > From a quick look this looks sane, but I'll take a closer look > tomrrow and add some testing too. Shouldn't we know the root cause of the "RIP: scsi_times_out+0x17" crash reported in https://bugzilla.kernel.org/show_bug.cgi?id=199077 before we decide how to proceed? Thanks, Bart.
Re: [PATCH V3] blk-mq: fix race between complete and BLK_EH_RESET_TIMER
On 4/12/18 5:59 AM, Ming Lei wrote: > The normal request completion can be done before or during handling > BLK_EH_RESET_TIMER, and this race may cause the request to never be > completed since driver's .timeout() may always return > BLK_EH_RESET_TIMER. > > This issue can't be fixed completely by driver, since the normal > completion can be done between returning .timeout() and handling > BLK_EH_RESET_TIMER. > > This patch fixes the race by introducing rq state of > MQ_RQ_COMPLETE_IN_RESET, and reading/writing rq's state by holding > queue lock, which can be per-request actually, but just not necessary > to introduce one lock for so unusual event. > > Also when .timeout() returns BLK_EH_HANDLED, sync with normal > completion path before completing this timed-out rq finally for > avoiding this rq's state touched by normal completion. I like this approach since it keeps the cost outside of the fast path. And it's fine to reuse the queue lock for this, instead of adding a special lock for something we consider a rare occurrence. >From a quick look this looks sane, but I'll take a closer look tomrrow and add some testing too. -- Jens Axboe