Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-18 Thread Bart Van Assche
On 04/10/18 14:54, t...@kernel.org wrote: On Tue, Apr 10, 2018 at 09:46:23PM +, Bart Van Assche wrote: On Tue, 2018-04-10 at 14:33 -0700, t...@kernel.org wrote: + else + rq->missed_completion = true; In this patch I found code that sets rq->missed_completion but no

Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-15 Thread Israel Rukshin
Hi, On 4/12/2018 4:35 PM, t...@kernel.org wrote: Hello, Israel. On Thu, Apr 12, 2018 at 11:59:11AM +0300, Israel Rukshin wrote: On 4/12/2018 12:31 AM, t...@kernel.org wrote: Hey, again. On Wed, Apr 11, 2018 at 10:07:33AM -0700, t...@kernel.org wrote: Hello, Israel. On Wed, Apr 11, 2018 at

Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-12 Thread t...@kernel.org
Hello, Israel. On Thu, Apr 12, 2018 at 11:59:11AM +0300, Israel Rukshin wrote: > On 4/12/2018 12:31 AM, t...@kernel.org wrote: > >Hey, again. > > > >On Wed, Apr 11, 2018 at 10:07:33AM -0700, t...@kernel.org wrote: > >>Hello, Israel. > >> > >>On Wed, Apr 11, 2018 at 07:16:14PM +0300, Israel

Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-12 Thread Israel Rukshin
On 4/12/2018 12:31 AM, t...@kernel.org wrote: Hey, again. On Wed, Apr 11, 2018 at 10:07:33AM -0700, t...@kernel.org wrote: Hello, Israel. On Wed, Apr 11, 2018 at 07:16:14PM +0300, Israel Rukshin wrote: Just noticed this one, this looks interesting to me as well. Israel, can you run your test

Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-11 Thread t...@kernel.org
Hey, again. On Wed, Apr 11, 2018 at 10:07:33AM -0700, t...@kernel.org wrote: > Hello, Israel. > > On Wed, Apr 11, 2018 at 07:16:14PM +0300, Israel Rukshin wrote: > > >Just noticed this one, this looks interesting to me as well. Israel, > > >can you run your test with this patch? > > > > Yes, I

Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-11 Thread Martin Steigerwald
Bart Van Assche - 11.04.18, 14:50: > On Tue, 2018-04-10 at 14:54 -0700, t...@kernel.org wrote: > > Ah, yeah, I was moving it out of add_timer but forgot to actully add > > it to the issue path. Fixed patch below. > > > > BTW, no matter what we do w/ the request handover between normal and > >

Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-11 Thread Israel Rukshin
On 4/11/2018 5:24 PM, Sagi Grimberg wrote: Ah, yeah, I was moving it out of add_timer but forgot to actully add it to the issue path.  Fixed patch below. BTW, no matter what we do w/ the request handover between normal and timeout paths, we'd need something similar.  Otherwise, while we can

Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-11 Thread t...@kernel.org
Hello, On Wed, Apr 11, 2018 at 05:24:49PM +0300, Sagi Grimberg wrote: > > >Ah, yeah, I was moving it out of add_timer but forgot to actully add > >it to the issue path. Fixed patch below. > > > >BTW, no matter what we do w/ the request handover between normal and > >timeout paths, we'd need

Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-11 Thread Sagi Grimberg
Ah, yeah, I was moving it out of add_timer but forgot to actully add it to the issue path. Fixed patch below. BTW, no matter what we do w/ the request handover between normal and timeout paths, we'd need something similar. Otherwise, while we can reduce the window, we can't get rid of it.

Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-11 Thread t...@kernel.org
Hello, Bart. On Wed, Apr 11, 2018 at 12:50:51PM +, Bart Van Assche wrote: > Thank you for having shared this patch. It looks interesting to me. What I > know about the blk-mq timeout handling is as follows: > * Nobody who has reviewed the blk-mq timeout handling code with this patch >

Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-11 Thread Bart Van Assche
On Tue, 2018-04-10 at 14:54 -0700, t...@kernel.org wrote: > Ah, yeah, I was moving it out of add_timer but forgot to actully add > it to the issue path. Fixed patch below. > > BTW, no matter what we do w/ the request handover between normal and > timeout paths, we'd need something similar.

Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-10 Thread t...@kernel.org
On Tue, Apr 10, 2018 at 09:46:23PM +, Bart Van Assche wrote: > On Tue, 2018-04-10 at 14:33 -0700, t...@kernel.org wrote: > > + else > > + rq->missed_completion = true; > > In this patch I found code that sets rq->missed_completion but no code that > clears it. Did I

Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-10 Thread Bart Van Assche
On Tue, 2018-04-10 at 14:33 -0700, t...@kernel.org wrote: > + else > + rq->missed_completion = true; In this patch I found code that sets rq->missed_completion but no code that clears it. Did I perhaps miss something? Thanks, Bart.

Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-10 Thread t...@kernel.org
Hello, On Tue, Apr 10, 2018 at 08:40:43AM -0700, t...@kernel.org wrote: > On Tue, Apr 10, 2018 at 11:38:18PM +0800, Ming Lei wrote: > > I agree, the issue should be in driver's irq handler and .timeout in > > theory. > > > > For example, even though one request has been done by irq handler,

Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-10 Thread t...@kernel.org
Hello, On Tue, Apr 10, 2018 at 11:38:18PM +0800, Ming Lei wrote: > I agree, the issue should be in driver's irq handler and .timeout in > theory. > > For example, even though one request has been done by irq handler, .timeout > still may return RESET_TIMER. blk-mq can use a separate flag to

Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-10 Thread Ming Lei
Hi Tejun, On Tue, Apr 10, 2018 at 08:30:31AM -0700, t...@kernel.org wrote: > Hello, Ming. > > On Tue, Apr 10, 2018 at 11:25:54PM +0800, Ming Lei wrote: > > + if (time_after_eq(jiffies, deadline) && > > + blk_mq_change_rq_state(rq, MQ_RQ_IN_FLIGHT, MQ_RQ_COMPLETE)) { > > +

Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-10 Thread t...@kernel.org
Hello, Ming. On Tue, Apr 10, 2018 at 11:25:54PM +0800, Ming Lei wrote: > + if (time_after_eq(jiffies, deadline) && > + blk_mq_change_rq_state(rq, MQ_RQ_IN_FLIGHT, MQ_RQ_COMPLETE)) { > + blk_mq_rq_timed_out(rq, reserved); > > Normal completion still can happen between

Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-10 Thread Ming Lei
On Tue, Apr 10, 2018 at 03:02:11PM +, Bart Van Assche wrote: > On Tue, 2018-04-10 at 22:30 +0800, Ming Lei wrote: > > On Tue, Apr 10, 2018 at 02:09:33PM +, Bart Van Assche wrote: > > > Please keep in mind that all synchronize_rcu() does is to wait for pre- > > > existing RCU readers to

Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-10 Thread Bart Van Assche
On Tue, 2018-04-10 at 22:30 +0800, Ming Lei wrote: > On Tue, Apr 10, 2018 at 02:09:33PM +, Bart Van Assche wrote: > > Please keep in mind that all synchronize_rcu() does is to wait for pre- > > existing RCU readers to finish. synchronize_rcu() does not prevent that new > > rcu_read_lock()

Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-10 Thread h...@lst.de
On Tue, Apr 10, 2018 at 01:26:39PM +, Bart Van Assche wrote: > Can you explain why you think that using cmpxchg() is safe in this context? > The regular completion path and the timeout code can both execute e.g. the > following code concurrently: > > blk_mq_change_rq_state(rq,

Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-10 Thread Jens Axboe
On 4/10/18 3:55 AM, Christoph Hellwig wrote: > On Mon, Apr 09, 2018 at 06:34:55PM -0700, Bart Van Assche wrote: >> If a completion occurs after blk_mq_rq_timed_out() has reset >> rq->aborted_gstate and the request is again in flight when the timeout >> expires then a request will be completed

Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-10 Thread t...@kernel.org
Hello, On Tue, Apr 10, 2018 at 02:30:26PM +, Bart Van Assche wrote: > > Switching to another model might be better but let's please do that > > with the right rationales. A good portion of this seems to be built > > on misunderstandings. > > Which misunderstandings? I'm not aware of any

Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-10 Thread jianchao.wang
Hi Bart Thanks for your kindly response. On 04/10/2018 09:01 PM, Bart Van Assche wrote: > On Tue, 2018-04-10 at 15:59 +0800, jianchao.wang wrote: >> If yes, how does the timeout handler get the freed request when the tag has >> been freed ? > > Hello Jianchao, > > Have you noticed that the

Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-10 Thread Ming Lei
On Tue, Apr 10, 2018 at 02:09:33PM +, Bart Van Assche wrote: > On Tue, 2018-04-10 at 21:55 +0800, Ming Lei wrote: > > Then I have same question with Jianchao, what is the actual double > > complete in linus tree between BLK_EH_RESET_TIMER and normal completion? > > > > Follows my

Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-10 Thread Bart Van Assche
On Tue, 2018-04-10 at 07:20 -0700, Tejun Heo wrote: > On Mon, Apr 09, 2018 at 06:34:55PM -0700, Bart Van Assche wrote: > > Since the request state can be updated from two different contexts, > > namely regular completion and request timeout, this race cannot be > > fixed with RCU synchronization

Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-10 Thread Tejun Heo
Hello, Bart. On Mon, Apr 09, 2018 at 06:34:55PM -0700, Bart Van Assche wrote: > Since the request state can be updated from two different contexts, > namely regular completion and request timeout, this race cannot be > fixed with RCU synchronization only. Fix this race as follows: Well, it can

Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-10 Thread Bart Van Assche
On Tue, 2018-04-10 at 21:55 +0800, Ming Lei wrote: > Then I have same question with Jianchao, what is the actual double > complete in linus tree between BLK_EH_RESET_TIMER and normal completion? > > Follows my understanding: > > 1) when timeout is detected on one request, its aborted_gstate is >

Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-10 Thread Ming Lei
On Tue, Apr 10, 2018 at 12:58:04PM +, Bart Van Assche wrote: > On Tue, 2018-04-10 at 16:41 +0800, Ming Lei wrote: > > On Mon, Apr 09, 2018 at 06:34:55PM -0700, Bart Van Assche wrote: > > > If a completion occurs after blk_mq_rq_timed_out() has reset > > > rq->aborted_gstate and the request is

Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-10 Thread Bart Van Assche
On Tue, 2018-04-10 at 11:55 +0200, Christoph Hellwig wrote: > I don't think we need the atomic_long_cmpxchg, and can do with a plain > cmpxhg. Also unterminated cmpxchg loops are a bad idea, but I think > both callers are protected from other changes so we can turn that > into a WARN_ON(). Hello

Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-10 Thread Bart Van Assche
On Tue, 2018-04-10 at 15:59 +0800, jianchao.wang wrote: > If yes, how does the timeout handler get the freed request when the tag has > been freed ? Hello Jianchao, Have you noticed that the timeout handler does not check whether or not the request tag is freed? Additionally, I don't think it

Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-10 Thread Bart Van Assche
On Tue, 2018-04-10 at 16:41 +0800, Ming Lei wrote: > On Mon, Apr 09, 2018 at 06:34:55PM -0700, Bart Van Assche wrote: > > If a completion occurs after blk_mq_rq_timed_out() has reset > > rq->aborted_gstate and the request is again in flight when the timeout > > Given rq->aborted_gstate is reset

Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-10 Thread Shan Hai
On 2018年04月10日 18:04, Ming Lei wrote: On Tue, Apr 10, 2018 at 03:59:30PM +0800, jianchao.wang wrote: Hi Bart On 04/10/2018 09:34 AM, Bart Van Assche wrote: If a completion occurs after blk_mq_rq_timed_out() has reset rq->aborted_gstate and the request is again in flight when the timeout

Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-10 Thread Ming Lei
On Tue, Apr 10, 2018 at 03:59:30PM +0800, jianchao.wang wrote: > Hi Bart > > On 04/10/2018 09:34 AM, Bart Van Assche wrote: > > If a completion occurs after blk_mq_rq_timed_out() has reset > > rq->aborted_gstate and the request is again in flight when the timeout > > expires then a request will

Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-10 Thread Christoph Hellwig
On Mon, Apr 09, 2018 at 06:34:55PM -0700, Bart Van Assche wrote: > If a completion occurs after blk_mq_rq_timed_out() has reset > rq->aborted_gstate and the request is again in flight when the timeout > expires then a request will be completed twice: a first time by the > timeout handler and a

Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-10 Thread Ming Lei
On Mon, Apr 09, 2018 at 06:34:55PM -0700, Bart Van Assche wrote: > If a completion occurs after blk_mq_rq_timed_out() has reset > rq->aborted_gstate and the request is again in flight when the timeout Given rq->aborted_gstate is reset only for BLK_EH_RESET_TIMER, I think you are addressing the

Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-10 Thread jianchao.wang
Hi Bart On 04/10/2018 09:34 AM, Bart Van Assche wrote: > If a completion occurs after blk_mq_rq_timed_out() has reset > rq->aborted_gstate and the request is again in flight when the timeout > expires then a request will be completed twice: a first time by the > timeout handler and a second time