Re: [PATCH 0/3] Enabling ATA Command Priorities

2016-09-29 Thread t...@kernel.org
Hello, On Wed, Sep 28, 2016 at 03:43:32AM +, Adam Manzanares wrote: > I prefer having the feature conditional so you can use the CFQ > scheduler with I/O priorities as is. If you decide to enable the > feature then the priorities will be passed down to the drive in > addition to the work that

Re: [PATCH 1/5] percpu-refcount: Introduce percpu_ref_switch_to_atomic_nowait()

2017-09-11 Thread t...@kernel.org
Hello, On Mon, Sep 11, 2017 at 04:09:12PM +, Bart Van Assche wrote: > On Mon, 2017-09-11 at 06:13 -0700, Tejun Heo wrote: > > On Fri, Sep 08, 2017 at 04:52:22PM -0700, Bart Van Assche wrote: > > > The blk-mq core keeps track of the number of request queue users > > > through q->q_usage_count.

Re: [PATCH 2/6] blk-mq: replace timeout synchronization with a RCU and generation based scheme

2017-12-12 Thread t...@kernel.org
Hello, Bart. On Tue, Dec 12, 2017 at 09:37:11PM +, Bart Van Assche wrote: > Have you considered the following instead of introducing MQ_RQ_IDLE and > MQ_RQ_IN_FLIGHT? I think this could help to limit the number of new atomic > operations introduced in the hot path by this patch series. But

Re: [PATCH 6/6] blk-mq: remove REQ_ATOM_STARTED

2017-12-12 Thread t...@kernel.org
On Tue, Dec 12, 2017 at 10:20:39PM +, Bart Van Assche wrote: > The above code should show all requests owned by the block driver. Patch > "blk-mq-debugfs: Also show requests that have not yet been started" (not yet > in Jens' tree) changes the REQ_ATOM_STARTED test into >

Re: [PATCH 4/6] blk-mq: make blk_abort_request() trigger timeout path

2017-12-14 Thread t...@kernel.org
Hello, On Thu, Dec 14, 2017 at 06:56:55PM +, Bart Van Assche wrote: > On Tue, 2017-12-12 at 11:01 -0800, Tejun Heo wrote: > > void blk_abort_request(struct request *req) > > { > > - if (blk_mark_rq_complete(req)) > > - return; > > > > if (req->q->mq_ops) { > > -

Re: [PATCH 2/6] blk-mq: replace timeout synchronization with a RCU and generation based scheme

2017-12-14 Thread t...@kernel.org
Hello, On Thu, Dec 14, 2017 at 06:51:11PM +, Bart Van Assche wrote: > On Tue, 2017-12-12 at 11:01 -0800, Tejun Heo wrote: > > rules. Unfortunatley, it contains quite a few holes. > ^ > Unfortunately? > > > While this change makes REQ_ATOM_COMPLETE

Re: [PATCH 1/6] blk-mq: protect completion path with RCU

2017-12-14 Thread t...@kernel.org
On Thu, Dec 14, 2017 at 05:01:06PM +, Bart Van Assche wrote: > On Tue, 2017-12-12 at 11:01 -0800, Tejun Heo wrote: > > + } else { > > + srcu_idx = srcu_read_lock(hctx->queue_rq_srcu); > > + if (!blk_mark_rq_complete(rq)) > > +

Re: [PATCH 2/6] blk-mq: replace timeout synchronization with a RCU and generation based scheme

2017-12-15 Thread t...@kernel.org
Hello, Bart. On Thu, Dec 14, 2017 at 09:13:32PM +, Bart Van Assche wrote: ... > however is called before a every use of a request. Sorry but I'm not > enthusiast about the code in blk_rq_init() that reinitializes state > information that should survive request reuse. If it wasn't clear, me

Re: [PATCH 2/6] blk-mq: replace timeout synchronization with a RCU and generation based scheme

2017-12-15 Thread t...@kernel.org
Hello, Peter. On Thu, Dec 14, 2017 at 09:20:42PM +0100, Peter Zijlstra wrote: > But now that I look at this again, TJ, why can't the below happen? > > write_seqlock_begin(); > blk_mq_rq_update_state(rq, IN_FLIGHT); > blk_add_timer(rq); > >

Re: [PATCH 3/8] blk-mq: replace timeout synchronization with a RCU and generation based scheme

2018-01-09 Thread t...@kernel.org
On Mon, Jan 08, 2018 at 09:06:55PM +, Bart Van Assche wrote: > On Mon, 2018-01-08 at 11:15 -0800, Tejun Heo wrote: > > +static void blk_mq_rq_update_aborted_gstate(struct request *rq, u64 gstate) > > +{ > > + unsigned long flags; > > + > > + local_irq_save(flags); > > +

Re: [PATCH 3/8] blk-mq: replace timeout synchronization with a RCU and generation based scheme

2018-01-09 Thread t...@kernel.org
On Mon, Jan 08, 2018 at 11:29:11PM +, Bart Van Assche wrote: > Does "gstate" perhaps stand for "generation number and state"? If so, please > mention this in one of the above comments. Yeah, will do. Thanks. -- tejun

Re: [PATCH 5/8] blk-mq: make blk_abort_request() trigger timeout path

2018-01-09 Thread t...@kernel.org
On Mon, Jan 08, 2018 at 10:10:01PM +, Bart Van Assche wrote: > Other req->deadline writes are protected by preempt_disable(), > write_seqcount_begin(>gstate_seq), write_seqcount_end(>gstate_seq) > and preempt_enable(). I think it's fine that the above req->deadline store > does not have that

Re: [PATCH 2/8] blk-mq: protect completion path with RCU

2018-01-09 Thread t...@kernel.org
Hello, Bart. On Tue, Jan 09, 2018 at 04:12:40PM +, Bart Van Assche wrote: > I'm concerned about the additional CPU cycles needed for the new > blk_mq_map_queue() > call, although I know this call is cheap. Would the timeout code really get > that So, if that is really a concern, let's

Re: [PATCH v2] blk-mq: Fix race between resetting the timer and completion handling

2018-02-07 Thread t...@kernel.org
Hello, On Wed, Feb 07, 2018 at 07:03:56PM +, Bart Van Assche wrote: > I tried the above patch but already during the first iteration of the test I > noticed that the test hung, probably due to the following request that got > stuck: > > $ (cd /sys/kernel/debug/block && grep -aH .

Re: [PATCH v2] blk-mq: Fix race between resetting the timer and completion handling

2018-02-07 Thread t...@kernel.org
Hello, Bart. On Wed, Feb 07, 2018 at 06:14:13PM +, Bart Van Assche wrote: > When I wrote my comment I was not sure whether or not non-reentrancy is > guaranteed for work queue items. However, according to what I found in the > workqueue implementation I think that is guaranteed. So it

Re: [PATCH v2] blk-mq: Fix race between resetting the timer and completion handling

2018-02-08 Thread t...@kernel.org
On Thu, Feb 08, 2018 at 05:48:32PM +, Bart Van Assche wrote: > On Thu, 2018-02-08 at 09:40 -0800, t...@kernel.org wrote: > > Heh, sorry about not being clear. What I'm trying to say is that > > scmd->device != NULL && device->host == NULL. Or was this what

Re: [PATCH v2] blk-mq: Fix race between resetting the timer and completion handling

2018-02-08 Thread t...@kernel.org
On Thu, Feb 08, 2018 at 05:37:46PM +, Bart Van Assche wrote: > On Thu, 2018-02-08 at 09:19 -0800, t...@kernel.org wrote: > > Hello, Bart. > > > > On Thu, Feb 08, 2018 at 05:10:45PM +, Bart Van Assche wrote: > > > I think "dereferencing a pointer

Re: [PATCH v2] blk-mq: Fix race between resetting the timer and completion handling

2018-02-08 Thread t...@kernel.org
Hello, Bart. On Thu, Feb 08, 2018 at 04:31:43PM +, Bart Van Assche wrote: > > That sounds more like a scsi hotplug bug than an issue in the timeout > > code unless we messed up @req pointer to begin with. > > I don't think that this is related to SCSI hotplugging: this crash does not > occur

Re: [PATCH v2] blk-mq: Fix race between resetting the timer and completion handling

2018-02-08 Thread t...@kernel.org
Hello, Bart. On Thu, Feb 08, 2018 at 05:10:45PM +, Bart Van Assche wrote: > I think "dereferencing a pointer" means reading the memory location that > pointer points > at? Anyway, I think we both interpret the crash report in the same way, > namely that it > means that scmd->device == NULL.

Re: [PATCH v2] blk-mq: Fix race between resetting the timer and completion handling

2018-02-07 Thread t...@kernel.org
Hello, On Wed, Feb 07, 2018 at 09:02:21PM +, Bart Van Assche wrote: > The patch that I used in my test had an smp_wmb() call (see also below). > Anyway, > I will see whether I can extract more state information through debugfs. > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index

Re: [PATCH v2] blk-mq: Fix race between resetting the timer and completion handling

2018-02-13 Thread t...@kernel.org
Hello, Bart. Sorry about the delay. On Thu, Feb 08, 2018 at 04:31:43PM +, Bart Van Assche wrote: > The crash is reported at address scsi_times_out+0x17 == scsi_times_out+23. The > instruction at that address tries to dereference scsi_cmnd.device (%rax). The > register dump shows that that

Re: [PATCH] blk-mq: Fix a race between resetting the timer and completion handling

2018-02-05 Thread t...@kernel.org
Hello, Bart. On Mon, Feb 05, 2018 at 09:33:03PM +, Bart Van Assche wrote: > My goal with this patch is to fix the race between resetting the timer and > the completion path. Hence change (3). Changes (1) and (2) are needed to > make the changes in blk_mq_rq_timed_out() work. Ah, I see. That

Re: [PATCH v2] blk-mq: Fix race between resetting the timer and completion handling

2018-02-08 Thread t...@kernel.org
On Thu, Feb 08, 2018 at 01:09:57AM +, Bart Van Assche wrote: > On Wed, 2018-02-07 at 23:48 +, Bart Van Assche wrote: > > With this patch applied I see requests for which it seems like the timeout > > handler > > did not get invoked: [ ... ] > > I just noticed the following in the system

Re: [PATCH v2] blk-mq: Fix race between resetting the timer and completion handling

2018-02-08 Thread t...@kernel.org
On Thu, Feb 08, 2018 at 07:39:40AM -0800, t...@kernel.org wrote: > That sounds more like a scsi hotplug but than an issue in the timeout ^bug > code unless we messed up @req pointer to begin with. -- tejun

Re: [PATCH v2] blk-mq: Fix race between resetting the timer and completion handling

2018-02-07 Thread t...@kernel.org
Hello, Bart. On Wed, Feb 07, 2018 at 05:27:10PM +, Bart Van Assche wrote: > Even with the above change I think that there is still a race between the > code that handles timer resets and the completion handler. Anyway, the test Can you elaborate the scenario a bit further? If you're

Re: [PATCH v2] blk-mq: Fix race between resetting the timer and completion handling

2018-02-21 Thread t...@kernel.org
Hello, Bart. On Wed, Feb 21, 2018 at 06:53:05PM +, Bart Van Assche wrote: > On Sun, 2018-02-18 at 05:11 -0800, t...@kernel.org wrote: > > On Wed, Feb 14, 2018 at 04:58:56PM +, Bart Van Assche wrote: > > > With this patch applied the tests I ran so far pass. > >

Re: [PATCH v2] blk-mq: Fix race between resetting the timer and completion handling

2018-02-18 Thread t...@kernel.org
Hello, Bart. On Wed, Feb 14, 2018 at 04:58:56PM +, Bart Van Assche wrote: > With this patch applied the tests I ran so far pass. Ah, great to hear. Thanks a lot for testing. Can you please verify the following? It's the same approach but with RCU sync batching. Thanks. Index:

Re: [PATCHSET v2] blk-mq: reimplement timeout handling

2017-12-20 Thread t...@kernel.org
On Wed, Dec 20, 2017 at 11:41:02PM +, Bart Van Assche wrote: > On Tue, 2017-12-12 at 11:01 -0800, Tejun Heo wrote: > > Currently, blk-mq timeout path synchronizes against the usual > > issue/completion path using a complex scheme involving atomic > > bitflags, REQ_ATOM_*, memory barriers and

Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-10 Thread t...@kernel.org
Hello, Ming. On Tue, Apr 10, 2018 at 11:25:54PM +0800, Ming Lei wrote: > + if (time_after_eq(jiffies, deadline) && > + blk_mq_change_rq_state(rq, MQ_RQ_IN_FLIGHT, MQ_RQ_COMPLETE)) { > + blk_mq_rq_timed_out(rq, reserved); > > Normal completion still can happen between

Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-10 Thread t...@kernel.org
Hello, On Tue, Apr 10, 2018 at 02:30:26PM +, Bart Van Assche wrote: > > Switching to another model might be better but let's please do that > > with the right rationales. A good portion of this seems to be built > > on misunderstandings. > > Which misunderstandings? I'm not aware of any

Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-10 Thread t...@kernel.org
Hello, On Tue, Apr 10, 2018 at 11:38:18PM +0800, Ming Lei wrote: > I agree, the issue should be in driver's irq handler and .timeout in > theory. > > For example, even though one request has been done by irq handler, .timeout > still may return RESET_TIMER. blk-mq can use a separate flag to

Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-10 Thread t...@kernel.org
On Tue, Apr 10, 2018 at 09:46:23PM +, Bart Van Assche wrote: > On Tue, 2018-04-10 at 14:33 -0700, t...@kernel.org wrote: > > + else > > + rq->missed_completion = true; > > In this patch I found code that sets rq->missed_completion but no c

Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-10 Thread t...@kernel.org
Hello, On Tue, Apr 10, 2018 at 08:40:43AM -0700, t...@kernel.org wrote: > On Tue, Apr 10, 2018 at 11:38:18PM +0800, Ming Lei wrote: > > I agree, the issue should be in driver's irq handler and .timeout in > > theory. > > > > For example, even though one request h

Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-11 Thread t...@kernel.org
Hello, Bart. On Wed, Apr 11, 2018 at 12:50:51PM +, Bart Van Assche wrote: > Thank you for having shared this patch. It looks interesting to me. What I > know about the blk-mq timeout handling is as follows: > * Nobody who has reviewed the blk-mq timeout handling code with this patch >

Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-11 Thread t...@kernel.org
Hello, On Wed, Apr 11, 2018 at 05:24:49PM +0300, Sagi Grimberg wrote: > > >Ah, yeah, I was moving it out of add_timer but forgot to actully add > >it to the issue path. Fixed patch below. > > > >BTW, no matter what we do w/ the request handover between normal and > >timeout paths, we'd need

Re: [PATCH v2] blk-cgroup: remove entries in blkg_tree before queue release

2018-04-11 Thread t...@kernel.org
Hello, On Wed, Apr 11, 2018 at 04:42:55PM +, Bart Van Assche wrote: > On Wed, 2018-04-11 at 07:56 -0700, Tejun Heo wrote: > > And looking at the change, it looks like the right thing we should > > have done is caching @lock on the print_blkg side and when switching > > locks make sure both

Re: [PATCH v2] blk-cgroup: remove entries in blkg_tree before queue release

2018-04-11 Thread t...@kernel.org
Hello, On Wed, Apr 11, 2018 at 05:06:41PM +, Bart Van Assche wrote: > A simple and effective solution is to dissociate a request queue from the > block cgroup controller before blk_cleanup_queue() returns. This is why commit > a063057d7c73 ("block: Fix a race between request queue removal and

Re: [PATCH v2] blk-cgroup: remove entries in blkg_tree before queue release

2018-04-11 Thread t...@kernel.org
Hello, On Wed, Apr 11, 2018 at 05:26:12PM +, Bart Van Assche wrote: > Please explain what you wrote further. It's not clear to me why you think > that anything is broken nor what a "sever model" is. So, sever (or revoke) model is where you actively disconnect an object and ensuring that

Re: [PATCH v2] blk-cgroup: remove entries in blkg_tree before queue release

2018-04-11 Thread t...@kernel.org
On Wed, Apr 11, 2018 at 08:00:29PM +, Bart Van Assche wrote: > On Wed, 2018-04-11 at 12:57 -0700, t...@kernel.org wrote: > > On Wed, Apr 11, 2018 at 01:55:25PM -0600, Bart Van Assche wrote: > > > On 04/11/18 13:00, Alexandru Moise wrote: > > > > But the root caus

Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-11 Thread t...@kernel.org
Hey, again. On Wed, Apr 11, 2018 at 10:07:33AM -0700, t...@kernel.org wrote: > Hello, Israel. > > On Wed, Apr 11, 2018 at 07:16:14PM +0300, Israel Rukshin wrote: > > >Just noticed this one, this looks interesting to me as well. Israel, > > >can you run your test with

Re: [PATCH v2] blk-cgroup: remove entries in blkg_tree before queue release

2018-04-11 Thread t...@kernel.org
On Wed, Apr 11, 2018 at 01:55:25PM -0600, Bart Van Assche wrote: > On 04/11/18 13:00, Alexandru Moise wrote: > >But the root cause of it is in blkcg_init_queue() when blkg_create() returns > >an ERR ptr, because it tries to insert into a populated index into > >blkcg->blkg_tree, > >the entry that

Re: [PATCH] block: Ensure that a request queue is dissociated from the cgroup controller

2018-04-12 Thread t...@kernel.org
Hello, On Thu, Apr 12, 2018 at 03:56:51PM +0200, h...@lst.de wrote: > On Thu, Apr 12, 2018 at 06:48:12AM -0700, t...@kernel.org wrote: > > > Which sounds like a very good reason not to use a driver controller > > > lock for internals like blkcq. > > > > &g

Re: [PATCH] block: Ensure that a request queue is dissociated from the cgroup controller

2018-04-12 Thread t...@kernel.org
On Thu, Apr 12, 2018 at 06:58:21AM -0700, t...@kernel.org wrote: > Cool, can we just factor out the queue lock from those drivers? It's > just really messy to be switching locks like we do in the cleanup > path. So, looking at a couple drivers, it looks like all we'd need is a str

Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-12 Thread t...@kernel.org
Hello, Israel. On Thu, Apr 12, 2018 at 11:59:11AM +0300, Israel Rukshin wrote: > On 4/12/2018 12:31 AM, t...@kernel.org wrote: > >Hey, again. > > > >On Wed, Apr 11, 2018 at 10:07:33AM -0700, t...@kernel.org wrote: > >>Hello, Israel. > >> > >>

Re: [PATCH] block: Ensure that a request queue is dissociated from the cgroup controller

2018-04-12 Thread t...@kernel.org
Hello, On Thu, Apr 12, 2018 at 03:14:40PM +0200, h...@lst.de wrote: > > At least the SCSI ULP drivers drop the last reference to a disk after > > the blk_cleanup_queue() call. As explained in the description of commit > > a063057d7c73, removing a request queue from blkcg must happen before > >

Re: [PATCH] block: Ensure that a request queue is dissociated from the cgroup controller

2018-04-12 Thread t...@kernel.org
Hello, On Thu, Apr 12, 2018 at 04:29:09PM +, Bart Van Assche wrote: > Any code that submits a bio or request needs blk_queue_enter() / > blk_queue_exit() anyway. Please have a look at the following commit - you will > see that that commit reduces the number of blk_queue_enter() / >

Re: [PATCH] block: Ensure that a request queue is dissociated from the cgroup controller

2018-04-12 Thread t...@kernel.org
Hello, On Thu, Apr 12, 2018 at 04:03:52PM +, Bart Van Assche wrote: > On Thu, 2018-04-12 at 08:37 -0700, Tejun Heo wrote: > > On Thu, Apr 12, 2018 at 08:09:17AM -0600, Bart Van Assche wrote: > > > I have retested hotunplugging by rerunning the srp-test software. It > > > seems like you

Re: [PATCH] block: Ensure that a request queue is dissociated from the cgroup controller

2018-04-12 Thread t...@kernel.org
Hello, On Thu, Apr 12, 2018 at 06:56:26PM +, Bart Van Assche wrote: > On Thu, 2018-04-12 at 11:11 -0700, t...@kernel.org wrote: > > * Right now, blk_queue_enter/exit() doesn't have any annotations. > > IOW, there's no way for paths which need enter locked to actual

Re: [PATCH] blk-mq: Fix recently introduced races in the timeout handling code

2018-04-09 Thread t...@kernel.org
Hello, Bart. On Mon, Apr 09, 2018 at 09:30:27PM +, Bart Van Assche wrote: > On Mon, 2018-04-09 at 11:56 -0700, t...@kernel.org wrote: > > On Mon, Apr 09, 2018 at 05:03:05PM +, Bart Van Assche wrote: > > > exist today in the blk-mq timeout handling code cannot b

Re: [PATCH] block: Ensure that a request queue is dissociated from the cgroup controller

2018-04-13 Thread t...@kernel.org
Hello, Bart. On Thu, Apr 12, 2018 at 10:40:52PM +, Bart Van Assche wrote: > Is something like the patch below perhaps what you had in mind? Yeah, exactly. It'd be really great to have the lockdep asserts add to the right places. Thanks. -- tejun

Re: [PATCH 2/2] blk-mq: Fix request handover from timeout path to normal execution

2018-04-02 Thread t...@kernel.org
Hello, On Mon, Apr 02, 2018 at 09:31:34PM +, Bart Van Assche wrote: > > > > +* As nothing prevents from completion happening while > > > > +* ->aborted_gstate is set, this may lead to ignored completions > > > > +* and further spurious timeouts. > > > > +*/ > >

Re: [PATCH 2/2] blk-mq: Fix request handover from timeout path to normal execution

2018-04-02 Thread t...@kernel.org
Hello, On Mon, Apr 02, 2018 at 09:56:41PM +, Bart Van Assche wrote: > This patch increases the time during which .aborted_gstate == .gstate if the > timeout is reset. Does that increase the chance that a completion will be > missed > if the request timeout is reset? It sure does, but we're

Re: [PATCH 2/2] blk-mq: Fix request handover from timeout path to normal execution

2018-04-02 Thread t...@kernel.org
Hello, On Mon, Apr 02, 2018 at 10:09:18PM +, Bart Van Assche wrote: > Please elaborate what your long-term goal is for the blk-mq timeout handler. Hmm... I don't really have any plans beyond what's been posted. > The legacy block layer suspends request state changes while a timeout is >