On Wed, Dec 13, 2017 at 3:24 PM, Byungchul Park
wrote:
> Lockdep works, based on the following:
>
>(1) Classifying locks properly
>(2) Checking relationship between the classes
>
> If (1) is not good or (2) is not good, then we
> might get false positives.
>
> For (1), we don't have to cla
On Tue, Dec 12, 2017 at 11:10:32PM +0800, Ming Lei wrote:
> On Tue, Dec 12, 2017 at 03:13:58PM +0100, Christoph Hellwig wrote:
> > On Tue, Dec 12, 2017 at 07:02:32PM +0800, Ming Lei wrote:
> > > It may cause race by setting 'nvmeq' in nvme_init_request()
> > > because .init_request is called inside
Lockdep works, based on the following:
(1) Classifying locks properly
(2) Checking relationship between the classes
If (1) is not good or (2) is not good, then we
might get false positives.
For (1), we don't have to classify locks 100%
properly but need as enough as lockdep works.
For (2)
On 12/13/2017 2:00 AM, Linus Torvalds wrote:
On Mon, Dec 11, 2017 at 9:20 PM, Byungchul Park wrote:
The *problem* is false positives, since locks and waiters in
kernel are not classified properly
So the problem is that those false positives apparently end up being a
big deal for the filesyst
On 12/12/2017 10:03 PM, Theodore Ts'o wrote:
On Tue, Dec 12, 2017 at 02:20:32PM +0900, Byungchul Park wrote:
The *problem* is false positives, since locks and waiters in
kernel are not classified properly, at the moment, which is just
a fact that is not related to cross-release stuff at all. IO
Hi Tejun
On 12/13/2017 03:01 AM, Tejun Heo wrote:
> Currently, blk-mq timeout path synchronizes against the usual
> issue/completion path using a complex scheme involving atomic
> bitflags, REQ_ATOM_*, memory barriers and subtle memory coherence
> rules. Unfortunatley, it contains quite a few hol
Hello tejun
Sorry for missing the V2, same comment again.
On 12/13/2017 03:01 AM, Tejun Heo wrote:
> Currently, blk-mq protects only the issue path with RCU. This patch
> puts the completion path under the same RCU protection. This will be
> used to synchronize issue/completion against timeout
On Thursday, November 30, 2017 12:23:45 AM CET Luis R. Rodriguez wrote:
> This is a followup from the original RFC which proposed to start
> to kill kthread freezing all together [0]. Instead of going straight
> out to the jugular for kthread freezing this series only addresses
> killing freezer ca
On Tue, Dec 12, 2017 at 10:20:39PM +, Bart Van Assche wrote:
> The above code should show all requests owned by the block driver. Patch
> "blk-mq-debugfs: Also show requests that have not yet been started" (not yet
> in Jens' tree) changes the REQ_ATOM_STARTED test into
> list_empty(&rq->queue
On Tue, 2017-12-12 at 11:01 -0800, Tejun Heo wrote:
> @@ -409,7 +407,7 @@ static void hctx_show_busy_rq(struct request *rq, void
> *data, bool reserved)
> const struct show_busy_params *params = data;
>
> if (blk_mq_map_queue(rq->q, rq->mq_ctx->cpu) == params->hctx &&
> -
Hello, Bart.
On Tue, Dec 12, 2017 at 09:37:11PM +, Bart Van Assche wrote:
> Have you considered the following instead of introducing MQ_RQ_IDLE and
> MQ_RQ_IN_FLIGHT? I think this could help to limit the number of new atomic
> operations introduced in the hot path by this patch series.
But no
Hello, Jens.
On Tue, Dec 12, 2017 at 01:23:01PM -0700, Jens Axboe wrote:
> I like this a lot, it's a lot less fragile and more intuitive/readable
> than what we have now. And apparently less error prone... I'll do
> some testing with this.
Great.
> BTW, since youadd a few more BLK_MQ_F_BLOCKING
On Tue, 2017-12-12 at 11:01 -0800, Tejun Heo wrote:
> +/*
> + * Bits for request->gstate. The lower two bits carry MQ_RQ_* state value
> + * and the upper bits the generation number.
> + */
> +enum mq_rq_state {
> + MQ_RQ_IDLE = 0,
> + MQ_RQ_IN_FLIGHT = 1,
> +
> +
On Mon, Dec 04, 2017 at 12:26:32PM +0300, Dan Carpenter wrote:
> On Mon, Dec 04, 2017 at 09:18:05AM +0100, Dmitry Vyukov wrote:
> > On Mon, Dec 4, 2017 at 9:14 AM, Dan Carpenter
> > wrote:
> > > On Sun, Dec 03, 2017 at 12:16:08PM -0800, Eric Biggers wrote:
> > >> Looks like BLKTRACESETUP doesn't
On 12/12/2017 12:01 PM, Tejun Heo wrote:
> Changes from the last version[1]
>
> - BLK_EH_RESET_TIMER handling fixed.
>
> - s/request->gstate_seqc/request->gstate_seq/
>
> - READ_ONCE() added to blk_mq_rq_udpate_state().
>
> - Removed left over blk_clear_rq_complete() invocation from
> blk_mq_
Changes from the last version[1]
- BLK_EH_RESET_TIMER handling fixed.
- s/request->gstate_seqc/request->gstate_seq/
- READ_ONCE() added to blk_mq_rq_udpate_state().
- Removed left over blk_clear_rq_complete() invocation from
blk_mq_rq_timed_out().
Currently, blk-mq timeout path synchronizes
Currently, blk-mq protects only the issue path with RCU. This patch
puts the completion path under the same RCU protection. This will be
used to synchronize issue/completion against timeout by later patches,
which will also add the comments.
Signed-off-by: Tejun Heo
---
block/blk-mq.c | 16 +++
blk_mq_check_inflight() and blk_mq_poll_hybrid_sleep() test
REQ_ATOM_COMPLETE to determine the request state. Both uses are
speculative and we can test REQ_ATOM_STARTED and blk_mq_rq_state() for
equivalent results. Replace the tests. This will allow removing
REQ_ATOM_COMPLETE usages from blk-mq.
After the recent updates to use generation number and state based
synchronization, blk-mq no longer depends on REQ_ATOM_COMPLETE for
anything.
Remove all REQ_ATOM_COMPLETE usages. This removes atomic bitops from
hot paths too.
v2: Removed blk_clear_rq_complete() from blk_mq_rq_timed_out().
Sign
Currently, blk-mq timeout path synchronizes against the usual
issue/completion path using a complex scheme involving atomic
bitflags, REQ_ATOM_*, memory barriers and subtle memory coherence
rules. Unfortunatley, it contains quite a few holes.
There's a complex dancing around REQ_ATOM_STARTED and
After the recent updates to use generation number and state based
synchronization, we can easily replace REQ_ATOM_STARTED usages by
adding an extra state to distinguish completed but not yet freed
state.
Add MQ_RQ_COMPLETE and replace REQ_ATOM_STARTED usages with
blk_mq_rq_state() tests. REQ_ATOM
With issue/complete and timeout paths now using the generation number
and state based synchronization, blk_abort_request() is the only one
which depends on REQ_ATOM_COMPLETE for arbitrating completion.
There's no reason for blk_abort_request() to be a completely separate
path. This patch makes bl
On Mon, Dec 11, 2017 at 9:20 PM, Byungchul Park wrote:
>
> The *problem* is false positives, since locks and waiters in
> kernel are not classified properly
So the problem is that those false positives apparently end up being a
big deal for the filesystem people.
I personally don't think the cod
On Mon, Dec 4, 2017 at 10:26 AM, Dan Carpenter wrote:
> On Mon, Dec 04, 2017 at 09:18:05AM +0100, Dmitry Vyukov wrote:
>> On Mon, Dec 4, 2017 at 9:14 AM, Dan Carpenter
>> wrote:
>> > On Sun, Dec 03, 2017 at 12:16:08PM -0800, Eric Biggers wrote:
>> >> Looks like BLKTRACESETUP doesn't limit the '.
On Tue, Dec 12, 2017 at 08:03:43AM -0500, Theodore Ts'o wrote:
> On Tue, Dec 12, 2017 at 02:20:32PM +0900, Byungchul Park wrote:
> > The *problem* is false positives, since locks and waiters in
> > kernel are not classified properly, at the moment, which is just
> > a fact that is not related to cr
On Tue, Dec 12, 2017 at 03:12:43PM +0100, Christoph Hellwig wrote:
> On Tue, Dec 12, 2017 at 07:02:30PM +0800, Ming Lei wrote:
> > blk_mq_pci_map_queues() may not map one CPU into any hw queue, but its
> > previous map isn't cleared yet, and may point to one stale hw queue
> > index.
> >
> > This
On Tue, Dec 12, 2017 at 03:13:58PM +0100, Christoph Hellwig wrote:
> On Tue, Dec 12, 2017 at 07:02:32PM +0800, Ming Lei wrote:
> > It may cause race by setting 'nvmeq' in nvme_init_request()
> > because .init_request is called inside switching io scheduler, which
> > may happen when the NVMe device
On Tue, Dec 12, 2017 at 07:02:32PM +0800, Ming Lei wrote:
> It may cause race by setting 'nvmeq' in nvme_init_request()
> because .init_request is called inside switching io scheduler, which
> may happen when the NVMe device is being resetted and its nvme queues
> are being freed and created. We do
On Tue, Dec 12, 2017 at 07:02:30PM +0800, Ming Lei wrote:
> blk_mq_pci_map_queues() may not map one CPU into any hw queue, but its
> previous map isn't cleared yet, and may point to one stale hw queue
> index.
>
> This patch fixes the following issue by clearing the mapping table before
> setting
On Tue, Dec 12, 2017 at 02:20:32PM +0900, Byungchul Park wrote:
>
> The *problem* is false positives, since locks and waiters in
> kernel are not classified properly, at the moment, which is just
> a fact that is not related to cross-release stuff at all. IOW,
> that would be useful once all locks
Dispatch may still be in-progress after queue is frozen, so we have to
quiesce queue before switching IO scheduler and updating nr_requests.
Also when switching io schedulers, blk_mq_run_hw_queue() may still be
called somewhere(such as from nvme_reset_work()), and io scheduler's
per-hctx data may
blk_mq_pci_map_queues() may not map one CPU into any hw queue, but its
previous map isn't cleared yet, and may point to one stale hw queue
index.
This patch fixes the following issue by clearing the mapping table before
setting it up in blk_mq_pci_map_queues().
This patches fixes this following i
Turns out that blk_mq_freeze_queue() isn't stronger[1] than
blk_mq_quiesce_queue() because dispatch may still be in-progress after
queue is frozen, and in several cases, such as switching io scheduler,
and updating hw queues, we still need to quiesce queue as a supplement
of freezing queue.
As we
It may cause race by setting 'nvmeq' in nvme_init_request()
because .init_request is called inside switching io scheduler, which
may happen when the NVMe device is being resetted and its nvme queues
are being freed and created. We don't have any sync between the two
pathes.
This patch removes the
In both elevator_switch_mq() and blk_mq_update_nr_hw_queues(), sched tags
can be allocated, and q->nr_hw_queue is used, and race is inevitable, for
example: blk_mq_init_sched() may trigger use-after-free on hctx, which is
freed in blk_mq_realloc_hw_ctxs() when nr_hw_queues is decreased.
This patch
After queue is frozen, dispatch still may happen, for example:
1) requests are submitted from several contexts
2) requests from all these contexts are inserted to queue, but may dispatch
to LLD in one of these paths, but other paths sill need to move on even all
these requests are completed(that m
Hi,
The 1st patch fixes one kernel oops triggered by IOs vs. deleting SCSI
device, and this issue can be triggered in less than 1min on scsi_debug.
The other 5 patch fixes recent Yi Zhang's reports about his NVMe stress
tests, most of them are related with switching io sched, NVMe reset or
updati
On Mon, Dec 11, 2017 at 11:57:38PM -0800, Christoph Hellwig wrote:
> Most of this looks sane, but I'd really like to see it in context
> of the actual multipage bvec patches. Do you have an updated branch
> on top of these?
I will post it out soon after addressing some of last comments.
Thanks,
38 matches
Mail list logo