Re: About the try to remove cross-release feature entirely by Ingo

2017-12-12 Thread Byungchul Park
On Wed, Dec 13, 2017 at 3:24 PM, Byungchul Park wrote: > Lockdep works, based on the following: > >(1) Classifying locks properly >(2) Checking relationship between the classes > > If (1) is not good or (2) is not good, then we > might get false positives. > > For (1), we don't have to cla

Re: [PATCH 6/6] nvme: remove .init_request callback

2017-12-12 Thread Ming Lei
On Tue, Dec 12, 2017 at 11:10:32PM +0800, Ming Lei wrote: > On Tue, Dec 12, 2017 at 03:13:58PM +0100, Christoph Hellwig wrote: > > On Tue, Dec 12, 2017 at 07:02:32PM +0800, Ming Lei wrote: > > > It may cause race by setting 'nvmeq' in nvme_init_request() > > > because .init_request is called inside

About the try to remove cross-release feature entirely by Ingo

2017-12-12 Thread Byungchul Park
Lockdep works, based on the following: (1) Classifying locks properly (2) Checking relationship between the classes If (1) is not good or (2) is not good, then we might get false positives. For (1), we don't have to classify locks 100% properly but need as enough as lockdep works. For (2)

Re: [PATCH] locking/lockdep: Add CONFIG_LOCKDEP_AGGRESSIVE

2017-12-12 Thread Byungchul Park
On 12/13/2017 2:00 AM, Linus Torvalds wrote: On Mon, Dec 11, 2017 at 9:20 PM, Byungchul Park wrote: The *problem* is false positives, since locks and waiters in kernel are not classified properly So the problem is that those false positives apparently end up being a big deal for the filesyst

Re: [PATCH] locking/lockdep: Add CONFIG_LOCKDEP_AGGRESSIVE

2017-12-12 Thread Byungchul Park
On 12/12/2017 10:03 PM, Theodore Ts'o wrote: On Tue, Dec 12, 2017 at 02:20:32PM +0900, Byungchul Park wrote: The *problem* is false positives, since locks and waiters in kernel are not classified properly, at the moment, which is just a fact that is not related to cross-release stuff at all. IO

Re: [PATCH 2/6] blk-mq: replace timeout synchronization with a RCU and generation based scheme

2017-12-12 Thread jianchao.wang
Hi Tejun On 12/13/2017 03:01 AM, Tejun Heo wrote: > Currently, blk-mq timeout path synchronizes against the usual > issue/completion path using a complex scheme involving atomic > bitflags, REQ_ATOM_*, memory barriers and subtle memory coherence > rules. Unfortunatley, it contains quite a few hol

Re: [PATCH 1/6] blk-mq: protect completion path with RCU

2017-12-12 Thread jianchao.wang
Hello tejun Sorry for missing the V2, same comment again. On 12/13/2017 03:01 AM, Tejun Heo wrote: > Currently, blk-mq protects only the issue path with RCU. This patch > puts the completion path under the same RCU protection. This will be > used to synchronize issue/completion against timeout

Re: [PATCH 00/11] fs: use freeze_fs on suspend/hibernate

2017-12-12 Thread Rafael J. Wysocki
On Thursday, November 30, 2017 12:23:45 AM CET Luis R. Rodriguez wrote: > This is a followup from the original RFC which proposed to start > to kill kthread freezing all together [0]. Instead of going straight > out to the jugular for kthread freezing this series only addresses > killing freezer ca

Re: [PATCH 6/6] blk-mq: remove REQ_ATOM_STARTED

2017-12-12 Thread t...@kernel.org
On Tue, Dec 12, 2017 at 10:20:39PM +, Bart Van Assche wrote: > The above code should show all requests owned by the block driver. Patch > "blk-mq-debugfs: Also show requests that have not yet been started" (not yet > in Jens' tree) changes the REQ_ATOM_STARTED test into > list_empty(&rq->queue

Re: [PATCH 6/6] blk-mq: remove REQ_ATOM_STARTED

2017-12-12 Thread Bart Van Assche
On Tue, 2017-12-12 at 11:01 -0800, Tejun Heo wrote: > @@ -409,7 +407,7 @@ static void hctx_show_busy_rq(struct request *rq, void > *data, bool reserved) > const struct show_busy_params *params = data; > > if (blk_mq_map_queue(rq->q, rq->mq_ctx->cpu) == params->hctx && > -

Re: [PATCH 2/6] blk-mq: replace timeout synchronization with a RCU and generation based scheme

2017-12-12 Thread t...@kernel.org
Hello, Bart. On Tue, Dec 12, 2017 at 09:37:11PM +, Bart Van Assche wrote: > Have you considered the following instead of introducing MQ_RQ_IDLE and > MQ_RQ_IN_FLIGHT? I think this could help to limit the number of new atomic > operations introduced in the hot path by this patch series. But no

Re: [PATCHSET v2] blk-mq: reimplement timeout handling

2017-12-12 Thread Tejun Heo
Hello, Jens. On Tue, Dec 12, 2017 at 01:23:01PM -0700, Jens Axboe wrote: > I like this a lot, it's a lot less fragile and more intuitive/readable > than what we have now. And apparently less error prone... I'll do > some testing with this. Great. > BTW, since youadd a few more BLK_MQ_F_BLOCKING

Re: [PATCH 2/6] blk-mq: replace timeout synchronization with a RCU and generation based scheme

2017-12-12 Thread Bart Van Assche
On Tue, 2017-12-12 at 11:01 -0800, Tejun Heo wrote: > +/* > + * Bits for request->gstate. The lower two bits carry MQ_RQ_* state value > + * and the upper bits the generation number. > + */ > +enum mq_rq_state { > + MQ_RQ_IDLE = 0, > + MQ_RQ_IN_FLIGHT = 1, > + > +

Re: WARNING in kmalloc_slab (3)

2017-12-12 Thread Eric Biggers
On Mon, Dec 04, 2017 at 12:26:32PM +0300, Dan Carpenter wrote: > On Mon, Dec 04, 2017 at 09:18:05AM +0100, Dmitry Vyukov wrote: > > On Mon, Dec 4, 2017 at 9:14 AM, Dan Carpenter > > wrote: > > > On Sun, Dec 03, 2017 at 12:16:08PM -0800, Eric Biggers wrote: > > >> Looks like BLKTRACESETUP doesn't

Re: [PATCHSET v2] blk-mq: reimplement timeout handling

2017-12-12 Thread Jens Axboe
On 12/12/2017 12:01 PM, Tejun Heo wrote: > Changes from the last version[1] > > - BLK_EH_RESET_TIMER handling fixed. > > - s/request->gstate_seqc/request->gstate_seq/ > > - READ_ONCE() added to blk_mq_rq_udpate_state(). > > - Removed left over blk_clear_rq_complete() invocation from > blk_mq_

[PATCHSET v2] blk-mq: reimplement timeout handling

2017-12-12 Thread Tejun Heo
Changes from the last version[1] - BLK_EH_RESET_TIMER handling fixed. - s/request->gstate_seqc/request->gstate_seq/ - READ_ONCE() added to blk_mq_rq_udpate_state(). - Removed left over blk_clear_rq_complete() invocation from blk_mq_rq_timed_out(). Currently, blk-mq timeout path synchronizes

[PATCH 1/6] blk-mq: protect completion path with RCU

2017-12-12 Thread Tejun Heo
Currently, blk-mq protects only the issue path with RCU. This patch puts the completion path under the same RCU protection. This will be used to synchronize issue/completion against timeout by later patches, which will also add the comments. Signed-off-by: Tejun Heo --- block/blk-mq.c | 16 +++

[PATCH 3/6] blk-mq: use blk_mq_rq_state() instead of testing REQ_ATOM_COMPLETE

2017-12-12 Thread Tejun Heo
blk_mq_check_inflight() and blk_mq_poll_hybrid_sleep() test REQ_ATOM_COMPLETE to determine the request state. Both uses are speculative and we can test REQ_ATOM_STARTED and blk_mq_rq_state() for equivalent results. Replace the tests. This will allow removing REQ_ATOM_COMPLETE usages from blk-mq.

[PATCH 5/6] blk-mq: remove REQ_ATOM_COMPLETE usages from blk-mq

2017-12-12 Thread Tejun Heo
After the recent updates to use generation number and state based synchronization, blk-mq no longer depends on REQ_ATOM_COMPLETE for anything. Remove all REQ_ATOM_COMPLETE usages. This removes atomic bitops from hot paths too. v2: Removed blk_clear_rq_complete() from blk_mq_rq_timed_out(). Sign

[PATCH 2/6] blk-mq: replace timeout synchronization with a RCU and generation based scheme

2017-12-12 Thread Tejun Heo
Currently, blk-mq timeout path synchronizes against the usual issue/completion path using a complex scheme involving atomic bitflags, REQ_ATOM_*, memory barriers and subtle memory coherence rules. Unfortunatley, it contains quite a few holes. There's a complex dancing around REQ_ATOM_STARTED and

[PATCH 6/6] blk-mq: remove REQ_ATOM_STARTED

2017-12-12 Thread Tejun Heo
After the recent updates to use generation number and state based synchronization, we can easily replace REQ_ATOM_STARTED usages by adding an extra state to distinguish completed but not yet freed state. Add MQ_RQ_COMPLETE and replace REQ_ATOM_STARTED usages with blk_mq_rq_state() tests. REQ_ATOM

[PATCH 4/6] blk-mq: make blk_abort_request() trigger timeout path

2017-12-12 Thread Tejun Heo
With issue/complete and timeout paths now using the generation number and state based synchronization, blk_abort_request() is the only one which depends on REQ_ATOM_COMPLETE for arbitrating completion. There's no reason for blk_abort_request() to be a completely separate path. This patch makes bl

Re: [PATCH] locking/lockdep: Add CONFIG_LOCKDEP_AGGRESSIVE

2017-12-12 Thread Linus Torvalds
On Mon, Dec 11, 2017 at 9:20 PM, Byungchul Park wrote: > > The *problem* is false positives, since locks and waiters in > kernel are not classified properly So the problem is that those false positives apparently end up being a big deal for the filesystem people. I personally don't think the cod

Re: WARNING in kmalloc_slab (3)

2017-12-12 Thread Dmitry Vyukov
On Mon, Dec 4, 2017 at 10:26 AM, Dan Carpenter wrote: > On Mon, Dec 04, 2017 at 09:18:05AM +0100, Dmitry Vyukov wrote: >> On Mon, Dec 4, 2017 at 9:14 AM, Dan Carpenter >> wrote: >> > On Sun, Dec 03, 2017 at 12:16:08PM -0800, Eric Biggers wrote: >> >> Looks like BLKTRACESETUP doesn't limit the '.

Re: [PATCH] locking/lockdep: Add CONFIG_LOCKDEP_AGGRESSIVE

2017-12-12 Thread Matthew Wilcox
On Tue, Dec 12, 2017 at 08:03:43AM -0500, Theodore Ts'o wrote: > On Tue, Dec 12, 2017 at 02:20:32PM +0900, Byungchul Park wrote: > > The *problem* is false positives, since locks and waiters in > > kernel are not classified properly, at the moment, which is just > > a fact that is not related to cr

Re: [PATCH 4/6] blk-mq: avoid to map CPU into stale hw queue

2017-12-12 Thread Ming Lei
On Tue, Dec 12, 2017 at 03:12:43PM +0100, Christoph Hellwig wrote: > On Tue, Dec 12, 2017 at 07:02:30PM +0800, Ming Lei wrote: > > blk_mq_pci_map_queues() may not map one CPU into any hw queue, but its > > previous map isn't cleared yet, and may point to one stale hw queue > > index. > > > > This

Re: [PATCH 6/6] nvme: remove .init_request callback

2017-12-12 Thread Ming Lei
On Tue, Dec 12, 2017 at 03:13:58PM +0100, Christoph Hellwig wrote: > On Tue, Dec 12, 2017 at 07:02:32PM +0800, Ming Lei wrote: > > It may cause race by setting 'nvmeq' in nvme_init_request() > > because .init_request is called inside switching io scheduler, which > > may happen when the NVMe device

Re: [PATCH 6/6] nvme: remove .init_request callback

2017-12-12 Thread Christoph Hellwig
On Tue, Dec 12, 2017 at 07:02:32PM +0800, Ming Lei wrote: > It may cause race by setting 'nvmeq' in nvme_init_request() > because .init_request is called inside switching io scheduler, which > may happen when the NVMe device is being resetted and its nvme queues > are being freed and created. We do

Re: [PATCH 4/6] blk-mq: avoid to map CPU into stale hw queue

2017-12-12 Thread Christoph Hellwig
On Tue, Dec 12, 2017 at 07:02:30PM +0800, Ming Lei wrote: > blk_mq_pci_map_queues() may not map one CPU into any hw queue, but its > previous map isn't cleared yet, and may point to one stale hw queue > index. > > This patch fixes the following issue by clearing the mapping table before > setting

Re: [PATCH] locking/lockdep: Add CONFIG_LOCKDEP_AGGRESSIVE

2017-12-12 Thread Theodore Ts'o
On Tue, Dec 12, 2017 at 02:20:32PM +0900, Byungchul Park wrote: > > The *problem* is false positives, since locks and waiters in > kernel are not classified properly, at the moment, which is just > a fact that is not related to cross-release stuff at all. IOW, > that would be useful once all locks

[PATCH 3/6] blk-mq: quiesce queue during switching io sched and updating nr_requests

2017-12-12 Thread Ming Lei
Dispatch may still be in-progress after queue is frozen, so we have to quiesce queue before switching IO scheduler and updating nr_requests. Also when switching io schedulers, blk_mq_run_hw_queue() may still be called somewhere(such as from nvme_reset_work()), and io scheduler's per-hctx data may

[PATCH 4/6] blk-mq: avoid to map CPU into stale hw queue

2017-12-12 Thread Ming Lei
blk_mq_pci_map_queues() may not map one CPU into any hw queue, but its previous map isn't cleared yet, and may point to one stale hw queue index. This patch fixes the following issue by clearing the mapping table before setting it up in blk_mq_pci_map_queues(). This patches fixes this following i

[PATCH 2/6] blk-mq: support concurrent blk_mq_quiesce_queue()

2017-12-12 Thread Ming Lei
Turns out that blk_mq_freeze_queue() isn't stronger[1] than blk_mq_quiesce_queue() because dispatch may still be in-progress after queue is frozen, and in several cases, such as switching io scheduler, and updating hw queues, we still need to quiesce queue as a supplement of freezing queue. As we

[PATCH 6/6] nvme: remove .init_request callback

2017-12-12 Thread Ming Lei
It may cause race by setting 'nvmeq' in nvme_init_request() because .init_request is called inside switching io scheduler, which may happen when the NVMe device is being resetted and its nvme queues are being freed and created. We don't have any sync between the two pathes. This patch removes the

[PATCH 5/6] blk-mq: fix race between updating nr_hw_queues and switching io sched

2017-12-12 Thread Ming Lei
In both elevator_switch_mq() and blk_mq_update_nr_hw_queues(), sched tags can be allocated, and q->nr_hw_queue is used, and race is inevitable, for example: blk_mq_init_sched() may trigger use-after-free on hctx, which is freed in blk_mq_realloc_hw_ctxs() when nr_hw_queues is decreased. This patch

[PATCH 1/6] blk-mq: quiesce queue before freeing queue

2017-12-12 Thread Ming Lei
After queue is frozen, dispatch still may happen, for example: 1) requests are submitted from several contexts 2) requests from all these contexts are inserted to queue, but may dispatch to LLD in one of these paths, but other paths sill need to move on even all these requests are completed(that m

[PATCH 0/6] blk-mq: fix race related with device deletion/reset/switching sched

2017-12-12 Thread Ming Lei
Hi, The 1st patch fixes one kernel oops triggered by IOs vs. deleting SCSI device, and this issue can be triggered in less than 1min on scsi_debug. The other 5 patch fixes recent Yi Zhang's reports about his NVMe stress tests, most of them are related with switching io sched, NVMe reset or updati

Re: [PATCH 00/10] block: cleanup on direct access to bvec table(prepare for multipage bvec)

2017-12-12 Thread Ming Lei
On Mon, Dec 11, 2017 at 11:57:38PM -0800, Christoph Hellwig wrote: > Most of this looks sane, but I'd really like to see it in context > of the actual multipage bvec patches. Do you have an updated branch > on top of these? I will post it out soon after addressing some of last comments. Thanks,