Re: [PATCH v4 0/3] AIO add per-command iopriority
On 5/17/18 2:38 PM, adam.manzana...@wdc.com wrote: > From: Adam Manzanares> > This is the per-I/O equivalent of the ioprio_set system call. > See the following link for performance implications on a SATA HDD: > https://lkml.org/lkml/2016/12/6/495 > > First patch factors ioprio_check_cap function out of ioprio_set system call to > also be used by the aio ioprio interface. > > Second patch converts kiocb ki_hint field to a u16 to avoid kiocb bloat. > > Third patch passes ioprio hint from aio iocb to kiocb and enables block_dev > usage of the per I/O ioprio feature. > > v2: merge patches > use IOCB_FLAG_IOPRIO > validate intended use with IOCB_IOPRIO > add linux-api and linux-block to cc > > v3: add ioprio_check_cap function > convert kiocb ki_hint to u16 > use ioprio_check_cap when adding ioprio to kiocb in aio.c > > v4: handle IOCB_IOPRIO in aio_prep_rw > note patch 3 depends on patch 1 in commit msg > > Adam Manzanares (3): > block: add ioprio_check_cap function > fs: Convert kiocb rw_hint from enum to u16 > fs: Add aio iopriority support for block_dev > > block/ioprio.c | 22 -- > fs/aio.c | 16 > fs/block_dev.c | 2 ++ > include/linux/fs.h | 17 +++-- > include/linux/ioprio.h | 2 ++ > include/uapi/linux/aio_abi.h | 1 + > 6 files changed, 52 insertions(+), 8 deletions(-) This looks fine to me now. I can pick up #1 for 4.18 - and 2+3 as well, unless someone else wants to take them. -- Jens Axboe
Re: [PATCH V6 11/11] nvme: pci: support nested EH
On Wed, May 16, 2018 at 08:20:31PM -0600, Keith Busch wrote: > On Thu, May 17, 2018 at 07:10:59AM +0800, Ming Lei wrote: > > All simulation in block/011 may happen in reality. > > If this test actually simulates reality, then the following one line > patch (plus explanation for why) would be a real "fix" as this is very > successful in passing block/011. :) > > --- > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c > index 1faa32cd07da..dcc5746304c4 100644 > --- a/drivers/nvme/host/pci.c > +++ b/drivers/nvme/host/pci.c > @@ -2118,6 +2118,12 @@ static int nvme_pci_enable(struct nvme_dev *dev) > > if (pci_enable_device_mem(pdev)) > return result; > + /* > + * blktests block/011 disables the device without the driver knowing. > + * We'll just enable the device twice to get the enable_cnt > 1 > + * so that the test's disabling does absolutely nothing. > + */ > + pci_enable_device_mem(pdev); Heh. But yes, this test and the PCI "enable" interface in sysfs look horribly wrong. pci_disable_device/pci_enable_device aren't something we can just do underneath the driver. Even if injecting the lowlevel config space manipulations done by it might be useful and a good test the Linux state ones are just killing the driver. The enable attribute appears to have been added by Arjan for the Xorg driver. I think if we have a driver bound to the device we should not allow it.
[PATCH v2] bcache: return 0 from bch_debug_init() if CONFIG_DEBUG_FS=n
Commit 539d39eb2708 ("bcache: fix wrong return value in bch_debug_init()") returns the return value of debugfs_create_dir() to bcache_init(). When CONFIG_DEBUG_FS=n, bch_debug_init() always returns 1 and makes bcache_init() failedi. This patch makes bch_debug_init() always returns 0 if CONFIG_DEBUG_FS=n, so bcache can continue to work for the kernels which don't have debugfs enanbled. Fixes: Commit 539d39eb2708 ("bcache: fix wrong return value in bch_debug_init()") Cc: sta...@vger.kernel.org Signed-off-by: Coly LiReported-by: Massimo B. Reported-by: Kai Krakow Tested-by: Kai Krakow Cc: Kent Overstreet --- drivers/md/bcache/bcache.h | 5 + drivers/md/bcache/debug.c | 8 2 files changed, 9 insertions(+), 4 deletions(-) diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h index 3a0cfb237af9..5b3fe87f32ee 100644 --- a/drivers/md/bcache/bcache.h +++ b/drivers/md/bcache/bcache.h @@ -994,8 +994,13 @@ void bch_open_buckets_free(struct cache_set *); int bch_cache_allocator_start(struct cache *ca); +#ifdef CONFIG_DEBUG_FS void bch_debug_exit(void); int bch_debug_init(struct kobject *); +#else +static inline void bch_debug_exit(void) {}; +static inline int bch_debug_init(struct kobject *kobj) { return 0; }; +#endif void bch_request_exit(void); int bch_request_init(void); diff --git a/drivers/md/bcache/debug.c b/drivers/md/bcache/debug.c index 4e63c6f6c04d..20e5e524e88e 100644 --- a/drivers/md/bcache/debug.c +++ b/drivers/md/bcache/debug.c @@ -17,8 +17,6 @@ #include #include -struct dentry *bcache_debug; - #ifdef CONFIG_BCACHE_DEBUG #define for_each_written_bset(b, start, i) \ @@ -151,6 +149,8 @@ void bch_data_verify(struct cached_dev *dc, struct bio *bio) /* XXX: cache set refcounting */ +struct dentry *bcache_debug; + struct dump_iterator { charbuf[PAGE_SIZE]; size_t bytes; @@ -240,8 +240,6 @@ void bch_debug_init_cache_set(struct cache_set *c) } } -#endif - void bch_debug_exit(void) { if (!IS_ERR_OR_NULL(bcache_debug)) @@ -254,3 +252,5 @@ int __init bch_debug_init(struct kobject *kobj) return IS_ERR_OR_NULL(bcache_debug); } + +#endif -- 2.16.3
Re: [PATCH V6 11/11] nvme: pci: support nested EH
On Thu, May 17, 2018 at 08:20:51AM -0600, Keith Busch wrote: > > Heh. But yes, this test and the PCI "enable" interface in sysfs look > > horribly wrong. pci_disable_device/pci_enable_device aren't something we > > can just do underneath the driver. Even if injecting the lowlevel > > config space manipulations done by it might be useful and a good test > > the Linux state ones are just killing the driver. > > Yes, I'm totally fine with injecting errors into the config space, but > for goodness sakes, don't fuck with the internal kernel structures out > from under drivers using them. > > My suggestion is to use 'setpci' to disable the device if you want to > inject this scenario. That way you get the desired broken device > scenario you want to test, but struct pci_dev hasn't been altered. > > > The enable attribute appears to have been added by Arjan for the > > Xorg driver. I think if we have a driver bound to the device we > > should not allow it. > > Agreed. Alternatively possibly call the driver's reset_preparei/done > callbacks. Exactly, but as long as we can issue the reset via sysfs the test-case is still valid. -- Johannes Thumshirn Storage jthumsh...@suse.de+49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
Re: [PATCH blktests] Documentation: document prerequisite scriptlets
Omar, ping? -- Johannes Thumshirn Storage jthumsh...@suse.de+49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
Re: [PATCH V6 11/11] nvme: pci: support nested EH
On Thu, May 17, 2018 at 10:41:29AM +0200, Christoph Hellwig wrote: > On Wed, May 16, 2018 at 08:20:31PM -0600, Keith Busch wrote: > > On Thu, May 17, 2018 at 07:10:59AM +0800, Ming Lei wrote: > > > All simulation in block/011 may happen in reality. > > > > If this test actually simulates reality, then the following one line > > patch (plus explanation for why) would be a real "fix" as this is very > > successful in passing block/011. :) > > > > --- > > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c > > index 1faa32cd07da..dcc5746304c4 100644 > > --- a/drivers/nvme/host/pci.c > > +++ b/drivers/nvme/host/pci.c > > @@ -2118,6 +2118,12 @@ static int nvme_pci_enable(struct nvme_dev *dev) > > > > if (pci_enable_device_mem(pdev)) > > return result; > > + /* > > +* blktests block/011 disables the device without the driver knowing. > > +* We'll just enable the device twice to get the enable_cnt > 1 > > +* so that the test's disabling does absolutely nothing. > > +*/ > > + pci_enable_device_mem(pdev); > > Heh. But yes, this test and the PCI "enable" interface in sysfs look > horribly wrong. pci_disable_device/pci_enable_device aren't something we > can just do underneath the driver. Even if injecting the lowlevel > config space manipulations done by it might be useful and a good test > the Linux state ones are just killing the driver. Yes, I'm totally fine with injecting errors into the config space, but for goodness sakes, don't fuck with the internal kernel structures out from under drivers using them. My suggestion is to use 'setpci' to disable the device if you want to inject this scenario. That way you get the desired broken device scenario you want to test, but struct pci_dev hasn't been altered. > The enable attribute appears to have been added by Arjan for the > Xorg driver. I think if we have a driver bound to the device we > should not allow it. Agreed. Alternatively possibly call the driver's reset_preparei/done callbacks.
Re: [PATCH 04/33] fs: remove the buffer_unwritten check in page_seek_hole_data
2018-05-09 9:48 GMT+02:00 Christoph Hellwig: > We only call into this function through the iomap iterators, so we already > know the buffer is unwritten. In addition to that we always require the > uptodate flag that is ORed with the result anyway. Please update the page_cache_seek_hole_data description as well: --- a/fs/iomap.c +++ b/fs/iomap.c @@ -647,8 +647,8 @@ * Seek for SEEK_DATA / SEEK_HOLE in the page cache. * * Within unwritten extents, the page cache determines which parts are holes - * and which are data: unwritten and uptodate buffer heads count as data; - * everything else counts as a hole. + * and which are data: uptodate buffer heads count as data; everything else + * counts as a hole. * * Returns the resulting offset on successs, and -ENOENT otherwise. */ Thanks, Andreas
Re: [PATCH v2] bcache: return 0 from bch_debug_init() if CONFIG_DEBUG_FS=n
On Thu, May 17, 2018 at 05:53:48PM +0800, Coly Li wrote: > Commit 539d39eb2708 ("bcache: fix wrong return value in bch_debug_init()") > returns the return value of debugfs_create_dir() to bcache_init(). When > CONFIG_DEBUG_FS=n, bch_debug_init() always returns 1 and makes > bcache_init() failedi. > > This patch makes bch_debug_init() always returns 0 if CONFIG_DEBUG_FS=n, > so bcache can continue to work for the kernels which don't have debugfs > enanbled. > > Fixes: Commit 539d39eb2708 ("bcache: fix wrong return value in > bch_debug_init()") > Cc: sta...@vger.kernel.org > Signed-off-by: Coly Li> Reported-by: Massimo B. > Reported-by: Kai Krakow > Tested-by: Kai Krakow > Cc: Kent Overstreet > --- > drivers/md/bcache/bcache.h | 5 + > drivers/md/bcache/debug.c | 8 > 2 files changed, 9 insertions(+), 4 deletions(-) > > diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h > index 3a0cfb237af9..5b3fe87f32ee 100644 > --- a/drivers/md/bcache/bcache.h > +++ b/drivers/md/bcache/bcache.h > @@ -994,8 +994,13 @@ void bch_open_buckets_free(struct cache_set *); > > int bch_cache_allocator_start(struct cache *ca); > > +#ifdef CONFIG_DEBUG_FS you could just use if (IS_ENABLED(CONFIG_DEBUG_FS)) in bch_debug_init, that way you don't need to add an #ifdef > void bch_debug_exit(void); > int bch_debug_init(struct kobject *); > +#else > +static inline void bch_debug_exit(void) {}; > +static inline int bch_debug_init(struct kobject *kobj) { return 0; }; > +#endif > void bch_request_exit(void); > int bch_request_init(void); > > diff --git a/drivers/md/bcache/debug.c b/drivers/md/bcache/debug.c > index 4e63c6f6c04d..20e5e524e88e 100644 > --- a/drivers/md/bcache/debug.c > +++ b/drivers/md/bcache/debug.c > @@ -17,8 +17,6 @@ > #include > #include > > -struct dentry *bcache_debug; > - > #ifdef CONFIG_BCACHE_DEBUG > > #define for_each_written_bset(b, start, i) \ > @@ -151,6 +149,8 @@ void bch_data_verify(struct cached_dev *dc, struct bio > *bio) > > /* XXX: cache set refcounting */ > > +struct dentry *bcache_debug; > + > struct dump_iterator { > charbuf[PAGE_SIZE]; > size_t bytes; > @@ -240,8 +240,6 @@ void bch_debug_init_cache_set(struct cache_set *c) > } > } > > -#endif > - > void bch_debug_exit(void) > { > if (!IS_ERR_OR_NULL(bcache_debug)) > @@ -254,3 +252,5 @@ int __init bch_debug_init(struct kobject *kobj) > > return IS_ERR_OR_NULL(bcache_debug); > } > + > +#endif > -- > 2.16.3 >
Re: [PATCH v3] bcache: return 0 from bch_debug_init() if CONFIG_DEBUG_FS=n
On Thu, May 17, 2018 at 10:32:54PM +0800, Coly Li wrote: > Commit 539d39eb2708 ("bcache: fix wrong return value in bch_debug_init()") > returns the return value of debugfs_create_dir() to bcache_init(). When > CONFIG_DEBUG_FS=n, bch_debug_init() always returns 1 and makes > bcache_init() failedi. > > This patch makes bch_debug_init() always returns 0 if CONFIG_DEBUG_FS=n, > so bcache can continue to work for the kernels which don't have debugfs > enanbled. > > Fixes: Commit 539d39eb2708 ("bcache: fix wrong return value in > bch_debug_init()") > Cc: sta...@vger.kernel.org > Signed-off-by: Coly Li> Reported-by: Massimo B. > Reported-by: Kai Krakow > Tested-by: Kai Krakow > Cc: Kent Overstreet Acked-by: Kent Overstreet
[PATCH v3] bcache: return 0 from bch_debug_init() if CONFIG_DEBUG_FS=n
Commit 539d39eb2708 ("bcache: fix wrong return value in bch_debug_init()") returns the return value of debugfs_create_dir() to bcache_init(). When CONFIG_DEBUG_FS=n, bch_debug_init() always returns 1 and makes bcache_init() failedi. This patch makes bch_debug_init() always returns 0 if CONFIG_DEBUG_FS=n, so bcache can continue to work for the kernels which don't have debugfs enanbled. Fixes: Commit 539d39eb2708 ("bcache: fix wrong return value in bch_debug_init()") Cc: sta...@vger.kernel.org Signed-off-by: Coly LiReported-by: Massimo B. Reported-by: Kai Krakow Tested-by: Kai Krakow Cc: Kent Overstreet --- drivers/md/bcache/debug.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/md/bcache/debug.c b/drivers/md/bcache/debug.c index 4e63c6f6c04d..d030ce3025a6 100644 --- a/drivers/md/bcache/debug.c +++ b/drivers/md/bcache/debug.c @@ -250,7 +250,9 @@ void bch_debug_exit(void) int __init bch_debug_init(struct kobject *kobj) { - bcache_debug = debugfs_create_dir("bcache", NULL); + if (!IS_ENABLED(CONFIG_DEBUG_FS)) + return 0; + bcache_debug = debugfs_create_dir("bcache", NULL); return IS_ERR_OR_NULL(bcache_debug); } -- 2.16.3
Re: [PATCH v2] bcache: return 0 from bch_debug_init() if CONFIG_DEBUG_FS=n
On 2018/5/17 7:35 PM, Kent Overstreet wrote: > On Thu, May 17, 2018 at 05:53:48PM +0800, Coly Li wrote: >> Commit 539d39eb2708 ("bcache: fix wrong return value in bch_debug_init()") >> returns the return value of debugfs_create_dir() to bcache_init(). When >> CONFIG_DEBUG_FS=n, bch_debug_init() always returns 1 and makes >> bcache_init() failedi. >> >> This patch makes bch_debug_init() always returns 0 if CONFIG_DEBUG_FS=n, >> so bcache can continue to work for the kernels which don't have debugfs >> enanbled. >> >> Fixes: Commit 539d39eb2708 ("bcache: fix wrong return value in >> bch_debug_init()") >> Cc: sta...@vger.kernel.org >> Signed-off-by: Coly Li>> Reported-by: Massimo B. >> Reported-by: Kai Krakow >> Tested-by: Kai Krakow >> Cc: Kent Overstreet >> --- >> drivers/md/bcache/bcache.h | 5 + >> drivers/md/bcache/debug.c | 8 >> 2 files changed, 9 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h >> index 3a0cfb237af9..5b3fe87f32ee 100644 >> --- a/drivers/md/bcache/bcache.h >> +++ b/drivers/md/bcache/bcache.h >> @@ -994,8 +994,13 @@ void bch_open_buckets_free(struct cache_set *); >> >> int bch_cache_allocator_start(struct cache *ca); >> >> +#ifdef CONFIG_DEBUG_FS > > you could just use if (IS_ENABLED(CONFIG_DEBUG_FS)) in bch_debug_init, that > way > you don't need to add an #ifdef > Yes, it is much simpler. v3 patch resent for your review. Thanks for the hint. Coly Li
Re: [PATCH 1/1] bcache: return 0 from bch_debug_init() if CONFIG_DEBUG_FS=n
On 5/17/18 9:33 AM, Coly Li wrote: > Commit 539d39eb2708 ("bcache: fix wrong return value in bch_debug_init()") > returns the return value of debugfs_create_dir() to bcache_init(). When > CONFIG_DEBUG_FS=n, bch_debug_init() always returns 1 and makes > bcache_init() failedi. > > This patch makes bch_debug_init() always returns 0 if CONFIG_DEBUG_FS=n, > so bcache can continue to work for the kernels which don't have debugfs > enanbled. Applied, thanks. -- Jens Axboe
[PATCH 0/1] bcache fix for 4.17-rc6
Hi Jenns, We have reports that bcache does not work when CONFIG_DEBUG_FS=n, here is the bug report on bugzilla.kernel.org: https://bugzilla.kernel.org/show_bug.cgi?id=199683 This problem happens since v4.16 and I believe it still happens in v4.17. This submission has one patch to fix this issue, the patch is tested by one of the reporters. If it is not too late for 4.17, could you please to pick it for 4.17-rc6 ? Thanks in advance. Coly Li --- Coly Li (1): bcache: return 0 from bch_debug_init() if CONFIG_DEBUG_FS=n drivers/md/bcache/debug.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) -- 2.16.3
[PATCH 1/1] bcache: return 0 from bch_debug_init() if CONFIG_DEBUG_FS=n
Commit 539d39eb2708 ("bcache: fix wrong return value in bch_debug_init()") returns the return value of debugfs_create_dir() to bcache_init(). When CONFIG_DEBUG_FS=n, bch_debug_init() always returns 1 and makes bcache_init() failedi. This patch makes bch_debug_init() always returns 0 if CONFIG_DEBUG_FS=n, so bcache can continue to work for the kernels which don't have debugfs enanbled. Changelog: v4: Add Acked-by from Kent Overstreet. v3: Use IS_ENABLED(CONFIG_DEBUG_FS) to replace #ifdef DEBUG_FS. v2: Remove a warning information v1: Initial version. Fixes: Commit 539d39eb2708 ("bcache: fix wrong return value in bch_debug_init()") Cc: sta...@vger.kernel.org Signed-off-by: Coly LiReported-by: Massimo B. Reported-by: Kai Krakow Tested-by: Kai Krakow Acked-by: Kent Overstreet --- drivers/md/bcache/debug.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/md/bcache/debug.c b/drivers/md/bcache/debug.c index 4e63c6f6c04d..d030ce3025a6 100644 --- a/drivers/md/bcache/debug.c +++ b/drivers/md/bcache/debug.c @@ -250,7 +250,9 @@ void bch_debug_exit(void) int __init bch_debug_init(struct kobject *kobj) { - bcache_debug = debugfs_create_dir("bcache", NULL); + if (!IS_ENABLED(CONFIG_DEBUG_FS)) + return 0; + bcache_debug = debugfs_create_dir("bcache", NULL); return IS_ERR_OR_NULL(bcache_debug); } -- 2.16.3
Re: [PATCH 2/2] nbd: don't start req until after the dead connection logic
On Thu, 2017-10-19 at 16:21 -0400, Josef Bacik wrote: > + blk_mq_start_request(req); > if (unlikely(nsock->pending && nsock->pending != req)) { > blk_mq_requeue_request(req, true); > ret = 0; (replying to an e-mail from seven months ago) Hello Josef, Are you aware that the nbd driver is one of the very few block drivers that calls blk_mq_requeue_request() after a request has been started? I think that can lead to the block layer core to undesired behavior, e.g. that the timeout handler fires concurrently with a request being reinstered. Can you or a colleague have a look at this? I would like to add the following code to the block layer core and I think that the nbd driver would trigger this warning: void blk_mq_requeue_request(struct request *rq, bool kick_requeue_list) { + WARN_ON_ONCE(old_state != MQ_RQ_COMPLETE); + __blk_mq_requeue_request(rq); Thanks, Bart.
Re: [PATCH 00/10] Misc block layer patches for bcachefs
On Tue, 2018-05-08 at 21:33 -0400, Kent Overstreet wrote: > [ ... ] Hello Kent, With Jens' latest for-next branch I hit the kernel warning shown below. Can you have a look? Thanks, Bart. == BUG: KASAN: use-after-free in bio_advance+0x110/0x1b0 Read of size 4 at addr 880156c5e6d0 by task ksoftirqd/10/72 CPU: 10 PID: 72 Comm: ksoftirqd/10 Tainted: GW 4.17.0-rc4-dbg+ #5 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 Call Trace: dump_stack+0x9a/0xeb print_address_description+0x65/0x270 kasan_report+0x232/0x350 bio_advance+0x110/0x1b0 blk_update_request+0x9d/0x5a0 scsi_end_request+0x4c/0x300 [scsi_mod] scsi_io_completion+0x71e/0xa40 [scsi_mod] __blk_mq_complete_request+0x143/0x220 srp_recv_done+0x454/0x1100 [ib_srp] __ib_process_cq+0x9a/0xf0 [ib_core] ib_poll_handler+0x2d/0x90 [ib_core] irq_poll_softirq+0xe5/0x1e0 __do_softirq+0x112/0x5f0 run_ksoftirqd+0x29/0x50 smpboot_thread_fn+0x30f/0x410 kthread+0x1b2/0x1d0 ret_from_fork+0x24/0x30 Allocated by task 1356: kasan_kmalloc+0xa0/0xd0 kmem_cache_alloc+0xed/0x320 mempool_alloc+0xc6/0x210 bio_alloc_bioset+0x128/0x2d0 submit_bh_wbc+0x95/0x2d0 __block_write_full_page+0x2a6/0x5c0 __writepage+0x37/0x80 write_cache_pages+0x305/0x7c0 generic_writepages+0xb9/0x110 do_writepages+0x96/0x180 __filemap_fdatawrite_range+0x162/0x1b0 file_write_and_wait_range+0x4d/0xb0 blkdev_fsync+0x3c/0x70 do_fsync+0x33/0x60 __x64_sys_fsync+0x18/0x20 do_syscall_64+0x6d/0x220 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 72: __kasan_slab_free+0x130/0x180 kmem_cache_free+0xcd/0x380 blk_update_request+0xc4/0x5a0 blk_update_request+0xc4/0x5a0 scsi_end_request+0x4c/0x300 [scsi_mod] scsi_io_completion+0x71e/0xa40 [scsi_mod] __blk_mq_complete_request+0x143/0x220 srp_recv_done+0x454/0x1100 [ib_srp] __ib_process_cq+0x9a/0xf0 [ib_core] ib_poll_handler+0x2d/0x90 [ib_core] irq_poll_softirq+0xe5/0x1e0 __do_softirq+0x112/0x5f0 The buggy address belongs to the object at 880156c5e640 which belongs to the cache bio-0 of size 200 The buggy address is located 144 bytes inside of 200-byte region [880156c5e640, 880156c5e708) The buggy address belongs to the page: page:ea00055b1780 count:1 mapcount:0 mapping: index:0x0 compound_mapcount: 0 ib_srpt:srpt_zerolength_write: ib_srpt 10.196.159.179-24: queued zerolength write flags: 0x80008100(slab|head) raw: 80008100 000100190019 raw: ea000543a800 00020002 88015a8f3a00 ib_srpt:srpt_zerolength_write: ib_srpt 10.196.159.179-22: queued zerolength write page dumped because: kasan: bad access detected ib_srpt:srpt_zerolength_write: ib_srpt 10.196.159.179-20: queued zerolength write Memory state around the buggy address: ib_srpt:srpt_zerolength_write: ib_srpt 10.196.159.179-18: queued zerolength write 880156c5e580: 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc ib_srpt:srpt_zerolength_write_done: ib_srpt 10.196.159.179-24 wc->status 5 880156c5e600: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb ib_srpt:srpt_zerolength_write_done: ib_srpt 10.196.159.179-22 wc->status 5 >880156c5e680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ib_srpt:srpt_zerolength_write_done: ib_srpt 10.196.159.179-20 wc->status 5 ^ 880156c5e700: fb fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ib_srpt:srpt_zerolength_write_done: ib_srpt 10.196.159.179-18 wc->status 5 880156c5e780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ib_srpt:srpt_release_channel_work: ib_srpt 10.196.159.179-24 == (gdb) list *(bio_advance+0x110) 0x81450090 is in bio_advance (./include/linux/bvec.h:82). 77 iter->bi_size = 0; 78 return false; 79 } 80 81 while (bytes) { 82 unsigned iter_len = bvec_iter_len(bv, *iter); 83 unsigned len = min(bytes, iter_len); 84 85 bytes -= len; 86 iter->bi_size -= len;
Re: [PATCH 2/2] nbd: don't start req until after the dead connection logic
On Thu, 2018-05-17 at 14:41 -0400, Josef Bacik wrote: > Yup I can tell you why, on 4.11 where I originally did this work > __blk_mq_requeue_request() did this > > static void __blk_mq_requeue_request(struct request *rq) > { > struct request_queue *q = rq->q; > > trace_block_rq_requeue(q, rq); > rq_qos_requeue(q, >issue_stat); > blk_mq_sched_requeue_request(rq); > > if (test_and_clear_bit(REQ_ATOM_STARTED, >atomic_flags)) { > if (q->dma_drain_size && blk_rq_bytes(rq)) > rq->nr_phys_segments--; > } > } > > So it was clearing the started part when it did the requeue. If that's not > what > I'm supposed to be doing anymore then I can send a patch to fix it. What is > supposed to be done if I did already do blk_mq_start_request, because I can > avoid doing the start until after that chunk of code, but there's a part > further > down that needs to have start done before we reach it, so I'll have to do > whatever the special thing is now there. Thanks, Hello Josef, After having had a closer look I think calling blk_mq_start_request() before blk_mq_requeue_request() is fine anyway. The v4.16 block layer core namely defers timeout processing until after .queue_rq() has returned. So the timeout code shouldn't see the requests for which both blk_mq_start_request() and blk_mq_requeue_request() are called from inside .queue_rq(). I will make sure this behavior is preserved. Bart.
[PATCH v4 3/3] fs: Add aio iopriority support for block_dev
From: Adam ManzanaresThis is the per-I/O equivalent of the ioprio_set system call. When IOCB_FLAG_IOPRIO is set on the iocb aio_flags field, then we set the newly added kiocb ki_ioprio field to the value in the iocb aio_reqprio field. When a bio is created for an aio request by the block dev we set the priority value of the bio to the user supplied value. This patch depends on block: add ioprio_check_cap function Signed-off-by: Adam Manzanares --- fs/aio.c | 16 fs/block_dev.c | 2 ++ include/linux/fs.h | 2 ++ include/uapi/linux/aio_abi.h | 1 + 4 files changed, 21 insertions(+) diff --git a/fs/aio.c b/fs/aio.c index f3eae5d5771b..ff3107aa82d5 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -1451,6 +1451,22 @@ static int aio_prep_rw(struct kiocb *req, struct iocb *iocb) if (iocb->aio_flags & IOCB_FLAG_RESFD) req->ki_flags |= IOCB_EVENTFD; req->ki_hint = file_write_hint(req->ki_filp); + if (iocb->aio_flags & IOCB_FLAG_IOPRIO) { + /* +* If the IOCB_FLAG_IOPRIO flag of aio_flags is set, then +* aio_reqprio is interpreted as an I/O scheduling +* class and priority. +*/ + ret = ioprio_check_cap(iocb->aio_reqprio); + if (ret) { + pr_debug("aio ioprio check cap error\n"); + return -EINVAL; + } + + req->ki_ioprio = iocb->aio_reqprio; + req->ki_flags |= IOCB_IOPRIO; + } + ret = kiocb_set_rw_flags(req, iocb->aio_rw_flags); if (unlikely(ret)) fput(req->ki_filp); diff --git a/fs/block_dev.c b/fs/block_dev.c index 7ec920e27065..970bef79caa6 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -355,6 +355,8 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_pages) bio->bi_write_hint = iocb->ki_hint; bio->bi_private = dio; bio->bi_end_io = blkdev_bio_end_io; + if (iocb->ki_flags & IOCB_IOPRIO) + bio->bi_ioprio = iocb->ki_ioprio; ret = bio_iov_iter_get_pages(bio, iter); if (unlikely(ret)) { diff --git a/include/linux/fs.h b/include/linux/fs.h index 7a90ce387e00..3415e83f6350 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -294,6 +294,7 @@ enum rw_hint { #define IOCB_SYNC (1 << 5) #define IOCB_WRITE (1 << 6) #define IOCB_NOWAIT(1 << 7) +#define IOCB_IOPRIO(1 << 8) struct kiocb { struct file *ki_filp; @@ -302,6 +303,7 @@ struct kiocb { void*private; int ki_flags; u16 ki_hint; + u16 ki_ioprio; /* See linux/ioprio.h */ } __randomize_layout; static inline bool is_sync_kiocb(struct kiocb *kiocb) diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h index 2c0a3415beee..d4e768d55d14 100644 --- a/include/uapi/linux/aio_abi.h +++ b/include/uapi/linux/aio_abi.h @@ -55,6 +55,7 @@ enum { * is valid. */ #define IOCB_FLAG_RESFD(1 << 0) +#define IOCB_FLAG_IOPRIO (1 << 1) /* read() from /dev/aio returns these structures. */ struct io_event { -- 2.15.1
[PATCH v4 1/3] block: add ioprio_check_cap function
From: Adam ManzanaresAio per command iopriority support introduces a second interface between userland and the kernel capable of passing iopriority. The aio interface also needs the ability to verify that the submitting context has sufficient priviledges to submit IOPRIO_RT commands. This patch creates the ioprio_check_cap function to be used by the ioprio_set system call and also by the aio interface. Signed-off-by: Adam Manzanares --- block/ioprio.c | 22 -- include/linux/ioprio.h | 2 ++ 2 files changed, 18 insertions(+), 6 deletions(-) diff --git a/block/ioprio.c b/block/ioprio.c index 6f5d0b6625e3..f9821080c92c 100644 --- a/block/ioprio.c +++ b/block/ioprio.c @@ -61,15 +61,10 @@ int set_task_ioprio(struct task_struct *task, int ioprio) } EXPORT_SYMBOL_GPL(set_task_ioprio); -SYSCALL_DEFINE3(ioprio_set, int, which, int, who, int, ioprio) +int ioprio_check_cap(int ioprio) { int class = IOPRIO_PRIO_CLASS(ioprio); int data = IOPRIO_PRIO_DATA(ioprio); - struct task_struct *p, *g; - struct user_struct *user; - struct pid *pgrp; - kuid_t uid; - int ret; switch (class) { case IOPRIO_CLASS_RT: @@ -92,6 +87,21 @@ SYSCALL_DEFINE3(ioprio_set, int, which, int, who, int, ioprio) return -EINVAL; } + return 0; +} + +SYSCALL_DEFINE3(ioprio_set, int, which, int, who, int, ioprio) +{ + struct task_struct *p, *g; + struct user_struct *user; + struct pid *pgrp; + kuid_t uid; + int ret; + + ret = ioprio_check_cap(ioprio); + if (ret) + return ret; + ret = -ESRCH; rcu_read_lock(); switch (which) { diff --git a/include/linux/ioprio.h b/include/linux/ioprio.h index 627efac73e6d..4a28cec49ec3 100644 --- a/include/linux/ioprio.h +++ b/include/linux/ioprio.h @@ -77,4 +77,6 @@ extern int ioprio_best(unsigned short aprio, unsigned short bprio); extern int set_task_ioprio(struct task_struct *task, int ioprio); +extern int ioprio_check_cap(int ioprio); + #endif -- 2.15.1
[PATCH v4 2/3] fs: Convert kiocb rw_hint from enum to u16
From: Adam ManzanaresIn order to avoid kiocb bloat for per command iopriority support, rw_hint is converted from enum to a u16. Added a guard around ki_hint assigment. Signed-off-by: Adam Manzanares --- include/linux/fs.h | 15 +-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/include/linux/fs.h b/include/linux/fs.h index 760d8da1b6c7..7a90ce387e00 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -284,6 +284,8 @@ enum rw_hint { WRITE_LIFE_EXTREME = RWH_WRITE_LIFE_EXTREME, }; +#define MAX_KI_HINT((1 << 16) - 1) /* ki_hint type is u16 */ + #define IOCB_EVENTFD (1 << 0) #define IOCB_APPEND(1 << 1) #define IOCB_DIRECT(1 << 2) @@ -299,7 +301,7 @@ struct kiocb { void (*ki_complete)(struct kiocb *iocb, long ret, long ret2); void*private; int ki_flags; - enum rw_hintki_hint; + u16 ki_hint; } __randomize_layout; static inline bool is_sync_kiocb(struct kiocb *kiocb) @@ -1927,12 +1929,21 @@ static inline enum rw_hint file_write_hint(struct file *file) static inline int iocb_flags(struct file *file); +/* ki_hint changed from enum to u16, make sure rw_hint fits into u16 */ +static inline u16 ki_hint_valid(enum rw_hint hint) +{ + if (hint > MAX_KI_HINT) + return 0; + + return hint; +} + static inline void init_sync_kiocb(struct kiocb *kiocb, struct file *filp) { *kiocb = (struct kiocb) { .ki_filp = filp, .ki_flags = iocb_flags(filp), - .ki_hint = file_write_hint(filp), + .ki_hint = ki_hint_valid(file_write_hint(filp)), }; } -- 2.15.1
[PATCH v4 0/3] AIO add per-command iopriority
From: Adam ManzanaresThis is the per-I/O equivalent of the ioprio_set system call. See the following link for performance implications on a SATA HDD: https://lkml.org/lkml/2016/12/6/495 First patch factors ioprio_check_cap function out of ioprio_set system call to also be used by the aio ioprio interface. Second patch converts kiocb ki_hint field to a u16 to avoid kiocb bloat. Third patch passes ioprio hint from aio iocb to kiocb and enables block_dev usage of the per I/O ioprio feature. v2: merge patches use IOCB_FLAG_IOPRIO validate intended use with IOCB_IOPRIO add linux-api and linux-block to cc v3: add ioprio_check_cap function convert kiocb ki_hint to u16 use ioprio_check_cap when adding ioprio to kiocb in aio.c v4: handle IOCB_IOPRIO in aio_prep_rw note patch 3 depends on patch 1 in commit msg Adam Manzanares (3): block: add ioprio_check_cap function fs: Convert kiocb rw_hint from enum to u16 fs: Add aio iopriority support for block_dev block/ioprio.c | 22 -- fs/aio.c | 16 fs/block_dev.c | 2 ++ include/linux/fs.h | 17 +++-- include/linux/ioprio.h | 2 ++ include/uapi/linux/aio_abi.h | 1 + 6 files changed, 52 insertions(+), 8 deletions(-) -- 2.15.1
Re: [PATCH 2/2] nbd: don't start req until after the dead connection logic
On Thu, May 17, 2018 at 06:21:40PM +, Bart Van Assche wrote: > On Thu, 2017-10-19 at 16:21 -0400, Josef Bacik wrote: > > + blk_mq_start_request(req); > > if (unlikely(nsock->pending && nsock->pending != req)) { > > blk_mq_requeue_request(req, true); > > ret = 0; > > (replying to an e-mail from seven months ago) > > Hello Josef, > > Are you aware that the nbd driver is one of the very few block drivers that > calls blk_mq_requeue_request() after a request has been started? I think that > can lead to the block layer core to undesired behavior, e.g. that the timeout > handler fires concurrently with a request being reinstered. Can you or a > colleague have a look at this? I would like to add the following code to the > block layer core and I think that the nbd driver would trigger this warning: > > void blk_mq_requeue_request(struct request *rq, bool kick_requeue_list) > { > + WARN_ON_ONCE(old_state != MQ_RQ_COMPLETE); > + > __blk_mq_requeue_request(rq); > Yup I can tell you why, on 4.11 where I originally did this work __blk_mq_requeue_request() did this static void __blk_mq_requeue_request(struct request *rq) { struct request_queue *q = rq->q; trace_block_rq_requeue(q, rq); rq_qos_requeue(q, >issue_stat); blk_mq_sched_requeue_request(rq); if (test_and_clear_bit(REQ_ATOM_STARTED, >atomic_flags)) { if (q->dma_drain_size && blk_rq_bytes(rq)) rq->nr_phys_segments--; } } So it was clearing the started part when it did the requeue. If that's not what I'm supposed to be doing anymore then I can send a patch to fix it. What is supposed to be done if I did already do blk_mq_start_request, because I can avoid doing the start until after that chunk of code, but there's a part further down that needs to have start done before we reach it, so I'll have to do whatever the special thing is now there. Thanks, Josef
Re: [PATCH V6 11/11] nvme: pci: support nested EH
On Fri, May 18, 2018 at 08:19:58AM +0800, Ming Lei wrote: > On Wed, May 16, 2018 at 08:20:31PM -0600, Keith Busch wrote: > > Hi Ming, > > > > I'm developing the answers in code the issues you raised. It will just > > take a moment to complete flushing those out. In the meantime just want > > to point out why I think block/011 isn't a real test. > > > > On Thu, May 17, 2018 at 07:10:59AM +0800, Ming Lei wrote: > > > All simulation in block/011 may happen in reality. > > > > If this test actually simulates reality, then the following one line > > patch (plus explanation for why) would be a real "fix" as this is very > > successful in passing block/011. :) > > > > --- > > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c > > index 1faa32cd07da..dcc5746304c4 100644 > > --- a/drivers/nvme/host/pci.c > > +++ b/drivers/nvme/host/pci.c > > @@ -2118,6 +2118,12 @@ static int nvme_pci_enable(struct nvme_dev *dev) > > > > if (pci_enable_device_mem(pdev)) > > return result; > > + /* > > +* blktests block/011 disables the device without the driver knowing. > > +* We'll just enable the device twice to get the enable_cnt > 1 > > +* so that the test's disabling does absolutely nothing. > > +*/ > > + pci_enable_device_mem(pdev); > > What I think block/011 is helpful is that it can trigger IO timeout > during reset, which can be triggered in reality too. > > That is one big problem of NVMe driver, IMO. > > And this patch does fix this issue, together other timeout related > races. BTW, the above patch may not be enough to 'fix' this NVMe issue, I guess you may have to figure out another one to remove fault-injection, or at least disable io-timeout-fail on NVMe. Thanks, Ming
Re: [PATCH V6 11/11] nvme: pci: support nested EH
On Wed, May 16, 2018 at 08:20:31PM -0600, Keith Busch wrote: > Hi Ming, > > I'm developing the answers in code the issues you raised. It will just > take a moment to complete flushing those out. In the meantime just want > to point out why I think block/011 isn't a real test. > > On Thu, May 17, 2018 at 07:10:59AM +0800, Ming Lei wrote: > > All simulation in block/011 may happen in reality. > > If this test actually simulates reality, then the following one line > patch (plus explanation for why) would be a real "fix" as this is very > successful in passing block/011. :) > > --- > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c > index 1faa32cd07da..dcc5746304c4 100644 > --- a/drivers/nvme/host/pci.c > +++ b/drivers/nvme/host/pci.c > @@ -2118,6 +2118,12 @@ static int nvme_pci_enable(struct nvme_dev *dev) > > if (pci_enable_device_mem(pdev)) > return result; > + /* > + * blktests block/011 disables the device without the driver knowing. > + * We'll just enable the device twice to get the enable_cnt > 1 > + * so that the test's disabling does absolutely nothing. > + */ > + pci_enable_device_mem(pdev); What I think block/011 is helpful is that it can trigger IO timeout during reset, which can be triggered in reality too. That is one big problem of NVMe driver, IMO. And this patch does fix this issue, together other timeout related races. Thanks, Ming