[PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-09 Thread Bart Van Assche
If a completion occurs after blk_mq_rq_timed_out() has reset rq->aborted_gstate and the request is again in flight when the timeout expires then a request will be completed twice: a first time by the timeout handler and a second time when the regular completion occurs. Additionally, the blk-mq

Re: [block regression] kernel oops triggered by removing scsi device dring IO

2018-04-09 Thread Bart Van Assche
On Tue, 2018-04-10 at 09:30 +0800, Ming Lei wrote: > Also is it possible to see queue freed here? I think the caller should keep a reference on the request queue. Otherwise we have a much bigger problem than a race between submitting a bio and removing a request queue from the cgroup controller

Re: [block regression] kernel oops triggered by removing scsi device dring IO

2018-04-09 Thread Ming Lei
On Mon, Apr 09, 2018 at 10:54:57PM +, Bart Van Assche wrote: > On Mon, 2018-04-09 at 14:54 +0800, Joseph Qi wrote: > > The oops happens during generic_make_request_checks(), in > > blk_throtl_bio() exactly. > > So if we want to bypass dying queue, we have to check this before > >

Re: limits->max_sectors is getting set to 0, why/where? [was: Re: dm: kernel oops by divide error on v4.16+]

2018-04-09 Thread Jens Axboe
On 4/9/18 5:54 PM, Linus Torvalds wrote: > On Mon, Apr 9, 2018 at 3:32 PM, Jens Axboe wrote: >> >> The resulting min/max and friends would have been trivial to test, but >> clearly they weren't. > > Well, the min/max macros themselves actually were tested in user space by me. >

Re: limits->max_sectors is getting set to 0, why/where? [was: Re: dm: kernel oops by divide error on v4.16+]

2018-04-09 Thread Linus Torvalds
On Mon, Apr 9, 2018 at 3:32 PM, Jens Axboe wrote: > > The resulting min/max and friends would have been trivial to test, but > clearly they weren't. Well, the min/max macros themselves actually were tested in user space by me. It was the interaction with the unrelated

Re: 4.15.14 crash with iscsi target and dvd

2018-04-09 Thread Ming Lei
On Mon, Apr 09, 2018 at 07:43:01PM -0400, Wakko Warner wrote: > Ming Lei wrote: > > On Mon, Apr 09, 2018 at 09:30:11PM +, Bart Van Assche wrote: > > > Hello Ming, > > > > > > Can you have a look at this? The start of this e-mail thread is available > > > at > > >

Re: 4.15.14 crash with iscsi target and dvd

2018-04-09 Thread Wakko Warner
Ming Lei wrote: > On Mon, Apr 09, 2018 at 09:30:11PM +, Bart Van Assche wrote: > > Hello Ming, > > > > Can you have a look at this? The start of this e-mail thread is available at > > https://www.mail-archive.com/linux-scsi@vger.kernel.org/msg72574.html. > > Sure, thanks for your sharing. >

Re: 4.15.14 crash with iscsi target and dvd

2018-04-09 Thread Ming Lei
On Mon, Apr 09, 2018 at 09:30:11PM +, Bart Van Assche wrote: > On Sun, 2018-04-08 at 12:02 -0400, Wakko Warner wrote: > > I finished with git bisect. Here's the output: > > 84c8590646d5b35804bac60eb58b145839b5893e is the first bad commit > > commit 84c8590646d5b35804bac60eb58b145839b5893e > >

Re: [block regression] kernel oops triggered by removing scsi device dring IO

2018-04-09 Thread Bart Van Assche
On Mon, 2018-04-09 at 16:58 -0600, Jens Axboe wrote: > This ends up being nutty in the generic_make_request() case, where we > do the exact same enter/exit logic right after. That needs to get unified. > Maybe move the queue enter into generic_make_request_checks(), and exit > in the caller?

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-09 Thread Kees Cook
On Mon, Apr 9, 2018 at 1:30 PM, Kees Cook wrote: > Ah! dm-crypt too. I'll see if I can get that added easily to my tests. Quick update: I added dm-crypt (with XFS on top) and it hung my system almost immediately. I got no warnings at all, though. -Kees -- Kees Cook

Re: [s390x] New regression was found on kernel-4.16

2018-04-09 Thread Ming Lei
On Mon, Apr 09, 2018 at 06:18:04PM +0800, Li Wang wrote: > Hi, > > I got this BUG_ON() on s390x platform with kernel-v4.16.0. > > [1.200196] [ cut here ] > [1.200201] kernel BUG at block/bio.c:1798! > [1.200228] illegal operation: 0001 ilc:1 [#1] SMP > [

Re: limits->max_sectors is getting set to 0, why/where? [was: Re: dm: kernel oops by divide error on v4.16+]

2018-04-09 Thread Jens Axboe
On 4/9/18 4:38 PM, Kees Cook wrote: > On Mon, Apr 9, 2018 at 3:32 PM, Jens Axboe wrote: >> That's bad, for sure, but my worry was bigger than an oops or crash, >> we could have had corruption due to this. >> >> The resulting min/max and friends would have been trivial to test,

Re: [block regression] kernel oops triggered by removing scsi device dring IO

2018-04-09 Thread Jens Axboe
On 4/9/18 4:54 PM, Bart Van Assche wrote: > On Mon, 2018-04-09 at 14:54 +0800, Joseph Qi wrote: >> The oops happens during generic_make_request_checks(), in >> blk_throtl_bio() exactly. >> So if we want to bypass dying queue, we have to check this before >> generic_make_request_checks(), I think.

Re: [block regression] kernel oops triggered by removing scsi device dring IO

2018-04-09 Thread Bart Van Assche
On Mon, 2018-04-09 at 14:54 +0800, Joseph Qi wrote: > The oops happens during generic_make_request_checks(), in > blk_throtl_bio() exactly. > So if we want to bypass dying queue, we have to check this before > generic_make_request_checks(), I think. How about something like the patch below?

Re: limits->max_sectors is getting set to 0, why/where? [was: Re: dm: kernel oops by divide error on v4.16+]

2018-04-09 Thread Kees Cook
On Mon, Apr 9, 2018 at 3:32 PM, Jens Axboe wrote: > That's bad, for sure, but my worry was bigger than an oops or crash, > we could have had corruption due to this. > > The resulting min/max and friends would have been trivial to test, but > clearly they weren't. Yeah, that was

Re: limits->max_sectors is getting set to 0, why/where? [was: Re: dm: kernel oops by divide error on v4.16+]

2018-04-09 Thread Jens Axboe
On 4/9/18 4:27 PM, Ming Lei wrote: > On Mon, Apr 09, 2018 at 04:10:17PM -0600, Jens Axboe wrote: >> On 4/9/18 4:05 PM, Kees Cook wrote: >>> On Mon, Apr 9, 2018 at 2:56 PM, Jens Axboe wrote: On 4/9/18 3:26 PM, Jens Axboe wrote: > On 4/9/18 1:32 PM, Jens Axboe wrote:

Re: limits->max_sectors is getting set to 0, why/where? [was: Re: dm: kernel oops by divide error on v4.16+]

2018-04-09 Thread Ming Lei
On Mon, Apr 09, 2018 at 04:10:17PM -0600, Jens Axboe wrote: > On 4/9/18 4:05 PM, Kees Cook wrote: > > On Mon, Apr 9, 2018 at 2:56 PM, Jens Axboe wrote: > >> On 4/9/18 3:26 PM, Jens Axboe wrote: > >>> On 4/9/18 1:32 PM, Jens Axboe wrote: > On 4/9/18 12:38 PM, Mike Snitzer

Re: limits->max_sectors is getting set to 0, why/where? [was: Re: dm: kernel oops by divide error on v4.16+]

2018-04-09 Thread Jens Axboe
On 4/9/18 4:05 PM, Kees Cook wrote: > On Mon, Apr 9, 2018 at 2:56 PM, Jens Axboe wrote: >> On 4/9/18 3:26 PM, Jens Axboe wrote: >>> On 4/9/18 1:32 PM, Jens Axboe wrote: On 4/9/18 12:38 PM, Mike Snitzer wrote: > On Mon, Apr 09 2018 at 11:51am -0400, > Mike Snitzer

Re: [PATCH v2] blk-cgroup: remove entries in blkg_tree before queue release

2018-04-09 Thread Tejun Heo
(cc'ing Joseph as he worked on the area recently, hi!) Hello, On Sat, Apr 07, 2018 at 12:21:48PM +0200, Alexandru Moise wrote: > The q->id is used as an index within the blkg_tree radix tree. > > If the entry is not released before reclaiming the blk_queue_ida's id > blkcg_init_queue() within a

Re: limits->max_sectors is getting set to 0, why/where? [was: Re: dm: kernel oops by divide error on v4.16+]

2018-04-09 Thread Kees Cook
On Mon, Apr 9, 2018 at 2:56 PM, Jens Axboe wrote: > On 4/9/18 3:26 PM, Jens Axboe wrote: >> On 4/9/18 1:32 PM, Jens Axboe wrote: >>> On 4/9/18 12:38 PM, Mike Snitzer wrote: On Mon, Apr 09 2018 at 11:51am -0400, Mike Snitzer wrote: > On Sun,

Re: limits->max_sectors is getting set to 0, why/where? [was: Re: dm: kernel oops by divide error on v4.16+]

2018-04-09 Thread Jens Axboe
On 4/9/18 3:26 PM, Jens Axboe wrote: > On 4/9/18 1:32 PM, Jens Axboe wrote: >> On 4/9/18 12:38 PM, Mike Snitzer wrote: >>> On Mon, Apr 09 2018 at 11:51am -0400, >>> Mike Snitzer wrote: >>> On Sun, Apr 08 2018 at 12:00am -0400, Ming Lei wrote:

Re: [PATCH] blk-mq: Fix recently introduced races in the timeout handling code

2018-04-09 Thread t...@kernel.org
Hello, Bart. On Mon, Apr 09, 2018 at 09:30:27PM +, Bart Van Assche wrote: > On Mon, 2018-04-09 at 11:56 -0700, t...@kernel.org wrote: > > On Mon, Apr 09, 2018 at 05:03:05PM +, Bart Van Assche wrote: > > > exist today in the blk-mq timeout handling code cannot be fixed completely > > >

Re: [PATCH] blk-mq: Fix recently introduced races in the timeout handling code

2018-04-09 Thread Bart Van Assche
On Mon, 2018-04-09 at 11:56 -0700, t...@kernel.org wrote: > On Mon, Apr 09, 2018 at 05:03:05PM +, Bart Van Assche wrote: > > exist today in the blk-mq timeout handling code cannot be fixed completely > > using RCU only. > > I really don't think that is that complicated. Let's first confirm >

Re: 4.15.14 crash with iscsi target and dvd

2018-04-09 Thread Bart Van Assche
On Sun, 2018-04-08 at 12:02 -0400, Wakko Warner wrote: > I finished with git bisect. Here's the output: > 84c8590646d5b35804bac60eb58b145839b5893e is the first bad commit > commit 84c8590646d5b35804bac60eb58b145839b5893e > Author: Ming Lei > Date: Fri Nov 11 20:05:32

Re: limits->max_sectors is getting set to 0, why/where? [was: Re: dm: kernel oops by divide error on v4.16+]

2018-04-09 Thread Jens Axboe
On 4/9/18 1:32 PM, Jens Axboe wrote: > On 4/9/18 12:38 PM, Mike Snitzer wrote: >> On Mon, Apr 09 2018 at 11:51am -0400, >> Mike Snitzer wrote: >> >>> On Sun, Apr 08 2018 at 12:00am -0400, >>> Ming Lei wrote: >>> Hi, The following kernel

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-09 Thread Kees Cook
On Mon, Apr 9, 2018 at 12:02 PM, Oleksandr Natalenko wrote: > > Hi. > > (fancy details for linux-block and BFQ people go below) > > 09.04.2018 20:32, Kees Cook wrote: >> >> Ah, this detail I didn't have. I've changed my environment to >> >> build with: >> >>

Re: limits->max_sectors is getting set to 0, why/where? [was: Re: dm: kernel oops by divide error on v4.16+]

2018-04-09 Thread Jens Axboe
On 4/9/18 12:38 PM, Mike Snitzer wrote: > On Mon, Apr 09 2018 at 11:51am -0400, > Mike Snitzer wrote: > >> On Sun, Apr 08 2018 at 12:00am -0400, >> Ming Lei wrote: >> >>> Hi, >>> >>> The following kernel oops(divide error) is triggered when running >>>

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-09 Thread Oleksandr Natalenko
Hi. (fancy details for linux-block and BFQ people go below) 09.04.2018 20:32, Kees Cook wrote: Ah, this detail I didn't have. I've changed my environment to build with: CONFIG_BLK_MQ_PCI=y CONFIG_BLK_MQ_VIRTIO=y CONFIG_IOSCHED_BFQ=y boot with scsi_mod.use_blk_mq=1 and select BFQ in the

Re: limits->max_sectors is getting set to 0, why/where? [was: Re: dm: kernel oops by divide error on v4.16+]

2018-04-09 Thread Mike Snitzer
On Mon, Apr 09 2018 at 11:51am -0400, Mike Snitzer wrote: > On Sun, Apr 08 2018 at 12:00am -0400, > Ming Lei wrote: > > > Hi, > > > > The following kernel oops(divide error) is triggered when running > > xfstest(generic/347) on ext4. > > > > [

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-09 Thread Kees Cook
On Sun, Apr 8, 2018 at 12:07 PM, Oleksandr Natalenko wrote: > So far, I wasn't able to trigger this with mq-deadline (or without blk-mq). > Maybe, this has something to do with blk-mq+BFQ re-queuing, or it's just me > not being persistent enough. Ah, this detail I

Re: Block layer use of __GFP flags

2018-04-09 Thread Michal Hocko
On Mon 09-04-18 15:03:45, Bart Van Assche wrote: > On Mon, 2018-04-09 at 11:00 +0200, Michal Hocko wrote: > > On Mon 09-04-18 04:46:22, Bart Van Assche wrote: > > [...] > > [...] > > > diff --git a/drivers/ide/ide-pm.c b/drivers/ide/ide-pm.c > > > index ad8a125defdd..3ddb464b72e6 100644 > > > ---

Re: [PATCH 7/7] block: use GFP_KERNEL for allocations from blk_get_request

2018-04-09 Thread Christoph Hellwig
On Mon, Apr 09, 2018 at 09:52:03AM -0700, Matthew Wilcox wrote: > On Mon, Apr 09, 2018 at 05:39:16PM +0200, Christoph Hellwig wrote: > > blk_get_request is used for pass-through style I/O and thus doesn't need > > GFP_NOIO. > > Obviously GFP_KERNEL is a big improvement over GFP_NOIO! But can we

Re: [PATCH 6/7] block: consistently use GFP_NOIO instead of __GFP_NORECLAIM

2018-04-09 Thread Christoph Hellwig
On Mon, Apr 09, 2018 at 09:03:54AM -0700, Matthew Wilcox wrote: > > @@ -499,7 +499,7 @@ int sg_scsi_ioctl(struct request_queue *q, struct > > gendisk *disk, fmode_t mode, > > break; > > } > > > > - if (bytes && blk_rq_map_kern(q, rq, buffer, bytes, __GFP_RECLAIM)) { > > + if

Re: [PATCH] blk-mq: Fix recently introduced races in the timeout handling code

2018-04-09 Thread Bart Van Assche
On Mon, 2018-04-09 at 09:47 -0700, Tejun Heo wrote: > On Sun, Apr 08, 2018 at 10:20:38PM -0700, Bart Van Assche wrote: > > If a completion occurs after blk_mq_rq_timed_out() has reset > > rq->aborted_gstate and the request is again in flight when the timeout > > expires then a request will be

Re: [PATCH 7/7] block: use GFP_KERNEL for allocations from blk_get_request

2018-04-09 Thread Matthew Wilcox
On Mon, Apr 09, 2018 at 05:39:16PM +0200, Christoph Hellwig wrote: > blk_get_request is used for pass-through style I/O and thus doesn't need > GFP_NOIO. Obviously GFP_KERNEL is a big improvement over GFP_NOIO! But can we take it all the way to GFP_USER, if this is always done in the ioctl path

Re: [PATCH] blk-mq: Fix recently introduced races in the timeout handling code

2018-04-09 Thread Tejun Heo
Hello, Sagi. On Mon, Apr 09, 2018 at 11:37:15AM +0300, Sagi Grimberg wrote: > > >If a completion occurs after blk_mq_rq_timed_out() has reset > >rq->aborted_gstate and the request is again in flight when the timeout > >expires then a request will be completed twice: a first time by the >

Re: [PATCH] blk-mq: Fix recently introduced races in the timeout handling code

2018-04-09 Thread Tejun Heo
Hey, Bart. On Sun, Apr 08, 2018 at 10:20:38PM -0700, Bart Van Assche wrote: > If a completion occurs after blk_mq_rq_timed_out() has reset > rq->aborted_gstate and the request is again in flight when the timeout > expires then a request will be completed twice: a first time by the > timeout

Re: [PATCH 6/7] block: consistently use GFP_NOIO instead of __GFP_NORECLAIM

2018-04-09 Thread Matthew Wilcox
On Mon, Apr 09, 2018 at 05:39:15PM +0200, Christoph Hellwig wrote: > Same numerical value (for now at least), but a much better documentation > of intent. > @@ -499,7 +499,7 @@ int sg_scsi_ioctl(struct request_queue *q, struct gendisk > *disk, fmode_t mode, > break; > } > >

Re: Multi-Actuator SAS HDD First Look

2018-04-09 Thread Douglas Gilbert
On 2018-04-09 02:17 AM, Hannes Reinecke wrote: On 04/09/2018 04:08 AM, Tim Walker wrote: On Fri, Apr 6, 2018 at 11:09 AM, Douglas Gilbert wrote: On 2018-04-06 02:42 AM, Christoph Hellwig wrote: On Fri, Apr 06, 2018 at 08:24:18AM +0200, Hannes Reinecke wrote: Ah.

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-09 Thread Oleksandr Natalenko
Hi. 09.04.2018 11:35, Christoph Hellwig wrote: I really can't make sense of that report. Sorry, I have nothing to add there so far, I just see the symptom of something going wrong in the ioctl code path that is invoked by smartctl, but I have no idea what's the minimal environment to

limits->max_sectors is getting set to 0, why/where? [was: Re: dm: kernel oops by divide error on v4.16+]

2018-04-09 Thread Mike Snitzer
On Sun, Apr 08 2018 at 12:00am -0400, Ming Lei wrote: > Hi, > > The following kernel oops(divide error) is triggered when running > xfstest(generic/347) on ext4. > > [ 442.632954] run fstests generic/347 at 2018-04-07 18:06:44 > [ 443.839480] divide error: [#1]

Re: [PATCH 0/8] blk-mq: fix and improve queue mapping

2018-04-09 Thread Stefan Haberland
On 08.04.2018 11:48, Ming Lei wrote: Hi Jens, The first two patches fix issues about queue mapping. The other 6 patches improve queue mapping for blk-mq. Christian, this patches should fix your issue, so please give a test, and the patches can be found in the following tree:

Re: [PATCH] blk-mq: Fix recently introduced races in the timeout handling code

2018-04-09 Thread Israel Rukshin
On 4/9/2018 11:37 AM, Sagi Grimberg wrote: If a completion occurs after blk_mq_rq_timed_out() has reset rq->aborted_gstate and the request is again in flight when the timeout expires then a request will be completed twice: a first time by the timeout handler and a second time when the regular

[PATCH 7/7] block: use GFP_KERNEL for allocations from blk_get_request

2018-04-09 Thread Christoph Hellwig
blk_get_request is used for pass-through style I/O and thus doesn't need GFP_NOIO. Signed-off-by: Christoph Hellwig --- block/blk-core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/block/blk-core.c b/block/blk-core.c index 432923751551..253a869558f9 100644

[PATCH 5/7] block: use GFP_NOIO instead of __GFP_DIRECT_RECLAIM

2018-04-09 Thread Christoph Hellwig
We just can't do I/O when doing block layer requests allocations, so use GFP_NOIO instead of the even more limited __GFP_DIRECT_RECLAIM. Signed-off-by: Christoph Hellwig --- block/blk-core.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/block/blk-core.c

[PATCH 4/7] block: pass explicit gfp_t to get_request

2018-04-09 Thread Christoph Hellwig
blk_old_get_request already has it at hand, and in blk_queue_bio, which is the fast path, it is constant. Signed-off-by: Christoph Hellwig --- block/blk-core.c | 14 +++--- drivers/scsi/scsi_error.c | 4 2 files changed, 7 insertions(+), 11 deletions(-) diff

[PATCH 3/7] block: sanitize blk_get_request calling conventions

2018-04-09 Thread Christoph Hellwig
Switch everyone to blk_get_request_flags, and then rename blk_get_request_flags to blk_get_request. Signed-off-by: Christoph Hellwig --- block/blk-core.c | 14 +++--- block/bsg.c| 5 ++--- block/scsi_ioctl.c | 8

[PATCH 6/7] block: consistently use GFP_NOIO instead of __GFP_NORECLAIM

2018-04-09 Thread Christoph Hellwig
Same numerical value (for now at least), but a much better documentation of intent. Signed-off-by: Christoph Hellwig --- block/scsi_ioctl.c | 2 +- drivers/block/drbd/drbd_bitmap.c | 3 ++- drivers/block/pktcdvd.c | 2 +- drivers/ide/ide-tape.c |

[PATCH 1/7] scsi/osd: remove the gfp argument to osd_start_request

2018-04-09 Thread Christoph Hellwig
Always GFP_KERNEL, and keeping it would cause serious complications for the next change. Signed-off-by: Christoph Hellwig --- drivers/scsi/osd/osd_initiator.c | 24 +++- fs/exofs/ore.c | 10 +- fs/exofs/super.c | 2 +-

[RFC] fix confusion around GFP_* flags and blk_get_request

2018-04-09 Thread Christoph Hellwig
Hi all, this series sorts out the mess around how we use gfp flags in the block layer get_request interface.

[PATCH 2/7] block: fix __get_request documentation

2018-04-09 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig --- block/blk-core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/block/blk-core.c b/block/blk-core.c index abcb8684ba67..abde22c755ab 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1517,7 +1517,7 @@ static struct

Re: Block layer use of __GFP flags

2018-04-09 Thread Bart Van Assche
On Mon, 2018-04-09 at 01:26 -0700, Christoph Hellwig wrote: > On Mon, Apr 09, 2018 at 08:53:49AM +0200, Hannes Reinecke wrote: > > Why don't you fold the 'flags' argument into the 'gfp_flags', and drop > > the 'flags' argument completely? > > Looks a bit pointless to me, having two arguments

Re: Block layer use of __GFP flags

2018-04-09 Thread Matthew Wilcox
On Mon, Apr 09, 2018 at 01:26:50AM -0700, Christoph Hellwig wrote: > On Mon, Apr 09, 2018 at 08:53:49AM +0200, Hannes Reinecke wrote: > > Why don't you fold the 'flags' argument into the 'gfp_flags', and drop > > the 'flags' argument completely? > > Looks a bit pointless to me, having two

Re: Block layer use of __GFP flags

2018-04-09 Thread Bart Van Assche
On Mon, 2018-04-09 at 11:00 +0200, Michal Hocko wrote: > On Mon 09-04-18 04:46:22, Bart Van Assche wrote: > [...] > [...] > > diff --git a/drivers/ide/ide-pm.c b/drivers/ide/ide-pm.c > > index ad8a125defdd..3ddb464b72e6 100644 > > --- a/drivers/ide/ide-pm.c > > +++ b/drivers/ide/ide-pm.c > > @@

Re: [PATCH] blk-mq: Fix recently introduced races in the timeout handling code

2018-04-09 Thread Jens Axboe
On 4/9/18 8:58 AM, Bart Van Assche wrote: > On Mon, 2018-04-09 at 11:37 +0200, Christoph Hellwig wrote: >> This looks sensible, but I'm worried about taking a whole spinlock >> for every request completion, including irq disabling. However it seems >> like your new updated pattern would fit use

Re: [PATCH 0/8] blk-mq: fix and improve queue mapping

2018-04-09 Thread Jens Axboe
On 4/8/18 3:48 AM, Ming Lei wrote: > Hi Jens, > > The first two patches fix issues about queue mapping. > > The other 6 patches improve queue mapping for blk-mq. > > Christian, this patches should fix your issue, so please give > a test, and the patches can be found in the following tree: > >

Re: [PATCH] blk-mq: Fix recently introduced races in the timeout handling code

2018-04-09 Thread Bart Van Assche
On Mon, 2018-04-09 at 11:37 +0200, Christoph Hellwig wrote: > This looks sensible, but I'm worried about taking a whole spinlock > for every request completion, including irq disabling. However it seems > like your new updated pattern would fit use of cmpxchg() very nicely. Hello Christoph,

Re: [PATCH] blk-mq: Fix recently introduced races in the timeout handling code

2018-04-09 Thread Bart Van Assche
On Mon, 2018-04-09 at 11:37 +0300, Sagi Grimberg wrote: > > If a completion occurs after blk_mq_rq_timed_out() has reset > > rq->aborted_gstate and the request is again in flight when the timeout > > expires then a request will be completed twice: a first time by the > > timeout handler and a

Re: BUG at IP: blk_mq_get_request+0x23e/0x390 on 4.16.0-rc7

2018-04-09 Thread Ming Lei
On Mon, Apr 09, 2018 at 11:31:37AM +0300, Sagi Grimberg wrote: > > > > My device exposes nr_hw_queues which is not higher than num_online_cpus > > > so I want to connect all hctxs with hope that they will be used. > > > > The issue is that CPU online & offline can happen any time, and after > >

Re: [PATCH 8/8] blk-mq: remove code for dealing with remapping queue

2018-04-09 Thread Sagi Grimberg
Looks good, Reviewed-by: Sagi Grimberg

Re: [PATCH 7/8] blk-mq: reimplement blk_mq_hw_queue_mapped

2018-04-09 Thread Sagi Grimberg
Looks good, Reviewed-by: Sagi Grimberg

Re: [PATCH 6/8] blk-mq: don't check queue mapped in __blk_mq_delay_run_hw_queue()

2018-04-09 Thread Sagi Grimberg
Looks good, Reviewed-by: Sagi Grimberg

Re: [PATCH 5/8] blk-mq: remove blk_mq_delay_queue()

2018-04-09 Thread Sagi Grimberg
Looks good, Reviewed-by: Sagi Grimberg

Re: [PATCH 4/8] blk-mq: introduce blk_mq_hw_queue_first_cpu() to figure out first cpu

2018-04-09 Thread Sagi Grimberg
Looks good, Reviewed-by: Sagi Grimberg

Re: [PATCH 3/8] blk-mq: avoid to write intermediate result to hctx->next_cpu

2018-04-09 Thread Sagi Grimberg
Looks good, Reviewed-by: Sagi Grimberg

Re: [PATCH 2/8] blk-mq: don't keep offline CPUs mapped to hctx 0

2018-04-09 Thread Sagi Grimberg
Looks good, Reviewed-by: Sagi Grimberg

Re: [PATCH 1/8] blk-mq: make sure that correct hctx->next_cpu is set

2018-04-09 Thread Sagi Grimberg
Looks good, Reviewed-by: Sagi Grimberg

Re: [PATCH 0/8] blk-mq: fix and improve queue mapping

2018-04-09 Thread Christian Borntraeger
On 04/08/2018 11:48 AM, Ming Lei wrote: > Hi Jens, > > The first two patches fix issues about queue mapping. > > The other 6 patches improve queue mapping for blk-mq. > > Christian, this patches should fix your issue, so please give > a test, and the patches can be found in the following

Re: [PATCH 8/8] blk-mq: remove code for dealing with remapping queue

2018-04-09 Thread Christoph Hellwig
On Sun, Apr 08, 2018 at 05:48:14PM +0800, Ming Lei wrote: > Firstly, from commit 4b855ad37194 ("blk-mq: Create hctx for each present CPU), > blk-mq doesn't remap queue any more after CPU topo is changed. > > Secondly, set->nr_hw_queues can't be bigger than nr_cpu_ids, and now we map > all

Re: [PATCH 7/8] blk-mq: reimplement blk_mq_hw_queue_mapped

2018-04-09 Thread Christoph Hellwig
On Sun, Apr 08, 2018 at 05:48:13PM +0800, Ming Lei wrote: > Now the actual meaning of queue mapped is that if there is any online > CPU mapped to this hctx, so implement blk_mq_hw_queue_mapped() in this > way. Reviewed-by: Christoph Hellwig

Re: [PATCH 6/8] blk-mq: don't check queue mapped in __blk_mq_delay_run_hw_queue()

2018-04-09 Thread Christoph Hellwig
On Sun, Apr 08, 2018 at 05:48:12PM +0800, Ming Lei wrote: > There are several reasons for removing the check: > > 1) blk_mq_hw_queue_mapped() returns true always now since each hctx > may be mapped by one CPU at least Sounds like we should remove it, then.. The patch looks good: Reviewed-by:

Re: [PATCH 5/8] blk-mq: remove blk_mq_delay_queue()

2018-04-09 Thread Christoph Hellwig
On Sun, Apr 08, 2018 at 05:48:11PM +0800, Ming Lei wrote: > No driver uses this interface any more, so remove it. Looks good, Reviewed-by: Christoph Hellwig

Re: [PATCH 4/8] blk-mq: introduce blk_mq_hw_queue_first_cpu() to figure out first cpu

2018-04-09 Thread Christoph Hellwig
On Sun, Apr 08, 2018 at 05:48:10PM +0800, Ming Lei wrote: > This patch introduces helper of blk_mq_hw_queue_first_cpu() for > figuring out the hctx's first cpu, and code duplication can be > avoided. Looks good, Reviewed-by: Christoph Hellwig

Re: [PATCH 3/8] blk-mq: avoid to write intermediate result to hctx->next_cpu

2018-04-09 Thread Christoph Hellwig
On Sun, Apr 08, 2018 at 05:48:09PM +0800, Ming Lei wrote: > This patch figures out the final selected CPU, then writes > it to hctx->next_cpu once, then we can avoid to intermediate > next cpu observed from other dispatch paths. Looks good, Reviewed-by: Christoph Hellwig

Re: [PATCH 2/8] blk-mq: don't keep offline CPUs mapped to hctx 0

2018-04-09 Thread Christoph Hellwig
On Sun, Apr 08, 2018 at 05:48:08PM +0800, Ming Lei wrote: > >From commit 4b855ad37194 ("blk-mq: Create hctx for each present CPU), > blk-mq doesn't remap queue after CPU topo is changed, that said when > some of these offline CPUs become online, they are still mapped to > hctx 0, then hctx 0 may

Re: [PATCH 1/8] blk-mq: make sure that correct hctx->next_cpu is set

2018-04-09 Thread Christoph Hellwig
On Sun, Apr 08, 2018 at 05:48:07PM +0800, Ming Lei wrote: > >From commit 20e4d81393196 (blk-mq: simplify queue mapping & schedule > with each possisble CPU), one hctx can be mapped from all offline CPUs, > then hctx->next_cpu can be set as wrong. > > This patch fixes this issue by making

Re: [PATCH] blk-mq: Fix recently introduced races in the timeout handling code

2018-04-09 Thread Christoph Hellwig
This looks sensible, but I'm worried about taking a whole spinlock for every request completion, including irq disabling. However it seems like your new updated pattern would fit use of cmpxchg() very nicely.

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-09 Thread Christoph Hellwig
I really can't make sense of that report. And I'm also curious why you think 17cb960f29c2 should change anything for that code path.

Re: Multi-Actuator SAS HDD First Look

2018-04-09 Thread Christoph Hellwig
On Fri, Apr 06, 2018 at 01:09:08PM -0400, Douglas Gilbert wrote: > So you found a document that outlines NVMe's architecture! Could you > share the url (no marketing BS, please)? You can always take a look at the actual spec:

Re: BUG at IP: blk_mq_get_request+0x23e/0x390 on 4.16.0-rc7

2018-04-09 Thread Sagi Grimberg
Hi Sagi Sorry for the late response, bellow patch works, here is the full log: Thanks for testing! Now that we isolated the issue, the question is if this fix is correct given that we are guaranteed that the connect context will run on an online cpu? another reference to the patch (we can

Re: BUG at IP: blk_mq_get_request+0x23e/0x390 on 4.16.0-rc7

2018-04-09 Thread Yi Zhang
On 04/09/2018 04:54 PM, Yi Zhang wrote: On 04/09/2018 04:31 PM, Sagi Grimberg wrote: My device exposes nr_hw_queues which is not higher than num_online_cpus so I want to connect all hctxs with hope that they will be used. The issue is that CPU online & offline can happen any time, and

Re: Block layer use of __GFP flags

2018-04-09 Thread Michal Hocko
On Mon 09-04-18 04:46:22, Bart Van Assche wrote: [...] [...] > diff --git a/drivers/ide/ide-pm.c b/drivers/ide/ide-pm.c > index ad8a125defdd..3ddb464b72e6 100644 > --- a/drivers/ide/ide-pm.c > +++ b/drivers/ide/ide-pm.c > @@ -91,7 +91,7 @@ int generic_ide_resume(struct device *dev) > >

Re: BUG at IP: blk_mq_get_request+0x23e/0x390 on 4.16.0-rc7

2018-04-09 Thread Yi Zhang
On 04/09/2018 04:31 PM, Sagi Grimberg wrote: My device exposes nr_hw_queues which is not higher than num_online_cpus so I want to connect all hctxs with hope that they will be used. The issue is that CPU online & offline can happen any time, and after blk-mq removes CPU hotplug handler,

Re: [PATCH] blk-mq: Fix recently introduced races in the timeout handling code

2018-04-09 Thread Sagi Grimberg
If a completion occurs after blk_mq_rq_timed_out() has reset rq->aborted_gstate and the request is again in flight when the timeout expires then a request will be completed twice: a first time by the timeout handler and a second time when the regular completion occurs. Additionally, the blk-mq

Re: BUG at IP: blk_mq_get_request+0x23e/0x390 on 4.16.0-rc7

2018-04-09 Thread Sagi Grimberg
My device exposes nr_hw_queues which is not higher than num_online_cpus so I want to connect all hctxs with hope that they will be used. The issue is that CPU online & offline can happen any time, and after blk-mq removes CPU hotplug handler, there is no way to remap queue when CPU topo is

Re: Block layer use of __GFP flags

2018-04-09 Thread Christoph Hellwig
On Mon, Apr 09, 2018 at 08:53:49AM +0200, Hannes Reinecke wrote: > Why don't you fold the 'flags' argument into the 'gfp_flags', and drop > the 'flags' argument completely? > Looks a bit pointless to me, having two arguments denoting basically > the same ... Wrong way around. gfp_flags doesn't

Re: [block regression] kernel oops triggered by removing scsi device dring IO

2018-04-09 Thread Joseph Qi
Hi Bart, On 18/4/9 12:47, Bart Van Assche wrote: > On Sun, 2018-04-08 at 12:21 +0800, Ming Lei wrote: >> The following kernel oops is triggered by 'removing scsi device' during >> heavy IO. > > Is the below patch sufficient to fix this? > > Thanks, > > Bart. > > > Subject: blk-mq: Avoid that

Re: Block layer use of __GFP flags

2018-04-09 Thread Hannes Reinecke
On Mon, 9 Apr 2018 04:46:22 + "Bart Van Assche" wrote: > On Sun, 2018-04-08 at 12:08 -0700, Matthew Wilcox wrote: > > On Sun, Apr 08, 2018 at 04:40:59PM +, Bart Van Assche wrote: > > > Do you perhaps want me to prepare a patch that makes > > > blk_get_request()

Re: Multi-Actuator SAS HDD First Look

2018-04-09 Thread Hannes Reinecke
On 04/09/2018 04:08 AM, Tim Walker wrote: > On Fri, Apr 6, 2018 at 11:09 AM, Douglas Gilbert > wrote: >> >> On 2018-04-06 02:42 AM, Christoph Hellwig wrote: >>> >>> On Fri, Apr 06, 2018 at 08:24:18AM +0200, Hannes Reinecke wrote: Ah. Far better. What about

[PATCH] blk-throttle: discard stale last_low_overflow_time

2018-04-09 Thread Jianchao Wang
When there is only one type of traffic, the associated last_low_overflow_time will not be updated, so the value is stale and invalid and we should discard it. Otherwise, __tg_last_low_overflow_time always return the stale value because it is smaller, and then we always get bps/iops has been below

[PATCH] blk-throttle: only update last_low_overflow_time when LIMIT_MAX

2018-04-09 Thread Jianchao Wang
Currently, last_low_overflow_time will be updated whenever bios are throttled and queued. This is true when LIMIT_MAX, but not for LIMIT_LOW. what last_low_overflow_time indicates is dispatching not submitting. When LIMIT_LOW, the dispatch bps/iops should never be above the low limit, we should