Re: [PATCH 4.21 V3] blk-mq: not embed .mq_kobj and ctx->kobj into queue instance

2018-11-19 Thread Greg Kroah-Hartman
On Tue, Nov 20, 2018 at 09:44:35AM +0800, Ming Lei wrote:
> Even though .mq_kobj, ctx->kobj and q->kobj share same lifetime
> from block layer's view, actually they don't because userspace may
> grab one kobject anytime via sysfs.
> 
> This patch fixes the issue by the following approach:
> 
> 1) introduce 'struct blk_mq_ctxs' for holding .mq_kobj and managing
> all ctxs
> 
> 2) free all allocated ctxs and the 'blk_mq_ctxs' instance in release
> handler of .mq_kobj
> 
> 3) grab one ref of .mq_kobj before initializing each ctx->kobj, so that
> .mq_kobj is always released after all ctxs are freed.
> 
> This patch fixes kernel panic issue during booting when DEBUG_KOBJECT_RELEASE
> is enabled.
> 
> Reported-by: Guenter Roeck 
> Cc: "jianchao.wang" 
> Cc: Guenter Roeck 
> Cc: Greg Kroah-Hartman 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Ming Lei 
> ---
> V3:
>   - keep to allocate q->queue_ctx via percpu allocator, so one extra
> pointer reference can be saved for getting ctx
> V2:
>   - allocate 'blk_mq_ctx' inside blk_mq_init_allocated_queue()
>   - allocate q->mq_kobj directly 

Not tested, but seems sane from a kobject point-of-view:

Reviewed-by: Greg Kroah-Hartman 


Re: [PATCH V2 for-4.21 2/2] blk-mq: alloc q->queue_ctx as normal array

2018-11-19 Thread Greg Kroah-Hartman
On Mon, Nov 19, 2018 at 10:04:27AM +0800, Ming Lei wrote:
> On Sat, Nov 17, 2018 at 11:03:42AM +0100, Greg Kroah-Hartman wrote:
> > On Sat, Nov 17, 2018 at 10:34:18AM +0800, Ming Lei wrote:
> > > On Fri, Nov 16, 2018 at 06:06:23AM -0800, Greg Kroah-Hartman wrote:
> > > > On Fri, Nov 16, 2018 at 07:23:11PM +0800, Ming Lei wrote:
> > > > > Now q->queue_ctx is just one read-mostly table for query the
> > > > > 'blk_mq_ctx' instance from one cpu index, it isn't necessary
> > > > > to allocate it as percpu variable. One simple array may be
> > > > > more efficient.
> > > > 
> > > > "may be", have you run benchmarks to be sure?  If so, can you add the
> > > > results of them to this changelog?  If there is no measurable
> > > > difference, then why make this change at all?
> > > 
> > > __blk_mq_get_ctx() is used in fast path, what do you think about which
> > > one is more efficient?
> > > 
> > > - *per_cpu_ptr(q->queue_ctx, cpu);
> > > 
> > > - q->queue_ctx[cpu]
> > 
> > You need to actually test to see which one is faster, you might be
> > surprised :)
> > 
> > In other words, do not just guess.
> 
> No performance difference is observed wrt. this patchset when I
> run the following fio test on null_blk(modprobe null_blk) in my VM:
> 
> fio --direct=1 --size=128G --bsrange=4k-4k --runtime=40 --numjobs=32 \
>   --ioengine=libaio --iodepth=64 --group_reporting=1 --filename=/dev/nullb0 \
>   --name=null_blk-ttest-randread --rw=randread
> 
> Running test is important, but IMO it is more important to understand
> the idea behind is correct, or the approach can be proved as correct.
> 
> Given the count of test cases can be increased exponentially when the related
> factors or settings are covered, obviously we can't run all the test cases.

And what happens when you start to scale the number of queues and cpus
in the system?  Does both options work the same?  Why did the original
code have per-cpu variables?

Anyway, this isn't my subsystem, and it has nothing to do with kobjects,
so I really do not care.  My point is, do not make core changes like
this without really knowing the reason behind the original choice and at
least testing that your change does not break that reasoning.

good luck!

greg k-h


Re: [PATCH V2 for-4.21 1/2] blk-mq: not embed .mq_kobj and ctx->kobj into queue instance

2018-11-19 Thread Greg Kroah-Hartman
On Sat, Nov 17, 2018 at 10:26:38AM +0800, Ming Lei wrote:
> On Fri, Nov 16, 2018 at 06:05:21AM -0800, Greg Kroah-Hartman wrote:
> > On Fri, Nov 16, 2018 at 07:23:10PM +0800, Ming Lei wrote:
> > > @@ -456,7 +456,7 @@ struct request_queue {
> > >   /*
> > >* mq queue kobject
> > >*/
> > > - struct kobject mq_kobj;
> > > + struct kobject *mq_kobj;
> > 
> > What is this kobject even used for?  It wasn't obvious at all from this
> > patch, why is it needed if you are not using it to reference count the
> > larger structure here?
> 
> All attributes and kobjects under /sys/block/$DEV/mq are covered by this 
> kobject
> actually, and all are for exposing blk-mq specific information, but now there 
> is
> only blk-mq, and legacy io path is removed.

I am sorry, but I really can not parse this sentance at all.

What Documentation/ABI/ entries are covered by this kobject, that should
help me out more.  And what do you mean by "legacy io"?

> That is why I mentioned we may remove this kobject last time and move all 
> under
> /sys/block/$DEV/queue, however you thought that may break some userspace.

Who relies on these sysfs files today?

> If we want to backport them to stable, this patch may be a bit easier to go.

Why do you want to backport any of this to stable?

thanks,

greg k-h


Re: [PATCH V2 for-4.21 2/2] blk-mq: alloc q->queue_ctx as normal array

2018-11-17 Thread Greg Kroah-Hartman
On Sat, Nov 17, 2018 at 10:34:18AM +0800, Ming Lei wrote:
> On Fri, Nov 16, 2018 at 06:06:23AM -0800, Greg Kroah-Hartman wrote:
> > On Fri, Nov 16, 2018 at 07:23:11PM +0800, Ming Lei wrote:
> > > Now q->queue_ctx is just one read-mostly table for query the
> > > 'blk_mq_ctx' instance from one cpu index, it isn't necessary
> > > to allocate it as percpu variable. One simple array may be
> > > more efficient.
> > 
> > "may be", have you run benchmarks to be sure?  If so, can you add the
> > results of them to this changelog?  If there is no measurable
> > difference, then why make this change at all?
> 
> __blk_mq_get_ctx() is used in fast path, what do you think about which
> one is more efficient?
> 
> - *per_cpu_ptr(q->queue_ctx, cpu);
> 
> - q->queue_ctx[cpu]

You need to actually test to see which one is faster, you might be
surprised :)

In other words, do not just guess.

> At least the latter isn't worse than the former.

How do you know?

> Especially q->queue_ctx is just a read-only look-up table, it doesn't
> make sense to make it percpu any more.
> 
> Not mention q->queue_ctx[cpu] is more clean/readable.

Again, please test to verify this.

thanks,

greg k-h


Re: [PATCH V2 for-4.21 2/2] blk-mq: alloc q->queue_ctx as normal array

2018-11-16 Thread Greg Kroah-Hartman
On Fri, Nov 16, 2018 at 07:23:11PM +0800, Ming Lei wrote:
> Now q->queue_ctx is just one read-mostly table for query the
> 'blk_mq_ctx' instance from one cpu index, it isn't necessary
> to allocate it as percpu variable. One simple array may be
> more efficient.

"may be", have you run benchmarks to be sure?  If so, can you add the
results of them to this changelog?  If there is no measurable
difference, then why make this change at all?

thanks,

greg k-h


Re: [PATCH V2 for-4.21 1/2] blk-mq: not embed .mq_kobj and ctx->kobj into queue instance

2018-11-16 Thread Greg Kroah-Hartman
On Fri, Nov 16, 2018 at 07:23:10PM +0800, Ming Lei wrote:
> @@ -456,7 +456,7 @@ struct request_queue {
>   /*
>* mq queue kobject
>*/
> - struct kobject mq_kobj;
> + struct kobject *mq_kobj;

What is this kobject even used for?  It wasn't obvious at all from this
patch, why is it needed if you are not using it to reference count the
larger structure here?

thanks,

greg k-h


Re: [GIT PULL] Block fixes for 4.19-final

2018-10-19 Thread Greg Kroah-Hartman
On Fri, Oct 19, 2018 at 09:52:39AM -0600, Jens Axboe wrote:
> Hi Greg,
> 
> Two small fixes that should go into this release. Please pull!
> 
> 
>   git://git.kernel.dk/linux-block.git tags/for-linus-20181019
> 

Now merged, thanks.

greg k-h


Re: [GIT PULL] Block fix for 4.19-rc

2018-10-13 Thread Greg Kroah-Hartman
On Fri, Oct 12, 2018 at 01:16:28PM -0600, Jens Axboe wrote:
> Hi Greg,
> 
> Just a single fix that should go in, fixing a regression introduced in
> the blk-wbt code.
> 
> Please pull!
> 
> 
>   git://git.kernel.dk/linux-block.git tags/for-linus-20181012

Now merged, thanks.

greg k-h


Re: [GIT PULL] Block fixes for 4.19-rc6

2018-09-29 Thread Greg Kroah-Hartman
On Sat, Sep 29, 2018 at 03:12:25PM -0600, Jens Axboe wrote:
> Hi Greg,
> 
> A set of fixes that should go into this release. This pull request
> contains:
> 
> - A fix (hopefully) for the persistent grants for xen-blkfront. A
>   previous fix from this series wasn't complete, hence reverted, and
>   this one should hopefully be it. (Boris Ostrovsky)
> 
> - Fix for an elevator drain warning with SMR devices, which is triggered
>   when you switch schedulers (Damien)
> 
> - bcache deadlock fix (Guoju Fang)
> 
> - Fix for the block unplug tracepoint, which has had the timer/explicit
>   flag reverted since 4.11 (Ilya)
> 
> - Fix a regression in this series where the blk-mq timeout hook is
>   invoked with the RCU read lock held, hence preventing it from blocking
>   (Keith)
> 
> - NVMe pull from Christoph, with a single multipath fix (Susobhan Dey)
> 
> Please pull!
> 
> 
>   git://git.kernel.dk/linux-block.git tags/for-linus-20180929

Now pulled, thanks.

greg k-h


Re: [GIT PULL] Block fix for 4.19-rc5

2018-09-23 Thread Greg Kroah-Hartman
On Sat, Sep 22, 2018 at 03:21:19PM -0600, Jens Axboe wrote:
> Hi Greg,
> 
> I was going to wait with this one, but then I decided that I think we
> should put it in sooner rather than later. Just a single fix in this
> pull request, fixing a regression in /proc/diskstats caused by the
> unification of timestamps.
> 
> Please pull!

Now pulled, thanks.

greg k-h


Re: [GIT PULL] Storage fixes for 4.19-rc5

2018-09-21 Thread Greg Kroah-Hartman
On Thu, Sep 20, 2018 at 03:50:53PM -0600, Jens Axboe wrote:
> Hi Greg,
> 
> Three fixes that should go into this series. This pull request
> contains:
> 
> - Fix for leaking kernel pointer in floppy ioctl (Andy Whitcroft)
> 
> - NVMe pull request from Christoph, and a single ANA log page fix
>   (Hannes)
> 
> - Regression fix for libata qd32 support, where we trigger an illegal
>   active command transition. This fixes a CD-ROM detection issue that
>   was reported, but could also trigger premature completion of the
>   internal tag (me)
> 
> 
> Please pull!
> 
> 
>   git://git.kernel.dk/linux-block.git tags/for-linus-20180920

Now pulled, thanks.

greg k-h


Re: Fix for 84676c1f (b5b6e8c8) missing in 4.14.y

2018-08-15 Thread Greg Kroah-Hartman
On Wed, Aug 15, 2018 at 02:30:23PM +, Felipe Franciosi wrote:
> 
> > On 10 Aug 2018, at 20:43, Felipe Franciosi  wrote:
> > 
> > 
> >> On 10 Aug 2018, at 11:32, Greg Kroah-Hartman  
> >> wrote:
> >> 
> >> On Fri, Aug 10, 2018 at 05:10:52PM +, Felipe Franciosi wrote:
> >>> 
> >>>> On 10 Aug 2018, at 03:15, Greg Kroah-Hartman 
> >>>>  wrote:
> >>>> 
> >>>> On Fri, Aug 10, 2018 at 10:31:29AM +0800, Ming Lei wrote:
> >>>>> On Fri, Aug 10, 2018 at 02:09:01AM +, Felipe Franciosi wrote:
> >>>>>> Hi Ming (and all),
> >>>>>> 
> >>>>>> Your series "scsi: virtio_scsi: fix IO hang caused by irq vector 
> >>>>>> automatic affinity" which forces virtio-scsi to use blk-mq fixes an 
> >>>>>> issue introduced by 84676c1f. We noticed that this bug also exists in 
> >>>>>> 4.14.y (as ef86f3a72adb), but your series was not backported to that 
> >>>>>> stable branch.
> >>>>>> 
> >>>>>> Are there any plans to do that? At least CoreOS is using 4.14 and 
> >>>>>> showing issues on AHV (which provides an mq virtio-scsi controller).
> >>>>>> 
> >>>>> 
> >>>>> Hi Felipe,
> >>>>> 
> >>>>> Looks the following 4 patches should have been marked as stable, sorry
> >>>>> for missing that.
> >>>>> 
> >>>>> b5b6e8c8d3b4 scsi: virtio_scsi: fix IO hang caused by automatic irq 
> >>>>> vector affinity
> >>>>> 2f31115e940c scsi: core: introduce force_blk_mq
> >>>>> adbe552349f2 scsi: megaraid_sas: fix selection of reply queue
> >>>>> 8b834bff1b73 scsi: hpsa: fix selection of reply queue
> >>>>> 
> >>>>> Usually this backporting is done by our stable guys, so I will CC stable
> >>>>> and leave them handle it, but I am happy to provide any help for
> >>>>> addressing conflicts or sort of thing.
> >>>> 
> >>>> As the above patches do not apply "cleanly" to the 4.14.y tree at all,
> >>>> can you please provide a set of backported patches that I can apply?
> >>> 
> >>> Actually, adbe552349f2 is already present in 4.14.y. It is commit 
> >>> e58114824fa6.
> >>> 
> >>> If you skip that, all the other three apply cleanly.
> >> 
> >> Ok, that works, but there's another bug report of aacraid having
> >> problems.  Any ideas?
> > 
> > Heya, I actually have no idea which bug you are talking about. TBH I'm only 
> > experiencing the bug fixed by b5b6e8c8d3b4, which only requires 
> > 2f31115e940c. (I tested a 4.14 with both commits which resolves the bug.)
> > 
> > I doubt any of that would have any interference with aacraid and should be 
> > safe backports in that respect.
> 
> Hi Greg, sorry to bother but I didn't hear anything back about this.
> Are you picking up 2f31115e940c and b5b6e8c8d3b4 for 4.14.y or waiting
> for some other action?

They are in 4.14.63-rc1 right now, did you not see them?

thanks,

greg k-h


Re: [PATCH v2 1/2] block: Introduce alloc_disk_node_attr()

2018-08-13 Thread Greg Kroah-Hartman
On Mon, Aug 13, 2018 at 10:25:30AM -0700, Bart Van Assche wrote:
> This patch does not change the behavior of any existing code but
> introduces a function that will be used by the next patch.

"next" is hard to determine in a git log :)

Try being a bit more specific as to why you are doing this.

thanks,

greg k-h


Re: Fix for 84676c1f (b5b6e8c8) missing in 4.14.y

2018-08-10 Thread Greg Kroah-Hartman
On Fri, Aug 10, 2018 at 05:10:52PM +, Felipe Franciosi wrote:
> 
> > On 10 Aug 2018, at 03:15, Greg Kroah-Hartman  
> > wrote:
> > 
> > On Fri, Aug 10, 2018 at 10:31:29AM +0800, Ming Lei wrote:
> >> On Fri, Aug 10, 2018 at 02:09:01AM +, Felipe Franciosi wrote:
> >>> Hi Ming (and all),
> >>> 
> >>> Your series "scsi: virtio_scsi: fix IO hang caused by irq vector 
> >>> automatic affinity" which forces virtio-scsi to use blk-mq fixes an issue 
> >>> introduced by 84676c1f. We noticed that this bug also exists in 4.14.y 
> >>> (as ef86f3a72adb), but your series was not backported to that stable 
> >>> branch.
> >>> 
> >>> Are there any plans to do that? At least CoreOS is using 4.14 and showing 
> >>> issues on AHV (which provides an mq virtio-scsi controller).
> >>> 
> >> 
> >> Hi Felipe,
> >> 
> >> Looks the following 4 patches should have been marked as stable, sorry
> >> for missing that.
> >> 
> >> b5b6e8c8d3b4 scsi: virtio_scsi: fix IO hang caused by automatic irq vector 
> >> affinity
> >> 2f31115e940c scsi: core: introduce force_blk_mq
> >> adbe552349f2 scsi: megaraid_sas: fix selection of reply queue
> >> 8b834bff1b73 scsi: hpsa: fix selection of reply queue
> >> 
> >> Usually this backporting is done by our stable guys, so I will CC stable
> >> and leave them handle it, but I am happy to provide any help for
> >> addressing conflicts or sort of thing.
> > 
> > As the above patches do not apply "cleanly" to the 4.14.y tree at all,
> > can you please provide a set of backported patches that I can apply?
> 
> Actually, adbe552349f2 is already present in 4.14.y. It is commit 
> e58114824fa6.
> 
> If you skip that, all the other three apply cleanly.

Ok, that works, but there's another bug report of aacraid having
problems.  Any ideas?

greg k-h


Re: Fix for 84676c1f (b5b6e8c8) missing in 4.14.y

2018-08-10 Thread Greg Kroah-Hartman
On Fri, Aug 10, 2018 at 10:31:29AM +0800, Ming Lei wrote:
> On Fri, Aug 10, 2018 at 02:09:01AM +, Felipe Franciosi wrote:
> > Hi Ming (and all),
> > 
> > Your series "scsi: virtio_scsi: fix IO hang caused by irq vector automatic 
> > affinity" which forces virtio-scsi to use blk-mq fixes an issue introduced 
> > by 84676c1f. We noticed that this bug also exists in 4.14.y (as 
> > ef86f3a72adb), but your series was not backported to that stable branch.
> > 
> > Are there any plans to do that? At least CoreOS is using 4.14 and showing 
> > issues on AHV (which provides an mq virtio-scsi controller).
> > 
> 
> Hi Felipe,
> 
> Looks the following 4 patches should have been marked as stable, sorry
> for missing that.
> 
> b5b6e8c8d3b4 scsi: virtio_scsi: fix IO hang caused by automatic irq vector 
> affinity
> 2f31115e940c scsi: core: introduce force_blk_mq
> adbe552349f2 scsi: megaraid_sas: fix selection of reply queue
> 8b834bff1b73 scsi: hpsa: fix selection of reply queue
> 
> Usually this backporting is done by our stable guys, so I will CC stable
> and leave them handle it, but I am happy to provide any help for
> addressing conflicts or sort of thing.

As the above patches do not apply "cleanly" to the 4.14.y tree at all,
can you please provide a set of backported patches that I can apply?

thanks,

greg k-h


Re: [PATCH 2/2] tracing/events: block: dev_t via driver core for plug and unplug events

2018-04-15 Thread Greg Kroah-Hartman
On Fri, Apr 13, 2018 at 03:07:18PM +0200, Steffen Maier wrote:
> Complements v2.6.31 commit 55782138e47d ("tracing/events: convert block
> trace points to TRACE_EVENT()") to be equivalent to traditional blktrace
> output. Also this allows event filtering to not always get all (un)plug
> events.
> 
> NB: The NULL pointer check for q->kobj.parent is certainly racy and
> I don't have enough experience if it's good enough for a trace event.
> The change did work for my cases (block device read/write I/O on
> zfcp-attached SCSI disks and dm-mpath on top).
> 
> While I haven't seen any prior art using driver core (parent) relations
> for trace events, there are other cases using this when no direct pointer
> exists between objects, such as:
>  #define to_scsi_target(d)container_of(d, struct scsi_target, dev)
>  static inline struct scsi_target *scsi_target(struct scsi_device *sdev)
>  {
>   return to_scsi_target(sdev->sdev_gendev.parent);
>  }

That is because you "know" the parent of a target device is a
scsi_target.

> This is the object model we make use of here:
> 
> struct gendisk {
> struct hd_struct {
> struct device {  /*container_of*/
> struct kobject kobj; <--+
> dev_t  devt; /*deref*/  |
> } __dev;|
> } part0;|
> struct request_queue *queue; ..+|
> }  :|
>:|
> struct request_queue {  <..+|
> /* queue kobject */ |
> struct kobject {|
> struct kobject *parent; +

Are you sure about this?

> } kobj;
> }
> 
> The parent pointer comes from:
>  #define disk_to_dev(disk)(&(disk)->part0.__dev)
> int blk_register_queue(struct gendisk *disk)
>   struct device *dev = disk_to_dev(disk);
>   struct request_queue *q = disk->queue;
>   ret = kobject_add(>kobj, kobject_get(>kobj), "%s", "queue");
>   ^^^parent
> 
> $ ls -d /sys/block/sdf/queue
> /sys/block/sda/queue
> $ cat /sys/block/sdf/dev
> 80:0
> 
> A partition does not have its own request queue:
> 
> $ cat /sys/block/sdf/sdf1/dev
> 8:81
> $ ls -d /sys/block/sdf/sdf1/queue
> ls: cannot access '/sys/block/sdf/sdf1/queue': No such file or directory
> 
> The difference to blktrace parsed output is that block events don't use the
> partition's minor number but the containing block device's minor number:

Why do you want the block device's minor number here?  What is wrong
with the partition's minor number?  I would think you want that instead.

> 
> $ dd if=/dev/sdf1 count=1
> 
> $ cat /sys/kernel/debug/tracing/trace
> block_bio_remap: 8,80 R 2048 + 32 <- (8,81) 0
> block_bio_queue: 8,80 R 2048 + 32 [dd]
> block_getrq: 8,80 R 2048 + 32 [dd]
> block_plug: 8,80 [dd]
> 
> block_rq_insert: 8,80 R 16384 () 2048 + 32 [dd]
> block_unplug: 8,80 [dd] 1 explicit
>   
> block_rq_issue: 8,80 R 16384 () 2048 + 32 [dd]
> block_rq_complete: 8,80 R () 2048 + 32 [0]
> 
> $ btrace /dev/sdf1
>   8,80   11 0.0 240240  A   R 2048 + 32 <- (8,81) 0
>   8,81   12 0.000220890 240240  Q   R 2048 + 32 [dd]
>   8,81   13 0.000229639 240240  G   R 2048 + 32 [dd]
>   8,81   14 0.000231805 240240  P   N [dd]
> ^^
>   8,81   15 0.000234671 240240  I   R 2048 + 32 [dd]
>   8,81   16 0.000236365 240240  U   N [dd] 1
> ^^
>   8,81   17 0.000238527 240240  D   R 2048 + 32 [dd]
>   8,81   22 0.000613741 0  C   R 2048 + 32 [0]
> 
> Signed-off-by: Steffen Maier 
> ---
>  include/trace/events/block.h | 13 +++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/include/trace/events/block.h b/include/trace/events/block.h
> index a13613d27cee..cffedc26e8a3 100644
> --- a/include/trace/events/block.h
> +++ b/include/trace/events/block.h
> @@ -460,14 +460,18 @@ TRACE_EVENT(block_plug,
>   TP_ARGS(q),
>  
>   TP_STRUCT__entry(
> + __field( dev_t, dev )
>   __array( char,  comm,   TASK_COMM_LEN   )
>   ),
>  
>   TP_fast_assign(
> + __entry->dev = q->kobj.parent ?
> + container_of(q->kobj.parent, struct device, kobj)->devt : 0;

That really really really scares me.  It feels very fragile and messing
with parent pointers is ripe for things breaking in the future in odd
and unexplainable ways.

And how can the parent be NULL?

>   memcpy(__entry->comm, current->comm, TASK_COMM_LEN);
>   ),
>  
> - TP_printk("[%s]", __entry->comm)
> + TP_printk("%d,%d [%s]",
> +   MAJOR(__entry->dev), MINOR(__entry->dev), __entry->comm)
>  );
>  
>  #define 

[PATCH 4.14 28/89] delayacct: Account blkio completion on the correct task

2018-01-22 Thread Greg Kroah-Hartman
4.14-stable review patch.  If anyone has any objections, please let me know.

--

From: Josh Snyder <jo...@netflix.com>

commit c96f5471ce7d2aefd0dda560cc23f08ab00bc65d upstream.

Before commit:

  e33a9bba85a8 ("sched/core: move IO scheduling accounting from 
io_schedule_timeout() into scheduler")

delayacct_blkio_end() was called after context-switching into the task which
completed I/O.

This resulted in double counting: the task would account a delay both waiting
for I/O and for time spent in the runqueue.

With e33a9bba85a8, delayacct_blkio_end() is called by try_to_wake_up().
In ttwu, we have not yet context-switched. This is more correct, in that
the delay accounting ends when the I/O is complete.

But delayacct_blkio_end() relies on 'get_current()', and we have not yet
context-switched into the task whose I/O completed. This results in the
wrong task having its delay accounting statistics updated.

Instead of doing that, pass the task_struct being woken to 
delayacct_blkio_end(),
so that it can update the statistics of the correct task.

Signed-off-by: Josh Snyder <jo...@netflix.com>
Acked-by: Tejun Heo <t...@kernel.org>
Acked-by: Balbir Singh <bsinghar...@gmail.com>
Cc: Brendan Gregg <bgr...@netflix.com>
Cc: Jens Axboe <ax...@kernel.dk>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: linux-block@vger.kernel.org
Fixes: e33a9bba85a8 ("sched/core: move IO scheduling accounting from 
io_schedule_timeout() into scheduler")
Link: http://lkml.kernel.org/r/1513613712-571-1-git-send-email-jo...@netflix.com
Signed-off-by: Ingo Molnar <mi...@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
 include/linux/delayacct.h |8 
 kernel/delayacct.c|   42 ++
 kernel/sched/core.c   |6 +++---
 3 files changed, 33 insertions(+), 23 deletions(-)

--- a/include/linux/delayacct.h
+++ b/include/linux/delayacct.h
@@ -71,7 +71,7 @@ extern void delayacct_init(void);
 extern void __delayacct_tsk_init(struct task_struct *);
 extern void __delayacct_tsk_exit(struct task_struct *);
 extern void __delayacct_blkio_start(void);
-extern void __delayacct_blkio_end(void);
+extern void __delayacct_blkio_end(struct task_struct *);
 extern int __delayacct_add_tsk(struct taskstats *, struct task_struct *);
 extern __u64 __delayacct_blkio_ticks(struct task_struct *);
 extern void __delayacct_freepages_start(void);
@@ -122,10 +122,10 @@ static inline void delayacct_blkio_start
__delayacct_blkio_start();
 }
 
-static inline void delayacct_blkio_end(void)
+static inline void delayacct_blkio_end(struct task_struct *p)
 {
if (current->delays)
-   __delayacct_blkio_end();
+   __delayacct_blkio_end(p);
delayacct_clear_flag(DELAYACCT_PF_BLKIO);
 }
 
@@ -169,7 +169,7 @@ static inline void delayacct_tsk_free(st
 {}
 static inline void delayacct_blkio_start(void)
 {}
-static inline void delayacct_blkio_end(void)
+static inline void delayacct_blkio_end(struct task_struct *p)
 {}
 static inline int delayacct_add_tsk(struct taskstats *d,
struct task_struct *tsk)
--- a/kernel/delayacct.c
+++ b/kernel/delayacct.c
@@ -51,16 +51,16 @@ void __delayacct_tsk_init(struct task_st
  * Finish delay accounting for a statistic using its timestamps (@start),
  * accumalator (@total) and @count
  */
-static void delayacct_end(u64 *start, u64 *total, u32 *count)
+static void delayacct_end(spinlock_t *lock, u64 *start, u64 *total, u32 *count)
 {
s64 ns = ktime_get_ns() - *start;
unsigned long flags;
 
if (ns > 0) {
-   spin_lock_irqsave(>delays->lock, flags);
+   spin_lock_irqsave(lock, flags);
*total += ns;
(*count)++;
-   spin_unlock_irqrestore(>delays->lock, flags);
+   spin_unlock_irqrestore(lock, flags);
}
 }
 
@@ -69,17 +69,25 @@ void __delayacct_blkio_start(void)
current->delays->blkio_start = ktime_get_ns();
 }
 
-void __delayacct_blkio_end(void)
+/*
+ * We cannot rely on the `current` macro, as we haven't yet switched back to
+ * the process being woken.
+ */
+void __delayacct_blkio_end(struct task_struct *p)
 {
-   if (current->delays->flags & DELAYACCT_PF_SWAPIN)
-   /* Swapin block I/O */
-   delayacct_end(>delays->blkio_start,
-   >delays->swapin_delay,
-   >delays->swapin_count);
-   else/* Other block I/O */
-   delayacct_end(>delays->blkio_start,
-   >delays->blkio_delay,
-   >delays->blkio_count);
+   struct task_delay_info *delays = p->delay

Re: [PATCH] kernfs: checking for IS_ERR() instead of NULL

2017-08-31 Thread Greg Kroah-Hartman
On Thu, Aug 31, 2017 at 01:56:40PM -0600, Jens Axboe wrote:
> On 08/31/2017 10:52 AM, Greg Kroah-Hartman wrote:
> > On Wed, Aug 30, 2017 at 05:04:56PM +0300, Dan Carpenter wrote:
> >> The kernfs_get_inode() returns NULL on error, it never returns error
> >> pointers.
> >>
> >> Fixes: aa8188253474 ("kernfs: add exportfs operations")
> >> Signed-off-by: Dan Carpenter <dan.carpen...@oracle.com>
> >> Acked-by: Tejun Heo <t...@kernel.org>
> > 
> > Hm, I don't know what tree aa8188253474 is in, but it's not mine, so I
> > can't take this patch :(
> 
> It's in my tree, I'll take it. Can I add your
> acked/reviewed/whatever-by?

Yes:

Acked-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>


Re: [PATCH] kernfs: checking for IS_ERR() instead of NULL

2017-08-31 Thread Greg Kroah-Hartman
On Wed, Aug 30, 2017 at 05:04:56PM +0300, Dan Carpenter wrote:
> The kernfs_get_inode() returns NULL on error, it never returns error
> pointers.
> 
> Fixes: aa8188253474 ("kernfs: add exportfs operations")
> Signed-off-by: Dan Carpenter 
> Acked-by: Tejun Heo 

Hm, I don't know what tree aa8188253474 is in, but it's not mine, so I
can't take this patch :(

thanks,

greg k-h


[PATCH 4.12 055/106] blk-mq: Create hctx for each present CPU

2017-08-09 Thread Greg Kroah-Hartman
4.12-stable review patch.  If anyone has any objections, please let me know.

--

From: Christoph Hellwig <h...@lst.de>

commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.

Currently we only create hctx for online CPUs, which can lead to a lot
of churn due to frequent soft offline / online operations.  Instead
allocate one for each present CPU to avoid this and dramatically simplify
the code.

Signed-off-by: Christoph Hellwig <h...@lst.de>
Reviewed-by: Jens Axboe <ax...@kernel.dk>
Cc: Keith Busch <keith.bu...@intel.com>
Cc: linux-block@vger.kernel.org
Cc: linux-n...@lists.infradead.org
Link: http://lkml.kernel.org/r/20170626102058.10200-3-...@lst.de
Signed-off-by: Thomas Gleixner <t...@linutronix.de>
Cc: Oleksandr Natalenko <oleksa...@natalenko.name>
Cc: Mike Galbraith <efa...@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
 block/blk-mq.c |  120 -
 block/blk-mq.h |5 -
 include/linux/cpuhotplug.h |1 
 3 files changed, 11 insertions(+), 115 deletions(-)

--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -37,9 +37,6 @@
 #include "blk-wbt.h"
 #include "blk-mq-sched.h"
 
-static DEFINE_MUTEX(all_q_mutex);
-static LIST_HEAD(all_q_list);
-
 static void blk_mq_poll_stats_start(struct request_queue *q);
 static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb);
 static void __blk_mq_stop_hw_queues(struct request_queue *q, bool sync);
@@ -1975,8 +1972,8 @@ static void blk_mq_init_cpu_queues(struc
INIT_LIST_HEAD(&__ctx->rq_list);
__ctx->queue = q;
 
-   /* If the cpu isn't online, the cpu is mapped to first hctx */
-   if (!cpu_online(i))
+   /* If the cpu isn't present, the cpu is mapped to first hctx */
+   if (!cpu_present(i))
continue;
 
hctx = blk_mq_map_queue(q, i);
@@ -2019,8 +2016,7 @@ static void blk_mq_free_map_and_requests
}
 }
 
-static void blk_mq_map_swqueue(struct request_queue *q,
-  const struct cpumask *online_mask)
+static void blk_mq_map_swqueue(struct request_queue *q)
 {
unsigned int i, hctx_idx;
struct blk_mq_hw_ctx *hctx;
@@ -2038,13 +2034,11 @@ static void blk_mq_map_swqueue(struct re
}
 
/*
-* Map software to hardware queues
+* Map software to hardware queues.
+*
+* If the cpu isn't present, the cpu is mapped to first hctx.
 */
-   for_each_possible_cpu(i) {
-   /* If the cpu isn't online, the cpu is mapped to first hctx */
-   if (!cpumask_test_cpu(i, online_mask))
-   continue;
-
+   for_each_present_cpu(i) {
hctx_idx = q->mq_map[i];
/* unmapped hw queue can be remapped after CPU topo changed */
if (!set->tags[hctx_idx] &&
@@ -2340,16 +2334,8 @@ struct request_queue *blk_mq_init_alloca
blk_queue_softirq_done(q, set->ops->complete);
 
blk_mq_init_cpu_queues(q, set->nr_hw_queues);
-
-   get_online_cpus();
-   mutex_lock(_q_mutex);
-
-   list_add_tail(>all_q_node, _q_list);
blk_mq_add_queue_tag_set(set, q);
-   blk_mq_map_swqueue(q, cpu_online_mask);
-
-   mutex_unlock(_q_mutex);
-   put_online_cpus();
+   blk_mq_map_swqueue(q);
 
if (!(set->flags & BLK_MQ_F_NO_SCHED)) {
int ret;
@@ -2375,18 +2361,12 @@ void blk_mq_free_queue(struct request_qu
 {
struct blk_mq_tag_set   *set = q->tag_set;
 
-   mutex_lock(_q_mutex);
-   list_del_init(>all_q_node);
-   mutex_unlock(_q_mutex);
-
blk_mq_del_queue_tag_set(q);
-
blk_mq_exit_hw_queues(q, set, set->nr_hw_queues);
 }
 
 /* Basically redo blk_mq_init_queue with queue frozen */
-static void blk_mq_queue_reinit(struct request_queue *q,
-   const struct cpumask *online_mask)
+static void blk_mq_queue_reinit(struct request_queue *q)
 {
WARN_ON_ONCE(!atomic_read(>mq_freeze_depth));
 
@@ -2399,76 +2379,12 @@ static void blk_mq_queue_reinit(struct r
 * involves free and re-allocate memory, worthy doing?)
 */
 
-   blk_mq_map_swqueue(q, online_mask);
+   blk_mq_map_swqueue(q);
 
blk_mq_sysfs_register(q);
blk_mq_debugfs_register_hctxs(q);
 }
 
-/*
- * New online cpumask which is going to be set in this hotplug event.
- * Declare this cpumasks as global as cpu-hotplug operation is invoked
- * one-by-one and dynamically allocating this could result in a failure.
- */
-static struct cpumask cpuhp_online_new;
-
-static void blk_mq_queue_reinit_work(void)
-{
-   struct request_queue *q;
-
-   mutex_lock(_q_mutex);
-   /*
-* We need to freeze and reinit all existing queues.  Fr

[PATCH 4.12 054/106] blk-mq: Include all present CPUs in the default queue mapping

2017-08-09 Thread Greg Kroah-Hartman
4.12-stable review patch.  If anyone has any objections, please let me know.

--

From: Christoph Hellwig <h...@lst.de>

commit 5f042e7cbd9ebd3580077dcdc21f35e68c2adf5f upstream.

This way we get a nice distribution independent of the current cpu
online / offline state.

Signed-off-by: Christoph Hellwig <h...@lst.de>
Reviewed-by: Jens Axboe <ax...@kernel.dk>
Cc: Keith Busch <keith.bu...@intel.com>
Cc: linux-block@vger.kernel.org
Cc: linux-n...@lists.infradead.org
Link: http://lkml.kernel.org/r/20170626102058.10200-2-...@lst.de
Signed-off-by: Thomas Gleixner <t...@linutronix.de>
Cc: Oleksandr Natalenko <oleksa...@natalenko.name>
Cc: Mike Galbraith <efa...@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
 block/blk-mq-cpumap.c |5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

--- a/block/blk-mq-cpumap.c
+++ b/block/blk-mq-cpumap.c
@@ -35,7 +35,6 @@ int blk_mq_map_queues(struct blk_mq_tag_
 {
unsigned int *map = set->mq_map;
unsigned int nr_queues = set->nr_hw_queues;
-   const struct cpumask *online_mask = cpu_online_mask;
unsigned int i, nr_cpus, nr_uniq_cpus, queue, first_sibling;
cpumask_var_t cpus;
 
@@ -44,7 +43,7 @@ int blk_mq_map_queues(struct blk_mq_tag_
 
cpumask_clear(cpus);
nr_cpus = nr_uniq_cpus = 0;
-   for_each_cpu(i, online_mask) {
+   for_each_present_cpu(i) {
nr_cpus++;
first_sibling = get_first_sibling(i);
if (!cpumask_test_cpu(first_sibling, cpus))
@@ -54,7 +53,7 @@ int blk_mq_map_queues(struct blk_mq_tag_
 
queue = 0;
for_each_possible_cpu(i) {
-   if (!cpumask_test_cpu(i, online_mask)) {
+   if (!cpumask_test_cpu(i, cpu_present_mask)) {
map[i] = 0;
continue;
}




[PATCH 4.4 030/101] block: fix module reference leak on put_disk() call for cgroups throttle

2017-07-03 Thread Greg Kroah-Hartman
4.4-stable review patch.  If anyone has any objections, please let me know.

--

From: Roman Pen <roman.peny...@profitbricks.com>

commit 39a169b62b415390398291080dafe63aec751e0a upstream.

get_disk(),get_gendisk() calls have non explicit side effect: they
increase the reference on the disk owner module.

The following is the correct sequence how to get a disk reference and
to put it:

disk = get_gendisk(...);

/* use disk */

owner = disk->fops->owner;
put_disk(disk);
module_put(owner);

fs/block_dev.c is aware of this required module_put() call, but f.e.
blkg_conf_finish(), which is located in block/blk-cgroup.c, does not put
a module reference.  To see a leakage in action cgroups throttle config
can be used.  In the following script I'm removing throttle for /dev/ram0
(actually this is NOP, because throttle was never set for this device):

# lsmod | grep brd
brd 5175  0
# i=100; while [ $i -gt 0 ]; do echo "1:0 0" > \
/sys/fs/cgroup/blkio/blkio.throttle.read_bps_device; i=$(($i - 1)); \
done
# lsmod | grep brd
brd 5175  100

Now brd module has 100 references.

The issue is fixed by calling module_put() just right away put_disk().

Signed-off-by: Roman Pen <roman.peny...@profitbricks.com>
Cc: Gi-Oh Kim <gi-oh@profitbricks.com>
Cc: Tejun Heo <t...@kernel.org>
Cc: Jens Axboe <ax...@kernel.dk>
Cc: linux-block@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Signed-off-by: Jens Axboe <ax...@fb.com>
Cc: Sumit Semwal <sumit.sem...@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
 block/blk-cgroup.c |9 +
 1 file changed, 9 insertions(+)

--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -788,6 +788,7 @@ int blkg_conf_prep(struct blkcg *blkcg,
 {
struct gendisk *disk;
struct blkcg_gq *blkg;
+   struct module *owner;
unsigned int major, minor;
int key_len, part, ret;
char *body;
@@ -804,7 +805,9 @@ int blkg_conf_prep(struct blkcg *blkcg,
if (!disk)
return -ENODEV;
if (part) {
+   owner = disk->fops->owner;
put_disk(disk);
+   module_put(owner);
return -ENODEV;
}
 
@@ -820,7 +823,9 @@ int blkg_conf_prep(struct blkcg *blkcg,
ret = PTR_ERR(blkg);
rcu_read_unlock();
spin_unlock_irq(disk->queue->queue_lock);
+   owner = disk->fops->owner;
put_disk(disk);
+   module_put(owner);
/*
 * If queue was bypassing, we should retry.  Do so after a
 * short msleep().  It isn't strictly necessary but queue
@@ -851,9 +856,13 @@ EXPORT_SYMBOL_GPL(blkg_conf_prep);
 void blkg_conf_finish(struct blkg_conf_ctx *ctx)
__releases(ctx->disk->queue->queue_lock) __releases(rcu)
 {
+   struct module *owner;
+
spin_unlock_irq(ctx->disk->queue->queue_lock);
rcu_read_unlock();
+   owner = ctx->disk->fops->owner;
put_disk(ctx->disk);
+   module_put(owner);
 }
 EXPORT_SYMBOL_GPL(blkg_conf_finish);
 




[PATCH 6/7] pktcdvd: use class_groups instead of class_attrs

2017-06-08 Thread Greg Kroah-Hartman
The class_attrs pointer is long depreciated, and is about to be finally
removed, so move to use the class_groups pointer instead.

Cc: <linux-block@vger.kernel.org>
Cc: Jens Axboe <ax...@fb.com>
Cc: Hannes Reinecke <h...@suse.com>
Cc: Jan Kara <j...@suse.cz>
Cc: Mike Christie <mchri...@redhat.com>
Cc: Bart Van Assche <bart.vanass...@sandisk.com>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
---
 drivers/block/pktcdvd.c | 35 +--
 1 file changed, 17 insertions(+), 18 deletions(-)

diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index 205b865ebeb9..98939ee97476 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -348,9 +348,9 @@ static void class_pktcdvd_release(struct class *cls)
 {
kfree(cls);
 }
-static ssize_t class_pktcdvd_show_map(struct class *c,
-   struct class_attribute *attr,
-   char *data)
+
+static ssize_t device_map_show(struct class *c, struct class_attribute *attr,
+  char *data)
 {
int n = 0;
int idx;
@@ -368,11 +368,10 @@ static ssize_t class_pktcdvd_show_map(struct class *c,
mutex_unlock(_mutex);
return n;
 }
+static CLASS_ATTR_RO(device_map);
 
-static ssize_t class_pktcdvd_store_add(struct class *c,
-   struct class_attribute *attr,
-   const char *buf,
-   size_t count)
+static ssize_t add_store(struct class *c, struct class_attribute *attr,
+const char *buf, size_t count)
 {
unsigned int major, minor;
 
@@ -390,11 +389,10 @@ static ssize_t class_pktcdvd_store_add(struct class *c,
 
return -EINVAL;
 }
+static CLASS_ATTR_WO(add);
 
-static ssize_t class_pktcdvd_store_remove(struct class *c,
- struct class_attribute *attr,
- const char *buf,
-   size_t count)
+static ssize_t remove_store(struct class *c, struct class_attribute *attr,
+   const char *buf, size_t count)
 {
unsigned int major, minor;
if (sscanf(buf, "%u:%u", , ) == 2) {
@@ -403,14 +401,15 @@ static ssize_t class_pktcdvd_store_remove(struct class *c,
}
return -EINVAL;
 }
+static CLASS_ATTR_WO(remove);
 
-static struct class_attribute class_pktcdvd_attrs[] = {
- __ATTR(add,0200, NULL, class_pktcdvd_store_add),
- __ATTR(remove, 0200, NULL, class_pktcdvd_store_remove),
- __ATTR(device_map, 0444, class_pktcdvd_show_map, NULL),
- __ATTR_NULL
+static struct attribute *class_pktcdvd_attrs[] = {
+   _attr_add.attr,
+   _attr_remove.attr,
+   _attr_device_map.attr,
+   NULL,
 };
-
+ATTRIBUTE_GROUPS(class_pktcdvd);
 
 static int pkt_sysfs_init(void)
 {
@@ -426,7 +425,7 @@ static int pkt_sysfs_init(void)
class_pktcdvd->name = DRIVER_NAME;
class_pktcdvd->owner = THIS_MODULE;
class_pktcdvd->class_release = class_pktcdvd_release;
-   class_pktcdvd->class_attrs = class_pktcdvd_attrs;
+   class_pktcdvd->class_groups = class_pktcdvd_groups;
ret = class_register(class_pktcdvd);
if (ret) {
kfree(class_pktcdvd);
-- 
2.13.1



[PATCH 4.9 26/31] blk-mq: Avoid memory reclaim when remapping queues

2017-04-16 Thread Greg Kroah-Hartman
4.9-stable review patch.  If anyone has any objections, please let me know.

--

From: Gabriel Krisman Bertazi <kris...@linux.vnet.ibm.com>

commit 36e1f3d107867b25c616c2fd294f5a1c9d4e5d09 upstream.

While stressing memory and IO at the same time we changed SMT settings,
we were able to consistently trigger deadlocks in the mm system, which
froze the entire machine.

I think that under memory stress conditions, the large allocations
performed by blk_mq_init_rq_map may trigger a reclaim, which stalls
waiting on the block layer remmaping completion, thus deadlocking the
system.  The trace below was collected after the machine stalled,
waiting for the hotplug event completion.

The simplest fix for this is to make allocations in this path
non-reclaimable, with GFP_NOIO.  With this patch, We couldn't hit the
issue anymore.

This should apply on top of Jens's for-next branch cleanly.

Changes since v1:
  - Use GFP_NOIO instead of GFP_NOWAIT.

 Call Trace:
[c00f0160aaf0] [c00f0160ab50] 0xc00f0160ab50 (unreliable)
[c00f0160acc0] [c0016624] __switch_to+0x2e4/0x430
[c00f0160ad20] [c0b1a880] __schedule+0x310/0x9b0
[c00f0160ae00] [c0b1af68] schedule+0x48/0xc0
[c00f0160ae30] [c0b1b4b0] schedule_preempt_disabled+0x20/0x30
[c00f0160ae50] [c0b1d4fc] __mutex_lock_slowpath+0xec/0x1f0
[c00f0160aed0] [c0b1d678] mutex_lock+0x78/0xa0
[c00f0160af00] [d00019413cac] xfs_reclaim_inodes_ag+0x33c/0x380 [xfs]
[c00f0160b0b0] [d00019415164] xfs_reclaim_inodes_nr+0x54/0x70 [xfs]
[c00f0160b0f0] [d000194297f8] xfs_fs_free_cached_objects+0x38/0x60 [xfs]
[c00f0160b120] [c03172c8] super_cache_scan+0x1f8/0x210
[c00f0160b190] [c026301c] shrink_slab.part.13+0x21c/0x4c0
[c00f0160b2d0] [c0268088] shrink_zone+0x2d8/0x3c0
[c00f0160b380] [c026834c] do_try_to_free_pages+0x1dc/0x520
[c00f0160b450] [c026876c] try_to_free_pages+0xdc/0x250
[c00f0160b4e0] [c0251978] __alloc_pages_nodemask+0x868/0x10d0
[c00f0160b6f0] [c0567030] blk_mq_init_rq_map+0x160/0x380
[c00f0160b7a0] [c056758c] blk_mq_map_swqueue+0x33c/0x360
[c00f0160b820] [c0567904] blk_mq_queue_reinit+0x64/0xb0
[c00f0160b850] [c056a16c] blk_mq_queue_reinit_notify+0x19c/0x250
[c00f0160b8a0] [c00f5d38] notifier_call_chain+0x98/0x100
[c00f0160b8f0] [c00c5fb0] __cpu_notify+0x70/0xe0
[c00f0160b930] [c00c63c4] notify_prepare+0x44/0xb0
[c00f0160b9b0] [c00c52f4] cpuhp_invoke_callback+0x84/0x250
[c00f0160ba10] [c00c570c] cpuhp_up_callbacks+0x5c/0x120
[c00f0160ba60] [c00c7cb8] _cpu_up+0xf8/0x1d0
[c00f0160bac0] [c00c7eb0] do_cpu_up+0x120/0x150
[c00f0160bb40] [c06fe024] cpu_subsys_online+0x64/0xe0
[c00f0160bb90] [c06f5124] device_online+0xb4/0x120
[c00f0160bbd0] [c06f5244] online_store+0xb4/0xc0
[c00f0160bc20] [c06f0a68] dev_attr_store+0x68/0xa0
[c00f0160bc60] [c03ccc30] sysfs_kf_write+0x80/0xb0
[c00f0160bca0] [c03cbabc] kernfs_fop_write+0x17c/0x250
[c00f0160bcf0] [c030fe6c] __vfs_write+0x6c/0x1e0
[c00f0160bd90] [c0311490] vfs_write+0xd0/0x270
[c00f0160bde0] [c03131fc] SyS_write+0x6c/0x110
[c00f0160be30] [c0009204] system_call+0x38/0xec

Signed-off-by: Gabriel Krisman Bertazi <kris...@linux.vnet.ibm.com>
Cc: Brian King <brk...@linux.vnet.ibm.com>
Cc: Douglas Miller <dougm...@linux.vnet.ibm.com>
Cc: linux-block@vger.kernel.org
Cc: linux-s...@vger.kernel.org
Signed-off-by: Jens Axboe <ax...@fb.com>
Signed-off-by: Sumit Semwal <sumit.sem...@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
 block/blk-mq.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1474,7 +1474,7 @@ static struct blk_mq_tags *blk_mq_init_r
INIT_LIST_HEAD(>page_list);
 
tags->rqs = kzalloc_node(set->queue_depth * sizeof(struct request *),
-GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY,
+GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY,
 set->numa_node);
if (!tags->rqs) {
blk_mq_free_tags(tags);
@@ -1500,7 +1500,7 @@ static struct blk_mq_tags *blk_mq_init_r
 
do {
page = alloc_pages_node(set->numa_node,
-   GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY | 
__GFP_ZERO,
+   GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY | 
__GFP_ZERO,
this_order);
if (page)
break;
@@ -1521,7 +1521,7 @@ static struct blk_mq_tags *blk_mq_init_r
 *

[PATCH 4.4 13/18] blk-mq: Avoid memory reclaim when remapping queues

2017-04-16 Thread Greg Kroah-Hartman
4.4-stable review patch.  If anyone has any objections, please let me know.

--

From: Gabriel Krisman Bertazi <kris...@linux.vnet.ibm.com>

commit 36e1f3d107867b25c616c2fd294f5a1c9d4e5d09 upstream.

While stressing memory and IO at the same time we changed SMT settings,
we were able to consistently trigger deadlocks in the mm system, which
froze the entire machine.

I think that under memory stress conditions, the large allocations
performed by blk_mq_init_rq_map may trigger a reclaim, which stalls
waiting on the block layer remmaping completion, thus deadlocking the
system.  The trace below was collected after the machine stalled,
waiting for the hotplug event completion.

The simplest fix for this is to make allocations in this path
non-reclaimable, with GFP_NOIO.  With this patch, We couldn't hit the
issue anymore.

This should apply on top of Jens's for-next branch cleanly.

Changes since v1:
  - Use GFP_NOIO instead of GFP_NOWAIT.

 Call Trace:
[c00f0160aaf0] [c00f0160ab50] 0xc00f0160ab50 (unreliable)
[c00f0160acc0] [c0016624] __switch_to+0x2e4/0x430
[c00f0160ad20] [c0b1a880] __schedule+0x310/0x9b0
[c00f0160ae00] [c0b1af68] schedule+0x48/0xc0
[c00f0160ae30] [c0b1b4b0] schedule_preempt_disabled+0x20/0x30
[c00f0160ae50] [c0b1d4fc] __mutex_lock_slowpath+0xec/0x1f0
[c00f0160aed0] [c0b1d678] mutex_lock+0x78/0xa0
[c00f0160af00] [d00019413cac] xfs_reclaim_inodes_ag+0x33c/0x380 [xfs]
[c00f0160b0b0] [d00019415164] xfs_reclaim_inodes_nr+0x54/0x70 [xfs]
[c00f0160b0f0] [d000194297f8] xfs_fs_free_cached_objects+0x38/0x60 [xfs]
[c00f0160b120] [c03172c8] super_cache_scan+0x1f8/0x210
[c00f0160b190] [c026301c] shrink_slab.part.13+0x21c/0x4c0
[c00f0160b2d0] [c0268088] shrink_zone+0x2d8/0x3c0
[c00f0160b380] [c026834c] do_try_to_free_pages+0x1dc/0x520
[c00f0160b450] [c026876c] try_to_free_pages+0xdc/0x250
[c00f0160b4e0] [c0251978] __alloc_pages_nodemask+0x868/0x10d0
[c00f0160b6f0] [c0567030] blk_mq_init_rq_map+0x160/0x380
[c00f0160b7a0] [c056758c] blk_mq_map_swqueue+0x33c/0x360
[c00f0160b820] [c0567904] blk_mq_queue_reinit+0x64/0xb0
[c00f0160b850] [c056a16c] blk_mq_queue_reinit_notify+0x19c/0x250
[c00f0160b8a0] [c00f5d38] notifier_call_chain+0x98/0x100
[c00f0160b8f0] [c00c5fb0] __cpu_notify+0x70/0xe0
[c00f0160b930] [c00c63c4] notify_prepare+0x44/0xb0
[c00f0160b9b0] [c00c52f4] cpuhp_invoke_callback+0x84/0x250
[c00f0160ba10] [c00c570c] cpuhp_up_callbacks+0x5c/0x120
[c00f0160ba60] [c00c7cb8] _cpu_up+0xf8/0x1d0
[c00f0160bac0] [c00c7eb0] do_cpu_up+0x120/0x150
[c00f0160bb40] [c06fe024] cpu_subsys_online+0x64/0xe0
[c00f0160bb90] [c06f5124] device_online+0xb4/0x120
[c00f0160bbd0] [c06f5244] online_store+0xb4/0xc0
[c00f0160bc20] [c06f0a68] dev_attr_store+0x68/0xa0
[c00f0160bc60] [c03ccc30] sysfs_kf_write+0x80/0xb0
[c00f0160bca0] [c03cbabc] kernfs_fop_write+0x17c/0x250
[c00f0160bcf0] [c030fe6c] __vfs_write+0x6c/0x1e0
[c00f0160bd90] [c0311490] vfs_write+0xd0/0x270
[c00f0160bde0] [c03131fc] SyS_write+0x6c/0x110
[c00f0160be30] [c0009204] system_call+0x38/0xec

Signed-off-by: Gabriel Krisman Bertazi <kris...@linux.vnet.ibm.com>
Cc: Brian King <brk...@linux.vnet.ibm.com>
Cc: Douglas Miller <dougm...@linux.vnet.ibm.com>
Cc: linux-block@vger.kernel.org
Cc: linux-s...@vger.kernel.org
Signed-off-by: Jens Axboe <ax...@fb.com>
Signed-off-by: Sumit Semwal <sumit.sem...@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
 block/blk-mq.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1470,7 +1470,7 @@ static struct blk_mq_tags *blk_mq_init_r
INIT_LIST_HEAD(>page_list);
 
tags->rqs = kzalloc_node(set->queue_depth * sizeof(struct request *),
-GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY,
+GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY,
 set->numa_node);
if (!tags->rqs) {
blk_mq_free_tags(tags);
@@ -1496,7 +1496,7 @@ static struct blk_mq_tags *blk_mq_init_r
 
do {
page = alloc_pages_node(set->numa_node,
-   GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY | 
__GFP_ZERO,
+   GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY | 
__GFP_ZERO,
this_order);
if (page)
break;
@@ -1517,7 +1517,7 @@ static struct blk_mq_tags *blk_mq_init_r
 *

Re: [PATCH 0/6] block: fix blk-mq debugfs vs. blktrace

2017-02-02 Thread Greg Kroah-Hartman
On Thu, Feb 02, 2017 at 09:01:45AM -0800, Omar Sandoval wrote:
> On Thu, Feb 02, 2017 at 11:58:53AM +0100, Greg Kroah-Hartman wrote:
> > On Wed, Feb 01, 2017 at 12:31:15AM -0800, Omar Sandoval wrote:
> > > On Wed, Feb 01, 2017 at 09:16:08AM +0100, Greg Kroah-Hartman wrote:
> > > > On Tue, Jan 31, 2017 at 02:53:16PM -0800, Omar Sandoval wrote:
> > > > > From: Omar Sandoval <osan...@fb.com>
> > > > > 
> > > > > When I moved the blk-mq debugging information to debugfs, I didn't
> > > > > realize that blktrace also created directories in debugfs that
> > > > > conflicted with the blk-mq directories. This series fixes that.
> > > > > 
> > > > > Patch 1 adds a new debugfs helper needed for patch 6. Greg, could I 
> > > > > get
> > > > > an ack on that if it makes sense? Jens and I went back and forth on 
> > > > > this
> > > > > for a little while, but patch 6 has more of the rationale on why we
> > > > > decided that this approach was the cleanest.
> > > > 
> > > > I can't find patch 6, you only cc:ed me on the first patch :(
> > > > 
> > > > Care to bounce them all to me?
> > > > 
> > > > thanks,
> > > > 
> > > > greg k-h
> > > 
> > > Gah, I forgot --cc-cover to git-send-email. The series is all here:
> > > http://marc.info/?l=linux-block=1=201701=2. I can also send the
> > > patches directly to you if you prefer that.
> > 
> > I don't understand the problem here.  How do you not know if you have
> > created the debugfs file or not?  You have the structure, with the
> > correct name, how could it have been created?  Can't you save the dentry
> > to the debugfs file in the structure that has the name?
> > 
> > thanks,
> > 
> > greg k-h
> 
> Hey, Greg,
> 
> So here's the alternative to doing the lookup:
> 
> diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
> index 38052f625a0f..79ef6b9d1f96 100644
> --- a/kernel/trace/blktrace.c
> +++ b/kernel/trace/blktrace.c
> @@ -470,12 +470,15 @@ static int do_blk_trace_setup(struct request_queue *q, 
> char *name, dev_t dev,
>   if (!blk_debugfs_root)
>   goto err;
>  
> - dir = debugfs_create_dir(buts->name, blk_debugfs_root);
> -
> +#ifdef CONFIG_BLK_DEBUG_FS
> + if (q->mq_ops && !bdev->bd_part.partno)
> + dir = q->debugfs_dir;
> +#endif

Eeek, no #ifdefs please :)

The lookup patch is fine, please take it.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/6] block: fix blk-mq debugfs vs. blktrace

2017-02-02 Thread Greg Kroah-Hartman
On Thu, Feb 02, 2017 at 08:17:31AM -0700, Jens Axboe wrote:
> On 02/02/2017 03:58 AM, Greg Kroah-Hartman wrote:
> > On Wed, Feb 01, 2017 at 12:31:15AM -0800, Omar Sandoval wrote:
> >> On Wed, Feb 01, 2017 at 09:16:08AM +0100, Greg Kroah-Hartman wrote:
> >>> On Tue, Jan 31, 2017 at 02:53:16PM -0800, Omar Sandoval wrote:
> >>>> From: Omar Sandoval <osan...@fb.com>
> >>>>
> >>>> When I moved the blk-mq debugging information to debugfs, I didn't
> >>>> realize that blktrace also created directories in debugfs that
> >>>> conflicted with the blk-mq directories. This series fixes that.
> >>>>
> >>>> Patch 1 adds a new debugfs helper needed for patch 6. Greg, could I get
> >>>> an ack on that if it makes sense? Jens and I went back and forth on this
> >>>> for a little while, but patch 6 has more of the rationale on why we
> >>>> decided that this approach was the cleanest.
> >>>
> >>> I can't find patch 6, you only cc:ed me on the first patch :(
> >>>
> >>> Care to bounce them all to me?
> >>>
> >>> thanks,
> >>>
> >>> greg k-h
> >>
> >> Gah, I forgot --cc-cover to git-send-email. The series is all here:
> >> http://marc.info/?l=linux-block=1=201701=2. I can also send the
> >> patches directly to you if you prefer that.
> > 
> > I don't understand the problem here.  How do you not know if you have
> > created the debugfs file or not?  You have the structure, with the
> > correct name, how could it have been created?  Can't you save the dentry
> > to the debugfs file in the structure that has the name?
> 
> The problem is that blktrace registers a trace name directory, which
> can be either whole device or partition, depending on what you trace.
> For the blk-mq debug parts, we always just register the whole device
> name. There's no way to save the partition dentry, and imho, why even
> would you when you can just look it up. It's a file system...

I agree, it is a file system, but usually that debugfs file is
associated with some sort of data you want to keep track of outside of a
filesystem :)

Anyway, if it's such a big pain, then it's fine, add the function, no
objection from me anymore.

Acked-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/6] block: fix blk-mq debugfs vs. blktrace

2017-02-02 Thread Greg Kroah-Hartman
On Wed, Feb 01, 2017 at 12:31:15AM -0800, Omar Sandoval wrote:
> On Wed, Feb 01, 2017 at 09:16:08AM +0100, Greg Kroah-Hartman wrote:
> > On Tue, Jan 31, 2017 at 02:53:16PM -0800, Omar Sandoval wrote:
> > > From: Omar Sandoval <osan...@fb.com>
> > > 
> > > When I moved the blk-mq debugging information to debugfs, I didn't
> > > realize that blktrace also created directories in debugfs that
> > > conflicted with the blk-mq directories. This series fixes that.
> > > 
> > > Patch 1 adds a new debugfs helper needed for patch 6. Greg, could I get
> > > an ack on that if it makes sense? Jens and I went back and forth on this
> > > for a little while, but patch 6 has more of the rationale on why we
> > > decided that this approach was the cleanest.
> > 
> > I can't find patch 6, you only cc:ed me on the first patch :(
> > 
> > Care to bounce them all to me?
> > 
> > thanks,
> > 
> > greg k-h
> 
> Gah, I forgot --cc-cover to git-send-email. The series is all here:
> http://marc.info/?l=linux-block=1=201701=2. I can also send the
> patches directly to you if you prefer that.

I don't understand the problem here.  How do you not know if you have
created the debugfs file or not?  You have the structure, with the
correct name, how could it have been created?  Can't you save the dentry
to the debugfs file in the structure that has the name?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/6] block: fix blk-mq debugfs vs. blktrace

2017-02-01 Thread Greg Kroah-Hartman
On Tue, Jan 31, 2017 at 02:53:16PM -0800, Omar Sandoval wrote:
> From: Omar Sandoval 
> 
> When I moved the blk-mq debugging information to debugfs, I didn't
> realize that blktrace also created directories in debugfs that
> conflicted with the blk-mq directories. This series fixes that.
> 
> Patch 1 adds a new debugfs helper needed for patch 6. Greg, could I get
> an ack on that if it makes sense? Jens and I went back and forth on this
> for a little while, but patch 6 has more of the rationale on why we
> decided that this approach was the cleanest.

I can't find patch 6, you only cc:ed me on the first patch :(

Care to bounce them all to me?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4.7 45/45] cfq: fix starvation of asynchronous writes

2016-10-21 Thread Greg Kroah-Hartman
--- count
 0 | 11
 1 |161
 2 |@  1966
 4 |@54
 8 | 36
16 |  7
32 |  0
64 |  0
   ~
  1024 |  0
  2048 |  0
  4096 |  1
  8192 |  1
 16384 |  2
 32768 |  0
 65536 |  0
131072 |  1
262144 |  0
524288 |  0

Signed-off-by: Glauber Costa <glau...@scylladb.com>
CC: Jens Axboe <ax...@kernel.dk>
CC: linux-block@vger.kernel.org
CC: linux-ker...@vger.kernel.org
Signed-off-by: Glauber Costa <glau...@scylladb.com>
Signed-off-by: Jens Axboe <ax...@fb.com>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
 block/cfq-iosched.c |   13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -3021,7 +3021,6 @@ static struct request *cfq_check_fifo(st
if (time_before(jiffies, rq->fifo_time))
rq = NULL;
 
-   cfq_log_cfqq(cfqq->cfqd, cfqq, "fifo=%p", rq);
return rq;
 }
 
@@ -3395,6 +3394,9 @@ static bool cfq_may_dispatch(struct cfq_
 {
unsigned int max_dispatch;
 
+   if (cfq_cfqq_must_dispatch(cfqq))
+   return true;
+
/*
 * Drain async requests before we start sync IO
 */
@@ -3486,15 +3488,20 @@ static bool cfq_dispatch_request(struct
 
BUG_ON(RB_EMPTY_ROOT(>sort_list));
 
+   rq = cfq_check_fifo(cfqq);
+   if (rq)
+   cfq_mark_cfqq_must_dispatch(cfqq);
+
if (!cfq_may_dispatch(cfqd, cfqq))
return false;
 
/*
 * follow expired path, else get first next available
 */
-   rq = cfq_check_fifo(cfqq);
if (!rq)
rq = cfqq->next_rq;
+   else
+   cfq_log_cfqq(cfqq->cfqd, cfqq, "fifo=%p", rq);
 
/*
 * insert request into driver dispatch list
@@ -3962,7 +3969,7 @@ cfq_should_preempt(struct cfq_data *cfqd
 * if the new request is sync, but the currently running queue is
 * not, let the sync request have priority.
 */
-   if (rq_is_sync(rq) && !cfq_cfqq_sync(cfqq))
+   if (rq_is_sync(rq) && !cfq_cfqq_sync(cfqq) && 
!cfq_cfqq_must_dispatch(cfqq))
return true;
 
/*


--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4.8 57/57] cfq: fix starvation of asynchronous writes

2016-10-21 Thread Greg Kroah-Hartman
--- count
 0 | 11
 1 |161
 2 |@  1966
 4 |@54
 8 | 36
16 |  7
32 |  0
64 |  0
   ~
  1024 |  0
  2048 |  0
  4096 |  1
  8192 |  1
 16384 |  2
 32768 |  0
 65536 |  0
131072 |  1
262144 |  0
524288 |  0

Signed-off-by: Glauber Costa <glau...@scylladb.com>
CC: Jens Axboe <ax...@kernel.dk>
CC: linux-block@vger.kernel.org
CC: linux-ker...@vger.kernel.org
Signed-off-by: Glauber Costa <glau...@scylladb.com>
Signed-off-by: Jens Axboe <ax...@fb.com>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
 block/cfq-iosched.c |   13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -3042,7 +3042,6 @@ static struct request *cfq_check_fifo(st
if (ktime_get_ns() < rq->fifo_time)
rq = NULL;
 
-   cfq_log_cfqq(cfqq->cfqd, cfqq, "fifo=%p", rq);
return rq;
 }
 
@@ -3420,6 +3419,9 @@ static bool cfq_may_dispatch(struct cfq_
 {
unsigned int max_dispatch;
 
+   if (cfq_cfqq_must_dispatch(cfqq))
+   return true;
+
/*
 * Drain async requests before we start sync IO
 */
@@ -3511,15 +3513,20 @@ static bool cfq_dispatch_request(struct
 
BUG_ON(RB_EMPTY_ROOT(>sort_list));
 
+   rq = cfq_check_fifo(cfqq);
+   if (rq)
+   cfq_mark_cfqq_must_dispatch(cfqq);
+
if (!cfq_may_dispatch(cfqd, cfqq))
return false;
 
/*
 * follow expired path, else get first next available
 */
-   rq = cfq_check_fifo(cfqq);
if (!rq)
rq = cfqq->next_rq;
+   else
+   cfq_log_cfqq(cfqq->cfqd, cfqq, "fifo=%p", rq);
 
/*
 * insert request into driver dispatch list
@@ -3989,7 +3996,7 @@ cfq_should_preempt(struct cfq_data *cfqd
 * if the new request is sync, but the currently running queue is
 * not, let the sync request have priority.
 */
-   if (rq_is_sync(rq) && !cfq_cfqq_sync(cfqq))
+   if (rq_is_sync(rq) && !cfq_cfqq_sync(cfqq) && 
!cfq_cfqq_must_dispatch(cfqq))
return true;
 
/*


--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4.4 25/25] cfq: fix starvation of asynchronous writes

2016-10-21 Thread Greg Kroah-Hartman
--- count
 0 | 11
 1 |161
 2 |@  1966
 4 |@54
 8 | 36
16 |  7
32 |  0
64 |  0
   ~
  1024 |  0
  2048 |  0
  4096 |  1
  8192 |  1
 16384 |  2
 32768 |  0
 65536 |  0
131072 |  1
262144 |  0
524288 |  0

Signed-off-by: Glauber Costa <glau...@scylladb.com>
CC: Jens Axboe <ax...@kernel.dk>
CC: linux-block@vger.kernel.org
CC: linux-ker...@vger.kernel.org
Signed-off-by: Glauber Costa <glau...@scylladb.com>
Signed-off-by: Jens Axboe <ax...@fb.com>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
 block/cfq-iosched.c |   13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -3003,7 +3003,6 @@ static struct request *cfq_check_fifo(st
if (time_before(jiffies, rq->fifo_time))
rq = NULL;
 
-   cfq_log_cfqq(cfqq->cfqd, cfqq, "fifo=%p", rq);
return rq;
 }
 
@@ -3377,6 +3376,9 @@ static bool cfq_may_dispatch(struct cfq_
 {
unsigned int max_dispatch;
 
+   if (cfq_cfqq_must_dispatch(cfqq))
+   return true;
+
/*
 * Drain async requests before we start sync IO
 */
@@ -3468,15 +3470,20 @@ static bool cfq_dispatch_request(struct
 
BUG_ON(RB_EMPTY_ROOT(>sort_list));
 
+   rq = cfq_check_fifo(cfqq);
+   if (rq)
+   cfq_mark_cfqq_must_dispatch(cfqq);
+
if (!cfq_may_dispatch(cfqd, cfqq))
return false;
 
/*
 * follow expired path, else get first next available
 */
-   rq = cfq_check_fifo(cfqq);
if (!rq)
rq = cfqq->next_rq;
+   else
+   cfq_log_cfqq(cfqq->cfqd, cfqq, "fifo=%p", rq);
 
/*
 * insert request into driver dispatch list
@@ -3944,7 +3951,7 @@ cfq_should_preempt(struct cfq_data *cfqd
 * if the new request is sync, but the currently running queue is
 * not, let the sync request have priority.
 */
-   if (rq_is_sync(rq) && !cfq_cfqq_sync(cfqq))
+   if (rq_is_sync(rq) && !cfq_cfqq_sync(cfqq) && 
!cfq_cfqq_must_dispatch(cfqq))
return true;
 
if (new_cfqq->cfqg != cfqq->cfqg)


--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [4.4-stable] lightnvm: put bio before return

2016-09-20 Thread Greg Kroah-Hartman
On Tue, Sep 20, 2016 at 01:05:38PM +0100, Ben Hutchings wrote:
> Please cherry-pick:
> 
> commit 16c6d048d7b74249a4387700887e8adb13028866
> Author: Wenwei Tao 
> Date:   Thu Feb 4 15:13:23 2016 +0100
> 
> lightnvm: put bio before return
> 
> for 4.4-stable only.  (It is included in 4.5 and no earlier stable
> branch has lightnvm.)  This is a follow-up to commit 3bfbc6adbc50
> "lightnvm: add check after mempool allocation" which you already
> cherry-picked in 4.4.21.

Now applied, thanks.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html