Re: recent issues with heavy delete's causing soft lockups

2018-11-02 Thread Jens Axboe
On 11/2/18 2:32 PM, Thomas Fjellstrom wrote:
> On Saturday, October 27, 2018 1:20:10 PM MDT Jens Axboe wrote:
>> On Oct 27, 2018, at 12:40 PM, Thomas Fjellstrom  
> [snip]
>>
>> Can you try 4.19? A patch went in since 4.18 that fixes a starvation issue
>> around requeue conditions, which SATA is the one to most often hit.
>>
>> Jens
> 
> I just had to do a clean, and I have the mq kernel options I mentioned in my 
> previous mail enabled. (mq should be disabled) and it appears to still be 
> causing issues. current io scheduler appears to be cfq, and it took that 
> "make 
> clean" about 4 minutes, a lot of that time was spent with plasma, intelij, 
> and 
> chrome all starved of IO. 
> 
> I did switch to a terminal and checked iostat -d 1, and it showed very little 
> actual io for the time I was looking at it.
> 
> I have no idea what's going on.

If you're using cfq, then it's not using mq at all. Maybe do something ala:

# perf record -ag -- sleep 10

while the slowdown is happening and then do perf report -g --no-children and
see if that yields anything interesting. Sounds like time is being spent
elsewhere and you aren't actually waiting on IO.

-- 
Jens Axboe



Re: recent issues with heavy delete's causing soft lockups

2018-11-02 Thread Thomas Fjellstrom
On Saturday, October 27, 2018 1:20:10 PM MDT Jens Axboe wrote:
> On Oct 27, 2018, at 12:40 PM, Thomas Fjellstrom  
[snip]
> 
> Can you try 4.19? A patch went in since 4.18 that fixes a starvation issue
> around requeue conditions, which SATA is the one to most often hit.
> 
> Jens

I just had to do a clean, and I have the mq kernel options I mentioned in my 
previous mail enabled. (mq should be disabled) and it appears to still be 
causing issues. current io scheduler appears to be cfq, and it took that "make 
clean" about 4 minutes, a lot of that time was spent with plasma, intelij, and 
chrome all starved of IO. 

I did switch to a terminal and checked iostat -d 1, and it showed very little 
actual io for the time I was looking at it.

I have no idea what's going on.

-- 
Thomas Fjellstrom
tho...@fjellstrom.ca





Re: [GIT PULL] Final block merge window changes/fixes

2018-11-02 Thread Linus Torvalds
On Fri, Nov 2, 2018 at 10:08 AM Jens Axboe  wrote:
>
> The biggest part of this pull request is the revert of the blkcg cleanup
> series. It had one fix earlier for a stacked device issue, but another
> one was reported. Rather than play whack-a-mole with this, revert the
> entire series and try again for the next kernel release.
>
> Apart from that, only small fixes/changes.

Pulled,

  Linus


Re: recent issues with heavy delete's causing soft lockups

2018-11-02 Thread Thomas Fjellstrom
On Saturday, October 27, 2018 1:20:10 PM MDT Jens Axboe wrote:
> On Oct 27, 2018, at 12:40 PM, Thomas Fjellstrom  
wrote:
> > Hi
[snip explanation of problem]
> 
> Can you try 4.19? A patch went in since 4.18 that fixes a starvation issue
> around requeue conditions, which SATA is the one to most often hit.

Gave it a shot. with the vanila kernel from git linux-stable/v4.9. It was a 
bit of a pain as the amdgpu driver seems to be broken for my r9 390 on many 
kernels, including 4.19. Had to reconfigure to the radeon driver, which I must 
say seems to work a lot better than it used to.

At any rate, it doesn't seem to have helped a lot so far. I did end up adding 
"scsi_mod.use_blk_mq=0 dm_mod.use_blk_mq=0" to the default kernel boot command 
line in grub. It seems to have helped a little, but I haven't tested fully 
with a full delete of the build directory. haven't had time to sit and wait 
the 40+ minutes it takes to re build the entire thing. And I'm low enough on 
disk space that I can't easily make a copy of the 109GB build folder. I've got 
about 25GB free out of 780GB. I'll try and test some more soon.

> Jens


-- 
Thomas Fjellstrom
tho...@fjellstrom.ca





[GIT PULL] Final block merge window changes/fixes

2018-11-02 Thread Jens Axboe
Hi Linus,

The biggest part of this pull request is the revert of the blkcg cleanup
series. It had one fix earlier for a stacked device issue, but another
one was reported. Rather than play whack-a-mole with this, revert the
entire series and try again for the next kernel release.

Apart from that, only small fixes/changes. This pull request contains:

- Indentation fixup for mtip32xx (Colin Ian King)

- The blkcg cleanup series revert (Dennis Zhou)

- Two NVMe fixes. One fixing a regression in the nvme request
  initialization in this merge window, causing nvme-fc to not work. The
  other is a suspend/resume p2p resource issue (James, Keith)

- Fix sg discard merge, allowing us to merge in cases where we didn't
  before (Jianchao Wang)

- Call rq_qos_exit() after the queue is frozen, preventing a hang (Ming)

- Fix brd queue setup, fixing an oops if we fail setting up all devices
  (Ming)

Please pull!


  git://git.kernel.dk/linux-block.git tags/for-linus-20181102



Colin Ian King (1):
  mtip32xx: clean an indentation issue, remove extraneous tabs

Dennis Zhou (1):
  blkcg: revert blkcg cleanups series

James Smart (1):
  nvme-fc: fix request private initialization

Jianchao Wang (1):
  block: fix the DISCARD request merge

Keith Busch (1):
  nvme-pci: fix conflicting p2p resource adds

Ming Lei (2):
  block: call rq_qos_exit() after queue is frozen
  block: brd: associate with queue until adding disk

 Documentation/admin-guide/cgroup-v2.rst |   8 +-
 block/bfq-cgroup.c  |   4 +-
 block/bfq-iosched.c |   2 +-
 block/bio.c | 174 +---
 block/blk-cgroup.c  | 123 +++---
 block/blk-core.c|   4 +-
 block/blk-iolatency.c   |  26 -
 block/blk-merge.c   |  46 +++--
 block/blk-sysfs.c   |   2 -
 block/blk-throttle.c|  13 ++-
 block/bounce.c  |   4 +-
 block/cfq-iosched.c |   4 +-
 drivers/block/brd.c |  16 ++-
 drivers/block/loop.c|   5 +-
 drivers/block/mtip32xx/mtip32xx.c   |   4 +-
 drivers/md/raid0.c  |   2 +-
 drivers/nvme/host/fc.c  |   2 +-
 drivers/nvme/host/pci.c |   5 +-
 fs/buffer.c |  10 +-
 fs/ext4/page-io.c   |   2 +-
 include/linux/bio.h |  26 ++---
 include/linux/blk-cgroup.h  | 145 +-
 include/linux/blk_types.h   |   1 +
 include/linux/cgroup.h  |   2 -
 include/linux/writeback.h   |   5 +-
 kernel/cgroup/cgroup.c  |  48 ++---
 kernel/trace/blktrace.c |   4 +-
 mm/page_io.c|   2 +-
 28 files changed, 265 insertions(+), 424 deletions(-)

-- 
Jens Axboe



[PATCH 1/4] Revert "irq: add support for allocating (and affinitizing) sets of IRQs"

2018-11-02 Thread Ming Lei
This reverts commit 1d44f6f43e229ca06bf680aa7eb5ad380eaa5d72.
---
 drivers/pci/msi.c | 14 --
 include/linux/interrupt.h |  4 
 kernel/irq/affinity.c | 40 +---
 3 files changed, 9 insertions(+), 49 deletions(-)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 265ed3e4c920..af24ed50a245 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -1036,13 +1036,6 @@ static int __pci_enable_msi_range(struct pci_dev *dev, 
int minvec, int maxvec,
if (maxvec < minvec)
return -ERANGE;
 
-   /*
-* If the caller is passing in sets, we can't support a range of
-* vectors. The caller needs to handle that.
-*/
-   if (affd && affd->nr_sets && minvec != maxvec)
-   return -EINVAL;
-
if (WARN_ON_ONCE(dev->msi_enabled))
return -EINVAL;
 
@@ -1094,13 +1087,6 @@ static int __pci_enable_msix_range(struct pci_dev *dev,
if (maxvec < minvec)
return -ERANGE;
 
-   /*
-* If the caller is passing in sets, we can't support a range of
-* supported vectors. The caller needs to handle that.
-*/
-   if (affd && affd->nr_sets && minvec != maxvec)
-   return -EINVAL;
-
if (WARN_ON_ONCE(dev->msix_enabled))
return -EINVAL;
 
diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index ca397ff40836..1d6711c28271 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -247,14 +247,10 @@ struct irq_affinity_notify {
  * the MSI(-X) vector space
  * @post_vectors:  Don't apply affinity to @post_vectors at end of
  * the MSI(-X) vector space
- * @nr_sets:   Length of passed in *sets array
- * @sets:  Number of affinitized sets
  */
 struct irq_affinity {
int pre_vectors;
int post_vectors;
-   int nr_sets;
-   int *sets;
 };
 
 #if defined(CONFIG_SMP)
diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index 2046a0f0f0f1..f4f29b9d90ee 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -180,7 +180,6 @@ irq_create_affinity_masks(int nvecs, const struct 
irq_affinity *affd)
int curvec, usedvecs;
cpumask_var_t nmsk, npresmsk, *node_to_cpumask;
struct cpumask *masks = NULL;
-   int i, nr_sets;
 
/*
 * If there aren't any vectors left after applying the pre/post
@@ -211,23 +210,10 @@ irq_create_affinity_masks(int nvecs, const struct 
irq_affinity *affd)
get_online_cpus();
build_node_to_cpumask(node_to_cpumask);
 
-   /*
-* Spread on present CPUs starting from affd->pre_vectors. If we
-* have multiple sets, build each sets affinity mask separately.
-*/
-   nr_sets = affd->nr_sets;
-   if (!nr_sets)
-   nr_sets = 1;
-
-   for (i = 0, usedvecs = 0; i < nr_sets; i++) {
-   int this_vecs = affd->sets ? affd->sets[i] : affvecs;
-   int nr;
-
-   nr = irq_build_affinity_masks(affd, curvec, this_vecs,
- node_to_cpumask, cpu_present_mask,
- nmsk, masks + usedvecs);
-   usedvecs += nr;
-   }
+   /* Spread on present CPUs starting from affd->pre_vectors */
+   usedvecs = irq_build_affinity_masks(affd, curvec, affvecs,
+   node_to_cpumask, cpu_present_mask,
+   nmsk, masks);
 
/*
 * Spread on non present CPUs starting from the next vector to be
@@ -272,21 +258,13 @@ int irq_calc_affinity_vectors(int minvec, int maxvec, 
const struct irq_affinity
 {
int resv = affd->pre_vectors + affd->post_vectors;
int vecs = maxvec - resv;
-   int set_vecs;
+   int ret;
 
if (resv > minvec)
return 0;
 
-   if (affd->nr_sets) {
-   int i;
-
-   for (i = 0, set_vecs = 0;  i < affd->nr_sets; i++)
-   set_vecs += affd->sets[i];
-   } else {
-   get_online_cpus();
-   set_vecs = cpumask_weight(cpu_possible_mask);
-   put_online_cpus();
-   }
-
-   return resv + min(set_vecs, vecs);
+   get_online_cpus();
+   ret = min_t(int, cpumask_weight(cpu_possible_mask), vecs) + resv;
+   put_online_cpus();
+   return ret;
 }
-- 
2.9.5



Re: [GIT PULL] nvme fixes for 4.20

2018-11-02 Thread Jens Axboe
On 11/2/18 12:37 AM, Christoph Hellwig wrote:
> The following changes since commit a5185607787e030fcb0009194d3b12f8bcca59d6:
> 
>   block: brd: associate with queue until adding disk (2018-10-31 08:43:09 
> -0600)
> 
> are available in the Git repository at:
> 
>   git://git.infradead.org/nvme.git nvme-4.20
> 
> for you to fetch changes up to ae172db3b3f389c363ec7f3683b2cad41091580d:
> 
>   nvme-pci: fix conflicting p2p resource adds (2018-11-01 08:44:47 +0200)
> 
> 
> James Smart (1):
>   nvme-fc: fix request private initialization
> 
> Keith Busch (1):
>   nvme-pci: fix conflicting p2p resource adds
> 
>  drivers/nvme/host/fc.c  | 2 +-
>  drivers/nvme/host/pci.c | 5 -
>  2 files changed, 5 insertions(+), 2 deletions(-)

Applied these manually, since I had rebased for-linus yesterday to drop
a buggy patch from Ming. JFYI.

-- 
Jens Axboe



Re: [PATCH v2] block: BFQ default for single queue devices

2018-11-02 Thread Oleksandr Natalenko

Hi.

On 16.10.2018 19:35, Jens Axboe wrote:

Do you have anything more recent? All of these predate the current
code (by a lot), and isn't even mq. I'm mostly just interested in
plain fast NVMe device, and a big box hardware raid setup with
a ton of drives.

I do still think that this should be going through the distros, they
need to be the ones driving this, as they will ultimately be the
ones getting customer reports on regressions. The qual/test cycle
they do is useful for this. In mainline, if we make a change like
this, we'll figure out if it worked many releases down the line.


Some benchmarks here for a non-RAID setup obtained by S suite. This is 
from Lenovo T460s with SAMSUNG MZNTY256HDHP-000L7 SSD. v4.19 kernel is 
running with all recent BFQ patches applied.


# replayed gnome terminal startup throughput
# Workload   bfq mq-deadline
  0r-raw_seq 13.2617 13.4867
  10r-raw_seq512.507  539.95

# replayed gnome terminal startup time
# Workload   bfq mq-deadline
  0r-raw_seq0.43 0.4
  10r-raw_seq   0.685 4.1625

# replayed lowriter startup throughput
# Workload   bfq mq-deadline
  0r-raw_seq   9.985  10.375
  10r-raw_seq 516.62  539.61

# replayed lowriter startup time
# Workload   bfq mq-deadline
  0r-raw_seq 0.4  0.3875
  10r-raw_seq  0.535  2.3875

# replayed xterm startup throughput
# Workload   bfq mq-deadline
  0r-raw_seq 5.93833 6.10834
  10r-raw_seq524.447 539.991

# replayed xterm startup time
# Workload   bfq mq-deadline
  0r-raw_seq0.230.23
  10r-raw_seq   0.381.56

# throughput
# Workload   bfq mq-deadline
  10r-raw_rand   362.446 363.817
  10r-raw_seq537.646 540.609
  1r-raw_seq 500.733 502.526

Throughput-wise, BFQ is on-par with mq-deadline. Latency-wise, BFQ is 
much-much better.


--
  Oleksandr Natalenko (post-factum)


[GIT PULL] nvme fixes for 4.20

2018-11-02 Thread Christoph Hellwig
The following changes since commit a5185607787e030fcb0009194d3b12f8bcca59d6:

  block: brd: associate with queue until adding disk (2018-10-31 08:43:09 -0600)

are available in the Git repository at:

  git://git.infradead.org/nvme.git nvme-4.20

for you to fetch changes up to ae172db3b3f389c363ec7f3683b2cad41091580d:

  nvme-pci: fix conflicting p2p resource adds (2018-11-01 08:44:47 +0200)


James Smart (1):
  nvme-fc: fix request private initialization

Keith Busch (1):
  nvme-pci: fix conflicting p2p resource adds

 drivers/nvme/host/fc.c  | 2 +-
 drivers/nvme/host/pci.c | 5 -
 2 files changed, 5 insertions(+), 2 deletions(-)