Re: recent issues with heavy delete's causing soft lockups
On 11/2/18 2:32 PM, Thomas Fjellstrom wrote: > On Saturday, October 27, 2018 1:20:10 PM MDT Jens Axboe wrote: >> On Oct 27, 2018, at 12:40 PM, Thomas Fjellstrom > [snip] >> >> Can you try 4.19? A patch went in since 4.18 that fixes a starvation issue >> around requeue conditions, which SATA is the one to most often hit. >> >> Jens > > I just had to do a clean, and I have the mq kernel options I mentioned in my > previous mail enabled. (mq should be disabled) and it appears to still be > causing issues. current io scheduler appears to be cfq, and it took that > "make > clean" about 4 minutes, a lot of that time was spent with plasma, intelij, > and > chrome all starved of IO. > > I did switch to a terminal and checked iostat -d 1, and it showed very little > actual io for the time I was looking at it. > > I have no idea what's going on. If you're using cfq, then it's not using mq at all. Maybe do something ala: # perf record -ag -- sleep 10 while the slowdown is happening and then do perf report -g --no-children and see if that yields anything interesting. Sounds like time is being spent elsewhere and you aren't actually waiting on IO. -- Jens Axboe
Re: recent issues with heavy delete's causing soft lockups
On Saturday, October 27, 2018 1:20:10 PM MDT Jens Axboe wrote: > On Oct 27, 2018, at 12:40 PM, Thomas Fjellstrom [snip] > > Can you try 4.19? A patch went in since 4.18 that fixes a starvation issue > around requeue conditions, which SATA is the one to most often hit. > > Jens I just had to do a clean, and I have the mq kernel options I mentioned in my previous mail enabled. (mq should be disabled) and it appears to still be causing issues. current io scheduler appears to be cfq, and it took that "make clean" about 4 minutes, a lot of that time was spent with plasma, intelij, and chrome all starved of IO. I did switch to a terminal and checked iostat -d 1, and it showed very little actual io for the time I was looking at it. I have no idea what's going on. -- Thomas Fjellstrom tho...@fjellstrom.ca
Re: [GIT PULL] Final block merge window changes/fixes
On Fri, Nov 2, 2018 at 10:08 AM Jens Axboe wrote: > > The biggest part of this pull request is the revert of the blkcg cleanup > series. It had one fix earlier for a stacked device issue, but another > one was reported. Rather than play whack-a-mole with this, revert the > entire series and try again for the next kernel release. > > Apart from that, only small fixes/changes. Pulled, Linus
Re: recent issues with heavy delete's causing soft lockups
On Saturday, October 27, 2018 1:20:10 PM MDT Jens Axboe wrote: > On Oct 27, 2018, at 12:40 PM, Thomas Fjellstrom wrote: > > Hi [snip explanation of problem] > > Can you try 4.19? A patch went in since 4.18 that fixes a starvation issue > around requeue conditions, which SATA is the one to most often hit. Gave it a shot. with the vanila kernel from git linux-stable/v4.9. It was a bit of a pain as the amdgpu driver seems to be broken for my r9 390 on many kernels, including 4.19. Had to reconfigure to the radeon driver, which I must say seems to work a lot better than it used to. At any rate, it doesn't seem to have helped a lot so far. I did end up adding "scsi_mod.use_blk_mq=0 dm_mod.use_blk_mq=0" to the default kernel boot command line in grub. It seems to have helped a little, but I haven't tested fully with a full delete of the build directory. haven't had time to sit and wait the 40+ minutes it takes to re build the entire thing. And I'm low enough on disk space that I can't easily make a copy of the 109GB build folder. I've got about 25GB free out of 780GB. I'll try and test some more soon. > Jens -- Thomas Fjellstrom tho...@fjellstrom.ca
[GIT PULL] Final block merge window changes/fixes
Hi Linus, The biggest part of this pull request is the revert of the blkcg cleanup series. It had one fix earlier for a stacked device issue, but another one was reported. Rather than play whack-a-mole with this, revert the entire series and try again for the next kernel release. Apart from that, only small fixes/changes. This pull request contains: - Indentation fixup for mtip32xx (Colin Ian King) - The blkcg cleanup series revert (Dennis Zhou) - Two NVMe fixes. One fixing a regression in the nvme request initialization in this merge window, causing nvme-fc to not work. The other is a suspend/resume p2p resource issue (James, Keith) - Fix sg discard merge, allowing us to merge in cases where we didn't before (Jianchao Wang) - Call rq_qos_exit() after the queue is frozen, preventing a hang (Ming) - Fix brd queue setup, fixing an oops if we fail setting up all devices (Ming) Please pull! git://git.kernel.dk/linux-block.git tags/for-linus-20181102 Colin Ian King (1): mtip32xx: clean an indentation issue, remove extraneous tabs Dennis Zhou (1): blkcg: revert blkcg cleanups series James Smart (1): nvme-fc: fix request private initialization Jianchao Wang (1): block: fix the DISCARD request merge Keith Busch (1): nvme-pci: fix conflicting p2p resource adds Ming Lei (2): block: call rq_qos_exit() after queue is frozen block: brd: associate with queue until adding disk Documentation/admin-guide/cgroup-v2.rst | 8 +- block/bfq-cgroup.c | 4 +- block/bfq-iosched.c | 2 +- block/bio.c | 174 +--- block/blk-cgroup.c | 123 +++--- block/blk-core.c| 4 +- block/blk-iolatency.c | 26 - block/blk-merge.c | 46 +++-- block/blk-sysfs.c | 2 - block/blk-throttle.c| 13 ++- block/bounce.c | 4 +- block/cfq-iosched.c | 4 +- drivers/block/brd.c | 16 ++- drivers/block/loop.c| 5 +- drivers/block/mtip32xx/mtip32xx.c | 4 +- drivers/md/raid0.c | 2 +- drivers/nvme/host/fc.c | 2 +- drivers/nvme/host/pci.c | 5 +- fs/buffer.c | 10 +- fs/ext4/page-io.c | 2 +- include/linux/bio.h | 26 ++--- include/linux/blk-cgroup.h | 145 +- include/linux/blk_types.h | 1 + include/linux/cgroup.h | 2 - include/linux/writeback.h | 5 +- kernel/cgroup/cgroup.c | 48 ++--- kernel/trace/blktrace.c | 4 +- mm/page_io.c| 2 +- 28 files changed, 265 insertions(+), 424 deletions(-) -- Jens Axboe
[PATCH 1/4] Revert "irq: add support for allocating (and affinitizing) sets of IRQs"
This reverts commit 1d44f6f43e229ca06bf680aa7eb5ad380eaa5d72. --- drivers/pci/msi.c | 14 -- include/linux/interrupt.h | 4 kernel/irq/affinity.c | 40 +--- 3 files changed, 9 insertions(+), 49 deletions(-) diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index 265ed3e4c920..af24ed50a245 100644 --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -1036,13 +1036,6 @@ static int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec, if (maxvec < minvec) return -ERANGE; - /* -* If the caller is passing in sets, we can't support a range of -* vectors. The caller needs to handle that. -*/ - if (affd && affd->nr_sets && minvec != maxvec) - return -EINVAL; - if (WARN_ON_ONCE(dev->msi_enabled)) return -EINVAL; @@ -1094,13 +1087,6 @@ static int __pci_enable_msix_range(struct pci_dev *dev, if (maxvec < minvec) return -ERANGE; - /* -* If the caller is passing in sets, we can't support a range of -* supported vectors. The caller needs to handle that. -*/ - if (affd && affd->nr_sets && minvec != maxvec) - return -EINVAL; - if (WARN_ON_ONCE(dev->msix_enabled)) return -EINVAL; diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h index ca397ff40836..1d6711c28271 100644 --- a/include/linux/interrupt.h +++ b/include/linux/interrupt.h @@ -247,14 +247,10 @@ struct irq_affinity_notify { * the MSI(-X) vector space * @post_vectors: Don't apply affinity to @post_vectors at end of * the MSI(-X) vector space - * @nr_sets: Length of passed in *sets array - * @sets: Number of affinitized sets */ struct irq_affinity { int pre_vectors; int post_vectors; - int nr_sets; - int *sets; }; #if defined(CONFIG_SMP) diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c index 2046a0f0f0f1..f4f29b9d90ee 100644 --- a/kernel/irq/affinity.c +++ b/kernel/irq/affinity.c @@ -180,7 +180,6 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd) int curvec, usedvecs; cpumask_var_t nmsk, npresmsk, *node_to_cpumask; struct cpumask *masks = NULL; - int i, nr_sets; /* * If there aren't any vectors left after applying the pre/post @@ -211,23 +210,10 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd) get_online_cpus(); build_node_to_cpumask(node_to_cpumask); - /* -* Spread on present CPUs starting from affd->pre_vectors. If we -* have multiple sets, build each sets affinity mask separately. -*/ - nr_sets = affd->nr_sets; - if (!nr_sets) - nr_sets = 1; - - for (i = 0, usedvecs = 0; i < nr_sets; i++) { - int this_vecs = affd->sets ? affd->sets[i] : affvecs; - int nr; - - nr = irq_build_affinity_masks(affd, curvec, this_vecs, - node_to_cpumask, cpu_present_mask, - nmsk, masks + usedvecs); - usedvecs += nr; - } + /* Spread on present CPUs starting from affd->pre_vectors */ + usedvecs = irq_build_affinity_masks(affd, curvec, affvecs, + node_to_cpumask, cpu_present_mask, + nmsk, masks); /* * Spread on non present CPUs starting from the next vector to be @@ -272,21 +258,13 @@ int irq_calc_affinity_vectors(int minvec, int maxvec, const struct irq_affinity { int resv = affd->pre_vectors + affd->post_vectors; int vecs = maxvec - resv; - int set_vecs; + int ret; if (resv > minvec) return 0; - if (affd->nr_sets) { - int i; - - for (i = 0, set_vecs = 0; i < affd->nr_sets; i++) - set_vecs += affd->sets[i]; - } else { - get_online_cpus(); - set_vecs = cpumask_weight(cpu_possible_mask); - put_online_cpus(); - } - - return resv + min(set_vecs, vecs); + get_online_cpus(); + ret = min_t(int, cpumask_weight(cpu_possible_mask), vecs) + resv; + put_online_cpus(); + return ret; } -- 2.9.5
Re: [GIT PULL] nvme fixes for 4.20
On 11/2/18 12:37 AM, Christoph Hellwig wrote: > The following changes since commit a5185607787e030fcb0009194d3b12f8bcca59d6: > > block: brd: associate with queue until adding disk (2018-10-31 08:43:09 > -0600) > > are available in the Git repository at: > > git://git.infradead.org/nvme.git nvme-4.20 > > for you to fetch changes up to ae172db3b3f389c363ec7f3683b2cad41091580d: > > nvme-pci: fix conflicting p2p resource adds (2018-11-01 08:44:47 +0200) > > > James Smart (1): > nvme-fc: fix request private initialization > > Keith Busch (1): > nvme-pci: fix conflicting p2p resource adds > > drivers/nvme/host/fc.c | 2 +- > drivers/nvme/host/pci.c | 5 - > 2 files changed, 5 insertions(+), 2 deletions(-) Applied these manually, since I had rebased for-linus yesterday to drop a buggy patch from Ming. JFYI. -- Jens Axboe
Re: [PATCH v2] block: BFQ default for single queue devices
Hi. On 16.10.2018 19:35, Jens Axboe wrote: Do you have anything more recent? All of these predate the current code (by a lot), and isn't even mq. I'm mostly just interested in plain fast NVMe device, and a big box hardware raid setup with a ton of drives. I do still think that this should be going through the distros, they need to be the ones driving this, as they will ultimately be the ones getting customer reports on regressions. The qual/test cycle they do is useful for this. In mainline, if we make a change like this, we'll figure out if it worked many releases down the line. Some benchmarks here for a non-RAID setup obtained by S suite. This is from Lenovo T460s with SAMSUNG MZNTY256HDHP-000L7 SSD. v4.19 kernel is running with all recent BFQ patches applied. # replayed gnome terminal startup throughput # Workload bfq mq-deadline 0r-raw_seq 13.2617 13.4867 10r-raw_seq512.507 539.95 # replayed gnome terminal startup time # Workload bfq mq-deadline 0r-raw_seq0.43 0.4 10r-raw_seq 0.685 4.1625 # replayed lowriter startup throughput # Workload bfq mq-deadline 0r-raw_seq 9.985 10.375 10r-raw_seq 516.62 539.61 # replayed lowriter startup time # Workload bfq mq-deadline 0r-raw_seq 0.4 0.3875 10r-raw_seq 0.535 2.3875 # replayed xterm startup throughput # Workload bfq mq-deadline 0r-raw_seq 5.93833 6.10834 10r-raw_seq524.447 539.991 # replayed xterm startup time # Workload bfq mq-deadline 0r-raw_seq0.230.23 10r-raw_seq 0.381.56 # throughput # Workload bfq mq-deadline 10r-raw_rand 362.446 363.817 10r-raw_seq537.646 540.609 1r-raw_seq 500.733 502.526 Throughput-wise, BFQ is on-par with mq-deadline. Latency-wise, BFQ is much-much better. -- Oleksandr Natalenko (post-factum)
[GIT PULL] nvme fixes for 4.20
The following changes since commit a5185607787e030fcb0009194d3b12f8bcca59d6: block: brd: associate with queue until adding disk (2018-10-31 08:43:09 -0600) are available in the Git repository at: git://git.infradead.org/nvme.git nvme-4.20 for you to fetch changes up to ae172db3b3f389c363ec7f3683b2cad41091580d: nvme-pci: fix conflicting p2p resource adds (2018-11-01 08:44:47 +0200) James Smart (1): nvme-fc: fix request private initialization Keith Busch (1): nvme-pci: fix conflicting p2p resource adds drivers/nvme/host/fc.c | 2 +- drivers/nvme/host/pci.c | 5 - 2 files changed, 5 insertions(+), 2 deletions(-)