Re: [PATCH RFC 01/15] MIPS: replace **** with a hug

2018-11-30 Thread Mike Galbraith
On Fri, 2018-11-30 at 11:27 -0800, Jarkko Sakkinen wrote: > In order to comply with the CoC, replace with a hug. > > Signed-off-by: Jarkko Sakkinen > --- > arch/mips/pci/ops-bridge.c | 24 > arch/mips/sgi-ip22/ip22-setup.c | 2 +- > 2 files changed, 13

Re: memcg oops: memcg_kmem_charge_memcg()->try_charge()->page_counter_try_charge()->BOOM

2018-10-29 Thread Mike Galbraith
On Mon, 2018-10-29 at 21:49 +, Roman Gushchin wrote: > On Mon, Oct 29, 2018 at 09:46:54PM +0100, Mike Galbraith wrote: > > > Ah, I have cgroup_disable=memory on the command line, which turns out > > to be why your box doesn't explode, while mine does. > > Yeah, here

Re: memcg oops: memcg_kmem_charge_memcg()->try_charge()->page_counter_try_charge()->BOOM

2018-10-29 Thread Mike Galbraith
On Mon, 2018-10-29 at 18:54 +, Roman Gushchin wrote: > > Hi Mike! > > Thank you for the report! > > Do you see it reliable every time you boot up the machine? Yeah. > How do you run kvm? My VMs are full SW/data clones of my i7-4790/openSUSE box. > Is there something special about your

Re: memcg oops: memcg_kmem_charge_memcg()->try_charge()->page_counter_try_charge()->BOOM

2018-10-29 Thread Mike Galbraith
On Mon, 2018-10-29 at 14:20 +0100, Michal Hocko wrote: > > > [4.420976] Code: f3 c3 0f 1f 00 0f 1f 44 00 00 48 85 ff 0f 84 a8 00 00 > > 00 41 56 48 89 f8 41 55 49 89 fe 41 54 49 89 d5 55 49 89 f4 53 48 89 f3 > > 48 0f c1 1f 48 01 f3 48 39 5f 18 48 89 fd 73 17 eb 41 48 89 e8 > > [

Re: [PATCH RT 08/22] Revert "x86: UV: raw_spinlock conversion"

2018-09-06 Thread Mike Galbraith
On Thu, 2018-09-06 at 09:35 +0200, Sebastian Andrzej Siewior wrote: > On 2018-09-05 08:28:02 [-0400], Steven Rostedt wrote: > > 4.14.63-rt41-rc1 stable review patch. > > If anyone has any objections, please let me know. > > > > -- > > > > From: Sebastian Andrzej Siewior > > > >

Re: [regression/bisected] 4.19 cycle boot time IO stalls

2018-09-05 Thread Mike Galbraith
On Wed, 2018-09-05 at 07:39 -0600, Jens Axboe wrote: > > I bet it's the host busy change from Ming, which I already > reported as being the culprit for another test failure I had. For > some reason it's not merged yet, nudge nudge Martin. You can test > by reverting: > > commit

[regression/bisected] 4.19 cycle boot time IO stalls

2018-09-05 Thread Mike Galbraith
Greetings, I've been seeing $subject, decided to take the time to try to bisect the little bugger. The hangs are not 100% repeatable, and while bisection with a 5 boot go/nogo threshold seemed to go smoothly, it ended up fingering a merge commit (sigh). Box has an SSD (unused only by windows 10

Re: bisected - arm64 kvm unit test failures

2018-08-22 Thread Mike Galbraith
On Wed, 2018-08-22 at 14:50 +0100, Marc Zyngier wrote: > On 22/08/18 14:38, Mike Galbraith wrote: > > On Tue, 2018-08-21 at 16:34 +0100, Marc Zyngier wrote: > >> Could you give that patchlet[1] a go? It solves a similar issue for me > >> on a different pl

Re: bisected - arm64 kvm unit test failures

2018-08-22 Thread Mike Galbraith
On Tue, 2018-08-21 at 16:34 +0100, Marc Zyngier wrote: > Could you give that patchlet[1] a go? It solves a similar issue for me > on a different platform. > > [1] https://lists.cs.columbia.edu/pipermail/kvmarm/2018-August/032469.html Yup, all better. -Mike

Re: [BUG v4.14-rt] kernel BUG at /work/rt/stable-rt.git/kernel/sched/core.c:1639!

2018-08-19 Thread Mike Galbraith
On Sat, 2018-08-18 at 15:13 +0200, Mike Galbraith wrote: > seems it has be something from the 4.17 cycle that went back to 4.14- > stable after 4.1[56]-stable trees went extinct. See ("sched/core: Require cpu_active() in select_task_rq(), for user tasks") Fix it like so? sch

Re: [BUG v4.14-rt] kernel BUG at /work/rt/stable-rt.git/kernel/sched/core.c:1639!

2018-08-18 Thread Mike Galbraith
On Sat, 2018-08-18 at 12:29 +0200, Mike Galbraith wrote: > On Fri, 2018-08-17 at 16:23 -0400, Steven Rostedt wrote: > > Pulling in stable releases into v4.14-rt I triggered this with my CPU > > hotplug test: > > > > [ cut here ] > > ker

Re: [BUG v4.14-rt] kernel BUG at /work/rt/stable-rt.git/kernel/sched/core.c:1639!

2018-08-18 Thread Mike Galbraith
On Fri, 2018-08-17 at 16:23 -0400, Steven Rostedt wrote: > Pulling in stable releases into v4.14-rt I triggered this with my CPU > hotplug test: > > [ cut here ] > kernel BUG at /work/rt/stable-rt.git/kernel/sched/core.c:1639! > invalid opcode: [#1] PREEMPT SMP PTI >

[PATCH] rcu: Convert rcu_state.ofl_lock to raw_spinlock_t

2018-08-15 Thread Mike Galbraith
1e64b15a4b10 ("rcu: Fix grace-period hangs due to race with CPU offline") added spinlock_t ofl_lock to the rcu_state structure, then takes it with preemption disabled during CPU offline, giving RT sleeping lock heartburn. Convert it to raw_spinlock_t. Signed-off-by: Mike Galbraith -

Re: [PATCH] x86, kdump: Fix efi=noruntime NULL pointer dereference

2018-08-14 Thread Mike Galbraith
On Wed, 2018-08-15 at 11:59 +0800, Dave Young wrote: > > Does this improve things, and plug the no boot hole? > > Would you mind to tune my patch with some acpi_rsdp checking and add > some error message in case kexec load failure? Eg. suggest people to use > append acpi_rsdp for noefi booting

Re: [PATCH] x86, kdump: Fix efi=noruntime NULL pointer dereference

2018-08-10 Thread Mike Galbraith
a when a 1:1 mapping is available. Bail early with -ENODEV if not available, but is required to boot, and acpi_rsdp= was not passed on the command line. 3. Use the proper config dependency to isolate efi setup functions, adding a !EFI_RUNTIME_MAP stub for setup_efi_state(). 4. Change efi functions that

Re: [PATCH] x86, kdump: Fix efi=noruntime NULL pointer dereference

2018-08-10 Thread Mike Galbraith
On Fri, 2018-08-10 at 16:45 +0800, Dave Young wrote: > > BTW, this patch only fix the kexec load phase problem, even if kexec > load successfully with the fix, the 2nd kernel can not boot because efi > memmap info is not correct and usable. Hm. I didn't do anything else with kexec, but did

[PATCH] x86, kdump: Fix efi=noruntime NULL pointer dereference

2018-08-08 Thread Mike Galbraith
When booting with efi=noruntime, we call efi_runtime_map_copy() while loading the kdump kernel, and trip over a NULL efi.memmap.map. Avoid that and a useless allocation when the only mapping we can use (1:1) is not available. Signed-off-by: Mike Galbraith --- arch/x86/kernel/kexec-bzimage64.c

Re: [rt-patch 4/3] arm,KVM: Move phys_timer handling to hard irq context

2018-08-04 Thread Mike Galbraith
On Sat, 2018-08-04 at 14:25 +0200, Mike Galbraith wrote: > > Besides, there are more interesting fish in the arm64 sea than kvm. > > virgin 4.16.18-rt12-rt > > [ 537.236131] ITS queue timeout (65440 65504 4640) > [ 537.236150] ITS cmd its_build_inv_cmd failed

Re: [rt-patch 4/3] arm,KVM: Move phys_timer handling to hard irq context

2018-08-04 Thread Mike Galbraith
On Thu, 2018-08-02 at 19:43 +0200, Mike Galbraith wrote: > On Thu, 2018-08-02 at 18:50 +0200, Mike Galbraith wrote: > > On Thu, 2018-08-02 at 12:31 -0400, Steven Rostedt wrote: > > > On Thu, 02 Aug 2018 08:56:20 +0200 > > > Mike Galbraith wrote: > > > &g

Re: [rt-patch 4/3] arm,KVM: Move phys_timer handling to hard irq context

2018-08-02 Thread Mike Galbraith
On Thu, 2018-08-02 at 18:50 +0200, Mike Galbraith wrote: > On Thu, 2018-08-02 at 12:31 -0400, Steven Rostedt wrote: > > On Thu, 02 Aug 2018 08:56:20 +0200 > > Mike Galbraith wrote: > > > > > (arm-land adventures 1/3 take2 will have to wait, my cup runeth over) &

Re: [rt-patch 4/3] arm,KVM: Move phys_timer handling to hard irq context

2018-08-02 Thread Mike Galbraith
On Thu, 2018-08-02 at 12:31 -0400, Steven Rostedt wrote: > On Thu, 02 Aug 2018 08:56:20 +0200 > Mike Galbraith wrote: > > > (arm-land adventures 1/3 take2 will have to wait, my cup runeth over) > > > > v4.14..v4.15 timer handling changes including calling kvm_ti

[rt-patch 1/3 v2] arm64/acpi/perf: move pmu allocation to an early CPU up hook

2018-08-02 Thread Mike Galbraith
with the other CPUHP_PERF_{ARCH}_PREPARE stages, where we'll be preemptible, thus no longer requiring a GFP_ATOMIC allocation either. Signed-off-by: Mike Galbraith --- drivers/perf/arm_pmu_acpi.c | 12 ++-- include/linux/cpuhotplug.h |2 +- 2 files changed, 7 insertions(+), 7 deletions

Re: cpu stopper threads and setaffinity leads to deadlock

2018-08-02 Thread Mike Galbraith
On Thu, 2018-08-02 at 10:12 +0200, Peter Zijlstra wrote: > On Wed, Aug 01, 2018 at 06:34:40PM -0700, Sodagudi Prasad wrote: > > diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c > > index e190d1e..f932e1e 100644 > > --- a/kernel/stop_machine.c > > +++ b/kernel/stop_machine.c > > @@ -87,9

[rt-patch 4/3] arm,KVM: Move phys_timer handling to hard irq context

2018-08-02 Thread Mike Galbraith
est-vectors-kernel (2 tests) PASS selftest-vectors-user (2 tests) PASS selftest-smp (65 tests) PASS pci-test (1 tests) PASS pmu (3 tests) PASS gicv2-ipi (3 tests) PASS gicv3-ipi (3 tests) PASS gicv2-active (1 tests) PASS gicv3-active (1 tests) PASS psci (4 tests) PASS timer (8 tests) Signed-off-

Re: bisected - arm64 kvm unit test failures

2018-08-01 Thread Mike Galbraith
On Wed, 2018-08-01 at 08:22 +0100, Marc Zyngier wrote: > On Wed, 01 Aug 2018 07:02:25 +0100, > Mike Galbraith wrote: > > > > [1 ] > > On Wed, 2018-08-01 at 06:35 +0100, Marc Zyngier wrote: > > > > > > Is it something that is repr

Re: bisected - arm64 kvm unit test failures

2018-08-01 Thread Mike Galbraith
On Wed, 2018-08-01 at 08:22 +0100, Marc Zyngier wrote: > > > Box is a 4 node/64 core TaiShan 2280. > > Is that what is also known as D05/HIP07, with 64 Cortex-A72? No idea, our rent-a-box web client shows nothing informative. -Mike

Re: bisected - arm64 kvm unit test failures

2018-08-01 Thread Mike Galbraith
On Wed, 2018-08-01 at 06:35 +0100, Marc Zyngier wrote: > > Is it something that is reproducible with the current mainline (non-RT)? These waters are a bit muddy, it's config dependent. I'm trying to generate a reproducing !RT config for -rc7 as we speak. If I build openSUSE/master-default, it

bisected - arm64 kvm unit test failures

2018-07-31 Thread Mike Galbraith
On Mon, 2018-07-30 at 18:24 +0200, Mike Galbraith wrote: > On Sun, 2018-07-29 at 13:47 +0200, Mike Galbraith wrote: > > FYI, per kvm unit tests, 4.16-rt definitely has more kvm issues. But it's not RT, or rather most of it isn't... > > huawei5:/abuild/mike/kvm-unit-tests # uname

Re: candidates for @devel-rt localversion-rt++

2018-07-30 Thread Mike Galbraith
On Sun, 2018-07-29 at 13:47 +0200, Mike Galbraith wrote: > FYI, per kvm unit tests, 4.16-rt definitely has more kvm issues. > > huawei5:/abuild/mike/kvm-unit-tests # uname -r > 4.16.18-rt11-rt > huawei5:/abuild/mike/kvm-unit-tests # ./run_tests.sh > PASS selftest-setup (2 test

Re: [rt-patch 3/3] arm, KVM: convert vgic_irq.irq_lock to raw_spinlock_t

2018-07-30 Thread Mike Galbraith
On Mon, 2018-07-30 at 11:27 +0200, Peter Zijlstra wrote: > > The thing missing from the Changelog is the analysis that all the work > done under these locks is indeed properly bounded and cannot cause > excessive latencies. True, I have no idea what worst case hold times are. Nothing poked me

Re: candidates for @devel-rt localversion-rt++

2018-07-29 Thread Mike Galbraith
FYI, per kvm unit tests, 4.16-rt definitely has more kvm issues. huawei5:/abuild/mike/kvm-unit-tests # uname -r 4.16.18-rt11-rt huawei5:/abuild/mike/kvm-unit-tests # ./run_tests.sh PASS selftest-setup (2 tests) FAIL selftest-vectors-kernel FAIL selftest-vectors-user PASS selftest-smp (65 tests)

Re: candidates for @devel-rt localversion-rt++

2018-07-29 Thread Mike Galbraith
On Sat, 2018-07-28 at 11:07 +0200, Mike Galbraith wrote: > 1. arm64/acpi/perf: move pmu allocation to an early CPU up hook Nope, it's an ex-candidate. Having found/run kvm unit tests, I discovered that while the above fixes boot time splat, it somehow manages to break kvm pmu tests, so ne

[rt-patch 1/3] arm64/acpi/perf: move pmu allocation to an early CPU up hook

2018-07-28 Thread Mike Galbraith
. Signed-off-by: Mike Galbraith --- drivers/perf/arm_pmu_acpi.c |6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) --- a/drivers/perf/arm_pmu_acpi.c +++ b/drivers/perf/arm_pmu_acpi.c @@ -135,10 +135,10 @@ static struct arm_pmu *arm_pmu_acpi_find return pmu

candidates for @devel-rt localversion-rt++

2018-07-28 Thread Mike Galbraith
1. arm64/acpi/perf: move pmu allocation to an early CPU up hook 2. sched: Introduce raw_cond_resched_lock() 3. arm, KVM: convert vgic_irq.irq_lock to raw_spinlock_t With these applied, 4 socket TaiShan 2280 box boots shiny new -rt11 gripe free, and has been tossed into SUSE's kvm build-bot slave

[rt-patch 2/3] sched: Introduce raw_cond_resched_lock()

2018-07-28 Thread Mike Galbraith
Add raw_cond_resched_lock() infrastructure. Signed-off-by: Mike Galbraith --- include/linux/sched.h | 15 +++ kernel/sched/core.c | 20 2 files changed, 35 insertions(+) --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1779,12 +1779,18

[rt-patch 3/3] arm, KVM: convert vgic_irq.irq_lock to raw_spinlock_t

2018-07-28 Thread Mike Galbraith
gt;ap_list_lock must be taken with IRQs disabled kvm->lpi_list_lock must be taken with IRQs disabled vgic_irq->irq_lock must be taken with IRQs disabled ...meaning vgic_dist.lpi_list_lock and vgic_cpu.ap_list_lock must be converted as we

Re: [PATCH RT v3] arm64: fpsimd: use preemp_disable in addition to local_bh_disable()

2018-07-26 Thread Mike Galbraith
On Thu, 2018-07-26 at 17:06 +0200, Sebastian Andrzej Siewior wrote: > > @@ -1115,6 +1139,7 @@ void kernel_neon_begin(void) > > BUG_ON(!may_use_simd()); > > + preempt_disable(); > local_bh_disable(); > > __this_cpu_write(kernel_neon_busy, true); > @@ -1131,6 +1156,7 @@

Re: [PATCH RT v2] arm64: fpsimd: use a local_lock() in addition to local_bh_disable()

2018-07-18 Thread Mike Galbraith
See pseudo-patch below. That cures the reported gcc gripeage. On Sun, 2018-07-15 at 09:22 +0200, Mike Galbraith wrote: > On Sat, 2018-07-14 at 00:03 +0200, Mike Galbraith wrote: > > On Fri, 2018-07-13 at 19:49 +0200, Sebastian Andrzej Siewior wrote: > > > In v4.16-RT

Re: [PATCH RT v2] arm64: fpsimd: use a local_lock() in addition to local_bh_disable()

2018-07-18 Thread Mike Galbraith
On Wed, 2018-07-18 at 11:27 +0200, Sebastian Andrzej Siewior wrote: > On 2018-07-14 00:03:44 [+0200], Mike Galbraith wrote: > > > This seems to make work (crypto chacha20-neon + cyclictest). I have no > > > EFI so I have no clue if saving SIMD while calling to EFI works. >

Re: [PATCH RT v2] arm64: fpsimd: use a local_lock() in addition to local_bh_disable()

2018-07-15 Thread Mike Galbraith
On Sat, 2018-07-14 at 00:03 +0200, Mike Galbraith wrote: > On Fri, 2018-07-13 at 19:49 +0200, Sebastian Andrzej Siewior wrote: > > In v4.16-RT I noticed a number of warnings from task_fpsimd_load(). The > > code disables BH and expects that it is not preemptible. On -RT the &

Re: [PATCH RT v2] arm64: fpsimd: use a local_lock() in addition to local_bh_disable()

2018-07-13 Thread Mike Galbraith
On Fri, 2018-07-13 at 19:49 +0200, Sebastian Andrzej Siewior wrote: > In v4.16-RT I noticed a number of warnings from task_fpsimd_load(). The > code disables BH and expects that it is not preemptible. On -RT the > task remains preemptible but remains the same CPU. This may corrupt the > content of

Re: [PATCH 1/7] mm: allocate mm_cpumask dynamically based on nr_cpu_ids

2018-07-09 Thread Mike Galbraith
On Mon, 2018-07-09 at 17:38 -0400, Rik van Riel wrote: > > I added your code, and Signed-off-By in patch > 1 for version 5 of the series. No objection, but no need (like taking credit for fixing a typo:).

Re: [PATCH 1/7] mm: allocate mm_cpumask dynamically based on nr_cpu_ids

2018-07-08 Thread Mike Galbraith
BTW, a second gripe ala the first, but wrt mm_init_cpumask(_mm): In function ‘bitmap_zero’, inlined from ‘cpumask_clear’ at ./include/linux/cpumask.h:378:2, inlined from ‘mm_init_cpumask’ at ./include/linux/mm_types.h:504:2, inlined from ‘efi_alloc_page_tables’ at

Re: [PATCH 1/7] mm: allocate mm_cpumask dynamically based on nr_cpu_ids

2018-07-08 Thread Mike Galbraith
On Sat, 2018-07-07 at 17:25 -0400, Rik van Riel wrote: > > > ./include/linux/bitmap.h:208:3: warning: ‘memset’ writing 64 bytes > > into a region of size 0 overflows the destination [-Wstringop- > > overflow=] > >memset(dst, 0, len); > >^~~ > > I don't understand this

Re: [PATCH 5/7] x86,tlb: only send page table free TLB flush to lazy TLB CPUs

2018-07-07 Thread Mike Galbraith
(bah, I see I replied to wrong patch version, but it's still valid) On Sat, 2018-07-07 at 14:26 +0200, Mike Galbraith wrote: > On Fri, 2018-06-29 at 10:29 -0400, Rik van Riel wrote: > > diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c > > index e59214ec52b1..c4073367219d 100644

Re: [PATCH 5/7] x86,tlb: only send page table free TLB flush to lazy TLB CPUs

2018-07-07 Thread Mike Galbraith
On Fri, 2018-06-29 at 10:29 -0400, Rik van Riel wrote: > diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c > index e59214ec52b1..c4073367219d 100644 > --- a/arch/x86/mm/tlb.c > +++ b/arch/x86/mm/tlb.c > @@ -718,14 +718,47 @@ void tlb_flush_remove_tables_local(void *arg) > } > } > >

Re: [PATCH 1/7] mm: allocate mm_cpumask dynamically based on nr_cpu_ids

2018-07-07 Thread Mike Galbraith
On Fri, 2018-07-06 at 17:56 -0400, Rik van Riel wrote: > The mm_struct always contains a cpumask bitmap, regardless of > CONFIG_CPUMASK_OFFSTACK. That means the first step can be to > simplify things, and simply have one bitmask at the end of the > mm_struct for the mm_cpumask. Otherwise virgin

Re: [tip:x86/pti] x86/asm: Pad assembly functions with INT3 instructions

2018-06-17 Thread Mike Galbraith
On Sun, 2018-06-17 at 21:47 +0200, Borislav Petkov wrote: > On Sun, Jun 17, 2018 at 04:02:58PM +0200, Mike Galbraith wrote: > > (/me does that.. all better) > > > > From 6ac281ee69f4cb5b581d5f49662fb56b6326155a Mon Sep 17 00:00:00 2001 > > From: Borislav Petkov >

Re: [tip:x86/pti] x86/asm: Pad assembly functions with INT3 instructions

2018-06-17 Thread Mike Galbraith
On Sun, 2018-06-17 at 15:38 +0200, Mike Galbraith wrote: > On Sun, 2018-06-17 at 14:00 +0200, Borislav Petkov wrote: > > On Sun, Jun 17, 2018 at 01:40:13PM +0200, Mike Galbraith wrote: > > > On Mon, 2018-05-14 at 05:53 -0700, tip-bot for Alexey Dobriyan wrote:

Re: [tip:x86/pti] x86/asm: Pad assembly functions with INT3 instructions

2018-06-17 Thread Mike Galbraith
On Sun, 2018-06-17 at 14:00 +0200, Borislav Petkov wrote: > On Sun, Jun 17, 2018 at 01:40:13PM +0200, Mike Galbraith wrote: > > On Mon, 2018-05-14 at 05:53 -0700, tip-bot for Alexey Dobriyan wrote: > > > Commit-ID: 51bad67ffbce0aaa44579f84ef5d05597054ec6a > > &g

Re: [tip:x86/pti] x86/asm: Pad assembly functions with INT3 instructions

2018-06-17 Thread Mike Galbraith
On Mon, 2018-05-14 at 05:53 -0700, tip-bot for Alexey Dobriyan wrote: > Commit-ID: 51bad67ffbce0aaa44579f84ef5d05597054ec6a > Gitweb: > https://git.kernel.org/tip/51bad67ffbce0aaa44579f84ef5d05597054ec6a > Author: Alexey Dobriyan > AuthorDate: Tue, 8 May 2018 00:37:55 +0300 > Committer:

v4.14.21+: ATOMIC_SLEEP splat bisected to 9428088c90b6 ("drm/qxl: reapply cursor after resetting primary")

2018-06-16 Thread Mike Galbraith
Greetings, Running a kernel with ATOMIC_SLEEP enabled in one of my VMs, I met the splat below. I tracked it back to 4.14-stable, and bisected it there. [ 35.748479] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:239 [ 37.302172] BUG: sleeping function called

[Fwd: avahi-daemon.service startup failure post kernel commit f396922d862a]

2018-06-13 Thread Mike Galbraith
Well, the folks at "To:" below apparently don't want bug reports from non-subscribers (no mediation, simply rejected). Posting here simply because it may save some other busy person a bisection. Forwarded Message ---- From: Mike Galbraith To: av...@lists.freedesktop.o

Re: [PATCH] Revert "debugfs: inode: debugfs_create_dir uses mode permission from parent"

2018-06-11 Thread Mike Galbraith
On Mon, 2018-06-11 at 11:12 -0700, Laura Abbott wrote: > On 06/11/2018 02:28 AM, Thomas Richter wrote: > > This reverts commit 95cde3c59966f6371b6bcd9e4e2da2ba64ee9775. > > It breaks the ioctl(KVM_CREATE_VM) interface. > > > > Can you elaborate a little more on how this breaks? Fedora has >

regression: "95cde3c59966 debugfs: inode: debugfs_create_dir uses mode permission from parent" terminally annoys libvirt

2018-06-08 Thread Mike Galbraith
Greetings, $subject bisected and verified via revert. Box is garden variety i4790, distro is openSUSE Leap 15.0. Error starting domain: internal error: process exited while connecting to monitor: ioctl(KVM_CREATE_VM) failed: 12 Cannot allocate memory 2018-06-08T03:18:00.453006Z

Re: [PATCH] x86,switch_mm: skip atomic operations for init_mm

2018-06-01 Thread Mike Galbraith
On Fri, 2018-06-01 at 13:03 -0700, Andy Lutomirski wrote: > > Mike, you never did say: do you have PCID on your CPU? Yes. > Also, what is > your workload doing to cause so many switches back and forth between > init_mm and a task. pipe-test measures pipe round trip, does nearly nothing but

Re: [PATCH] x86,switch_mm: skip atomic operations for init_mm

2018-06-01 Thread Mike Galbraith
On Fri, 2018-06-01 at 14:22 -0400, Rik van Riel wrote: > On Fri, 2018-06-01 at 08:11 -0700, Andy Lutomirski wrote: > > On Fri, Jun 1, 2018 at 5:28 AM Rik van Riel wrote: > > > > > > Song noticed switch_mm_irqs_off taking a lot of CPU time in recent > > > kernels,using 2.4% of a 48 CPU system

4.13..4.14 scheduling overhead regression (bisected - b956575bed91)

2018-06-01 Thread Mike Galbraith
Greetings, While dusting off regression testing trees, I noticed a substantial pipe-test dent at 4.14, and bisected it to b956575bed91. Log below. skew_tick=1 audit=0 nodelayacct cgroup_disable=memory nopti nospectre_v2 nospec_store_bypass_disable gov performance taskset 0xc pipe-test 1

Re: [PATCH] x86: UV: raw_spinlock conversion

2018-05-22 Thread Mike Galbraith
On Tue, 2018-05-22 at 11:46 +0200, Mike Galbraith wrote: > On Tue, 2018-05-22 at 11:14 +0200, Sebastian Andrzej Siewior wrote: > > > If you suggest that I > > should stop caring about UV than I do so. Please post a patch that adds > > a dependency to UV

Re: [PATCH] x86: UV: raw_spinlock conversion

2018-05-22 Thread Mike Galbraith
On Tue, 2018-05-22 at 11:14 +0200, Sebastian Andrzej Siewior wrote: > On 2018-05-22 10:24:22 [+0200], Mike Galbraith wrote: > > > If I were in your shoes, I think I'd just stop caring about UV until a > > real user appears. AFAIK, I'm the only guy who ever ran RT on UV, and

Re: [PATCH] x86: UV: raw_spinlock conversion

2018-05-22 Thread Mike Galbraith
On Tue, 2018-05-22 at 08:50 +0200, Sebastian Andrzej Siewior wrote: > > Regarding the preempt_disable() in the original patch in uv_read_rtc(): > This looks essential for PREEMPT configs. Is it possible to get this > tested by someone or else get rid of the UV code? It looks broken for >

Re: [PATCH] x86: UV: raw_spinlock conversion

2018-05-19 Thread Mike Galbraith
On Mon, 2018-05-07 at 09:39 +0200, Sebastian Andrzej Siewior wrote: > On 2018-05-06 12:59:19 [+0200], Mike Galbraith wrote: > > On Sun, 2018-05-06 at 12:26 +0200, Thomas Gleixner wrote: > > > On Fri, 4 May 2018, Sebastian Andrzej Siewior wrote: > > > > > >

Re: cpu stopper threads and load balancing leads to deadlock

2018-05-17 Thread Mike Galbraith
On Thu, 2018-05-17 at 07:03 -0700, Paul E. McKenney wrote: > On Tue, May 15, 2018 at 06:30:26AM +0200, Mike Galbraith wrote: > > > > Something like so perhaps? Mike, can you play around with that? Could > > > burn your granny and eat your cookies. > > > >

Re: cpu stopper threads and load balancing leads to deadlock

2018-05-14 Thread Mike Galbraith
On Thu, 2018-05-03 at 18:45 +0200, Peter Zijlstra wrote: > On Thu, May 03, 2018 at 09:12:31AM -0700, Paul E. McKenney wrote: > > On Thu, May 03, 2018 at 04:44:50PM +0200, Peter Zijlstra wrote: > > > On Thu, May 03, 2018 at 04:16:55PM +0200, Mike Galbraith wrote: > > > &g

Re: [patch] swiotlb: fix ignored DMA_ATTR_NO_WARN request

2018-05-12 Thread Mike Galbraith
To conclude to this snail like thread (/me=walking wounded), with the v4.16.8 hunk below, traces showing that swiotlb_alloc_coherent() was being asked to not bother warning started showing up after the box had been flogged for a while. Whatever finally happens with swiotlb (seems to be in flux),

[patch] swiotlb: fix ignored DMA_ATTR_NO_WARN request

2018-05-11 Thread Mike Galbraith
170 [006] 963.866917: swiotlb_tbl_map_single+0x29b/0x2d0: swiotlb buffer is full (sz: 2097152 bytes) Signed-off-by: Mike Galbraith <efa...@gmx.de> --- lib/swiotlb.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/lib/swiotlb.c +++ b/lib/swiotlb.c @@ -714,7 +714,7

Re: kernel spew from nouveau/ swiotlb

2018-05-11 Thread Mike Galbraith
On Thu, 2018-05-10 at 12:28 +0200, Mike Galbraith wrote: > On Thu, 2018-05-10 at 11:10 +0200, Mike Galbraith wrote: > > Greetings, > > > > When box is earning its keep, nouveau/swiotlb grumble.. a LOT. The > > below is from master.today. > > > > [1259

Re: [Nouveau] kernel spew from nouveau/ swiotlb

2018-05-10 Thread Mike Galbraith
On Thu, 2018-05-10 at 17:31 +0200, Mike Galbraith wrote: > On Thu, 2018-05-10 at 10:31 -0400, Jerome Glisse wrote: > > > > Could you bisect ? I would love to point finger upstream to the DMA > > folk who made changes to that API without testing with GPU. > >

Re: [Nouveau] kernel spew from nouveau/ swiotlb

2018-05-10 Thread Mike Galbraith
On Thu, 2018-05-10 at 10:31 -0400, Jerome Glisse wrote: > > Could you bisect ? I would love to point finger upstream to the DMA > folk who made changes to that API without testing with GPU. Rummaging a bit, it might be... nouveau_bo_new() ... ttm_dma_pool_alloc_new_pages() dma_alloc_attrs()

Re: kernel spew from nouveau/ swiotlb

2018-05-10 Thread Mike Galbraith
On Thu, 2018-05-10 at 11:10 +0200, Mike Galbraith wrote: > Greetings, > > When box is earning its keep, nouveau/swiotlb grumble.. a LOT. The > below is from master.today. > > [12594.640959] nouveau :01:00.0: swiotlb buffer is full (sz: 2097152 > bytes) > [12594.6930

kernel spew from nouveau/ swiotlb

2018-05-10 Thread Mike Galbraith
Greetings, When box is earning its keep, nouveau/swiotlb grumble.. a LOT. The below is from master.today. [12594.640959] nouveau :01:00.0: swiotlb buffer is full (sz: 2097152 bytes) [12594.693000] nouveau :01:00.0: swiotlb buffer is full (sz: 2097152 bytes) [12594.713787] nouveau

Re: bug in tag handling in blk-mq?

2018-05-09 Thread Mike Galbraith
On Wed, 2018-05-09 at 13:50 -0600, Jens Axboe wrote: > On 5/9/18 12:31 PM, Mike Galbraith wrote: > > On Wed, 2018-05-09 at 11:01 -0600, Jens Axboe wrote: > >> On 5/9/18 10:57 AM, Mike Galbraith wrote: > >> > >>>>> Confirmed. Impressive high speed bug s

Re: bug in tag handling in blk-mq?

2018-05-09 Thread Mike Galbraith
On Wed, 2018-05-09 at 11:01 -0600, Jens Axboe wrote: > On 5/9/18 10:57 AM, Mike Galbraith wrote: > > >>> Confirmed. Impressive high speed bug stomping. > >> > >> Well, that's good news. Can I get you to try this patch? > > > > Sure thing. T

Re: bug in tag handling in blk-mq?

2018-05-09 Thread Mike Galbraith
On Wed, 2018-05-09 at 09:18 -0600, Jens Axboe wrote: > On 5/8/18 10:11 PM, Mike Galbraith wrote: > > On Tue, 2018-05-08 at 19:09 -0600, Jens Axboe wrote: > >> > >> Alright, I managed to reproduce it. What I think is happening is that > >> BFQ is limiting the

Re: bug in tag handling in blk-mq?

2018-05-08 Thread Mike Galbraith
On Tue, 2018-05-08 at 14:37 -0600, Jens Axboe wrote: > > - sdd has nothing pending, yet has 6 active waitqueues. sdd is where ccache storage lives, which that should have been the only activity on that drive, as I built source in sdb, and was doing nothing else that utilizes sdd. -Mike

Re: bug in tag handling in blk-mq?

2018-05-08 Thread Mike Galbraith
On Tue, 2018-05-08 at 19:09 -0600, Jens Axboe wrote: > > Alright, I managed to reproduce it. What I think is happening is that > BFQ is limiting the inflight case to something less than the wake > batch for sbitmap, which can lead to stalls. I don't have time to test > this tonight, but perhaps

Re: bug in tag handling in blk-mq?

2018-05-08 Thread Mike Galbraith
On Tue, 2018-05-08 at 08:55 -0600, Jens Axboe wrote: > > All the block debug files are empty... Sigh. Take 2, this time cat debug files, having turned block tracing off before doing anything else (so trace bits in dmesg.txt should end AT the stall). -Mike dmesg.xz Description:

Re: bug in tag handling in blk-mq?

2018-05-08 Thread Mike Galbraith
On Tue, 2018-05-08 at 06:51 +0200, Mike Galbraith wrote: > > I'm deadlined ATM, but will get to it. (Bah, even a zombie can type ccache -C; make -j8 and stare...) kbuild again hung on the first go (yay), and post hang data written to sdd1 survived (kernel source lives in sdb3). Full

Re: bug in tag handling in blk-mq?

2018-05-07 Thread Mike Galbraith
On Mon, 2018-05-07 at 20:02 +0200, Paolo Valente wrote: > > > > Is there a reproducer? Just building fat config kernels works for me. It was highly non- deterministic, but reproduced quickly twice in a row with Paolos hack.    > Ok Mike, I guess it's your turn now, for at least a stack trace.

Re: [PATCH BUGFIX] block, bfq: postpone rq preparation to insert or merge

2018-05-07 Thread Mike Galbraith
On Mon, 2018-05-07 at 11:27 +0200, Paolo Valente wrote: > > > Where is the bug? Hm, seems potent pain-killers and C don't mix all that well.

Re: [PATCH] x86: UV: raw_spinlock conversion

2018-05-07 Thread Mike Galbraith
On Mon, 2018-05-07 at 09:39 +0200, Sebastian Andrzej Siewior wrote: > On 2018-05-06 12:59:19 [+0200], Mike Galbraith wrote: > > On Sun, 2018-05-06 at 12:26 +0200, Thomas Gleixner wrote: > > > On Fri, 4 May 2018, Sebastian Andrzej Siewior wrote: > > > > > >

Re: [PATCH BUGFIX] block, bfq: postpone rq preparation to insert or merge

2018-05-06 Thread Mike Galbraith
On Sun, 2018-05-06 at 09:42 +0200, Paolo Valente wrote: > > diff --git a/block/bfq-mq-iosched.c b/block/bfq-mq-iosched.c > index 118f319af7c0..6662efe29b69 100644 > --- a/block/bfq-mq-iosched.c > +++ b/block/bfq-mq-iosched.c > @@ -525,8 +525,13 @@ static void bfq_limit_depth(unsigned int op,

Re: [PATCH BUGFIX] block, bfq: postpone rq preparation to insert or merge

2018-05-06 Thread Mike Galbraith
On Mon, 2018-05-07 at 04:43 +0200, Mike Galbraith wrote: > On Sun, 2018-05-06 at 09:42 +0200, Paolo Valente wrote: > > > > I've attached a compressed patch (to avoid possible corruption from my > > mailer). I'm little confident, but no pain, no gain, right? > > > &

Re: [PATCH BUGFIX] block, bfq: postpone rq preparation to insert or merge

2018-05-06 Thread Mike Galbraith
On Sun, 2018-05-06 at 09:42 +0200, Paolo Valente wrote: > > I've attached a compressed patch (to avoid possible corruption from my > mailer). I'm little confident, but no pain, no gain, right? > > If possible, apply this patch on top of the fix I proposed in this > thread, just to eliminate

Re: [PATCH] x86: UV: raw_spinlock conversion

2018-05-06 Thread Mike Galbraith
On Sun, 2018-05-06 at 12:26 +0200, Thomas Gleixner wrote: > On Fri, 4 May 2018, Sebastian Andrzej Siewior wrote: > > > From: Mike Galbraith <umgwanakikb...@gmail.com> > > > > Shrug. Lots of hobbyists have a beast in their basement, right? > > This hardly qua

Re: [PATCH BUGFIX] block, bfq: postpone rq preparation to insert or merge

2018-05-05 Thread Mike Galbraith
On Sat, 2018-05-05 at 12:39 +0200, Paolo Valente wrote: > > BTW, if you didn't run out of patience with this permanent issue yet, > I was thinking of two o three changes to retry to trigger your failure > reliably. Sure, fire away, I'll happily give the annoying little bugger opportunities to

Re: [PATCH BUGFIX] block, bfq: postpone rq preparation to insert or merge

2018-05-05 Thread Mike Galbraith
On Fri, 2018-05-04 at 21:46 +0200, Mike Galbraith wrote: > Tentatively, I suspect you've just fixed the nasty stalls I reported a > while back. Oh well, so much for optimism. It took a lot, but just hung.

Re: [PATCH BUGFIX] block, bfq: postpone rq preparation to insert or merge

2018-05-04 Thread Mike Galbraith
Tentatively, I suspect you've just fixed the nasty stalls I reported a while back. Not a hint of stall as yet (should have shown itself by now), spinning rust buckets are being all they can be, box feels good. Later mq-deadline (I hope to eventually forget the module dependency eternities we've

[patch-rt] sched,fair: Fix CFS bandwidth control lockdep DEADLOCK report

2018-05-04 Thread Mike Galbraith
__hrtimer_run_queues+0x10e/0x5f0 hrtimer_run_softirq+0x83/0xc0 do_current_softirqs+0x292/0x660 run_ksoftirqd+0x27/0x70 smpboot_thread_fn+0x27f/0x330 kthread+0x103/0x140 ? smpboot_register_percpu_thread_cpumask+0x100/0x100 ? kthread_delayed_work_timer_fn+0x90/0x90 ret_from_fork+0x3a/0x50 Sig

Re: cpu stopper threads and load balancing leads to deadlock

2018-05-03 Thread Mike Galbraith
On Thu, 2018-05-03 at 18:45 +0200, Peter Zijlstra wrote: > > Something like so perhaps? Mike, can you play around with that? Could > burn your granny and eat your cookies. That worked, and nothing entertaining has happened.. yet. Hm, I could use this kernel to update my backup drive, if there's

Re: cpu stopper threads and load balancing leads to deadlock

2018-05-03 Thread Mike Galbraith
On Thu, 2018-05-03 at 15:56 +0200, Peter Zijlstra wrote: > On Thu, May 03, 2018 at 03:32:39PM +0200, Mike Galbraith wrote: > > > Dang. With $subject fix applied as well.. > > That's a NO then... :-( Could say who cares about oddball offline wakeup stat.

Re: cpu stopper threads and load balancing leads to deadlock

2018-05-03 Thread Mike Galbraith
On Thu, 2018-05-03 at 14:49 +0200, Peter Zijlstra wrote: > On Thu, May 03, 2018 at 02:40:21PM +0200, Mike Galbraith wrote: > > On Thu, 2018-05-03 at 14:28 +0200, Peter Zijlstra wrote: > > > > > > Hurm.. I don't see how this is 'new'. We moved the wakeup out fr

Re: cpu stopper threads and load balancing leads to deadlock

2018-05-03 Thread Mike Galbraith
On Thu, 2018-05-03 at 14:28 +0200, Peter Zijlstra wrote: > > Hurm.. I don't see how this is 'new'. We moved the wakeup out from under > stopper lock, but that should not affect the RCU state. No, not new, just an additional woes from same spot. -Mike

Re: cpu stopper threads and load balancing leads to deadlock

2018-05-03 Thread Mike Galbraith
On Tue, 2018-04-24 at 14:33 +0100, Matt Fleming wrote: > On Fri, 20 Apr, at 11:50:05AM, Peter Zijlstra wrote: > > On Tue, Apr 17, 2018 at 03:21:19PM +0100, Matt Fleming wrote: > > > Hi guys, > > > > > > We've seen a bug in one of our SLE kernels where the cpu stopper > > > thread ("migration/15")

Re: [PATCH v7 2/5] cpuset: Add cpuset.sched_load_balance to v2

2018-05-02 Thread Mike Galbraith
On Wed, 2018-05-02 at 16:02 +0200, Peter Zijlstra wrote: > On Wed, May 02, 2018 at 09:47:00AM -0400, Waiman Long wrote: > > > > I've read half of the next patch that adds the isolation thing. And > > > while that kludges around the whole root cgorup is magic thing, it > > > doesn't help if you

Re: [RFC/RFT patch 0/7] timekeeping: Unify clock MONOTONIC and clock BOOTTIME

2018-04-26 Thread Mike Galbraith
On Wed, 2018-04-25 at 15:03 +0200, Thomas Gleixner wrote: > Right, it does not matter. The real interesting one is d6ed449afdb3. FWIW, three boxen here suspend/resume fine, but repeatably exhibit the below after a very few minute suspend, and a short bisect fingered your suspect. Distro is

Re: DOS by unprivileged user

2018-04-25 Thread Mike Galbraith
On Wed, 2018-04-25 at 15:54 +0100, Alan Cox wrote: > > Classical Unix systems never had this problem because they respond to > thrashing by ensuring that all processes consumed CPU and made some > progress. Linux handles it by thrashing itself to dealth while BSD always > handled it by moving

Re: DOS by unprivileged user

2018-04-25 Thread Mike Galbraith
On Wed, 2018-04-25 at 15:54 +0100, Alan Cox wrote: > > > I think memory allocation and io waits can't be decoupled from > > > scheduling as they are now. > > > > The scheduler is not decoupled from either, it is intimately involved > > in both. However, none of the decision making smarts for

Re: [PATCH] sched: fix typo in error message

2018-04-25 Thread Mike Galbraith
On Wed, 2018-04-25 at 13:41 +0800, Li Bin wrote: > Signed-off-by: Li Bin > --- > kernel/sched/topology.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c > index 64cc564..cf15c1c 100644 > ---

Re: DOS by unprivileged user

2018-04-23 Thread Mike Galbraith
On Sun, 2018-04-22 at 21:37 +0200, Ferry Toth wrote: > > Yes your memory hog scenario thoroughly wrecks the user experience, but > > the process scheduler in not the source of that wreckage, it's a memory > > management issue. With no constraints in place, anybody can just keep > > on allocating

  1   2   3   4   5   6   7   8   9   10   >