Re: BUG_ON(!newowner) in fixup_pi_state_owner()

2020-11-04 Thread Mike Galbraith
On Wed, 2020-11-04 at 14:26 +0100, Thomas Gleixner wrote: > > I'll post that with a proper comment and changelog. Mike, can I add your > Signed-off-by to the thing? I suppose. -Mike

Re: BUG_ON(!newowner) in fixup_pi_state_owner()

2020-11-03 Thread Mike Galbraith
On Wed, 2020-11-04 at 01:56 +0100, Mike Galbraith wrote: > On Tue, 2020-11-03 at 17:31 -0600, Gratian Crisan wrote: > > Hi all, > > > > I apologize for waking up the futex demons (and replying to my own > > email), but ... > > > > Gratian Crisan writ

Re: BUG_ON(!newowner) in fixup_pi_state_owner()

2020-11-03 Thread Mike Galbraith
On Tue, 2020-11-03 at 17:31 -0600, Gratian Crisan wrote: > Hi all, > > I apologize for waking up the futex demons (and replying to my own > email), but ... > > Gratian Crisan writes: > > > > Brandon and I have been debugging a nasty race that leads to > > BUG_ON(!newowner) in

Re: v5.8+ powersave governor breakage?

2020-11-01 Thread Mike Galbraith
On Sun, 2020-11-01 at 17:23 +0100, Mike Galbraith wrote: > Greetings, > > As you can see in the data below, my i4790 box used to default to the > powersave governor despite CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y, and > disallowed switching to ondemand. Ok, my HP lappy running mas

v5.8+ powersave governor breakage?

2020-11-01 Thread Mike Galbraith
Greetings, As you can see in the data below, my i4790 box used to default to the powersave governor despite CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y, and disallowed switching to ondemand. Post 5.8, powersave locks in at the lowest freq available, and while ondemand becomes the default, it takes a

ltp::mmap05 --> BUG: using __this_cpu_read() in preemptible

2020-10-30 Thread Mike Galbraith
[ 138.620544] BUG: using __this_cpu_read() in preemptible [] code: mmap05/4858 [ 138.620737] caller is lockdep_hardirqs_on_prepare+0x2f/0x1b0 [ 138.620880] CPU: 2 PID: 4858 Comm: mmap05 Kdump: loaded Tainted: G S E 5.10.0.g07e0887-master #18 [ 138.621097] Hardware name:

Re: kvm+nouveau induced lockdep gripe

2020-10-27 Thread Mike Galbraith
On Tue, 2020-10-27 at 11:18 +0100, Sebastian Andrzej Siewior wrote: > On 2020-10-27 11:14:34 [+0100], Mike Galbraith wrote: > > On Tue, 2020-10-27 at 10:00 +0100, Sebastian Andrzej Siewior wrote: > > > Let me try if I can figure out when this broke. > > > > My mo

Re: kvm+nouveau induced lockdep gripe

2020-10-27 Thread Mike Galbraith
On Tue, 2020-10-27 at 10:00 +0100, Sebastian Andrzej Siewior wrote: > Let me try if I can figure out when this broke. My money is on... 710da3c8ea7df (Juri Lelli 2019-07-19 16:00:00 +0200 5317) if (pi) 710da3c8ea7df (Juri Lelli 2019-07-19 16:00:00 +0200 5318)

Re: kvm+nouveau induced lockdep gripe

2020-10-27 Thread Mike Galbraith
On Tue, 2020-10-27 at 10:00 +0100, Sebastian Andrzej Siewior wrote: > On 2020-10-27 07:03:38 [+0100], Mike Galbraith wrote: > > On Mon, 2020-10-26 at 20:53 +0100, Sebastian Andrzej Siewior wrote: > > > > > > Could you try this, please? > > > > Nogo, fi

5.10-rc1 xhci lockdep splat: pin_fs_lock vs xhci->lock HARDIRQ-safe -> HARDIRQ-unsafe lock order detected

2020-10-27 Thread Mike Galbraith
FYI, $subject is followed by a might sleep splat. Box is an aging HP Spectre X360 lappy. [5.987129] = [5.987133] WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected [5.987138] 5.10.0.g3650b22-master #15 Tainted: G I

Re: kvm+nouveau induced lockdep gripe

2020-10-27 Thread Mike Galbraith
On Mon, 2020-10-26 at 20:53 +0100, Sebastian Andrzej Siewior wrote: > > Could you try this, please? Nogo, first call of sched_setscheduler() is via kthread_create(). I confirmed that nuking that (gratuitous user foot saving override) call on top of moving sched_set_fifo() does shut it up, but

Re: kvm+nouveau induced lockdep gripe

2020-10-26 Thread Mike Galbraith
On Mon, 2020-10-26 at 18:31 +0100, Sebastian Andrzej Siewior wrote: > On 2020-10-24 13:00:00 [+0800], Hillf Danton wrote: > > > > Hmm...curious how that word went into your mind. And when? > > > [ 30.457363] > > >other info that might help us debug this: > > > [ 30.457369]

Re: kvm+nouveau induced lockdep gripe

2020-10-23 Thread Mike Galbraith
On Sat, 2020-10-24 at 13:00 +0800, Hillf Danton wrote: > On Sat, 24 Oct 2020 05:38:23 +0200 Mike Galbraith wrote: > > On Sat, 2020-10-24 at 10:22 +0800, Hillf Danton wrote: > > > > > > Looks like we can break the lock chain by moving ttm bo's release > > > met

Re: kvm+nouveau induced lockdep gripe

2020-10-23 Thread Mike Galbraith
On Sat, 2020-10-24 at 10:22 +0800, Hillf Danton wrote: > > Looks like we can break the lock chain by moving ttm bo's release > method out of mmap_lock, see diff below. Ah, the perfect compliment to morning java, a patchlet to wedge in and see what happens. wedge/build/boot Mmm, box says no

Re: kvm+nouveau induced lockdep gripe

2020-10-23 Thread Mike Galbraith
On Fri, 2020-10-23 at 11:01 +0200, Sebastian Andrzej Siewior wrote: > On 2020-10-22 07:28:20 [+0200], Mike Galbraith wrote: > > I've only as yet seen nouveau lockdep gripage when firing up one of my > > full distro KVM's. > > Could you please check !RT with the `threadirqs' c

kvm+nouveau induced lockdep gripe

2020-10-21 Thread Mike Galbraith
I've only as yet seen nouveau lockdep gripage when firing up one of my full distro KVM's. [ 91.655613] == [ 91.655614] WARNING: possible circular locking dependency detected [ 91.655614] 5.9.1-rt18-rt #5 Tainted: G S E [

ltp or kvm triggerable lockdep alloc_pid() deadlock gripe

2020-10-21 Thread Mike Galbraith
Greetings, The gripe below is repeatable in two ways here, boot with nomodeset so nouveau doesn't steal the lockdep show when I then fire up one of my (oink) full distro VM's, or from an ltp directory ./runltp -f cpuset with the attached subset of controllers file placed in ./runtest dir.

block, bfq: lockdep circular locking dependency gripe

2020-10-20 Thread Mike Galbraith
[ 1917.361401] == [ 1917.361406] WARNING: possible circular locking dependency detected [ 1917.361413] 5.9.0.g7cf726a-master #2 Tainted: G S E [ 1917.361417] -- [ 1917.361422]

Re: [PATCH v3 7/7] zram: Use local lock to protect per-CPU data

2020-10-18 Thread Mike Galbraith
On Sun, 2020-10-18 at 19:52 -0600, Yu Zhao wrote: > On Wed, May 27, 2020 at 10:11:19PM +0200, Sebastian Andrzej Siewior wrote: > > From: Mike Galbraith > > > > The zcomp driver uses per-CPU compression. The per-CPU data pointer is > > acquired with get_cpu_ptr(

Re: [RT] 5.9-rt14 softirq_ctrl.lock vs listening_hash[i].lock lockdep splat

2020-10-12 Thread Mike Galbraith
On Mon, 2020-10-12 at 20:34 +0200, Mike Galbraith wrote: > On Mon, 2020-10-12 at 18:45 +0200, Sebastian Andrzej Siewior wrote: > > On 2020-10-10 06:31:57 [+0200], Mike Galbraith wrote: > > > > so this then. Do you have more of these? > > Nope Well, I do have a grip

Re: [RT] 5.9-rt14 softirq_ctrl.lock vs listening_hash[i].lock lockdep splat

2020-10-12 Thread Mike Galbraith
On Mon, 2020-10-12 at 18:45 +0200, Sebastian Andrzej Siewior wrote: > On 2020-10-10 06:31:57 [+0200], Mike Galbraith wrote: > > so this then. Do you have more of these? Nope, nothing was hiding behind it, all better now.

[RT] 5.9-rt14 softirq_ctrl.lock vs listening_hash[i].lock lockdep splat

2020-10-09 Thread Mike Galbraith
[ 47.844511] == [ 47.844511] WARNING: possible circular locking dependency detected [ 47.844512] 5.9.0.gc85fb28-rt14-rt #1 Tainted: GE [ 47.844513] -- [ 47.844514]

nouveau: BUG: Invalid wait context

2020-09-09 Thread Mike Galbraith
Greetings, Box is an aging generic i4790 + GTX-980 desktop. [ 1143.133663] = [ 1143.133666] [ BUG: Invalid wait context ] [ 1143.133671] 5.9.0.g34d4ddd-preempt #2 Tainted: G S E [ 1143.133675] - [ 1143.133678] X/2015 is trying to

Re: [ANNOUNCE] v5.9-rc3-rt3

2020-09-09 Thread Mike Galbraith
On Wed, 2020-09-09 at 10:20 +0200, Sebastian Andrzej Siewior wrote: > > Do you see the lockdep splat without nouveau? Yeah. Lappy uses i915, but lockdep also shuts itself off. BTW, methinks RT had nothing to do with the nouveau burp. -Mike

Re: [ANNOUNCE] v5.9-rc3-rt3

2020-09-08 Thread Mike Galbraith
On Wed, 2020-09-09 at 05:12 +0200, Mike Galbraith wrote: > On Wed, 2020-09-02 at 17:55 +0200, Sebastian Andrzej Siewior wrote: > > > > Known issues > > - It has been pointed out that due to changes to the printk code the > >internal buffer representation cha

Re: [ANNOUNCE] v5.9-rc3-rt3

2020-09-08 Thread Mike Galbraith
On Wed, 2020-09-09 at 05:12 +0200, Mike Galbraith wrote: > On Wed, 2020-09-02 at 17:55 +0200, Sebastian Andrzej Siewior wrote: > > > > Known issues > > - It has been pointed out that due to changes to the printk code the > >internal buffer representation cha

Re: [ANNOUNCE] v5.9-rc3-rt3

2020-09-08 Thread Mike Galbraith
On Wed, 2020-09-02 at 17:55 +0200, Sebastian Andrzej Siewior wrote: > > Known issues > - It has been pointed out that due to changes to the printk code the >internal buffer representation changed. This is only an issue if tools >like `crash' are used to extract the printk

Re: v5.9-rc3-rt3 boot time networking lockdep splat

2020-09-08 Thread Mike Galbraith
On Tue, 2020-09-08 at 17:06 +0200, Sebastian Andrzej Siewior wrote: > > This should cure it: It did. -Mike

Re: v5.9-rc3-rt3 boot time networking lockdep splat

2020-09-08 Thread Mike Galbraith
On Tue, 2020-09-08 at 14:19 +0200, Sebastian Andrzej Siewior wrote: > > This has nothing to do with the bridge but with the fact that you use a > non standard queue class (something else than pfifo_fast). That must be SUSE, I don't muck about in network land. I downloaded a whole library of RFCs

Re: v5.9-rc3-rt3 boot time networking lockdep splat

2020-09-08 Thread Mike Galbraith
On Tue, 2020-09-08 at 17:12 +0200, Sebastian Andrzej Siewior wrote: > On 2020-09-05 07:19:10 [+0200], Mike Galbraith wrote: > > Lappy, which does not use bridge, boots clean... but lock leakage > > pretty darn quickly inspires lockdep to craps its drawers. > > &

Re: v5.9-rc3-rt3 boot time networking lockdep splat

2020-09-08 Thread Mike Galbraith
On Tue, 2020-09-08 at 17:06 +0200, Sebastian Andrzej Siewior wrote: > On 2020-09-08 16:56:20 [+0200], Mike Galbraith wrote: > > On Tue, 2020-09-08 at 14:19 +0200, Sebastian Andrzej Siewior wrote: > > > > > > This has nothing to do with the bridge but with the fact that y

Re: v5.9-rc3-rt3 boot time networking lockdep splat

2020-09-04 Thread Mike Galbraith
Lappy, which does not use bridge, boots clean... but lock leakage pretty darn quickly inspires lockdep to craps its drawers. [ 209.00] BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low! [ 209.001113] turning off the locking correctness validator. [ 209.001114] CPU: 2 PID: 3773 Comm: Socket Thread

v5.9-rc3-rt3 boot time networking lockdep splat

2020-09-04 Thread Mike Galbraith
[ 22.004225] r8169 :03:00.0 eth0: Link is Up - 1Gbps/Full - flow control off [ 22.004450] br0: port 1(eth0) entered blocking state [ 22.004473] br0: port 1(eth0) entered forwarding state [ 22.006411] IPv6: ADDRCONF(NETDEV_CHANGE): br0: link becomes ready [ 22.024936]

[tip: locking/core] zram: Use local lock to protect per-CPU data

2020-06-01 Thread tip-bot2 for Mike Galbraith
The following commit has been merged into the locking/core branch of tip: Commit-ID: 19f545b6e07f753c4dc639c2f0ab52345733b6a8 Gitweb: https://git.kernel.org/tip/19f545b6e07f753c4dc639c2f0ab52345733b6a8 Author:Mike Galbraith AuthorDate:Wed, 27 May 2020 22:11:19 +02:00

[tip: locking/core] connector/cn_proc: Protect send_msg() with a local lock

2020-06-01 Thread tip-bot2 for Mike Galbraith
The following commit has been merged into the locking/core branch of tip: Commit-ID: 3e92fd7bd2b8418b53cb7304855b8b69bedbe2b4 Gitweb: https://git.kernel.org/tip/3e92fd7bd2b8418b53cb7304855b8b69bedbe2b4 Author:Mike Galbraith AuthorDate:Wed, 27 May 2020 22:11:17 +02:00

Re: [PATCH RT 4/2] hrtimer: Don't lose state in cpu_chill()

2019-02-25 Thread Mike Galbraith
On Mon, 2019-02-25 at 17:34 +0100, Sebastian Andrzej Siewior wrote: > On 2019-02-25 15:43:35 [+0100], Mike Galbraith wrote: > > Hi Sebastian, > Hi Mike, > > > My box claims that this patch is busted. It argues its case by IO > > deadlocking any kernel this patch is ap

Re: [PATCH RT 4/2] hrtimer: Don't lose state in cpu_chill()

2019-02-25 Thread Mike Galbraith
Hi Sebastian, My box claims that this patch is busted. It argues its case by IO deadlocking any kernel this patch is applied to when spinning rust is flogged, including virgin 4.19-rt14, said kernel becoming stable again when I whack the accused. On Tue, 2019-02-19 at 17:08 +0100, Sebastian

Re: Kernel panics with recent 4.9 Kernels

2019-02-07 Thread Mike Galbraith
On Fri, 2019-02-08 at 04:09 +0100, Mike Galbraith wrote: > On Thu, 2019-02-07 at 20:13 +0100, Michael Brunnbauer wrote: > > hi, > > > > no replies to my mail. Any suggestions what I should do or where I > > could ask for help? > > I'd suggest trying to capture

Re: Kernel panics with recent 4.9 Kernels

2019-02-07 Thread Mike Galbraith
On Thu, 2019-02-07 at 20:13 +0100, Michael Brunnbauer wrote: > hi, > > no replies to my mail. Any suggestions what I should do or where I > could ask for help? I'd suggest trying to capture events with something better than an incomplete screenshot, ie serial console, netconsole, anything that

Re: 5.0-rc1 KVM inspired "BUG: Bad page state in process" spew

2019-01-14 Thread Mike Galbraith
On Mon, 2019-01-14 at 07:46 -0800, Sean Christopherson wrote: > On Sat, Jan 12, 2019 at 07:43:02AM +0100, Mike Galbraith wrote: > > On Wed, 2019-01-09 at 11:26 -0800, Sean Christopherson wrote: > > > > > > I'll try to bisect. > > > > Good luck with tha

Re: 5.0-rc1 KVM inspired "BUG: Bad page state in process" spew

2019-01-11 Thread Mike Galbraith
On Wed, 2019-01-09 at 11:26 -0800, Sean Christopherson wrote: > > I'll try to bisect. Good luck with that. I gave it a go, but that apparently invalidated the warrantee of my vm image :) -Mike

Re: 5.0-rc1 KVM inspired "BUG: Bad page state in process" spew

2019-01-09 Thread Mike Galbraith
On Wed, 2019-01-09 at 15:42 +0100, Adam Borowski wrote: > On Wed, Jan 09, 2019 at 06:38:58AM +0100, Mike Galbraith wrote: > > KVM seems to be busted in master ATM. All I have to do to have host > > start screaming and maybe exploding (if the guest doesn't do so first) > >

5.0-rc1 KVM inspired "BUG: Bad page state in process" spew

2019-01-08 Thread Mike Galbraith
Greetings, KVM seems to be busted in master ATM. All I have to do to have host start screaming and maybe exploding (if the guest doesn't do so first) is to try to install a (obese in this case) kernel over nfs mount of the host in a guest. Kernel producing the spew below is 3bd6e94, config

Re: [PATCH 0/1] RFC: sched/fair: skip select_idle_sibling() in presence of sync wakeups

2019-01-08 Thread Mike Galbraith
On Tue, 2019-01-08 at 22:49 -0500, Andrea Arcangeli wrote: > Hello, > > we noticed some unexpected performance regressions in the scheduler by > switching the guest CPU topology from "-smp 2,sockets=2,cores=1" to > "-smp 2,sockets=1,cores=2". > > With sockets=2,cores=1 localhost message passing

Re: CFS scheduler: spin_lock usage causes dead lock when smp_apic_timer_interrupt occurs

2019-01-07 Thread Mike Galbraith
On Mon, 2019-01-07 at 13:52 +0100, Peter Zijlstra wrote: > On Mon, Jan 07, 2019 at 01:28:34PM +0100, Mike Galbraith wrote: > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index 960ad0ce77d7..420624c49f38 100644 > > --- a/kernel/sched/fair.c > >

Re: CFS scheduler: spin_lock usage causes dead lock when smp_apic_timer_interrupt occurs

2019-01-07 Thread Mike Galbraith
On Mon, 2019-01-07 at 11:26 +0100, Peter Zijlstra wrote: > > I would expect lockdep you also complain about this... And grumble it did. commit df7e8acc0c9a84979a448d215b8ef889efe4ac5a Author: Mike Galbraith Date: Fri May 4 08:14:38 2018 +0200 sched/fair: Fix CFS bandwidth c

Re: [PATCH RFC 01/15] MIPS: replace **** with a hug

2018-11-30 Thread Mike Galbraith
On Fri, 2018-11-30 at 11:27 -0800, Jarkko Sakkinen wrote: > In order to comply with the CoC, replace with a hug. > > Signed-off-by: Jarkko Sakkinen > --- > arch/mips/pci/ops-bridge.c | 24 > arch/mips/sgi-ip22/ip22-setup.c | 2 +- > 2 files changed, 13

Re: [PATCH RFC 01/15] MIPS: replace **** with a hug

2018-11-30 Thread Mike Galbraith
On Fri, 2018-11-30 at 11:27 -0800, Jarkko Sakkinen wrote: > In order to comply with the CoC, replace with a hug. > > Signed-off-by: Jarkko Sakkinen > --- > arch/mips/pci/ops-bridge.c | 24 > arch/mips/sgi-ip22/ip22-setup.c | 2 +- > 2 files changed, 13

Re: memcg oops: memcg_kmem_charge_memcg()->try_charge()->page_counter_try_charge()->BOOM

2018-10-29 Thread Mike Galbraith
On Mon, 2018-10-29 at 21:49 +, Roman Gushchin wrote: > On Mon, Oct 29, 2018 at 09:46:54PM +0100, Mike Galbraith wrote: > > > Ah, I have cgroup_disable=memory on the command line, which turns out > > to be why your box doesn't explode, while mine does. > > Yeah, here

Re: memcg oops: memcg_kmem_charge_memcg()->try_charge()->page_counter_try_charge()->BOOM

2018-10-29 Thread Mike Galbraith
On Mon, 2018-10-29 at 21:49 +, Roman Gushchin wrote: > On Mon, Oct 29, 2018 at 09:46:54PM +0100, Mike Galbraith wrote: > > > Ah, I have cgroup_disable=memory on the command line, which turns out > > to be why your box doesn't explode, while mine does. > > Yeah, here

Re: memcg oops: memcg_kmem_charge_memcg()->try_charge()->page_counter_try_charge()->BOOM

2018-10-29 Thread Mike Galbraith
On Mon, 2018-10-29 at 18:54 +, Roman Gushchin wrote: > > Hi Mike! > > Thank you for the report! > > Do you see it reliable every time you boot up the machine? Yeah. > How do you run kvm? My VMs are full SW/data clones of my i7-4790/openSUSE box. > Is there something special about your

Re: memcg oops: memcg_kmem_charge_memcg()->try_charge()->page_counter_try_charge()->BOOM

2018-10-29 Thread Mike Galbraith
On Mon, 2018-10-29 at 18:54 +, Roman Gushchin wrote: > > Hi Mike! > > Thank you for the report! > > Do you see it reliable every time you boot up the machine? Yeah. > How do you run kvm? My VMs are full SW/data clones of my i7-4790/openSUSE box. > Is there something special about your

Re: memcg oops: memcg_kmem_charge_memcg()->try_charge()->page_counter_try_charge()->BOOM

2018-10-29 Thread Mike Galbraith
On Mon, 2018-10-29 at 14:20 +0100, Michal Hocko wrote: > > > [4.420976] Code: f3 c3 0f 1f 00 0f 1f 44 00 00 48 85 ff 0f 84 a8 00 00 > > 00 41 56 48 89 f8 41 55 49 89 fe 41 54 49 89 d5 55 49 89 f4 53 48 89 f3 > > 48 0f c1 1f 48 01 f3 48 39 5f 18 48 89 fd 73 17 eb 41 48 89 e8 > > [

Re: memcg oops: memcg_kmem_charge_memcg()->try_charge()->page_counter_try_charge()->BOOM

2018-10-29 Thread Mike Galbraith
On Mon, 2018-10-29 at 14:20 +0100, Michal Hocko wrote: > > > [4.420976] Code: f3 c3 0f 1f 00 0f 1f 44 00 00 48 85 ff 0f 84 a8 00 00 > > 00 41 56 48 89 f8 41 55 49 89 fe 41 54 49 89 d5 55 49 89 f4 53 48 89 f3 > > 48 0f c1 1f 48 01 f3 48 39 5f 18 48 89 fd 73 17 eb 41 48 89 e8 > > [

Re: [PATCH RT 08/22] Revert "x86: UV: raw_spinlock conversion"

2018-09-06 Thread Mike Galbraith
On Thu, 2018-09-06 at 09:35 +0200, Sebastian Andrzej Siewior wrote: > On 2018-09-05 08:28:02 [-0400], Steven Rostedt wrote: > > 4.14.63-rt41-rc1 stable review patch. > > If anyone has any objections, please let me know. > > > > -- > > > > From: Sebastian Andrzej Siewior > > > >

Re: [PATCH RT 08/22] Revert "x86: UV: raw_spinlock conversion"

2018-09-06 Thread Mike Galbraith
On Thu, 2018-09-06 at 09:35 +0200, Sebastian Andrzej Siewior wrote: > On 2018-09-05 08:28:02 [-0400], Steven Rostedt wrote: > > 4.14.63-rt41-rc1 stable review patch. > > If anyone has any objections, please let me know. > > > > -- > > > > From: Sebastian Andrzej Siewior > > > >

Re: [regression/bisected] 4.19 cycle boot time IO stalls

2018-09-05 Thread Mike Galbraith
On Wed, 2018-09-05 at 07:39 -0600, Jens Axboe wrote: > > I bet it's the host busy change from Ming, which I already > reported as being the culprit for another test failure I had. For > some reason it's not merged yet, nudge nudge Martin. You can test > by reverting: > > commit

Re: [regression/bisected] 4.19 cycle boot time IO stalls

2018-09-05 Thread Mike Galbraith
On Wed, 2018-09-05 at 07:39 -0600, Jens Axboe wrote: > > I bet it's the host busy change from Ming, which I already > reported as being the culprit for another test failure I had. For > some reason it's not merged yet, nudge nudge Martin. You can test > by reverting: > > commit

[regression/bisected] 4.19 cycle boot time IO stalls

2018-09-05 Thread Mike Galbraith
Greetings, I've been seeing $subject, decided to take the time to try to bisect the little bugger. The hangs are not 100% repeatable, and while bisection with a 5 boot go/nogo threshold seemed to go smoothly, it ended up fingering a merge commit (sigh). Box has an SSD (unused only by windows 10

[regression/bisected] 4.19 cycle boot time IO stalls

2018-09-05 Thread Mike Galbraith
Greetings, I've been seeing $subject, decided to take the time to try to bisect the little bugger. The hangs are not 100% repeatable, and while bisection with a 5 boot go/nogo threshold seemed to go smoothly, it ended up fingering a merge commit (sigh). Box has an SSD (unused only by windows 10

Re: bisected - arm64 kvm unit test failures

2018-08-22 Thread Mike Galbraith
On Wed, 2018-08-22 at 14:50 +0100, Marc Zyngier wrote: > On 22/08/18 14:38, Mike Galbraith wrote: > > On Tue, 2018-08-21 at 16:34 +0100, Marc Zyngier wrote: > >> Could you give that patchlet[1] a go? It solves a similar issue for me > >> on a different pl

Re: bisected - arm64 kvm unit test failures

2018-08-22 Thread Mike Galbraith
On Wed, 2018-08-22 at 14:50 +0100, Marc Zyngier wrote: > On 22/08/18 14:38, Mike Galbraith wrote: > > On Tue, 2018-08-21 at 16:34 +0100, Marc Zyngier wrote: > >> Could you give that patchlet[1] a go? It solves a similar issue for me > >> on a different pl

Re: bisected - arm64 kvm unit test failures

2018-08-22 Thread Mike Galbraith
On Tue, 2018-08-21 at 16:34 +0100, Marc Zyngier wrote: > Could you give that patchlet[1] a go? It solves a similar issue for me > on a different platform. > > [1] https://lists.cs.columbia.edu/pipermail/kvmarm/2018-August/032469.html Yup, all better. -Mike

Re: bisected - arm64 kvm unit test failures

2018-08-22 Thread Mike Galbraith
On Tue, 2018-08-21 at 16:34 +0100, Marc Zyngier wrote: > Could you give that patchlet[1] a go? It solves a similar issue for me > on a different platform. > > [1] https://lists.cs.columbia.edu/pipermail/kvmarm/2018-August/032469.html Yup, all better. -Mike

Re: [BUG v4.14-rt] kernel BUG at /work/rt/stable-rt.git/kernel/sched/core.c:1639!

2018-08-19 Thread Mike Galbraith
On Sat, 2018-08-18 at 15:13 +0200, Mike Galbraith wrote: > seems it has be something from the 4.17 cycle that went back to 4.14- > stable after 4.1[56]-stable trees went extinct. See ("sched/core: Require cpu_active() in select_task_rq(), for user tasks") Fix it like so? sch

Re: [BUG v4.14-rt] kernel BUG at /work/rt/stable-rt.git/kernel/sched/core.c:1639!

2018-08-19 Thread Mike Galbraith
On Sat, 2018-08-18 at 15:13 +0200, Mike Galbraith wrote: > seems it has be something from the 4.17 cycle that went back to 4.14- > stable after 4.1[56]-stable trees went extinct. See ("sched/core: Require cpu_active() in select_task_rq(), for user tasks") Fix it like so? sch

Re: [BUG v4.14-rt] kernel BUG at /work/rt/stable-rt.git/kernel/sched/core.c:1639!

2018-08-18 Thread Mike Galbraith
On Sat, 2018-08-18 at 12:29 +0200, Mike Galbraith wrote: > On Fri, 2018-08-17 at 16:23 -0400, Steven Rostedt wrote: > > Pulling in stable releases into v4.14-rt I triggered this with my CPU > > hotplug test: > > > > [ cut here ] > > ker

Re: [BUG v4.14-rt] kernel BUG at /work/rt/stable-rt.git/kernel/sched/core.c:1639!

2018-08-18 Thread Mike Galbraith
On Sat, 2018-08-18 at 12:29 +0200, Mike Galbraith wrote: > On Fri, 2018-08-17 at 16:23 -0400, Steven Rostedt wrote: > > Pulling in stable releases into v4.14-rt I triggered this with my CPU > > hotplug test: > > > > [ cut here ] > > ker

Re: [BUG v4.14-rt] kernel BUG at /work/rt/stable-rt.git/kernel/sched/core.c:1639!

2018-08-18 Thread Mike Galbraith
On Fri, 2018-08-17 at 16:23 -0400, Steven Rostedt wrote: > Pulling in stable releases into v4.14-rt I triggered this with my CPU > hotplug test: > > [ cut here ] > kernel BUG at /work/rt/stable-rt.git/kernel/sched/core.c:1639! > invalid opcode: [#1] PREEMPT SMP PTI >

Re: [BUG v4.14-rt] kernel BUG at /work/rt/stable-rt.git/kernel/sched/core.c:1639!

2018-08-18 Thread Mike Galbraith
On Fri, 2018-08-17 at 16:23 -0400, Steven Rostedt wrote: > Pulling in stable releases into v4.14-rt I triggered this with my CPU > hotplug test: > > [ cut here ] > kernel BUG at /work/rt/stable-rt.git/kernel/sched/core.c:1639! > invalid opcode: [#1] PREEMPT SMP PTI >

[PATCH] rcu: Convert rcu_state.ofl_lock to raw_spinlock_t

2018-08-15 Thread Mike Galbraith
1e64b15a4b10 ("rcu: Fix grace-period hangs due to race with CPU offline") added spinlock_t ofl_lock to the rcu_state structure, then takes it with preemption disabled during CPU offline, giving RT sleeping lock heartburn. Convert it to raw_spinlock_t. Signed-off-by: Mike Galbraith -

[PATCH] rcu: Convert rcu_state.ofl_lock to raw_spinlock_t

2018-08-15 Thread Mike Galbraith
1e64b15a4b10 ("rcu: Fix grace-period hangs due to race with CPU offline") added spinlock_t ofl_lock to the rcu_state structure, then takes it with preemption disabled during CPU offline, giving RT sleeping lock heartburn. Convert it to raw_spinlock_t. Signed-off-by: Mike Galbraith -

Re: [PATCH] x86, kdump: Fix efi=noruntime NULL pointer dereference

2018-08-14 Thread Mike Galbraith
On Wed, 2018-08-15 at 11:59 +0800, Dave Young wrote: > > Does this improve things, and plug the no boot hole? > > Would you mind to tune my patch with some acpi_rsdp checking and add > some error message in case kexec load failure? Eg. suggest people to use > append acpi_rsdp for noefi booting

Re: [PATCH] x86, kdump: Fix efi=noruntime NULL pointer dereference

2018-08-14 Thread Mike Galbraith
On Wed, 2018-08-15 at 11:59 +0800, Dave Young wrote: > > Does this improve things, and plug the no boot hole? > > Would you mind to tune my patch with some acpi_rsdp checking and add > some error message in case kexec load failure? Eg. suggest people to use > append acpi_rsdp for noefi booting

Re: [PATCH] x86, kdump: Fix efi=noruntime NULL pointer dereference

2018-08-10 Thread Mike Galbraith
a when a 1:1 mapping is available. Bail early with -ENODEV if not available, but is required to boot, and acpi_rsdp= was not passed on the command line. 3. Use the proper config dependency to isolate efi setup functions, adding a !EFI_RUNTIME_MAP stub for setup_efi_state(). 4. Change efi functions that

Re: [PATCH] x86, kdump: Fix efi=noruntime NULL pointer dereference

2018-08-10 Thread Mike Galbraith
a when a 1:1 mapping is available. Bail early with -ENODEV if not available, but is required to boot, and acpi_rsdp= was not passed on the command line. 3. Use the proper config dependency to isolate efi setup functions, adding a !EFI_RUNTIME_MAP stub for setup_efi_state(). 4. Change efi functions that

Re: [PATCH] x86, kdump: Fix efi=noruntime NULL pointer dereference

2018-08-10 Thread Mike Galbraith
On Fri, 2018-08-10 at 16:45 +0800, Dave Young wrote: > > BTW, this patch only fix the kexec load phase problem, even if kexec > load successfully with the fix, the 2nd kernel can not boot because efi > memmap info is not correct and usable. Hm. I didn't do anything else with kexec, but did

Re: [PATCH] x86, kdump: Fix efi=noruntime NULL pointer dereference

2018-08-10 Thread Mike Galbraith
On Fri, 2018-08-10 at 16:45 +0800, Dave Young wrote: > > BTW, this patch only fix the kexec load phase problem, even if kexec > load successfully with the fix, the 2nd kernel can not boot because efi > memmap info is not correct and usable. Hm. I didn't do anything else with kexec, but did

[PATCH] x86, kdump: Fix efi=noruntime NULL pointer dereference

2018-08-08 Thread Mike Galbraith
When booting with efi=noruntime, we call efi_runtime_map_copy() while loading the kdump kernel, and trip over a NULL efi.memmap.map. Avoid that and a useless allocation when the only mapping we can use (1:1) is not available. Signed-off-by: Mike Galbraith --- arch/x86/kernel/kexec-bzimage64.c

[PATCH] x86, kdump: Fix efi=noruntime NULL pointer dereference

2018-08-08 Thread Mike Galbraith
When booting with efi=noruntime, we call efi_runtime_map_copy() while loading the kdump kernel, and trip over a NULL efi.memmap.map. Avoid that and a useless allocation when the only mapping we can use (1:1) is not available. Signed-off-by: Mike Galbraith --- arch/x86/kernel/kexec-bzimage64.c

Re: [rt-patch 4/3] arm,KVM: Move phys_timer handling to hard irq context

2018-08-04 Thread Mike Galbraith
On Sat, 2018-08-04 at 14:25 +0200, Mike Galbraith wrote: > > Besides, there are more interesting fish in the arm64 sea than kvm. > > virgin 4.16.18-rt12-rt > > [ 537.236131] ITS queue timeout (65440 65504 4640) > [ 537.236150] ITS cmd its_build_inv_cmd failed

Re: [rt-patch 4/3] arm,KVM: Move phys_timer handling to hard irq context

2018-08-04 Thread Mike Galbraith
On Sat, 2018-08-04 at 14:25 +0200, Mike Galbraith wrote: > > Besides, there are more interesting fish in the arm64 sea than kvm. > > virgin 4.16.18-rt12-rt > > [ 537.236131] ITS queue timeout (65440 65504 4640) > [ 537.236150] ITS cmd its_build_inv_cmd failed

Re: [rt-patch 4/3] arm,KVM: Move phys_timer handling to hard irq context

2018-08-04 Thread Mike Galbraith
On Thu, 2018-08-02 at 19:43 +0200, Mike Galbraith wrote: > On Thu, 2018-08-02 at 18:50 +0200, Mike Galbraith wrote: > > On Thu, 2018-08-02 at 12:31 -0400, Steven Rostedt wrote: > > > On Thu, 02 Aug 2018 08:56:20 +0200 > > > Mike Galbraith wrote: > > > &g

Re: [rt-patch 4/3] arm,KVM: Move phys_timer handling to hard irq context

2018-08-04 Thread Mike Galbraith
On Thu, 2018-08-02 at 19:43 +0200, Mike Galbraith wrote: > On Thu, 2018-08-02 at 18:50 +0200, Mike Galbraith wrote: > > On Thu, 2018-08-02 at 12:31 -0400, Steven Rostedt wrote: > > > On Thu, 02 Aug 2018 08:56:20 +0200 > > > Mike Galbraith wrote: > > > &g

Re: [rt-patch 4/3] arm,KVM: Move phys_timer handling to hard irq context

2018-08-02 Thread Mike Galbraith
On Thu, 2018-08-02 at 18:50 +0200, Mike Galbraith wrote: > On Thu, 2018-08-02 at 12:31 -0400, Steven Rostedt wrote: > > On Thu, 02 Aug 2018 08:56:20 +0200 > > Mike Galbraith wrote: > > > > > (arm-land adventures 1/3 take2 will have to wait, my cup runeth over) &

Re: [rt-patch 4/3] arm,KVM: Move phys_timer handling to hard irq context

2018-08-02 Thread Mike Galbraith
On Thu, 2018-08-02 at 18:50 +0200, Mike Galbraith wrote: > On Thu, 2018-08-02 at 12:31 -0400, Steven Rostedt wrote: > > On Thu, 02 Aug 2018 08:56:20 +0200 > > Mike Galbraith wrote: > > > > > (arm-land adventures 1/3 take2 will have to wait, my cup runeth over) &

Re: [rt-patch 4/3] arm,KVM: Move phys_timer handling to hard irq context

2018-08-02 Thread Mike Galbraith
On Thu, 2018-08-02 at 12:31 -0400, Steven Rostedt wrote: > On Thu, 02 Aug 2018 08:56:20 +0200 > Mike Galbraith wrote: > > > (arm-land adventures 1/3 take2 will have to wait, my cup runeth over) > > > > v4.14..v4.15 timer handling changes including calling kvm_ti

Re: [rt-patch 4/3] arm,KVM: Move phys_timer handling to hard irq context

2018-08-02 Thread Mike Galbraith
On Thu, 2018-08-02 at 12:31 -0400, Steven Rostedt wrote: > On Thu, 02 Aug 2018 08:56:20 +0200 > Mike Galbraith wrote: > > > (arm-land adventures 1/3 take2 will have to wait, my cup runeth over) > > > > v4.14..v4.15 timer handling changes including calling kvm_ti

[rt-patch 1/3 v2] arm64/acpi/perf: move pmu allocation to an early CPU up hook

2018-08-02 Thread Mike Galbraith
with the other CPUHP_PERF_{ARCH}_PREPARE stages, where we'll be preemptible, thus no longer requiring a GFP_ATOMIC allocation either. Signed-off-by: Mike Galbraith --- drivers/perf/arm_pmu_acpi.c | 12 ++-- include/linux/cpuhotplug.h |2 +- 2 files changed, 7 insertions(+), 7 deletions

[rt-patch 1/3 v2] arm64/acpi/perf: move pmu allocation to an early CPU up hook

2018-08-02 Thread Mike Galbraith
with the other CPUHP_PERF_{ARCH}_PREPARE stages, where we'll be preemptible, thus no longer requiring a GFP_ATOMIC allocation either. Signed-off-by: Mike Galbraith --- drivers/perf/arm_pmu_acpi.c | 12 ++-- include/linux/cpuhotplug.h |2 +- 2 files changed, 7 insertions(+), 7 deletions

Re: cpu stopper threads and setaffinity leads to deadlock

2018-08-02 Thread Mike Galbraith
On Thu, 2018-08-02 at 10:12 +0200, Peter Zijlstra wrote: > On Wed, Aug 01, 2018 at 06:34:40PM -0700, Sodagudi Prasad wrote: > > diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c > > index e190d1e..f932e1e 100644 > > --- a/kernel/stop_machine.c > > +++ b/kernel/stop_machine.c > > @@ -87,9

Re: cpu stopper threads and setaffinity leads to deadlock

2018-08-02 Thread Mike Galbraith
On Thu, 2018-08-02 at 10:12 +0200, Peter Zijlstra wrote: > On Wed, Aug 01, 2018 at 06:34:40PM -0700, Sodagudi Prasad wrote: > > diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c > > index e190d1e..f932e1e 100644 > > --- a/kernel/stop_machine.c > > +++ b/kernel/stop_machine.c > > @@ -87,9

[rt-patch 4/3] arm,KVM: Move phys_timer handling to hard irq context

2018-08-02 Thread Mike Galbraith
est-vectors-kernel (2 tests) PASS selftest-vectors-user (2 tests) PASS selftest-smp (65 tests) PASS pci-test (1 tests) PASS pmu (3 tests) PASS gicv2-ipi (3 tests) PASS gicv3-ipi (3 tests) PASS gicv2-active (1 tests) PASS gicv3-active (1 tests) PASS psci (4 tests) PASS timer (8 tests) Signed-off-

[rt-patch 4/3] arm,KVM: Move phys_timer handling to hard irq context

2018-08-02 Thread Mike Galbraith
est-vectors-kernel (2 tests) PASS selftest-vectors-user (2 tests) PASS selftest-smp (65 tests) PASS pci-test (1 tests) PASS pmu (3 tests) PASS gicv2-ipi (3 tests) PASS gicv3-ipi (3 tests) PASS gicv2-active (1 tests) PASS gicv3-active (1 tests) PASS psci (4 tests) PASS timer (8 tests) Signed-off-

Re: bisected - arm64 kvm unit test failures

2018-08-01 Thread Mike Galbraith
On Wed, 2018-08-01 at 08:22 +0100, Marc Zyngier wrote: > On Wed, 01 Aug 2018 07:02:25 +0100, > Mike Galbraith wrote: > > > > [1 ] > > On Wed, 2018-08-01 at 06:35 +0100, Marc Zyngier wrote: > > > > > > Is it something that is repr

Re: bisected - arm64 kvm unit test failures

2018-08-01 Thread Mike Galbraith
On Wed, 2018-08-01 at 08:22 +0100, Marc Zyngier wrote: > On Wed, 01 Aug 2018 07:02:25 +0100, > Mike Galbraith wrote: > > > > [1 ] > > On Wed, 2018-08-01 at 06:35 +0100, Marc Zyngier wrote: > > > > > > Is it something that is repr

Re: bisected - arm64 kvm unit test failures

2018-08-01 Thread Mike Galbraith
On Wed, 2018-08-01 at 08:22 +0100, Marc Zyngier wrote: > > > Box is a 4 node/64 core TaiShan 2280. > > Is that what is also known as D05/HIP07, with 64 Cortex-A72? No idea, our rent-a-box web client shows nothing informative. -Mike

Re: bisected - arm64 kvm unit test failures

2018-08-01 Thread Mike Galbraith
On Wed, 2018-08-01 at 08:22 +0100, Marc Zyngier wrote: > > > Box is a 4 node/64 core TaiShan 2280. > > Is that what is also known as D05/HIP07, with 64 Cortex-A72? No idea, our rent-a-box web client shows nothing informative. -Mike

Re: bisected - arm64 kvm unit test failures

2018-08-01 Thread Mike Galbraith
On Wed, 2018-08-01 at 06:35 +0100, Marc Zyngier wrote: > > Is it something that is reproducible with the current mainline (non-RT)? These waters are a bit muddy, it's config dependent. I'm trying to generate a reproducing !RT config for -rc7 as we speak. If I build openSUSE/master-default, it

Re: bisected - arm64 kvm unit test failures

2018-08-01 Thread Mike Galbraith
On Wed, 2018-08-01 at 06:35 +0100, Marc Zyngier wrote: > > Is it something that is reproducible with the current mainline (non-RT)? These waters are a bit muddy, it's config dependent. I'm trying to generate a reproducing !RT config for -rc7 as we speak. If I build openSUSE/master-default, it

<    1   2   3   4   5   6   7   8   9   10   >