Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Mike Galbraith
On Tue, 2017-04-04 at 22:03 +0300, Michael S. Tsirkin wrote: > since I couldn't reproduce, I decided it's worth trying to see > what happens if we revert back to before 5c34d002dcc7. > > > Could you please test a tag "test" in my tree above? > It should point at

Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Mike Galbraith
On Tue, 2017-04-04 at 22:03 +0300, Michael S. Tsirkin wrote: > since I couldn't reproduce, I decided it's worth trying to see > what happens if we revert back to before 5c34d002dcc7. > > > Could you please test a tag "test" in my tree above? > It should point at

Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Mike Galbraith
On Wed, 2017-04-05 at 00:31 +0300, Michael S. Tsirkin wrote: > On Tue, Apr 04, 2017 at 08:38:35PM +0200, Mike Galbraith wrote: > > On Tue, 2017-04-04 at 21:00 +0300, Michael S. Tsirkin wrote: > > > > > And just making double sure, the 1st version that has the issue >

Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Mike Galbraith
On Wed, 2017-04-05 at 00:31 +0300, Michael S. Tsirkin wrote: > On Tue, Apr 04, 2017 at 08:38:35PM +0200, Mike Galbraith wrote: > > On Tue, 2017-04-04 at 21:00 +0300, Michael S. Tsirkin wrote: > > > > > And just making double sure, the 1st version that has the issue >

Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Mike Galbraith
On Tue, 2017-04-04 at 21:00 +0300, Michael S. Tsirkin wrote: > And just making double sure, the 1st version that has the issue > is 5c34d002dcc7, isn't it? I'm asking because subject says so > but then goes on to list subject from another commit. > This one is: > > virtio_pci: remove struct

Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Mike Galbraith
On Tue, 2017-04-04 at 21:00 +0300, Michael S. Tsirkin wrote: > And just making double sure, the 1st version that has the issue > is 5c34d002dcc7, isn't it? I'm asking because subject says so > but then goes on to list subject from another commit. > This one is: > > virtio_pci: remove struct

Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Mike Galbraith
On Tue, 2017-04-04 at 19:40 +0200, Mike Galbraith wrote: > On Tue, 2017-04-04 at 18:30 +0300, Michael S. Tsirkin wrote: > > > I couldn't reproduce it - let's make sure we are using the > > same tree. Could you pls try > > > > git://git.kernel.org/pub/scm/linux/

Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Mike Galbraith
On Tue, 2017-04-04 at 19:40 +0200, Mike Galbraith wrote: > On Tue, 2017-04-04 at 18:30 +0300, Michael S. Tsirkin wrote: > > > I couldn't reproduce it - let's make sure we are using the > > same tree. Could you pls try > > > > git://git.kernel.org/pub/scm/linux/

Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Mike Galbraith
On Tue, 2017-04-04 at 18:30 +0300, Michael S. Tsirkin wrote: > I couldn't reproduce it - let's make sure we are using the > same tree. Could you pls try > > git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next > > It's currently at cc79d42a7d7e57ff64f406a1fd3740afebac0b44

Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Mike Galbraith
On Tue, 2017-04-04 at 18:30 +0300, Michael S. Tsirkin wrote: > I couldn't reproduce it - let's make sure we are using the > same tree. Could you pls try > > git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next > > It's currently at cc79d42a7d7e57ff64f406a1fd3740afebac0b44

Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Mike Galbraith
On Tue, 2017-04-04 at 16:38 +0300, Michael S. Tsirkin wrote: > On Tue, Apr 04, 2017 at 06:02:52AM +0200, Mike Galbraith wrote: > > On Mon, 2017-04-03 at 21:11 +0300, Michael S. Tsirkin wrote: > > > On Mon, Apr 03, 2017 at 07:56:32PM +0200, Mike Galbraith wrote: > > > &g

Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Mike Galbraith
On Tue, 2017-04-04 at 16:38 +0300, Michael S. Tsirkin wrote: > On Tue, Apr 04, 2017 at 06:02:52AM +0200, Mike Galbraith wrote: > > On Mon, 2017-04-03 at 21:11 +0300, Michael S. Tsirkin wrote: > > > On Mon, Apr 03, 2017 at 07:56:32PM +0200, Mike Galbraith wrote: > > > &g

Re: [BUG nohz]: wrong user and system time accounting

2017-04-04 Thread Mike Galbraith
On Mon, 2017-04-03 at 16:40 +0200, Frederic Weisbecker wrote: > On Thu, Mar 30, 2017 at 03:35:22PM +0200, Mike Galbraith wrote: > Nohz_full is already bad for powersavings anyway. CPU 0 always ticks :-) OTOH, if a nohz_full set is doing what it was born to do, CPU0 tick spikes

Re: [BUG nohz]: wrong user and system time accounting

2017-04-04 Thread Mike Galbraith
On Mon, 2017-04-03 at 16:40 +0200, Frederic Weisbecker wrote: > On Thu, Mar 30, 2017 at 03:35:22PM +0200, Mike Galbraith wrote: > Nohz_full is already bad for powersavings anyway. CPU 0 always ticks :-) OTOH, if a nohz_full set is doing what it was born to do, CPU0 tick spikes

Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-03 Thread Mike Galbraith
On Mon, 2017-04-03 at 21:11 +0300, Michael S. Tsirkin wrote: > On Mon, Apr 03, 2017 at 07:56:32PM +0200, Mike Galbraith wrote: > > On Mon, 2017-04-03 at 16:18 +0200, Christoph Hellwig wrote: > > > Mike, > > > > > > can you try the patch below? > > >

Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-03 Thread Mike Galbraith
On Mon, 2017-04-03 at 21:11 +0300, Michael S. Tsirkin wrote: > On Mon, Apr 03, 2017 at 07:56:32PM +0200, Mike Galbraith wrote: > > On Mon, 2017-04-03 at 16:18 +0200, Christoph Hellwig wrote: > > > Mike, > > > > > > can you try the patch below? > > >

Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-03 Thread Mike Galbraith
On Mon, 2017-04-03 at 16:18 +0200, Christoph Hellwig wrote: > Mike, > > can you try the patch below? No more spinning kworker woes, but I still have a warning on hibernate, threadirqs invariant. I'm also seeing intermittent post hibernate hang funnies in virgin source +- this patch, and without

Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-03 Thread Mike Galbraith
On Mon, 2017-04-03 at 16:18 +0200, Christoph Hellwig wrote: > Mike, > > can you try the patch below? No more spinning kworker woes, but I still have a warning on hibernate, threadirqs invariant. I'm also seeing intermittent post hibernate hang funnies in virgin source +- this patch, and without

net/sched: latent livelock in dev_deactivate_many() due to yield() usage

2017-04-01 Thread Mike Galbraith
Greetings network wizards, Quoting kernel/sched/core.c: /** * yield - yield the current processor to other threads. * * Do not ever use this function, there's a 99% chance you're doing it wrong. * * The scheduler is at all times free to pick the calling task as the most * eligible task to

net/sched: latent livelock in dev_deactivate_many() due to yield() usage

2017-04-01 Thread Mike Galbraith
Greetings network wizards, Quoting kernel/sched/core.c: /** * yield - yield the current processor to other threads. * * Do not ever use this function, there's a 99% chance you're doing it wrong. * * The scheduler is at all times free to pick the calling task as the most * eligible task to

Re: [BUG nohz]: wrong user and system time accounting

2017-03-30 Thread Mike Galbraith
On Thu, 2017-03-30 at 09:02 -0400, Rik van Riel wrote: > On Thu, 2017-03-30 at 14:51 +0200, Frederic Weisbecker wrote: > > Also, why does it raise power consumption issues? > > On a system without either nohz_full or nohz idle > mode, skewed ticks result in CPU cores waking up > at different

Re: [BUG nohz]: wrong user and system time accounting

2017-03-30 Thread Mike Galbraith
On Thu, 2017-03-30 at 09:02 -0400, Rik van Riel wrote: > On Thu, 2017-03-30 at 14:51 +0200, Frederic Weisbecker wrote: > > Also, why does it raise power consumption issues? > > On a system without either nohz_full or nohz idle > mode, skewed ticks result in CPU cores waking up > at different

Re: [BUG nohz]: wrong user and system time accounting

2017-03-30 Thread Mike Galbraith
On Thu, 2017-03-30 at 14:40 +0200, Frederic Weisbecker wrote: > On Thu, Mar 30, 2017 at 09:58:44AM +0800, Wanpeng Li wrote: > > There is such a feature skew_tick currently, refer to commit > > 5307c9556bc (tick: add tick skew boot option), w/ skew_tick=1 boot > > parameter, the bug disappear,

Re: [BUG nohz]: wrong user and system time accounting

2017-03-30 Thread Mike Galbraith
On Thu, 2017-03-30 at 14:40 +0200, Frederic Weisbecker wrote: > On Thu, Mar 30, 2017 at 09:58:44AM +0800, Wanpeng Li wrote: > > There is such a feature skew_tick currently, refer to commit > > 5307c9556bc (tick: add tick skew boot option), w/ skew_tick=1 boot > > parameter, the bug disappear,

Re: [BUG nohz]: wrong user and system time accounting

2017-03-30 Thread Mike Galbraith
On Thu, 2017-03-30 at 19:52 +0800, Wanpeng Li wrote: > If we should just add random offset to the cpu in the nohz_full mode? Up to you, whatever works best. I left the regular skew alone, just added some noise to scheduler_tick_max_deferment(). -Mike

Re: [BUG nohz]: wrong user and system time accounting

2017-03-30 Thread Mike Galbraith
On Thu, 2017-03-30 at 19:52 +0800, Wanpeng Li wrote: > If we should just add random offset to the cpu in the nohz_full mode? Up to you, whatever works best. I left the regular skew alone, just added some noise to scheduler_tick_max_deferment(). -Mike

Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-30 Thread Mike Galbraith
On Thu, 2017-03-30 at 05:10 +0200, Mike Galbraith wrote: > WRT spin, you should need do nothing more than boot with threadirqs, > that's 100% repeatable here in absolutely virgin source. No idea why virtqueue_get_buf() in __send_control_msg() fails forever with threadirqs, but marking t

Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-30 Thread Mike Galbraith
On Thu, 2017-03-30 at 05:10 +0200, Mike Galbraith wrote: > WRT spin, you should need do nothing more than boot with threadirqs, > that's 100% repeatable here in absolutely virgin source. No idea why virtqueue_get_buf() in __send_control_msg() fails forever with threadirqs, but marking t

Re: [BUG nohz]: wrong user and system time accounting

2017-03-29 Thread Mike Galbraith
On Wed, 2017-03-29 at 16:08 -0400, Rik van Riel wrote: > In other words, the tick on cpu0 is aligned > with the tick on the nohz_full cpus, and > jiffies is advanced while the nohz_full cpus > with an active tick happen to be in kernel > mode? You really want skew_tick=1, especially on big

Re: [BUG nohz]: wrong user and system time accounting

2017-03-29 Thread Mike Galbraith
On Wed, 2017-03-29 at 16:08 -0400, Rik van Riel wrote: > In other words, the tick on cpu0 is aligned > with the tick on the nohz_full cpus, and > jiffies is advanced while the nohz_full cpus > with an active tick happen to be in kernel > mode? You really want skew_tick=1, especially on big

Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-29 Thread Mike Galbraith
On Wed, 2017-03-29 at 23:19 +0300, Michael S. Tsirkin wrote: > > > > > > > > > > > >max_nr_ports) == 0) { > > @@ -2179,7 +2179,9 @@ static struct virtio_device_id id_table[ > > > > static unsigned int features[] = { > > > >> > VIRTIO_CONSOLE_F_SIZE, > > +#ifndef

Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-29 Thread Mike Galbraith
On Wed, 2017-03-29 at 23:19 +0300, Michael S. Tsirkin wrote: > > > > > > > > > > > >max_nr_ports) == 0) { > > @@ -2179,7 +2179,9 @@ static struct virtio_device_id id_table[ > > > > static unsigned int features[] = { > > > >> > VIRTIO_CONSOLE_F_SIZE, > > +#ifndef

Re: [PATCH] virtio_console: fix uninitialized variable use

2017-03-29 Thread Mike Galbraith
On Wed, 2017-03-29 at 23:27 +0300, Michael S. Tsirkin wrote: > Hi Mike > if you like, pls send me your Signed-off-by and I'll > change the patch to make you an author. Nah, it's perfect as it is. While I was pretty darn sure it was generic, I intentionally posted it as diagnostic

Re: [PATCH] virtio_console: fix uninitialized variable use

2017-03-29 Thread Mike Galbraith
On Wed, 2017-03-29 at 23:27 +0300, Michael S. Tsirkin wrote: > Hi Mike > if you like, pls send me your Signed-off-by and I'll > change the patch to make you an author. Nah, it's perfect as it is. While I was pretty darn sure it was generic, I intentionally posted it as diagnostic

Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-29 Thread Mike Galbraith
On Wed, 2017-03-29 at 23:10 +0300, Michael S. Tsirkin wrote: > Poking at this some more, I was able to reproduce at > least some warnings. I still do not see a spin > but is there a chance this helps your case too? Well, it's down to one warning, clean on the way back up. WRT spin, you should

Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-29 Thread Mike Galbraith
On Wed, 2017-03-29 at 23:10 +0300, Michael S. Tsirkin wrote: > Poking at this some more, I was able to reproduce at > least some warnings. I still do not see a spin > but is there a chance this helps your case too? Well, it's down to one warning, clean on the way back up. WRT spin, you should

Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-29 Thread Mike Galbraith
On Mon, 2017-03-27 at 20:18 +0200, Mike Galbraith wrote: > BTW, WRT RT woes with $subject, I tried booting a generic kernel with > threadirqs, and bingo, same deal, just a bit more painful than for RT, > where there's no watchdog moaning accompanying the (preemptible) spin. BTW++: the

Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-29 Thread Mike Galbraith
On Mon, 2017-03-27 at 20:18 +0200, Mike Galbraith wrote: > BTW, WRT RT woes with $subject, I tried booting a generic kernel with > threadirqs, and bingo, same deal, just a bit more painful than for RT, > where there's no watchdog moaning accompanying the (preemptible) spin. BTW++: the

Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-28 Thread Mike Galbraith
On Tue, 2017-03-28 at 20:27 +0300, Michael S. Tsirkin wrote: > On Tue, Mar 28, 2017 at 06:33:53PM +0200, Mike Galbraith wrote: > > On Tue, 2017-03-28 at 18:37 +0300, Michael S. Tsirkin wrote: > > > > > Anything specific that you do to trigger this? > > > > N

Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-28 Thread Mike Galbraith
On Tue, 2017-03-28 at 20:27 +0300, Michael S. Tsirkin wrote: > On Tue, Mar 28, 2017 at 06:33:53PM +0200, Mike Galbraith wrote: > > On Tue, 2017-03-28 at 18:37 +0300, Michael S. Tsirkin wrote: > > > > > Anything specific that you do to trigger this? > > > > N

Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-28 Thread Mike Galbraith
On Tue, 2017-03-28 at 18:37 +0300, Michael S. Tsirkin wrote: > Anything specific that you do to trigger this? Nope, all I have to do is to poke kde Power/Session Hibernate button. Not that it should matter, but the vm is a full clone of my 42.1 box, including git server/repos etc, so has all

Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-28 Thread Mike Galbraith
On Tue, 2017-03-28 at 18:37 +0300, Michael S. Tsirkin wrote: > Anything specific that you do to trigger this? Nope, all I have to do is to poke kde Power/Session Hibernate button. Not that it should matter, but the vm is a full clone of my 42.1 box, including git server/repos etc, so has all

Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-27 Thread Mike Galbraith
On Tue, 2017-03-28 at 05:35 +0300, Michael S. Tsirkin wrote: > On Tue, Mar 28, 2017 at 03:08:20AM +0200, Mike Galbraith wrote: > > On Mon, 2017-03-27 at 21:16 +0300, Michael S. Tsirkin wrote: > > > > > Mike, could you pls send lspci -vv that shows up after > > >

Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-27 Thread Mike Galbraith
On Tue, 2017-03-28 at 05:35 +0300, Michael S. Tsirkin wrote: > On Tue, Mar 28, 2017 at 03:08:20AM +0200, Mike Galbraith wrote: > > On Mon, 2017-03-27 at 21:16 +0300, Michael S. Tsirkin wrote: > > > > > Mike, could you pls send lspci -vv that shows up after > > >

Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-27 Thread Mike Galbraith
On Mon, 2017-03-27 at 21:16 +0300, Michael S. Tsirkin wrote: > Mike, could you pls send lspci -vv that shows up after > boot? Presuming you mean the virtual box.. 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02) Subsystem: Red Hat, Inc Qemu virtual machine

Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-27 Thread Mike Galbraith
On Mon, 2017-03-27 at 21:16 +0300, Michael S. Tsirkin wrote: > Mike, could you pls send lspci -vv that shows up after > boot? Presuming you mean the virtual box.. 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02) Subsystem: Red Hat, Inc Qemu virtual machine

Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-27 Thread Mike Galbraith
On Mon, 2017-03-27 at 19:05 +0200, Christoph Hellwig wrote: > Hi Mike, > > does the patch below fix that issue for you? Nope, warnings are alive and well. > diff --git a/drivers/virtio/virtio_pci_common.c > b/drivers/virtio/virtio_pci_common.c > index df548a6fb844..fd1b06368b1f 100644 > ---

Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-27 Thread Mike Galbraith
On Mon, 2017-03-27 at 19:05 +0200, Christoph Hellwig wrote: > Hi Mike, > > does the patch below fix that issue for you? Nope, warnings are alive and well. > diff --git a/drivers/virtio/virtio_pci_common.c > b/drivers/virtio/virtio_pci_common.c > index df548a6fb844..fd1b06368b1f 100644 > ---

Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-27 Thread Mike Galbraith
On Mon, 2017-03-27 at 19:05 +0200, Christoph Hellwig wrote: > Hi Mike, > > does the patch below fix that issue for you? Thanks, I'll give it a go in the A.M. BTW, WRT RT woes with $subject, I tried booting a generic kernel with threadirqs, and bingo, same deal, just a bit more painful than for

Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-27 Thread Mike Galbraith
On Mon, 2017-03-27 at 19:05 +0200, Christoph Hellwig wrote: > Hi Mike, > > does the patch below fix that issue for you? Thanks, I'll give it a go in the A.M. BTW, WRT RT woes with $subject, I tried booting a generic kernel with threadirqs, and bingo, same deal, just a bit more painful than for

Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-27 Thread Mike Galbraith
On Thu, 2017-03-23 at 15:56 +0100, Christoph Hellwig wrote: > Does the patch from Jason in the > > "[REGRESSION] 07ec51480b5e ("virtio_pci: use shared interrupts for > virtqueues") causes crashes in guest" > > thread fix the issue for you? That seems to eliminate explosions, but not the below.

Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-27 Thread Mike Galbraith
On Thu, 2017-03-23 at 15:56 +0100, Christoph Hellwig wrote: > Does the patch from Jason in the > > "[REGRESSION] 07ec51480b5e ("virtio_pci: use shared interrupts for > virtqueues") causes crashes in guest" > > thread fix the issue for you? That seems to eliminate explosions, but not the below.

Re: Splat during resume

2017-03-26 Thread Mike Galbraith
On Sun, 2017-03-26 at 10:41 +0200, Borislav Petkov wrote: > Btw, try the 6 patches here: > https://marc.info/?l=linux-mm=148977696117208=2 > ontop of tip. Should fix your vaporite too. Yeah, silicon is still happy, vaporite boots gripe free. Trying to hibernate vaporite was a bad idea, but is

Re: Splat during resume

2017-03-26 Thread Mike Galbraith
On Sun, 2017-03-26 at 10:41 +0200, Borislav Petkov wrote: > Btw, try the 6 patches here: > https://marc.info/?l=linux-mm=148977696117208=2 > ontop of tip. Should fix your vaporite too. Yeah, silicon is still happy, vaporite boots gripe free. Trying to hibernate vaporite was a bad idea, but is

Re: Splat during resume

2017-03-26 Thread Mike Galbraith
On Sat, 2017-03-25 at 22:46 +0100, Borislav Petkov wrote: > On Sat, Mar 25, 2017 at 07:58:55PM +0100, Borislav Petkov wrote: > > Hey Rafael, > > > > have you seen this already (partial splat photo attached)? Happens > > during resume from s2d. Judging by the timestamps, this looks like the > >

Re: Splat during resume

2017-03-26 Thread Mike Galbraith
On Sat, 2017-03-25 at 22:46 +0100, Borislav Petkov wrote: > On Sat, Mar 25, 2017 at 07:58:55PM +0100, Borislav Petkov wrote: > > Hey Rafael, > > > > have you seen this already (partial splat photo attached)? Happens > > during resume from s2d. Judging by the timestamps, this looks like the > >

Re: Still OOM problems with 4.9er/4.10er kernels

2017-03-23 Thread Mike Galbraith
On Thu, 2017-03-23 at 08:16 +0100, Gerhard Wiesinger wrote: > On 21.03.2017 08:13, Mike Galbraith wrote: > > On Tue, 2017-03-21 at 06:59 +0100, Gerhard Wiesinger wrote: > > > > > Is this the correct information? > > Incomplete, but enough to reiterate cgroup

Re: Still OOM problems with 4.9er/4.10er kernels

2017-03-23 Thread Mike Galbraith
On Thu, 2017-03-23 at 08:16 +0100, Gerhard Wiesinger wrote: > On 21.03.2017 08:13, Mike Galbraith wrote: > > On Tue, 2017-03-21 at 06:59 +0100, Gerhard Wiesinger wrote: > > > > > Is this the correct information? > > Incomplete, but enough to reiterate cgroup

Re: Still OOM problems with 4.9er/4.10er kernels

2017-03-21 Thread Mike Galbraith
On Tue, 2017-03-21 at 06:59 +0100, Gerhard Wiesinger wrote: > Is this the correct information? Incomplete, but enough to reiterate cgroup_disable=memory suggestion. -Mike

Re: Still OOM problems with 4.9er/4.10er kernels

2017-03-21 Thread Mike Galbraith
On Tue, 2017-03-21 at 06:59 +0100, Gerhard Wiesinger wrote: > Is this the correct information? Incomplete, but enough to reiterate cgroup_disable=memory suggestion. -Mike

Re: Still OOM problems with 4.9er/4.10er kernels

2017-03-19 Thread Mike Galbraith
On Sun, 2017-03-19 at 17:02 +0100, Gerhard Wiesinger wrote: > mount | grep cgroup Just because controllers are mounted doesn't mean they're populated. To check that, you want to look for directories under the mount points with a non-empty 'tasks'. You will find some, but memory cgroup

Re: Still OOM problems with 4.9er/4.10er kernels

2017-03-19 Thread Mike Galbraith
On Sun, 2017-03-19 at 17:02 +0100, Gerhard Wiesinger wrote: > mount | grep cgroup Just because controllers are mounted doesn't mean they're populated. To check that, you want to look for directories under the mount points with a non-empty 'tasks'. You will find some, but memory cgroup

Re: change "tcp: randomize tcp timestamp offsets for each connection" broke networking

2017-03-15 Thread Mike Galbraith
On Wed, 2017-03-15 at 20:48 +0100, Lutz Vieweg wrote: > Dear Linux Developers, > > change set "tcp: randomize tcp timestamp offsets for each connection" > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/co > mmit/?id=95a22caee396cef0bb2ca8fafdd82966a49367bb > broke networking

Re: change "tcp: randomize tcp timestamp offsets for each connection" broke networking

2017-03-15 Thread Mike Galbraith
On Wed, 2017-03-15 at 20:48 +0100, Lutz Vieweg wrote: > Dear Linux Developers, > > change set "tcp: randomize tcp timestamp offsets for each connection" > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/co > mmit/?id=95a22caee396cef0bb2ca8fafdd82966a49367bb > broke networking

Re: [PATCHSET for-4.11] cgroup: implement cgroup v2 thread mode

2017-03-14 Thread Mike Galbraith
On Mon, 2017-03-13 at 15:26 -0400, Tejun Heo wrote: > Hello, Mike. > > Sorry about the long delay. > > On Mon, Feb 13, 2017 at 06:45:07AM +0100, Mike Galbraith wrote: > > > > So, as long as the depth stays reasonable (single digit or lower), > > > > what

Re: [PATCHSET for-4.11] cgroup: implement cgroup v2 thread mode

2017-03-14 Thread Mike Galbraith
On Mon, 2017-03-13 at 15:26 -0400, Tejun Heo wrote: > Hello, Mike. > > Sorry about the long delay. > > On Mon, Feb 13, 2017 at 06:45:07AM +0100, Mike Galbraith wrote: > > > > So, as long as the depth stays reasonable (single digit or lower), > > > > what

Re: oops with 4.9.13-rt12 under mild load (and no rt-tasks active)

2017-03-10 Thread Mike Galbraith
On Fri, 2017-03-10 at 19:47 +, Nicholas Mc Guire wrote: > Has anyone seen 4.9.13-rt12 oopses related to ext4 or vfs in general ? FWIW, here it's seen quite a bit of hefty use on boxen large and small with no trouble. That said, @stable has a large pile queued for 4.9, 8 for ext4, some of

Re: oops with 4.9.13-rt12 under mild load (and no rt-tasks active)

2017-03-10 Thread Mike Galbraith
On Fri, 2017-03-10 at 19:47 +, Nicholas Mc Guire wrote: > Has anyone seen 4.9.13-rt12 oopses related to ext4 or vfs in general ? FWIW, here it's seen quite a bit of hefty use on boxen large and small with no trouble. That said, @stable has a large pile queued for 4.9, 8 for ext4, some of

Re: kexec, x86/purgatory: Cleanup the unholy mess

2017-03-10 Thread Mike Galbraith
On Fri, 2017-03-10 at 16:31 +0100, Thomas Gleixner wrote: > On Fri, 10 Mar 2017, Mike Galbraith wrote: > > > On Fri, 2017-03-10 at 15:56 +0100, Thomas Gleixner wrote: > > > On Fri, 10 Mar 2017, Mike Galbraith wrote: > > > > Stuffing the lot into .kexec-purgatory

Re: kexec, x86/purgatory: Cleanup the unholy mess

2017-03-10 Thread Mike Galbraith
On Fri, 2017-03-10 at 16:31 +0100, Thomas Gleixner wrote: > On Fri, 10 Mar 2017, Mike Galbraith wrote: > > > On Fri, 2017-03-10 at 15:56 +0100, Thomas Gleixner wrote: > > > On Fri, 10 Mar 2017, Mike Galbraith wrote: > > > > Stuffing the lot into .kexec-purgatory

Re: kexec, x86/purgatory: Cleanup the unholy mess

2017-03-10 Thread Mike Galbraith
On Fri, 2017-03-10 at 15:56 +0100, Thomas Gleixner wrote: > On Fri, 10 Mar 2017, Mike Galbraith wrote: > > Stuffing the lot into .kexec-purgatory worked. > > You beat me to it :) That's odd, I'm usually a day late and a dollar short :) -Mike

Re: kexec, x86/purgatory: Cleanup the unholy mess

2017-03-10 Thread Mike Galbraith
On Fri, 2017-03-10 at 15:56 +0100, Thomas Gleixner wrote: > On Fri, 10 Mar 2017, Mike Galbraith wrote: > > Stuffing the lot into .kexec-purgatory worked. > > You beat me to it :) That's odd, I'm usually a day late and a dollar short :) -Mike

Re: kexec, x86/purgatory: Cleanup the unholy mess

2017-03-10 Thread Mike Galbraith
On Fri, 2017-03-10 at 14:57 +0100, Thomas Gleixner wrote: > On Fri, 10 Mar 2017, Mike Galbraith wrote: > > On Fri, 2017-03-10 at 13:17 +0100, Thomas Gleixner wrote: > > > The purgatory code defines global variables which are referenced via a > > > symbol lookup in th

Re: kexec, x86/purgatory: Cleanup the unholy mess

2017-03-10 Thread Mike Galbraith
On Fri, 2017-03-10 at 14:57 +0100, Thomas Gleixner wrote: > On Fri, 10 Mar 2017, Mike Galbraith wrote: > > On Fri, 2017-03-10 at 13:17 +0100, Thomas Gleixner wrote: > > > The purgatory code defines global variables which are referenced via a > > > symbol lookup in th

Re: kexec, x86/purgatory: Cleanup the unholy mess

2017-03-10 Thread Mike Galbraith
On Fri, 2017-03-10 at 13:17 +0100, Thomas Gleixner wrote: > The purgatory code defines global variables which are referenced via a > symbol lookup in the kexec code (core and arch). > > A recent commit addressing sparse warning made these static and thereby > broke kexec file. > > Why did this

Re: kexec, x86/purgatory: Cleanup the unholy mess

2017-03-10 Thread Mike Galbraith
On Fri, 2017-03-10 at 13:17 +0100, Thomas Gleixner wrote: > The purgatory code defines global variables which are referenced via a > symbol lookup in the kexec code (core and arch). > > A recent commit addressing sparse warning made these static and thereby > broke kexec file. > > Why did this

Re: [block] BUG: KASAN: use-after-free in rb_erase+0x1431/0x1970

2017-03-09 Thread Mike Galbraith
On Thu, 2017-03-09 at 08:38 -0700, Jens Axboe wrote: > On 03/09/2017 08:16 AM, Mike Galbraith wrote: > > Greetings, > > > > Building master.today with kasan enabled (because I saw the same when > > trying out kasan on rt), the below fell out. > > > > Confi

Re: [block] BUG: KASAN: use-after-free in rb_erase+0x1431/0x1970

2017-03-09 Thread Mike Galbraith
On Thu, 2017-03-09 at 08:38 -0700, Jens Axboe wrote: > On 03/09/2017 08:16 AM, Mike Galbraith wrote: > > Greetings, > > > > Building master.today with kasan enabled (because I saw the same when > > trying out kasan on rt), the below fell out. > > > > Confi

Re: [regression] 72042a8c7b01 x86/purgatory: Make functions and variables static

2017-03-09 Thread Mike Galbraith
On Thu, 2017-03-09 at 18:50 +0100, Thomas Gleixner wrote: > On Thu, 9 Mar 2017, Mike Galbraith wrote: > > > Greetings, > > > > I bisected kdump breakage to $subject, and verified the identified > > culprit via revert. Seems kexec needs those variables as they were.

Re: [regression] 72042a8c7b01 x86/purgatory: Make functions and variables static

2017-03-09 Thread Mike Galbraith
On Thu, 2017-03-09 at 18:50 +0100, Thomas Gleixner wrote: > On Thu, 9 Mar 2017, Mike Galbraith wrote: > > > Greetings, > > > > I bisected kdump breakage to $subject, and verified the identified > > culprit via revert. Seems kexec needs those variables as they were.

[block] BUG: KASAN: use-after-free in rb_erase+0x1431/0x1970

2017-03-09 Thread Mike Galbraith
Greetings, Building master.today with kasan enabled (because I saw the same when trying out kasan on rt), the below fell out. Config is enterprise based (tune for maximum build time), plus PREEMPT. [5.335444] == [5.337030]

[block] BUG: KASAN: use-after-free in rb_erase+0x1431/0x1970

2017-03-09 Thread Mike Galbraith
Greetings, Building master.today with kasan enabled (because I saw the same when trying out kasan on rt), the below fell out. Config is enterprise based (tune for maximum build time), plus PREEMPT. [5.335444] == [5.337030]

[regression] 72042a8c7b01 x86/purgatory: Make functions and variables static

2017-03-09 Thread Mike Galbraith
Greetings, I bisected kdump breakage to $subject, and verified the identified culprit via revert. Seems kexec needs those variables as they were. -Mike

[regression] 72042a8c7b01 x86/purgatory: Make functions and variables static

2017-03-09 Thread Mike Galbraith
Greetings, I bisected kdump breakage to $subject, and verified the identified culprit via revert. Seems kexec needs those variables as they were. -Mike

Re: [PATCH v3] lockdep: Teach lockdep about memalloc_noio_save

2017-03-02 Thread Mike Galbraith
On Wed, 2017-03-01 at 16:46 +0100, Peter Zijlstra wrote: > On Wed, Mar 01, 2017 at 01:29:57PM +0200, Nikolay Borisov wrote: > > Commit 21caf2fc1931 ("mm: teach mm by current context info to not do I/O > > during memory allocation") added the memalloc_noio_(save|restore) functions > > to enable

Re: [PATCH v3] lockdep: Teach lockdep about memalloc_noio_save

2017-03-02 Thread Mike Galbraith
On Wed, 2017-03-01 at 16:46 +0100, Peter Zijlstra wrote: > On Wed, Mar 01, 2017 at 01:29:57PM +0200, Nikolay Borisov wrote: > > Commit 21caf2fc1931 ("mm: teach mm by current context info to not do I/O > > during memory allocation") added the memalloc_noio_(save|restore) functions > > to enable

Re: [cgroups] suspicious rcu_dereference_check() usage!

2017-03-01 Thread Mike Galbraith
On Wed, 2017-03-01 at 12:44 -0500, Tejun Heo wrote: > If you still have the .config around, can you please attach it? I'll > verify the fix and send out the fix. Resurrected (master) and attached. -Mike config.xz Description: application/xz

Re: [cgroups] suspicious rcu_dereference_check() usage!

2017-03-01 Thread Mike Galbraith
On Wed, 2017-03-01 at 12:44 -0500, Tejun Heo wrote: > If you still have the .config around, can you please attach it? I'll > verify the fix and send out the fix. Resurrected (master) and attached. -Mike config.xz Description: application/xz

Re: [GIT pull] x86/timers for 4.10

2017-02-23 Thread Mike Galbraith
On Thu, 2017-02-23 at 11:26 +0100, Borislav Petkov wrote: > On Thu, Feb 23, 2017 at 09:20:06AM +0100, Mike Galbraith wrote: > > --- a/arch/x86/kernel/tsc_sync.c > > +++ b/arch/x86/kernel/tsc_sync.c > > @@ -294,7 +294,7 @@ void check_tsc_sync_source(int cpu) > > >

Re: [GIT pull] x86/timers for 4.10

2017-02-23 Thread Mike Galbraith
On Thu, 2017-02-23 at 11:26 +0100, Borislav Petkov wrote: > On Thu, Feb 23, 2017 at 09:20:06AM +0100, Mike Galbraith wrote: > > --- a/arch/x86/kernel/tsc_sync.c > > +++ b/arch/x86/kernel/tsc_sync.c > > @@ -294,7 +294,7 @@ void check_tsc_sync_source(int cpu) > > >

Re: 9908859acaa9 cpuidle/menu: add per CPU PM QoS resume latency consideration

2017-02-23 Thread Mike Galbraith
On Thu, 2017-02-23 at 13:15 +0100, Rafael J. Wysocki wrote: > On Wednesday, February 22, 2017 10:55:04 PM Alex Shi wrote: > > > > > > Its not hard; spinlock_t ends up being a mutex, and this is ran > > > from the > > > idle thread. What thread do you think we ought to run when we > > > block > >

Re: 9908859acaa9 cpuidle/menu: add per CPU PM QoS resume latency consideration

2017-02-23 Thread Mike Galbraith
On Thu, 2017-02-23 at 13:15 +0100, Rafael J. Wysocki wrote: > On Wednesday, February 22, 2017 10:55:04 PM Alex Shi wrote: > > > > > > Its not hard; spinlock_t ends up being a mutex, and this is ran > > > from the > > > idle thread. What thread do you think we ought to run when we > > > block > >

Re: [GIT pull] x86/timers for 4.10

2017-02-23 Thread Mike Galbraith
On Thu, 2017-02-09 at 16:07 +0100, Thomas Gleixner wrote: > On Wed, 8 Feb 2017, Mike Galbraith wrote: > > On Wed, 2017-02-08 at 12:44 +0100, Thomas Gleixner wrote: > > > On Mon, 6 Feb 2017, Olof Johansson wrote: > > > > [0.177102] [Firmware Bug]: TSC ADJUST d

Re: [GIT pull] x86/timers for 4.10

2017-02-23 Thread Mike Galbraith
On Thu, 2017-02-09 at 16:07 +0100, Thomas Gleixner wrote: > On Wed, 8 Feb 2017, Mike Galbraith wrote: > > On Wed, 2017-02-08 at 12:44 +0100, Thomas Gleixner wrote: > > > On Mon, 6 Feb 2017, Olof Johansson wrote: > > > > [0.177102] [Firmware Bug]: TSC ADJUST d

Re: 9908859acaa9 cpuidle/menu: add per CPU PM QoS resume latency consideration

2017-02-22 Thread Mike Galbraith
On Wed, 2017-02-22 at 23:36 +0800, Alex Shi wrote: > Sorry. Mike. > What you mean of 'took the zero added cycles option'? :) #ifndef CONFIG_PREEMPT_RT_FULL ... #endif I waved my magic ifdef wand, and poof, they disappeared :) -Mike

Re: 9908859acaa9 cpuidle/menu: add per CPU PM QoS resume latency consideration

2017-02-22 Thread Mike Galbraith
On Wed, 2017-02-22 at 23:36 +0800, Alex Shi wrote: > Sorry. Mike. > What you mean of 'took the zero added cycles option'? :) #ifndef CONFIG_PREEMPT_RT_FULL ... #endif I waved my magic ifdef wand, and poof, they disappeared :) -Mike

Re: 9908859acaa9 cpuidle/menu: add per CPU PM QoS resume latency consideration

2017-02-22 Thread Mike Galbraith
On Wed, 2017-02-22 at 22:53 +0800, Alex Shi wrote: > cc Rafael. > > > On 02/22/2017 09:12 PM, Peter Zijlstra wrote: > > On Wed, Feb 22, 2017 at 01:56:37PM +0100, Mike Galbraith wrote: > > > Hi, > > > > > > Do we really need a spinlock for that i

Re: 9908859acaa9 cpuidle/menu: add per CPU PM QoS resume latency consideration

2017-02-22 Thread Mike Galbraith
On Wed, 2017-02-22 at 22:53 +0800, Alex Shi wrote: > cc Rafael. > > > On 02/22/2017 09:12 PM, Peter Zijlstra wrote: > > On Wed, Feb 22, 2017 at 01:56:37PM +0100, Mike Galbraith wrote: > > > Hi, > > > > > > Do we really need a spinlock for that i

Re: 9908859acaa9 cpuidle/menu: add per CPU PM QoS resume latency consideration

2017-02-22 Thread Mike Galbraith
On Wed, 2017-02-22 at 22:31 +0800, Alex Shi wrote: > > On 02/22/2017 09:19 PM, Mike Galbraith wrote: > > On Wed, 2017-02-22 at 14:12 +0100, Peter Zijlstra wrote: > > > On Wed, Feb 22, 2017 at 01:56:37PM +0100, Mike Galbraith wrote: > > > > Hi, > > &g

Re: 9908859acaa9 cpuidle/menu: add per CPU PM QoS resume latency consideration

2017-02-22 Thread Mike Galbraith
On Wed, 2017-02-22 at 22:31 +0800, Alex Shi wrote: > > On 02/22/2017 09:19 PM, Mike Galbraith wrote: > > On Wed, 2017-02-22 at 14:12 +0100, Peter Zijlstra wrote: > > > On Wed, Feb 22, 2017 at 01:56:37PM +0100, Mike Galbraith wrote: > > > > Hi, > > &g

<    7   8   9   10   11   12   13   14   15   16   >