Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-10 Thread Mike Galbraith
On Tue, 2017-04-11 at 00:23 +0300, Michael S. Tsirkin wrote:
> On Sat, Apr 08, 2017 at 07:01:34AM +0200, Mike Galbraith wrote:
> > On Fri, 2017-04-07 at 21:56 +0300, Michael S. Tsirkin wrote:
> > 
> > > OK. test3 and test4 are now pushed: test3 should fix your hang,
> > > test4 is trying to fix a crash reported independently.
> > 
> > test3 does not fix the post hibernate hang business that I can easily
> > reproduce, those are NFS, and at least as old as 4.4.  Host/guest,
> > dunno, put 4.4 on both, guest hangs intermittently.
> 
> OK so IIUC you agree it's a good idea to send test4 to Linus, right?

Well, my box agrees that that is a viable option.

> Hybernation's still broken but that's not a regression.

Yup.

> > [] __rpc_wait_for_completion_task+0x30/0x30 [sunrpc]
> > [] rpc_wait_bit_killable+0x1e/0xb0 [sunrpc]
> > [] __rpc_wait_for_completion_task+0x30/0x30 [sunrpc]
> > [] autoremove_wake_function+0x50/0x50
> > [] call_decode+0x850/0x850 [sunrpc]
> > [] call_decode+0x850/0x850 [sunrpc]
> > [] __rpc_execute+0x14e/0x440 [sunrpc]
> > [] ktime_get+0x35/0xa0
> > [] rpc_run_task+0x120/0x170 [sunrpc]
> > [] nfs4_call_sync_sequence+0x56/0x80 [nfsv4]
> > [] _nfs4_proc_getattr+0xb0/0xc0 [nfsv4]
> > [] path_lookupat+0xd2/0x100
> > [] nfs4_proc_getattr+0x5c/0xe0 [nfsv4]
> > [] __nfs_revalidate_inode+0xa0/0x300 [nfs]
> > [] nfs_getattr+0x95/0x250 [nfs]
> > [] vfs_statx+0x7b/0xc0
> > [] SYSC_newstat+0x20/0x40
> > [] entry_SYSCALL_64_fastpath+0x1a/0xa9
> > [] 0x
> > 
> > I noted no _other_ misbehavior in either kernel, w/wo threadirqs.
> > 
> > > > -Mike
> 
> Interesting. I would guess virtio net does not complete some
> packets. So you were unable to find an old guest where this
> works fine?

I just tried my opensuse 13.2 clone.  It works markedly less fine,
turns into a brick either on the way down or back up in short order.

-Mike


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-10 Thread Michael S. Tsirkin
On Sat, Apr 08, 2017 at 07:01:34AM +0200, Mike Galbraith wrote:
> On Fri, 2017-04-07 at 21:56 +0300, Michael S. Tsirkin wrote:
> 
> > OK. test3 and test4 are now pushed: test3 should fix your hang,
> > test4 is trying to fix a crash reported independently.
> 
> test3 does not fix the post hibernate hang business that I can easily
> reproduce, those are NFS, and at least as old as 4.4.  Host/guest,
> dunno, put 4.4 on both, guest hangs intermittently.

OK so IIUC you agree it's a good idea to send test4 to Linus, right?
Hybernation's still broken but that's not a regression.

> [] __rpc_wait_for_completion_task+0x30/0x30 [sunrpc]
> [] rpc_wait_bit_killable+0x1e/0xb0 [sunrpc]
> [] __rpc_wait_for_completion_task+0x30/0x30 [sunrpc]
> [] autoremove_wake_function+0x50/0x50
> [] call_decode+0x850/0x850 [sunrpc]
> [] call_decode+0x850/0x850 [sunrpc]
> [] __rpc_execute+0x14e/0x440 [sunrpc]
> [] ktime_get+0x35/0xa0
> [] rpc_run_task+0x120/0x170 [sunrpc]
> [] nfs4_call_sync_sequence+0x56/0x80 [nfsv4]
> [] _nfs4_proc_getattr+0xb0/0xc0 [nfsv4]
> [] path_lookupat+0xd2/0x100
> [] nfs4_proc_getattr+0x5c/0xe0 [nfsv4]
> [] __nfs_revalidate_inode+0xa0/0x300 [nfs]
> [] nfs_getattr+0x95/0x250 [nfs]
> [] vfs_statx+0x7b/0xc0
> [] SYSC_newstat+0x20/0x40
> [] entry_SYSCALL_64_fastpath+0x1a/0xa9
> [] 0x
> 
> I noted no _other_ misbehavior in either kernel, w/wo threadirqs.
> 
>   -Mike

Interesting. I would guess virtio net does not complete some
packets. So you were unable to find an old guest where this
works fine?

-- 
MST


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-07 Thread Mike Galbraith
On Fri, 2017-04-07 at 21:56 +0300, Michael S. Tsirkin wrote:

> OK. test3 and test4 are now pushed: test3 should fix your hang,
> test4 is trying to fix a crash reported independently.

test3 does not fix the post hibernate hang business that I can easily
reproduce, those are NFS, and at least as old as 4.4.  Host/guest,
dunno, put 4.4 on both, guest hangs intermittently.

[] __rpc_wait_for_completion_task+0x30/0x30 [sunrpc]
[] rpc_wait_bit_killable+0x1e/0xb0 [sunrpc]
[] __rpc_wait_for_completion_task+0x30/0x30 [sunrpc]
[] autoremove_wake_function+0x50/0x50
[] call_decode+0x850/0x850 [sunrpc]
[] call_decode+0x850/0x850 [sunrpc]
[] __rpc_execute+0x14e/0x440 [sunrpc]
[] ktime_get+0x35/0xa0
[] rpc_run_task+0x120/0x170 [sunrpc]
[] nfs4_call_sync_sequence+0x56/0x80 [nfsv4]
[] _nfs4_proc_getattr+0xb0/0xc0 [nfsv4]
[] path_lookupat+0xd2/0x100
[] nfs4_proc_getattr+0x5c/0xe0 [nfsv4]
[] __nfs_revalidate_inode+0xa0/0x300 [nfs]
[] nfs_getattr+0x95/0x250 [nfs]
[] vfs_statx+0x7b/0xc0
[] SYSC_newstat+0x20/0x40
[] entry_SYSCALL_64_fastpath+0x1a/0xa9
[] 0x

I noted no _other_ misbehavior in either kernel, w/wo threadirqs.

-Mike


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-07 Thread Michael S. Tsirkin
On Fri, Apr 07, 2017 at 04:29:53PM +0200, Mike Galbraith wrote:
> On Fri, 2017-04-07 at 16:35 +0300, Michael S. Tsirkin wrote:
> 
> > Oh wait, I still put the ctx feature patches in there :(
> > Pls ignore, I'll update when I've fixed it up. Sorry about the noise.
> 
> Both worked fine w/wo threadirqs.
> 
>   -Mike

OK. test3 and test4 are now pushed: test3 should fix your hang,
test4 is trying to fix a crash reported independently.

Will push to linux-next once I hear from you.

-- 
MST


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-07 Thread Mike Galbraith
On Fri, 2017-04-07 at 16:35 +0300, Michael S. Tsirkin wrote:

> Oh wait, I still put the ctx feature patches in there :(
> Pls ignore, I'll update when I've fixed it up. Sorry about the noise.

Both worked fine w/wo threadirqs.

-Mike


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-07 Thread Michael S. Tsirkin
On Fri, Apr 07, 2017 at 04:20:12PM +0300, Michael S. Tsirkin wrote:
> On Fri, Apr 07, 2017 at 09:22:02AM +0200, Mike Galbraith wrote:
> > On Fri, 2017-04-07 at 09:05 +0200, Mike Galbraith wrote:
> > > On Fri, 2017-04-07 at 08:44 +0200, Mike Galbraith wrote:
> > > > On Fri, 2017-04-07 at 09:24 +0300, Michael S. Tsirkin wrote:
> > > > > On Fri, Apr 07, 2017 at 08:03:19AM +0200, Mike Galbraith wrote:
> > > > 
> > > > > > Test tag works fine here w/wo threadirqs, RT works as well.
> > > > > > 
> > > > > > -Mike
> > > > > 
> > > > > Thanks a lot.
> > > > > OK I pushed out two new tags
> > > > >   test1 with just the cleanup reverts
> > > > >   test2 with a bugfix in this area
> > > > > 
> > > > > 
> > > > > I would very much appreciate your testing report on both -
> > > > > should be ok but better make sure.
> > > > 
> > > > Ok, once it percolates out I'll do that.
> > > 
> > > for_linus-10-g960bef2a6172 contains a -ENOBUILD merge conflict.
> > 
> > But test2 works fine w/wo threadirqs.
> 
> Oops. This is what one gets by pushing at 2am. I fixed that one up
> (still didn't even build as I'm in the middle of a conference).
> Also it's actually the reverse test2 is just the revert test1 has
> one more bugfix.
> 
> So I'm inclined to push test2 out to linux-next for now, and will
> add test1 later if it fares well.
> 
> Mike, your testing is very much appreciated!

Oh wait, I still put the ctx feature patches in there :(
Pls ignore, I'll update when I've fixed it up. Sorry about the noise.

> -- 
> MST


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-07 Thread Michael S. Tsirkin
On Fri, Apr 07, 2017 at 09:22:02AM +0200, Mike Galbraith wrote:
> On Fri, 2017-04-07 at 09:05 +0200, Mike Galbraith wrote:
> > On Fri, 2017-04-07 at 08:44 +0200, Mike Galbraith wrote:
> > > On Fri, 2017-04-07 at 09:24 +0300, Michael S. Tsirkin wrote:
> > > > On Fri, Apr 07, 2017 at 08:03:19AM +0200, Mike Galbraith wrote:
> > > 
> > > > > Test tag works fine here w/wo threadirqs, RT works as well.
> > > > > 
> > > > >   -Mike
> > > > 
> > > > Thanks a lot.
> > > > OK I pushed out two new tags
> > > > test1 with just the cleanup reverts
> > > > test2 with a bugfix in this area
> > > > 
> > > > 
> > > > I would very much appreciate your testing report on both -
> > > > should be ok but better make sure.
> > > 
> > > Ok, once it percolates out I'll do that.
> > 
> > for_linus-10-g960bef2a6172 contains a -ENOBUILD merge conflict.
> 
> But test2 works fine w/wo threadirqs.

Oops. This is what one gets by pushing at 2am. I fixed that one up
(still didn't even build as I'm in the middle of a conference).
Also it's actually the reverse test2 is just the revert test1 has
one more bugfix.

So I'm inclined to push test2 out to linux-next for now, and will
add test1 later if it fares well.

Mike, your testing is very much appreciated!

-- 
MST


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-07 Thread Mike Galbraith
On Fri, 2017-04-07 at 09:22 +0200, Mike Galbraith wrote:
> On Fri, 2017-04-07 at 09:05 +0200, Mike Galbraith wrote:
> > On Fri, 2017-04-07 at 08:44 +0200, Mike Galbraith wrote:
> > > On Fri, 2017-04-07 at 09:24 +0300, Michael S. Tsirkin wrote:
> > > > On Fri, Apr 07, 2017 at 08:03:19AM +0200, Mike Galbraith wrote:
> > > 
> > > > > Test tag works fine here w/wo threadirqs, RT works as well.
> > > > > 
> > > > >   -Mike
> > > > 
> > > > Thanks a lot.
> > > > OK I pushed out two new tags
> > > > test1 with just the cleanup reverts
> > > > test2 with a bugfix in this area
> > > > 
> > > > 
> > > > I would very much appreciate your testing report on both -
> > > > should be ok but better make sure.
> > > 
> > > Ok, once it percolates out I'll do that.
> > 
> > for_linus-10-g960bef2a6172 contains a -ENOBUILD merge conflict.
> 
> But test2 works fine w/wo threadirqs.

(CONFIG_DEBUG_SHIRQ=y as well btw)


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-07 Thread Mike Galbraith
On Fri, 2017-04-07 at 09:05 +0200, Mike Galbraith wrote:
> On Fri, 2017-04-07 at 08:44 +0200, Mike Galbraith wrote:
> > On Fri, 2017-04-07 at 09:24 +0300, Michael S. Tsirkin wrote:
> > > On Fri, Apr 07, 2017 at 08:03:19AM +0200, Mike Galbraith wrote:
> > 
> > > > Test tag works fine here w/wo threadirqs, RT works as well.
> > > > 
> > > > -Mike
> > > 
> > > Thanks a lot.
> > > OK I pushed out two new tags
> > >   test1 with just the cleanup reverts
> > >   test2 with a bugfix in this area
> > > 
> > > 
> > > I would very much appreciate your testing report on both -
> > > should be ok but better make sure.
> > 
> > Ok, once it percolates out I'll do that.
> 
> for_linus-10-g960bef2a6172 contains a -ENOBUILD merge conflict.

But test2 works fine w/wo threadirqs.


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-07 Thread Mike Galbraith
On Fri, 2017-04-07 at 08:44 +0200, Mike Galbraith wrote:
> On Fri, 2017-04-07 at 09:24 +0300, Michael S. Tsirkin wrote:
> > On Fri, Apr 07, 2017 at 08:03:19AM +0200, Mike Galbraith wrote:
> 
> > > Test tag works fine here w/wo threadirqs, RT works as well.
> > > 
> > >   -Mike
> > 
> > Thanks a lot.
> > OK I pushed out two new tags
> > test1 with just the cleanup reverts
> > test2 with a bugfix in this area
> > 
> > 
> > I would very much appreciate your testing report on both -
> > should be ok but better make sure.
> 
> Ok, once it percolates out I'll do that.

for_linus-10-g960bef2a6172 contains a -ENOBUILD merge conflict.

-Mike


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-06 Thread Mike Galbraith
On Fri, 2017-04-07 at 09:24 +0300, Michael S. Tsirkin wrote:
> On Fri, Apr 07, 2017 at 08:03:19AM +0200, Mike Galbraith wrote:

> > Test tag works fine here w/wo threadirqs, RT works as well.
> > 
> > -Mike
> 
> Thanks a lot.
> OK I pushed out two new tags
>   test1 with just the cleanup reverts
>   test2 with a bugfix in this area
> 
> 
> I would very much appreciate your testing report on both -
> should be ok but better make sure.

Ok, once it percolates out I'll do that.

-Mike


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-06 Thread Michael S. Tsirkin
On Fri, Apr 07, 2017 at 08:03:19AM +0200, Mike Galbraith wrote:
> On Thu, 2017-04-06 at 00:38 +0300, Michael S. Tsirkin wrote:
> 
> > What I did is a revert the refactorings while keeping the affinity API -
> > we can safely postpone them until the next release without loss of
> > functionality. But that's on top of my testing tree so it has unrelated
> > stuff as well. I'm rather confident they aren't fixing the issues but
> > I'll prepare a bugfix-only tree now for testing.
> 
> Test tag works fine here w/wo threadirqs, RT works as well.
> 
>   -Mike

Thanks a lot.
OK I pushed out two new tags
test1 with just the cleanup reverts
test2 with a bugfix in this area


I would very much appreciate your testing report on both -
should be ok but better make sure.
Unfortunately it's past 2am here so I don't have the time to
test - and I'm at a conference so not a lot of time during
the day either.

Christoph, I still think your cleanups were a good idea,
but we need get this release into a stable shape ASAP.
Let's try again for the next release, OK?

-- 
MST


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-06 Thread Mike Galbraith
On Thu, 2017-04-06 at 00:38 +0300, Michael S. Tsirkin wrote:

> What I did is a revert the refactorings while keeping the affinity API -
> we can safely postpone them until the next release without loss of
> functionality. But that's on top of my testing tree so it has unrelated
> stuff as well. I'm rather confident they aren't fixing the issues but
> I'll prepare a bugfix-only tree now for testing.

Test tag works fine here w/wo threadirqs, RT works as well.

-Mike


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-05 Thread Michael S. Tsirkin
On Wed, Apr 05, 2017 at 08:29:34AM +0200, Christoph Hellwig wrote:
> On Wed, Apr 05, 2017 at 06:24:50AM +0200, Mike Galbraith wrote:
> > On Wed, 2017-04-05 at 06:51 +0300, Michael S. Tsirkin wrote:
> > 
> > > Any issues at all left with this tree?
> > > In particular any regressions?
> > 
> > Nothing blatantly obvious in a testdrive that lasted a couple minutes. 
> >  I'd have to beat on it a bit to look for things beyond the reported,
> > but can't afford to do that right now.
> 
> Can you check where the issues appear?  I'd like to do a pure revert
> of the shared interrupts, but that three has a lot more in it..

What I did is a revert the refactorings while keeping the affinity API -
we can safely postpone them until the next release without loss of
functionality. But that's on top of my testing tree so it has unrelated
stuff as well. I'm rather confident they aren't fixing the issues but
I'll prepare a bugfix-only tree now for testing.

-- 
MST


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Mike Galbraith
On Wed, 2017-04-05 at 08:29 +0200, Christoph Hellwig wrote:

> Can you check where the issues appear?  I'd like to do a pure revert
> of the shared interrupts, but that three has a lot more in it..

Not immediately, one of my several pots is emitting black smoke.

-Mike


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Christoph Hellwig
On Mon, Apr 03, 2017 at 07:14:22PM +0300, Michael S. Tsirkin wrote:
> On Mon, Apr 03, 2017 at 04:18:23PM +0200, Christoph Hellwig wrote:
> > Mike,
> > 
> > can you try the patch below?
> > 
> > ---
> > >From fe41a30b54878cc631623b7511267125e0da4b15 Mon Sep 17 00:00:00 2001
> > From: Christoph Hellwig 
> > Date: Mon, 3 Apr 2017 14:51:35 +0200
> > Subject: virtio_pci: don't use shared irq for virtqueues
> > 
> > Reimplement the shared irq feature manually, as we might have a larger
> > number of virtqueues than the core shared interrupt code can handle
> > in threaded interrupt mode.
> > 
> > Signed-off-by: Christoph Hellwig 
> > ---
> >  drivers/virtio/virtio_pci_common.c | 142 
> > +
> >  drivers/virtio/virtio_pci_common.h |   1 +
> >  2 files changed, 83 insertions(+), 60 deletions(-)
> 
> Well the original patch this is trying to fix is
> 07ec51480b5eb1233f8c1b0f5d7a7c8d1247c507 which dropped just 40 lines
> with documentation. It did this by re-using error handling to switch
> from per-vq to non-per-vq mode. Now this has separate flows for errors
> and per-vq non-per-vq switch and (I think, as a result) is adding 140
> lines which doesn't make me very happy.

The above adds 23 lines.  We could entangle both loops again, but I'm
not sure it's going to buy us much.


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Christoph Hellwig
On Wed, Apr 05, 2017 at 06:24:50AM +0200, Mike Galbraith wrote:
> On Wed, 2017-04-05 at 06:51 +0300, Michael S. Tsirkin wrote:
> 
> > Any issues at all left with this tree?
> > In particular any regressions?
> 
> Nothing blatantly obvious in a testdrive that lasted a couple minutes. 
>  I'd have to beat on it a bit to look for things beyond the reported,
> but can't afford to do that right now.

Can you check where the issues appear?  I'd like to do a pure revert
of the shared interrupts, but that three has a lot more in it..


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Mike Galbraith
On Wed, 2017-04-05 at 06:51 +0300, Michael S. Tsirkin wrote:

> Any issues at all left with this tree?
> In particular any regressions?

Nothing blatantly obvious in a testdrive that lasted a couple minutes. 
 I'd have to beat on it a bit to look for things beyond the reported,
but can't afford to do that right now.

-Mike


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Michael S. Tsirkin
On Wed, Apr 05, 2017 at 05:24:30AM +0200, Mike Galbraith wrote:
> On Wed, 2017-04-05 at 06:13 +0300, Michael S. Tsirkin wrote:
> > On Wed, Apr 05, 2017 at 05:09:09AM +0200, Mike Galbraith wrote:
> > > On Tue, 2017-04-04 at 22:03 +0300, Michael S. Tsirkin wrote:
> > > 
> > > > since I couldn't reproduce, I decided it's worth trying to see
> > > > what happens if we revert back to before 5c34d002dcc7.
> > > > 
> > > > 
> > > > Could you please test a tag "test" in my tree above?
> > > > It should point at 6d88af1bf359417eb821370294ba489bdf7f5ab8
> > > 
> > > Nogo.
> > > 
> > > git@homer:..git/vhost> git remote update
> > > Fetching origin
> > > git@homer:..git/vhost> git show
> > > 6d88af1bf359417eb821370294ba489bdf7f5ab8
> > > fatal: bad object 6d88af1bf359417eb821370294ba489bdf7f5ab8
> > 
> > Maybe because it's a tag not a head. Pls try
> > git fetch git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git
> > refs/tags/test
> 
> That worked.  Checked out/building.

Thanks a lot for the testing.

-- 
MST


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Michael S. Tsirkin
On Wed, Apr 05, 2017 at 05:40:06AM +0200, Mike Galbraith wrote:
> On Wed, 2017-04-05 at 05:24 +0200, Mike Galbraith wrote:
> > On Wed, 2017-04-05 at 06:13 +0300, Michael S. Tsirkin wrote:
> > > On Wed, Apr 05, 2017 at 05:09:09AM +0200, Mike Galbraith wrote:
> > > > On Tue, 2017-04-04 at 22:03 +0300, Michael S. Tsirkin wrote:
> > > > 
> > > > > since I couldn't reproduce, I decided it's worth trying to see
> > > > > what happens if we revert back to before 5c34d002dcc7.
> > > > > 
> > > > > 
> > > > > Could you please test a tag "test" in my tree above?
> > > > > It should point at 6d88af1bf359417eb821370294ba489bdf7f5ab8
> > > > 
> > > > Nogo.
> > > > 
> > > > git@homer:..git/vhost> git remote update
> > > > Fetching origin
> > > > git@homer:..git/vhost> git show
> > > > 6d88af1bf359417eb821370294ba489bdf7f5ab8
> > > > fatal: bad object 6d88af1bf359417eb821370294ba489bdf7f5ab8
> > > 
> > > Maybe because it's a tag not a head. Pls try
> > > git fetch
> > > git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git
> > > refs/tags/test
> > 
> > That worked.  Checked out/building.
> 
> vbox hibernated gripe free, w/wo threadirqs.
> 
>   -Mike

Any issues at all left with this tree?
In particular any regressions?

-- 
MST


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Mike Galbraith
On Wed, 2017-04-05 at 05:24 +0200, Mike Galbraith wrote:
> On Wed, 2017-04-05 at 06:13 +0300, Michael S. Tsirkin wrote:
> > On Wed, Apr 05, 2017 at 05:09:09AM +0200, Mike Galbraith wrote:
> > > On Tue, 2017-04-04 at 22:03 +0300, Michael S. Tsirkin wrote:
> > > 
> > > > since I couldn't reproduce, I decided it's worth trying to see
> > > > what happens if we revert back to before 5c34d002dcc7.
> > > > 
> > > > 
> > > > Could you please test a tag "test" in my tree above?
> > > > It should point at 6d88af1bf359417eb821370294ba489bdf7f5ab8
> > > 
> > > Nogo.
> > > 
> > > git@homer:..git/vhost> git remote update
> > > Fetching origin
> > > git@homer:..git/vhost> git show
> > > 6d88af1bf359417eb821370294ba489bdf7f5ab8
> > > fatal: bad object 6d88af1bf359417eb821370294ba489bdf7f5ab8
> > 
> > Maybe because it's a tag not a head. Pls try
> > git fetch
> > git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git
> > refs/tags/test
> 
> That worked.  Checked out/building.

vbox hibernated gripe free, w/wo threadirqs.

-Mike


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Mike Galbraith
On Wed, 2017-04-05 at 06:13 +0300, Michael S. Tsirkin wrote:
> On Wed, Apr 05, 2017 at 05:09:09AM +0200, Mike Galbraith wrote:
> > On Tue, 2017-04-04 at 22:03 +0300, Michael S. Tsirkin wrote:
> > 
> > > since I couldn't reproduce, I decided it's worth trying to see
> > > what happens if we revert back to before 5c34d002dcc7.
> > > 
> > > 
> > > Could you please test a tag "test" in my tree above?
> > > It should point at 6d88af1bf359417eb821370294ba489bdf7f5ab8
> > 
> > Nogo.
> > 
> > git@homer:..git/vhost> git remote update
> > Fetching origin
> > git@homer:..git/vhost> git show
> > 6d88af1bf359417eb821370294ba489bdf7f5ab8
> > fatal: bad object 6d88af1bf359417eb821370294ba489bdf7f5ab8
> 
> Maybe because it's a tag not a head. Pls try
> git fetch git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git
> refs/tags/test

That worked.  Checked out/building.


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Michael S. Tsirkin
On Wed, Apr 05, 2017 at 05:09:09AM +0200, Mike Galbraith wrote:
> On Tue, 2017-04-04 at 22:03 +0300, Michael S. Tsirkin wrote:
> 
> > since I couldn't reproduce, I decided it's worth trying to see
> > what happens if we revert back to before 5c34d002dcc7.
> > 
> > 
> > Could you please test a tag "test" in my tree above?
> > It should point at 6d88af1bf359417eb821370294ba489bdf7f5ab8
> 
> Nogo.
> 
> git@homer:..git/vhost> git remote update
> Fetching origin
> git@homer:..git/vhost> git show 6d88af1bf359417eb821370294ba489bdf7f5ab8
> fatal: bad object 6d88af1bf359417eb821370294ba489bdf7f5ab8

Maybe because it's a tag not a head. Pls try
git fetch git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git 
refs/tags/test

-- 
MST


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Mike Galbraith
On Tue, 2017-04-04 at 22:03 +0300, Michael S. Tsirkin wrote:

> since I couldn't reproduce, I decided it's worth trying to see
> what happens if we revert back to before 5c34d002dcc7.
> 
> 
> Could you please test a tag "test" in my tree above?
> It should point at 6d88af1bf359417eb821370294ba489bdf7f5ab8

Nogo.

git@homer:..git/vhost> git remote update
Fetching origin
git@homer:..git/vhost> git show 6d88af1bf359417eb821370294ba489bdf7f5ab8
fatal: bad object 6d88af1bf359417eb821370294ba489bdf7f5ab8


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Mike Galbraith
On Wed, 2017-04-05 at 00:31 +0300, Michael S. Tsirkin wrote:
> On Tue, Apr 04, 2017 at 08:38:35PM +0200, Mike Galbraith wrote:
> > On Tue, 2017-04-04 at 21:00 +0300, Michael S. Tsirkin wrote:
> > 
> > > And just making double sure, the 1st version that has the issue
> > > is 5c34d002dcc7, isn't it? I'm asking because subject says so
> > > but then goes on to list subject from another commit.
> > > This one is:
> > >   > virtio_pci: remove struct virtio_pci_vq_info
> > 
> > When the hibernation related warnings started I don't know, I
> > wasn't
> > targeting that, those fell out of subsequent testing.
> > I started out
> > hunting console breakage point w. threaded irqs, which is
> > 5c34d002dcc7.
> 
> OK but 5c34d002dcc7 isn't "virtio_pci: use shared
> interrupts for virtqueues".

Heh, wrong sha.. $subject does however correctly identify in quotes the
origin of the threaded irq woes.
 
> I'm confused at this point. I would appreciate the summary of
> which versions were tested and what did you see. Testing
> a revert might also help.

I already tested full revert.  I went looking for what busted kvm for
RT kernels, extracted the virtio series and quilt bisected that to use
shared interrupts.  I was going to just use my little turn off
multiport hacklet to put spinning kworker on the back burner until the
dust settled, but noticed that there was more going on, and none of it
is RT specific (thus freeing up a back burner).

>From there, it's all test what you/Christoph post, as you post it, in
virgin source.

-Mike


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Michael S. Tsirkin
On Tue, Apr 04, 2017 at 08:38:35PM +0200, Mike Galbraith wrote:
> On Tue, 2017-04-04 at 21:00 +0300, Michael S. Tsirkin wrote:
> 
> > And just making double sure, the 1st version that has the issue
> > is 5c34d002dcc7, isn't it? I'm asking because subject says so
> > but then goes on to list subject from another commit.
> > This one is:
> > > virtio_pci: remove struct virtio_pci_vq_info
> 
> When the hibernation related warnings started I don't know, I wasn't
> targeting that, those fell out of subsequent testing.
> I started out
> hunting console breakage point w. threaded irqs, which is 5c34d002dcc7.

OK but 5c34d002dcc7 isn't "virtio_pci: use shared
interrupts for virtqueues".

> 
>   -Mike

I'm confused at this point. I would appreciate the summary of
which versions were tested and what did you see. Testing
a revert might also help.

Thanks a lot for your testing!

-- 
MST


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Michael S. Tsirkin
On Tue, Apr 04, 2017 at 07:54:36PM +0200, Mike Galbraith wrote:
> On Tue, 2017-04-04 at 19:40 +0200, Mike Galbraith wrote:
> > On Tue, 2017-04-04 at 18:30 +0300, Michael S. Tsirkin wrote:
> > 
> > > I couldn't reproduce it - let's make sure we are using the
> > > same tree. Could you pls try
> > > 
> > > git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux
> > > -next 
> > > 
> > > It's currently at cc79d42a7d7e57ff64f406a1fd3740afebac0b44
> > 
> > Things that make ya go hmm...
> 
> Making double sure we're on the same page...
> 
> git@homer:..git/vhost> git branch
> * linux-next
>   master
> git@homer:..git/vhost> git describe
> warning: tag 'for_linus' is really 'tags_for_linus' here
> for_linus-220128-gcc79d42a7d7e
> git@homer:..git/vhost> git status
> On branch linux-next
> Your branch is up-to-date with 'origin/linux-next'.
> Changes not staged for commit:
>   (use "git add ..." to update what will be committed)
>   (use "git checkout -- ..." to discard changes in working directory)
> 
> modified:   Makefile
> modified:   scripts/setlocalversion
> 
> no changes added to commit (use "git add" and/or "git commit -a")
> git@homer:..git/vhost>
> 
> Modifications are me whacking '+' sign and -rc5.. I don't do those.


since I couldn't reproduce, I decided it's worth trying to see
what happens if we revert back to before 5c34d002dcc7.


Could you please test a tag "test" in my tree above?
It should point at 6d88af1bf359417eb821370294ba489bdf7f5ab8


That has reverts for code refactorings since 5c34d002dcc7
inclusive. If this finally works, maybe you could
go back and see which of the reverts helps?

The idea is that this only has refactorings nicely isolated,
if all else fails we can even do the reverts without losing
functionality.

-- 
MST


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Mike Galbraith
On Tue, 2017-04-04 at 21:00 +0300, Michael S. Tsirkin wrote:

> And just making double sure, the 1st version that has the issue
> is 5c34d002dcc7, isn't it? I'm asking because subject says so
> but then goes on to list subject from another commit.
> This one is:
>   > virtio_pci: remove struct virtio_pci_vq_info

When the hibernation related warnings started I don't know, I wasn't
targeting that, those fell out of subsequent testing.  I started out
hunting console breakage point w. threaded irqs, which is 5c34d002dcc7.

-Mike


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Michael S. Tsirkin
On Tue, Apr 04, 2017 at 07:54:36PM +0200, Mike Galbraith wrote:
> On Tue, 2017-04-04 at 19:40 +0200, Mike Galbraith wrote:
> > On Tue, 2017-04-04 at 18:30 +0300, Michael S. Tsirkin wrote:
> > 
> > > I couldn't reproduce it - let's make sure we are using the
> > > same tree. Could you pls try
> > > 
> > > git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux
> > > -next 
> > > 
> > > It's currently at cc79d42a7d7e57ff64f406a1fd3740afebac0b44
> > 
> > Things that make ya go hmm...
> 
> Making double sure we're on the same page...
> 
> git@homer:..git/vhost> git branch
> * linux-next
>   master
> git@homer:..git/vhost> git describe
> warning: tag 'for_linus' is really 'tags_for_linus' here
> for_linus-220128-gcc79d42a7d7e
> git@homer:..git/vhost> git status
> On branch linux-next
> Your branch is up-to-date with 'origin/linux-next'.
> Changes not staged for commit:
>   (use "git add ..." to update what will be committed)
>   (use "git checkout -- ..." to discard changes in working directory)
> 
> modified:   Makefile
> modified:   scripts/setlocalversion
> 
> no changes added to commit (use "git add" and/or "git commit -a")
> git@homer:..git/vhost>
> 
> Modifications are me whacking '+' sign and -rc5.. I don't do those.

And just making double sure, the 1st version that has the issue
is 5c34d002dcc7, isn't it? I'm asking because subject says so
but then goes on to list subject from another commit.
This one is:
virtio_pci: remove struct virtio_pci_vq_info

-- 
MST


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Mike Galbraith
On Tue, 2017-04-04 at 19:40 +0200, Mike Galbraith wrote:
> On Tue, 2017-04-04 at 18:30 +0300, Michael S. Tsirkin wrote:
> 
> > I couldn't reproduce it - let's make sure we are using the
> > same tree. Could you pls try
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux
> > -next 
> > 
> > It's currently at cc79d42a7d7e57ff64f406a1fd3740afebac0b44
> 
> Things that make ya go hmm...

Making double sure we're on the same page...

git@homer:..git/vhost> git branch
* linux-next
  master
git@homer:..git/vhost> git describe
warning: tag 'for_linus' is really 'tags_for_linus' here
for_linus-220128-gcc79d42a7d7e
git@homer:..git/vhost> git status
On branch linux-next
Your branch is up-to-date with 'origin/linux-next'.
Changes not staged for commit:
  (use "git add ..." to update what will be committed)
  (use "git checkout -- ..." to discard changes in working directory)

modified:   Makefile
modified:   scripts/setlocalversion

no changes added to commit (use "git add" and/or "git commit -a")
git@homer:..git/vhost>

Modifications are me whacking '+' sign and -rc5.. I don't do those.


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Mike Galbraith
On Tue, 2017-04-04 at 18:30 +0300, Michael S. Tsirkin wrote:

> I couldn't reproduce it - let's make sure we are using the
> same tree. Could you pls try
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next 
> 
> It's currently at cc79d42a7d7e57ff64f406a1fd3740afebac0b44

Things that make ya go hmm...

[   87.940161] [ cut here ]
[   87.940180] WARNING: CPU: 0 PID: 97 at drivers/pci/msi.c:1251 
pci_irq_vector+0xcb/0xe0
[   87.940181] Modules linked in: dm_mod(E) fuse(E) ebtable_filter(E) 
ebtables(E) rpcsec_gss_krb5(E) nfsv4(E) dns_resolver(E) nfs(E) fscache(E) 
nf_log_ipv6(E) xt_pkttype(E) nf_log_ipv4(E) nf_log_common(E) xt_LOG(E) 
xt_limit(E) af_packet(E) iscsi_ibft(E) iscsi_boot_sysfs(E) ip6t_REJECT(E) 
xt_tcpudp(E) nf_conntrack_ipv6(E) nf_defrag_ipv6(E) ip6table_raw(E) 
ipt_REJECT(E) iptable_raw(E) xt_CT(E) iptable_filter(E) ip6table_mangle(E) 
nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nf_conntrack_ipv4(E) 
nf_defrag_ipv4(E) ip_tables(E) xt_conntrack(E) nf_conntrack(E) libcrc32c(E) 
ip6table_filter(E) ip6_tables(E) x_tables(E) snd_hda_codec_generic(E) 
snd_hda_intel(E) snd_hda_codec(E) joydev(E) snd_hda_core(E) snd_hwdep(E) 
snd_pcm(E) snd_timer(E) snd(E) 8139too(E) ppdev(E) soundcore(E) parport_pc(E) 
i2c_piix4(E)
[   87.940206]  parport(E) virtio_balloon(E) crct10dif_pclmul(E) 
crc32_pclmul(E) crc32c_intel(E) ghash_clmulni_intel(E) serio_raw(E) 
acpi_cpufreq(E) pcbc(E) button(E) aesni_intel(E) pcspkr(E) aes_x86_64(E) 
crypto_simd(E) glue_helper(E) cryptd(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) 
lockd(E) grace(E) sunrpc(E) ext4(E) crc16(E) jbd2(E) mbcache(E) hid_generic(E) 
usbhid(E) ata_generic(E) ata_piix(E) sr_mod(E) cdrom(E) virtio_blk(E) 
virtio_rng(E) virtio_console(E) qxl(E) drm_kms_helper(E) syscopyarea(E) 
sysfillrect(E) sysimgblt(E) fb_sys_fops(E) ehci_pci(E) ttm(E) uhci_hcd(E) 
ehci_hcd(E) floppy(E) ahci(E) libahci(E) virtio_pci(E) drm(E) virtio_ring(E) 
virtio(E) usbcore(E) libata(E) 8139cp(E) mii(E) sg(E) scsi_mod(E) autofs4(E)
[   87.940233] CPU: 0 PID: 97 Comm: kworker/u16:1 Tainted: GE   
4.11.0-default #1
[   87.940234] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.8.1-0-g4adadbd-20161202_174313-build11a 04/01/2014
[   87.940240] Workqueue: events_unbound async_run_entry_fn
[   87.940241] Call Trace:
[   87.940246]  ? dump_stack+0x5c/0x85
[   87.940255]  ? __warn+0xc4/0xe0
[   87.940258]  ? pci_pm_poweroff+0xf0/0xf0
[   87.940269]  ? pci_irq_vector+0xcb/0xe0
[   87.940272]  ? vp_synchronize_vectors+0x3e/0x50 [virtio_pci]
[   87.940275]  ? virtcons_freeze+0x1a/0xd0 [virtio_console]
[   87.940276]  ? virtio_pci_freeze+0x19/0x40 [virtio_pci]
[   87.940277]  ? pci_pm_freeze+0x59/0xe0
[   87.940281]  ? dpm_run_callback+0x4d/0x170
[   87.940283]  ? __device_suspend+0x11f/0x3b0
[   87.940283]  ? pm_dev_dbg+0x70/0x70
[   87.940284]  ? async_suspend+0x1a/0x90
[   87.940286]  ? async_run_entry_fn+0x34/0x160
[   87.940287]  ? process_one_work+0x164/0x430
[   87.940288]  ? worker_thread+0x135/0x4d0
[   87.940290]  ? kthread+0xff/0x140
[   87.940291]  ? rescuer_thread+0x3c0/0x3c0
[   87.940292]  ? kthread_park+0x80/0x80
[   87.940293]  ? kthread_park+0x80/0x80
[   87.940299]  ? ret_from_fork+0x26/0x40
[   87.940300] ---[ end trace 5d65fe0efc4b61d7 ]---


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Michael S. Tsirkin
On Tue, Apr 04, 2017 at 04:18:02PM +0200, Mike Galbraith wrote:
> On Tue, 2017-04-04 at 16:38 +0300, Michael S. Tsirkin wrote:
> > On Tue, Apr 04, 2017 at 06:02:52AM +0200, Mike Galbraith wrote:
> > > On Mon, 2017-04-03 at 21:11 +0300, Michael S. Tsirkin wrote:
> > > > On Mon, Apr 03, 2017 at 07:56:32PM +0200, Mike Galbraith wrote:
> > > > > On Mon, 2017-04-03 at 16:18 +0200, Christoph Hellwig wrote:
> > > > > > Mike,
> > > > > > 
> > > > > > can you try the patch below?
> > > > > 
> > > > > No more spinning kworker woes, but I still have a warning on
> > > > > hibernate,
> > > > > threadirqs invariant.  I'm also seeing intermittent post
> > > > > hibernate hang
> > > > > funnies in virgin source +- this patch, and without threadirqs.
> > > > > 
> > > > > [  110.223953] WARNING: CPU: 5 PID: 452 at
> > > > > drivers/pci/msi.c:1261 pci_irq_vector+0xb1/0xe0
> > > > > 
> > > > >   > > -Mike
> > > > 
> > > > I just sent a patch fixing that.
> > > > However I think we want to print a message when MSI fails to work
> > > > so we
> > > > know guest is falling back on legacy interrupts.
> > > 
> > > The warning persists.
> > > 
> > > [  137.656423] WARNING: CPU: 1 PID: 535 at drivers/pci/msi.c:1261
> > > pci_irq_vector+0xb1/0xe0
> > 
> > Can you post the rest of the backtrace? Is it still in the console?
> 
> This is from a dump of post hibernate loop dying vbox I captured and
> squirreled away, so pid is different.  I'm not absolutely certain that
> I didn't have my local patch set re-applied when I did this, so I'll
> rebuild in the a.m..  My stuff is unrelated, so this should be fine.
> 
> [  328.475988] [ cut here ]
> [  328.476002] WARNING: CPU: 6 PID: 313 at drivers/pci/msi.c:1261 
> pci_irq_vector+0xb1/0xe0
> [  328.476003] Modules linked in: fuse(E) ebtable_filter(E) ebtables(E) 
> nf_log_ipv6(E) xt_pkttype(E) nf_log_ipv4(E) nf_log_common(E) xt_LOG(E) 
> xt_limit(E) rpcsec_gss_krb5(E) nfsv4(E) dns_resolver(E) nfs(E) fscache(E) 
> af_packet(E) iscsi_ibft(E) iscsi_boot_sysfs(E) ip6t_REJECT(E) xt_tcpudp(E) 
> nf_conntrack_ipv6(E) nf_defrag_ipv6(E) ip6table_raw(E) ipt_REJECT(E) 
> iptable_raw(E) xt_CT(E) iptable_filter(E) ip6table_mangle(E) 
> nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nf_conntrack_ipv4(E) 
> nf_defrag_ipv4(E) ip_tables(E) xt_conntrack(E) nf_conntrack(E) libcrc32c(E) 
> ip6table_filter(E) ip6_tables(E) x_tables(E) snd_hda_codec_generic(E) 
> snd_hda_intel(E) snd_hda_codec(E) snd_hda_core(E) joydev(E) snd_hwdep(E) 
> snd_pcm(E) snd_timer(E) snd(E) 8139too(E) soundcore(E) i2c_piix4(E) 
> virtio_balloon(E) crct10dif_pclmul(E)
> [  328.476019]  crc32_pclmul(E) ppdev(E) ghash_clmulni_intel(E) parport_pc(E) 
> acpi_cpufreq(E) pcbc(E) button(E) parport(E) aesni_intel(E) aes_x86_64(E) 
> serio_raw(E) pcspkr(E) crypto_simd(E) glue_helper(E) cryptd(E) nfsd(E) 
> auth_rpcgss(E) nfs_acl(E) lockd(E) dm_mod(E) grace(E) sunrpc(E) ext4(E) 
> crc16(E) jbd2(E) mbcache(E) hid_generic(E) usbhid(E) sr_mod(E) cdrom(E) 
> ata_generic(E) virtio_blk(E) virtio_rng(E) virtio_console(E) ata_piix(E) 
> qxl(E) drm_kms_helper(E) syscopyarea(E) uhci_hcd(E) ehci_pci(E) 
> sysfillrect(E) sysimgblt(E) ahci(E) fb_sys_fops(E) ehci_hcd(E) libahci(E) 
> crc32c_intel(E) ttm(E) virtio_pci(E) virtio_ring(E) 8139cp(E) virtio(E) 
> usbcore(E) floppy(E) mii(E) drm(E) libata(E) sg(E) scsi_mod(E) autofs4(E)
> [  328.476037] CPU: 6 PID: 313 Comm: kworker/u16:2 Tainted: GE   
> 4.11.0-default #20
> [  328.476038] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> rel-1.8.1-0-g4adadbd-20161202_174313-build11a 04/01/2014
> [  328.476041] Workqueue: events_unbound async_run_entry_fn
> [  328.476042] Call Trace:
> [  328.476056]  ? dump_stack+0x5c/0x85
> [  328.476058]  ? __warn+0xc4/0xe0
> [  328.476060]  ? pci_pm_poweroff+0xf0/0xf0
> [  328.476062]  ? pci_irq_vector+0xb1/0xe0
> [  328.476064]  ? vp_del_vqs+0xcb/0x120 [virtio_pci]
> [  328.476066]  ? remove_common+0x60/0x80 [virtio_rng]
> [  328.476067]  ? virtrng_freeze+0xa/0x10 [virtio_rng]
> [  328.476068]  ? virtio_pci_freeze+0x19/0x40 [virtio_pci]
> [  328.476069]  ? pci_pm_freeze+0x59/0xe0
> [  328.476070]  ? dpm_run_callback+0x4d/0x170
> [  328.476071]  ? __device_suspend+0x11f/0x3b0
> [  328.476072]  ? pm_dev_dbg+0x70/0x70
> [  328.476072]  ? async_suspend+0x1a/0x90
> [  328.476082]  ? async_run_entry_fn+0x34/0x160
> [  328.476083]  ? process_one_work+0x164/0x430
> [  328.476084]  ? worker_thread+0x135/0x4d0
> [  328.476085]  ? kthread+0xff/0x140
> [  328.476086]  ? rescuer_thread+0x3c0/0x3c0
> [  328.476087]  ? kthread_park+0x80/0x80
> [  328.476088]  ? do_group_exit+0x39/0xa0
> [  328.476090]  ? ret_from_fork+0x26/0x40
> [  328.476091] ---[ end trace a045c2118936902f ]---


I couldn't reproduce it - let's make sure we are using the
same tree. Could you pls try

git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next 

It's currently at cc79d42a7d7e57ff64f406a1fd3740afebac0b44
-- 
MST


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Michael S. Tsirkin
On Tue, Apr 04, 2017 at 04:18:02PM +0200, Mike Galbraith wrote:
> On Tue, 2017-04-04 at 16:38 +0300, Michael S. Tsirkin wrote:
> > On Tue, Apr 04, 2017 at 06:02:52AM +0200, Mike Galbraith wrote:
> > > On Mon, 2017-04-03 at 21:11 +0300, Michael S. Tsirkin wrote:
> > > > On Mon, Apr 03, 2017 at 07:56:32PM +0200, Mike Galbraith wrote:
> > > > > On Mon, 2017-04-03 at 16:18 +0200, Christoph Hellwig wrote:
> > > > > > Mike,
> > > > > > 
> > > > > > can you try the patch below?
> > > > > 
> > > > > No more spinning kworker woes, but I still have a warning on
> > > > > hibernate,
> > > > > threadirqs invariant.  I'm also seeing intermittent post
> > > > > hibernate hang
> > > > > funnies in virgin source +- this patch, and without threadirqs.
> > > > > 
> > > > > [  110.223953] WARNING: CPU: 5 PID: 452 at
> > > > > drivers/pci/msi.c:1261 pci_irq_vector+0xb1/0xe0
> > > > > 
> > > > >   > > -Mike
> > > > 
> > > > I just sent a patch fixing that.
> > > > However I think we want to print a message when MSI fails to work
> > > > so we
> > > > know guest is falling back on legacy interrupts.
> > > 
> > > The warning persists.
> > > 
> > > [  137.656423] WARNING: CPU: 1 PID: 535 at drivers/pci/msi.c:1261
> > > pci_irq_vector+0xb1/0xe0
> > 
> > Can you post the rest of the backtrace? Is it still in the console?
> 
> This is from a dump of post hibernate loop dying vbox I captured and
> squirreled away, so pid is different.  I'm not absolutely certain that
> I didn't have my local patch set re-applied when I did this, so I'll
> rebuild in the a.m..  My stuff is unrelated, so this should be fine.
> 
> [  328.475988] [ cut here ]
> [  328.476002] WARNING: CPU: 6 PID: 313 at drivers/pci/msi.c:1261 
> pci_irq_vector+0xb1/0xe0
> [  328.476003] Modules linked in: fuse(E) ebtable_filter(E) ebtables(E) 
> nf_log_ipv6(E) xt_pkttype(E) nf_log_ipv4(E) nf_log_common(E) xt_LOG(E) 
> xt_limit(E) rpcsec_gss_krb5(E) nfsv4(E) dns_resolver(E) nfs(E) fscache(E) 
> af_packet(E) iscsi_ibft(E) iscsi_boot_sysfs(E) ip6t_REJECT(E) xt_tcpudp(E) 
> nf_conntrack_ipv6(E) nf_defrag_ipv6(E) ip6table_raw(E) ipt_REJECT(E) 
> iptable_raw(E) xt_CT(E) iptable_filter(E) ip6table_mangle(E) 
> nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nf_conntrack_ipv4(E) 
> nf_defrag_ipv4(E) ip_tables(E) xt_conntrack(E) nf_conntrack(E) libcrc32c(E) 
> ip6table_filter(E) ip6_tables(E) x_tables(E) snd_hda_codec_generic(E) 
> snd_hda_intel(E) snd_hda_codec(E) snd_hda_core(E) joydev(E) snd_hwdep(E) 
> snd_pcm(E) snd_timer(E) snd(E) 8139too(E) soundcore(E) i2c_piix4(E) 
> virtio_balloon(E) crct10dif_pclmul(E)
> [  328.476019]  crc32_pclmul(E) ppdev(E) ghash_clmulni_intel(E) parport_pc(E) 
> acpi_cpufreq(E) pcbc(E) button(E) parport(E) aesni_intel(E) aes_x86_64(E) 
> serio_raw(E) pcspkr(E) crypto_simd(E) glue_helper(E) cryptd(E) nfsd(E) 
> auth_rpcgss(E) nfs_acl(E) lockd(E) dm_mod(E) grace(E) sunrpc(E) ext4(E) 
> crc16(E) jbd2(E) mbcache(E) hid_generic(E) usbhid(E) sr_mod(E) cdrom(E) 
> ata_generic(E) virtio_blk(E) virtio_rng(E) virtio_console(E) ata_piix(E) 
> qxl(E) drm_kms_helper(E) syscopyarea(E) uhci_hcd(E) ehci_pci(E) 
> sysfillrect(E) sysimgblt(E) ahci(E) fb_sys_fops(E) ehci_hcd(E) libahci(E) 
> crc32c_intel(E) ttm(E) virtio_pci(E) virtio_ring(E) 8139cp(E) virtio(E) 
> usbcore(E) floppy(E) mii(E) drm(E) libata(E) sg(E) scsi_mod(E) autofs4(E)
> [  328.476037] CPU: 6 PID: 313 Comm: kworker/u16:2 Tainted: GE   
> 4.11.0-default #20
> [  328.476038] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> rel-1.8.1-0-g4adadbd-20161202_174313-build11a 04/01/2014
> [  328.476041] Workqueue: events_unbound async_run_entry_fn
> [  328.476042] Call Trace:
> [  328.476056]  ? dump_stack+0x5c/0x85
> [  328.476058]  ? __warn+0xc4/0xe0
> [  328.476060]  ? pci_pm_poweroff+0xf0/0xf0
> [  328.476062]  ? pci_irq_vector+0xb1/0xe0
> [  328.476064]  ? vp_del_vqs+0xcb/0x120 [virtio_pci]
> [  328.476066]  ? remove_common+0x60/0x80 [virtio_rng]
> [  328.476067]  ? virtrng_freeze+0xa/0x10 [virtio_rng]
> [  328.476068]  ? virtio_pci_freeze+0x19/0x40 [virtio_pci]
> [  328.476069]  ? pci_pm_freeze+0x59/0xe0
> [  328.476070]  ? dpm_run_callback+0x4d/0x170
> [  328.476071]  ? __device_suspend+0x11f/0x3b0
> [  328.476072]  ? pm_dev_dbg+0x70/0x70
> [  328.476072]  ? async_suspend+0x1a/0x90
> [  328.476082]  ? async_run_entry_fn+0x34/0x160
> [  328.476083]  ? process_one_work+0x164/0x430
> [  328.476084]  ? worker_thread+0x135/0x4d0
> [  328.476085]  ? kthread+0xff/0x140
> [  328.476086]  ? rescuer_thread+0x3c0/0x3c0
> [  328.476087]  ? kthread_park+0x80/0x80
> [  328.476088]  ? do_group_exit+0x39/0xa0
> [  328.476090]  ? ret_from_fork+0x26/0x40
> [  328.476091] ---[ end trace a045c2118936902f ]---

Interesting, it's rng this time. I'll try that.

-- 
MST


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Mike Galbraith
On Tue, 2017-04-04 at 16:38 +0300, Michael S. Tsirkin wrote:
> On Tue, Apr 04, 2017 at 06:02:52AM +0200, Mike Galbraith wrote:
> > On Mon, 2017-04-03 at 21:11 +0300, Michael S. Tsirkin wrote:
> > > On Mon, Apr 03, 2017 at 07:56:32PM +0200, Mike Galbraith wrote:
> > > > On Mon, 2017-04-03 at 16:18 +0200, Christoph Hellwig wrote:
> > > > > Mike,
> > > > > 
> > > > > can you try the patch below?
> > > > 
> > > > No more spinning kworker woes, but I still have a warning on
> > > > hibernate,
> > > > threadirqs invariant.  I'm also seeing intermittent post
> > > > hibernate hang
> > > > funnies in virgin source +- this patch, and without threadirqs.
> > > > 
> > > > [  110.223953] WARNING: CPU: 5 PID: 452 at
> > > > drivers/pci/msi.c:1261 pci_irq_vector+0xb1/0xe0
> > > > 
> > > > > > -Mike
> > > 
> > > I just sent a patch fixing that.
> > > However I think we want to print a message when MSI fails to work
> > > so we
> > > know guest is falling back on legacy interrupts.
> > 
> > The warning persists.
> > 
> > [  137.656423] WARNING: CPU: 1 PID: 535 at drivers/pci/msi.c:1261
> > pci_irq_vector+0xb1/0xe0
> 
> Can you post the rest of the backtrace? Is it still in the console?

This is from a dump of post hibernate loop dying vbox I captured and
squirreled away, so pid is different.  I'm not absolutely certain that
I didn't have my local patch set re-applied when I did this, so I'll
rebuild in the a.m..  My stuff is unrelated, so this should be fine.

[  328.475988] [ cut here ]
[  328.476002] WARNING: CPU: 6 PID: 313 at drivers/pci/msi.c:1261 
pci_irq_vector+0xb1/0xe0
[  328.476003] Modules linked in: fuse(E) ebtable_filter(E) ebtables(E) 
nf_log_ipv6(E) xt_pkttype(E) nf_log_ipv4(E) nf_log_common(E) xt_LOG(E) 
xt_limit(E) rpcsec_gss_krb5(E) nfsv4(E) dns_resolver(E) nfs(E) fscache(E) 
af_packet(E) iscsi_ibft(E) iscsi_boot_sysfs(E) ip6t_REJECT(E) xt_tcpudp(E) 
nf_conntrack_ipv6(E) nf_defrag_ipv6(E) ip6table_raw(E) ipt_REJECT(E) 
iptable_raw(E) xt_CT(E) iptable_filter(E) ip6table_mangle(E) 
nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nf_conntrack_ipv4(E) 
nf_defrag_ipv4(E) ip_tables(E) xt_conntrack(E) nf_conntrack(E) libcrc32c(E) 
ip6table_filter(E) ip6_tables(E) x_tables(E) snd_hda_codec_generic(E) 
snd_hda_intel(E) snd_hda_codec(E) snd_hda_core(E) joydev(E) snd_hwdep(E) 
snd_pcm(E) snd_timer(E) snd(E) 8139too(E) soundcore(E) i2c_piix4(E) 
virtio_balloon(E) crct10dif_pclmul(E)
[  328.476019]  crc32_pclmul(E) ppdev(E) ghash_clmulni_intel(E) parport_pc(E) 
acpi_cpufreq(E) pcbc(E) button(E) parport(E) aesni_intel(E) aes_x86_64(E) 
serio_raw(E) pcspkr(E) crypto_simd(E) glue_helper(E) cryptd(E) nfsd(E) 
auth_rpcgss(E) nfs_acl(E) lockd(E) dm_mod(E) grace(E) sunrpc(E) ext4(E) 
crc16(E) jbd2(E) mbcache(E) hid_generic(E) usbhid(E) sr_mod(E) cdrom(E) 
ata_generic(E) virtio_blk(E) virtio_rng(E) virtio_console(E) ata_piix(E) qxl(E) 
drm_kms_helper(E) syscopyarea(E) uhci_hcd(E) ehci_pci(E) sysfillrect(E) 
sysimgblt(E) ahci(E) fb_sys_fops(E) ehci_hcd(E) libahci(E) crc32c_intel(E) 
ttm(E) virtio_pci(E) virtio_ring(E) 8139cp(E) virtio(E) usbcore(E) floppy(E) 
mii(E) drm(E) libata(E) sg(E) scsi_mod(E) autofs4(E)
[  328.476037] CPU: 6 PID: 313 Comm: kworker/u16:2 Tainted: GE   
4.11.0-default #20
[  328.476038] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.8.1-0-g4adadbd-20161202_174313-build11a 04/01/2014
[  328.476041] Workqueue: events_unbound async_run_entry_fn
[  328.476042] Call Trace:
[  328.476056]  ? dump_stack+0x5c/0x85
[  328.476058]  ? __warn+0xc4/0xe0
[  328.476060]  ? pci_pm_poweroff+0xf0/0xf0
[  328.476062]  ? pci_irq_vector+0xb1/0xe0
[  328.476064]  ? vp_del_vqs+0xcb/0x120 [virtio_pci]
[  328.476066]  ? remove_common+0x60/0x80 [virtio_rng]
[  328.476067]  ? virtrng_freeze+0xa/0x10 [virtio_rng]
[  328.476068]  ? virtio_pci_freeze+0x19/0x40 [virtio_pci]
[  328.476069]  ? pci_pm_freeze+0x59/0xe0
[  328.476070]  ? dpm_run_callback+0x4d/0x170
[  328.476071]  ? __device_suspend+0x11f/0x3b0
[  328.476072]  ? pm_dev_dbg+0x70/0x70
[  328.476072]  ? async_suspend+0x1a/0x90
[  328.476082]  ? async_run_entry_fn+0x34/0x160
[  328.476083]  ? process_one_work+0x164/0x430
[  328.476084]  ? worker_thread+0x135/0x4d0
[  328.476085]  ? kthread+0xff/0x140
[  328.476086]  ? rescuer_thread+0x3c0/0x3c0
[  328.476087]  ? kthread_park+0x80/0x80
[  328.476088]  ? do_group_exit+0x39/0xa0
[  328.476090]  ? ret_from_fork+0x26/0x40
[  328.476091] ---[ end trace a045c2118936902f ]---


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Michael S. Tsirkin
On Tue, Apr 04, 2017 at 06:02:52AM +0200, Mike Galbraith wrote:
> On Mon, 2017-04-03 at 21:11 +0300, Michael S. Tsirkin wrote:
> > On Mon, Apr 03, 2017 at 07:56:32PM +0200, Mike Galbraith wrote:
> > > On Mon, 2017-04-03 at 16:18 +0200, Christoph Hellwig wrote:
> > > > Mike,
> > > > 
> > > > can you try the patch below?
> > > 
> > > No more spinning kworker woes, but I still have a warning on hibernate,
> > > threadirqs invariant.  I'm also seeing intermittent post hibernate hang
> > > funnies in virgin source +- this patch, and without threadirqs.
> > > 
> > > [  110.223953] WARNING: CPU: 5 PID: 452 at drivers/pci/msi.c:1261 
> > > pci_irq_vector+0xb1/0xe0
> > > 
> > >   > > -Mike
> > 
> > I just sent a patch fixing that.
> > However I think we want to print a message when MSI fails to work so we
> > know guest is falling back on legacy interrupts.
> 
> The warning persists.
> 
> [  137.656423] WARNING: CPU: 1 PID: 535 at drivers/pci/msi.c:1261 
> pci_irq_vector+0xb1/0xe0

Can you post the rest of the backtrace? Is it still in the console?

> WRT the post hibernate hang business, that is apparently not part of
> the 4.11 woes (at least not solely), as 4.10.8 did not survive a 10
> hibernate cycle loop.  RT is better at reproducing trouble (shrug, it
> frequently is), but it matters not whether I'm running 4.10, master or
> master-rt, they will all hang.
> 
> WRT gripe, I wedged virtio_pci-fix-msix-vector-tracking-on-cleanup in
> on top, but it wasn't impressed.
> 
>   -Mike


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-03 Thread Mike Galbraith
On Mon, 2017-04-03 at 21:11 +0300, Michael S. Tsirkin wrote:
> On Mon, Apr 03, 2017 at 07:56:32PM +0200, Mike Galbraith wrote:
> > On Mon, 2017-04-03 at 16:18 +0200, Christoph Hellwig wrote:
> > > Mike,
> > > 
> > > can you try the patch below?
> > 
> > No more spinning kworker woes, but I still have a warning on hibernate,
> > threadirqs invariant.  I'm also seeing intermittent post hibernate hang
> > funnies in virgin source +- this patch, and without threadirqs.
> > 
> > [  110.223953] WARNING: CPU: 5 PID: 452 at drivers/pci/msi.c:1261 
> > pci_irq_vector+0xb1/0xe0
> > 
> > > > -Mike
> 
> I just sent a patch fixing that.
> However I think we want to print a message when MSI fails to work so we
> know guest is falling back on legacy interrupts.

The warning persists.

[  137.656423] WARNING: CPU: 1 PID: 535 at drivers/pci/msi.c:1261 
pci_irq_vector+0xb1/0xe0

WRT the post hibernate hang business, that is apparently not part of
the 4.11 woes (at least not solely), as 4.10.8 did not survive a 10
hibernate cycle loop.  RT is better at reproducing trouble (shrug, it
frequently is), but it matters not whether I'm running 4.10, master or
master-rt, they will all hang.

WRT gripe, I wedged virtio_pci-fix-msix-vector-tracking-on-cleanup in
on top, but it wasn't impressed.

-Mike


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-03 Thread Michael S. Tsirkin
On Mon, Apr 03, 2017 at 07:56:32PM +0200, Mike Galbraith wrote:
> On Mon, 2017-04-03 at 16:18 +0200, Christoph Hellwig wrote:
> > Mike,
> > 
> > can you try the patch below?
> 
> No more spinning kworker woes, but I still have a warning on hibernate,
> threadirqs invariant.  I'm also seeing intermittent post hibernate hang
> funnies in virgin source +- this patch, and without threadirqs.
> 
> [  110.223953] WARNING: CPU: 5 PID: 452 at drivers/pci/msi.c:1261 
> pci_irq_vector+0xb1/0xe0
> 
>   -Mike

I just sent a patch fixing that.
However I think we want to print a message when MSI fails to work so we
know guest is falling back on legacy interrupts.

-- 
MST


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-03 Thread Mike Galbraith
On Mon, 2017-04-03 at 16:18 +0200, Christoph Hellwig wrote:
> Mike,
> 
> can you try the patch below?

No more spinning kworker woes, but I still have a warning on hibernate,
threadirqs invariant.  I'm also seeing intermittent post hibernate hang
funnies in virgin source +- this patch, and without threadirqs.

[  110.223953] WARNING: CPU: 5 PID: 452 at drivers/pci/msi.c:1261 
pci_irq_vector+0xb1/0xe0

-Mike



Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-03 Thread Michael S. Tsirkin
On Mon, Apr 03, 2017 at 04:18:23PM +0200, Christoph Hellwig wrote:
> Mike,
> 
> can you try the patch below?
> 
> ---
> >From fe41a30b54878cc631623b7511267125e0da4b15 Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig 
> Date: Mon, 3 Apr 2017 14:51:35 +0200
> Subject: virtio_pci: don't use shared irq for virtqueues
> 
> Reimplement the shared irq feature manually, as we might have a larger
> number of virtqueues than the core shared interrupt code can handle
> in threaded interrupt mode.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/virtio/virtio_pci_common.c | 142 
> +
>  drivers/virtio/virtio_pci_common.h |   1 +
>  2 files changed, 83 insertions(+), 60 deletions(-)

Well the original patch this is trying to fix is
07ec51480b5eb1233f8c1b0f5d7a7c8d1247c507 which dropped just 40 lines
with documentation. It did this by re-using error handling to switch
from per-vq to non-per-vq mode. Now this has separate flows for errors
and per-vq non-per-vq switch and (I think, as a result) is adding 140
lines which doesn't make me very happy.

> diff --git a/drivers/virtio/virtio_pci_common.c 
> b/drivers/virtio/virtio_pci_common.c
> index 590534910dc6..6dd719543410 100644
> --- a/drivers/virtio/virtio_pci_common.c
> +++ b/drivers/virtio/virtio_pci_common.c
> @@ -137,6 +137,9 @@ void vp_del_vqs(struct virtio_device *vdev)
>   kfree(vp_dev->msix_vector_map);
>   }
>  
> + /* free the shared virtuqueue irq if we don't use per-vq irqs */

typo

> + if (vp_dev->shared_vq_vec)
> + free_irq(pci_irq_vector(vp_dev->pci_dev, 1), vp_dev);

So we used to have enums for 1 and 0. I think it was cleaner.


>   free_irq(pci_irq_vector(vp_dev->pci_dev, 0), vp_dev);
>   pci_free_irq_vectors(vp_dev->pci_dev);
>  }
> @@ -147,10 +150,10 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, 
> unsigned nvqs,
>  {
>   struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>   const char *name = dev_name(&vp_dev->vdev.dev);
> - int i, j, err = -ENOMEM, allocated_vectors, nvectors;
> + struct pci_dev *pdev = vp_dev->pci_dev;
> + int i, err = -ENOMEM, nvectors;
>   unsigned flags = PCI_IRQ_MSIX;
> - bool shared = false;
> - u16 msix_vec;
> + u16 msix_vec = 0;
>  
>   if (desc) {
>   flags |= PCI_IRQ_AFFINITY;
> @@ -162,19 +165,18 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, 
> unsigned nvqs,
>   if (callbacks[i])
>   nvectors++;
>  
> - /* Try one vector per queue first. */
> - err = pci_alloc_irq_vectors_affinity(vp_dev->pci_dev, nvectors,
> - nvectors, flags, desc);
> + /* Try one vector for config and one per queue first. */
> + err = pci_alloc_irq_vectors_affinity(pdev, nvectors, nvectors, flags,
> + desc);
>   if (err < 0) {
>   /* Fallback to one vector for config, one shared for queues. */
> - shared = true;
> - err = pci_alloc_irq_vectors(vp_dev->pci_dev, 2, 2,
> + nvectors = 2;
> + vp_dev->shared_vq_vec = true;
> + err = pci_alloc_irq_vectors(pdev, nvectors, nvectors,
>   PCI_IRQ_MSIX);
>   if (err < 0)
>   return err;
>   }
> - if (err < 0)
> - return err;
>  
>   vp_dev->msix_vectors = nvectors;
>   vp_dev->msix_names = kmalloc_array(nvectors,
> @@ -194,79 +196,99 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, 
> unsigned nvqs,
>   }
>  
>   /* Set the vector used for configuration */
> - snprintf(vp_dev->msix_names[0], sizeof(*vp_dev->msix_names),
> + snprintf(vp_dev->msix_names[msix_vec], sizeof(*vp_dev->msix_names),
>"%s-config", name);
> - err = request_irq(pci_irq_vector(vp_dev->pci_dev, 0), vp_config_changed,
> - 0, vp_dev->msix_names[0], vp_dev);
> + err = request_irq(pci_irq_vector(pdev, msix_vec), vp_config_changed, 0,
> +   vp_dev->msix_names[msix_vec], vp_dev);
>   if (err)
>   goto out_free_msix_affinity_masks;
>  
>   /* Verify we had enough resources to assign the vector */
> - if (vp_dev->config_vector(vp_dev, 0) == VIRTIO_MSI_NO_VECTOR) {
> + if (vp_dev->config_vector(vp_dev, msix_vec) == VIRTIO_MSI_NO_VECTOR) {
>   err = -EBUSY;
>   goto out_free_config_irq;
>   }
>  
> - vp_dev->msix_vector_map = kmalloc_array(nvqs,
> - sizeof(*vp_dev->msix_vector_map), GFP_KERNEL);
> - if (!vp_dev->msix_vector_map)
> - goto out_disable_config_irq;
> -
> - allocated_vectors = j = 1; /* vector 0 is the config interrupt */
> - for (i = 0; i < nvqs; ++i) {
> - if (!names[i]) {
> - vqs[i] = NULL;
> - continue;
> - }
> -
> - if (callbacks[i])
> -   

Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-03 Thread Michael S. Tsirkin
On Mon, Apr 03, 2017 at 04:18:23PM +0200, Christoph Hellwig wrote:
> Mike,
> 
> can you try the patch below?

It's really easy to test on qemu so I will - just add a dummy
virtio-serial-pci device with -device virtio-serial-pci and
add threadirqs to kernel command line.

However it doesn't look like this will fix the error recovery
for when request irq fails - it will just make the error less likely.

So we still need to look into that - failure should recover
and use the intx path, ATM it causes hybernation to hang.

> ---
> >From fe41a30b54878cc631623b7511267125e0da4b15 Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig 
> Date: Mon, 3 Apr 2017 14:51:35 +0200
> Subject: virtio_pci: don't use shared irq for virtqueues
> 
> Reimplement the shared irq feature manually, as we might have a larger
> number of virtqueues than the core shared interrupt code can handle
> in threaded interrupt mode.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/virtio/virtio_pci_common.c | 142 
> +
>  drivers/virtio/virtio_pci_common.h |   1 +
>  2 files changed, 83 insertions(+), 60 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_pci_common.c 
> b/drivers/virtio/virtio_pci_common.c
> index 590534910dc6..6dd719543410 100644
> --- a/drivers/virtio/virtio_pci_common.c
> +++ b/drivers/virtio/virtio_pci_common.c
> @@ -137,6 +137,9 @@ void vp_del_vqs(struct virtio_device *vdev)
>   kfree(vp_dev->msix_vector_map);
>   }
>  
> + /* free the shared virtuqueue irq if we don't use per-vq irqs */
> + if (vp_dev->shared_vq_vec)
> + free_irq(pci_irq_vector(vp_dev->pci_dev, 1), vp_dev);
>   free_irq(pci_irq_vector(vp_dev->pci_dev, 0), vp_dev);
>   pci_free_irq_vectors(vp_dev->pci_dev);
>  }
> @@ -147,10 +150,10 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, 
> unsigned nvqs,
>  {
>   struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>   const char *name = dev_name(&vp_dev->vdev.dev);
> - int i, j, err = -ENOMEM, allocated_vectors, nvectors;
> + struct pci_dev *pdev = vp_dev->pci_dev;
> + int i, err = -ENOMEM, nvectors;
>   unsigned flags = PCI_IRQ_MSIX;
> - bool shared = false;
> - u16 msix_vec;
> + u16 msix_vec = 0;
>  
>   if (desc) {
>   flags |= PCI_IRQ_AFFINITY;
> @@ -162,19 +165,18 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, 
> unsigned nvqs,
>   if (callbacks[i])
>   nvectors++;
>  
> - /* Try one vector per queue first. */
> - err = pci_alloc_irq_vectors_affinity(vp_dev->pci_dev, nvectors,
> - nvectors, flags, desc);
> + /* Try one vector for config and one per queue first. */
> + err = pci_alloc_irq_vectors_affinity(pdev, nvectors, nvectors, flags,
> + desc);
>   if (err < 0) {
>   /* Fallback to one vector for config, one shared for queues. */
> - shared = true;
> - err = pci_alloc_irq_vectors(vp_dev->pci_dev, 2, 2,
> + nvectors = 2;
> + vp_dev->shared_vq_vec = true;
> + err = pci_alloc_irq_vectors(pdev, nvectors, nvectors,
>   PCI_IRQ_MSIX);
>   if (err < 0)
>   return err;
>   }
> - if (err < 0)
> - return err;
>  
>   vp_dev->msix_vectors = nvectors;
>   vp_dev->msix_names = kmalloc_array(nvectors,
> @@ -194,79 +196,99 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, 
> unsigned nvqs,
>   }
>  
>   /* Set the vector used for configuration */
> - snprintf(vp_dev->msix_names[0], sizeof(*vp_dev->msix_names),
> + snprintf(vp_dev->msix_names[msix_vec], sizeof(*vp_dev->msix_names),
>"%s-config", name);
> - err = request_irq(pci_irq_vector(vp_dev->pci_dev, 0), vp_config_changed,
> - 0, vp_dev->msix_names[0], vp_dev);
> + err = request_irq(pci_irq_vector(pdev, msix_vec), vp_config_changed, 0,
> +   vp_dev->msix_names[msix_vec], vp_dev);
>   if (err)
>   goto out_free_msix_affinity_masks;
>  
>   /* Verify we had enough resources to assign the vector */
> - if (vp_dev->config_vector(vp_dev, 0) == VIRTIO_MSI_NO_VECTOR) {
> + if (vp_dev->config_vector(vp_dev, msix_vec) == VIRTIO_MSI_NO_VECTOR) {
>   err = -EBUSY;
>   goto out_free_config_irq;
>   }
>  
> - vp_dev->msix_vector_map = kmalloc_array(nvqs,
> - sizeof(*vp_dev->msix_vector_map), GFP_KERNEL);
> - if (!vp_dev->msix_vector_map)
> - goto out_disable_config_irq;
> -
> - allocated_vectors = j = 1; /* vector 0 is the config interrupt */
> - for (i = 0; i < nvqs; ++i) {
> - if (!names[i]) {
> - vqs[i] = NULL;
> - continue;
> - }
> -
> - if (callbacks[i])
> - msix_vec = allocated_vect

Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-03 Thread Christoph Hellwig
Mike,

can you try the patch below?

---
>From fe41a30b54878cc631623b7511267125e0da4b15 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig 
Date: Mon, 3 Apr 2017 14:51:35 +0200
Subject: virtio_pci: don't use shared irq for virtqueues

Reimplement the shared irq feature manually, as we might have a larger
number of virtqueues than the core shared interrupt code can handle
in threaded interrupt mode.

Signed-off-by: Christoph Hellwig 
---
 drivers/virtio/virtio_pci_common.c | 142 +
 drivers/virtio/virtio_pci_common.h |   1 +
 2 files changed, 83 insertions(+), 60 deletions(-)

diff --git a/drivers/virtio/virtio_pci_common.c 
b/drivers/virtio/virtio_pci_common.c
index 590534910dc6..6dd719543410 100644
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -137,6 +137,9 @@ void vp_del_vqs(struct virtio_device *vdev)
kfree(vp_dev->msix_vector_map);
}
 
+   /* free the shared virtuqueue irq if we don't use per-vq irqs */
+   if (vp_dev->shared_vq_vec)
+   free_irq(pci_irq_vector(vp_dev->pci_dev, 1), vp_dev);
free_irq(pci_irq_vector(vp_dev->pci_dev, 0), vp_dev);
pci_free_irq_vectors(vp_dev->pci_dev);
 }
@@ -147,10 +150,10 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, 
unsigned nvqs,
 {
struct virtio_pci_device *vp_dev = to_vp_device(vdev);
const char *name = dev_name(&vp_dev->vdev.dev);
-   int i, j, err = -ENOMEM, allocated_vectors, nvectors;
+   struct pci_dev *pdev = vp_dev->pci_dev;
+   int i, err = -ENOMEM, nvectors;
unsigned flags = PCI_IRQ_MSIX;
-   bool shared = false;
-   u16 msix_vec;
+   u16 msix_vec = 0;
 
if (desc) {
flags |= PCI_IRQ_AFFINITY;
@@ -162,19 +165,18 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, 
unsigned nvqs,
if (callbacks[i])
nvectors++;
 
-   /* Try one vector per queue first. */
-   err = pci_alloc_irq_vectors_affinity(vp_dev->pci_dev, nvectors,
-   nvectors, flags, desc);
+   /* Try one vector for config and one per queue first. */
+   err = pci_alloc_irq_vectors_affinity(pdev, nvectors, nvectors, flags,
+   desc);
if (err < 0) {
/* Fallback to one vector for config, one shared for queues. */
-   shared = true;
-   err = pci_alloc_irq_vectors(vp_dev->pci_dev, 2, 2,
+   nvectors = 2;
+   vp_dev->shared_vq_vec = true;
+   err = pci_alloc_irq_vectors(pdev, nvectors, nvectors,
PCI_IRQ_MSIX);
if (err < 0)
return err;
}
-   if (err < 0)
-   return err;
 
vp_dev->msix_vectors = nvectors;
vp_dev->msix_names = kmalloc_array(nvectors,
@@ -194,79 +196,99 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, 
unsigned nvqs,
}
 
/* Set the vector used for configuration */
-   snprintf(vp_dev->msix_names[0], sizeof(*vp_dev->msix_names),
+   snprintf(vp_dev->msix_names[msix_vec], sizeof(*vp_dev->msix_names),
 "%s-config", name);
-   err = request_irq(pci_irq_vector(vp_dev->pci_dev, 0), vp_config_changed,
-   0, vp_dev->msix_names[0], vp_dev);
+   err = request_irq(pci_irq_vector(pdev, msix_vec), vp_config_changed, 0,
+ vp_dev->msix_names[msix_vec], vp_dev);
if (err)
goto out_free_msix_affinity_masks;
 
/* Verify we had enough resources to assign the vector */
-   if (vp_dev->config_vector(vp_dev, 0) == VIRTIO_MSI_NO_VECTOR) {
+   if (vp_dev->config_vector(vp_dev, msix_vec) == VIRTIO_MSI_NO_VECTOR) {
err = -EBUSY;
goto out_free_config_irq;
}
 
-   vp_dev->msix_vector_map = kmalloc_array(nvqs,
-   sizeof(*vp_dev->msix_vector_map), GFP_KERNEL);
-   if (!vp_dev->msix_vector_map)
-   goto out_disable_config_irq;
-
-   allocated_vectors = j = 1; /* vector 0 is the config interrupt */
-   for (i = 0; i < nvqs; ++i) {
-   if (!names[i]) {
-   vqs[i] = NULL;
-   continue;
-   }
-
-   if (callbacks[i])
-   msix_vec = allocated_vectors;
-   else
-   msix_vec = VIRTIO_MSI_NO_VECTOR;
-
-   vqs[i] = vp_dev->setup_vq(vp_dev, i, callbacks[i], names[i],
-   msix_vec);
-   if (IS_ERR(vqs[i])) {
-   err = PTR_ERR(vqs[i]);
-   goto out_remove_vqs;
+   msix_vec++;
+
+   /*
+* Use a different vector for each queue if they are available,
+* else share the same vector for all VQs.
+*/
+   if (vp_dev->shared_vq_vec) {
+   snprintf(vp_

Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-31 Thread Michael S. Tsirkin
On Fri, Mar 31, 2017 at 10:20:49AM +0200, Christoph Hellwig wrote:
> On Fri, Mar 31, 2017 at 06:22:31AM +0300, Michael S. Tsirkin wrote:
> > I'm not sure why does it fail after 32 on 64 bit, but as
> > virtio devices aren't limited to 32 vqs it looks like we
> > should go back to requesting the irq only once for all vqs.
> 
> Meh.
> 
> > 
> > Christoph, should I just revert for now, or do you
> > want to look into a smaller patch for this?
> 
> I think we'll need to do a different patch than just a simple revert,
> mostly because so much infrastructure depends on the patch.
> 
> I'll take a look over the weekend.
> 
> > Another question is looking into intx support - that
> > should work but it seems to be broken at the moment.
> 
> Does it?  I'm pretty sure I tested it back when I came up with the
> series by artifically disabling MSI-X in the kernel.  I can try this
> again, though.

I'm not 100% sure - what I see is that we do not handle failure to
request irqs correctly, we seem to fall back on intx but
the following freeze then blows up trying to free non-existing
vectors.

Does not seem to trigger with just msix off so maybe that is
simply failure to recover from an error correctly.


-- 
MST


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-31 Thread Christoph Hellwig
On Fri, Mar 31, 2017 at 06:22:31AM +0300, Michael S. Tsirkin wrote:
> I'm not sure why does it fail after 32 on 64 bit, but as
> virtio devices aren't limited to 32 vqs it looks like we
> should go back to requesting the irq only once for all vqs.

Meh.

> 
> Christoph, should I just revert for now, or do you
> want to look into a smaller patch for this?

I think we'll need to do a different patch than just a simple revert,
mostly because so much infrastructure depends on the patch.

I'll take a look over the weekend.

> Another question is looking into intx support - that
> should work but it seems to be broken at the moment.

Does it?  I'm pretty sure I tested it back when I came up with the
series by artifically disabling MSI-X in the kernel.  I can try this
again, though.


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-30 Thread Michael S. Tsirkin
On Fri, Mar 31, 2017 at 04:23:35AM +0300, Michael S. Tsirkin wrote:
> On Thu, Mar 30, 2017 at 09:20:35AM +0200, Mike Galbraith wrote:
> > On Thu, 2017-03-30 at 05:10 +0200, Mike Galbraith wrote:
> > 
> > > WRT spin, you should need do nothing more than boot with threadirqs,
> > > that's 100% repeatable here in absolutely virgin source.
> > 
> > No idea why virtqueue_get_buf() in __send_control_msg() fails forever
> > with threadirqs, but marking that vq as being busted (it clearly is)
> > results in one gripe, and a vbox that seemingly cares not one whit that
> > something went missing.  CONFIG_DEBUG_SHIRQ OTOH notices, mutters
> > something that sounds like "idiot" when I hibernate the thing ;-)
> > 
> > diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
> > index e9b7e0b3cabe..831406dae1cb 100644
> > --- a/drivers/char/virtio_console.c
> > +++ b/drivers/char/virtio_console.c
> > @@ -567,6 +567,7 @@ static ssize_t __send_control_msg(struct ports_device 
> > *portdev, u32 port_id,
> > struct scatterlist sg[1];
> > struct virtqueue *vq;
> > unsigned int len;
> > +   unsigned long deadline = jiffies+1;
> >  
> > if (!use_multiport(portdev))
> > return 0;
> > @@ -583,9 +584,13 @@ static ssize_t __send_control_msg(struct ports_device 
> > *portdev, u32 port_id,
> >  
> > if (virtqueue_add_outbuf(vq, sg, 1, &portdev->cpkt, GFP_ATOMIC) == 0) {
> > virtqueue_kick(vq);
> > -   while (!virtqueue_get_buf(vq, &len)
> > -   && !virtqueue_is_broken(vq))
> > +   while (!virtqueue_get_buf(vq, &len) && 
> > !virtqueue_is_broken(vq)) {
> > cpu_relax();
> > +   if (time_after(jiffies, deadline)) {
> > +   trace_printk("Aw crap, I'm stuck.. breaking 
> > device\n");
> > +   virtio_break_device(portdev->vdev);
> > +   }
> > +   }
> > }
> >  
> > spin_unlock(&portdev->c_ovq_lock);
> 
> 
> OK so with your help I was able to reproduce. Surprisingly easy:
> 
> 1. add threadirqs
> 2. add to qemu -device virtio-serial-pci -no-shutdown
> 3. within guest, do echo disk > /sys/power/state
> 
> This produces a warning. Looking deeper into it, I find:
> the device has 64 vqs. This line
> 
>err = request_irq(pci_irq_vector(vp_dev->pci_dev, msix_vec),
>   vring_interrupt, IRQF_SHARED,
>   vp_dev->msix_names[j], vqs[i]);
> 
> fails after assigning interrupts to 33 vqs.
> Is there a limit to how many threaded irqs can share a line?

In fact it fails on the 33'rd one, and I see this:

/*
 * Unlikely to have 32 resp 64 irqs sharing one line,
 * but who knows.
 */
if (thread_mask == ~0UL) {
printk(KERN_ERR "%s +%d\n", __FILE__, __LINE__);
ret = -EBUSY;
goto out_mask;
}


I'm not sure why does it fail after 32 on 64 bit, but as
virtio devices aren't limited to 32 vqs it looks like we
should go back to requesting the irq only once for all vqs.

Christoph, should I just revert for now, or do you
want to look into a smaller patch for this?

Another question is looking into intx support - that
should work but it seems to be broken at the moment.


> 
> If so we need to rethink the whole approach.
> 
> Still looking into it.
> 
> Christoph, any idea?
> 
> 
> -- 
> MST


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-30 Thread Mike Galbraith
On Thu, 2017-03-30 at 05:10 +0200, Mike Galbraith wrote:

> WRT spin, you should need do nothing more than boot with threadirqs,
> that's 100% repeatable here in absolutely virgin source.

No idea why virtqueue_get_buf() in __send_control_msg() fails forever
with threadirqs, but marking that vq as being busted (it clearly is)
results in one gripe, and a vbox that seemingly cares not one whit that
something went missing.  CONFIG_DEBUG_SHIRQ OTOH notices, mutters
something that sounds like "idiot" when I hibernate the thing ;-)

diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index e9b7e0b3cabe..831406dae1cb 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -567,6 +567,7 @@ static ssize_t __send_control_msg(struct ports_device 
*portdev, u32 port_id,
struct scatterlist sg[1];
struct virtqueue *vq;
unsigned int len;
+   unsigned long deadline = jiffies+1;
 
if (!use_multiport(portdev))
return 0;
@@ -583,9 +584,13 @@ static ssize_t __send_control_msg(struct ports_device 
*portdev, u32 port_id,
 
if (virtqueue_add_outbuf(vq, sg, 1, &portdev->cpkt, GFP_ATOMIC) == 0) {
virtqueue_kick(vq);
-   while (!virtqueue_get_buf(vq, &len)
-   && !virtqueue_is_broken(vq))
+   while (!virtqueue_get_buf(vq, &len) && 
!virtqueue_is_broken(vq)) {
cpu_relax();
+   if (time_after(jiffies, deadline)) {
+   trace_printk("Aw crap, I'm stuck.. breaking 
device\n");
+   virtio_break_device(portdev->vdev);
+   }
+   }
}
 
spin_unlock(&portdev->c_ovq_lock);


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-29 Thread Mike Galbraith
On Wed, 2017-03-29 at 23:19 +0300, Michael S. Tsirkin wrote:
>  > >  > > > > > > > >  &portdev->max_nr_ports) == 0) {
> > @@ -2179,7 +2179,9 @@ static struct virtio_device_id id_table[
> >  
> >  static unsigned int features[] = {
> >  > >> > VIRTIO_CONSOLE_F_SIZE,
> > +#ifndef CONFIG_IRQ_FORCED_THREADING
> >  > >> > VIRTIO_CONSOLE_F_MULTIPORT,
> > +#endif
> >  };
> 
> These look kind of questionable.
> Is this part needed?

I would have sworn it was, but double checking, nope, it's not.

Hm, so I could make a prettier bandaid with a runtime check.. but it'd
remain a bandaid, so I'll go do some beans 'n' biscuits work instead.

-Mike


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-29 Thread Mike Galbraith
On Wed, 2017-03-29 at 23:10 +0300, Michael S. Tsirkin wrote:

> Poking at this some more, I was able to reproduce at
> least some warnings. I still do not see a spin
> but is there a chance this helps your case too?

Well, it's down to one warning, clean on the way back up.

WRT spin, you should need do nothing more than boot with threadirqs,
that's 100% repeatable here in absolutely virgin source.  Attaching
(obese enterprise-ish) config.

[  174.147626] [ cut here ]
[  174.147640] WARNING: CPU: 7 PID: 339 at drivers/pci/msi.c:1251 
pci_irq_vector+0xcb/0xe0
[  174.147640] Modules linked in: dm_mod(E) fuse(E) ebtable_filter(E) 
ebtables(E) nf_log_ipv6(E) rpcsec_gss_krb5(E) xt_pkttype(E) nfsv4(E) 
nf_log_ipv4(E) nf_log_common(E) dns_resolver(E) xt_LOG(E) xt_limit(E) nfs(E) 
fscache(E) af_packet(E) iscsi_ibft(E) iscsi_boot_sysfs(E) ip6t_REJECT(E) 
xt_tcpudp(E) nf_conntrack_ipv6(E) nf_defrag_ipv6(E) ip6table_raw(E) 
ipt_REJECT(E) iptable_raw(E) xt_CT(E) iptable_filter(E) ip6table_mangle(E) 
nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nf_conntrack_ipv4(E) 
nf_defrag_ipv4(E) ip_tables(E) xt_conntrack(E) nf_conntrack(E) libcrc32c(E) 
ip6table_filter(E) ip6_tables(E) x_tables(E) snd_hda_codec_generic(E) 
snd_hda_intel(E) snd_hda_codec(E) joydev(E) snd_hda_core(E) snd_hwdep(E) 
snd_pcm(E) snd_timer(E) crct10dif_pclmul(E) snd(E) crc32_pclmul(E) 
ghash_clmulni_intel(E)
[  174.147664]  pcbc(E) soundcore(E) 8139too(E) aesni_intel(E) i2c_piix4(E) 
ppdev(E) aes_x86_64(E) virtio_balloon(E) crypto_simd(E) parport_pc(E) 
glue_helper(E) serio_raw(E) pcspkr(E) parport(E) cryptd(E) button(E) 
acpi_cpufreq(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) sunrpc(E) 
ext4(E) crc16(E) jbd2(E) mbcache(E) hid_generic(E) usbhid(E) sr_mod(E) cdrom(E) 
ata_generic(E) ata_piix(E) virtio_console(E) virtio_blk(E) virtio_rng(E) qxl(E) 
drm_kms_helper(E) syscopyarea(E) sysfillrect(E) ehci_pci(E) sysimgblt(E) 
ahci(E) fb_sys_fops(E) libahci(E) ttm(E) uhci_hcd(E) ehci_hcd(E) virtio_pci(E) 
virtio_ring(E) drm(E) crc32c_intel(E) 8139cp(E) libata(E) usbcore(E) mii(E) 
virtio(E) floppy(E) sg(E) scsi_mod(E) autofs4(E)
[  174.147702] CPU: 7 PID: 339 Comm: kworker/u16:3 Tainted: GE   
4.11.0-default #2
[  174.147702] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.8.1-0-g4adadbd-20161202_174313-build11a 04/01/2014
[  174.147707] Workqueue: events_unbound async_run_entry_fn
[  174.147708] Call Trace:
[  174.147713]  ? dump_stack+0x5c/0x85
[  174.147718]  ? __warn+0xc4/0xe0
[  174.147721]  ? pci_pm_poweroff+0xf0/0xf0
[  174.147722]  ? pci_irq_vector+0xcb/0xe0
[  174.147725]  ? vp_synchronize_vectors+0x3e/0x50 [virtio_pci]
[  174.147727]  ? virtcons_freeze+0x1f/0xa0 [virtio_console]
[  174.147729]  ? virtio_pci_freeze+0x19/0x40 [virtio_pci]
[  174.147730]  ? pci_pm_freeze+0x59/0xe0
[  174.147737]  ? dpm_run_callback+0x4d/0x170
[  174.147738]  ? __device_suspend+0x11f/0x3b0
[  174.147739]  ? pm_dev_dbg+0x70/0x70
[  174.147739]  ? async_suspend+0x1a/0x90
[  174.147740]  ? async_run_entry_fn+0x34/0x160
[  174.147742]  ? process_one_work+0x164/0x430
[  174.147743]  ? worker_thread+0x135/0x4d0
[  174.147744]  ? kthread+0xff/0x140
[  174.147745]  ? rescuer_thread+0x3c0/0x3c0
[  174.147746]  ? kthread_park+0x80/0x80
[  174.147753]  ? ret_from_fork+0x26/0x40
[  174.147754] ---[ end trace 02cd3f1b527dc954 ]---

config.xz
Description: application/xz


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-29 Thread Michael S. Tsirkin
On Wed, Mar 29, 2017 at 08:23:22AM +0200, Mike Galbraith wrote:
> On Mon, 2017-03-27 at 20:18 +0200, Mike Galbraith wrote:
> 
> > BTW, WRT RT woes with $subject, I tried booting a generic kernel with
> > threadirqs, and bingo, same deal, just a bit more painful than for RT,
> > where there's no watchdog moaning accompanying the (preemptible) spin.
> 
> BTW++: the last hunk of this bandaid may be a bug fix.  With only the
> first two, box tried to use uninitialized stuff on hibernate, went
> boom.  Looks like that may be possible without help from me.
> 
> --- a/drivers/char/virtio_console.c
> +++ b/drivers/char/virtio_console.c
> @@ -2058,7 +2058,7 @@ static int virtcons_probe(struct virtio_
>   portdev->max_nr_ports = 1;
>  
>   /* Don't test MULTIPORT at all if we're rproc: not a valid feature! */
> - if (!is_rproc_serial(vdev) &&
> + if (!is_rproc_serial(vdev) && !IS_ENABLED(CONFIG_IRQ_FORCED_THREADING) 
> &&
>   virtio_cread_feature(vdev, VIRTIO_CONSOLE_F_MULTIPORT,
>struct virtio_console_config, max_nr_ports,
>&portdev->max_nr_ports) == 0) {
> @@ -2179,7 +2179,9 @@ static struct virtio_device_id id_table[
>  
>  static unsigned int features[] = {
>   VIRTIO_CONSOLE_F_SIZE,
> +#ifndef CONFIG_IRQ_FORCED_THREADING
>   VIRTIO_CONSOLE_F_MULTIPORT,
> +#endif
>  };

These look kind of questionable.
Is this part needed?

>  static struct virtio_device_id rproc_serial_id_table[] = {
> @@ -2202,14 +2204,16 @@ static int virtcons_freeze(struct virtio
>  
>   vdev->config->reset(vdev);
>  
> - virtqueue_disable_cb(portdev->c_ivq);
> + if (use_multiport(portdev))
> + virtqueue_disable_cb(portdev->c_ivq);
>   cancel_work_sync(&portdev->control_work);
>   cancel_work_sync(&portdev->config_work);
>   /*
>* Once more: if control_work_handler() was running, it would
>* enable the cb as the last step.
>*/
> - virtqueue_disable_cb(portdev->c_ivq);
> + if (use_multiport(portdev))
> + virtqueue_disable_cb(portdev->c_ivq);
>   remove_controlq_data(portdev);
>  
>   list_for_each_entry(port, &portdev->ports, list) {

This looks real. No idea why would interrupt sharing
trigger anything like this but go figure.
Can you pls submit this separately with
a signature?

-- 
MST


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-29 Thread Michael S. Tsirkin
On Wed, Mar 29, 2017 at 08:23:22AM +0200, Mike Galbraith wrote:
> On Mon, 2017-03-27 at 20:18 +0200, Mike Galbraith wrote:
> 
> > BTW, WRT RT woes with $subject, I tried booting a generic kernel with
> > threadirqs, and bingo, same deal, just a bit more painful than for RT,
> > where there's no watchdog moaning accompanying the (preemptible) spin.
> 
> BTW++: the last hunk of this bandaid may be a bug fix.  With only the
> first two, box tried to use uninitialized stuff on hibernate, went
> boom.  Looks like that may be possible without help from me.
> 
> --- a/drivers/char/virtio_console.c
> +++ b/drivers/char/virtio_console.c
> @@ -2058,7 +2058,7 @@ static int virtcons_probe(struct virtio_
>   portdev->max_nr_ports = 1;
>  
>   /* Don't test MULTIPORT at all if we're rproc: not a valid feature! */
> - if (!is_rproc_serial(vdev) &&
> + if (!is_rproc_serial(vdev) && !IS_ENABLED(CONFIG_IRQ_FORCED_THREADING) 
> &&
>   virtio_cread_feature(vdev, VIRTIO_CONSOLE_F_MULTIPORT,
>struct virtio_console_config, max_nr_ports,
>&portdev->max_nr_ports) == 0) {
> @@ -2179,7 +2179,9 @@ static struct virtio_device_id id_table[
>  
>  static unsigned int features[] = {
>   VIRTIO_CONSOLE_F_SIZE,
> +#ifndef CONFIG_IRQ_FORCED_THREADING
>   VIRTIO_CONSOLE_F_MULTIPORT,
> +#endif
>  };
>  
>  static struct virtio_device_id rproc_serial_id_table[] = {
> @@ -2202,14 +2204,16 @@ static int virtcons_freeze(struct virtio
>  
>   vdev->config->reset(vdev);
>  
> - virtqueue_disable_cb(portdev->c_ivq);
> + if (use_multiport(portdev))
> + virtqueue_disable_cb(portdev->c_ivq);
>   cancel_work_sync(&portdev->control_work);
>   cancel_work_sync(&portdev->config_work);
>   /*
>* Once more: if control_work_handler() was running, it would
>* enable the cb as the last step.
>*/
> - virtqueue_disable_cb(portdev->c_ivq);
> + if (use_multiport(portdev))
> + virtqueue_disable_cb(portdev->c_ivq);
>   remove_controlq_data(portdev);
>  
>   list_for_each_entry(port, &portdev->ports, list) {


Poking at this some more, I was able to reproduce at
least some warnings. I still do not see a spin
but is there a chance this helps your case too?

commit 85039ca3162295759cf986aa753778043a90012c
Author: Michael S. Tsirkin 
Date:   Wed Mar 29 23:02:28 2017 +0300

virtio_pci: fix msix vector tracking on cleanup

virtio pci tracks allocated vectors in a variable: msix_vectors. This
isn't reset on del_vqs, as a result if reset is called after vqs are
deleted we try to synchronize non-existing irqs producing a (probably
harmless) warning.

Fixes: 07ec51480b5e ("virtio_pci: use shared interrupts for virtqueues")
Signed-off-by: Michael S. Tsirkin 

diff --git a/drivers/virtio/virtio_pci_common.c 
b/drivers/virtio/virtio_pci_common.c
index baae423..a70bed6 100644
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -151,6 +151,7 @@ void vp_del_vqs(struct virtio_device *vdev)
}
 
free_irq(pci_irq_vector(vp_dev->pci_dev, 0), vp_dev);
+   vp_dev->msix_vectors = 0;
pci_free_irq_vectors(vp_dev->pci_dev);
 }
 
@@ -294,6 +295,7 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, 
unsigned nvqs,
 out_free_msix_names:
kfree(vp_dev->msix_names);
 out_free_irq_vectors:
+   vp_dev->msix_vectors = 0;
pci_free_irq_vectors(vp_dev->pci_dev);
return err;
 }


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-28 Thread Mike Galbraith
On Mon, 2017-03-27 at 20:18 +0200, Mike Galbraith wrote:

> BTW, WRT RT woes with $subject, I tried booting a generic kernel with
> threadirqs, and bingo, same deal, just a bit more painful than for RT,
> where there's no watchdog moaning accompanying the (preemptible) spin.

BTW++: the last hunk of this bandaid may be a bug fix.  With only the
first two, box tried to use uninitialized stuff on hibernate, went
boom.  Looks like that may be possible without help from me.

--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -2058,7 +2058,7 @@ static int virtcons_probe(struct virtio_
portdev->max_nr_ports = 1;
 
/* Don't test MULTIPORT at all if we're rproc: not a valid feature! */
-   if (!is_rproc_serial(vdev) &&
+   if (!is_rproc_serial(vdev) && !IS_ENABLED(CONFIG_IRQ_FORCED_THREADING) 
&&
virtio_cread_feature(vdev, VIRTIO_CONSOLE_F_MULTIPORT,
 struct virtio_console_config, max_nr_ports,
 &portdev->max_nr_ports) == 0) {
@@ -2179,7 +2179,9 @@ static struct virtio_device_id id_table[
 
 static unsigned int features[] = {
VIRTIO_CONSOLE_F_SIZE,
+#ifndef CONFIG_IRQ_FORCED_THREADING
VIRTIO_CONSOLE_F_MULTIPORT,
+#endif
 };
 
 static struct virtio_device_id rproc_serial_id_table[] = {
@@ -2202,14 +2204,16 @@ static int virtcons_freeze(struct virtio
 
vdev->config->reset(vdev);
 
-   virtqueue_disable_cb(portdev->c_ivq);
+   if (use_multiport(portdev))
+   virtqueue_disable_cb(portdev->c_ivq);
cancel_work_sync(&portdev->control_work);
cancel_work_sync(&portdev->config_work);
/*
 * Once more: if control_work_handler() was running, it would
 * enable the cb as the last step.
 */
-   virtqueue_disable_cb(portdev->c_ivq);
+   if (use_multiport(portdev))
+   virtqueue_disable_cb(portdev->c_ivq);
remove_controlq_data(portdev);
 
list_for_each_entry(port, &portdev->ports, list) {


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-28 Thread Mike Galbraith
On Tue, 2017-03-28 at 20:27 +0300, Michael S. Tsirkin wrote:
> On Tue, Mar 28, 2017 at 06:33:53PM +0200, Mike Galbraith wrote:
> > On Tue, 2017-03-28 at 18:37 +0300, Michael S. Tsirkin wrote:
> > 
> > > Anything specific that you do to trigger this?
> > 
> > Nope, all I have to do is to poke kde Power/Session Hibernate
> > button.  
> 
> Oh so you actually start hypernate? Is this what you mean when
> you say "poke"?

s/hyper/hiber, but yes, and button poking == mouse clicking.

-Mike


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-28 Thread Michael S. Tsirkin
On Tue, Mar 28, 2017 at 06:33:53PM +0200, Mike Galbraith wrote:
> On Tue, 2017-03-28 at 18:37 +0300, Michael S. Tsirkin wrote:
> 
> > Anything specific that you do to trigger this?
> 
> Nope, all I have to do is to poke kde Power/Session Hibernate button.  

Oh so you actually start hypernate? Is this what you mean when
you say "poke"?

> Not that it should matter, but the vm is a full clone of my 42.1 box,
> including git server/repos etc, so has all whistles/bells/lard.
> 
>   -Mike


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-28 Thread Mike Galbraith
On Tue, 2017-03-28 at 18:37 +0300, Michael S. Tsirkin wrote:

> Anything specific that you do to trigger this?

Nope, all I have to do is to poke kde Power/Session Hibernate button.  

Not that it should matter, but the vm is a full clone of my 42.1 box,
including git server/repos etc, so has all whistles/bells/lard.

-Mike



Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-28 Thread Michael S. Tsirkin
On Tue, Mar 28, 2017 at 05:16:13AM +0200, Mike Galbraith wrote:
> On Tue, 2017-03-28 at 05:35 +0300, Michael S. Tsirkin wrote:
> > On Tue, Mar 28, 2017 at 03:08:20AM +0200, Mike Galbraith wrote:
> > > On Mon, 2017-03-27 at 21:16 +0300, Michael S. Tsirkin wrote:
> > > 
> > > > Mike, could you pls send lspci -vv that shows up after
> > > > boot?
> > > 
> > > Presuming you mean the virtual box..
> > 
> > Yes. Hmm nothing strange here. Can you pls post your QEMU
> > command line so I can try reproducing?
> 
> I don't start from the command line, I poke buttons in gui tool for
> virt-weenies, below is ps result (hope your monitor is 37 feet wide).
> 
> /usr/bin/qemu-system-x86_64 -name opensuse42.1 -S -machine 
> pc-i440fx-2.3,accel=kvm,usb=off,vmport=off -cpu 
> Haswell-noTSX,+abm,+pdpe1gb,+rdrand,+f16c,+osxsave,+pdcm,+xtpr,+tm2,+est,+smx,+vmx,+ds_cpl,+monitor,+dtes64,+pbe,+tm,+ht,+ss,+acpi,+ds,+vme
>  -m 8192 -realtime mlock=off -smp 8,sockets=1,cores=1,threads=8 -uuid 
> afff4e95-262d-41ca-9189-f40c87c9375b -no-user-config -nodefaults -chardev 
> socket,id=charmonitor,path=/var/lib/libvirt/qemu/opensuse42.1.monitor,server,nowait
>  -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew 
> -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -global 
> PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device 
> ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x6.0x7 -device 
> ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x6
>  -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x6.0x1 
> -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x6.0x2 
> -device ahci,id=sata0,bus=pci.0,addr=0x5 -device 
> virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -drive 
> file=/abuild/lib/libvirt/images/opensuse42.1.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=off
>  -device 
> virtio-blk-pci,scsi=off,bus=pci.0,addr=0xa,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
>  -drive 
> file=/dev/sr0,if=none,media=cdrom,id=drive-sata0-0-0,readonly=on,format=raw 
> -device ide-cd,bus=sata0.0,drive=drive-sata0-0-0,id=sata0-0-0 -netdev 
> tap,fd=22,id=hostnet0 -device 
> rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:be:db:82,bus=pci.0,addr=0x3 
> -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 
> -chardev spicevmc,id=charchannel0,name=vdagent -device 
> virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0
>  -chardev 
> socket,id=charchannel1,path=/var/lib/libvirt/qemu/channel/target/opensuse42.1.org.qemu.guest_agent.0,server,nowait
>  -device 
> virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0
>  -device usb-tablet,id=input0 -spice 
> port=5900,addr=127.0.0.1,disable-ticketing,seamless-migration=on -k de 
> -device 
> qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vgamem_mb=16,bus=pci.0,addr=0x2
>  -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device 
> hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -chardev 
> spicevmc,id=charredir0,name=usbredir -device 
> usb-redir,chardev=charredir0,id=redir0 -chardev 
> spicevmc,id=charredir1,name=usbredir -device 
> usb-redir,chardev=charredir1,id=redir1 -device 
> virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -object 
> rng-random,id=objrng0,filename=/dev/random -device 
> virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x9 -msg timestamp=on
> 
> If you want the xml instead, holler.
> 
>   -Mike

No, that's fine, thanks. Anything specific that you do to trigger this?

-- 
MST


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-27 Thread Mike Galbraith
On Tue, 2017-03-28 at 05:35 +0300, Michael S. Tsirkin wrote:
> On Tue, Mar 28, 2017 at 03:08:20AM +0200, Mike Galbraith wrote:
> > On Mon, 2017-03-27 at 21:16 +0300, Michael S. Tsirkin wrote:
> > 
> > > Mike, could you pls send lspci -vv that shows up after
> > > boot?
> > 
> > Presuming you mean the virtual box..
> 
> Yes. Hmm nothing strange here. Can you pls post your QEMU
> command line so I can try reproducing?

I don't start from the command line, I poke buttons in gui tool for
virt-weenies, below is ps result (hope your monitor is 37 feet wide).

/usr/bin/qemu-system-x86_64 -name opensuse42.1 -S -machine 
pc-i440fx-2.3,accel=kvm,usb=off,vmport=off -cpu 
Haswell-noTSX,+abm,+pdpe1gb,+rdrand,+f16c,+osxsave,+pdcm,+xtpr,+tm2,+est,+smx,+vmx,+ds_cpl,+monitor,+dtes64,+pbe,+tm,+ht,+ss,+acpi,+ds,+vme
 -m 8192 -realtime mlock=off -smp 8,sockets=1,cores=1,threads=8 -uuid 
afff4e95-262d-41ca-9189-f40c87c9375b -no-user-config -nodefaults -chardev 
socket,id=charmonitor,path=/var/lib/libvirt/qemu/opensuse42.1.monitor,server,nowait
 -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew 
-global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -global 
PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device 
ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x6.0x7 -device 
ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x6 
-device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x6.0x1 
-device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x6.0x2 
-device ahci,id=sata0,bus=pci.0,addr=0x5 -device 
virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -drive 
file=/abuild/lib/libvirt/images/opensuse42.1.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=off
 -device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0xa,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
 -drive 
file=/dev/sr0,if=none,media=cdrom,id=drive-sata0-0-0,readonly=on,format=raw 
-device ide-cd,bus=sata0.0,drive=drive-sata0-0-0,id=sata0-0-0 -netdev 
tap,fd=22,id=hostnet0 -device 
rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:be:db:82,bus=pci.0,addr=0x3 
-chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 
-chardev spicevmc,id=charchannel0,name=vdagent -device 
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0
 -chardev 
socket,id=charchannel1,path=/var/lib/libvirt/qemu/channel/target/opensuse42.1.org.qemu.guest_agent.0,server,nowait
 -device 
virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0
 -device usb-tablet,id=input0 -spice 
port=5900,addr=127.0.0.1,disable-ticketing,seamless-migration=on -k de -device 
qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vgamem_mb=16,bus=pci.0,addr=0x2
 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device 
hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -chardev 
spicevmc,id=charredir0,name=usbredir -device 
usb-redir,chardev=charredir0,id=redir0 -chardev 
spicevmc,id=charredir1,name=usbredir -device 
usb-redir,chardev=charredir1,id=redir1 -device 
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -object 
rng-random,id=objrng0,filename=/dev/random -device 
virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x9 -msg timestamp=on

If you want the xml instead, holler.

-Mike


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-27 Thread Michael S. Tsirkin
On Tue, Mar 28, 2017 at 03:08:20AM +0200, Mike Galbraith wrote:
> On Mon, 2017-03-27 at 21:16 +0300, Michael S. Tsirkin wrote:
> 
> > Mike, could you pls send lspci -vv that shows up after
> > boot?
> 
> Presuming you mean the virtual box..

Yes. Hmm nothing strange here. Can you pls post your QEMU
command line so I can try reproducing?


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-27 Thread Mike Galbraith
On Mon, 2017-03-27 at 21:16 +0300, Michael S. Tsirkin wrote:

> Mike, could you pls send lspci -vv that shows up after
> boot?

Presuming you mean the virtual box..

00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
Subsystem: Red Hat, Inc Qemu virtual machine
Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR+ FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- 
SERR- TAbort- 
SERR- TAbort- 
SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- 

Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-27 Thread Mike Galbraith
On Mon, 2017-03-27 at 19:05 +0200, Christoph Hellwig wrote:
> Hi Mike,
> 
> does the patch below fix that issue for you?

Nope, warnings are alive and well.

> diff --git a/drivers/virtio/virtio_pci_common.c
> b/drivers/virtio/virtio_pci_common.c
> index df548a6fb844..fd1b06368b1f 100644
> --- a/drivers/virtio/virtio_pci_common.c
> +++ b/drivers/virtio/virtio_pci_common.c
> @@ -176,7 +176,7 @@ static int vp_find_vqs_msix(struct virtio_device
> *vdev, unsigned nvqs,
>   if (err < 0)
>   return err;
>  
> - vp_dev->msix_vectors = nvectors;
> + vp_dev->msix_vectors = err; /* number of vectors allocated
> */
>   vp_dev->msix_names = kmalloc_array(nvectors,
>   sizeof(*vp_dev->msix_names), GFP_KERNEL);
>   if (!vp_dev->msix_names)


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-27 Thread Mike Galbraith
On Mon, 2017-03-27 at 19:05 +0200, Christoph Hellwig wrote:
> Hi Mike,
> 
> does the patch below fix that issue for you?

Thanks, I'll give it a go in the A.M.

BTW, WRT RT woes with $subject, I tried booting a generic kernel with
threadirqs, and bingo, same deal, just a bit more painful than for RT,
where there's no watchdog moaning accompanying the (preemptible) spin.

[   28.346311] NMI watchdog: BUG: soft lockup - CPU#7 stuck for 22s! 
[kworker/7:1:108]
[   28.347536] Modules linked in: virtio_rng(E) virtio_blk(E) virtio_console(E) 
ata_piix(E) qxl(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) 
fb_sys_fops(E) ttm(E) ahci(E) libahci(E) drm(E) ehci_pci(E) uhci_hcd(E) 
ehci_hcd(E) usbcore(E) libata(E) virtio_pci(E) virtio_ring(E) virtio(E) 
8139cp(E) floppy(E) mii(E) sg(E) scsi_mod(E) autofs4(E)
[   28.351160] CPU: 7 PID: 108 Comm: kworker/7:1 Tainted: GE   
4.11.0-default #30
[   28.352085] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.8.1-0-g4adadbd-20161202_174313-build11a 04/01/2014
[   28.353547] Workqueue: events control_work_handler [virtio_console]
[   28.354450] task: 8802370d4440 task.stack: c900010d8000
[   28.355281] RIP: 0010:__send_control_msg+0xbd/0xd0 [virtio_console]
[   28.356005] RSP: 0018:c900010dbd20 EFLAGS: 0246 ORIG_RAX: 
ff10
[   28.356987] RAX:  RBX: 880231c31ec8 RCX: 880231cb1000
[   28.357866] RDX: 0001 RSI: c900010dbd2c RDI: 880234f87400
[   28.358738] RBP: c900010dbd78 R08: 01080020 R09: c900010dbd30
[   28.359718] R10: 88023fdddc00 R11: ffc8 R12: 880234f87400
[   28.360653] R13: 880231c31ea8 R14: 0001 R15: 0003
[   28.361510] FS:  () GS:88023fdc() 
knlGS:
[   28.362433] CS:  0010 DS:  ES:  CR0: 80050033
[   28.363177] CR2: 7f4da0f4 CR3: 01c09000 CR4: 001406e0
[   28.363994] Call Trace:
[   28.364420]  add_port+0x23f/0x3d0 [virtio_console]
[   28.365094]  ? _raw_spin_unlock_irqrestore+0x24/0x40
[   28.365765]  handle_control_message.constprop.32+0x2c2/0x2e0 [virtio_console]
[   28.366622]  control_work_handler+0x52/0xb7 [virtio_console]
[   28.367291]  process_one_work+0x15c/0x440
[   28.367869]  worker_thread+0x137/0x4b0
[   28.368426]  kthread+0x10c/0x140
[   28.368921]  ? process_one_work+0x440/0x440
[   28.369477]  ? kthread_create_on_node+0x40/0x40
[   28.370067]  ret_from_fork+0x2c/0x40
[   28.370611] Code: 57 e1 48 83 c4 30 31 c0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 
4c 89 e7 e8 03 93 f7 ff eb 0e 4c 89 e7 e8 89 84 f7 ff 84 c0 75 d1 f3 90 <48> 8d 
75 b4 4c 89 e7 e8 57 91 f7 ff 48 85 c0 74 e1 eb bc 0f 1f 


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-27 Thread Michael S. Tsirkin
On Mon, Mar 27, 2017 at 07:05:40PM +0200, Christoph Hellwig wrote:
> Hi Mike,
> 
> does the patch below fix that issue for you?
> 
> diff --git a/drivers/virtio/virtio_pci_common.c 
> b/drivers/virtio/virtio_pci_common.c
> index df548a6fb844..fd1b06368b1f 100644
> --- a/drivers/virtio/virtio_pci_common.c
> +++ b/drivers/virtio/virtio_pci_common.c
> @@ -176,7 +176,7 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, 
> unsigned nvqs,
>   if (err < 0)
>   return err;
>  
> - vp_dev->msix_vectors = nvectors;
> + vp_dev->msix_vectors = err; /* number of vectors allocated */
>   vp_dev->msix_names = kmalloc_array(nvectors,
>   sizeof(*vp_dev->msix_names), GFP_KERNEL);
>   if (!vp_dev->msix_names)

Can this sometimes allocate less vectors than min number requested then?
I didn't realize.

In that case we probably should change if (err < 0)
to if (err != nvectors) and similarly for when we try
to get 2 vectors.


Mike, could you pls send lspci -vv that shows up after
boot?

-- 
MST


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-27 Thread Christoph Hellwig
Hi Mike,

does the patch below fix that issue for you?

diff --git a/drivers/virtio/virtio_pci_common.c 
b/drivers/virtio/virtio_pci_common.c
index df548a6fb844..fd1b06368b1f 100644
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -176,7 +176,7 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, 
unsigned nvqs,
if (err < 0)
return err;
 
-   vp_dev->msix_vectors = nvectors;
+   vp_dev->msix_vectors = err; /* number of vectors allocated */
vp_dev->msix_names = kmalloc_array(nvectors,
sizeof(*vp_dev->msix_names), GFP_KERNEL);
if (!vp_dev->msix_names)


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-27 Thread Mike Galbraith
On Thu, 2017-03-23 at 15:56 +0100, Christoph Hellwig wrote:
> Does the patch from Jason in the
> 
> "[REGRESSION] 07ec51480b5e ("virtio_pci: use shared interrupts for 
> virtqueues") causes crashes in guest"
> 
> thread fix the issue for you?

That seems to eliminate explosions, but not the below.

07ec51480b5e causes me some kworker grief in -rt too (100% CPU), but
that's as yet not been stared at (too darn [busy/lazy], pick one;).

virgin 4.11-rc4+referenced patch, config=enterprise-ish.
...
[  158.400210] PM: Hibernation mode set to 'shutdown'
[  158.607439] PM: Syncing filesystems ... 
[  158.986595] PM: done.
[  158.986771] Freezing user space processes ... (elapsed 0.001 seconds) done.
[  158.988758] PM: Marking nosave pages: [mem 0x-0x0fff]
[  158.989156] PM: Marking nosave pages: [mem 0x0009f000-0x000f]
[  158.989550] PM: Marking nosave pages: [mem 0xbffde000-0x]
[  158.990200] PM: Basic memory bitmaps created
[  158.990468] PM: Preallocating image memory... done (allocated 395798 pages)
[  159.114650] PM: Allocated 1583192 kbytes in 0.12 seconds (13193.26 MB/s)
[  159.115203] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) 
done.
[  159.119378] [ cut here ]
[  159.122606] WARNING: CPU: 3 PID: 509 at drivers/pci/msi.c:1251 
pci_irq_vector+0xcf/0xe0
[  159.123194] Modules linked in: fuse(E) ebtable_filter(E) ebtables(E) 
rpcsec_gss_krb5(E) nfsv4(E) dns_resolver(E) nfs(E) fscache(E) nf_log_ipv6(E) 
xt_pkttype(E) nf_log_ipv4(E) nf_log_common(E) xt_LOG(E) xt_limit(E) 
af_packet(E) iscsi_ibft(E) iscsi_boot_sysfs(E) ip6t_REJECT(E) xt_tcpudp(E) 
nf_conntrack_ipv6(E) nf_defrag_ipv6(E) ip6table_raw(E) ipt_REJECT(E) 
iptable_raw(E) xt_CT(E) iptable_filter(E) ip6table_mangle(E) 
nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nf_conntrack_ipv4(E) 
nf_defrag_ipv4(E) ip_tables(E) xt_conntrack(E) nf_conntrack(E) libcrc32c(E) 
ip6table_filter(E) ip6_tables(E) x_tables(E) snd_hda_codec_generic(E) 
snd_hda_intel(E) snd_hda_codec(E) snd_hda_core(E) snd_hwdep(E) joydev(E) 
snd_pcm(E) snd_timer(E) snd(E) crct10dif_pclmul(E) soundcore(E) crc32_pclmul(E) 
8139too(E) ghash_clmulni_intel(E)
[  159.128123]  pcbc(E) aesni_intel(E) ppdev(E) i2c_piix4(E) aes_x86_64(E) 
virtio_balloon(E) crypto_simd(E) parport_pc(E) glue_helper(E) parport(E) 
button(E) pcspkr(E) cryptd(E) serio_raw(E) acpi_cpufreq(E) nfsd(E) dm_mod(E) 
auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) sunrpc(E) ext4(E) crc16(E) jbd2(E) 
mbcache(E) hid_generic(E) usbhid(E) sr_mod(E) cdrom(E) ata_generic(E) 
virtio_rng(E) virtio_blk(E) virtio_console(E) ata_piix(E) floppy(E) ehci_pci(E) 
qxl(E) drm_kms_helper(E) syscopyarea(E) uhci_hcd(E) ahci(E) ehci_hcd(E) 
sysfillrect(E) crc32c_intel(E) sysimgblt(E) libahci(E) fb_sys_fops(E) ttm(E) 
virtio_pci(E) virtio_ring(E) usbcore(E) virtio(E) 8139cp(E) drm(E) libata(E) 
mii(E) sg(E) scsi_mod(E) autofs4(E)
[  159.132177] CPU: 3 PID: 509 Comm: kworker/u16:6 Tainted: GE   
4.11.0-default #28
[  159.132677] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.8.1-0-g4adadbd-20161202_174313-build11a 04/01/2014
[  159.133428] Workqueue: events_unbound async_run_entry_fn
[  159.133768] Call Trace:
[  159.133933]  dump_stack+0x63/0x90
[  159.134161]  __warn+0xd1/0xf0
[  159.134360]  ? pci_pm_poweroff+0x100/0x100
[  159.134627]  warn_slowpath_null+0x1d/0x20
[  159.134889]  pci_irq_vector+0xcf/0xe0
[  159.135134]  vp_synchronize_vectors+0x46/0x60 [virtio_pci]
[  159.135486]  vp_reset+0x37/0x40 [virtio_pci]
[  159.135780]  virtcons_freeze+0x23/0xa0 [virtio_console]
[  159.136116]  virtio_device_freeze+0x6b/0x80 [virtio]
[  159.136431]  virtio_pci_freeze+0x1d/0x40 [virtio_pci]
[  159.136756]  pci_pm_freeze+0x5f/0xe0
[  159.136999]  dpm_run_callback+0x59/0x180
[  159.137252]  __device_suspend+0x127/0x3c0
[  159.137513]  ? pm_dev_dbg+0x80/0x80
[  159.137740]  async_suspend+0x1f/0xa0
[  159.137973]  async_run_entry_fn+0x39/0x170
[  159.138250]  process_one_work+0x16c/0x450
[  159.138514]  worker_thread+0x137/0x4e0
[  159.138761]  kthread+0x10c/0x140
[  159.138970]  ? rescuer_thread+0x3c0/0x3c0
[  159.139235]  ? kthread_park+0x90/0x90
[  159.139476]  ret_from_fork+0x2c/0x40
[  159.139721] ---[ end trace d66daafbe82e66e7 ]---
[  159.728658] PM: freeze of devices complete after 611.743 msecs
[  159.729321] PM: late freeze of devices complete after 0.243 msecs
[  159.730921] PM: noirq freeze of devices complete after 1.145 msecs
[  159.731507] Disabling non-boot CPUs ...
[  159.732004] Unregister pv shared memory for cpu 1
[  159.739017] smpboot: CPU 1 is now offline
[  159.765702] Unregister pv shared memory for cpu 2
[  159.770757] smpboot: CPU 2 is now offline
[  159.797684] Unregister pv shared memory for cpu 3
[  159.799545] smpboot: CPU 3 is now offline
[  159.821934] Unregister pv shared memory for cpu 4
[  159.823759] smpboot: CPU 4 is now offline
[  159.848588] Unregister pv shared memory for cpu 5
[  159.850375] smpboot: CPU 5 is now offline

Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-23 Thread Richard W.M. Jones
On Thu, Mar 23, 2017 at 03:56:22PM +0100, Christoph Hellwig wrote:
> Does the patch from Jason in the
> 
> "[REGRESSION] 07ec51480b5e ("virtio_pci: use shared interrupts for 
> virtqueues") causes crashes in guest"
> 
> thread fix the issue for you?

In brief, yes it does.  I followed up on that thread.

Thanks,

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW


Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-23 Thread Thorsten Leemhuis
Hi Christoph! Hi Michael!

(Mail roughly based on text from
https://bugzilla.kernel.org/show_bug.cgi?id=194911 )

I'm seeing random crashes during boot every few boot attempts when
running Linux 4.11-rc/mainline in a Fedora 26 guest under a CentOS7 host
(CPU: Intel(R) Pentium(R) CPU G3220) using KVM. Sometimes when the guest
actually booted the network did not work. To get some impressions of the
crashes I got see this gallery:
https://plus.google.com/+ThorstenLeemhuis/posts/FjyyGjNtrrG

Richard W.M. Jones and Adam Williamson see the same problems. See above
bug for details. It seems they ran into the problem in the past few
days, so I assume it's still present in mainline (I'm travelling
currently and haven't had time for proper tests since last last Friday
(pre-rc3); but I thought it's time to get the problem to the lists).

Long story short: Richard and I did bisections and we both found that
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=07ec51480b5e
("virtio_pci: use shared interrupts for virtqueues") is the first bad
commit. Any idea what might be wrong? Do you need more details from us
to fix this?

Ciao, Thorsten

P.S.: Sorry, I should have written this mail a few days ago after filing
above bug report, but I didn't get around to it :-/


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-23 Thread Thorsten Leemhuis
On 23.03.2017 15:56, Christoph Hellwig wrote:
> Does the patch from Jason in the
> "[REGRESSION] 07ec51480b5e ("virtio_pci: use shared interrupts for 
> virtqueues") causes crashes in guest"
> thread fix the issue for you?

Ha, sorry, I'm travelling and wasn't aware that Laura earlier today did
what I should have done a few days ago: bring the issue to the proper
mailing lists.

I'll give the patch a try. Thx for pointing it out and sorry for the
noise. Ciao, Thorsten


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-23 Thread Richard W.M. Jones
On Thu, Mar 23, 2017 at 03:56:22PM +0100, Christoph Hellwig wrote:
> Does the patch from Jason in the
> 
> "[REGRESSION] 07ec51480b5e ("virtio_pci: use shared interrupts for 
> virtqueues") causes crashes in guest"
> 
> thread fix the issue for you?

I didn't see this thread before.  I'll check that out for you now.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-23 Thread Richard W.M. Jones
On Thu, Mar 23, 2017 at 03:51:25PM +0100, Thorsten Leemhuis wrote:
> Hi Christoph! Hi Michael!
> 
> (Mail roughly based on text from
> https://bugzilla.kernel.org/show_bug.cgi?id=194911 )
> 
> I'm seeing random crashes during boot every few boot attempts when
> running Linux 4.11-rc/mainline in a Fedora 26 guest under a CentOS7 host
> (CPU: Intel(R) Pentium(R) CPU G3220) using KVM. Sometimes when the guest
> actually booted the network did not work. To get some impressions of the
> crashes I got see this gallery:
> https://plus.google.com/+ThorstenLeemhuis/posts/FjyyGjNtrrG
> 
> Richard W.M. Jones and Adam Williamson see the same problems. See above
> bug for details. It seems they ran into the problem in the past few
> days, so I assume it's still present in mainline (I'm travelling
> currently and haven't had time for proper tests since last last Friday
> (pre-rc3); but I thought it's time to get the problem to the lists).
> 
> Long story short: Richard and I did bisections and we both found that
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=07ec51480b5e
> ("virtio_pci: use shared interrupts for virtqueues") is the first bad
> commit. Any idea what might be wrong? Do you need more details from us
> to fix this?

Laura Abbott posted a kernel RPM which works for me.  She has had to
revert quite a number of commits, which are detailed in this comment:

  https://bugzilla.redhat.com/show_bug.cgi?id=1430297#c7

Her reverting patch is also attached.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html
>From 4d3cba0be27b20516eb765c2913bce93e73fe30e Mon Sep 17 00:00:00 2001
From: Laura Abbott 
Date: Wed, 22 Mar 2017 15:41:27 -0700
Subject: [PATCH] Revert a bunch of virtio commits

07ec51480b5e ("virtio_pci: use shared interrupts for virtqueues")
is linked to a bunch of issues. Unfortunately we can't just revert it by
itself. Revert it and dependency patches as well.

Revert "virtio: provide a method to get the IRQ affinity mask for a virtqueue"

This reverts commit bbaba479563910aaa51e59bb9027a09e396d3a3c.

Revert "virtio-console: avoid DMA from stack"

This reverts commit c4baad50297d84bde1a7ad45e50c73adae4a2192.

Revert "vhost: introduce O(1) vq metadata cache"

This reverts commit f889491380582b4ba2981cf0b0d7d6a40fb30ab7.

 Conflicts:
drivers/vhost/vhost.c

Revert "virtio_scsi: use virtio IRQ affinity"

This reverts commit 0d9f0a52c8b9f7a003fe1650b7d5fb8518efabe0.

Revert "virtio_blk: use virtio IRQ affinity"

This reverts commit ad71473d9c43725c917fc5a86d54ceb7001ee28c.

Revert "blk-mq: provide a default queue mapping for virtio device"

This reverts commit 73473427bb551686e4b68ecd99bfd27e6635286a.

Revert "virtio: allow drivers to request IRQ affinity when creating VQs"

This reverts commit fb5e31d970ce8b4941f03ed765d7dbefc39f22d9.

Revert "virtio_pci: simplify MSI-X setup"

This reverts commit 52a61516125fa9a21b3bdf4f90928308e2e5573f.

Revert "virtio_pci: don't duplicate the msix_enable flag in struct pci_dev"

This reverts commit 53a020c661741f3b87ad3ac6fa545088aaebac9b.

Revert "virtio_pci: use shared interrupts for virtqueues"

This reverts commit 07ec51480b5eb1233f8c1b0f5d7a7c8d1247c507.
---
 block/Kconfig  |   5 -
 block/Makefile |   1 -
 block/blk-mq-virtio.c  |  54 --
 drivers/block/virtio_blk.c |  14 +-
 drivers/char/virtio_console.c  |  14 +-
 drivers/crypto/virtio/virtio_crypto_core.c |   2 +-
 drivers/gpu/drm/virtio/virtgpu_kms.c   |   2 +-
 drivers/misc/mic/vop/vop_main.c|   2 +-
 drivers/net/caif/caif_virtio.c |   3 +-
 drivers/net/virtio_net.c   |   2 +-
 drivers/remoteproc/remoteproc_virtio.c |   3 +-
 drivers/rpmsg/virtio_rpmsg_bus.c   |   2 +-
 drivers/s390/virtio/kvm_virtio.c   |   3 +-
 drivers/s390/virtio/virtio_ccw.c   |   3 +-
 drivers/scsi/virtio_scsi.c | 127 +++--
 drivers/vhost/vhost.c  | 136 +++---
 drivers/vhost/vhost.h  |   8 -
 drivers/virtio/virtio_balloon.c|   3 +-
 drivers/virtio/virtio_input.c  |   3 +-
 drivers/virtio/virtio_mmio.c   |   3 +-
 drivers/virtio/virtio_pci_common.c | 287 +++--
 drivers/virtio/virtio_pci_common.h |  25 ++-
 drivers/virtio/virtio_pci_legacy.c |   3 +-
 drivers/virtio/virtio_pci_modern.c |  11 +-
 include/linux/blk-mq-virtio.h  |  10 -
 include/linux/cpuhotplug.h |   1 +
 include/linux/virtio_config.h  |  12 +-
 include/uapi/linux/virtio_pci.h|   2 +-
 net/vmw_vsock/virtio_transport.c   |   3 +-
 29 files changed, 337

Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-23 Thread Christoph Hellwig
Does the patch from Jason in the

"[REGRESSION] 07ec51480b5e ("virtio_pci: use shared interrupts for virtqueues") 
causes crashes in guest"

thread fix the issue for you?