Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Tue, 2017-04-11 at 00:23 +0300, Michael S. Tsirkin wrote: > On Sat, Apr 08, 2017 at 07:01:34AM +0200, Mike Galbraith wrote: > > On Fri, 2017-04-07 at 21:56 +0300, Michael S. Tsirkin wrote: > > > > > OK. test3 and test4 are now pushed: test3 should fix your hang, > > > test4 is trying to fix a crash reported independently. > > > > test3 does not fix the post hibernate hang business that I can easily > > reproduce, those are NFS, and at least as old as 4.4. Host/guest, > > dunno, put 4.4 on both, guest hangs intermittently. > > OK so IIUC you agree it's a good idea to send test4 to Linus, right? Well, my box agrees that that is a viable option. > Hybernation's still broken but that's not a regression. Yup. > > [] __rpc_wait_for_completion_task+0x30/0x30 [sunrpc] > > [] rpc_wait_bit_killable+0x1e/0xb0 [sunrpc] > > [] __rpc_wait_for_completion_task+0x30/0x30 [sunrpc] > > [] autoremove_wake_function+0x50/0x50 > > [] call_decode+0x850/0x850 [sunrpc] > > [] call_decode+0x850/0x850 [sunrpc] > > [] __rpc_execute+0x14e/0x440 [sunrpc] > > [] ktime_get+0x35/0xa0 > > [] rpc_run_task+0x120/0x170 [sunrpc] > > [] nfs4_call_sync_sequence+0x56/0x80 [nfsv4] > > [] _nfs4_proc_getattr+0xb0/0xc0 [nfsv4] > > [] path_lookupat+0xd2/0x100 > > [] nfs4_proc_getattr+0x5c/0xe0 [nfsv4] > > [] __nfs_revalidate_inode+0xa0/0x300 [nfs] > > [] nfs_getattr+0x95/0x250 [nfs] > > [] vfs_statx+0x7b/0xc0 > > [] SYSC_newstat+0x20/0x40 > > [] entry_SYSCALL_64_fastpath+0x1a/0xa9 > > [] 0x > > > > I noted no _other_ misbehavior in either kernel, w/wo threadirqs. > > > > > > -Mike > > Interesting. I would guess virtio net does not complete some > packets. So you were unable to find an old guest where this > works fine? I just tried my opensuse 13.2 clone. It works markedly less fine, turns into a brick either on the way down or back up in short order. -Mike
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Sat, Apr 08, 2017 at 07:01:34AM +0200, Mike Galbraith wrote: > On Fri, 2017-04-07 at 21:56 +0300, Michael S. Tsirkin wrote: > > > OK. test3 and test4 are now pushed: test3 should fix your hang, > > test4 is trying to fix a crash reported independently. > > test3 does not fix the post hibernate hang business that I can easily > reproduce, those are NFS, and at least as old as 4.4. Host/guest, > dunno, put 4.4 on both, guest hangs intermittently. OK so IIUC you agree it's a good idea to send test4 to Linus, right? Hybernation's still broken but that's not a regression. > [] __rpc_wait_for_completion_task+0x30/0x30 [sunrpc] > [] rpc_wait_bit_killable+0x1e/0xb0 [sunrpc] > [] __rpc_wait_for_completion_task+0x30/0x30 [sunrpc] > [] autoremove_wake_function+0x50/0x50 > [] call_decode+0x850/0x850 [sunrpc] > [] call_decode+0x850/0x850 [sunrpc] > [] __rpc_execute+0x14e/0x440 [sunrpc] > [] ktime_get+0x35/0xa0 > [] rpc_run_task+0x120/0x170 [sunrpc] > [] nfs4_call_sync_sequence+0x56/0x80 [nfsv4] > [] _nfs4_proc_getattr+0xb0/0xc0 [nfsv4] > [] path_lookupat+0xd2/0x100 > [] nfs4_proc_getattr+0x5c/0xe0 [nfsv4] > [] __nfs_revalidate_inode+0xa0/0x300 [nfs] > [] nfs_getattr+0x95/0x250 [nfs] > [] vfs_statx+0x7b/0xc0 > [] SYSC_newstat+0x20/0x40 > [] entry_SYSCALL_64_fastpath+0x1a/0xa9 > [] 0x > > I noted no _other_ misbehavior in either kernel, w/wo threadirqs. > > -Mike Interesting. I would guess virtio net does not complete some packets. So you were unable to find an old guest where this works fine? -- MST
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Fri, 2017-04-07 at 21:56 +0300, Michael S. Tsirkin wrote: > OK. test3 and test4 are now pushed: test3 should fix your hang, > test4 is trying to fix a crash reported independently. test3 does not fix the post hibernate hang business that I can easily reproduce, those are NFS, and at least as old as 4.4. Host/guest, dunno, put 4.4 on both, guest hangs intermittently. [] __rpc_wait_for_completion_task+0x30/0x30 [sunrpc] [] rpc_wait_bit_killable+0x1e/0xb0 [sunrpc] [] __rpc_wait_for_completion_task+0x30/0x30 [sunrpc] [] autoremove_wake_function+0x50/0x50 [] call_decode+0x850/0x850 [sunrpc] [] call_decode+0x850/0x850 [sunrpc] [] __rpc_execute+0x14e/0x440 [sunrpc] [] ktime_get+0x35/0xa0 [] rpc_run_task+0x120/0x170 [sunrpc] [] nfs4_call_sync_sequence+0x56/0x80 [nfsv4] [] _nfs4_proc_getattr+0xb0/0xc0 [nfsv4] [] path_lookupat+0xd2/0x100 [] nfs4_proc_getattr+0x5c/0xe0 [nfsv4] [] __nfs_revalidate_inode+0xa0/0x300 [nfs] [] nfs_getattr+0x95/0x250 [nfs] [] vfs_statx+0x7b/0xc0 [] SYSC_newstat+0x20/0x40 [] entry_SYSCALL_64_fastpath+0x1a/0xa9 [] 0x I noted no _other_ misbehavior in either kernel, w/wo threadirqs. -Mike
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Fri, Apr 07, 2017 at 04:29:53PM +0200, Mike Galbraith wrote: > On Fri, 2017-04-07 at 16:35 +0300, Michael S. Tsirkin wrote: > > > Oh wait, I still put the ctx feature patches in there :( > > Pls ignore, I'll update when I've fixed it up. Sorry about the noise. > > Both worked fine w/wo threadirqs. > > -Mike OK. test3 and test4 are now pushed: test3 should fix your hang, test4 is trying to fix a crash reported independently. Will push to linux-next once I hear from you. -- MST
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Fri, 2017-04-07 at 16:35 +0300, Michael S. Tsirkin wrote: > Oh wait, I still put the ctx feature patches in there :( > Pls ignore, I'll update when I've fixed it up. Sorry about the noise. Both worked fine w/wo threadirqs. -Mike
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Fri, Apr 07, 2017 at 04:20:12PM +0300, Michael S. Tsirkin wrote: > On Fri, Apr 07, 2017 at 09:22:02AM +0200, Mike Galbraith wrote: > > On Fri, 2017-04-07 at 09:05 +0200, Mike Galbraith wrote: > > > On Fri, 2017-04-07 at 08:44 +0200, Mike Galbraith wrote: > > > > On Fri, 2017-04-07 at 09:24 +0300, Michael S. Tsirkin wrote: > > > > > On Fri, Apr 07, 2017 at 08:03:19AM +0200, Mike Galbraith wrote: > > > > > > > > > > Test tag works fine here w/wo threadirqs, RT works as well. > > > > > > > > > > > > -Mike > > > > > > > > > > Thanks a lot. > > > > > OK I pushed out two new tags > > > > > test1 with just the cleanup reverts > > > > > test2 with a bugfix in this area > > > > > > > > > > > > > > > I would very much appreciate your testing report on both - > > > > > should be ok but better make sure. > > > > > > > > Ok, once it percolates out I'll do that. > > > > > > for_linus-10-g960bef2a6172 contains a -ENOBUILD merge conflict. > > > > But test2 works fine w/wo threadirqs. > > Oops. This is what one gets by pushing at 2am. I fixed that one up > (still didn't even build as I'm in the middle of a conference). > Also it's actually the reverse test2 is just the revert test1 has > one more bugfix. > > So I'm inclined to push test2 out to linux-next for now, and will > add test1 later if it fares well. > > Mike, your testing is very much appreciated! Oh wait, I still put the ctx feature patches in there :( Pls ignore, I'll update when I've fixed it up. Sorry about the noise. > -- > MST
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Fri, Apr 07, 2017 at 09:22:02AM +0200, Mike Galbraith wrote: > On Fri, 2017-04-07 at 09:05 +0200, Mike Galbraith wrote: > > On Fri, 2017-04-07 at 08:44 +0200, Mike Galbraith wrote: > > > On Fri, 2017-04-07 at 09:24 +0300, Michael S. Tsirkin wrote: > > > > On Fri, Apr 07, 2017 at 08:03:19AM +0200, Mike Galbraith wrote: > > > > > > > > Test tag works fine here w/wo threadirqs, RT works as well. > > > > > > > > > > -Mike > > > > > > > > Thanks a lot. > > > > OK I pushed out two new tags > > > > test1 with just the cleanup reverts > > > > test2 with a bugfix in this area > > > > > > > > > > > > I would very much appreciate your testing report on both - > > > > should be ok but better make sure. > > > > > > Ok, once it percolates out I'll do that. > > > > for_linus-10-g960bef2a6172 contains a -ENOBUILD merge conflict. > > But test2 works fine w/wo threadirqs. Oops. This is what one gets by pushing at 2am. I fixed that one up (still didn't even build as I'm in the middle of a conference). Also it's actually the reverse test2 is just the revert test1 has one more bugfix. So I'm inclined to push test2 out to linux-next for now, and will add test1 later if it fares well. Mike, your testing is very much appreciated! -- MST
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Fri, 2017-04-07 at 09:22 +0200, Mike Galbraith wrote: > On Fri, 2017-04-07 at 09:05 +0200, Mike Galbraith wrote: > > On Fri, 2017-04-07 at 08:44 +0200, Mike Galbraith wrote: > > > On Fri, 2017-04-07 at 09:24 +0300, Michael S. Tsirkin wrote: > > > > On Fri, Apr 07, 2017 at 08:03:19AM +0200, Mike Galbraith wrote: > > > > > > > > Test tag works fine here w/wo threadirqs, RT works as well. > > > > > > > > > > -Mike > > > > > > > > Thanks a lot. > > > > OK I pushed out two new tags > > > > test1 with just the cleanup reverts > > > > test2 with a bugfix in this area > > > > > > > > > > > > I would very much appreciate your testing report on both - > > > > should be ok but better make sure. > > > > > > Ok, once it percolates out I'll do that. > > > > for_linus-10-g960bef2a6172 contains a -ENOBUILD merge conflict. > > But test2 works fine w/wo threadirqs. (CONFIG_DEBUG_SHIRQ=y as well btw)
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Fri, 2017-04-07 at 09:05 +0200, Mike Galbraith wrote: > On Fri, 2017-04-07 at 08:44 +0200, Mike Galbraith wrote: > > On Fri, 2017-04-07 at 09:24 +0300, Michael S. Tsirkin wrote: > > > On Fri, Apr 07, 2017 at 08:03:19AM +0200, Mike Galbraith wrote: > > > > > > Test tag works fine here w/wo threadirqs, RT works as well. > > > > > > > > -Mike > > > > > > Thanks a lot. > > > OK I pushed out two new tags > > > test1 with just the cleanup reverts > > > test2 with a bugfix in this area > > > > > > > > > I would very much appreciate your testing report on both - > > > should be ok but better make sure. > > > > Ok, once it percolates out I'll do that. > > for_linus-10-g960bef2a6172 contains a -ENOBUILD merge conflict. But test2 works fine w/wo threadirqs.
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Fri, 2017-04-07 at 08:44 +0200, Mike Galbraith wrote: > On Fri, 2017-04-07 at 09:24 +0300, Michael S. Tsirkin wrote: > > On Fri, Apr 07, 2017 at 08:03:19AM +0200, Mike Galbraith wrote: > > > > Test tag works fine here w/wo threadirqs, RT works as well. > > > > > > -Mike > > > > Thanks a lot. > > OK I pushed out two new tags > > test1 with just the cleanup reverts > > test2 with a bugfix in this area > > > > > > I would very much appreciate your testing report on both - > > should be ok but better make sure. > > Ok, once it percolates out I'll do that. for_linus-10-g960bef2a6172 contains a -ENOBUILD merge conflict. -Mike
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Fri, 2017-04-07 at 09:24 +0300, Michael S. Tsirkin wrote: > On Fri, Apr 07, 2017 at 08:03:19AM +0200, Mike Galbraith wrote: > > Test tag works fine here w/wo threadirqs, RT works as well. > > > > -Mike > > Thanks a lot. > OK I pushed out two new tags > test1 with just the cleanup reverts > test2 with a bugfix in this area > > > I would very much appreciate your testing report on both - > should be ok but better make sure. Ok, once it percolates out I'll do that. -Mike
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Fri, Apr 07, 2017 at 08:03:19AM +0200, Mike Galbraith wrote: > On Thu, 2017-04-06 at 00:38 +0300, Michael S. Tsirkin wrote: > > > What I did is a revert the refactorings while keeping the affinity API - > > we can safely postpone them until the next release without loss of > > functionality. But that's on top of my testing tree so it has unrelated > > stuff as well. I'm rather confident they aren't fixing the issues but > > I'll prepare a bugfix-only tree now for testing. > > Test tag works fine here w/wo threadirqs, RT works as well. > > -Mike Thanks a lot. OK I pushed out two new tags test1 with just the cleanup reverts test2 with a bugfix in this area I would very much appreciate your testing report on both - should be ok but better make sure. Unfortunately it's past 2am here so I don't have the time to test - and I'm at a conference so not a lot of time during the day either. Christoph, I still think your cleanups were a good idea, but we need get this release into a stable shape ASAP. Let's try again for the next release, OK? -- MST
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Thu, 2017-04-06 at 00:38 +0300, Michael S. Tsirkin wrote: > What I did is a revert the refactorings while keeping the affinity API - > we can safely postpone them until the next release without loss of > functionality. But that's on top of my testing tree so it has unrelated > stuff as well. I'm rather confident they aren't fixing the issues but > I'll prepare a bugfix-only tree now for testing. Test tag works fine here w/wo threadirqs, RT works as well. -Mike
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Wed, Apr 05, 2017 at 08:29:34AM +0200, Christoph Hellwig wrote: > On Wed, Apr 05, 2017 at 06:24:50AM +0200, Mike Galbraith wrote: > > On Wed, 2017-04-05 at 06:51 +0300, Michael S. Tsirkin wrote: > > > > > Any issues at all left with this tree? > > > In particular any regressions? > > > > Nothing blatantly obvious in a testdrive that lasted a couple minutes. > > I'd have to beat on it a bit to look for things beyond the reported, > > but can't afford to do that right now. > > Can you check where the issues appear? I'd like to do a pure revert > of the shared interrupts, but that three has a lot more in it.. What I did is a revert the refactorings while keeping the affinity API - we can safely postpone them until the next release without loss of functionality. But that's on top of my testing tree so it has unrelated stuff as well. I'm rather confident they aren't fixing the issues but I'll prepare a bugfix-only tree now for testing. -- MST
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Wed, 2017-04-05 at 08:29 +0200, Christoph Hellwig wrote: > Can you check where the issues appear? I'd like to do a pure revert > of the shared interrupts, but that three has a lot more in it.. Not immediately, one of my several pots is emitting black smoke. -Mike
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Mon, Apr 03, 2017 at 07:14:22PM +0300, Michael S. Tsirkin wrote: > On Mon, Apr 03, 2017 at 04:18:23PM +0200, Christoph Hellwig wrote: > > Mike, > > > > can you try the patch below? > > > > --- > > >From fe41a30b54878cc631623b7511267125e0da4b15 Mon Sep 17 00:00:00 2001 > > From: Christoph Hellwig > > Date: Mon, 3 Apr 2017 14:51:35 +0200 > > Subject: virtio_pci: don't use shared irq for virtqueues > > > > Reimplement the shared irq feature manually, as we might have a larger > > number of virtqueues than the core shared interrupt code can handle > > in threaded interrupt mode. > > > > Signed-off-by: Christoph Hellwig > > --- > > drivers/virtio/virtio_pci_common.c | 142 > > + > > drivers/virtio/virtio_pci_common.h | 1 + > > 2 files changed, 83 insertions(+), 60 deletions(-) > > Well the original patch this is trying to fix is > 07ec51480b5eb1233f8c1b0f5d7a7c8d1247c507 which dropped just 40 lines > with documentation. It did this by re-using error handling to switch > from per-vq to non-per-vq mode. Now this has separate flows for errors > and per-vq non-per-vq switch and (I think, as a result) is adding 140 > lines which doesn't make me very happy. The above adds 23 lines. We could entangle both loops again, but I'm not sure it's going to buy us much.
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Wed, Apr 05, 2017 at 06:24:50AM +0200, Mike Galbraith wrote: > On Wed, 2017-04-05 at 06:51 +0300, Michael S. Tsirkin wrote: > > > Any issues at all left with this tree? > > In particular any regressions? > > Nothing blatantly obvious in a testdrive that lasted a couple minutes. > I'd have to beat on it a bit to look for things beyond the reported, > but can't afford to do that right now. Can you check where the issues appear? I'd like to do a pure revert of the shared interrupts, but that three has a lot more in it..
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Wed, 2017-04-05 at 06:51 +0300, Michael S. Tsirkin wrote: > Any issues at all left with this tree? > In particular any regressions? Nothing blatantly obvious in a testdrive that lasted a couple minutes. I'd have to beat on it a bit to look for things beyond the reported, but can't afford to do that right now. -Mike
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Wed, Apr 05, 2017 at 05:24:30AM +0200, Mike Galbraith wrote: > On Wed, 2017-04-05 at 06:13 +0300, Michael S. Tsirkin wrote: > > On Wed, Apr 05, 2017 at 05:09:09AM +0200, Mike Galbraith wrote: > > > On Tue, 2017-04-04 at 22:03 +0300, Michael S. Tsirkin wrote: > > > > > > > since I couldn't reproduce, I decided it's worth trying to see > > > > what happens if we revert back to before 5c34d002dcc7. > > > > > > > > > > > > Could you please test a tag "test" in my tree above? > > > > It should point at 6d88af1bf359417eb821370294ba489bdf7f5ab8 > > > > > > Nogo. > > > > > > git@homer:..git/vhost> git remote update > > > Fetching origin > > > git@homer:..git/vhost> git show > > > 6d88af1bf359417eb821370294ba489bdf7f5ab8 > > > fatal: bad object 6d88af1bf359417eb821370294ba489bdf7f5ab8 > > > > Maybe because it's a tag not a head. Pls try > > git fetch git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git > > refs/tags/test > > That worked. Checked out/building. Thanks a lot for the testing. -- MST
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Wed, Apr 05, 2017 at 05:40:06AM +0200, Mike Galbraith wrote: > On Wed, 2017-04-05 at 05:24 +0200, Mike Galbraith wrote: > > On Wed, 2017-04-05 at 06:13 +0300, Michael S. Tsirkin wrote: > > > On Wed, Apr 05, 2017 at 05:09:09AM +0200, Mike Galbraith wrote: > > > > On Tue, 2017-04-04 at 22:03 +0300, Michael S. Tsirkin wrote: > > > > > > > > > since I couldn't reproduce, I decided it's worth trying to see > > > > > what happens if we revert back to before 5c34d002dcc7. > > > > > > > > > > > > > > > Could you please test a tag "test" in my tree above? > > > > > It should point at 6d88af1bf359417eb821370294ba489bdf7f5ab8 > > > > > > > > Nogo. > > > > > > > > git@homer:..git/vhost> git remote update > > > > Fetching origin > > > > git@homer:..git/vhost> git show > > > > 6d88af1bf359417eb821370294ba489bdf7f5ab8 > > > > fatal: bad object 6d88af1bf359417eb821370294ba489bdf7f5ab8 > > > > > > Maybe because it's a tag not a head. Pls try > > > git fetch > > > git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git > > > refs/tags/test > > > > That worked. Checked out/building. > > vbox hibernated gripe free, w/wo threadirqs. > > -Mike Any issues at all left with this tree? In particular any regressions? -- MST
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Wed, 2017-04-05 at 05:24 +0200, Mike Galbraith wrote: > On Wed, 2017-04-05 at 06:13 +0300, Michael S. Tsirkin wrote: > > On Wed, Apr 05, 2017 at 05:09:09AM +0200, Mike Galbraith wrote: > > > On Tue, 2017-04-04 at 22:03 +0300, Michael S. Tsirkin wrote: > > > > > > > since I couldn't reproduce, I decided it's worth trying to see > > > > what happens if we revert back to before 5c34d002dcc7. > > > > > > > > > > > > Could you please test a tag "test" in my tree above? > > > > It should point at 6d88af1bf359417eb821370294ba489bdf7f5ab8 > > > > > > Nogo. > > > > > > git@homer:..git/vhost> git remote update > > > Fetching origin > > > git@homer:..git/vhost> git show > > > 6d88af1bf359417eb821370294ba489bdf7f5ab8 > > > fatal: bad object 6d88af1bf359417eb821370294ba489bdf7f5ab8 > > > > Maybe because it's a tag not a head. Pls try > > git fetch > > git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git > > refs/tags/test > > That worked. Checked out/building. vbox hibernated gripe free, w/wo threadirqs. -Mike
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Wed, 2017-04-05 at 06:13 +0300, Michael S. Tsirkin wrote: > On Wed, Apr 05, 2017 at 05:09:09AM +0200, Mike Galbraith wrote: > > On Tue, 2017-04-04 at 22:03 +0300, Michael S. Tsirkin wrote: > > > > > since I couldn't reproduce, I decided it's worth trying to see > > > what happens if we revert back to before 5c34d002dcc7. > > > > > > > > > Could you please test a tag "test" in my tree above? > > > It should point at 6d88af1bf359417eb821370294ba489bdf7f5ab8 > > > > Nogo. > > > > git@homer:..git/vhost> git remote update > > Fetching origin > > git@homer:..git/vhost> git show > > 6d88af1bf359417eb821370294ba489bdf7f5ab8 > > fatal: bad object 6d88af1bf359417eb821370294ba489bdf7f5ab8 > > Maybe because it's a tag not a head. Pls try > git fetch git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git > refs/tags/test That worked. Checked out/building.
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Wed, Apr 05, 2017 at 05:09:09AM +0200, Mike Galbraith wrote: > On Tue, 2017-04-04 at 22:03 +0300, Michael S. Tsirkin wrote: > > > since I couldn't reproduce, I decided it's worth trying to see > > what happens if we revert back to before 5c34d002dcc7. > > > > > > Could you please test a tag "test" in my tree above? > > It should point at 6d88af1bf359417eb821370294ba489bdf7f5ab8 > > Nogo. > > git@homer:..git/vhost> git remote update > Fetching origin > git@homer:..git/vhost> git show 6d88af1bf359417eb821370294ba489bdf7f5ab8 > fatal: bad object 6d88af1bf359417eb821370294ba489bdf7f5ab8 Maybe because it's a tag not a head. Pls try git fetch git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git refs/tags/test -- MST
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Tue, 2017-04-04 at 22:03 +0300, Michael S. Tsirkin wrote: > since I couldn't reproduce, I decided it's worth trying to see > what happens if we revert back to before 5c34d002dcc7. > > > Could you please test a tag "test" in my tree above? > It should point at 6d88af1bf359417eb821370294ba489bdf7f5ab8 Nogo. git@homer:..git/vhost> git remote update Fetching origin git@homer:..git/vhost> git show 6d88af1bf359417eb821370294ba489bdf7f5ab8 fatal: bad object 6d88af1bf359417eb821370294ba489bdf7f5ab8
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Wed, 2017-04-05 at 00:31 +0300, Michael S. Tsirkin wrote: > On Tue, Apr 04, 2017 at 08:38:35PM +0200, Mike Galbraith wrote: > > On Tue, 2017-04-04 at 21:00 +0300, Michael S. Tsirkin wrote: > > > > > And just making double sure, the 1st version that has the issue > > > is 5c34d002dcc7, isn't it? I'm asking because subject says so > > > but then goes on to list subject from another commit. > > > This one is: > > > > virtio_pci: remove struct virtio_pci_vq_info > > > > When the hibernation related warnings started I don't know, I > > wasn't > > targeting that, those fell out of subsequent testing. > > I started out > > hunting console breakage point w. threaded irqs, which is > > 5c34d002dcc7. > > OK but 5c34d002dcc7 isn't "virtio_pci: use shared > interrupts for virtqueues". Heh, wrong sha.. $subject does however correctly identify in quotes the origin of the threaded irq woes. > I'm confused at this point. I would appreciate the summary of > which versions were tested and what did you see. Testing > a revert might also help. I already tested full revert. I went looking for what busted kvm for RT kernels, extracted the virtio series and quilt bisected that to use shared interrupts. I was going to just use my little turn off multiport hacklet to put spinning kworker on the back burner until the dust settled, but noticed that there was more going on, and none of it is RT specific (thus freeing up a back burner). >From there, it's all test what you/Christoph post, as you post it, in virgin source. -Mike
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Tue, Apr 04, 2017 at 08:38:35PM +0200, Mike Galbraith wrote: > On Tue, 2017-04-04 at 21:00 +0300, Michael S. Tsirkin wrote: > > > And just making double sure, the 1st version that has the issue > > is 5c34d002dcc7, isn't it? I'm asking because subject says so > > but then goes on to list subject from another commit. > > This one is: > > > virtio_pci: remove struct virtio_pci_vq_info > > When the hibernation related warnings started I don't know, I wasn't > targeting that, those fell out of subsequent testing. > I started out > hunting console breakage point w. threaded irqs, which is 5c34d002dcc7. OK but 5c34d002dcc7 isn't "virtio_pci: use shared interrupts for virtqueues". > > -Mike I'm confused at this point. I would appreciate the summary of which versions were tested and what did you see. Testing a revert might also help. Thanks a lot for your testing! -- MST
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Tue, Apr 04, 2017 at 07:54:36PM +0200, Mike Galbraith wrote: > On Tue, 2017-04-04 at 19:40 +0200, Mike Galbraith wrote: > > On Tue, 2017-04-04 at 18:30 +0300, Michael S. Tsirkin wrote: > > > > > I couldn't reproduce it - let's make sure we are using the > > > same tree. Could you pls try > > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux > > > -next > > > > > > It's currently at cc79d42a7d7e57ff64f406a1fd3740afebac0b44 > > > > Things that make ya go hmm... > > Making double sure we're on the same page... > > git@homer:..git/vhost> git branch > * linux-next > master > git@homer:..git/vhost> git describe > warning: tag 'for_linus' is really 'tags_for_linus' here > for_linus-220128-gcc79d42a7d7e > git@homer:..git/vhost> git status > On branch linux-next > Your branch is up-to-date with 'origin/linux-next'. > Changes not staged for commit: > (use "git add ..." to update what will be committed) > (use "git checkout -- ..." to discard changes in working directory) > > modified: Makefile > modified: scripts/setlocalversion > > no changes added to commit (use "git add" and/or "git commit -a") > git@homer:..git/vhost> > > Modifications are me whacking '+' sign and -rc5.. I don't do those. since I couldn't reproduce, I decided it's worth trying to see what happens if we revert back to before 5c34d002dcc7. Could you please test a tag "test" in my tree above? It should point at 6d88af1bf359417eb821370294ba489bdf7f5ab8 That has reverts for code refactorings since 5c34d002dcc7 inclusive. If this finally works, maybe you could go back and see which of the reverts helps? The idea is that this only has refactorings nicely isolated, if all else fails we can even do the reverts without losing functionality. -- MST
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Tue, 2017-04-04 at 21:00 +0300, Michael S. Tsirkin wrote: > And just making double sure, the 1st version that has the issue > is 5c34d002dcc7, isn't it? I'm asking because subject says so > but then goes on to list subject from another commit. > This one is: > > virtio_pci: remove struct virtio_pci_vq_info When the hibernation related warnings started I don't know, I wasn't targeting that, those fell out of subsequent testing. I started out hunting console breakage point w. threaded irqs, which is 5c34d002dcc7. -Mike
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Tue, Apr 04, 2017 at 07:54:36PM +0200, Mike Galbraith wrote: > On Tue, 2017-04-04 at 19:40 +0200, Mike Galbraith wrote: > > On Tue, 2017-04-04 at 18:30 +0300, Michael S. Tsirkin wrote: > > > > > I couldn't reproduce it - let's make sure we are using the > > > same tree. Could you pls try > > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux > > > -next > > > > > > It's currently at cc79d42a7d7e57ff64f406a1fd3740afebac0b44 > > > > Things that make ya go hmm... > > Making double sure we're on the same page... > > git@homer:..git/vhost> git branch > * linux-next > master > git@homer:..git/vhost> git describe > warning: tag 'for_linus' is really 'tags_for_linus' here > for_linus-220128-gcc79d42a7d7e > git@homer:..git/vhost> git status > On branch linux-next > Your branch is up-to-date with 'origin/linux-next'. > Changes not staged for commit: > (use "git add ..." to update what will be committed) > (use "git checkout -- ..." to discard changes in working directory) > > modified: Makefile > modified: scripts/setlocalversion > > no changes added to commit (use "git add" and/or "git commit -a") > git@homer:..git/vhost> > > Modifications are me whacking '+' sign and -rc5.. I don't do those. And just making double sure, the 1st version that has the issue is 5c34d002dcc7, isn't it? I'm asking because subject says so but then goes on to list subject from another commit. This one is: virtio_pci: remove struct virtio_pci_vq_info -- MST
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Tue, 2017-04-04 at 19:40 +0200, Mike Galbraith wrote: > On Tue, 2017-04-04 at 18:30 +0300, Michael S. Tsirkin wrote: > > > I couldn't reproduce it - let's make sure we are using the > > same tree. Could you pls try > > > > git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux > > -next > > > > It's currently at cc79d42a7d7e57ff64f406a1fd3740afebac0b44 > > Things that make ya go hmm... Making double sure we're on the same page... git@homer:..git/vhost> git branch * linux-next master git@homer:..git/vhost> git describe warning: tag 'for_linus' is really 'tags_for_linus' here for_linus-220128-gcc79d42a7d7e git@homer:..git/vhost> git status On branch linux-next Your branch is up-to-date with 'origin/linux-next'. Changes not staged for commit: (use "git add ..." to update what will be committed) (use "git checkout -- ..." to discard changes in working directory) modified: Makefile modified: scripts/setlocalversion no changes added to commit (use "git add" and/or "git commit -a") git@homer:..git/vhost> Modifications are me whacking '+' sign and -rc5.. I don't do those.
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Tue, 2017-04-04 at 18:30 +0300, Michael S. Tsirkin wrote: > I couldn't reproduce it - let's make sure we are using the > same tree. Could you pls try > > git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next > > It's currently at cc79d42a7d7e57ff64f406a1fd3740afebac0b44 Things that make ya go hmm... [ 87.940161] [ cut here ] [ 87.940180] WARNING: CPU: 0 PID: 97 at drivers/pci/msi.c:1251 pci_irq_vector+0xcb/0xe0 [ 87.940181] Modules linked in: dm_mod(E) fuse(E) ebtable_filter(E) ebtables(E) rpcsec_gss_krb5(E) nfsv4(E) dns_resolver(E) nfs(E) fscache(E) nf_log_ipv6(E) xt_pkttype(E) nf_log_ipv4(E) nf_log_common(E) xt_LOG(E) xt_limit(E) af_packet(E) iscsi_ibft(E) iscsi_boot_sysfs(E) ip6t_REJECT(E) xt_tcpudp(E) nf_conntrack_ipv6(E) nf_defrag_ipv6(E) ip6table_raw(E) ipt_REJECT(E) iptable_raw(E) xt_CT(E) iptable_filter(E) ip6table_mangle(E) nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) ip_tables(E) xt_conntrack(E) nf_conntrack(E) libcrc32c(E) ip6table_filter(E) ip6_tables(E) x_tables(E) snd_hda_codec_generic(E) snd_hda_intel(E) snd_hda_codec(E) joydev(E) snd_hda_core(E) snd_hwdep(E) snd_pcm(E) snd_timer(E) snd(E) 8139too(E) ppdev(E) soundcore(E) parport_pc(E) i2c_piix4(E) [ 87.940206] parport(E) virtio_balloon(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) ghash_clmulni_intel(E) serio_raw(E) acpi_cpufreq(E) pcbc(E) button(E) aesni_intel(E) pcspkr(E) aes_x86_64(E) crypto_simd(E) glue_helper(E) cryptd(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) sunrpc(E) ext4(E) crc16(E) jbd2(E) mbcache(E) hid_generic(E) usbhid(E) ata_generic(E) ata_piix(E) sr_mod(E) cdrom(E) virtio_blk(E) virtio_rng(E) virtio_console(E) qxl(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) ehci_pci(E) ttm(E) uhci_hcd(E) ehci_hcd(E) floppy(E) ahci(E) libahci(E) virtio_pci(E) drm(E) virtio_ring(E) virtio(E) usbcore(E) libata(E) 8139cp(E) mii(E) sg(E) scsi_mod(E) autofs4(E) [ 87.940233] CPU: 0 PID: 97 Comm: kworker/u16:1 Tainted: GE 4.11.0-default #1 [ 87.940234] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20161202_174313-build11a 04/01/2014 [ 87.940240] Workqueue: events_unbound async_run_entry_fn [ 87.940241] Call Trace: [ 87.940246] ? dump_stack+0x5c/0x85 [ 87.940255] ? __warn+0xc4/0xe0 [ 87.940258] ? pci_pm_poweroff+0xf0/0xf0 [ 87.940269] ? pci_irq_vector+0xcb/0xe0 [ 87.940272] ? vp_synchronize_vectors+0x3e/0x50 [virtio_pci] [ 87.940275] ? virtcons_freeze+0x1a/0xd0 [virtio_console] [ 87.940276] ? virtio_pci_freeze+0x19/0x40 [virtio_pci] [ 87.940277] ? pci_pm_freeze+0x59/0xe0 [ 87.940281] ? dpm_run_callback+0x4d/0x170 [ 87.940283] ? __device_suspend+0x11f/0x3b0 [ 87.940283] ? pm_dev_dbg+0x70/0x70 [ 87.940284] ? async_suspend+0x1a/0x90 [ 87.940286] ? async_run_entry_fn+0x34/0x160 [ 87.940287] ? process_one_work+0x164/0x430 [ 87.940288] ? worker_thread+0x135/0x4d0 [ 87.940290] ? kthread+0xff/0x140 [ 87.940291] ? rescuer_thread+0x3c0/0x3c0 [ 87.940292] ? kthread_park+0x80/0x80 [ 87.940293] ? kthread_park+0x80/0x80 [ 87.940299] ? ret_from_fork+0x26/0x40 [ 87.940300] ---[ end trace 5d65fe0efc4b61d7 ]---
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Tue, Apr 04, 2017 at 04:18:02PM +0200, Mike Galbraith wrote: > On Tue, 2017-04-04 at 16:38 +0300, Michael S. Tsirkin wrote: > > On Tue, Apr 04, 2017 at 06:02:52AM +0200, Mike Galbraith wrote: > > > On Mon, 2017-04-03 at 21:11 +0300, Michael S. Tsirkin wrote: > > > > On Mon, Apr 03, 2017 at 07:56:32PM +0200, Mike Galbraith wrote: > > > > > On Mon, 2017-04-03 at 16:18 +0200, Christoph Hellwig wrote: > > > > > > Mike, > > > > > > > > > > > > can you try the patch below? > > > > > > > > > > No more spinning kworker woes, but I still have a warning on > > > > > hibernate, > > > > > threadirqs invariant. I'm also seeing intermittent post > > > > > hibernate hang > > > > > funnies in virgin source +- this patch, and without threadirqs. > > > > > > > > > > [ 110.223953] WARNING: CPU: 5 PID: 452 at > > > > > drivers/pci/msi.c:1261 pci_irq_vector+0xb1/0xe0 > > > > > > > > > > > > -Mike > > > > > > > > I just sent a patch fixing that. > > > > However I think we want to print a message when MSI fails to work > > > > so we > > > > know guest is falling back on legacy interrupts. > > > > > > The warning persists. > > > > > > [ 137.656423] WARNING: CPU: 1 PID: 535 at drivers/pci/msi.c:1261 > > > pci_irq_vector+0xb1/0xe0 > > > > Can you post the rest of the backtrace? Is it still in the console? > > This is from a dump of post hibernate loop dying vbox I captured and > squirreled away, so pid is different. I'm not absolutely certain that > I didn't have my local patch set re-applied when I did this, so I'll > rebuild in the a.m.. My stuff is unrelated, so this should be fine. > > [ 328.475988] [ cut here ] > [ 328.476002] WARNING: CPU: 6 PID: 313 at drivers/pci/msi.c:1261 > pci_irq_vector+0xb1/0xe0 > [ 328.476003] Modules linked in: fuse(E) ebtable_filter(E) ebtables(E) > nf_log_ipv6(E) xt_pkttype(E) nf_log_ipv4(E) nf_log_common(E) xt_LOG(E) > xt_limit(E) rpcsec_gss_krb5(E) nfsv4(E) dns_resolver(E) nfs(E) fscache(E) > af_packet(E) iscsi_ibft(E) iscsi_boot_sysfs(E) ip6t_REJECT(E) xt_tcpudp(E) > nf_conntrack_ipv6(E) nf_defrag_ipv6(E) ip6table_raw(E) ipt_REJECT(E) > iptable_raw(E) xt_CT(E) iptable_filter(E) ip6table_mangle(E) > nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nf_conntrack_ipv4(E) > nf_defrag_ipv4(E) ip_tables(E) xt_conntrack(E) nf_conntrack(E) libcrc32c(E) > ip6table_filter(E) ip6_tables(E) x_tables(E) snd_hda_codec_generic(E) > snd_hda_intel(E) snd_hda_codec(E) snd_hda_core(E) joydev(E) snd_hwdep(E) > snd_pcm(E) snd_timer(E) snd(E) 8139too(E) soundcore(E) i2c_piix4(E) > virtio_balloon(E) crct10dif_pclmul(E) > [ 328.476019] crc32_pclmul(E) ppdev(E) ghash_clmulni_intel(E) parport_pc(E) > acpi_cpufreq(E) pcbc(E) button(E) parport(E) aesni_intel(E) aes_x86_64(E) > serio_raw(E) pcspkr(E) crypto_simd(E) glue_helper(E) cryptd(E) nfsd(E) > auth_rpcgss(E) nfs_acl(E) lockd(E) dm_mod(E) grace(E) sunrpc(E) ext4(E) > crc16(E) jbd2(E) mbcache(E) hid_generic(E) usbhid(E) sr_mod(E) cdrom(E) > ata_generic(E) virtio_blk(E) virtio_rng(E) virtio_console(E) ata_piix(E) > qxl(E) drm_kms_helper(E) syscopyarea(E) uhci_hcd(E) ehci_pci(E) > sysfillrect(E) sysimgblt(E) ahci(E) fb_sys_fops(E) ehci_hcd(E) libahci(E) > crc32c_intel(E) ttm(E) virtio_pci(E) virtio_ring(E) 8139cp(E) virtio(E) > usbcore(E) floppy(E) mii(E) drm(E) libata(E) sg(E) scsi_mod(E) autofs4(E) > [ 328.476037] CPU: 6 PID: 313 Comm: kworker/u16:2 Tainted: GE > 4.11.0-default #20 > [ 328.476038] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > rel-1.8.1-0-g4adadbd-20161202_174313-build11a 04/01/2014 > [ 328.476041] Workqueue: events_unbound async_run_entry_fn > [ 328.476042] Call Trace: > [ 328.476056] ? dump_stack+0x5c/0x85 > [ 328.476058] ? __warn+0xc4/0xe0 > [ 328.476060] ? pci_pm_poweroff+0xf0/0xf0 > [ 328.476062] ? pci_irq_vector+0xb1/0xe0 > [ 328.476064] ? vp_del_vqs+0xcb/0x120 [virtio_pci] > [ 328.476066] ? remove_common+0x60/0x80 [virtio_rng] > [ 328.476067] ? virtrng_freeze+0xa/0x10 [virtio_rng] > [ 328.476068] ? virtio_pci_freeze+0x19/0x40 [virtio_pci] > [ 328.476069] ? pci_pm_freeze+0x59/0xe0 > [ 328.476070] ? dpm_run_callback+0x4d/0x170 > [ 328.476071] ? __device_suspend+0x11f/0x3b0 > [ 328.476072] ? pm_dev_dbg+0x70/0x70 > [ 328.476072] ? async_suspend+0x1a/0x90 > [ 328.476082] ? async_run_entry_fn+0x34/0x160 > [ 328.476083] ? process_one_work+0x164/0x430 > [ 328.476084] ? worker_thread+0x135/0x4d0 > [ 328.476085] ? kthread+0xff/0x140 > [ 328.476086] ? rescuer_thread+0x3c0/0x3c0 > [ 328.476087] ? kthread_park+0x80/0x80 > [ 328.476088] ? do_group_exit+0x39/0xa0 > [ 328.476090] ? ret_from_fork+0x26/0x40 > [ 328.476091] ---[ end trace a045c2118936902f ]--- I couldn't reproduce it - let's make sure we are using the same tree. Could you pls try git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next It's currently at cc79d42a7d7e57ff64f406a1fd3740afebac0b44 -- MST
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Tue, Apr 04, 2017 at 04:18:02PM +0200, Mike Galbraith wrote: > On Tue, 2017-04-04 at 16:38 +0300, Michael S. Tsirkin wrote: > > On Tue, Apr 04, 2017 at 06:02:52AM +0200, Mike Galbraith wrote: > > > On Mon, 2017-04-03 at 21:11 +0300, Michael S. Tsirkin wrote: > > > > On Mon, Apr 03, 2017 at 07:56:32PM +0200, Mike Galbraith wrote: > > > > > On Mon, 2017-04-03 at 16:18 +0200, Christoph Hellwig wrote: > > > > > > Mike, > > > > > > > > > > > > can you try the patch below? > > > > > > > > > > No more spinning kworker woes, but I still have a warning on > > > > > hibernate, > > > > > threadirqs invariant. I'm also seeing intermittent post > > > > > hibernate hang > > > > > funnies in virgin source +- this patch, and without threadirqs. > > > > > > > > > > [ 110.223953] WARNING: CPU: 5 PID: 452 at > > > > > drivers/pci/msi.c:1261 pci_irq_vector+0xb1/0xe0 > > > > > > > > > > > > -Mike > > > > > > > > I just sent a patch fixing that. > > > > However I think we want to print a message when MSI fails to work > > > > so we > > > > know guest is falling back on legacy interrupts. > > > > > > The warning persists. > > > > > > [ 137.656423] WARNING: CPU: 1 PID: 535 at drivers/pci/msi.c:1261 > > > pci_irq_vector+0xb1/0xe0 > > > > Can you post the rest of the backtrace? Is it still in the console? > > This is from a dump of post hibernate loop dying vbox I captured and > squirreled away, so pid is different. I'm not absolutely certain that > I didn't have my local patch set re-applied when I did this, so I'll > rebuild in the a.m.. My stuff is unrelated, so this should be fine. > > [ 328.475988] [ cut here ] > [ 328.476002] WARNING: CPU: 6 PID: 313 at drivers/pci/msi.c:1261 > pci_irq_vector+0xb1/0xe0 > [ 328.476003] Modules linked in: fuse(E) ebtable_filter(E) ebtables(E) > nf_log_ipv6(E) xt_pkttype(E) nf_log_ipv4(E) nf_log_common(E) xt_LOG(E) > xt_limit(E) rpcsec_gss_krb5(E) nfsv4(E) dns_resolver(E) nfs(E) fscache(E) > af_packet(E) iscsi_ibft(E) iscsi_boot_sysfs(E) ip6t_REJECT(E) xt_tcpudp(E) > nf_conntrack_ipv6(E) nf_defrag_ipv6(E) ip6table_raw(E) ipt_REJECT(E) > iptable_raw(E) xt_CT(E) iptable_filter(E) ip6table_mangle(E) > nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nf_conntrack_ipv4(E) > nf_defrag_ipv4(E) ip_tables(E) xt_conntrack(E) nf_conntrack(E) libcrc32c(E) > ip6table_filter(E) ip6_tables(E) x_tables(E) snd_hda_codec_generic(E) > snd_hda_intel(E) snd_hda_codec(E) snd_hda_core(E) joydev(E) snd_hwdep(E) > snd_pcm(E) snd_timer(E) snd(E) 8139too(E) soundcore(E) i2c_piix4(E) > virtio_balloon(E) crct10dif_pclmul(E) > [ 328.476019] crc32_pclmul(E) ppdev(E) ghash_clmulni_intel(E) parport_pc(E) > acpi_cpufreq(E) pcbc(E) button(E) parport(E) aesni_intel(E) aes_x86_64(E) > serio_raw(E) pcspkr(E) crypto_simd(E) glue_helper(E) cryptd(E) nfsd(E) > auth_rpcgss(E) nfs_acl(E) lockd(E) dm_mod(E) grace(E) sunrpc(E) ext4(E) > crc16(E) jbd2(E) mbcache(E) hid_generic(E) usbhid(E) sr_mod(E) cdrom(E) > ata_generic(E) virtio_blk(E) virtio_rng(E) virtio_console(E) ata_piix(E) > qxl(E) drm_kms_helper(E) syscopyarea(E) uhci_hcd(E) ehci_pci(E) > sysfillrect(E) sysimgblt(E) ahci(E) fb_sys_fops(E) ehci_hcd(E) libahci(E) > crc32c_intel(E) ttm(E) virtio_pci(E) virtio_ring(E) 8139cp(E) virtio(E) > usbcore(E) floppy(E) mii(E) drm(E) libata(E) sg(E) scsi_mod(E) autofs4(E) > [ 328.476037] CPU: 6 PID: 313 Comm: kworker/u16:2 Tainted: GE > 4.11.0-default #20 > [ 328.476038] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > rel-1.8.1-0-g4adadbd-20161202_174313-build11a 04/01/2014 > [ 328.476041] Workqueue: events_unbound async_run_entry_fn > [ 328.476042] Call Trace: > [ 328.476056] ? dump_stack+0x5c/0x85 > [ 328.476058] ? __warn+0xc4/0xe0 > [ 328.476060] ? pci_pm_poweroff+0xf0/0xf0 > [ 328.476062] ? pci_irq_vector+0xb1/0xe0 > [ 328.476064] ? vp_del_vqs+0xcb/0x120 [virtio_pci] > [ 328.476066] ? remove_common+0x60/0x80 [virtio_rng] > [ 328.476067] ? virtrng_freeze+0xa/0x10 [virtio_rng] > [ 328.476068] ? virtio_pci_freeze+0x19/0x40 [virtio_pci] > [ 328.476069] ? pci_pm_freeze+0x59/0xe0 > [ 328.476070] ? dpm_run_callback+0x4d/0x170 > [ 328.476071] ? __device_suspend+0x11f/0x3b0 > [ 328.476072] ? pm_dev_dbg+0x70/0x70 > [ 328.476072] ? async_suspend+0x1a/0x90 > [ 328.476082] ? async_run_entry_fn+0x34/0x160 > [ 328.476083] ? process_one_work+0x164/0x430 > [ 328.476084] ? worker_thread+0x135/0x4d0 > [ 328.476085] ? kthread+0xff/0x140 > [ 328.476086] ? rescuer_thread+0x3c0/0x3c0 > [ 328.476087] ? kthread_park+0x80/0x80 > [ 328.476088] ? do_group_exit+0x39/0xa0 > [ 328.476090] ? ret_from_fork+0x26/0x40 > [ 328.476091] ---[ end trace a045c2118936902f ]--- Interesting, it's rng this time. I'll try that. -- MST
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Tue, 2017-04-04 at 16:38 +0300, Michael S. Tsirkin wrote: > On Tue, Apr 04, 2017 at 06:02:52AM +0200, Mike Galbraith wrote: > > On Mon, 2017-04-03 at 21:11 +0300, Michael S. Tsirkin wrote: > > > On Mon, Apr 03, 2017 at 07:56:32PM +0200, Mike Galbraith wrote: > > > > On Mon, 2017-04-03 at 16:18 +0200, Christoph Hellwig wrote: > > > > > Mike, > > > > > > > > > > can you try the patch below? > > > > > > > > No more spinning kworker woes, but I still have a warning on > > > > hibernate, > > > > threadirqs invariant. I'm also seeing intermittent post > > > > hibernate hang > > > > funnies in virgin source +- this patch, and without threadirqs. > > > > > > > > [ 110.223953] WARNING: CPU: 5 PID: 452 at > > > > drivers/pci/msi.c:1261 pci_irq_vector+0xb1/0xe0 > > > > > > > > > > -Mike > > > > > > I just sent a patch fixing that. > > > However I think we want to print a message when MSI fails to work > > > so we > > > know guest is falling back on legacy interrupts. > > > > The warning persists. > > > > [ 137.656423] WARNING: CPU: 1 PID: 535 at drivers/pci/msi.c:1261 > > pci_irq_vector+0xb1/0xe0 > > Can you post the rest of the backtrace? Is it still in the console? This is from a dump of post hibernate loop dying vbox I captured and squirreled away, so pid is different. I'm not absolutely certain that I didn't have my local patch set re-applied when I did this, so I'll rebuild in the a.m.. My stuff is unrelated, so this should be fine. [ 328.475988] [ cut here ] [ 328.476002] WARNING: CPU: 6 PID: 313 at drivers/pci/msi.c:1261 pci_irq_vector+0xb1/0xe0 [ 328.476003] Modules linked in: fuse(E) ebtable_filter(E) ebtables(E) nf_log_ipv6(E) xt_pkttype(E) nf_log_ipv4(E) nf_log_common(E) xt_LOG(E) xt_limit(E) rpcsec_gss_krb5(E) nfsv4(E) dns_resolver(E) nfs(E) fscache(E) af_packet(E) iscsi_ibft(E) iscsi_boot_sysfs(E) ip6t_REJECT(E) xt_tcpudp(E) nf_conntrack_ipv6(E) nf_defrag_ipv6(E) ip6table_raw(E) ipt_REJECT(E) iptable_raw(E) xt_CT(E) iptable_filter(E) ip6table_mangle(E) nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) ip_tables(E) xt_conntrack(E) nf_conntrack(E) libcrc32c(E) ip6table_filter(E) ip6_tables(E) x_tables(E) snd_hda_codec_generic(E) snd_hda_intel(E) snd_hda_codec(E) snd_hda_core(E) joydev(E) snd_hwdep(E) snd_pcm(E) snd_timer(E) snd(E) 8139too(E) soundcore(E) i2c_piix4(E) virtio_balloon(E) crct10dif_pclmul(E) [ 328.476019] crc32_pclmul(E) ppdev(E) ghash_clmulni_intel(E) parport_pc(E) acpi_cpufreq(E) pcbc(E) button(E) parport(E) aesni_intel(E) aes_x86_64(E) serio_raw(E) pcspkr(E) crypto_simd(E) glue_helper(E) cryptd(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) dm_mod(E) grace(E) sunrpc(E) ext4(E) crc16(E) jbd2(E) mbcache(E) hid_generic(E) usbhid(E) sr_mod(E) cdrom(E) ata_generic(E) virtio_blk(E) virtio_rng(E) virtio_console(E) ata_piix(E) qxl(E) drm_kms_helper(E) syscopyarea(E) uhci_hcd(E) ehci_pci(E) sysfillrect(E) sysimgblt(E) ahci(E) fb_sys_fops(E) ehci_hcd(E) libahci(E) crc32c_intel(E) ttm(E) virtio_pci(E) virtio_ring(E) 8139cp(E) virtio(E) usbcore(E) floppy(E) mii(E) drm(E) libata(E) sg(E) scsi_mod(E) autofs4(E) [ 328.476037] CPU: 6 PID: 313 Comm: kworker/u16:2 Tainted: GE 4.11.0-default #20 [ 328.476038] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20161202_174313-build11a 04/01/2014 [ 328.476041] Workqueue: events_unbound async_run_entry_fn [ 328.476042] Call Trace: [ 328.476056] ? dump_stack+0x5c/0x85 [ 328.476058] ? __warn+0xc4/0xe0 [ 328.476060] ? pci_pm_poweroff+0xf0/0xf0 [ 328.476062] ? pci_irq_vector+0xb1/0xe0 [ 328.476064] ? vp_del_vqs+0xcb/0x120 [virtio_pci] [ 328.476066] ? remove_common+0x60/0x80 [virtio_rng] [ 328.476067] ? virtrng_freeze+0xa/0x10 [virtio_rng] [ 328.476068] ? virtio_pci_freeze+0x19/0x40 [virtio_pci] [ 328.476069] ? pci_pm_freeze+0x59/0xe0 [ 328.476070] ? dpm_run_callback+0x4d/0x170 [ 328.476071] ? __device_suspend+0x11f/0x3b0 [ 328.476072] ? pm_dev_dbg+0x70/0x70 [ 328.476072] ? async_suspend+0x1a/0x90 [ 328.476082] ? async_run_entry_fn+0x34/0x160 [ 328.476083] ? process_one_work+0x164/0x430 [ 328.476084] ? worker_thread+0x135/0x4d0 [ 328.476085] ? kthread+0xff/0x140 [ 328.476086] ? rescuer_thread+0x3c0/0x3c0 [ 328.476087] ? kthread_park+0x80/0x80 [ 328.476088] ? do_group_exit+0x39/0xa0 [ 328.476090] ? ret_from_fork+0x26/0x40 [ 328.476091] ---[ end trace a045c2118936902f ]---
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Tue, Apr 04, 2017 at 06:02:52AM +0200, Mike Galbraith wrote: > On Mon, 2017-04-03 at 21:11 +0300, Michael S. Tsirkin wrote: > > On Mon, Apr 03, 2017 at 07:56:32PM +0200, Mike Galbraith wrote: > > > On Mon, 2017-04-03 at 16:18 +0200, Christoph Hellwig wrote: > > > > Mike, > > > > > > > > can you try the patch below? > > > > > > No more spinning kworker woes, but I still have a warning on hibernate, > > > threadirqs invariant. I'm also seeing intermittent post hibernate hang > > > funnies in virgin source +- this patch, and without threadirqs. > > > > > > [ 110.223953] WARNING: CPU: 5 PID: 452 at drivers/pci/msi.c:1261 > > > pci_irq_vector+0xb1/0xe0 > > > > > > > > -Mike > > > > I just sent a patch fixing that. > > However I think we want to print a message when MSI fails to work so we > > know guest is falling back on legacy interrupts. > > The warning persists. > > [ 137.656423] WARNING: CPU: 1 PID: 535 at drivers/pci/msi.c:1261 > pci_irq_vector+0xb1/0xe0 Can you post the rest of the backtrace? Is it still in the console? > WRT the post hibernate hang business, that is apparently not part of > the 4.11 woes (at least not solely), as 4.10.8 did not survive a 10 > hibernate cycle loop. RT is better at reproducing trouble (shrug, it > frequently is), but it matters not whether I'm running 4.10, master or > master-rt, they will all hang. > > WRT gripe, I wedged virtio_pci-fix-msix-vector-tracking-on-cleanup in > on top, but it wasn't impressed. > > -Mike
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Mon, 2017-04-03 at 21:11 +0300, Michael S. Tsirkin wrote: > On Mon, Apr 03, 2017 at 07:56:32PM +0200, Mike Galbraith wrote: > > On Mon, 2017-04-03 at 16:18 +0200, Christoph Hellwig wrote: > > > Mike, > > > > > > can you try the patch below? > > > > No more spinning kworker woes, but I still have a warning on hibernate, > > threadirqs invariant. I'm also seeing intermittent post hibernate hang > > funnies in virgin source +- this patch, and without threadirqs. > > > > [ 110.223953] WARNING: CPU: 5 PID: 452 at drivers/pci/msi.c:1261 > > pci_irq_vector+0xb1/0xe0 > > > > > > -Mike > > I just sent a patch fixing that. > However I think we want to print a message when MSI fails to work so we > know guest is falling back on legacy interrupts. The warning persists. [ 137.656423] WARNING: CPU: 1 PID: 535 at drivers/pci/msi.c:1261 pci_irq_vector+0xb1/0xe0 WRT the post hibernate hang business, that is apparently not part of the 4.11 woes (at least not solely), as 4.10.8 did not survive a 10 hibernate cycle loop. RT is better at reproducing trouble (shrug, it frequently is), but it matters not whether I'm running 4.10, master or master-rt, they will all hang. WRT gripe, I wedged virtio_pci-fix-msix-vector-tracking-on-cleanup in on top, but it wasn't impressed. -Mike
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Mon, Apr 03, 2017 at 07:56:32PM +0200, Mike Galbraith wrote: > On Mon, 2017-04-03 at 16:18 +0200, Christoph Hellwig wrote: > > Mike, > > > > can you try the patch below? > > No more spinning kworker woes, but I still have a warning on hibernate, > threadirqs invariant. I'm also seeing intermittent post hibernate hang > funnies in virgin source +- this patch, and without threadirqs. > > [ 110.223953] WARNING: CPU: 5 PID: 452 at drivers/pci/msi.c:1261 > pci_irq_vector+0xb1/0xe0 > > -Mike I just sent a patch fixing that. However I think we want to print a message when MSI fails to work so we know guest is falling back on legacy interrupts. -- MST
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Mon, 2017-04-03 at 16:18 +0200, Christoph Hellwig wrote: > Mike, > > can you try the patch below? No more spinning kworker woes, but I still have a warning on hibernate, threadirqs invariant. I'm also seeing intermittent post hibernate hang funnies in virgin source +- this patch, and without threadirqs. [ 110.223953] WARNING: CPU: 5 PID: 452 at drivers/pci/msi.c:1261 pci_irq_vector+0xb1/0xe0 -Mike
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Mon, Apr 03, 2017 at 04:18:23PM +0200, Christoph Hellwig wrote: > Mike, > > can you try the patch below? > > --- > >From fe41a30b54878cc631623b7511267125e0da4b15 Mon Sep 17 00:00:00 2001 > From: Christoph Hellwig > Date: Mon, 3 Apr 2017 14:51:35 +0200 > Subject: virtio_pci: don't use shared irq for virtqueues > > Reimplement the shared irq feature manually, as we might have a larger > number of virtqueues than the core shared interrupt code can handle > in threaded interrupt mode. > > Signed-off-by: Christoph Hellwig > --- > drivers/virtio/virtio_pci_common.c | 142 > + > drivers/virtio/virtio_pci_common.h | 1 + > 2 files changed, 83 insertions(+), 60 deletions(-) Well the original patch this is trying to fix is 07ec51480b5eb1233f8c1b0f5d7a7c8d1247c507 which dropped just 40 lines with documentation. It did this by re-using error handling to switch from per-vq to non-per-vq mode. Now this has separate flows for errors and per-vq non-per-vq switch and (I think, as a result) is adding 140 lines which doesn't make me very happy. > diff --git a/drivers/virtio/virtio_pci_common.c > b/drivers/virtio/virtio_pci_common.c > index 590534910dc6..6dd719543410 100644 > --- a/drivers/virtio/virtio_pci_common.c > +++ b/drivers/virtio/virtio_pci_common.c > @@ -137,6 +137,9 @@ void vp_del_vqs(struct virtio_device *vdev) > kfree(vp_dev->msix_vector_map); > } > > + /* free the shared virtuqueue irq if we don't use per-vq irqs */ typo > + if (vp_dev->shared_vq_vec) > + free_irq(pci_irq_vector(vp_dev->pci_dev, 1), vp_dev); So we used to have enums for 1 and 0. I think it was cleaner. > free_irq(pci_irq_vector(vp_dev->pci_dev, 0), vp_dev); > pci_free_irq_vectors(vp_dev->pci_dev); > } > @@ -147,10 +150,10 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, > unsigned nvqs, > { > struct virtio_pci_device *vp_dev = to_vp_device(vdev); > const char *name = dev_name(&vp_dev->vdev.dev); > - int i, j, err = -ENOMEM, allocated_vectors, nvectors; > + struct pci_dev *pdev = vp_dev->pci_dev; > + int i, err = -ENOMEM, nvectors; > unsigned flags = PCI_IRQ_MSIX; > - bool shared = false; > - u16 msix_vec; > + u16 msix_vec = 0; > > if (desc) { > flags |= PCI_IRQ_AFFINITY; > @@ -162,19 +165,18 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, > unsigned nvqs, > if (callbacks[i]) > nvectors++; > > - /* Try one vector per queue first. */ > - err = pci_alloc_irq_vectors_affinity(vp_dev->pci_dev, nvectors, > - nvectors, flags, desc); > + /* Try one vector for config and one per queue first. */ > + err = pci_alloc_irq_vectors_affinity(pdev, nvectors, nvectors, flags, > + desc); > if (err < 0) { > /* Fallback to one vector for config, one shared for queues. */ > - shared = true; > - err = pci_alloc_irq_vectors(vp_dev->pci_dev, 2, 2, > + nvectors = 2; > + vp_dev->shared_vq_vec = true; > + err = pci_alloc_irq_vectors(pdev, nvectors, nvectors, > PCI_IRQ_MSIX); > if (err < 0) > return err; > } > - if (err < 0) > - return err; > > vp_dev->msix_vectors = nvectors; > vp_dev->msix_names = kmalloc_array(nvectors, > @@ -194,79 +196,99 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, > unsigned nvqs, > } > > /* Set the vector used for configuration */ > - snprintf(vp_dev->msix_names[0], sizeof(*vp_dev->msix_names), > + snprintf(vp_dev->msix_names[msix_vec], sizeof(*vp_dev->msix_names), >"%s-config", name); > - err = request_irq(pci_irq_vector(vp_dev->pci_dev, 0), vp_config_changed, > - 0, vp_dev->msix_names[0], vp_dev); > + err = request_irq(pci_irq_vector(pdev, msix_vec), vp_config_changed, 0, > + vp_dev->msix_names[msix_vec], vp_dev); > if (err) > goto out_free_msix_affinity_masks; > > /* Verify we had enough resources to assign the vector */ > - if (vp_dev->config_vector(vp_dev, 0) == VIRTIO_MSI_NO_VECTOR) { > + if (vp_dev->config_vector(vp_dev, msix_vec) == VIRTIO_MSI_NO_VECTOR) { > err = -EBUSY; > goto out_free_config_irq; > } > > - vp_dev->msix_vector_map = kmalloc_array(nvqs, > - sizeof(*vp_dev->msix_vector_map), GFP_KERNEL); > - if (!vp_dev->msix_vector_map) > - goto out_disable_config_irq; > - > - allocated_vectors = j = 1; /* vector 0 is the config interrupt */ > - for (i = 0; i < nvqs; ++i) { > - if (!names[i]) { > - vqs[i] = NULL; > - continue; > - } > - > - if (callbacks[i]) > -
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Mon, Apr 03, 2017 at 04:18:23PM +0200, Christoph Hellwig wrote: > Mike, > > can you try the patch below? It's really easy to test on qemu so I will - just add a dummy virtio-serial-pci device with -device virtio-serial-pci and add threadirqs to kernel command line. However it doesn't look like this will fix the error recovery for when request irq fails - it will just make the error less likely. So we still need to look into that - failure should recover and use the intx path, ATM it causes hybernation to hang. > --- > >From fe41a30b54878cc631623b7511267125e0da4b15 Mon Sep 17 00:00:00 2001 > From: Christoph Hellwig > Date: Mon, 3 Apr 2017 14:51:35 +0200 > Subject: virtio_pci: don't use shared irq for virtqueues > > Reimplement the shared irq feature manually, as we might have a larger > number of virtqueues than the core shared interrupt code can handle > in threaded interrupt mode. > > Signed-off-by: Christoph Hellwig > --- > drivers/virtio/virtio_pci_common.c | 142 > + > drivers/virtio/virtio_pci_common.h | 1 + > 2 files changed, 83 insertions(+), 60 deletions(-) > > diff --git a/drivers/virtio/virtio_pci_common.c > b/drivers/virtio/virtio_pci_common.c > index 590534910dc6..6dd719543410 100644 > --- a/drivers/virtio/virtio_pci_common.c > +++ b/drivers/virtio/virtio_pci_common.c > @@ -137,6 +137,9 @@ void vp_del_vqs(struct virtio_device *vdev) > kfree(vp_dev->msix_vector_map); > } > > + /* free the shared virtuqueue irq if we don't use per-vq irqs */ > + if (vp_dev->shared_vq_vec) > + free_irq(pci_irq_vector(vp_dev->pci_dev, 1), vp_dev); > free_irq(pci_irq_vector(vp_dev->pci_dev, 0), vp_dev); > pci_free_irq_vectors(vp_dev->pci_dev); > } > @@ -147,10 +150,10 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, > unsigned nvqs, > { > struct virtio_pci_device *vp_dev = to_vp_device(vdev); > const char *name = dev_name(&vp_dev->vdev.dev); > - int i, j, err = -ENOMEM, allocated_vectors, nvectors; > + struct pci_dev *pdev = vp_dev->pci_dev; > + int i, err = -ENOMEM, nvectors; > unsigned flags = PCI_IRQ_MSIX; > - bool shared = false; > - u16 msix_vec; > + u16 msix_vec = 0; > > if (desc) { > flags |= PCI_IRQ_AFFINITY; > @@ -162,19 +165,18 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, > unsigned nvqs, > if (callbacks[i]) > nvectors++; > > - /* Try one vector per queue first. */ > - err = pci_alloc_irq_vectors_affinity(vp_dev->pci_dev, nvectors, > - nvectors, flags, desc); > + /* Try one vector for config and one per queue first. */ > + err = pci_alloc_irq_vectors_affinity(pdev, nvectors, nvectors, flags, > + desc); > if (err < 0) { > /* Fallback to one vector for config, one shared for queues. */ > - shared = true; > - err = pci_alloc_irq_vectors(vp_dev->pci_dev, 2, 2, > + nvectors = 2; > + vp_dev->shared_vq_vec = true; > + err = pci_alloc_irq_vectors(pdev, nvectors, nvectors, > PCI_IRQ_MSIX); > if (err < 0) > return err; > } > - if (err < 0) > - return err; > > vp_dev->msix_vectors = nvectors; > vp_dev->msix_names = kmalloc_array(nvectors, > @@ -194,79 +196,99 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, > unsigned nvqs, > } > > /* Set the vector used for configuration */ > - snprintf(vp_dev->msix_names[0], sizeof(*vp_dev->msix_names), > + snprintf(vp_dev->msix_names[msix_vec], sizeof(*vp_dev->msix_names), >"%s-config", name); > - err = request_irq(pci_irq_vector(vp_dev->pci_dev, 0), vp_config_changed, > - 0, vp_dev->msix_names[0], vp_dev); > + err = request_irq(pci_irq_vector(pdev, msix_vec), vp_config_changed, 0, > + vp_dev->msix_names[msix_vec], vp_dev); > if (err) > goto out_free_msix_affinity_masks; > > /* Verify we had enough resources to assign the vector */ > - if (vp_dev->config_vector(vp_dev, 0) == VIRTIO_MSI_NO_VECTOR) { > + if (vp_dev->config_vector(vp_dev, msix_vec) == VIRTIO_MSI_NO_VECTOR) { > err = -EBUSY; > goto out_free_config_irq; > } > > - vp_dev->msix_vector_map = kmalloc_array(nvqs, > - sizeof(*vp_dev->msix_vector_map), GFP_KERNEL); > - if (!vp_dev->msix_vector_map) > - goto out_disable_config_irq; > - > - allocated_vectors = j = 1; /* vector 0 is the config interrupt */ > - for (i = 0; i < nvqs; ++i) { > - if (!names[i]) { > - vqs[i] = NULL; > - continue; > - } > - > - if (callbacks[i]) > - msix_vec = allocated_vect
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
Mike, can you try the patch below? --- >From fe41a30b54878cc631623b7511267125e0da4b15 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Mon, 3 Apr 2017 14:51:35 +0200 Subject: virtio_pci: don't use shared irq for virtqueues Reimplement the shared irq feature manually, as we might have a larger number of virtqueues than the core shared interrupt code can handle in threaded interrupt mode. Signed-off-by: Christoph Hellwig --- drivers/virtio/virtio_pci_common.c | 142 + drivers/virtio/virtio_pci_common.h | 1 + 2 files changed, 83 insertions(+), 60 deletions(-) diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c index 590534910dc6..6dd719543410 100644 --- a/drivers/virtio/virtio_pci_common.c +++ b/drivers/virtio/virtio_pci_common.c @@ -137,6 +137,9 @@ void vp_del_vqs(struct virtio_device *vdev) kfree(vp_dev->msix_vector_map); } + /* free the shared virtuqueue irq if we don't use per-vq irqs */ + if (vp_dev->shared_vq_vec) + free_irq(pci_irq_vector(vp_dev->pci_dev, 1), vp_dev); free_irq(pci_irq_vector(vp_dev->pci_dev, 0), vp_dev); pci_free_irq_vectors(vp_dev->pci_dev); } @@ -147,10 +150,10 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, unsigned nvqs, { struct virtio_pci_device *vp_dev = to_vp_device(vdev); const char *name = dev_name(&vp_dev->vdev.dev); - int i, j, err = -ENOMEM, allocated_vectors, nvectors; + struct pci_dev *pdev = vp_dev->pci_dev; + int i, err = -ENOMEM, nvectors; unsigned flags = PCI_IRQ_MSIX; - bool shared = false; - u16 msix_vec; + u16 msix_vec = 0; if (desc) { flags |= PCI_IRQ_AFFINITY; @@ -162,19 +165,18 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, unsigned nvqs, if (callbacks[i]) nvectors++; - /* Try one vector per queue first. */ - err = pci_alloc_irq_vectors_affinity(vp_dev->pci_dev, nvectors, - nvectors, flags, desc); + /* Try one vector for config and one per queue first. */ + err = pci_alloc_irq_vectors_affinity(pdev, nvectors, nvectors, flags, + desc); if (err < 0) { /* Fallback to one vector for config, one shared for queues. */ - shared = true; - err = pci_alloc_irq_vectors(vp_dev->pci_dev, 2, 2, + nvectors = 2; + vp_dev->shared_vq_vec = true; + err = pci_alloc_irq_vectors(pdev, nvectors, nvectors, PCI_IRQ_MSIX); if (err < 0) return err; } - if (err < 0) - return err; vp_dev->msix_vectors = nvectors; vp_dev->msix_names = kmalloc_array(nvectors, @@ -194,79 +196,99 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, unsigned nvqs, } /* Set the vector used for configuration */ - snprintf(vp_dev->msix_names[0], sizeof(*vp_dev->msix_names), + snprintf(vp_dev->msix_names[msix_vec], sizeof(*vp_dev->msix_names), "%s-config", name); - err = request_irq(pci_irq_vector(vp_dev->pci_dev, 0), vp_config_changed, - 0, vp_dev->msix_names[0], vp_dev); + err = request_irq(pci_irq_vector(pdev, msix_vec), vp_config_changed, 0, + vp_dev->msix_names[msix_vec], vp_dev); if (err) goto out_free_msix_affinity_masks; /* Verify we had enough resources to assign the vector */ - if (vp_dev->config_vector(vp_dev, 0) == VIRTIO_MSI_NO_VECTOR) { + if (vp_dev->config_vector(vp_dev, msix_vec) == VIRTIO_MSI_NO_VECTOR) { err = -EBUSY; goto out_free_config_irq; } - vp_dev->msix_vector_map = kmalloc_array(nvqs, - sizeof(*vp_dev->msix_vector_map), GFP_KERNEL); - if (!vp_dev->msix_vector_map) - goto out_disable_config_irq; - - allocated_vectors = j = 1; /* vector 0 is the config interrupt */ - for (i = 0; i < nvqs; ++i) { - if (!names[i]) { - vqs[i] = NULL; - continue; - } - - if (callbacks[i]) - msix_vec = allocated_vectors; - else - msix_vec = VIRTIO_MSI_NO_VECTOR; - - vqs[i] = vp_dev->setup_vq(vp_dev, i, callbacks[i], names[i], - msix_vec); - if (IS_ERR(vqs[i])) { - err = PTR_ERR(vqs[i]); - goto out_remove_vqs; + msix_vec++; + + /* +* Use a different vector for each queue if they are available, +* else share the same vector for all VQs. +*/ + if (vp_dev->shared_vq_vec) { + snprintf(vp_
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Fri, Mar 31, 2017 at 10:20:49AM +0200, Christoph Hellwig wrote: > On Fri, Mar 31, 2017 at 06:22:31AM +0300, Michael S. Tsirkin wrote: > > I'm not sure why does it fail after 32 on 64 bit, but as > > virtio devices aren't limited to 32 vqs it looks like we > > should go back to requesting the irq only once for all vqs. > > Meh. > > > > > Christoph, should I just revert for now, or do you > > want to look into a smaller patch for this? > > I think we'll need to do a different patch than just a simple revert, > mostly because so much infrastructure depends on the patch. > > I'll take a look over the weekend. > > > Another question is looking into intx support - that > > should work but it seems to be broken at the moment. > > Does it? I'm pretty sure I tested it back when I came up with the > series by artifically disabling MSI-X in the kernel. I can try this > again, though. I'm not 100% sure - what I see is that we do not handle failure to request irqs correctly, we seem to fall back on intx but the following freeze then blows up trying to free non-existing vectors. Does not seem to trigger with just msix off so maybe that is simply failure to recover from an error correctly. -- MST
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Fri, Mar 31, 2017 at 06:22:31AM +0300, Michael S. Tsirkin wrote: > I'm not sure why does it fail after 32 on 64 bit, but as > virtio devices aren't limited to 32 vqs it looks like we > should go back to requesting the irq only once for all vqs. Meh. > > Christoph, should I just revert for now, or do you > want to look into a smaller patch for this? I think we'll need to do a different patch than just a simple revert, mostly because so much infrastructure depends on the patch. I'll take a look over the weekend. > Another question is looking into intx support - that > should work but it seems to be broken at the moment. Does it? I'm pretty sure I tested it back when I came up with the series by artifically disabling MSI-X in the kernel. I can try this again, though.
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Fri, Mar 31, 2017 at 04:23:35AM +0300, Michael S. Tsirkin wrote: > On Thu, Mar 30, 2017 at 09:20:35AM +0200, Mike Galbraith wrote: > > On Thu, 2017-03-30 at 05:10 +0200, Mike Galbraith wrote: > > > > > WRT spin, you should need do nothing more than boot with threadirqs, > > > that's 100% repeatable here in absolutely virgin source. > > > > No idea why virtqueue_get_buf() in __send_control_msg() fails forever > > with threadirqs, but marking that vq as being busted (it clearly is) > > results in one gripe, and a vbox that seemingly cares not one whit that > > something went missing. CONFIG_DEBUG_SHIRQ OTOH notices, mutters > > something that sounds like "idiot" when I hibernate the thing ;-) > > > > diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c > > index e9b7e0b3cabe..831406dae1cb 100644 > > --- a/drivers/char/virtio_console.c > > +++ b/drivers/char/virtio_console.c > > @@ -567,6 +567,7 @@ static ssize_t __send_control_msg(struct ports_device > > *portdev, u32 port_id, > > struct scatterlist sg[1]; > > struct virtqueue *vq; > > unsigned int len; > > + unsigned long deadline = jiffies+1; > > > > if (!use_multiport(portdev)) > > return 0; > > @@ -583,9 +584,13 @@ static ssize_t __send_control_msg(struct ports_device > > *portdev, u32 port_id, > > > > if (virtqueue_add_outbuf(vq, sg, 1, &portdev->cpkt, GFP_ATOMIC) == 0) { > > virtqueue_kick(vq); > > - while (!virtqueue_get_buf(vq, &len) > > - && !virtqueue_is_broken(vq)) > > + while (!virtqueue_get_buf(vq, &len) && > > !virtqueue_is_broken(vq)) { > > cpu_relax(); > > + if (time_after(jiffies, deadline)) { > > + trace_printk("Aw crap, I'm stuck.. breaking > > device\n"); > > + virtio_break_device(portdev->vdev); > > + } > > + } > > } > > > > spin_unlock(&portdev->c_ovq_lock); > > > OK so with your help I was able to reproduce. Surprisingly easy: > > 1. add threadirqs > 2. add to qemu -device virtio-serial-pci -no-shutdown > 3. within guest, do echo disk > /sys/power/state > > This produces a warning. Looking deeper into it, I find: > the device has 64 vqs. This line > >err = request_irq(pci_irq_vector(vp_dev->pci_dev, msix_vec), > vring_interrupt, IRQF_SHARED, > vp_dev->msix_names[j], vqs[i]); > > fails after assigning interrupts to 33 vqs. > Is there a limit to how many threaded irqs can share a line? In fact it fails on the 33'rd one, and I see this: /* * Unlikely to have 32 resp 64 irqs sharing one line, * but who knows. */ if (thread_mask == ~0UL) { printk(KERN_ERR "%s +%d\n", __FILE__, __LINE__); ret = -EBUSY; goto out_mask; } I'm not sure why does it fail after 32 on 64 bit, but as virtio devices aren't limited to 32 vqs it looks like we should go back to requesting the irq only once for all vqs. Christoph, should I just revert for now, or do you want to look into a smaller patch for this? Another question is looking into intx support - that should work but it seems to be broken at the moment. > > If so we need to rethink the whole approach. > > Still looking into it. > > Christoph, any idea? > > > -- > MST
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Thu, 2017-03-30 at 05:10 +0200, Mike Galbraith wrote: > WRT spin, you should need do nothing more than boot with threadirqs, > that's 100% repeatable here in absolutely virgin source. No idea why virtqueue_get_buf() in __send_control_msg() fails forever with threadirqs, but marking that vq as being busted (it clearly is) results in one gripe, and a vbox that seemingly cares not one whit that something went missing. CONFIG_DEBUG_SHIRQ OTOH notices, mutters something that sounds like "idiot" when I hibernate the thing ;-) diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c index e9b7e0b3cabe..831406dae1cb 100644 --- a/drivers/char/virtio_console.c +++ b/drivers/char/virtio_console.c @@ -567,6 +567,7 @@ static ssize_t __send_control_msg(struct ports_device *portdev, u32 port_id, struct scatterlist sg[1]; struct virtqueue *vq; unsigned int len; + unsigned long deadline = jiffies+1; if (!use_multiport(portdev)) return 0; @@ -583,9 +584,13 @@ static ssize_t __send_control_msg(struct ports_device *portdev, u32 port_id, if (virtqueue_add_outbuf(vq, sg, 1, &portdev->cpkt, GFP_ATOMIC) == 0) { virtqueue_kick(vq); - while (!virtqueue_get_buf(vq, &len) - && !virtqueue_is_broken(vq)) + while (!virtqueue_get_buf(vq, &len) && !virtqueue_is_broken(vq)) { cpu_relax(); + if (time_after(jiffies, deadline)) { + trace_printk("Aw crap, I'm stuck.. breaking device\n"); + virtio_break_device(portdev->vdev); + } + } } spin_unlock(&portdev->c_ovq_lock);
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Wed, 2017-03-29 at 23:19 +0300, Michael S. Tsirkin wrote: > > > > > > > > > > > &portdev->max_nr_ports) == 0) { > > @@ -2179,7 +2179,9 @@ static struct virtio_device_id id_table[ > > > > static unsigned int features[] = { > > > >> > VIRTIO_CONSOLE_F_SIZE, > > +#ifndef CONFIG_IRQ_FORCED_THREADING > > > >> > VIRTIO_CONSOLE_F_MULTIPORT, > > +#endif > > }; > > These look kind of questionable. > Is this part needed? I would have sworn it was, but double checking, nope, it's not. Hm, so I could make a prettier bandaid with a runtime check.. but it'd remain a bandaid, so I'll go do some beans 'n' biscuits work instead. -Mike
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Wed, 2017-03-29 at 23:10 +0300, Michael S. Tsirkin wrote: > Poking at this some more, I was able to reproduce at > least some warnings. I still do not see a spin > but is there a chance this helps your case too? Well, it's down to one warning, clean on the way back up. WRT spin, you should need do nothing more than boot with threadirqs, that's 100% repeatable here in absolutely virgin source. Attaching (obese enterprise-ish) config. [ 174.147626] [ cut here ] [ 174.147640] WARNING: CPU: 7 PID: 339 at drivers/pci/msi.c:1251 pci_irq_vector+0xcb/0xe0 [ 174.147640] Modules linked in: dm_mod(E) fuse(E) ebtable_filter(E) ebtables(E) nf_log_ipv6(E) rpcsec_gss_krb5(E) xt_pkttype(E) nfsv4(E) nf_log_ipv4(E) nf_log_common(E) dns_resolver(E) xt_LOG(E) xt_limit(E) nfs(E) fscache(E) af_packet(E) iscsi_ibft(E) iscsi_boot_sysfs(E) ip6t_REJECT(E) xt_tcpudp(E) nf_conntrack_ipv6(E) nf_defrag_ipv6(E) ip6table_raw(E) ipt_REJECT(E) iptable_raw(E) xt_CT(E) iptable_filter(E) ip6table_mangle(E) nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) ip_tables(E) xt_conntrack(E) nf_conntrack(E) libcrc32c(E) ip6table_filter(E) ip6_tables(E) x_tables(E) snd_hda_codec_generic(E) snd_hda_intel(E) snd_hda_codec(E) joydev(E) snd_hda_core(E) snd_hwdep(E) snd_pcm(E) snd_timer(E) crct10dif_pclmul(E) snd(E) crc32_pclmul(E) ghash_clmulni_intel(E) [ 174.147664] pcbc(E) soundcore(E) 8139too(E) aesni_intel(E) i2c_piix4(E) ppdev(E) aes_x86_64(E) virtio_balloon(E) crypto_simd(E) parport_pc(E) glue_helper(E) serio_raw(E) pcspkr(E) parport(E) cryptd(E) button(E) acpi_cpufreq(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) sunrpc(E) ext4(E) crc16(E) jbd2(E) mbcache(E) hid_generic(E) usbhid(E) sr_mod(E) cdrom(E) ata_generic(E) ata_piix(E) virtio_console(E) virtio_blk(E) virtio_rng(E) qxl(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) ehci_pci(E) sysimgblt(E) ahci(E) fb_sys_fops(E) libahci(E) ttm(E) uhci_hcd(E) ehci_hcd(E) virtio_pci(E) virtio_ring(E) drm(E) crc32c_intel(E) 8139cp(E) libata(E) usbcore(E) mii(E) virtio(E) floppy(E) sg(E) scsi_mod(E) autofs4(E) [ 174.147702] CPU: 7 PID: 339 Comm: kworker/u16:3 Tainted: GE 4.11.0-default #2 [ 174.147702] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20161202_174313-build11a 04/01/2014 [ 174.147707] Workqueue: events_unbound async_run_entry_fn [ 174.147708] Call Trace: [ 174.147713] ? dump_stack+0x5c/0x85 [ 174.147718] ? __warn+0xc4/0xe0 [ 174.147721] ? pci_pm_poweroff+0xf0/0xf0 [ 174.147722] ? pci_irq_vector+0xcb/0xe0 [ 174.147725] ? vp_synchronize_vectors+0x3e/0x50 [virtio_pci] [ 174.147727] ? virtcons_freeze+0x1f/0xa0 [virtio_console] [ 174.147729] ? virtio_pci_freeze+0x19/0x40 [virtio_pci] [ 174.147730] ? pci_pm_freeze+0x59/0xe0 [ 174.147737] ? dpm_run_callback+0x4d/0x170 [ 174.147738] ? __device_suspend+0x11f/0x3b0 [ 174.147739] ? pm_dev_dbg+0x70/0x70 [ 174.147739] ? async_suspend+0x1a/0x90 [ 174.147740] ? async_run_entry_fn+0x34/0x160 [ 174.147742] ? process_one_work+0x164/0x430 [ 174.147743] ? worker_thread+0x135/0x4d0 [ 174.147744] ? kthread+0xff/0x140 [ 174.147745] ? rescuer_thread+0x3c0/0x3c0 [ 174.147746] ? kthread_park+0x80/0x80 [ 174.147753] ? ret_from_fork+0x26/0x40 [ 174.147754] ---[ end trace 02cd3f1b527dc954 ]--- config.xz Description: application/xz
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Wed, Mar 29, 2017 at 08:23:22AM +0200, Mike Galbraith wrote: > On Mon, 2017-03-27 at 20:18 +0200, Mike Galbraith wrote: > > > BTW, WRT RT woes with $subject, I tried booting a generic kernel with > > threadirqs, and bingo, same deal, just a bit more painful than for RT, > > where there's no watchdog moaning accompanying the (preemptible) spin. > > BTW++: the last hunk of this bandaid may be a bug fix. With only the > first two, box tried to use uninitialized stuff on hibernate, went > boom. Looks like that may be possible without help from me. > > --- a/drivers/char/virtio_console.c > +++ b/drivers/char/virtio_console.c > @@ -2058,7 +2058,7 @@ static int virtcons_probe(struct virtio_ > portdev->max_nr_ports = 1; > > /* Don't test MULTIPORT at all if we're rproc: not a valid feature! */ > - if (!is_rproc_serial(vdev) && > + if (!is_rproc_serial(vdev) && !IS_ENABLED(CONFIG_IRQ_FORCED_THREADING) > && > virtio_cread_feature(vdev, VIRTIO_CONSOLE_F_MULTIPORT, >struct virtio_console_config, max_nr_ports, >&portdev->max_nr_ports) == 0) { > @@ -2179,7 +2179,9 @@ static struct virtio_device_id id_table[ > > static unsigned int features[] = { > VIRTIO_CONSOLE_F_SIZE, > +#ifndef CONFIG_IRQ_FORCED_THREADING > VIRTIO_CONSOLE_F_MULTIPORT, > +#endif > }; These look kind of questionable. Is this part needed? > static struct virtio_device_id rproc_serial_id_table[] = { > @@ -2202,14 +2204,16 @@ static int virtcons_freeze(struct virtio > > vdev->config->reset(vdev); > > - virtqueue_disable_cb(portdev->c_ivq); > + if (use_multiport(portdev)) > + virtqueue_disable_cb(portdev->c_ivq); > cancel_work_sync(&portdev->control_work); > cancel_work_sync(&portdev->config_work); > /* >* Once more: if control_work_handler() was running, it would >* enable the cb as the last step. >*/ > - virtqueue_disable_cb(portdev->c_ivq); > + if (use_multiport(portdev)) > + virtqueue_disable_cb(portdev->c_ivq); > remove_controlq_data(portdev); > > list_for_each_entry(port, &portdev->ports, list) { This looks real. No idea why would interrupt sharing trigger anything like this but go figure. Can you pls submit this separately with a signature? -- MST
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Wed, Mar 29, 2017 at 08:23:22AM +0200, Mike Galbraith wrote: > On Mon, 2017-03-27 at 20:18 +0200, Mike Galbraith wrote: > > > BTW, WRT RT woes with $subject, I tried booting a generic kernel with > > threadirqs, and bingo, same deal, just a bit more painful than for RT, > > where there's no watchdog moaning accompanying the (preemptible) spin. > > BTW++: the last hunk of this bandaid may be a bug fix. With only the > first two, box tried to use uninitialized stuff on hibernate, went > boom. Looks like that may be possible without help from me. > > --- a/drivers/char/virtio_console.c > +++ b/drivers/char/virtio_console.c > @@ -2058,7 +2058,7 @@ static int virtcons_probe(struct virtio_ > portdev->max_nr_ports = 1; > > /* Don't test MULTIPORT at all if we're rproc: not a valid feature! */ > - if (!is_rproc_serial(vdev) && > + if (!is_rproc_serial(vdev) && !IS_ENABLED(CONFIG_IRQ_FORCED_THREADING) > && > virtio_cread_feature(vdev, VIRTIO_CONSOLE_F_MULTIPORT, >struct virtio_console_config, max_nr_ports, >&portdev->max_nr_ports) == 0) { > @@ -2179,7 +2179,9 @@ static struct virtio_device_id id_table[ > > static unsigned int features[] = { > VIRTIO_CONSOLE_F_SIZE, > +#ifndef CONFIG_IRQ_FORCED_THREADING > VIRTIO_CONSOLE_F_MULTIPORT, > +#endif > }; > > static struct virtio_device_id rproc_serial_id_table[] = { > @@ -2202,14 +2204,16 @@ static int virtcons_freeze(struct virtio > > vdev->config->reset(vdev); > > - virtqueue_disable_cb(portdev->c_ivq); > + if (use_multiport(portdev)) > + virtqueue_disable_cb(portdev->c_ivq); > cancel_work_sync(&portdev->control_work); > cancel_work_sync(&portdev->config_work); > /* >* Once more: if control_work_handler() was running, it would >* enable the cb as the last step. >*/ > - virtqueue_disable_cb(portdev->c_ivq); > + if (use_multiport(portdev)) > + virtqueue_disable_cb(portdev->c_ivq); > remove_controlq_data(portdev); > > list_for_each_entry(port, &portdev->ports, list) { Poking at this some more, I was able to reproduce at least some warnings. I still do not see a spin but is there a chance this helps your case too? commit 85039ca3162295759cf986aa753778043a90012c Author: Michael S. Tsirkin Date: Wed Mar 29 23:02:28 2017 +0300 virtio_pci: fix msix vector tracking on cleanup virtio pci tracks allocated vectors in a variable: msix_vectors. This isn't reset on del_vqs, as a result if reset is called after vqs are deleted we try to synchronize non-existing irqs producing a (probably harmless) warning. Fixes: 07ec51480b5e ("virtio_pci: use shared interrupts for virtqueues") Signed-off-by: Michael S. Tsirkin diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c index baae423..a70bed6 100644 --- a/drivers/virtio/virtio_pci_common.c +++ b/drivers/virtio/virtio_pci_common.c @@ -151,6 +151,7 @@ void vp_del_vqs(struct virtio_device *vdev) } free_irq(pci_irq_vector(vp_dev->pci_dev, 0), vp_dev); + vp_dev->msix_vectors = 0; pci_free_irq_vectors(vp_dev->pci_dev); } @@ -294,6 +295,7 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, unsigned nvqs, out_free_msix_names: kfree(vp_dev->msix_names); out_free_irq_vectors: + vp_dev->msix_vectors = 0; pci_free_irq_vectors(vp_dev->pci_dev); return err; }
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Mon, 2017-03-27 at 20:18 +0200, Mike Galbraith wrote: > BTW, WRT RT woes with $subject, I tried booting a generic kernel with > threadirqs, and bingo, same deal, just a bit more painful than for RT, > where there's no watchdog moaning accompanying the (preemptible) spin. BTW++: the last hunk of this bandaid may be a bug fix. With only the first two, box tried to use uninitialized stuff on hibernate, went boom. Looks like that may be possible without help from me. --- a/drivers/char/virtio_console.c +++ b/drivers/char/virtio_console.c @@ -2058,7 +2058,7 @@ static int virtcons_probe(struct virtio_ portdev->max_nr_ports = 1; /* Don't test MULTIPORT at all if we're rproc: not a valid feature! */ - if (!is_rproc_serial(vdev) && + if (!is_rproc_serial(vdev) && !IS_ENABLED(CONFIG_IRQ_FORCED_THREADING) && virtio_cread_feature(vdev, VIRTIO_CONSOLE_F_MULTIPORT, struct virtio_console_config, max_nr_ports, &portdev->max_nr_ports) == 0) { @@ -2179,7 +2179,9 @@ static struct virtio_device_id id_table[ static unsigned int features[] = { VIRTIO_CONSOLE_F_SIZE, +#ifndef CONFIG_IRQ_FORCED_THREADING VIRTIO_CONSOLE_F_MULTIPORT, +#endif }; static struct virtio_device_id rproc_serial_id_table[] = { @@ -2202,14 +2204,16 @@ static int virtcons_freeze(struct virtio vdev->config->reset(vdev); - virtqueue_disable_cb(portdev->c_ivq); + if (use_multiport(portdev)) + virtqueue_disable_cb(portdev->c_ivq); cancel_work_sync(&portdev->control_work); cancel_work_sync(&portdev->config_work); /* * Once more: if control_work_handler() was running, it would * enable the cb as the last step. */ - virtqueue_disable_cb(portdev->c_ivq); + if (use_multiport(portdev)) + virtqueue_disable_cb(portdev->c_ivq); remove_controlq_data(portdev); list_for_each_entry(port, &portdev->ports, list) {
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Tue, 2017-03-28 at 20:27 +0300, Michael S. Tsirkin wrote: > On Tue, Mar 28, 2017 at 06:33:53PM +0200, Mike Galbraith wrote: > > On Tue, 2017-03-28 at 18:37 +0300, Michael S. Tsirkin wrote: > > > > > Anything specific that you do to trigger this? > > > > Nope, all I have to do is to poke kde Power/Session Hibernate > > button. > > Oh so you actually start hypernate? Is this what you mean when > you say "poke"? s/hyper/hiber, but yes, and button poking == mouse clicking. -Mike
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Tue, Mar 28, 2017 at 06:33:53PM +0200, Mike Galbraith wrote: > On Tue, 2017-03-28 at 18:37 +0300, Michael S. Tsirkin wrote: > > > Anything specific that you do to trigger this? > > Nope, all I have to do is to poke kde Power/Session Hibernate button. Oh so you actually start hypernate? Is this what you mean when you say "poke"? > Not that it should matter, but the vm is a full clone of my 42.1 box, > including git server/repos etc, so has all whistles/bells/lard. > > -Mike
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Tue, 2017-03-28 at 18:37 +0300, Michael S. Tsirkin wrote: > Anything specific that you do to trigger this? Nope, all I have to do is to poke kde Power/Session Hibernate button. Not that it should matter, but the vm is a full clone of my 42.1 box, including git server/repos etc, so has all whistles/bells/lard. -Mike
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Tue, Mar 28, 2017 at 05:16:13AM +0200, Mike Galbraith wrote: > On Tue, 2017-03-28 at 05:35 +0300, Michael S. Tsirkin wrote: > > On Tue, Mar 28, 2017 at 03:08:20AM +0200, Mike Galbraith wrote: > > > On Mon, 2017-03-27 at 21:16 +0300, Michael S. Tsirkin wrote: > > > > > > > Mike, could you pls send lspci -vv that shows up after > > > > boot? > > > > > > Presuming you mean the virtual box.. > > > > Yes. Hmm nothing strange here. Can you pls post your QEMU > > command line so I can try reproducing? > > I don't start from the command line, I poke buttons in gui tool for > virt-weenies, below is ps result (hope your monitor is 37 feet wide). > > /usr/bin/qemu-system-x86_64 -name opensuse42.1 -S -machine > pc-i440fx-2.3,accel=kvm,usb=off,vmport=off -cpu > Haswell-noTSX,+abm,+pdpe1gb,+rdrand,+f16c,+osxsave,+pdcm,+xtpr,+tm2,+est,+smx,+vmx,+ds_cpl,+monitor,+dtes64,+pbe,+tm,+ht,+ss,+acpi,+ds,+vme > -m 8192 -realtime mlock=off -smp 8,sockets=1,cores=1,threads=8 -uuid > afff4e95-262d-41ca-9189-f40c87c9375b -no-user-config -nodefaults -chardev > socket,id=charmonitor,path=/var/lib/libvirt/qemu/opensuse42.1.monitor,server,nowait > -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew > -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -global > PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device > ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x6.0x7 -device > ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x6 > -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x6.0x1 > -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x6.0x2 > -device ahci,id=sata0,bus=pci.0,addr=0x5 -device > virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -drive > file=/abuild/lib/libvirt/images/opensuse42.1.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=off > -device > virtio-blk-pci,scsi=off,bus=pci.0,addr=0xa,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 > -drive > file=/dev/sr0,if=none,media=cdrom,id=drive-sata0-0-0,readonly=on,format=raw > -device ide-cd,bus=sata0.0,drive=drive-sata0-0-0,id=sata0-0-0 -netdev > tap,fd=22,id=hostnet0 -device > rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:be:db:82,bus=pci.0,addr=0x3 > -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 > -chardev spicevmc,id=charchannel0,name=vdagent -device > virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 > -chardev > socket,id=charchannel1,path=/var/lib/libvirt/qemu/channel/target/opensuse42.1.org.qemu.guest_agent.0,server,nowait > -device > virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 > -device usb-tablet,id=input0 -spice > port=5900,addr=127.0.0.1,disable-ticketing,seamless-migration=on -k de > -device > qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vgamem_mb=16,bus=pci.0,addr=0x2 > -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device > hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -chardev > spicevmc,id=charredir0,name=usbredir -device > usb-redir,chardev=charredir0,id=redir0 -chardev > spicevmc,id=charredir1,name=usbredir -device > usb-redir,chardev=charredir1,id=redir1 -device > virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -object > rng-random,id=objrng0,filename=/dev/random -device > virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x9 -msg timestamp=on > > If you want the xml instead, holler. > > -Mike No, that's fine, thanks. Anything specific that you do to trigger this? -- MST
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Tue, 2017-03-28 at 05:35 +0300, Michael S. Tsirkin wrote: > On Tue, Mar 28, 2017 at 03:08:20AM +0200, Mike Galbraith wrote: > > On Mon, 2017-03-27 at 21:16 +0300, Michael S. Tsirkin wrote: > > > > > Mike, could you pls send lspci -vv that shows up after > > > boot? > > > > Presuming you mean the virtual box.. > > Yes. Hmm nothing strange here. Can you pls post your QEMU > command line so I can try reproducing? I don't start from the command line, I poke buttons in gui tool for virt-weenies, below is ps result (hope your monitor is 37 feet wide). /usr/bin/qemu-system-x86_64 -name opensuse42.1 -S -machine pc-i440fx-2.3,accel=kvm,usb=off,vmport=off -cpu Haswell-noTSX,+abm,+pdpe1gb,+rdrand,+f16c,+osxsave,+pdcm,+xtpr,+tm2,+est,+smx,+vmx,+ds_cpl,+monitor,+dtes64,+pbe,+tm,+ht,+ss,+acpi,+ds,+vme -m 8192 -realtime mlock=off -smp 8,sockets=1,cores=1,threads=8 -uuid afff4e95-262d-41ca-9189-f40c87c9375b -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/opensuse42.1.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x6.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x6 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x6.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x6.0x2 -device ahci,id=sata0,bus=pci.0,addr=0x5 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -drive file=/abuild/lib/libvirt/images/opensuse42.1.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=off -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0xa,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/dev/sr0,if=none,media=cdrom,id=drive-sata0-0-0,readonly=on,format=raw -device ide-cd,bus=sata0.0,drive=drive-sata0-0-0,id=sata0-0-0 -netdev tap,fd=22,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:be:db:82,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/channel/target/opensuse42.1.org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -device usb-tablet,id=input0 -spice port=5900,addr=127.0.0.1,disable-ticketing,seamless-migration=on -k de -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vgamem_mb=16,bus=pci.0,addr=0x2 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -object rng-random,id=objrng0,filename=/dev/random -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x9 -msg timestamp=on If you want the xml instead, holler. -Mike
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Tue, Mar 28, 2017 at 03:08:20AM +0200, Mike Galbraith wrote: > On Mon, 2017-03-27 at 21:16 +0300, Michael S. Tsirkin wrote: > > > Mike, could you pls send lspci -vv that shows up after > > boot? > > Presuming you mean the virtual box.. Yes. Hmm nothing strange here. Can you pls post your QEMU command line so I can try reproducing?
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Mon, 2017-03-27 at 21:16 +0300, Michael S. Tsirkin wrote: > Mike, could you pls send lspci -vv that shows up after > boot? Presuming you mean the virtual box.. 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02) Subsystem: Red Hat, Inc Qemu virtual machine Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR-
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Mon, 2017-03-27 at 19:05 +0200, Christoph Hellwig wrote: > Hi Mike, > > does the patch below fix that issue for you? Nope, warnings are alive and well. > diff --git a/drivers/virtio/virtio_pci_common.c > b/drivers/virtio/virtio_pci_common.c > index df548a6fb844..fd1b06368b1f 100644 > --- a/drivers/virtio/virtio_pci_common.c > +++ b/drivers/virtio/virtio_pci_common.c > @@ -176,7 +176,7 @@ static int vp_find_vqs_msix(struct virtio_device > *vdev, unsigned nvqs, > if (err < 0) > return err; > > - vp_dev->msix_vectors = nvectors; > + vp_dev->msix_vectors = err; /* number of vectors allocated > */ > vp_dev->msix_names = kmalloc_array(nvectors, > sizeof(*vp_dev->msix_names), GFP_KERNEL); > if (!vp_dev->msix_names)
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Mon, 2017-03-27 at 19:05 +0200, Christoph Hellwig wrote: > Hi Mike, > > does the patch below fix that issue for you? Thanks, I'll give it a go in the A.M. BTW, WRT RT woes with $subject, I tried booting a generic kernel with threadirqs, and bingo, same deal, just a bit more painful than for RT, where there's no watchdog moaning accompanying the (preemptible) spin. [ 28.346311] NMI watchdog: BUG: soft lockup - CPU#7 stuck for 22s! [kworker/7:1:108] [ 28.347536] Modules linked in: virtio_rng(E) virtio_blk(E) virtio_console(E) ata_piix(E) qxl(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) ttm(E) ahci(E) libahci(E) drm(E) ehci_pci(E) uhci_hcd(E) ehci_hcd(E) usbcore(E) libata(E) virtio_pci(E) virtio_ring(E) virtio(E) 8139cp(E) floppy(E) mii(E) sg(E) scsi_mod(E) autofs4(E) [ 28.351160] CPU: 7 PID: 108 Comm: kworker/7:1 Tainted: GE 4.11.0-default #30 [ 28.352085] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20161202_174313-build11a 04/01/2014 [ 28.353547] Workqueue: events control_work_handler [virtio_console] [ 28.354450] task: 8802370d4440 task.stack: c900010d8000 [ 28.355281] RIP: 0010:__send_control_msg+0xbd/0xd0 [virtio_console] [ 28.356005] RSP: 0018:c900010dbd20 EFLAGS: 0246 ORIG_RAX: ff10 [ 28.356987] RAX: RBX: 880231c31ec8 RCX: 880231cb1000 [ 28.357866] RDX: 0001 RSI: c900010dbd2c RDI: 880234f87400 [ 28.358738] RBP: c900010dbd78 R08: 01080020 R09: c900010dbd30 [ 28.359718] R10: 88023fdddc00 R11: ffc8 R12: 880234f87400 [ 28.360653] R13: 880231c31ea8 R14: 0001 R15: 0003 [ 28.361510] FS: () GS:88023fdc() knlGS: [ 28.362433] CS: 0010 DS: ES: CR0: 80050033 [ 28.363177] CR2: 7f4da0f4 CR3: 01c09000 CR4: 001406e0 [ 28.363994] Call Trace: [ 28.364420] add_port+0x23f/0x3d0 [virtio_console] [ 28.365094] ? _raw_spin_unlock_irqrestore+0x24/0x40 [ 28.365765] handle_control_message.constprop.32+0x2c2/0x2e0 [virtio_console] [ 28.366622] control_work_handler+0x52/0xb7 [virtio_console] [ 28.367291] process_one_work+0x15c/0x440 [ 28.367869] worker_thread+0x137/0x4b0 [ 28.368426] kthread+0x10c/0x140 [ 28.368921] ? process_one_work+0x440/0x440 [ 28.369477] ? kthread_create_on_node+0x40/0x40 [ 28.370067] ret_from_fork+0x2c/0x40 [ 28.370611] Code: 57 e1 48 83 c4 30 31 c0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 4c 89 e7 e8 03 93 f7 ff eb 0e 4c 89 e7 e8 89 84 f7 ff 84 c0 75 d1 f3 90 <48> 8d 75 b4 4c 89 e7 e8 57 91 f7 ff 48 85 c0 74 e1 eb bc 0f 1f
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Mon, Mar 27, 2017 at 07:05:40PM +0200, Christoph Hellwig wrote: > Hi Mike, > > does the patch below fix that issue for you? > > diff --git a/drivers/virtio/virtio_pci_common.c > b/drivers/virtio/virtio_pci_common.c > index df548a6fb844..fd1b06368b1f 100644 > --- a/drivers/virtio/virtio_pci_common.c > +++ b/drivers/virtio/virtio_pci_common.c > @@ -176,7 +176,7 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, > unsigned nvqs, > if (err < 0) > return err; > > - vp_dev->msix_vectors = nvectors; > + vp_dev->msix_vectors = err; /* number of vectors allocated */ > vp_dev->msix_names = kmalloc_array(nvectors, > sizeof(*vp_dev->msix_names), GFP_KERNEL); > if (!vp_dev->msix_names) Can this sometimes allocate less vectors than min number requested then? I didn't realize. In that case we probably should change if (err < 0) to if (err != nvectors) and similarly for when we try to get 2 vectors. Mike, could you pls send lspci -vv that shows up after boot? -- MST
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
Hi Mike, does the patch below fix that issue for you? diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c index df548a6fb844..fd1b06368b1f 100644 --- a/drivers/virtio/virtio_pci_common.c +++ b/drivers/virtio/virtio_pci_common.c @@ -176,7 +176,7 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, unsigned nvqs, if (err < 0) return err; - vp_dev->msix_vectors = nvectors; + vp_dev->msix_vectors = err; /* number of vectors allocated */ vp_dev->msix_names = kmalloc_array(nvectors, sizeof(*vp_dev->msix_names), GFP_KERNEL); if (!vp_dev->msix_names)
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Thu, 2017-03-23 at 15:56 +0100, Christoph Hellwig wrote: > Does the patch from Jason in the > > "[REGRESSION] 07ec51480b5e ("virtio_pci: use shared interrupts for > virtqueues") causes crashes in guest" > > thread fix the issue for you? That seems to eliminate explosions, but not the below. 07ec51480b5e causes me some kworker grief in -rt too (100% CPU), but that's as yet not been stared at (too darn [busy/lazy], pick one;). virgin 4.11-rc4+referenced patch, config=enterprise-ish. ... [ 158.400210] PM: Hibernation mode set to 'shutdown' [ 158.607439] PM: Syncing filesystems ... [ 158.986595] PM: done. [ 158.986771] Freezing user space processes ... (elapsed 0.001 seconds) done. [ 158.988758] PM: Marking nosave pages: [mem 0x-0x0fff] [ 158.989156] PM: Marking nosave pages: [mem 0x0009f000-0x000f] [ 158.989550] PM: Marking nosave pages: [mem 0xbffde000-0x] [ 158.990200] PM: Basic memory bitmaps created [ 158.990468] PM: Preallocating image memory... done (allocated 395798 pages) [ 159.114650] PM: Allocated 1583192 kbytes in 0.12 seconds (13193.26 MB/s) [ 159.115203] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. [ 159.119378] [ cut here ] [ 159.122606] WARNING: CPU: 3 PID: 509 at drivers/pci/msi.c:1251 pci_irq_vector+0xcf/0xe0 [ 159.123194] Modules linked in: fuse(E) ebtable_filter(E) ebtables(E) rpcsec_gss_krb5(E) nfsv4(E) dns_resolver(E) nfs(E) fscache(E) nf_log_ipv6(E) xt_pkttype(E) nf_log_ipv4(E) nf_log_common(E) xt_LOG(E) xt_limit(E) af_packet(E) iscsi_ibft(E) iscsi_boot_sysfs(E) ip6t_REJECT(E) xt_tcpudp(E) nf_conntrack_ipv6(E) nf_defrag_ipv6(E) ip6table_raw(E) ipt_REJECT(E) iptable_raw(E) xt_CT(E) iptable_filter(E) ip6table_mangle(E) nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) ip_tables(E) xt_conntrack(E) nf_conntrack(E) libcrc32c(E) ip6table_filter(E) ip6_tables(E) x_tables(E) snd_hda_codec_generic(E) snd_hda_intel(E) snd_hda_codec(E) snd_hda_core(E) snd_hwdep(E) joydev(E) snd_pcm(E) snd_timer(E) snd(E) crct10dif_pclmul(E) soundcore(E) crc32_pclmul(E) 8139too(E) ghash_clmulni_intel(E) [ 159.128123] pcbc(E) aesni_intel(E) ppdev(E) i2c_piix4(E) aes_x86_64(E) virtio_balloon(E) crypto_simd(E) parport_pc(E) glue_helper(E) parport(E) button(E) pcspkr(E) cryptd(E) serio_raw(E) acpi_cpufreq(E) nfsd(E) dm_mod(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) sunrpc(E) ext4(E) crc16(E) jbd2(E) mbcache(E) hid_generic(E) usbhid(E) sr_mod(E) cdrom(E) ata_generic(E) virtio_rng(E) virtio_blk(E) virtio_console(E) ata_piix(E) floppy(E) ehci_pci(E) qxl(E) drm_kms_helper(E) syscopyarea(E) uhci_hcd(E) ahci(E) ehci_hcd(E) sysfillrect(E) crc32c_intel(E) sysimgblt(E) libahci(E) fb_sys_fops(E) ttm(E) virtio_pci(E) virtio_ring(E) usbcore(E) virtio(E) 8139cp(E) drm(E) libata(E) mii(E) sg(E) scsi_mod(E) autofs4(E) [ 159.132177] CPU: 3 PID: 509 Comm: kworker/u16:6 Tainted: GE 4.11.0-default #28 [ 159.132677] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20161202_174313-build11a 04/01/2014 [ 159.133428] Workqueue: events_unbound async_run_entry_fn [ 159.133768] Call Trace: [ 159.133933] dump_stack+0x63/0x90 [ 159.134161] __warn+0xd1/0xf0 [ 159.134360] ? pci_pm_poweroff+0x100/0x100 [ 159.134627] warn_slowpath_null+0x1d/0x20 [ 159.134889] pci_irq_vector+0xcf/0xe0 [ 159.135134] vp_synchronize_vectors+0x46/0x60 [virtio_pci] [ 159.135486] vp_reset+0x37/0x40 [virtio_pci] [ 159.135780] virtcons_freeze+0x23/0xa0 [virtio_console] [ 159.136116] virtio_device_freeze+0x6b/0x80 [virtio] [ 159.136431] virtio_pci_freeze+0x1d/0x40 [virtio_pci] [ 159.136756] pci_pm_freeze+0x5f/0xe0 [ 159.136999] dpm_run_callback+0x59/0x180 [ 159.137252] __device_suspend+0x127/0x3c0 [ 159.137513] ? pm_dev_dbg+0x80/0x80 [ 159.137740] async_suspend+0x1f/0xa0 [ 159.137973] async_run_entry_fn+0x39/0x170 [ 159.138250] process_one_work+0x16c/0x450 [ 159.138514] worker_thread+0x137/0x4e0 [ 159.138761] kthread+0x10c/0x140 [ 159.138970] ? rescuer_thread+0x3c0/0x3c0 [ 159.139235] ? kthread_park+0x90/0x90 [ 159.139476] ret_from_fork+0x2c/0x40 [ 159.139721] ---[ end trace d66daafbe82e66e7 ]--- [ 159.728658] PM: freeze of devices complete after 611.743 msecs [ 159.729321] PM: late freeze of devices complete after 0.243 msecs [ 159.730921] PM: noirq freeze of devices complete after 1.145 msecs [ 159.731507] Disabling non-boot CPUs ... [ 159.732004] Unregister pv shared memory for cpu 1 [ 159.739017] smpboot: CPU 1 is now offline [ 159.765702] Unregister pv shared memory for cpu 2 [ 159.770757] smpboot: CPU 2 is now offline [ 159.797684] Unregister pv shared memory for cpu 3 [ 159.799545] smpboot: CPU 3 is now offline [ 159.821934] Unregister pv shared memory for cpu 4 [ 159.823759] smpboot: CPU 4 is now offline [ 159.848588] Unregister pv shared memory for cpu 5 [ 159.850375] smpboot: CPU 5 is now offline
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Thu, Mar 23, 2017 at 03:56:22PM +0100, Christoph Hellwig wrote: > Does the patch from Jason in the > > "[REGRESSION] 07ec51480b5e ("virtio_pci: use shared interrupts for > virtqueues") causes crashes in guest" > > thread fix the issue for you? In brief, yes it does. I followed up on that thread. Thanks, Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com Fedora Windows cross-compiler. Compile Windows programs, test, and build Windows installers. Over 100 libraries supported. http://fedoraproject.org/wiki/MinGW
Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
Hi Christoph! Hi Michael! (Mail roughly based on text from https://bugzilla.kernel.org/show_bug.cgi?id=194911 ) I'm seeing random crashes during boot every few boot attempts when running Linux 4.11-rc/mainline in a Fedora 26 guest under a CentOS7 host (CPU: Intel(R) Pentium(R) CPU G3220) using KVM. Sometimes when the guest actually booted the network did not work. To get some impressions of the crashes I got see this gallery: https://plus.google.com/+ThorstenLeemhuis/posts/FjyyGjNtrrG Richard W.M. Jones and Adam Williamson see the same problems. See above bug for details. It seems they ran into the problem in the past few days, so I assume it's still present in mainline (I'm travelling currently and haven't had time for proper tests since last last Friday (pre-rc3); but I thought it's time to get the problem to the lists). Long story short: Richard and I did bisections and we both found that https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=07ec51480b5e ("virtio_pci: use shared interrupts for virtqueues") is the first bad commit. Any idea what might be wrong? Do you need more details from us to fix this? Ciao, Thorsten P.S.: Sorry, I should have written this mail a few days ago after filing above bug report, but I didn't get around to it :-/
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On 23.03.2017 15:56, Christoph Hellwig wrote: > Does the patch from Jason in the > "[REGRESSION] 07ec51480b5e ("virtio_pci: use shared interrupts for > virtqueues") causes crashes in guest" > thread fix the issue for you? Ha, sorry, I'm travelling and wasn't aware that Laura earlier today did what I should have done a few days ago: bring the issue to the proper mailing lists. I'll give the patch a try. Thx for pointing it out and sorry for the noise. Ciao, Thorsten
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Thu, Mar 23, 2017 at 03:56:22PM +0100, Christoph Hellwig wrote: > Does the patch from Jason in the > > "[REGRESSION] 07ec51480b5e ("virtio_pci: use shared interrupts for > virtqueues") causes crashes in guest" > > thread fix the issue for you? I didn't see this thread before. I'll check that out for you now. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-p2v converts physical machines to virtual machines. Boot with a live CD or over the network (PXE) and turn machines into KVM guests. http://libguestfs.org/virt-v2v
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Thu, Mar 23, 2017 at 03:51:25PM +0100, Thorsten Leemhuis wrote: > Hi Christoph! Hi Michael! > > (Mail roughly based on text from > https://bugzilla.kernel.org/show_bug.cgi?id=194911 ) > > I'm seeing random crashes during boot every few boot attempts when > running Linux 4.11-rc/mainline in a Fedora 26 guest under a CentOS7 host > (CPU: Intel(R) Pentium(R) CPU G3220) using KVM. Sometimes when the guest > actually booted the network did not work. To get some impressions of the > crashes I got see this gallery: > https://plus.google.com/+ThorstenLeemhuis/posts/FjyyGjNtrrG > > Richard W.M. Jones and Adam Williamson see the same problems. See above > bug for details. It seems they ran into the problem in the past few > days, so I assume it's still present in mainline (I'm travelling > currently and haven't had time for proper tests since last last Friday > (pre-rc3); but I thought it's time to get the problem to the lists). > > Long story short: Richard and I did bisections and we both found that > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=07ec51480b5e > ("virtio_pci: use shared interrupts for virtqueues") is the first bad > commit. Any idea what might be wrong? Do you need more details from us > to fix this? Laura Abbott posted a kernel RPM which works for me. She has had to revert quite a number of commits, which are detailed in this comment: https://bugzilla.redhat.com/show_bug.cgi?id=1430297#c7 Her reverting patch is also attached. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-builder quickly builds VMs from scratch http://libguestfs.org/virt-builder.1.html >From 4d3cba0be27b20516eb765c2913bce93e73fe30e Mon Sep 17 00:00:00 2001 From: Laura Abbott Date: Wed, 22 Mar 2017 15:41:27 -0700 Subject: [PATCH] Revert a bunch of virtio commits 07ec51480b5e ("virtio_pci: use shared interrupts for virtqueues") is linked to a bunch of issues. Unfortunately we can't just revert it by itself. Revert it and dependency patches as well. Revert "virtio: provide a method to get the IRQ affinity mask for a virtqueue" This reverts commit bbaba479563910aaa51e59bb9027a09e396d3a3c. Revert "virtio-console: avoid DMA from stack" This reverts commit c4baad50297d84bde1a7ad45e50c73adae4a2192. Revert "vhost: introduce O(1) vq metadata cache" This reverts commit f889491380582b4ba2981cf0b0d7d6a40fb30ab7. Conflicts: drivers/vhost/vhost.c Revert "virtio_scsi: use virtio IRQ affinity" This reverts commit 0d9f0a52c8b9f7a003fe1650b7d5fb8518efabe0. Revert "virtio_blk: use virtio IRQ affinity" This reverts commit ad71473d9c43725c917fc5a86d54ceb7001ee28c. Revert "blk-mq: provide a default queue mapping for virtio device" This reverts commit 73473427bb551686e4b68ecd99bfd27e6635286a. Revert "virtio: allow drivers to request IRQ affinity when creating VQs" This reverts commit fb5e31d970ce8b4941f03ed765d7dbefc39f22d9. Revert "virtio_pci: simplify MSI-X setup" This reverts commit 52a61516125fa9a21b3bdf4f90928308e2e5573f. Revert "virtio_pci: don't duplicate the msix_enable flag in struct pci_dev" This reverts commit 53a020c661741f3b87ad3ac6fa545088aaebac9b. Revert "virtio_pci: use shared interrupts for virtqueues" This reverts commit 07ec51480b5eb1233f8c1b0f5d7a7c8d1247c507. --- block/Kconfig | 5 - block/Makefile | 1 - block/blk-mq-virtio.c | 54 -- drivers/block/virtio_blk.c | 14 +- drivers/char/virtio_console.c | 14 +- drivers/crypto/virtio/virtio_crypto_core.c | 2 +- drivers/gpu/drm/virtio/virtgpu_kms.c | 2 +- drivers/misc/mic/vop/vop_main.c| 2 +- drivers/net/caif/caif_virtio.c | 3 +- drivers/net/virtio_net.c | 2 +- drivers/remoteproc/remoteproc_virtio.c | 3 +- drivers/rpmsg/virtio_rpmsg_bus.c | 2 +- drivers/s390/virtio/kvm_virtio.c | 3 +- drivers/s390/virtio/virtio_ccw.c | 3 +- drivers/scsi/virtio_scsi.c | 127 +++-- drivers/vhost/vhost.c | 136 +++--- drivers/vhost/vhost.h | 8 - drivers/virtio/virtio_balloon.c| 3 +- drivers/virtio/virtio_input.c | 3 +- drivers/virtio/virtio_mmio.c | 3 +- drivers/virtio/virtio_pci_common.c | 287 +++-- drivers/virtio/virtio_pci_common.h | 25 ++- drivers/virtio/virtio_pci_legacy.c | 3 +- drivers/virtio/virtio_pci_modern.c | 11 +- include/linux/blk-mq-virtio.h | 10 - include/linux/cpuhotplug.h | 1 + include/linux/virtio_config.h | 12 +- include/uapi/linux/virtio_pci.h| 2 +- net/vmw_vsock/virtio_transport.c | 3 +- 29 files changed, 337
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
Does the patch from Jason in the "[REGRESSION] 07ec51480b5e ("virtio_pci: use shared interrupts for virtqueues") causes crashes in guest" thread fix the issue for you?