> It took longer than expected, but a definite crash happened yesterday. > Sadly, it seems that MSI was not a fix for the in-use crashes. > > At this point I'm worried that it's some sort of weird hardware-specific > interaction that is unlikely to be fixed. If anybody experiences similar > symptoms or can suggest any debugging techniques, I'd greatly appreciate > any suggestions.
I do experience something similar and have been since June. I get about 1-2 crashes per month and the symptoms are very similar to yours. After the last crash I went ahead and setup netconsole logging. That way all kernel messages are sent to another machine and are saved after the crash. https://www.kernel.org/doc/Documentation/networking/netconsole.txt It's easy to setup using the "Dynamic reconfiguration" solution but you'll need another machine to log the messages. Today I finally got another crash and it looks identical to this: https://lkml.org/lkml/2016/9/14/527 It's a problem with fuse that's only triggered under memory pressure. I always assumed the crashes are related to kvm because it usually happens soon after starting a VM but perhaps the VM only introduced the memory pressure needed to trigger the fuse crash. Do you also use fuse? The patch to fix it are marked <sta...@vger.kernel.org> [3.15+] but so far only 4.8.0 and above got the fix. I upgraded to 4.8.2 and hopefully that'll fix the crashes for me. After some googeling I even found this: https://github.com/trapexit/mergerfs#mergerfs-under-heavy-load-and-memory-preasure-leads-to-kernel-panic mergerfs is what I use fuse for. _______________________________________________ vfio-users mailing list vfio-users@redhat.com https://www.redhat.com/mailman/listinfo/vfio-users