> It took longer than expected, but a definite crash happened yesterday.
> Sadly, it seems that MSI was not a fix for the in-use crashes.
> 
> At this point I'm worried that it's some sort of weird hardware-specific
> interaction that is unlikely to be fixed. If anybody experiences similar
> symptoms or can suggest any debugging techniques, I'd greatly appreciate
> any suggestions.

I do experience something similar and have been since June. I get about 1-2
crashes per month and the symptoms are very similar to yours. After the last
crash I went ahead and setup netconsole logging. That way all kernel messages
are sent to another machine and are saved after the crash. 

https://www.kernel.org/doc/Documentation/networking/netconsole.txt
It's easy to setup using the "Dynamic reconfiguration" solution but you'll
need another machine to log the messages.

Today I finally got another crash and it looks identical to this:
https://lkml.org/lkml/2016/9/14/527

It's a problem with fuse that's only triggered under memory pressure. I
always assumed the crashes are related to kvm because it usually happens soon
after starting a VM but perhaps the VM only introduced the memory pressure
needed to trigger the fuse crash. Do you also use fuse?

The patch to fix it are marked <sta...@vger.kernel.org> [3.15+] but so far
only 4.8.0 and above got the fix. I upgraded to 4.8.2 and hopefully that'll
fix the crashes for me.

After some googeling I even found this:
https://github.com/trapexit/mergerfs#mergerfs-under-heavy-load-and-memory-preasure-leads-to-kernel-panic
mergerfs is what I use fuse for.

_______________________________________________
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users

Reply via email to