Hello everyone. I am not sure this is the right place to ask, but I am also not sure where to start, so this list seemed like a good place. I am happy for any direction as to the best place to turn to for a solution. :)
For quite some time now I have been having random kernel panics on random VMs. I have a two-node cluster, currently running a pretty current PVE version: PVE Manager Version pve-manager/5.0-23/af4267bf Now, these kernel panics have continued through several VM kernel upgrades, and even continue after the 4.x to 5.x Proxmox upgrade several weeks ago. In addition, I have moved VMs from one Proxmox node to the other to no avail, ruling out hardware on one node or the other. Also, it does not matter if the VMs have their (QCOW2) disks on the Proxmox node's local hardware RAID storage or the Synology NFS-connected storage I am trying to verify this by moving a few VMs that seem to panic more often than others back to some local hardware RAID storage on one node as I write this email... Typically the kernel panics occur during the nightly backups of the VMs, but I cannot say that this is always when they occur. I _can_ say that the kernel panic always reports the sym53c8xx_2 module as the culprit though... I have set up remote kernel logging on one VM and here is the kernel panic reported: ----8<---- [138539.201838] Kernel panic - not syncing: assertion "i && sym_get_cam_status(cp->cmd) == DID_SOFT_ERROR" failed: file "drivers/scsi/sym53c8xx_2/sym_hipd.c", line 3399 [138539.201838] [138539.201838] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.9.34-gentoo #5 [138539.201838] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 [138539.201838] ffff88023fd03d90 ffffffff813a2408 ffff8800bb842700 ffffffff81c51450 [138539.201838] ffff88023fd03e10 ffffffff8111ff3f ffff880200000020 ffff88023fd03e20 [138539.201838] ffff88023fd03db8 ffffffff813c70f3 ffffffff81c517b0 ffffffff81c51400 [138539.201838] Call Trace: [138539.201838] <IRQ> [138539.201838] [<ffffffff813a2408>] dump_stack+0x4d/0x65 [138539.201838] [<ffffffff8111ff3f>] panic+0xca/0x203 [138539.201838] [<ffffffff813c70f3>] ? swiotlb_unmap_sg_attrs+0x43/0x60 [138539.201838] [<ffffffff815ff3af>] sym_interrupt+0x1bff/0x1dd0 [138539.201838] [<ffffffff8163e888>] ? e1000_clean+0x358/0x880 [138539.201838] [<ffffffff815f8fc7>] sym53c8xx_intr+0x37/0x80 [138539.201838] [<ffffffff8109fa78>] __handle_irq_event_percpu+0x38/0x1a0 [138539.201838] [<ffffffff8109fbfe>] handle_irq_event_percpu+0x1e/0x50 [138539.201838] [<ffffffff8109fc57>] handle_irq_event+0x27/0x50 [138539.201838] [<ffffffff810a2b39>] handle_fasteoi_irq+0x89/0x160 [138539.201838] [<ffffffff8101ea5e>] handle_irq+0x6e/0x120 [138539.201838] [<ffffffff81079315>] ? atomic_notifier_call_chain+0x15/0x20 [138539.201838] [<ffffffff8101e346>] do_IRQ+0x46/0xd0 [138539.201838] [<ffffffff818dafff>] common_interrupt+0x7f/0x7f [138539.201838] <EOI> [138539.201838] [<ffffffff818d9e5b>] ? default_idle+0x1b/0xd0 [138539.201838] [<ffffffff81025eea>] arch_cpu_idle+0xa/0x10 [138539.201838] [<ffffffff818da22e>] default_idle_call+0x1e/0x30 [138539.201838] [<ffffffff81097105>] cpu_startup_entry+0xd5/0x1c0 [138539.201838] [<ffffffff8103cd98>] start_secondary+0xe8/0xf0 [138539.201838] Shutting down cpus with NMI [138539.201838] Kernel Offset: disabled [138539.201838] ---[ end Kernel panic - not syncing: assertion "i && sym_get_cam_status(cp->cmd) == DID_SOFT_ERROR" failed: file "drivers/scsi/sym53c8xx_2/sym_hipd.c", line 3399 ----8<---- The dmesg output on the Proxmox nodes' does not show any issues during the times of these VM kernel panics. I appreciate any comments, questions, or some direction on this. Thank you, Bill -- Bill Arlofski Reverse Polarity, LLC http://www.revpol.com/blogs/waa ------------------------------- He picks up scraps of information He's adept at adaptation --[ Not responsible for anything below this line ]-- _______________________________________________ pve-user mailing list [email protected] https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
