Hi all,

We have replaced an old node in our office Proxmox 5.1 cluster, with a Ryzen 7 1700 machine with 64GB non-ECC RAM, just moving the disks from the old Intel server to the new AMD machine. So far so good, everything booted OK, Ceph OSD started OK after adjusting network, replacement went really nice.

But we have found _one_ Debian 9 VM that kernel panics shortly after migrating to/from Intel nodes from/to AMD node. Sometimes it is a matter of seconds, sometimes it needs some minutes or even rarely one or two hours.

The strange thing is that we have done that king of migration with other VMs (serveral Windows VMs with different versions, another CentOS VM, Debian 8 VM) and works perfectly.

If we restart this problematic VM after the migration+crash, it works flawlessly (no more crashes until migration to another CPU maker). Migration between Intel CPUs (with ECC memory) works OK too. We don't have a second AMD machine to test migration between AMD nodes.

VM has 1 socket/2 cores type kvm64, 3GB of RAM, Standard VGA, cdrom at IDE2, scsi-virtio, scsi0 8G on ceph-rbd, scsi1 50GB on ceph-rbd, network virtio, OS type Linux 4.x, Hotplug Disk, Network, USB, ACPI support yes, BIOS SeaBIOS, KVM hwd virt yes, qemu agent no. We have tried with virtio-block too.

# pveversion -v
proxmox-ve: 5.1-35 (running kernel: 4.13.13-4-pve)
pve-manager: 5.1-42 (running version: 5.1-42/724a6cb3)
pve-kernel-4.4.83-1-pve: 4.4.83-96
pve-kernel-4.13.4-1-pve: 4.13.4-26
pve-kernel-4.4.76-1-pve: 4.4.76-94
pve-kernel-4.13.13-4-pve: 4.13.13-35
pve-kernel-4.4.67-1-pve: 4.4.67-92
libpve-http-server-perl: 2.0-8
lvm2: 2.02.168-pve6
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-19
qemu-server: 5.0-19
pve-firmware: 2.0-3
libpve-common-perl: 5.0-25
libpve-guest-common-perl: 2.0-14
libpve-access-control: 5.0-7
libpve-storage-perl: 5.0-17
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-3
pve-docs: 5.1-16
pve-qemu-kvm: 2.9.1-5
pve-container: 2.0-18
pve-firewall: 3.0-5
pve-ha-manager: 2.0-4
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.1.1-2
lxcfs: 2.0.8-1
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.7.3-pve1~bpo9
ceph: 12.2.2-1~bpo90+1

Any ideas? This is a production VM but it isn't critical, we can play with it. We can also live with the problem, but I think it could be of interest to try to debug the problem.

Thanks a lot

Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943569206
Astigarraga bidea 2, 2º izq. oficina 11; 20180 Oiartzun (Gipuzkoa)

pve-user mailing list

Reply via email to