Hi, this is a continuation of https://www.redhat.com/archives/libvirt-users/2019-June/msg00003.html. I have a HA-Cluster with 2 nodes on which several domains are running with libvirt and qemu. Each night i create a snapshot of the domains, which are shutdown prior. Snapshots are created with virsh snapshot-create. The procedure is: - shutdown domains - create snapshots - start domains - copy backing raw file to a cifs server - blockcommit snapshot via virsh blockcommit domain /path/to/snapshot --verbose --active --pivot
Mostly this works quite well. But sometimes one domain crashes on blockcommit. Here the excerpts from the syslog on the hosts: > 2019-06-01T03:05:31.620725+02:00 ha-idg-2 systemd-coredump[14253]: Core > Dumping has been disabled for process 30590 (qemu-system-x86). > 2019-06-01T03:05:31.712673+02:00 ha-idg-2 systemd-coredump[14253]: Process > 30590 (qemu-system-x86) of user 488 dumped core. > 2019-06-01T03:05:32.173272+02:00 ha-idg-2 kernel: [294682.387828] br0: port > 4(vnet2) entered disabled state > 2019-06-01T03:05:32.177111+02:00 ha-idg-2 kernel: [294682.388384] device > vnet2 left promiscuous mode > 2019-06-01T03:05:32.177122+02:00 ha-idg-2 kernel: [294682.388391] br0: port > 4(vnet2) entered disabled state > 2019-06-01T03:05:32.208916+02:00 ha-idg-2 wickedd[2954]: error retrieving tap > attribute from sysfs > 2019-06-01T03:05:41.395685+02:00 ha-idg-2 systemd-machined[2824]: Machine > qemu-31-severin terminated. > > > 2019-06-08T05:59:17.502899+02:00 ha-idg-1 systemd-coredump[31089]: Core > Dumping has been disabled for process 19489 (qemu-system-x86). > 2019-06-08T05:59:17.523050+02:00 ha-idg-1 systemd-coredump[31089]: Process > 19489 (qemu-system-x86) of user 489 dumped core. > 2019-06-08T05:59:17.650334+02:00 ha-idg-1 kernel: [999258.577132] br0: port > 9(vnet7) entered disabled state > 2019-06-08T05:59:17.650354+02:00 ha-idg-1 kernel: [999258.578103] device > vnet7 left promiscuous mode > 2019-06-08T05:59:17.650355+02:00 ha-idg-1 kernel: [999258.578108] br0: port > 9(vnet7) entered disabled state > 2019-06-08T05:59:25.983702+02:00 ha-idg-1 systemd-machined[1383]: Machine > qemu-205-severin terminated. The respective logs from libvirtd: ... 2019-05-31 20:31:34.481+0000: 4170: error : qemuMonitorIO:719 : internal error: End of file from qemu monitor 2019-06-01 01:05:32.233+0000: 4170: error : qemuMonitorIO:719 : internal error: End of file from qemu monitor 2019-06-01 01:05:43.804+0000: 22605: warning : qemuGetProcessInfo:1461 : cannot parse process status data 2019-06-01 01:05:43.848+0000: 22596: warning : qemuGetProcessInfo:1461 : cannot parse process status data 2019-06-01 01:06:11.438+0000: 26112: warning : qemuDomainObjBeginJobInternal:4865 : Cannot start job (destroy, none) for doma in severin; current job is (modify, none) owned by (5372 remoteDispatchDomainBlockJobAbort, 0 <null>) for (39s, 0s) 2019-06-01 01:06:11.438+0000: 26112: error : qemuDomainObjBeginJobInternal:4877 : Timed out during operation: cannot acquire state change lock (held by remoteDispatchDomainBlockJobAbort) 2019-06-01 01:06:13.976+0000: 5369: warning : qemuGetProcessInfo:1461 : cannot parse process status data 2019-06-01 01:06:14.028+0000: 22596: warning : qemuGetProcessInfo:1461 : cannot parse process status data 2019-06-01 01:06:44.165+0000: 5371: warning : qemuGetProcessInfo:1461 : cannot parse process status data 2019-06-01 01:06:44.218+0000: 22605: warning : qemuGetProcessInfo:1461 : cannot parse process status data 2019-06-01 01:07:14.343+0000: 5369: warning : qemuGetProcessInfo:1461 : cannot parse process status data 2019-06-01 01:07:14.387+0000: 22598: warning : qemuGetProcessInfo:1461 : cannot parse process status data 2019-06-01 01:07:44.495+0000: 22605: warning : qemuGetProcessInfo:1461 : cannot parse process status data ... 2019-06-07 20:30:57.170+0000: 30299: error : qemuMonitorIO:719 : internal error: End of file from qemu monitor 2019-06-08 03:59:17.690+0000: 30299: error : qemuMonitorIO:719 : internal error: End of file from qemu monitor 2019-06-08 03:59:26.145+0000: 30300: warning : qemuGetProcessInfo:1461 : cannot parse process status data 2019-06-08 03:59:26.191+0000: 30303: warning : qemuGetProcessInfo:1461 : cannot parse process status data 2019-06-08 03:59:56.095+0000: 27956: warning : qemuDomainObjBeginJobInternal:4865 : Cannot start job (destroy, none) for domain severin; current job is (modify, none) owned by (13061 remoteDispatchDomainBlockJobAbort, 0 <null>) for (38s, 0s) 2019-06-08 03:59:56.095+0000: 27956: error : qemuDomainObjBeginJobInternal:4877 : Timed out during operation: cannot acquire state change lock (held by remoteDispatchDomainBlockJobAbort) 2019-06-08 03:59:56.325+0000: 13060: warning : qemuGetProcessInfo:1461 : cannot parse process status data 2019-06-08 03:59:56.372+0000: 30304: warning : qemuGetProcessInfo:1461 : cannot parse process status data 2019-06-08 04:00:26.503+0000: 13060: warning : qemuGetProcessInfo:1461 : cannot parse process status data The log from the domain itself (/var/log/libvirt/qemu/domain): qemu-system-x86_64: block/mirror.c:864: mirror_run: Assertion `((&bs->tracked_requests)->lh_first == ((void *)0))' failed. I'm running SLES 12 SP4 with qemu-2.11.2-5.8.1.x86_64 and kernel 4.12.14-95.13. Bernd -- Bernd Lentes Systemadministration Institut für Entwicklungsgenetik Gebäude 35.34 - Raum 208 HelmholtzZentrum münchen bernd.len...@helmholtz-muenchen.de phone: +49 89 3187 1241 phone: +49 89 3187 3827 fax: +49 89 3187 2294 http://www.helmholtz-muenchen.de/idg Perfekt ist wer keine Fehler macht Also sind Tote perfekt Helmholtz Zentrum Muenchen Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH) Ingolstaedter Landstr. 1 85764 Neuherberg www.helmholtz-muenchen.de Aufsichtsratsvorsitzende: MinDir'in Prof. Dr. Veronika von Messling Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Heinrich Bassler, Kerstin Guenther Registergericht: Amtsgericht Muenchen HRB 6466 USt-IdNr: DE 129521671