Hi there,

Yesterday I did the upgrade from 2.2 up to 2.3 (pveversion see below) on all three nodes of our cluster (no HA). At 23:00 the usual backup of a KVM Machine (801) started via vzdump.cron on Node 3 and ended with errors (see syslog below).

After this crash the VMs on Node 3 and the Webinterface had not been reachable anymore.

We restarted pvedaemond and pvestatd and had been able to reach the webinterface.

We tried to stop the vms but the processes "vzctl stop xxx" remained in the process list, even kill -9 did not work for removing them. "reboot" via ssh failed also - we had to execute an "echo b > /proc/sysrq-trigger" to restart the host.

After reboot everthing was fine, the VMs started again.

Now we have on the two other nodes (no reboot) still an issue in syslog:

Mar 21 12:09:18 promo2 pvestatd[101835]: WARNING: command 'df -P -B 1 /mnt/pve/p3_storage' failed: got timeout"But an

But on the bash the "df -P -B 1 /mnt/pve/p3_storage" works fine on every of the three hosts.


Had this heavy backup issue been reported earlier?
Any hints to prevent from that?

Regards, Martin


pve-manager: 2.3-13 (pve-manager/2.3/7946f1f1)
running kernel: 2.6.32-19-pve
proxmox-ve-2.6.32: 2.3-93
pve-kernel-2.6.32-10-pve: 2.6.32-63
pve-kernel-2.6.32-19-pve: 2.6.32-93
pve-kernel-2.6.32-17-pve: 2.6.32-83
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.4-4
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.93-2
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.9-1
pve-cluster: 1.0-36
qemu-server: 2.3-18
pve-firmware: 1.0-21
libpve-common-perl: 1.0-49
libpve-access-control: 1.0-26
libpve-storage-perl: 2.3-6
vncterm: 1.0-3
vzctl: 4.0-1pve2
vzprocps: 2.0.11-2
vzquota: 3.1-1
pve-qemu-kvm: 1.4-8
ksm-control-daemon: 1.1-1


Mar 20 23:00:01 promo3 /USR/SBIN/CRON[150583]: (root) CMD (vzdump 801 306 --quiet 1 --mode snapshot --compress lzo --storage p2_storage) Mar 20 23:00:02 promo3 vzdump[150584]: <root@pam> starting task UPID:promo3:00024C3A:00785E0A:514A3162:vzdump::root@pam: Mar 20 23:00:02 promo3 vzdump[150586]: INFO: starting new backup job: vzdump 801 306 --quiet 1 --mode snapshot --compress lzo --storage p2_storage Mar 20 23:00:02 promo3 vzdump[150586]: INFO: Starting Backup of VM 306 (openvz) Mar 20 23:00:31 promo3 pvestatd[2328]: WARNING: unable to connect to VM 801 socket - timeout after 31 retries
...
Mar 20 23:03:11 promo3 pvestatd[2328]: WARNING: unable to connect to VM 801 socket - timeout after 31 retries Mar 20 23:03:18 promo3 kernel: INFO: task kvm:2585 blocked for more than 120 seconds. Mar 20 23:03:18 promo3 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 20 23:03:18 promo3 kernel: kvm D ffff88107a480da0 0 2585 1 0 0x00000000 Mar 20 23:03:18 promo3 kernel: ffff88107a92fd08 0000000000000082 0000000000000000 ffff880879df35c8 Mar 20 23:03:18 promo3 kernel: ffff880878cc08c0 00000000000000db ffff88107c415810 ffff88107a92fab8 Mar 20 23:03:18 promo3 kernel: ffff88107c415800 0000000104af1976 ffff88107a481368 000000000001e9c0
Mar 20 23:03:18 promo3 kernel: Call Trace:
Mar 20 23:03:18 promo3 kernel: [<ffffffff8119ad69>] __sb_start_write+0x169/0x1a0 Mar 20 23:03:18 promo3 kernel: [<ffffffff81097200>] ? autoremove_wake_function+0x0/0x40 Mar 20 23:03:18 promo3 kernel: [<ffffffff81127489>] generic_file_aio_write+0x69/0x100 Mar 20 23:03:18 promo3 kernel: [<ffffffff811e325b>] aio_rw_vect_retry+0xbb/0x220
Mar 20 23:03:18 promo3 kernel: [<ffffffff811e4bc4>] aio_run_iocb+0x64/0x170
Mar 20 23:03:18 promo3 kernel: [<ffffffff811e614c>] do_io_submit+0x2bc/0x670
Mar 20 23:03:18 promo3 kernel: [<ffffffff811e6510>] sys_io_submit+0x10/0x20
Mar 20 23:03:18 promo3 kernel: [<ffffffff8100b102>] system_call_fastpath+0x16/0x1b Mar 20 23:03:18 promo3 kernel: INFO: task lvcreate:150596 blocked for more than 120 seconds. Mar 20 23:03:18 promo3 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 20 23:03:18 promo3 kernel: lvcreate D ffff88087aae6d20 0 150596 150595 0 0x00000000 Mar 20 23:03:18 promo3 kernel: ffff8802fc5bbc48 0000000000000082 0000000000000000 00000000000000d2 Mar 20 23:03:18 promo3 kernel: ffffe8ffffffffff ffff88087bec5760 ffffffff81ac37d0 ffffffff8141c110 Mar 20 23:03:18 promo3 kernel: 0000000000000000 0000000104af1b10 ffff88087aae72e8 000000000001e9c0
Mar 20 23:03:18 promo3 kernel: Call Trace:
Mar 20 23:03:18 promo3 kernel: [<ffffffff8141c110>] ? copy_params+0x90/0x110
Mar 20 23:03:18 promo3 kernel: [<ffffffff8119ab6d>] sb_wait_write+0x9d/0xb0
Mar 20 23:03:18 promo3 kernel: [<ffffffff81097200>] ? autoremove_wake_function+0x0/0x40
Mar 20 23:03:18 promo3 kernel: [<ffffffff8119c2d0>] freeze_super+0x60/0x140
Mar 20 23:03:18 promo3 kernel: [<ffffffff811d5ad8>] freeze_bdev+0x98/0xe0
Mar 20 23:03:18 promo3 kernel: [<ffffffff81415697>] dm_suspend+0x97/0x270
Mar 20 23:03:18 promo3 kernel: [<ffffffff8141a1dc>] ? __find_device_hash_cell+0xac/0x170
Mar 20 23:03:18 promo3 kernel: [<ffffffff8141b4a6>] dev_suspend+0x76/0x250
Mar 20 23:03:18 promo3 kernel: [<ffffffff8141c344>] ctl_ioctl+0x1b4/0x270
Mar 20 23:03:18 promo3 kernel: [<ffffffff8141b430>] ? dev_suspend+0x0/0x250
Mar 20 23:03:18 promo3 kernel: [<ffffffff8141c413>] dm_ctl_ioctl+0x13/0x20
Mar 20 23:03:18 promo3 kernel: [<ffffffff811ac622>] vfs_ioctl+0x22/0xa0
Mar 20 23:03:18 promo3 kernel: [<ffffffff81061bcf>] ? pick_next_task_fair+0x16f/0x1f0 Mar 20 23:03:18 promo3 kernel: [<ffffffff8109e52d>] ? sched_clock_cpu+0xcd/0x110
Mar 20 23:03:18 promo3 kernel: [<ffffffff811ac7ca>] do_vfs_ioctl+0x8a/0x590
Mar 20 23:03:18 promo3 kernel: [<ffffffff8151dc50>] ? thread_return+0xbe/0x88e
Mar 20 23:03:18 promo3 kernel: [<ffffffff8108e675>] ? set_one_prio+0x75/0xd0
Mar 20 23:03:18 promo3 kernel: [<ffffffff811acd1f>] sys_ioctl+0x4f/0x80
Mar 20 23:03:18 promo3 kernel: [<ffffffff8100b102>] system_call_fastpath+0x16/0x1b Mar 20 23:03:21 promo3 pvestatd[2328]: WARNING: unable to connect to VM 801 socket - timeout after 31 retries
...



_______________________________________________
pve-user mailing list
[email protected]
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user

Reply via email to