Re: [PVE-User] pveproxy dying, node unusable
> On 20/12/2017 12:41 AM, Lindsay Mathieson wrote: > Is it possible to rollback the last update? I'd backup contents of /etc/network/interfaces, /etc/pve/qemu-server (vm config files), vm disk images, and anything else you don't want to lose and do a full reinstall on the sick nodes. Last time I had to do something similar (changing addresses on corosync, storage, management... it was easier to just rebuild it) it took approx. 15min/node, not counting waiting for my PE r730's to post, but YMMV There's also this: https://unix.stackexchange.com/questions/79050/can-i-rollback-an-apt-get-upgrade-if-something-goes-wrong but I'd be very very cautious with that Either way plan yourself a nice wide downtime window and prepare for the worst Good luck -Ed ___ pve-user mailing list pve-user@pve.proxmox.com https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
Re: [PVE-User] pveproxy dying, node unusable
nb. This is with Proxmox 4 On 20/12/2017 10:13 AM, Lindsay Mathieson wrote: On 20/12/2017 12:41 AM, Lindsay Mathieson wrote: Having to hard reset them as I need them usable again before work starts. And pveproxy hung on both nodes again this morning, this is becoming quite a problem for us. [21360.917460] INFO: task pveproxy:18122 blocked for more than 120 seconds. [21360.917465] Tainted: P O 4.4.95-1-pve #1 [21360.917469] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [21360.917473] pveproxy D 8807799cbdf8 0 18122 1 0x0004 [21360.917476] 8807799cbdf8 880ff114a840 880ff84fc600 880fd9979c00 [21360.917478] 8807799cc000 880fc30143ac 880fd9979c00 [21360.917480] 880fc30143b0 8807799cbe10 818643b5 880fc30143a8 [21360.917482] Call Trace: [21360.917485] [] schedule+0x35/0x80 [21360.917487] [] schedule_preempt_disabled+0xe/0x10 [21360.917489] [] __mutex_lock_slowpath+0xb9/0x130 [21360.917491] [] mutex_lock+0x1f/0x30 [21360.917493] [] filename_create+0x7a/0x160 [21360.917495] [] SyS_mkdir+0x53/0x100 [21360.917497] [] entry_SYSCALL_64_fastpath+0x16/0x75 Is it possible to rollback the last update? -- Lindsay Mathieson ___ pve-user mailing list pve-user@pve.proxmox.com https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
Re: [PVE-User] pveproxy dying, node unusable
On 12/12/2017 2:14 AM, Emmanuel Kasper wrote: Hi Lindsay As a quick check, is the cluster file system mounted on /etc/pve and can you read files there normally ( ie cat /etc/pve/datacenter.cfg working ) ? Are the node storages returning their status properly ? (ie pvesm status does not hang) Just had this exact same behaviour. multiple unkillable pveproxy processes with the timeout errors in dmesg. Only for the two nodes I upgraded. - cluster file system is fine - pvesm returns all storage ok. - pvecm status is normal - qm list and qm migrate just hang. - can't connect to the webgui on the two ndoes in question. Having to hard reset them as I need them usable again before work starts. -- Lindsay Mathieson ___ pve-user mailing list pve-user@pve.proxmox.com https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
Re: [PVE-User] pveproxy dying, node unusable
On 12/12/2017 2:14 AM, Emmanuel Kasper wrote: Hi Lindsay As a quick check, is the cluster file system mounted on /etc/pve and can you read files there normally ( ie cat /etc/pve/datacenter.cfg working ) ? Unfortunately I hard reset both nodes as I needed them up. But a pvecm status showed that quorum was ok and the nodes were marked green in the web gui. /etc/pve was mounted and accessible on the unaffected node. Are the node storages returning their status properly ? (ie pvesm status does not hang) Yes they were (pvesm status). nb. Both nodes are running ok after a reset now. thanks. -- Lindsay Mathieson ___ pve-user mailing list pve-user@pve.proxmox.com https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
Re: [PVE-User] pveproxy dying, node unusable
On 12/11/2017 04:50 PM, Lindsay Mathieson wrote: > Also I was unable to connect to the VM's on those nodes, not even via RDP > > On 12/12/2017 1:46 AM, Lindsay Mathieson wrote: >> >> I dist-upraded two nodes yesterday. Now both those nodes have multiple >> unkilliable pveproxy processes. dmesg has many entries of: >> >> [50996.416909] INFO: task pveproxy:6798 blocked for more than 120 >> seconds. >> [50996.416914] Tainted: P O 4.4.95-1-pve #1 >> [50996.416918] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >> disables this message. >> [50996.416922] pveproxy D 8809194e3df8 0 6798 1 >> 0x0004 >> [50996.416925] 8809194e3df8 880ff6f5ed80 880ff84fe200 >> 880fded5e200 >> [50996.416927] 8809194e4000 880fc7fb43ac 880fded5e200 >> >> [50996.416929] 880fc7fb43b0 8809194e3e10 818643b5 >> 880fc7fb43a8 >> >> >> qm list hangs >> >> Node vms do not respond in web gui >> >> The node I did not upgrade is fine. Hi Lindsay As a quick check, is the cluster file system mounted on /etc/pve and can you read files there normally ( ie cat /etc/pve/datacenter.cfg working ) ? Are the node storages returning their status properly ? (ie pvesm status does not hang) ___ pve-user mailing list pve-user@pve.proxmox.com https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
Re: [PVE-User] pveproxy dying, node unusable
Also I was unable to connect to the VM's on those nodes, not even via RDP On 12/12/2017 1:46 AM, Lindsay Mathieson wrote: I dist-upraded two nodes yesterday. Now both those nodes have multiple unkilliable pveproxy processes. dmesg has many entries of: [50996.416909] INFO: task pveproxy:6798 blocked for more than 120 seconds. [50996.416914] Tainted: P O 4.4.95-1-pve #1 [50996.416918] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [50996.416922] pveproxy D 8809194e3df8 0 6798 1 0x0004 [50996.416925] 8809194e3df8 880ff6f5ed80 880ff84fe200 880fded5e200 [50996.416927] 8809194e4000 880fc7fb43ac 880fded5e200 [50996.416929] 880fc7fb43b0 8809194e3e10 818643b5 880fc7fb43a8 qm list hangs Node vms do not respond in web gui The node I did not upgrade is fine. -- Lindsay Mathieson -- Lindsay Mathieson ___ pve-user mailing list pve-user@pve.proxmox.com https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user