Re: [PVE-User] pveproxy dying, node unusable

2018-01-02 Thread Edwin Pers
> On 20/12/2017 12:41 AM, Lindsay Mathieson wrote:
> Is it possible to rollback the last update?

I'd backup contents of /etc/network/interfaces, /etc/pve/qemu-server (vm config 
files), vm disk images, and anything else you don't want to lose and do a full 
reinstall on the sick nodes. Last time I had to do something similar (changing 
addresses on corosync, storage, management... it was easier to just rebuild it) 
it took approx. 15min/node, not counting waiting for my PE r730's to post, but 
YMMV

There's also this:
https://unix.stackexchange.com/questions/79050/can-i-rollback-an-apt-get-upgrade-if-something-goes-wrong
but I'd be very very cautious with that

Either way plan yourself a nice wide downtime window and prepare for the worst

Good luck

-Ed

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] pveproxy dying, node unusable

2017-12-19 Thread Lindsay Mathieson

nb. This is with Proxmox 4

On 20/12/2017 10:13 AM, Lindsay Mathieson wrote:

On 20/12/2017 12:41 AM, Lindsay Mathieson wrote:
Having to hard reset them as I need them usable again before work 
starts.


And pveproxy hung on both nodes again this morning, this is becoming 
quite a problem for us.



[21360.917460] INFO: task pveproxy:18122 blocked for more than 120 
seconds.

[21360.917465]   Tainted: P   O    4.4.95-1-pve #1
[21360.917469] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[21360.917473] pveproxy    D 8807799cbdf8 0 18122 1 
0x0004
[21360.917476]  8807799cbdf8 880ff114a840 880ff84fc600 
880fd9979c00
[21360.917478]  8807799cc000 880fc30143ac 880fd9979c00 

[21360.917480]  880fc30143b0 8807799cbe10 818643b5 
880fc30143a8

[21360.917482] Call Trace:
[21360.917485]  [] schedule+0x35/0x80
[21360.917487]  [] schedule_preempt_disabled+0xe/0x10
[21360.917489]  [] __mutex_lock_slowpath+0xb9/0x130
[21360.917491]  [] mutex_lock+0x1f/0x30
[21360.917493]  [] filename_create+0x7a/0x160
[21360.917495]  [] SyS_mkdir+0x53/0x100
[21360.917497]  [] entry_SYSCALL_64_fastpath+0x16/0x75


Is it possible to rollback the last update?



--
Lindsay Mathieson

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] pveproxy dying, node unusable

2017-12-19 Thread Lindsay Mathieson

On 12/12/2017 2:14 AM, Emmanuel Kasper wrote:

Hi Lindsay
As a quick check, is the cluster file system mounted on /etc/pve and can
you read files there normally ( ie cat /etc/pve/datacenter.cfg working ) ?

Are the node storages  returning their status properly ?
(ie pvesm status does not hang)



Just had this exact same behaviour. multiple unkillable pveproxy 
processes with the timeout errors in dmesg. Only for the two nodes I 
upgraded.


- cluster file system is fine

- pvesm returns all storage ok.

- pvecm status is normal

- qm list and qm migrate just hang.


- can't connect to the webgui on the two ndoes in question.


Having to hard reset them as I need them usable again before work starts.


--
Lindsay Mathieson

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] pveproxy dying, node unusable

2017-12-11 Thread Lindsay Mathieson

On 12/12/2017 2:14 AM, Emmanuel Kasper wrote:

Hi Lindsay
As a quick check, is the cluster file system mounted on /etc/pve and can
you read files there normally ( ie cat /etc/pve/datacenter.cfg working ) ?


Unfortunately I hard reset both nodes as I needed them up. But a pvecm 
status showed that quorum was ok and the nodes were marked green in the 
web gui.


/etc/pve was mounted and accessible on the unaffected node.



Are the node storages  returning their status properly ?
(ie pvesm status does not hang)



Yes they were (pvesm status).


nb. Both nodes are running ok after a reset now.


thanks.

--
Lindsay Mathieson

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] pveproxy dying, node unusable

2017-12-11 Thread Emmanuel Kasper

On 12/11/2017 04:50 PM, Lindsay Mathieson wrote:
> Also I was unable to connect to the VM's on those nodes, not even via RDP
> 
> On 12/12/2017 1:46 AM, Lindsay Mathieson wrote:
>>
>> I dist-upraded two nodes yesterday. Now both those nodes have multiple
>> unkilliable pveproxy processes. dmesg has many entries of:
>>
>>     [50996.416909] INFO: task pveproxy:6798 blocked for more than 120
>>     seconds.
>>     [50996.416914]   Tainted: P   O 4.4.95-1-pve #1
>>     [50996.416918] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>     disables this message.
>>     [50996.416922] pveproxy    D 8809194e3df8 0  6798  1
>>     0x0004
>>     [50996.416925]  8809194e3df8 880ff6f5ed80 880ff84fe200
>>     880fded5e200
>>     [50996.416927]  8809194e4000 880fc7fb43ac 880fded5e200
>>     
>>     [50996.416929]  880fc7fb43b0 8809194e3e10 818643b5
>>     880fc7fb43a8
>>
>>
>> qm list hangs
>>
>> Node vms do not respond in web gui
>>
>> The node I did not upgrade is fine.


Hi Lindsay
As a quick check, is the cluster file system mounted on /etc/pve and can
you read files there normally ( ie cat /etc/pve/datacenter.cfg working ) ?

Are the node storages  returning their status properly ?
(ie pvesm status does not hang)

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] pveproxy dying, node unusable

2017-12-11 Thread Lindsay Mathieson

Also I was unable to connect to the VM's on those nodes, not even via RDP

On 12/12/2017 1:46 AM, Lindsay Mathieson wrote:


I dist-upraded two nodes yesterday. Now both those nodes have multiple 
unkilliable pveproxy processes. dmesg has many entries of:


[50996.416909] INFO: task pveproxy:6798 blocked for more than 120
seconds.
[50996.416914]   Tainted: P   O 4.4.95-1-pve #1
[50996.416918] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[50996.416922] pveproxy    D 8809194e3df8 0  6798  1
0x0004
[50996.416925]  8809194e3df8 880ff6f5ed80 880ff84fe200
880fded5e200
[50996.416927]  8809194e4000 880fc7fb43ac 880fded5e200

[50996.416929]  880fc7fb43b0 8809194e3e10 818643b5
880fc7fb43a8


qm list hangs

Node vms do not respond in web gui

The node I did not upgrade is fine.


--
Lindsay Mathieson



--
Lindsay Mathieson

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user