ACS 4.15.2
KVM
Ubuntu 20.04

Hi all

We had a physical host crash on Friday due to hardware failure. This appeared 
to have caused issues with some RVR’s going into an ‘unknown’ state.

The strange thing was that on any host where a RVR in an unknown state was 
running – we could not console onto any VM’s on that host – nor could we SSH 
directly to the RVR from the host.
The UI was showing all hosts agent state as ‘UP’

Only when we restarted the ACS mgmt. service did we notice that the host agent 
where a RVR was running in an ‘unknown’ state then was in a ‘connecting’ state 
for some time – there were no networking issues either – host was pingable from 
the mgmt. server.

We were then briefly able to console onto one of the RVR’s in an unknown state 
and then discovered that the RVR was indeed corrupt – this is the screenshot of 
the RVR terminal :
[cid:image006.png@01DA68AE.A9D7A090]

We then marked the RVR in the DB as ‘stopped’ and virsh destroyed it directly 
on the host. We were then able to restart the VPC with cleanup which then 
re-created the corrupt RVR.
It then appeared that once the corrupt RVR had gone – all other RVR’s in an 
unknown state transitioned to ‘backup’ state

We are wondering if we have encountered a bug where if a corrupt RVR crashes 
the host cloudstack agent if ACS tries to do anything with the RVR – like 
restart it

BR

Gary




Gary Dixon
Quadris Cloud Manager
0161 537 4980 +44 7989717661
gary.di...@quadris.co.uk
www.quadris.com
Innovation House, 12-13 Bredbury Business Park
Bredbury Park Way, Bredbury, Stockport, SK6 2SN

Reply via email to