ACS 4.15.2 KVM Ubuntu 20.04 Hi all
We had a physical host crash on Friday due to hardware failure. This appeared to have caused issues with some RVR’s going into an ‘unknown’ state. The strange thing was that on any host where a RVR in an unknown state was running – we could not console onto any VM’s on that host – nor could we SSH directly to the RVR from the host. The UI was showing all hosts agent state as ‘UP’ Only when we restarted the ACS mgmt. service did we notice that the host agent where a RVR was running in an ‘unknown’ state then was in a ‘connecting’ state for some time – there were no networking issues either – host was pingable from the mgmt. server. We were then briefly able to console onto one of the RVR’s in an unknown state and then discovered that the RVR was indeed corrupt – this is the screenshot of the RVR terminal : [cid:image006.png@01DA68AE.A9D7A090] We then marked the RVR in the DB as ‘stopped’ and virsh destroyed it directly on the host. We were then able to restart the VPC with cleanup which then re-created the corrupt RVR. It then appeared that once the corrupt RVR had gone – all other RVR’s in an unknown state transitioned to ‘backup’ state We are wondering if we have encountered a bug where if a corrupt RVR crashes the host cloudstack agent if ACS tries to do anything with the RVR – like restart it BR Gary Gary Dixon Quadris Cloud Manager 0161 537 4980 +44 7989717661 gary.di...@quadris.co.uk www.quadris.com Innovation House, 12-13 Bredbury Business Park Bredbury Park Way, Bredbury, Stockport, SK6 2SN