I dropped all the cloud* databases, deleted everything in primary and secondary storage, and reinstalled the management server, following the guide I wrote for myself the last time I built a stable CloudStack system. Then I imported one of my backed up instances as a template and tried to create a new VM. Same problem as before. How is this possible?
2014-10-10 19:17:44,075 WARN [kvm.resource.LibvirtComputingResource] (agentRequest-Handler-3:null) Timed out: /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/patchviasocket.pl -n r-4-VM -p %template=domP%name=r-4-VM%eth0ip=192.168.102.222%eth0mask=255.255.255.0%gateway=192.168.102.1%domain= lax.ratespecial.com%cidrsize=24%dhcprange=192.168.102.1%eth1ip=169.254.0.33%eth1mask=255.255.0.0%type=dhcpsrvr%disable_rp_filter=true%dns1=192.168.100.2%dns2=192.168.100.3 . Output is: 2014-10-10 19:18:05,078 WARN [kvm.resource.LibvirtComputingResource] (Script-3:null) Interrupting script. On Fri, Oct 10, 2014 at 4:33 PM, Ian Young <iyo...@ratespecial.com> wrote: > I've restarted all the services and restarted the servers too. The SSVM > and CP start with no trouble. Every time I try to start or create an > instance, I see repeated messages like these: > > /var/log/cloudstack/agent/cloudstack-agent.out: > 2014-10-10 16:27:21,841{GMT} WARN [kvm.resource.LibvirtComputingResource] > (Script-8:) Interrupting script. > 2014-10-10 16:27:21,841{GMT} WARN [kvm.resource.LibvirtComputingResource] > (agentRequest-Handler-4:) Timed out: > /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/patchviasocket.pl > -n r-19-VM -p > %template=domP%name=r-19-VM%eth0ip=192.168.102.89%eth0mask=255.255.255.0%gateway=192.168.102.1%domain= > lax.ratespecial.com%cidrsize=24%dhcprange=192.168.102.1%eth1ip=169.254.2.193%eth1mask=255.255.0.0%type=dhcpsrvr%disable_rp_filter=true%dns1=192.168.100.2%dns2=192.168.100.3 > . Output is: > > /var/log/cloudstack/agent/security_group.log: > 2014-10-10 16:27:33,259 - Failed to get rule logs, better luck next time! > > On Fri, Oct 10, 2014 at 3:04 PM, Ian Young <iyo...@ratespecial.com> wrote: > >> I tried to restart the network with the "clean up" option, via the web >> console. After several minutes, it failed to restart the network. The >> SSVM and CP are still running but the VR no longer exists. Why would these >> be able to start but not the virtual router? >> >> On Fri, Oct 10, 2014 at 2:48 PM, Ian Young <iyo...@ratespecial.com> >> wrote: >> >>> I restarted the libvirtd service and the management service is now fully >>> started (there are services listening on ports 8250 and 9090). The SSVM >>> health check script now reports no problems. >>> >>> However, I tried starting an instance and both the instance and the >>> virtual router are in a "starting" state but have been so for almost 10 >>> minutes. In the catalina.out log I see: >>> INFO [c.c.v.VirtualMachineManagerImpl] (AgentManager-Handler-10:null) >>> There is pending job or HA tasks working on the VM. vm id: 4, postpone >>> power-change report by resetting power-change counters >>> INFO [c.c.v.VirtualMachineManagerImpl] (AgentManager-Handler-10:null) >>> There is pending job or HA tasks working on the VM. vm id: 13, postpone >>> power-change report by resetting power-change counters >>> >>> I'm also seeing this in the agent.log: >>> 2014-10-10 14:43:26,833 WARN [kvm.resource.LibvirtComputingResource] >>> (Script-6:null) Interrupting script. >>> 2014-10-10 14:43:26,833 WARN [kvm.resource.LibvirtComputingResource] >>> (agentRequest-Handler-2:null) Timed out: >>> /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/patchviasocket.pl >>> -n r-4-VM -p >>> %template=domP%name=r-4-VM%eth0ip=192.168.102.110%eth0mask=255.255.255.0%gateway=192.168.102.1%domain= >>> lax.ratespecial.com%cidrsize=24%dhcprange=192.168.102.1%eth1ip=169.254.2.181%eth1mask=255.255.0.0%type=dhcpsrvr%disable_rp_filter=true%dns1=192.168.100.2%dns2=192.168.100.3 >>> . Output is: >>> >>> And in the security_group.log: >>> 2014-10-10 14:42:41,926 - Failed to get rule logs, better luck next time! >>> 2014-10-10 14:43:41,926 - Failed to get rule logs, better luck next time! >>> >>> What does this mean? >>> >>> On Fri, Oct 10, 2014 at 2:11 PM, Ian Young <iyo...@ratespecial.com> >>> wrote: >>> >>>> This morning I was unable to start new instances. I discovered that I >>>> could SSH into the SSVM and the console proxy but not the virtual router. >>>> Something strange was happening so I thought it might be a good time to >>>> gracefully stop all the instances and reboot the hypervisor to see if the >>>> VR would start working again. I also rebooted the management server (a >>>> separate machine) to have a clean slate. Now that they've both been >>>> rebooted, the following symptoms exist: >>>> >>>> * On the management server, there is no services listening on 9090 or >>>> 8250. >>>> * When I run the SSVM health check script, it says NFS is not currently >>>> mounted. >>>> * The management server log is reporting that Zone 1 is not ready to >>>> launch SSVM/CP yet, even though both of those are running. >>>> >>>> The NFS server is running just fine. I can mount it in the management >>>> server with no problems. I've restarted cloudstack-management and >>>> cloudstack-agent but the problems persist. The "not ready to launch >>>> SSVM/CP yet" messages sounds like the management server and the hypervisor >>>> are not communicating or some information about the system state is out of >>>> sync. How can I confirm this? >>>> >>> >>> >> >