Good going Ian, sorry you didn't get any assistance on the way. Did you find a setting that should have a different default? Like the router service offering memory :P or doesn't that make any sense?
On Sat, Oct 11, 2014 at 5:11 AM, Ian Young <iyo...@ratespecial.com> wrote: > Aha! I restarted cloudstack-agent, which caused the virtual router to > change to a "stopped" status in the management console. However, the > console viewer icon was still visible, so I clicked it. The router had run > out of memory and caused a kernel panic. I created a new system service > offering with 500 MB of memory, changed the router's service offering, and > started it. It booted with no problem. The default memory size of 128 MB > is not enough. This is the system VM template I was using: > > > http://cloudstack.apt-get.eu/systemvm/4.4/systemvm64template-4.4.0-6-kvm.qcow2.bz2 > > On Fri, Oct 10, 2014 at 7:28 PM, Ian Young <iyo...@ratespecial.com> wrote: > > > I dropped all the cloud* databases, deleted everything in primary and > > secondary storage, and reinstalled the management server, following the > > guide I wrote for myself the last time I built a stable CloudStack > system. > > Then I imported one of my backed up instances as a template and tried to > > create a new VM. Same problem as before. How is this possible? > > > > 2014-10-10 19:17:44,075 WARN [kvm.resource.LibvirtComputingResource] > > (agentRequest-Handler-3:null) Timed out: > > /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/patchviasocket.pl > > -n r-4-VM -p > > > %template=domP%name=r-4-VM%eth0ip=192.168.102.222%eth0mask=255.255.255.0%gateway=192.168.102.1%domain= > > lax.ratespecial.com > %cidrsize=24%dhcprange=192.168.102.1%eth1ip=169.254.0.33%eth1mask=255.255.0.0%type=dhcpsrvr%disable_rp_filter=true%dns1=192.168.100.2%dns2=192.168.100.3 > > . Output is: > > 2014-10-10 19:18:05,078 WARN [kvm.resource.LibvirtComputingResource] > > (Script-3:null) Interrupting script. > > > > On Fri, Oct 10, 2014 at 4:33 PM, Ian Young <iyo...@ratespecial.com> > wrote: > > > >> I've restarted all the services and restarted the servers too. The SSVM > >> and CP start with no trouble. Every time I try to start or create an > >> instance, I see repeated messages like these: > >> > >> /var/log/cloudstack/agent/cloudstack-agent.out: > >> 2014-10-10 16:27:21,841{GMT} WARN > >> [kvm.resource.LibvirtComputingResource] (Script-8:) Interrupting > script. > >> 2014-10-10 16:27:21,841{GMT} WARN > >> [kvm.resource.LibvirtComputingResource] (agentRequest-Handler-4:) Timed > >> out: /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/ > >> patchviasocket.pl -n r-19-VM -p > >> > %template=domP%name=r-19-VM%eth0ip=192.168.102.89%eth0mask=255.255.255.0%gateway=192.168.102.1%domain= > >> lax.ratespecial.com > %cidrsize=24%dhcprange=192.168.102.1%eth1ip=169.254.2.193%eth1mask=255.255.0.0%type=dhcpsrvr%disable_rp_filter=true%dns1=192.168.100.2%dns2=192.168.100.3 > >> . Output is: > >> > >> /var/log/cloudstack/agent/security_group.log: > >> 2014-10-10 16:27:33,259 - Failed to get rule logs, better luck next > time! > >> > >> On Fri, Oct 10, 2014 at 3:04 PM, Ian Young <iyo...@ratespecial.com> > >> wrote: > >> > >>> I tried to restart the network with the "clean up" option, via the web > >>> console. After several minutes, it failed to restart the network. The > >>> SSVM and CP are still running but the VR no longer exists. Why would > these > >>> be able to start but not the virtual router? > >>> > >>> On Fri, Oct 10, 2014 at 2:48 PM, Ian Young <iyo...@ratespecial.com> > >>> wrote: > >>> > >>>> I restarted the libvirtd service and the management service is now > >>>> fully started (there are services listening on ports 8250 and 9090). > The > >>>> SSVM health check script now reports no problems. > >>>> > >>>> However, I tried starting an instance and both the instance and the > >>>> virtual router are in a "starting" state but have been so for almost > 10 > >>>> minutes. In the catalina.out log I see: > >>>> INFO [c.c.v.VirtualMachineManagerImpl] (AgentManager-Handler-10:null) > >>>> There is pending job or HA tasks working on the VM. vm id: 4, postpone > >>>> power-change report by resetting power-change counters > >>>> INFO [c.c.v.VirtualMachineManagerImpl] (AgentManager-Handler-10:null) > >>>> There is pending job or HA tasks working on the VM. vm id: 13, > postpone > >>>> power-change report by resetting power-change counters > >>>> > >>>> I'm also seeing this in the agent.log: > >>>> 2014-10-10 14:43:26,833 WARN [kvm.resource.LibvirtComputingResource] > >>>> (Script-6:null) Interrupting script. > >>>> 2014-10-10 14:43:26,833 WARN [kvm.resource.LibvirtComputingResource] > >>>> (agentRequest-Handler-2:null) Timed out: > >>>> /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/ > >>>> patchviasocket.pl -n r-4-VM -p > >>>> > %template=domP%name=r-4-VM%eth0ip=192.168.102.110%eth0mask=255.255.255.0%gateway=192.168.102.1%domain= > >>>> lax.ratespecial.com > %cidrsize=24%dhcprange=192.168.102.1%eth1ip=169.254.2.181%eth1mask=255.255.0.0%type=dhcpsrvr%disable_rp_filter=true%dns1=192.168.100.2%dns2=192.168.100.3 > >>>> . Output is: > >>>> > >>>> And in the security_group.log: > >>>> 2014-10-10 14:42:41,926 - Failed to get rule logs, better luck next > >>>> time! > >>>> 2014-10-10 14:43:41,926 - Failed to get rule logs, better luck next > >>>> time! > >>>> > >>>> What does this mean? > >>>> > >>>> On Fri, Oct 10, 2014 at 2:11 PM, Ian Young <iyo...@ratespecial.com> > >>>> wrote: > >>>> > >>>>> This morning I was unable to start new instances. I discovered that > I > >>>>> could SSH into the SSVM and the console proxy but not the virtual > router. > >>>>> Something strange was happening so I thought it might be a good time > to > >>>>> gracefully stop all the instances and reboot the hypervisor to see > if the > >>>>> VR would start working again. I also rebooted the management server > (a > >>>>> separate machine) to have a clean slate. Now that they've both been > >>>>> rebooted, the following symptoms exist: > >>>>> > >>>>> * On the management server, there is no services listening on 9090 or > >>>>> 8250. > >>>>> * When I run the SSVM health check script, it says NFS is not > >>>>> currently mounted. > >>>>> * The management server log is reporting that Zone 1 is not ready to > >>>>> launch SSVM/CP yet, even though both of those are running. > >>>>> > >>>>> The NFS server is running just fine. I can mount it in the > management > >>>>> server with no problems. I've restarted cloudstack-management and > >>>>> cloudstack-agent but the problems persist. The "not ready to launch > >>>>> SSVM/CP yet" messages sounds like the management server and the > hypervisor > >>>>> are not communicating or some information about the system state is > out of > >>>>> sync. How can I confirm this? > >>>>> > >>>> > >>>> > >>> > >> > > > -- Daan