Re: services not running after reboot

Daan Hoogland Mon, 13 Oct 2014 02:19:02 -0700

Good going Ian, sorry you didn't get any assistance on the way. Did you
find a setting that should have a different default? Like the router
service offering memory :P or doesn't that make any sense?


On Sat, Oct 11, 2014 at 5:11 AM, Ian Young <[email protected]> wrote:

> Aha!  I restarted cloudstack-agent, which caused the virtual router to
> change to a "stopped" status in the management console.  However, the
> console viewer icon was still visible, so I clicked it.  The router had run
> out of memory and caused a kernel panic.  I created a new system service
> offering with 500 MB of memory, changed the router's service offering, and
> started it.  It booted with no problem.  The default memory size of 128 MB
> is not enough.  This is the system VM template I was using:
>
>
> http://cloudstack.apt-get.eu/systemvm/4.4/systemvm64template-4.4.0-6-kvm.qcow2.bz2
>
> On Fri, Oct 10, 2014 at 7:28 PM, Ian Young <[email protected]> wrote:
>
> > I dropped all the cloud* databases, deleted everything in primary and
> > secondary storage, and reinstalled the management server, following the
> > guide I wrote for myself the last time I built a stable CloudStack
> system.
> > Then I imported one of my backed up instances as a template and tried to
> > create a new VM.  Same problem as before.  How is this possible?
> >
> > 2014-10-10 19:17:44,075 WARN  [kvm.resource.LibvirtComputingResource]
> > (agentRequest-Handler-3:null) Timed out:
> > /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/patchviasocket.pl
> > -n r-4-VM -p
> >
> %template=domP%name=r-4-VM%eth0ip=192.168.102.222%eth0mask=255.255.255.0%gateway=192.168.102.1%domain=
> > lax.ratespecial.com
> %cidrsize=24%dhcprange=192.168.102.1%eth1ip=169.254.0.33%eth1mask=255.255.0.0%type=dhcpsrvr%disable_rp_filter=true%dns1=192.168.100.2%dns2=192.168.100.3
> > .  Output is:
> > 2014-10-10 19:18:05,078 WARN  [kvm.resource.LibvirtComputingResource]
> > (Script-3:null) Interrupting script.
> >
> > On Fri, Oct 10, 2014 at 4:33 PM, Ian Young <[email protected]>
> wrote:
> >
> >> I've restarted all the services and restarted the servers too.  The SSVM
> >> and CP start with no trouble.  Every time I try to start or create an
> >> instance, I see repeated messages like these:
> >>
> >> /var/log/cloudstack/agent/cloudstack-agent.out:
> >> 2014-10-10 16:27:21,841{GMT} WARN
> >>  [kvm.resource.LibvirtComputingResource] (Script-8:) Interrupting
> script.
> >> 2014-10-10 16:27:21,841{GMT} WARN
> >>  [kvm.resource.LibvirtComputingResource] (agentRequest-Handler-4:) Timed
> >> out: /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/
> >> patchviasocket.pl -n r-19-VM -p
> >>
> %template=domP%name=r-19-VM%eth0ip=192.168.102.89%eth0mask=255.255.255.0%gateway=192.168.102.1%domain=
> >> lax.ratespecial.com
> %cidrsize=24%dhcprange=192.168.102.1%eth1ip=169.254.2.193%eth1mask=255.255.0.0%type=dhcpsrvr%disable_rp_filter=true%dns1=192.168.100.2%dns2=192.168.100.3
> >> .  Output is:
> >>
> >> /var/log/cloudstack/agent/security_group.log:
> >> 2014-10-10 16:27:33,259 - Failed to get rule logs, better luck next
> time!
> >>
> >> On Fri, Oct 10, 2014 at 3:04 PM, Ian Young <[email protected]>
> >> wrote:
> >>
> >>> I tried to restart the network with the "clean up" option, via the web
> >>> console.  After several minutes, it failed to restart the network.  The
> >>> SSVM and CP are still running but the VR no longer exists.  Why would
> these
> >>> be able to start but not the virtual router?
> >>>
> >>> On Fri, Oct 10, 2014 at 2:48 PM, Ian Young <[email protected]>
> >>> wrote:
> >>>
> >>>> I restarted the libvirtd service and the management service is now
> >>>> fully started (there are services listening on ports 8250 and 9090).
> The
> >>>> SSVM health check script now reports no problems.
> >>>>
> >>>> However, I tried starting an instance and both the instance and the
> >>>> virtual router are in a "starting" state but have been so for almost
> 10
> >>>> minutes.  In the catalina.out log I see:
> >>>> INFO  [c.c.v.VirtualMachineManagerImpl] (AgentManager-Handler-10:null)
> >>>> There is pending job or HA tasks working on the VM. vm id: 4, postpone
> >>>> power-change report by resetting power-change counters
> >>>> INFO  [c.c.v.VirtualMachineManagerImpl] (AgentManager-Handler-10:null)
> >>>> There is pending job or HA tasks working on the VM. vm id: 13,
> postpone
> >>>> power-change report by resetting power-change counters
> >>>>
> >>>> I'm also seeing this in the agent.log:
> >>>> 2014-10-10 14:43:26,833 WARN  [kvm.resource.LibvirtComputingResource]
> >>>> (Script-6:null) Interrupting script.
> >>>> 2014-10-10 14:43:26,833 WARN  [kvm.resource.LibvirtComputingResource]
> >>>> (agentRequest-Handler-2:null) Timed out:
> >>>> /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/
> >>>> patchviasocket.pl -n r-4-VM -p
> >>>>
> %template=domP%name=r-4-VM%eth0ip=192.168.102.110%eth0mask=255.255.255.0%gateway=192.168.102.1%domain=
> >>>> lax.ratespecial.com
> %cidrsize=24%dhcprange=192.168.102.1%eth1ip=169.254.2.181%eth1mask=255.255.0.0%type=dhcpsrvr%disable_rp_filter=true%dns1=192.168.100.2%dns2=192.168.100.3
> >>>> .  Output is:
> >>>>
> >>>> And in the security_group.log:
> >>>> 2014-10-10 14:42:41,926 - Failed to get rule logs, better luck next
> >>>> time!
> >>>> 2014-10-10 14:43:41,926 - Failed to get rule logs, better luck next
> >>>> time!
> >>>>
> >>>> What does this mean?
> >>>>
> >>>> On Fri, Oct 10, 2014 at 2:11 PM, Ian Young <[email protected]>
> >>>> wrote:
> >>>>
> >>>>> This morning I was unable to start new instances.  I discovered that
> I
> >>>>> could SSH into the SSVM and the console proxy but not the virtual
> router.
> >>>>> Something strange was happening so I thought it might be a good time
> to
> >>>>> gracefully stop all the instances and reboot the hypervisor to see
> if the
> >>>>> VR would start working again.  I also rebooted the management server
> (a
> >>>>> separate machine) to have a clean slate.  Now that they've both been
> >>>>> rebooted, the following symptoms exist:
> >>>>>
> >>>>> * On the management server, there is no services listening on 9090 or
> >>>>> 8250.
> >>>>> * When I run the SSVM health check script, it says NFS is not
> >>>>> currently mounted.
> >>>>> * The management server log is reporting that Zone 1 is not ready to
> >>>>> launch SSVM/CP yet, even though both of those are running.
> >>>>>
> >>>>> The NFS server is running just fine.  I can mount it in the
> management
> >>>>> server with no problems.  I've restarted cloudstack-management and
> >>>>> cloudstack-agent but the problems persist.  The "not ready to launch
> >>>>> SSVM/CP yet" messages sounds like the management server and the
> hypervisor
> >>>>> are not communicating or some information about the system state is
> out of
> >>>>> sync.  How can I confirm this?
> >>>>>
> >>>>
> >>>>
> >>>
> >>
> >
>



-- 
Daan

Re: services not running after reboot

Reply via email to