I dropped all the cloud* databases, deleted everything in primary and
secondary storage, and reinstalled the management server, following the
guide I wrote for myself the last time I built a stable CloudStack system.
Then I imported one of my backed up instances as a template and tried to
create a new VM.  Same problem as before.  How is this possible?

2014-10-10 19:17:44,075 WARN  [kvm.resource.LibvirtComputingResource]
(agentRequest-Handler-3:null) Timed out:
/usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/patchviasocket.pl -n
r-4-VM -p
%template=domP%name=r-4-VM%eth0ip=192.168.102.222%eth0mask=255.255.255.0%gateway=192.168.102.1%domain=
lax.ratespecial.com%cidrsize=24%dhcprange=192.168.102.1%eth1ip=169.254.0.33%eth1mask=255.255.0.0%type=dhcpsrvr%disable_rp_filter=true%dns1=192.168.100.2%dns2=192.168.100.3
.  Output is:
2014-10-10 19:18:05,078 WARN  [kvm.resource.LibvirtComputingResource]
(Script-3:null) Interrupting script.

On Fri, Oct 10, 2014 at 4:33 PM, Ian Young <iyo...@ratespecial.com> wrote:

> I've restarted all the services and restarted the servers too.  The SSVM
> and CP start with no trouble.  Every time I try to start or create an
> instance, I see repeated messages like these:
>
> /var/log/cloudstack/agent/cloudstack-agent.out:
> 2014-10-10 16:27:21,841{GMT} WARN  [kvm.resource.LibvirtComputingResource]
> (Script-8:) Interrupting script.
> 2014-10-10 16:27:21,841{GMT} WARN  [kvm.resource.LibvirtComputingResource]
> (agentRequest-Handler-4:) Timed out:
> /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/patchviasocket.pl
> -n r-19-VM -p
> %template=domP%name=r-19-VM%eth0ip=192.168.102.89%eth0mask=255.255.255.0%gateway=192.168.102.1%domain=
> lax.ratespecial.com%cidrsize=24%dhcprange=192.168.102.1%eth1ip=169.254.2.193%eth1mask=255.255.0.0%type=dhcpsrvr%disable_rp_filter=true%dns1=192.168.100.2%dns2=192.168.100.3
> .  Output is:
>
> /var/log/cloudstack/agent/security_group.log:
> 2014-10-10 16:27:33,259 - Failed to get rule logs, better luck next time!
>
> On Fri, Oct 10, 2014 at 3:04 PM, Ian Young <iyo...@ratespecial.com> wrote:
>
>> I tried to restart the network with the "clean up" option, via the web
>> console.  After several minutes, it failed to restart the network.  The
>> SSVM and CP are still running but the VR no longer exists.  Why would these
>> be able to start but not the virtual router?
>>
>> On Fri, Oct 10, 2014 at 2:48 PM, Ian Young <iyo...@ratespecial.com>
>> wrote:
>>
>>> I restarted the libvirtd service and the management service is now fully
>>> started (there are services listening on ports 8250 and 9090).  The SSVM
>>> health check script now reports no problems.
>>>
>>> However, I tried starting an instance and both the instance and the
>>> virtual router are in a "starting" state but have been so for almost 10
>>> minutes.  In the catalina.out log I see:
>>> INFO  [c.c.v.VirtualMachineManagerImpl] (AgentManager-Handler-10:null)
>>> There is pending job or HA tasks working on the VM. vm id: 4, postpone
>>> power-change report by resetting power-change counters
>>> INFO  [c.c.v.VirtualMachineManagerImpl] (AgentManager-Handler-10:null)
>>> There is pending job or HA tasks working on the VM. vm id: 13, postpone
>>> power-change report by resetting power-change counters
>>>
>>> I'm also seeing this in the agent.log:
>>> 2014-10-10 14:43:26,833 WARN  [kvm.resource.LibvirtComputingResource]
>>> (Script-6:null) Interrupting script.
>>> 2014-10-10 14:43:26,833 WARN  [kvm.resource.LibvirtComputingResource]
>>> (agentRequest-Handler-2:null) Timed out:
>>> /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/patchviasocket.pl
>>> -n r-4-VM -p
>>> %template=domP%name=r-4-VM%eth0ip=192.168.102.110%eth0mask=255.255.255.0%gateway=192.168.102.1%domain=
>>> lax.ratespecial.com%cidrsize=24%dhcprange=192.168.102.1%eth1ip=169.254.2.181%eth1mask=255.255.0.0%type=dhcpsrvr%disable_rp_filter=true%dns1=192.168.100.2%dns2=192.168.100.3
>>> .  Output is:
>>>
>>> And in the security_group.log:
>>> 2014-10-10 14:42:41,926 - Failed to get rule logs, better luck next time!
>>> 2014-10-10 14:43:41,926 - Failed to get rule logs, better luck next time!
>>>
>>> What does this mean?
>>>
>>> On Fri, Oct 10, 2014 at 2:11 PM, Ian Young <iyo...@ratespecial.com>
>>> wrote:
>>>
>>>> This morning I was unable to start new instances.  I discovered that I
>>>> could SSH into the SSVM and the console proxy but not the virtual router.
>>>> Something strange was happening so I thought it might be a good time to
>>>> gracefully stop all the instances and reboot the hypervisor to see if the
>>>> VR would start working again.  I also rebooted the management server (a
>>>> separate machine) to have a clean slate.  Now that they've both been
>>>> rebooted, the following symptoms exist:
>>>>
>>>> * On the management server, there is no services listening on 9090 or
>>>> 8250.
>>>> * When I run the SSVM health check script, it says NFS is not currently
>>>> mounted.
>>>> * The management server log is reporting that Zone 1 is not ready to
>>>> launch SSVM/CP yet, even though both of those are running.
>>>>
>>>> The NFS server is running just fine.  I can mount it in the management
>>>> server with no problems.  I've restarted cloudstack-management and
>>>> cloudstack-agent but the problems persist.  The "not ready to launch
>>>> SSVM/CP yet" messages sounds like the management server and the hypervisor
>>>> are not communicating or some information about the system state is out of
>>>> sync.  How can I confirm this?
>>>>
>>>
>>>
>>
>

Reply via email to