upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout

Stephan Seitz Tue, 24 Nov 2015 03:31:22 -0800

Hi List!

After upgrading from 4.5.2 to 4.6.0 I faced a problem with one
virtualrouter. This particular VR has about 10 IPs w/ LB and FW rules
defined. During the upgrade process, and after about 4-5 minutes a
watchdog kicks in and kills the respective VR due to no response.


So far I didn't find any timeout value in the global settings.
Temporarily setting network.router.EnableServiceMonitoring to false
doesn't change the behaviour.

Any help, how to mitigate that nasty timeout would be really
appreciated :)

cheers,

Stephan 

>From within the VR, the logs show

2015-11-24 11:24:33,807  CsFile.py search:123 Searching for
dhcp-range=interface:eth0,set:interface and replacing with
dhcp-range=interface:eth0,set:interface-eth0,10.10.22.1,static
2015-11-24 11:24:33,808  merge.py load:56 Creating data bag type
guestnetwork
2015-11-24 11:24:33,808  CsFile.py search:123 Searching for
dhcp-option=tag:interface-eth0,15 and replacing with
dhcp-option=tag:interface-eth0,15,heinlein.cloudservice
2015-11-24 11:24:33,808  CsFile.py search:123 Searching for
dhcp-option=tag:interface-eth0,6 and replacing with
dhcp-option=tag:interface-eth0,6,10.10.22.1,195.10.208.2,91.198.250.2
2015-11-24 11:24:33,809  CsFile.py search:123 Searching for
dhcp-option=tag:interface-eth0,3, and replacing with
dhcp-option=tag:interface-eth0,3,10.10.22.1
2015-11-24 11:24:33,809  CsFile.py search:123 Searching for
dhcp-option=tag:interface-eth0,1, and replacing with
dhcp-option=tag:interface-eth0,1,255.255.255.0
2015-11-24 11:24:33,810  CsHelper.py execute:160 Executing: service
dnsmasq restart

==> /var/log/messages <==
Nov 24 11:24:34 r-504-VM shutdown[6752]: shutting down for system halt

Broadcast message from root@r-504-VM (Tue Nov 24 11:24:34 2015):

The system is going down for system halt NOW!
Nov 24 11:24:35 r-504-VM KVP: KVP starting; pid is:6844

==> /var/log/cloud.log <==
/opt/cloud/bin/vr_cfg.sh: line 60:  6603
Killed                  /opt/cloud/bin/update_config.py
vm_dhcp_entry.json

==> /var/log/messages <==
Nov 24 11:24:35 r-504-VM cloud: VR config: executing
failed: /opt/cloud/bin/update_config.py vm_dhcp_entry.json

==> /var/log/cloud.log <==
Tue Nov 24 11:24:35 UTC 2015 : VR config: executing
failed: /opt/cloud/bin/update_config.py vm_dhcp_entry.json
Connection to 169.254.2.192 closed by remote host.
Connection to 169.254.2.192 closed.


the management-server.log shows

2015-11-24 12:24:43,015 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
(Work-Job-Executor-1:ctx-ad9e4658 job-5163/job-5164) Done executing
com.cloud.vm.VmWorkStart for job-5164
2015-11-24 12:24:43,017 INFO  [o.a.c.f.j.i.AsyncJobMonitor]
(Work-Job-Executor-1:ctx-ad9e4658 job-5163/job-5164) Remove job-5164
from job monitoring
2015-11-24 12:24:43,114 ERROR [c.c.a.ApiAsyncJobDispatcher]
(API-Job-Executor-1:ctx-760da779 job-5163) Unexpected exception while
executing org.apache.cloudstack.api.command.admin.
router.StartRouterCmd
com.cloud.exception.AgentUnavailableException: Resource [Host:1] is
unreachable: Host 1: Unable to start instance due to Unable to start
VM[DomainRouter|r-504-VM] due to error in f
inalizeStart, not retrying
        at
com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:1121)
        at
com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:4580)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at
com.cloud.vm.VmWorkJobHandlerProxy.handleVmWorkJob(VmWorkJobHandlerProxy.java:107)
        at
com.cloud.vm.VirtualMachineManagerImpl.handleVmWorkJob(VirtualMachineManagerImpl.java:4736)
        at
com.cloud.vm.VmWorkJobDispatcher.runJob(VmWorkJobDispatcher.java:102)
        at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl
$5.runInContext(AsyncJobManagerImpl.java:537)
        at org.apache.cloudstack.managed.context.ManagedContextRunnable
$1.run(ManagedContextRunnable.java:49)
        at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext
$1.call(DefaultManagedContext.java:56)
        at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
        at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
        at
org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
        at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl
$5.run(AsyncJobManagerImpl.java:494)
        at java.util.concurrent.Executors
$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: com.cloud.utils.exception.ExecutionException: Unable to start
VM[DomainRouter|r-504-VM] due to error in finalizeStart, not retrying
        at
com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:1085)
        at
com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:4580)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        ... 18 more
2015-11-24 12:24:43,115 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
(API-Job-Executor-1:ctx-760da779 job-5163) Complete async job-5163,
jobStatus: FAILED, resultCode: 530, result: org.
apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[],"errorcode":530,"errortext":"Resource
 [Host:1] is unreachable: Host 1: Unable to start instance due to Unable t
o start VM[DomainRouter|r-504-VM] due to error in finalizeStart, not
retrying"}

upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout

Reply via email to