Well, that was the bit of information that finally got us back up and running.
The original routerVM failed to start due to the template not being there, so I had redownloaded the template according to the directions at https://docs.cloudstack.apache.org/projects/cloudstack-release-notes/en/4.4.1/upgrade/upgrade-4.3.html and that seems to have been the source of the problem. Within the errors that accompanied the failure to start of the random VM I picked there was an attempt to start a VR: 2016-09-19 15:29:48,962 DEBUG [c.c.n.r.VirtualNetworkApplianceManagerImpl] (Work-Job-Executor-4:ctx-21dac544 job-10167/job-10169 ctx-d441bcf6) Lock is acquired for network id 204 as a part of router startup in Dest[Zone(Id)-Pod(Id)-Cluster(Id)-Host(Id)-Storage(Volume(Id|Type-->Pool(Id))] : Dest[Zone(1)-Pod(1)-Cluster(7)-Host(25)-Storage()] 2016-09-19 15:29:48,972 DEBUG [c.c.n.r.VirtualNetworkApplianceManagerImpl] (Work-Job-Executor-4:ctx-21dac544 job-10167/job-10169 ctx-d441bcf6) Adding nic for Virtual Router in Guest network Ntwk[448deced-7223-4549-98bd-5acafc811f05|Guest|6] 2016-09-19 15:29:48,977 DEBUG [c.c.n.r.VirtualNetworkApplianceManagerImpl] (Work-Job-Executor-4:ctx-21dac544 job-10167/job-10169 ctx-d441bcf6) Adding nic for Virtual Router in Control network 2016-09-19 15:29:48,980 DEBUG [o.a.c.e.o.NetworkOrchestrator] (Work-Job-Executor-4:ctx-21dac544 job-10167/job-10169 ctx-d441bcf6) Found existing network configuration for offering [Network Offering [3-Control-System-Control-Network]: Ntwk[28d967d8-3b75-4362-9234-b0d029b0d21b|Control|3] 2016-09-19 15:29:48,980 DEBUG [o.a.c.e.o.NetworkOrchestrator] (Work-Job-Executor-4:ctx-21dac544 job-10167/job-10169 ctx-d441bcf6) Releasing lock for Acct[7500fc58-dcf6-11e2-b492-00219b9585d4-system] 2016-09-19 15:29:48,984 DEBUG [c.c.n.r.VirtualNetworkApplianceManagerImpl] (Work-Job-Executor-4:ctx-21dac544 job-10167/job-10169 ctx-d441bcf6) Allocating the VR i=1703 in datacenter com.cloud.dc.DataCenterVO$$EnhancerByCGLIB$$9732e921@1with the hypervisor type KVM 2016-09-19 15:29:48,988 DEBUG [c.c.n.r.VirtualNetworkApplianceManagerImpl] (Work-Job-Executor-4:ctx-21dac544 job-10167/job-10169 ctx-d441bcf6) KVM won't support system vm, skip it 2016-09-19 15:29:48,989 DEBUG [c.c.n.r.VirtualNetworkApplianceManagerImpl] (Work-Job-Executor-4:ctx-21dac544 job-10167/job-10169 ctx-d441bcf6) Lock is released for network id 204 as a part of router startup in Dest[Zone(Id)-Pod(Id)-Cluster(Id)-Host(Id)-Storage(Volume(Id|Type-->Pool(Id))] : Dest[Zone(1)-Pod(1)-Cluster(7)-Host(25)-Storage()] 2016-09-19 15:29:48,989 INFO [c.c.v.VirtualMachineManagerImpl] (Work-Job-Executor-4:ctx-21dac544 job-10167/job-10169 ctx-d441bcf6) Unable to contact resource. com.cloud.exception.ResourceUnavailableException: Resource [DataCenter:1] is unreachable: Can't find at least one running router! The line ‘KVM won’t support system vm, skip it’ was the key, as googling it led to other cloudstack-users mailing list posts with other people who had failed to add systemvm templates back in. Shutting down the management server, updating the TYPE column in the database for the systemvm to ‘SYSTEM’ instead of ‘USER’, and starting a VM again fixed it, and everything came back up happily. It seems like the documentation on importing systemVMs (for instance, https://docs.cloudstack.apache.org/projects/cloudstack-release-notes/en/4.9.0/upgrade/upgrade-4.4.html) is incorrect, as it says that the template should be imported as ‘Routing: no’. Or is there some other method in which a template is supposed to be set to SYSTEM instead of USER? Thanks! --Mason On 9/19/16, 3:22 PM, "Kirk Kosinski" <kirk.kosin...@shapeblue.com> wrote: Hi, if you start a VM in a network that has no VR, the VR will be recreated. So you can stop/start an existing VM in the network, or deploy a new VM to the network. Best regards, kirk.kosin...@shapeblue.com www.shapeblue.com 53 Chandos Place, Covent Garden, London WC2N 4HS @shapeblue -----Original Message----- From: Mason Donahue [mailto:mdona...@backstopsolutions.com] Sent: Monday, September 19, 2016 9:56 AM To: users@cloudstack.apache.org Subject: Reestablishing a VR when the VR was deleted Hi there, We’re in a bit of a pickle with our Cloudstack 4.4.1 install. (Yes, I know it’s outdated; we were hoping to upgrade soon and then this happened.) We had an issue where the VR for one of our Networks went down, and I mistakenly missed the ‘cleanup’ checkbox in the ‘Restart network’ menu of the UI. We use the VR solely for DNS. We now have no VRs in our setup, and the network can no longer be restarted due to a check that routers are running. Unfortunately, to delete and re-add the network, the log output states that we’d have to expunge all of our machines, which I am hoping to avoid doing. What are my other options? Can I un-mark the router as deleted in the DB, and will that allow it to limp along to the point where it can at least rebuild the network? Thanks, --Mason (relevant logs below) 2016-09-19 11:46:50,890 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-5:ctx-8a42cc36 job-10141) Executing AsyncJobVO {id:10141, userId: 2, accountId: 2, instanceType: None, instanceId: null, cmd: org.apache.cloudstack.api.command.user.network.RestartNetworkCmd, cmdInfo: {"response":"json","id":"448deced-7 223-4549-98bd-5acafc811f05","sessionkey":"985SpsvETZfPpiNyTjNsOgXWVQI\u003d","cleanup":"true","ctxDetails":"{\"com.cloud.network.Network\":\"448deced-7223-4549-98bd-5acafc811f05\"}","cmdEventType":"NETWORK.RESTART","ctxUserId":"2","httpmethod":"GET","_":"1474303610834","uuid":"448deced-7223-4549-98bd-5acafc811f05","c txAccountId":"2","ctxStartEventId":"21061"}, cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0, result: null, initMsid: 144343483243, completeMsid: null, lastUpdated: null, lastPolled: null, created: null} 2016-09-19 11:46:50,891 DEBUG [c.c.a.ApiServlet] (catalina-exec-22:ctx-94ea893d ctx-69ffa947) ===END=== 192.168.42.156 -- GET command=restartNetwork&id=448deced-7223-4549-98bd-5acafc811f05&cleanup=true&response=json&sessionkey=985SpsvETZfPpiNyTjNsOgXWVQI%3D&_=1474303610834 2016-09-19 11:46:50,925 DEBUG [o.a.c.e.o.NetworkOrchestrator] (API-Job-Executor-5:ctx-8a42cc36 job-10141 ctx-6dda2ed3) Restarting network 204... 2016-09-19 11:46:50,925 DEBUG [o.a.c.e.o.NetworkOrchestrator] (API-Job-Executor-5:ctx-8a42cc36 job-10141 ctx-6dda2ed3) Shutting down the network id=204 as a part of network restart 2016-09-19 11:46:50,929 DEBUG [o.a.c.e.o.NetworkOrchestrator] (API-Job-Executor-5:ctx-8a42cc36 job-10141 ctx-6dda2ed3) Releasing 0 port forwarding rules for network id=204 as a part of shutdownNetworkRules 2016-09-19 11:46:50,930 DEBUG [c.c.n.f.FirewallManagerImpl] (API-Job-Executor-5:ctx-8a42cc36 job-10141 ctx-6dda2ed3) There are no rules to forward to the network elements 2016-09-19 11:46:50,932 DEBUG [o.a.c.e.o.NetworkOrchestrator] (API-Job-Executor-5:ctx-8a42cc36 job-10141 ctx-6dda2ed3) Releasing 0 static nat rules for network id=204 as a part of shutdownNetworkRules 2016-09-19 11:46:50,932 DEBUG [c.c.n.f.FirewallManagerImpl] (API-Job-Executor-5:ctx-8a42cc36 job-10141 ctx-6dda2ed3) There are no rules to forward to the network elements 2016-09-19 11:46:50,934 DEBUG [c.c.n.l.LoadBalancingRulesManagerImpl] (API-Job-Executor-5:ctx-8a42cc36 job-10141 ctx-6dda2ed3) Revoking 0 Public load balancing rules for network id=204 2016-09-19 11:46:50,934 DEBUG [c.c.n.l.LoadBalancingRulesManagerImpl] (API-Job-Executor-5:ctx-8a42cc36 job-10141 ctx-6dda2ed3) There are no Load Balancing Rules to forward to the network elements 2016-09-19 11:46:50,936 DEBUG [c.c.n.l.LoadBalancingRulesManagerImpl] (API-Job-Executor-5:ctx-8a42cc36 job-10141 ctx-6dda2ed3) Revoking 0 Internal load balancing rules for network id=204 2016-09-19 11:46:50,936 DEBUG [c.c.n.l.LoadBalancingRulesManagerImpl] (API-Job-Executor-5:ctx-8a42cc36 job-10141 ctx-6dda2ed3) There are no Load Balancing Rules to forward to the network elements 2016-09-19 11:46:50,937 DEBUG [o.a.c.e.o.NetworkOrchestrator] (API-Job-Executor-5:ctx-8a42cc36 job-10141 ctx-6dda2ed3) Releasing 0 firewall ingress rules for network id=204 as a part of shutdownNetworkRules 2016-09-19 11:46:50,937 DEBUG [c.c.n.f.FirewallManagerImpl] (API-Job-Executor-5:ctx-8a42cc36 job-10141 ctx-6dda2ed3) There are no rules to forward to the network elements 2016-09-19 11:46:50,938 DEBUG [o.a.c.e.o.NetworkOrchestrator] (API-Job-Executor-5:ctx-8a42cc36 job-10141 ctx-6dda2ed3) Releasing 0 firewall egress rules for network id=204 as a part of shutdownNetworkRules 2016-09-19 11:46:50,939 DEBUG [c.c.n.f.FirewallManagerImpl] (API-Job-Executor-5:ctx-8a42cc36 job-10141 ctx-6dda2ed3) There are no rules to forward to the network elements 2016-09-19 11:46:50,941 DEBUG [c.c.n.r.RulesManagerImpl] (API-Job-Executor-5:ctx-8a42cc36 job-10141 ctx-6dda2ed3) Found 0 static nat rules to apply for network id 204 2016-09-19 11:46:51,029 DEBUG [o.a.c.e.o.NetworkOrchestrator] (API-Job-Executor-5:ctx-8a42cc36 job-10141 ctx-6dda2ed3) Sending network shutdown to SecurityGroupProvider 2016-09-19 11:46:51,032 DEBUG [o.a.c.e.o.NetworkOrchestrator] (API-Job-Executor-5:ctx-8a42cc36 job-10141 ctx-6dda2ed3) Sending network shutdown to VirtualRouter 2016-09-19 11:46:51,034 DEBUG [o.a.c.e.o.NetworkOrchestrator] (API-Job-Executor-5:ctx-8a42cc36 job-10141 ctx-6dda2ed3) Implementing the network Ntwk[448deced-7223-4549-98bd-5acafc811f05|Guest|6] elements and resources as a part of network restart 2016-09-19 11:46:51,040 DEBUG [o.a.c.e.o.NetworkOrchestrator] (API-Job-Executor-5:ctx-8a42cc36 job-10141 ctx-6dda2ed3) Asking SecurityGroupProvider to implemenet Ntwk[448deced-7223-4549-98bd-5acafc811f05|Guest|6] 2016-09-19 11:46:51,043 DEBUG [o.a.c.e.o.NetworkOrchestrator] (API-Job-Executor-5:ctx-8a42cc36 job-10141 ctx-6dda2ed3) Asking VirtualRouter to implemenet Ntwk[448deced-7223-4549-98bd-5acafc811f05|Guest|6] 2016-09-19 11:46:51,049 DEBUG [c.c.n.r.VirtualNetworkApplianceManagerImpl] (API-Job-Executor-5:ctx-8a42cc36 job-10141 ctx-6dda2ed3) Lock is acquired for network id 204 as a part of router startup in Dest[Zone(Id)-Pod(Id)-Cluster(Id)-Host(Id)-Storage(Volume(Id|Type-->Pool(Id))] : Dest[Zone(1)-Pod(null)-Cluster(null)-H ost(null)-Storage()] 2016-09-19 11:46:51,051 DEBUG [c.c.n.r.VirtualNetworkApplianceManagerImpl] (API-Job-Executor-5:ctx-8a42cc36 job-10141 ctx-6dda2ed3) Lock is released for network id 204 as a part of router startup in Dest[Zone(Id)-Pod(Id)-Cluster(Id)-Host(Id)-Storage(Volume(Id|Type-->Pool(Id))] : Dest[Zone(1)-Pod(null)-Cluster(null)-H ost(null)-Storage()] 2016-09-19 11:46:51,053 WARN [o.a.c.e.o.NetworkOrchestrator] (API-Job-Executor-5:ctx-8a42cc36 job-10141 ctx-6dda2ed3) Failed to implement network Ntwk[448deced-7223-4549-98bd-5acafc811f05|Guest|6] elements and resources as a part of network restart due to com.cloud.exception.ResourceUnavailableException: Resource [DataCenter:1] is unreachable: Can't find all necessary running routers! at com.cloud.network.element.VirtualRouterElement.implement(VirtualRouterElement.java:199) at org.apache.cloudstack.engine.orchestration.NetworkOrchestrator.implementNetworkElementsAndResources(NetworkOrchestrator.java:1080) at org.apache.cloudstack.engine.orchestration.NetworkOrchestrator.restartNetwork(NetworkOrchestrator.java:2430) at com.cloud.network.NetworkServiceImpl.restartNetwork(NetworkServiceImpl.java:1892) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150) at org.apache.cloudstack.network.contrail.management.EventUtils$EventInterceptor.invoke(EventUtils.java:106) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:161) at com.cloud.event.ActionEventInterceptor.invoke(ActionEventInterceptor.java:51) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:161) at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204) at com.sun.proxy.$Proxy156.restartNetwork(Unknown Source) at org.apache.cloudstack.api.command.user.network.RestartNetworkCmd.execute(RestartNetworkCmd.java:95) at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:141) at com.cloud.api.ApiAsyncJobDispatcher.runJob(ApiAsyncJobDispatcher.java:108) at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:503) at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49) at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56) at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103) at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53) at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46) at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:460) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2016-09-19 11:46:51,054 WARN [c.c.n.NetworkServiceImpl] (API-Job-Executor-5:ctx-8a42cc36 job-10141 ctx-6dda2ed3) Network id=204 failed to restart. 2016-09-19 11:46:51,072 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-5:ctx-8a42cc36 job-10141) Complete async job-10141, jobStatus: FAILED, resultCode: 530, result: org.apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[],"errorcode":530,"errortext":"Failed to restart network"} 2016-09-19 11:46:51,087 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-5:ctx-8a42cc36 job-10141) Done executing org.apache.cloudstack.api.command.user.network.RestartNetworkCmd for job-10141 2016-09-19 11:46:51,107 INFO [o.a.c.f.j.i.AsyncJobMonitor] (API-Job-Executor-5:ctx-8a42cc36 job-10141) Remove job-10141 from job monitoring