Hi ilya, Funny you brought up debugging the router VM. After I responding yesterday, I did just that and I did find some odd things. Just to be clear (I think we're on the same page), since I'm not the OP of this thread, the virtual router always gets deployed and it starts up just fine; however, CloudStack reports that it's always stuck in starting. VMs that get deployed ultimately fail. CloudStack reports the router version as UNKNOWN. Before I provide what I found debugging the router VM, I'll address some of your points.
### FOLLOW-UP QUESTIONS ### " Another reason would be an issue of hypervisor accessing the NFS mount used for secondary storage." I don't believe this is an issue. The hypervisor (VMware) does mount the secondary storage via NFS just fine. If this were an issue, I would think the Secondary Storage and Console VMs would not deploy. " Use console of vCenter to see what is happening on router vm. You can login locally with root/password and see the content of /var/log/cloud.out file, paste it on pastebin - if it makes no sense to you..." It looks like to me that /var/log/cloud.out is only logged to when $CLOUD_DEBUG is set to a non-zero length in the /etc/init.d/cloud script. As such, there isn't even a file for /var/log/cloud.out. Even when I set that variable, I never get anything logged to /var/log/cloud.out. However, there is a /var/log/cloud.log. Here is the contents of that: http://pastebin.com/aaTsRKZE " you can also run /etc/init.d/cloud stop and start.. that will give you a fresh start on logs.." The service is in a failed state. It's worth noting that this service is in a started state on the Console and Secondary Storage VMs. " also, confirm that management server can talk to VR on POD IP (management) on port 3922.." It appears this is not an issue; see below: root@r-4-VM:~# telnet 10.70.110.101 8250 Trying 10.70.110.101... Connected to 10.70.110.101. Escape character is '^]'. ### ROUTE VM DEBUG ### Here is what I found with router VM gets deployed (please tell me if anything seems off): 2 NICs; only one NIC gets an IP address. CloudStack NIC1 shows an IP address coming from the defaultGuestNetwork. NIC2 is traffic type Control but has an IP address of 0.0.0.0 >From the CloudStack management server, I cannot SSH into the router VM on >NIC1. I've found this is because of iptables rules on the router VM. If I >issue a /etc/init.d/iptables-persistent flush on the router VM, I can SSH into >the router VM using the SSH key at port 3922. The service "cloud" is in a failed state. Looking at the cloud init script, I see the following: CMDLINE=$(cat /var/cache/cloud/cmdline) TYPE="router" for i in $CMDLINE do # search for foo=bar pattern and cut out foo FIRSTPATTERN=$(echo $i | cut -d= -f1) case $FIRSTPATTERN in type) TYPE=$(echo $i | cut -d= -f2) ;; esac done The file cat /var/cache/cloud/cmdline exist; here are the contents: template=domP name=r-4-VM eth0ip=10.70.116.75 eth0mask=255.255.255.0 gateway=10.70.116.1 domain=vit.vertitechit.com cidrsize=24 dhcprange=10.70.116.1 eth1ip=0.0.0.0 eth1mask=0.0.0.0 mgmtcidr=10.70.110.0/24 localgw=10.70.116.1 sshonguest=true type=dhcpsrvr disable_rp_filter=true extra_pubnics=2 dns1=10.70.10.21 baremetalnotificationsecuritykey=nu1HfF_DpC-gK-G_3y1u54Snb9ruROq-qldOvhnHj4EMypguvtfQu0o18eY3gs81iPZMD2Du1QOUAG5KOfMYXQ baremetalnotificationapikey=CKZoOXffpY5ihjvzly3yD_2t2qaDnFglYFDoeep37aH1qy5u67aX51ZsuZpZcphfOxJY52rkTlNOl0nkNSyXjQ host=10.70.110.101 port=8080 nic_macs=06:b1:2e:00:00:10|02:00:14:42:00:03 The previous code suggests that the value of TYPE starts as router but will get set to dhcpsrvr, as indicated by the contents of /var/cache/cloud/cmdline. Is this normal? Further down the script, I see: CLOUDSTACK_HOME="/usr/local/cloud" <----------------------------------------Exists if [ -f $CLOUDSTACK_HOME/systemvm/utils.sh ]; <----------------------------------------Does not exist. Seems odd! then . $CLOUDSTACK_HOME/systemvm/utils.sh else _failure fi # mkdir -p /var/log/vmops start() { local pid=$(get_pids) if [ "$pid" != "" ]; then echo "CloudStack cloud sevice is already running, PID = $pid" return 0 fi echo -n "Starting CloudStack cloud service (type=$TYPE) " if [ -f $CLOUDSTACK_HOME/systemvm/run.sh ]; <------------------------------------------------------Does not exist. Seems odd! then if [ "$pid" == "" ] then (cd $CLOUDSTACK_HOME/systemvm; nohup ./run.sh > $LOG_FILE 2>&1 & ) pid=$(get_pids) echo $pid > /var/run/cloud.pid fi _success else _failure fi echo echo 'start' > $CLOUDSTACK_HOME/systemvm/user_request } I see that it sets CLOUDSTACK_HOME to /usr/local/cloud. This folder exists; however, the script then looks for the file /usr/local/cloud/systemvm/utils.sh. This file doesn't exist. It also looks is supposed to start the script run.sh but that also doesn't exist. This seems like a problem to me. Here you can see step through when I try to start the cloud service: sh -x /etc/init.d/cloud start + ENABLED=0 + [ -e /etc/default/cloud ] + . /etc/default/cloud + ENABLED=0 + cat /var/cache/cloud/cmdline + CMDLINE= template=domP name=r-4-VM eth0ip=10.70.116.75 eth0mask=255.255.255.0 gateway=10.70.116.1 domain=vit.vertitechit.com cidrsize=24 dhcprange=10.70.116.1 eth1ip=0.0.0.0 eth1mask=0.0.0.0 mgmtcidr=10.70.110.0/24 localgw=10.70.116.1 sshonguest=true type=dhcpsrvr disable_rp_filter=true extra_pubnics=2 dns1=10.70.10.21 baremetalnotificationsecuritykey=nu1HfF_DpC-gK-G_3y1u54Snb9ruROq-qldOvhnHj4EMypguvtfQu0o18eY3gs81iPZMD2Du1QOUAG5KOfMYXQ baremetalnotificationapikey=CKZoOXffpY5ihjvzly3yD_2t2qaDnFglYFDoeep37aH1qy5u67aX51ZsuZpZcphfOxJY52rkTlNOl0nkNSyXjQ host=10.70.110.101 port=8080 nic_macs=06:b1:2e:00:00:10|02:00:14:42:00:03 + [ ! -z ] + LOG_FILE=/dev/null + TYPE=router + cut -d= -f1 + echo template=domP + FIRSTPATTERN=template + cut -d= -f1 + echo name=r-4-VM + FIRSTPATTERN=name + cut -d= -f1 + echo eth0ip=10.70.116.75 + FIRSTPATTERN=eth0ip + cut -d= -f1 + echo eth0mask=255.255.255.0 + FIRSTPATTERN=eth0mask + cut -d= -f1 + echo gateway=10.70.116.1 + FIRSTPATTERN=gateway + cut -d= -f1 + echo domain=vit.vertitechit.com + FIRSTPATTERN=domain + cut -d= -f1 + echo cidrsize=24 + FIRSTPATTERN=cidrsize + cut -d= -f1 + echo dhcprange=10.70.116.1 + FIRSTPATTERN=dhcprange + cut -d= -f1 + echo eth1ip=0.0.0.0 + FIRSTPATTERN=eth1ip + cut -d= -f1 + echo eth1mask=0.0.0.0 + FIRSTPATTERN=eth1mask + cut -d= -f1 + echo mgmtcidr=10.70.110.0/24 + FIRSTPATTERN=mgmtcidr + cut -d= -f1 + echo localgw=10.70.116.1 + FIRSTPATTERN=localgw + cut -d= -f1 + echo sshonguest=true + FIRSTPATTERN=sshonguest + cut -d= -f1 + echo type=dhcpsrvr + FIRSTPATTERN=type + cut -d= -f2 + echo type=dhcpsrvr + TYPE=dhcpsrvr + cut -d= -f1 + echo disable_rp_filter=true + FIRSTPATTERN=disable_rp_filter + cut -d= -f1 + echo extra_pubnics=2 + FIRSTPATTERN=extra_pubnics + cut -d= -f1 + echo dns1=10.70.10.21 + FIRSTPATTERN=dns1 + cut -d= -f1 + echo baremetalnotificationsecuritykey=nu1HfF_DpC-gK-G_3y1u54Snb9ruROq-qldOvhnHj4EMypguvtfQu0o18eY3gs81iPZMD2Du1QOUAG5KOfMYXQ + FIRSTPATTERN=baremetalnotificationsecuritykey + cut -d= -f1 + echo baremetalnotificationapikey=CKZoOXffpY5ihjvzly3yD_2t2qaDnFglYFDoeep37aH1qy5u67aX51ZsuZpZcphfOxJY52rkTlNOl0nkNSyXjQ + FIRSTPATTERN=baremetalnotificationapikey + cut -d= -f1 + echo host=10.70.110.101 + FIRSTPATTERN=host + cut -d= -f1 + echo port=8080 + FIRSTPATTERN=port + cut -d= -f1 + echo nic_macs=06:b1:2e:00:00:10|02:00:14:42:00:03 + FIRSTPATTERN=nic_macs + [ -f /etc/init.d/functions ] + [ -f ./lib/lsb/init-functions ] + RETVAL=0 + CLOUDSTACK_HOME=/usr/local/cloud + [ -f /usr/local/cloud/systemvm/utils.sh ] + _failure + [ -f /etc/init.d/functions ] + echo Failed Failed + [ 0 != 0 ] + exit 0 Thoughts? Jacob Seeley Sr. Infrastructure Engineer VertitechIT 413-268-1631 www.vertitechit.com -----Original Message----- From: ilya [mailto:ilya.mailing.li...@gmail.com] Sent: Wednesday, July 27, 2016 8:43 PM To: users@cloudstack.apache.org Subject: Re: CS 4.8 VMware - Virtual Router stuck at starting Hi Jacob I gave this a second read - if your issue is Router VM in starting mode - but not started - it means cloudstack agent on routerVM cannot talk to management server on 8250 over POD network. Another reason would be an issue of hypervisor accessing the NFS mount used for secondary storage. Use console of vCenter to see what is happening on router vm. You can login locally with root/password and see the content of /var/log/cloud.out file, paste it on pastebin - if it makes no sense to you... you can also run /etc/init.d/cloud stop and start.. that will give you a fresh start on logs.. also, confirm that management server can talk to VR on POD IP (management) on port 3922.. Regards ilya On 7/27/16 9:34 AM, Jacob Seeley wrote: > ilya, > > Here are the contents of the secondary storage: > > . > ./template > ./template/tmpl > ./template/tmpl/1 > ./template/tmpl/1/8 > ./template/tmpl/1/8/49a4c4ee-ef06-4474-92c3-1d8efb082266.ova > ./template/tmpl/1/8/template.properties > ./template/tmpl/1/8/systemvm64template-4.6.0-RC20151104T1522-4.6.0-vmw > are.ovf > ./template/tmpl/1/8/systemvm64template-4.6.0-RC20151104T1522-4.6.0-vmw > are-disk3.vmdk > ./template/tmpl/1/7 > ./template/tmpl/1/7/template.properties > ./template/tmpl/1/7/0098d168-4985-3b33-9840-eb5848d2f385.ova > ./template/tmpl/1/7/CentOS5.3-x86_64.ovf > ./template/tmpl/1/7/CentOS5.3-x86_64-disk1.vmdk > ./template/tmpl/1/7/CentOS5.3-x86_64.mf > ./systemvm > ./systemvm/systemvm-4.8.0.1.iso > ./systemvm/.lck-bf162a0100000000 > ./snapshots > ./volumes > > I've noticed that both the Secondary Storage VM and Console Proxy VM mount > this ISO and as stated before, they come up just fine. > > Regards, > > Jacob Seeley > Sr. Infrastructure Engineer > VertitechIT > 413-268-1631 > > www.vertitechit.com > > -----Original Message----- > From: ilya [mailto:ilya.mailing.li...@gmail.com] > Sent: Wednesday, July 27, 2016 3:22 AM > To: users@cloudstack.apache.org > Subject: Re: CS 4.8 VMware - Virtual Router stuck at starting > > Jacob > > The upgrade usually occurs though systemvm.iso - that is generated by > cloudstack on the first start. > > Please show the content of your secondary store specifically > > /mnt/[secondary-storage]/systemvm > > Regards > ilya > > On 7/25/16 11:19 AM, Jacob Seeley wrote: >> Here is a pastebin snippet the management-server.log - >> http://pastebin.com/GCLm53Gz >> >> Hopefully the relevant data is in there. >> >> I made sure to start from scratch for this example. Everything from the >> vSphere ESXi to the vCenter to the CentOS 7 with CloudStack install is >> fresh. I deployed a new instance in CloudStack, a VM internally named >> i-2-3-VM with an IP address of 192.168.0.78. This prompted CloudStack to >> deploy a VR. The VR is called r-4-VM with an IP address of 192.168.0.79. >> >> Thank you, >> >> Jacob Seeley >> Sr. Infrastructure Engineer >> VertitechIT >> 413-268-1631 >> >> www.vertitechit.com >> >> -----Original Message----- >> From: Suresh Sadhu [mailto:suresh.sa...@accelerite.com] >> Sent: Monday, July 25, 2016 1:37 AM >> To: users@cloudstack.apache.org >> Subject: Re: CS 4.8 VMware - Virtual Router stuck at starting >> >> please upload the logs in the issue. >>> On Jul 5, 2016, at 8:46 AM, Darren Tang <darrentang...@gmail.com> wrote: >>> >>> https://issues.apache.org/jira/browse/CLOUDSTACK-9144 >>> >>> 2016-07-04 19:41 GMT+08:00 Glenn Wagner <glenn.wag...@shapeblue.com>: >>> >>>> Hi, >>>> >>>> What template are you using to start your first VM? - the default >>>> vmware template? >>>> If you look in vcenter , what does the console show you ? >>>> >>>> >>>> Glenn >>>> >>>> >>>> >>>> glenn.wag...@shapeblue.com >>>> www.shapeblue.com >>>> 2nd Floor, Oudehuis Centre, 122 Main Rd, Somerset West, Cape Town >>>> 7130South Africa @shapeblue >>>> >>>> >>>> >>>> >>>> -----Original Message----- >>>> From: Pascal R. [mailto:repa...@gmail.com] >>>> Sent: Monday, 04 July 2016 1:26 PM >>>> To: users@cloudstack.apache.org >>>> Subject: CS 4.8 VMware - Virtual Router stuck at starting >>>> >>>> hi, >>>> >>>> we have a CS4.8 deployment with VMWare 5.5. >>>> >>>> When trying to launch the first VM, the VS is created. VS starts >>>> up, but in CS, it stuck with "starting" state. >>>> >>>> i can't find any usefull information in the logs. >>>> >>>> any hint? >>>> >> >> >> >> >> DISCLAIMER >> ========== >> This e-mail may contain privileged and confidential information which is the >> property of Accelerite, a Persistent Systems business. It is intended only >> for the use of the individual or entity to which it is addressed. If you are >> not the intended recipient, you are not authorized to read, retain, copy, >> print, distribute or use this message. If you have received this >> communication in error, please notify the sender and delete all copies of >> this message. Accelerite, a Persistent Systems business does not accept any >> liability for virus infected mails. >>