Re: Router VM: patchviasocket.py timeout issue on 1 out of 4 networks
01:38:38 r-686-VM cloud: VR config: create file success > > Dec 20 01:38:38 r-686-VM cloud: VR config: executing: > > /opt/cloud/bin/update_config.py vm_dhcp_entry.json > > Dec 20 01:39:01 r-686-VM cloud: VR config: execution success > > Dec 20 01:39:01 r-686-VM cloud: VR config: creating file: > > /var/cache/cloud/vm_metadata.json > > Dec 20 01:39:01 r-686-VM cloud: VR config: create file success > > Dec 20 01:39:01 r-686-VM cloud: VR config: executing: > > /opt/cloud/bin/update_config.py vm_metadata.json > > Dec 20 01:39:21 r-686-VM cloud: VR config: execution success > > Dec 20 01:39:21 r-686-VM cloud: VR config: creating file: > > /var/cache/cloud/vm_metadata.json > > Dec 20 01:39:21 r-686-VM cloud: VR config: create file success > > Dec 20 01:39:21 r-686-VM cloud: VR config: executing: > > /opt/cloud/bin/update_config.py vm_metadata.json > > Dec 20 01:39:41 r-686-VM cloud: VR config: execution success > > Dec 20 01:39:41 r-686-VM cloud: VR config: Flushing conntrack table > > Dec 20 01:39:41 r-686-VM cloud: VR config: Flushing conntrack table > completed > > Dec 20 01:39:42 r-686-VM cloud: VR config: configuation format version > 1.0 > > Dec 20 01:39:42 r-686-VM cloud: VR config: Flushing conntrack table > > Dec 20 01:39:42 r-686-VM cloud: VR config: Flushing conntrack table > completed > > > > 2. Non-working network router VM ( http://pastebin.com/jzfGMGQB ):- > > . > > > > Dec 20 01:44:21 r-687-VM cloud: Boot up process done > > Dec 20 01:44:22 r-687-VM cloud: VR config: configuation format version > 1.0 > > Dec 20 01:44:22 r-687-VM cloud: VR config: creating file: > > /var/cache/cloud/monitor_service.json > > Dec 20 01:44:22 r-687-VM cloud: VR config: create file success > > Dec 20 01:44:22 r-687-VM cloud: VR config: executing: > > /opt/cloud/bin/update_config.py monitor_service.json > > Dec 20 01:44:42 r-687-VM cloud: VR config: execution success > > Dec 20 01:44:42 r-687-VM cloud: VR config: creating file: > > /var/cache/cloud/vm_dhcp_entry.json > > Dec 20 01:44:42 r-687-VM cloud: VR config: create file success > > Dec 20 01:44:42 r-687-VM cloud: VR config: executing: > > /opt/cloud/bin/update_config.py vm_dhcp_entry.json > > Dec 20 01:45:05 r-687-VM cloud: VR config: execution success > > Dec 20 01:45:05 r-687-VM cloud: VR config: creating file: > > /var/cache/cloud/vm_dhcp_entry.json > > Dec 20 01:45:05 r-687-VM cloud: VR config: create file success > > Dec 20 01:45:05 r-687-VM cloud: VR config: executing: > > /opt/cloud/bin/update_config.py vm_dhcp_entry.json > > Dec 20 01:45:27 r-687-VM cloud: VR config: execution success > > Dec 20 01:45:27 r-687-VM cloud: VR config: creating file: > > /var/cache/cloud/vm_dhcp_entry.json > > Dec 20 01:45:27 r-687-VM cloud: VR config: create file success > > Dec 20 01:45:27 r-687-VM cloud: VR config: executing: > > /opt/cloud/bin/update_config.py vm_dhcp_entry.json > > Dec 20 01:45:49 r-687-VM cloud: VR config: execution success > > Dec 20 01:45:49 r-687-VM cloud: VR config: creating file: > > /var/cache/cloud/vm_dhcp_entry.json > > Dec 20 01:45:49 r-687-VM cloud: VR config: create file success > > Dec 20 01:45:49 r-687-VM cloud: VR config: executing: > > /opt/cloud/bin/update_config.py vm_dhcp_entry.json > > Dec 20 01:46:12 r-687-VM cloud: VR config: execution success > > Dec 20 01:46:12 r-687-VM cloud: VR config: creating file: > > /var/cache/cloud/vm_dhcp_entry.json > > Dec 20 01:46:12 r-687-VM cloud: VR config: create file success > > Dec 20 01:46:12 r-687-VM cloud: VR config: executing: > > /opt/cloud/bin/update_config.py vm_dhcp_entry.json > > Dec 20 01:46:22 r-687-VM shutdown[3919]: shutting down for system halt > > > > Broadcast message from root@r-687-VM (Tue Dec 20 01:46:22 2016): > > > > The system is going down for system halt NOW! > > Dec 20 01:46:22 r-687-VM shutdown[3962]: shutting down for system halt > > > > Broadcast message from root@r-687-VM (Tue Dec 20 01:46:22 2016): > > > > Power button pressed > > The system is going down for system halt NOW! > > Dec 20 01:46:23 r-687-VM KVP: KVP starting; pid is:4037 > > Dec 20 01:46:23 r-687-VM cloud: VR config: executing failed: > > /opt/cloud/bin/update_config.py vm_dhcp_entry.json > > debug1: channel 0: free: client-session, nchannels 1 > > Connection to 169.254.0.197 closed by remote host. > > Connection to 169.254.0.197 closed. > > Transferred: sent 4336, received 93744 bytes, in 180.3 seconds > > Bytes per second: sent 24.0, received 519.8 > > debug1: Exit status -1 > > > > Looks like t
Re: Router VM: patchviasocket.py timeout issue on 1 out of 4 networks
1 r-686-VM cloud: VR config: execution success > Dec 20 01:39:41 r-686-VM cloud: VR config: Flushing conntrack table > Dec 20 01:39:41 r-686-VM cloud: VR config: Flushing conntrack table completed > Dec 20 01:39:42 r-686-VM cloud: VR config: configuation format version 1.0 > Dec 20 01:39:42 r-686-VM cloud: VR config: Flushing conntrack table > Dec 20 01:39:42 r-686-VM cloud: VR config: Flushing conntrack table completed > > 2. Non-working network router VM ( http://pastebin.com/jzfGMGQB ):- > . > > Dec 20 01:44:21 r-687-VM cloud: Boot up process done > Dec 20 01:44:22 r-687-VM cloud: VR config: configuation format version 1.0 > Dec 20 01:44:22 r-687-VM cloud: VR config: creating file: > /var/cache/cloud/monitor_service.json > Dec 20 01:44:22 r-687-VM cloud: VR config: create file success > Dec 20 01:44:22 r-687-VM cloud: VR config: executing: > /opt/cloud/bin/update_config.py monitor_service.json > Dec 20 01:44:42 r-687-VM cloud: VR config: execution success > Dec 20 01:44:42 r-687-VM cloud: VR config: creating file: > /var/cache/cloud/vm_dhcp_entry.json > Dec 20 01:44:42 r-687-VM cloud: VR config: create file success > Dec 20 01:44:42 r-687-VM cloud: VR config: executing: > /opt/cloud/bin/update_config.py vm_dhcp_entry.json > Dec 20 01:45:05 r-687-VM cloud: VR config: execution success > Dec 20 01:45:05 r-687-VM cloud: VR config: creating file: > /var/cache/cloud/vm_dhcp_entry.json > Dec 20 01:45:05 r-687-VM cloud: VR config: create file success > Dec 20 01:45:05 r-687-VM cloud: VR config: executing: > /opt/cloud/bin/update_config.py vm_dhcp_entry.json > Dec 20 01:45:27 r-687-VM cloud: VR config: execution success > Dec 20 01:45:27 r-687-VM cloud: VR config: creating file: > /var/cache/cloud/vm_dhcp_entry.json > Dec 20 01:45:27 r-687-VM cloud: VR config: create file success > Dec 20 01:45:27 r-687-VM cloud: VR config: executing: > /opt/cloud/bin/update_config.py vm_dhcp_entry.json > Dec 20 01:45:49 r-687-VM cloud: VR config: execution success > Dec 20 01:45:49 r-687-VM cloud: VR config: creating file: > /var/cache/cloud/vm_dhcp_entry.json > Dec 20 01:45:49 r-687-VM cloud: VR config: create file success > Dec 20 01:45:49 r-687-VM cloud: VR config: executing: > /opt/cloud/bin/update_config.py vm_dhcp_entry.json > Dec 20 01:46:12 r-687-VM cloud: VR config: execution success > Dec 20 01:46:12 r-687-VM cloud: VR config: creating file: > /var/cache/cloud/vm_dhcp_entry.json > Dec 20 01:46:12 r-687-VM cloud: VR config: create file success > Dec 20 01:46:12 r-687-VM cloud: VR config: executing: > /opt/cloud/bin/update_config.py vm_dhcp_entry.json > Dec 20 01:46:22 r-687-VM shutdown[3919]: shutting down for system halt > > Broadcast message from root@r-687-VM (Tue Dec 20 01:46:22 2016): > > The system is going down for system halt NOW! > Dec 20 01:46:22 r-687-VM shutdown[3962]: shutting down for system halt > > Broadcast message from root@r-687-VM (Tue Dec 20 01:46:22 2016): > > Power button pressed > The system is going down for system halt NOW! > Dec 20 01:46:23 r-687-VM KVP: KVP starting; pid is:4037 > Dec 20 01:46:23 r-687-VM cloud: VR config: executing failed: > /opt/cloud/bin/update_config.py vm_dhcp_entry.json > debug1: channel 0: free: client-session, nchannels 1 > Connection to 169.254.0.197 closed by remote host. > Connection to 169.254.0.197 closed. > Transferred: sent 4336, received 93744 bytes, in 180.3 seconds > Bytes per second: sent 24.0, received 519.8 > debug1: Exit status -1 > > Looks like the config script didn't get past vm_dhcp_entry.json ? > > Thanks. > >> >> >> >> >> From: Syahrul Sazli Shaharir <sa...@pulasan.my> >> Sent: Monday, December 19, 2016 2:09 AM >> To: users@cloudstack.apache.org >> Subject: Re: Router VM: patchviasocket.py timeout issue on 1 out of 4 >> networks >> >> On Tue, Dec 13, 2016 at 7:26 PM, Syahrul Sazli Shaharir >> <sa...@pulasan.my> wrote: >>> Hi Simon, >>> >>> On Tue, Dec 13, 2016 at 10:31 AM, Simon Weller <swel...@ena.com> wrote: >>>> Can you turn on agent debug mode and take a look at the debug level logs? >>>> >>>> >>>> You can do that by running sed -i 's/INFO/DEBUG/g' >>>> /etc/cloudstack/agent/log4j-cloud.xml on the host and then restarting the >>>> agent. >>> >>> Here are the debug logs - patchviasocket.py executed OK but couldn't >>> connect to the router VM's internal IP:- >>> >>> 2016-12-13 19:23:18,627 DEBUG [kvm.resource.LibvirtComputingResource] >>> (agentRequest-Handler-4:null) (logid:0bf9a356) Executing: >>> /usr/share/cloudstack-common/scripts/vm/hy
Re: Router VM: patchviasocket.py timeout issue on 1 out of 4 networks
ting file: /var/cache/cloud/vm_dhcp_entry.json Dec 20 01:44:42 r-687-VM cloud: VR config: create file success Dec 20 01:44:42 r-687-VM cloud: VR config: executing: /opt/cloud/bin/update_config.py vm_dhcp_entry.json Dec 20 01:45:05 r-687-VM cloud: VR config: execution success Dec 20 01:45:05 r-687-VM cloud: VR config: creating file: /var/cache/cloud/vm_dhcp_entry.json Dec 20 01:45:05 r-687-VM cloud: VR config: create file success Dec 20 01:45:05 r-687-VM cloud: VR config: executing: /opt/cloud/bin/update_config.py vm_dhcp_entry.json Dec 20 01:45:27 r-687-VM cloud: VR config: execution success Dec 20 01:45:27 r-687-VM cloud: VR config: creating file: /var/cache/cloud/vm_dhcp_entry.json Dec 20 01:45:27 r-687-VM cloud: VR config: create file success Dec 20 01:45:27 r-687-VM cloud: VR config: executing: /opt/cloud/bin/update_config.py vm_dhcp_entry.json Dec 20 01:45:49 r-687-VM cloud: VR config: execution success Dec 20 01:45:49 r-687-VM cloud: VR config: creating file: /var/cache/cloud/vm_dhcp_entry.json Dec 20 01:45:49 r-687-VM cloud: VR config: create file success Dec 20 01:45:49 r-687-VM cloud: VR config: executing: /opt/cloud/bin/update_config.py vm_dhcp_entry.json Dec 20 01:46:12 r-687-VM cloud: VR config: execution success Dec 20 01:46:12 r-687-VM cloud: VR config: creating file: /var/cache/cloud/vm_dhcp_entry.json Dec 20 01:46:12 r-687-VM cloud: VR config: create file success Dec 20 01:46:12 r-687-VM cloud: VR config: executing: /opt/cloud/bin/update_config.py vm_dhcp_entry.json Dec 20 01:46:22 r-687-VM shutdown[3919]: shutting down for system halt Broadcast message from root@r-687-VM (Tue Dec 20 01:46:22 2016): The system is going down for system halt NOW! Dec 20 01:46:22 r-687-VM shutdown[3962]: shutting down for system halt Broadcast message from root@r-687-VM (Tue Dec 20 01:46:22 2016): Power button pressed The system is going down for system halt NOW! Dec 20 01:46:23 r-687-VM KVP: KVP starting; pid is:4037 Dec 20 01:46:23 r-687-VM cloud: VR config: executing failed: /opt/cloud/bin/update_config.py vm_dhcp_entry.json debug1: channel 0: free: client-session, nchannels 1 Connection to 169.254.0.197 closed by remote host. Connection to 169.254.0.197 closed. Transferred: sent 4336, received 93744 bytes, in 180.3 seconds Bytes per second: sent 24.0, received 519.8 debug1: Exit status -1 Looks like the config script didn't get past vm_dhcp_entry.json ? Thanks. > > > > > From: Syahrul Sazli Shaharir <sa...@pulasan.my> > Sent: Monday, December 19, 2016 2:09 AM > To: users@cloudstack.apache.org > Subject: Re: Router VM: patchviasocket.py timeout issue on 1 out of 4 networks > > On Tue, Dec 13, 2016 at 7:26 PM, Syahrul Sazli Shaharir > <sa...@pulasan.my> wrote: >> Hi Simon, >> >> On Tue, Dec 13, 2016 at 10:31 AM, Simon Weller <swel...@ena.com> wrote: >>> Can you turn on agent debug mode and take a look at the debug level logs? >>> >>> >>> You can do that by running sed -i 's/INFO/DEBUG/g' >>> /etc/cloudstack/agent/log4j-cloud.xml on the host and then restarting the >>> agent. >> >> Here are the debug logs - patchviasocket.py executed OK but couldn't >> connect to the router VM's internal IP:- >> >> 2016-12-13 19:23:18,627 DEBUG [kvm.resource.LibvirtComputingResource] >> (agentRequest-Handler-4:null) (logid:0bf9a356) Executing: >> /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/patchviasocket.py >> -n r-669-VM -p >> %template=domP%name=r-669-VM%eth0ip=10.3.28.10%eth0mask=255.255.255.0%gateway=10.3.28.1%domain=nocser.net%cidrsize=24%dhcprange=10.3.28.1%eth1ip=169.254.3.7%eth1mask=255.255.0.0%type=dhcpsrvr%disable_rp_filter=true%dns1=8.8.8.8%dns2=8.8.4.4%ip6dns1=%ip6dns2=%baremetalnotificationsecuritykey=uavJByNGGjNLrELG-qbdN99__1I3tnp8qa0KbcsKokKJcPB43K9s6oQu2nMLqo3YP8p6jqDy5XT3WWOWBA2yNw%baremetalnotificationapikey=8JH4mdkxsEMhgIBgMonkNXAEKjVOeZnG1m5UVekvvo4v_iXQ4ZS7rh6NNS0qphhc7ZrCauiz23tp2-Wa3AASlg%host=10.2.30.11%port=8080 >> 2016-12-13 19:23:18,739 DEBUG [kvm.resource.LibvirtComputingResource] >> (agentRequest-Handler-4:null) (logid:0bf9a356) Execution is >> successful. >> 2016-12-13 19:23:18,742 DEBUG >> [resource.virtualnetwork.VirtualRoutingResource] >> (agentRequest-Handler-4:null) (logid:0bf9a356) Trying to connect to >> 169.254.3.7 >> 2016-12-13 19:23:21,749 DEBUG >> [resource.virtualnetwork.VirtualRoutingResource] >> (agentRequest-Handler-4:null) (logid:0bf9a356) Could not connect to >> 169.254.3.7 >> 2016-12-13 19:23:26,750 DEBUG >> [resource.virtualnetwork.VirtualRoutingResource] >> (agentRequest-Handler-4:null) (logid:0bf9a356) Trying to connect to >> 169.254.3.7 >> 2016-12-13 19:23:29,757 DEBUG >> [resource.virtualnetwork.VirtualR
Re: Router VM: patchviasocket.py timeout issue on 1 out of 4 networks
When you're in the console, can you ping the host ip? What are your ip tables rules on this host currently? Can you dump the routing table as well? Have you tried a restart of one of the working networks to see if it fails on restart? From: Syahrul Sazli Shaharir <sa...@pulasan.my> Sent: Monday, December 19, 2016 2:09 AM To: users@cloudstack.apache.org Subject: Re: Router VM: patchviasocket.py timeout issue on 1 out of 4 networks On Tue, Dec 13, 2016 at 7:26 PM, Syahrul Sazli Shaharir <sa...@pulasan.my> wrote: > Hi Simon, > > On Tue, Dec 13, 2016 at 10:31 AM, Simon Weller <swel...@ena.com> wrote: >> Can you turn on agent debug mode and take a look at the debug level logs? >> >> >> You can do that by running sed -i 's/INFO/DEBUG/g' >> /etc/cloudstack/agent/log4j-cloud.xml on the host and then restarting the >> agent. > > Here are the debug logs - patchviasocket.py executed OK but couldn't > connect to the router VM's internal IP:- > > 2016-12-13 19:23:18,627 DEBUG [kvm.resource.LibvirtComputingResource] > (agentRequest-Handler-4:null) (logid:0bf9a356) Executing: > /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/patchviasocket.py > -n r-669-VM -p > %template=domP%name=r-669-VM%eth0ip=10.3.28.10%eth0mask=255.255.255.0%gateway=10.3.28.1%domain=nocser.net%cidrsize=24%dhcprange=10.3.28.1%eth1ip=169.254.3.7%eth1mask=255.255.0.0%type=dhcpsrvr%disable_rp_filter=true%dns1=8.8.8.8%dns2=8.8.4.4%ip6dns1=%ip6dns2=%baremetalnotificationsecuritykey=uavJByNGGjNLrELG-qbdN99__1I3tnp8qa0KbcsKokKJcPB43K9s6oQu2nMLqo3YP8p6jqDy5XT3WWOWBA2yNw%baremetalnotificationapikey=8JH4mdkxsEMhgIBgMonkNXAEKjVOeZnG1m5UVekvvo4v_iXQ4ZS7rh6NNS0qphhc7ZrCauiz23tp2-Wa3AASlg%host=10.2.30.11%port=8080 > 2016-12-13 19:23:18,739 DEBUG [kvm.resource.LibvirtComputingResource] > (agentRequest-Handler-4:null) (logid:0bf9a356) Execution is > successful. > 2016-12-13 19:23:18,742 DEBUG > [resource.virtualnetwork.VirtualRoutingResource] > (agentRequest-Handler-4:null) (logid:0bf9a356) Trying to connect to > 169.254.3.7 > 2016-12-13 19:23:21,749 DEBUG > [resource.virtualnetwork.VirtualRoutingResource] > (agentRequest-Handler-4:null) (logid:0bf9a356) Could not connect to > 169.254.3.7 > 2016-12-13 19:23:26,750 DEBUG > [resource.virtualnetwork.VirtualRoutingResource] > (agentRequest-Handler-4:null) (logid:0bf9a356) Trying to connect to > 169.254.3.7 > 2016-12-13 19:23:29,757 DEBUG > [resource.virtualnetwork.VirtualRoutingResource] > (agentRequest-Handler-4:null) (logid:0bf9a356) Could not connect to > 169.254.3.7 > 2016-12-13 19:23:29,869 DEBUG [cloud.agent.Agent] > (agentRequest-Handler-5:null) (logid:981a5f6f) Processing command: > com.cloud.agent.api.GetHostStatsCommand > 2016-12-13 19:23:34,759 DEBUG > [resource.virtualnetwork.VirtualRoutingResource] > (agentRequest-Handler-4:null) (logid:0bf9a356) Unable to logon to > 169.254.3.7 > > virsh console also failed to show anything. Ok after upgrading to latest qemu-kvm-ev-2.6.0-27.1.el7, this time I got to the console at some stage, but patchviasocket.py still times out. Here are the console output:- http://pastebin.com/n37aHeSa [http://pastebin.com/i/facebook.png]<http://pastebin.com/n37aHeSa> Router VM's short lifetime - Pastebin.com<http://pastebin.com/n37aHeSa> pastebin.com Thanks. >> >> From: Syahrul Sazli Shaharir <sa...@pulasan.my> >> Sent: Monday, December 12, 2016 8:21 PM >> To: users@cloudstack.apache.org >> Subject: Router VM: patchviasocket.py timeout issue on 1 out of 4 networks >> >> Hi, >> >> I am running latest Cloudstack 4.9.0.1 on CentOS 7 KVM + ceph >> environment. After running for some time, I faced with an issue with >> one out of 4 networks - following a heartbeat-induced reset on all >> hosts, the associated virtual router would not get recreated and >> started properly on any of the 3 hosts I have, even after repeated >> attempts of the following:- >> - destroy-recreate cycles, via Cloudstack UI >> - restartNetwork cleanup=true API calls (failed with errorcode = 530). >> - redownload and reregister system VM template as another entry and >> assign to router VM in global setting (boots the new template OK, but >> still same problem) >> - tweak default system offering for router VM (increased RAM from 256 to >> 512MB) >> - created new system offering, with RAM tweak, and use of ceph rbd >> store, and assigned it to Cloud.Com-SoftwareRouter as per docs - which >> didnt work for some reason: it kept on using initial default offering >> and created image on local host storage >> - upgrade to latest cloudstack (previously was running 4.8) >> >> As
Re: Router VM: patchviasocket.py timeout issue on 1 out of 4 networks
On Tue, Dec 13, 2016 at 7:26 PM, Syahrul Sazli Shaharirwrote: > Hi Simon, > > On Tue, Dec 13, 2016 at 10:31 AM, Simon Weller wrote: >> Can you turn on agent debug mode and take a look at the debug level logs? >> >> >> You can do that by running sed -i 's/INFO/DEBUG/g' >> /etc/cloudstack/agent/log4j-cloud.xml on the host and then restarting the >> agent. > > Here are the debug logs - patchviasocket.py executed OK but couldn't > connect to the router VM's internal IP:- > > 2016-12-13 19:23:18,627 DEBUG [kvm.resource.LibvirtComputingResource] > (agentRequest-Handler-4:null) (logid:0bf9a356) Executing: > /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/patchviasocket.py > -n r-669-VM -p > %template=domP%name=r-669-VM%eth0ip=10.3.28.10%eth0mask=255.255.255.0%gateway=10.3.28.1%domain=nocser.net%cidrsize=24%dhcprange=10.3.28.1%eth1ip=169.254.3.7%eth1mask=255.255.0.0%type=dhcpsrvr%disable_rp_filter=true%dns1=8.8.8.8%dns2=8.8.4.4%ip6dns1=%ip6dns2=%baremetalnotificationsecuritykey=uavJByNGGjNLrELG-qbdN99__1I3tnp8qa0KbcsKokKJcPB43K9s6oQu2nMLqo3YP8p6jqDy5XT3WWOWBA2yNw%baremetalnotificationapikey=8JH4mdkxsEMhgIBgMonkNXAEKjVOeZnG1m5UVekvvo4v_iXQ4ZS7rh6NNS0qphhc7ZrCauiz23tp2-Wa3AASlg%host=10.2.30.11%port=8080 > 2016-12-13 19:23:18,739 DEBUG [kvm.resource.LibvirtComputingResource] > (agentRequest-Handler-4:null) (logid:0bf9a356) Execution is > successful. > 2016-12-13 19:23:18,742 DEBUG > [resource.virtualnetwork.VirtualRoutingResource] > (agentRequest-Handler-4:null) (logid:0bf9a356) Trying to connect to > 169.254.3.7 > 2016-12-13 19:23:21,749 DEBUG > [resource.virtualnetwork.VirtualRoutingResource] > (agentRequest-Handler-4:null) (logid:0bf9a356) Could not connect to > 169.254.3.7 > 2016-12-13 19:23:26,750 DEBUG > [resource.virtualnetwork.VirtualRoutingResource] > (agentRequest-Handler-4:null) (logid:0bf9a356) Trying to connect to > 169.254.3.7 > 2016-12-13 19:23:29,757 DEBUG > [resource.virtualnetwork.VirtualRoutingResource] > (agentRequest-Handler-4:null) (logid:0bf9a356) Could not connect to > 169.254.3.7 > 2016-12-13 19:23:29,869 DEBUG [cloud.agent.Agent] > (agentRequest-Handler-5:null) (logid:981a5f6f) Processing command: > com.cloud.agent.api.GetHostStatsCommand > 2016-12-13 19:23:34,759 DEBUG > [resource.virtualnetwork.VirtualRoutingResource] > (agentRequest-Handler-4:null) (logid:0bf9a356) Unable to logon to > 169.254.3.7 > > virsh console also failed to show anything. Ok after upgrading to latest qemu-kvm-ev-2.6.0-27.1.el7, this time I got to the console at some stage, but patchviasocket.py still times out. Here are the console output:- http://pastebin.com/n37aHeSa Thanks. >> >> From: Syahrul Sazli Shaharir >> Sent: Monday, December 12, 2016 8:21 PM >> To: users@cloudstack.apache.org >> Subject: Router VM: patchviasocket.py timeout issue on 1 out of 4 networks >> >> Hi, >> >> I am running latest Cloudstack 4.9.0.1 on CentOS 7 KVM + ceph >> environment. After running for some time, I faced with an issue with >> one out of 4 networks - following a heartbeat-induced reset on all >> hosts, the associated virtual router would not get recreated and >> started properly on any of the 3 hosts I have, even after repeated >> attempts of the following:- >> - destroy-recreate cycles, via Cloudstack UI >> - restartNetwork cleanup=true API calls (failed with errorcode = 530). >> - redownload and reregister system VM template as another entry and >> assign to router VM in global setting (boots the new template OK, but >> still same problem) >> - tweak default system offering for router VM (increased RAM from 256 to >> 512MB) >> - created new system offering, with RAM tweak, and use of ceph rbd >> store, and assigned it to Cloud.Com-SoftwareRouter as per docs - which >> didnt work for some reason: it kept on using initial default offering >> and created image on local host storage >> - upgrade to latest cloudstack (previously was running 4.8) >> >> As with a handful of others in this list archives, virsh list and >> dumpxml shows the VM created OK but failed soon after booting, as >> found in the following error in agent.log :- >> >> 2016-12-13 10:03:33,894 WARN [kvm.resource.LibvirtComputingResource] >> (agentRequest-Handler-1:null) (logid:633e6e03) Timed out: >> /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/patchviasocket.py >> -n r-668-VM -p >> %template=domP%name=r-668-VM%eth0ip=10.3.28.10%eth0mask=255.255.255.0%gateway=10.3.28.1%domain=nocser.net%cidrsize=24%dhcprange=10.3.28.1%eth1ip=169.254.0.33%eth1mask=255.255.0.0%type=dhcpsrvr%disable_rp_filter=true%dns1=8.8.8.8%dns2=8.8.4.4%ip6dns1=%ip6dns2=%baremetalnotificationsecuritykey=uavJByNGGjNLrELG-qbdN99__1I3tnp8qa0KbcsKokKJcPB43K9s6oQu2nMLqo3YP8p6jqDy5XT3WWOWBA2yNw%baremetalnotificationapikey=8JH4mdkxsEMhgIBgMonkNXAEKjVOeZnG1m5UVekvvo4v_iXQ4ZS7rh6NNS0qphhc7ZrCauiz23tp2-Wa3AASlg%host=10.2.30.11%port=8080 >> . Output is: >> . >> 2016-12-13 10:05:45,895
Re: Router VM: patchviasocket.py timeout issue on 1 out of 4 networks
On Fri, Dec 16, 2016 at 5:16 PM, Dag Sonstebowrote: > Hi Syahrul, > > It just struck me we had similar issues with patchviasocket.py and > python-argparse with one of our clients a while back, I believe our fix is > going into 4.9.1.0: > > https://github.com/apache/cloudstack/pull/1634 Hi Dag, As I'm already running centos 7 with python 2.7, would this still apply? Thanks. > > Regards, > Dag Sonstebo > Cloud Architect > ShapeBlue > > On 15/12/2016, 23:09, "Syahrul Sazli Shaharir" wrote: > > Hi Ilya, > > I've looked at the patch suggested, looks like it has been committed > into qemu 2.4.0, and I can see the modified parts in the latest qemu > 2.6.0 code. So I went ahead and installed qemu-kvm-ev-2.6.0-27.1 on > one of the hosts. But the problem still persists. Perhaps I should > bring this issue to that dev thread. > > Thanks for the help! :) > > On Thu, Dec 15, 2016 at 11:03 AM, ilya > wrote: > > This will explain a bit more on how this issue came about and how to fix > > it.. > > https://www.mail-archive.com/dev@cloudstack.apache.org/msg71559.html > > > > On 12/12/16 6:31 PM, Simon Weller wrote: > >> Can you turn on agent debug mode and take a look at the debug level > logs? > >> > >> > >> You can do that by running sed -i 's/INFO/DEBUG/g' > /etc/cloudstack/agent/log4j-cloud.xml on the host and then restarting the > agent. > >> > >> > >> - Si > >> > >> > >> > >> > >> > >> From: Syahrul Sazli Shaharir > >> Sent: Monday, December 12, 2016 8:21 PM > >> To: users@cloudstack.apache.org > >> Subject: Router VM: patchviasocket.py timeout issue on 1 out of 4 > networks > >> > >> Hi, > >> > >> I am running latest Cloudstack 4.9.0.1 on CentOS 7 KVM + ceph > >> environment. After running for some time, I faced with an issue with > >> one out of 4 networks - following a heartbeat-induced reset on all > >> hosts, the associated virtual router would not get recreated and > >> started properly on any of the 3 hosts I have, even after repeated > >> attempts of the following:- > >> - destroy-recreate cycles, via Cloudstack UI > >> - restartNetwork cleanup=true API calls (failed with errorcode = 530). > >> - redownload and reregister system VM template as another entry and > >> assign to router VM in global setting (boots the new template OK, but > >> still same problem) > >> - tweak default system offering for router VM (increased RAM from 256 > to 512MB) > >> - created new system offering, with RAM tweak, and use of ceph rbd > >> store, and assigned it to Cloud.Com-SoftwareRouter as per docs - which > >> didnt work for some reason: it kept on using initial default offering > >> and created image on local host storage > >> - upgrade to latest cloudstack (previously was running 4.8) > >> > >> As with a handful of others in this list archives, virsh list and > >> dumpxml shows the VM created OK but failed soon after booting, as > >> found in the following error in agent.log :- > >> > >> 2016-12-13 10:03:33,894 WARN [kvm.resource.LibvirtComputingResource] > >> (agentRequest-Handler-1:null) (logid:633e6e03) Timed out: > >> > /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/patchviasocket.py > >> -n r-668-VM -p > %template=domP%name=r-668-VM%eth0ip=10.3.28.10%eth0mask=255.255.255.0%gateway=10.3.28.1%domain=nocser.net%cidrsize=24%dhcprange=10.3.28.1%eth1ip=169.254.0.33%eth1mask=255.255.0.0%type=dhcpsrvr%disable_rp_filter=true%dns1=8.8.8.8%dns2=8.8.4.4%ip6dns1=%ip6dns2=%baremetalnotificationsecuritykey=uavJByNGGjNLrELG-qbdN99__1I3tnp8qa0KbcsKokKJcPB43K9s6oQu2nMLqo3YP8p6jqDy5XT3WWOWBA2yNw%baremetalnotificationapikey=8JH4mdkxsEMhgIBgMonkNXAEKjVOeZnG1m5UVekvvo4v_iXQ4ZS7rh6NNS0qphhc7ZrCauiz23tp2-Wa3AASlg%host=10.2.30.11%port=8080 > >> . Output is: > >> . > >> 2016-12-13 10:05:45,895 WARN [kvm.resource.LibvirtComputingResource] > >> (agentRequest-Handler-1:null) (logid:633e6e03) Timed out: > >> /usr/share/cloudstack-common/scripts/network/domr/router_proxy.sh > >> vr_cfg.sh 169.254.0.33 -c > >> /var/cache/cloud/VR-48ea8a95-6c02-499f-88d3-eae5bf9f9fbe.cfg . Output > >> is: > >> > >> As mentioned, this only happens with 1 network (always the same > >> network). The other router VMs work OK. Any clues on how to > >> troubleshoot this further, would be greatly appreciated. > >> > >> Thanks. > >> > >> -- > >> --sazli > >> Syahrul Sazli Shaharir > >> > > > > -- > --sazli > Syahrul Sazli Shaharir > Mobile: +6019 385 8301 - YM/Skype: syahrulsazli > System Administrator > TMK Pulasan (002339810-M) http://pulasan.my/ >
Re: Router VM: patchviasocket.py timeout issue on 1 out of 4 networks
Hi Syahrul, It just struck me we had similar issues with patchviasocket.py and python-argparse with one of our clients a while back, I believe our fix is going into 4.9.1.0: https://github.com/apache/cloudstack/pull/1634 Regards, Dag Sonstebo Cloud Architect ShapeBlue On 15/12/2016, 23:09, "Syahrul Sazli Shaharir"wrote: Hi Ilya, I've looked at the patch suggested, looks like it has been committed into qemu 2.4.0, and I can see the modified parts in the latest qemu 2.6.0 code. So I went ahead and installed qemu-kvm-ev-2.6.0-27.1 on one of the hosts. But the problem still persists. Perhaps I should bring this issue to that dev thread. Thanks for the help! :) On Thu, Dec 15, 2016 at 11:03 AM, ilya wrote: > This will explain a bit more on how this issue came about and how to fix > it.. > https://www.mail-archive.com/dev@cloudstack.apache.org/msg71559.html > > On 12/12/16 6:31 PM, Simon Weller wrote: >> Can you turn on agent debug mode and take a look at the debug level logs? >> >> >> You can do that by running sed -i 's/INFO/DEBUG/g' /etc/cloudstack/agent/log4j-cloud.xml on the host and then restarting the agent. >> >> >> - Si >> >> >> >> >> >> From: Syahrul Sazli Shaharir >> Sent: Monday, December 12, 2016 8:21 PM >> To: users@cloudstack.apache.org >> Subject: Router VM: patchviasocket.py timeout issue on 1 out of 4 networks >> >> Hi, >> >> I am running latest Cloudstack 4.9.0.1 on CentOS 7 KVM + ceph >> environment. After running for some time, I faced with an issue with >> one out of 4 networks - following a heartbeat-induced reset on all >> hosts, the associated virtual router would not get recreated and >> started properly on any of the 3 hosts I have, even after repeated >> attempts of the following:- >> - destroy-recreate cycles, via Cloudstack UI >> - restartNetwork cleanup=true API calls (failed with errorcode = 530). >> - redownload and reregister system VM template as another entry and >> assign to router VM in global setting (boots the new template OK, but >> still same problem) >> - tweak default system offering for router VM (increased RAM from 256 to 512MB) >> - created new system offering, with RAM tweak, and use of ceph rbd >> store, and assigned it to Cloud.Com-SoftwareRouter as per docs - which >> didnt work for some reason: it kept on using initial default offering >> and created image on local host storage >> - upgrade to latest cloudstack (previously was running 4.8) >> >> As with a handful of others in this list archives, virsh list and >> dumpxml shows the VM created OK but failed soon after booting, as >> found in the following error in agent.log :- >> >> 2016-12-13 10:03:33,894 WARN [kvm.resource.LibvirtComputingResource] >> (agentRequest-Handler-1:null) (logid:633e6e03) Timed out: >> /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/patchviasocket.py >> -n r-668-VM -p %template=domP%name=r-668-VM%eth0ip=10.3.28.10%eth0mask=255.255.255.0%gateway=10.3.28.1%domain=nocser.net%cidrsize=24%dhcprange=10.3.28.1%eth1ip=169.254.0.33%eth1mask=255.255.0.0%type=dhcpsrvr%disable_rp_filter=true%dns1=8.8.8.8%dns2=8.8.4.4%ip6dns1=%ip6dns2=%baremetalnotificationsecuritykey=uavJByNGGjNLrELG-qbdN99__1I3tnp8qa0KbcsKokKJcPB43K9s6oQu2nMLqo3YP8p6jqDy5XT3WWOWBA2yNw%baremetalnotificationapikey=8JH4mdkxsEMhgIBgMonkNXAEKjVOeZnG1m5UVekvvo4v_iXQ4ZS7rh6NNS0qphhc7ZrCauiz23tp2-Wa3AASlg%host=10.2.30.11%port=8080 >> . Output is: >> . >> 2016-12-13 10:05:45,895 WARN [kvm.resource.LibvirtComputingResource] >> (agentRequest-Handler-1:null) (logid:633e6e03) Timed out: >> /usr/share/cloudstack-common/scripts/network/domr/router_proxy.sh >> vr_cfg.sh 169.254.0.33 -c >> /var/cache/cloud/VR-48ea8a95-6c02-499f-88d3-eae5bf9f9fbe.cfg . Output >> is: >> >> As mentioned, this only happens with 1 network (always the same >> network). The other router VMs work OK. Any clues on how to >> troubleshoot this further, would be greatly appreciated. >> >> Thanks. >> >> -- >> --sazli >> Syahrul Sazli Shaharir >> -- --sazli Syahrul Sazli Shaharir Mobile: +6019 385 8301 - YM/Skype: syahrulsazli System Administrator TMK Pulasan (002339810-M) http://pulasan.my/ 11 Jalan 3/4, 43650 Bandar Baru Bangi, Selangor, Malaysia. Tel/Fax: +603 8926 0338 dag.sonst...@shapeblue.comĀ www.shapeblue.com 53 Chandos Place, Covent Garden, London WC2N 4HSUK @shapeblue
Re: Router VM: patchviasocket.py timeout issue on 1 out of 4 networks
Hi Ilya, I've looked at the patch suggested, looks like it has been committed into qemu 2.4.0, and I can see the modified parts in the latest qemu 2.6.0 code. So I went ahead and installed qemu-kvm-ev-2.6.0-27.1 on one of the hosts. But the problem still persists. Perhaps I should bring this issue to that dev thread. Thanks for the help! :) On Thu, Dec 15, 2016 at 11:03 AM, ilyawrote: > This will explain a bit more on how this issue came about and how to fix > it.. > https://www.mail-archive.com/dev@cloudstack.apache.org/msg71559.html > > On 12/12/16 6:31 PM, Simon Weller wrote: >> Can you turn on agent debug mode and take a look at the debug level logs? >> >> >> You can do that by running sed -i 's/INFO/DEBUG/g' >> /etc/cloudstack/agent/log4j-cloud.xml on the host and then restarting the >> agent. >> >> >> - Si >> >> >> >> >> >> From: Syahrul Sazli Shaharir >> Sent: Monday, December 12, 2016 8:21 PM >> To: users@cloudstack.apache.org >> Subject: Router VM: patchviasocket.py timeout issue on 1 out of 4 networks >> >> Hi, >> >> I am running latest Cloudstack 4.9.0.1 on CentOS 7 KVM + ceph >> environment. After running for some time, I faced with an issue with >> one out of 4 networks - following a heartbeat-induced reset on all >> hosts, the associated virtual router would not get recreated and >> started properly on any of the 3 hosts I have, even after repeated >> attempts of the following:- >> - destroy-recreate cycles, via Cloudstack UI >> - restartNetwork cleanup=true API calls (failed with errorcode = 530). >> - redownload and reregister system VM template as another entry and >> assign to router VM in global setting (boots the new template OK, but >> still same problem) >> - tweak default system offering for router VM (increased RAM from 256 to >> 512MB) >> - created new system offering, with RAM tweak, and use of ceph rbd >> store, and assigned it to Cloud.Com-SoftwareRouter as per docs - which >> didnt work for some reason: it kept on using initial default offering >> and created image on local host storage >> - upgrade to latest cloudstack (previously was running 4.8) >> >> As with a handful of others in this list archives, virsh list and >> dumpxml shows the VM created OK but failed soon after booting, as >> found in the following error in agent.log :- >> >> 2016-12-13 10:03:33,894 WARN [kvm.resource.LibvirtComputingResource] >> (agentRequest-Handler-1:null) (logid:633e6e03) Timed out: >> /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/patchviasocket.py >> -n r-668-VM -p >> %template=domP%name=r-668-VM%eth0ip=10.3.28.10%eth0mask=255.255.255.0%gateway=10.3.28.1%domain=nocser.net%cidrsize=24%dhcprange=10.3.28.1%eth1ip=169.254.0.33%eth1mask=255.255.0.0%type=dhcpsrvr%disable_rp_filter=true%dns1=8.8.8.8%dns2=8.8.4.4%ip6dns1=%ip6dns2=%baremetalnotificationsecuritykey=uavJByNGGjNLrELG-qbdN99__1I3tnp8qa0KbcsKokKJcPB43K9s6oQu2nMLqo3YP8p6jqDy5XT3WWOWBA2yNw%baremetalnotificationapikey=8JH4mdkxsEMhgIBgMonkNXAEKjVOeZnG1m5UVekvvo4v_iXQ4ZS7rh6NNS0qphhc7ZrCauiz23tp2-Wa3AASlg%host=10.2.30.11%port=8080 >> . Output is: >> . >> 2016-12-13 10:05:45,895 WARN [kvm.resource.LibvirtComputingResource] >> (agentRequest-Handler-1:null) (logid:633e6e03) Timed out: >> /usr/share/cloudstack-common/scripts/network/domr/router_proxy.sh >> vr_cfg.sh 169.254.0.33 -c >> /var/cache/cloud/VR-48ea8a95-6c02-499f-88d3-eae5bf9f9fbe.cfg . Output >> is: >> >> As mentioned, this only happens with 1 network (always the same >> network). The other router VMs work OK. Any clues on how to >> troubleshoot this further, would be greatly appreciated. >> >> Thanks. >> >> -- >> --sazli >> Syahrul Sazli Shaharir >> -- --sazli Syahrul Sazli Shaharir Mobile: +6019 385 8301 - YM/Skype: syahrulsazli System Administrator TMK Pulasan (002339810-M) http://pulasan.my/ 11 Jalan 3/4, 43650 Bandar Baru Bangi, Selangor, Malaysia. Tel/Fax: +603 8926 0338
Re: Router VM: patchviasocket.py timeout issue on 1 out of 4 networks
This will explain a bit more on how this issue came about and how to fix it.. https://www.mail-archive.com/dev@cloudstack.apache.org/msg71559.html On 12/12/16 6:31 PM, Simon Weller wrote: > Can you turn on agent debug mode and take a look at the debug level logs? > > > You can do that by running sed -i 's/INFO/DEBUG/g' > /etc/cloudstack/agent/log4j-cloud.xml on the host and then restarting the > agent. > > > - Si > > > > > > From: Syahrul Sazli Shaharir> Sent: Monday, December 12, 2016 8:21 PM > To: users@cloudstack.apache.org > Subject: Router VM: patchviasocket.py timeout issue on 1 out of 4 networks > > Hi, > > I am running latest Cloudstack 4.9.0.1 on CentOS 7 KVM + ceph > environment. After running for some time, I faced with an issue with > one out of 4 networks - following a heartbeat-induced reset on all > hosts, the associated virtual router would not get recreated and > started properly on any of the 3 hosts I have, even after repeated > attempts of the following:- > - destroy-recreate cycles, via Cloudstack UI > - restartNetwork cleanup=true API calls (failed with errorcode = 530). > - redownload and reregister system VM template as another entry and > assign to router VM in global setting (boots the new template OK, but > still same problem) > - tweak default system offering for router VM (increased RAM from 256 to > 512MB) > - created new system offering, with RAM tweak, and use of ceph rbd > store, and assigned it to Cloud.Com-SoftwareRouter as per docs - which > didnt work for some reason: it kept on using initial default offering > and created image on local host storage > - upgrade to latest cloudstack (previously was running 4.8) > > As with a handful of others in this list archives, virsh list and > dumpxml shows the VM created OK but failed soon after booting, as > found in the following error in agent.log :- > > 2016-12-13 10:03:33,894 WARN [kvm.resource.LibvirtComputingResource] > (agentRequest-Handler-1:null) (logid:633e6e03) Timed out: > /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/patchviasocket.py > -n r-668-VM -p > %template=domP%name=r-668-VM%eth0ip=10.3.28.10%eth0mask=255.255.255.0%gateway=10.3.28.1%domain=nocser.net%cidrsize=24%dhcprange=10.3.28.1%eth1ip=169.254.0.33%eth1mask=255.255.0.0%type=dhcpsrvr%disable_rp_filter=true%dns1=8.8.8.8%dns2=8.8.4.4%ip6dns1=%ip6dns2=%baremetalnotificationsecuritykey=uavJByNGGjNLrELG-qbdN99__1I3tnp8qa0KbcsKokKJcPB43K9s6oQu2nMLqo3YP8p6jqDy5XT3WWOWBA2yNw%baremetalnotificationapikey=8JH4mdkxsEMhgIBgMonkNXAEKjVOeZnG1m5UVekvvo4v_iXQ4ZS7rh6NNS0qphhc7ZrCauiz23tp2-Wa3AASlg%host=10.2.30.11%port=8080 > . Output is: > . > 2016-12-13 10:05:45,895 WARN [kvm.resource.LibvirtComputingResource] > (agentRequest-Handler-1:null) (logid:633e6e03) Timed out: > /usr/share/cloudstack-common/scripts/network/domr/router_proxy.sh > vr_cfg.sh 169.254.0.33 -c > /var/cache/cloud/VR-48ea8a95-6c02-499f-88d3-eae5bf9f9fbe.cfg . Output > is: > > As mentioned, this only happens with 1 network (always the same > network). The other router VMs work OK. Any clues on how to > troubleshoot this further, would be greatly appreciated. > > Thanks. > > -- > --sazli > Syahrul Sazli Shaharir >
Re: Router VM: patchviasocket.py timeout issue on 1 out of 4 networks
Hi Simon, On Tue, Dec 13, 2016 at 10:31 AM, Simon Wellerwrote: > Can you turn on agent debug mode and take a look at the debug level logs? > > > You can do that by running sed -i 's/INFO/DEBUG/g' > /etc/cloudstack/agent/log4j-cloud.xml on the host and then restarting the > agent. Here are the debug logs - patchviasocket.py executed OK but couldn't connect to the router VM's internal IP:- 2016-12-13 19:23:18,627 DEBUG [kvm.resource.LibvirtComputingResource] (agentRequest-Handler-4:null) (logid:0bf9a356) Executing: /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/patchviasocket.py -n r-669-VM -p %template=domP%name=r-669-VM%eth0ip=10.3.28.10%eth0mask=255.255.255.0%gateway=10.3.28.1%domain=nocser.net%cidrsize=24%dhcprange=10.3.28.1%eth1ip=169.254.3.7%eth1mask=255.255.0.0%type=dhcpsrvr%disable_rp_filter=true%dns1=8.8.8.8%dns2=8.8.4.4%ip6dns1=%ip6dns2=%baremetalnotificationsecuritykey=uavJByNGGjNLrELG-qbdN99__1I3tnp8qa0KbcsKokKJcPB43K9s6oQu2nMLqo3YP8p6jqDy5XT3WWOWBA2yNw%baremetalnotificationapikey=8JH4mdkxsEMhgIBgMonkNXAEKjVOeZnG1m5UVekvvo4v_iXQ4ZS7rh6NNS0qphhc7ZrCauiz23tp2-Wa3AASlg%host=10.2.30.11%port=8080 2016-12-13 19:23:18,739 DEBUG [kvm.resource.LibvirtComputingResource] (agentRequest-Handler-4:null) (logid:0bf9a356) Execution is successful. 2016-12-13 19:23:18,742 DEBUG [resource.virtualnetwork.VirtualRoutingResource] (agentRequest-Handler-4:null) (logid:0bf9a356) Trying to connect to 169.254.3.7 2016-12-13 19:23:21,749 DEBUG [resource.virtualnetwork.VirtualRoutingResource] (agentRequest-Handler-4:null) (logid:0bf9a356) Could not connect to 169.254.3.7 2016-12-13 19:23:26,750 DEBUG [resource.virtualnetwork.VirtualRoutingResource] (agentRequest-Handler-4:null) (logid:0bf9a356) Trying to connect to 169.254.3.7 2016-12-13 19:23:29,757 DEBUG [resource.virtualnetwork.VirtualRoutingResource] (agentRequest-Handler-4:null) (logid:0bf9a356) Could not connect to 169.254.3.7 2016-12-13 19:23:29,869 DEBUG [cloud.agent.Agent] (agentRequest-Handler-5:null) (logid:981a5f6f) Processing command: com.cloud.agent.api.GetHostStatsCommand 2016-12-13 19:23:34,759 DEBUG [resource.virtualnetwork.VirtualRoutingResource] (agentRequest-Handler-4:null) (logid:0bf9a356) Unable to logon to 169.254.3.7 virsh console also failed to show anything. Thanks. > > > - Si > > > > > > From: Syahrul Sazli Shaharir > Sent: Monday, December 12, 2016 8:21 PM > To: users@cloudstack.apache.org > Subject: Router VM: patchviasocket.py timeout issue on 1 out of 4 networks > > Hi, > > I am running latest Cloudstack 4.9.0.1 on CentOS 7 KVM + ceph > environment. After running for some time, I faced with an issue with > one out of 4 networks - following a heartbeat-induced reset on all > hosts, the associated virtual router would not get recreated and > started properly on any of the 3 hosts I have, even after repeated > attempts of the following:- > - destroy-recreate cycles, via Cloudstack UI > - restartNetwork cleanup=true API calls (failed with errorcode = 530). > - redownload and reregister system VM template as another entry and > assign to router VM in global setting (boots the new template OK, but > still same problem) > - tweak default system offering for router VM (increased RAM from 256 to > 512MB) > - created new system offering, with RAM tweak, and use of ceph rbd > store, and assigned it to Cloud.Com-SoftwareRouter as per docs - which > didnt work for some reason: it kept on using initial default offering > and created image on local host storage > - upgrade to latest cloudstack (previously was running 4.8) > > As with a handful of others in this list archives, virsh list and > dumpxml shows the VM created OK but failed soon after booting, as > found in the following error in agent.log :- > > 2016-12-13 10:03:33,894 WARN [kvm.resource.LibvirtComputingResource] > (agentRequest-Handler-1:null) (logid:633e6e03) Timed out: > /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/patchviasocket.py > -n r-668-VM -p > %template=domP%name=r-668-VM%eth0ip=10.3.28.10%eth0mask=255.255.255.0%gateway=10.3.28.1%domain=nocser.net%cidrsize=24%dhcprange=10.3.28.1%eth1ip=169.254.0.33%eth1mask=255.255.0.0%type=dhcpsrvr%disable_rp_filter=true%dns1=8.8.8.8%dns2=8.8.4.4%ip6dns1=%ip6dns2=%baremetalnotificationsecuritykey=uavJByNGGjNLrELG-qbdN99__1I3tnp8qa0KbcsKokKJcPB43K9s6oQu2nMLqo3YP8p6jqDy5XT3WWOWBA2yNw%baremetalnotificationapikey=8JH4mdkxsEMhgIBgMonkNXAEKjVOeZnG1m5UVekvvo4v_iXQ4ZS7rh6NNS0qphhc7ZrCauiz23tp2-Wa3AASlg%host=10.2.30.11%port=8080 > . Output is: > . > 2016-12-13 10:05:45,895 WARN [kvm.resource.LibvirtComputingResource] > (agentRequest-Handler-1:null) (logid:633e6e03) Timed out: > /usr/share/cloudstack-common/scripts/network/domr/router_proxy.sh > vr_cfg.sh 169.254.0.33 -c > /var/cache/cloud/VR-48ea8a95-6c02-499f-88d3-eae5bf9f9fbe.cfg . Output > is: > > As mentioned, this only happens with 1 network (always the same > network). The other router VMs work OK.
Re: Router VM: patchviasocket.py timeout issue on 1 out of 4 networks
Can you turn on agent debug mode and take a look at the debug level logs? You can do that by running sed -i 's/INFO/DEBUG/g' /etc/cloudstack/agent/log4j-cloud.xml on the host and then restarting the agent. - Si From: Syahrul Sazli ShaharirSent: Monday, December 12, 2016 8:21 PM To: users@cloudstack.apache.org Subject: Router VM: patchviasocket.py timeout issue on 1 out of 4 networks Hi, I am running latest Cloudstack 4.9.0.1 on CentOS 7 KVM + ceph environment. After running for some time, I faced with an issue with one out of 4 networks - following a heartbeat-induced reset on all hosts, the associated virtual router would not get recreated and started properly on any of the 3 hosts I have, even after repeated attempts of the following:- - destroy-recreate cycles, via Cloudstack UI - restartNetwork cleanup=true API calls (failed with errorcode = 530). - redownload and reregister system VM template as another entry and assign to router VM in global setting (boots the new template OK, but still same problem) - tweak default system offering for router VM (increased RAM from 256 to 512MB) - created new system offering, with RAM tweak, and use of ceph rbd store, and assigned it to Cloud.Com-SoftwareRouter as per docs - which didnt work for some reason: it kept on using initial default offering and created image on local host storage - upgrade to latest cloudstack (previously was running 4.8) As with a handful of others in this list archives, virsh list and dumpxml shows the VM created OK but failed soon after booting, as found in the following error in agent.log :- 2016-12-13 10:03:33,894 WARN [kvm.resource.LibvirtComputingResource] (agentRequest-Handler-1:null) (logid:633e6e03) Timed out: /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/patchviasocket.py -n r-668-VM -p %template=domP%name=r-668-VM%eth0ip=10.3.28.10%eth0mask=255.255.255.0%gateway=10.3.28.1%domain=nocser.net%cidrsize=24%dhcprange=10.3.28.1%eth1ip=169.254.0.33%eth1mask=255.255.0.0%type=dhcpsrvr%disable_rp_filter=true%dns1=8.8.8.8%dns2=8.8.4.4%ip6dns1=%ip6dns2=%baremetalnotificationsecuritykey=uavJByNGGjNLrELG-qbdN99__1I3tnp8qa0KbcsKokKJcPB43K9s6oQu2nMLqo3YP8p6jqDy5XT3WWOWBA2yNw%baremetalnotificationapikey=8JH4mdkxsEMhgIBgMonkNXAEKjVOeZnG1m5UVekvvo4v_iXQ4ZS7rh6NNS0qphhc7ZrCauiz23tp2-Wa3AASlg%host=10.2.30.11%port=8080 . Output is: . 2016-12-13 10:05:45,895 WARN [kvm.resource.LibvirtComputingResource] (agentRequest-Handler-1:null) (logid:633e6e03) Timed out: /usr/share/cloudstack-common/scripts/network/domr/router_proxy.sh vr_cfg.sh 169.254.0.33 -c /var/cache/cloud/VR-48ea8a95-6c02-499f-88d3-eae5bf9f9fbe.cfg . Output is: As mentioned, this only happens with 1 network (always the same network). The other router VMs work OK. Any clues on how to troubleshoot this further, would be greatly appreciated. Thanks. -- --sazli Syahrul Sazli Shaharir