I have noticed that sometimes the arping stuff can slow down the router
drastically. At least in 4.7, arping will try to ping a hostname of "None"
(Python's equivalent of null). I'm wondering if this has been fixed in
newer versions.

*Jeff Hair*
Technical Lead and Software Developer

Tel: (+354) 415 0200
j...@greenqloud.com
www.greenqloud.com

On Fri, Feb 17, 2017 at 4:22 AM, Syahrul Sazli Shaharir <sa...@pulasan.my>
wrote:

> Hi,
>
> For the benefit of others, I've worked around this issue by:-
> - SSH to router VM via local IP through host running the VM
> - tail /var/log/cloud.log to pinpoint the source of loop
> - Find the looping script in /opt/cloud , edit and disable the looping part
> - If you do this quick (before timeout), VM will resume boot
> successfully - if not try again until succeed.
>
> Full notes for my particular case:
> https://pulasan.my/doku.php?id=cs_masalah_router_vm
>
> Thanks.
>
> On Tue, Dec 20, 2016 at 10:09 AM, Syahrul Sazli Shaharir
> <sa...@pulasan.my> wrote:
> > On Mon, Dec 19, 2016 at 8:54 PM, Simon Weller <swel...@ena.com> wrote:
> >> When you're in the console, can you ping the host ip?
> >
> > Yes - some (not all) of the IPs assigned on the host.
> >
> >> What are your ip tables rules on this host currently?
> >
> > Chain INPUT (policy ACCEPT)
> > target     prot opt source               destination
> > ACCEPT     udp  --  0.0.0.0/0            0.0.0.0/0            udp dpt:53
> > ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0            tcp dpt:53
> > ACCEPT     udp  --  0.0.0.0/0            0.0.0.0/0            udp dpt:67
> > ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0            tcp dpt:67
> >
> > Chain FORWARD (policy ACCEPT)
> > target     prot opt source               destination
> > ACCEPT     all  --  0.0.0.0/0            192.168.122.0/24     ctstate
> > RELATED,ESTABLISHED
> > ACCEPT     all  --  192.168.122.0/24     0.0.0.0/0
> > ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0
> > REJECT     all  --  0.0.0.0/0            0.0.0.0/0
> > reject-with icmp-port-unreachable
> > REJECT     all  --  0.0.0.0/0            0.0.0.0/0
> > reject-with icmp-port-unreachable
> >
> > Chain OUTPUT (policy ACCEPT)
> > target     prot opt source               destination
> > ACCEPT     udp  --  0.0.0.0/0            0.0.0.0/0            udp dpt:68
> >
> >> Can you dump the routing table as well?
> >
> > Kernel IP routing table
> > Destination     Gateway         Genmask         Flags   MSS Window  irtt
> Iface
> > 0.0.0.0         172.16.30.33      0.0.0.0         UG        0 0
> >   0 cloudbr2.304
> > 10.1.30.0       0.0.0.0         255.255.255.0   U         0 0          0
> bond1
> > 10.2.30.0       0.0.0.0         255.255.255.0   U         0 0
> > 0 cloudbr2.352
> > 10.3.30.0       0.0.0.0         255.255.255.0   U         0 0
> > 0 cloudbr2.353
> > 172.16.30.32      0.0.0.0         255.255.255.224 U         0 0
> >   0 cloudbr2.304
> > 169.254.0.0     0.0.0.0         255.255.0.0     U         0 0          0
> cloud0
> > 192.168.122.0   0.0.0.0         255.255.255.0   U         0 0          0
> virbr0
> >
> >> Have you tried a restart of one of the working networks to see if it
> fails on restart?
> >
> > Yes, and able to restart OK. I logged on to each network's router VM
> > console during the patchviasocket.py stage onwards, and find the
> > following difference (both VMs were created and booted in the same
> > host):-
> >
> > 1. Working network router VM ( http://pastebin.com/Y6zpDa6M ) :-
> > .....
> >
> > Dec 20 01:37:55 r-686-VM cloud: Boot up process done
> > Dec 20 01:37:55 r-686-VM cloud: VR config: configuation format version
> 1.0
> > Dec 20 01:37:55 r-686-VM cloud: VR config: creating file:
> > /var/cache/cloud/monitor_service.json
> > Dec 20 01:37:55 r-686-VM cloud: VR config: create file success
> > Dec 20 01:37:55 r-686-VM cloud: VR config: executing:
> > /opt/cloud/bin/update_config.py monitor_service.json
> > Dec 20 01:38:16 r-686-VM cloud: VR config: execution success
> > Dec 20 01:38:16 r-686-VM cloud: VR config: creating file:
> > /var/cache/cloud/vm_dhcp_entry.json
> > Dec 20 01:38:16 r-686-VM cloud: VR config: create file success
> > Dec 20 01:38:16 r-686-VM cloud: VR config: executing:
> > /opt/cloud/bin/update_config.py vm_dhcp_entry.json
> > Dec 20 01:38:38 r-686-VM cloud: VR config: execution success
> > Dec 20 01:38:38 r-686-VM cloud: VR config: creating file:
> > /var/cache/cloud/vm_dhcp_entry.json
> > Dec 20 01:38:38 r-686-VM cloud: VR config: create file success
> > Dec 20 01:38:38 r-686-VM cloud: VR config: executing:
> > /opt/cloud/bin/update_config.py vm_dhcp_entry.json
> > Dec 20 01:39:01 r-686-VM cloud: VR config: execution success
> > Dec 20 01:39:01 r-686-VM cloud: VR config: creating file:
> > /var/cache/cloud/vm_metadata.json
> > Dec 20 01:39:01 r-686-VM cloud: VR config: create file success
> > Dec 20 01:39:01 r-686-VM cloud: VR config: executing:
> > /opt/cloud/bin/update_config.py vm_metadata.json
> > Dec 20 01:39:21 r-686-VM cloud: VR config: execution success
> > Dec 20 01:39:21 r-686-VM cloud: VR config: creating file:
> > /var/cache/cloud/vm_metadata.json
> > Dec 20 01:39:21 r-686-VM cloud: VR config: create file success
> > Dec 20 01:39:21 r-686-VM cloud: VR config: executing:
> > /opt/cloud/bin/update_config.py vm_metadata.json
> > Dec 20 01:39:41 r-686-VM cloud: VR config: execution success
> > Dec 20 01:39:41 r-686-VM cloud: VR config: Flushing conntrack table
> > Dec 20 01:39:41 r-686-VM cloud: VR config: Flushing conntrack table
> completed
> > Dec 20 01:39:42 r-686-VM cloud: VR config: configuation format version
> 1.0
> > Dec 20 01:39:42 r-686-VM cloud: VR config: Flushing conntrack table
> > Dec 20 01:39:42 r-686-VM cloud: VR config: Flushing conntrack table
> completed
> >
> > 2. Non-working network router VM ( http://pastebin.com/jzfGMGQB ):-
> > .....
> >
> > Dec 20 01:44:21 r-687-VM cloud: Boot up process done
> > Dec 20 01:44:22 r-687-VM cloud: VR config: configuation format version
> 1.0
> > Dec 20 01:44:22 r-687-VM cloud: VR config: creating file:
> > /var/cache/cloud/monitor_service.json
> > Dec 20 01:44:22 r-687-VM cloud: VR config: create file success
> > Dec 20 01:44:22 r-687-VM cloud: VR config: executing:
> > /opt/cloud/bin/update_config.py monitor_service.json
> > Dec 20 01:44:42 r-687-VM cloud: VR config: execution success
> > Dec 20 01:44:42 r-687-VM cloud: VR config: creating file:
> > /var/cache/cloud/vm_dhcp_entry.json
> > Dec 20 01:44:42 r-687-VM cloud: VR config: create file success
> > Dec 20 01:44:42 r-687-VM cloud: VR config: executing:
> > /opt/cloud/bin/update_config.py vm_dhcp_entry.json
> > Dec 20 01:45:05 r-687-VM cloud: VR config: execution success
> > Dec 20 01:45:05 r-687-VM cloud: VR config: creating file:
> > /var/cache/cloud/vm_dhcp_entry.json
> > Dec 20 01:45:05 r-687-VM cloud: VR config: create file success
> > Dec 20 01:45:05 r-687-VM cloud: VR config: executing:
> > /opt/cloud/bin/update_config.py vm_dhcp_entry.json
> > Dec 20 01:45:27 r-687-VM cloud: VR config: execution success
> > Dec 20 01:45:27 r-687-VM cloud: VR config: creating file:
> > /var/cache/cloud/vm_dhcp_entry.json
> > Dec 20 01:45:27 r-687-VM cloud: VR config: create file success
> > Dec 20 01:45:27 r-687-VM cloud: VR config: executing:
> > /opt/cloud/bin/update_config.py vm_dhcp_entry.json
> > Dec 20 01:45:49 r-687-VM cloud: VR config: execution success
> > Dec 20 01:45:49 r-687-VM cloud: VR config: creating file:
> > /var/cache/cloud/vm_dhcp_entry.json
> > Dec 20 01:45:49 r-687-VM cloud: VR config: create file success
> > Dec 20 01:45:49 r-687-VM cloud: VR config: executing:
> > /opt/cloud/bin/update_config.py vm_dhcp_entry.json
> > Dec 20 01:46:12 r-687-VM cloud: VR config: execution success
> > Dec 20 01:46:12 r-687-VM cloud: VR config: creating file:
> > /var/cache/cloud/vm_dhcp_entry.json
> > Dec 20 01:46:12 r-687-VM cloud: VR config: create file success
> > Dec 20 01:46:12 r-687-VM cloud: VR config: executing:
> > /opt/cloud/bin/update_config.py vm_dhcp_entry.json
> > Dec 20 01:46:22 r-687-VM shutdown[3919]: shutting down for system halt
> >
> > Broadcast message from root@r-687-VM (Tue Dec 20 01:46:22 2016):
> >
> > The system is going down for system halt NOW!
> > Dec 20 01:46:22 r-687-VM shutdown[3962]: shutting down for system halt
> >
> > Broadcast message from root@r-687-VM (Tue Dec 20 01:46:22 2016):
> >
> > Power button pressed
> > The system is going down for system halt NOW!
> > Dec 20 01:46:23 r-687-VM KVP: KVP starting; pid is:4037
> > Dec 20 01:46:23 r-687-VM cloud: VR config: executing failed:
> > /opt/cloud/bin/update_config.py vm_dhcp_entry.json
> > debug1: channel 0: free: client-session, nchannels 1
> > Connection to 169.254.0.197 closed by remote host.
> > Connection to 169.254.0.197 closed.
> > Transferred: sent 4336, received 93744 bytes, in 180.3 seconds
> > Bytes per second: sent 24.0, received 519.8
> > debug1: Exit status -1
> >
> > Looks like the config script didn't get past vm_dhcp_entry.json ?
> >
> > Thanks.
> >
> >>
> >>
> >>
> >> ________________________________
> >> From: Syahrul Sazli Shaharir <sa...@pulasan.my>
> >> Sent: Monday, December 19, 2016 2:09 AM
> >> To: users@cloudstack.apache.org
> >> Subject: Re: Router VM: patchviasocket.py timeout issue on 1 out of 4
> networks
> >>
> >> On Tue, Dec 13, 2016 at 7:26 PM, Syahrul Sazli Shaharir
> >> <sa...@pulasan.my> wrote:
> >>> Hi Simon,
> >>>
> >>> On Tue, Dec 13, 2016 at 10:31 AM, Simon Weller <swel...@ena.com>
> wrote:
> >>>> Can you turn on agent debug mode and take a look at the debug level
> logs?
> >>>>
> >>>>
> >>>> You can do that by running sed -i 's/INFO/DEBUG/g'
> /etc/cloudstack/agent/log4j-cloud.xml on the host and then restarting the
> agent.
> >>>
> >>> Here are the debug logs - patchviasocket.py executed OK but couldn't
> >>> connect to the router VM's internal IP:-
> >>>
> >>> 2016-12-13 19:23:18,627 DEBUG [kvm.resource.LibvirtComputingResource]
> >>> (agentRequest-Handler-4:null) (logid:0bf9a356) Executing:
> >>> /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/
> patchviasocket.py
> >>> -n r-669-VM -p %template=domP%name=r-669-VM%
> eth0ip=10.3.28.10%eth0mask=255.255.255.0%gateway=10.3.28.1%domain=
> nocser.net%cidrsize=24%dhcprange=10.3.28.1%eth1ip=
> 169.254.3.7%eth1mask=255.255.0.0%type=dhcpsrvr%disable_rp_
> filter=true%dns1=8.8.8.8%dns2=8.8.4.4%ip6dns1=%ip6dns2=%
> baremetalnotificationsecuritykey=uavJByNGGjNLrELG-qbdN99__
> 1I3tnp8qa0KbcsKokKJcPB43K9s6oQu2nMLqo3YP8p6jqDy5XT3WWOWBA2yN
> w%baremetalnotificationapikey=8JH4mdkxsEMhgIBgMonkNXAEKjVOeZ
> nG1m5UVekvvo4v_iXQ4ZS7rh6NNS0qphhc7ZrCauiz23tp2-Wa3AASlg%host=10.2.30.11%
> port=8080
> >>> 2016-12-13 19:23:18,739 DEBUG [kvm.resource.LibvirtComputingResource]
> >>> (agentRequest-Handler-4:null) (logid:0bf9a356) Execution is
> >>> successful.
> >>> 2016-12-13 19:23:18,742 DEBUG
> >>> [resource.virtualnetwork.VirtualRoutingResource]
> >>> (agentRequest-Handler-4:null) (logid:0bf9a356) Trying to connect to
> >>> 169.254.3.7
> >>> 2016-12-13 19:23:21,749 DEBUG
> >>> [resource.virtualnetwork.VirtualRoutingResource]
> >>> (agentRequest-Handler-4:null) (logid:0bf9a356) Could not connect to
> >>> 169.254.3.7
> >>> 2016-12-13 19:23:26,750 DEBUG
> >>> [resource.virtualnetwork.VirtualRoutingResource]
> >>> (agentRequest-Handler-4:null) (logid:0bf9a356) Trying to connect to
> >>> 169.254.3.7
> >>> 2016-12-13 19:23:29,757 DEBUG
> >>> [resource.virtualnetwork.VirtualRoutingResource]
> >>> (agentRequest-Handler-4:null) (logid:0bf9a356) Could not connect to
> >>> 169.254.3.7
> >>> 2016-12-13 19:23:29,869 DEBUG [cloud.agent.Agent]
> >>> (agentRequest-Handler-5:null) (logid:981a5f6f) Processing command:
> >>> com.cloud.agent.api.GetHostStatsCommand
> >>> 2016-12-13 19:23:34,759 DEBUG
> >>> [resource.virtualnetwork.VirtualRoutingResource]
> >>> (agentRequest-Handler-4:null) (logid:0bf9a356) Unable to logon to
> >>> 169.254.3.7
> >>>
> >>> virsh console also failed to show anything.
> >>
> >> Ok after upgrading to latest qemu-kvm-ev-2.6.0-27.1.el7, this time I
> >> got to the console at some stage, but patchviasocket.py still times
> >> out. Here are the console output:-
> >>
> >> http://pastebin.com/n37aHeSa
> >> [http://pastebin.com/i/facebook.png]<http://pastebin.com/n37aHeSa>
> >>
> >> Router VM's short lifetime - Pastebin.com<http://pastebin.com/n37aHeSa>
> >> pastebin.com
> >>
> >>
> >>
> >>
> >> Thanks.
> >>
> >>
> >>>> ________________________________
> >>>> From: Syahrul Sazli Shaharir <sa...@pulasan.my>
> >>>> Sent: Monday, December 12, 2016 8:21 PM
> >>>> To: users@cloudstack.apache.org
> >>>> Subject: Router VM: patchviasocket.py timeout issue on 1 out of 4
> networks
> >>>>
> >>>> Hi,
> >>>>
> >>>> I am running latest Cloudstack 4.9.0.1 on CentOS 7 KVM + ceph
> >>>> environment. After running for some time, I faced with an issue with
> >>>> one out of 4 networks - following a heartbeat-induced reset on all
> >>>> hosts, the associated virtual router would not get recreated and
> >>>> started properly on any of the 3 hosts I have, even after repeated
> >>>> attempts of the following:-
> >>>> - destroy-recreate cycles, via Cloudstack UI
> >>>> - restartNetwork cleanup=true API calls (failed with errorcode = 530).
> >>>> - redownload and reregister system VM template as another entry and
> >>>> assign to router VM in global setting (boots the new template OK, but
> >>>> still same problem)
> >>>> - tweak default system offering for router VM (increased RAM from 256
> to 512MB)
> >>>> - created new system offering, with RAM tweak, and use of ceph rbd
> >>>> store, and assigned it to Cloud.Com-SoftwareRouter as per docs - which
> >>>> didnt work for some reason: it kept on using initial default offering
> >>>> and created image on local host storage
> >>>> - upgrade to latest cloudstack (previously was running 4.8)
> >>>>
> >>>> As with a handful of others in this list archives, virsh list and
> >>>> dumpxml shows the VM created OK but failed soon after booting, as
> >>>> found in the following error in agent.log :-
> >>>>
> >>>> 2016-12-13 10:03:33,894 WARN  [kvm.resource.LibvirtComputingResource]
> >>>> (agentRequest-Handler-1:null) (logid:633e6e03) Timed out:
> >>>> /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/
> patchviasocket.py
> >>>> -n r-668-VM -p %template=domP%name=r-668-VM%
> eth0ip=10.3.28.10%eth0mask=255.255.255.0%gateway=10.3.28.1%domain=
> nocser.net%cidrsize=24%dhcprange=10.3.28.1%eth1ip=
> 169.254.0.33%eth1mask=255.255.0.0%type=dhcpsrvr%disable_rp_
> filter=true%dns1=8.8.8.8%dns2=8.8.4.4%ip6dns1=%ip6dns2=%
> baremetalnotificationsecuritykey=uavJByNGGjNLrELG-qbdN99__
> 1I3tnp8qa0KbcsKokKJcPB43K9s6oQu2nMLqo3YP8p6jqDy5XT3WWOWBA2yN
> w%baremetalnotificationapikey=8JH4mdkxsEMhgIBgMonkNXAEKjVOeZ
> nG1m5UVekvvo4v_iXQ4ZS7rh6NNS0qphhc7ZrCauiz23tp2-Wa3AASlg%host=10.2.30.11%
> port=8080
> >>>> .  Output is:
> >>>> .....
> >>>> 2016-12-13 10:05:45,895 WARN  [kvm.resource.LibvirtComputingResource]
> >>>> (agentRequest-Handler-1:null) (logid:633e6e03) Timed out:
> >>>> /usr/share/cloudstack-common/scripts/network/domr/router_proxy.sh
> >>>> vr_cfg.sh 169.254.0.33 -c
> >>>> /var/cache/cloud/VR-48ea8a95-6c02-499f-88d3-eae5bf9f9fbe.cfg .
> Output
> >>>> is:
> >>>>
> >>>> As mentioned, this only happens with 1 network (always the same
> >>>> network). The other router VMs work OK. Any clues on how to
> >>>> troubleshoot this further, would be greatly appreciated.
> >>>>
> >>>> Thanks.
> >>>>
> >>>> --
> >>>> --sazli
> >
> >
> >
> > --
> > --sazli
>
>
>
> --
> --sazli
> Syahrul Sazli Shaharir <sa...@pulasan.my>
> Mobile: +6019 385 8301 - YM/Skype: syahrulsazli
> System Administrator
> TMK Pulasan (002339810-M) http://pulasan.my/
> 11 Jalan 3/4, 43650 Bandar Baru Bangi, Selangor, Malaysia.
> Tel/Fax: +603 8926 0338
>

Reply via email to