Re: Router VM: patchviasocket.py timeout issue on 1 out of 4 networks

2017-02-17 Thread Jeff Hair
 01:38:38 r-686-VM cloud: VR config: create file success
> > Dec 20 01:38:38 r-686-VM cloud: VR config: executing:
> > /opt/cloud/bin/update_config.py vm_dhcp_entry.json
> > Dec 20 01:39:01 r-686-VM cloud: VR config: execution success
> > Dec 20 01:39:01 r-686-VM cloud: VR config: creating file:
> > /var/cache/cloud/vm_metadata.json
> > Dec 20 01:39:01 r-686-VM cloud: VR config: create file success
> > Dec 20 01:39:01 r-686-VM cloud: VR config: executing:
> > /opt/cloud/bin/update_config.py vm_metadata.json
> > Dec 20 01:39:21 r-686-VM cloud: VR config: execution success
> > Dec 20 01:39:21 r-686-VM cloud: VR config: creating file:
> > /var/cache/cloud/vm_metadata.json
> > Dec 20 01:39:21 r-686-VM cloud: VR config: create file success
> > Dec 20 01:39:21 r-686-VM cloud: VR config: executing:
> > /opt/cloud/bin/update_config.py vm_metadata.json
> > Dec 20 01:39:41 r-686-VM cloud: VR config: execution success
> > Dec 20 01:39:41 r-686-VM cloud: VR config: Flushing conntrack table
> > Dec 20 01:39:41 r-686-VM cloud: VR config: Flushing conntrack table
> completed
> > Dec 20 01:39:42 r-686-VM cloud: VR config: configuation format version
> 1.0
> > Dec 20 01:39:42 r-686-VM cloud: VR config: Flushing conntrack table
> > Dec 20 01:39:42 r-686-VM cloud: VR config: Flushing conntrack table
> completed
> >
> > 2. Non-working network router VM ( http://pastebin.com/jzfGMGQB ):-
> > .
> >
> > Dec 20 01:44:21 r-687-VM cloud: Boot up process done
> > Dec 20 01:44:22 r-687-VM cloud: VR config: configuation format version
> 1.0
> > Dec 20 01:44:22 r-687-VM cloud: VR config: creating file:
> > /var/cache/cloud/monitor_service.json
> > Dec 20 01:44:22 r-687-VM cloud: VR config: create file success
> > Dec 20 01:44:22 r-687-VM cloud: VR config: executing:
> > /opt/cloud/bin/update_config.py monitor_service.json
> > Dec 20 01:44:42 r-687-VM cloud: VR config: execution success
> > Dec 20 01:44:42 r-687-VM cloud: VR config: creating file:
> > /var/cache/cloud/vm_dhcp_entry.json
> > Dec 20 01:44:42 r-687-VM cloud: VR config: create file success
> > Dec 20 01:44:42 r-687-VM cloud: VR config: executing:
> > /opt/cloud/bin/update_config.py vm_dhcp_entry.json
> > Dec 20 01:45:05 r-687-VM cloud: VR config: execution success
> > Dec 20 01:45:05 r-687-VM cloud: VR config: creating file:
> > /var/cache/cloud/vm_dhcp_entry.json
> > Dec 20 01:45:05 r-687-VM cloud: VR config: create file success
> > Dec 20 01:45:05 r-687-VM cloud: VR config: executing:
> > /opt/cloud/bin/update_config.py vm_dhcp_entry.json
> > Dec 20 01:45:27 r-687-VM cloud: VR config: execution success
> > Dec 20 01:45:27 r-687-VM cloud: VR config: creating file:
> > /var/cache/cloud/vm_dhcp_entry.json
> > Dec 20 01:45:27 r-687-VM cloud: VR config: create file success
> > Dec 20 01:45:27 r-687-VM cloud: VR config: executing:
> > /opt/cloud/bin/update_config.py vm_dhcp_entry.json
> > Dec 20 01:45:49 r-687-VM cloud: VR config: execution success
> > Dec 20 01:45:49 r-687-VM cloud: VR config: creating file:
> > /var/cache/cloud/vm_dhcp_entry.json
> > Dec 20 01:45:49 r-687-VM cloud: VR config: create file success
> > Dec 20 01:45:49 r-687-VM cloud: VR config: executing:
> > /opt/cloud/bin/update_config.py vm_dhcp_entry.json
> > Dec 20 01:46:12 r-687-VM cloud: VR config: execution success
> > Dec 20 01:46:12 r-687-VM cloud: VR config: creating file:
> > /var/cache/cloud/vm_dhcp_entry.json
> > Dec 20 01:46:12 r-687-VM cloud: VR config: create file success
> > Dec 20 01:46:12 r-687-VM cloud: VR config: executing:
> > /opt/cloud/bin/update_config.py vm_dhcp_entry.json
> > Dec 20 01:46:22 r-687-VM shutdown[3919]: shutting down for system halt
> >
> > Broadcast message from root@r-687-VM (Tue Dec 20 01:46:22 2016):
> >
> > The system is going down for system halt NOW!
> > Dec 20 01:46:22 r-687-VM shutdown[3962]: shutting down for system halt
> >
> > Broadcast message from root@r-687-VM (Tue Dec 20 01:46:22 2016):
> >
> > Power button pressed
> > The system is going down for system halt NOW!
> > Dec 20 01:46:23 r-687-VM KVP: KVP starting; pid is:4037
> > Dec 20 01:46:23 r-687-VM cloud: VR config: executing failed:
> > /opt/cloud/bin/update_config.py vm_dhcp_entry.json
> > debug1: channel 0: free: client-session, nchannels 1
> > Connection to 169.254.0.197 closed by remote host.
> > Connection to 169.254.0.197 closed.
> > Transferred: sent 4336, received 93744 bytes, in 180.3 seconds
> > Bytes per second: sent 24.0, received 519.8
> > debug1: Exit status -1
> >
> > Looks like t

Re: Router VM: patchviasocket.py timeout issue on 1 out of 4 networks

2017-02-16 Thread Syahrul Sazli Shaharir
1 r-686-VM cloud: VR config: execution success
> Dec 20 01:39:41 r-686-VM cloud: VR config: Flushing conntrack table
> Dec 20 01:39:41 r-686-VM cloud: VR config: Flushing conntrack table completed
> Dec 20 01:39:42 r-686-VM cloud: VR config: configuation format version 1.0
> Dec 20 01:39:42 r-686-VM cloud: VR config: Flushing conntrack table
> Dec 20 01:39:42 r-686-VM cloud: VR config: Flushing conntrack table completed
>
> 2. Non-working network router VM ( http://pastebin.com/jzfGMGQB ):-
> .
>
> Dec 20 01:44:21 r-687-VM cloud: Boot up process done
> Dec 20 01:44:22 r-687-VM cloud: VR config: configuation format version 1.0
> Dec 20 01:44:22 r-687-VM cloud: VR config: creating file:
> /var/cache/cloud/monitor_service.json
> Dec 20 01:44:22 r-687-VM cloud: VR config: create file success
> Dec 20 01:44:22 r-687-VM cloud: VR config: executing:
> /opt/cloud/bin/update_config.py monitor_service.json
> Dec 20 01:44:42 r-687-VM cloud: VR config: execution success
> Dec 20 01:44:42 r-687-VM cloud: VR config: creating file:
> /var/cache/cloud/vm_dhcp_entry.json
> Dec 20 01:44:42 r-687-VM cloud: VR config: create file success
> Dec 20 01:44:42 r-687-VM cloud: VR config: executing:
> /opt/cloud/bin/update_config.py vm_dhcp_entry.json
> Dec 20 01:45:05 r-687-VM cloud: VR config: execution success
> Dec 20 01:45:05 r-687-VM cloud: VR config: creating file:
> /var/cache/cloud/vm_dhcp_entry.json
> Dec 20 01:45:05 r-687-VM cloud: VR config: create file success
> Dec 20 01:45:05 r-687-VM cloud: VR config: executing:
> /opt/cloud/bin/update_config.py vm_dhcp_entry.json
> Dec 20 01:45:27 r-687-VM cloud: VR config: execution success
> Dec 20 01:45:27 r-687-VM cloud: VR config: creating file:
> /var/cache/cloud/vm_dhcp_entry.json
> Dec 20 01:45:27 r-687-VM cloud: VR config: create file success
> Dec 20 01:45:27 r-687-VM cloud: VR config: executing:
> /opt/cloud/bin/update_config.py vm_dhcp_entry.json
> Dec 20 01:45:49 r-687-VM cloud: VR config: execution success
> Dec 20 01:45:49 r-687-VM cloud: VR config: creating file:
> /var/cache/cloud/vm_dhcp_entry.json
> Dec 20 01:45:49 r-687-VM cloud: VR config: create file success
> Dec 20 01:45:49 r-687-VM cloud: VR config: executing:
> /opt/cloud/bin/update_config.py vm_dhcp_entry.json
> Dec 20 01:46:12 r-687-VM cloud: VR config: execution success
> Dec 20 01:46:12 r-687-VM cloud: VR config: creating file:
> /var/cache/cloud/vm_dhcp_entry.json
> Dec 20 01:46:12 r-687-VM cloud: VR config: create file success
> Dec 20 01:46:12 r-687-VM cloud: VR config: executing:
> /opt/cloud/bin/update_config.py vm_dhcp_entry.json
> Dec 20 01:46:22 r-687-VM shutdown[3919]: shutting down for system halt
>
> Broadcast message from root@r-687-VM (Tue Dec 20 01:46:22 2016):
>
> The system is going down for system halt NOW!
> Dec 20 01:46:22 r-687-VM shutdown[3962]: shutting down for system halt
>
> Broadcast message from root@r-687-VM (Tue Dec 20 01:46:22 2016):
>
> Power button pressed
> The system is going down for system halt NOW!
> Dec 20 01:46:23 r-687-VM KVP: KVP starting; pid is:4037
> Dec 20 01:46:23 r-687-VM cloud: VR config: executing failed:
> /opt/cloud/bin/update_config.py vm_dhcp_entry.json
> debug1: channel 0: free: client-session, nchannels 1
> Connection to 169.254.0.197 closed by remote host.
> Connection to 169.254.0.197 closed.
> Transferred: sent 4336, received 93744 bytes, in 180.3 seconds
> Bytes per second: sent 24.0, received 519.8
> debug1: Exit status -1
>
> Looks like the config script didn't get past vm_dhcp_entry.json ?
>
> Thanks.
>
>>
>>
>>
>> 
>> From: Syahrul Sazli Shaharir <sa...@pulasan.my>
>> Sent: Monday, December 19, 2016 2:09 AM
>> To: users@cloudstack.apache.org
>> Subject: Re: Router VM: patchviasocket.py timeout issue on 1 out of 4 
>> networks
>>
>> On Tue, Dec 13, 2016 at 7:26 PM, Syahrul Sazli Shaharir
>> <sa...@pulasan.my> wrote:
>>> Hi Simon,
>>>
>>> On Tue, Dec 13, 2016 at 10:31 AM, Simon Weller <swel...@ena.com> wrote:
>>>> Can you turn on agent debug mode and take a look at the debug level logs?
>>>>
>>>>
>>>> You can do that by running sed -i 's/INFO/DEBUG/g' 
>>>> /etc/cloudstack/agent/log4j-cloud.xml on the host and then restarting the 
>>>> agent.
>>>
>>> Here are the debug logs - patchviasocket.py executed OK but couldn't
>>> connect to the router VM's internal IP:-
>>>
>>> 2016-12-13 19:23:18,627 DEBUG [kvm.resource.LibvirtComputingResource]
>>> (agentRequest-Handler-4:null) (logid:0bf9a356) Executing:
>>> /usr/share/cloudstack-common/scripts/vm/hy

Re: Router VM: patchviasocket.py timeout issue on 1 out of 4 networks

2016-12-19 Thread Syahrul Sazli Shaharir
ting file:
/var/cache/cloud/vm_dhcp_entry.json
Dec 20 01:44:42 r-687-VM cloud: VR config: create file success
Dec 20 01:44:42 r-687-VM cloud: VR config: executing:
/opt/cloud/bin/update_config.py vm_dhcp_entry.json
Dec 20 01:45:05 r-687-VM cloud: VR config: execution success
Dec 20 01:45:05 r-687-VM cloud: VR config: creating file:
/var/cache/cloud/vm_dhcp_entry.json
Dec 20 01:45:05 r-687-VM cloud: VR config: create file success
Dec 20 01:45:05 r-687-VM cloud: VR config: executing:
/opt/cloud/bin/update_config.py vm_dhcp_entry.json
Dec 20 01:45:27 r-687-VM cloud: VR config: execution success
Dec 20 01:45:27 r-687-VM cloud: VR config: creating file:
/var/cache/cloud/vm_dhcp_entry.json
Dec 20 01:45:27 r-687-VM cloud: VR config: create file success
Dec 20 01:45:27 r-687-VM cloud: VR config: executing:
/opt/cloud/bin/update_config.py vm_dhcp_entry.json
Dec 20 01:45:49 r-687-VM cloud: VR config: execution success
Dec 20 01:45:49 r-687-VM cloud: VR config: creating file:
/var/cache/cloud/vm_dhcp_entry.json
Dec 20 01:45:49 r-687-VM cloud: VR config: create file success
Dec 20 01:45:49 r-687-VM cloud: VR config: executing:
/opt/cloud/bin/update_config.py vm_dhcp_entry.json
Dec 20 01:46:12 r-687-VM cloud: VR config: execution success
Dec 20 01:46:12 r-687-VM cloud: VR config: creating file:
/var/cache/cloud/vm_dhcp_entry.json
Dec 20 01:46:12 r-687-VM cloud: VR config: create file success
Dec 20 01:46:12 r-687-VM cloud: VR config: executing:
/opt/cloud/bin/update_config.py vm_dhcp_entry.json
Dec 20 01:46:22 r-687-VM shutdown[3919]: shutting down for system halt

Broadcast message from root@r-687-VM (Tue Dec 20 01:46:22 2016):

The system is going down for system halt NOW!
Dec 20 01:46:22 r-687-VM shutdown[3962]: shutting down for system halt

Broadcast message from root@r-687-VM (Tue Dec 20 01:46:22 2016):

Power button pressed
The system is going down for system halt NOW!
Dec 20 01:46:23 r-687-VM KVP: KVP starting; pid is:4037
Dec 20 01:46:23 r-687-VM cloud: VR config: executing failed:
/opt/cloud/bin/update_config.py vm_dhcp_entry.json
debug1: channel 0: free: client-session, nchannels 1
Connection to 169.254.0.197 closed by remote host.
Connection to 169.254.0.197 closed.
Transferred: sent 4336, received 93744 bytes, in 180.3 seconds
Bytes per second: sent 24.0, received 519.8
debug1: Exit status -1

Looks like the config script didn't get past vm_dhcp_entry.json ?

Thanks.

>
>
>
> 
> From: Syahrul Sazli Shaharir <sa...@pulasan.my>
> Sent: Monday, December 19, 2016 2:09 AM
> To: users@cloudstack.apache.org
> Subject: Re: Router VM: patchviasocket.py timeout issue on 1 out of 4 networks
>
> On Tue, Dec 13, 2016 at 7:26 PM, Syahrul Sazli Shaharir
> <sa...@pulasan.my> wrote:
>> Hi Simon,
>>
>> On Tue, Dec 13, 2016 at 10:31 AM, Simon Weller <swel...@ena.com> wrote:
>>> Can you turn on agent debug mode and take a look at the debug level logs?
>>>
>>>
>>> You can do that by running sed -i 's/INFO/DEBUG/g' 
>>> /etc/cloudstack/agent/log4j-cloud.xml on the host and then restarting the 
>>> agent.
>>
>> Here are the debug logs - patchviasocket.py executed OK but couldn't
>> connect to the router VM's internal IP:-
>>
>> 2016-12-13 19:23:18,627 DEBUG [kvm.resource.LibvirtComputingResource]
>> (agentRequest-Handler-4:null) (logid:0bf9a356) Executing:
>> /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/patchviasocket.py
>> -n r-669-VM -p 
>> %template=domP%name=r-669-VM%eth0ip=10.3.28.10%eth0mask=255.255.255.0%gateway=10.3.28.1%domain=nocser.net%cidrsize=24%dhcprange=10.3.28.1%eth1ip=169.254.3.7%eth1mask=255.255.0.0%type=dhcpsrvr%disable_rp_filter=true%dns1=8.8.8.8%dns2=8.8.4.4%ip6dns1=%ip6dns2=%baremetalnotificationsecuritykey=uavJByNGGjNLrELG-qbdN99__1I3tnp8qa0KbcsKokKJcPB43K9s6oQu2nMLqo3YP8p6jqDy5XT3WWOWBA2yNw%baremetalnotificationapikey=8JH4mdkxsEMhgIBgMonkNXAEKjVOeZnG1m5UVekvvo4v_iXQ4ZS7rh6NNS0qphhc7ZrCauiz23tp2-Wa3AASlg%host=10.2.30.11%port=8080
>> 2016-12-13 19:23:18,739 DEBUG [kvm.resource.LibvirtComputingResource]
>> (agentRequest-Handler-4:null) (logid:0bf9a356) Execution is
>> successful.
>> 2016-12-13 19:23:18,742 DEBUG
>> [resource.virtualnetwork.VirtualRoutingResource]
>> (agentRequest-Handler-4:null) (logid:0bf9a356) Trying to connect to
>> 169.254.3.7
>> 2016-12-13 19:23:21,749 DEBUG
>> [resource.virtualnetwork.VirtualRoutingResource]
>> (agentRequest-Handler-4:null) (logid:0bf9a356) Could not connect to
>> 169.254.3.7
>> 2016-12-13 19:23:26,750 DEBUG
>> [resource.virtualnetwork.VirtualRoutingResource]
>> (agentRequest-Handler-4:null) (logid:0bf9a356) Trying to connect to
>> 169.254.3.7
>> 2016-12-13 19:23:29,757 DEBUG
>> [resource.virtualnetwork.VirtualR

Re: Router VM: patchviasocket.py timeout issue on 1 out of 4 networks

2016-12-19 Thread Simon Weller
When you're in the console, can you ping the host ip?

What are your ip tables rules on this host currently?

Can you dump the routing table as well?


Have you tried a restart of one of the working networks to see if it fails on 
restart?




From: Syahrul Sazli Shaharir <sa...@pulasan.my>
Sent: Monday, December 19, 2016 2:09 AM
To: users@cloudstack.apache.org
Subject: Re: Router VM: patchviasocket.py timeout issue on 1 out of 4 networks

On Tue, Dec 13, 2016 at 7:26 PM, Syahrul Sazli Shaharir
<sa...@pulasan.my> wrote:
> Hi Simon,
>
> On Tue, Dec 13, 2016 at 10:31 AM, Simon Weller <swel...@ena.com> wrote:
>> Can you turn on agent debug mode and take a look at the debug level logs?
>>
>>
>> You can do that by running sed -i 's/INFO/DEBUG/g' 
>> /etc/cloudstack/agent/log4j-cloud.xml on the host and then restarting the 
>> agent.
>
> Here are the debug logs - patchviasocket.py executed OK but couldn't
> connect to the router VM's internal IP:-
>
> 2016-12-13 19:23:18,627 DEBUG [kvm.resource.LibvirtComputingResource]
> (agentRequest-Handler-4:null) (logid:0bf9a356) Executing:
> /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/patchviasocket.py
> -n r-669-VM -p 
> %template=domP%name=r-669-VM%eth0ip=10.3.28.10%eth0mask=255.255.255.0%gateway=10.3.28.1%domain=nocser.net%cidrsize=24%dhcprange=10.3.28.1%eth1ip=169.254.3.7%eth1mask=255.255.0.0%type=dhcpsrvr%disable_rp_filter=true%dns1=8.8.8.8%dns2=8.8.4.4%ip6dns1=%ip6dns2=%baremetalnotificationsecuritykey=uavJByNGGjNLrELG-qbdN99__1I3tnp8qa0KbcsKokKJcPB43K9s6oQu2nMLqo3YP8p6jqDy5XT3WWOWBA2yNw%baremetalnotificationapikey=8JH4mdkxsEMhgIBgMonkNXAEKjVOeZnG1m5UVekvvo4v_iXQ4ZS7rh6NNS0qphhc7ZrCauiz23tp2-Wa3AASlg%host=10.2.30.11%port=8080
> 2016-12-13 19:23:18,739 DEBUG [kvm.resource.LibvirtComputingResource]
> (agentRequest-Handler-4:null) (logid:0bf9a356) Execution is
> successful.
> 2016-12-13 19:23:18,742 DEBUG
> [resource.virtualnetwork.VirtualRoutingResource]
> (agentRequest-Handler-4:null) (logid:0bf9a356) Trying to connect to
> 169.254.3.7
> 2016-12-13 19:23:21,749 DEBUG
> [resource.virtualnetwork.VirtualRoutingResource]
> (agentRequest-Handler-4:null) (logid:0bf9a356) Could not connect to
> 169.254.3.7
> 2016-12-13 19:23:26,750 DEBUG
> [resource.virtualnetwork.VirtualRoutingResource]
> (agentRequest-Handler-4:null) (logid:0bf9a356) Trying to connect to
> 169.254.3.7
> 2016-12-13 19:23:29,757 DEBUG
> [resource.virtualnetwork.VirtualRoutingResource]
> (agentRequest-Handler-4:null) (logid:0bf9a356) Could not connect to
> 169.254.3.7
> 2016-12-13 19:23:29,869 DEBUG [cloud.agent.Agent]
> (agentRequest-Handler-5:null) (logid:981a5f6f) Processing command:
> com.cloud.agent.api.GetHostStatsCommand
> 2016-12-13 19:23:34,759 DEBUG
> [resource.virtualnetwork.VirtualRoutingResource]
> (agentRequest-Handler-4:null) (logid:0bf9a356) Unable to logon to
> 169.254.3.7
>
> virsh console also failed to show anything.

Ok after upgrading to latest qemu-kvm-ev-2.6.0-27.1.el7, this time I
got to the console at some stage, but patchviasocket.py still times
out. Here are the console output:-

http://pastebin.com/n37aHeSa
[http://pastebin.com/i/facebook.png]<http://pastebin.com/n37aHeSa>

Router VM's short lifetime - Pastebin.com<http://pastebin.com/n37aHeSa>
pastebin.com




Thanks.


>> 
>> From: Syahrul Sazli Shaharir <sa...@pulasan.my>
>> Sent: Monday, December 12, 2016 8:21 PM
>> To: users@cloudstack.apache.org
>> Subject: Router VM: patchviasocket.py timeout issue on 1 out of 4 networks
>>
>> Hi,
>>
>> I am running latest Cloudstack 4.9.0.1 on CentOS 7 KVM + ceph
>> environment. After running for some time, I faced with an issue with
>> one out of 4 networks - following a heartbeat-induced reset on all
>> hosts, the associated virtual router would not get recreated and
>> started properly on any of the 3 hosts I have, even after repeated
>> attempts of the following:-
>> - destroy-recreate cycles, via Cloudstack UI
>> - restartNetwork cleanup=true API calls (failed with errorcode = 530).
>> - redownload and reregister system VM template as another entry and
>> assign to router VM in global setting (boots the new template OK, but
>> still same problem)
>> - tweak default system offering for router VM (increased RAM from 256 to 
>> 512MB)
>> - created new system offering, with RAM tweak, and use of ceph rbd
>> store, and assigned it to Cloud.Com-SoftwareRouter as per docs - which
>> didnt work for some reason: it kept on using initial default offering
>> and created image on local host storage
>> - upgrade to latest cloudstack (previously was running 4.8)
>>
>> As

Re: Router VM: patchviasocket.py timeout issue on 1 out of 4 networks

2016-12-19 Thread Syahrul Sazli Shaharir
On Tue, Dec 13, 2016 at 7:26 PM, Syahrul Sazli Shaharir
 wrote:
> Hi Simon,
>
> On Tue, Dec 13, 2016 at 10:31 AM, Simon Weller  wrote:
>> Can you turn on agent debug mode and take a look at the debug level logs?
>>
>>
>> You can do that by running sed -i 's/INFO/DEBUG/g' 
>> /etc/cloudstack/agent/log4j-cloud.xml on the host and then restarting the 
>> agent.
>
> Here are the debug logs - patchviasocket.py executed OK but couldn't
> connect to the router VM's internal IP:-
>
> 2016-12-13 19:23:18,627 DEBUG [kvm.resource.LibvirtComputingResource]
> (agentRequest-Handler-4:null) (logid:0bf9a356) Executing:
> /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/patchviasocket.py
> -n r-669-VM -p 
> %template=domP%name=r-669-VM%eth0ip=10.3.28.10%eth0mask=255.255.255.0%gateway=10.3.28.1%domain=nocser.net%cidrsize=24%dhcprange=10.3.28.1%eth1ip=169.254.3.7%eth1mask=255.255.0.0%type=dhcpsrvr%disable_rp_filter=true%dns1=8.8.8.8%dns2=8.8.4.4%ip6dns1=%ip6dns2=%baremetalnotificationsecuritykey=uavJByNGGjNLrELG-qbdN99__1I3tnp8qa0KbcsKokKJcPB43K9s6oQu2nMLqo3YP8p6jqDy5XT3WWOWBA2yNw%baremetalnotificationapikey=8JH4mdkxsEMhgIBgMonkNXAEKjVOeZnG1m5UVekvvo4v_iXQ4ZS7rh6NNS0qphhc7ZrCauiz23tp2-Wa3AASlg%host=10.2.30.11%port=8080
> 2016-12-13 19:23:18,739 DEBUG [kvm.resource.LibvirtComputingResource]
> (agentRequest-Handler-4:null) (logid:0bf9a356) Execution is
> successful.
> 2016-12-13 19:23:18,742 DEBUG
> [resource.virtualnetwork.VirtualRoutingResource]
> (agentRequest-Handler-4:null) (logid:0bf9a356) Trying to connect to
> 169.254.3.7
> 2016-12-13 19:23:21,749 DEBUG
> [resource.virtualnetwork.VirtualRoutingResource]
> (agentRequest-Handler-4:null) (logid:0bf9a356) Could not connect to
> 169.254.3.7
> 2016-12-13 19:23:26,750 DEBUG
> [resource.virtualnetwork.VirtualRoutingResource]
> (agentRequest-Handler-4:null) (logid:0bf9a356) Trying to connect to
> 169.254.3.7
> 2016-12-13 19:23:29,757 DEBUG
> [resource.virtualnetwork.VirtualRoutingResource]
> (agentRequest-Handler-4:null) (logid:0bf9a356) Could not connect to
> 169.254.3.7
> 2016-12-13 19:23:29,869 DEBUG [cloud.agent.Agent]
> (agentRequest-Handler-5:null) (logid:981a5f6f) Processing command:
> com.cloud.agent.api.GetHostStatsCommand
> 2016-12-13 19:23:34,759 DEBUG
> [resource.virtualnetwork.VirtualRoutingResource]
> (agentRequest-Handler-4:null) (logid:0bf9a356) Unable to logon to
> 169.254.3.7
>
> virsh console also failed to show anything.

Ok after upgrading to latest qemu-kvm-ev-2.6.0-27.1.el7, this time I
got to the console at some stage, but patchviasocket.py still times
out. Here are the console output:-

http://pastebin.com/n37aHeSa

Thanks.


>> 
>> From: Syahrul Sazli Shaharir 
>> Sent: Monday, December 12, 2016 8:21 PM
>> To: users@cloudstack.apache.org
>> Subject: Router VM: patchviasocket.py timeout issue on 1 out of 4 networks
>>
>> Hi,
>>
>> I am running latest Cloudstack 4.9.0.1 on CentOS 7 KVM + ceph
>> environment. After running for some time, I faced with an issue with
>> one out of 4 networks - following a heartbeat-induced reset on all
>> hosts, the associated virtual router would not get recreated and
>> started properly on any of the 3 hosts I have, even after repeated
>> attempts of the following:-
>> - destroy-recreate cycles, via Cloudstack UI
>> - restartNetwork cleanup=true API calls (failed with errorcode = 530).
>> - redownload and reregister system VM template as another entry and
>> assign to router VM in global setting (boots the new template OK, but
>> still same problem)
>> - tweak default system offering for router VM (increased RAM from 256 to 
>> 512MB)
>> - created new system offering, with RAM tweak, and use of ceph rbd
>> store, and assigned it to Cloud.Com-SoftwareRouter as per docs - which
>> didnt work for some reason: it kept on using initial default offering
>> and created image on local host storage
>> - upgrade to latest cloudstack (previously was running 4.8)
>>
>> As with a handful of others in this list archives, virsh list and
>> dumpxml shows the VM created OK but failed soon after booting, as
>> found in the following error in agent.log :-
>>
>> 2016-12-13 10:03:33,894 WARN  [kvm.resource.LibvirtComputingResource]
>> (agentRequest-Handler-1:null) (logid:633e6e03) Timed out:
>> /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/patchviasocket.py
>> -n r-668-VM -p 
>> %template=domP%name=r-668-VM%eth0ip=10.3.28.10%eth0mask=255.255.255.0%gateway=10.3.28.1%domain=nocser.net%cidrsize=24%dhcprange=10.3.28.1%eth1ip=169.254.0.33%eth1mask=255.255.0.0%type=dhcpsrvr%disable_rp_filter=true%dns1=8.8.8.8%dns2=8.8.4.4%ip6dns1=%ip6dns2=%baremetalnotificationsecuritykey=uavJByNGGjNLrELG-qbdN99__1I3tnp8qa0KbcsKokKJcPB43K9s6oQu2nMLqo3YP8p6jqDy5XT3WWOWBA2yNw%baremetalnotificationapikey=8JH4mdkxsEMhgIBgMonkNXAEKjVOeZnG1m5UVekvvo4v_iXQ4ZS7rh6NNS0qphhc7ZrCauiz23tp2-Wa3AASlg%host=10.2.30.11%port=8080
>> .  Output is:
>> .
>> 2016-12-13 10:05:45,895 

Re: Router VM: patchviasocket.py timeout issue on 1 out of 4 networks

2016-12-16 Thread Syahrul Sazli Shaharir
On Fri, Dec 16, 2016 at 5:16 PM, Dag Sonstebo
 wrote:
> Hi Syahrul,
>
> It just struck me we had similar issues with patchviasocket.py and 
> python-argparse with one of our clients a while back, I believe our fix is 
> going into 4.9.1.0:
>
> https://github.com/apache/cloudstack/pull/1634

Hi Dag,

As I'm already running centos 7 with python 2.7, would this still apply?

Thanks.


>
> Regards,
> Dag Sonstebo
> Cloud Architect
> ShapeBlue
>
> On 15/12/2016, 23:09, "Syahrul Sazli Shaharir"  wrote:
>
> Hi Ilya,
>
> I've looked at the patch suggested, looks like it has been committed
> into qemu 2.4.0, and I can see the modified parts in the latest qemu
> 2.6.0 code. So I went ahead and installed qemu-kvm-ev-2.6.0-27.1 on
> one of the hosts. But the problem still persists. Perhaps I should
> bring this issue to that dev thread.
>
> Thanks for the help! :)
>
> On Thu, Dec 15, 2016 at 11:03 AM, ilya  
> wrote:
> > This will explain a bit more on how this issue came about and how to fix
> > it..
> > https://www.mail-archive.com/dev@cloudstack.apache.org/msg71559.html
> >
> > On 12/12/16 6:31 PM, Simon Weller wrote:
> >> Can you turn on agent debug mode and take a look at the debug level 
> logs?
> >>
> >>
> >> You can do that by running sed -i 's/INFO/DEBUG/g' 
> /etc/cloudstack/agent/log4j-cloud.xml on the host and then restarting the 
> agent.
> >>
> >>
> >> - Si
> >>
> >>
> >>
> >>
> >> 
> >> From: Syahrul Sazli Shaharir 
> >> Sent: Monday, December 12, 2016 8:21 PM
> >> To: users@cloudstack.apache.org
> >> Subject: Router VM: patchviasocket.py timeout issue on 1 out of 4 
> networks
> >>
> >> Hi,
> >>
> >> I am running latest Cloudstack 4.9.0.1 on CentOS 7 KVM + ceph
> >> environment. After running for some time, I faced with an issue with
> >> one out of 4 networks - following a heartbeat-induced reset on all
> >> hosts, the associated virtual router would not get recreated and
> >> started properly on any of the 3 hosts I have, even after repeated
> >> attempts of the following:-
> >> - destroy-recreate cycles, via Cloudstack UI
> >> - restartNetwork cleanup=true API calls (failed with errorcode = 530).
> >> - redownload and reregister system VM template as another entry and
> >> assign to router VM in global setting (boots the new template OK, but
> >> still same problem)
> >> - tweak default system offering for router VM (increased RAM from 256 
> to 512MB)
> >> - created new system offering, with RAM tweak, and use of ceph rbd
> >> store, and assigned it to Cloud.Com-SoftwareRouter as per docs - which
> >> didnt work for some reason: it kept on using initial default offering
> >> and created image on local host storage
> >> - upgrade to latest cloudstack (previously was running 4.8)
> >>
> >> As with a handful of others in this list archives, virsh list and
> >> dumpxml shows the VM created OK but failed soon after booting, as
> >> found in the following error in agent.log :-
> >>
> >> 2016-12-13 10:03:33,894 WARN  [kvm.resource.LibvirtComputingResource]
> >> (agentRequest-Handler-1:null) (logid:633e6e03) Timed out:
> >> 
> /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/patchviasocket.py
> >> -n r-668-VM -p 
> %template=domP%name=r-668-VM%eth0ip=10.3.28.10%eth0mask=255.255.255.0%gateway=10.3.28.1%domain=nocser.net%cidrsize=24%dhcprange=10.3.28.1%eth1ip=169.254.0.33%eth1mask=255.255.0.0%type=dhcpsrvr%disable_rp_filter=true%dns1=8.8.8.8%dns2=8.8.4.4%ip6dns1=%ip6dns2=%baremetalnotificationsecuritykey=uavJByNGGjNLrELG-qbdN99__1I3tnp8qa0KbcsKokKJcPB43K9s6oQu2nMLqo3YP8p6jqDy5XT3WWOWBA2yNw%baremetalnotificationapikey=8JH4mdkxsEMhgIBgMonkNXAEKjVOeZnG1m5UVekvvo4v_iXQ4ZS7rh6NNS0qphhc7ZrCauiz23tp2-Wa3AASlg%host=10.2.30.11%port=8080
> >> .  Output is:
> >> .
> >> 2016-12-13 10:05:45,895 WARN  [kvm.resource.LibvirtComputingResource]
> >> (agentRequest-Handler-1:null) (logid:633e6e03) Timed out:
> >> /usr/share/cloudstack-common/scripts/network/domr/router_proxy.sh
> >> vr_cfg.sh 169.254.0.33 -c
> >> /var/cache/cloud/VR-48ea8a95-6c02-499f-88d3-eae5bf9f9fbe.cfg .  Output
> >> is:
> >>
> >> As mentioned, this only happens with 1 network (always the same
> >> network). The other router VMs work OK. Any clues on how to
> >> troubleshoot this further, would be greatly appreciated.
> >>
> >> Thanks.
> >>
> >> --
> >> --sazli
> >> Syahrul Sazli Shaharir 
> >>
>
>
>
> --
> --sazli
> Syahrul Sazli Shaharir 
> Mobile: +6019 385 8301 - YM/Skype: syahrulsazli
> System Administrator
> TMK Pulasan (002339810-M) http://pulasan.my/
>   

Re: Router VM: patchviasocket.py timeout issue on 1 out of 4 networks

2016-12-16 Thread Dag Sonstebo
Hi Syahrul,

It just struck me we had similar issues with patchviasocket.py and 
python-argparse with one of our clients a while back, I believe our fix is 
going into 4.9.1.0:

https://github.com/apache/cloudstack/pull/1634 

Regards,
Dag Sonstebo
Cloud Architect
ShapeBlue

On 15/12/2016, 23:09, "Syahrul Sazli Shaharir"  wrote:

Hi Ilya,

I've looked at the patch suggested, looks like it has been committed
into qemu 2.4.0, and I can see the modified parts in the latest qemu
2.6.0 code. So I went ahead and installed qemu-kvm-ev-2.6.0-27.1 on
one of the hosts. But the problem still persists. Perhaps I should
bring this issue to that dev thread.

Thanks for the help! :)

On Thu, Dec 15, 2016 at 11:03 AM, ilya  wrote:
> This will explain a bit more on how this issue came about and how to fix
> it..
> https://www.mail-archive.com/dev@cloudstack.apache.org/msg71559.html
>
> On 12/12/16 6:31 PM, Simon Weller wrote:
>> Can you turn on agent debug mode and take a look at the debug level logs?
>>
>>
>> You can do that by running sed -i 's/INFO/DEBUG/g' 
/etc/cloudstack/agent/log4j-cloud.xml on the host and then restarting the agent.
>>
>>
>> - Si
>>
>>
>>
>>
>> 
>> From: Syahrul Sazli Shaharir 
>> Sent: Monday, December 12, 2016 8:21 PM
>> To: users@cloudstack.apache.org
>> Subject: Router VM: patchviasocket.py timeout issue on 1 out of 4 
networks
>>
>> Hi,
>>
>> I am running latest Cloudstack 4.9.0.1 on CentOS 7 KVM + ceph
>> environment. After running for some time, I faced with an issue with
>> one out of 4 networks - following a heartbeat-induced reset on all
>> hosts, the associated virtual router would not get recreated and
>> started properly on any of the 3 hosts I have, even after repeated
>> attempts of the following:-
>> - destroy-recreate cycles, via Cloudstack UI
>> - restartNetwork cleanup=true API calls (failed with errorcode = 530).
>> - redownload and reregister system VM template as another entry and
>> assign to router VM in global setting (boots the new template OK, but
>> still same problem)
>> - tweak default system offering for router VM (increased RAM from 256 to 
512MB)
>> - created new system offering, with RAM tweak, and use of ceph rbd
>> store, and assigned it to Cloud.Com-SoftwareRouter as per docs - which
>> didnt work for some reason: it kept on using initial default offering
>> and created image on local host storage
>> - upgrade to latest cloudstack (previously was running 4.8)
>>
>> As with a handful of others in this list archives, virsh list and
>> dumpxml shows the VM created OK but failed soon after booting, as
>> found in the following error in agent.log :-
>>
>> 2016-12-13 10:03:33,894 WARN  [kvm.resource.LibvirtComputingResource]
>> (agentRequest-Handler-1:null) (logid:633e6e03) Timed out:
>> /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/patchviasocket.py
>> -n r-668-VM -p 
%template=domP%name=r-668-VM%eth0ip=10.3.28.10%eth0mask=255.255.255.0%gateway=10.3.28.1%domain=nocser.net%cidrsize=24%dhcprange=10.3.28.1%eth1ip=169.254.0.33%eth1mask=255.255.0.0%type=dhcpsrvr%disable_rp_filter=true%dns1=8.8.8.8%dns2=8.8.4.4%ip6dns1=%ip6dns2=%baremetalnotificationsecuritykey=uavJByNGGjNLrELG-qbdN99__1I3tnp8qa0KbcsKokKJcPB43K9s6oQu2nMLqo3YP8p6jqDy5XT3WWOWBA2yNw%baremetalnotificationapikey=8JH4mdkxsEMhgIBgMonkNXAEKjVOeZnG1m5UVekvvo4v_iXQ4ZS7rh6NNS0qphhc7ZrCauiz23tp2-Wa3AASlg%host=10.2.30.11%port=8080
>> .  Output is:
>> .
>> 2016-12-13 10:05:45,895 WARN  [kvm.resource.LibvirtComputingResource]
>> (agentRequest-Handler-1:null) (logid:633e6e03) Timed out:
>> /usr/share/cloudstack-common/scripts/network/domr/router_proxy.sh
>> vr_cfg.sh 169.254.0.33 -c
>> /var/cache/cloud/VR-48ea8a95-6c02-499f-88d3-eae5bf9f9fbe.cfg .  Output
>> is:
>>
>> As mentioned, this only happens with 1 network (always the same
>> network). The other router VMs work OK. Any clues on how to
>> troubleshoot this further, would be greatly appreciated.
>>
>> Thanks.
>>
>> --
>> --sazli
>> Syahrul Sazli Shaharir 
>>



-- 
--sazli
Syahrul Sazli Shaharir 
Mobile: +6019 385 8301 - YM/Skype: syahrulsazli
System Administrator
TMK Pulasan (002339810-M) http://pulasan.my/
11 Jalan 3/4, 43650 Bandar Baru Bangi, Selangor, Malaysia.
Tel/Fax: +603 8926 0338



dag.sonst...@shapeblue.comĀ 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue
  
 



Re: Router VM: patchviasocket.py timeout issue on 1 out of 4 networks

2016-12-15 Thread Syahrul Sazli Shaharir
Hi Ilya,

I've looked at the patch suggested, looks like it has been committed
into qemu 2.4.0, and I can see the modified parts in the latest qemu
2.6.0 code. So I went ahead and installed qemu-kvm-ev-2.6.0-27.1 on
one of the hosts. But the problem still persists. Perhaps I should
bring this issue to that dev thread.

Thanks for the help! :)

On Thu, Dec 15, 2016 at 11:03 AM, ilya  wrote:
> This will explain a bit more on how this issue came about and how to fix
> it..
> https://www.mail-archive.com/dev@cloudstack.apache.org/msg71559.html
>
> On 12/12/16 6:31 PM, Simon Weller wrote:
>> Can you turn on agent debug mode and take a look at the debug level logs?
>>
>>
>> You can do that by running sed -i 's/INFO/DEBUG/g' 
>> /etc/cloudstack/agent/log4j-cloud.xml on the host and then restarting the 
>> agent.
>>
>>
>> - Si
>>
>>
>>
>>
>> 
>> From: Syahrul Sazli Shaharir 
>> Sent: Monday, December 12, 2016 8:21 PM
>> To: users@cloudstack.apache.org
>> Subject: Router VM: patchviasocket.py timeout issue on 1 out of 4 networks
>>
>> Hi,
>>
>> I am running latest Cloudstack 4.9.0.1 on CentOS 7 KVM + ceph
>> environment. After running for some time, I faced with an issue with
>> one out of 4 networks - following a heartbeat-induced reset on all
>> hosts, the associated virtual router would not get recreated and
>> started properly on any of the 3 hosts I have, even after repeated
>> attempts of the following:-
>> - destroy-recreate cycles, via Cloudstack UI
>> - restartNetwork cleanup=true API calls (failed with errorcode = 530).
>> - redownload and reregister system VM template as another entry and
>> assign to router VM in global setting (boots the new template OK, but
>> still same problem)
>> - tweak default system offering for router VM (increased RAM from 256 to 
>> 512MB)
>> - created new system offering, with RAM tweak, and use of ceph rbd
>> store, and assigned it to Cloud.Com-SoftwareRouter as per docs - which
>> didnt work for some reason: it kept on using initial default offering
>> and created image on local host storage
>> - upgrade to latest cloudstack (previously was running 4.8)
>>
>> As with a handful of others in this list archives, virsh list and
>> dumpxml shows the VM created OK but failed soon after booting, as
>> found in the following error in agent.log :-
>>
>> 2016-12-13 10:03:33,894 WARN  [kvm.resource.LibvirtComputingResource]
>> (agentRequest-Handler-1:null) (logid:633e6e03) Timed out:
>> /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/patchviasocket.py
>> -n r-668-VM -p 
>> %template=domP%name=r-668-VM%eth0ip=10.3.28.10%eth0mask=255.255.255.0%gateway=10.3.28.1%domain=nocser.net%cidrsize=24%dhcprange=10.3.28.1%eth1ip=169.254.0.33%eth1mask=255.255.0.0%type=dhcpsrvr%disable_rp_filter=true%dns1=8.8.8.8%dns2=8.8.4.4%ip6dns1=%ip6dns2=%baremetalnotificationsecuritykey=uavJByNGGjNLrELG-qbdN99__1I3tnp8qa0KbcsKokKJcPB43K9s6oQu2nMLqo3YP8p6jqDy5XT3WWOWBA2yNw%baremetalnotificationapikey=8JH4mdkxsEMhgIBgMonkNXAEKjVOeZnG1m5UVekvvo4v_iXQ4ZS7rh6NNS0qphhc7ZrCauiz23tp2-Wa3AASlg%host=10.2.30.11%port=8080
>> .  Output is:
>> .
>> 2016-12-13 10:05:45,895 WARN  [kvm.resource.LibvirtComputingResource]
>> (agentRequest-Handler-1:null) (logid:633e6e03) Timed out:
>> /usr/share/cloudstack-common/scripts/network/domr/router_proxy.sh
>> vr_cfg.sh 169.254.0.33 -c
>> /var/cache/cloud/VR-48ea8a95-6c02-499f-88d3-eae5bf9f9fbe.cfg .  Output
>> is:
>>
>> As mentioned, this only happens with 1 network (always the same
>> network). The other router VMs work OK. Any clues on how to
>> troubleshoot this further, would be greatly appreciated.
>>
>> Thanks.
>>
>> --
>> --sazli
>> Syahrul Sazli Shaharir 
>>



-- 
--sazli
Syahrul Sazli Shaharir 
Mobile: +6019 385 8301 - YM/Skype: syahrulsazli
System Administrator
TMK Pulasan (002339810-M) http://pulasan.my/
11 Jalan 3/4, 43650 Bandar Baru Bangi, Selangor, Malaysia.
Tel/Fax: +603 8926 0338


Re: Router VM: patchviasocket.py timeout issue on 1 out of 4 networks

2016-12-14 Thread ilya
This will explain a bit more on how this issue came about and how to fix
it..
https://www.mail-archive.com/dev@cloudstack.apache.org/msg71559.html

On 12/12/16 6:31 PM, Simon Weller wrote:
> Can you turn on agent debug mode and take a look at the debug level logs?
> 
> 
> You can do that by running sed -i 's/INFO/DEBUG/g' 
> /etc/cloudstack/agent/log4j-cloud.xml on the host and then restarting the 
> agent.
> 
> 
> - Si
> 
> 
> 
> 
> 
> From: Syahrul Sazli Shaharir 
> Sent: Monday, December 12, 2016 8:21 PM
> To: users@cloudstack.apache.org
> Subject: Router VM: patchviasocket.py timeout issue on 1 out of 4 networks
> 
> Hi,
> 
> I am running latest Cloudstack 4.9.0.1 on CentOS 7 KVM + ceph
> environment. After running for some time, I faced with an issue with
> one out of 4 networks - following a heartbeat-induced reset on all
> hosts, the associated virtual router would not get recreated and
> started properly on any of the 3 hosts I have, even after repeated
> attempts of the following:-
> - destroy-recreate cycles, via Cloudstack UI
> - restartNetwork cleanup=true API calls (failed with errorcode = 530).
> - redownload and reregister system VM template as another entry and
> assign to router VM in global setting (boots the new template OK, but
> still same problem)
> - tweak default system offering for router VM (increased RAM from 256 to 
> 512MB)
> - created new system offering, with RAM tweak, and use of ceph rbd
> store, and assigned it to Cloud.Com-SoftwareRouter as per docs - which
> didnt work for some reason: it kept on using initial default offering
> and created image on local host storage
> - upgrade to latest cloudstack (previously was running 4.8)
> 
> As with a handful of others in this list archives, virsh list and
> dumpxml shows the VM created OK but failed soon after booting, as
> found in the following error in agent.log :-
> 
> 2016-12-13 10:03:33,894 WARN  [kvm.resource.LibvirtComputingResource]
> (agentRequest-Handler-1:null) (logid:633e6e03) Timed out:
> /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/patchviasocket.py
> -n r-668-VM -p 
> %template=domP%name=r-668-VM%eth0ip=10.3.28.10%eth0mask=255.255.255.0%gateway=10.3.28.1%domain=nocser.net%cidrsize=24%dhcprange=10.3.28.1%eth1ip=169.254.0.33%eth1mask=255.255.0.0%type=dhcpsrvr%disable_rp_filter=true%dns1=8.8.8.8%dns2=8.8.4.4%ip6dns1=%ip6dns2=%baremetalnotificationsecuritykey=uavJByNGGjNLrELG-qbdN99__1I3tnp8qa0KbcsKokKJcPB43K9s6oQu2nMLqo3YP8p6jqDy5XT3WWOWBA2yNw%baremetalnotificationapikey=8JH4mdkxsEMhgIBgMonkNXAEKjVOeZnG1m5UVekvvo4v_iXQ4ZS7rh6NNS0qphhc7ZrCauiz23tp2-Wa3AASlg%host=10.2.30.11%port=8080
> .  Output is:
> .
> 2016-12-13 10:05:45,895 WARN  [kvm.resource.LibvirtComputingResource]
> (agentRequest-Handler-1:null) (logid:633e6e03) Timed out:
> /usr/share/cloudstack-common/scripts/network/domr/router_proxy.sh
> vr_cfg.sh 169.254.0.33 -c
> /var/cache/cloud/VR-48ea8a95-6c02-499f-88d3-eae5bf9f9fbe.cfg .  Output
> is:
> 
> As mentioned, this only happens with 1 network (always the same
> network). The other router VMs work OK. Any clues on how to
> troubleshoot this further, would be greatly appreciated.
> 
> Thanks.
> 
> --
> --sazli
> Syahrul Sazli Shaharir 
> 


Re: Router VM: patchviasocket.py timeout issue on 1 out of 4 networks

2016-12-13 Thread Syahrul Sazli Shaharir
Hi Simon,

On Tue, Dec 13, 2016 at 10:31 AM, Simon Weller  wrote:
> Can you turn on agent debug mode and take a look at the debug level logs?
>
>
> You can do that by running sed -i 's/INFO/DEBUG/g' 
> /etc/cloudstack/agent/log4j-cloud.xml on the host and then restarting the 
> agent.

Here are the debug logs - patchviasocket.py executed OK but couldn't
connect to the router VM's internal IP:-

2016-12-13 19:23:18,627 DEBUG [kvm.resource.LibvirtComputingResource]
(agentRequest-Handler-4:null) (logid:0bf9a356) Executing:
/usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/patchviasocket.py
-n r-669-VM -p 
%template=domP%name=r-669-VM%eth0ip=10.3.28.10%eth0mask=255.255.255.0%gateway=10.3.28.1%domain=nocser.net%cidrsize=24%dhcprange=10.3.28.1%eth1ip=169.254.3.7%eth1mask=255.255.0.0%type=dhcpsrvr%disable_rp_filter=true%dns1=8.8.8.8%dns2=8.8.4.4%ip6dns1=%ip6dns2=%baremetalnotificationsecuritykey=uavJByNGGjNLrELG-qbdN99__1I3tnp8qa0KbcsKokKJcPB43K9s6oQu2nMLqo3YP8p6jqDy5XT3WWOWBA2yNw%baremetalnotificationapikey=8JH4mdkxsEMhgIBgMonkNXAEKjVOeZnG1m5UVekvvo4v_iXQ4ZS7rh6NNS0qphhc7ZrCauiz23tp2-Wa3AASlg%host=10.2.30.11%port=8080
2016-12-13 19:23:18,739 DEBUG [kvm.resource.LibvirtComputingResource]
(agentRequest-Handler-4:null) (logid:0bf9a356) Execution is
successful.
2016-12-13 19:23:18,742 DEBUG
[resource.virtualnetwork.VirtualRoutingResource]
(agentRequest-Handler-4:null) (logid:0bf9a356) Trying to connect to
169.254.3.7
2016-12-13 19:23:21,749 DEBUG
[resource.virtualnetwork.VirtualRoutingResource]
(agentRequest-Handler-4:null) (logid:0bf9a356) Could not connect to
169.254.3.7
2016-12-13 19:23:26,750 DEBUG
[resource.virtualnetwork.VirtualRoutingResource]
(agentRequest-Handler-4:null) (logid:0bf9a356) Trying to connect to
169.254.3.7
2016-12-13 19:23:29,757 DEBUG
[resource.virtualnetwork.VirtualRoutingResource]
(agentRequest-Handler-4:null) (logid:0bf9a356) Could not connect to
169.254.3.7
2016-12-13 19:23:29,869 DEBUG [cloud.agent.Agent]
(agentRequest-Handler-5:null) (logid:981a5f6f) Processing command:
com.cloud.agent.api.GetHostStatsCommand
2016-12-13 19:23:34,759 DEBUG
[resource.virtualnetwork.VirtualRoutingResource]
(agentRequest-Handler-4:null) (logid:0bf9a356) Unable to logon to
169.254.3.7

virsh console also failed to show anything.

Thanks.

>
>
> - Si
>
>
>
>
> 
> From: Syahrul Sazli Shaharir 
> Sent: Monday, December 12, 2016 8:21 PM
> To: users@cloudstack.apache.org
> Subject: Router VM: patchviasocket.py timeout issue on 1 out of 4 networks
>
> Hi,
>
> I am running latest Cloudstack 4.9.0.1 on CentOS 7 KVM + ceph
> environment. After running for some time, I faced with an issue with
> one out of 4 networks - following a heartbeat-induced reset on all
> hosts, the associated virtual router would not get recreated and
> started properly on any of the 3 hosts I have, even after repeated
> attempts of the following:-
> - destroy-recreate cycles, via Cloudstack UI
> - restartNetwork cleanup=true API calls (failed with errorcode = 530).
> - redownload and reregister system VM template as another entry and
> assign to router VM in global setting (boots the new template OK, but
> still same problem)
> - tweak default system offering for router VM (increased RAM from 256 to 
> 512MB)
> - created new system offering, with RAM tweak, and use of ceph rbd
> store, and assigned it to Cloud.Com-SoftwareRouter as per docs - which
> didnt work for some reason: it kept on using initial default offering
> and created image on local host storage
> - upgrade to latest cloudstack (previously was running 4.8)
>
> As with a handful of others in this list archives, virsh list and
> dumpxml shows the VM created OK but failed soon after booting, as
> found in the following error in agent.log :-
>
> 2016-12-13 10:03:33,894 WARN  [kvm.resource.LibvirtComputingResource]
> (agentRequest-Handler-1:null) (logid:633e6e03) Timed out:
> /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/patchviasocket.py
> -n r-668-VM -p 
> %template=domP%name=r-668-VM%eth0ip=10.3.28.10%eth0mask=255.255.255.0%gateway=10.3.28.1%domain=nocser.net%cidrsize=24%dhcprange=10.3.28.1%eth1ip=169.254.0.33%eth1mask=255.255.0.0%type=dhcpsrvr%disable_rp_filter=true%dns1=8.8.8.8%dns2=8.8.4.4%ip6dns1=%ip6dns2=%baremetalnotificationsecuritykey=uavJByNGGjNLrELG-qbdN99__1I3tnp8qa0KbcsKokKJcPB43K9s6oQu2nMLqo3YP8p6jqDy5XT3WWOWBA2yNw%baremetalnotificationapikey=8JH4mdkxsEMhgIBgMonkNXAEKjVOeZnG1m5UVekvvo4v_iXQ4ZS7rh6NNS0qphhc7ZrCauiz23tp2-Wa3AASlg%host=10.2.30.11%port=8080
> .  Output is:
> .
> 2016-12-13 10:05:45,895 WARN  [kvm.resource.LibvirtComputingResource]
> (agentRequest-Handler-1:null) (logid:633e6e03) Timed out:
> /usr/share/cloudstack-common/scripts/network/domr/router_proxy.sh
> vr_cfg.sh 169.254.0.33 -c
> /var/cache/cloud/VR-48ea8a95-6c02-499f-88d3-eae5bf9f9fbe.cfg .  Output
> is:
>
> As mentioned, this only happens with 1 network (always the same
> network). The other router VMs work OK. 

Re: Router VM: patchviasocket.py timeout issue on 1 out of 4 networks

2016-12-12 Thread Simon Weller
Can you turn on agent debug mode and take a look at the debug level logs?


You can do that by running sed -i 's/INFO/DEBUG/g' 
/etc/cloudstack/agent/log4j-cloud.xml on the host and then restarting the agent.


- Si





From: Syahrul Sazli Shaharir 
Sent: Monday, December 12, 2016 8:21 PM
To: users@cloudstack.apache.org
Subject: Router VM: patchviasocket.py timeout issue on 1 out of 4 networks

Hi,

I am running latest Cloudstack 4.9.0.1 on CentOS 7 KVM + ceph
environment. After running for some time, I faced with an issue with
one out of 4 networks - following a heartbeat-induced reset on all
hosts, the associated virtual router would not get recreated and
started properly on any of the 3 hosts I have, even after repeated
attempts of the following:-
- destroy-recreate cycles, via Cloudstack UI
- restartNetwork cleanup=true API calls (failed with errorcode = 530).
- redownload and reregister system VM template as another entry and
assign to router VM in global setting (boots the new template OK, but
still same problem)
- tweak default system offering for router VM (increased RAM from 256 to 512MB)
- created new system offering, with RAM tweak, and use of ceph rbd
store, and assigned it to Cloud.Com-SoftwareRouter as per docs - which
didnt work for some reason: it kept on using initial default offering
and created image on local host storage
- upgrade to latest cloudstack (previously was running 4.8)

As with a handful of others in this list archives, virsh list and
dumpxml shows the VM created OK but failed soon after booting, as
found in the following error in agent.log :-

2016-12-13 10:03:33,894 WARN  [kvm.resource.LibvirtComputingResource]
(agentRequest-Handler-1:null) (logid:633e6e03) Timed out:
/usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/patchviasocket.py
-n r-668-VM -p 
%template=domP%name=r-668-VM%eth0ip=10.3.28.10%eth0mask=255.255.255.0%gateway=10.3.28.1%domain=nocser.net%cidrsize=24%dhcprange=10.3.28.1%eth1ip=169.254.0.33%eth1mask=255.255.0.0%type=dhcpsrvr%disable_rp_filter=true%dns1=8.8.8.8%dns2=8.8.4.4%ip6dns1=%ip6dns2=%baremetalnotificationsecuritykey=uavJByNGGjNLrELG-qbdN99__1I3tnp8qa0KbcsKokKJcPB43K9s6oQu2nMLqo3YP8p6jqDy5XT3WWOWBA2yNw%baremetalnotificationapikey=8JH4mdkxsEMhgIBgMonkNXAEKjVOeZnG1m5UVekvvo4v_iXQ4ZS7rh6NNS0qphhc7ZrCauiz23tp2-Wa3AASlg%host=10.2.30.11%port=8080
.  Output is:
.
2016-12-13 10:05:45,895 WARN  [kvm.resource.LibvirtComputingResource]
(agentRequest-Handler-1:null) (logid:633e6e03) Timed out:
/usr/share/cloudstack-common/scripts/network/domr/router_proxy.sh
vr_cfg.sh 169.254.0.33 -c
/var/cache/cloud/VR-48ea8a95-6c02-499f-88d3-eae5bf9f9fbe.cfg .  Output
is:

As mentioned, this only happens with 1 network (always the same
network). The other router VMs work OK. Any clues on how to
troubleshoot this further, would be greatly appreciated.

Thanks.

--
--sazli
Syahrul Sazli Shaharir