Re: AW: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout

2016-01-29 Thread Daniel Mezentsev

 Hi All,

Sorry didn't particpate discussion. Got pretty big issue with the cloud
yesterday. Very intersting one.
I have 2 networks with 2 dedicated VR (not VPC). One of the netowork with
default gateway, the second one just for internal interconnect between VMs.
Both network are IsolatedWithNoSNAT service, basically only DHCP. Both
network started. I started fire-up VMs and got default gateway on the
second NIC (it's configured to be on the first). Doing some troubleshooting
on it i found that default gateway was overwitten when second interface was
activating. From my understanding VR of the network not designated to be
"default" should not provide DHCP option "router", but it did.
More over it provided wrong default gateway. In my configuration default
gateway points to device in the reserved IP range (external router), but VR
is pushed own IP address as default gw, so network was completely down.
Here is some numbers

network 1 - 10.1.12.0/24 range 10.1.12.1-240, default gw 10.1.12.254
network 2 - 10.1.22.0/24 range 10.1.22.1-240, default gw 10.1.22.254 (i
have to specify it here, but i really don't need it, more over that ip is
down)

NIC1 on all VMs in network1, NIC1 is default
NIC2 on all VMs in network2

But i got default gw from network2, and it was 10.1.22.1, which is not in
configuration at all.

That is very confusing. I wasn't able to fix that issue, so i removed
network2 completely.

Configuration above was working with 4.5 and 4.4, so it points me that
there is definetely something wrong with 4.6 system VM image.

That solution to increase time-out rather work-arround then real fix for
the problem. There is no way that router is booting for 4-5 minutes.
 


The routers from Daniel work when we bumped global setting
router.aggregation.command.each.timeout to 20 or higher. You may need to
upgrade to 4.7.1 for that to work, but that's a good thing to do anyway
because it has a load of fixes over 4.6. The upgrade is quick and
painless.

For the record, it doesn't have to do with the template. Be happy we
reuse it, makes upgrading way easier.

Regards, Remi

Sent from my iPhone


On 28 Jan 2016, at 07:12, Daniel Mezentsev  wrote:

4.7.1 also affected.
All versions use 4.6 templates.

P.S.: would it be usefull to move discussion to devs list. Looks like
it's
pretty big issue.


Hi!

After upgrading from 4.5.2 to 4.6.2, I also face this issue. We use

both

KVM and XenServer. Any stable workaround for this?

Thanx

Martin

-Ursprüngliche Nachricht-
Von: Andrei Mikhailovsky [mailto:and...@arhont.com]
Gesendet: Freitag, 18. Dezember 2015 02:55
An: users 
Betreff: Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout

Hello guys,

I've just upgraded to 4.6.2 and have similar issues with three virtual
routers out of 22 in total. They are all failing exactly the same way

as

described here.

Has anyone found a permanent workaround for this issue?

Thanks

Andrei

- Original Message -

From: "Stephan Seitz" 
To: "users" 
Sent: Monday, 30 November, 2015 19:53:57
Subject: Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout



Does anybody else experiemce problems due to (very) slow deployment of
VRs?

Am Dienstag, den 24.11.2015, 16:31 +0100 schrieb Stephan Seitz:

Update / FYI:
After faking the particular VRu in sql, I tried to restart that
network, and it always fails. To me it looks like the
update_config.py - which takes almost all cpu ressources - runs way
longer any watchdog will accept.

I'm able to mitigate that by very nasty workarounds:
a) start the router
b) wait until its provisioned
c) restart cloudstack-management
d)  update vm_instance
set state='Running',
power_state='PowerOn' where name = 'r-XXX-VM';
e) once: update domain_router
set template_version="Cloudstack Release 4.6.0 Wed Nov 4 08:22:47 UTC
2015", scripts_version="546c9e7ac38e0aa16ecc498899dac8e2"
where id=XXX;
f) wait until update_config.py finishes (for me thats about 15
minutes)

Since I expect the need for VR restarts in the future, this behaviour
is somehow unsatisfying. It needs a lot of errorprone intervention.

I'm quite unsure if it's introduced with the update or the particular
VR just has simply not been restarted after getting configured with
lots of ips and rules.

Am Dienstag, den 24.11.2015, 12:29 +0100 schrieb Stephan Seitz:
Hi List!

After upgrading from 4.5.2 to 4.6.0 I faced a problem with one
virtualrouter. This particular VR has about 10 IPs w/ LB and FW
rules
defined. During the upgrade process, and after about 4-5 minutes a
watchdog kicks in and kills the respective VR due to no response.

So far I didn't find any timeout value in the global settings.
Temporarily setting network.router.EnableServiceMonitoring to false
doesn't change the behaviour.

Any help, how to mitigate that nasty timeout would be really
appreciated :)

cheers,

Stephan

From within the VR, the logs 

Re: AW: AW: AW: AW: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout

2016-01-28 Thread Remi Bergsma
Like I said, it’s already much better in 4.7.1. I start to see packages 
appearing, for example here: http://cloudstack.apt-get.eu/centos7/4.7/oss/

With 4.7.1 we tested hundreds of rules and that worked fine.

Regards,
Remi



On 28/01/16 17:24, "Martin Emrich" <martin.emr...@empolis.com> wrote:

>This was/is 4.7.0.
>
>With the KVM problem: I suspect that the KVM hosts somehow got wonky, as the 
>VR has no network connectivity (tried virsh console to it, and had no internet 
>there). Next I'll try rebooting one.
>
>Another network (with 48 firewall/portforwarding rules) took ca. 30min, the 
>first one with 22 rules took 10 minutes, so the time is consumed during 
>configuration of the rules.
>
>Ciao
>
>Martin 
>
>-Ursprüngliche Nachricht-
>Von: Remi Bergsma [mailto:rberg...@schubergphilis.com] 
>Gesendet: Donnerstag, 28. Januar 2016 17:20
>An: users@cloudstack.apache.org
>Betreff: Re: AW: AW: AW: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout
>
>The 4.7.1 packages will be there soon. I’d advise anyone currently on 4.6 to 
>upgrade to it as it has several speed improvements.
>
>What version was this Martin?
>
>Regards,
>Remi
>
>
>
>
>On 28/01/16 15:46, "Martin Emrich" <martin.emr...@empolis.com> wrote:
>
>>Thanks, that's it.
>>
>>So (at least) I seem to have two issues:
>>
>>1. VirtualRouters now fail to start on KVM (at least when there's already a 
>>VR on the KVM host) , but work on XenServer.
>>
>>2. VirtualRouters take very long to start since 4.6: The VR I just started 
>>belongs to a network with some 20 firewall and port forwarding rules, and the 
>>VR took ca. 10 minutes to configure (eating 100% CPU in the process). Thanks 
>>to setting the timeout high enough, the network finally came up.
>>I took a copy of the VR cloud.log ca. 8 minutes into the process, if anyone 
>>is interested.
>>
>>Ciao
>>
>>Martin
>>
>>-Ursprüngliche Nachricht-
>>Von: Remi Bergsma [mailto:rberg...@schubergphilis.com] 
>>Gesendet: Donnerstag, 28. Januar 2016 14:09
>>An: users@cloudstack.apache.org
>>Betreff: Re: AW: AW: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout
>>
>>Hi Martin,
>>
>>4.7.1 has a fix for the overall timeout of 120 seconds. I expect the packages 
>>will be ready in a day or two.
>>
>>If XenServer works, try setting system.vm.default.hypervisor to XenServer and 
>>it will not use another hypervisor.
>>
>>Regards,
>>Remi
>>
>>
>>
>>On 28/01/16 13:38, "Martin Emrich" <martin.emr...@empolis.com> wrote:
>>
>>>Hi!
>>>
>>>To follow up:
>>>
>>>- I upgraded to 4.7.0 (there are no 4.7.1 el6 RPMs yet, neither from 
>>>CloudStack nor from Shapeblue)
>>>- The problem still persists
>>>- It seems that VRs can be created on XenServer, but not on KVM. I tried 
>>>forcing new VRs to XenServers only via host tags, but the decision to use 
>>>KVM is being made before the tags are evaluated, so this leaves no hosts 
>>>when the decision for KVM is made.
>>>- I see this in the KVM host agent.log:
>>>2016-01-28 13:29:10,956 WARN  [kvm.resource.LibvirtComputingResource] 
>>>(Script-2:null) (logid:) Interrupting script.
>>>2016-01-28 13:29:10,958 WARN  [kvm.resource.LibvirtComputingResource] 
>>>(agentRequest-Handler-4:null) (logid:7d8981d1) Timed out: 
>>>/usr/share/cloudstack-common/scripts/network/domr/router_proxy.sh vr_cfg.sh 
>>>169.254.3.252 -c 
>>>/var/cache/cloud/VR-0193b95c-96ee-4e0d-a527-964960baa49e.cfg .  Output is:
>>>- router.aggregation.command.each.timeout is set to whopping 600s, but the 
>>>error appears ca. 1-2 minutes after I click on "restart network" with 
>>>cleanup=true.
>>>
>>>
>>>Any Ideas?
>>>
>>>Thanks
>>>
>>>Martin
>>>


AW: AW: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout

2016-01-28 Thread Martin Emrich
Hi!

To follow up:

- I upgraded to 4.7.0 (there are no 4.7.1 el6 RPMs yet, neither from CloudStack 
nor from Shapeblue)
- The problem still persists
- It seems that VRs can be created on XenServer, but not on KVM. I tried 
forcing new VRs to XenServers only via host tags, but the decision to use KVM 
is being made before the tags are evaluated, so this leaves no hosts when the 
decision for KVM is made.
- I see this in the KVM host agent.log:
2016-01-28 13:29:10,956 WARN  [kvm.resource.LibvirtComputingResource] 
(Script-2:null) (logid:) Interrupting script.
2016-01-28 13:29:10,958 WARN  [kvm.resource.LibvirtComputingResource] 
(agentRequest-Handler-4:null) (logid:7d8981d1) Timed out: 
/usr/share/cloudstack-common/scripts/network/domr/router_proxy.sh vr_cfg.sh 
169.254.3.252 -c /var/cache/cloud/VR-0193b95c-96ee-4e0d-a527-964960baa49e.cfg . 
 Output is:
- router.aggregation.command.each.timeout is set to whopping 600s, but the 
error appears ca. 1-2 minutes after I click on "restart network" with 
cleanup=true.


Any Ideas?

Thanks

Martin



Re: AW: AW: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout

2016-01-28 Thread Remi Bergsma
Hi Martin,

4.7.1 has a fix for the overall timeout of 120 seconds. I expect the packages 
will be ready in a day or two.

If XenServer works, try setting system.vm.default.hypervisor to XenServer and 
it will not use another hypervisor.

Regards,
Remi



On 28/01/16 13:38, "Martin Emrich"  wrote:

>Hi!
>
>To follow up:
>
>- I upgraded to 4.7.0 (there are no 4.7.1 el6 RPMs yet, neither from 
>CloudStack nor from Shapeblue)
>- The problem still persists
>- It seems that VRs can be created on XenServer, but not on KVM. I tried 
>forcing new VRs to XenServers only via host tags, but the decision to use KVM 
>is being made before the tags are evaluated, so this leaves no hosts when the 
>decision for KVM is made.
>- I see this in the KVM host agent.log:
>2016-01-28 13:29:10,956 WARN  [kvm.resource.LibvirtComputingResource] 
>(Script-2:null) (logid:) Interrupting script.
>2016-01-28 13:29:10,958 WARN  [kvm.resource.LibvirtComputingResource] 
>(agentRequest-Handler-4:null) (logid:7d8981d1) Timed out: 
>/usr/share/cloudstack-common/scripts/network/domr/router_proxy.sh vr_cfg.sh 
>169.254.3.252 -c /var/cache/cloud/VR-0193b95c-96ee-4e0d-a527-964960baa49e.cfg 
>.  Output is:
>- router.aggregation.command.each.timeout is set to whopping 600s, but the 
>error appears ca. 1-2 minutes after I click on "restart network" with 
>cleanup=true.
>
>
>Any Ideas?
>
>Thanks
>
>Martin
>


AW: AW: AW: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout

2016-01-28 Thread Martin Emrich
Thanks, that's it.

So (at least) I seem to have two issues:

1. VirtualRouters now fail to start on KVM (at least when there's already a VR 
on the KVM host) , but work on XenServer.

2. VirtualRouters take very long to start since 4.6: The VR I just started 
belongs to a network with some 20 firewall and port forwarding rules, and the 
VR took ca. 10 minutes to configure (eating 100% CPU in the process). Thanks to 
setting the timeout high enough, the network finally came up.
I took a copy of the VR cloud.log ca. 8 minutes into the process, if anyone is 
interested.

Ciao

Martin

-Ursprüngliche Nachricht-
Von: Remi Bergsma [mailto:rberg...@schubergphilis.com] 
Gesendet: Donnerstag, 28. Januar 2016 14:09
An: users@cloudstack.apache.org
Betreff: Re: AW: AW: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout

Hi Martin,

4.7.1 has a fix for the overall timeout of 120 seconds. I expect the packages 
will be ready in a day or two.

If XenServer works, try setting system.vm.default.hypervisor to XenServer and 
it will not use another hypervisor.

Regards,
Remi



On 28/01/16 13:38, "Martin Emrich" <martin.emr...@empolis.com> wrote:

>Hi!
>
>To follow up:
>
>- I upgraded to 4.7.0 (there are no 4.7.1 el6 RPMs yet, neither from 
>CloudStack nor from Shapeblue)
>- The problem still persists
>- It seems that VRs can be created on XenServer, but not on KVM. I tried 
>forcing new VRs to XenServers only via host tags, but the decision to use KVM 
>is being made before the tags are evaluated, so this leaves no hosts when the 
>decision for KVM is made.
>- I see this in the KVM host agent.log:
>2016-01-28 13:29:10,956 WARN  [kvm.resource.LibvirtComputingResource] 
>(Script-2:null) (logid:) Interrupting script.
>2016-01-28 13:29:10,958 WARN  [kvm.resource.LibvirtComputingResource] 
>(agentRequest-Handler-4:null) (logid:7d8981d1) Timed out: 
>/usr/share/cloudstack-common/scripts/network/domr/router_proxy.sh vr_cfg.sh 
>169.254.3.252 -c /var/cache/cloud/VR-0193b95c-96ee-4e0d-a527-964960baa49e.cfg 
>.  Output is:
>- router.aggregation.command.each.timeout is set to whopping 600s, but the 
>error appears ca. 1-2 minutes after I click on "restart network" with 
>cleanup=true.
>
>
>Any Ideas?
>
>Thanks
>
>Martin
>


AW: AW: AW: AW: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout

2016-01-28 Thread Martin Emrich
This was/is 4.7.0.

With the KVM problem: I suspect that the KVM hosts somehow got wonky, as the VR 
has no network connectivity (tried virsh console to it, and had no internet 
there). Next I'll try rebooting one.

Another network (with 48 firewall/portforwarding rules) took ca. 30min, the 
first one with 22 rules took 10 minutes, so the time is consumed during 
configuration of the rules.

Ciao

Martin 

-Ursprüngliche Nachricht-
Von: Remi Bergsma [mailto:rberg...@schubergphilis.com] 
Gesendet: Donnerstag, 28. Januar 2016 17:20
An: users@cloudstack.apache.org
Betreff: Re: AW: AW: AW: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout

The 4.7.1 packages will be there soon. I’d advise anyone currently on 4.6 to 
upgrade to it as it has several speed improvements.

What version was this Martin?

Regards,
Remi




On 28/01/16 15:46, "Martin Emrich" <martin.emr...@empolis.com> wrote:

>Thanks, that's it.
>
>So (at least) I seem to have two issues:
>
>1. VirtualRouters now fail to start on KVM (at least when there's already a VR 
>on the KVM host) , but work on XenServer.
>
>2. VirtualRouters take very long to start since 4.6: The VR I just started 
>belongs to a network with some 20 firewall and port forwarding rules, and the 
>VR took ca. 10 minutes to configure (eating 100% CPU in the process). Thanks 
>to setting the timeout high enough, the network finally came up.
>I took a copy of the VR cloud.log ca. 8 minutes into the process, if anyone is 
>interested.
>
>Ciao
>
>Martin
>
>-Ursprüngliche Nachricht-
>Von: Remi Bergsma [mailto:rberg...@schubergphilis.com] 
>Gesendet: Donnerstag, 28. Januar 2016 14:09
>An: users@cloudstack.apache.org
>Betreff: Re: AW: AW: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout
>
>Hi Martin,
>
>4.7.1 has a fix for the overall timeout of 120 seconds. I expect the packages 
>will be ready in a day or two.
>
>If XenServer works, try setting system.vm.default.hypervisor to XenServer and 
>it will not use another hypervisor.
>
>Regards,
>Remi
>
>
>
>On 28/01/16 13:38, "Martin Emrich" <martin.emr...@empolis.com> wrote:
>
>>Hi!
>>
>>To follow up:
>>
>>- I upgraded to 4.7.0 (there are no 4.7.1 el6 RPMs yet, neither from 
>>CloudStack nor from Shapeblue)
>>- The problem still persists
>>- It seems that VRs can be created on XenServer, but not on KVM. I tried 
>>forcing new VRs to XenServers only via host tags, but the decision to use KVM 
>>is being made before the tags are evaluated, so this leaves no hosts when the 
>>decision for KVM is made.
>>- I see this in the KVM host agent.log:
>>2016-01-28 13:29:10,956 WARN  [kvm.resource.LibvirtComputingResource] 
>>(Script-2:null) (logid:) Interrupting script.
>>2016-01-28 13:29:10,958 WARN  [kvm.resource.LibvirtComputingResource] 
>>(agentRequest-Handler-4:null) (logid:7d8981d1) Timed out: 
>>/usr/share/cloudstack-common/scripts/network/domr/router_proxy.sh vr_cfg.sh 
>>169.254.3.252 -c /var/cache/cloud/VR-0193b95c-96ee-4e0d-a527-964960baa49e.cfg 
>>.  Output is:
>>- router.aggregation.command.each.timeout is set to whopping 600s, but the 
>>error appears ca. 1-2 minutes after I click on "restart network" with 
>>cleanup=true.
>>
>>
>>Any Ideas?
>>
>>Thanks
>>
>>Martin
>>


Re: AW: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout

2016-01-27 Thread Daniel Mezentsev

4.7.1 also affected.
All versions use 4.6 templates.

P.S.: would it be usefull to move discussion to devs list. Looks like it's
pretty big issue.


Hi!

After upgrading from 4.5.2 to 4.6.2, I also face this issue. We use both
KVM and XenServer. Any stable workaround for this?

Thanx

Martin

-Ursprüngliche Nachricht-
Von: Andrei Mikhailovsky [mailto:and...@arhont.com]
Gesendet: Freitag, 18. Dezember 2015 02:55
An: users 
Betreff: Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout

Hello guys,

I've just upgraded to 4.6.2 and have similar issues with three virtual
routers out of 22 in total. They are all failing exactly the same way as
described here.

Has anyone found a permanent workaround for this issue?

Thanks

Andrei

- Original Message -

From: "Stephan Seitz" 
To: "users" 
Sent: Monday, 30 November, 2015 19:53:57
Subject: Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout



Does anybody else experiemce problems due to (very) slow deployment of
VRs?

Am Dienstag, den 24.11.2015, 16:31 +0100 schrieb Stephan Seitz:

Update / FYI:
After faking the particular VRu in sql, I tried to restart that
network, and it always fails. To me it looks like the
update_config.py - which takes almost all cpu ressources - runs way
longer any watchdog will accept.

I'm able to mitigate that by very nasty workarounds:
a) start the router
b) wait until its provisioned
c) restart cloudstack-management
d)  update vm_instance
set state='Running',
power_state='PowerOn' where name = 'r-XXX-VM';
e) once: update domain_router
set template_version="Cloudstack Release 4.6.0 Wed Nov 4 08:22:47 UTC
2015", scripts_version="546c9e7ac38e0aa16ecc498899dac8e2"
where id=XXX;
f) wait until update_config.py finishes (for me thats about 15
minutes)

Since I expect the need for VR restarts in the future, this behaviour
is somehow unsatisfying. It needs a lot of errorprone intervention.

I'm quite unsure if it's introduced with the update or the particular
VR just has simply not been restarted after getting configured with
lots of ips and rules.

Am Dienstag, den 24.11.2015, 12:29 +0100 schrieb Stephan Seitz:
Hi List!

After upgrading from 4.5.2 to 4.6.0 I faced a problem with one
virtualrouter. This particular VR has about 10 IPs w/ LB and FW
rules
defined. During the upgrade process, and after about 4-5 minutes a
watchdog kicks in and kills the respective VR due to no response.

So far I didn't find any timeout value in the global settings.
Temporarily setting network.router.EnableServiceMonitoring to false
doesn't change the behaviour.

Any help, how to mitigate that nasty timeout would be really
appreciated :)

cheers,

Stephan

From within the VR, the logs show

2015-11-24 11:24:33,807  CsFile.py search:123 Searching for
dhcp-range=interface:eth0,set:interface and replacing with
dhcp-range=interface:eth0,set:interface-eth0,10.10.22.1,static
2015-11-24 11:24:33,808  merge.py load:56 Creating data bag type
guestnetwork
2015-11-24 11:24:33,808  CsFile.py search:123 Searching for
dhcp-option=tag:interface-eth0,15 and replacing with
dhcp-option=tag:interface-eth0,15,heinlein.cloudservice
2015-11-24 11:24:33,808  CsFile.py search:123 Searching for
dhcp-option=tag:interface-eth0,6 and replacing with
dhcp-option=tag:interface
-eth0,6,10.10.22.1,195.10.208.2,91.198.250.2
2015-11-24 11:24:33,809  CsFile.py search:123 Searching for
dhcp-option=tag:interface-eth0,3, and replacing with
dhcp-option=tag:interface-eth0,3,10.10.22.1
2015-11-24 11:24:33,809  CsFile.py search:123 Searching for
dhcp-option=tag:interface-eth0,1, and replacing with
dhcp-option=tag:interface-eth0,1,255.255.255.0
2015-11-24 11:24:33,810  CsHelper.py execute:160 Executing: service
dnsmasq restart

==> /var/log/messages <==
Nov 24 11:24:34 r-504-VM shutdown[6752]: shutting down for system
halt

Broadcast message from root@r-504-VM (Tue Nov 24 11:24:34 2015):

The system is going down for system halt NOW!
Nov 24 11:24:35 r-504-VM KVP: KVP starting; pid is:6844

==> /var/log/cloud.log <==
/opt/cloud/bin/vr_cfg.sh: line 60:  6603
Killed                  /opt/cloud/bin/update_config.py
vm_dhcp_entry.json

==> /var/log/messages <==
Nov 24 11:24:35 r-504-VM cloud: VR config: executing
failed: /opt/cloud/bin/update_config.py vm_dhcp_entry.json

==> /var/log/cloud.log <==
Tue Nov 24 11:24:35 UTC 2015 : VR config: executing
failed: /opt/cloud/bin/update_config.py vm_dhcp_entry.json
Connection to 169.254.2.192 closed by remote host.
Connection to 169.254.2.192 closed.

the management-server.log shows

2015-11-24 12:24:43,015 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
(Work-Job-Executor-1:ctx-ad9e4658 job-5163/job-5164) Done executing
com.cloud.vm.VmWorkStart for job-5164
2015-11-24 12:24:43,017 INFO  [o.a.c.f.j.i.AsyncJobMonitor]
(Work-Job-Executor-1:ctx-ad9e4658 job-5163/job-5164) Remove job
-5164
from job monitoring
2015-11-24 12:24:43,114 ERROR 

Re: AW: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout

2016-01-27 Thread Remi Bergsma
The routers from Daniel work when we bumped global setting 
router.aggregation.command.each.timeout to 20 or higher. You may need to 
upgrade to 4.7.1 for that to work, but that's a good thing to do anyway because 
it has a load of fixes over 4.6. The upgrade is quick and painless. 

For the record, it doesn't have to do with the template. Be happy we reuse it, 
makes upgrading way easier. 

Regards, Remi 

Sent from my iPhone

> On 28 Jan 2016, at 07:12, Daniel Mezentsev  wrote:
> 
> 4.7.1 also affected.
> All versions use 4.6 templates.
> 
> P.S.: would it be usefull to move discussion to devs list. Looks like it's
> pretty big issue.
> 
>> Hi!
>> 
>> After upgrading from 4.5.2 to 4.6.2, I also face this issue. We use both
>> KVM and XenServer. Any stable workaround for this?
>> 
>> Thanx
>> 
>> Martin
>> 
>> -Ursprüngliche Nachricht-
>> Von: Andrei Mikhailovsky [mailto:and...@arhont.com]
>> Gesendet: Freitag, 18. Dezember 2015 02:55
>> An: users 
>> Betreff: Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout
>> 
>> Hello guys,
>> 
>> I've just upgraded to 4.6.2 and have similar issues with three virtual
>> routers out of 22 in total. They are all failing exactly the same way as
>> described here.
>> 
>> Has anyone found a permanent workaround for this issue?
>> 
>> Thanks
>> 
>> Andrei
>> 
>> - Original Message -
>>> From: "Stephan Seitz" 
>>> To: "users" 
>>> Sent: Monday, 30 November, 2015 19:53:57
>>> Subject: Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout
>> 
>>> Does anybody else experiemce problems due to (very) slow deployment of
>>> VRs?
>>> 
>>> Am Dienstag, den 24.11.2015, 16:31 +0100 schrieb Stephan Seitz:
 Update / FYI:
 After faking the particular VRu in sql, I tried to restart that
 network, and it always fails. To me it looks like the
 update_config.py - which takes almost all cpu ressources - runs way
 longer any watchdog will accept.
 
 I'm able to mitigate that by very nasty workarounds:
 a) start the router
 b) wait until its provisioned
 c) restart cloudstack-management
 d)  update vm_instance
 set state='Running',
 power_state='PowerOn' where name = 'r-XXX-VM';
 e) once: update domain_router
 set template_version="Cloudstack Release 4.6.0 Wed Nov 4 08:22:47 UTC
 2015", scripts_version="546c9e7ac38e0aa16ecc498899dac8e2"
 where id=XXX;
 f) wait until update_config.py finishes (for me thats about 15
 minutes)
 
 Since I expect the need for VR restarts in the future, this behaviour
 is somehow unsatisfying. It needs a lot of errorprone intervention.
 
 I'm quite unsure if it's introduced with the update or the particular
 VR just has simply not been restarted after getting configured with
 lots of ips and rules.
 
 Am Dienstag, den 24.11.2015, 12:29 +0100 schrieb Stephan Seitz:
 Hi List!
 
 After upgrading from 4.5.2 to 4.6.0 I faced a problem with one
 virtualrouter. This particular VR has about 10 IPs w/ LB and FW
 rules
 defined. During the upgrade process, and after about 4-5 minutes a
 watchdog kicks in and kills the respective VR due to no response.
 
 So far I didn't find any timeout value in the global settings.
 Temporarily setting network.router.EnableServiceMonitoring to false
 doesn't change the behaviour.
 
 Any help, how to mitigate that nasty timeout would be really
 appreciated :)
 
 cheers,
 
 Stephan
 
 From within the VR, the logs show
 
 2015-11-24 11:24:33,807  CsFile.py search:123 Searching for
 dhcp-range=interface:eth0,set:interface and replacing with
 dhcp-range=interface:eth0,set:interface-eth0,10.10.22.1,static
 2015-11-24 11:24:33,808  merge.py load:56 Creating data bag type
 guestnetwork
 2015-11-24 11:24:33,808  CsFile.py search:123 Searching for
 dhcp-option=tag:interface-eth0,15 and replacing with
 dhcp-option=tag:interface-eth0,15,heinlein.cloudservice
 2015-11-24 11:24:33,808  CsFile.py search:123 Searching for
 dhcp-option=tag:interface-eth0,6 and replacing with
 dhcp-option=tag:interface
 -eth0,6,10.10.22.1,195.10.208.2,91.198.250.2
 2015-11-24 11:24:33,809  CsFile.py search:123 Searching for
 dhcp-option=tag:interface-eth0,3, and replacing with
 dhcp-option=tag:interface-eth0,3,10.10.22.1
 2015-11-24 11:24:33,809  CsFile.py search:123 Searching for
 dhcp-option=tag:interface-eth0,1, and replacing with
 dhcp-option=tag:interface-eth0,1,255.255.255.0
 2015-11-24 11:24:33,810  CsHelper.py execute:160 Executing: service
 dnsmasq restart
 
 ==> /var/log/messages <==
 Nov 24 11:24:34 r-504-VM shutdown[6752]: shutting down for system
 halt
 
 Broadcast message from root@r-504-VM (Tue Nov 24 11:24:34 2015):
 
 

AW: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout

2016-01-27 Thread Martin Emrich
Hi!

After upgrading from 4.5.2 to 4.6.2, I also face this issue. We use both KVM 
and XenServer. Any stable workaround for this?

Thanx

Martin



-Ursprüngliche Nachricht-
Von: Andrei Mikhailovsky [mailto:and...@arhont.com] 
Gesendet: Freitag, 18. Dezember 2015 02:55
An: users 
Betreff: Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout

Hello guys,

I've just upgraded to 4.6.2 and have similar issues with three virtual routers 
out of 22 in total. They are all failing exactly the same way as described here.

Has anyone found a permanent workaround for this issue?

Thanks

Andrei

- Original Message -
> From: "Stephan Seitz" 
> To: "users" 
> Sent: Monday, 30 November, 2015 19:53:57
> Subject: Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout

> Does anybody else experiemce problems due to (very) slow deployment of 
> VRs?
> 
> 
> Am Dienstag, den 24.11.2015, 16:31 +0100 schrieb Stephan Seitz:
>> Update / FYI:
>> After faking the particular VRu in sql, I tried to restart that 
>> network, and it always fails. To me it looks like the 
>> update_config.py - which takes almost all cpu ressources - runs way 
>> longer any watchdog will accept.
>> 
>> I'm able to mitigate that by very nasty workarounds:
>> a) start the router
>> b) wait until its provisioned
>> c) restart cloudstack-management
>> d)  update vm_instance
>> set state='Running',
>> power_state='PowerOn' where name = 'r-XXX-VM';
>> e) once: update domain_router
>> set template_version="Cloudstack Release 4.6.0 Wed Nov 4 08:22:47 UTC 
>> 2015", scripts_version="546c9e7ac38e0aa16ecc498899dac8e2"
>> where id=XXX;
>> f) wait until update_config.py finishes (for me thats about 15
>> minutes)
>> 
>> Since I expect the need for VR restarts in the future, this behaviour 
>> is somehow unsatisfying. It needs a lot of errorprone intervention.
>> 
>> I'm quite unsure if it's introduced with the update or the particular 
>> VR just has simply not been restarted after getting configured with 
>> lots of ips and rules.
>> 
>> 
>> Am Dienstag, den 24.11.2015, 12:29 +0100 schrieb Stephan Seitz:
>> > Hi List!
>> > 
>> > After upgrading from 4.5.2 to 4.6.0 I faced a problem with one
>> > virtualrouter. This particular VR has about 10 IPs w/ LB and FW
>> > rules
>> > defined. During the upgrade process, and after about 4-5 minutes a
>> > watchdog kicks in and kills the respective VR due to no response.
>> > 
>> > So far I didn't find any timeout value in the global settings.
>> > Temporarily setting network.router.EnableServiceMonitoring to false
>> > doesn't change the behaviour.
>> > 
>> > Any help, how to mitigate that nasty timeout would be really
>> > appreciated :)
>> > 
>> > cheers,
>> > 
>> > Stephan
>> > 
>> > From within the VR, the logs show
>> > 
>> > 2015-11-24 11:24:33,807  CsFile.py search:123 Searching for
>> > dhcp-range=interface:eth0,set:interface and replacing with
>> > dhcp-range=interface:eth0,set:interface-eth0,10.10.22.1,static
>> > 2015-11-24 11:24:33,808  merge.py load:56 Creating data bag type
>> > guestnetwork
>> > 2015-11-24 11:24:33,808  CsFile.py search:123 Searching for
>> > dhcp-option=tag:interface-eth0,15 and replacing with
>> > dhcp-option=tag:interface-eth0,15,heinlein.cloudservice
>> > 2015-11-24 11:24:33,808  CsFile.py search:123 Searching for
>> > dhcp-option=tag:interface-eth0,6 and replacing with
>> > dhcp-option=tag:interface
>> > -eth0,6,10.10.22.1,195.10.208.2,91.198.250.2
>> > 2015-11-24 11:24:33,809  CsFile.py search:123 Searching for
>> > dhcp-option=tag:interface-eth0,3, and replacing with
>> > dhcp-option=tag:interface-eth0,3,10.10.22.1
>> > 2015-11-24 11:24:33,809  CsFile.py search:123 Searching for
>> > dhcp-option=tag:interface-eth0,1, and replacing with
>> > dhcp-option=tag:interface-eth0,1,255.255.255.0
>> > 2015-11-24 11:24:33,810  CsHelper.py execute:160 Executing: service
>> > dnsmasq restart
>> > 
>> > ==> /var/log/messages <==
>> > Nov 24 11:24:34 r-504-VM shutdown[6752]: shutting down for system
>> > halt
>> > 
>> > Broadcast message from root@r-504-VM (Tue Nov 24 11:24:34 2015):
>> > 
>> > The system is going down for system halt NOW!
>> > Nov 24 11:24:35 r-504-VM KVP: KVP starting; pid is:6844
>> > 
>> > ==> /var/log/cloud.log <==
>> > /opt/cloud/bin/vr_cfg.sh: line 60:  6603
>> > Killed  /opt/cloud/bin/update_config.py
>> > vm_dhcp_entry.json
>> > 
>> > ==> /var/log/messages <==
>> > Nov 24 11:24:35 r-504-VM cloud: VR config: executing
>> > failed: /opt/cloud/bin/update_config.py vm_dhcp_entry.json
>> > 
>> > ==> /var/log/cloud.log <==
>> > Tue Nov 24 11:24:35 UTC 2015 : VR config: executing
>> > failed: /opt/cloud/bin/update_config.py vm_dhcp_entry.json
>> > Connection to 169.254.2.192 closed by remote host.
>> > Connection to 169.254.2.192 closed.
>> > 
>> > 
>> > the management-server.log shows
>> > 
>> > 2015-11-24 12:24:43,015 DEBUG