Re: Very slow Virtual Router provisioning with 4.9.2.0
Hi, To conclude, after doing a lot of work in debugging we were able to reduce the deployment of our VRs from ~2 hours to ~5 MINUTES. Two PRs are open for this against the 4.9 branch: - https://github.com/apache/cloudstack/pull/2077 - https://github.com/apache/cloudstack/pull/2089 The problem was that in Basic Networking each VR would get ALL DHCP information instead of just the information for that POD (PR 2077). The other issue is that for each entry dnsmasq and Apache would be restarted combined with some other things. By delaying this to the end of the provision of the router we save a lot of time. Both PRs are running in production on our cloud in Basic Networking with a few thousands Instances behind it. No problems found so far. Wido > Op 2 mei 2017 om 19:57 schreef Wido den Hollander: > > > Hi, > > Last night I upgraded a CloudStack 4.5.2 setup to 4.9.2.0. All went well, but > the VR provisioning is terribly slow which causes all kinds of problems. > > The vr_cfg.sh and update_config.py scripts start to run. Restart dnsmasq, add > metadata, etc. > > But for just 1800 hosts this can take up to 2 hours and that causes timeouts > in the management server and other problems. > > 2 hours is just very, very slow. So I am starting to wonder if something is > wrong here. > > Did anybody else see this? > > Running Basic Networking with CloudStack 4.9.2.0 > > Wido
Re: Very slow Virtual Router provisioning with 4.9.2.0
As far as I can tell there was a missing license header and no license issue at all. On the other hand, both didn’t pass Jenkins or travis and I did not analyse the code fully enough so I could do with intensive review. Both having failed and their warranty at merge. https://github.com/apache/cloudstack/pull/2083 https://github.com/apache/cloudstack/pull/2084 On 04/05/17 13:33, "Wido den Hollander" <w...@widodh.nl> wrote: Here we go: https://github.com/apache/cloudstack/pull/2077 That works really well for us. 70% less DHCP entries in our VR in Basic Networking since only the entries for that POD are send to the VR. Wido > Op 4 mei 2017 om 12:12 schreef Wido den Hollander <w...@widodh.nl>: > > Op 4 mei 2017 om 11:11 schreef Wei ZHOU <ustcweiz...@gmail.com>: > > > 2017-05-04 10:48 GMT+02:00 Wido den Hollander <w...@widodh.nl>: > > > > Op 3 mei 2017 om 14:49 schreef Daan Hoogland <daan.hoogl...@gmail.com>: > > > > Happy to pick this up, Remi. I'm travelling now but will look at both on > > > > Friday. > > > > On 3 May 2017 2:25 pm, "Remi Bergsma" <rberg...@schubergphilis.com> > > > > > On 03/05/2017, 13:44, "Rohit Yadav" <rohit.ya...@shapeblue.com> wrote: > > > > > From: Remi Bergsma <rberg...@schubergphilis.com> > > > > > Sent: 03 May 2017 16:58:18 > > > > > To: dev@cloudstack.apache.org > > > > > Subject: Re: Very slow Virtual Router provisioning with 4.9.2.0 > > > > > > > > > > Hi, > > > > > > > > > > The patches I talked about: > > > > > > > > > > 1) Iptables speed improvement > > > > > https://github.com/apache/cloudstack/pull/1482 > > > > > Was reverted due to a licensing issue. > > > > > > > > > > 2) Passwd speed improvement > > > > > https://github.com/MissionCriticalCloudOldRepos/ > > > cosmic-core/pull/138 > > > > > > > > > > By now, these are rather old patches so they need some work before > > > > > they apply to CloudStack again. > > > > > daan.hoogl...@shapeblue.com www.shapeblue.com 53 Chandos Place, Covent Garden, London WC2N 4HSUK @shapeblue
Re: Very slow Virtual Router provisioning with 4.9.2.0
Here we go: https://github.com/apache/cloudstack/pull/2077 That works really well for us. 70% less DHCP entries in our VR in Basic Networking since only the entries for that POD are send to the VR. Wido > Op 4 mei 2017 om 12:12 schreef Wido den Hollander <w...@widodh.nl>: > > > Hi, > > Yes, we are working on a few low hanging fruit fixes. Like checking if the > last restart of dnsmasq was < 10 sec ago. If so, skip the restart. > > Will report back once we have anything. > > Wido > > > Op 4 mei 2017 om 11:11 schreef Wei ZHOU <ustcweiz...@gmail.com>: > > > > > > Hi Wido, > > > > A simple improvement is, donot wait while restarting dnsmasq service in VR. > > > > > > ''' > > diff --git a/systemvm/patches/debian/config/opt/cloud/bin/cs/CsDhcp.py > > b/systemvm/patches/debian/config/opt/cloud/bin/cs/CsDhcp.py > > index 95d2eff..999be8f 100755 > > --- a/systemvm/patches/debian/config/opt/cloud/bin/cs/CsDhcp.py > > +++ b/systemvm/patches/debian/config/opt/cloud/bin/cs/CsDhcp.py > > @@ -59,7 +59,7 @@ class CsDhcp(CsDataBag): > > > > # We restart DNSMASQ every time the configure.py is called in > > order to avoid lease problems. > > if not self.cl.is_redundant() or self.cl.is_master(): > > -CsHelper.service("dnsmasq", "restart") > > +CsHelper.execute3("service dnsmasq restart") > > > > def configure_server(self): > > # self.conf.addeq("dhcp-hostsfile=%s" % DHCP_HOSTS) > > diff --git a/systemvm/patches/debian/config/opt/cloud/bin/cs/CsHelper.py > > b/systemvm/patches/debian/config/opt/cloud/bin/cs/CsHelper.py > > index a8ccea2..b06bde3 100755 > > --- a/systemvm/patches/debian/config/opt/cloud/bin/cs/CsHelper.py > > +++ b/systemvm/patches/debian/config/opt/cloud/bin/cs/CsHelper.py > > @@ -191,6 +191,11 @@ def execute2(command): > > p.wait() > > return p > > > > +def execute3(command): > > +""" Execute command """ > > +logging.debug("Executing: %s" % command) > > +p = subprocess.Popen(command, stdout=subprocess.PIPE, > > stderr=subprocess.PIPE, shell=True) > > +return p > > > > def service(name, op): > > execute("service %s %s" % (name, op)) > > ''' > > > > -Wei > > > > > > 2017-05-04 10:48 GMT+02:00 Wido den Hollander <w...@widodh.nl>: > > > > > Thanks Daan, Remi. > > > > > > I found a additional bug where it seems that > > > 'network.dns.basiczone.updates' > > > isn't read when sending DHCP settings in Basic Networking. > > > > > > This means that the VR gets all DHCP setting for the whole zone instead of > > > just for that POD. > > > > > > In this case some VRs we have get ~2k of DHCP offerings send to them which > > > causes a large slowdown. > > > > > > Wido > > > > > > > Op 3 mei 2017 om 14:49 schreef Daan Hoogland <daan.hoogl...@gmail.com>: > > > > > > > > > > > > Happy to pick this up, Remi. I'm travelling now but will look at both on > > > > Friday. > > > > > > > > Biligual auto correct use. Read at your own risico > > > > > > > > On 3 May 2017 2:25 pm, "Remi Bergsma" <rberg...@schubergphilis.com> > > > wrote: > > > > > > > > > Always happy to share, but I won’t have time to work on porting this > > > > > to > > > > > CloudStack any time soon. > > > > > > > > > > Regards, Remi > > > > > > > > > > > > > > > On 03/05/2017, 13:44, "Rohit Yadav" <rohit.ya...@shapeblue.com> wrote: > > > > > > > > > > Hi Remi, thanks for sharing. We would love to have those changes > > > (for > > > > > 4.9+), looking forward to your pull requests. > > > > > > > > > > > > > > > Regards. > > > > > > > > > > > > > > > From: Remi Bergsma <rberg...@schubergphilis.com> > > > > > Sent: 03 May 2017 16:58:18 > > > > > To: dev@cloudstack.apache.org > > > > > Subject: Re: Very slow Virtual Router provisioning with 4.9.2.0 > > > > > > > > > > Hi, > > > > > > > > > > The patc
Re: Very slow Virtual Router provisioning with 4.9.2.0
Hi, Yes, we are working on a few low hanging fruit fixes. Like checking if the last restart of dnsmasq was < 10 sec ago. If so, skip the restart. Will report back once we have anything. Wido > Op 4 mei 2017 om 11:11 schreef Wei ZHOU <ustcweiz...@gmail.com>: > > > Hi Wido, > > A simple improvement is, donot wait while restarting dnsmasq service in VR. > > > ''' > diff --git a/systemvm/patches/debian/config/opt/cloud/bin/cs/CsDhcp.py > b/systemvm/patches/debian/config/opt/cloud/bin/cs/CsDhcp.py > index 95d2eff..999be8f 100755 > --- a/systemvm/patches/debian/config/opt/cloud/bin/cs/CsDhcp.py > +++ b/systemvm/patches/debian/config/opt/cloud/bin/cs/CsDhcp.py > @@ -59,7 +59,7 @@ class CsDhcp(CsDataBag): > > # We restart DNSMASQ every time the configure.py is called in > order to avoid lease problems. > if not self.cl.is_redundant() or self.cl.is_master(): > -CsHelper.service("dnsmasq", "restart") > +CsHelper.execute3("service dnsmasq restart") > > def configure_server(self): > # self.conf.addeq("dhcp-hostsfile=%s" % DHCP_HOSTS) > diff --git a/systemvm/patches/debian/config/opt/cloud/bin/cs/CsHelper.py > b/systemvm/patches/debian/config/opt/cloud/bin/cs/CsHelper.py > index a8ccea2..b06bde3 100755 > --- a/systemvm/patches/debian/config/opt/cloud/bin/cs/CsHelper.py > +++ b/systemvm/patches/debian/config/opt/cloud/bin/cs/CsHelper.py > @@ -191,6 +191,11 @@ def execute2(command): > p.wait() > return p > > +def execute3(command): > +""" Execute command """ > +logging.debug("Executing: %s" % command) > +p = subprocess.Popen(command, stdout=subprocess.PIPE, > stderr=subprocess.PIPE, shell=True) > +return p > > def service(name, op): > execute("service %s %s" % (name, op)) > ''' > > -Wei > > > 2017-05-04 10:48 GMT+02:00 Wido den Hollander <w...@widodh.nl>: > > > Thanks Daan, Remi. > > > > I found a additional bug where it seems that 'network.dns.basiczone.updates' > > isn't read when sending DHCP settings in Basic Networking. > > > > This means that the VR gets all DHCP setting for the whole zone instead of > > just for that POD. > > > > In this case some VRs we have get ~2k of DHCP offerings send to them which > > causes a large slowdown. > > > > Wido > > > > > Op 3 mei 2017 om 14:49 schreef Daan Hoogland <daan.hoogl...@gmail.com>: > > > > > > > > > Happy to pick this up, Remi. I'm travelling now but will look at both on > > > Friday. > > > > > > Biligual auto correct use. Read at your own risico > > > > > > On 3 May 2017 2:25 pm, "Remi Bergsma" <rberg...@schubergphilis.com> > > wrote: > > > > > > > Always happy to share, but I won’t have time to work on porting this to > > > > CloudStack any time soon. > > > > > > > > Regards, Remi > > > > > > > > > > > > On 03/05/2017, 13:44, "Rohit Yadav" <rohit.ya...@shapeblue.com> wrote: > > > > > > > > Hi Remi, thanks for sharing. We would love to have those changes > > (for > > > > 4.9+), looking forward to your pull requests. > > > > > > > > > > > > Regards. > > > > > > > > > > > > From: Remi Bergsma <rberg...@schubergphilis.com> > > > > Sent: 03 May 2017 16:58:18 > > > > To: dev@cloudstack.apache.org > > > > Subject: Re: Very slow Virtual Router provisioning with 4.9.2.0 > > > > > > > > Hi, > > > > > > > > The patches I talked about: > > > > > > > > 1) Iptables speed improvement > > > > https://github.com/apache/cloudstack/pull/1482 > > > > Was reverted due to a licensing issue. > > > > > > > > 2) Passwd speed improvement > > > > https://github.com/MissionCriticalCloudOldRepos/ > > cosmic-core/pull/138 > > > > > > > > By now, these are rather old patches so they need some work before > > > > they apply to CloudStack again. > > > > > > > > Regards, Remi > > > > > > > > > > > > > > > > On 03/05/2017, 12:49, "Jeff Hair" <j...@greenqloud.com> wrote: > > > > > > > > Hi Remi, > > > >
Re: Very slow Virtual Router provisioning with 4.9.2.0
Hi Wido, A simple improvement is, donot wait while restarting dnsmasq service in VR. ''' diff --git a/systemvm/patches/debian/config/opt/cloud/bin/cs/CsDhcp.py b/systemvm/patches/debian/config/opt/cloud/bin/cs/CsDhcp.py index 95d2eff..999be8f 100755 --- a/systemvm/patches/debian/config/opt/cloud/bin/cs/CsDhcp.py +++ b/systemvm/patches/debian/config/opt/cloud/bin/cs/CsDhcp.py @@ -59,7 +59,7 @@ class CsDhcp(CsDataBag): # We restart DNSMASQ every time the configure.py is called in order to avoid lease problems. if not self.cl.is_redundant() or self.cl.is_master(): -CsHelper.service("dnsmasq", "restart") +CsHelper.execute3("service dnsmasq restart") def configure_server(self): # self.conf.addeq("dhcp-hostsfile=%s" % DHCP_HOSTS) diff --git a/systemvm/patches/debian/config/opt/cloud/bin/cs/CsHelper.py b/systemvm/patches/debian/config/opt/cloud/bin/cs/CsHelper.py index a8ccea2..b06bde3 100755 --- a/systemvm/patches/debian/config/opt/cloud/bin/cs/CsHelper.py +++ b/systemvm/patches/debian/config/opt/cloud/bin/cs/CsHelper.py @@ -191,6 +191,11 @@ def execute2(command): p.wait() return p +def execute3(command): +""" Execute command """ +logging.debug("Executing: %s" % command) +p = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True) +return p def service(name, op): execute("service %s %s" % (name, op)) ''' -Wei 2017-05-04 10:48 GMT+02:00 Wido den Hollander <w...@widodh.nl>: > Thanks Daan, Remi. > > I found a additional bug where it seems that 'network.dns.basiczone.updates' > isn't read when sending DHCP settings in Basic Networking. > > This means that the VR gets all DHCP setting for the whole zone instead of > just for that POD. > > In this case some VRs we have get ~2k of DHCP offerings send to them which > causes a large slowdown. > > Wido > > > Op 3 mei 2017 om 14:49 schreef Daan Hoogland <daan.hoogl...@gmail.com>: > > > > > > Happy to pick this up, Remi. I'm travelling now but will look at both on > > Friday. > > > > Biligual auto correct use. Read at your own risico > > > > On 3 May 2017 2:25 pm, "Remi Bergsma" <rberg...@schubergphilis.com> > wrote: > > > > > Always happy to share, but I won’t have time to work on porting this to > > > CloudStack any time soon. > > > > > > Regards, Remi > > > > > > > > > On 03/05/2017, 13:44, "Rohit Yadav" <rohit.ya...@shapeblue.com> wrote: > > > > > > Hi Remi, thanks for sharing. We would love to have those changes > (for > > > 4.9+), looking forward to your pull requests. > > > > > > > > > Regards. > > > > > > > > > From: Remi Bergsma <rberg...@schubergphilis.com> > > > Sent: 03 May 2017 16:58:18 > > > To: dev@cloudstack.apache.org > > > Subject: Re: Very slow Virtual Router provisioning with 4.9.2.0 > > > > > > Hi, > > > > > > The patches I talked about: > > > > > > 1) Iptables speed improvement > > > https://github.com/apache/cloudstack/pull/1482 > > > Was reverted due to a licensing issue. > > > > > > 2) Passwd speed improvement > > > https://github.com/MissionCriticalCloudOldRepos/ > cosmic-core/pull/138 > > > > > > By now, these are rather old patches so they need some work before > > > they apply to CloudStack again. > > > > > > Regards, Remi > > > > > > > > > > > > On 03/05/2017, 12:49, "Jeff Hair" <j...@greenqloud.com> wrote: > > > > > > Hi Remi, > > > > > > Do you have a link to the PR that was reverted? And also > possibly > > > the code > > > that makes the password updating more efficient? > > > > > > Jeff > > > > > > On Wed, May 3, 2017 at 10:36 AM, Remi Bergsma < > > > rberg...@schubergphilis.com> > > > wrote: > > > > > > > Hi Wido, > > > > > > > > When we had similar issues last year, we found that for > example > > > comparing > > > > the iptables rules one-by-one is 1000x slower than simply > > > loading them all > > > > at once. Boris rewrote this part in our Cosmic fork, may be > > > worth looking > >
Re: Very slow Virtual Router provisioning with 4.9.2.0
Thanks Daan, Remi. I found a additional bug where it seems that 'network.dns.basiczone.updates' isn't read when sending DHCP settings in Basic Networking. This means that the VR gets all DHCP setting for the whole zone instead of just for that POD. In this case some VRs we have get ~2k of DHCP offerings send to them which causes a large slowdown. Wido > Op 3 mei 2017 om 14:49 schreef Daan Hoogland <daan.hoogl...@gmail.com>: > > > Happy to pick this up, Remi. I'm travelling now but will look at both on > Friday. > > Biligual auto correct use. Read at your own risico > > On 3 May 2017 2:25 pm, "Remi Bergsma" <rberg...@schubergphilis.com> wrote: > > > Always happy to share, but I won’t have time to work on porting this to > > CloudStack any time soon. > > > > Regards, Remi > > > > > > On 03/05/2017, 13:44, "Rohit Yadav" <rohit.ya...@shapeblue.com> wrote: > > > > Hi Remi, thanks for sharing. We would love to have those changes (for > > 4.9+), looking forward to your pull requests. > > > > > > Regards. > > > > ____ > > From: Remi Bergsma <rberg...@schubergphilis.com> > > Sent: 03 May 2017 16:58:18 > > To: dev@cloudstack.apache.org > > Subject: Re: Very slow Virtual Router provisioning with 4.9.2.0 > > > > Hi, > > > > The patches I talked about: > > > > 1) Iptables speed improvement > > https://github.com/apache/cloudstack/pull/1482 > > Was reverted due to a licensing issue. > > > > 2) Passwd speed improvement > > https://github.com/MissionCriticalCloudOldRepos/cosmic-core/pull/138 > > > > By now, these are rather old patches so they need some work before > > they apply to CloudStack again. > > > > Regards, Remi > > > > > > > > On 03/05/2017, 12:49, "Jeff Hair" <j...@greenqloud.com> wrote: > > > > Hi Remi, > > > > Do you have a link to the PR that was reverted? And also possibly > > the code > > that makes the password updating more efficient? > > > > Jeff > > > > On Wed, May 3, 2017 at 10:36 AM, Remi Bergsma < > > rberg...@schubergphilis.com> > > wrote: > > > > > Hi Wido, > > > > > > When we had similar issues last year, we found that for example > > comparing > > > the iptables rules one-by-one is 1000x slower than simply > > loading them all > > > at once. Boris rewrote this part in our Cosmic fork, may be > > worth looking > > > into this again. The PR to CloudStack was merged, but reverted > > later, can't > > > remember why. We run it in production ever since. Also feeding > > passwords to > > > the passwd server is very inefficient (it operates like a > > snowball and gets > > > slower once you have more VMs). That we also fixed in Cosmic, > > not sure if > > > that patch made it upstream. Wrote it about a year ago already. > > > > > > We tested applying 10K iptables rules in just a couple of > > seconds. 1000 > > > VMs takes a few minutes to deploy. > > > > > > Generally speaking I'd suggest looking at the logs to find what > > takes long > > > or is executed a lot of times. Iptables and passwd are two to > > look at. > > > > > > If you want I can lookup the patches. Not handy on my phone now > > ;-) > > > > > > Regards, Remi > > > > > > From: Wido den Hollander <w...@widodh.nl> > > > Sent: Tuesday, May 2, 2017 7:57:08 PM > > > To: dev@cloudstack.apache.org > > > Subject: Very slow Virtual Router provisioning with 4.9.2.0 > > > > > > Hi, > > > > > > Last night I upgraded a CloudStack 4.5.2 setup to 4.9.2.0. All > > went well, > > > but the VR provisioning is terribly slow which causes all kinds > > of problems. > > > > > > The vr_cfg.sh and update_config.py scripts start to run. Restart > > dnsmasq, > > > add metadata, etc. > > > > > > But for just 1800 hosts this can take up to 2 hours and that > > causes > > > timeouts in the management server and other problems. > > > > > > 2 hours is just very, very slow. So I am starting to wonder if > > something > > > is wrong here. > > > > > > Did anybody else see this? > > > > > > Running Basic Networking with CloudStack 4.9.2.0 > > > > > > Wido > > > > > > > > > > > > > rohit.ya...@shapeblue.com > > www.shapeblue.com > > 53 Chandos Place, Covent Garden, London WC2N 4HSUK > > @shapeblue > > > > > > > > > > > >
Re: Very slow Virtual Router provisioning with 4.9.2.0
Thanks Remi for the hint and Daan for pick it up! That is why I like open source software development and this project ;) On 05/03/2017 02:49 PM, Daan Hoogland wrote: > Happy to pick this up, Remi. I'm travelling now but will look at both on > Friday.
Re: Very slow Virtual Router provisioning with 4.9.2.0
Happy to pick this up, Remi. I'm travelling now but will look at both on Friday. Biligual auto correct use. Read at your own risico On 3 May 2017 2:25 pm, "Remi Bergsma" <rberg...@schubergphilis.com> wrote: > Always happy to share, but I won’t have time to work on porting this to > CloudStack any time soon. > > Regards, Remi > > > On 03/05/2017, 13:44, "Rohit Yadav" <rohit.ya...@shapeblue.com> wrote: > > Hi Remi, thanks for sharing. We would love to have those changes (for > 4.9+), looking forward to your pull requests. > > > Regards. > > > From: Remi Bergsma <rberg...@schubergphilis.com> > Sent: 03 May 2017 16:58:18 > To: dev@cloudstack.apache.org > Subject: Re: Very slow Virtual Router provisioning with 4.9.2.0 > > Hi, > > The patches I talked about: > > 1) Iptables speed improvement > https://github.com/apache/cloudstack/pull/1482 > Was reverted due to a licensing issue. > > 2) Passwd speed improvement > https://github.com/MissionCriticalCloudOldRepos/cosmic-core/pull/138 > > By now, these are rather old patches so they need some work before > they apply to CloudStack again. > > Regards, Remi > > > > On 03/05/2017, 12:49, "Jeff Hair" <j...@greenqloud.com> wrote: > > Hi Remi, > > Do you have a link to the PR that was reverted? And also possibly > the code > that makes the password updating more efficient? > > Jeff > > On Wed, May 3, 2017 at 10:36 AM, Remi Bergsma < > rberg...@schubergphilis.com> > wrote: > > > Hi Wido, > > > > When we had similar issues last year, we found that for example > comparing > > the iptables rules one-by-one is 1000x slower than simply > loading them all > > at once. Boris rewrote this part in our Cosmic fork, may be > worth looking > > into this again. The PR to CloudStack was merged, but reverted > later, can't > > remember why. We run it in production ever since. Also feeding > passwords to > > the passwd server is very inefficient (it operates like a > snowball and gets > > slower once you have more VMs). That we also fixed in Cosmic, > not sure if > > that patch made it upstream. Wrote it about a year ago already. > > > > We tested applying 10K iptables rules in just a couple of > seconds. 1000 > > VMs takes a few minutes to deploy. > > > > Generally speaking I'd suggest looking at the logs to find what > takes long > > or is executed a lot of times. Iptables and passwd are two to > look at. > > > > If you want I can lookup the patches. Not handy on my phone now > ;-) > > > > Regards, Remi > > > > From: Wido den Hollander <w...@widodh.nl> > > Sent: Tuesday, May 2, 2017 7:57:08 PM > > To: dev@cloudstack.apache.org > > Subject: Very slow Virtual Router provisioning with 4.9.2.0 > > > > Hi, > > > > Last night I upgraded a CloudStack 4.5.2 setup to 4.9.2.0. All > went well, > > but the VR provisioning is terribly slow which causes all kinds > of problems. > > > > The vr_cfg.sh and update_config.py scripts start to run. Restart > dnsmasq, > > add metadata, etc. > > > > But for just 1800 hosts this can take up to 2 hours and that > causes > > timeouts in the management server and other problems. > > > > 2 hours is just very, very slow. So I am starting to wonder if > something > > is wrong here. > > > > Did anybody else see this? > > > > Running Basic Networking with CloudStack 4.9.2.0 > > > > Wido > > > > > > > rohit.ya...@shapeblue.com > www.shapeblue.com > 53 Chandos Place, Covent Garden, London WC2N 4HSUK > @shapeblue > > > > > >
Re: Very slow Virtual Router provisioning with 4.9.2.0
Always happy to share, but I won’t have time to work on porting this to CloudStack any time soon. Regards, Remi On 03/05/2017, 13:44, "Rohit Yadav" <rohit.ya...@shapeblue.com> wrote: Hi Remi, thanks for sharing. We would love to have those changes (for 4.9+), looking forward to your pull requests. Regards. From: Remi Bergsma <rberg...@schubergphilis.com> Sent: 03 May 2017 16:58:18 To: dev@cloudstack.apache.org Subject: Re: Very slow Virtual Router provisioning with 4.9.2.0 Hi, The patches I talked about: 1) Iptables speed improvement https://github.com/apache/cloudstack/pull/1482 Was reverted due to a licensing issue. 2) Passwd speed improvement https://github.com/MissionCriticalCloudOldRepos/cosmic-core/pull/138 By now, these are rather old patches so they need some work before they apply to CloudStack again. Regards, Remi On 03/05/2017, 12:49, "Jeff Hair" <j...@greenqloud.com> wrote: Hi Remi, Do you have a link to the PR that was reverted? And also possibly the code that makes the password updating more efficient? Jeff On Wed, May 3, 2017 at 10:36 AM, Remi Bergsma <rberg...@schubergphilis.com> wrote: > Hi Wido, > > When we had similar issues last year, we found that for example comparing > the iptables rules one-by-one is 1000x slower than simply loading them all > at once. Boris rewrote this part in our Cosmic fork, may be worth looking > into this again. The PR to CloudStack was merged, but reverted later, can't > remember why. We run it in production ever since. Also feeding passwords to > the passwd server is very inefficient (it operates like a snowball and gets > slower once you have more VMs). That we also fixed in Cosmic, not sure if > that patch made it upstream. Wrote it about a year ago already. > > We tested applying 10K iptables rules in just a couple of seconds. 1000 > VMs takes a few minutes to deploy. > > Generally speaking I'd suggest looking at the logs to find what takes long > or is executed a lot of times. Iptables and passwd are two to look at. > > If you want I can lookup the patches. Not handy on my phone now ;-) > > Regards, Remi > > From: Wido den Hollander <w...@widodh.nl> > Sent: Tuesday, May 2, 2017 7:57:08 PM > To: dev@cloudstack.apache.org > Subject: Very slow Virtual Router provisioning with 4.9.2.0 > > Hi, > > Last night I upgraded a CloudStack 4.5.2 setup to 4.9.2.0. All went well, > but the VR provisioning is terribly slow which causes all kinds of problems. > > The vr_cfg.sh and update_config.py scripts start to run. Restart dnsmasq, > add metadata, etc. > > But for just 1800 hosts this can take up to 2 hours and that causes > timeouts in the management server and other problems. > > 2 hours is just very, very slow. So I am starting to wonder if something > is wrong here. > > Did anybody else see this? > > Running Basic Networking with CloudStack 4.9.2.0 > > Wido > rohit.ya...@shapeblue.com www.shapeblue.com 53 Chandos Place, Covent Garden, London WC2N 4HSUK @shapeblue
Re: Very slow Virtual Router provisioning with 4.9.2.0
Hi Remi, thanks for sharing. We would love to have those changes (for 4.9+), looking forward to your pull requests. Regards. From: Remi Bergsma <rberg...@schubergphilis.com> Sent: 03 May 2017 16:58:18 To: dev@cloudstack.apache.org Subject: Re: Very slow Virtual Router provisioning with 4.9.2.0 Hi, The patches I talked about: 1) Iptables speed improvement https://github.com/apache/cloudstack/pull/1482 Was reverted due to a licensing issue. 2) Passwd speed improvement https://github.com/MissionCriticalCloudOldRepos/cosmic-core/pull/138 By now, these are rather old patches so they need some work before they apply to CloudStack again. Regards, Remi On 03/05/2017, 12:49, "Jeff Hair" <j...@greenqloud.com> wrote: Hi Remi, Do you have a link to the PR that was reverted? And also possibly the code that makes the password updating more efficient? Jeff On Wed, May 3, 2017 at 10:36 AM, Remi Bergsma <rberg...@schubergphilis.com> wrote: > Hi Wido, > > When we had similar issues last year, we found that for example comparing > the iptables rules one-by-one is 1000x slower than simply loading them all > at once. Boris rewrote this part in our Cosmic fork, may be worth looking > into this again. The PR to CloudStack was merged, but reverted later, can't > remember why. We run it in production ever since. Also feeding passwords to > the passwd server is very inefficient (it operates like a snowball and gets > slower once you have more VMs). That we also fixed in Cosmic, not sure if > that patch made it upstream. Wrote it about a year ago already. > > We tested applying 10K iptables rules in just a couple of seconds. 1000 > VMs takes a few minutes to deploy. > > Generally speaking I'd suggest looking at the logs to find what takes long > or is executed a lot of times. Iptables and passwd are two to look at. > > If you want I can lookup the patches. Not handy on my phone now ;-) > > Regards, Remi > > From: Wido den Hollander <w...@widodh.nl> > Sent: Tuesday, May 2, 2017 7:57:08 PM > To: dev@cloudstack.apache.org > Subject: Very slow Virtual Router provisioning with 4.9.2.0 > > Hi, > > Last night I upgraded a CloudStack 4.5.2 setup to 4.9.2.0. All went well, > but the VR provisioning is terribly slow which causes all kinds of problems. > > The vr_cfg.sh and update_config.py scripts start to run. Restart dnsmasq, > add metadata, etc. > > But for just 1800 hosts this can take up to 2 hours and that causes > timeouts in the management server and other problems. > > 2 hours is just very, very slow. So I am starting to wonder if something > is wrong here. > > Did anybody else see this? > > Running Basic Networking with CloudStack 4.9.2.0 > > Wido > rohit.ya...@shapeblue.com www.shapeblue.com 53 Chandos Place, Covent Garden, London WC2N 4HSUK @shapeblue
Re: Very slow Virtual Router provisioning with 4.9.2.0
Hi, The patches I talked about: 1) Iptables speed improvement https://github.com/apache/cloudstack/pull/1482 Was reverted due to a licensing issue. 2) Passwd speed improvement https://github.com/MissionCriticalCloudOldRepos/cosmic-core/pull/138 By now, these are rather old patches so they need some work before they apply to CloudStack again. Regards, Remi On 03/05/2017, 12:49, "Jeff Hair" <j...@greenqloud.com> wrote: Hi Remi, Do you have a link to the PR that was reverted? And also possibly the code that makes the password updating more efficient? Jeff On Wed, May 3, 2017 at 10:36 AM, Remi Bergsma <rberg...@schubergphilis.com> wrote: > Hi Wido, > > When we had similar issues last year, we found that for example comparing > the iptables rules one-by-one is 1000x slower than simply loading them all > at once. Boris rewrote this part in our Cosmic fork, may be worth looking > into this again. The PR to CloudStack was merged, but reverted later, can't > remember why. We run it in production ever since. Also feeding passwords to > the passwd server is very inefficient (it operates like a snowball and gets > slower once you have more VMs). That we also fixed in Cosmic, not sure if > that patch made it upstream. Wrote it about a year ago already. > > We tested applying 10K iptables rules in just a couple of seconds. 1000 > VMs takes a few minutes to deploy. > > Generally speaking I'd suggest looking at the logs to find what takes long > or is executed a lot of times. Iptables and passwd are two to look at. > > If you want I can lookup the patches. Not handy on my phone now ;-) > > Regards, Remi > > From: Wido den Hollander <w...@widodh.nl> > Sent: Tuesday, May 2, 2017 7:57:08 PM > To: dev@cloudstack.apache.org > Subject: Very slow Virtual Router provisioning with 4.9.2.0 > > Hi, > > Last night I upgraded a CloudStack 4.5.2 setup to 4.9.2.0. All went well, > but the VR provisioning is terribly slow which causes all kinds of problems. > > The vr_cfg.sh and update_config.py scripts start to run. Restart dnsmasq, > add metadata, etc. > > But for just 1800 hosts this can take up to 2 hours and that causes > timeouts in the management server and other problems. > > 2 hours is just very, very slow. So I am starting to wonder if something > is wrong here. > > Did anybody else see this? > > Running Basic Networking with CloudStack 4.9.2.0 > > Wido >
Re: Very slow Virtual Router provisioning with 4.9.2.0
Hi Wido, When we had similar issues last year, we found that for example comparing the iptables rules one-by-one is 1000x slower than simply loading them all at once. Boris rewrote this part in our Cosmic fork, may be worth looking into this again. The PR to CloudStack was merged, but reverted later, can't remember why. We run it in production ever since. Also feeding passwords to the passwd server is very inefficient (it operates like a snowball and gets slower once you have more VMs). That we also fixed in Cosmic, not sure if that patch made it upstream. Wrote it about a year ago already. We tested applying 10K iptables rules in just a couple of seconds. 1000 VMs takes a few minutes to deploy. Generally speaking I'd suggest looking at the logs to find what takes long or is executed a lot of times. Iptables and passwd are two to look at. If you want I can lookup the patches. Not handy on my phone now ;-) Regards, Remi From: Wido den Hollander <w...@widodh.nl> Sent: Tuesday, May 2, 2017 7:57:08 PM To: dev@cloudstack.apache.org Subject: Very slow Virtual Router provisioning with 4.9.2.0 Hi, Last night I upgraded a CloudStack 4.5.2 setup to 4.9.2.0. All went well, but the VR provisioning is terribly slow which causes all kinds of problems. The vr_cfg.sh and update_config.py scripts start to run. Restart dnsmasq, add metadata, etc. But for just 1800 hosts this can take up to 2 hours and that causes timeouts in the management server and other problems. 2 hours is just very, very slow. So I am starting to wonder if something is wrong here. Did anybody else see this? Running Basic Networking with CloudStack 4.9.2.0 Wido
Re: Very slow Virtual Router provisioning with 4.9.2.0
Another reason of slow can be VR configuration(persistent VR configuration design). When one component config apply, whole VR configuration apply is executed. Due to this the VR boot up time will increase. Thanks, Jayapal > On May 3, 2017, at 1:55 PM, Marc-Aurèle Brothierwrote: > > Hi Wido, > > Well for us, it's not a version problem, it's simply a design problem. This > VR is very problematic during any upgrade of cloudstack (which I perform > every week almost on our platform), same goes for the secondary storage VMs > which scans all templates. We've planned on our roadmap to get rid of the > system vms. The VR is really a SPoF. > > On Tue, May 2, 2017 at 7:57 PM, Wido den Hollander wrote: > >> Hi, >> >> Last night I upgraded a CloudStack 4.5.2 setup to 4.9.2.0. All went well, >> but the VR provisioning is terribly slow which causes all kinds of problems. >> >> The vr_cfg.sh and update_config.py scripts start to run. Restart dnsmasq, >> add metadata, etc. >> >> But for just 1800 hosts this can take up to 2 hours and that causes >> timeouts in the management server and other problems. >> >> 2 hours is just very, very slow. So I am starting to wonder if something >> is wrong here. >> >> Did anybody else see this? >> >> Running Basic Networking with CloudStack 4.9.2.0 >> >> Wido >> DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Accelerite, a Persistent Systems business. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Accelerite, a Persistent Systems business does not accept any liability for virus infected mails.
Re: Very slow Virtual Router provisioning with 4.9.2.0
Hi Wido, Well for us, it's not a version problem, it's simply a design problem. This VR is very problematic during any upgrade of cloudstack (which I perform every week almost on our platform), same goes for the secondary storage VMs which scans all templates. We've planned on our roadmap to get rid of the system vms. The VR is really a SPoF. On Tue, May 2, 2017 at 7:57 PM, Wido den Hollanderwrote: > Hi, > > Last night I upgraded a CloudStack 4.5.2 setup to 4.9.2.0. All went well, > but the VR provisioning is terribly slow which causes all kinds of problems. > > The vr_cfg.sh and update_config.py scripts start to run. Restart dnsmasq, > add metadata, etc. > > But for just 1800 hosts this can take up to 2 hours and that causes > timeouts in the management server and other problems. > > 2 hours is just very, very slow. So I am starting to wonder if something > is wrong here. > > Did anybody else see this? > > Running Basic Networking with CloudStack 4.9.2.0 > > Wido >
Very slow Virtual Router provisioning with 4.9.2.0
Hi, Last night I upgraded a CloudStack 4.5.2 setup to 4.9.2.0. All went well, but the VR provisioning is terribly slow which causes all kinds of problems. The vr_cfg.sh and update_config.py scripts start to run. Restart dnsmasq, add metadata, etc. But for just 1800 hosts this can take up to 2 hours and that causes timeouts in the management server and other problems. 2 hours is just very, very slow. So I am starting to wonder if something is wrong here. Did anybody else see this? Running Basic Networking with CloudStack 4.9.2.0 Wido