Re: [openstack-dev] [tripleo][ci] Temporary increase for the OVB undercloud instance memory
On Thu, Sep 22, 2016 at 1:40 PM, Steven Hardywrote: > On Thu, Sep 22, 2016 at 04:36:30PM +0200, Gabriele Cerami wrote: >> Hi, >> >> As reported on this bug >> >> https://bugs.launchpad.net/tripleo/+bug/1626483 >> >> HA gate and periodic jobs for master and sometimes newton started to >> fail for errors related to memory shortage. Memory on undercloud >> instance was increased to 8G less than a month ago, so the problem >> needs a different approach to be solved. >> >> We have some solutions in store. However, with the release date so >> close, I don't think it's time for this kind of changes. So I thought >> it could be a good compromise to temporarily increase the undercloud >> instance memory to 12G, just for this week, unless there's a rapid way >> to reduce memory footprint for heat-engine (usually the biggest memory >> consumer on the undercloud instance) > > If we can avoid it, I'd rather we avoided increasing the ram again - I > suspect there is an issue with a heat regression as I'm seeing much higher > memory usage in my local test environment too. > > I did a quick re-test of some local monitoring I did earlier in the cycle > when we experienced some high memory usage: > > http://people.redhat.com/~shardy/heat/plots/heat_before_after_end_newton.png > > There are three plots there, one early in the cycle, one after some fixes > which reduced memory usage a lot, then the highest leaky plot is the one I > just did today. > > So I'm pretty sure we have another heat memory leak to track down. > > If anyone has any historical data of memory usage e.g from periodic CI > runs, that would be helpful, otherwise we'll have to bisect testing locally > or derive it from scraping our dstat data from CI run logs. > > Steve. Steve, I dropped a comment in your Heat bug report, that might be related to our CI problem: https://bugs.launchpad.net/heat/+bug/1626675/comments/1 I hope it helps, -- Emilien Macchi __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tripleo][ci] Temporary increase for the OVB undercloud instance memory
On Thu, Sep 22, 2016 at 04:36:30PM +0200, Gabriele Cerami wrote: > Hi, > > As reported on this bug > > https://bugs.launchpad.net/tripleo/+bug/1626483 > > HA gate and periodic jobs for master and sometimes newton started to > fail for errors related to memory shortage. Memory on undercloud > instance was increased to 8G less than a month ago, so the problem > needs a different approach to be solved. > > We have some solutions in store. However, with the release date so > close, I don't think it's time for this kind of changes. So I thought > it could be a good compromise to temporarily increase the undercloud > instance memory to 12G, just for this week, unless there's a rapid way > to reduce memory footprint for heat-engine (usually the biggest memory > consumer on the undercloud instance) If we can avoid it, I'd rather we avoided increasing the ram again - I suspect there is an issue with a heat regression as I'm seeing much higher memory usage in my local test environment too. I did a quick re-test of some local monitoring I did earlier in the cycle when we experienced some high memory usage: http://people.redhat.com/~shardy/heat/plots/heat_before_after_end_newton.png There are three plots there, one early in the cycle, one after some fixes which reduced memory usage a lot, then the highest leaky plot is the one I just did today. So I'm pretty sure we have another heat memory leak to track down. If anyone has any historical data of memory usage e.g from periodic CI runs, that would be helpful, otherwise we'll have to bisect testing locally or derive it from scraping our dstat data from CI run logs. Steve. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tripleo][ci] Temporary increase for the OVB undercloud instance memory
Hi, On Thu, Sep 22, 2016 at 1:48 PM, James Slaglewrote: > On Thu, Sep 22, 2016 at 10:36 AM, Gabriele Cerami > wrote: > > Hi, > > > > As reported on this bug > > > > https://bugs.launchpad.net/tripleo/+bug/1626483 > > > > HA gate and periodic jobs for master and sometimes newton started to > > fail for errors related to memory shortage. Memory on undercloud > > instance was increased to 8G less than a month ago, so the problem > > needs a different approach to be solved. > > > > We have some solutions in store. However, with the release date so > > close, I don't think it's time for this kind of changes. So I thought > > it could be a good compromise to temporarily increase the undercloud > > instance memory to 12G, just for this week, unless there's a rapid way > > to reduce memory footprint for heat-engine (usually the biggest memory > > consumer on the undercloud instance) > > > > Any other ideas ? > > The OOM error in the bug is from overcloud-controller-0, not the > undercloud. The overcloud nodes in OVB are still at 6GB. I think it > would be reasonable to increase those to 8GB as well. > > I also noticed that there are 4 neutron-server processes despite > having NeutronWorkers: 1 in > https://github.com/openstack-infra/tripleo-ci/blob/master/ > test-environments/worker-config.yaml. > If we can get that down to 1, looks like that might save around 270MB. > > It also looks like there are 2 nova-api workers despite having > NovaWorkers: 1. Is that normal? Getting rid of one of them would save > around another 140MB. > > -- > -- James Slagle > -- In the case of neutron and some of the other worker settings, we can try using 0 instead of 1. This should prevent the spawning of a separate worker processes for anything that is going on. IIRC, neutron server itself spawns separate API and RPC workers. FWIW, I don't think '0' works for nova though. I think if you do that it leaves it unset and it will use the service default. Cheers, Brent __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tripleo][ci] Temporary increase for the OVB undercloud instance memory
On Thu, Sep 22, 2016 at 10:36 AM, Gabriele Ceramiwrote: > Hi, > > As reported on this bug > > https://bugs.launchpad.net/tripleo/+bug/1626483 > > HA gate and periodic jobs for master and sometimes newton started to > fail for errors related to memory shortage. Memory on undercloud > instance was increased to 8G less than a month ago, so the problem > needs a different approach to be solved. > > We have some solutions in store. However, with the release date so > close, I don't think it's time for this kind of changes. So I thought > it could be a good compromise to temporarily increase the undercloud > instance memory to 12G, just for this week, unless there's a rapid way > to reduce memory footprint for heat-engine (usually the biggest memory > consumer on the undercloud instance) > > Any other ideas ? The OOM error in the bug is from overcloud-controller-0, not the undercloud. The overcloud nodes in OVB are still at 6GB. I think it would be reasonable to increase those to 8GB as well. I also noticed that there are 4 neutron-server processes despite having NeutronWorkers: 1 in https://github.com/openstack-infra/tripleo-ci/blob/master/test-environments/worker-config.yaml. If we can get that down to 1, looks like that might save around 270MB. It also looks like there are 2 nova-api workers despite having NovaWorkers: 1. Is that normal? Getting rid of one of them would save around another 140MB. -- -- James Slagle -- __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tripleo][ci] Temporary increase for the OVB undercloud instance memory
On 09/22/2016 09:36 AM, Gabriele Cerami wrote: Hi, As reported on this bug https://bugs.launchpad.net/tripleo/+bug/1626483 HA gate and periodic jobs for master and sometimes newton started to fail for errors related to memory shortage. Memory on undercloud instance was increased to 8G less than a month ago, so the problem needs a different approach to be solved. Which was a pretty significant jump for 6 GB before that. Part of the motivation for going to 8 was to move us in line with the rest of the infra Jenkins instances, so it would not be ideal to change it again. We have some solutions in store. However, with the release date so close, I don't think it's time for this kind of changes. So I thought it could be a good compromise to temporarily increase the undercloud instance memory to 12G, just for this week, unless there's a rapid way to reduce memory footprint for heat-engine (usually the biggest memory consumer on the undercloud instance) This is fine for CI and the handful of us who have beefy development machines, but are we really at a point now where our memory usage _requires_ 12 GB on the undercloud and somewhere north of 6 GB on the overcloud nodes (we're also getting quite a few OOMs on overcloud nodes in HA deployments lately, with 6 GB instances)? For an HA deployment, that means 40 GB of memory just for the VMs, assuming 7 GB overcloud nodes. And _that's_ without ceph or the ability to test scaleup or...you get the idea. Our developer hardware situation is bad enough as it is. Requiring a 64 GB box just to do one of the most common deploy types feels untenable to me. Would providing a worker config that reduces the number of worker processes be sufficient to keep us at 8 GB? We just added a similar thing to tripleo-heat-templates for the overcloud, so I think that would be reasonable. Mostly we have to stop bloating the memory usage of even basic deployments. It took us less than a month to use up the extra 2 GB we gave ourselves last time. That's not a good trend. :-/ Any other ideas ? thanks. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev