Re: [openstack-dev] [tripleo][ci] Temporary increase for the OVB undercloud instance memory

2016-09-22 Thread Emilien Macchi
On Thu, Sep 22, 2016 at 1:40 PM, Steven Hardy  wrote:
> On Thu, Sep 22, 2016 at 04:36:30PM +0200, Gabriele Cerami wrote:
>> Hi,
>>
>> As reported on this bug
>>
>> https://bugs.launchpad.net/tripleo/+bug/1626483
>>
>> HA gate and periodic jobs for master and sometimes newton started to
>> fail for errors related to memory shortage. Memory on undercloud
>> instance was increased to 8G less than a month ago, so the problem
>> needs a different approach to be solved.
>>
>> We have some solutions in store. However, with the release date so
>> close, I don't think it's time for this kind of changes. So I thought
>> it could be a good compromise to temporarily increase the undercloud
>> instance memory to 12G, just for this week, unless there's a rapid way
>> to reduce memory footprint for heat-engine (usually the biggest memory
>> consumer on the undercloud instance)
>
> If we can avoid it, I'd rather we avoided increasing the ram again - I
> suspect there is an issue with a heat regression as I'm seeing much higher
> memory usage in my local test environment too.
>
> I did a quick re-test of some local monitoring I did earlier in the cycle
> when we experienced some high memory usage:
>
> http://people.redhat.com/~shardy/heat/plots/heat_before_after_end_newton.png
>
> There are three plots there, one early in the cycle, one after some fixes
> which reduced memory usage a lot, then the highest leaky plot is the one I
> just did today.
>
> So I'm pretty sure we have another heat memory leak to track down.
>
> If anyone has any historical data of memory usage e.g from periodic CI
> runs, that would be helpful, otherwise we'll have to bisect testing locally
> or derive it from scraping our dstat data from CI run logs.
>
> Steve.

Steve, I dropped a comment in your Heat bug report, that might be
related to our CI problem:
https://bugs.launchpad.net/heat/+bug/1626675/comments/1

I hope it helps,
-- 
Emilien Macchi

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo][ci] Temporary increase for the OVB undercloud instance memory

2016-09-22 Thread Steven Hardy
On Thu, Sep 22, 2016 at 04:36:30PM +0200, Gabriele Cerami wrote:
> Hi,
> 
> As reported on this bug
> 
> https://bugs.launchpad.net/tripleo/+bug/1626483
> 
> HA gate and periodic jobs for master and sometimes newton started to
> fail for errors related to memory shortage. Memory on undercloud
> instance was increased to 8G less than a month ago, so the problem
> needs a different approach to be solved. 
> 
> We have some solutions in store. However, with the release date so
> close, I don't think it's time for this kind of changes. So I thought
> it could be a good compromise to temporarily increase the undercloud
> instance memory to 12G, just for this week, unless there's a rapid way
> to reduce memory footprint for heat-engine (usually the biggest memory
> consumer on the undercloud instance)

If we can avoid it, I'd rather we avoided increasing the ram again - I
suspect there is an issue with a heat regression as I'm seeing much higher
memory usage in my local test environment too.

I did a quick re-test of some local monitoring I did earlier in the cycle
when we experienced some high memory usage:

http://people.redhat.com/~shardy/heat/plots/heat_before_after_end_newton.png

There are three plots there, one early in the cycle, one after some fixes
which reduced memory usage a lot, then the highest leaky plot is the one I
just did today.

So I'm pretty sure we have another heat memory leak to track down.

If anyone has any historical data of memory usage e.g from periodic CI
runs, that would be helpful, otherwise we'll have to bisect testing locally
or derive it from scraping our dstat data from CI run logs.

Steve.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo][ci] Temporary increase for the OVB undercloud instance memory

2016-09-22 Thread Brent Eagles
Hi,

On Thu, Sep 22, 2016 at 1:48 PM, James Slagle 
wrote:

> On Thu, Sep 22, 2016 at 10:36 AM, Gabriele Cerami 
> wrote:
> > Hi,
> >
> > As reported on this bug
> >
> > https://bugs.launchpad.net/tripleo/+bug/1626483
> >
> > HA gate and periodic jobs for master and sometimes newton started to
> > fail for errors related to memory shortage. Memory on undercloud
> > instance was increased to 8G less than a month ago, so the problem
> > needs a different approach to be solved.
> >
> > We have some solutions in store. However, with the release date so
> > close, I don't think it's time for this kind of changes. So I thought
> > it could be a good compromise to temporarily increase the undercloud
> > instance memory to 12G, just for this week, unless there's a rapid way
> > to reduce memory footprint for heat-engine (usually the biggest memory
> > consumer on the undercloud instance)
> >
> > Any other ideas ?
>
> The OOM error in the bug is from overcloud-controller-0, not the
> undercloud. The overcloud nodes in OVB are still at 6GB. I think it
> would be reasonable to increase those to 8GB as well.
>
> I also noticed that there are 4 neutron-server processes despite
> having NeutronWorkers: 1 in
> https://github.com/openstack-infra/tripleo-ci/blob/master/
> test-environments/worker-config.yaml.
> If we can get that down to 1, looks like that might save around 270MB.
>
> It also looks like there are 2 nova-api workers despite having
> NovaWorkers: 1. Is that normal? Getting rid of one of them would save
> around another 140MB.
>
> --
> -- James Slagle
> --


​In the case of neutron and some of the other worker settings, we can try
using 0 instead of 1. ​

​This should prevent the spawning of a separate worker processes for
anything that is going on. IIRC, neutron server itself spawns separate API
and RPC workers.

​FWIW, I don't think '0' works for nova though. I think if you do that it
leaves it unset and it will use the service default.​

Cheers,

Brent
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo][ci] Temporary increase for the OVB undercloud instance memory

2016-09-22 Thread James Slagle
On Thu, Sep 22, 2016 at 10:36 AM, Gabriele Cerami  wrote:
> Hi,
>
> As reported on this bug
>
> https://bugs.launchpad.net/tripleo/+bug/1626483
>
> HA gate and periodic jobs for master and sometimes newton started to
> fail for errors related to memory shortage. Memory on undercloud
> instance was increased to 8G less than a month ago, so the problem
> needs a different approach to be solved.
>
> We have some solutions in store. However, with the release date so
> close, I don't think it's time for this kind of changes. So I thought
> it could be a good compromise to temporarily increase the undercloud
> instance memory to 12G, just for this week, unless there's a rapid way
> to reduce memory footprint for heat-engine (usually the biggest memory
> consumer on the undercloud instance)
>
> Any other ideas ?

The OOM error in the bug is from overcloud-controller-0, not the
undercloud. The overcloud nodes in OVB are still at 6GB. I think it
would be reasonable to increase those to 8GB as well.

I also noticed that there are 4 neutron-server processes despite
having NeutronWorkers: 1 in
https://github.com/openstack-infra/tripleo-ci/blob/master/test-environments/worker-config.yaml.
If we can get that down to 1, looks like that might save around 270MB.

It also looks like there are 2 nova-api workers despite having
NovaWorkers: 1. Is that normal? Getting rid of one of them would save
around another 140MB.

-- 
-- James Slagle
--

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo][ci] Temporary increase for the OVB undercloud instance memory

2016-09-22 Thread Ben Nemec



On 09/22/2016 09:36 AM, Gabriele Cerami wrote:

Hi,

As reported on this bug

https://bugs.launchpad.net/tripleo/+bug/1626483

HA gate and periodic jobs for master and sometimes newton started to
fail for errors related to memory shortage. Memory on undercloud
instance was increased to 8G less than a month ago, so the problem
needs a different approach to be solved.


Which was a pretty significant jump for 6 GB before that.  Part of the 
motivation for going to 8 was to move us in line with the rest of the 
infra Jenkins instances, so it would not be ideal to change it again.




We have some solutions in store. However, with the release date so
close, I don't think it's time for this kind of changes. So I thought
it could be a good compromise to temporarily increase the undercloud
instance memory to 12G, just for this week, unless there's a rapid way
to reduce memory footprint for heat-engine (usually the biggest memory
consumer on the undercloud instance)


This is fine for CI and the handful of us who have beefy development 
machines, but are we really at a point now where our memory usage 
_requires_ 12 GB on the undercloud and somewhere north of 6 GB on the 
overcloud nodes (we're also getting quite a few OOMs on overcloud nodes 
in HA deployments lately, with 6 GB instances)?  For an HA deployment, 
that means 40 GB of memory just for the VMs, assuming 7 GB overcloud 
nodes.  And _that's_ without ceph or the ability to test scaleup 
or...you get the idea.


Our developer hardware situation is bad enough as it is.  Requiring a 64 
GB box just to do one of the most common deploy types feels untenable to 
me.  Would providing a worker config that reduces the number of worker 
processes be sufficient to keep us at 8 GB?  We just added a similar 
thing to tripleo-heat-templates for the overcloud, so I think that would 
be reasonable.


Mostly we have to stop bloating the memory usage of even basic 
deployments.  It took us less than a month to use up the extra 2 GB we 
gave ourselves last time.  That's not a good trend. :-/




Any other ideas ?

thanks.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo][ci] Temporary increase for the OVB undercloud instance memory

2016-09-22 Thread Gabriele Cerami
Hi,

As reported on this bug

https://bugs.launchpad.net/tripleo/+bug/1626483

HA gate and periodic jobs for master and sometimes newton started to
fail for errors related to memory shortage. Memory on undercloud
instance was increased to 8G less than a month ago, so the problem
needs a different approach to be solved. 

We have some solutions in store. However, with the release date so
close, I don't think it's time for this kind of changes. So I thought
it could be a good compromise to temporarily increase the undercloud
instance memory to 12G, just for this week, unless there's a rapid way
to reduce memory footprint for heat-engine (usually the biggest memory
consumer on the undercloud instance)

Any other ideas ?

thanks.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev