On Thu, Sep 22, 2016 at 04:36:30PM +0200, Gabriele Cerami wrote:
> Hi,
> As reported on this bug
> https://bugs.launchpad.net/tripleo/+bug/1626483
> HA gate and periodic jobs for master and sometimes newton started to
> fail for errors related to memory shortage. Memory on undercloud
> instance was increased to 8G less than a month ago, so the problem
> needs a different approach to be solved. 
> We have some solutions in store. However, with the release date so
> close, I don't think it's time for this kind of changes. So I thought
> it could be a good compromise to temporarily increase the undercloud
> instance memory to 12G, just for this week, unless there's a rapid way
> to reduce memory footprint for heat-engine (usually the biggest memory
> consumer on the undercloud instance)

If we can avoid it, I'd rather we avoided increasing the ram again - I
suspect there is an issue with a heat regression as I'm seeing much higher
memory usage in my local test environment too.

I did a quick re-test of some local monitoring I did earlier in the cycle
when we experienced some high memory usage:


There are three plots there, one early in the cycle, one after some fixes
which reduced memory usage a lot, then the highest leaky plot is the one I
just did today.

So I'm pretty sure we have another heat memory leak to track down.

If anyone has any historical data of memory usage e.g from periodic CI
runs, that would be helpful, otherwise we'll have to bisect testing locally
or derive it from scraping our dstat data from CI run logs.


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

Reply via email to