Re: [openstack-dev] [heat][tripleo] Heat memory usage in the TripleO gate during Ocata

2017-01-17 Thread Zane Bitter

On 11/01/17 09:21, Zane Bitter wrote:


From that run, total memory usage by Heat was 2.32GiB. That's a little
lower than the peak that occurred near the end of Newton development for
the legacy path, but still more than double the current legacy path
usage (0.90GiB on the job that ran for that same review). So we have
work to do.

I still expect storing output values in the database at the time
resources are created/updated, rather than generating them on the fly,
will create the biggest savings. There may be other infelicities we can
iron out to get some more wins as well.


Crag and I discovered that we were accidentally loading all of the 
resources from the database when doing a check on one resource 
(basically meaning we had to read O(n^2) resources on each traversal - 
ouch). The patch https://review.openstack.org/#/c/420971/ brings the 
memory usage down to 2.10GiB (10% saving) and has given us a few other 
ideas for further improvements too.


- ZB

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [heat][tripleo] Heat memory usage in the TripleO gate during Ocata

2017-01-11 Thread Zane Bitter

On 06/01/17 16:58, Emilien Macchi wrote:

It's worth reiterating that TripleO still disables convergence in the
undercloud, so these are all tests of the legacy code path. It would be
great if we could set up a non-voting job on t-h-t with convergence enabled
and start tracking memory use over time there too. As a first step, maybe we
could at least add an experimental job on Heat to give us a baseline?

+1. We haven't made any huge changes into that direction, but having
some info would be great.

+1 too. I volunteer to do it.


Emilien kindly set up the experimental job for us, so we now have a 
baseline: https://review.openstack.org/#/c/418583/


From that run, total memory usage by Heat was 2.32GiB. That's a little 
lower than the peak that occurred near the end of Newton development for 
the legacy path, but still more than double the current legacy path 
usage (0.90GiB on the job that ran for that same review). So we have 
work to do.


I still expect storing output values in the database at the time 
resources are created/updated, rather than generating them on the fly, 
will create the biggest savings. There may be other infelicities we can 
iron out to get some more wins as well.


It's worth noting for the record that convergence is an architecture 
designed to allow arbitrary scale-out, even at the cost of CPU/memory 
performance (a common trade-off). Thus TripleO, which combines an 
enormous number of stacks and resources with running on a single 
undercloud server, represents the worst case.


cheers,
Zane.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [heat][tripleo] Heat memory usage in the TripleO gate during Ocata

2017-01-07 Thread Emilien Macchi
On Fri, Jan 6, 2017 at 5:41 PM, Zane Bitter  wrote:
> On 06/01/17 16:58, Emilien Macchi wrote:
>>
>> On Fri, Jan 6, 2017 at 4:35 PM, Thomas Herve  wrote:
>>>
>>> On Fri, Jan 6, 2017 at 6:12 PM, Zane Bitter  wrote:

 It's worth reiterating that TripleO still disables convergence in the
 undercloud, so these are all tests of the legacy code path. It would be
 great if we could set up a non-voting job on t-h-t with convergence
 enabled
 and start tracking memory use over time there too. As a first step,
 maybe we
 could at least add an experimental job on Heat to give us a baseline?
>>>
>>>
>>> +1. We haven't made any huge changes into that direction, but having
>>> some info would be great.
>>
>>
>> +1 too. I volunteer to do it.
>>
>> Quick question: to enable it, is it just a matter of setting
>> convergence_engine to true in heat.conf (on the undercloud)?
>
>
> Yep! Actually, it's even simpler than that: now that true is the default
> (Newton onwards), it's just a matter of _not_ setting it to false :)

done: https://review.openstack.org/#/q/topic:tripleo/heat/convergence

> - ZB
>
>> If not, what else if needed?
>
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



-- 
Emilien Macchi

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [heat][tripleo] Heat memory usage in the TripleO gate during Ocata

2017-01-06 Thread Zane Bitter

On 06/01/17 16:35, Thomas Herve wrote:

Thanks a lot for the analysis. It's great that things haven't gotten off track.


I tracked down most of the step changes to identifiable patches:

2016-10-07: 2.44GiB -> 1.64GiB
 - https://review.openstack.org/382068/ merged, making ResourceInfo classes
more memory-efficient. Judging by the stable branch (where this and the
following patch were merged at different times), this was responsible for
dropping the memory usage from 2.44GiB -> 1.83GiB. (Which seems like a
disproportionately large change?)

Without wanting to get the credit, I believe
https://review.openstack.org/377061/ is more likely the reason here.


It *is* possible (and I had, indeed, forgotten about that patch), since 
those two backports merged at around the same time. However, that patch 
merged to master on 30 September, a week before the other two patches, 
and there was no (downwards) change in the memory usage on master until 
the day after the other two merged. So the evidence is definitely not as 
clear-cut as with some of the others.



 - https://review.openstack.org/#/c/382377/ merged, so we no longer create
multiple yaql contexts. (This was responsible for the drop from 1.83GiB ->
1.64GiB.)

2016-10-17: 1.62GiB -> 0.93GiB
 - https://review.openstack.org/#/c/386696/ merged, reducing the number of
engine workers on the undercloud to 2.

2016-10-19: 0.93GiB -> 0.73GiB (variance also seemed to drop after this)
 - https://review.openstack.org/#/c/386247/ merged (on 2016-10-16), avoiding
loading all nested stacks in a single process simultaneously much of the
time.
 - https://review.openstack.org/#/c/383839/ merged (on 2016-10-16),
switching output calculations to RPC to avoid almost all simultaneous
loading of all nested stacks.

2016-11-08: 0.76GiB -> 0.70GiB
 - This one is a bit of a mystery???

Possibly https://review.openstack.org/390064/ ? Reducing the
environment size could have an effect.


Unlikely; stable/newton fell too (a few days later), but that patch was 
never backported. (Also, it merged on master almost a week before the 
change in memory use.)


It's likely a change in another repo, but I checked the obvious 
candidates (heatclient, tripleoclient) without luck.



2016-11-22: 0.69GiB -> 0.50GiB
 - https://review.openstack.org/#/c/398476/ merged, improving the efficiency
of resource listing?

2016-12-01: 0.49GiB -> 0.88GiB
 - https://review.openstack.org/#/c/399619/ merged, returning the number of
engine workers on the undercloud to 4.



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [heat][tripleo] Heat memory usage in the TripleO gate during Ocata

2017-01-06 Thread Zane Bitter

On 06/01/17 16:40, Hugh Brock wrote:

Why would TripleO not move to convergence at the earliest possible point?


We'll need some data to decide when the earliest possible point is :)

Last time Steve (Hardy) tested it I believe convergence was looking far 
worse than legacy in memory usage, at a time when legacy was already 
through the roof. Clearly a lot has changed since then, so now would be 
a good time to retest and re-evaluate where we stand.


cheers,
Zane.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [heat][tripleo] Heat memory usage in the TripleO gate during Ocata

2017-01-06 Thread Zane Bitter

On 06/01/17 16:58, Emilien Macchi wrote:

On Fri, Jan 6, 2017 at 4:35 PM, Thomas Herve  wrote:

On Fri, Jan 6, 2017 at 6:12 PM, Zane Bitter  wrote:

It's worth reiterating that TripleO still disables convergence in the
undercloud, so these are all tests of the legacy code path. It would be
great if we could set up a non-voting job on t-h-t with convergence enabled
and start tracking memory use over time there too. As a first step, maybe we
could at least add an experimental job on Heat to give us a baseline?


+1. We haven't made any huge changes into that direction, but having
some info would be great.


+1 too. I volunteer to do it.

Quick question: to enable it, is it just a matter of setting
convergence_engine to true in heat.conf (on the undercloud)?


Yep! Actually, it's even simpler than that: now that true is the default 
(Newton onwards), it's just a matter of _not_ setting it to false :)


- ZB


If not, what else if needed?



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [heat][tripleo] Heat memory usage in the TripleO gate during Ocata

2017-01-06 Thread Emilien Macchi
On Fri, Jan 6, 2017 at 4:35 PM, Thomas Herve  wrote:
> On Fri, Jan 6, 2017 at 6:12 PM, Zane Bitter  wrote:
>> tl;dr everything looks great, and memory usage has dropped by about 64%
>> since the initial Newton release of Heat.
>>
>> I re-ran my analysis of Heat memory usage in the tripleo-heat-templates
>> gate. (This is based on the gate-tripleo-ci-centos-7-ovb-nonha job.) Here's
>> a pretty picture:
>>
>> https://fedorapeople.org/~zaneb/tripleo-memory/20170105/heat_memused.png
>>
>> There is one major caveat here: for the period marked in grey where it says
>> "Only 2 engine workers", the job was configured to use only 2 heat-enginer
>> worker processes instead of 4, so this is not an apples-to-apples
>> comparison. The inital drop at the beginning and the subsequent bounce at
>> the end are artifacts of this change. Note that the stable/newton branch is
>> _still_ using only 2 engine workers.
>>
>> The rapidly increasing usage on the left is due to increases in the
>> complexity of the templates during the Newton cycle. It's clear that if
>> there has been any similar complexity growth during Ocata, it has had a tiny
>> effect on memory consumption in comparison.
>
> Thanks a lot for the analysis. It's great that things haven't gotten off 
> track.
>
>> I tracked down most of the step changes to identifiable patches:
>>
>> 2016-10-07: 2.44GiB -> 1.64GiB
>>  - https://review.openstack.org/382068/ merged, making ResourceInfo classes
>> more memory-efficient. Judging by the stable branch (where this and the
>> following patch were merged at different times), this was responsible for
>> dropping the memory usage from 2.44GiB -> 1.83GiB. (Which seems like a
>> disproportionately large change?)
>
> Without wanting to get the credit, I believe
> https://review.openstack.org/377061/ is more likely the reason here.
>
>>  - https://review.openstack.org/#/c/382377/ merged, so we no longer create
>> multiple yaql contexts. (This was responsible for the drop from 1.83GiB ->
>> 1.64GiB.)
>>
>> 2016-10-17: 1.62GiB -> 0.93GiB
>>  - https://review.openstack.org/#/c/386696/ merged, reducing the number of
>> engine workers on the undercloud to 2.
>>
>> 2016-10-19: 0.93GiB -> 0.73GiB (variance also seemed to drop after this)
>>  - https://review.openstack.org/#/c/386247/ merged (on 2016-10-16), avoiding
>> loading all nested stacks in a single process simultaneously much of the
>> time.
>>  - https://review.openstack.org/#/c/383839/ merged (on 2016-10-16),
>> switching output calculations to RPC to avoid almost all simultaneous
>> loading of all nested stacks.
>>
>> 2016-11-08: 0.76GiB -> 0.70GiB
>>  - This one is a bit of a mystery???
>
> Possibly https://review.openstack.org/390064/ ? Reducing the
> environment size could have an effect.
>
>> 2016-11-22: 0.69GiB -> 0.50GiB
>>  - https://review.openstack.org/#/c/398476/ merged, improving the efficiency
>> of resource listing?
>>
>> 2016-12-01: 0.49GiB -> 0.88GiB
>>  - https://review.openstack.org/#/c/399619/ merged, returning the number of
>> engine workers on the undercloud to 4.
>>
>> It's not an exact science because IIUC there's a delay between a patch
>> merging in Heat and it being used in subsequent t-h-t gate jobs. e.g. the
>> change to getting outputs over RPC landed the day before the
>> instack-undercloud patch that cut the number of engine workers, but the
>> effects don't show up until 2 days after. I'd love to figure out what
>> happened on the 8th of November, but I can't correlate it to anything
>> obvious. The attribution of the change on the 22nd also seems dubious, but
>> the timing adds up (including on stable/newton).
>>
>> It's fair to say that none of the other patches we merged in an attempt to
>> reduce memory usage had any discernible effect :D
>>
>> It's worth reiterating that TripleO still disables convergence in the
>> undercloud, so these are all tests of the legacy code path. It would be
>> great if we could set up a non-voting job on t-h-t with convergence enabled
>> and start tracking memory use over time there too. As a first step, maybe we
>> could at least add an experimental job on Heat to give us a baseline?
>
> +1. We haven't made any huge changes into that direction, but having
> some info would be great.

+1 too. I volunteer to do it.

Quick question: to enable it, is it just a matter of setting
convergence_engine to true in heat.conf (on the undercloud)?
If not, what else if needed?

> --
> Thomas
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



-- 
Emilien Macchi

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: 

Re: [openstack-dev] [heat][tripleo] Heat memory usage in the TripleO gate during Ocata

2017-01-06 Thread Hugh Brock
Why would TripleO not move to convergence at the earliest possible point?

On Jan 6, 2017 10:37 PM, "Thomas Herve"  wrote:

> On Fri, Jan 6, 2017 at 6:12 PM, Zane Bitter  wrote:
> > tl;dr everything looks great, and memory usage has dropped by about 64%
> > since the initial Newton release of Heat.
> >
> > I re-ran my analysis of Heat memory usage in the tripleo-heat-templates
> > gate. (This is based on the gate-tripleo-ci-centos-7-ovb-nonha job.)
> Here's
> > a pretty picture:
> >
> > https://fedorapeople.org/~zaneb/tripleo-memory/20170105/heat_memused.png
> >
> > There is one major caveat here: for the period marked in grey where it
> says
> > "Only 2 engine workers", the job was configured to use only 2
> heat-enginer
> > worker processes instead of 4, so this is not an apples-to-apples
> > comparison. The inital drop at the beginning and the subsequent bounce at
> > the end are artifacts of this change. Note that the stable/newton branch
> is
> > _still_ using only 2 engine workers.
> >
> > The rapidly increasing usage on the left is due to increases in the
> > complexity of the templates during the Newton cycle. It's clear that if
> > there has been any similar complexity growth during Ocata, it has had a
> tiny
> > effect on memory consumption in comparison.
>
> Thanks a lot for the analysis. It's great that things haven't gotten off
> track.
>
> > I tracked down most of the step changes to identifiable patches:
> >
> > 2016-10-07: 2.44GiB -> 1.64GiB
> >  - https://review.openstack.org/382068/ merged, making ResourceInfo
> classes
> > more memory-efficient. Judging by the stable branch (where this and the
> > following patch were merged at different times), this was responsible for
> > dropping the memory usage from 2.44GiB -> 1.83GiB. (Which seems like a
> > disproportionately large change?)
>
> Without wanting to get the credit, I believe
> https://review.openstack.org/377061/ is more likely the reason here.
>
> >  - https://review.openstack.org/#/c/382377/ merged, so we no longer
> create
> > multiple yaql contexts. (This was responsible for the drop from 1.83GiB
> ->
> > 1.64GiB.)
> >
> > 2016-10-17: 1.62GiB -> 0.93GiB
> >  - https://review.openstack.org/#/c/386696/ merged, reducing the number
> of
> > engine workers on the undercloud to 2.
> >
> > 2016-10-19: 0.93GiB -> 0.73GiB (variance also seemed to drop after this)
> >  - https://review.openstack.org/#/c/386247/ merged (on 2016-10-16),
> avoiding
> > loading all nested stacks in a single process simultaneously much of the
> > time.
> >  - https://review.openstack.org/#/c/383839/ merged (on 2016-10-16),
> > switching output calculations to RPC to avoid almost all simultaneous
> > loading of all nested stacks.
> >
> > 2016-11-08: 0.76GiB -> 0.70GiB
> >  - This one is a bit of a mystery???
>
> Possibly https://review.openstack.org/390064/ ? Reducing the
> environment size could have an effect.
>
> > 2016-11-22: 0.69GiB -> 0.50GiB
> >  - https://review.openstack.org/#/c/398476/ merged, improving the
> efficiency
> > of resource listing?
> >
> > 2016-12-01: 0.49GiB -> 0.88GiB
> >  - https://review.openstack.org/#/c/399619/ merged, returning the
> number of
> > engine workers on the undercloud to 4.
> >
> > It's not an exact science because IIUC there's a delay between a patch
> > merging in Heat and it being used in subsequent t-h-t gate jobs. e.g. the
> > change to getting outputs over RPC landed the day before the
> > instack-undercloud patch that cut the number of engine workers, but the
> > effects don't show up until 2 days after. I'd love to figure out what
> > happened on the 8th of November, but I can't correlate it to anything
> > obvious. The attribution of the change on the 22nd also seems dubious,
> but
> > the timing adds up (including on stable/newton).
> >
> > It's fair to say that none of the other patches we merged in an attempt
> to
> > reduce memory usage had any discernible effect :D
> >
> > It's worth reiterating that TripleO still disables convergence in the
> > undercloud, so these are all tests of the legacy code path. It would be
> > great if we could set up a non-voting job on t-h-t with convergence
> enabled
> > and start tracking memory use over time there too. As a first step,
> maybe we
> > could at least add an experimental job on Heat to give us a baseline?
>
> +1. We haven't made any huge changes into that direction, but having
> some info would be great.
>
> --
> Thomas
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

Re: [openstack-dev] [heat][tripleo] Heat memory usage in the TripleO gate during Ocata

2017-01-06 Thread Thomas Herve
On Fri, Jan 6, 2017 at 6:12 PM, Zane Bitter  wrote:
> tl;dr everything looks great, and memory usage has dropped by about 64%
> since the initial Newton release of Heat.
>
> I re-ran my analysis of Heat memory usage in the tripleo-heat-templates
> gate. (This is based on the gate-tripleo-ci-centos-7-ovb-nonha job.) Here's
> a pretty picture:
>
> https://fedorapeople.org/~zaneb/tripleo-memory/20170105/heat_memused.png
>
> There is one major caveat here: for the period marked in grey where it says
> "Only 2 engine workers", the job was configured to use only 2 heat-enginer
> worker processes instead of 4, so this is not an apples-to-apples
> comparison. The inital drop at the beginning and the subsequent bounce at
> the end are artifacts of this change. Note that the stable/newton branch is
> _still_ using only 2 engine workers.
>
> The rapidly increasing usage on the left is due to increases in the
> complexity of the templates during the Newton cycle. It's clear that if
> there has been any similar complexity growth during Ocata, it has had a tiny
> effect on memory consumption in comparison.

Thanks a lot for the analysis. It's great that things haven't gotten off track.

> I tracked down most of the step changes to identifiable patches:
>
> 2016-10-07: 2.44GiB -> 1.64GiB
>  - https://review.openstack.org/382068/ merged, making ResourceInfo classes
> more memory-efficient. Judging by the stable branch (where this and the
> following patch were merged at different times), this was responsible for
> dropping the memory usage from 2.44GiB -> 1.83GiB. (Which seems like a
> disproportionately large change?)

Without wanting to get the credit, I believe
https://review.openstack.org/377061/ is more likely the reason here.

>  - https://review.openstack.org/#/c/382377/ merged, so we no longer create
> multiple yaql contexts. (This was responsible for the drop from 1.83GiB ->
> 1.64GiB.)
>
> 2016-10-17: 1.62GiB -> 0.93GiB
>  - https://review.openstack.org/#/c/386696/ merged, reducing the number of
> engine workers on the undercloud to 2.
>
> 2016-10-19: 0.93GiB -> 0.73GiB (variance also seemed to drop after this)
>  - https://review.openstack.org/#/c/386247/ merged (on 2016-10-16), avoiding
> loading all nested stacks in a single process simultaneously much of the
> time.
>  - https://review.openstack.org/#/c/383839/ merged (on 2016-10-16),
> switching output calculations to RPC to avoid almost all simultaneous
> loading of all nested stacks.
>
> 2016-11-08: 0.76GiB -> 0.70GiB
>  - This one is a bit of a mystery???

Possibly https://review.openstack.org/390064/ ? Reducing the
environment size could have an effect.

> 2016-11-22: 0.69GiB -> 0.50GiB
>  - https://review.openstack.org/#/c/398476/ merged, improving the efficiency
> of resource listing?
>
> 2016-12-01: 0.49GiB -> 0.88GiB
>  - https://review.openstack.org/#/c/399619/ merged, returning the number of
> engine workers on the undercloud to 4.
>
> It's not an exact science because IIUC there's a delay between a patch
> merging in Heat and it being used in subsequent t-h-t gate jobs. e.g. the
> change to getting outputs over RPC landed the day before the
> instack-undercloud patch that cut the number of engine workers, but the
> effects don't show up until 2 days after. I'd love to figure out what
> happened on the 8th of November, but I can't correlate it to anything
> obvious. The attribution of the change on the 22nd also seems dubious, but
> the timing adds up (including on stable/newton).
>
> It's fair to say that none of the other patches we merged in an attempt to
> reduce memory usage had any discernible effect :D
>
> It's worth reiterating that TripleO still disables convergence in the
> undercloud, so these are all tests of the legacy code path. It would be
> great if we could set up a non-voting job on t-h-t with convergence enabled
> and start tracking memory use over time there too. As a first step, maybe we
> could at least add an experimental job on Heat to give us a baseline?

+1. We haven't made any huge changes into that direction, but having
some info would be great.

-- 
Thomas

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [heat][tripleo] Heat memory usage in the TripleO gate during Ocata

2017-01-06 Thread Zane Bitter
tl;dr everything looks great, and memory usage has dropped by about 64% 
since the initial Newton release of Heat.


I re-ran my analysis of Heat memory usage in the tripleo-heat-templates 
gate. (This is based on the gate-tripleo-ci-centos-7-ovb-nonha job.) 
Here's a pretty picture:


https://fedorapeople.org/~zaneb/tripleo-memory/20170105/heat_memused.png

There is one major caveat here: for the period marked in grey where it 
says "Only 2 engine workers", the job was configured to use only 2 
heat-enginer worker processes instead of 4, so this is not an 
apples-to-apples comparison. The inital drop at the beginning and the 
subsequent bounce at the end are artifacts of this change. Note that the 
stable/newton branch is _still_ using only 2 engine workers.


The rapidly increasing usage on the left is due to increases in the 
complexity of the templates during the Newton cycle. It's clear that if 
there has been any similar complexity growth during Ocata, it has had a 
tiny effect on memory consumption in comparison.


I tracked down most of the step changes to identifiable patches:

2016-10-07: 2.44GiB -> 1.64GiB
 - https://review.openstack.org/382068/ merged, making ResourceInfo 
classes more memory-efficient. Judging by the stable branch (where this 
and the following patch were merged at different times), this was 
responsible for dropping the memory usage from 2.44GiB -> 1.83GiB. 
(Which seems like a disproportionately large change?)
 - https://review.openstack.org/#/c/382377/ merged, so we no longer 
create multiple yaql contexts. (This was responsible for the drop from 
1.83GiB -> 1.64GiB.)


2016-10-17: 1.62GiB -> 0.93GiB
 - https://review.openstack.org/#/c/386696/ merged, reducing the number 
of engine workers on the undercloud to 2.


2016-10-19: 0.93GiB -> 0.73GiB (variance also seemed to drop after this)
 - https://review.openstack.org/#/c/386247/ merged (on 2016-10-16), 
avoiding loading all nested stacks in a single process simultaneously 
much of the time.
 - https://review.openstack.org/#/c/383839/ merged (on 2016-10-16), 
switching output calculations to RPC to avoid almost all simultaneous 
loading of all nested stacks.


2016-11-08: 0.76GiB -> 0.70GiB
 - This one is a bit of a mystery???

2016-11-22: 0.69GiB -> 0.50GiB
 - https://review.openstack.org/#/c/398476/ merged, improving the 
efficiency of resource listing?


2016-12-01: 0.49GiB -> 0.88GiB
 - https://review.openstack.org/#/c/399619/ merged, returning the 
number of engine workers on the undercloud to 4.


It's not an exact science because IIUC there's a delay between a patch 
merging in Heat and it being used in subsequent t-h-t gate jobs. e.g. 
the change to getting outputs over RPC landed the day before the 
instack-undercloud patch that cut the number of engine workers, but the 
effects don't show up until 2 days after. I'd love to figure out what 
happened on the 8th of November, but I can't correlate it to anything 
obvious. The attribution of the change on the 22nd also seems dubious, 
but the timing adds up (including on stable/newton).


It's fair to say that none of the other patches we merged in an attempt 
to reduce memory usage had any discernible effect :D


It's worth reiterating that TripleO still disables convergence in the 
undercloud, so these are all tests of the legacy code path. It would be 
great if we could set up a non-voting job on t-h-t with convergence 
enabled and start tracking memory use over time there too. As a first 
step, maybe we could at least add an experimental job on Heat to give us 
a baseline?


The next big improvement to memory use is likely to come from 
https://review.openstack.org/#/c/407326/ or something like it (though I 
don't think we have a firm decision on whether we'd apply this to 
non-convergence stacks). Hopefully that will deliver a nice speed boost 
for convergence too.


cheers,
Zane.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev