Has anyone tried:

https://github.com/mgedmin/dozer/blob/master/dozer/leak.py#L72

This piece of middleware creates some nice graphs (using PIL) that may help identify which areas are using what memory (and/or leaking).

https://pypi.python.org/pypi/linesman might also be somewhat useful to have running.

How any process takes more than 100MB here blows my mind (horizon is doing nicely, ha); what are people caching in process to have RSS that large (1.95 GB, woah).

Armando M. wrote:
Hi,

[TL;DR]: OpenStack services have steadily increased their memory
footprints. We need a concerted way to address the oom-kills experienced
in the openstack gate, as we may have reached a ceiling.

Now the longer version:
--------------------------------

We have been experiencing some instability in the gate lately due to a
number of reasons. When everything adds up, this means it's rather
difficult to merge anything and knowing we're in feature freeze, that
adds to stress. One culprit was identified to be [1].

We initially tried to increase the swappiness, but that didn't seem to
help. Then we have looked at the resident memory in use. When going back
over the past three releases we have noticed that the aggregated memory
footprint of some openstack projects has grown steadily. We have the
following:

  * Mitaka
      o neutron: 1.40GB
      o nova: 1.70GB
      o swift: 640MB
      o cinder: 730MB
      o keystone: 760MB
      o horizon: 17MB
      o glance: 538MB
  * Newton
      o neutron: 1.59GB (+13%)
      o nova: 1.67GB (-1%)
      o swift: 779MB (+21%)
      o cinder: 878MB (+20%)
      o keystone: 919MB (+20%)
      o horizon: 21MB (+23%)
      o glance: 721MB (+34%)
  * Ocata
      o neutron: 1.75GB (+10%)
      o nova: 1.95GB (%16%)
      o swift: 703MB (-9%)
      o cinder: 920MB (4%)
      o keystone: 903MB (-1%)
      o horizon: 25MB (+20%)
      o glance: 740MB (+2%)

Numbers are approximated and I only took a couple of samples, but in a
nutshell, the majority of the services have seen double digit growth
over the past two cycles in terms of the amount or RSS memory they use.

Since [1] is observed only since ocata [2], I imagine that's pretty
reasonable to assume that memory increase might as well be a determining
factor to the oom-kills we see in the gate.

Profiling and surgically reducing the memory used by each component in
each service is a lengthy process, but I'd rather see some gate relief
right away. Reducing the number of API workers helps bring the RSS
memory down back to mitaka levels:

  * neutron: 1.54GB
  * nova: 1.24GB
  * swift: 694MB
  * cinder: 778MB
  * keystone: 891MB
  * horizon: 24MB
  * glance: 490MB

However, it may have other side effects, like longer execution times, or
increase of timeouts.

Where do we go from here? I am not particularly fond of stop-gap [4],
but it is the one fix that most widely address the memory increase we
have experienced across the board.

Thanks,
Armando

[1] https://bugs.launchpad.net/neutron/+bug/1656386
<https://bugs.launchpad.net/neutron/+bug/1656386>
[2]
http://logstash.openstack.org/#/dashboard/file/logstash.json?query=message:%5C%22oom-killer%5C%22%20AND%20tags:syslog
[3]
http://logs.openstack.org/21/427921/1/check/gate-tempest-dsvm-neutron-full-ubuntu-xenial/82084c2/
[4] https://review.openstack.org/#/c/427921

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to