[ovirt-users] Re: Upgraded to oVirt 4.4.9, still have vdsmd memory leak

2021-12-09 Thread Chris Adams
Once upon a time, Victor Stinner  said:
> Or something somehow prevents to delete these projects object. For
> example, an exception is stored somewhere which keeps all variables
> alive (in Python 3, an exception stores a traceback object which keeps
> all variables of all frames alive).

I think I found the cause, if not the actual code issue... due to a
long-standing local config typo (how embarassing), these servers had the
vdsm (TCP 54321) port open to the world.  It appears that something is
leaking memory on bad connections (like from port scans I expect).  I
blocked the outside access, and the vdsmd processes are not growing
since then.

It'd probably be good to handle this better (and now knowing a probable
cause may help someone track it down), but also I think I've solved my
immediate problem.

-- 
Chris Adams 
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/LRDQJ4CWL4EJ5R6YEJSWZ2D2AIAIKTS5/


[ovirt-users] Re: Upgraded to oVirt 4.4.9, still have vdsmd memory leak

2021-12-09 Thread Sandro Bonazzola
@Jiri Denemark  @Eduardo Lima  can
you please have a look on libvirt side?
@Martin Perina  the host/stats part within vdsm was
handled by people who are not working anymore on oVirt project, perhaps
someone from infra can have a look?

Il giorno gio 9 dic 2021 alle ore 11:20 Victor Stinner 
ha scritto:

> On Tue, Dec 7, 2021 at 6:12 PM Chris Adams  wrote:
> > Top differences
> > /usr/lib64/python3.6/site-packages/libvirt.py:442: size=295 MiB (+285
> MiB), count=5511282 (+5312311), average=56 B
> > /usr/lib64/python3.6/json/decoder.py:355: size=73.9 MiB (+70.2 MiB),
> count=736108 (+697450), average=105 B
> > /usr/lib64/python3.6/logging/__init__.py:1630: size=44.2 MiB (+43.8
> MiB), count=345704 (+342481), average=134 B
> > /usr/lib64/python3.6/site-packages/libvirt.py:5695: size=30.3 MiB (+30.0
> MiB), count=190449 (+188665), average=167 B
> > /usr/lib/python3.6/site-packages/vdsm/host/stats.py:138: size=12.1 MiB
> (+11.4 MiB), count=75366 (+70991), average=168 B
> > /usr/lib/python3.6/site-packages/vdsm/utils.py:358: size=10.4 MiB (+9968
> KiB), count=70204 (+65272), average=156 B
>
> That's quite significant!
>
> > Top block
> > 5511282 memory blocks: 302589.8 KiB
> >   File "/usr/lib64/python3.6/site-packages/libvirt.py", line 442
> > ret = libvirtmod.virEventRunDefaultImpl()
> >   File
> "/usr/lib/python3.6/site-packages/vdsm/common/libvirtconnection.py", line 69
> > libvirt.virEventRunDefaultImpl()
> >   File "/usr/lib/python3.6/site-packages/vdsm/common/concurrent.py",
> line 260
> > ret = func(*args, **kwargs)
>
> You should check where these "ret" objects (of libvirt.py:442) are
> stored: 5,511,282 is a lot of small objects (average: 56 bytes)! Maybe
> they are stored in a list and never destroyed.
>
> Maybe it's a reference leak in the libvirtmod.virEventRunDefaultImpl()
> function of "libvirtmod" C extension: missing Py_DECREF() somewhere.
>
> Or something somehow prevents to delete these projects object. For
> example, an exception is stored somewhere which keeps all variables
> alive (in Python 3, an exception stores a traceback object which keeps
> all variables of all frames alive).
>
> On GitHub and GitLab, I found the following code. Maybe there are
> minor differences in the versions that you are using.
>
> https://gitlab.com/libvirt/libvirt-python
> (I built the code locally to get build/libvirt.py)
>
> build/libvirt.c:
> ---
> PyObject *
> libvirt_intWrap(int val)
> {
> return PyLong_FromLong((long) val);
> }
>
> PyObject *
> libvirt_virEventRunDefaultImpl(PyObject *self ATTRIBUTE_UNUSED,
> PyObject *args ATTRIBUTE_UNUSED) {
> PyObject *py_retval;
> int c_retval;
> LIBVIRT_BEGIN_ALLOW_THREADS;
> c_retval = virEventRunDefaultImpl();
> LIBVIRT_END_ALLOW_THREADS;
> py_retval = libvirt_intWrap((int) c_retval);
> return py_retval;
> }
>
> static PyMethodDef libvirtMethods[] = {
> { (char *)"virEventRunDefaultImpl",
> libvirt_virEventRunDefaultImpl, METH_VARARGS, NULL },
> ...
> {NULL, NULL, 0, NULL}
> };
> ---
>
> This code looks correct and straightforward. Is it possible that
> internally virEventRunDefaultImpl() calls a Python memory allocator?
>
> build/libvirt.py:
> ---
> def virEventRunDefaultImpl():
> ret = libvirtmod.virEventRunDefaultImpl()
> if ret == -1:
> raise libvirtError('virEventRunDefaultImpl() failed')
> return ret
> ---
>
> Again, this code looks correct and straightforward.
>
>
> https://github.com/oVirt/vdsm/blob/37ed5c279c2dd9c9bb06329d674882e0f98f34d6/lib/vdsm/common/libvirtconnection.py
>
> vdsm/common/libvirtconnection.py:
> ---
> def __run(self):
> try:
> libvirt.virEventRegisterDefaultImpl()
> while self.run:
> libvirt.virEventRunDefaultImpl()
> finally:
> self.run = False
> ---
>
> libvirt.virEventRunDefaultImpl() result is ignored and so I don't see
> anything obvious which would explain a leak.
>
>
> Sometimes, looking at the top function is misleading since the
> explanation can be found in one of the caller functions.
>
> For example, which function creates 70.2 MiB of objects from a JSON
> document? What calls json/decoder.py:355?
>
> Victor
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/O5OAA6KNLINLRT2VYKNBI2PPH6UIYR4A/
>


-- 

Sandro Bonazzola

MANAGER, SOFTWARE ENGINEERING, EMEA R RHV

Red Hat EMEA 

sbona...@redhat.com


*Red Hat respects your work life balance. Therefore there is no need to
answer this email out of your office hours.*
___
Users mailing list -- users@ovirt.org
To unsubscribe 

[ovirt-users] Re: Upgraded to oVirt 4.4.9, still have vdsmd memory leak

2021-12-09 Thread Victor Stinner
On Tue, Dec 7, 2021 at 6:12 PM Chris Adams  wrote:
> Top differences
> /usr/lib64/python3.6/site-packages/libvirt.py:442: size=295 MiB (+285 MiB), 
> count=5511282 (+5312311), average=56 B
> /usr/lib64/python3.6/json/decoder.py:355: size=73.9 MiB (+70.2 MiB), 
> count=736108 (+697450), average=105 B
> /usr/lib64/python3.6/logging/__init__.py:1630: size=44.2 MiB (+43.8 MiB), 
> count=345704 (+342481), average=134 B
> /usr/lib64/python3.6/site-packages/libvirt.py:5695: size=30.3 MiB (+30.0 
> MiB), count=190449 (+188665), average=167 B
> /usr/lib/python3.6/site-packages/vdsm/host/stats.py:138: size=12.1 MiB (+11.4 
> MiB), count=75366 (+70991), average=168 B
> /usr/lib/python3.6/site-packages/vdsm/utils.py:358: size=10.4 MiB (+9968 
> KiB), count=70204 (+65272), average=156 B

That's quite significant!

> Top block
> 5511282 memory blocks: 302589.8 KiB
>   File "/usr/lib64/python3.6/site-packages/libvirt.py", line 442
> ret = libvirtmod.virEventRunDefaultImpl()
>   File "/usr/lib/python3.6/site-packages/vdsm/common/libvirtconnection.py", 
> line 69
> libvirt.virEventRunDefaultImpl()
>   File "/usr/lib/python3.6/site-packages/vdsm/common/concurrent.py", line 260
> ret = func(*args, **kwargs)

You should check where these "ret" objects (of libvirt.py:442) are
stored: 5,511,282 is a lot of small objects (average: 56 bytes)! Maybe
they are stored in a list and never destroyed.

Maybe it's a reference leak in the libvirtmod.virEventRunDefaultImpl()
function of "libvirtmod" C extension: missing Py_DECREF() somewhere.

Or something somehow prevents to delete these projects object. For
example, an exception is stored somewhere which keeps all variables
alive (in Python 3, an exception stores a traceback object which keeps
all variables of all frames alive).

On GitHub and GitLab, I found the following code. Maybe there are
minor differences in the versions that you are using.

https://gitlab.com/libvirt/libvirt-python
(I built the code locally to get build/libvirt.py)

build/libvirt.c:
---
PyObject *
libvirt_intWrap(int val)
{
return PyLong_FromLong((long) val);
}

PyObject *
libvirt_virEventRunDefaultImpl(PyObject *self ATTRIBUTE_UNUSED,
PyObject *args ATTRIBUTE_UNUSED) {
PyObject *py_retval;
int c_retval;
LIBVIRT_BEGIN_ALLOW_THREADS;
c_retval = virEventRunDefaultImpl();
LIBVIRT_END_ALLOW_THREADS;
py_retval = libvirt_intWrap((int) c_retval);
return py_retval;
}

static PyMethodDef libvirtMethods[] = {
{ (char *)"virEventRunDefaultImpl",
libvirt_virEventRunDefaultImpl, METH_VARARGS, NULL },
...
{NULL, NULL, 0, NULL}
};
---

This code looks correct and straightforward. Is it possible that
internally virEventRunDefaultImpl() calls a Python memory allocator?

build/libvirt.py:
---
def virEventRunDefaultImpl():
ret = libvirtmod.virEventRunDefaultImpl()
if ret == -1:
raise libvirtError('virEventRunDefaultImpl() failed')
return ret
---

Again, this code looks correct and straightforward.

https://github.com/oVirt/vdsm/blob/37ed5c279c2dd9c9bb06329d674882e0f98f34d6/lib/vdsm/common/libvirtconnection.py

vdsm/common/libvirtconnection.py:
---
def __run(self):
try:
libvirt.virEventRegisterDefaultImpl()
while self.run:
libvirt.virEventRunDefaultImpl()
finally:
self.run = False
---

libvirt.virEventRunDefaultImpl() result is ignored and so I don't see
anything obvious which would explain a leak.


Sometimes, looking at the top function is misleading since the
explanation can be found in one of the caller functions.

For example, which function creates 70.2 MiB of objects from a JSON
document? What calls json/decoder.py:355?

Victor
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/O5OAA6KNLINLRT2VYKNBI2PPH6UIYR4A/


[ovirt-users] Re: Upgraded to oVirt 4.4.9, still have vdsmd memory leak

2021-12-07 Thread Chris Adams
Once upon a time, Victor Stinner  said:
> Then use pickle.load() to reload snapshots from files. You can take
> multiple snapshots and compare snapshot 1 with snapshot 2, compare 1
> with 3, etc. If there is a major memory increase between two
> snapshots, I expect a significant difference between these two
> snapshots.

I tried this approach (tried Valgrind but it caused vdsmd to run too
slow, so oVirt saw timeouts and moved VMs away), and it does show a
pretty big jump overnight.  Below is the output of a compare of
tracemalloc dump between yesterday afternoon and this morning.  The
files in the traceback are from RPMs:

python3-libvirt-7.6.0-1.el8s.x86_64
vdsm-common-4.40.90.4-1.el8.noarch

Looking at the code, I'm not sure what to make of it though.


Top differences
/usr/lib64/python3.6/site-packages/libvirt.py:442: size=295 MiB (+285 MiB), 
count=5511282 (+5312311), average=56 B
/usr/lib64/python3.6/json/decoder.py:355: size=73.9 MiB (+70.2 MiB), 
count=736108 (+697450), average=105 B
/usr/lib64/python3.6/logging/__init__.py:1630: size=44.2 MiB (+43.8 MiB), 
count=345704 (+342481), average=134 B
/usr/lib64/python3.6/site-packages/libvirt.py:5695: size=30.3 MiB (+30.0 MiB), 
count=190449 (+188665), average=167 B
/usr/lib/python3.6/site-packages/vdsm/host/stats.py:138: size=12.1 MiB (+11.4 
MiB), count=75366 (+70991), average=168 B
/usr/lib/python3.6/site-packages/vdsm/utils.py:358: size=10.4 MiB (+9968 KiB), 
count=70204 (+65272), average=156 B
/usr/lib64/python3.6/site-packages/libvirt.py:537: size=7676 KiB (+7656 KiB), 
count=109119 (+108886), average=72 B
/usr/lib/python3.6/site-packages/yajsonrpc/betterAsyncore.py:256: size=7813 KiB 
(+7505 KiB), count=125015 (+120083), average=64 B
/usr/lib64/python3.6/asyncore.py:173: size=6934 KiB (+6735 KiB), count=110941 
(+107755), average=64 B
/usr/lib/python3.6/site-packages/vdsm/virt/vmchannels.py:163: size=5984 KiB 
(+5631 KiB), count=95744 (+90103), average=64 B

Top block
5511282 memory blocks: 302589.8 KiB
  File "/usr/lib64/python3.6/site-packages/libvirt.py", line 442
ret = libvirtmod.virEventRunDefaultImpl()
  File "/usr/lib/python3.6/site-packages/vdsm/common/libvirtconnection.py", 
line 69 
libvirt.virEventRunDefaultImpl()
  File "/usr/lib/python3.6/site-packages/vdsm/common/concurrent.py", line 260
ret = func(*args, **kwargs)
  File "/usr/lib64/python3.6/threading.py", line 885
self._target(*self._args, **self._kwargs)
  File "/usr/lib64/python3.6/threading.py", line 937
self.run()
  File "/usr/lib64/python3.6/threading.py", line 905
self._bootstrap_inner()


-- 
Chris Adams 
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/PPMN35ROLDKDTLD43NJBDZLPH5Z64XFH/


[ovirt-users] Re: Upgraded to oVirt 4.4.9, still have vdsmd memory leak

2021-12-06 Thread Victor Stinner
Hi,

I like to compute differences between two snapshots (diff), rather
than looking at a single snapshot:
https://docs.python.org/dev/library/tracemalloc.html#compute-differences

You can modify your signal handler to write the snapshot into a file
using pickle.dump():
https://github.com/vstinner/tracemallocqt#usage

Then use pickle.load() to reload snapshots from files. You can take
multiple snapshots and compare snapshot 1 with snapshot 2, compare 1
with 3, etc. If there is a major memory increase between two
snapshots, I expect a significant difference between these two
snapshots.

You can configure tracemalloc to decide how many frames per traceback
are stored. See -X tracemalloc=NFRAME, PYTHONTRACEMALLOC=NFRAME and
start() argument:
https://docs.python.org/dev/library/tracemalloc.html#tracemalloc.start

tracemalloc only "sees" memory allocations made by Python. You can get
the "current size size of memory blocks traced by the tracemalloc
module" with:
https://docs.python.org/dev/library/tracemalloc.html#tracemalloc.get_traced_memory

Note: tracemalloc itself consumes a lot of memory, which can explain
why your application uses more RSS memory when tracemalloc is used.

If there is a huge difference between the RSS memory increase the what
tracemalloc see (ex: RSS: +100 MB, tracemalloc: +1 MB), maybe you
should use another tool working at the malloc/free level, like
Valgrind.

Victor
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/FXZ3LFGSSD4OBLKDGMR5TDKSHZYLHFL2/


[ovirt-users] Re: Upgraded to oVirt 4.4.9, still have vdsmd memory leak

2021-12-03 Thread Chris Adams
Once upon a time, Victor Stinner  said:
> I wrote the tracemalloc module which is easy to use on Python 3.4 and
> newer. If you take tracemalloc snapshots while the memory usage is
> growing, and comparing snapshots don't show anything obvious, you can
> maybe suspect memory fragmentation. You're talking about 4 GB of
> memory usage, I don't think that memory fragmentation can explain it.
> Do you need my help to use tracemalloc?

My python is rudimentary at best (my programming has all been in other
languages), but here's what I tried for starters: I added a USR2 signal
handler to log the top users, but it doesn't seem to show anything
growing like the RSS is actually doing.

I made the following change:

--- /usr/lib/python3.6/site-packages/vdsm/vdsmd.py.dist~2021-10-25 
11:27:46.0 -0500
+++ /usr/lib/python3.6/site-packages/vdsm/vdsmd.py  2021-12-02 
13:08:46.0 -0600
@@ -29,6 +29,7 @@
 import syslog
 import resource
 import tempfile
+import tracemalloc
 from logging import config as lconfig
 
 from vdsm import constants
@@ -82,6 +83,14 @@
 irs.spmStop(
 irs.getConnectedStoragePoolsList()['poollist'][0])
 
+def sigusr2Handler(signum, frame):
+snapshot = tracemalloc.take_snapshot()
+top_stats = snapshot.statistics('lineno')
+lentry = 'Top memory users:\n'
+for stat in top_stats[:10]:
+lentry += '' + str(stat) + '\n'
+log.info(lentry)
+
 def sigalrmHandler(signum, frame):
 # Used in panic.panic() when shuting down logging, must not log.
 raise RuntimeError("Alarm timeout")
@@ -89,6 +98,7 @@
 sigutils.register()
 signal.signal(signal.SIGTERM, sigtermHandler)
 signal.signal(signal.SIGUSR1, sigusr1Handler)
+signal.signal(signal.SIGUSR2, sigusr2Handler)
 signal.signal(signal.SIGALRM, sigalrmHandler)
 zombiereaper.registerSignalHandler()
 

And also set a systemd override on vdsmd.service to add
PYTHONTRACEMALLOC=25.  That gets log entries like this:

2021-12-03 07:30:37,244-0600 INFO  (MainThread) [vds] Top memory users:
/usr/lib64/python3.6/site-packages/libvirt.py:442: size=34.0 MiB, 
count=630128, average=57 B
:487: size=16.5 MiB, count=191152, 
average=90 B
/usr/lib64/python3.6/json/decoder.py:355: size=14.6 MiB, count=142411, 
average=108 B
/usr/lib/python3.6/site-packages/vdsm/host/stats.py:138: size=3678 KiB, 
count=22428, average=168 B
:219: size=2027 KiB, count=17555, average=118 B
/usr/lib/python3.6/site-packages/vdsm/api/vdsmapi.py:143: size=1724 KiB, 
count=23388, average=75 B
/usr/lib/python3.6/site-packages/vdsm/virt/vmchannels.py:163: size=1502 
KiB, count=24039, average=64 B
/usr/lib64/python3.6/linecache.py:137: size=1383 KiB, count=13404, 
average=106 B
/usr/lib/python3.6/site-packages/vdsm/utils.py:358: size=1305 KiB, 
count=8587, average=156 B
/usr/lib64/python3.6/functools.py:67: size=1134 KiB, count=9624, 
average=121 B
 (vdsmd:92)


But at the time I generated that, the RSS was over 340MB.
Interestingly, when I sent the signal, the RSS jumped to over 430MB (but
maybe my change did that?).

-- 
Chris Adams 
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/RYIDSSYRP4GHJTDK5ZY7PPL6XL7QZCFE/


[ovirt-users] Re: Upgraded to oVirt 4.4.9, still have vdsmd memory leak

2021-11-15 Thread Chris Adams
Once upon a time, Victor Stinner  said:
> I wrote the tracemalloc module which is easy to use on Python 3.4 and
> newer. If you take tracemalloc snapshots while the memory usage is
> growing, and comparing snapshots don't show anything obvious, you can
> maybe suspect memory fragmentation. You're talking about 4 GB of
> memory usage, I don't think that memory fragmentation can explain it.
> Do you need my help to use tracemalloc?

Any tips on where I should add that to vdsm's code?

It'll probably take me a little time to get this going - I'm only seeing
this on my production cluster (of course), not my dev cluster.  One
difference is prod is iSCSI storage with multipath, while dev is Gluster
(so that may be a clue to the source of the issue).
-- 
Chris Adams 
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/VENMRJEP2JW3F2R3RM53XY5PRR3VWYFX/


[ovirt-users] Re: Upgraded to oVirt 4.4.9, still have vdsmd memory leak

2021-11-15 Thread Victor Stinner
Hi,

I wrote the tracemalloc module which is easy to use on Python 3.4 and
newer. If you take tracemalloc snapshots while the memory usage is
growing, and comparing snapshots don't show anything obvious, you can
maybe suspect memory fragmentation. You're talking about 4 GB of
memory usage, I don't think that memory fragmentation can explain it.
Do you need my help to use tracemalloc?

Quick tutorial in the official documentation:
https://docs.python.org/dev/library/tracemalloc.html#compute-differences

Victor

On Fri, Nov 12, 2021 at 3:51 PM David Malcolm  wrote:
>
> On Fri, 2021-11-12 at 09:54 +0100, Sandro Bonazzola wrote:
> > Il giorno ven 12 nov 2021 alle ore 09:50 Sandro Bonazzola <
> > sbona...@redhat.com> ha scritto:
> >
> > >
> > >
> > > Il giorno ven 12 nov 2021 alle ore 09:47 Sandro Bonazzola <
> > > sbona...@redhat.com> ha scritto:
> > >
> > > >
> > > >
> > > > Il giorno mer 10 nov 2021 alle ore 15:45 Chris Adams
> > > > 
> > > > ha scritto:
> > > >
> > > > > I have seen vdsmd leak memory for years (I've been running
> > > > > oVirt since
> > > > > version 3.5), but never been able to nail it down.  I've
> > > > > upgraded a
> > > > > cluster to oVirt 4.4.9 (reloading the hosts with CentOS 8-
> > > > > stream), and I
> > > > > still see it happen.  One host in the cluster, which has been
> > > > > up 8 days,
> > > > > has vdsmd with 4.3 GB resident memory.  On a couple of other
> > > > > hosts, it's
> > > > > around half a gigabyte.
> > > > >
> > > > > In the past, it seemed more likely to happen on the hosted
> > > > > engine hosts
> > > > > and/or the SPM host... but the host with the 4.3 GB vdsmd is
> > > > > not either
> > > > > of those.
> > > > >
> > > > > I'm not sure what I do that would make my setup "special"
> > > > > compared to
> > > > > others; I loaded a pretty minimal install of CentOS 8-stream,
> > > > > with the
> > > > > only extra thing being I add the core parts of the Dell
> > > > > PowerEdge
> > > > > OpenManage tools (so I can get remote SNMP hardware
> > > > > monitoring).
> > > > >
> > > > > When I run "pmap $(pidof -x vdsmd)", the bulk of the RAM use is
> > > > > a single
> > > > > anonymous block (which I'm guessing is just the python general
> > > > > memory
> > > > > allocator).
> > > > >
> > > > > I thought maybe the switch to CentOS 8 and python 3 might clear
> > > > > something up, but obviously not.  Any ideas?
> > > > >
> > > >
> > > > I guess we still have the reproducibility issue (
> > > > https://lists.ovirt.org/archives/list/de...@ovirt.org/thread/KO5SEPAZMLBWSBS6OJZ73YVPLHIAFOLV/
> > > > ).
> > > > But maybe in the meanwhile there's a new way to track things
> > > > down. +Marcin
> > > > Sobczyk  ?
> > > >
> > > >
> > > >
> > > Perhaps https://docs.python.org/3.6/library/tracemalloc.html ?
> > >
> >
> > +David Malcolm  I saw your slides on python
> > memory
> > leak debugging, maybe you can give some suggestions here.
>
> I haven't worked on Python itself in > 8 years, so my knowledge is out-
> of-date here.
>
> Adding in Victor Stinner, who has worked on the CPython memory
> allocators more recently, and, in particular, implemented the
> tracemalloc library linked to above.
>
> Dave
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/XQSW5KABQG4VNGTYS7T66JETMFWCTNSF/


[ovirt-users] Re: Upgraded to oVirt 4.4.9, still have vdsmd memory leak

2021-11-14 Thread Nir Soffer
On Wed, Nov 10, 2021 at 4:46 PM Chris Adams  wrote:
>
> I have seen vdsmd leak memory for years (I've been running oVirt since
> version 3.5), but never been able to nail it down.  I've upgraded a
> cluster to oVirt 4.4.9 (reloading the hosts with CentOS 8-stream), and I
> still see it happen.  One host in the cluster, which has been up 8 days,
> has vdsmd with 4.3 GB resident memory.  On a couple of other hosts, it's
> around half a gigabyte.

Can you share vdsm logs from the time vdsm started?

We have these logs:

2021-11-14 15:16:32,956+0200 DEBUG (health) [health] Checking health (health:93)
2021-11-14 15:16:32,977+0200 DEBUG (health) [health] Collected 5001
objects (health:101)
2021-11-14 15:16:32,977+0200 DEBUG (health) [health] user=2.46%,
sys=0.74%, rss=108068 kB (-376), threads=47 (health:126)
2021-11-14 15:16:32,977+0200 INFO  (health) [health] LVM cache hit
ratio: 97.64% (hits: 5431 misses: 131) (health:131)

They may provide useful info on the leak.

You need to enable DEBUG logs for root logger in /etc/vdsm/logger.conf:

[logger_root]
level=DEBUG
handlers=syslog,logthread
propagate=0

and restart vdsmd service.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/JDA34CQF5FTHVFTRXF4OGKEFJIKJL3NL/


[ovirt-users] Re: Upgraded to oVirt 4.4.9, still have vdsmd memory leak

2021-11-12 Thread David Malcolm
On Fri, 2021-11-12 at 09:54 +0100, Sandro Bonazzola wrote:
> Il giorno ven 12 nov 2021 alle ore 09:50 Sandro Bonazzola <
> sbona...@redhat.com> ha scritto:
> 
> > 
> > 
> > Il giorno ven 12 nov 2021 alle ore 09:47 Sandro Bonazzola <
> > sbona...@redhat.com> ha scritto:
> > 
> > > 
> > > 
> > > Il giorno mer 10 nov 2021 alle ore 15:45 Chris Adams
> > > 
> > > ha scritto:
> > > 
> > > > I have seen vdsmd leak memory for years (I've been running
> > > > oVirt since
> > > > version 3.5), but never been able to nail it down.  I've
> > > > upgraded a
> > > > cluster to oVirt 4.4.9 (reloading the hosts with CentOS 8-
> > > > stream), and I
> > > > still see it happen.  One host in the cluster, which has been
> > > > up 8 days,
> > > > has vdsmd with 4.3 GB resident memory.  On a couple of other
> > > > hosts, it's
> > > > around half a gigabyte.
> > > > 
> > > > In the past, it seemed more likely to happen on the hosted
> > > > engine hosts
> > > > and/or the SPM host... but the host with the 4.3 GB vdsmd is
> > > > not either
> > > > of those.
> > > > 
> > > > I'm not sure what I do that would make my setup "special"
> > > > compared to
> > > > others; I loaded a pretty minimal install of CentOS 8-stream,
> > > > with the
> > > > only extra thing being I add the core parts of the Dell
> > > > PowerEdge
> > > > OpenManage tools (so I can get remote SNMP hardware
> > > > monitoring).
> > > > 
> > > > When I run "pmap $(pidof -x vdsmd)", the bulk of the RAM use is
> > > > a single
> > > > anonymous block (which I'm guessing is just the python general
> > > > memory
> > > > allocator).
> > > > 
> > > > I thought maybe the switch to CentOS 8 and python 3 might clear
> > > > something up, but obviously not.  Any ideas?
> > > > 
> > > 
> > > I guess we still have the reproducibility issue (
> > > https://lists.ovirt.org/archives/list/de...@ovirt.org/thread/KO5SEPAZMLBWSBS6OJZ73YVPLHIAFOLV/
> > > ).
> > > But maybe in the meanwhile there's a new way to track things
> > > down. +Marcin
> > > Sobczyk  ?
> > > 
> > > 
> > > 
> > Perhaps https://docs.python.org/3.6/library/tracemalloc.html ?
> > 
> 
> +David Malcolm  I saw your slides on python
> memory
> leak debugging, maybe you can give some suggestions here.

I haven't worked on Python itself in > 8 years, so my knowledge is out-
of-date here.

Adding in Victor Stinner, who has worked on the CPython memory
allocators more recently, and, in particular, implemented the
tracemalloc library linked to above.

Dave
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/GAASNRR7WWCVJTKFKZIJM6MYLAI6VPGU/


[ovirt-users] Re: Upgraded to oVirt 4.4.9, still have vdsmd memory leak

2021-11-12 Thread Sandro Bonazzola
Il giorno ven 12 nov 2021 alle ore 09:50 Sandro Bonazzola <
sbona...@redhat.com> ha scritto:

>
>
> Il giorno ven 12 nov 2021 alle ore 09:47 Sandro Bonazzola <
> sbona...@redhat.com> ha scritto:
>
>>
>>
>> Il giorno mer 10 nov 2021 alle ore 15:45 Chris Adams 
>> ha scritto:
>>
>>> I have seen vdsmd leak memory for years (I've been running oVirt since
>>> version 3.5), but never been able to nail it down.  I've upgraded a
>>> cluster to oVirt 4.4.9 (reloading the hosts with CentOS 8-stream), and I
>>> still see it happen.  One host in the cluster, which has been up 8 days,
>>> has vdsmd with 4.3 GB resident memory.  On a couple of other hosts, it's
>>> around half a gigabyte.
>>>
>>> In the past, it seemed more likely to happen on the hosted engine hosts
>>> and/or the SPM host... but the host with the 4.3 GB vdsmd is not either
>>> of those.
>>>
>>> I'm not sure what I do that would make my setup "special" compared to
>>> others; I loaded a pretty minimal install of CentOS 8-stream, with the
>>> only extra thing being I add the core parts of the Dell PowerEdge
>>> OpenManage tools (so I can get remote SNMP hardware monitoring).
>>>
>>> When I run "pmap $(pidof -x vdsmd)", the bulk of the RAM use is a single
>>> anonymous block (which I'm guessing is just the python general memory
>>> allocator).
>>>
>>> I thought maybe the switch to CentOS 8 and python 3 might clear
>>> something up, but obviously not.  Any ideas?
>>>
>>
>> I guess we still have the reproducibility issue (
>> https://lists.ovirt.org/archives/list/de...@ovirt.org/thread/KO5SEPAZMLBWSBS6OJZ73YVPLHIAFOLV/
>> ).
>> But maybe in the meanwhile there's a new way to track things down. +Marcin
>> Sobczyk  ?
>>
>>
>>
> Perhaps https://docs.python.org/3.6/library/tracemalloc.html ?
>

+David Malcolm  I saw your slides on python memory
leak debugging, maybe you can give some suggestions here.


>
>
>
>>
>>
>>> --
>>> Chris Adams 
>>> ___
>>> Users mailing list -- users@ovirt.org
>>> To unsubscribe send an email to users-le...@ovirt.org
>>> Privacy Statement: https://www.ovirt.org/privacy-policy.html
>>> oVirt Code of Conduct:
>>> https://www.ovirt.org/community/about/community-guidelines/
>>> List Archives:
>>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/3PTE35WMIVGLV2W47YVQUHCVOI6LGIPM/
>>>
>>
>>
>> --
>>
>> Sandro Bonazzola
>>
>> MANAGER, SOFTWARE ENGINEERING, EMEA R RHV
>>
>> Red Hat EMEA 
>>
>> sbona...@redhat.com
>> 
>>
>> *Red Hat respects your work life balance. Therefore there is no need to
>> answer this email out of your office hours.*
>>
>>
>>
>
> --
>
> Sandro Bonazzola
>
> MANAGER, SOFTWARE ENGINEERING, EMEA R RHV
>
> Red Hat EMEA 
>
> sbona...@redhat.com
> 
>
> *Red Hat respects your work life balance. Therefore there is no need to
> answer this email out of your office hours.*
>
>
>

-- 

Sandro Bonazzola

MANAGER, SOFTWARE ENGINEERING, EMEA R RHV

Red Hat EMEA 

sbona...@redhat.com


*Red Hat respects your work life balance. Therefore there is no need to
answer this email out of your office hours.*
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/IZIL7DCUJW6IXNKDH32YGPRRQY5Q5HXC/


[ovirt-users] Re: Upgraded to oVirt 4.4.9, still have vdsmd memory leak

2021-11-12 Thread Sandro Bonazzola
Il giorno ven 12 nov 2021 alle ore 09:47 Sandro Bonazzola <
sbona...@redhat.com> ha scritto:

>
>
> Il giorno mer 10 nov 2021 alle ore 15:45 Chris Adams  ha
> scritto:
>
>> I have seen vdsmd leak memory for years (I've been running oVirt since
>> version 3.5), but never been able to nail it down.  I've upgraded a
>> cluster to oVirt 4.4.9 (reloading the hosts with CentOS 8-stream), and I
>> still see it happen.  One host in the cluster, which has been up 8 days,
>> has vdsmd with 4.3 GB resident memory.  On a couple of other hosts, it's
>> around half a gigabyte.
>>
>> In the past, it seemed more likely to happen on the hosted engine hosts
>> and/or the SPM host... but the host with the 4.3 GB vdsmd is not either
>> of those.
>>
>> I'm not sure what I do that would make my setup "special" compared to
>> others; I loaded a pretty minimal install of CentOS 8-stream, with the
>> only extra thing being I add the core parts of the Dell PowerEdge
>> OpenManage tools (so I can get remote SNMP hardware monitoring).
>>
>> When I run "pmap $(pidof -x vdsmd)", the bulk of the RAM use is a single
>> anonymous block (which I'm guessing is just the python general memory
>> allocator).
>>
>> I thought maybe the switch to CentOS 8 and python 3 might clear
>> something up, but obviously not.  Any ideas?
>>
>
> I guess we still have the reproducibility issue (
> https://lists.ovirt.org/archives/list/de...@ovirt.org/thread/KO5SEPAZMLBWSBS6OJZ73YVPLHIAFOLV/
> ).
> But maybe in the meanwhile there's a new way to track things down. +Marcin
> Sobczyk  ?
>
>
>
Perhaps https://docs.python.org/3.6/library/tracemalloc.html ?



>
>
>> --
>> Chris Adams 
>> ___
>> Users mailing list -- users@ovirt.org
>> To unsubscribe send an email to users-le...@ovirt.org
>> Privacy Statement: https://www.ovirt.org/privacy-policy.html
>> oVirt Code of Conduct:
>> https://www.ovirt.org/community/about/community-guidelines/
>> List Archives:
>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/3PTE35WMIVGLV2W47YVQUHCVOI6LGIPM/
>>
>
>
> --
>
> Sandro Bonazzola
>
> MANAGER, SOFTWARE ENGINEERING, EMEA R RHV
>
> Red Hat EMEA 
>
> sbona...@redhat.com
> 
>
> *Red Hat respects your work life balance. Therefore there is no need to
> answer this email out of your office hours.*
>
>
>

-- 

Sandro Bonazzola

MANAGER, SOFTWARE ENGINEERING, EMEA R RHV

Red Hat EMEA 

sbona...@redhat.com


*Red Hat respects your work life balance. Therefore there is no need to
answer this email out of your office hours.*
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/PD5T57TRMEUZEX7YMR7A6XNV5NSGCMFB/


[ovirt-users] Re: Upgraded to oVirt 4.4.9, still have vdsmd memory leak

2021-11-12 Thread Sandro Bonazzola
Il giorno mer 10 nov 2021 alle ore 15:45 Chris Adams  ha
scritto:

> I have seen vdsmd leak memory for years (I've been running oVirt since
> version 3.5), but never been able to nail it down.  I've upgraded a
> cluster to oVirt 4.4.9 (reloading the hosts with CentOS 8-stream), and I
> still see it happen.  One host in the cluster, which has been up 8 days,
> has vdsmd with 4.3 GB resident memory.  On a couple of other hosts, it's
> around half a gigabyte.
>
> In the past, it seemed more likely to happen on the hosted engine hosts
> and/or the SPM host... but the host with the 4.3 GB vdsmd is not either
> of those.
>
> I'm not sure what I do that would make my setup "special" compared to
> others; I loaded a pretty minimal install of CentOS 8-stream, with the
> only extra thing being I add the core parts of the Dell PowerEdge
> OpenManage tools (so I can get remote SNMP hardware monitoring).
>
> When I run "pmap $(pidof -x vdsmd)", the bulk of the RAM use is a single
> anonymous block (which I'm guessing is just the python general memory
> allocator).
>
> I thought maybe the switch to CentOS 8 and python 3 might clear
> something up, but obviously not.  Any ideas?
>

I guess we still have the reproducibility issue (
https://lists.ovirt.org/archives/list/de...@ovirt.org/thread/KO5SEPAZMLBWSBS6OJZ73YVPLHIAFOLV/
).
But maybe in the meanwhile there's a new way to track things down. +Marcin
Sobczyk  ?




> --
> Chris Adams 
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/3PTE35WMIVGLV2W47YVQUHCVOI6LGIPM/
>


-- 

Sandro Bonazzola

MANAGER, SOFTWARE ENGINEERING, EMEA R RHV

Red Hat EMEA 

sbona...@redhat.com


*Red Hat respects your work life balance. Therefore there is no need to
answer this email out of your office hours.*
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/YWFMABWWTTSBN3HCJFFGTICS7V2WQD3G/