Re: [ovirt-users] vdsm memory consumption (ovirt 4.0)
On Wed, Aug 31, 2016 at 9:06 PM, Nir Sofferwrote: > On Wed, Aug 31, 2016 at 8:06 PM, Federico Alberto Sayd > wrote: > >> Hello Nir: >> >> >> I followed your instructions , added the config file, restarted vdsm, and >> today I have the vdsm logs from a host: >> >> https://drive.google.com/file/d/0ByrwZ1AkYuyeR1hmRm90a1R6MEk >> /view?usp=sharing >> >> Please tell me if you see anything related to the memory issue. >> > > This logs start when vdsm is using 567640 kB (554 MiB) - very unusual. > > The memory usage grow by 18 MiB during one day. No garbage collection > issues. This smells like we keep some data forever for no reason. > > $ grep rss= vdsm-leak.log | head -n 1 > Thread-33::DEBUG::2016-08-30 > 12:01:43,845::health::122::health::(_check_resources) > user=1.73%, sys=1.65%, rss=567640 kB (+44), threads=57 > > $ grep rss= vdsm-leak.log | tail -n 1 > Thread-33::DEBUG::2016-08-31 > 13:00:36,913::health::122::health::(_check_resources) > user=4.18%, sys=1.87%, rss=586584 kB (+0), threads=52 > > I would like to see the logs since vdsm was started - do you have them? > > Also, can you describe the workload on this hypervisor? > > - how many vms are running at the same time > - how many vms are started and stopped per hour > - using default vdsm.conf? if not, please attach your conf > I could reproduce similar leak in master - it seems that we leak about 1MiB for each vm started and stopped. I opened this bug: https://bugzilla.redhat.com/1372205 Please check if this bug match your issue. If it does, please add your logs and other info to this bug. Thanks, Nir ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] vdsm memory consumption (ovirt 4.0)
On Wed, Aug 31, 2016 at 8:06 PM, Federico Alberto Saydwrote: > Hello Nir: > > > I followed your instructions , added the config file, restarted vdsm, and > today I have the vdsm logs from a host: > > https://drive.google.com/file/d/0ByrwZ1AkYuyeR1hmRm90a1R6MEk/ > view?usp=sharing > > Please tell me if you see anything related to the memory issue. > This logs start when vdsm is using 567640 kB (554 MiB) - very unusual. The memory usage grow by 18 MiB during one day. No garbage collection issues. This smells like we keep some data forever for no reason. $ grep rss= vdsm-leak.log | head -n 1 Thread-33::DEBUG::2016-08-30 12:01:43,845::health::122::health::(_check_resources) user=1.73%, sys=1.65%, rss=567640 kB (+44), threads=57 $ grep rss= vdsm-leak.log | tail -n 1 Thread-33::DEBUG::2016-08-31 13:00:36,913::health::122::health::(_check_resources) user=4.18%, sys=1.87%, rss=586584 kB (+0), threads=52 I would like to see the logs since vdsm was started - do you have them? Also, can you describe the workload on this hypervisor? - how many vms are running at the same time - how many vms are started and stopped per hour - using default vdsm.conf? if not, please attach your conf Nir > > Thanks > > Federico > > El 30/08/16 a las 03:47, Nir Soffer escribió: > > On Tue, Aug 30, 2016 at 1:30 AM, Federico Alberto Sayd < > fs...@uncu.edu.ar> wrote: > >> I have issues with my ovirt setup related to memory consumption. After >> upgrading to 4.0 I noted a considerable grow in vdsm memory consumption. >> I suspect that the grow is related to a memory leak. >> > > We need more details, see bellow... > > >> >> When I boot up the system and activate the host the memory consumption >> is about 600MB. After 5 days running and host in maintenance mode the >> memory consumption is about 1,4 GB. >> >> I need to put my hosts in maintenance and reboot to free memory. >> > > You can restart vdsm (systemctl restart vdsmd) instead, running vms > are not effected by this. > > >> >> Can anyone help me to debug this problem? >> > > We had a memory in vdsm-4.18.5, fixed in vdsm-4.18.11. Since you > are running 4.18.11, there may be another leak. > > Please enable health monitoring by creating > /etc/vdsm/vdsm.conf.d/50-health.conf > > [devel] > health_monitor_enable = true > > And restart vdsm. > > Please run with this setting for couple of hours, maybe one day, > and then share the vdsm logs from this timeframe. > > You may disable health monitoring by setting > > [devel] > health_monitor_enable = false > > Or by renaming or deleting this configuration file: > > /etc/vdsm/vdsm.conf.d/50-health.conf.disabled > > Nir > > > ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] vdsm memory consumption (ovirt 4.0)
Hello Nir: I followed your instructions , added the config file, restarted vdsm, and today I have the vdsm logs from a host: https://drive.google.com/file/d/0ByrwZ1AkYuyeR1hmRm90a1R6MEk/view?usp=sharing Please tell me if you see anything related to the memory issue. Thanks Federico El 30/08/16 a las 03:47, Nir Soffer escribió: On Tue, Aug 30, 2016 at 1:30 AM, Federico Alberto Sayd> wrote: I have issues with my ovirt setup related to memory consumption. After upgrading to 4.0 I noted a considerable grow in vdsm memory consumption. I suspect that the grow is related to a memory leak. We need more details, see bellow... When I boot up the system and activate the host the memory consumption is about 600MB. After 5 days running and host in maintenance mode the memory consumption is about 1,4 GB. I need to put my hosts in maintenance and reboot to free memory. You can restart vdsm (systemctl restart vdsmd) instead, running vms are not effected by this. Can anyone help me to debug this problem? We had a memory in vdsm-4.18.5, fixed in vdsm-4.18.11. Since you are running 4.18.11, there may be another leak. Please enable health monitoring by creating /etc/vdsm/vdsm.conf.d/50-health.conf [devel] health_monitor_enable = true And restart vdsm. Please run with this setting for couple of hours, maybe one day, and then share the vdsm logs from this timeframe. You may disable health monitoring by setting [devel] health_monitor_enable = false Or by renaming or deleting this configuration file: /etc/vdsm/vdsm.conf.d/50-health.conf.disabled Nir ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] vdsm memory consumption (ovirt 4.0)
On Tue, Aug 30, 2016 at 1:30 AM, Federico Alberto Saydwrote: > I have issues with my ovirt setup related to memory consumption. After > upgrading to 4.0 I noted a considerable grow in vdsm memory consumption. > I suspect that the grow is related to a memory leak. > We need more details, see bellow... > > When I boot up the system and activate the host the memory consumption > is about 600MB. After 5 days running and host in maintenance mode the > memory consumption is about 1,4 GB. > > I need to put my hosts in maintenance and reboot to free memory. > You can restart vdsm (systemctl restart vdsmd) instead, running vms are not effected by this. > > Can anyone help me to debug this problem? > We had a memory in vdsm-4.18.5, fixed in vdsm-4.18.11. Since you are running 4.18.11, there may be another leak. Please enable health monitoring by creating /etc/vdsm/vdsm.conf.d/50-health.conf [devel] health_monitor_enable = true And restart vdsm. Please run with this setting for couple of hours, maybe one day, and then share the vdsm logs from this timeframe. You may disable health monitoring by setting [devel] health_monitor_enable = false Or by renaming or deleting this configuration file: /etc/vdsm/vdsm.conf.d/50-health.conf.disabled Nir ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
[ovirt-users] vdsm memory consumption (ovirt 4.0)
I have issues with my ovirt setup related to memory consumption. After upgrading to 4.0 I noted a considerable grow in vdsm memory consumption. I suspect that the grow is related to a memory leak. When I boot up the system and activate the host the memory consumption is about 600MB. After 5 days running and host in maintenance mode the memory consumption is about 1,4 GB. I need to put my hosts in maintenance and reboot to free memory. Can anyone help me to debug this problem? OS Version: RHEL - 7 - 2.1511.el7.centos.2.10 Kernel Version: 3.10.0 - 327.22.2.el7.x86_64 KVM Version: 2.3.0 - 31.el7.16.1 LIBVIRT Version: libvirt-1.2.17-13.el7_2.5 VDSM Version: vdsm-4.18.11-1.el7.centos Thank you ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] VDSM memory consumption
Finally got a chance to implement this, so testing this on my centos7 hosts, and it looks good. I’ll keep eye on it for a couple days, but after a couple of hours, there’s no evidence of any leakage. On Mar 30, 2015, at 4:14 PM, John Taylor jtt77...@yahoo.com wrote: Dan Kenigsberg dan...@redhat.com writes: On Sat, Mar 28, 2015 at 10:20:25AM -0400, John Taylor wrote: Daniel Helgenberger daniel.helgenber...@m-box.de writes: Hello Everyone, I did create the original BZ on this. In the mean time, lab system I used is dismantled and the production system is yet to deploy. As I wrote in BZ1147148 [1], I experienced two different issues. One, one big mem leak of about 15MiB/h and a smaller one, ~300KiB. These seem unrelated. The larger leak was indeed related to SSL in some way; not necessarily M2Crypto. However, after disabling SSL this was gone leaving the smaller leak. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1147148 I think there are, at least for the purpose of this discussion, 3 leaks: 1. the M2Crypto leak 2. a slower leak 3. a large leak that's not M2Crypto related that's part of sampling My efforts have been around finding the source of my larger leak, which I think is #3. I had disabled ssl so I knew that M2Crypto isn't/shouldn't be the problem as in bz1147148, and ssl is beside the point as it happens with a deactived host. It's part of sampling which always runs. What I've found is, after trying to get the smallest reproducer, that it's not the netlink.iter_links that I commented on in [1] that is the problem. But in the _get_intefaces_and_samples loop is the call to create an InterfaceSample and that has getLinkSpeed() which, for vlans, ends up calling ipwrapper.getLink, and that to netlink.get_link(name) netlink.get_link(name) *is* the source of my big leak. This is vdsm 4.16.10, so it is [2] and it's been changed in master for the removal of support for libnl v1 so it might not be a problem anymore. def get_link(name): Returns the information dictionary of the name specified link. with _pool.socket() as sock: with _nl_link_cache(sock) as cache: link = _rtnl_link_get_by_name(cache, name) if not link: raise IOError(errno.ENODEV, '%s is not present in the system' % name) return _link_info(cache, link) The libnl documentation note at [3] says that for the rtnl_link_get_by_name function Attention The reference counter of the returned link object will be incremented. Use rtnl_link_put() to release the reference. So I took that hint, and made a change that does the rtnl_link_put() in get_link(name) and it looks like it works for me. diff oldnetlink.py netlink.py 67d66 return _link_info(cache, link) 68a68,70 li = _link_info(cache, link) _rtnl_link_put(link) return li 333a336,337 _rtnl_link_put = _none_proto(('rtnl_link_put', LIBNL_ROUTE)) Hope that helps. And if someone else could confirm that would be great. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1158108 [2] https://gerrit.ovirt.org/gitweb?p=vdsm.git;a=blob;f=lib/vdsm/netlink.py;h=afae5cecb5ce701d00fb8f019ec92b3331a39036;hb=5608cfdf43db9186dabac4b2a779f9557e798968 [3] http://www.infradead.org/~tgr/libnl/doc/api/group__link.html#ga1d583e4f0b43c89d854e5e681a529fad Thanks, John, for a great detective work. I'm afraid that with even on the master branch we keep calling rtnl_link_get_link() and rtnl_link_get_by_name() without clearing the reference count, so a fix is due there, too. Would you consider posting a fully-fledged fix to gerrit? I still need to understand what is the use of that refcount, so that we do not release it too early. Regards, Dan. Dan, I'm happy to [1], although I've probably gotten something wrong with how it's supposed to be done :) It's for the version I'm using so it's for branch ovirt-3.5. [1] https://gerrit.ovirt.org/#/c/39372/ Thanks, -John ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] VDSM memory consumption
On Sat, Mar 28, 2015 at 10:20:25AM -0400, John Taylor wrote: Daniel Helgenberger daniel.helgenber...@m-box.de writes: Hello Everyone, I did create the original BZ on this. In the mean time, lab system I used is dismantled and the production system is yet to deploy. As I wrote in BZ1147148 [1], I experienced two different issues. One, one big mem leak of about 15MiB/h and a smaller one, ~300KiB. These seem unrelated. The larger leak was indeed related to SSL in some way; not necessarily M2Crypto. However, after disabling SSL this was gone leaving the smaller leak. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1147148 I think there are, at least for the purpose of this discussion, 3 leaks: 1. the M2Crypto leak 2. a slower leak 3. a large leak that's not M2Crypto related that's part of sampling My efforts have been around finding the source of my larger leak, which I think is #3. I had disabled ssl so I knew that M2Crypto isn't/shouldn't be the problem as in bz1147148, and ssl is beside the point as it happens with a deactived host. It's part of sampling which always runs. What I've found is, after trying to get the smallest reproducer, that it's not the netlink.iter_links that I commented on in [1] that is the problem. But in the _get_intefaces_and_samples loop is the call to create an InterfaceSample and that has getLinkSpeed() which, for vlans, ends up calling ipwrapper.getLink, and that to netlink.get_link(name) netlink.get_link(name) *is* the source of my big leak. This is vdsm 4.16.10, so it is [2] and it's been changed in master for the removal of support for libnl v1 so it might not be a problem anymore. def get_link(name): Returns the information dictionary of the name specified link. with _pool.socket() as sock: with _nl_link_cache(sock) as cache: link = _rtnl_link_get_by_name(cache, name) if not link: raise IOError(errno.ENODEV, '%s is not present in the system' % name) return _link_info(cache, link) The libnl documentation note at [3] says that for the rtnl_link_get_by_name function Attention The reference counter of the returned link object will be incremented. Use rtnl_link_put() to release the reference. So I took that hint, and made a change that does the rtnl_link_put() in get_link(name) and it looks like it works for me. diff oldnetlink.py netlink.py 67d66 return _link_info(cache, link) 68a68,70 li = _link_info(cache, link) _rtnl_link_put(link) return li 333a336,337 _rtnl_link_put = _none_proto(('rtnl_link_put', LIBNL_ROUTE)) Hope that helps. And if someone else could confirm that would be great. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1158108 [2] https://gerrit.ovirt.org/gitweb?p=vdsm.git;a=blob;f=lib/vdsm/netlink.py;h=afae5cecb5ce701d00fb8f019ec92b3331a39036;hb=5608cfdf43db9186dabac4b2a779f9557e798968 [3] http://www.infradead.org/~tgr/libnl/doc/api/group__link.html#ga1d583e4f0b43c89d854e5e681a529fad Thanks, John, for a great detective work. I'm afraid that with even on the master branch we keep calling rtnl_link_get_link() and rtnl_link_get_by_name() without clearing the reference count, so a fix is due there, too. Would you consider posting a fully-fledged fix to gerrit? I still need to understand what is the use of that refcount, so that we do not release it too early. Regards, Dan. ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] VDSM memory consumption
On 26/03/15 18:12, Darrell Budic wrote: Yes, this script leaks quickly. Started out at a RSS of 21000ish, already at 26744 a minute in, about 5 minutes later it’s at 39384 and climbing. Been abusing a production server for those simple tests, but didn’t want to run valgrind against it right this minute. Did run it against the test.py script above though, got this (fpaste.org didn’t like, too long maybe?): http://tower.onholyground.com/valgrind-test.log To comment on some other posts in this thread, I also see leaks on my test system which is running Centos 6.6, but it only has 3 VMs across 2 servers and 3 configured networks and it leaks MUCH slower. I suspect people don’t notice this on test systems because they don’t have a lot of VMs/interfaces running, and don’t leave them up for weeks at a time. That’s why I was running these tests on my production box, to have more VMs up. I don't think it's related directly to the number of VMs running. Maybe indirectly if it's related to the number of network interfaces (so vm interfaces add to the leak). We've seen the leak on nodes under maintenance... G ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] VDSM memory consumption
Just to precise that I'm also concerned whatever is the host (el7 or el6) and I have many vms running on a single host (up to 15) and many networks ( up to 10) It is always the same : when vdsmd finished to take the totality of memory, the host becomes unreacheable and vms begin to migrate. The only way to stop this is to restart vdsmd. Le 30/03/2015 15:40, Kapetanakis Giannis a écrit : On 26/03/15 18:12, Darrell Budic wrote: Yes, this script leaks quickly. Started out at a RSS of 21000ish, already at 26744 a minute in, about 5 minutes later it’s at 39384 and climbing. Been abusing a production server for those simple tests, but didn’t want to run valgrind against it right this minute. Did run it against the test.py script above though, got this (fpaste.org didn’t like, too long maybe?): http://tower.onholyground.com/valgrind-test.log To comment on some other posts in this thread, I also see leaks on my test system which is running Centos 6.6, but it only has 3 VMs across 2 servers and 3 configured networks and it leaks MUCH slower. I suspect people don’t notice this on test systems because they don’t have a lot of VMs/interfaces running, and don’t leave them up for weeks at a time. That’s why I was running these tests on my production box, to have more VMs up. I don't think it's related directly to the number of VMs running. Maybe indirectly if it's related to the number of network interfaces (so vm interfaces add to the leak). We've seen the leak on nodes under maintenance... G ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] VDSM memory consumption
Dan Kenigsberg dan...@redhat.com writes: On Sat, Mar 28, 2015 at 10:20:25AM -0400, John Taylor wrote: Daniel Helgenberger daniel.helgenber...@m-box.de writes: Hello Everyone, I did create the original BZ on this. In the mean time, lab system I used is dismantled and the production system is yet to deploy. As I wrote in BZ1147148 [1], I experienced two different issues. One, one big mem leak of about 15MiB/h and a smaller one, ~300KiB. These seem unrelated. The larger leak was indeed related to SSL in some way; not necessarily M2Crypto. However, after disabling SSL this was gone leaving the smaller leak. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1147148 I think there are, at least for the purpose of this discussion, 3 leaks: 1. the M2Crypto leak 2. a slower leak 3. a large leak that's not M2Crypto related that's part of sampling My efforts have been around finding the source of my larger leak, which I think is #3. I had disabled ssl so I knew that M2Crypto isn't/shouldn't be the problem as in bz1147148, and ssl is beside the point as it happens with a deactived host. It's part of sampling which always runs. What I've found is, after trying to get the smallest reproducer, that it's not the netlink.iter_links that I commented on in [1] that is the problem. But in the _get_intefaces_and_samples loop is the call to create an InterfaceSample and that has getLinkSpeed() which, for vlans, ends up calling ipwrapper.getLink, and that to netlink.get_link(name) netlink.get_link(name) *is* the source of my big leak. This is vdsm 4.16.10, so it is [2] and it's been changed in master for the removal of support for libnl v1 so it might not be a problem anymore. def get_link(name): Returns the information dictionary of the name specified link. with _pool.socket() as sock: with _nl_link_cache(sock) as cache: link = _rtnl_link_get_by_name(cache, name) if not link: raise IOError(errno.ENODEV, '%s is not present in the system' % name) return _link_info(cache, link) The libnl documentation note at [3] says that for the rtnl_link_get_by_name function Attention The reference counter of the returned link object will be incremented. Use rtnl_link_put() to release the reference. So I took that hint, and made a change that does the rtnl_link_put() in get_link(name) and it looks like it works for me. diff oldnetlink.py netlink.py 67d66 return _link_info(cache, link) 68a68,70 li = _link_info(cache, link) _rtnl_link_put(link) return li 333a336,337 _rtnl_link_put = _none_proto(('rtnl_link_put', LIBNL_ROUTE)) Hope that helps. And if someone else could confirm that would be great. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1158108 [2] https://gerrit.ovirt.org/gitweb?p=vdsm.git;a=blob;f=lib/vdsm/netlink.py;h=afae5cecb5ce701d00fb8f019ec92b3331a39036;hb=5608cfdf43db9186dabac4b2a779f9557e798968 [3] http://www.infradead.org/~tgr/libnl/doc/api/group__link.html#ga1d583e4f0b43c89d854e5e681a529fad Thanks, John, for a great detective work. I'm afraid that with even on the master branch we keep calling rtnl_link_get_link() and rtnl_link_get_by_name() without clearing the reference count, so a fix is due there, too. Would you consider posting a fully-fledged fix to gerrit? I still need to understand what is the use of that refcount, so that we do not release it too early. Regards, Dan. Dan, I'm happy to [1], although I've probably gotten something wrong with how it's supposed to be done :) It's for the version I'm using so it's for branch ovirt-3.5. [1] https://gerrit.ovirt.org/#/c/39372/ Thanks, -John ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] VDSM memory consumption
Daniel Helgenberger daniel.helgenber...@m-box.de writes: Hello Everyone, I did create the original BZ on this. In the mean time, lab system I used is dismantled and the production system is yet to deploy. As I wrote in BZ1147148 [1], I experienced two different issues. One, one big mem leak of about 15MiB/h and a smaller one, ~300KiB. These seem unrelated. The larger leak was indeed related to SSL in some way; not necessarily M2Crypto. However, after disabling SSL this was gone leaving the smaller leak. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1147148 I think there are, at least for the purpose of this discussion, 3 leaks: 1. the M2Crypto leak 2. a slower leak 3. a large leak that's not M2Crypto related that's part of sampling My efforts have been around finding the source of my larger leak, which I think is #3. I had disabled ssl so I knew that M2Crypto isn't/shouldn't be the problem as in bz1147148, and ssl is beside the point as it happens with a deactived host. It's part of sampling which always runs. What I've found is, after trying to get the smallest reproducer, that it's not the netlink.iter_links that I commented on in [1] that is the problem. But in the _get_intefaces_and_samples loop is the call to create an InterfaceSample and that has getLinkSpeed() which, for vlans, ends up calling ipwrapper.getLink, and that to netlink.get_link(name) netlink.get_link(name) *is* the source of my big leak. This is vdsm 4.16.10, so it is [2] and it's been changed in master for the removal of support for libnl v1 so it might not be a problem anymore. def get_link(name): Returns the information dictionary of the name specified link. with _pool.socket() as sock: with _nl_link_cache(sock) as cache: link = _rtnl_link_get_by_name(cache, name) if not link: raise IOError(errno.ENODEV, '%s is not present in the system' % name) return _link_info(cache, link) The libnl documentation note at [3] says that for the rtnl_link_get_by_name function Attention The reference counter of the returned link object will be incremented. Use rtnl_link_put() to release the reference. So I took that hint, and made a change that does the rtnl_link_put() in get_link(name) and it looks like it works for me. diff oldnetlink.py netlink.py 67d66 return _link_info(cache, link) 68a68,70 li = _link_info(cache, link) _rtnl_link_put(link) return li 333a336,337 _rtnl_link_put = _none_proto(('rtnl_link_put', LIBNL_ROUTE)) Hope that helps. And if someone else could confirm that would be great. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1158108 [2] https://gerrit.ovirt.org/gitweb?p=vdsm.git;a=blob;f=lib/vdsm/netlink.py;h=afae5cecb5ce701d00fb8f019ec92b3331a39036;hb=5608cfdf43db9186dabac4b2a779f9557e798968 [3] http://www.infradead.org/~tgr/libnl/doc/api/group__link.html#ga1d583e4f0b43c89d854e5e681a529fad -John ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] VDSM memory consumption
On Wed, Mar 25, 2015 at 01:29:25PM -0500, Darrell Budic wrote: On Mar 25, 2015, at 5:34 AM, Dan Kenigsberg dan...@redhat.com wrote: On Tue, Mar 24, 2015 at 02:01:40PM -0500, Darrell Budic wrote: On Mar 24, 2015, at 4:33 AM, Dan Kenigsberg dan...@redhat.com wrote: On Mon, Mar 23, 2015 at 04:00:14PM -0400, John Taylor wrote: Chris Adams c...@cmadams.net writes: Once upon a time, Sven Kieske s.kie...@mittwald.de said: On 13/03/15 12:29, Kapetanakis Giannis wrote: We also face this problem since 3.5 in two different installations... Hope it's fixed soon Nothing will get fixed if no one bothers to open BZs and send relevants log files to help track down the problems. There's already an open BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1158108 I'm not sure if that is exactly the same problem I'm seeing or not; my vdsm process seems to be growing faster (RSS grew 952K in a 5 minute period just now; VSZ didn't change). For those following this I've added a comment on the bz [1], although in my case the memory leak is, like Chris Adams, a lot more than the 300KiB/h in the original bug report by Daniel Helgenberger . [1] https://bugzilla.redhat.com/show_bug.cgi?id=1158108 That's interesting (and worrying). Could you check your suggestion by editing sampling.py so that _get_interfaces_and_samples() returns the empty dict immediately? Would this make the leak disappear? Looks like you’ve got something there. Just a quick test for now, watching RSS in top. I’ll let it go this way for a while and see what it looks in a few hours. System 1: 13 VMs w/ 24 interfaces between them 11:47 killed a vdsm @ 9.116G RSS (after maybe a week and a half running) 11:47: 97xxx 11:57 135544 and climbing 12:00 136400 restarted with sampling.py modified to just return empty set: def _get_interfaces_and_samples(): links_and_samples = {} return links_and_samples Thanks for the input. Just to be a little more certain that the culprit is _get_interfaces_and_samples() per se, would you please decorate it with memoized, and add a log line in the end @utils.memoized # add this line def _get_interfaces_and_samples(): ... logging.debug('LINKS %s', links_and_samples) ## and this line return links_and_samples I'd like to see what happens when the function is run only once, and returns a non-empty reasonable dictionary of links and samples. Looks similar, I modified my second server for this test: Thanks again. Would you be kind to search further? Does the following script leak anything on your host, when placed in your /usr/share/vdsm: #!/usr/bin/python from time import sleep from virt.sampling import _get_interfaces_and_samples while True: _get_interfaces_and_samples() sleep(0.2) Something that can be a bit harder would be to: # service vdsmd stop # su - vdsm -s /bin/bash # cd /usr/share/vdsm # valgrind --leak-check=full --log-file=/tmp/your.log vdsm as suggested by Thomas on https://bugzilla.redhat.com/show_bug.cgi?id=1158108#c6 Regards, Dan. ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] VDSM memory consumption
Hello Everyone, I did create the original BZ on this. In the mean time, lab system I used is dismantled and the production system is yet to deploy. As I wrote in BZ1147148 [1], I experienced two different issues. One, one big mem leak of about 15MiB/h and a smaller one, ~300KiB. These seem unrelated. The larger leak was indeed related to SSL in some way; not necessarily M2Crypto. However, after disabling SSL this was gone leaving the smaller leak. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1147148 On Mo, 2015-03-09 at 23:49 +0100, Matt . wrote: Hi, I also see this on the latest 3.5 version, I'm thinking about setting up a cronjob to restart vdsm every night. I did the same thing. In general, it seems to be a bad idea as it compromised system stability on the long run. While VMs seem to be fine, engine does not like this very much. I cannot believe that people say they don't have this issue. This was hard for me to accept as well. I know of Markus Stockhausen and Seven Kieske, both confirmed the small leak. This might also be some special other service; though I started out with a minimal install of Centos 6. Can someone of the devs dive in maybe ? Thanks! Matt 2015-03-09 23:29 GMT+01:00 Dan Kenigsberg dan...@redhat.com: On Mon, Mar 09, 2015 at 10:40:51AM -0500, Darrell Budic wrote: On Mar 9, 2015, at 4:51 AM, Dan Kenigsberg dan...@redhat.com wrote: On Fri, Mar 06, 2015 at 10:58:53AM -0600, Darrell Budic wrote: I believe the supervdsm leak was fixed, but 3.5.1 versions of vdsmd still leaks slowly, ~300k/hr, yes. https://bugzilla.redhat.com/show_bug.cgi?id=1158108 On Mar 6, 2015, at 10:23 AM, Chris Adams c...@cmadams.net wrote: Once upon a time, Federico Alberto Sayd fs...@uncu.edu.ar said: I am experiencing troubles with VDSM memory consuption. I am running Engine: ovirt 3.5.1 Nodes: Centos 6.6 VDSM 4.16.10-8 Libvirt: libvirt-0.10.2-46 Kernel: 2.6.32 When the host boots, memory consuption is normal, but after 2 or 3 days running, VDSM memory consuption grows and it consumes more memory that all vm's running in the host. If I restart the vdsm service, memory consuption normalizes, but then it start growing again. I have seen some BZ about vdsm and supervdsm about memory leaks, but I don't know if VDSM 4.6.10.8 is still affected by a related bug. Can't help, but I see the same thing with CentOS 7 nodes and the same version of vdsm. -- Chris Adams c...@cmadams.net ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users I'm afraid that we are yet to find a solution for this issue, which is completly different from the horrible leak of supervdsm 4.16.7. Could you corroborate the claim of Bug 1147148 - M2Crypto usage in vdsm leaks memory ? Does the leak disappear once you start using plaintext transport? Regards, Dan. I don’t think this is crypto related, but I could try that if you still need some confirmation (and point me at a quick doc on switching to plaintext?). This is from #ovirt around November 18th I think, Saggi thought he’d found something related: 9:58:43 AM saggi: YamakasY: Found the leak 9:58:48 AM saggi: YamakasY: Or at least the flow 9:58:57 AM saggi: YamakasY: The good news is that I can reproduce 9:59:20 AM YamakasY: saggi: that's kewl! 9:59:25 AM YamakasY: saggi: what happens ? 9:59:41 AM YamakasY: I know from Telsin (ping ping!) that he sees it going faster on gluster usage tdosek left the room (quit: Ping timeout: 480 seconds). (10:00:02 AM) djasa left the room (quit: Quit: Leaving). (10:00:24 AM) mlipchuk left the room (quit: Quit: Leaving.). (10:00:29 AM) laravot left the room (quit: Quit: Leaving.). (10:01:19 AM) 10:01:54 AM saggi: YamakasY: it's in getCapabilities(). Here is the RSS graph. The flatlines are when I stopped calling it and called other verbs. http://i.imgur.com/CLm0Q75.png I do recall what is the issue Saggi and YamakasY were dicussing (CCing the pair), or if it reached fruition as a patch. It is certainly something other than Bug 1158108, as the latter speak about a leak in a normal working state, with no getCapabilities calls. ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users -- Daniel Helgenberger m box bewegtbild GmbH P: +49/30/2408781-22 F: +49/30/2408781-10 ACKERSTR. 19 D-10115 BERLIN www.m-box.de www.monkeymen.tv Geschäftsführer: Martin Retschitzegger / Michaela Göllner Handeslregister: Amtsgericht Charlottenburg / HRB 112767 ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] VDSM memory consumption
Hi Daniel, Great! Thanks. I only see this issue happening on CentOS 7, Joop van de Wege also confirmed he didn't see it on CentOS 6. Cheers, Matt 2015-03-26 13:33 GMT+01:00 Daniel Helgenberger daniel.helgenber...@m-box.de: Hello Everyone, I did create the original BZ on this. In the mean time, lab system I used is dismantled and the production system is yet to deploy. As I wrote in BZ1147148 [1], I experienced two different issues. One, one big mem leak of about 15MiB/h and a smaller one, ~300KiB. These seem unrelated. The larger leak was indeed related to SSL in some way; not necessarily M2Crypto. However, after disabling SSL this was gone leaving the smaller leak. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1147148 On Mo, 2015-03-09 at 23:49 +0100, Matt . wrote: Hi, I also see this on the latest 3.5 version, I'm thinking about setting up a cronjob to restart vdsm every night. I did the same thing. In general, it seems to be a bad idea as it compromised system stability on the long run. While VMs seem to be fine, engine does not like this very much. I cannot believe that people say they don't have this issue. This was hard for me to accept as well. I know of Markus Stockhausen and Seven Kieske, both confirmed the small leak. This might also be some special other service; though I started out with a minimal install of Centos 6. Can someone of the devs dive in maybe ? Thanks! Matt 2015-03-09 23:29 GMT+01:00 Dan Kenigsberg dan...@redhat.com: On Mon, Mar 09, 2015 at 10:40:51AM -0500, Darrell Budic wrote: On Mar 9, 2015, at 4:51 AM, Dan Kenigsberg dan...@redhat.com wrote: On Fri, Mar 06, 2015 at 10:58:53AM -0600, Darrell Budic wrote: I believe the supervdsm leak was fixed, but 3.5.1 versions of vdsmd still leaks slowly, ~300k/hr, yes. https://bugzilla.redhat.com/show_bug.cgi?id=1158108 On Mar 6, 2015, at 10:23 AM, Chris Adams c...@cmadams.net wrote: Once upon a time, Federico Alberto Sayd fs...@uncu.edu.ar said: I am experiencing troubles with VDSM memory consuption. I am running Engine: ovirt 3.5.1 Nodes: Centos 6.6 VDSM 4.16.10-8 Libvirt: libvirt-0.10.2-46 Kernel: 2.6.32 When the host boots, memory consuption is normal, but after 2 or 3 days running, VDSM memory consuption grows and it consumes more memory that all vm's running in the host. If I restart the vdsm service, memory consuption normalizes, but then it start growing again. I have seen some BZ about vdsm and supervdsm about memory leaks, but I don't know if VDSM 4.6.10.8 is still affected by a related bug. Can't help, but I see the same thing with CentOS 7 nodes and the same version of vdsm. -- Chris Adams c...@cmadams.net ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users I'm afraid that we are yet to find a solution for this issue, which is completly different from the horrible leak of supervdsm 4.16.7. Could you corroborate the claim of Bug 1147148 - M2Crypto usage in vdsm leaks memory ? Does the leak disappear once you start using plaintext transport? Regards, Dan. I don’t think this is crypto related, but I could try that if you still need some confirmation (and point me at a quick doc on switching to plaintext?). This is from #ovirt around November 18th I think, Saggi thought he’d found something related: 9:58:43 AM saggi: YamakasY: Found the leak 9:58:48 AM saggi: YamakasY: Or at least the flow 9:58:57 AM saggi: YamakasY: The good news is that I can reproduce 9:59:20 AM YamakasY: saggi: that's kewl! 9:59:25 AM YamakasY: saggi: what happens ? 9:59:41 AM YamakasY: I know from Telsin (ping ping!) that he sees it going faster on gluster usage tdosek left the room (quit: Ping timeout: 480 seconds). (10:00:02 AM) djasa left the room (quit: Quit: Leaving). (10:00:24 AM) mlipchuk left the room (quit: Quit: Leaving.). (10:00:29 AM) laravot left the room (quit: Quit: Leaving.). (10:01:19 AM) 10:01:54 AM saggi: YamakasY: it's in getCapabilities(). Here is the RSS graph. The flatlines are when I stopped calling it and called other verbs. http://i.imgur.com/CLm0Q75.png I do recall what is the issue Saggi and YamakasY were dicussing (CCing the pair), or if it reached fruition as a patch. It is certainly something other than Bug 1158108, as the latter speak about a leak in a normal working state, with no getCapabilities calls. ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users -- Daniel Helgenberger m box bewegtbild GmbH P: +49/30/2408781-22 F: +49/30/2408781-10 ACKERSTR. 19 D-10115 BERLIN www.m-box.de www.monkeymen.tv Geschäftsführer: Martin Retschitzegger / Michaela Göllner Handeslregister: Amtsgericht
Re: [ovirt-users] VDSM memory consumption
On 26/03/15 09:43, Matt . wrote: Hi Daniel, Great! Thanks. I only see this issue happening on CentOS 7, Joop van de Wege also confirmed he didn't see it on CentOS 6. Cheers, Matt I have experienced the same issue on Centos 6.6 and Centos 7 both managed by the same engine. Cheers Federico 2015-03-26 13:33 GMT+01:00 Daniel Helgenberger daniel.helgenber...@m-box.de: Hello Everyone, I did create the original BZ on this. In the mean time, lab system I used is dismantled and the production system is yet to deploy. As I wrote in BZ1147148 [1], I experienced two different issues. One, one big mem leak of about 15MiB/h and a smaller one, ~300KiB. These seem unrelated. The larger leak was indeed related to SSL in some way; not necessarily M2Crypto. However, after disabling SSL this was gone leaving the smaller leak. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1147148 On Mo, 2015-03-09 at 23:49 +0100, Matt . wrote: Hi, I also see this on the latest 3.5 version, I'm thinking about setting up a cronjob to restart vdsm every night. I did the same thing. In general, it seems to be a bad idea as it compromised system stability on the long run. While VMs seem to be fine, engine does not like this very much. I cannot believe that people say they don't have this issue. This was hard for me to accept as well. I know of Markus Stockhausen and Seven Kieske, both confirmed the small leak. This might also be some special other service; though I started out with a minimal install of Centos 6. Can someone of the devs dive in maybe ? Thanks! Matt 2015-03-09 23:29 GMT+01:00 Dan Kenigsberg dan...@redhat.com: On Mon, Mar 09, 2015 at 10:40:51AM -0500, Darrell Budic wrote: On Mar 9, 2015, at 4:51 AM, Dan Kenigsberg dan...@redhat.com wrote: On Fri, Mar 06, 2015 at 10:58:53AM -0600, Darrell Budic wrote: I believe the supervdsm leak was fixed, but 3.5.1 versions of vdsmd still leaks slowly, ~300k/hr, yes. https://bugzilla.redhat.com/show_bug.cgi?id=1158108 On Mar 6, 2015, at 10:23 AM, Chris Adams c...@cmadams.net wrote: Once upon a time, Federico Alberto Sayd fs...@uncu.edu.ar said: I am experiencing troubles with VDSM memory consuption. I am running Engine: ovirt 3.5.1 Nodes: Centos 6.6 VDSM 4.16.10-8 Libvirt: libvirt-0.10.2-46 Kernel: 2.6.32 When the host boots, memory consuption is normal, but after 2 or 3 days running, VDSM memory consuption grows and it consumes more memory that all vm's running in the host. If I restart the vdsm service, memory consuption normalizes, but then it start growing again. I have seen some BZ about vdsm and supervdsm about memory leaks, but I don't know if VDSM 4.6.10.8 is still affected by a related bug. Can't help, but I see the same thing with CentOS 7 nodes and the same version of vdsm. -- Chris Adams c...@cmadams.net ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users I'm afraid that we are yet to find a solution for this issue, which is completly different from the horrible leak of supervdsm 4.16.7. Could you corroborate the claim of Bug 1147148 - M2Crypto usage in vdsm leaks memory ? Does the leak disappear once you start using plaintext transport? Regards, Dan. I don’t think this is crypto related, but I could try that if you still need some confirmation (and point me at a quick doc on switching to plaintext?). This is from #ovirt around November 18th I think, Saggi thought he’d found something related: 9:58:43 AM saggi: YamakasY: Found the leak 9:58:48 AM saggi: YamakasY: Or at least the flow 9:58:57 AM saggi: YamakasY: The good news is that I can reproduce 9:59:20 AM YamakasY: saggi: that's kewl! 9:59:25 AM YamakasY: saggi: what happens ? 9:59:41 AM YamakasY: I know from Telsin (ping ping!) that he sees it going faster on gluster usage tdosek left the room (quit: Ping timeout: 480 seconds). (10:00:02 AM) djasa left the room (quit: Quit: Leaving). (10:00:24 AM) mlipchuk left the room (quit: Quit: Leaving.). (10:00:29 AM) laravot left the room (quit: Quit: Leaving.). (10:01:19 AM) 10:01:54 AM saggi: YamakasY: it's in getCapabilities(). Here is the RSS graph. The flatlines are when I stopped calling it and called other verbs. http://i.imgur.com/CLm0Q75.png I do recall what is the issue Saggi and YamakasY were dicussing (CCing the pair), or if it reached fruition as a patch. It is certainly something other than Bug 1158108, as the latter speak about a leak in a normal working state, with no getCapabilities calls. ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users -- Daniel Helgenberger m box bewegtbild GmbH P: +49/30/2408781-22 F: +49/30/2408781-10 ACKERSTR. 19 D-10115 BERLIN www.m-box.de www.monkeymen.tv Geschäftsführer: Martin Retschitzegger / Michaela Göllner Handeslregister: Amtsgericht Charlottenburg / HRB 112767 ___ Users
Re: [ovirt-users] VDSM memory consumption
On Mar 26, 2015, at 6:42 AM, Dan Kenigsberg dan...@redhat.com wrote: On Wed, Mar 25, 2015 at 01:29:25PM -0500, Darrell Budic wrote: On Mar 25, 2015, at 5:34 AM, Dan Kenigsberg dan...@redhat.com wrote: On Tue, Mar 24, 2015 at 02:01:40PM -0500, Darrell Budic wrote: On Mar 24, 2015, at 4:33 AM, Dan Kenigsberg dan...@redhat.com wrote: On Mon, Mar 23, 2015 at 04:00:14PM -0400, John Taylor wrote: Chris Adams c...@cmadams.net writes: Once upon a time, Sven Kieske s.kie...@mittwald.de said: On 13/03/15 12:29, Kapetanakis Giannis wrote: We also face this problem since 3.5 in two different installations... Hope it's fixed soon Nothing will get fixed if no one bothers to open BZs and send relevants log files to help track down the problems. There's already an open BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1158108 I'm not sure if that is exactly the same problem I'm seeing or not; my vdsm process seems to be growing faster (RSS grew 952K in a 5 minute period just now; VSZ didn't change). For those following this I've added a comment on the bz [1], although in my case the memory leak is, like Chris Adams, a lot more than the 300KiB/h in the original bug report by Daniel Helgenberger . [1] https://bugzilla.redhat.com/show_bug.cgi?id=1158108 That's interesting (and worrying). Could you check your suggestion by editing sampling.py so that _get_interfaces_and_samples() returns the empty dict immediately? Would this make the leak disappear? Looks like you’ve got something there. Just a quick test for now, watching RSS in top. I’ll let it go this way for a while and see what it looks in a few hours. System 1: 13 VMs w/ 24 interfaces between them 11:47 killed a vdsm @ 9.116G RSS (after maybe a week and a half running) 11:47: 97xxx 11:57 135544 and climbing 12:00 136400 restarted with sampling.py modified to just return empty set: def _get_interfaces_and_samples(): links_and_samples = {} return links_and_samples Thanks for the input. Just to be a little more certain that the culprit is _get_interfaces_and_samples() per se, would you please decorate it with memoized, and add a log line in the end @utils.memoized # add this line def _get_interfaces_and_samples(): ... logging.debug('LINKS %s', links_and_samples) ## and this line return links_and_samples I'd like to see what happens when the function is run only once, and returns a non-empty reasonable dictionary of links and samples. Looks similar, I modified my second server for this test: Thanks again. Would you be kind to search further? Does the following script leak anything on your host, when placed in your /usr/share/vdsm: #!/usr/bin/python from time import sleep from virt.sampling import _get_interfaces_and_samples while True: _get_interfaces_and_samples() sleep(0.2) Something that can be a bit harder would be to: # service vdsmd stop # su - vdsm -s /bin/bash # cd /usr/share/vdsm # valgrind --leak-check=full --log-file=/tmp/your.log vdsm as suggested by Thomas on https://bugzilla.redhat.com/show_bug.cgi?id=1158108#c6 Yes, this script leaks quickly. Started out at a RSS of 21000ish, already at 26744 a minute in, about 5 minutes later it’s at 39384 and climbing. Been abusing a production server for those simple tests, but didn’t want to run valgrind against it right this minute. Did run it against the test.py script above though, got this (fpaste.org didn’t like, too long maybe?): http://tower.onholyground.com/valgrind-test.log To comment on some other posts in this thread, I also see leaks on my test system which is running Centos 6.6, but it only has 3 VMs across 2 servers and 3 configured networks and it leaks MUCH slower. I suspect people don’t notice this on test systems because they don’t have a lot of VMs/interfaces running, and don’t leave them up for weeks at a time. That’s why I was running these tests on my production box, to have more VMs up. ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] VDSM memory consumption
On Mar 25, 2015, at 5:34 AM, Dan Kenigsberg dan...@redhat.com wrote: On Tue, Mar 24, 2015 at 02:01:40PM -0500, Darrell Budic wrote: On Mar 24, 2015, at 4:33 AM, Dan Kenigsberg dan...@redhat.com wrote: On Mon, Mar 23, 2015 at 04:00:14PM -0400, John Taylor wrote: Chris Adams c...@cmadams.net writes: Once upon a time, Sven Kieske s.kie...@mittwald.de said: On 13/03/15 12:29, Kapetanakis Giannis wrote: We also face this problem since 3.5 in two different installations... Hope it's fixed soon Nothing will get fixed if no one bothers to open BZs and send relevants log files to help track down the problems. There's already an open BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1158108 I'm not sure if that is exactly the same problem I'm seeing or not; my vdsm process seems to be growing faster (RSS grew 952K in a 5 minute period just now; VSZ didn't change). For those following this I've added a comment on the bz [1], although in my case the memory leak is, like Chris Adams, a lot more than the 300KiB/h in the original bug report by Daniel Helgenberger . [1] https://bugzilla.redhat.com/show_bug.cgi?id=1158108 That's interesting (and worrying). Could you check your suggestion by editing sampling.py so that _get_interfaces_and_samples() returns the empty dict immediately? Would this make the leak disappear? Looks like you’ve got something there. Just a quick test for now, watching RSS in top. I’ll let it go this way for a while and see what it looks in a few hours. System 1: 13 VMs w/ 24 interfaces between them 11:47 killed a vdsm @ 9.116G RSS (after maybe a week and a half running) 11:47: 97xxx 11:57 135544 and climbing 12:00 136400 restarted with sampling.py modified to just return empty set: def _get_interfaces_and_samples(): links_and_samples = {} return links_and_samples Thanks for the input. Just to be a little more certain that the culprit is _get_interfaces_and_samples() per se, would you please decorate it with memoized, and add a log line in the end @utils.memoized # add this line def _get_interfaces_and_samples(): ... logging.debug('LINKS %s', links_and_samples) ## and this line return links_and_samples I'd like to see what happens when the function is run only once, and returns a non-empty reasonable dictionary of links and samples. Looks similar, I modified my second server for this test: 12:25, still growing from yesterday: 544512 restarted with mods for logging and memoize: stabilized @ 12:32: 114284 1:23: 115300 Thread-12::DEBUG::2015-03-25 12:28:08,080::sampling::243::root::(_get_interfaces_and_samples) LINKS {'vnet18': virt.sampling.InterfaceSample instance at 0x7f38c03e85f0, 'vnet19': virt.sampling.InterfaceSample instance at 0x7f38b42cbcf8, 'bond0': virt.sampling.InterfaceSample instance at 0x7f38b429afc8, 'vnet13': virt.sampling.InterfaceSample instance at 0x7f38b42c8680, 'vnet16': virt.sampling.InterfaceSample instance at 0x7f38b42cb368, 'private': virt.sampling.InterfaceSample instance at 0x7f38b42b8bd8, 'bond0.100': virt.sampling.InterfaceSample instance at 0x7f38b42bdd88, 'vnet0': virt.sampling.InterfaceSample instance at 0x7f38b42c1f80, 'enp3s0': virt.sampling.InterfaceSample instance at 0x7f38b429cef0, 'vnet2': virt.sampling.InterfaceSample instance at 0x7f38b42bbbd8, 'vnet3': virt.sampling.InterfaceSample instance at 0x7f38b42c37e8, 'vnet4': virt.sampling.InterfaceSample instance at 0x7f38b42c5518, 'vnet5': virt.sampling.InterfaceSample instance at 0x7f38b42c6ab8, 'vnet6': virt.sampling.InterfaceSample instance at 0x7f38b42c7248, 'vnet7': virt.sampling.InterfaceSample instance at 0x7f38c03e7a28, 'vnet8': virt.sampling.InterfaceSample instance at 0x7f38b42c7c20, 'bond0.1100': virt.sampling.InterfaceSample instance at 0x7f38b42be710, 'bond0.1103': virt.sampling.InterfaceSample instance at 0x7f38b429dc68, 'ovirtmgmt': virt.sampling.InterfaceSample instance at 0x7f38b42b16c8, 'lo': virt.sampling.InterfaceSample instance at 0x7f38b429a8c0, 'vnet22': virt.sampling.InterfaceSample instance at 0x7f38c03e7128, 'vnet21': virt.sampling.InterfaceSample instance at 0x7f38b42cd368, 'vnet20': virt.sampling.InterfaceSample instance at 0x7f38b42cc7a0, 'internet': virt.sampling.InterfaceSample instance at 0x7f38b42aa098, 'bond0.1203': virt.sampling.InterfaceSample instance at 0x7f38b42aa8c0, 'bond0.1223': virt.sampling.InterfaceSample instance at 0x7f38b42bb128, ‘XXX': virt.sampling.InterfaceSample instance at 0x7f38b42bee60, ‘XXX': virt.sampling.InterfaceSample instance at 0x7f38b42beef0, ';vdsmdummy;': virt.sampling.InterfaceSample instance at 0x7f38b42bdc20, 'vnet14': virt.sampling.InterfaceSample instance at 0x7f38b42ca050, 'mgmt': virt.sampling.InterfaceSample instance at 0x7f38b42be248, 'vnet15': virt.sampling.InterfaceSample instance at 0x7f38b42cab00, 'enp2s0': virt.sampling.InterfaceSample instance at 0x7f38b429c200, 'bond0.1110':
Re: [ovirt-users] VDSM memory consumption
On Mon, Mar 23, 2015 at 04:00:14PM -0400, John Taylor wrote: Chris Adams c...@cmadams.net writes: Once upon a time, Sven Kieske s.kie...@mittwald.de said: On 13/03/15 12:29, Kapetanakis Giannis wrote: We also face this problem since 3.5 in two different installations... Hope it's fixed soon Nothing will get fixed if no one bothers to open BZs and send relevants log files to help track down the problems. There's already an open BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1158108 I'm not sure if that is exactly the same problem I'm seeing or not; my vdsm process seems to be growing faster (RSS grew 952K in a 5 minute period just now; VSZ didn't change). For those following this I've added a comment on the bz [1], although in my case the memory leak is, like Chris Adams, a lot more than the 300KiB/h in the original bug report by Daniel Helgenberger . [1] https://bugzilla.redhat.com/show_bug.cgi?id=1158108 That's interesting (and worrying). Could you check your suggestion by editing sampling.py so that _get_interfaces_and_samples() returns the empty dict immediately? Would this make the leak disappear? Dan. ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] VDSM memory consumption
On Mar 24, 2015, at 4:33 AM, Dan Kenigsberg dan...@redhat.com wrote: On Mon, Mar 23, 2015 at 04:00:14PM -0400, John Taylor wrote: Chris Adams c...@cmadams.net writes: Once upon a time, Sven Kieske s.kie...@mittwald.de said: On 13/03/15 12:29, Kapetanakis Giannis wrote: We also face this problem since 3.5 in two different installations... Hope it's fixed soon Nothing will get fixed if no one bothers to open BZs and send relevants log files to help track down the problems. There's already an open BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1158108 I'm not sure if that is exactly the same problem I'm seeing or not; my vdsm process seems to be growing faster (RSS grew 952K in a 5 minute period just now; VSZ didn't change). For those following this I've added a comment on the bz [1], although in my case the memory leak is, like Chris Adams, a lot more than the 300KiB/h in the original bug report by Daniel Helgenberger . [1] https://bugzilla.redhat.com/show_bug.cgi?id=1158108 That's interesting (and worrying). Could you check your suggestion by editing sampling.py so that _get_interfaces_and_samples() returns the empty dict immediately? Would this make the leak disappear? Looks like you’ve got something there. Just a quick test for now, watching RSS in top. I’ll let it go this way for a while and see what it looks in a few hours. System 1: 13 VMs w/ 24 interfaces between them 11:47 killed a vdsm @ 9.116G RSS (after maybe a week and a half running) 11:47: 97xxx 11:57 135544 and climbing 12:00 136400 restarted with sampling.py modified to just return empty set: def _get_interfaces_and_samples(): links_and_samples = {} return links_and_samples 12:02 quickly grew to 127694 12:13: 133352 12:20: 132476 12:31: 132732 12:40: 132656 12:50: 132800 1:30: 133928 1:40: 133136 1:50: 133116 2:00: 133128 interestingly, it looks like overall system load dropped significantly (from ~40-45% to 10% reported). mostly ksmd getting out of the way after freeing 9G, but feels like more than that. (this is a 6 core system, usually saw ksmd using ~80% of a single cpu, roughly 15% of the total available) Second system, 10 Vms w/ 17 interfaces vdsmd @ 5.027G RSS (slightly less uptime that previous host) freeing this ram caused a ~16% utilization drop as ksmd stopped running as hard. restarted at 12:10 12:10: 106224 12:20: 111220 12:31: 114616 12:40: 117500 12:50: 120504 1:30: 133040 1:40: 136140 1:50: 139032 2:00: 142292 ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] VDSM memory consumption
Chris Adams c...@cmadams.net writes: Once upon a time, Sven Kieske s.kie...@mittwald.de said: On 13/03/15 12:29, Kapetanakis Giannis wrote: We also face this problem since 3.5 in two different installations... Hope it's fixed soon Nothing will get fixed if no one bothers to open BZs and send relevants log files to help track down the problems. There's already an open BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1158108 I'm not sure if that is exactly the same problem I'm seeing or not; my vdsm process seems to be growing faster (RSS grew 952K in a 5 minute period just now; VSZ didn't change). For those following this I've added a comment on the bz [1], although in my case the memory leak is, like Chris Adams, a lot more than the 300KiB/h in the original bug report by Daniel Helgenberger . [1] https://bugzilla.redhat.com/show_bug.cgi?id=1158108 -John ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] VDSM memory consumption
On 13/03/15 12:29, Kapetanakis Giannis wrote: We also face this problem since 3.5 in two different installations... Hope it's fixed soon Nothing will get fixed if no one bothers to open BZs and send relevants log files to help track down the problems. -- Mit freundlichen Grüßen / Regards Sven Kieske Systemadministrator Mittwald CM Service GmbH Co. KG Königsberger Straße 6 32339 Espelkamp T: +49-5772-293-100 F: +49-5772-293-333 https://www.mittwald.de Geschäftsführer: Robert Meyer St.Nr.: 331/5721/1033, USt-IdNr.: DE814773217, HRA 6640, AG Bad Oeynhausen Komplementärin: Robert Meyer Verwaltungs GmbH, HRB 13260, AG Bad Oeynhausen ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] VDSM memory consumption
On 06/03/15 18:12, Federico Alberto Sayd wrote: Hello: I am experiencing troubles with VDSM memory consuption. I am running Engine: ovirt 3.5.1 Nodes: Centos 6.6 VDSM 4.16.10-8 Libvirt: libvirt-0.10.2-46 Kernel: 2.6.32 When the host boots, memory consuption is normal, but after 2 or 3 days running, VDSM memory consuption grows and it consumes more memory that all vm's running in the host. If I restart the vdsm service, memory consuption normalizes, but then it start growing again. I have seen some BZ about vdsm and supervdsm about memory leaks, but I don't know if VDSM 4.6.10.8 is still affected by a related bug. Any help? If you need, I can provide more information Thank you We also face this problem since 3.5 in two different installations... Hope it's fixed soon G ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] VDSM memory consumption
Once upon a time, Sven Kieske s.kie...@mittwald.de said: On 13/03/15 12:29, Kapetanakis Giannis wrote: We also face this problem since 3.5 in two different installations... Hope it's fixed soon Nothing will get fixed if no one bothers to open BZs and send relevants log files to help track down the problems. There's already an open BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1158108 I'm not sure if that is exactly the same problem I'm seeing or not; my vdsm process seems to be growing faster (RSS grew 952K in a 5 minute period just now; VSZ didn't change). -- Chris Adams c...@cmadams.net ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] VDSM memory consumption
On Mon, Mar 09, 2015 at 11:49:01PM +0100, Matt . wrote: Hi, I also see this on the latest 3.5 version, I'm thinking about setting up a cronjob to restart vdsm every night. I cannot believe that people say they don't have this issue. Can someone of the devs dive in maybe ? 10:01:54 AM saggi: YamakasY: it's in getCapabilities(). Here is the RSS graph. The flatlines are when I stopped calling it and called other verbs. http://i.imgur.com/CLm0Q75.png I do ***NOT*** recall what is the issue Saggi and YamakasY were dicussing (CCing the pair), or if it reached fruition as a patch. It is certainly something other than Bug 1158108, as the latter speak about a leak in a normal working state, with no getCapabilities calls. Please notice an important word that fell off my text. Do YOU recall if a fix was posted? ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] VDSM memory consumption
On 03/10/2015 12:19 AM, Dan Kenigsberg wrote: On Mon, Mar 09, 2015 at 12:17:00PM -0500, Chris Adams wrote: Once upon a time, Dan Kenigsberg dan...@redhat.com said: I'm afraid that we are yet to find a solution for this issue, which is completly different from the horrible leak of supervdsm 4.16.7. Could you corroborate the claim of Bug 1147148 - M2Crypto usage in vdsm leaks memory ? Does the leak disappear once you start using plaintext transport? So, to confirm, it looks like to do that, the steps would be: - In the [vars] section of /etc/vdsm/vdsm.conf, set ssl = false. - Restart the vdsmd service. Is that all that is needed? No. You'd have to reconfigure libvirtd to work in plaintext vdsm-tool congfigure --force and also set you Engine to work in plaintext (unfortunately, I don't recall how's that done. surely Yaniv does) if the host already managed by the engine you can move it to maintenance, set directly in vdc_options table by psql client to your db- update to False in vdc_options the value of 'EncryptHostCommunication' 'SSLEnabled' options, then restart ovirt-engine. expect the engine side, run also the changes on host (ssl=False and configure --force as Dan mentions above) and reactivate the host. Is it safe to restart vdsmd on a node with active VMs? It's safe in the sense that I have not heard of a single failure to reconnected to already-running VMs in years. However, this is still not recommended for production environment, and particularly not if one of the VMs is defined as highly-available. This can end up with your host being fenced and all your VMs dead. Dan. -- Yaniv Bronhaim. ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] VDSM memory consumption
NO! The fix that should have fixed it didn't change a thing... we lost track there as some devs were going to look at it. 2015-03-10 11:47 GMT+01:00 Dan Kenigsberg dan...@redhat.com: On Mon, Mar 09, 2015 at 11:49:01PM +0100, Matt . wrote: Hi, I also see this on the latest 3.5 version, I'm thinking about setting up a cronjob to restart vdsm every night. I cannot believe that people say they don't have this issue. Can someone of the devs dive in maybe ? 10:01:54 AM saggi: YamakasY: it's in getCapabilities(). Here is the RSS graph. The flatlines are when I stopped calling it and called other verbs. http://i.imgur.com/CLm0Q75.png I do ***NOT*** recall what is the issue Saggi and YamakasY were dicussing (CCing the pair), or if it reached fruition as a patch. It is certainly something other than Bug 1158108, as the latter speak about a leak in a normal working state, with no getCapabilities calls. Please notice an important word that fell off my text. Do YOU recall if a fix was posted? ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] VDSM memory consumption
On Mar 9, 2015, at 4:51 AM, Dan Kenigsberg dan...@redhat.com wrote: On Fri, Mar 06, 2015 at 10:58:53AM -0600, Darrell Budic wrote: I believe the supervdsm leak was fixed, but 3.5.1 versions of vdsmd still leaks slowly, ~300k/hr, yes. https://bugzilla.redhat.com/show_bug.cgi?id=1158108 On Mar 6, 2015, at 10:23 AM, Chris Adams c...@cmadams.net wrote: Once upon a time, Federico Alberto Sayd fs...@uncu.edu.ar said: I am experiencing troubles with VDSM memory consuption. I am running Engine: ovirt 3.5.1 Nodes: Centos 6.6 VDSM 4.16.10-8 Libvirt: libvirt-0.10.2-46 Kernel: 2.6.32 When the host boots, memory consuption is normal, but after 2 or 3 days running, VDSM memory consuption grows and it consumes more memory that all vm's running in the host. If I restart the vdsm service, memory consuption normalizes, but then it start growing again. I have seen some BZ about vdsm and supervdsm about memory leaks, but I don't know if VDSM 4.6.10.8 is still affected by a related bug. Can't help, but I see the same thing with CentOS 7 nodes and the same version of vdsm. -- Chris Adams c...@cmadams.net ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users I'm afraid that we are yet to find a solution for this issue, which is completly different from the horrible leak of supervdsm 4.16.7. Could you corroborate the claim of Bug 1147148 - M2Crypto usage in vdsm leaks memory ? Does the leak disappear once you start using plaintext transport? Regards, Dan. I don’t think this is crypto related, but I could try that if you still need some confirmation (and point me at a quick doc on switching to plaintext?). This is from #ovirt around November 18th I think, Saggi thought he’d found something related: 9:58:43 AM saggi: YamakasY: Found the leak 9:58:48 AM saggi: YamakasY: Or at least the flow 9:58:57 AM saggi: YamakasY: The good news is that I can reproduce 9:59:20 AM YamakasY: saggi: that's kewl! 9:59:25 AM YamakasY: saggi: what happens ? 9:59:41 AM YamakasY: I know from Telsin (ping ping!) that he sees it going faster on gluster usage tdosek left the room (quit: Ping timeout: 480 seconds). (10:00:02 AM) djasa left the room (quit: Quit: Leaving). (10:00:24 AM) mlipchuk left the room (quit: Quit: Leaving.). (10:00:29 AM) laravot left the room (quit: Quit: Leaving.). (10:01:19 AM) 10:01:54 AM saggi: YamakasY: it's in getCapabilities(). Here is the RSS graph. The flatlines are when I stopped calling it and called other verbs. http://i.imgur.com/CLm0Q75.png movciari left the room (quit: Ping timeout: 480 seconds). (10:02:34 AM) 10:02:46 AM saggi: YamakasY: horizontal is time since epoch and vertical is RSS in bytes bobdrad left the room (quit: Quit: Leaving.). (10:03:25 AM) 10:03:52 AM YamakasY: saggi: I have seen that line s much! 10:04:11 AM YamakasY: I think I even made a mailing about it 10:04:18 AM YamakasY: at least asked here 10:04:32 AM YamakasY: no-one knew, but those lines are almost blowing you away 10:04:35 AM YamakasY: can we patch it ? 10:04:59 AM YamakasY: wow, nice one to catch 10:05:28 AM saggi: YamakasY: I now have a smaller part of the code to scan through and a way to reproduce so hopefully I'll have a patch soon was that ever followed up on? ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] VDSM memory consumption
Once upon a time, Dan Kenigsberg dan...@redhat.com said: I'm afraid that we are yet to find a solution for this issue, which is completly different from the horrible leak of supervdsm 4.16.7. Could you corroborate the claim of Bug 1147148 - M2Crypto usage in vdsm leaks memory ? Does the leak disappear once you start using plaintext transport? So, to confirm, it looks like to do that, the steps would be: - In the [vars] section of /etc/vdsm/vdsm.conf, set ssl = false. - Restart the vdsmd service. Is that all that is needed? Is it safe to restart vdsmd on a node with active VMs? -- Chris Adams c...@cmadams.net ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] VDSM memory consumption
On Mon, Mar 09, 2015 at 10:40:51AM -0500, Darrell Budic wrote: On Mar 9, 2015, at 4:51 AM, Dan Kenigsberg dan...@redhat.com wrote: On Fri, Mar 06, 2015 at 10:58:53AM -0600, Darrell Budic wrote: I believe the supervdsm leak was fixed, but 3.5.1 versions of vdsmd still leaks slowly, ~300k/hr, yes. https://bugzilla.redhat.com/show_bug.cgi?id=1158108 On Mar 6, 2015, at 10:23 AM, Chris Adams c...@cmadams.net wrote: Once upon a time, Federico Alberto Sayd fs...@uncu.edu.ar said: I am experiencing troubles with VDSM memory consuption. I am running Engine: ovirt 3.5.1 Nodes: Centos 6.6 VDSM 4.16.10-8 Libvirt: libvirt-0.10.2-46 Kernel: 2.6.32 When the host boots, memory consuption is normal, but after 2 or 3 days running, VDSM memory consuption grows and it consumes more memory that all vm's running in the host. If I restart the vdsm service, memory consuption normalizes, but then it start growing again. I have seen some BZ about vdsm and supervdsm about memory leaks, but I don't know if VDSM 4.6.10.8 is still affected by a related bug. Can't help, but I see the same thing with CentOS 7 nodes and the same version of vdsm. -- Chris Adams c...@cmadams.net ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users I'm afraid that we are yet to find a solution for this issue, which is completly different from the horrible leak of supervdsm 4.16.7. Could you corroborate the claim of Bug 1147148 - M2Crypto usage in vdsm leaks memory ? Does the leak disappear once you start using plaintext transport? Regards, Dan. I don’t think this is crypto related, but I could try that if you still need some confirmation (and point me at a quick doc on switching to plaintext?). This is from #ovirt around November 18th I think, Saggi thought he’d found something related: 9:58:43 AM saggi: YamakasY: Found the leak 9:58:48 AM saggi: YamakasY: Or at least the flow 9:58:57 AM saggi: YamakasY: The good news is that I can reproduce 9:59:20 AM YamakasY: saggi: that's kewl! 9:59:25 AM YamakasY: saggi: what happens ? 9:59:41 AM YamakasY: I know from Telsin (ping ping!) that he sees it going faster on gluster usage tdosek left the room (quit: Ping timeout: 480 seconds). (10:00:02 AM) djasa left the room (quit: Quit: Leaving). (10:00:24 AM) mlipchuk left the room (quit: Quit: Leaving.). (10:00:29 AM) laravot left the room (quit: Quit: Leaving.). (10:01:19 AM) 10:01:54 AM saggi: YamakasY: it's in getCapabilities(). Here is the RSS graph. The flatlines are when I stopped calling it and called other verbs. http://i.imgur.com/CLm0Q75.png I do recall what is the issue Saggi and YamakasY were dicussing (CCing the pair), or if it reached fruition as a patch. It is certainly something other than Bug 1158108, as the latter speak about a leak in a normal working state, with no getCapabilities calls. ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] VDSM memory consumption
On Mon, Mar 09, 2015 at 12:17:00PM -0500, Chris Adams wrote: Once upon a time, Dan Kenigsberg dan...@redhat.com said: I'm afraid that we are yet to find a solution for this issue, which is completly different from the horrible leak of supervdsm 4.16.7. Could you corroborate the claim of Bug 1147148 - M2Crypto usage in vdsm leaks memory ? Does the leak disappear once you start using plaintext transport? So, to confirm, it looks like to do that, the steps would be: - In the [vars] section of /etc/vdsm/vdsm.conf, set ssl = false. - Restart the vdsmd service. Is that all that is needed? No. You'd have to reconfigure libvirtd to work in plaintext vdsm-tool congfigure --force and also set you Engine to work in plaintext (unfortunately, I don't recall how's that done. surely Yaniv does) Is it safe to restart vdsmd on a node with active VMs? It's safe in the sense that I have not heard of a single failure to reconnected to already-running VMs in years. However, this is still not recommended for production environment, and particularly not if one of the VMs is defined as highly-available. This can end up with your host being fenced and all your VMs dead. Dan. ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] VDSM memory consumption
Hi, I also see this on the latest 3.5 version, I'm thinking about setting up a cronjob to restart vdsm every night. I cannot believe that people say they don't have this issue. Can someone of the devs dive in maybe ? Thanks! Matt 2015-03-09 23:29 GMT+01:00 Dan Kenigsberg dan...@redhat.com: On Mon, Mar 09, 2015 at 10:40:51AM -0500, Darrell Budic wrote: On Mar 9, 2015, at 4:51 AM, Dan Kenigsberg dan...@redhat.com wrote: On Fri, Mar 06, 2015 at 10:58:53AM -0600, Darrell Budic wrote: I believe the supervdsm leak was fixed, but 3.5.1 versions of vdsmd still leaks slowly, ~300k/hr, yes. https://bugzilla.redhat.com/show_bug.cgi?id=1158108 On Mar 6, 2015, at 10:23 AM, Chris Adams c...@cmadams.net wrote: Once upon a time, Federico Alberto Sayd fs...@uncu.edu.ar said: I am experiencing troubles with VDSM memory consuption. I am running Engine: ovirt 3.5.1 Nodes: Centos 6.6 VDSM 4.16.10-8 Libvirt: libvirt-0.10.2-46 Kernel: 2.6.32 When the host boots, memory consuption is normal, but after 2 or 3 days running, VDSM memory consuption grows and it consumes more memory that all vm's running in the host. If I restart the vdsm service, memory consuption normalizes, but then it start growing again. I have seen some BZ about vdsm and supervdsm about memory leaks, but I don't know if VDSM 4.6.10.8 is still affected by a related bug. Can't help, but I see the same thing with CentOS 7 nodes and the same version of vdsm. -- Chris Adams c...@cmadams.net ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users I'm afraid that we are yet to find a solution for this issue, which is completly different from the horrible leak of supervdsm 4.16.7. Could you corroborate the claim of Bug 1147148 - M2Crypto usage in vdsm leaks memory ? Does the leak disappear once you start using plaintext transport? Regards, Dan. I don’t think this is crypto related, but I could try that if you still need some confirmation (and point me at a quick doc on switching to plaintext?). This is from #ovirt around November 18th I think, Saggi thought he’d found something related: 9:58:43 AM saggi: YamakasY: Found the leak 9:58:48 AM saggi: YamakasY: Or at least the flow 9:58:57 AM saggi: YamakasY: The good news is that I can reproduce 9:59:20 AM YamakasY: saggi: that's kewl! 9:59:25 AM YamakasY: saggi: what happens ? 9:59:41 AM YamakasY: I know from Telsin (ping ping!) that he sees it going faster on gluster usage tdosek left the room (quit: Ping timeout: 480 seconds). (10:00:02 AM) djasa left the room (quit: Quit: Leaving). (10:00:24 AM) mlipchuk left the room (quit: Quit: Leaving.). (10:00:29 AM) laravot left the room (quit: Quit: Leaving.). (10:01:19 AM) 10:01:54 AM saggi: YamakasY: it's in getCapabilities(). Here is the RSS graph. The flatlines are when I stopped calling it and called other verbs. http://i.imgur.com/CLm0Q75.png I do recall what is the issue Saggi and YamakasY were dicussing (CCing the pair), or if it reached fruition as a patch. It is certainly something other than Bug 1158108, as the latter speak about a leak in a normal working state, with no getCapabilities calls. ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] VDSM memory consumption
On Fri, Mar 06, 2015 at 10:58:53AM -0600, Darrell Budic wrote: I believe the supervdsm leak was fixed, but 3.5.1 versions of vdsmd still leaks slowly, ~300k/hr, yes. https://bugzilla.redhat.com/show_bug.cgi?id=1158108 On Mar 6, 2015, at 10:23 AM, Chris Adams c...@cmadams.net wrote: Once upon a time, Federico Alberto Sayd fs...@uncu.edu.ar said: I am experiencing troubles with VDSM memory consuption. I am running Engine: ovirt 3.5.1 Nodes: Centos 6.6 VDSM 4.16.10-8 Libvirt: libvirt-0.10.2-46 Kernel: 2.6.32 When the host boots, memory consuption is normal, but after 2 or 3 days running, VDSM memory consuption grows and it consumes more memory that all vm's running in the host. If I restart the vdsm service, memory consuption normalizes, but then it start growing again. I have seen some BZ about vdsm and supervdsm about memory leaks, but I don't know if VDSM 4.6.10.8 is still affected by a related bug. Can't help, but I see the same thing with CentOS 7 nodes and the same version of vdsm. -- Chris Adams c...@cmadams.net ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users I'm afraid that we are yet to find a solution for this issue, which is completly different from the horrible leak of supervdsm 4.16.7. Could you corroborate the claim of Bug 1147148 - M2Crypto usage in vdsm leaks memory ? Does the leak disappear once you start using plaintext transport? Regards, Dan. ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
[ovirt-users] VDSM memory consumption
Hello: I am experiencing troubles with VDSM memory consuption. I am running Engine: ovirt 3.5.1 Nodes: Centos 6.6 VDSM 4.16.10-8 Libvirt: libvirt-0.10.2-46 Kernel: 2.6.32 When the host boots, memory consuption is normal, but after 2 or 3 days running, VDSM memory consuption grows and it consumes more memory that all vm's running in the host. If I restart the vdsm service, memory consuption normalizes, but then it start growing again. I have seen some BZ about vdsm and supervdsm about memory leaks, but I don't know if VDSM 4.6.10.8 is still affected by a related bug. Any help? If you need, I can provide more information Thank you ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] VDSM memory consumption
Once upon a time, Federico Alberto Sayd fs...@uncu.edu.ar said: I am experiencing troubles with VDSM memory consuption. I am running Engine: ovirt 3.5.1 Nodes: Centos 6.6 VDSM 4.16.10-8 Libvirt: libvirt-0.10.2-46 Kernel: 2.6.32 When the host boots, memory consuption is normal, but after 2 or 3 days running, VDSM memory consuption grows and it consumes more memory that all vm's running in the host. If I restart the vdsm service, memory consuption normalizes, but then it start growing again. I have seen some BZ about vdsm and supervdsm about memory leaks, but I don't know if VDSM 4.6.10.8 is still affected by a related bug. Can't help, but I see the same thing with CentOS 7 nodes and the same version of vdsm. -- Chris Adams c...@cmadams.net ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] VDSM memory consumption
I believe the supervdsm leak was fixed, but 3.5.1 versions of vdsmd still leaks slowly, ~300k/hr, yes. https://bugzilla.redhat.com/show_bug.cgi?id=1158108 On Mar 6, 2015, at 10:23 AM, Chris Adams c...@cmadams.net wrote: Once upon a time, Federico Alberto Sayd fs...@uncu.edu.ar said: I am experiencing troubles with VDSM memory consuption. I am running Engine: ovirt 3.5.1 Nodes: Centos 6.6 VDSM 4.16.10-8 Libvirt: libvirt-0.10.2-46 Kernel: 2.6.32 When the host boots, memory consuption is normal, but after 2 or 3 days running, VDSM memory consuption grows and it consumes more memory that all vm's running in the host. If I restart the vdsm service, memory consuption normalizes, but then it start growing again. I have seen some BZ about vdsm and supervdsm about memory leaks, but I don't know if VDSM 4.6.10.8 is still affected by a related bug. Can't help, but I see the same thing with CentOS 7 nodes and the same version of vdsm. -- Chris Adams c...@cmadams.net ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users