Re: [ovirt-users] vdsm memory consumption (ovirt 4.0)

2016-09-01 Thread Nir Soffer
On Wed, Aug 31, 2016 at 9:06 PM, Nir Soffer  wrote:

> On Wed, Aug 31, 2016 at 8:06 PM, Federico Alberto Sayd 
> wrote:
>
>> Hello Nir:
>>
>>
>> I followed your instructions , added the config file, restarted vdsm, and
>> today I have the vdsm logs from a host:
>>
>> https://drive.google.com/file/d/0ByrwZ1AkYuyeR1hmRm90a1R6MEk
>> /view?usp=sharing
>>
>> Please tell me if you see anything related to the memory issue.
>>
>
> This logs start when vdsm is using 567640 kB (554 MiB) - very unusual.
>
> The memory usage grow by 18 MiB during one day. No garbage collection
> issues. This smells like we keep some data forever for no reason.
>
> $ grep rss= vdsm-leak.log | head -n 1
> Thread-33::DEBUG::2016-08-30 
> 12:01:43,845::health::122::health::(_check_resources)
> user=1.73%, sys=1.65%, rss=567640 kB (+44), threads=57
>
>  $ grep rss= vdsm-leak.log | tail -n 1
> Thread-33::DEBUG::2016-08-31 
> 13:00:36,913::health::122::health::(_check_resources)
> user=4.18%, sys=1.87%, rss=586584 kB (+0), threads=52
>
> I would like to see the logs since vdsm was started  - do you have them?
>
> Also, can you describe the workload on this hypervisor?
>
> - how many vms are running at the same time
> - how many vms are started and stopped per hour
> - using default vdsm.conf? if not, please attach your conf
>

I could reproduce similar leak in master - it seems that we leak about 1MiB
for each vm started and stopped.

I opened this bug:
https://bugzilla.redhat.com/1372205

Please check if this bug match your issue. If it does, please add
your logs and other info to this bug.

Thanks,
Nir
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] vdsm memory consumption (ovirt 4.0)

2016-08-31 Thread Nir Soffer
On Wed, Aug 31, 2016 at 8:06 PM, Federico Alberto Sayd 
wrote:

> Hello Nir:
>
>
> I followed your instructions , added the config file, restarted vdsm, and
> today I have the vdsm logs from a host:
>
> https://drive.google.com/file/d/0ByrwZ1AkYuyeR1hmRm90a1R6MEk/
> view?usp=sharing
>
> Please tell me if you see anything related to the memory issue.
>

This logs start when vdsm is using 567640 kB (554 MiB) - very unusual.

The memory usage grow by 18 MiB during one day. No garbage collection
issues. This smells like we keep some data forever for no reason.

$ grep rss= vdsm-leak.log | head -n 1
Thread-33::DEBUG::2016-08-30
12:01:43,845::health::122::health::(_check_resources) user=1.73%,
sys=1.65%, rss=567640 kB (+44), threads=57

 $ grep rss= vdsm-leak.log | tail -n 1
Thread-33::DEBUG::2016-08-31
13:00:36,913::health::122::health::(_check_resources) user=4.18%,
sys=1.87%, rss=586584 kB (+0), threads=52

I would like to see the logs since vdsm was started  - do you have them?

Also, can you describe the workload on this hypervisor?

- how many vms are running at the same time
- how many vms are started and stopped per hour
- using default vdsm.conf? if not, please attach your conf

Nir

>
> Thanks
>
> Federico
>
> El 30/08/16 a las 03:47, Nir Soffer escribió:
>
> On Tue, Aug 30, 2016 at 1:30 AM, Federico Alberto Sayd <
> fs...@uncu.edu.ar> wrote:
>
>> I have issues with my ovirt setup related to memory consumption. After
>> upgrading to 4.0 I noted a considerable grow in vdsm memory consumption.
>> I suspect that the grow is related to a memory leak.
>>
>
> We need more details, see bellow...
>
>
>>
>> When I boot up the system and activate the host the memory consumption
>> is about 600MB. After 5 days running and host in maintenance mode the
>> memory consumption is about 1,4 GB.
>>
>> I need to put my hosts in maintenance and reboot to free memory.
>>
>
> You can restart vdsm (systemctl restart vdsmd) instead, running vms
> are not effected by this.
>
>
>>
>> Can anyone help me to debug this problem?
>>
>
> We had a memory in vdsm-4.18.5, fixed  in vdsm-4.18.11. Since you
> are running 4.18.11, there may be another leak.
>
> Please enable health monitoring by creating
> /etc/vdsm/vdsm.conf.d/50-health.conf
>
> [devel]
> health_monitor_enable = true
>
> And restart vdsm.
>
> Please run with this setting for couple of hours, maybe one day,
> and then share the vdsm logs from this timeframe.
>
> You may disable health monitoring by setting
>
> [devel]
> health_monitor_enable = false
>
> Or by renaming or deleting this configuration file:
>
> /etc/vdsm/vdsm.conf.d/50-health.conf.disabled
>
> Nir
>
>
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] vdsm memory consumption (ovirt 4.0)

2016-08-31 Thread Federico Alberto Sayd

Hello Nir:


I followed your instructions , added the config file, restarted vdsm, 
and today I have the vdsm logs from a host:


https://drive.google.com/file/d/0ByrwZ1AkYuyeR1hmRm90a1R6MEk/view?usp=sharing

Please tell me if you see anything related to the memory issue.


Thanks


Federico

El 30/08/16 a las 03:47, Nir Soffer escribió:
On Tue, Aug 30, 2016 at 1:30 AM, Federico Alberto Sayd 
> wrote:


I have issues with my ovirt setup related to memory consumption. After
upgrading to 4.0 I noted a considerable grow in vdsm memory
consumption.
I suspect that the grow is related to a memory leak.


We need more details, see bellow...


When I boot up the system and activate the host the memory consumption
is about 600MB. After 5 days running and host in maintenance mode the
memory consumption is about 1,4 GB.

I need to put my hosts in maintenance and reboot to free memory.


You can restart vdsm (systemctl restart vdsmd) instead, running vms
are not effected by this.


Can anyone help me to debug this problem?


We had a memory in vdsm-4.18.5, fixed  in vdsm-4.18.11. Since you
are running 4.18.11, there may be another leak.

Please enable health monitoring by creating
/etc/vdsm/vdsm.conf.d/50-health.conf

[devel]
health_monitor_enable = true

And restart vdsm.

Please run with this setting for couple of hours, maybe one day,
and then share the vdsm logs from this timeframe.

You may disable health monitoring by setting

[devel]
health_monitor_enable = false

Or by renaming or deleting this configuration file:

/etc/vdsm/vdsm.conf.d/50-health.conf.disabled

Nir



___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] vdsm memory consumption (ovirt 4.0)

2016-08-30 Thread Nir Soffer
On Tue, Aug 30, 2016 at 1:30 AM, Federico Alberto Sayd 
wrote:

> I have issues with my ovirt setup related to memory consumption. After
> upgrading to 4.0 I noted a considerable grow in vdsm memory consumption.
> I suspect that the grow is related to a memory leak.
>

We need more details, see bellow...


>
> When I boot up the system and activate the host the memory consumption
> is about 600MB. After 5 days running and host in maintenance mode the
> memory consumption is about 1,4 GB.
>
> I need to put my hosts in maintenance and reboot to free memory.
>

You can restart vdsm (systemctl restart vdsmd) instead, running vms
are not effected by this.


>
> Can anyone help me to debug this problem?
>

We had a memory in vdsm-4.18.5, fixed  in vdsm-4.18.11. Since you
are running 4.18.11, there may be another leak.

Please enable health monitoring by creating
/etc/vdsm/vdsm.conf.d/50-health.conf

[devel]
health_monitor_enable = true

And restart vdsm.

Please run with this setting for couple of hours, maybe one day,
and then share the vdsm logs from this timeframe.

You may disable health monitoring by setting

[devel]
health_monitor_enable = false

Or by renaming or deleting this configuration file:

/etc/vdsm/vdsm.conf.d/50-health.conf.disabled

Nir
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] vdsm memory consumption (ovirt 4.0)

2016-08-29 Thread Federico Alberto Sayd
I have issues with my ovirt setup related to memory consumption. After
upgrading to 4.0 I noted a considerable grow in vdsm memory consumption.
I suspect that the grow is related to a memory leak.

When I boot up the system and activate the host the memory consumption
is about 600MB. After 5 days running and host in maintenance mode the
memory consumption is about 1,4 GB.

I need to put my hosts in maintenance and reboot to free memory.

Can anyone help me to debug this problem?

OS Version:
RHEL - 7 - 2.1511.el7.centos.2.10
Kernel Version:
3.10.0 - 327.22.2.el7.x86_64
KVM Version:
2.3.0 - 31.el7.16.1
LIBVIRT Version:
libvirt-1.2.17-13.el7_2.5
VDSM Version:
vdsm-4.18.11-1.el7.centos

Thank you
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] VDSM memory consumption

2015-03-31 Thread Darrell Budic
Finally got a chance to implement this, so testing this on my centos7 hosts, 
and it looks good. I’ll keep eye on it for a couple days, but after a couple of 
hours, there’s no evidence of any leakage.


 On Mar 30, 2015, at 4:14 PM, John Taylor jtt77...@yahoo.com wrote:
 
 Dan Kenigsberg dan...@redhat.com writes:
 
 On Sat, Mar 28, 2015 at 10:20:25AM -0400, John Taylor wrote:
 Daniel Helgenberger daniel.helgenber...@m-box.de writes:
 
 Hello Everyone,
 
 I did create the original BZ on this. In the mean time, lab system I
 used is dismantled and the production system is yet to deploy.
 
 As I wrote in BZ1147148 [1], I experienced two different issues. One,
 one big mem leak of about 15MiB/h and a smaller one, ~300KiB. These seem
 unrelated.
 
 The larger leak was indeed related to SSL in some way; not necessarily
 M2Crypto. However, after disabling SSL this was gone leaving the smaller
 leak.
 
 [1] https://bugzilla.redhat.com/show_bug.cgi?id=1147148
 
 
 I think there are, at least for the purpose of this discussion, 3 leaks:
 1. the M2Crypto leak
 2. a slower leak 
 3. a large leak that's not M2Crypto related that's part of sampling
 
 My efforts have been around finding the source of my larger leak, which
 I think is #3.  I had disabled ssl so I knew that M2Crypto
 isn't/shouldn't be the problem as in bz1147148, and ssl is beside the
 point as it happens with a deactived host. It's part of sampling which
 always runs.
 
 What I've found is, after trying to get the smallest reproducer, that
 it's not the netlink.iter_links that I commented on in [1] that is the
 problem. But in the _get_intefaces_and_samples loop is the call to
 create an InterfaceSample and that has getLinkSpeed() which, for vlans,
 ends up calling ipwrapper.getLink, and that to
 netlink.get_link(name)
 
 netlink.get_link(name) *is* the source of my big leak. This is vdsm
 4.16.10, so it is [2] and it's been changed in master for the removal of
 support for libnl v1 so it might not be a problem anymore. 
 
 def get_link(name):
Returns the information dictionary of the name specified link.
with _pool.socket() as sock:
with _nl_link_cache(sock) as cache:
link = _rtnl_link_get_by_name(cache, name)
if not link:
raise IOError(errno.ENODEV, '%s is not present in the 
 system' %
  name)
return _link_info(cache, link)
 
 
 The libnl documentation note at [3] says that for the rtnl_link_get_by_name 
 function 
 Attention
The reference counter of the returned link object will be incremented. 
 Use rtnl_link_put() to release the reference.
 
 So I took that hint, and made a change that does the rtnl_link_put() in
 get_link(name) and it looks like it works for me.
 
 diff oldnetlink.py netlink.py
 67d66
  return _link_info(cache, link)
 68a68,70
li = _link_info(cache, link)
_rtnl_link_put(link)
return li
 333a336,337
 
 _rtnl_link_put  = _none_proto(('rtnl_link_put', LIBNL_ROUTE))
 
 Hope that helps. And if someone else could confirm that would be great.
 
 [1] https://bugzilla.redhat.com/show_bug.cgi?id=1158108
 [2]
 https://gerrit.ovirt.org/gitweb?p=vdsm.git;a=blob;f=lib/vdsm/netlink.py;h=afae5cecb5ce701d00fb8f019ec92b3331a39036;hb=5608cfdf43db9186dabac4b2a779f9557e798968
 [3] 
 http://www.infradead.org/~tgr/libnl/doc/api/group__link.html#ga1d583e4f0b43c89d854e5e681a529fad
 
 Thanks, John, for a great detective work.
 
 I'm afraid that with even on the master branch we keep calling
 rtnl_link_get_link() and rtnl_link_get_by_name() without clearing the
 reference count, so a fix is due there, too.
 
 Would you consider posting a fully-fledged fix to gerrit? I still need
 to understand what is the use of that refcount, so that we do not
 release it too early.
 
 Regards,
 Dan.
 
 Dan,
 
 I'm happy to [1], although I've probably gotten something wrong with how
 it's supposed to be done :) It's for the version I'm using so it's for
 branch ovirt-3.5.
 
 [1] https://gerrit.ovirt.org/#/c/39372/
 
 Thanks,
 -John
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] VDSM memory consumption

2015-03-30 Thread Dan Kenigsberg
On Sat, Mar 28, 2015 at 10:20:25AM -0400, John Taylor wrote:
 Daniel Helgenberger daniel.helgenber...@m-box.de writes:
 
  Hello Everyone,
 
  I did create the original BZ on this. In the mean time, lab system I
  used is dismantled and the production system is yet to deploy.
 
  As I wrote in BZ1147148 [1], I experienced two different issues. One,
  one big mem leak of about 15MiB/h and a smaller one, ~300KiB. These seem
  unrelated.
 
  The larger leak was indeed related to SSL in some way; not necessarily
  M2Crypto. However, after disabling SSL this was gone leaving the smaller
  leak.
 
  [1] https://bugzilla.redhat.com/show_bug.cgi?id=1147148
 
 
 I think there are, at least for the purpose of this discussion, 3 leaks:
 1. the M2Crypto leak
 2. a slower leak 
 3. a large leak that's not M2Crypto related that's part of sampling
 
 My efforts have been around finding the source of my larger leak, which
 I think is #3.  I had disabled ssl so I knew that M2Crypto
 isn't/shouldn't be the problem as in bz1147148, and ssl is beside the
 point as it happens with a deactived host. It's part of sampling which
 always runs.
 
 What I've found is, after trying to get the smallest reproducer, that
 it's not the netlink.iter_links that I commented on in [1] that is the
 problem. But in the _get_intefaces_and_samples loop is the call to
 create an InterfaceSample and that has getLinkSpeed() which, for vlans,
 ends up calling ipwrapper.getLink, and that to
 netlink.get_link(name)
 
 netlink.get_link(name) *is* the source of my big leak. This is vdsm
 4.16.10, so it is [2] and it's been changed in master for the removal of
 support for libnl v1 so it might not be a problem anymore. 
  
 def get_link(name):
 Returns the information dictionary of the name specified link.
 with _pool.socket() as sock:
 with _nl_link_cache(sock) as cache:
 link = _rtnl_link_get_by_name(cache, name)
 if not link:
 raise IOError(errno.ENODEV, '%s is not present in the system' 
 %
   name)
 return _link_info(cache, link)
 
 
 The libnl documentation note at [3] says that for the rtnl_link_get_by_name 
 function 
 Attention
 The reference counter of the returned link object will be incremented. 
 Use rtnl_link_put() to release the reference.
 
 So I took that hint, and made a change that does the rtnl_link_put() in
 get_link(name) and it looks like it works for me.
 
 diff oldnetlink.py netlink.py
 67d66
  return _link_info(cache, link)
 68a68,70
  li = _link_info(cache, link)
  _rtnl_link_put(link)
  return li
 333a336,337
  
  _rtnl_link_put  = _none_proto(('rtnl_link_put', LIBNL_ROUTE))
 
 Hope that helps. And if someone else could confirm that would be great.
 
 [1] https://bugzilla.redhat.com/show_bug.cgi?id=1158108
 [2] 
 https://gerrit.ovirt.org/gitweb?p=vdsm.git;a=blob;f=lib/vdsm/netlink.py;h=afae5cecb5ce701d00fb8f019ec92b3331a39036;hb=5608cfdf43db9186dabac4b2a779f9557e798968
 [3] 
 http://www.infradead.org/~tgr/libnl/doc/api/group__link.html#ga1d583e4f0b43c89d854e5e681a529fad

Thanks, John, for a great detective work.

I'm afraid that with even on the master branch we keep calling
rtnl_link_get_link() and rtnl_link_get_by_name() without clearing the
reference count, so a fix is due there, too.

Would you consider posting a fully-fledged fix to gerrit? I still need
to understand what is the use of that refcount, so that we do not
release it too early.

Regards,
Dan.

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] VDSM memory consumption

2015-03-30 Thread Kapetanakis Giannis

On 26/03/15 18:12, Darrell Budic wrote:

Yes, this script leaks quickly. Started out at a RSS of 21000ish, already at 
26744 a minute in, about 5 minutes later it’s at 39384 and climbing.

Been abusing a production server for those simple tests, but didn’t want to run 
valgrind against it right this minute. Did run it against the test.py script 
above though, got this (fpaste.org didn’t like, too long maybe?): 
http://tower.onholyground.com/valgrind-test.log

To comment on some other posts in this thread, I also see leaks on my test 
system which is running Centos 6.6, but it only has 3 VMs across 2 servers and 
3 configured networks and it leaks MUCH slower. I suspect people don’t notice 
this on test systems because they don’t have a lot of VMs/interfaces running, 
and don’t leave them up for weeks at a time. That’s why I was running these 
tests on my production box, to have more VMs up.


I don't think it's related directly to the number of VMs running.
Maybe indirectly if it's related to the number of network interfaces (so 
vm interfaces add to the leak).


We've seen the leak on nodes under maintenance...

G
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] VDSM memory consumption

2015-03-30 Thread Nathanaël Blanchet
Just to precise that I'm also concerned whatever is the host (el7 or 
el6) and I have many vms running on a single host (up to 15) and many 
networks ( up to 10)
It is always the same : when vdsmd finished to take the totality of 
memory, the host becomes unreacheable and vms begin to migrate. The only 
way to stop this is to restart vdsmd.


Le 30/03/2015 15:40, Kapetanakis Giannis a écrit :

On 26/03/15 18:12, Darrell Budic wrote:
Yes, this script leaks quickly. Started out at a RSS of 21000ish, 
already at 26744 a minute in, about 5 minutes later it’s at 39384 and 
climbing.


Been abusing a production server for those simple tests, but didn’t 
want to run valgrind against it right this minute. Did run it against 
the test.py script above though, got this (fpaste.org didn’t like, 
too long maybe?): http://tower.onholyground.com/valgrind-test.log


To comment on some other posts in this thread, I also see leaks on my 
test system which is running Centos 6.6, but it only has 3 VMs across 
2 servers and 3 configured networks and it leaks MUCH slower. I 
suspect people don’t notice this on test systems because they don’t 
have a lot of VMs/interfaces running, and don’t leave them up for 
weeks at a time. That’s why I was running these tests on my 
production box, to have more VMs up.


I don't think it's related directly to the number of VMs running.
Maybe indirectly if it's related to the number of network interfaces 
(so vm interfaces add to the leak).


We've seen the leak on nodes under maintenance...

G
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] VDSM memory consumption

2015-03-30 Thread John Taylor
Dan Kenigsberg dan...@redhat.com writes:

 On Sat, Mar 28, 2015 at 10:20:25AM -0400, John Taylor wrote:
 Daniel Helgenberger daniel.helgenber...@m-box.de writes:
 
  Hello Everyone,
 
  I did create the original BZ on this. In the mean time, lab system I
  used is dismantled and the production system is yet to deploy.
 
  As I wrote in BZ1147148 [1], I experienced two different issues. One,
  one big mem leak of about 15MiB/h and a smaller one, ~300KiB. These seem
  unrelated.
 
  The larger leak was indeed related to SSL in some way; not necessarily
  M2Crypto. However, after disabling SSL this was gone leaving the smaller
  leak.
 
  [1] https://bugzilla.redhat.com/show_bug.cgi?id=1147148
 
 
 I think there are, at least for the purpose of this discussion, 3 leaks:
 1. the M2Crypto leak
 2. a slower leak 
 3. a large leak that's not M2Crypto related that's part of sampling
 
 My efforts have been around finding the source of my larger leak, which
 I think is #3.  I had disabled ssl so I knew that M2Crypto
 isn't/shouldn't be the problem as in bz1147148, and ssl is beside the
 point as it happens with a deactived host. It's part of sampling which
 always runs.
 
 What I've found is, after trying to get the smallest reproducer, that
 it's not the netlink.iter_links that I commented on in [1] that is the
 problem. But in the _get_intefaces_and_samples loop is the call to
 create an InterfaceSample and that has getLinkSpeed() which, for vlans,
 ends up calling ipwrapper.getLink, and that to
 netlink.get_link(name)
 
 netlink.get_link(name) *is* the source of my big leak. This is vdsm
 4.16.10, so it is [2] and it's been changed in master for the removal of
 support for libnl v1 so it might not be a problem anymore. 
  
 def get_link(name):
 Returns the information dictionary of the name specified link.
 with _pool.socket() as sock:
 with _nl_link_cache(sock) as cache:
 link = _rtnl_link_get_by_name(cache, name)
 if not link:
 raise IOError(errno.ENODEV, '%s is not present in the 
 system' %
   name)
 return _link_info(cache, link)
 
 
 The libnl documentation note at [3] says that for the rtnl_link_get_by_name 
 function 
 Attention
 The reference counter of the returned link object will be incremented. 
 Use rtnl_link_put() to release the reference.
 
 So I took that hint, and made a change that does the rtnl_link_put() in
 get_link(name) and it looks like it works for me.
 
 diff oldnetlink.py netlink.py
 67d66
  return _link_info(cache, link)
 68a68,70
  li = _link_info(cache, link)
  _rtnl_link_put(link)
  return li
 333a336,337
  
  _rtnl_link_put  = _none_proto(('rtnl_link_put', LIBNL_ROUTE))
 
 Hope that helps. And if someone else could confirm that would be great.
 
 [1] https://bugzilla.redhat.com/show_bug.cgi?id=1158108
 [2]
 https://gerrit.ovirt.org/gitweb?p=vdsm.git;a=blob;f=lib/vdsm/netlink.py;h=afae5cecb5ce701d00fb8f019ec92b3331a39036;hb=5608cfdf43db9186dabac4b2a779f9557e798968
 [3] 
 http://www.infradead.org/~tgr/libnl/doc/api/group__link.html#ga1d583e4f0b43c89d854e5e681a529fad

 Thanks, John, for a great detective work.

 I'm afraid that with even on the master branch we keep calling
 rtnl_link_get_link() and rtnl_link_get_by_name() without clearing the
 reference count, so a fix is due there, too.

 Would you consider posting a fully-fledged fix to gerrit? I still need
 to understand what is the use of that refcount, so that we do not
 release it too early.

 Regards,
 Dan.

Dan,

I'm happy to [1], although I've probably gotten something wrong with how
it's supposed to be done :) It's for the version I'm using so it's for
branch ovirt-3.5.

[1] https://gerrit.ovirt.org/#/c/39372/

Thanks,
-John
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] VDSM memory consumption

2015-03-28 Thread John Taylor
Daniel Helgenberger daniel.helgenber...@m-box.de writes:

 Hello Everyone,

 I did create the original BZ on this. In the mean time, lab system I
 used is dismantled and the production system is yet to deploy.

 As I wrote in BZ1147148 [1], I experienced two different issues. One,
 one big mem leak of about 15MiB/h and a smaller one, ~300KiB. These seem
 unrelated.

 The larger leak was indeed related to SSL in some way; not necessarily
 M2Crypto. However, after disabling SSL this was gone leaving the smaller
 leak.

 [1] https://bugzilla.redhat.com/show_bug.cgi?id=1147148


I think there are, at least for the purpose of this discussion, 3 leaks:
1. the M2Crypto leak
2. a slower leak 
3. a large leak that's not M2Crypto related that's part of sampling

My efforts have been around finding the source of my larger leak, which
I think is #3.  I had disabled ssl so I knew that M2Crypto
isn't/shouldn't be the problem as in bz1147148, and ssl is beside the
point as it happens with a deactived host. It's part of sampling which
always runs.

What I've found is, after trying to get the smallest reproducer, that
it's not the netlink.iter_links that I commented on in [1] that is the
problem. But in the _get_intefaces_and_samples loop is the call to
create an InterfaceSample and that has getLinkSpeed() which, for vlans,
ends up calling ipwrapper.getLink, and that to
netlink.get_link(name)

netlink.get_link(name) *is* the source of my big leak. This is vdsm
4.16.10, so it is [2] and it's been changed in master for the removal of
support for libnl v1 so it might not be a problem anymore. 
 
def get_link(name):
Returns the information dictionary of the name specified link.
with _pool.socket() as sock:
with _nl_link_cache(sock) as cache:
link = _rtnl_link_get_by_name(cache, name)
if not link:
raise IOError(errno.ENODEV, '%s is not present in the system' %
  name)
return _link_info(cache, link)


The libnl documentation note at [3] says that for the rtnl_link_get_by_name 
function 
Attention
The reference counter of the returned link object will be incremented. Use 
rtnl_link_put() to release the reference.

So I took that hint, and made a change that does the rtnl_link_put() in
get_link(name) and it looks like it works for me.

diff oldnetlink.py netlink.py
67d66
 return _link_info(cache, link)
68a68,70
 li = _link_info(cache, link)
 _rtnl_link_put(link)
 return li
333a336,337
 
 _rtnl_link_put  = _none_proto(('rtnl_link_put', LIBNL_ROUTE))

Hope that helps. And if someone else could confirm that would be great.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1158108
[2] 
https://gerrit.ovirt.org/gitweb?p=vdsm.git;a=blob;f=lib/vdsm/netlink.py;h=afae5cecb5ce701d00fb8f019ec92b3331a39036;hb=5608cfdf43db9186dabac4b2a779f9557e798968
[3] 
http://www.infradead.org/~tgr/libnl/doc/api/group__link.html#ga1d583e4f0b43c89d854e5e681a529fad

-John
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] VDSM memory consumption

2015-03-26 Thread Dan Kenigsberg
On Wed, Mar 25, 2015 at 01:29:25PM -0500, Darrell Budic wrote:
 
  On Mar 25, 2015, at 5:34 AM, Dan Kenigsberg dan...@redhat.com wrote:
  
  On Tue, Mar 24, 2015 at 02:01:40PM -0500, Darrell Budic wrote:
  
  On Mar 24, 2015, at 4:33 AM, Dan Kenigsberg dan...@redhat.com wrote:
  
  On Mon, Mar 23, 2015 at 04:00:14PM -0400, John Taylor wrote:
  Chris Adams c...@cmadams.net writes:
  
  Once upon a time, Sven Kieske s.kie...@mittwald.de said:
  On 13/03/15 12:29, Kapetanakis Giannis wrote:
  We also face this problem since 3.5 in two different installations...
  Hope it's fixed soon
  
  Nothing will get fixed if no one bothers to
  open BZs and send relevants log files to help
  track down the problems.
  
  There's already an open BZ:
  
  https://bugzilla.redhat.com/show_bug.cgi?id=1158108
  
  I'm not sure if that is exactly the same problem I'm seeing or not; my
  vdsm process seems to be growing faster (RSS grew 952K in a 5 minute
  period just now; VSZ didn't change).
  
  For those following this I've added a comment on the bz [1], although in
  my case the memory leak is, like Chris Adams, a lot more than the 
  300KiB/h
  in the original bug report by Daniel Helgenberger .
  
  [1] https://bugzilla.redhat.com/show_bug.cgi?id=1158108
  
  That's interesting (and worrying).
  Could you check your suggestion by editing sampling.py so that
  _get_interfaces_and_samples() returns the empty dict immediately?
  Would this make the leak disappear?
  
  Looks like you’ve got something there. Just a quick test for now, watching 
  RSS in top. I’ll let it go this way for a while and see what it looks in a 
  few hours.
  
  System 1: 13 VMs w/ 24 interfaces between them
  
  11:47 killed a vdsm @ 9.116G RSS (after maybe a week and a half running)
  
  11:47: 97xxx
  11:57 135544 and climbing
  12:00 136400
  
  restarted with sampling.py modified to just return empty set:
  
  def _get_interfaces_and_samples():
 links_and_samples = {}
 return links_and_samples
  
  Thanks for the input. Just to be a little more certain that the culprit
  is _get_interfaces_and_samples() per se, would you please decorate it
  with memoized, and add a log line in the end
  
  @utils.memoized   # add this line
  def _get_interfaces_and_samples():
 ...
 logging.debug('LINKS %s', links_and_samples)  ## and this line
 return links_and_samples
  
  I'd like to see what happens when the function is run only once, and
  returns a non-empty reasonable dictionary of links and samples.
 
 Looks similar, I modified my second server for this test:

Thanks again. Would you be kind to search further?
Does the following script leak anything on your host, when placed in your
/usr/share/vdsm:

#!/usr/bin/python

from time import sleep
from virt.sampling import _get_interfaces_and_samples

while True:
_get_interfaces_and_samples()
sleep(0.2)

Something that can be a bit harder would be to:
# service vdsmd stop
# su - vdsm -s /bin/bash
# cd /usr/share/vdsm
# valgrind --leak-check=full --log-file=/tmp/your.log vdsm

as suggested by Thomas on
https://bugzilla.redhat.com/show_bug.cgi?id=1158108#c6

Regards,
Dan.
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] VDSM memory consumption

2015-03-26 Thread Daniel Helgenberger
Hello Everyone,

I did create the original BZ on this. In the mean time, lab system I
used is dismantled and the production system is yet to deploy.

As I wrote in BZ1147148 [1], I experienced two different issues. One,
one big mem leak of about 15MiB/h and a smaller one, ~300KiB. These seem
unrelated.

The larger leak was indeed related to SSL in some way; not necessarily
M2Crypto. However, after disabling SSL this was gone leaving the smaller
leak.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1147148
On Mo, 2015-03-09 at 23:49 +0100, Matt . wrote:
 Hi,
 
 I also see this on the latest 3.5 version, I'm thinking about setting
 up a cronjob to restart vdsm every night.
I did the same thing. In general, it seems to be a bad idea as it
compromised system stability on the long run. While VMs seem to be fine,
engine does not like this very much.

 I cannot believe that people say they don't have this issue.
This was hard for me to accept as well. I know of Markus Stockhausen and
Seven Kieske, both confirmed the small leak. This might also be some
special other service; though I started out with a minimal install of
Centos 6.
 
 Can someone of the devs dive in maybe ?
 
 Thanks!
 
 Matt
 
 
 
 2015-03-09 23:29 GMT+01:00 Dan Kenigsberg dan...@redhat.com:
  On Mon, Mar 09, 2015 at 10:40:51AM -0500, Darrell Budic wrote:
   On Mar 9, 2015, at 4:51 AM, Dan Kenigsberg dan...@redhat.com wrote:
  
   On Fri, Mar 06, 2015 at 10:58:53AM -0600, Darrell Budic wrote:
   I believe the supervdsm leak was fixed, but 3.5.1 versions of vdsmd 
   still leaks slowly, ~300k/hr, yes.
  
   https://bugzilla.redhat.com/show_bug.cgi?id=1158108
  
  
   On Mar 6, 2015, at 10:23 AM, Chris Adams c...@cmadams.net wrote:
  
   Once upon a time, Federico Alberto Sayd fs...@uncu.edu.ar said:
   I am experiencing troubles with VDSM memory consuption.
  
   I am running
  
   Engine: ovirt 3.5.1
  
   Nodes:
  
   Centos 6.6
   VDSM 4.16.10-8
   Libvirt: libvirt-0.10.2-46
   Kernel: 2.6.32
  
   When the host boots, memory consuption is normal, but after 2 or 3
   days running, VDSM memory consuption grows and it consumes more
   memory that all vm's running in the host. If I restart the vdsm
   service, memory consuption normalizes, but then it start growing
   again.
  
   I have seen some BZ about vdsm and supervdsm about memory leaks, but
   I don't know if VDSM 4.6.10.8 is still affected by a related bug.
  
   Can't help, but I see the same thing with CentOS 7 nodes and the same
   version of vdsm.
   --
   Chris Adams c...@cmadams.net
   ___
   Users mailing list
   Users@ovirt.org
   http://lists.ovirt.org/mailman/listinfo/users
  
   I'm afraid that we are yet to find a solution for this issue, which is
   completly different from the horrible leak of supervdsm  4.16.7.
  
   Could you corroborate the claim of
  Bug 1147148 - M2Crypto usage in vdsm leaks memory
   ? Does the leak disappear once you start using plaintext transport?
  
   Regards,
   Dan.
 
  I don’t think this is crypto related, but I could try that if you still 
  need some confirmation (and point me at a quick doc on switching to 
  plaintext?).
 
  This is from #ovirt around November 18th I think, Saggi thought he’d found 
  something related:
 
  9:58:43 AM saggi: YamakasY: Found the leak
  9:58:48 AM saggi: YamakasY: Or at least the flow
  9:58:57 AM saggi: YamakasY: The good news is that I can reproduce
  9:59:20 AM YamakasY: saggi: that's kewl!
  9:59:25 AM YamakasY: saggi: what happens ?
  9:59:41 AM YamakasY: I know from Telsin (ping ping!) that he sees it going 
  faster on gluster usage
  tdosek left the room (quit: Ping timeout: 480 seconds). (10:00:02 AM)
  djasa left the room (quit: Quit: Leaving). (10:00:24 AM)
  mlipchuk left the room (quit: Quit: Leaving.). (10:00:29 AM)
  laravot left the room (quit: Quit: Leaving.). (10:01:19 AM)
  10:01:54 AM saggi: YamakasY: it's in getCapabilities(). Here is the RSS 
  graph. The flatlines are when I stopped calling it and called other verbs. 
  http://i.imgur.com/CLm0Q75.png
 
  I do recall what is the issue Saggi and YamakasY were dicussing (CCing
  the pair), or if it reached fruition as a patch. It is certainly
  something other than Bug 1158108, as the latter speak about a leak in a
  normal working state, with no getCapabilities calls.
 
 
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users

-- 
Daniel Helgenberger
m box bewegtbild GmbH

P: +49/30/2408781-22
F: +49/30/2408781-10

ACKERSTR. 19
D-10115 BERLIN


www.m-box.de  www.monkeymen.tv

Geschäftsführer: Martin Retschitzegger / Michaela Göllner
Handeslregister: Amtsgericht Charlottenburg / HRB 112767
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] VDSM memory consumption

2015-03-26 Thread Matt .
Hi Daniel,

Great! Thanks.

I only see this issue happening on CentOS 7, Joop van de Wege also
confirmed he didn't see it on CentOS 6.

Cheers,

Matt

2015-03-26 13:33 GMT+01:00 Daniel Helgenberger daniel.helgenber...@m-box.de:
 Hello Everyone,

 I did create the original BZ on this. In the mean time, lab system I
 used is dismantled and the production system is yet to deploy.

 As I wrote in BZ1147148 [1], I experienced two different issues. One,
 one big mem leak of about 15MiB/h and a smaller one, ~300KiB. These seem
 unrelated.

 The larger leak was indeed related to SSL in some way; not necessarily
 M2Crypto. However, after disabling SSL this was gone leaving the smaller
 leak.

 [1] https://bugzilla.redhat.com/show_bug.cgi?id=1147148
 On Mo, 2015-03-09 at 23:49 +0100, Matt . wrote:
 Hi,

 I also see this on the latest 3.5 version, I'm thinking about setting
 up a cronjob to restart vdsm every night.
 I did the same thing. In general, it seems to be a bad idea as it
 compromised system stability on the long run. While VMs seem to be fine,
 engine does not like this very much.

 I cannot believe that people say they don't have this issue.
 This was hard for me to accept as well. I know of Markus Stockhausen and
 Seven Kieske, both confirmed the small leak. This might also be some
 special other service; though I started out with a minimal install of
 Centos 6.

 Can someone of the devs dive in maybe ?

 Thanks!

 Matt



 2015-03-09 23:29 GMT+01:00 Dan Kenigsberg dan...@redhat.com:
  On Mon, Mar 09, 2015 at 10:40:51AM -0500, Darrell Budic wrote:
   On Mar 9, 2015, at 4:51 AM, Dan Kenigsberg dan...@redhat.com wrote:
  
   On Fri, Mar 06, 2015 at 10:58:53AM -0600, Darrell Budic wrote:
   I believe the supervdsm leak was fixed, but 3.5.1 versions of vdsmd 
   still leaks slowly, ~300k/hr, yes.
  
   https://bugzilla.redhat.com/show_bug.cgi?id=1158108
  
  
   On Mar 6, 2015, at 10:23 AM, Chris Adams c...@cmadams.net wrote:
  
   Once upon a time, Federico Alberto Sayd fs...@uncu.edu.ar said:
   I am experiencing troubles with VDSM memory consuption.
  
   I am running
  
   Engine: ovirt 3.5.1
  
   Nodes:
  
   Centos 6.6
   VDSM 4.16.10-8
   Libvirt: libvirt-0.10.2-46
   Kernel: 2.6.32
  
   When the host boots, memory consuption is normal, but after 2 or 3
   days running, VDSM memory consuption grows and it consumes more
   memory that all vm's running in the host. If I restart the vdsm
   service, memory consuption normalizes, but then it start growing
   again.
  
   I have seen some BZ about vdsm and supervdsm about memory leaks, but
   I don't know if VDSM 4.6.10.8 is still affected by a related bug.
  
   Can't help, but I see the same thing with CentOS 7 nodes and the same
   version of vdsm.
   --
   Chris Adams c...@cmadams.net
   ___
   Users mailing list
   Users@ovirt.org
   http://lists.ovirt.org/mailman/listinfo/users
  
   I'm afraid that we are yet to find a solution for this issue, which is
   completly different from the horrible leak of supervdsm  4.16.7.
  
   Could you corroborate the claim of
  Bug 1147148 - M2Crypto usage in vdsm leaks memory
   ? Does the leak disappear once you start using plaintext transport?
  
   Regards,
   Dan.
 
  I don’t think this is crypto related, but I could try that if you still 
  need some confirmation (and point me at a quick doc on switching to 
  plaintext?).
 
  This is from #ovirt around November 18th I think, Saggi thought he’d 
  found something related:
 
  9:58:43 AM saggi: YamakasY: Found the leak
  9:58:48 AM saggi: YamakasY: Or at least the flow
  9:58:57 AM saggi: YamakasY: The good news is that I can reproduce
  9:59:20 AM YamakasY: saggi: that's kewl!
  9:59:25 AM YamakasY: saggi: what happens ?
  9:59:41 AM YamakasY: I know from Telsin (ping ping!) that he sees it 
  going faster on gluster usage
  tdosek left the room (quit: Ping timeout: 480 seconds). (10:00:02 AM)
  djasa left the room (quit: Quit: Leaving). (10:00:24 AM)
  mlipchuk left the room (quit: Quit: Leaving.). (10:00:29 AM)
  laravot left the room (quit: Quit: Leaving.). (10:01:19 AM)
  10:01:54 AM saggi: YamakasY: it's in getCapabilities(). Here is the RSS 
  graph. The flatlines are when I stopped calling it and called other 
  verbs. http://i.imgur.com/CLm0Q75.png
 
  I do recall what is the issue Saggi and YamakasY were dicussing (CCing
  the pair), or if it reached fruition as a patch. It is certainly
  something other than Bug 1158108, as the latter speak about a leak in a
  normal working state, with no getCapabilities calls.
 
 
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users

 --
 Daniel Helgenberger
 m box bewegtbild GmbH

 P: +49/30/2408781-22
 F: +49/30/2408781-10

 ACKERSTR. 19
 D-10115 BERLIN


 www.m-box.de  www.monkeymen.tv

 Geschäftsführer: Martin Retschitzegger / Michaela Göllner
 Handeslregister: Amtsgericht 

Re: [ovirt-users] VDSM memory consumption

2015-03-26 Thread Federico Alberto Sayd

On 26/03/15 09:43, Matt . wrote:

Hi Daniel,

Great! Thanks.

I only see this issue happening on CentOS 7, Joop van de Wege also
confirmed he didn't see it on CentOS 6.

Cheers,

Matt
I have experienced the same issue on Centos 6.6 and Centos 7 both 
managed by the same engine.


Cheers

Federico


2015-03-26 13:33 GMT+01:00 Daniel Helgenberger daniel.helgenber...@m-box.de:

Hello Everyone,

I did create the original BZ on this. In the mean time, lab system I
used is dismantled and the production system is yet to deploy.

As I wrote in BZ1147148 [1], I experienced two different issues. One,
one big mem leak of about 15MiB/h and a smaller one, ~300KiB. These seem
unrelated.

The larger leak was indeed related to SSL in some way; not necessarily
M2Crypto. However, after disabling SSL this was gone leaving the smaller
leak.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1147148
On Mo, 2015-03-09 at 23:49 +0100, Matt . wrote:

Hi,

I also see this on the latest 3.5 version, I'm thinking about setting
up a cronjob to restart vdsm every night.

I did the same thing. In general, it seems to be a bad idea as it
compromised system stability on the long run. While VMs seem to be fine,
engine does not like this very much.


I cannot believe that people say they don't have this issue.

This was hard for me to accept as well. I know of Markus Stockhausen and
Seven Kieske, both confirmed the small leak. This might also be some
special other service; though I started out with a minimal install of
Centos 6.

Can someone of the devs dive in maybe ?

Thanks!

Matt



2015-03-09 23:29 GMT+01:00 Dan Kenigsberg dan...@redhat.com:

On Mon, Mar 09, 2015 at 10:40:51AM -0500, Darrell Budic wrote:

On Mar 9, 2015, at 4:51 AM, Dan Kenigsberg dan...@redhat.com wrote:

On Fri, Mar 06, 2015 at 10:58:53AM -0600, Darrell Budic wrote:

I believe the supervdsm leak was fixed, but 3.5.1 versions of vdsmd still leaks 
slowly, ~300k/hr, yes.

https://bugzilla.redhat.com/show_bug.cgi?id=1158108



On Mar 6, 2015, at 10:23 AM, Chris Adams c...@cmadams.net wrote:

Once upon a time, Federico Alberto Sayd fs...@uncu.edu.ar said:

I am experiencing troubles with VDSM memory consuption.

I am running

Engine: ovirt 3.5.1

Nodes:

Centos 6.6
VDSM 4.16.10-8
Libvirt: libvirt-0.10.2-46
Kernel: 2.6.32

When the host boots, memory consuption is normal, but after 2 or 3
days running, VDSM memory consuption grows and it consumes more
memory that all vm's running in the host. If I restart the vdsm
service, memory consuption normalizes, but then it start growing
again.

I have seen some BZ about vdsm and supervdsm about memory leaks, but
I don't know if VDSM 4.6.10.8 is still affected by a related bug.

Can't help, but I see the same thing with CentOS 7 nodes and the same
version of vdsm.
--
Chris Adams c...@cmadams.net
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

I'm afraid that we are yet to find a solution for this issue, which is
completly different from the horrible leak of supervdsm  4.16.7.

Could you corroborate the claim of
Bug 1147148 - M2Crypto usage in vdsm leaks memory
? Does the leak disappear once you start using plaintext transport?

Regards,
Dan.

I don’t think this is crypto related, but I could try that if you still need 
some confirmation (and point me at a quick doc on switching to plaintext?).

This is from #ovirt around November 18th I think, Saggi thought he’d found 
something related:

9:58:43 AM saggi: YamakasY: Found the leak
9:58:48 AM saggi: YamakasY: Or at least the flow
9:58:57 AM saggi: YamakasY: The good news is that I can reproduce
9:59:20 AM YamakasY: saggi: that's kewl!
9:59:25 AM YamakasY: saggi: what happens ?
9:59:41 AM YamakasY: I know from Telsin (ping ping!) that he sees it going 
faster on gluster usage
tdosek left the room (quit: Ping timeout: 480 seconds). (10:00:02 AM)
djasa left the room (quit: Quit: Leaving). (10:00:24 AM)
mlipchuk left the room (quit: Quit: Leaving.). (10:00:29 AM)
laravot left the room (quit: Quit: Leaving.). (10:01:19 AM)
10:01:54 AM saggi: YamakasY: it's in getCapabilities(). Here is the RSS graph. 
The flatlines are when I stopped calling it and called other verbs. 
http://i.imgur.com/CLm0Q75.png

I do recall what is the issue Saggi and YamakasY were dicussing (CCing
the pair), or if it reached fruition as a patch. It is certainly
something other than Bug 1158108, as the latter speak about a leak in a
normal working state, with no getCapabilities calls.



___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

--
Daniel Helgenberger
m box bewegtbild GmbH

P: +49/30/2408781-22
F: +49/30/2408781-10

ACKERSTR. 19
D-10115 BERLIN


www.m-box.de  www.monkeymen.tv

Geschäftsführer: Martin Retschitzegger / Michaela Göllner
Handeslregister: Amtsgericht Charlottenburg / HRB 112767

___
Users 

Re: [ovirt-users] VDSM memory consumption

2015-03-26 Thread Darrell Budic

 On Mar 26, 2015, at 6:42 AM, Dan Kenigsberg dan...@redhat.com wrote:
 
 On Wed, Mar 25, 2015 at 01:29:25PM -0500, Darrell Budic wrote:
 
 On Mar 25, 2015, at 5:34 AM, Dan Kenigsberg dan...@redhat.com wrote:
 
 On Tue, Mar 24, 2015 at 02:01:40PM -0500, Darrell Budic wrote:
 
 On Mar 24, 2015, at 4:33 AM, Dan Kenigsberg dan...@redhat.com wrote:
 
 On Mon, Mar 23, 2015 at 04:00:14PM -0400, John Taylor wrote:
 Chris Adams c...@cmadams.net writes:
 
 Once upon a time, Sven Kieske s.kie...@mittwald.de said:
 On 13/03/15 12:29, Kapetanakis Giannis wrote:
 We also face this problem since 3.5 in two different installations...
 Hope it's fixed soon
 
 Nothing will get fixed if no one bothers to
 open BZs and send relevants log files to help
 track down the problems.
 
 There's already an open BZ:
 
 https://bugzilla.redhat.com/show_bug.cgi?id=1158108
 
 I'm not sure if that is exactly the same problem I'm seeing or not; my
 vdsm process seems to be growing faster (RSS grew 952K in a 5 minute
 period just now; VSZ didn't change).
 
 For those following this I've added a comment on the bz [1], although in
 my case the memory leak is, like Chris Adams, a lot more than the 
 300KiB/h
 in the original bug report by Daniel Helgenberger .
 
 [1] https://bugzilla.redhat.com/show_bug.cgi?id=1158108
 
 That's interesting (and worrying).
 Could you check your suggestion by editing sampling.py so that
 _get_interfaces_and_samples() returns the empty dict immediately?
 Would this make the leak disappear?
 
 Looks like you’ve got something there. Just a quick test for now, watching 
 RSS in top. I’ll let it go this way for a while and see what it looks in a 
 few hours.
 
 System 1: 13 VMs w/ 24 interfaces between them
 
 11:47 killed a vdsm @ 9.116G RSS (after maybe a week and a half running)
 
 11:47: 97xxx
 11:57 135544 and climbing
 12:00 136400
 
 restarted with sampling.py modified to just return empty set:
 
 def _get_interfaces_and_samples():
   links_and_samples = {}
   return links_and_samples
 
 Thanks for the input. Just to be a little more certain that the culprit
 is _get_interfaces_and_samples() per se, would you please decorate it
 with memoized, and add a log line in the end
 
 @utils.memoized   # add this line
 def _get_interfaces_and_samples():
   ...
   logging.debug('LINKS %s', links_and_samples)  ## and this line
   return links_and_samples
 
 I'd like to see what happens when the function is run only once, and
 returns a non-empty reasonable dictionary of links and samples.
 
 Looks similar, I modified my second server for this test:
 
 Thanks again. Would you be kind to search further?
 Does the following script leak anything on your host, when placed in your
 /usr/share/vdsm:
 
#!/usr/bin/python
 
from time import sleep
from virt.sampling import _get_interfaces_and_samples
 
while True:
_get_interfaces_and_samples()
sleep(0.2)
 
 Something that can be a bit harder would be to:
 # service vdsmd stop
 # su - vdsm -s /bin/bash
 # cd /usr/share/vdsm
 # valgrind --leak-check=full --log-file=/tmp/your.log vdsm
 
 as suggested by Thomas on
 https://bugzilla.redhat.com/show_bug.cgi?id=1158108#c6

Yes, this script leaks quickly. Started out at a RSS of 21000ish, already at 
26744 a minute in, about 5 minutes later it’s at 39384 and climbing.

Been abusing a production server for those simple tests, but didn’t want to run 
valgrind against it right this minute. Did run it against the test.py script 
above though, got this (fpaste.org didn’t like, too long maybe?): 
http://tower.onholyground.com/valgrind-test.log

To comment on some other posts in this thread, I also see leaks on my test 
system which is running Centos 6.6, but it only has 3 VMs across 2 servers and 
3 configured networks and it leaks MUCH slower. I suspect people don’t notice 
this on test systems because they don’t have a lot of VMs/interfaces running, 
and don’t leave them up for weeks at a time. That’s why I was running these 
tests on my production box, to have more VMs up.




___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] VDSM memory consumption

2015-03-25 Thread Darrell Budic

 On Mar 25, 2015, at 5:34 AM, Dan Kenigsberg dan...@redhat.com wrote:
 
 On Tue, Mar 24, 2015 at 02:01:40PM -0500, Darrell Budic wrote:
 
 On Mar 24, 2015, at 4:33 AM, Dan Kenigsberg dan...@redhat.com wrote:
 
 On Mon, Mar 23, 2015 at 04:00:14PM -0400, John Taylor wrote:
 Chris Adams c...@cmadams.net writes:
 
 Once upon a time, Sven Kieske s.kie...@mittwald.de said:
 On 13/03/15 12:29, Kapetanakis Giannis wrote:
 We also face this problem since 3.5 in two different installations...
 Hope it's fixed soon
 
 Nothing will get fixed if no one bothers to
 open BZs and send relevants log files to help
 track down the problems.
 
 There's already an open BZ:
 
 https://bugzilla.redhat.com/show_bug.cgi?id=1158108
 
 I'm not sure if that is exactly the same problem I'm seeing or not; my
 vdsm process seems to be growing faster (RSS grew 952K in a 5 minute
 period just now; VSZ didn't change).
 
 For those following this I've added a comment on the bz [1], although in
 my case the memory leak is, like Chris Adams, a lot more than the 300KiB/h
 in the original bug report by Daniel Helgenberger .
 
 [1] https://bugzilla.redhat.com/show_bug.cgi?id=1158108
 
 That's interesting (and worrying).
 Could you check your suggestion by editing sampling.py so that
 _get_interfaces_and_samples() returns the empty dict immediately?
 Would this make the leak disappear?
 
 Looks like you’ve got something there. Just a quick test for now, watching 
 RSS in top. I’ll let it go this way for a while and see what it looks in a 
 few hours.
 
 System 1: 13 VMs w/ 24 interfaces between them
 
 11:47 killed a vdsm @ 9.116G RSS (after maybe a week and a half running)
 
 11:47: 97xxx
 11:57 135544 and climbing
 12:00 136400
 
 restarted with sampling.py modified to just return empty set:
 
 def _get_interfaces_and_samples():
links_and_samples = {}
return links_and_samples
 
 Thanks for the input. Just to be a little more certain that the culprit
 is _get_interfaces_and_samples() per se, would you please decorate it
 with memoized, and add a log line in the end
 
 @utils.memoized   # add this line
 def _get_interfaces_and_samples():
...
logging.debug('LINKS %s', links_and_samples)  ## and this line
return links_and_samples
 
 I'd like to see what happens when the function is run only once, and
 returns a non-empty reasonable dictionary of links and samples.

Looks similar, I modified my second server for this test:

12:25, still growing from yesterday: 544512

restarted with mods for logging and memoize:
stabilized @ 12:32: 114284
1:23: 115300

Thread-12::DEBUG::2015-03-25 
12:28:08,080::sampling::243::root::(_get_interfaces_and_samples) LINKS 
{'vnet18': virt.sampling.InterfaceSample instance at 0x7f38c03e85f0, 
'vnet19': virt.sampling.InterfaceSample instance at 0x7f38b42cbcf8, 'bond0': 
virt.sampling.InterfaceSample instance at 0x7f38b429afc8, 'vnet13': 
virt.sampling.InterfaceSample instance at 0x7f38b42c8680, 'vnet16': 
virt.sampling.InterfaceSample instance at 0x7f38b42cb368, 'private': 
virt.sampling.InterfaceSample instance at 0x7f38b42b8bd8, 'bond0.100': 
virt.sampling.InterfaceSample instance at 0x7f38b42bdd88, 'vnet0': 
virt.sampling.InterfaceSample instance at 0x7f38b42c1f80, 'enp3s0': 
virt.sampling.InterfaceSample instance at 0x7f38b429cef0, 'vnet2': 
virt.sampling.InterfaceSample instance at 0x7f38b42bbbd8, 'vnet3': 
virt.sampling.InterfaceSample instance at 0x7f38b42c37e8, 'vnet4': 
virt.sampling.InterfaceSample instance at 0x7f38b42c5518, 'vnet5': 
virt.sampling.InterfaceSample instance at 0x7f38b42c6ab8, 'vnet6': 
virt.sampling.InterfaceSample instance at 0x7f38b42c7248, 'vnet7': 
virt.sampling.InterfaceSample instance at 0x7f38c03e7a28, 'vnet8': 
virt.sampling.InterfaceSample instance at 0x7f38b42c7c20, 'bond0.1100': 
virt.sampling.InterfaceSample instance at 0x7f38b42be710, 'bond0.1103': 
virt.sampling.InterfaceSample instance at 0x7f38b429dc68, 'ovirtmgmt': 
virt.sampling.InterfaceSample instance at 0x7f38b42b16c8, 'lo': 
virt.sampling.InterfaceSample instance at 0x7f38b429a8c0, 'vnet22': 
virt.sampling.InterfaceSample instance at 0x7f38c03e7128, 'vnet21': 
virt.sampling.InterfaceSample instance at 0x7f38b42cd368, 'vnet20': 
virt.sampling.InterfaceSample instance at 0x7f38b42cc7a0, 'internet': 
virt.sampling.InterfaceSample instance at 0x7f38b42aa098, 'bond0.1203': 
virt.sampling.InterfaceSample instance at 0x7f38b42aa8c0, 'bond0.1223': 
virt.sampling.InterfaceSample instance at 0x7f38b42bb128, ‘XXX': 
virt.sampling.InterfaceSample instance at 0x7f38b42bee60, ‘XXX': 
virt.sampling.InterfaceSample instance at 0x7f38b42beef0, ';vdsmdummy;': 
virt.sampling.InterfaceSample instance at 0x7f38b42bdc20, 'vnet14': 
virt.sampling.InterfaceSample instance at 0x7f38b42ca050, 'mgmt': 
virt.sampling.InterfaceSample instance at 0x7f38b42be248, 'vnet15': 
virt.sampling.InterfaceSample instance at 0x7f38b42cab00, 'enp2s0': 
virt.sampling.InterfaceSample instance at 0x7f38b429c200, 'bond0.1110': 

Re: [ovirt-users] VDSM memory consumption

2015-03-24 Thread Dan Kenigsberg
On Mon, Mar 23, 2015 at 04:00:14PM -0400, John Taylor wrote:
 Chris Adams c...@cmadams.net writes:
 
  Once upon a time, Sven Kieske s.kie...@mittwald.de said:
  On 13/03/15 12:29, Kapetanakis Giannis wrote:
   We also face this problem since 3.5 in two different installations...
   Hope it's fixed soon
  
  Nothing will get fixed if no one bothers to
  open BZs and send relevants log files to help
  track down the problems.
 
  There's already an open BZ:
 
  https://bugzilla.redhat.com/show_bug.cgi?id=1158108
 
  I'm not sure if that is exactly the same problem I'm seeing or not; my
  vdsm process seems to be growing faster (RSS grew 952K in a 5 minute
  period just now; VSZ didn't change).
 
 For those following this I've added a comment on the bz [1], although in
 my case the memory leak is, like Chris Adams, a lot more than the 300KiB/h
 in the original bug report by Daniel Helgenberger .
 
 [1] https://bugzilla.redhat.com/show_bug.cgi?id=1158108

That's interesting (and worrying).
Could you check your suggestion by editing sampling.py so that
_get_interfaces_and_samples() returns the empty dict immediately?
Would this make the leak disappear?

Dan.
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] VDSM memory consumption

2015-03-24 Thread Darrell Budic

 On Mar 24, 2015, at 4:33 AM, Dan Kenigsberg dan...@redhat.com wrote:
 
 On Mon, Mar 23, 2015 at 04:00:14PM -0400, John Taylor wrote:
 Chris Adams c...@cmadams.net writes:
 
 Once upon a time, Sven Kieske s.kie...@mittwald.de said:
 On 13/03/15 12:29, Kapetanakis Giannis wrote:
 We also face this problem since 3.5 in two different installations...
 Hope it's fixed soon
 
 Nothing will get fixed if no one bothers to
 open BZs and send relevants log files to help
 track down the problems.
 
 There's already an open BZ:
 
 https://bugzilla.redhat.com/show_bug.cgi?id=1158108
 
 I'm not sure if that is exactly the same problem I'm seeing or not; my
 vdsm process seems to be growing faster (RSS grew 952K in a 5 minute
 period just now; VSZ didn't change).
 
 For those following this I've added a comment on the bz [1], although in
 my case the memory leak is, like Chris Adams, a lot more than the 300KiB/h
 in the original bug report by Daniel Helgenberger .
 
 [1] https://bugzilla.redhat.com/show_bug.cgi?id=1158108
 
 That's interesting (and worrying).
 Could you check your suggestion by editing sampling.py so that
 _get_interfaces_and_samples() returns the empty dict immediately?
 Would this make the leak disappear?

Looks like you’ve got something there. Just a quick test for now, watching RSS 
in top. I’ll let it go this way for a while and see what it looks in a few 
hours.

System 1: 13 VMs w/ 24 interfaces between them

11:47 killed a vdsm @ 9.116G RSS (after maybe a week and a half running)

11:47: 97xxx
11:57 135544 and climbing
12:00 136400

restarted with sampling.py modified to just return empty set:

def _get_interfaces_and_samples():
links_and_samples = {}
return links_and_samples

12:02 quickly grew to 127694
12:13: 133352
12:20: 132476
12:31: 132732
12:40: 132656
12:50: 132800
1:30: 133928
1:40: 133136
1:50: 133116
2:00: 133128

interestingly, it looks like overall system load dropped significantly (from 
~40-45% to 10% reported). mostly ksmd getting out of the way after freeing 9G, 
but feels like more than that. (this is a 6 core system, usually saw ksmd using 
~80% of a single cpu, roughly 15% of the total available)


Second system, 10 Vms w/ 17 interfaces

vdsmd @ 5.027G RSS (slightly less uptime that previous host) freeing this ram 
caused a ~16% utilization drop as ksmd stopped running as hard.

restarted at 12:10

12:10: 106224
12:20: 111220
12:31: 114616
12:40: 117500
12:50: 120504
1:30: 133040
1:40: 136140
1:50: 139032
2:00: 142292



___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] VDSM memory consumption

2015-03-23 Thread John Taylor
Chris Adams c...@cmadams.net writes:

 Once upon a time, Sven Kieske s.kie...@mittwald.de said:
 On 13/03/15 12:29, Kapetanakis Giannis wrote:
  We also face this problem since 3.5 in two different installations...
  Hope it's fixed soon
 
 Nothing will get fixed if no one bothers to
 open BZs and send relevants log files to help
 track down the problems.

 There's already an open BZ:

 https://bugzilla.redhat.com/show_bug.cgi?id=1158108

 I'm not sure if that is exactly the same problem I'm seeing or not; my
 vdsm process seems to be growing faster (RSS grew 952K in a 5 minute
 period just now; VSZ didn't change).

For those following this I've added a comment on the bz [1], although in
my case the memory leak is, like Chris Adams, a lot more than the 300KiB/h
in the original bug report by Daniel Helgenberger .

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1158108

-John

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] VDSM memory consumption

2015-03-13 Thread Sven Kieske


On 13/03/15 12:29, Kapetanakis Giannis wrote:
 We also face this problem since 3.5 in two different installations...
 Hope it's fixed soon

Nothing will get fixed if no one bothers to
open BZs and send relevants log files to help
track down the problems.

-- 
Mit freundlichen Grüßen / Regards

Sven Kieske

Systemadministrator
Mittwald CM Service GmbH  Co. KG
Königsberger Straße 6
32339 Espelkamp
T: +49-5772-293-100
F: +49-5772-293-333
https://www.mittwald.de
Geschäftsführer: Robert Meyer
St.Nr.: 331/5721/1033, USt-IdNr.: DE814773217, HRA 6640, AG Bad Oeynhausen
Komplementärin: Robert Meyer Verwaltungs GmbH, HRB 13260, AG Bad Oeynhausen
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] VDSM memory consumption

2015-03-13 Thread Kapetanakis Giannis

On 06/03/15 18:12, Federico Alberto Sayd wrote:

Hello:

I am experiencing troubles with VDSM memory consuption.

I am running

Engine: ovirt 3.5.1

Nodes:

Centos 6.6
VDSM 4.16.10-8
Libvirt: libvirt-0.10.2-46
Kernel: 2.6.32

When the host boots, memory consuption is normal, but after 2 or 3 
days running, VDSM memory consuption grows and it consumes more memory 
that all vm's running in the host. If I restart the vdsm service, 
memory consuption normalizes, but then it start growing again.


I have seen some BZ about vdsm and supervdsm about memory leaks, but I 
don't know if VDSM 4.6.10.8 is still affected by a related bug.


Any help? If you need, I can provide more information

Thank you


We also face this problem since 3.5 in two different installations...
Hope it's fixed soon

G
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] VDSM memory consumption

2015-03-13 Thread Chris Adams
Once upon a time, Sven Kieske s.kie...@mittwald.de said:
 On 13/03/15 12:29, Kapetanakis Giannis wrote:
  We also face this problem since 3.5 in two different installations...
  Hope it's fixed soon
 
 Nothing will get fixed if no one bothers to
 open BZs and send relevants log files to help
 track down the problems.

There's already an open BZ:

https://bugzilla.redhat.com/show_bug.cgi?id=1158108

I'm not sure if that is exactly the same problem I'm seeing or not; my
vdsm process seems to be growing faster (RSS grew 952K in a 5 minute
period just now; VSZ didn't change).

-- 
Chris Adams c...@cmadams.net
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] VDSM memory consumption

2015-03-10 Thread Dan Kenigsberg
On Mon, Mar 09, 2015 at 11:49:01PM +0100, Matt . wrote:
 Hi,
 
 I also see this on the latest 3.5 version, I'm thinking about setting
 up a cronjob to restart vdsm every night.
 
 I cannot believe that people say they don't have this issue.
 
 Can someone of the devs dive in maybe ?

  10:01:54 AM saggi: YamakasY: it's in getCapabilities(). Here is the RSS 
  graph. The flatlines are when I stopped calling it and called other verbs. 
  http://i.imgur.com/CLm0Q75.png
 
  I do ***NOT*** recall what is the issue Saggi and YamakasY were dicussing 
  (CCing
  the pair), or if it reached fruition as a patch. It is certainly
  something other than Bug 1158108, as the latter speak about a leak in a
  normal working state, with no getCapabilities calls.

Please notice an important word that fell off my text. Do YOU recall if
a fix was posted?
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] VDSM memory consumption

2015-03-10 Thread ybronhei

On 03/10/2015 12:19 AM, Dan Kenigsberg wrote:

On Mon, Mar 09, 2015 at 12:17:00PM -0500, Chris Adams wrote:

Once upon a time, Dan Kenigsberg dan...@redhat.com said:

I'm afraid that we are yet to find a solution for this issue, which is
completly different from the horrible leak of supervdsm  4.16.7.

Could you corroborate the claim of
 Bug 1147148 - M2Crypto usage in vdsm leaks memory
? Does the leak disappear once you start using plaintext transport?


So, to confirm, it looks like to do that, the steps would be:

- In the [vars] section of /etc/vdsm/vdsm.conf, set ssl = false.
- Restart the vdsmd service.

Is that all that is needed?


No. You'd have to reconfigure libvirtd to work in plaintext

 vdsm-tool congfigure --force

and also set you Engine to work in plaintext (unfortunately, I don't
recall how's that done. surely Yaniv does)
if the host already managed by the engine you can move it to 
maintenance, set directly in vdc_options table by psql client to your 
db- update to False in vdc_options the value of 
'EncryptHostCommunication' 'SSLEnabled' options, then restart ovirt-engine.
expect the engine side, run also the changes on host (ssl=False and 
configure --force as Dan mentions above) and reactivate the host.



Is it safe to restart vdsmd on a node with
active VMs?


It's safe in the sense that I have not heard of a single failure to
reconnected to already-running VMs in years. However, this is still not
recommended for production environment, and particularly not if one of
the VMs is defined as highly-available. This can end up with your host
being fenced and all your VMs dead.

Dan.




--
Yaniv Bronhaim.
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] VDSM memory consumption

2015-03-10 Thread Matt .
NO!

The fix that should have fixed it didn't change a thing... we lost
track there as some devs were going to look at it.

2015-03-10 11:47 GMT+01:00 Dan Kenigsberg dan...@redhat.com:
 On Mon, Mar 09, 2015 at 11:49:01PM +0100, Matt . wrote:
 Hi,

 I also see this on the latest 3.5 version, I'm thinking about setting
 up a cronjob to restart vdsm every night.

 I cannot believe that people say they don't have this issue.

 Can someone of the devs dive in maybe ?

  10:01:54 AM saggi: YamakasY: it's in getCapabilities(). Here is the RSS 
  graph. The flatlines are when I stopped calling it and called other 
  verbs. http://i.imgur.com/CLm0Q75.png
 
  I do ***NOT*** recall what is the issue Saggi and YamakasY were dicussing 
  (CCing
  the pair), or if it reached fruition as a patch. It is certainly
  something other than Bug 1158108, as the latter speak about a leak in a
  normal working state, with no getCapabilities calls.

 Please notice an important word that fell off my text. Do YOU recall if
 a fix was posted?
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] VDSM memory consumption

2015-03-09 Thread Darrell Budic
 On Mar 9, 2015, at 4:51 AM, Dan Kenigsberg dan...@redhat.com wrote:
 
 On Fri, Mar 06, 2015 at 10:58:53AM -0600, Darrell Budic wrote:
 I believe the supervdsm leak was fixed, but 3.5.1 versions of vdsmd still 
 leaks slowly, ~300k/hr, yes.
 
 https://bugzilla.redhat.com/show_bug.cgi?id=1158108
 
 
 On Mar 6, 2015, at 10:23 AM, Chris Adams c...@cmadams.net wrote:
 
 Once upon a time, Federico Alberto Sayd fs...@uncu.edu.ar said:
 I am experiencing troubles with VDSM memory consuption.
 
 I am running
 
 Engine: ovirt 3.5.1
 
 Nodes:
 
 Centos 6.6
 VDSM 4.16.10-8
 Libvirt: libvirt-0.10.2-46
 Kernel: 2.6.32
 
 When the host boots, memory consuption is normal, but after 2 or 3
 days running, VDSM memory consuption grows and it consumes more
 memory that all vm's running in the host. If I restart the vdsm
 service, memory consuption normalizes, but then it start growing
 again.
 
 I have seen some BZ about vdsm and supervdsm about memory leaks, but
 I don't know if VDSM 4.6.10.8 is still affected by a related bug.
 
 Can't help, but I see the same thing with CentOS 7 nodes and the same
 version of vdsm.
 -- 
 Chris Adams c...@cmadams.net
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users
 
 I'm afraid that we are yet to find a solution for this issue, which is
 completly different from the horrible leak of supervdsm  4.16.7.
 
 Could you corroborate the claim of
Bug 1147148 - M2Crypto usage in vdsm leaks memory
 ? Does the leak disappear once you start using plaintext transport?
 
 Regards,
 Dan.

I don’t think this is crypto related, but I could try that if you still need 
some confirmation (and point me at a quick doc on switching to plaintext?).

This is from #ovirt around November 18th I think, Saggi thought he’d found 
something related:

9:58:43 AM saggi: YamakasY: Found the leak
9:58:48 AM saggi: YamakasY: Or at least the flow
9:58:57 AM saggi: YamakasY: The good news is that I can reproduce
9:59:20 AM YamakasY: saggi: that's kewl!
9:59:25 AM YamakasY: saggi: what happens ?
9:59:41 AM YamakasY: I know from Telsin (ping ping!) that he sees it going 
faster on gluster usage
tdosek left the room (quit: Ping timeout: 480 seconds). (10:00:02 AM)
djasa left the room (quit: Quit: Leaving). (10:00:24 AM)
mlipchuk left the room (quit: Quit: Leaving.). (10:00:29 AM)
laravot left the room (quit: Quit: Leaving.). (10:01:19 AM)
10:01:54 AM saggi: YamakasY: it's in getCapabilities(). Here is the RSS graph. 
The flatlines are when I stopped calling it and called other verbs. 
http://i.imgur.com/CLm0Q75.png
movciari left the room (quit: Ping timeout: 480 seconds). (10:02:34 AM)
10:02:46 AM saggi: YamakasY: horizontal is time since epoch and vertical is RSS 
in bytes
bobdrad left the room (quit: Quit: Leaving.). (10:03:25 AM)
10:03:52 AM YamakasY: saggi: I have seen that line s much!
10:04:11 AM YamakasY: I think I even made a mailing about it
10:04:18 AM YamakasY: at least asked here
10:04:32 AM YamakasY: no-one knew, but those lines are almost blowing you away
10:04:35 AM YamakasY: can we patch it ?
10:04:59 AM YamakasY: wow, nice one to catch
10:05:28 AM saggi: YamakasY: I now have a smaller part of the code to scan 
through and a way to reproduce so hopefully I'll have a patch soon

was that ever followed up on?


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] VDSM memory consumption

2015-03-09 Thread Chris Adams
Once upon a time, Dan Kenigsberg dan...@redhat.com said:
 I'm afraid that we are yet to find a solution for this issue, which is
 completly different from the horrible leak of supervdsm  4.16.7.
 
 Could you corroborate the claim of
 Bug 1147148 - M2Crypto usage in vdsm leaks memory
 ? Does the leak disappear once you start using plaintext transport?

So, to confirm, it looks like to do that, the steps would be:

- In the [vars] section of /etc/vdsm/vdsm.conf, set ssl = false.
- Restart the vdsmd service.

Is that all that is needed?  Is it safe to restart vdsmd on a node with
active VMs?

-- 
Chris Adams c...@cmadams.net
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] VDSM memory consumption

2015-03-09 Thread Dan Kenigsberg
On Mon, Mar 09, 2015 at 10:40:51AM -0500, Darrell Budic wrote:
  On Mar 9, 2015, at 4:51 AM, Dan Kenigsberg dan...@redhat.com wrote:
  
  On Fri, Mar 06, 2015 at 10:58:53AM -0600, Darrell Budic wrote:
  I believe the supervdsm leak was fixed, but 3.5.1 versions of vdsmd still 
  leaks slowly, ~300k/hr, yes.
  
  https://bugzilla.redhat.com/show_bug.cgi?id=1158108
  
  
  On Mar 6, 2015, at 10:23 AM, Chris Adams c...@cmadams.net wrote:
  
  Once upon a time, Federico Alberto Sayd fs...@uncu.edu.ar said:
  I am experiencing troubles with VDSM memory consuption.
  
  I am running
  
  Engine: ovirt 3.5.1
  
  Nodes:
  
  Centos 6.6
  VDSM 4.16.10-8
  Libvirt: libvirt-0.10.2-46
  Kernel: 2.6.32
  
  When the host boots, memory consuption is normal, but after 2 or 3
  days running, VDSM memory consuption grows and it consumes more
  memory that all vm's running in the host. If I restart the vdsm
  service, memory consuption normalizes, but then it start growing
  again.
  
  I have seen some BZ about vdsm and supervdsm about memory leaks, but
  I don't know if VDSM 4.6.10.8 is still affected by a related bug.
  
  Can't help, but I see the same thing with CentOS 7 nodes and the same
  version of vdsm.
  -- 
  Chris Adams c...@cmadams.net
  ___
  Users mailing list
  Users@ovirt.org
  http://lists.ovirt.org/mailman/listinfo/users
  
  I'm afraid that we are yet to find a solution for this issue, which is
  completly different from the horrible leak of supervdsm  4.16.7.
  
  Could you corroborate the claim of
 Bug 1147148 - M2Crypto usage in vdsm leaks memory
  ? Does the leak disappear once you start using plaintext transport?
  
  Regards,
  Dan.
 
 I don’t think this is crypto related, but I could try that if you still need 
 some confirmation (and point me at a quick doc on switching to plaintext?).
 
 This is from #ovirt around November 18th I think, Saggi thought he’d found 
 something related:
 
 9:58:43 AM saggi: YamakasY: Found the leak
 9:58:48 AM saggi: YamakasY: Or at least the flow
 9:58:57 AM saggi: YamakasY: The good news is that I can reproduce
 9:59:20 AM YamakasY: saggi: that's kewl!
 9:59:25 AM YamakasY: saggi: what happens ?
 9:59:41 AM YamakasY: I know from Telsin (ping ping!) that he sees it going 
 faster on gluster usage
 tdosek left the room (quit: Ping timeout: 480 seconds). (10:00:02 AM)
 djasa left the room (quit: Quit: Leaving). (10:00:24 AM)
 mlipchuk left the room (quit: Quit: Leaving.). (10:00:29 AM)
 laravot left the room (quit: Quit: Leaving.). (10:01:19 AM)
 10:01:54 AM saggi: YamakasY: it's in getCapabilities(). Here is the RSS 
 graph. The flatlines are when I stopped calling it and called other verbs. 
 http://i.imgur.com/CLm0Q75.png

I do recall what is the issue Saggi and YamakasY were dicussing (CCing
the pair), or if it reached fruition as a patch. It is certainly
something other than Bug 1158108, as the latter speak about a leak in a
normal working state, with no getCapabilities calls.


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] VDSM memory consumption

2015-03-09 Thread Dan Kenigsberg
On Mon, Mar 09, 2015 at 12:17:00PM -0500, Chris Adams wrote:
 Once upon a time, Dan Kenigsberg dan...@redhat.com said:
  I'm afraid that we are yet to find a solution for this issue, which is
  completly different from the horrible leak of supervdsm  4.16.7.
  
  Could you corroborate the claim of
  Bug 1147148 - M2Crypto usage in vdsm leaks memory
  ? Does the leak disappear once you start using plaintext transport?
 
 So, to confirm, it looks like to do that, the steps would be:
 
 - In the [vars] section of /etc/vdsm/vdsm.conf, set ssl = false.
 - Restart the vdsmd service.
 
 Is that all that is needed?

No. You'd have to reconfigure libvirtd to work in plaintext

vdsm-tool congfigure --force

and also set you Engine to work in plaintext (unfortunately, I don't
recall how's that done. surely Yaniv does)

 Is it safe to restart vdsmd on a node with
 active VMs?

It's safe in the sense that I have not heard of a single failure to
reconnected to already-running VMs in years. However, this is still not
recommended for production environment, and particularly not if one of
the VMs is defined as highly-available. This can end up with your host
being fenced and all your VMs dead.

Dan.
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] VDSM memory consumption

2015-03-09 Thread Matt .
Hi,

I also see this on the latest 3.5 version, I'm thinking about setting
up a cronjob to restart vdsm every night.

I cannot believe that people say they don't have this issue.

Can someone of the devs dive in maybe ?

Thanks!

Matt



2015-03-09 23:29 GMT+01:00 Dan Kenigsberg dan...@redhat.com:
 On Mon, Mar 09, 2015 at 10:40:51AM -0500, Darrell Budic wrote:
  On Mar 9, 2015, at 4:51 AM, Dan Kenigsberg dan...@redhat.com wrote:
 
  On Fri, Mar 06, 2015 at 10:58:53AM -0600, Darrell Budic wrote:
  I believe the supervdsm leak was fixed, but 3.5.1 versions of vdsmd still 
  leaks slowly, ~300k/hr, yes.
 
  https://bugzilla.redhat.com/show_bug.cgi?id=1158108
 
 
  On Mar 6, 2015, at 10:23 AM, Chris Adams c...@cmadams.net wrote:
 
  Once upon a time, Federico Alberto Sayd fs...@uncu.edu.ar said:
  I am experiencing troubles with VDSM memory consuption.
 
  I am running
 
  Engine: ovirt 3.5.1
 
  Nodes:
 
  Centos 6.6
  VDSM 4.16.10-8
  Libvirt: libvirt-0.10.2-46
  Kernel: 2.6.32
 
  When the host boots, memory consuption is normal, but after 2 or 3
  days running, VDSM memory consuption grows and it consumes more
  memory that all vm's running in the host. If I restart the vdsm
  service, memory consuption normalizes, but then it start growing
  again.
 
  I have seen some BZ about vdsm and supervdsm about memory leaks, but
  I don't know if VDSM 4.6.10.8 is still affected by a related bug.
 
  Can't help, but I see the same thing with CentOS 7 nodes and the same
  version of vdsm.
  --
  Chris Adams c...@cmadams.net
  ___
  Users mailing list
  Users@ovirt.org
  http://lists.ovirt.org/mailman/listinfo/users
 
  I'm afraid that we are yet to find a solution for this issue, which is
  completly different from the horrible leak of supervdsm  4.16.7.
 
  Could you corroborate the claim of
 Bug 1147148 - M2Crypto usage in vdsm leaks memory
  ? Does the leak disappear once you start using plaintext transport?
 
  Regards,
  Dan.

 I don’t think this is crypto related, but I could try that if you still need 
 some confirmation (and point me at a quick doc on switching to plaintext?).

 This is from #ovirt around November 18th I think, Saggi thought he’d found 
 something related:

 9:58:43 AM saggi: YamakasY: Found the leak
 9:58:48 AM saggi: YamakasY: Or at least the flow
 9:58:57 AM saggi: YamakasY: The good news is that I can reproduce
 9:59:20 AM YamakasY: saggi: that's kewl!
 9:59:25 AM YamakasY: saggi: what happens ?
 9:59:41 AM YamakasY: I know from Telsin (ping ping!) that he sees it going 
 faster on gluster usage
 tdosek left the room (quit: Ping timeout: 480 seconds). (10:00:02 AM)
 djasa left the room (quit: Quit: Leaving). (10:00:24 AM)
 mlipchuk left the room (quit: Quit: Leaving.). (10:00:29 AM)
 laravot left the room (quit: Quit: Leaving.). (10:01:19 AM)
 10:01:54 AM saggi: YamakasY: it's in getCapabilities(). Here is the RSS 
 graph. The flatlines are when I stopped calling it and called other verbs. 
 http://i.imgur.com/CLm0Q75.png

 I do recall what is the issue Saggi and YamakasY were dicussing (CCing
 the pair), or if it reached fruition as a patch. It is certainly
 something other than Bug 1158108, as the latter speak about a leak in a
 normal working state, with no getCapabilities calls.


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] VDSM memory consumption

2015-03-09 Thread Dan Kenigsberg
On Fri, Mar 06, 2015 at 10:58:53AM -0600, Darrell Budic wrote:
 I believe the supervdsm leak was fixed, but 3.5.1 versions of vdsmd still 
 leaks slowly, ~300k/hr, yes.
 
 https://bugzilla.redhat.com/show_bug.cgi?id=1158108
 
 
  On Mar 6, 2015, at 10:23 AM, Chris Adams c...@cmadams.net wrote:
  
  Once upon a time, Federico Alberto Sayd fs...@uncu.edu.ar said:
  I am experiencing troubles with VDSM memory consuption.
  
  I am running
  
  Engine: ovirt 3.5.1
  
  Nodes:
  
  Centos 6.6
  VDSM 4.16.10-8
  Libvirt: libvirt-0.10.2-46
  Kernel: 2.6.32
  
  When the host boots, memory consuption is normal, but after 2 or 3
  days running, VDSM memory consuption grows and it consumes more
  memory that all vm's running in the host. If I restart the vdsm
  service, memory consuption normalizes, but then it start growing
  again.
  
  I have seen some BZ about vdsm and supervdsm about memory leaks, but
  I don't know if VDSM 4.6.10.8 is still affected by a related bug.
  
  Can't help, but I see the same thing with CentOS 7 nodes and the same
  version of vdsm.
  -- 
  Chris Adams c...@cmadams.net
  ___
  Users mailing list
  Users@ovirt.org
  http://lists.ovirt.org/mailman/listinfo/users

I'm afraid that we are yet to find a solution for this issue, which is
completly different from the horrible leak of supervdsm  4.16.7.

Could you corroborate the claim of
Bug 1147148 - M2Crypto usage in vdsm leaks memory
? Does the leak disappear once you start using plaintext transport?

Regards,
Dan.
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] VDSM memory consumption

2015-03-06 Thread Federico Alberto Sayd

Hello:

I am experiencing troubles with VDSM memory consuption.

I am running

Engine: ovirt 3.5.1

Nodes:

Centos 6.6
VDSM 4.16.10-8
Libvirt: libvirt-0.10.2-46
Kernel: 2.6.32

When the host boots, memory consuption is normal, but after 2 or 3 days 
running, VDSM memory consuption grows and it consumes more memory that 
all vm's running in the host. If I restart the vdsm service, memory 
consuption normalizes, but then it start growing again.


I have seen some BZ about vdsm and supervdsm about memory leaks, but I 
don't know if VDSM 4.6.10.8 is still affected by a related bug.


Any help? If you need, I can provide more information

Thank you
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] VDSM memory consumption

2015-03-06 Thread Chris Adams
Once upon a time, Federico Alberto Sayd fs...@uncu.edu.ar said:
 I am experiencing troubles with VDSM memory consuption.
 
 I am running
 
 Engine: ovirt 3.5.1
 
 Nodes:
 
 Centos 6.6
 VDSM 4.16.10-8
 Libvirt: libvirt-0.10.2-46
 Kernel: 2.6.32
 
 When the host boots, memory consuption is normal, but after 2 or 3
 days running, VDSM memory consuption grows and it consumes more
 memory that all vm's running in the host. If I restart the vdsm
 service, memory consuption normalizes, but then it start growing
 again.
 
 I have seen some BZ about vdsm and supervdsm about memory leaks, but
 I don't know if VDSM 4.6.10.8 is still affected by a related bug.

Can't help, but I see the same thing with CentOS 7 nodes and the same
version of vdsm.
-- 
Chris Adams c...@cmadams.net
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] VDSM memory consumption

2015-03-06 Thread Darrell Budic
I believe the supervdsm leak was fixed, but 3.5.1 versions of vdsmd still leaks 
slowly, ~300k/hr, yes.

https://bugzilla.redhat.com/show_bug.cgi?id=1158108


 On Mar 6, 2015, at 10:23 AM, Chris Adams c...@cmadams.net wrote:
 
 Once upon a time, Federico Alberto Sayd fs...@uncu.edu.ar said:
 I am experiencing troubles with VDSM memory consuption.
 
 I am running
 
 Engine: ovirt 3.5.1
 
 Nodes:
 
 Centos 6.6
 VDSM 4.16.10-8
 Libvirt: libvirt-0.10.2-46
 Kernel: 2.6.32
 
 When the host boots, memory consuption is normal, but after 2 or 3
 days running, VDSM memory consuption grows and it consumes more
 memory that all vm's running in the host. If I restart the vdsm
 service, memory consuption normalizes, but then it start growing
 again.
 
 I have seen some BZ about vdsm and supervdsm about memory leaks, but
 I don't know if VDSM 4.6.10.8 is still affected by a related bug.
 
 Can't help, but I see the same thing with CentOS 7 nodes and the same
 version of vdsm.
 -- 
 Chris Adams c...@cmadams.net
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users