> On Mar 25, 2015, at 5:34 AM, Dan Kenigsberg <dan...@redhat.com> wrote:
> 
> On Tue, Mar 24, 2015 at 02:01:40PM -0500, Darrell Budic wrote:
>> 
>>> On Mar 24, 2015, at 4:33 AM, Dan Kenigsberg <dan...@redhat.com> wrote:
>>> 
>>> On Mon, Mar 23, 2015 at 04:00:14PM -0400, John Taylor wrote:
>>>> Chris Adams <c...@cmadams.net> writes:
>>>> 
>>>>> Once upon a time, Sven Kieske <s.kie...@mittwald.de> said:
>>>>>> On 13/03/15 12:29, Kapetanakis Giannis wrote:
>>>>>>> We also face this problem since 3.5 in two different installations...
>>>>>>> Hope it's fixed soon
>>>>>> 
>>>>>> Nothing will get fixed if no one bothers to
>>>>>> open BZs and send relevants log files to help
>>>>>> track down the problems.
>>>>> 
>>>>> There's already an open BZ:
>>>>> 
>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1158108
>>>>> 
>>>>> I'm not sure if that is exactly the same problem I'm seeing or not; my
>>>>> vdsm process seems to be growing faster (RSS grew 952K in a 5 minute
>>>>> period just now; VSZ didn't change).
>>>> 
>>>> For those following this I've added a comment on the bz [1], although in
>>>> my case the memory leak is, like Chris Adams, a lot more than the 300KiB/h
>>>> in the original bug report by Daniel Helgenberger .
>>>> 
>>>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1158108
>>> 
>>> That's interesting (and worrying).
>>> Could you check your suggestion by editing sampling.py so that
>>> _get_interfaces_and_samples() returns the empty dict immediately?
>>> Would this make the leak disappear?
>> 
>> Looks like you’ve got something there. Just a quick test for now, watching 
>> RSS in top. I’ll let it go this way for a while and see what it looks in a 
>> few hours.
>> 
>> System 1: 13 VMs w/ 24 interfaces between them
>> 
>> 11:47 killed a vdsm @ 9.116G RSS (after maybe a week and a half running)
>> 
>> 11:47: 97xxx
>> 11:57 135544 and climbing
>> 12:00 136400
>> 
>> restarted with sampling.py modified to just return empty set:
>> 
>> def _get_interfaces_and_samples():
>>    links_and_samples = {}
>>    return links_and_samples
> 
> Thanks for the input. Just to be a little more certain that the culprit
> is _get_interfaces_and_samples() per se, would you please decorate it
> with memoized, and add a log line in the end
> 
> @utils.memoized   # add this line
> def _get_interfaces_and_samples():
>    ...
>    logging.debug('LINKS %s', links_and_samples)  ## and this line
>    return links_and_samples
> 
> I'd like to see what happens when the function is run only once, and
> returns a non-empty reasonable dictionary of links and samples.

Looks similar, I modified my second server for this test:

12:25, still growing from yesterday: 544512

restarted with mods for logging and memoize:
stabilized @ 12:32: 114284
1:23: 115300

Thread-12::DEBUG::2015-03-25 
12:28:08,080::sampling::243::root::(_get_interfaces_and_samples) LINKS 
{'vnet18': <virt.sampling.InterfaceSample instance at 0x7f38c03e85f0>, 
'vnet19': <virt.sampling.InterfaceSample instance at 0x7f38b42cbcf8>, 'bond0': 
<virt.sampling.InterfaceSample instance at 0x7f38b429afc8>, 'vnet13': 
<virt.sampling.InterfaceSample instance at 0x7f38b42c8680>, 'vnet16': 
<virt.sampling.InterfaceSample instance at 0x7f38b42cb368>, 'private': 
<virt.sampling.InterfaceSample instance at 0x7f38b42b8bd8>, 'bond0.100': 
<virt.sampling.InterfaceSample instance at 0x7f38b42bdd88>, 'vnet0': 
<virt.sampling.InterfaceSample instance at 0x7f38b42c1f80>, 'enp3s0': 
<virt.sampling.InterfaceSample instance at 0x7f38b429cef0>, 'vnet2': 
<virt.sampling.InterfaceSample instance at 0x7f38b42bbbd8>, 'vnet3': 
<virt.sampling.InterfaceSample instance at 0x7f38b42c37e8>, 'vnet4': 
<virt.sampling.InterfaceSample instance at 0x7f38b42c5518>, 'vnet5': 
<virt.sampling.InterfaceSample instance at 0x7f38b42c6ab8>, 'vnet6': 
<virt.sampling.InterfaceSample instance at 0x7f38b42c7248>, 'vnet7': 
<virt.sampling.InterfaceSample instance at 0x7f38c03e7a28>, 'vnet8': 
<virt.sampling.InterfaceSample instance at 0x7f38b42c7c20>, 'bond0.1100': 
<virt.sampling.InterfaceSample instance at 0x7f38b42be710>, 'bond0.1103': 
<virt.sampling.InterfaceSample instance at 0x7f38b429dc68>, 'ovirtmgmt': 
<virt.sampling.InterfaceSample instance at 0x7f38b42b16c8>, 'lo': 
<virt.sampling.InterfaceSample instance at 0x7f38b429a8c0>, 'vnet22': 
<virt.sampling.InterfaceSample instance at 0x7f38c03e7128>, 'vnet21': 
<virt.sampling.InterfaceSample instance at 0x7f38b42cd368>, 'vnet20': 
<virt.sampling.InterfaceSample instance at 0x7f38b42cc7a0>, 'internet': 
<virt.sampling.InterfaceSample instance at 0x7f38b42aa098>, 'bond0.1203': 
<virt.sampling.InterfaceSample instance at 0x7f38b42aa8c0>, 'bond0.1223': 
<virt.sampling.InterfaceSample instance at 0x7f38b42bb128>, ‘XXXXXXXXXXX': 
<virt.sampling.InterfaceSample instance at 0x7f38b42bee60>, ‘XXXXXXX': 
<virt.sampling.InterfaceSample instance at 0x7f38b42beef0>, ';vdsmdummy;': 
<virt.sampling.InterfaceSample instance at 0x7f38b42bdc20>, 'vnet14': 
<virt.sampling.InterfaceSample instance at 0x7f38b42ca050>, 'mgmt': 
<virt.sampling.InterfaceSample instance at 0x7f38b42be248>, 'vnet15': 
<virt.sampling.InterfaceSample instance at 0x7f38b42cab00>, 'enp2s0': 
<virt.sampling.InterfaceSample instance at 0x7f38b429c200>, 'bond0.1110': 
<virt.sampling.InterfaceSample instance at 0x7f38b42bed40>, 'vnet1': 
<virt.sampling.InterfaceSample instance at 0x7f38b42c27e8>, 'bond0.1233': 
<virt.sampling.InterfaceSample instance at 0x7f38b42bedd0>, 'bond0.1213': 
<virt.sampling.InterfaceSample instance at 0x7f38b42b2128>}

Didn’t see the significant CPU use difference on this one, so thinking it was 
all ksmd on yesterdays tests.

Yesterdays test is still going, and still hovering around 135016 or so.

_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

Reply via email to