> On Mar 25, 2015, at 5:34 AM, Dan Kenigsberg <[email protected]> wrote:
>
> On Tue, Mar 24, 2015 at 02:01:40PM -0500, Darrell Budic wrote:
>>
>>> On Mar 24, 2015, at 4:33 AM, Dan Kenigsberg <[email protected]> wrote:
>>>
>>> On Mon, Mar 23, 2015 at 04:00:14PM -0400, John Taylor wrote:
>>>> Chris Adams <[email protected]> writes:
>>>>
>>>>> Once upon a time, Sven Kieske <[email protected]> said:
>>>>>> On 13/03/15 12:29, Kapetanakis Giannis wrote:
>>>>>>> We also face this problem since 3.5 in two different installations...
>>>>>>> Hope it's fixed soon
>>>>>>
>>>>>> Nothing will get fixed if no one bothers to
>>>>>> open BZs and send relevants log files to help
>>>>>> track down the problems.
>>>>>
>>>>> There's already an open BZ:
>>>>>
>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1158108
>>>>>
>>>>> I'm not sure if that is exactly the same problem I'm seeing or not; my
>>>>> vdsm process seems to be growing faster (RSS grew 952K in a 5 minute
>>>>> period just now; VSZ didn't change).
>>>>
>>>> For those following this I've added a comment on the bz [1], although in
>>>> my case the memory leak is, like Chris Adams, a lot more than the 300KiB/h
>>>> in the original bug report by Daniel Helgenberger .
>>>>
>>>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1158108
>>>
>>> That's interesting (and worrying).
>>> Could you check your suggestion by editing sampling.py so that
>>> _get_interfaces_and_samples() returns the empty dict immediately?
>>> Would this make the leak disappear?
>>
>> Looks like you’ve got something there. Just a quick test for now, watching
>> RSS in top. I’ll let it go this way for a while and see what it looks in a
>> few hours.
>>
>> System 1: 13 VMs w/ 24 interfaces between them
>>
>> 11:47 killed a vdsm @ 9.116G RSS (after maybe a week and a half running)
>>
>> 11:47: 97xxx
>> 11:57 135544 and climbing
>> 12:00 136400
>>
>> restarted with sampling.py modified to just return empty set:
>>
>> def _get_interfaces_and_samples():
>> links_and_samples = {}
>> return links_and_samples
>
> Thanks for the input. Just to be a little more certain that the culprit
> is _get_interfaces_and_samples() per se, would you please decorate it
> with memoized, and add a log line in the end
>
> @utils.memoized # add this line
> def _get_interfaces_and_samples():
> ...
> logging.debug('LINKS %s', links_and_samples) ## and this line
> return links_and_samples
>
> I'd like to see what happens when the function is run only once, and
> returns a non-empty reasonable dictionary of links and samples.
Looks similar, I modified my second server for this test:
12:25, still growing from yesterday: 544512
restarted with mods for logging and memoize:
stabilized @ 12:32: 114284
1:23: 115300
Thread-12::DEBUG::2015-03-25
12:28:08,080::sampling::243::root::(_get_interfaces_and_samples) LINKS
{'vnet18': <virt.sampling.InterfaceSample instance at 0x7f38c03e85f0>,
'vnet19': <virt.sampling.InterfaceSample instance at 0x7f38b42cbcf8>, 'bond0':
<virt.sampling.InterfaceSample instance at 0x7f38b429afc8>, 'vnet13':
<virt.sampling.InterfaceSample instance at 0x7f38b42c8680>, 'vnet16':
<virt.sampling.InterfaceSample instance at 0x7f38b42cb368>, 'private':
<virt.sampling.InterfaceSample instance at 0x7f38b42b8bd8>, 'bond0.100':
<virt.sampling.InterfaceSample instance at 0x7f38b42bdd88>, 'vnet0':
<virt.sampling.InterfaceSample instance at 0x7f38b42c1f80>, 'enp3s0':
<virt.sampling.InterfaceSample instance at 0x7f38b429cef0>, 'vnet2':
<virt.sampling.InterfaceSample instance at 0x7f38b42bbbd8>, 'vnet3':
<virt.sampling.InterfaceSample instance at 0x7f38b42c37e8>, 'vnet4':
<virt.sampling.InterfaceSample instance at 0x7f38b42c5518>, 'vnet5':
<virt.sampling.InterfaceSample instance at 0x7f38b42c6ab8>, 'vnet6':
<virt.sampling.InterfaceSample instance at 0x7f38b42c7248>, 'vnet7':
<virt.sampling.InterfaceSample instance at 0x7f38c03e7a28>, 'vnet8':
<virt.sampling.InterfaceSample instance at 0x7f38b42c7c20>, 'bond0.1100':
<virt.sampling.InterfaceSample instance at 0x7f38b42be710>, 'bond0.1103':
<virt.sampling.InterfaceSample instance at 0x7f38b429dc68>, 'ovirtmgmt':
<virt.sampling.InterfaceSample instance at 0x7f38b42b16c8>, 'lo':
<virt.sampling.InterfaceSample instance at 0x7f38b429a8c0>, 'vnet22':
<virt.sampling.InterfaceSample instance at 0x7f38c03e7128>, 'vnet21':
<virt.sampling.InterfaceSample instance at 0x7f38b42cd368>, 'vnet20':
<virt.sampling.InterfaceSample instance at 0x7f38b42cc7a0>, 'internet':
<virt.sampling.InterfaceSample instance at 0x7f38b42aa098>, 'bond0.1203':
<virt.sampling.InterfaceSample instance at 0x7f38b42aa8c0>, 'bond0.1223':
<virt.sampling.InterfaceSample instance at 0x7f38b42bb128>, ‘XXXXXXXXXXX':
<virt.sampling.InterfaceSample instance at 0x7f38b42bee60>, ‘XXXXXXX':
<virt.sampling.InterfaceSample instance at 0x7f38b42beef0>, ';vdsmdummy;':
<virt.sampling.InterfaceSample instance at 0x7f38b42bdc20>, 'vnet14':
<virt.sampling.InterfaceSample instance at 0x7f38b42ca050>, 'mgmt':
<virt.sampling.InterfaceSample instance at 0x7f38b42be248>, 'vnet15':
<virt.sampling.InterfaceSample instance at 0x7f38b42cab00>, 'enp2s0':
<virt.sampling.InterfaceSample instance at 0x7f38b429c200>, 'bond0.1110':
<virt.sampling.InterfaceSample instance at 0x7f38b42bed40>, 'vnet1':
<virt.sampling.InterfaceSample instance at 0x7f38b42c27e8>, 'bond0.1233':
<virt.sampling.InterfaceSample instance at 0x7f38b42bedd0>, 'bond0.1213':
<virt.sampling.InterfaceSample instance at 0x7f38b42b2128>}
Didn’t see the significant CPU use difference on this one, so thinking it was
all ksmd on yesterdays tests.
Yesterdays test is still going, and still hovering around 135016 or so.
_______________________________________________
Users mailing list
[email protected]
http://lists.ovirt.org/mailman/listinfo/users