> On Mar 25, 2015, at 5:34 AM, Dan Kenigsberg <dan...@redhat.com> wrote: > > On Tue, Mar 24, 2015 at 02:01:40PM -0500, Darrell Budic wrote: >> >>> On Mar 24, 2015, at 4:33 AM, Dan Kenigsberg <dan...@redhat.com> wrote: >>> >>> On Mon, Mar 23, 2015 at 04:00:14PM -0400, John Taylor wrote: >>>> Chris Adams <c...@cmadams.net> writes: >>>> >>>>> Once upon a time, Sven Kieske <s.kie...@mittwald.de> said: >>>>>> On 13/03/15 12:29, Kapetanakis Giannis wrote: >>>>>>> We also face this problem since 3.5 in two different installations... >>>>>>> Hope it's fixed soon >>>>>> >>>>>> Nothing will get fixed if no one bothers to >>>>>> open BZs and send relevants log files to help >>>>>> track down the problems. >>>>> >>>>> There's already an open BZ: >>>>> >>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1158108 >>>>> >>>>> I'm not sure if that is exactly the same problem I'm seeing or not; my >>>>> vdsm process seems to be growing faster (RSS grew 952K in a 5 minute >>>>> period just now; VSZ didn't change). >>>> >>>> For those following this I've added a comment on the bz [1], although in >>>> my case the memory leak is, like Chris Adams, a lot more than the 300KiB/h >>>> in the original bug report by Daniel Helgenberger . >>>> >>>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1158108 >>> >>> That's interesting (and worrying). >>> Could you check your suggestion by editing sampling.py so that >>> _get_interfaces_and_samples() returns the empty dict immediately? >>> Would this make the leak disappear? >> >> Looks like you’ve got something there. Just a quick test for now, watching >> RSS in top. I’ll let it go this way for a while and see what it looks in a >> few hours. >> >> System 1: 13 VMs w/ 24 interfaces between them >> >> 11:47 killed a vdsm @ 9.116G RSS (after maybe a week and a half running) >> >> 11:47: 97xxx >> 11:57 135544 and climbing >> 12:00 136400 >> >> restarted with sampling.py modified to just return empty set: >> >> def _get_interfaces_and_samples(): >> links_and_samples = {} >> return links_and_samples > > Thanks for the input. Just to be a little more certain that the culprit > is _get_interfaces_and_samples() per se, would you please decorate it > with memoized, and add a log line in the end > > @utils.memoized # add this line > def _get_interfaces_and_samples(): > ... > logging.debug('LINKS %s', links_and_samples) ## and this line > return links_and_samples > > I'd like to see what happens when the function is run only once, and > returns a non-empty reasonable dictionary of links and samples.
Looks similar, I modified my second server for this test: 12:25, still growing from yesterday: 544512 restarted with mods for logging and memoize: stabilized @ 12:32: 114284 1:23: 115300 Thread-12::DEBUG::2015-03-25 12:28:08,080::sampling::243::root::(_get_interfaces_and_samples) LINKS {'vnet18': <virt.sampling.InterfaceSample instance at 0x7f38c03e85f0>, 'vnet19': <virt.sampling.InterfaceSample instance at 0x7f38b42cbcf8>, 'bond0': <virt.sampling.InterfaceSample instance at 0x7f38b429afc8>, 'vnet13': <virt.sampling.InterfaceSample instance at 0x7f38b42c8680>, 'vnet16': <virt.sampling.InterfaceSample instance at 0x7f38b42cb368>, 'private': <virt.sampling.InterfaceSample instance at 0x7f38b42b8bd8>, 'bond0.100': <virt.sampling.InterfaceSample instance at 0x7f38b42bdd88>, 'vnet0': <virt.sampling.InterfaceSample instance at 0x7f38b42c1f80>, 'enp3s0': <virt.sampling.InterfaceSample instance at 0x7f38b429cef0>, 'vnet2': <virt.sampling.InterfaceSample instance at 0x7f38b42bbbd8>, 'vnet3': <virt.sampling.InterfaceSample instance at 0x7f38b42c37e8>, 'vnet4': <virt.sampling.InterfaceSample instance at 0x7f38b42c5518>, 'vnet5': <virt.sampling.InterfaceSample instance at 0x7f38b42c6ab8>, 'vnet6': <virt.sampling.InterfaceSample instance at 0x7f38b42c7248>, 'vnet7': <virt.sampling.InterfaceSample instance at 0x7f38c03e7a28>, 'vnet8': <virt.sampling.InterfaceSample instance at 0x7f38b42c7c20>, 'bond0.1100': <virt.sampling.InterfaceSample instance at 0x7f38b42be710>, 'bond0.1103': <virt.sampling.InterfaceSample instance at 0x7f38b429dc68>, 'ovirtmgmt': <virt.sampling.InterfaceSample instance at 0x7f38b42b16c8>, 'lo': <virt.sampling.InterfaceSample instance at 0x7f38b429a8c0>, 'vnet22': <virt.sampling.InterfaceSample instance at 0x7f38c03e7128>, 'vnet21': <virt.sampling.InterfaceSample instance at 0x7f38b42cd368>, 'vnet20': <virt.sampling.InterfaceSample instance at 0x7f38b42cc7a0>, 'internet': <virt.sampling.InterfaceSample instance at 0x7f38b42aa098>, 'bond0.1203': <virt.sampling.InterfaceSample instance at 0x7f38b42aa8c0>, 'bond0.1223': <virt.sampling.InterfaceSample instance at 0x7f38b42bb128>, ‘XXXXXXXXXXX': <virt.sampling.InterfaceSample instance at 0x7f38b42bee60>, ‘XXXXXXX': <virt.sampling.InterfaceSample instance at 0x7f38b42beef0>, ';vdsmdummy;': <virt.sampling.InterfaceSample instance at 0x7f38b42bdc20>, 'vnet14': <virt.sampling.InterfaceSample instance at 0x7f38b42ca050>, 'mgmt': <virt.sampling.InterfaceSample instance at 0x7f38b42be248>, 'vnet15': <virt.sampling.InterfaceSample instance at 0x7f38b42cab00>, 'enp2s0': <virt.sampling.InterfaceSample instance at 0x7f38b429c200>, 'bond0.1110': <virt.sampling.InterfaceSample instance at 0x7f38b42bed40>, 'vnet1': <virt.sampling.InterfaceSample instance at 0x7f38b42c27e8>, 'bond0.1233': <virt.sampling.InterfaceSample instance at 0x7f38b42bedd0>, 'bond0.1213': <virt.sampling.InterfaceSample instance at 0x7f38b42b2128>} Didn’t see the significant CPU use difference on this one, so thinking it was all ksmd on yesterdays tests. Yesterdays test is still going, and still hovering around 135016 or so. _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users