Re: [ovirt-devel] [VDSM] health: Introduce Vdsm health monitoring

2016-01-14 Thread Nir Soffer
Here are example logs with running and stopping one vm:

# tail -f /var/log/vdsm/vdsm.log | grep health
Thread-13::DEBUG::2016-01-14
19:42:04,625::health::84::health::(_check) Checking health
Thread-13::DEBUG::2016-01-14
19:42:04,651::health::86::health::(_check) Collected 25 objects
Thread-13::WARNING::2016-01-14
19:42:04,656::health::100::health::(_check) Found 6085 uncollectible
objects, 0 suspects implementing __del__: []

Vm was started here...

Thread-13::DEBUG::2016-01-14
19:43:04,657::health::84::health::(_check) Checking health
Thread-13::DEBUG::2016-01-14
19:43:04,688::health::86::health::(_check) Collected 1680 objects
Thread-13::WARNING::2016-01-14
19:43:04,697::health::100::health::(_check) Found 11041 uncollectible
objects, 0 suspects implementing __del__: []
Thread-13::DEBUG::2016-01-14
19:44:04,698::health::84::health::(_check) Checking health
Thread-13::DEBUG::2016-01-14
19:44:04,729::health::86::health::(_check) Collected 57 objects
Thread-13::WARNING::2016-01-14
19:44:04,737::health::100::health::(_check) Found 11096 uncollectible
objects, 0 suspects implementing __del__: []
Thread-13::DEBUG::2016-01-14
19:45:04,738::health::84::health::(_check) Checking health
Thread-13::DEBUG::2016-01-14
19:45:04,769::health::86::health::(_check) Collected 69 objects
Thread-13::WARNING::2016-01-14
19:45:04,778::health::100::health::(_check) Found 11163 uncollectible
objects, 0 suspects implementing __del__: []

Vm was shutdown here...

Thread-13::DEBUG::2016-01-14
19:46:04,779::health::84::health::(_check) Checking health
Thread-13::DEBUG::2016-01-14
19:46:04,809::health::86::health::(_check) Collected 2272 objects
Thread-13::WARNING::2016-01-14
19:46:04,819::health::100::health::(_check) Found 13464 uncollectible
objects, 28 suspects implementing __del__: ['', '',
'', '', '',
'', '', '',
'', '', '',
'', '', '',
'', '', '',
'', '', '',
'', '', '',
'', '', '',
'', '']

Need to test with should be fixed by https://gerrit.ovirt.org/51630,
which should
eliminate '', and maybe also
the pthreading
objects.

Nir

On Thu, Jan 14, 2016 at 7:38 PM, Nir Soffer  wrote:
> Hi all,
>
> Continuing the leak investigation Francesco and Milan are working on,
> I posted this patch, adding leak health monitoring to vdsm [1]. This patch
> currently monitor collectible objects, and we may add other interesting
> stuff later if needed.
>
> To enable the monitor, you must set:
>
> [devel]
> health_monitor_enable = true
>
> In vdsm.conf, and restart vdsm.
>
> Here is an example logs - this was logged 60 seconds after starting vdsm,
> when running as spm:
>
> Thread-13::DEBUG::2016-01-14
> 18:53:43,239::health::84::health::(_check) Checking health
> Thread-13::DEBUG::2016-01-14
> 18:53:43,272::health::86::health::(_check) Collected 460 objects
> Thread-13::WARNING::2016-01-14
> 18:53:43,277::health::100::health::(_check) Found 5873 uncollectible
> objects, 10 suspects implementing __del__: [' at 0x7f012c115320>', '',
> '', ' at 0x7f012c1154d0>', '',
> '', ' 0x7f012c111250>', '',
> '', ' 0x7f012c111750>']
>
> We found 460 uncollectible objects, 10 of them are objects implementing 
> __del__.
> Having such object in a reference cycle make the entire cycle uncollectible.
>
> The pthreading objects in this log grow as vdsm continue to run, even without
> any interaction with engine. The pthreading objects leak is caused by a cycle
> in storage.misc.RWLock, fixed in patch [2].
>
> After solving the pthreading leak, we still have many uncollectible
> objects in the logs,
> listing them, it seems that the suspect is hiding inside a collection
> (dict, list, tuple)
> and does not appear in the gc.garbage list. I will address this in later 
> patch,
> searching in the entire object graph instead of the top level of gc.garbage.
>
> See end of this mail for list of these objects. I printed them like this:
>
> Enable manhole debugging shell:
>
> [devel]
> manhole_enable = true
>
> Connect to vdsm using manhole:
>
> nc -U /run/vdsm/vdsmd.manhole
>
> Print garbage:
>
 import pprint
 import gc
 with open('/tmp/garbage.txt', 'w') as f:
>pprint.pprint(gc.garbage, stream=f, indent=4)
>
> These objects looks related to libvirt - Francesco, can you take a look?
>
> [1] https://gerrit.ovirt.org/51708
> [2] https://gerrit.ovirt.org/51868
>
> Nir
>
> --- garbage after patch [2] ---
>
> [   (,),
> {   '__dict__': ,
> '__doc__': None,
> '__module__': 'vdsm.netlink',
> '__weakref__': ,
> '_length_': 60,
> '_type_': ,
> 'raw': ,
> 'value': },
> ,
> ,
> ,
> (   ,
> ,
> ,
> ),
> ,
> ,
> ,
> ,
> ,
> (   ,
> ,
> ),
> ,
> [   [   ,
> ,
> 'class'],
> [   ,
> ,
> 'class'],
> None],
> [   [   ,
> ,
> None],
> [   ,
> ,
> None],
> 'class'],
>

Re: [ovirt-devel] [VDSM] health: Introduce Vdsm health monitoring

2016-01-14 Thread Vinzenz Feenstra

> On Jan 14, 2016, at 6:38 PM, Nir Soffer  wrote:
> 
> Hi all,
> 
> Continuing the leak investigation Francesco and Milan are working on,
> I posted this patch, adding leak health monitoring to vdsm [1]. This patch
> currently monitor collectible objects, and we may add other interesting
> stuff later if needed.
> 
> To enable the monitor, you must set:
> 
> [devel]
> health_monitor_enable = true
> 
> In vdsm.conf, and restart vdsm.
> 
> Here is an example logs - this was logged 60 seconds after starting vdsm,
> when running as spm:
> 
> Thread-13::DEBUG::2016-01-14
> 18:53:43,239::health::84::health::(_check) Checking health
> Thread-13::DEBUG::2016-01-14
> 18:53:43,272::health::86::health::(_check) Collected 460 objects
> Thread-13::WARNING::2016-01-14
> 18:53:43,277::health::100::health::(_check) Found 5873 uncollectible
> objects, 10 suspects implementing __del__: [' at 0x7f012c115320>', '',
> '', ' at 0x7f012c1154d0>', '',
> '', ' 0x7f012c111250>', '',
> '', ' 0x7f012c111750>']
> 
> We found 460 uncollectible objects, 10 of them are objects implementing 
> __del__.
> Having such object in a reference cycle make the entire cycle uncollectible.
> 
> The pthreading objects in this log grow as vdsm continue to run, even without
> any interaction with engine. The pthreading objects leak is caused by a cycle
> in storage.misc.RWLock, fixed in patch [2].
> 
> After solving the pthreading leak, we still have many uncollectible
> objects in the logs,
> listing them, it seems that the suspect is hiding inside a collection
> (dict, list, tuple)
> and does not appear in the gc.garbage list. I will address this in later 
> patch,
> searching in the entire object graph instead of the top level of gc.garbage.
> 
> See end of this mail for list of these objects. I printed them like this:
> 
> Enable manhole debugging shell:
> 
> [devel]
> manhole_enable = true
> 
> Connect to vdsm using manhole:
> 
> nc -U /run/vdsm/vdsmd.manhole
> 
> Print garbage:
> 
 import pprint
 import gc
 with open('/tmp/garbage.txt', 'w') as f:
>   pprint.pprint(gc.garbage, stream=f, indent=4)
> 
> These objects looks related to libvirt - Francesco, can you take a look?
Since you’re addressing Francesco directly, I am adding him explicitly - To 
point that out - Someone might skip that little part here ;)
> 
> [1] https://gerrit.ovirt.org/51708
> [2] https://gerrit.ovirt.org/51868
> 
> Nir
> 
> --- garbage after patch [2] ---
> 
> [   (,),
>{   '__dict__': ,
>'__doc__': None,
>'__module__': 'vdsm.netlink',
>'__weakref__': ,
>'_length_': 60,
>'_type_': ,
>'raw': ,
>'value': },
>,
>,
>,
>(   ,
>,
>,
>),
>,
>,
>,
>,
>,
>(   ,
>,
>),
>,
>[   [   ,
>,
>'class'],
>[   ,
>,
>'class'],
>None],
>[   [   ,
>,
>None],
>[   ,
>,
>None],
>'class'],
>[   [   ,
>,
>'class'],
>[   ,
>,
>'class'],
>None],
>[   [   ,
>,
>None],
>[   ,
>,
>None],
>'class'],
>[   [   ,
>,
>'class'],
>[   ,
>,
>'class'],
>None],
>[   [   ,
>,
>None],
>[   ,
>,
>None],
>'class'],
>[   [   ,
>,
>'class'],
>[   ,
>,
>'class'],
>None],
>[   [   ,
>,
>None],
>[   ,
>,
>None],
>'class'],
>[   [   ,
>,
>'class'],
>[   ,
>,
>'class'],
>None],
>[   [   ,
>,
>None],
>[   ,
>,
>None],
>'class'],
>[   [   ,
>,
>'class'],
>[   ,
>,
>'class'],
>None],
>[   [   ,
>,
>None],
>[   ,
>,
>None],
>'class'],
>[   [   ,
>,
>'class'],
>[   ,
>,
>'class'],
>None],
>[   [   ,
>,
>None],
>[   ,
>,
>None],
>'class'],
>[   [   ,
>,
>'class'],
>[   ,
>,
>'class'],
>None],
>[   [   ,
>,
>None],
>[   ,
>,
>None],
>'class'],
>[   [   ,
>,
>'class'],
>[   ,
>,
>'class'],
>None],
>[   [   ,
>,
>None],
>[   ,
>,
>None],
>'class'],

[ovirt-devel] [VDSM] health: Introduce Vdsm health monitoring

2016-01-14 Thread Nir Soffer
Hi all,

Continuing the leak investigation Francesco and Milan are working on,
I posted this patch, adding leak health monitoring to vdsm [1]. This patch
currently monitor collectible objects, and we may add other interesting
stuff later if needed.

To enable the monitor, you must set:

[devel]
health_monitor_enable = true

In vdsm.conf, and restart vdsm.

Here is an example logs - this was logged 60 seconds after starting vdsm,
when running as spm:

Thread-13::DEBUG::2016-01-14
18:53:43,239::health::84::health::(_check) Checking health
Thread-13::DEBUG::2016-01-14
18:53:43,272::health::86::health::(_check) Collected 460 objects
Thread-13::WARNING::2016-01-14
18:53:43,277::health::100::health::(_check) Found 5873 uncollectible
objects, 10 suspects implementing __del__: ['', '',
'', '', '',
'', '', '',
'', '']

We found 460 uncollectible objects, 10 of them are objects implementing __del__.
Having such object in a reference cycle make the entire cycle uncollectible.

The pthreading objects in this log grow as vdsm continue to run, even without
any interaction with engine. The pthreading objects leak is caused by a cycle
in storage.misc.RWLock, fixed in patch [2].

After solving the pthreading leak, we still have many uncollectible
objects in the logs,
listing them, it seems that the suspect is hiding inside a collection
(dict, list, tuple)
and does not appear in the gc.garbage list. I will address this in later patch,
searching in the entire object graph instead of the top level of gc.garbage.

See end of this mail for list of these objects. I printed them like this:

Enable manhole debugging shell:

[devel]
manhole_enable = true

Connect to vdsm using manhole:

nc -U /run/vdsm/vdsmd.manhole

Print garbage:

>>> import pprint
>>> import gc
>>> with open('/tmp/garbage.txt', 'w') as f:
   pprint.pprint(gc.garbage, stream=f, indent=4)

These objects looks related to libvirt - Francesco, can you take a look?

[1] https://gerrit.ovirt.org/51708
[2] https://gerrit.ovirt.org/51868

Nir

--- garbage after patch [2] ---

[   (,),
{   '__dict__': ,
'__doc__': None,
'__module__': 'vdsm.netlink',
'__weakref__': ,
'_length_': 60,
'_type_': ,
'raw': ,
'value': },
,
,
,
(   ,
,
,
),
,
,
,
,
,
(   ,
,
),
,
[   [   ,
,
'class'],
[   ,
,
'class'],
None],
[   [   ,
,
None],
[   ,
,
None],
'class'],
[   [   ,
,
'class'],
[   ,
,
'class'],
None],
[   [   ,
,
None],
[   ,
,
None],
'class'],
[   [   ,
,
'class'],
[   ,
,
'class'],
None],
[   [   ,
,
None],
[   ,
,
None],
'class'],
[   [   ,
,
'class'],
[   ,
,
'class'],
None],
[   [   ,
,
None],
[   ,
,
None],
'class'],
[   [   ,
,
'class'],
[   ,
,
'class'],
None],
[   [   ,
,
None],
[   ,
,
None],
'class'],
[   [   ,
,
'class'],
[   ,
,
'class'],
None],
[   [   ,
,
None],
[   ,
,
None],
'class'],
[   [   ,
,
'class'],
[   ,
,
'class'],
None],
[   [   ,
,
None],
[   ,
,
None],
'class'],
[   [   ,
,
'class'],
[   ,
,
'class'],
None],
[   [   ,
,
None],
[   ,
,
None],
'class'],
[   [   ,
,
'class'],
[   ,
,
'class'],
None],
[   [   ,
,
None],
[   ,
,
None],
'class'],
[   [   ,
,
'class'],
[   ,
,
'class'],
None],
[   [   ,
,
None],
[   ,
,
None],
'class'],
,
{   '_children': [],
'attrib': {   'unit': 'KiB'},
'tag': 'memory',
'tail': '\n  ',
'text': '4095476'},
[],
,
{   '_children': [],
'attrib': {   'size': '4', 'unit': 'KiB'},
'tag': 'pages',
'tail': '\n  ',
'text': '1023869'},
[],
,
{ 

Re: [ovirt-devel] vdsm_master_unit-tests_merged is failing

2016-01-14 Thread Sandro Bonazzola
On Tue, Dec 22, 2015 at 12:04 PM, Dan Kenigsberg  wrote:

> On Tue, Dec 22, 2015 at 09:59:20AM +0200, Barak Korren wrote:
> > > Please remove it - unless you have plans to revert the
> > > automation/*-based approach.
> >
> > Since I don't know who wrote it, and what useful bits of code may be
> > hiding inside, I rather not make any irreversible changes while almost
> > everyone is on PTO.
>
> The job cherry-picks the now-abandoned https://gerrit.ovirt.org/#/c/34888/
> and runs
>
> https://gerrit.ovirt.org/#/c/34888/15/jobs/confs/shell-scripts/vdsm_unit-tests.sh
>
> This code is now placed under vdsm's automation dir, and there's no
> reason to look back.
>


ok, so have the job been deleted?



> ___
> Devel mailing list
> Devel@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/devel
>



-- 
Sandro Bonazzola
Better technology. Faster innovation. Powered by community collaboration.
See how it works at redhat.com
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel

[ovirt-devel] oVirt engine 4.0 will require WildFly 10 / EAP 7

2016-01-14 Thread Martin Perina
Hi,

we are going to merge patches which drop EAP 6 support [1] and require
WildFly 10 / EAP 7 [2] for oVirt engine master on Monday January 18th
at 10:00 CET.

This is a huge step as we will finally be able to use all new features
provided by WildFly 10 (J2EE 7, Java 8, RESTEasy 3, ActiveMQ Artemis,
Hibernate 5, ...).
Upgrade of code base to use those new features will be done incrementally,
for example this patch [3] cleans up our CDI related code and upgrades
to CDI 1.2.

So on Monday please upgrade WildFly packages on your development machines
to:

  ovirt-engine-wildfly-10.0.0
  ovirt-engine-wildfly-overlay-10.0.0

and rebase your patches on top of latest master.

To ease backporting of patches to oVirt 3.6 we have already merged patches
which made oVirt 3.6 compatible with WildFly 10, but please bear in mind
that oVirt 3.6 have to be compatible with EAP 6 and there are no plans
to upgrade it to WildFly 10!


Please contact me or other members of infra team if you have any issues
with this upgrade.

Thanks


Martin Perina


[1] https://gerrit.ovirt.org/48208
[2] https://gerrit.ovirt.org/48209
[3] https://gerrit.ovirt.org/48305
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


Re: [ovirt-devel] VARIANT_ID usage - with our without oVirt version?

2016-01-14 Thread Moti Asayag
On Wed, Jan 13, 2016 at 5:17 PM, Fabian Deutsch  wrote:

> Hey,
>
> we've now merged a patch [0] to use and populate the VARIANT and
> VARIANT_ID fields on Node.
>
> Currently the value is something like "ovirt-node-$BRANCH", i.e.
> "ovirt-node-master" or "ovirt-node-3.6".
>
> I'd like to question if we should include the oVirt version in the ID,
> or if we should just use "ovirt-node" without the version.
>
> From my POV the variant is not depending on a specific version, that
> is why I'd like to discuss it.
>

+1
I agree the variant-id should not be a version specific. It should only
describe the flavour of the host.
I don't see why the engine should be aware of the specific version of it,
especially since we'd like to have a unified process for all host types and
furthermore for the same host type of different versions.


>
> The oVirt version can still be retieved like on any other host i.e.
> using rpm or maybe some file(?).
>

 Resolving the supported version of the hypervisor should be done the same
way as for any host by monitoring the capabilities as reported by VDSM.


>
> - fabian
>
> --
> [0]
> https://gerrit.ovirt.org/gitweb?p=ovirt-release.git;a=blob;f=ovirt-release-master/ovirt-release-master.spec.in;h=8690d39402221acac402a6f2f0c485571ad838fa;hb=HEAD#l140
> [1] http://www.freedesktop.org/software/systemd/man/os-release.html
>



-- 
Regards,
Moti
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel

Re: [ovirt-devel] VARIANT_ID usage - with our without oVirt version?

2016-01-14 Thread Fabian Deutsch
On Thu, Jan 14, 2016 at 2:26 PM, Moti Asayag  wrote:
>
> On Wed, Jan 13, 2016 at 5:17 PM, Fabian Deutsch  wrote:
>>
>> Hey,
>>
>> we've now merged a patch [0] to use and populate the VARIANT and
>> VARIANT_ID fields on Node.
>>
>> Currently the value is something like "ovirt-node-$BRANCH", i.e.
>> "ovirt-node-master" or "ovirt-node-3.6".
>>
>> I'd like to question if we should include the oVirt version in the ID,
>> or if we should just use "ovirt-node" without the version.
>>
>> From my POV the variant is not depending on a specific version, that
>> is why I'd like to discuss it.
>
>
> +1
> I agree the variant-id should not be a version specific. It should only
> describe the flavour of the host.
> I don't see why the engine should be aware of the specific version of it,
> especially since we'd like to have a unified process for all host types and
> furthermore for the same host type of different versions.
>
>>
>>
>> The oVirt version can still be retieved like on any other host i.e.
>> using rpm or maybe some file(?).
>
>
>  Resolving the supported version of the hypervisor should be done the same
> way as for any host by monitoring the capabilities as reported by VDSM.
>

Perfect, that all makes sense to me as well.

- fabian


-- 
Fabian Deutsch 
RHEV Hypervisor
Red Hat
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


Re: [ovirt-devel] VARIANT_ID usage - with our without oVirt version?

2016-01-14 Thread Douglas Schilling Landgraf



On 01/14/2016 08:26 AM, Moti Asayag wrote:


On Wed, Jan 13, 2016 at 5:17 PM, Fabian Deutsch > wrote:


Hey,

we've now merged a patch [0] to use and populate the VARIANT and
VARIANT_ID fields on Node.

Currently the value is something like "ovirt-node-$BRANCH", i.e.
"ovirt-node-master" or "ovirt-node-3.6".

I'd like to question if we should include the oVirt version in the ID,
or if we should just use "ovirt-node" without the version.

From my POV the variant is not depending on a specific version, that
is why I'd like to discuss it.

My point of view is that variant_id doesn't depend of specific version, 
it only shows the
'flavor' of distro and may or may not include numbers as the link [1] 
showed.


One benefit to have the branding/ovirt release in variant id is that in 
the new oVirt Node
it uses Cockpit which reads /etc/os-release (ID + VARIANT_ID ) to show 
to the users in the login

page such data, i.e:

"CentOS oVirt Node 3.6"

Username:
Password:



+1
I agree the variant-id should not be a version specific. It should 
only describe the flavour of the host.
I don't see why the engine should be aware of the specific version of 
it, especially since we'd like to have a unified process for all host 
types and furthermore for the same host type of different versions.


I agree that Engine shouldn't care about a specific version at all but 
probably VDSM will be sending /etc/os-release to Engine for displaying 
data to the users.




The oVirt version can still be retieved like on any other host i.e.
using rpm or maybe some file(?).


 Resolving the supported version of the hypervisor should be done the 
same way as for any host by monitoring the capabilities as reported by 
VDSM.



- fabian

--
[0]

https://gerrit.ovirt.org/gitweb?p=ovirt-release.git;a=blob;f=ovirt-release-master/ovirt-release-master.spec.in;h=8690d39402221acac402a6f2f0c485571ad838fa;hb=HEAD#l140
[1] http://www.freedesktop.org/software/systemd/man/os-release.html




--
Regards,
Moti


___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel