Re: [vdsm] supervdsmServer: Enable logging from multiprocessing library

2013-10-02 Thread Yaniv Bronheim


- Original Message -
> From: "Dan Kenigsberg" 
> To: "David Gibson" 
> Cc: "Tomas Dosek" , vdsm-devel@lists.fedorahosted.org
> Sent: Wednesday, October 2, 2013 11:21:30 AM
> Subject: Re: [vdsm] supervdsmServer: Enable logging from multiprocessing 
> library
> 
> On Wed, Oct 02, 2013 at 03:59:58PM +1000, David Gibson wrote:
> > On Tue, 01 Oct 2013 20:25:21 +0100
> > Lee Yarwood  wrote:
> > 
> > > On 10/01/2013 06:35 PM, Dan Kenigsberg wrote:
> > > > On Tue, Oct 01, 2013 at 02:33:00PM +0100, Lee Yarwood wrote:
> > > >> On 10/01/2013 09:00 AM, Dan Kenigsberg wrote:
> > > >>> It is prefered to post patches to gerrit.ovirt.org.
> > > >>
> > > >> Apologies for jumping in David but I've pushed this here for now :
> > > >>
> > > >> http://gerrit.ovirt.org/19741
> > > > 
> > > > Thanks!
> > 
> > Thanks, I'm new to ovirt development and gerrit, so I'm going to need
> > to work that out.
> > 
> > > >>> On Tue, Oct 01, 2013 at 01:18:25PM +1000, David Gibson wrote:
> > > >>>> At present, if the super vdsm server dies with an exception inside
> > > >>>> Python's multiprocessing module, then it will not usually produce
> > > >>>> any
> > > >>>> useful debugging output.
> > > >>>
> > > >>> For our context - when do you notice such supervdsm deaths?
> > > >>> Is it frequent? What is the cause?
> > > >>
> > > >> BZ#1011661 & BZ#1010030 downstream.
> > > > 
> > > > Ok, I can see them, dig into them and find an answer to my question.
> > > > But
> > > > it's not fair to the wider community of users and partner to cite
> > > > private bugs.
> > > > 
> > > > https://www.berrange.com/posts/2012/06/27/thoughts-on-improving-openstack-git-commit-practicehistory/
> > > 
> > > Apologies Dan,
> > > 
> > > I believe David was referring to the public BZ#1011661. I believe that
> > > has been attributed to the following change merged upstream in May :
> > > 
> > > http://gerrit.ovirt.org/#/c/14998
> > 
> > Uh, the problem's not attributed to that, rather that patch fixes it.
> > The problem was that the process ctimes were changing, leading vdsm to
> > erroneously think that supervdsm had died and restarting it.  That lead
> > to several complications, including the supervdsm servers failing
> > silently due to lack of logging from multiprocessing.
> 
> As much as I hate restarting supervdsmd service, I'm so glad that these
> issues are solved in ovirt-3.3.
> 

agree.. much better practice to work imho.

> > 
> > We don't yet know why the ctimes were changing in this particular
> > customer environment.
> 
> a longshot: maybe a system date change managed to confuse vdsm?
> 

Might.. we saw that only once before when we tried to run load test in ovirt 
with many hosts. For that we used vms as hosts with nested kvm. each fake host 
ran vms.
In this case we saw that the ctime was changed during vdsm run, and we end up 
with many instances of the process. Nothing in the system was changed during 
the run.

Anyhow, ctime shouldn't be used to point on process creation time, the manual 
on stat alerts about it. Maybe literally the inode was changed, and maybe Dan's 
longshot hits the cause. Any new hints?

Yaniv Bronhaim.

> Dan.
> ___
> vdsm-devel mailing list
> vdsm-devel@lists.fedorahosted.org
> https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
> 
___
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel


Re: [vdsm] supervdsmServer: Enable logging from multiprocessing library

2013-10-02 Thread Dan Kenigsberg
On Wed, Oct 02, 2013 at 03:59:58PM +1000, David Gibson wrote:
> On Tue, 01 Oct 2013 20:25:21 +0100
> Lee Yarwood  wrote:
> 
> > On 10/01/2013 06:35 PM, Dan Kenigsberg wrote:
> > > On Tue, Oct 01, 2013 at 02:33:00PM +0100, Lee Yarwood wrote:
> > >> On 10/01/2013 09:00 AM, Dan Kenigsberg wrote:
> > >>> It is prefered to post patches to gerrit.ovirt.org.
> > >>
> > >> Apologies for jumping in David but I've pushed this here for now :
> > >>
> > >> http://gerrit.ovirt.org/19741
> > > 
> > > Thanks!
> 
> Thanks, I'm new to ovirt development and gerrit, so I'm going to need
> to work that out.
> 
> > >>> On Tue, Oct 01, 2013 at 01:18:25PM +1000, David Gibson wrote:
> >  At present, if the super vdsm server dies with an exception inside
> >  Python's multiprocessing module, then it will not usually produce any
> >  useful debugging output.
> > >>>
> > >>> For our context - when do you notice such supervdsm deaths?
> > >>> Is it frequent? What is the cause?
> > >>
> > >> BZ#1011661 & BZ#1010030 downstream.
> > > 
> > > Ok, I can see them, dig into them and find an answer to my question. But
> > > it's not fair to the wider community of users and partner to cite
> > > private bugs.
> > > 
> > > https://www.berrange.com/posts/2012/06/27/thoughts-on-improving-openstack-git-commit-practicehistory/
> > 
> > Apologies Dan,
> > 
> > I believe David was referring to the public BZ#1011661. I believe that
> > has been attributed to the following change merged upstream in May :
> > 
> > http://gerrit.ovirt.org/#/c/14998
> 
> Uh, the problem's not attributed to that, rather that patch fixes it.
> The problem was that the process ctimes were changing, leading vdsm to
> erroneously think that supervdsm had died and restarting it.  That lead
> to several complications, including the supervdsm servers failing
> silently due to lack of logging from multiprocessing.

As much as I hate restarting supervdsmd service, I'm so glad that these
issues are solved in ovirt-3.3.

> 
> We don't yet know why the ctimes were changing in this particular
> customer environment.

a longshot: maybe a system date change managed to confuse vdsm?

Dan.
___
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel


Re: [vdsm] supervdsmServer: Enable logging from multiprocessing library

2013-10-01 Thread David Gibson
On Tue, 01 Oct 2013 20:25:21 +0100
Lee Yarwood  wrote:

> On 10/01/2013 06:35 PM, Dan Kenigsberg wrote:
> > On Tue, Oct 01, 2013 at 02:33:00PM +0100, Lee Yarwood wrote:
> >> On 10/01/2013 09:00 AM, Dan Kenigsberg wrote:
> >>> It is prefered to post patches to gerrit.ovirt.org.
> >>
> >> Apologies for jumping in David but I've pushed this here for now :
> >>
> >> http://gerrit.ovirt.org/19741
> > 
> > Thanks!

Thanks, I'm new to ovirt development and gerrit, so I'm going to need
to work that out.

> >>> On Tue, Oct 01, 2013 at 01:18:25PM +1000, David Gibson wrote:
>  At present, if the super vdsm server dies with an exception inside
>  Python's multiprocessing module, then it will not usually produce any
>  useful debugging output.
> >>>
> >>> For our context - when do you notice such supervdsm deaths?
> >>> Is it frequent? What is the cause?
> >>
> >> BZ#1011661 & BZ#1010030 downstream.
> > 
> > Ok, I can see them, dig into them and find an answer to my question. But
> > it's not fair to the wider community of users and partner to cite
> > private bugs.
> > 
> > https://www.berrange.com/posts/2012/06/27/thoughts-on-improving-openstack-git-commit-practicehistory/
> 
> Apologies Dan,
> 
> I believe David was referring to the public BZ#1011661. I believe that
> has been attributed to the following change merged upstream in May :
> 
> http://gerrit.ovirt.org/#/c/14998

Uh, the problem's not attributed to that, rather that patch fixes it.
The problem was that the process ctimes were changing, leading vdsm to
erroneously think that supervdsm had died and restarting it.  That lead
to several complications, including the supervdsm servers failing
silently due to lack of logging from multiprocessing.

We don't yet know why the ctimes were changing in this particular
customer environment.

-- 
David Gibson 


pgpcb_e_pFrde.pgp
Description: PGP signature
___
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel


Re: [vdsm] supervdsmServer: Enable logging from multiprocessing library

2013-10-01 Thread Lee Yarwood
On 10/01/2013 06:35 PM, Dan Kenigsberg wrote:
> On Tue, Oct 01, 2013 at 02:33:00PM +0100, Lee Yarwood wrote:
>> On 10/01/2013 09:00 AM, Dan Kenigsberg wrote:
>>> It is prefered to post patches to gerrit.ovirt.org.
>>
>> Apologies for jumping in David but I've pushed this here for now :
>>
>> http://gerrit.ovirt.org/19741
> 
> Thanks!
> 
>>
>>> On Tue, Oct 01, 2013 at 01:18:25PM +1000, David Gibson wrote:
 At present, if the super vdsm server dies with an exception inside
 Python's multiprocessing module, then it will not usually produce any
 useful debugging output.
>>>
>>> For our context - when do you notice such supervdsm deaths?
>>> Is it frequent? What is the cause?
>>
>> BZ#1011661 & BZ#1010030 downstream.
> 
> Ok, I can see them, dig into them and find an answer to my question. But
> it's not fair to the wider community of users and partner to cite
> private bugs.
> 
> https://www.berrange.com/posts/2012/06/27/thoughts-on-improving-openstack-git-commit-practicehistory/

Apologies Dan,

I believe David was referring to the public BZ#1011661. I believe that
has been attributed to the following change merged upstream in May :

http://gerrit.ovirt.org/#/c/14998

Hopefully David or Tomas can confirm this.

The private bug BZ#1010030 that I also referenced is slightly different
and covers the following supervdsm tracebacks seen by customers downstream :

~~~
Thread-1029158::ERROR::2013-09-10
02:46:29,250::domainMonitor::225::Storage.DomainMonitorThread::(_monitorDomain)
Error while collecting domain 977c6c73-5ca2-478c-9ffe-6a72d74a09d4
monitoring information
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/domainMonitor.py", line 201, in
_monitorDomain
  File "/usr/share/vdsm/storage/sdc.py", line 49, in __getattr__
  File "/usr/share/vdsm/storage/sdc.py", line 52, in getRealDomain
  File "/usr/share/vdsm/storage/sdc.py", line 120, in _realProduce
  File "/usr/share/vdsm/storage/misc.py", line 1072, in helper
  File "/usr/share/vdsm/storage/misc.py", line 1057, in __call__
  File "/usr/share/vdsm/storage/sdc.py", line 83, in refreshStorage
  File "/usr/share/vdsm/storage/multipath.py", line 73, in rescan
  File "/usr/share/vdsm/supervdsm.py", line 76, in __call__
  File "/usr/share/vdsm/supervdsm.py", line 67, in 
  File "", line 2, in forceScsiScan
  File "/usr/lib64/python2.6/multiprocessing/managers.py", line 740, in
_callmethod
RemoteError:
---
Traceback (most recent call last):
  File "/usr/lib64/python2.6/multiprocessing/managers.py", line 216, in
serve_client
KeyError: '2baea50'
---
~~~

This BZ is still a WIP and thus any additional debug logging we could
get from the multiprocessing module would help.

Again my apologies for not including these details previously.

Lee
-- 

Lee Yarwood
Senior Software Maintenance Engineer
Red Hat UK Ltd
200 Fowler Avenue, Farnborough Business Park, Farnborough, Hants GU14 7JP

Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham (US), Paul Hickey (Ireland), Matt Parson
(US), Charles Peters (US)

GPG fingerprint : A5D1 9385 88CB 7E5F BE64  6618 BCA6 6E33 F672 2D76
___
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel


Re: [vdsm] supervdsmServer: Enable logging from multiprocessing library

2013-10-01 Thread Dan Kenigsberg
On Tue, Oct 01, 2013 at 02:33:00PM +0100, Lee Yarwood wrote:
> On 10/01/2013 09:00 AM, Dan Kenigsberg wrote:
> > It is prefered to post patches to gerrit.ovirt.org.
> 
> Apologies for jumping in David but I've pushed this here for now :
> 
> http://gerrit.ovirt.org/19741

Thanks!

> 
> > On Tue, Oct 01, 2013 at 01:18:25PM +1000, David Gibson wrote:
> >> At present, if the super vdsm server dies with an exception inside
> >> Python's multiprocessing module, then it will not usually produce any
> >> useful debugging output.
> > 
> > For our context - when do you notice such supervdsm deaths?
> > Is it frequent? What is the cause?
> 
> BZ#1011661 & BZ#1010030 downstream.

Ok, I can see them, dig into them and find an answer to my question. But
it's not fair to the wider community of users and partner to cite
private bugs.

https://www.berrange.com/posts/2012/06/27/thoughts-on-improving-openstack-git-commit-practicehistory/
___
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel


Re: [vdsm] supervdsmServer: Enable logging from multiprocessing library

2013-10-01 Thread Lee Yarwood
On 10/01/2013 09:00 AM, Dan Kenigsberg wrote:
> It is prefered to post patches to gerrit.ovirt.org.

Apologies for jumping in David but I've pushed this here for now :

http://gerrit.ovirt.org/19741

> On Tue, Oct 01, 2013 at 01:18:25PM +1000, David Gibson wrote:
>> At present, if the super vdsm server dies with an exception inside
>> Python's multiprocessing module, then it will not usually produce any
>> useful debugging output.
> 
> For our context - when do you notice such supervdsm deaths?
> Is it frequent? What is the cause?

BZ#1011661 & BZ#1010030 downstream.

>>
>> The multiprocessing module includes error logging using Python's logging
>> module, but it is not enabled by default.  This patch enables the
>> logging and sets it to propagate to the supervdsmServer's root logger,
>> so it now defaults to producing logging and can be configured easily
>> from /etc/vdsm/svdsm.logger.conf.
>>
>> Signed-off-by: David Gibson 
>> ---
>>  vdsm/supervdsmServer | 4 
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/vdsm/supervdsmServer b/vdsm/supervdsmServer
>> index 40ec9df..43a408f 100755
>> --- a/vdsm/supervdsmServer
>> +++ b/vdsm/supervdsmServer
>> @@ -42,6 +42,7 @@ except:
>>  log.warn("Could not init proper logging", exc_info=True)
>>  
>>  from storage import fuser
>> +import multiprocessing
>>  from multiprocessing import Pipe, Process
>>  from gluster import listPublicFunctions
>>  import storage.misc as misc
>> @@ -356,6 +357,9 @@ class _SuperVdsm(object):
>>  
>>  def main(sockfile, pidfile=None):
>>  log = logging.getLogger("SuperVdsm.Server")
>> +# Wire up the multiprocessing logger
>> +mlog = multiprocessing.get_logger()
>> +mlog.propagate = 1
> 
> The code seems fine, though I'd avoid the variable
> 
> multiprocessing.get_logger().propagate = 1
> 
> Thanks.
> ___
> vdsm-devel mailing list
> vdsm-devel@lists.fedorahosted.org
> https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
> 


-- 

Lee Yarwood
Senior Software Maintenance Engineer
Red Hat UK Ltd
200 Fowler Avenue, Farnborough Business Park, Farnborough, Hants GU14 7JP

Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham (US), Paul Hickey (Ireland), Matt Parson
(US), Charles Peters (US)

GPG fingerprint : A5D1 9385 88CB 7E5F BE64  6618 BCA6 6E33 F672 2D76
___
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel


Re: [vdsm] supervdsmServer: Enable logging from multiprocessing library

2013-10-01 Thread Dan Kenigsberg
It is prefered to post patches to gerrit.ovirt.org.

On Tue, Oct 01, 2013 at 01:18:25PM +1000, David Gibson wrote:
> At present, if the super vdsm server dies with an exception inside
> Python's multiprocessing module, then it will not usually produce any
> useful debugging output.

For our context - when do you notice such supervdsm deaths?
Is it frequent? What is the cause?

> 
> The multiprocessing module includes error logging using Python's logging
> module, but it is not enabled by default.  This patch enables the
> logging and sets it to propagate to the supervdsmServer's root logger,
> so it now defaults to producing logging and can be configured easily
> from /etc/vdsm/svdsm.logger.conf.
> 
> Signed-off-by: David Gibson 
> ---
>  vdsm/supervdsmServer | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/vdsm/supervdsmServer b/vdsm/supervdsmServer
> index 40ec9df..43a408f 100755
> --- a/vdsm/supervdsmServer
> +++ b/vdsm/supervdsmServer
> @@ -42,6 +42,7 @@ except:
>  log.warn("Could not init proper logging", exc_info=True)
>  
>  from storage import fuser
> +import multiprocessing
>  from multiprocessing import Pipe, Process
>  from gluster import listPublicFunctions
>  import storage.misc as misc
> @@ -356,6 +357,9 @@ class _SuperVdsm(object):
>  
>  def main(sockfile, pidfile=None):
>  log = logging.getLogger("SuperVdsm.Server")
> +# Wire up the multiprocessing logger
> +mlog = multiprocessing.get_logger()
> +mlog.propagate = 1

The code seems fine, though I'd avoid the variable

multiprocessing.get_logger().propagate = 1

Thanks.
___
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel