Re: [vdsm] supervdsmServer: Enable logging from multiprocessing library
- Original Message - > From: "Dan Kenigsberg" > To: "David Gibson" > Cc: "Tomas Dosek" , vdsm-devel@lists.fedorahosted.org > Sent: Wednesday, October 2, 2013 11:21:30 AM > Subject: Re: [vdsm] supervdsmServer: Enable logging from multiprocessing > library > > On Wed, Oct 02, 2013 at 03:59:58PM +1000, David Gibson wrote: > > On Tue, 01 Oct 2013 20:25:21 +0100 > > Lee Yarwood wrote: > > > > > On 10/01/2013 06:35 PM, Dan Kenigsberg wrote: > > > > On Tue, Oct 01, 2013 at 02:33:00PM +0100, Lee Yarwood wrote: > > > >> On 10/01/2013 09:00 AM, Dan Kenigsberg wrote: > > > >>> It is prefered to post patches to gerrit.ovirt.org. > > > >> > > > >> Apologies for jumping in David but I've pushed this here for now : > > > >> > > > >> http://gerrit.ovirt.org/19741 > > > > > > > > Thanks! > > > > Thanks, I'm new to ovirt development and gerrit, so I'm going to need > > to work that out. > > > > > >>> On Tue, Oct 01, 2013 at 01:18:25PM +1000, David Gibson wrote: > > > >>>> At present, if the super vdsm server dies with an exception inside > > > >>>> Python's multiprocessing module, then it will not usually produce > > > >>>> any > > > >>>> useful debugging output. > > > >>> > > > >>> For our context - when do you notice such supervdsm deaths? > > > >>> Is it frequent? What is the cause? > > > >> > > > >> BZ#1011661 & BZ#1010030 downstream. > > > > > > > > Ok, I can see them, dig into them and find an answer to my question. > > > > But > > > > it's not fair to the wider community of users and partner to cite > > > > private bugs. > > > > > > > > https://www.berrange.com/posts/2012/06/27/thoughts-on-improving-openstack-git-commit-practicehistory/ > > > > > > Apologies Dan, > > > > > > I believe David was referring to the public BZ#1011661. I believe that > > > has been attributed to the following change merged upstream in May : > > > > > > http://gerrit.ovirt.org/#/c/14998 > > > > Uh, the problem's not attributed to that, rather that patch fixes it. > > The problem was that the process ctimes were changing, leading vdsm to > > erroneously think that supervdsm had died and restarting it. That lead > > to several complications, including the supervdsm servers failing > > silently due to lack of logging from multiprocessing. > > As much as I hate restarting supervdsmd service, I'm so glad that these > issues are solved in ovirt-3.3. > agree.. much better practice to work imho. > > > > We don't yet know why the ctimes were changing in this particular > > customer environment. > > a longshot: maybe a system date change managed to confuse vdsm? > Might.. we saw that only once before when we tried to run load test in ovirt with many hosts. For that we used vms as hosts with nested kvm. each fake host ran vms. In this case we saw that the ctime was changed during vdsm run, and we end up with many instances of the process. Nothing in the system was changed during the run. Anyhow, ctime shouldn't be used to point on process creation time, the manual on stat alerts about it. Maybe literally the inode was changed, and maybe Dan's longshot hits the cause. Any new hints? Yaniv Bronhaim. > Dan. > ___ > vdsm-devel mailing list > vdsm-devel@lists.fedorahosted.org > https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel > ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] supervdsmServer: Enable logging from multiprocessing library
On Wed, Oct 02, 2013 at 03:59:58PM +1000, David Gibson wrote: > On Tue, 01 Oct 2013 20:25:21 +0100 > Lee Yarwood wrote: > > > On 10/01/2013 06:35 PM, Dan Kenigsberg wrote: > > > On Tue, Oct 01, 2013 at 02:33:00PM +0100, Lee Yarwood wrote: > > >> On 10/01/2013 09:00 AM, Dan Kenigsberg wrote: > > >>> It is prefered to post patches to gerrit.ovirt.org. > > >> > > >> Apologies for jumping in David but I've pushed this here for now : > > >> > > >> http://gerrit.ovirt.org/19741 > > > > > > Thanks! > > Thanks, I'm new to ovirt development and gerrit, so I'm going to need > to work that out. > > > >>> On Tue, Oct 01, 2013 at 01:18:25PM +1000, David Gibson wrote: > > At present, if the super vdsm server dies with an exception inside > > Python's multiprocessing module, then it will not usually produce any > > useful debugging output. > > >>> > > >>> For our context - when do you notice such supervdsm deaths? > > >>> Is it frequent? What is the cause? > > >> > > >> BZ#1011661 & BZ#1010030 downstream. > > > > > > Ok, I can see them, dig into them and find an answer to my question. But > > > it's not fair to the wider community of users and partner to cite > > > private bugs. > > > > > > https://www.berrange.com/posts/2012/06/27/thoughts-on-improving-openstack-git-commit-practicehistory/ > > > > Apologies Dan, > > > > I believe David was referring to the public BZ#1011661. I believe that > > has been attributed to the following change merged upstream in May : > > > > http://gerrit.ovirt.org/#/c/14998 > > Uh, the problem's not attributed to that, rather that patch fixes it. > The problem was that the process ctimes were changing, leading vdsm to > erroneously think that supervdsm had died and restarting it. That lead > to several complications, including the supervdsm servers failing > silently due to lack of logging from multiprocessing. As much as I hate restarting supervdsmd service, I'm so glad that these issues are solved in ovirt-3.3. > > We don't yet know why the ctimes were changing in this particular > customer environment. a longshot: maybe a system date change managed to confuse vdsm? Dan. ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] supervdsmServer: Enable logging from multiprocessing library
On Tue, 01 Oct 2013 20:25:21 +0100 Lee Yarwood wrote: > On 10/01/2013 06:35 PM, Dan Kenigsberg wrote: > > On Tue, Oct 01, 2013 at 02:33:00PM +0100, Lee Yarwood wrote: > >> On 10/01/2013 09:00 AM, Dan Kenigsberg wrote: > >>> It is prefered to post patches to gerrit.ovirt.org. > >> > >> Apologies for jumping in David but I've pushed this here for now : > >> > >> http://gerrit.ovirt.org/19741 > > > > Thanks! Thanks, I'm new to ovirt development and gerrit, so I'm going to need to work that out. > >>> On Tue, Oct 01, 2013 at 01:18:25PM +1000, David Gibson wrote: > At present, if the super vdsm server dies with an exception inside > Python's multiprocessing module, then it will not usually produce any > useful debugging output. > >>> > >>> For our context - when do you notice such supervdsm deaths? > >>> Is it frequent? What is the cause? > >> > >> BZ#1011661 & BZ#1010030 downstream. > > > > Ok, I can see them, dig into them and find an answer to my question. But > > it's not fair to the wider community of users and partner to cite > > private bugs. > > > > https://www.berrange.com/posts/2012/06/27/thoughts-on-improving-openstack-git-commit-practicehistory/ > > Apologies Dan, > > I believe David was referring to the public BZ#1011661. I believe that > has been attributed to the following change merged upstream in May : > > http://gerrit.ovirt.org/#/c/14998 Uh, the problem's not attributed to that, rather that patch fixes it. The problem was that the process ctimes were changing, leading vdsm to erroneously think that supervdsm had died and restarting it. That lead to several complications, including the supervdsm servers failing silently due to lack of logging from multiprocessing. We don't yet know why the ctimes were changing in this particular customer environment. -- David Gibson pgpcb_e_pFrde.pgp Description: PGP signature ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] supervdsmServer: Enable logging from multiprocessing library
On 10/01/2013 06:35 PM, Dan Kenigsberg wrote: > On Tue, Oct 01, 2013 at 02:33:00PM +0100, Lee Yarwood wrote: >> On 10/01/2013 09:00 AM, Dan Kenigsberg wrote: >>> It is prefered to post patches to gerrit.ovirt.org. >> >> Apologies for jumping in David but I've pushed this here for now : >> >> http://gerrit.ovirt.org/19741 > > Thanks! > >> >>> On Tue, Oct 01, 2013 at 01:18:25PM +1000, David Gibson wrote: At present, if the super vdsm server dies with an exception inside Python's multiprocessing module, then it will not usually produce any useful debugging output. >>> >>> For our context - when do you notice such supervdsm deaths? >>> Is it frequent? What is the cause? >> >> BZ#1011661 & BZ#1010030 downstream. > > Ok, I can see them, dig into them and find an answer to my question. But > it's not fair to the wider community of users and partner to cite > private bugs. > > https://www.berrange.com/posts/2012/06/27/thoughts-on-improving-openstack-git-commit-practicehistory/ Apologies Dan, I believe David was referring to the public BZ#1011661. I believe that has been attributed to the following change merged upstream in May : http://gerrit.ovirt.org/#/c/14998 Hopefully David or Tomas can confirm this. The private bug BZ#1010030 that I also referenced is slightly different and covers the following supervdsm tracebacks seen by customers downstream : ~~~ Thread-1029158::ERROR::2013-09-10 02:46:29,250::domainMonitor::225::Storage.DomainMonitorThread::(_monitorDomain) Error while collecting domain 977c6c73-5ca2-478c-9ffe-6a72d74a09d4 monitoring information Traceback (most recent call last): File "/usr/share/vdsm/storage/domainMonitor.py", line 201, in _monitorDomain File "/usr/share/vdsm/storage/sdc.py", line 49, in __getattr__ File "/usr/share/vdsm/storage/sdc.py", line 52, in getRealDomain File "/usr/share/vdsm/storage/sdc.py", line 120, in _realProduce File "/usr/share/vdsm/storage/misc.py", line 1072, in helper File "/usr/share/vdsm/storage/misc.py", line 1057, in __call__ File "/usr/share/vdsm/storage/sdc.py", line 83, in refreshStorage File "/usr/share/vdsm/storage/multipath.py", line 73, in rescan File "/usr/share/vdsm/supervdsm.py", line 76, in __call__ File "/usr/share/vdsm/supervdsm.py", line 67, in File "", line 2, in forceScsiScan File "/usr/lib64/python2.6/multiprocessing/managers.py", line 740, in _callmethod RemoteError: --- Traceback (most recent call last): File "/usr/lib64/python2.6/multiprocessing/managers.py", line 216, in serve_client KeyError: '2baea50' --- ~~~ This BZ is still a WIP and thus any additional debug logging we could get from the multiprocessing module would help. Again my apologies for not including these details previously. Lee -- Lee Yarwood Senior Software Maintenance Engineer Red Hat UK Ltd 200 Fowler Avenue, Farnborough Business Park, Farnborough, Hants GU14 7JP Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham (US), Paul Hickey (Ireland), Matt Parson (US), Charles Peters (US) GPG fingerprint : A5D1 9385 88CB 7E5F BE64 6618 BCA6 6E33 F672 2D76 ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] supervdsmServer: Enable logging from multiprocessing library
On Tue, Oct 01, 2013 at 02:33:00PM +0100, Lee Yarwood wrote: > On 10/01/2013 09:00 AM, Dan Kenigsberg wrote: > > It is prefered to post patches to gerrit.ovirt.org. > > Apologies for jumping in David but I've pushed this here for now : > > http://gerrit.ovirt.org/19741 Thanks! > > > On Tue, Oct 01, 2013 at 01:18:25PM +1000, David Gibson wrote: > >> At present, if the super vdsm server dies with an exception inside > >> Python's multiprocessing module, then it will not usually produce any > >> useful debugging output. > > > > For our context - when do you notice such supervdsm deaths? > > Is it frequent? What is the cause? > > BZ#1011661 & BZ#1010030 downstream. Ok, I can see them, dig into them and find an answer to my question. But it's not fair to the wider community of users and partner to cite private bugs. https://www.berrange.com/posts/2012/06/27/thoughts-on-improving-openstack-git-commit-practicehistory/ ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] supervdsmServer: Enable logging from multiprocessing library
On 10/01/2013 09:00 AM, Dan Kenigsberg wrote: > It is prefered to post patches to gerrit.ovirt.org. Apologies for jumping in David but I've pushed this here for now : http://gerrit.ovirt.org/19741 > On Tue, Oct 01, 2013 at 01:18:25PM +1000, David Gibson wrote: >> At present, if the super vdsm server dies with an exception inside >> Python's multiprocessing module, then it will not usually produce any >> useful debugging output. > > For our context - when do you notice such supervdsm deaths? > Is it frequent? What is the cause? BZ#1011661 & BZ#1010030 downstream. >> >> The multiprocessing module includes error logging using Python's logging >> module, but it is not enabled by default. This patch enables the >> logging and sets it to propagate to the supervdsmServer's root logger, >> so it now defaults to producing logging and can be configured easily >> from /etc/vdsm/svdsm.logger.conf. >> >> Signed-off-by: David Gibson >> --- >> vdsm/supervdsmServer | 4 >> 1 file changed, 4 insertions(+) >> >> diff --git a/vdsm/supervdsmServer b/vdsm/supervdsmServer >> index 40ec9df..43a408f 100755 >> --- a/vdsm/supervdsmServer >> +++ b/vdsm/supervdsmServer >> @@ -42,6 +42,7 @@ except: >> log.warn("Could not init proper logging", exc_info=True) >> >> from storage import fuser >> +import multiprocessing >> from multiprocessing import Pipe, Process >> from gluster import listPublicFunctions >> import storage.misc as misc >> @@ -356,6 +357,9 @@ class _SuperVdsm(object): >> >> def main(sockfile, pidfile=None): >> log = logging.getLogger("SuperVdsm.Server") >> +# Wire up the multiprocessing logger >> +mlog = multiprocessing.get_logger() >> +mlog.propagate = 1 > > The code seems fine, though I'd avoid the variable > > multiprocessing.get_logger().propagate = 1 > > Thanks. > ___ > vdsm-devel mailing list > vdsm-devel@lists.fedorahosted.org > https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel > -- Lee Yarwood Senior Software Maintenance Engineer Red Hat UK Ltd 200 Fowler Avenue, Farnborough Business Park, Farnborough, Hants GU14 7JP Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham (US), Paul Hickey (Ireland), Matt Parson (US), Charles Peters (US) GPG fingerprint : A5D1 9385 88CB 7E5F BE64 6618 BCA6 6E33 F672 2D76 ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] supervdsmServer: Enable logging from multiprocessing library
It is prefered to post patches to gerrit.ovirt.org. On Tue, Oct 01, 2013 at 01:18:25PM +1000, David Gibson wrote: > At present, if the super vdsm server dies with an exception inside > Python's multiprocessing module, then it will not usually produce any > useful debugging output. For our context - when do you notice such supervdsm deaths? Is it frequent? What is the cause? > > The multiprocessing module includes error logging using Python's logging > module, but it is not enabled by default. This patch enables the > logging and sets it to propagate to the supervdsmServer's root logger, > so it now defaults to producing logging and can be configured easily > from /etc/vdsm/svdsm.logger.conf. > > Signed-off-by: David Gibson > --- > vdsm/supervdsmServer | 4 > 1 file changed, 4 insertions(+) > > diff --git a/vdsm/supervdsmServer b/vdsm/supervdsmServer > index 40ec9df..43a408f 100755 > --- a/vdsm/supervdsmServer > +++ b/vdsm/supervdsmServer > @@ -42,6 +42,7 @@ except: > log.warn("Could not init proper logging", exc_info=True) > > from storage import fuser > +import multiprocessing > from multiprocessing import Pipe, Process > from gluster import listPublicFunctions > import storage.misc as misc > @@ -356,6 +357,9 @@ class _SuperVdsm(object): > > def main(sockfile, pidfile=None): > log = logging.getLogger("SuperVdsm.Server") > +# Wire up the multiprocessing logger > +mlog = multiprocessing.get_logger() > +mlog.propagate = 1 The code seems fine, though I'd avoid the variable multiprocessing.get_logger().propagate = 1 Thanks. ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel