----- Original Message -----
> From: "Dan Kenigsberg" <dan...@redhat.com>
> To: "David Gibson" <dgib...@redhat.com>
> Cc: "Tomas Dosek" <tdo...@redhat.com>, vdsm-devel@lists.fedorahosted.org
> Sent: Wednesday, October 2, 2013 11:21:30 AM
> Subject: Re: [vdsm] supervdsmServer: Enable logging from multiprocessing 
> library
> 
> On Wed, Oct 02, 2013 at 03:59:58PM +1000, David Gibson wrote:
> > On Tue, 01 Oct 2013 20:25:21 +0100
> > Lee Yarwood <lyarw...@redhat.com> wrote:
> > 
> > > On 10/01/2013 06:35 PM, Dan Kenigsberg wrote:
> > > > On Tue, Oct 01, 2013 at 02:33:00PM +0100, Lee Yarwood wrote:
> > > >> On 10/01/2013 09:00 AM, Dan Kenigsberg wrote:
> > > >>> It is prefered to post patches to gerrit.ovirt.org.
> > > >>
> > > >> Apologies for jumping in David but I've pushed this here for now :
> > > >>
> > > >> http://gerrit.ovirt.org/19741
> > > > 
> > > > Thanks!
> > 
> > Thanks, I'm new to ovirt development and gerrit, so I'm going to need
> > to work that out.
> > 
> > > >>> On Tue, Oct 01, 2013 at 01:18:25PM +1000, David Gibson wrote:
> > > >>>> At present, if the super vdsm server dies with an exception inside
> > > >>>> Python's multiprocessing module, then it will not usually produce
> > > >>>> any
> > > >>>> useful debugging output.
> > > >>>
> > > >>> For our context - when do you notice such supervdsm deaths?
> > > >>> Is it frequent? What is the cause?
> > > >>
> > > >> BZ#1011661 & BZ#1010030 downstream.
> > > > 
> > > > Ok, I can see them, dig into them and find an answer to my question.
> > > > But
> > > > it's not fair to the wider community of users and partner to cite
> > > > private bugs.
> > > > 
> > > > https://www.berrange.com/posts/2012/06/27/thoughts-on-improving-openstack-git-commit-practicehistory/
> > > 
> > > Apologies Dan,
> > > 
> > > I believe David was referring to the public BZ#1011661. I believe that
> > > has been attributed to the following change merged upstream in May :
> > > 
> > > http://gerrit.ovirt.org/#/c/14998
> > 
> > Uh, the problem's not attributed to that, rather that patch fixes it.
> > The problem was that the process ctimes were changing, leading vdsm to
> > erroneously think that supervdsm had died and restarting it.  That lead
> > to several complications, including the supervdsm servers failing
> > silently due to lack of logging from multiprocessing.
> 
> As much as I hate restarting supervdsmd service, I'm so glad that these
> issues are solved in ovirt-3.3.
> 

agree.. much better practice to work imho.

> > 
> > We don't yet know why the ctimes were changing in this particular
> > customer environment.
> 
> a longshot: maybe a system date change managed to confuse vdsm?
> 

Might.. we saw that only once before when we tried to run load test in ovirt 
with many hosts. For that we used vms as hosts with nested kvm. each fake host 
ran vms.
In this case we saw that the ctime was changed during vdsm run, and we end up 
with many instances of the process. Nothing in the system was changed during 
the run.

Anyhow, ctime shouldn't be used to point on process creation time, the manual 
on stat alerts about it. Maybe literally the inode was changed, and maybe Dan's 
longshot hits the cause. Any new hints?

Yaniv Bronhaim.

> Dan.
> _______________________________________________
> vdsm-devel mailing list
> vdsm-devel@lists.fedorahosted.org
> https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
> 
_______________________________________________
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel

Reply via email to