----- Original Message -----
> From: "Mark Wu" <wu...@linux.vnet.ibm.com>
> To: "Zhou Zheng Sheng" <zhshz...@linux.vnet.ibm.com>
> Cc: "VDSM Project Development" <vdsm-devel@lists.fedorahosted.org>
> Sent: Wednesday, May 29, 2013 11:38:40 AM
> Subject: Re: [vdsm] Potential Bug in cpopen
> 
> On Wed 29 May 2013 12:02:47 PM CST, Zhou Zheng Sheng wrote:
> >
> > Hi,
> >
> > Recently Jenkins unit test sometimes fails on PidStatTests.test, for
> > example http://gerrit.ovirt.org/#/c/14670/
> >
> > After it execCmd a "sleep" command with sync=False, in
> > /proc/[xxxpid]/stat we should see the name is "sleep", but in this case
> > we get "python", which means there is a possible race condition. The
> > most possible situation is that execCmd returns before the child process
> > execvp the "sleep", then the parent process reads the stat and sees the
> > process name is still "python".
> >
> > cpopen is designed to avoid this kind of race. It uses a pipe to
> > synchronize the child and parent. It sets the FD_CLOEXEC on the child
> > side of the pipe, so that once execvp succeeds, the child pipe is
> > closed. If execvp fails, child writes error code to the pipe. The parent
> > reads the other end of the pipe like follow.
> >
> > if (read(errnofd[0], &childErrno, sizeof(int)) == sizeof(int)) {
> >      PyErr_SetString(PyExc_OSError, strerror(childErrno));
> >      goto fail;
> > }
> >
> > It assumes that when read returns, the return value is either
> > sizeof(int), which indicates an error in the child side, or 0, which
> > indicates the child side of pipe is closed and execvp succeeds. However
> > this assumption may not always be true. If the parent process gets a
> > signal, the read invocation would be interrupt, and the code treats this
> > interruption the same as the case of execvp succeeds. If the system is
> > very busy like our Jenkins slave concurrently executing jobs, it's
> > possible the parent gets interrupted before the child execvp succeeds,
> > so cpopen returns to execCmd and cause the race.
> >
> > To produce this problem, we can add a "sleep(10);" before the exec
> > invocation in cpopen.c, it simulates a busy/slow system. Then in a
> > Python interpreter, register a signal handler and calls execCmd.
> >
> >>>> from vdsm.utils import execCmd
> >>>> from vdsm import utils
> >>>> def handler(signum, frame):
> > ...     print 'Signal handler called with signal', signum
> > ...
> >>>> import signal
> >>>> signal.signal(signal.SIGALRM, handler)
> >>>> signal.signal(signal.SIGCHLD, handler)
> >>>> p = execCmd(['sleep', '3'], sync=False, sudo=False); s =
> > utils.pidStat(p.pid); print s; p.wait()
> >
> > Then kill -SIGCHLD or kill -SIGALRM to this Python interpreter process,
> > we can see the output.
> >
> > Signal handler called with signal 17
> > (9541, 'python', 'S', 9509, 9509, 6843, 34817, 9509, 4218944, 66, 0, 0,
> > 0, 0, 0, 0, 0, 20, 0, 1, 0, 663767, 217919488, 1676,
> > 18446744073709551615L, 4194304, 4197004, 140735665131088,
> > 140735665124456, 268226504400, 0, 0, 16781312, 73730,
> > 18446744071579398713L, 0, 0, 17, 3, 0, 0, 0, 0, 0, 6294952, 6297616,
> > 19152896, 140735665132425, 140735665132432, 140735665132432,
> > 140735665135592, 0)
> > True
> >
> > I have not found other ways to produce it, just found this method. Not
> > sure it is a bug. Is it reasonable to check the read() return value and
> > retry on EAGAIN/EINTR to fix this?
It's surely a bug... 
> 
> Good catch!  If you can't find out how the race is invoked on jenkins
> slave,  you could just add a patch to fix the  EAGAIN/EINTR issue in
> cpopen, and add some fake patches to invoke multiple jenkins jobs.
> Then you could find if the fix of EAGAIN/EINTR helps.
> 

good catch alright.
The returncode of execCmd also None, so you have no way to catch the failure,
Seems like Mark is right, Im checking it and I'll upload a patch to fix it soon 
as possible.

Thanks!! 
> 
> _______________________________________________
> vdsm-devel mailing list
> vdsm-devel@lists.fedorahosted.org
> https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
> 
_______________________________________________
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel

Reply via email to