Re: [vdsm] About vdsmd init script

2013-05-29 Thread Zhou Zheng Sheng

Hi,

on 05/28/2013 17:26, Yaniv Bronheim wrote:
 Hey,
 
 I think that libvirt_configure part can be an external module (maybe python 
 module) that can be initiated by vdsm-tool,
 It should work with template or conf.default as you mentioned, and we should 
 call it before starting the service, I think it should be a module as it also 
 should include all the part of libvirtd_sysv2upstart, libvirtd_reconfigure, 
 libvirtd_configure, test_conflicting_conf scripts.
 
 Also, keep in mind that we plan to split vdsm to 2 services - one for vdsmd 
 and one for supervdsmd, both should be initiated at startup and should be 
 depended on eachother (http://gerrit.ovirt.org/#/c/11051/).

Yes. After supervdsm starts as service, we can add dependency
declarations easily. It's not conflicting with refactoring vdsm init
script. I can help to review the supervdsm patch to make it done faster.

 The other parts that you want to take out of vdsmd script are:
 shutdown-conflict-srv - could be also as part of the tool
 nwfliter, dummybr - both python scripts that we run, why not part of the tool 
 as well?
 start_needed_srv, load-needed-modules - only sysv and debian need it if I 
 understand correctly. systemd,upstat,openrc can use their init script 
 parameters. so why take them out? in each start function we'll start and load 
 the needed services and modules. systemd,upstat,openrc don't need custom 
 start function anyway.

The Debian ships with /lib/lsb/init-functions, and Red Hat family (such
as CentOS, RHEL6) ship with /etc/init.d/functions. To print the error
message and daemonize the service process, we call different utility
functions in different system thought they are all SysV. The service
script boilerplate in Debian is different from Red Hat family as well.
So we want provide dedicated init script for respective systems. To
re-use start_needed_srv and load-needed-modules in different SysV init
scripts, I move them out.

 gencerts, syslog_available, tune_system, test_space_and_lo, prepare_dirs - 
 can be scripts that we run before start as you did.
 
 Regards,
 Yaniv Bronhaim.
 

I agree some of the initialize operations can be moved to vdsm-tool. I
think we can do this in future patch after we port VDSM init script to
Ubuntu. I'd prefer start small, not to do all the things in one batch.
Once we have VDSM run on Ubuntu, we can improve it step by step.
-- 
Thanks and best regards!

Zhou Zheng Sheng / 周征晟
E-mail: zhshz...@linux.vnet.ibm.com
Telephone: 86-10-82454397

___
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel


[vdsm] oVirt updates - April 28th, 2013

2013-05-29 Thread Itamar Heim

1. From the Web

- interview with Theron on oVirt (chinese)
  http://www.infoq.com/cn/news/2013/05/conrey-on-ovirt

- interview with Dave Neary about his work (and oVirt)

http://www.techradar.com/news/software/what-went-wrong-with-meego-nokia-lost-faith-in-the-project--1147770

- Nagios monitoring plugin check_rhev3 1.2 released
  http://lists.ovirt.org/pipermail/users/2013-May/014389.html

- a blog on how to do HA for engine (written for rhev, should be
  relevant to oVirt as well)
  http://captainkvm.com/2013/05/providing-high-availability-for-rhev-m/


2. Video
- youtube available for IBM's session on connected Communities,
  Innovative Technologies: OpenStack, oVirt, and KVM
  http://www.youtube.com/watch?v=Pg7ShV-HvCE

- fog/foreman by ohad levy (fog supports oVirt)
  http://www.youtube.com/watch?v=JgaQ_ekR2JA


3. Conferences

- FOSDEM  presentations page uploaded
  http://www.ovirt.org/FOSDEM_2013

- some of the Shangahi presentations uploaded
  http://www.ovirt.org/Intel_Workshop_May_2013

- upcoming - LinuxCon Japan
  oVirt session in LinuxCon Japan (this week)

- upcoming - oVirt Developer days (with KVM Forum)
  Edinburgh, UK - October 21 - 23, 2013


4. Other

- help test the new oVirt installer and developer setup environments
  http://www.ovirt.org/OVirtEngineDevelopmentEnvironment

- RC for oVirt Node 3.0.0 is now available (but not compatible with
  ovirt-engine yet)

Thanks,
   Itamar
___
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel


Re: [vdsm] Potential Bug in cpopen

2013-05-29 Thread Mark Wu

On Wed 29 May 2013 12:02:47 PM CST, Zhou Zheng Sheng wrote:


Hi,

Recently Jenkins unit test sometimes fails on PidStatTests.test, for
example http://gerrit.ovirt.org/#/c/14670/

After it execCmd a sleep command with sync=False, in
/proc/[xxxpid]/stat we should see the name is sleep, but in this case
we get python, which means there is a possible race condition. The
most possible situation is that execCmd returns before the child process
execvp the sleep, then the parent process reads the stat and sees the
process name is still python.

cpopen is designed to avoid this kind of race. It uses a pipe to
synchronize the child and parent. It sets the FD_CLOEXEC on the child
side of the pipe, so that once execvp succeeds, the child pipe is
closed. If execvp fails, child writes error code to the pipe. The parent
reads the other end of the pipe like follow.

if (read(errnofd[0], childErrno, sizeof(int)) == sizeof(int)) {
 PyErr_SetString(PyExc_OSError, strerror(childErrno));
 goto fail;
}

It assumes that when read returns, the return value is either
sizeof(int), which indicates an error in the child side, or 0, which
indicates the child side of pipe is closed and execvp succeeds. However
this assumption may not always be true. If the parent process gets a
signal, the read invocation would be interrupt, and the code treats this
interruption the same as the case of execvp succeeds. If the system is
very busy like our Jenkins slave concurrently executing jobs, it's
possible the parent gets interrupted before the child execvp succeeds,
so cpopen returns to execCmd and cause the race.

To produce this problem, we can add a sleep(10); before the exec
invocation in cpopen.c, it simulates a busy/slow system. Then in a
Python interpreter, register a signal handler and calls execCmd.


from vdsm.utils import execCmd
from vdsm import utils
def handler(signum, frame):

... print 'Signal handler called with signal', signum
...

import signal
signal.signal(signal.SIGALRM, handler)
signal.signal(signal.SIGCHLD, handler)
p = execCmd(['sleep', '3'], sync=False, sudo=False); s =

utils.pidStat(p.pid); print s; p.wait()

Then kill -SIGCHLD or kill -SIGALRM to this Python interpreter process,
we can see the output.

Signal handler called with signal 17
(9541, 'python', 'S', 9509, 9509, 6843, 34817, 9509, 4218944, 66, 0, 0,
0, 0, 0, 0, 0, 20, 0, 1, 0, 663767, 217919488, 1676,
18446744073709551615L, 4194304, 4197004, 140735665131088,
140735665124456, 268226504400, 0, 0, 16781312, 73730,
18446744071579398713L, 0, 0, 17, 3, 0, 0, 0, 0, 0, 6294952, 6297616,
19152896, 140735665132425, 140735665132432, 140735665132432,
140735665135592, 0)
True

I have not found other ways to produce it, just found this method. Not
sure it is a bug. Is it reasonable to check the read() return value and
retry on EAGAIN/EINTR to fix this?


Good catch!  If you can't find out how the race is invoked on jenkins 
slave,  you could just add a patch to fix the  EAGAIN/EINTR issue in 
cpopen, and add some fake patches to invoke multiple jenkins jobs.  
Then you could find if the fix of EAGAIN/EINTR helps.



___
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel


Re: [vdsm] Migration regression on master

2013-05-29 Thread Peter V. Saveliev

On 05/29/2013 07:17 AM, Dan Kenigsberg wrote:

On Tue, May 28, 2013 at 11:54:45AM -0400, Giuseppe Vallarelli wrote:

  | - Original Message -
| | From: Assaf Muller amul...@redhat.com
| | To: Michal Skrivanek michal.skriva...@redhat.com
| | Cc: vdsm-devel@lists.fedorahosted.org Development
| | vdsm-devel@lists.fedorahosted.org
| | Sent: Thursday, May 23, 2013 2:12:47 PM
| | Subject: Re: [vdsm] Migration regression on master
| |
| | As you can see in a previous patch set I checked if the alias attribute
| | exists instead of assuming it exists.
| | I then changed my mind with Dan's blessing, and decided to assume it does
| | exist, exactly for cases like this.
| | Even if we check if the alias exists, what do we do if we find out it
| | doesn't? We're at a problem and need to understand why the alias doesn't
| | exist because it should - For all devices.
| |
| | We definitely need to deal with this issue - Can you provide the domxml of
| | the VM during creation, and during migration?

Hi Assaf, Peter today has reproduced the same error and provided me
the output log, you can find it here:
http://etherpad.ovirt.org/p/migration-errors


Would you add the original vmCreate line, and domxml, that was used to
create the VM at the source host?


http://pastebin.test.redhat.com/17
http://pastebin.test.redhat.com/16

--
Peter V. Saveliev
___
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel


Re: [vdsm] supervdsm broken on master

2013-05-29 Thread Yaniv Bronheim
Hey, I'm happy to see it merged :) 

I assumed that part of the spec we create this directory..
http://gerrit.ovirt.org/15170 - fix the issue,

And also, don't forget about /var/log/vdsm/supervdsm.log to check such errors 

Thanks! 

Yaniv.

- Original Message -
 From: Dan Kenigsberg dan...@redhat.com
 To: vdsm-devel@lists.fedorahosted.org
 Cc: Yaniv Bronheim ybron...@redhat.com
 Sent: Wednesday, May 29, 2013 6:31:46 PM
 Subject: supervdsm broken on master
 
 I've just taken Yaniv's http://gerrit.ovirt.org/11051 Supervdsm as
 external service to master, but unfortunately, I decided to test it
 myself only afterwards.
 
 Currently, supervdsm fails to start unless you manually create the
 directory /var/run/vdsm/ as root. The fix should not be complex, but I'm
 on the run. I'm confident that Yaniv would fix it soon.
 
 Regards,
 Dan.
 
___
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel