On 02/13/18 17:28, Daniel P. Berrangé wrote:
> On Fri, Feb 09, 2018 at 07:12:59PM +0000, Shaun Reitan wrote:
>> QEMU leaves the pidfile behind on a clean exit when using the option
>> -pidfile /var/run/qemu.pid.
>> Should QEMU leave it behind or should it clean up after itself?
>> I'm willing to take a crack at a patch to fix the issue, but before I do, I
>> want to make sure that leaving the pidfile behind was not intentional?
> If QEMU deletes the pidfile on exit then, with the current pidfile
> acquisition logic, there's a race condition possible:
> To acquire we do
> 1. fd = open()
> 2. lockf(fd)
> If the first QEMU that currently owns the pidfile unlinks in, while
> a second qemu is in betweeen steps 1 & 2, the second QEMU will
> acquire the pidfile successfully (which is fine) but the pidfile
> is now unlinked. This is not fine, because a 3rd qemu can now come
> and try to acquire the pidfile (by creating a new one) and succeed,
> despite the second qemu still owning the (now unlinked) pidfile.
> It is possible to deal with this race by making qemu_create_pidfile
> more intelligent . It would have todo
> 1. fd = open(filename)
> 2. fstat(fd)
> 3. lockf(fd)
> 4. stat(filename)
> It must then compare the results of 2 + 4 to ensure the pidfile it
> acquired is the same as the one on disk. With this change, it would
> be safe for QEMU to delete the pidfile on exit.
Why don't we just open the pidfile with (O_CREAT | O_EXCL)? O_EXCL is
supposed to be atomic.
... The open(2) manual on Linux says,
On NFS, O_EXCL is supported only when using NFSv3 or
later on kernel 2.6 or later. In NFS environments where
O_EXCL support is not provided, programs that rely on it
for performing locking tasks will contain a race condi-
>  See the equiv libvirt logic for pidfile acquisition in
To my knowledge, "same file" should be checked with:
a.st_dev == b.st_dev && a.st_ino == b.st_ino
- "filename" is "/var/run/qemu.pid"
- "/var/run" is originally a symbolic link to "/mnt/fs1/"
- between steps #1 and #4, "/var/run" is re-created as a symbolic link
to "/mnt/fs2/" -- a different filesystem from fs1
- "/mnt/fs2/qemu.pid" happens to have the same inode number as