Re: [Qemu-devel] QEMU leaves pidfile behind on exit
On Tue, Feb 13, 2018 at 08:35:23PM +0100, Laszlo Ersek wrote: > On 02/13/18 17:28, Daniel P. Berrangé wrote: > > On Fri, Feb 09, 2018 at 07:12:59PM +, Shaun Reitan wrote: > >> QEMU leaves the pidfile behind on a clean exit when using the option > >> -pidfile /var/run/qemu.pid. > >> > >> Should QEMU leave it behind or should it clean up after itself? > >> > >> I'm willing to take a crack at a patch to fix the issue, but before I do, I > >> want to make sure that leaving the pidfile behind was not intentional? > > > > If QEMU deletes the pidfile on exit then, with the current pidfile > > acquisition logic, there's a race condition possible: > > > > To acquire we do > > > > 1. fd = open() > > 2. lockf(fd) > > > > If the first QEMU that currently owns the pidfile unlinks in, while > > a second qemu is in betweeen steps 1 & 2, the second QEMU will > > acquire the pidfile successfully (which is fine) but the pidfile > > is now unlinked. This is not fine, because a 3rd qemu can now come > > and try to acquire the pidfile (by creating a new one) and succeed, > > despite the second qemu still owning the (now unlinked) pidfile. > > > > It is possible to deal with this race by making qemu_create_pidfile > > more intelligent [1]. It would have todo > > > > 1. fd = open(filename) > > 2. fstat(fd) > > 3. lockf(fd) > > 4. stat(filename) > > > > It must then compare the results of 2 + 4 to ensure the pidfile it > > acquired is the same as the one on disk. With this change, it would > > be safe for QEMU to delete the pidfile on exit. > > Why don't we just open the pidfile with (O_CREAT | O_EXCL)? O_EXCL is > supposed to be atomic. O_EXCL isn't a good idea because if QEMU crashes without cleaning up you have a stale pidfile and O_EXCL will turn that into a failure to acquire pidfile. The key point of using lockf() is to ensure we can cope reliably with stale pidfiles > > ... The open(2) manual on Linux says, > > On NFS, O_EXCL is supported only when using NFSv3 or > later on kernel 2.6 or later. In NFS environments where > O_EXCL support is not provided, programs that rely on it > for performing locking tasks will contain a race condi- > tion. [...] > > Sigh. > > > [1] See the equiv libvirt logic for pidfile acquisition in > > > > https://libvirt.org/git/?p=libvirt.git;a=blob;f=src/util/virpidfile.c;h=58ab29f77f2cfb8583447112dae77a07446bc627;hb=HEAD#l384 > > > > To my knowledge, "same file" should be checked with: > > a.st_dev == b.st_dev && a.st_ino == b.st_ino > > Example: > - "filename" is "/var/run/qemu.pid" > - "/var/run" is originally a symbolic link to "/mnt/fs1/" > - between steps #1 and #4, "/var/run" is re-created as a symbolic link > to "/mnt/fs2/" -- a different filesystem from fs1 > - "/mnt/fs2/qemu.pid" happens to have the same inode number as > "/mnt/fs1/qemu.pid" I don't really think we need to worry about the admin changing symlinks like this while QEMU is in middle of acquiring the PID. Regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
Re: [Qemu-devel] QEMU leaves pidfile behind on exit
On 02/13/18 17:28, Daniel P. Berrangé wrote: > On Fri, Feb 09, 2018 at 07:12:59PM +, Shaun Reitan wrote: >> QEMU leaves the pidfile behind on a clean exit when using the option >> -pidfile /var/run/qemu.pid. >> >> Should QEMU leave it behind or should it clean up after itself? >> >> I'm willing to take a crack at a patch to fix the issue, but before I do, I >> want to make sure that leaving the pidfile behind was not intentional? > > If QEMU deletes the pidfile on exit then, with the current pidfile > acquisition logic, there's a race condition possible: > > To acquire we do > > 1. fd = open() > 2. lockf(fd) > > If the first QEMU that currently owns the pidfile unlinks in, while > a second qemu is in betweeen steps 1 & 2, the second QEMU will > acquire the pidfile successfully (which is fine) but the pidfile > is now unlinked. This is not fine, because a 3rd qemu can now come > and try to acquire the pidfile (by creating a new one) and succeed, > despite the second qemu still owning the (now unlinked) pidfile. > > It is possible to deal with this race by making qemu_create_pidfile > more intelligent [1]. It would have todo > > 1. fd = open(filename) > 2. fstat(fd) > 3. lockf(fd) > 4. stat(filename) > > It must then compare the results of 2 + 4 to ensure the pidfile it > acquired is the same as the one on disk. With this change, it would > be safe for QEMU to delete the pidfile on exit. Why don't we just open the pidfile with (O_CREAT | O_EXCL)? O_EXCL is supposed to be atomic. ... The open(2) manual on Linux says, On NFS, O_EXCL is supported only when using NFSv3 or later on kernel 2.6 or later. In NFS environments where O_EXCL support is not provided, programs that rely on it for performing locking tasks will contain a race condi- tion. [...] Sigh. > [1] See the equiv libvirt logic for pidfile acquisition in > > https://libvirt.org/git/?p=libvirt.git;a=blob;f=src/util/virpidfile.c;h=58ab29f77f2cfb8583447112dae77a07446bc627;hb=HEAD#l384 > To my knowledge, "same file" should be checked with: a.st_dev == b.st_dev && a.st_ino == b.st_ino Example: - "filename" is "/var/run/qemu.pid" - "/var/run" is originally a symbolic link to "/mnt/fs1/" - between steps #1 and #4, "/var/run" is re-created as a symbolic link to "/mnt/fs2/" -- a different filesystem from fs1 - "/mnt/fs2/qemu.pid" happens to have the same inode number as "/mnt/fs1/qemu.pid" Thanks, Laszlo
Re: [Qemu-devel] QEMU leaves pidfile behind on exit
On Fri, Feb 09, 2018 at 07:12:59PM +, Shaun Reitan wrote: > QEMU leaves the pidfile behind on a clean exit when using the option > -pidfile /var/run/qemu.pid. > > Should QEMU leave it behind or should it clean up after itself? > > I'm willing to take a crack at a patch to fix the issue, but before I do, I > want to make sure that leaving the pidfile behind was not intentional? If QEMU deletes the pidfile on exit then, with the current pidfile acquisition logic, there's a race condition possible: To acquire we do 1. fd = open() 2. lockf(fd) If the first QEMU that currently owns the pidfile unlinks in, while a second qemu is in betweeen steps 1 & 2, the second QEMU will acquire the pidfile successfully (which is fine) but the pidfile is now unlinked. This is not fine, because a 3rd qemu can now come and try to acquire the pidfile (by creating a new one) and succeed, despite the second qemu still owning the (now unlinked) pidfile. It is possible to deal with this race by making qemu_create_pidfile more intelligent [1]. It would have todo 1. fd = open(filename) 2. fstat(fd) 3. lockf(fd) 4. stat(filename) It must then compare the results of 2 + 4 to ensure the pidfile it acquired is the same as the one on disk. With this change, it would be safe for QEMU to delete the pidfile on exit. Regards, Daniel [1] See the equiv libvirt logic for pidfile acquisition in https://libvirt.org/git/?p=libvirt.git;a=blob;f=src/util/virpidfile.c;h=58ab29f77f2cfb8583447112dae77a07446bc627;hb=HEAD#l384 -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
[Qemu-devel] QEMU leaves pidfile behind on exit
QEMU leaves the pidfile behind on a clean exit when using the option -pidfile /var/run/qemu.pid. Should QEMU leave it behind or should it clean up after itself? I'm willing to take a crack at a patch to fix the issue, but before I do, I want to make sure that leaving the pidfile behind was not intentional? -- Shaun Reitan NDCHost.com