Re: [libvirt] [PATCH] virtlockd: acquire locks on re-exec

2018-05-29 Thread Jim Fehlig

On 05/28/2018 04:34 AM, Matthew Richardson wrote:

Locks held by virtlockd are dropped on re-exec.

virtlockd   94306 POSIX 5.4G WRITE 0 0   0 /tmp/test.qcow2
virtlockd   94306 POSIX   5B WRITE 0 0   0 /run/virtlockd.pid
virtlockd   94306 POSIX   5B WRITE 0 0   0 /run/virtlockd.pid

Acquire locks in PostExecRestart code path.

Signed-off-by: Jim Fehlig 


I've just encountered this same unexpected behaviour (currently on RHEL7
box running libvirt 3.2), and found this thread while searching for a
bug report.

Has this patch been accepted, or the issue fixed in a later release? I
notice there was some confusion in the conversation following as to the
cause of this issue/the fix for it.


The problem with losing fcntl locks on execve was worked around in libvirt with 
this patch series (included in libvirt 4.2.0)


https://www.redhat.com/archives/libvir-list/2018-March/msg00298.html


If not, is there a bug open against this which I can follow/comment on?


There's an open bug to track the lost locks

https://bugzilla.redhat.com/show_bug.cgi?id=1552621

Regards,
Jim

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] virtlockd: acquire locks on re-exec

2018-05-29 Thread Matthew Richardson
> Locks held by virtlockd are dropped on re-exec.
>
> virtlockd   94306 POSIX 5.4G WRITE 0 0   0 /tmp/test.qcow2
> virtlockd   94306 POSIX   5B WRITE 0 0   0 /run/virtlockd.pid
> virtlockd   94306 POSIX   5B WRITE 0 0   0 /run/virtlockd.pid
>
> Acquire locks in PostExecRestart code path.
>
> Signed-off-by: Jim Fehlig 

I've just encountered this same unexpected behaviour (currently on RHEL7
box running libvirt 3.2), and found this thread while searching for a
bug report.

Has this patch been accepted, or the issue fixed in a later release? I
notice there was some confusion in the conversation following as to the
cause of this issue/the fix for it.

If not, is there a bug open against this which I can follow/comment on?

Cheers,

Matthew



signature.asc
Description: OpenPGP digital signature
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH] virtlockd: acquire locks on re-exec

2018-03-05 Thread Daniel P . Berrangé
On Fri, Mar 02, 2018 at 03:24:53PM -0700, Jim Fehlig wrote:
> On 03/02/2018 11:12 AM, Daniel P. Berrangé wrote:
> > On Fri, Mar 02, 2018 at 04:52:23PM +, Daniel P. Berrangé wrote:
> > > On Thu, Mar 01, 2018 at 04:42:36PM -0700, Jim Fehlig wrote:
> > > > Locks held by virtlockd are dropped on re-exec.
> > > > 
> > > > virtlockd   94306 POSIX 5.4G WRITE 0 0   0 /tmp/test.qcow2
> > > > virtlockd   94306 POSIX   5B WRITE 0 0   0 /run/virtlockd.pid
> > > > virtlockd   94306 POSIX   5B WRITE 0 0   0 /run/virtlockd.pid
> > > > 
> > > > Acquire locks in PostExecRestart code path.
> > > 
> > > This is really strange and should *not* be happening.  POSIX locks
> > > are supposed to be preserved across execve() if the FD has CLOEXEC
> > > unset, and you don't fork() before the exec.
> > 
> > [snip]
> > 
> > > So I wonder what we've screwed up to cause the locks to get
> > > released - reaquiring them definitely isn't desirable as we
> > > should not loose them in the first place !
> > 
> > This is really very strange. The problem seems to be the existance of
> > threads at time of execve().
> > 
> > If you spawn a thread and the thread exits, and you execve the locks
> > are preserved.
> > 
> > If you spawn a thread and the thread is still running, and you execve
> > the locks are lost.
> 
> Indeed you are correct. I'm seeing the same behavior with the below
> modifications to your demo. The lock is preserved after execve when
> BREAK_FLOCK is 0, but removed when BREAK_FLOCK is 1.

I tried RHEL7 vintage kernel and saw the same behaviour, so this seems
to be a fairly long standing issue.

In theory we can make virNetServer single-threaded by passing max_workers==0
when creating it, but I just tried that and hit deadlock, so it seems that
code has bit-rotted. If I we can fix that, then I think leaving it single
threaded is the easiest way to deal with this. Failing that, we would have
to arrange to cleanly terminate all threads before re-exec'ing.


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH] virtlockd: acquire locks on re-exec

2018-03-02 Thread Jim Fehlig

On 03/02/2018 11:12 AM, Daniel P. Berrangé wrote:

On Fri, Mar 02, 2018 at 04:52:23PM +, Daniel P. Berrangé wrote:

On Thu, Mar 01, 2018 at 04:42:36PM -0700, Jim Fehlig wrote:

Locks held by virtlockd are dropped on re-exec.

virtlockd   94306 POSIX 5.4G WRITE 0 0   0 /tmp/test.qcow2
virtlockd   94306 POSIX   5B WRITE 0 0   0 /run/virtlockd.pid
virtlockd   94306 POSIX   5B WRITE 0 0   0 /run/virtlockd.pid

Acquire locks in PostExecRestart code path.


This is really strange and should *not* be happening.  POSIX locks
are supposed to be preserved across execve() if the FD has CLOEXEC
unset, and you don't fork() before the exec.


[snip]


So I wonder what we've screwed up to cause the locks to get
released - reaquiring them definitely isn't desirable as we
should not loose them in the first place !


This is really very strange. The problem seems to be the existance of
threads at time of execve().

If you spawn a thread and the thread exits, and you execve the locks
are preserved.

If you spawn a thread and the thread is still running, and you execve
the locks are lost.


Indeed you are correct. I'm seeing the same behavior with the below 
modifications to your demo. The lock is preserved after execve when BREAK_FLOCK 
is 0, but removed when BREAK_FLOCK is 1.


--- lock.c  2018-03-02 15:10:59.200154182 -0700
+++ lock-thr.c  2018-03-02 15:14:30.501441105 -0700
@@ -4,6 +4,15 @@
 #include 
 #include 

+#define BREAK_FLOCK 1
+
+static void *thr_func(void *arg)
+{
+#if BREAK_FLOCK == 1
+while (1)
+#endif
+  sleep(5);
+}

 int main(int argc, char **argv) {

@@ -33,6 +42,13 @@
 sleep(50);

   } else {
+pthread_t thr;
+
+if (pthread_create(&thr, NULL, thr_func, NULL) != 0) {
+  fprintf(stderr, "pthread_create failed\n");
+  abort();
+}
+
 int fd = open("lock.txt", O_WRONLY|O_CREAT|O_TRUNC, 0755);
 if (fd < 0)
   abort();


This behaviour makes no sense at all to time. Why should it matter if
the thread exits itself, or is force exited during execve(). I wonder
if it is even possibly a kernel bug.


I'll attach the reproducer to an internal bug (sorry!), but will report back 
here with any findings.


Regards,
Jim

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH] virtlockd: acquire locks on re-exec

2018-03-02 Thread Daniel P . Berrangé
On Fri, Mar 02, 2018 at 04:52:23PM +, Daniel P. Berrangé wrote:
> On Thu, Mar 01, 2018 at 04:42:36PM -0700, Jim Fehlig wrote:
> > Locks held by virtlockd are dropped on re-exec.
> > 
> > virtlockd   94306 POSIX 5.4G WRITE 0 0   0 /tmp/test.qcow2
> > virtlockd   94306 POSIX   5B WRITE 0 0   0 /run/virtlockd.pid
> > virtlockd   94306 POSIX   5B WRITE 0 0   0 /run/virtlockd.pid
> > 
> > Acquire locks in PostExecRestart code path.
> 
> This is really strange and should *not* be happening.  POSIX locks
> are supposed to be preserved across execve() if the FD has CLOEXEC
> unset, and you don't fork() before the exec.

[snip]

> So I wonder what we've screwed up to cause the locks to get
> released - reaquiring them definitely isn't desirable as we
> should not loose them in the first place !

This is really very strange. The problem seems to be the existance of
threads at time of execve().

If you spawn a thread and the thread exits, and you execve the locks
are preserved.

If you spawn a thread and the thread is still running, and you execve
the locks are lost.

This behaviour makes no sense at all to time. Why should it matter if
the thread exits itself, or is force exited during execve(). I wonder
if it is even possibly a kernel bug.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH] virtlockd: acquire locks on re-exec

2018-03-02 Thread Daniel P . Berrangé
On Thu, Mar 01, 2018 at 04:42:36PM -0700, Jim Fehlig wrote:
> Locks held by virtlockd are dropped on re-exec.
> 
> virtlockd   94306 POSIX 5.4G WRITE 0 0   0 /tmp/test.qcow2
> virtlockd   94306 POSIX   5B WRITE 0 0   0 /run/virtlockd.pid
> virtlockd   94306 POSIX   5B WRITE 0 0   0 /run/virtlockd.pid
> 
> Acquire locks in PostExecRestart code path.

This is really strange and should *not* be happening.  POSIX locks
are supposed to be preserved across execve() if the FD has CLOEXEC
unset, and you don't fork() before the exec.

eg see this demo program:

#include 
#include 
#include 
#include 
#include 


int main(int argc, char **argv) {

  if (argc == 2) {
int fd = atoi(argv[1]);

struct flock fl = {
  .l_type = F_WRLCK,
  .l_whence = SEEK_SET,
  .l_start = 0,
  .l_len = 42,
};

if (fcntl(fd, F_GETLK, &fl) < 0)
  abort();

int flags;
if ((flags = fcntl(fd, F_GETFD)) < 0)
  abort();
flags |= FD_CLOEXEC;

if (fcntl(fd, F_SETFD, flags) < 0)
  abort();

fprintf(stderr, "Owned %d\n", fl.l_pid);
fprintf(stderr, "Execd\n");
sleep(50);

  } else {
int fd = open("lock.txt", O_WRONLY|O_CREAT|O_TRUNC, 0755);
if (fd < 0)
  abort();

struct flock fl = {
  .l_type = F_WRLCK,
  .l_whence = SEEK_SET,
  .l_start = 0,
  .l_len = 42,
};

if (fcntl(fd, F_SETLK, &fl) < 0)
  abort();

int flags;

if ((flags = fcntl(fd, F_GETFD)) < 0)
  abort();
flags &= ~FD_CLOEXEC;

if (fcntl(fd, F_SETFD, flags) < 0)
  abort();

char fdstr[100];
snprintf(fdstr, sizeof(fdstr), "%d", fd);

char *newargv[] = { argv[0], fdstr, NULL };

fprintf(stderr, "Waiting\n");
sleep(10);
execve(argv[0], newargv, NULL);
  }
}



If you run this, you'll see the lock still exists after execveI().

So I wonder what we've screwed up to cause the locks to get
released - reaquiring them definitely isn't desirable as we
should not loose them in the first place !

> 
> Signed-off-by: Jim Fehlig 
> ---
> 
> The CLOEXEC flag is set in virLockSpaceNewPostExecRestart(), so I assume
> it is fine to call virFileLock() here as well.
> 
>  src/util/virlockspace.c | 16 
>  1 file changed, 16 insertions(+)
> 
> diff --git a/src/util/virlockspace.c b/src/util/virlockspace.c
> index 41af0cdb6..420878b0a 100644
> --- a/src/util/virlockspace.c
> +++ b/src/util/virlockspace.c
> @@ -337,6 +337,7 @@ virLockSpacePtr 
> virLockSpaceNewPostExecRestart(virJSONValuePtr object)
>  virJSONValuePtr owners;
>  size_t j;
>  ssize_t m;
> +bool shared = false;
>  
>  if (VIR_ALLOC(res) < 0)
>  goto error;
> @@ -389,6 +390,21 @@ virLockSpacePtr 
> virLockSpaceNewPostExecRestart(virJSONValuePtr object)
>  goto error;
>  }
>  
> +shared = !!(res->flags & VIR_LOCK_SPACE_ACQUIRE_SHARED);
> +if (virFileLock(res->fd, shared, 0, 1, false) < 0) {
> +if (errno == EACCES || errno == EAGAIN) {
> +virReportError(VIR_ERR_RESOURCE_BUSY,
> +   _("Lockspace resource '%s' is locked"),
> +   res->name);
> +} else {
> +virReportSystemError(errno,
> + _("Unable to acquire lock on '%s'"),
> + res->path);
> +}
> +virLockSpaceResourceFree(res);
> +goto error;
> +}
> +
>  if (!(owners = virJSONValueObjectGet(child, "owners"))) {
>  virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
> _("Missing resource owners in JSON document"));
> -- 
> 2.16.2
> 
> --
> libvir-list mailing list
> libvir-list@redhat.com
> https://www.redhat.com/mailman/listinfo/libvir-list

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] virtlockd: acquire locks on re-exec

2018-03-01 Thread Martin Kletzander

On Thu, Mar 01, 2018 at 04:42:36PM -0700, Jim Fehlig wrote:

Locks held by virtlockd are dropped on re-exec.

virtlockd   94306 POSIX 5.4G WRITE 0 0   0 /tmp/test.qcow2
virtlockd   94306 POSIX   5B WRITE 0 0   0 /run/virtlockd.pid
virtlockd   94306 POSIX   5B WRITE 0 0   0 /run/virtlockd.pid

Acquire locks in PostExecRestart code path.

Signed-off-by: Jim Fehlig 
---

The CLOEXEC flag is set in virLockSpaceNewPostExecRestart(), so I assume
it is fine to call virFileLock() here as well.



I'm not that familiar with the internals of virtlockd, but I can't think
of what commands would the daemon run.  So can't we just keep the fd's
without O_CLOEXEC?  If not, then we could at least remove the flag
before re-exec (in PreExec) and then set it again back after (in
PostExec).  But more opinions might be better.

Martin


signature.asc
Description: Digital signature
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list