Re: [PATCH] fence: introduce a file-based self-fence mechanism

2020-02-05 Thread Felipe Franciosi
Hey, sorry for the delay on following up on this. I picked it up
again. Hopefully we can finalise quickly.

> On Dec 5, 2019, at 2:22 PM, Felipe Franciosi  wrote:
> 
> Heya,
> 
>> On Nov 26, 2019, at 12:18 PM, Marc-André Lureau  
>> wrote:
>> 
>> On Mon, Nov 25, 2019 at 8:14 PM Felipe Franciosi  wrote:
>>> 
>>> This introduces a self-fence mechanism to Qemu, causing it to die if a
>>> heartbeat condition is not met. Currently, a file-based heartbeat is
>>> available and can be configured as follows:
>>> 
>>> -object file-fence,id=ff0,file=/foo,qtimeout=20,ktimeout=25,signal=kill
>>> 
>>> Qemu will watch 'file' for attribute changes. Touching the file works as
>>> a heartbeat. This parameter is mandatory.
>>> 
>>> Fencing happens after 'qtimeout' or 'ktimeout' seconds elapse without a
>>> heartbeat. At least one of these must be specified. Both may be used.
>>> 
>>> When using 'qtimeout', an internal Qemu timer is used. Fencing with this
>>> method gives Qemu a chance to write a log message indicating which file
>>> caused the event. If Qemu's main loop is hung for whatever reason, this
>>> method won't successfully kill Qemu.
>>> 
>>> When using 'ktimeout', a kernel timer is used. In this case, 'signal'
>>> can be 'kill' (for SIGKILL, default) or 'quit' (for SIGQUIT). Using
>>> SIGQUIT may be preferred for obtaining core dumps. If Qemu is hung
>>> (eg. uninterruptable sleep), this method won't successfully kill Qemu.
>>> 
>>> It is worth noting that even successfully killing Qemu may not be
>>> sufficient to completely fence a VM as certain operations like network
>>> packets or block commands may be pending in the kernel. If that is a
>>> concern, systems should consider using further fencing mechanisms like
>>> hardware watchdogs either in addition or in conjunction with this for
>>> additional protection.
>>> 
>>> Signed-off-by: Felipe Franciosi 
>>> ---
>>> Based-on: <20191125153619.39893-2-fel...@nutanix.com>
>>> 
>>> Makefile.objs   |   1 +
>>> fence/Makefile.objs |   1 +
>>> fence/file_fence.c  | 381 
>> 
>> I think it could be under backends/
> 
> I thought about it and couldn't make up my mind. My decision was based on:
> - Doesn't really feel like a "backend".
> - I envision other types of self-fencing heartbeats (eg. network-based),
>  in which case this would probably be split in a "common" file.
> 
> Arguably other objects in backends/ also fall within these categories,
> so I'm happy to move if you think it's better. Let me know.

I changed it to backends/ for v2.

> 
> 
>> And a slight preference for - seperated words in filenames over qemu 
>> codebase.
> 
> Sure, will change.

Done for v2.

> 
>> 
>>> qemu-options.hx |  27 +++-
>>> 4 files changed, 409 insertions(+), 1 deletion(-)
>>> create mode 100644 fence/Makefile.objs
>>> create mode 100644 fence/file_fence.c
>>> 
>>> diff --git a/Makefile.objs b/Makefile.objs
>>> index 11ba1a36bd..998eed4796 100644
>>> --- a/Makefile.objs
>>> +++ b/Makefile.objs
>>> @@ -75,6 +75,7 @@ common-obj-$(CONFIG_TPM) += tpm.o
>>> 
>>> common-obj-y += backends/
>>> common-obj-y += chardev/
>>> +common-obj-y += fence/
>>> 
>>> common-obj-$(CONFIG_SECCOMP) += qemu-seccomp.o
>>> qemu-seccomp.o-cflags := $(SECCOMP_CFLAGS)
>>> diff --git a/fence/Makefile.objs b/fence/Makefile.objs
>>> new file mode 100644
>>> index 00..2ed2092568
>>> --- /dev/null
>>> +++ b/fence/Makefile.objs
>>> @@ -0,0 +1 @@
>>> +common-obj-y += file_fence.o
>>> diff --git a/fence/file_fence.c b/fence/file_fence.c
>>> new file mode 100644
>>> index 00..5b743e69d2
>>> --- /dev/null
>>> +++ b/fence/file_fence.c
>>> @@ -0,0 +1,381 @@
>>> +/*
>>> + * QEMU file-based self-fence mechanism
>>> + *
>>> + * Copyright (c) 2019 Nutanix Inc. All rights reserved.
>>> + *
>>> + * Authors:
>>> + *Felipe Franciosi 
>>> + *
>>> + * This library is free software; you can redistribute it and/or
>>> + * modify it under the terms of the GNU Lesser General Public
>>> + * License as published by the Free Software Foundation; either
>>> + * version 2 of the License, or (at your option) any later version.
>>> + *
>>> + * This library is distributed in the hope that it will be useful,
>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>>> + * Lesser General Public License for more details.
>>> + *
>>> + * You should have received a copy of the GNU Lesser General Public
>>> + * License along with this library; if not, see 
>>> >>  >.
>>> + *
>>> + */
>>> +
>>> +#include "qemu/osdep.h"
>>> +#include "qapi/error.h"
>>> +#include "qom/object_interfaces.h"
>>> +#include "qemu/filemonitor.h"
>>> +#include "qemu/timer.h"
>>> +

Re: [PATCH] fence: introduce a file-based self-fence mechanism

2019-12-05 Thread Felipe Franciosi
Heya,

> On Nov 26, 2019, at 12:18 PM, Marc-André Lureau  
> wrote:
> 
> On Mon, Nov 25, 2019 at 8:14 PM Felipe Franciosi  wrote:
>> 
>> This introduces a self-fence mechanism to Qemu, causing it to die if a
>> heartbeat condition is not met. Currently, a file-based heartbeat is
>> available and can be configured as follows:
>> 
>> -object file-fence,id=ff0,file=/foo,qtimeout=20,ktimeout=25,signal=kill
>> 
>> Qemu will watch 'file' for attribute changes. Touching the file works as
>> a heartbeat. This parameter is mandatory.
>> 
>> Fencing happens after 'qtimeout' or 'ktimeout' seconds elapse without a
>> heartbeat. At least one of these must be specified. Both may be used.
>> 
>> When using 'qtimeout', an internal Qemu timer is used. Fencing with this
>> method gives Qemu a chance to write a log message indicating which file
>> caused the event. If Qemu's main loop is hung for whatever reason, this
>> method won't successfully kill Qemu.
>> 
>> When using 'ktimeout', a kernel timer is used. In this case, 'signal'
>> can be 'kill' (for SIGKILL, default) or 'quit' (for SIGQUIT). Using
>> SIGQUIT may be preferred for obtaining core dumps. If Qemu is hung
>> (eg. uninterruptable sleep), this method won't successfully kill Qemu.
>> 
>> It is worth noting that even successfully killing Qemu may not be
>> sufficient to completely fence a VM as certain operations like network
>> packets or block commands may be pending in the kernel. If that is a
>> concern, systems should consider using further fencing mechanisms like
>> hardware watchdogs either in addition or in conjunction with this for
>> additional protection.
>> 
>> Signed-off-by: Felipe Franciosi 
>> ---
>> Based-on: <20191125153619.39893-2-fel...@nutanix.com>
>> 
>> Makefile.objs   |   1 +
>> fence/Makefile.objs |   1 +
>> fence/file_fence.c  | 381 
> 
> I think it could be under backends/

I thought about it and couldn't make up my mind. My decision was based on:
- Doesn't really feel like a "backend".
- I envision other types of self-fencing heartbeats (eg. network-based),
  in which case this would probably be split in a "common" file.

Arguably other objects in backends/ also fall within these categories,
so I'm happy to move if you think it's better. Let me know.


> And a slight preference for - seperated words in filenames over qemu codebase.

Sure, will change.

> 
>> qemu-options.hx |  27 +++-
>> 4 files changed, 409 insertions(+), 1 deletion(-)
>> create mode 100644 fence/Makefile.objs
>> create mode 100644 fence/file_fence.c
>> 
>> diff --git a/Makefile.objs b/Makefile.objs
>> index 11ba1a36bd..998eed4796 100644
>> --- a/Makefile.objs
>> +++ b/Makefile.objs
>> @@ -75,6 +75,7 @@ common-obj-$(CONFIG_TPM) += tpm.o
>> 
>> common-obj-y += backends/
>> common-obj-y += chardev/
>> +common-obj-y += fence/
>> 
>> common-obj-$(CONFIG_SECCOMP) += qemu-seccomp.o
>> qemu-seccomp.o-cflags := $(SECCOMP_CFLAGS)
>> diff --git a/fence/Makefile.objs b/fence/Makefile.objs
>> new file mode 100644
>> index 00..2ed2092568
>> --- /dev/null
>> +++ b/fence/Makefile.objs
>> @@ -0,0 +1 @@
>> +common-obj-y += file_fence.o
>> diff --git a/fence/file_fence.c b/fence/file_fence.c
>> new file mode 100644
>> index 00..5b743e69d2
>> --- /dev/null
>> +++ b/fence/file_fence.c
>> @@ -0,0 +1,381 @@
>> +/*
>> + * QEMU file-based self-fence mechanism
>> + *
>> + * Copyright (c) 2019 Nutanix Inc. All rights reserved.
>> + *
>> + * Authors:
>> + *Felipe Franciosi 
>> + *
>> + * This library is free software; you can redistribute it and/or
>> + * modify it under the terms of the GNU Lesser General Public
>> + * License as published by the Free Software Foundation; either
>> + * version 2 of the License, or (at your option) any later version.
>> + *
>> + * This library is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> + * Lesser General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU Lesser General Public
>> + * License along with this library; if not, see 
>> >  >.
>> + *
>> + */
>> +
>> +#include "qemu/osdep.h"
>> +#include "qapi/error.h"
>> +#include "qom/object_interfaces.h"
>> +#include "qemu/filemonitor.h"
>> +#include "qemu/timer.h"
>> +
>> +#include 
>> +
>> +#define TYPE_FILE_FENCE "file-fence"
>> +
>> +typedef struct FileFence {
>> +Object parent_obj;
>> +
>> +gchar *dir;
>> +gchar *file;
>> +uint32_t qtimeout;
>> +uint32_t ktimeout;
>> +int signal;
>> +
>> +timer_t ktimer;
>> +QEMUTimer *qtimer;
>> +
>> +QFileMonitor *fm;
>> +uint64_t id;
>> +} 

Re: [PATCH] fence: introduce a file-based self-fence mechanism

2019-11-26 Thread Marc-André Lureau
On Mon, Nov 25, 2019 at 8:14 PM Felipe Franciosi  wrote:
>
> This introduces a self-fence mechanism to Qemu, causing it to die if a
> heartbeat condition is not met. Currently, a file-based heartbeat is
> available and can be configured as follows:
>
> -object file-fence,id=ff0,file=/foo,qtimeout=20,ktimeout=25,signal=kill
>
> Qemu will watch 'file' for attribute changes. Touching the file works as
> a heartbeat. This parameter is mandatory.
>
> Fencing happens after 'qtimeout' or 'ktimeout' seconds elapse without a
> heartbeat. At least one of these must be specified. Both may be used.
>
> When using 'qtimeout', an internal Qemu timer is used. Fencing with this
> method gives Qemu a chance to write a log message indicating which file
> caused the event. If Qemu's main loop is hung for whatever reason, this
> method won't successfully kill Qemu.
>
> When using 'ktimeout', a kernel timer is used. In this case, 'signal'
> can be 'kill' (for SIGKILL, default) or 'quit' (for SIGQUIT). Using
> SIGQUIT may be preferred for obtaining core dumps. If Qemu is hung
> (eg. uninterruptable sleep), this method won't successfully kill Qemu.
>
> It is worth noting that even successfully killing Qemu may not be
> sufficient to completely fence a VM as certain operations like network
> packets or block commands may be pending in the kernel. If that is a
> concern, systems should consider using further fencing mechanisms like
> hardware watchdogs either in addition or in conjunction with this for
> additional protection.
>
> Signed-off-by: Felipe Franciosi 
> ---
> Based-on: <20191125153619.39893-2-fel...@nutanix.com>
>
>  Makefile.objs   |   1 +
>  fence/Makefile.objs |   1 +
>  fence/file_fence.c  | 381 

I think it could be under backends/
And a slight preference for - seperated words in filenames over qemu codebase.

>  qemu-options.hx |  27 +++-
>  4 files changed, 409 insertions(+), 1 deletion(-)
>  create mode 100644 fence/Makefile.objs
>  create mode 100644 fence/file_fence.c
>
> diff --git a/Makefile.objs b/Makefile.objs
> index 11ba1a36bd..998eed4796 100644
> --- a/Makefile.objs
> +++ b/Makefile.objs
> @@ -75,6 +75,7 @@ common-obj-$(CONFIG_TPM) += tpm.o
>
>  common-obj-y += backends/
>  common-obj-y += chardev/
> +common-obj-y += fence/
>
>  common-obj-$(CONFIG_SECCOMP) += qemu-seccomp.o
>  qemu-seccomp.o-cflags := $(SECCOMP_CFLAGS)
> diff --git a/fence/Makefile.objs b/fence/Makefile.objs
> new file mode 100644
> index 00..2ed2092568
> --- /dev/null
> +++ b/fence/Makefile.objs
> @@ -0,0 +1 @@
> +common-obj-y += file_fence.o
> diff --git a/fence/file_fence.c b/fence/file_fence.c
> new file mode 100644
> index 00..5b743e69d2
> --- /dev/null
> +++ b/fence/file_fence.c
> @@ -0,0 +1,381 @@
> +/*
> + * QEMU file-based self-fence mechanism
> + *
> + * Copyright (c) 2019 Nutanix Inc. All rights reserved.
> + *
> + * Authors:
> + *Felipe Franciosi 
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2 of the License, or (at your option) any later version.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this library; if not, see 
> .
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qapi/error.h"
> +#include "qom/object_interfaces.h"
> +#include "qemu/filemonitor.h"
> +#include "qemu/timer.h"
> +
> +#include 
> +
> +#define TYPE_FILE_FENCE "file-fence"
> +
> +typedef struct FileFence {
> +Object parent_obj;
> +
> +gchar *dir;
> +gchar *file;
> +uint32_t qtimeout;
> +uint32_t ktimeout;
> +int signal;
> +
> +timer_t ktimer;
> +QEMUTimer *qtimer;
> +
> +QFileMonitor *fm;
> +uint64_t id;
> +} FileFence;
> +
> +#define FILE_FENCE(obj) \
> +OBJECT_CHECK(FileFence, (obj), TYPE_FILE_FENCE)
> +
> +static void
> +timer_update(FileFence *ff)
> +{
> +struct itimerspec its = {
> +.it_value.tv_sec = ff->ktimeout,
> +};
> +int err;
> +
> +if (ff->qtimeout) {
> +timer_mod(ff->qtimer, qemu_clock_get_ms(QEMU_CLOCK_REALTIME) +
> +  ff->qtimeout * 1000);
> +}
> +
> +if (ff->ktimeout) {
> +err = timer_settime(ff->ktimer, 0, , NULL);
> +g_assert(err == 0);
> +}
> +}
> +
> +static void
> +file_fence_abort_cb(void *opaque)
> +{
> +FileFence *ff = opaque;
> +printf("Fencing after %u seconds on '%s'\n",
> +   ff->qtimeout, g_strconcat(ff->dir, "/", ff->file, NULL));

May be error_printf() instead.

> +abort();

[PATCH] fence: introduce a file-based self-fence mechanism

2019-11-25 Thread Felipe Franciosi
This introduces a self-fence mechanism to Qemu, causing it to die if a
heartbeat condition is not met. Currently, a file-based heartbeat is
available and can be configured as follows:

-object file-fence,id=ff0,file=/foo,qtimeout=20,ktimeout=25,signal=kill

Qemu will watch 'file' for attribute changes. Touching the file works as
a heartbeat. This parameter is mandatory.

Fencing happens after 'qtimeout' or 'ktimeout' seconds elapse without a
heartbeat. At least one of these must be specified. Both may be used.

When using 'qtimeout', an internal Qemu timer is used. Fencing with this
method gives Qemu a chance to write a log message indicating which file
caused the event. If Qemu's main loop is hung for whatever reason, this
method won't successfully kill Qemu.

When using 'ktimeout', a kernel timer is used. In this case, 'signal'
can be 'kill' (for SIGKILL, default) or 'quit' (for SIGQUIT). Using
SIGQUIT may be preferred for obtaining core dumps. If Qemu is hung
(eg. uninterruptable sleep), this method won't successfully kill Qemu.

It is worth noting that even successfully killing Qemu may not be
sufficient to completely fence a VM as certain operations like network
packets or block commands may be pending in the kernel. If that is a
concern, systems should consider using further fencing mechanisms like
hardware watchdogs either in addition or in conjunction with this for
additional protection.

Signed-off-by: Felipe Franciosi 
---
Based-on: <20191125153619.39893-2-fel...@nutanix.com>

 Makefile.objs   |   1 +
 fence/Makefile.objs |   1 +
 fence/file_fence.c  | 381 
 qemu-options.hx |  27 +++-
 4 files changed, 409 insertions(+), 1 deletion(-)
 create mode 100644 fence/Makefile.objs
 create mode 100644 fence/file_fence.c

diff --git a/Makefile.objs b/Makefile.objs
index 11ba1a36bd..998eed4796 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -75,6 +75,7 @@ common-obj-$(CONFIG_TPM) += tpm.o
 
 common-obj-y += backends/
 common-obj-y += chardev/
+common-obj-y += fence/
 
 common-obj-$(CONFIG_SECCOMP) += qemu-seccomp.o
 qemu-seccomp.o-cflags := $(SECCOMP_CFLAGS)
diff --git a/fence/Makefile.objs b/fence/Makefile.objs
new file mode 100644
index 00..2ed2092568
--- /dev/null
+++ b/fence/Makefile.objs
@@ -0,0 +1 @@
+common-obj-y += file_fence.o
diff --git a/fence/file_fence.c b/fence/file_fence.c
new file mode 100644
index 00..5b743e69d2
--- /dev/null
+++ b/fence/file_fence.c
@@ -0,0 +1,381 @@
+/*
+ * QEMU file-based self-fence mechanism
+ *
+ * Copyright (c) 2019 Nutanix Inc. All rights reserved.
+ *
+ * Authors:
+ *Felipe Franciosi 
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qom/object_interfaces.h"
+#include "qemu/filemonitor.h"
+#include "qemu/timer.h"
+
+#include 
+
+#define TYPE_FILE_FENCE "file-fence"
+
+typedef struct FileFence {
+Object parent_obj;
+
+gchar *dir;
+gchar *file;
+uint32_t qtimeout;
+uint32_t ktimeout;
+int signal;
+
+timer_t ktimer;
+QEMUTimer *qtimer;
+
+QFileMonitor *fm;
+uint64_t id;
+} FileFence;
+
+#define FILE_FENCE(obj) \
+OBJECT_CHECK(FileFence, (obj), TYPE_FILE_FENCE)
+
+static void
+timer_update(FileFence *ff)
+{
+struct itimerspec its = {
+.it_value.tv_sec = ff->ktimeout,
+};
+int err;
+
+if (ff->qtimeout) {
+timer_mod(ff->qtimer, qemu_clock_get_ms(QEMU_CLOCK_REALTIME) +
+  ff->qtimeout * 1000);
+}
+
+if (ff->ktimeout) {
+err = timer_settime(ff->ktimer, 0, , NULL);
+g_assert(err == 0);
+}
+}
+
+static void
+file_fence_abort_cb(void *opaque)
+{
+FileFence *ff = opaque;
+printf("Fencing after %u seconds on '%s'\n",
+   ff->qtimeout, g_strconcat(ff->dir, "/", ff->file, NULL));
+abort();
+}
+
+static void
+file_fence_watch_cb(int64_t id, QFileMonitorEvent ev, const char *file,
+void *opaque)
+{
+FileFence *ff = opaque;
+
+if (ev != QFILE_MONITOR_EVENT_ATTRIBUTES) {
+return;
+}
+
+if (g_strcmp0(file, ff->file) != 0) {
+return;
+}
+
+timer_update(ff);
+}
+
+static void
+ktimer_tear(FileFence *ff)
+{
+int err;
+
+if (ff->ktimer) {
+err = timer_delete(ff->ktimer);
+g_assert(err == 0);
+