Re: [Qemu-devel] [PATCH v3 3/3] hostmem-file: add 'sync' option

2018-01-24 Thread Haozhong Zhang
On 01/24/18 22:23 +0200, Michael S. Tsirkin wrote:
> On Wed, Jan 17, 2018 at 04:13:25PM +0800, Haozhong Zhang wrote:
> > This option controls whether QEMU mmap(2) the memory backend file with
> > MAP_SYNC flag, which can fully guarantee the guest write persistence
> > to the backend, if MAP_SYNC flag is supported by the host kernel
> > (Linux kernel 4.15 and later) and the backend is a file supporting
> > DAX (e.g., file on ext4/xfs file system mounted with '-o dax').
> > 
> > It can take one of following values:
> >  - on:  try to pass MAP_SYNC to mmap(2); if MAP_SYNC is not supported or
> > 'share=off', QEMU will abort
> >  - off: never pass MAP_SYNC to mmap(2)
> >  - auto (default): if MAP_SYNC is supported and 'share=on', work as if
> > 'sync=on'; otherwise, work as if 'sync=off'
> > 
> > Signed-off-by: Haozhong Zhang 
> > Suggested-by: Eduardo Habkost 

[..]
> >  
> >  @table @option
> >  
> > -@item -object 
> > memory-backend-file,id=@var{id},size=@var{size},mem-path=@var{dir},share=@var{on|off},discard-data=@var{on|off},merge=@var{on|off},dump=@var{on|off},prealloc=@var{on|off},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave},align=@var{align}
> > +@item -object 
> > memory-backend-file,id=@var{id},size=@var{size},mem-path=@var{dir},share=@var{on|off},discard-data=@var{on|off},merge=@var{on|off},dump=@var{on|off},prealloc=@var{on|off},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave},align=@var{align},sync=@var{on|off|auto}
> >  
> >  Creates a memory file backend object, which can be used to back
> >  the guest RAM with huge pages.
> > @@ -4034,6 +4034,25 @@ requires an alignment different than the default one 
> > used by QEMU, eg
> >  the device DAX /dev/dax0.0 requires 2M alignment rather than 4K. In
> >  such cases, users can specify the required alignment via this option.
> >  
> > +The @option{sync} option specifies whether QEMU mmap(2) @option{mem-path}
> > +with MAP_SYNC flag, which can fully guarantee the guest write
> > +persistence to @option{mem-path}.
> 
> I would add ... even in case of a host power loss.
> Here and wherever you say "fully".

Without MAP_SYNC, QEMU can only guarantee the guest data is written to
the host NVDIMM after, for example, guest clwb+sfence. However, if
some host file system meta data of the mapped file have not been
written back to the host NVDIMM when a host power failure happens, the
mapped file may be broken though all its data may be still there.

Anyway, I'll remove the confusing word "fully" and add your suggestion.

Thanks,
Haozhong

> 
> > MAP_SYNC requires supports from both
> > +the host kernel (since Linux kernel 4.15) and @option{mem-path} (only
> > +files supporting DAX). It can take one of following values:
> > +
> > +@table @option
> > +@item @var{on}
> > +try to pass MAP_SYNC to mmap(2); if MAP_SYNC is not supported or
> > +@option{share}=@var{off}, QEMU will abort
> > +
> > +@item @var{off}
> > +never pass MAP_SYNC to mmap(2)
> > +
> > +@item @var{auto} (default)
> > +if MAP_SYNC is supported and @option{share}=@var{on}, work as if
> > +@option{sync}=@var{on}; otherwise, work as if @option{sync}=@var{off}
> > +@end table
> > +
> >  @item -object 
> > memory-backend-ram,id=@var{id},merge=@var{on|off},dump=@var{on|off},prealloc=@var{on|off},size=@var{size},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave}
> >  
> >  Creates a memory backend object, which can be used to back the guest RAM.
> > -- 
> > 2.14.1



Re: [Qemu-devel] [PATCH v3 3/3] hostmem-file: add 'sync' option

2018-01-24 Thread Michael S. Tsirkin
On Wed, Jan 17, 2018 at 04:13:25PM +0800, Haozhong Zhang wrote:
> This option controls whether QEMU mmap(2) the memory backend file with
> MAP_SYNC flag, which can fully guarantee the guest write persistence
> to the backend, if MAP_SYNC flag is supported by the host kernel
> (Linux kernel 4.15 and later) and the backend is a file supporting
> DAX (e.g., file on ext4/xfs file system mounted with '-o dax').
> 
> It can take one of following values:
>  - on:  try to pass MAP_SYNC to mmap(2); if MAP_SYNC is not supported or
> 'share=off', QEMU will abort
>  - off: never pass MAP_SYNC to mmap(2)
>  - auto (default): if MAP_SYNC is supported and 'share=on', work as if
> 'sync=on'; otherwise, work as if 'sync=off'
> 
> Signed-off-by: Haozhong Zhang 
> Suggested-by: Eduardo Habkost 
> ---
>  backends/hostmem-file.c | 40 +++-
>  docs/nvdimm.txt | 15 ++-
>  exec.c  | 13 -
>  include/exec/memory.h   |  4 
>  include/exec/ram_addr.h |  6 +++---
>  memory.c|  6 --
>  numa.c  |  2 +-
>  qemu-options.hx | 21 -
>  8 files changed, 93 insertions(+), 14 deletions(-)
> 
> diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
> index ed7d145365..96dff38619 100644
> --- a/backends/hostmem-file.c
> +++ b/backends/hostmem-file.c
> @@ -15,6 +15,7 @@
>  #include "sysemu/hostmem.h"
>  #include "sysemu/sysemu.h"
>  #include "qom/object_interfaces.h"
> +#include "qapi-visit.h"
>  
>  /* hostmem-file.c */
>  /**
> @@ -35,6 +36,7 @@ struct HostMemoryBackendFile {
>  bool discard_data;
>  char *mem_path;
>  uint64_t align;
> +OnOffAuto sync;
>  };
>  
>  static void
> @@ -60,7 +62,7 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error 
> **errp)
>  memory_region_init_ram_from_file(>mr, OBJECT(backend),
>   path,
>   backend->size, fb->align, fb->share,
> - fb->mem_path, errp);
> + fb->sync, fb->mem_path, errp);
>  g_free(path);
>  }
>  #endif
> @@ -153,6 +155,39 @@ static void file_memory_backend_set_align(Object *o, 
> Visitor *v,
>  error_propagate(errp, local_err);
>  }
>  
> +static void file_memory_backend_get_sync(
> +Object *obj, Visitor *v, const char *name, void *opaque, Error **errp)
> +{
> +HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(obj);
> +OnOffAuto value = fb->sync;
> +
> +visit_type_OnOffAuto(v, name, , errp);
> +}
> +
> +static void file_memory_backend_set_sync(
> +Object *obj, Visitor *v, const char *name, void *opaque, Error **errp)
> +{
> +HostMemoryBackend *backend = MEMORY_BACKEND(obj);
> +HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(obj);
> +Error *local_err = NULL;
> +OnOffAuto value;
> +
> +if (host_memory_backend_mr_inited(backend)) {
> +error_setg(_err, "cannot change property '%s' of %s '%s'",
> +   name, object_get_typename(obj), backend->id);
> +goto out;
> +}
> +
> +visit_type_OnOffAuto(v, name, , _err);
> +if (local_err) {
> +goto out;
> +}
> +fb->sync = value;
> +
> + out:
> +error_propagate(errp, local_err);
> +}
> +
>  static void file_backend_unparent(Object *obj)
>  {
>  HostMemoryBackend *backend = MEMORY_BACKEND(obj);
> @@ -187,6 +222,9 @@ file_backend_class_init(ObjectClass *oc, void *data)
>  file_memory_backend_get_align,
>  file_memory_backend_set_align,
>  NULL, NULL, _abort);
> +object_class_property_add(oc, "sync", "OnOffAuto",
> +file_memory_backend_get_sync, file_memory_backend_set_sync,
> +NULL, NULL, _abort);
>  }
>  
>  static void file_backend_instance_finalize(Object *o)
> diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
> index e903d8bb09..49b174fe66 100644
> --- a/docs/nvdimm.txt
> +++ b/docs/nvdimm.txt
> @@ -143,10 +143,23 @@ Guest Data Persistence
>  --
>  
>  Though QEMU supports multiple types of vNVDIMM backends on Linux,
> -currently the only one that can guarantee the guest write persistence
> +if MAP_SYNC is not supported by the host kernel and the backends,
> +the only backend that can guarantee the guest write persistence
>  is the device DAX on the real NVDIMM device (e.g., /dev/dax0.0), to
>  which all guest access do not involve any host-side kernel cache.
>  
> +mmap(2) flag MAP_SYNC is added since Linux kernel 4.15. On such
> +systems, QEMU can mmap(2) the backend with MAP_SYNC, which can
> +guarantee the guest write persistence to vNVDIMM. Besides the host
> +kernel support, enabling MAP_SYNC in QEMU also requires:
> +
> + - the backend is a file supporting DAX, e.g., a file on an ext4 or
> +   xfs file system mounted with "-o dax",
> +
> + - 'sync' option of memory-backend-file is not 'off', and

Re: [Qemu-devel] [PATCH v3 3/3] hostmem-file: add 'sync' option

2018-01-24 Thread Michael S. Tsirkin
On Wed, Jan 17, 2018 at 04:13:25PM +0800, Haozhong Zhang wrote:
> This option controls whether QEMU mmap(2) the memory backend file with
> MAP_SYNC flag, which can fully guarantee the guest write persistence
> to the backend, if MAP_SYNC flag is supported by the host kernel
> (Linux kernel 4.15 and later) and the backend is a file supporting
> DAX (e.g., file on ext4/xfs file system mounted with '-o dax').
> 
> It can take one of following values:
>  - on:  try to pass MAP_SYNC to mmap(2); if MAP_SYNC is not supported or
> 'share=off', QEMU will abort
>  - off: never pass MAP_SYNC to mmap(2)
>  - auto (default): if MAP_SYNC is supported and 'share=on', work as if
> 'sync=on'; otherwise, work as if 'sync=off'
> 
> Signed-off-by: Haozhong Zhang 
> Suggested-by: Eduardo Habkost 

Reviewed-by: Michael S. Tsirkin 


> ---
>  backends/hostmem-file.c | 40 +++-
>  docs/nvdimm.txt | 15 ++-
>  exec.c  | 13 -
>  include/exec/memory.h   |  4 
>  include/exec/ram_addr.h |  6 +++---
>  memory.c|  6 --
>  numa.c  |  2 +-
>  qemu-options.hx | 21 -
>  8 files changed, 93 insertions(+), 14 deletions(-)
> 
> diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
> index ed7d145365..96dff38619 100644
> --- a/backends/hostmem-file.c
> +++ b/backends/hostmem-file.c
> @@ -15,6 +15,7 @@
>  #include "sysemu/hostmem.h"
>  #include "sysemu/sysemu.h"
>  #include "qom/object_interfaces.h"
> +#include "qapi-visit.h"
>  
>  /* hostmem-file.c */
>  /**
> @@ -35,6 +36,7 @@ struct HostMemoryBackendFile {
>  bool discard_data;
>  char *mem_path;
>  uint64_t align;
> +OnOffAuto sync;
>  };
>  
>  static void
> @@ -60,7 +62,7 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error 
> **errp)
>  memory_region_init_ram_from_file(>mr, OBJECT(backend),
>   path,
>   backend->size, fb->align, fb->share,
> - fb->mem_path, errp);
> + fb->sync, fb->mem_path, errp);
>  g_free(path);
>  }
>  #endif
> @@ -153,6 +155,39 @@ static void file_memory_backend_set_align(Object *o, 
> Visitor *v,
>  error_propagate(errp, local_err);
>  }
>  
> +static void file_memory_backend_get_sync(
> +Object *obj, Visitor *v, const char *name, void *opaque, Error **errp)
> +{
> +HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(obj);
> +OnOffAuto value = fb->sync;
> +
> +visit_type_OnOffAuto(v, name, , errp);
> +}
> +
> +static void file_memory_backend_set_sync(
> +Object *obj, Visitor *v, const char *name, void *opaque, Error **errp)
> +{
> +HostMemoryBackend *backend = MEMORY_BACKEND(obj);
> +HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(obj);
> +Error *local_err = NULL;
> +OnOffAuto value;
> +
> +if (host_memory_backend_mr_inited(backend)) {
> +error_setg(_err, "cannot change property '%s' of %s '%s'",
> +   name, object_get_typename(obj), backend->id);
> +goto out;
> +}
> +
> +visit_type_OnOffAuto(v, name, , _err);
> +if (local_err) {
> +goto out;
> +}
> +fb->sync = value;
> +
> + out:
> +error_propagate(errp, local_err);
> +}
> +
>  static void file_backend_unparent(Object *obj)
>  {
>  HostMemoryBackend *backend = MEMORY_BACKEND(obj);
> @@ -187,6 +222,9 @@ file_backend_class_init(ObjectClass *oc, void *data)
>  file_memory_backend_get_align,
>  file_memory_backend_set_align,
>  NULL, NULL, _abort);
> +object_class_property_add(oc, "sync", "OnOffAuto",
> +file_memory_backend_get_sync, file_memory_backend_set_sync,
> +NULL, NULL, _abort);
>  }
>  
>  static void file_backend_instance_finalize(Object *o)
> diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
> index e903d8bb09..49b174fe66 100644
> --- a/docs/nvdimm.txt
> +++ b/docs/nvdimm.txt
> @@ -143,10 +143,23 @@ Guest Data Persistence
>  --
>  
>  Though QEMU supports multiple types of vNVDIMM backends on Linux,
> -currently the only one that can guarantee the guest write persistence
> +if MAP_SYNC is not supported by the host kernel and the backends,
> +the only backend that can guarantee the guest write persistence
>  is the device DAX on the real NVDIMM device (e.g., /dev/dax0.0), to
>  which all guest access do not involve any host-side kernel cache.
>  
> +mmap(2) flag MAP_SYNC is added since Linux kernel 4.15. On such
> +systems, QEMU can mmap(2) the backend with MAP_SYNC, which can
> +guarantee the guest write persistence to vNVDIMM. Besides the host
> +kernel support, enabling MAP_SYNC in QEMU also requires:
> +
> + - the backend is a file supporting DAX, e.g., a file on an ext4 or
> +   xfs file system mounted with "-o dax",
> +
> + -