[Qemu-devel] [PATCH V13 5/5] docs: Added MAP_SYNC documentation

2019-02-07 Thread Zhang, Yi
From: Zhang Yi 

Signed-off-by: Zhang Yi 
---
 docs/nvdimm.txt | 22 +++---
 qemu-options.hx |  5 +
 2 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
index 5f158a6..e70f28b 100644
--- a/docs/nvdimm.txt
+++ b/docs/nvdimm.txt
@@ -143,9 +143,25 @@ Guest Data Persistence
 --
 
 Though QEMU supports multiple types of vNVDIMM backends on Linux,
-currently the only one that can guarantee the guest write persistence
-is the device DAX on the real NVDIMM device (e.g., /dev/dax0.0), to
-which all guest access do not involve any host-side kernel cache.
+the only backend that can guarantee the guest write persistence is:
+
+A. DAX device (e.g., /dev/dax0.0, ) or
+B. DAX file(mounted with dax option)
+
+When using B (A file supporting direct mapping of persistent memory)
+as a backend, write persistence is guaranteed if the host kernel has
+support for the MAP_SYNC flag in the mmap system call (available
+since Linux 4.15 and on certain distro kernels) and additionally
+both 'pmem' and 'share' flags are set to 'on' on the backend.
+
+If these conditions are not satisfied i.e. if either 'pmem' or 'share'
+are not set, if the backend file does not support DAX or if MAP_SYNC
+is not supported by the host kernel, write persistence is not
+guaranteed after a system crash. For compatibility reasons, these
+conditions are silently ignored if not satisfied. Currently, no way
+is provided to test for them.
+For more details, please reference mmap(2) man page:
+http://man7.org/linux/man-pages/man2/mmap.2.html.
 
 When using other types of backends, it's suggested to set 'unarmed'
 option of '-device nvdimm' to 'on', which sets the unarmed flag of the
diff --git a/qemu-options.hx b/qemu-options.hx
index 08f8516..ef1da8f 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -4002,6 +4002,11 @@ using the SNIA NVM programming model (e.g. Intel NVDIMM).
 If @option{pmem} is set to 'on', QEMU will take necessary operations to
 guarantee the persistence of its own writes to @option{mem-path}
 (e.g. in vNVDIMM label emulation and live migration).
+Also, we will map the backend-file with MAP_SYNC flag, which ensures the
+file metadata is in sync for @option{mem-path} in case of host crash
+or a power failure. MAP_SYNC requires support from both the host kernel
+(since Linux kernel 4.15) and the filesystem of @option{mem-path} mounted
+with DAX option.
 
 @item -object 
memory-backend-ram,id=@var{id},merge=@var{on|off},dump=@var{on|off},share=@var{on|off},prealloc=@var{on|off},size=@var{size},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave}
 
-- 
2.7.4




[Qemu-devel] [PATCH V13 5/5] docs: Added MAP_SYNC documentation

2019-02-07 Thread Zhang, Yi
From: Zhang Yi 

Signed-off-by: Zhang Yi 
---
 docs/nvdimm.txt | 25 ++---
 qemu-options.hx |  5 +
 2 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
index 5f158a6..a168429 100644
--- a/docs/nvdimm.txt
+++ b/docs/nvdimm.txt
@@ -143,9 +143,28 @@ Guest Data Persistence
 --
 
 Though QEMU supports multiple types of vNVDIMM backends on Linux,
-currently the only one that can guarantee the guest write persistence
-is the device DAX on the real NVDIMM device (e.g., /dev/dax0.0), to
-which all guest access do not involve any host-side kernel cache.
+the only backend that can guarantee the guest write persistence is:
+
+A. DAX device (e.g., /dev/dax0.0, ) or
+B. DAX file(mounted with dax option)
+
+both are use real NVDIMM device as backend, which supporting direct
+access for files(no page cache).
+
+When using B (A file supporting direct mapping of persistent memory)
+as a backend, write persistence is guaranteed if the host kernel has
+support for the MAP_SYNC flag in the mmap system call (available
+since Linux 4.15 and on certain distro kernels) and additionally
+both 'pmem' and 'share' flags are set to 'on' on the backend.
+
+If these conditions are not satisfied i.e. if either 'pmem' or 'share'
+are not set, if the backend file does not support DAX or if MAP_SYNC
+is not supported by the host kernel, write persistence is not
+guaranteed after a system crash. For compatibility reasons, these
+conditions are silently ignored if not satisfied. Currently, no way
+is provided to test for them.
+For more details, please reference mmap(2) man page:
+http://man7.org/linux/man-pages/man2/mmap.2.html.
 
 When using other types of backends, it's suggested to set 'unarmed'
 option of '-device nvdimm' to 'on', which sets the unarmed flag of the
diff --git a/qemu-options.hx b/qemu-options.hx
index 08f8516..ef1da8f 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -4002,6 +4002,11 @@ using the SNIA NVM programming model (e.g. Intel NVDIMM).
 If @option{pmem} is set to 'on', QEMU will take necessary operations to
 guarantee the persistence of its own writes to @option{mem-path}
 (e.g. in vNVDIMM label emulation and live migration).
+Also, we will map the backend-file with MAP_SYNC flag, which ensures the
+file metadata is in sync for @option{mem-path} in case of host crash
+or a power failure. MAP_SYNC requires support from both the host kernel
+(since Linux kernel 4.15) and the filesystem of @option{mem-path} mounted
+with DAX option.
 
 @item -object 
memory-backend-ram,id=@var{id},merge=@var{on|off},dump=@var{on|off},share=@var{on|off},prealloc=@var{on|off},size=@var{size},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave}
 
-- 
2.7.4




[Qemu-devel] [PATCH V13 4/5] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()

2019-02-07 Thread Zhang, Yi
From: Zhang Yi 

When a file supporting DAX is used as vNVDIMM backend, mmap it with
MAP_SYNC flag in addition which can ensure file system metadata
synced in each guest writes to the backend file, without other QEMU
actions (e.g., periodic fsync() by QEMU).

Current, We have below different possible use cases:

1. pmem=on is set, shared=on is set, MAP_SYNC supported:
   a: backend is a dax supporting file.
- MAP_SYNC will active.
   b: backend is not a dax supporting file.
- mmap will trigger a warning. then MAP_SYNC flag will be ignored

2. The rest of cases:
   - we will never pass the MAP_SYNC to mmap2

Signed-off-by: Haozhong Zhang 
Signed-off-by: Zhang Yi 
---
 util/mmap-alloc.c | 45 -
 1 file changed, 44 insertions(+), 1 deletion(-)

diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index 97bbeed..2f21efd 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -10,6 +10,13 @@
  * later.  See the COPYING file in the top-level directory.
  */
 
+#ifdef CONFIG_LINUX
+#include 
+#else  /* !CONFIG_LINUX */
+#define MAP_SYNC  0x0
+#define MAP_SHARED_VALIDATE   0x0
+#endif /* CONFIG_LINUX */
+
 #include "qemu/osdep.h"
 #include "qemu/mmap-alloc.h"
 #include "qemu/host-utils.h"
@@ -101,6 +108,8 @@ void *qemu_ram_mmap(int fd,
 #else
 void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
 #endif
+int mmap_flags;
+int map_sync_flags = 0;
 size_t offset;
 void *ptr1;
 
@@ -111,13 +120,47 @@ void *qemu_ram_mmap(int fd,
 assert(is_power_of_2(align));
 /* Always align to host page size */
 assert(align >= getpagesize());
+mmap_flags = shared ? MAP_SHARED : MAP_PRIVATE;
+if (shared && is_pmem) {
+map_sync_flags = MAP_SYNC | MAP_SHARED_VALIDATE;
+mmap_flags |= map_sync_flags;
+}
 
 offset = QEMU_ALIGN_UP((uintptr_t)ptr, align) - (uintptr_t)ptr;
 ptr1 = mmap(ptr + offset, size, PROT_READ | PROT_WRITE,
 MAP_FIXED |
 (fd == -1 ? MAP_ANONYMOUS : 0) |
-(shared ? MAP_SHARED : MAP_PRIVATE),
+mmap_flags,
 fd, 0);
+
+
+if (ptr1 == MAP_FAILED && map_sync_flags) {
+if (errno == ENOTSUP) {
+char *proc_link, *file_name;
+int len;
+proc_link = g_strdup_printf("/proc/self/fd/%d", fd);
+file_name = g_malloc0(PATH_MAX);
+len = readlink(proc_link, file_name, PATH_MAX - 1);
+if (len < 0) {
+len = 0;
+}
+file_name[len] = '\0';
+fprintf(stderr, "Warning: requesting persistence across crashes "
+"for backend file %s failed. Proceeding without "
+"persistence, data might become corrupted in case of host "
+"crash.\n", file_name);
+g_free(proc_link);
+g_free(file_name);
+}
+/* if map failed with MAP_SHARED_VALIDATE | MAP_SYNC,
+ * we will remove these flags to handle compatibility.
+ */
+ptr1 = mmap(ptr + offset, size, PROT_READ | PROT_WRITE,
+MAP_FIXED |
+(fd == -1 ? MAP_ANONYMOUS : 0) |
+MAP_SHARED,
+fd, 0);
+}
 if (ptr1 == MAP_FAILED) {
 munmap(ptr, total);
 return MAP_FAILED;
-- 
2.7.4




[Qemu-devel] [PATCH V13 2/5] scripts/update-linux-headers: add linux/mman.h

2019-02-07 Thread Zhang, Yi
From: Zhang Yi 

Add linux/mman.h,asm/mman.h,asm/mman-common.h to linux-headers,
So we can use more mmap2 flags.

Signed-off-by: Zhang Yi 
---
 scripts/update-linux-headers.sh | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/scripts/update-linux-headers.sh b/scripts/update-linux-headers.sh
index 0a964fe..57db5d9 100755
--- a/scripts/update-linux-headers.sh
+++ b/scripts/update-linux-headers.sh
@@ -95,7 +95,7 @@ for arch in $ARCHLIST; do
 
 rm -rf "$output/linux-headers/asm-$arch"
 mkdir -p "$output/linux-headers/asm-$arch"
-for header in kvm.h unistd.h bitsperlong.h; do
+for header in kvm.h unistd.h bitsperlong.h mman.h; do
 cp "$tmpdir/include/asm/$header" "$output/linux-headers/asm-$arch"
 done
 
@@ -126,13 +126,13 @@ done
 rm -rf "$output/linux-headers/linux"
 mkdir -p "$output/linux-headers/linux"
 for header in kvm.h vfio.h vfio_ccw.h vhost.h \
-  psci.h psp-sev.h userfaultfd.h; do
+  psci.h psp-sev.h userfaultfd.h mman.h; do
 cp "$tmpdir/include/linux/$header" "$output/linux-headers/linux"
 done
 
 rm -rf "$output/linux-headers/asm-generic"
 mkdir -p "$output/linux-headers/asm-generic"
-for header in unistd.h bitsperlong.h; do
+for header in unistd.h bitsperlong.h mman-common.h mman.h hugetlb_encode.h; do
 cp "$tmpdir/include/asm-generic/$header" 
"$output/linux-headers/asm-generic"
 done
 
-- 
2.7.4




[Qemu-devel] [PATCH V13 1/5] util/mmap-alloc: Add a 'is_pmem' parameter to qemu_ram_mmap

2019-02-07 Thread Zhang, Yi
From: Zhang Yi 

besides the existing 'shared' flags, we are going to add
'is_pmem' to qemu_ram_mmap(), which indicated the memory backend
file is a persist memory.

Signed-off-by: Haozhong Zhang 
Signed-off-by: Zhang Yi 
Reviewed-by: Pankaj Gupta 
---
 exec.c|  2 +-
 include/qemu/mmap-alloc.h | 21 -
 util/mmap-alloc.c |  6 +-
 util/oslib-posix.c|  2 +-
 4 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/exec.c b/exec.c
index bb6170d..27cea52 100644
--- a/exec.c
+++ b/exec.c
@@ -1860,7 +1860,7 @@ static void *file_ram_alloc(RAMBlock *block,
 }
 
 area = qemu_ram_mmap(fd, memory, block->mr->align,
- block->flags & RAM_SHARED);
+ block->flags & RAM_SHARED, block->flags & RAM_PMEM);
 if (area == MAP_FAILED) {
 error_setg_errno(errp, errno,
  "unable to map backing store for guest RAM");
diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
index 50385e3..190688a 100644
--- a/include/qemu/mmap-alloc.h
+++ b/include/qemu/mmap-alloc.h
@@ -7,7 +7,26 @@ size_t qemu_fd_getpagesize(int fd);
 
 size_t qemu_mempath_getpagesize(const char *mem_path);
 
-void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared);
+/**
+ * qemu_ram_mmap: mmap the specified file or device.
+ *
+ * Parameters:
+ *  @fd: the file or the device to mmap
+ *  @size: the number of bytes to be mmaped
+ *  @align: if not zero, specify the alignment of the starting mapping address;
+ *  otherwise, the alignment in use will be determined by QEMU.
+ *  @shared: map has RAM_SHARED flag.
+ *  @is_pmem: map has RAM_PMEM flag.
+ *
+ * Return:
+ *  On success, return a pointer to the mapped area.
+ *  On failure, return MAP_FAILED.
+ */
+void *qemu_ram_mmap(int fd,
+size_t size,
+size_t align,
+bool shared,
+bool is_pmem);
 
 void qemu_ram_munmap(void *ptr, size_t size);
 
diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index fd329ec..97bbeed 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -75,7 +75,11 @@ size_t qemu_mempath_getpagesize(const char *mem_path)
 return getpagesize();
 }
 
-void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
+void *qemu_ram_mmap(int fd,
+size_t size,
+size_t align,
+bool shared,
+bool is_pmem)
 {
 /*
  * Note: this always allocates at least one extra page of virtual address
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index fbd0dc8..040937f 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -203,7 +203,7 @@ void *qemu_memalign(size_t alignment, size_t size)
 void *qemu_anon_ram_alloc(size_t size, uint64_t *alignment, bool shared)
 {
 size_t align = QEMU_VMALLOC_ALIGN;
-void *ptr = qemu_ram_mmap(-1, size, align, shared);
+void *ptr = qemu_ram_mmap(-1, size, align, shared, false);
 
 if (ptr == MAP_FAILED) {
 return NULL;
-- 
2.7.4




[Qemu-devel] [PATCH V13 3/5] linux-headers: add linux/mman.h.

2019-02-07 Thread Zhang, Yi
From: Zhang Yi 

Update it to 4.20-rc1

Signed-off-by: Zhang Yi 
---
 linux-headers/asm-arm/mman.h   |   4 ++
 linux-headers/asm-arm64/mman.h |   1 +
 linux-headers/asm-generic/hugetlb_encode.h |  36 ++
 linux-headers/asm-generic/mman-common.h|  77 
 linux-headers/asm-generic/mman.h   |  24 +++
 linux-headers/asm-mips/mman.h  | 108 +
 linux-headers/asm-powerpc/mman.h   |  39 +++
 linux-headers/asm-s390/mman.h  |   1 +
 linux-headers/asm-x86/mman.h   |  31 +
 linux-headers/linux/mman.h |  38 ++
 10 files changed, 359 insertions(+)
 create mode 100644 linux-headers/asm-arm/mman.h
 create mode 100644 linux-headers/asm-arm64/mman.h
 create mode 100644 linux-headers/asm-generic/hugetlb_encode.h
 create mode 100644 linux-headers/asm-generic/mman-common.h
 create mode 100644 linux-headers/asm-generic/mman.h
 create mode 100644 linux-headers/asm-mips/mman.h
 create mode 100644 linux-headers/asm-powerpc/mman.h
 create mode 100644 linux-headers/asm-s390/mman.h
 create mode 100644 linux-headers/asm-x86/mman.h
 create mode 100644 linux-headers/linux/mman.h

diff --git a/linux-headers/asm-arm/mman.h b/linux-headers/asm-arm/mman.h
new file mode 100644
index 000..41f99c5
--- /dev/null
+++ b/linux-headers/asm-arm/mman.h
@@ -0,0 +1,4 @@
+#include 
+
+#define arch_mmap_check(addr, len, flags) \
+   (((flags) & MAP_FIXED && (addr) < FIRST_USER_ADDRESS) ? -EINVAL : 0)
diff --git a/linux-headers/asm-arm64/mman.h b/linux-headers/asm-arm64/mman.h
new file mode 100644
index 000..8eebf89
--- /dev/null
+++ b/linux-headers/asm-arm64/mman.h
@@ -0,0 +1 @@
+#include 
diff --git a/linux-headers/asm-generic/hugetlb_encode.h 
b/linux-headers/asm-generic/hugetlb_encode.h
new file mode 100644
index 000..b0f8e87
--- /dev/null
+++ b/linux-headers/asm-generic/hugetlb_encode.h
@@ -0,0 +1,36 @@
+#ifndef _ASM_GENERIC_HUGETLB_ENCODE_H_
+#define _ASM_GENERIC_HUGETLB_ENCODE_H_
+
+/*
+ * Several system calls take a flag to request "hugetlb" huge pages.
+ * Without further specification, these system calls will use the
+ * system's default huge page size.  If a system supports multiple
+ * huge page sizes, the desired huge page size can be specified in
+ * bits [26:31] of the flag arguments.  The value in these 6 bits
+ * will encode the log2 of the huge page size.
+ *
+ * The following definitions are associated with this huge page size
+ * encoding in flag arguments.  System call specific header files
+ * that use this encoding should include this file.  They can then
+ * provide definitions based on these with their own specific prefix.
+ * for example:
+ * #define MAP_HUGE_SHIFT HUGETLB_FLAG_ENCODE_SHIFT
+ */
+
+#define HUGETLB_FLAG_ENCODE_SHIFT  26
+#define HUGETLB_FLAG_ENCODE_MASK   0x3f
+
+#define HUGETLB_FLAG_ENCODE_64KB   (16 << HUGETLB_FLAG_ENCODE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_512KB  (19 << HUGETLB_FLAG_ENCODE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_1MB(20 << 
HUGETLB_FLAG_ENCODE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_2MB(21 << 
HUGETLB_FLAG_ENCODE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_8MB(23 << 
HUGETLB_FLAG_ENCODE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_16MB   (24 << HUGETLB_FLAG_ENCODE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_32MB   (25 << HUGETLB_FLAG_ENCODE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_256MB  (28 << HUGETLB_FLAG_ENCODE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_512MB  (29 << HUGETLB_FLAG_ENCODE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_1GB(30 << 
HUGETLB_FLAG_ENCODE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_2GB(31 << 
HUGETLB_FLAG_ENCODE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_16GB   (34 << HUGETLB_FLAG_ENCODE_SHIFT)
+
+#endif /* _ASM_GENERIC_HUGETLB_ENCODE_H_ */
diff --git a/linux-headers/asm-generic/mman-common.h 
b/linux-headers/asm-generic/mman-common.h
new file mode 100644
index 000..e7ee328
--- /dev/null
+++ b/linux-headers/asm-generic/mman-common.h
@@ -0,0 +1,77 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef __ASM_GENERIC_MMAN_COMMON_H
+#define __ASM_GENERIC_MMAN_COMMON_H
+
+/*
+ Author: Michael S. Tsirkin , Mellanox Technologies Ltd.
+ Based on: asm-xxx/mman.h
+*/
+
+#define PROT_READ  0x1 /* page can be read */
+#define PROT_WRITE 0x2 /* page can be written */
+#define PROT_EXEC  0x4 /* page can be executed */
+#define PROT_SEM   0x8 /* page may be used for atomic ops */
+#define PROT_NONE  0x0 /* page can not be accessed */
+#define PROT_GROWSDOWN 0x0100  /* mprotect flag: extend change to 
start of growsdown vma */
+#define PROT_GROWSUP   0x0200  /* mprotect flag: extend change to end 
of growsup vma */
+
+#define MAP_

[Qemu-devel] [PATCH V13 0/5] support MAP_SYNC for memory-backend-file

2019-02-07 Thread Zhang, Yi
mechanism. MAP_SYNC will be ignored
   by Linux kernel 4.15 if MAP_SHARED_VALIDATE is missed.
 * Patch 1: define MAP_SYNC and MAP_SHARED_VALIDATE as 0 on non-Linux
   platforms in order to make qemu_ram_mmap() compile on those platforms.
 * Patch 2&3: include more information in error messages of
   memory-backend in hope to help user to identify the error.
   (Dr. David Alan Gilbert)
 * Patch 3: fix typo in the commit message. (Dr. David Alan Gilbert)

Changes in v2:
 * Add 'sync' option to control the use of MAP_SYNC. (Eduardo Habkost)
 * Remove the unnecessary set of MAP_SHARED_VALIDATE in some cases and
   the retry mechanism in qemu_ram_mmap(). (Michael S. Tsirkin)
 * Move OS dependent definitions of MAP_SYNC and MAP_SHARED_VALIDATE
   to osdep.h. (Michael S. Tsirkin)

Zhang Yi (5):
  util/mmap-alloc: Add a 'is_pmem' parameter to qemu_ram_mmap
  scripts/update-linux-headers: add linux/mman.h
  linux-headers: add linux/mman.h.
  util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
  docs: Added MAP_SYNC documentation

Zhang Yi (5):
  util/mmap-alloc: Add a 'is_pmem' parameter to qemu_ram_mmap
  scripts/update-linux-headers: add linux/mman.h
  linux-headers: add linux/mman.h.
  util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
  docs: Added MAP_SYNC documentation

 docs/nvdimm.txt|  25 ++-
 exec.c |   2 +-
 include/qemu/mmap-alloc.h  |  21 +-
 linux-headers/asm-arm/mman.h   |   4 ++
 linux-headers/asm-arm64/mman.h |   1 +
 linux-headers/asm-generic/hugetlb_encode.h |  36 ++
 linux-headers/asm-generic/mman-common.h|  77 
 linux-headers/asm-generic/mman.h   |  24 +++
 linux-headers/asm-mips/mman.h  | 108 +
 linux-headers/asm-powerpc/mman.h   |  39 +++
 linux-headers/asm-s390/mman.h  |   1 +
 linux-headers/asm-x86/mman.h   |  31 +
 linux-headers/linux/mman.h |  38 ++
 qemu-options.hx|   5 ++
 scripts/update-linux-headers.sh|   6 +-
 util/mmap-alloc.c  |  51 +-
 util/oslib-posix.c |   2 +-
 17 files changed, 460 insertions(+), 11 deletions(-)
 create mode 100644 linux-headers/asm-arm/mman.h
 create mode 100644 linux-headers/asm-arm64/mman.h
 create mode 100644 linux-headers/asm-generic/hugetlb_encode.h
 create mode 100644 linux-headers/asm-generic/mman-common.h
 create mode 100644 linux-headers/asm-generic/mman.h
 create mode 100644 linux-headers/asm-mips/mman.h
 create mode 100644 linux-headers/asm-powerpc/mman.h
 create mode 100644 linux-headers/asm-s390/mman.h
 create mode 100644 linux-headers/asm-x86/mman.h
 create mode 100644 linux-headers/linux/mman.h

-- 
2.7.4




[Qemu-devel] [PATCH V12 2/5] scripts/update-linux-headers: add linux/mman.h

2019-02-06 Thread Zhang, Yi
From: Zhang Yi 

Add linux/mman.h,asm/mman.h,asm/mman-common.h to linux-headers,
So we can use more mmap2 flags.

Signed-off-by: Zhang Yi 
---
 scripts/update-linux-headers.sh | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/scripts/update-linux-headers.sh b/scripts/update-linux-headers.sh
index 0a964fe..57db5d9 100755
--- a/scripts/update-linux-headers.sh
+++ b/scripts/update-linux-headers.sh
@@ -95,7 +95,7 @@ for arch in $ARCHLIST; do
 
 rm -rf "$output/linux-headers/asm-$arch"
 mkdir -p "$output/linux-headers/asm-$arch"
-for header in kvm.h unistd.h bitsperlong.h; do
+for header in kvm.h unistd.h bitsperlong.h mman.h; do
 cp "$tmpdir/include/asm/$header" "$output/linux-headers/asm-$arch"
 done
 
@@ -126,13 +126,13 @@ done
 rm -rf "$output/linux-headers/linux"
 mkdir -p "$output/linux-headers/linux"
 for header in kvm.h vfio.h vfio_ccw.h vhost.h \
-  psci.h psp-sev.h userfaultfd.h; do
+  psci.h psp-sev.h userfaultfd.h mman.h; do
 cp "$tmpdir/include/linux/$header" "$output/linux-headers/linux"
 done
 
 rm -rf "$output/linux-headers/asm-generic"
 mkdir -p "$output/linux-headers/asm-generic"
-for header in unistd.h bitsperlong.h; do
+for header in unistd.h bitsperlong.h mman-common.h mman.h hugetlb_encode.h; do
 cp "$tmpdir/include/asm-generic/$header" 
"$output/linux-headers/asm-generic"
 done
 
-- 
2.7.4




[Qemu-devel] [PATCH V12 0/5] support MAP_SYNC for memory-backend-file

2019-02-06 Thread Zhang, Yi
Linux
   platforms in order to make qemu_ram_mmap() compile on those platforms.
 * Patch 2&3: include more information in error messages of
   memory-backend in hope to help user to identify the error.
   (Dr. David Alan Gilbert)
 * Patch 3: fix typo in the commit message. (Dr. David Alan Gilbert)

Changes in v2:
 * Add 'sync' option to control the use of MAP_SYNC. (Eduardo Habkost)
 * Remove the unnecessary set of MAP_SHARED_VALIDATE in some cases and
   the retry mechanism in qemu_ram_mmap(). (Michael S. Tsirkin)
 * Move OS dependent definitions of MAP_SYNC and MAP_SHARED_VALIDATE
   to osdep.h. (Michael S. Tsirkin)

Zhang Yi (5):
  util/mmap-alloc: Add a 'is_pmem' parameter to qemu_ram_mmap
  scripts/update-linux-headers: add linux/mman.h
  linux-headers: add linux/mman.h.
  util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
  docs: Added MAP_SYNC documentation

 docs/nvdimm.txt|  25 ++-
 exec.c |   2 +-
 include/qemu/mmap-alloc.h  |  21 +-
 include/qemu/osdep.h   |   7 ++
 linux-headers/asm-arm/mman.h   |   4 ++
 linux-headers/asm-arm64/mman.h |   1 +
 linux-headers/asm-generic/hugetlb_encode.h |  36 ++
 linux-headers/asm-generic/mman-common.h|  77 
 linux-headers/asm-generic/mman.h   |  24 +++
 linux-headers/asm-mips/mman.h  | 108 +
 linux-headers/asm-powerpc/mman.h   |  39 +++
 linux-headers/asm-s390/mman.h  |   1 +
 linux-headers/asm-x86/mman.h   |  31 +
 linux-headers/linux/mman.h |  38 ++
 qemu-options.hx|   4 ++
 scripts/update-linux-headers.sh|   6 +-
 util/mmap-alloc.c  |  30 +++-
 util/oslib-posix.c |   2 +-
 18 files changed, 445 insertions(+), 11 deletions(-)
 create mode 100644 linux-headers/asm-arm/mman.h
 create mode 100644 linux-headers/asm-arm64/mman.h
 create mode 100644 linux-headers/asm-generic/hugetlb_encode.h
 create mode 100644 linux-headers/asm-generic/mman-common.h
 create mode 100644 linux-headers/asm-generic/mman.h
 create mode 100644 linux-headers/asm-mips/mman.h
 create mode 100644 linux-headers/asm-powerpc/mman.h
 create mode 100644 linux-headers/asm-s390/mman.h
 create mode 100644 linux-headers/asm-x86/mman.h
 create mode 100644 linux-headers/linux/mman.h

-- 
2.7.4




[Qemu-devel] [PATCH V12 1/5] util/mmap-alloc: Add a 'is_pmem' parameter to qemu_ram_mmap

2019-02-06 Thread Zhang, Yi
From: Zhang Yi 

besides the existing 'shared' flags, we are going to add
'is_pmem' to qemu_ram_mmap(), which indicated the memory backend
file is a persist memory.

Signed-off-by: Haozhong Zhang 
Signed-off-by: Zhang Yi 
---
 exec.c|  2 +-
 include/qemu/mmap-alloc.h | 21 -
 util/mmap-alloc.c |  6 +-
 util/oslib-posix.c|  2 +-
 4 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/exec.c b/exec.c
index bb6170d..27cea52 100644
--- a/exec.c
+++ b/exec.c
@@ -1860,7 +1860,7 @@ static void *file_ram_alloc(RAMBlock *block,
 }
 
 area = qemu_ram_mmap(fd, memory, block->mr->align,
- block->flags & RAM_SHARED);
+ block->flags & RAM_SHARED, block->flags & RAM_PMEM);
 if (area == MAP_FAILED) {
 error_setg_errno(errp, errno,
  "unable to map backing store for guest RAM");
diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
index 50385e3..190688a 100644
--- a/include/qemu/mmap-alloc.h
+++ b/include/qemu/mmap-alloc.h
@@ -7,7 +7,26 @@ size_t qemu_fd_getpagesize(int fd);
 
 size_t qemu_mempath_getpagesize(const char *mem_path);
 
-void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared);
+/**
+ * qemu_ram_mmap: mmap the specified file or device.
+ *
+ * Parameters:
+ *  @fd: the file or the device to mmap
+ *  @size: the number of bytes to be mmaped
+ *  @align: if not zero, specify the alignment of the starting mapping address;
+ *  otherwise, the alignment in use will be determined by QEMU.
+ *  @shared: map has RAM_SHARED flag.
+ *  @is_pmem: map has RAM_PMEM flag.
+ *
+ * Return:
+ *  On success, return a pointer to the mapped area.
+ *  On failure, return MAP_FAILED.
+ */
+void *qemu_ram_mmap(int fd,
+size_t size,
+size_t align,
+bool shared,
+bool is_pmem);
 
 void qemu_ram_munmap(void *ptr, size_t size);
 
diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index fd329ec..97bbeed 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -75,7 +75,11 @@ size_t qemu_mempath_getpagesize(const char *mem_path)
 return getpagesize();
 }
 
-void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
+void *qemu_ram_mmap(int fd,
+size_t size,
+size_t align,
+bool shared,
+bool is_pmem)
 {
 /*
  * Note: this always allocates at least one extra page of virtual address
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index fbd0dc8..040937f 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -203,7 +203,7 @@ void *qemu_memalign(size_t alignment, size_t size)
 void *qemu_anon_ram_alloc(size_t size, uint64_t *alignment, bool shared)
 {
 size_t align = QEMU_VMALLOC_ALIGN;
-void *ptr = qemu_ram_mmap(-1, size, align, shared);
+void *ptr = qemu_ram_mmap(-1, size, align, shared, false);
 
 if (ptr == MAP_FAILED) {
 return NULL;
-- 
2.7.4




[Qemu-devel] [PATCH V12 3/5] linux-headers: add linux/mman.h.

2019-02-06 Thread Zhang, Yi
From: Zhang Yi 

Update it to 4.20-rc1

Signed-off-by: Zhang Yi 
---
 linux-headers/asm-arm/mman.h   |   4 ++
 linux-headers/asm-arm64/mman.h |   1 +
 linux-headers/asm-generic/hugetlb_encode.h |  36 ++
 linux-headers/asm-generic/mman-common.h|  77 
 linux-headers/asm-generic/mman.h   |  24 +++
 linux-headers/asm-mips/mman.h  | 108 +
 linux-headers/asm-powerpc/mman.h   |  39 +++
 linux-headers/asm-s390/mman.h  |   1 +
 linux-headers/asm-x86/mman.h   |  31 +
 linux-headers/linux/mman.h |  38 ++
 10 files changed, 359 insertions(+)
 create mode 100644 linux-headers/asm-arm/mman.h
 create mode 100644 linux-headers/asm-arm64/mman.h
 create mode 100644 linux-headers/asm-generic/hugetlb_encode.h
 create mode 100644 linux-headers/asm-generic/mman-common.h
 create mode 100644 linux-headers/asm-generic/mman.h
 create mode 100644 linux-headers/asm-mips/mman.h
 create mode 100644 linux-headers/asm-powerpc/mman.h
 create mode 100644 linux-headers/asm-s390/mman.h
 create mode 100644 linux-headers/asm-x86/mman.h
 create mode 100644 linux-headers/linux/mman.h

diff --git a/linux-headers/asm-arm/mman.h b/linux-headers/asm-arm/mman.h
new file mode 100644
index 000..41f99c5
--- /dev/null
+++ b/linux-headers/asm-arm/mman.h
@@ -0,0 +1,4 @@
+#include 
+
+#define arch_mmap_check(addr, len, flags) \
+   (((flags) & MAP_FIXED && (addr) < FIRST_USER_ADDRESS) ? -EINVAL : 0)
diff --git a/linux-headers/asm-arm64/mman.h b/linux-headers/asm-arm64/mman.h
new file mode 100644
index 000..8eebf89
--- /dev/null
+++ b/linux-headers/asm-arm64/mman.h
@@ -0,0 +1 @@
+#include 
diff --git a/linux-headers/asm-generic/hugetlb_encode.h 
b/linux-headers/asm-generic/hugetlb_encode.h
new file mode 100644
index 000..b0f8e87
--- /dev/null
+++ b/linux-headers/asm-generic/hugetlb_encode.h
@@ -0,0 +1,36 @@
+#ifndef _ASM_GENERIC_HUGETLB_ENCODE_H_
+#define _ASM_GENERIC_HUGETLB_ENCODE_H_
+
+/*
+ * Several system calls take a flag to request "hugetlb" huge pages.
+ * Without further specification, these system calls will use the
+ * system's default huge page size.  If a system supports multiple
+ * huge page sizes, the desired huge page size can be specified in
+ * bits [26:31] of the flag arguments.  The value in these 6 bits
+ * will encode the log2 of the huge page size.
+ *
+ * The following definitions are associated with this huge page size
+ * encoding in flag arguments.  System call specific header files
+ * that use this encoding should include this file.  They can then
+ * provide definitions based on these with their own specific prefix.
+ * for example:
+ * #define MAP_HUGE_SHIFT HUGETLB_FLAG_ENCODE_SHIFT
+ */
+
+#define HUGETLB_FLAG_ENCODE_SHIFT  26
+#define HUGETLB_FLAG_ENCODE_MASK   0x3f
+
+#define HUGETLB_FLAG_ENCODE_64KB   (16 << HUGETLB_FLAG_ENCODE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_512KB  (19 << HUGETLB_FLAG_ENCODE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_1MB(20 << 
HUGETLB_FLAG_ENCODE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_2MB(21 << 
HUGETLB_FLAG_ENCODE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_8MB(23 << 
HUGETLB_FLAG_ENCODE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_16MB   (24 << HUGETLB_FLAG_ENCODE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_32MB   (25 << HUGETLB_FLAG_ENCODE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_256MB  (28 << HUGETLB_FLAG_ENCODE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_512MB  (29 << HUGETLB_FLAG_ENCODE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_1GB(30 << 
HUGETLB_FLAG_ENCODE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_2GB(31 << 
HUGETLB_FLAG_ENCODE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_16GB   (34 << HUGETLB_FLAG_ENCODE_SHIFT)
+
+#endif /* _ASM_GENERIC_HUGETLB_ENCODE_H_ */
diff --git a/linux-headers/asm-generic/mman-common.h 
b/linux-headers/asm-generic/mman-common.h
new file mode 100644
index 000..e7ee328
--- /dev/null
+++ b/linux-headers/asm-generic/mman-common.h
@@ -0,0 +1,77 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef __ASM_GENERIC_MMAN_COMMON_H
+#define __ASM_GENERIC_MMAN_COMMON_H
+
+/*
+ Author: Michael S. Tsirkin , Mellanox Technologies Ltd.
+ Based on: asm-xxx/mman.h
+*/
+
+#define PROT_READ  0x1 /* page can be read */
+#define PROT_WRITE 0x2 /* page can be written */
+#define PROT_EXEC  0x4 /* page can be executed */
+#define PROT_SEM   0x8 /* page may be used for atomic ops */
+#define PROT_NONE  0x0 /* page can not be accessed */
+#define PROT_GROWSDOWN 0x0100  /* mprotect flag: extend change to 
start of growsdown vma */
+#define PROT_GROWSUP   0x0200  /* mprotect flag: extend change to end 
of growsup vma */
+
+#define MAP_

[Qemu-devel] [PATCH V12 5/5] docs: Added MAP_SYNC documentation

2019-02-06 Thread Zhang, Yi
From: Zhang Yi 

Signed-off-by: Zhang Yi 
---
 docs/nvdimm.txt | 25 ++---
 qemu-options.hx |  4 
 2 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
index 5f158a6..e2bf89f 100644
--- a/docs/nvdimm.txt
+++ b/docs/nvdimm.txt
@@ -143,9 +143,28 @@ Guest Data Persistence
 --
 
 Though QEMU supports multiple types of vNVDIMM backends on Linux,
-currently the only one that can guarantee the guest write persistence
-is the device DAX on the real NVDIMM device (e.g., /dev/dax0.0), to
-which all guest access do not involve any host-side kernel cache.
+the only backend that can guarantee the guest write persistence is:
+
+A. DAX device (e.g., /dev/dax0.0, ) or
+B. DAX file(mounted with dax option)
+
+both are from the real NVDIMM device, all guest access do not
+involve any host-side kernel cache.
+
+When using B (A file supporting direct mapping of persistent memory)
+as a backend, write persistence is guaranteed if the host kernel has
+support for the MAP_SYNC flag in the mmap system call (available
+since Linux 4.15 and on certain distro kernels) and additionally
+both 'pmem' and 'share' flags are set to 'on' on the backend.
+
+If these conditions are not satisfied i.e. if either 'pmem' or 'share'
+are not set, if the backend file does not support DAX or if MAP_SYNC
+is not supported by the host kernel, write persistence is not
+guaranteed after a system crash. For compatibility reasons, these
+conditions are silently ignored if not satisfied. Currently, no way
+is provided to test for them.
+For more details, please reference mmap(2) man page:
+http://man7.org/linux/man-pages/man2/mmap.2.html.
 
 When using other types of backends, it's suggested to set 'unarmed'
 option of '-device nvdimm' to 'on', which sets the unarmed flag of the
diff --git a/qemu-options.hx b/qemu-options.hx
index 08f8516..0cd41f4 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -4002,6 +4002,10 @@ using the SNIA NVM programming model (e.g. Intel NVDIMM).
 If @option{pmem} is set to 'on', QEMU will take necessary operations to
 guarantee the persistence of its own writes to @option{mem-path}
 (e.g. in vNVDIMM label emulation and live migration).
+Also, we will map the backend-file with MAP_SYNC flag, which can ensure
+the file metadata is in sync to @option{mem-path} in case of host crash
+or a power failure. MAP_SYNC requires support from both the host kernel
+(since Linux kernel 4.15) and @option{mem-path} (only files supporting DAX).
 
 @item -object 
memory-backend-ram,id=@var{id},merge=@var{on|off},dump=@var{on|off},share=@var{on|off},prealloc=@var{on|off},size=@var{size},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave}
 
-- 
2.7.4




[Qemu-devel] [PATCH V12 4/5] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()

2019-02-06 Thread Zhang, Yi
From: Zhang Yi 

When a file supporting DAX is used as vNVDIMM backend, mmap it with
MAP_SYNC flag in addition which can ensure file system metadata
synced in each guest writes to the backend file, without other QEMU
actions (e.g., periodic fsync() by QEMU).

Current, We have below different possible use cases:

1. pmem=on is set, shared=on is set, MAP_SYNC supported:
   a: backend is a dax supporting file.
- MAP_SYNC will active.
   b: backend is not a dax supporting file.
- mmap will trigger a warning. then MAP_SYNC flag will be ignored

2. The rest of cases:
   - we will never pass the MAP_SYNC to mmap2

Signed-off-by: Haozhong Zhang 
Signed-off-by: Zhang Yi 
---
 include/qemu/osdep.h |  7 +++
 util/mmap-alloc.c| 24 +++-
 2 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index 457d24e..9a94cc3 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -115,6 +115,13 @@ extern int daemon(int, int);
 #include "sysemu/os-win32.h"
 #endif
 
+#ifdef CONFIG_LINUX
+#include 
+#else  /* !CONFIG_LINUX */
+#define MAP_SYNC  0x0
+#define MAP_SHARED_VALIDATE   0x0
+#endif /* CONFIG_LINUX */
+
 #ifdef CONFIG_POSIX
 #include "sysemu/os-posix.h"
 #endif
diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index 97bbeed..e4e55fc 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -15,6 +15,7 @@
 #include "qemu/host-utils.h"
 
 #define HUGETLBFS_MAGIC   0x958458f6
+#define MAP_SYNC_FLAGS(MAP_SYNC | MAP_SHARED_VALIDATE)
 
 #ifdef CONFIG_LINUX
 #include 
@@ -101,6 +102,7 @@ void *qemu_ram_mmap(int fd,
 #else
 void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
 #endif
+int mmap_flags;
 size_t offset;
 void *ptr1;
 
@@ -111,13 +113,33 @@ void *qemu_ram_mmap(int fd,
 assert(is_power_of_2(align));
 /* Always align to host page size */
 assert(align >= getpagesize());
+mmap_flags = shared ? MAP_SHARED : MAP_PRIVATE;
+if (shared && is_pmem) {
+mmap_flags |= MAP_SYNC_FLAGS;
+}
 
 offset = QEMU_ALIGN_UP((uintptr_t)ptr, align) - (uintptr_t)ptr;
 ptr1 = mmap(ptr + offset, size, PROT_READ | PROT_WRITE,
 MAP_FIXED |
 (fd == -1 ? MAP_ANONYMOUS : 0) |
-(shared ? MAP_SHARED : MAP_PRIVATE),
+mmap_flags,
 fd, 0);
+
+
+if (ptr1 == MAP_FAILED &&
+(mmap_flags & MAP_SYNC_FLAGS) == MAP_SYNC_FLAGS) {
+if (errno == ENOTSUP) {
+perror("failed to validate with mapping flags");
+}
+/* if map failed with MAP_SHARED_VALIDATE | MAP_SYNC,
+ * we will remove these flags to handle compatibility.
+ */
+ptr1 = mmap(ptr + offset, size, PROT_READ | PROT_WRITE,
+MAP_FIXED |
+(fd == -1 ? MAP_ANONYMOUS : 0) |
+MAP_SHARED,
+fd, 0);
+}
 if (ptr1 == MAP_FAILED) {
 munmap(ptr, total);
 return MAP_FAILED;
-- 
2.7.4




[Qemu-devel] [PATCH v11 2/3] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()

2019-01-28 Thread Zhang, Yi
From: Zhang Yi 

When a file supporting DAX is used as vNVDIMM backend, mmap it with
MAP_SYNC flag in addition which can ensure file system metadata
synced in each guest writes to the backend file, without other QEMU
actions (e.g., periodic fsync() by QEMU).

Current, We have below different possible use cases:

1. pmem=on is set, shared=on is set, MAP_SYNC supported:
   a: backend is a dax supporting file.
- MAP_SYNC will active.
   b: backend is not a dax supporting file.
- mmap will trigger a warning. then MAP_SYNC flag will be ignored

2. The rest of cases:
   - we will never pass the MAP_SYNC to mmap2

Signed-off-by: Haozhong Zhang 
Signed-off-by: Zhang Yi 
---
 include/qemu/osdep.h | 21 +
 util/mmap-alloc.c| 28 +++-
 2 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index 457d24e..96209bb 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -419,6 +419,27 @@ void qemu_anon_ram_free(void *ptr, size_t size);
 #  define QEMU_VMALLOC_ALIGN getpagesize()
 #endif
 
+/*
+ * MAP_SHARED_VALIDATE and MAP_SYNC are introduced in Linux kernel
+ * 4.15, so they may not be defined when compiling on older kernels.
+ */
+#ifdef CONFIG_LINUX
+
+#include 
+
+#ifndef MAP_SYNC
+#define MAP_SYNC 0x8
+#endif
+
+#ifndef MAP_SHARED_VALIDATE
+#define MAP_SHARED_VALIDATE 0x03
+#endif
+
+#else  /* !CONFIG_LINUX */
+#define MAP_SYNC  0x0
+#define MAP_SHARED_VALIDATE   0x0
+#endif /* CONFIG_LINUX */
+
 #ifdef CONFIG_POSIX
 struct qemu_signalfd_siginfo {
 uint32_t ssi_signo;   /* Signal number */
diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index 97bbeed..2c86ad2 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -101,6 +101,7 @@ void *qemu_ram_mmap(int fd,
 #else
 void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
 #endif
+int mmap_xflags = 0;
 size_t offset;
 void *ptr1;
 
@@ -111,13 +112,38 @@ void *qemu_ram_mmap(int fd,
 assert(is_power_of_2(align));
 /* Always align to host page size */
 assert(align >= getpagesize());
+if (shared && is_pmem) {
+mmap_xflags = MAP_SYNC | MAP_SHARED_VALIDATE;
+}
 
 offset = QEMU_ALIGN_UP((uintptr_t)ptr, align) - (uintptr_t)ptr;
+retry_mmap:
 ptr1 = mmap(ptr + offset, size, PROT_READ | PROT_WRITE,
 MAP_FIXED |
 (fd == -1 ? MAP_ANONYMOUS : 0) |
-(shared ? MAP_SHARED : MAP_PRIVATE),
+(shared ? MAP_SHARED : MAP_PRIVATE) | mmap_xflags,
 fd, 0);
+
+/* if map failed with MAP_SHARED_VALIDATE | MAP_SYNC,
+ * we try with MAP_SHARED_VALIDATE without MAP_SYNC
+ */
+if (ptr1 == MAP_FAILED &&
+mmap_xflags == (MAP_SYNC | MAP_SHARED_VALIDATE)) {
+if (errno == ENOTSUP) {
+perror("failed to validate with mapping flags");
+}
+mmap_xflags = MAP_SHARED_VALIDATE;
+goto retry_mmap;
+}
+/* MAP_SHARED_VALIDATE flag is available since Linux 4.15
+ * Test only with MAP_SHARED_VALIDATE flag for compatibility.
+ * Then ignore the MAP_SHARED_VALIDATE flag and retry again
+ */
+if (mmap_xflags == MAP_SHARED_VALIDATE &&
+ptr1 == MAP_FAILED) {
+mmap_xflags &= ~MAP_SHARED_VALIDATE;
+goto retry_mmap;
+}
 if (ptr1 == MAP_FAILED) {
 munmap(ptr, total);
 return MAP_FAILED;
-- 
2.7.4




[Qemu-devel] [PATCH v11 1/3] util/mmap-alloc: Add a 'is_pmem' parameter to qemu_ram_mmap

2019-01-28 Thread Zhang, Yi
From: Zhang Yi 

besides the existing 'shared' flags, we are going to add
'is_pmem' to qemu_ram_mmap(), which indicated the memory backend
file is a persist memory.

Signed-off-by: Haozhong Zhang 
Signed-off-by: Zhang Yi 
---
 exec.c|  2 +-
 include/qemu/mmap-alloc.h | 21 -
 util/mmap-alloc.c |  6 +-
 util/oslib-posix.c|  2 +-
 4 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/exec.c b/exec.c
index bb6170d..27cea52 100644
--- a/exec.c
+++ b/exec.c
@@ -1860,7 +1860,7 @@ static void *file_ram_alloc(RAMBlock *block,
 }
 
 area = qemu_ram_mmap(fd, memory, block->mr->align,
- block->flags & RAM_SHARED);
+ block->flags & RAM_SHARED, block->flags & RAM_PMEM);
 if (area == MAP_FAILED) {
 error_setg_errno(errp, errno,
  "unable to map backing store for guest RAM");
diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
index 50385e3..190688a 100644
--- a/include/qemu/mmap-alloc.h
+++ b/include/qemu/mmap-alloc.h
@@ -7,7 +7,26 @@ size_t qemu_fd_getpagesize(int fd);
 
 size_t qemu_mempath_getpagesize(const char *mem_path);
 
-void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared);
+/**
+ * qemu_ram_mmap: mmap the specified file or device.
+ *
+ * Parameters:
+ *  @fd: the file or the device to mmap
+ *  @size: the number of bytes to be mmaped
+ *  @align: if not zero, specify the alignment of the starting mapping address;
+ *  otherwise, the alignment in use will be determined by QEMU.
+ *  @shared: map has RAM_SHARED flag.
+ *  @is_pmem: map has RAM_PMEM flag.
+ *
+ * Return:
+ *  On success, return a pointer to the mapped area.
+ *  On failure, return MAP_FAILED.
+ */
+void *qemu_ram_mmap(int fd,
+size_t size,
+size_t align,
+bool shared,
+bool is_pmem);
 
 void qemu_ram_munmap(void *ptr, size_t size);
 
diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index fd329ec..97bbeed 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -75,7 +75,11 @@ size_t qemu_mempath_getpagesize(const char *mem_path)
 return getpagesize();
 }
 
-void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
+void *qemu_ram_mmap(int fd,
+size_t size,
+size_t align,
+bool shared,
+bool is_pmem)
 {
 /*
  * Note: this always allocates at least one extra page of virtual address
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index fbd0dc8..040937f 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -203,7 +203,7 @@ void *qemu_memalign(size_t alignment, size_t size)
 void *qemu_anon_ram_alloc(size_t size, uint64_t *alignment, bool shared)
 {
 size_t align = QEMU_VMALLOC_ALIGN;
-void *ptr = qemu_ram_mmap(-1, size, align, shared);
+void *ptr = qemu_ram_mmap(-1, size, align, shared, false);
 
 if (ptr == MAP_FAILED) {
 return NULL;
-- 
2.7.4




[Qemu-devel] [PATCH v11 3/3] docs: Added MAP_SYNC documentation

2019-01-28 Thread Zhang, Yi
From: Zhang Yi 

Signed-off-by: Zhang Yi 
---
 docs/nvdimm.txt | 29 -
 qemu-options.hx |  4 
 2 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
index 5f158a6..9da96aa 100644
--- a/docs/nvdimm.txt
+++ b/docs/nvdimm.txt
@@ -142,11 +142,38 @@ backend of vNVDIMM:
 Guest Data Persistence
 --
 
+vNVDIMM is designed and implemented to guarantee the guest data
+persistence on the backends in case of host crash or a power failures.
+However, there are still some requirements and limitations
+as explained below.
+
 Though QEMU supports multiple types of vNVDIMM backends on Linux,
-currently the only one that can guarantee the guest write persistence
+if MAP_SYNC is not supported by the host kernel and the backends,
+the only backend that can guarantee the guest write persistence
 is the device DAX on the real NVDIMM device (e.g., /dev/dax0.0), to
 which all guest access do not involve any host-side kernel cache.
 
+mmap(2) flag MAP_SYNC is added since Linux kernel 4.15. On such
+systems, QEMU can mmap(2) the dax backend files with MAP_SYNC, which
+ensures filesystem metadata consistency in case of a host crash or a power
+failure. Enabling MAP_SYNC in QEMU requires below conditions
+
+ - 'pmem' option of memory-backend-file is 'on':
+   The backend is a file supporting DAX, e.g., a file on an ext4 or
+   xfs file system mounted with '-o dax'. if your pmem=on ,but the backend is
+   not a file supporting DAX, mapping with this flag results in an EOPNOTSUPP
+   warning. then MAP_SYNC will be ignored
+
+ - 'share' option of memory-backend-file is 'on':
+   MAP_SYNC flag available only with the MAP_SHARED_VALIDATE mapping type.
+
+ - 'MAP_SYNC' is supported on linux kernel.(default opened since Linux 4.15)
+
+Otherwise, We will ignore the MAP_SYNC flag.
+
+For more details, please reference mmap(2) man page:
+http://man7.org/linux/man-pages/man2/mmap.2.html.
+
 When using other types of backends, it's suggested to set 'unarmed'
 option of '-device nvdimm' to 'on', which sets the unarmed flag of the
 guest NVDIMM region mapping structure.  This unarmed flag indicates
diff --git a/qemu-options.hx b/qemu-options.hx
index 08f8516..0cd41f4 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -4002,6 +4002,10 @@ using the SNIA NVM programming model (e.g. Intel NVDIMM).
 If @option{pmem} is set to 'on', QEMU will take necessary operations to
 guarantee the persistence of its own writes to @option{mem-path}
 (e.g. in vNVDIMM label emulation and live migration).
+Also, we will map the backend-file with MAP_SYNC flag, which can ensure
+the file metadata is in sync to @option{mem-path} in case of host crash
+or a power failure. MAP_SYNC requires support from both the host kernel
+(since Linux kernel 4.15) and @option{mem-path} (only files supporting DAX).
 
 @item -object 
memory-backend-ram,id=@var{id},merge=@var{on|off},dump=@var{on|off},share=@var{on|off},prealloc=@var{on|off},size=@var{size},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave}
 
-- 
2.7.4




[Qemu-devel] [PATCH v11 0/3] support MAP_SYNC for memory-backend-file

2019-01-28 Thread Zhang, Yi
 the use of MAP_SYNC. (Eduardo Habkost)
 * Remove the unnecessary set of MAP_SHARED_VALIDATE in some cases and
   the retry mechanism in qemu_ram_mmap(). (Michael S. Tsirkin)
 * Move OS dependent definitions of MAP_SYNC and MAP_SHARED_VALIDATE
   to osdep.h. (Michael S. Tsirkin)

Zhang Yi (3):
  util/mmap-alloc: Add a 'is_pmem' parameter to qemu_ram_mmap
  util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
  docs: Added MAP_SYNC documentation

 docs/nvdimm.txt   | 29 -
 exec.c|  2 +-
 include/qemu/mmap-alloc.h | 21 -
 include/qemu/osdep.h  | 21 +
 qemu-options.hx   |  4 
 util/mmap-alloc.c | 34 --
 util/oslib-posix.c|  2 +-
 7 files changed, 107 insertions(+), 6 deletions(-)

-- 
2.7.4




[Qemu-devel] [PATCH V10 0/4] support MAP_SYNC for memory-backend-file

2019-01-22 Thread Zhang, Yi
From: "Zhang,Yi" 

Linux 4.15 introduces a new mmap flag MAP_SYNC, which can be used to
guarantee the write persistence to mmap'ed files supporting DAX (e.g.,
files on ext4/xfs file system mounted with '-o dax').

A description of MAP_SYNC and MAP_SHARED_VALIDATE can be found at
https://patchwork.kernel.org/patch/10028151/

In order to make sure that the file metadata is in sync after a fault 
while we are writing a shared DAX supporting backend files, this
patch-set enables QEMU to use MAP_SYNC flag for memory-backend-dax-file.

As the DAX vs DMA truncated issue was solved, we refined the code and
send out this feature for the v5 version.

We will pass MAP_SYNC to mmap(2); if MAP_SYNC is supported and
'share=on' & 'pmem=on'. 
Or QEMU will not pass this flag to mmap(2)

Changes in V10:
 * 4/4: refine the document.
 * 3/4: Reviewed-by: Stefano Garzarella 
 * 2/4: refine the commit message, Added MAP_SHARED_VALIDATE.
 * 2/4: Fix the wrong include header

Changes in V9:
 * 1/6: Reviewed-by: Eduardo Habkost 
 * 2/6: New Added: Micheal: use sparse feature define RAM_FLAG. 
 since I don't have much knowledge about the sparse feature, @Micheal Could you 
 add some documentation/commit message on this patch? Thank you very much.
 * 3/6: from 2/5: Eduardo: updated the commit message. 
 * 4/6: from 3/5: Micheal: don't ignore MAP_SYNC failures silently.
 * 5/6: from 4/5: Eduardo: updated the commit message.
 * 6/6: from 5/5: Micheal: Drop the sync option, document the MAP_SYNC.

Changes in v8:
 * Micheal: 3/5, remove the duplicated define in the os_dep.h
 * Micheal: 2/5, make type define safety.
 * Micheal: 2/5, fixed the incorrect define MAP_SHARE on qemu_anon_ram_alloc.
 * 4/6 removed, we remove the on/off/auto define of sync,  as by now,
   MAP_SYNC only worked with pmem=on.
 * @Micheal, I still reuse the RAM_SYNC flag, it is much straightforward to 
parse 
   all the flags in one parameter.

Changes in v7:
 * Micheal: [3,4,6]/6 limited the "sync" flag only on a nvdimm backend.(pmem=on)

Changes in v6:
 * Pankaj: 3/7 are squashed with 2/7
 * Pankaj: 7/7 update comments to "consistent filesystem metadata".
 * Pankaj, Igor: 1/7 Added Reviewed-by in patch-1/7
 * Stefan, 4/7 move the include header from "/linux/mman.h" to "osdep.h"
 * Stefan, 5/7 Add missing "munmap"
 * Stefan, 2/7 refine the shared/flag.

Changes in v5:
 * Add patch 1 to fix a memory leak issue.
 * Refine the patch 4-6
 * Remove the patch 3 as we already change the parameter from "shared" to
   "flags"

Changes in v4:
 * Add patch 1-3 to switch some functions to a single 'flags'
   parameters. (Michael S. Tsirkin)
 * v3 patch 1-3 become v4 patch 4-6.
 * Patch 4: move definitions of MAP_SYNC and MAP_SHARED_VALIDATE to a
   new header file under include/standard-headers/linux/. (Michael S. Tsirkin)
 * Patch 6: refine the description of the 'sync' option. (Michael S. Tsirkin)

Changes in v3:
 * Patch 1: add MAP_SHARED_VALIDATE in both sync=on and sync=auto
   cases, and add back the retry mechanism. MAP_SYNC will be ignored
   by Linux kernel 4.15 if MAP_SHARED_VALIDATE is missed.
 * Patch 1: define MAP_SYNC and MAP_SHARED_VALIDATE as 0 on non-Linux
   platforms in order to make qemu_ram_mmap() compile on those platforms.
 * Patch 2&3: include more information in error messages of
   memory-backend in hope to help user to identify the error.
   (Dr. David Alan Gilbert)
 * Patch 3: fix typo in the commit message. (Dr. David Alan Gilbert)

Changes in v2:
 * Add 'sync' option to control the use of MAP_SYNC. (Eduardo Habkost)
 * Remove the unnecessary set of MAP_SHARED_VALIDATE in some cases and
   the retry mechanism in qemu_ram_mmap(). (Michael S. Tsirkin)
 * Move OS dependent definitions of MAP_SYNC and MAP_SHARED_VALIDATE
   to osdep.h. (Michael S. Tsirkin)

Zhang Yi (4):
  util/mmap-alloc: switch 'shared' to 'flags' parameter
  util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
  hostmem: add more information in error messages
  docs: Added MAP_SYNC documentation

 backends/hostmem-file.c   |  6 --
 backends/hostmem.c|  8 +---
 docs/nvdimm.txt   | 29 -
 exec.c|  7 ---
 include/qemu/mmap-alloc.h | 20 +++-
 include/qemu/osdep.h  | 21 +
 qemu-options.hx   |  4 
 util/mmap-alloc.c | 15 +++
 util/oslib-posix.c|  9 -
 9 files changed, 104 insertions(+), 15 deletions(-)

-- 
2.7.4




[Qemu-devel] [PATCH V10 1/4] util/mmap-alloc: switch 'shared' to 'flags' parameter

2019-01-22 Thread Zhang, Yi
From: Zhang Yi 

As more flag parameters besides the existing 'shared' are going to be
added to qemu_ram_mmap() and qemu_ram_alloc_from_{file,fd}(), let's
switch 'shared' to a 'flags' parameter in advance, so as to ease the
further additions.

Signed-off-by: Haozhong Zhang 
Signed-off-by: Zhang Yi 
---
 exec.c|  7 ---
 include/qemu/mmap-alloc.h | 19 ++-
 util/mmap-alloc.c |  8 +---
 util/oslib-posix.c|  9 -
 4 files changed, 35 insertions(+), 8 deletions(-)

diff --git a/exec.c b/exec.c
index bb6170d..e92a7da 100644
--- a/exec.c
+++ b/exec.c
@@ -1810,6 +1810,7 @@ static void *file_ram_alloc(RAMBlock *block,
 ram_addr_t memory,
 int fd,
 bool truncate,
+uint32_t flags,
 Error **errp)
 {
 void *area;
@@ -1859,8 +1860,7 @@ static void *file_ram_alloc(RAMBlock *block,
 perror("ftruncate");
 }
 
-area = qemu_ram_mmap(fd, memory, block->mr->align,
- block->flags & RAM_SHARED);
+area = qemu_ram_mmap(fd, memory, block->mr->align, flags);
 if (area == MAP_FAILED) {
 error_setg_errno(errp, errno,
  "unable to map backing store for guest RAM");
@@ -2279,7 +2279,8 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, 
MemoryRegion *mr,
 new_block->used_length = size;
 new_block->max_length = size;
 new_block->flags = ram_flags;
-new_block->host = file_ram_alloc(new_block, size, fd, !file_size, errp);
+new_block->host = file_ram_alloc(new_block, size, fd, !file_size,
+ram_flags, errp);
 if (!new_block->host) {
 g_free(new_block);
 return NULL;
diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
index 50385e3..6fe6ed4 100644
--- a/include/qemu/mmap-alloc.h
+++ b/include/qemu/mmap-alloc.h
@@ -7,7 +7,24 @@ size_t qemu_fd_getpagesize(int fd);
 
 size_t qemu_mempath_getpagesize(const char *mem_path);
 
-void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared);
+/**
+ * qemu_ram_mmap: mmap the specified file or device.
+ *
+ * Parameters:
+ *  @fd: the file or the device to mmap
+ *  @size: the number of bytes to be mmaped
+ *  @align: if not zero, specify the alignment of the starting mapping address;
+ *  otherwise, the alignment in use will be determined by QEMU.
+ *  @flags: specifies additional properties of the mapping, which can be one or
+ *  bit-or of following values
+ *  - RAM_SHARED: mmap with MAP_SHARED flag
+ *  Other bits are ignored.
+ *
+ * Return:
+ *  On success, return a pointer to the mapped area.
+ *  On failure, return MAP_FAILED.
+ */
+void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags);
 
 void qemu_ram_munmap(void *ptr, size_t size);
 
diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index fd329ec..8f0a740 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -13,6 +13,7 @@
 #include "qemu/osdep.h"
 #include "qemu/mmap-alloc.h"
 #include "qemu/host-utils.h"
+#include "exec/memory.h"
 
 #define HUGETLBFS_MAGIC   0x958458f6
 
@@ -75,7 +76,7 @@ size_t qemu_mempath_getpagesize(const char *mem_path)
 return getpagesize();
 }
 
-void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
+void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
 {
 /*
  * Note: this always allocates at least one extra page of virtual address
@@ -92,11 +93,12 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, bool 
shared)
  * anonymous memory is OK.
  */
 int anonfd = fd == -1 || qemu_fd_getpagesize(fd) == getpagesize() ? -1 : 
fd;
-int flags = anonfd == -1 ? MAP_ANONYMOUS : MAP_NORESERVE;
-void *ptr = mmap(0, total, PROT_NONE, flags | MAP_PRIVATE, anonfd, 0);
+int mmap_flags = anonfd == -1 ? MAP_ANONYMOUS : MAP_NORESERVE;
+void *ptr = mmap(0, total, PROT_NONE, mmap_flags | MAP_PRIVATE, anonfd, 0);
 #else
 void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
 #endif
+bool shared = flags & RAM_SHARED;
 size_t offset;
 void *ptr1;
 
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index fbd0dc8..75a0171 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -54,6 +54,7 @@
 #endif
 
 #include "qemu/mmap-alloc.h"
+#include "exec/memory.h"
 
 #ifdef CONFIG_DEBUG_STACK_USAGE
 #include "qemu/error-report.h"
@@ -203,7 +204,13 @@ void *qemu_memalign(size_t alignment, size_t size)
 void *qemu_anon_ram_alloc(size_t size, uint64_t *alignment, bool shared)
 {
 size_t align = QEMU_VMALLOC_ALIGN;
-void *ptr = qemu_ram_mmap(-1, size, align, shared);
+uint32_t flags = 0;
+void *ptr;
+
+if (shared) {
+flags = RAM_SHARED;
+}
+ptr = qemu_ram_mmap(-1, size, align, flags);
 
 if (ptr == MAP_FAILED) {
 return NULL;
-- 
2.7.4




[Qemu-devel] [PATCH V10 2/4] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()

2019-01-22 Thread Zhang, Yi
From: Zhang Yi 

When a file supporting DAX is used as vNVDIMM backend, mmap it with
MAP_SYNC flag in addition which can ensure file system metadata
synced in each guest writes to the backend file, without other QEMU
actions (e.g., periodic fsync() by QEMU).

Current, We have below different possible use cases:

1. pmem=on is set, shared=on is set, MAP_SYNC supported:
   a: backend is a dax supporting file.
- MAP_SYNC will active.
   b: backend is not a dax supporting file.
- mmap will result in an EOPNOTSUPP error.

2. The rest of cases:
   - we will never pass the MAP_SYNC to mmap2

Signed-off-by: Haozhong Zhang 
Signed-off-by: Zhang Yi 
---
 include/qemu/mmap-alloc.h |  1 +
 include/qemu/osdep.h  | 21 +
 util/mmap-alloc.c |  7 ++-
 3 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
index 6fe6ed4..a95d91c 100644
--- a/include/qemu/mmap-alloc.h
+++ b/include/qemu/mmap-alloc.h
@@ -18,6 +18,7 @@ size_t qemu_mempath_getpagesize(const char *mem_path);
  *  @flags: specifies additional properties of the mapping, which can be one or
  *  bit-or of following values
  *  - RAM_SHARED: mmap with MAP_SHARED flag
+ *  - RAM_PMEM: mmap with MAP_SYNC flag
  *  Other bits are ignored.
  *
  * Return:
diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index 457d24e..3bcf155 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -419,6 +419,27 @@ void qemu_anon_ram_free(void *ptr, size_t size);
 #  define QEMU_VMALLOC_ALIGN getpagesize()
 #endif
 
+/*
+ * MAP_SHARED_VALIDATE and MAP_SYNC are introduced in Linux kernel
+ * 4.15, so they may not be defined when compiling on older kernels.
+ */
+#ifdef CONFIG_LINUX
+
+#include 
+
+#ifndef MAP_SYNC
+#define MAP_SYNC 0x0
+#endif
+
+#ifndef MAP_SHARED_VALIDATE
+#define MAP_SHARED_VALIDATE 0x0
+#endif
+
+#else  /* !CONFIG_LINUX */
+#define MAP_SYNC  0x0
+#define MAP_SHARED_VALIDATE   0x0
+#endif /* CONFIG_LINUX */
+
 #ifdef CONFIG_POSIX
 struct qemu_signalfd_siginfo {
 uint32_t ssi_signo;   /* Signal number */
diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index 8f0a740..a4ce9b5 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -99,6 +99,8 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, 
uint32_t flags)
 void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
 #endif
 bool shared = flags & RAM_SHARED;
+bool is_pmem = flags & RAM_PMEM;
+int mmap_xflags = 0;
 size_t offset;
 void *ptr1;
 
@@ -109,12 +111,15 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, 
uint32_t flags)
 assert(is_power_of_2(align));
 /* Always align to host page size */
 assert(align >= getpagesize());
+if (shared && is_pmem) {
+mmap_xflags |= (MAP_SYNC | MAP_SHARED_VALIDATE);
+}
 
 offset = QEMU_ALIGN_UP((uintptr_t)ptr, align) - (uintptr_t)ptr;
 ptr1 = mmap(ptr + offset, size, PROT_READ | PROT_WRITE,
 MAP_FIXED |
 (fd == -1 ? MAP_ANONYMOUS : 0) |
-(shared ? MAP_SHARED : MAP_PRIVATE),
+(shared ? MAP_SHARED : MAP_PRIVATE) | mmap_xflags,
 fd, 0);
 if (ptr1 == MAP_FAILED) {
 munmap(ptr, total);
-- 
2.7.4




[Qemu-devel] [PATCH V10 4/4] docs: Added MAP_SYNC documentation

2019-01-22 Thread Zhang, Yi
From: Zhang Yi 

Signed-off-by: Zhang Yi 
---
 docs/nvdimm.txt | 29 -
 qemu-options.hx |  4 
 2 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
index 5f158a6..166c395 100644
--- a/docs/nvdimm.txt
+++ b/docs/nvdimm.txt
@@ -142,11 +142,38 @@ backend of vNVDIMM:
 Guest Data Persistence
 --
 
+vNVDIMM is designed and implemented to guarantee the guest data
+persistence on the backends in case of host crash or a power failures.
+However, there are still some requirements and limitations
+as explained below.
+
 Though QEMU supports multiple types of vNVDIMM backends on Linux,
-currently the only one that can guarantee the guest write persistence
+if MAP_SYNC is not supported by the host kernel and the backends,
+the only backend that can guarantee the guest write persistence
 is the device DAX on the real NVDIMM device (e.g., /dev/dax0.0), to
 which all guest access do not involve any host-side kernel cache.
 
+mmap(2) flag MAP_SYNC is added since Linux kernel 4.15. On such
+systems, QEMU can mmap(2) the dax backend files with MAP_SYNC, which
+ensures filesystem metadata consistency in case of a host crash or a power
+failure. Enabling MAP_SYNC in QEMU requires below conditions
+
+ - 'pmem' option of memory-backend-file is 'on':
+   The backend is a file supporting DAX, e.g., a file on an ext4 or
+   xfs file system mounted with '-o dax'. if your pmem=on ,but the backend is
+   not a file supporting DAX, mapping with this flag results in an EOPNOTSUPP
+   error.
+
+ - 'share' option of memory-backend-file is 'on':
+   MAP_SYNC flag available only with the MAP_SHARED_VALIDATE mapping type.
+
+ - 'MAP_SYNC' is supported on linux kernel.(default opened since Linux 4.15)
+
+Otherwise, We will ignore the MAP_SYNC flag.
+
+For more details, please reference mmap(2) man page:
+http://man7.org/linux/man-pages/man2/mmap.2.html.
+
 When using other types of backends, it's suggested to set 'unarmed'
 option of '-device nvdimm' to 'on', which sets the unarmed flag of the
 guest NVDIMM region mapping structure.  This unarmed flag indicates
diff --git a/qemu-options.hx b/qemu-options.hx
index 08f8516..0cd41f4 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -4002,6 +4002,10 @@ using the SNIA NVM programming model (e.g. Intel NVDIMM).
 If @option{pmem} is set to 'on', QEMU will take necessary operations to
 guarantee the persistence of its own writes to @option{mem-path}
 (e.g. in vNVDIMM label emulation and live migration).
+Also, we will map the backend-file with MAP_SYNC flag, which can ensure
+the file metadata is in sync to @option{mem-path} in case of host crash
+or a power failure. MAP_SYNC requires support from both the host kernel
+(since Linux kernel 4.15) and @option{mem-path} (only files supporting DAX).
 
 @item -object 
memory-backend-ram,id=@var{id},merge=@var{on|off},dump=@var{on|off},share=@var{on|off},prealloc=@var{on|off},size=@var{size},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave}
 
-- 
2.7.4




[Qemu-devel] [PATCH V10 3/4] hostmem: add more information in error messages

2019-01-22 Thread Zhang, Yi
From: Zhang Yi 

When there are multiple memory backends in use, including the object type
name and the property name in the error message can help users to locate
the error.

Signed-off-by: Haozhong Zhang 
Signed-off-by: Zhang Yi 
Reviewed-by: Eduardo Habkost 
Reviewed-by: Stefano Garzarella 
---
 backends/hostmem-file.c | 6 --
 backends/hostmem.c  | 8 +---
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index e640749..0dd7a90 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -82,7 +82,8 @@ static void set_mem_path(Object *o, const char *str, Error 
**errp)
 HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
 
 if (host_memory_backend_mr_inited(backend)) {
-error_setg(errp, "cannot change property value");
+error_setg(errp, "cannot change property 'mem-path' of %s",
+   object_get_typename(o));
 return;
 }
 g_free(fb->mem_path);
@@ -120,7 +121,8 @@ static void file_memory_backend_set_align(Object *o, 
Visitor *v,
 uint64_t val;
 
 if (host_memory_backend_mr_inited(backend)) {
-error_setg(_err, "cannot change property value");
+error_setg(_err, "cannot change property '%s' of %s",
+   name, object_get_typename(o));
 goto out;
 }
 
diff --git a/backends/hostmem.c b/backends/hostmem.c
index 1a89342..e2bcf9f 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -47,7 +47,8 @@ host_memory_backend_set_size(Object *obj, Visitor *v, const 
char *name,
 uint64_t value;
 
 if (host_memory_backend_mr_inited(backend)) {
-error_setg(_err, "cannot change property value");
+error_setg(_err, "cannot change property %s of %s ",
+   name, object_get_typename(obj));
 goto out;
 }
 
@@ -56,8 +57,9 @@ host_memory_backend_set_size(Object *obj, Visitor *v, const 
char *name,
 goto out;
 }
 if (!value) {
-error_setg(_err, "Property '%s.%s' doesn't take value '%"
-   PRIu64 "'", object_get_typename(obj), name, value);
+error_setg(_err,
+   "property '%s' of %s doesn't take value '%" PRIu64 "'",
+   name, object_get_typename(obj), value);
 goto out;
 }
 backend->size = value;
-- 
2.7.4




[Qemu-devel] [PATCH V9 5/6] hostmem: add more information in error messages

2019-01-16 Thread Zhang Yi
When there are multiple memory backends in use, including the object type
name and the property name in the error message can help users to locate
the error.

Signed-off-by: Haozhong Zhang 
Signed-off-by: Zhang Yi 
Reviewed-by: Eduardo Habkost 
---
 backends/hostmem-file.c | 6 --
 backends/hostmem.c  | 8 +---
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index e640749..0dd7a90 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -82,7 +82,8 @@ static void set_mem_path(Object *o, const char *str, Error 
**errp)
 HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
 
 if (host_memory_backend_mr_inited(backend)) {
-error_setg(errp, "cannot change property value");
+error_setg(errp, "cannot change property 'mem-path' of %s",
+   object_get_typename(o));
 return;
 }
 g_free(fb->mem_path);
@@ -120,7 +121,8 @@ static void file_memory_backend_set_align(Object *o, 
Visitor *v,
 uint64_t val;
 
 if (host_memory_backend_mr_inited(backend)) {
-error_setg(_err, "cannot change property value");
+error_setg(_err, "cannot change property '%s' of %s",
+   name, object_get_typename(o));
 goto out;
 }
 
diff --git a/backends/hostmem.c b/backends/hostmem.c
index 1a89342..e2bcf9f 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -47,7 +47,8 @@ host_memory_backend_set_size(Object *obj, Visitor *v, const 
char *name,
 uint64_t value;
 
 if (host_memory_backend_mr_inited(backend)) {
-error_setg(_err, "cannot change property value");
+error_setg(_err, "cannot change property %s of %s ",
+   name, object_get_typename(obj));
 goto out;
 }
 
@@ -56,8 +57,9 @@ host_memory_backend_set_size(Object *obj, Visitor *v, const 
char *name,
 goto out;
 }
 if (!value) {
-error_setg(_err, "Property '%s.%s' doesn't take value '%"
-   PRIu64 "'", object_get_typename(obj), name, value);
+error_setg(_err,
+   "property '%s' of %s doesn't take value '%" PRIu64 "'",
+   name, object_get_typename(obj), value);
 goto out;
 }
 backend->size = value;
-- 
2.7.4




[Qemu-devel] [PATCH V9 3/6] util/mmap-alloc: switch 'shared' to 'flags' parameter

2019-01-16 Thread Zhang Yi
As more flag parameters besides the existing 'shared' are going to be
added to qemu_ram_mmap() and qemu_ram_alloc_from_{file,fd}(), let's
switch 'shared' to a 'flags' parameter in advance, so as to ease the
further additions.

Signed-off-by: Haozhong Zhang 
Signed-off-by: Zhang Yi 
---
 exec.c|  7 ---
 include/qemu/mmap-alloc.h | 19 ++-
 util/mmap-alloc.c |  8 +---
 util/oslib-posix.c|  9 -
 4 files changed, 35 insertions(+), 8 deletions(-)

diff --git a/exec.c b/exec.c
index bb6170d..e92a7da 100644
--- a/exec.c
+++ b/exec.c
@@ -1810,6 +1810,7 @@ static void *file_ram_alloc(RAMBlock *block,
 ram_addr_t memory,
 int fd,
 bool truncate,
+uint32_t flags,
 Error **errp)
 {
 void *area;
@@ -1859,8 +1860,7 @@ static void *file_ram_alloc(RAMBlock *block,
 perror("ftruncate");
 }
 
-area = qemu_ram_mmap(fd, memory, block->mr->align,
- block->flags & RAM_SHARED);
+area = qemu_ram_mmap(fd, memory, block->mr->align, flags);
 if (area == MAP_FAILED) {
 error_setg_errno(errp, errno,
  "unable to map backing store for guest RAM");
@@ -2279,7 +2279,8 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, 
MemoryRegion *mr,
 new_block->used_length = size;
 new_block->max_length = size;
 new_block->flags = ram_flags;
-new_block->host = file_ram_alloc(new_block, size, fd, !file_size, errp);
+new_block->host = file_ram_alloc(new_block, size, fd, !file_size,
+ram_flags, errp);
 if (!new_block->host) {
 g_free(new_block);
 return NULL;
diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
index 50385e3..6fe6ed4 100644
--- a/include/qemu/mmap-alloc.h
+++ b/include/qemu/mmap-alloc.h
@@ -7,7 +7,24 @@ size_t qemu_fd_getpagesize(int fd);
 
 size_t qemu_mempath_getpagesize(const char *mem_path);
 
-void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared);
+/**
+ * qemu_ram_mmap: mmap the specified file or device.
+ *
+ * Parameters:
+ *  @fd: the file or the device to mmap
+ *  @size: the number of bytes to be mmaped
+ *  @align: if not zero, specify the alignment of the starting mapping address;
+ *  otherwise, the alignment in use will be determined by QEMU.
+ *  @flags: specifies additional properties of the mapping, which can be one or
+ *  bit-or of following values
+ *  - RAM_SHARED: mmap with MAP_SHARED flag
+ *  Other bits are ignored.
+ *
+ * Return:
+ *  On success, return a pointer to the mapped area.
+ *  On failure, return MAP_FAILED.
+ */
+void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags);
 
 void qemu_ram_munmap(void *ptr, size_t size);
 
diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index fd329ec..8f0a740 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -13,6 +13,7 @@
 #include "qemu/osdep.h"
 #include "qemu/mmap-alloc.h"
 #include "qemu/host-utils.h"
+#include "exec/memory.h"
 
 #define HUGETLBFS_MAGIC   0x958458f6
 
@@ -75,7 +76,7 @@ size_t qemu_mempath_getpagesize(const char *mem_path)
 return getpagesize();
 }
 
-void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
+void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
 {
 /*
  * Note: this always allocates at least one extra page of virtual address
@@ -92,11 +93,12 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, bool 
shared)
  * anonymous memory is OK.
  */
 int anonfd = fd == -1 || qemu_fd_getpagesize(fd) == getpagesize() ? -1 : 
fd;
-int flags = anonfd == -1 ? MAP_ANONYMOUS : MAP_NORESERVE;
-void *ptr = mmap(0, total, PROT_NONE, flags | MAP_PRIVATE, anonfd, 0);
+int mmap_flags = anonfd == -1 ? MAP_ANONYMOUS : MAP_NORESERVE;
+void *ptr = mmap(0, total, PROT_NONE, mmap_flags | MAP_PRIVATE, anonfd, 0);
 #else
 void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
 #endif
+bool shared = flags & RAM_SHARED;
 size_t offset;
 void *ptr1;
 
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index fbd0dc8..75a0171 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -54,6 +54,7 @@
 #endif
 
 #include "qemu/mmap-alloc.h"
+#include "exec/memory.h"
 
 #ifdef CONFIG_DEBUG_STACK_USAGE
 #include "qemu/error-report.h"
@@ -203,7 +204,13 @@ void *qemu_memalign(size_t alignment, size_t size)
 void *qemu_anon_ram_alloc(size_t size, uint64_t *alignment, bool shared)
 {
 size_t align = QEMU_VMALLOC_ALIGN;
-void *ptr = qemu_ram_mmap(-1, size, align, shared);
+uint32_t flags = 0;
+void *ptr;
+
+if (shared) {
+flags = RAM_SHARED;
+}
+ptr = qemu_ram_mmap(-1, size, align, flags);
 
 if (ptr == MAP_FAILED) {
 return NULL;
-- 
2.7.4




[Qemu-devel] [PATCH V9 6/6] docs: Added MAP_SYNC documentation

2019-01-16 Thread Zhang Yi
Signed-off-by: Zhang Yi 
---
 docs/nvdimm.txt | 21 -
 qemu-options.hx |  4 
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
index 5f158a6..565ba73 100644
--- a/docs/nvdimm.txt
+++ b/docs/nvdimm.txt
@@ -142,11 +142,30 @@ backend of vNVDIMM:
 Guest Data Persistence
 --
 
+vNVDIMM is designed and implemented to guarantee the guest data
+persistence on the backends even on the host crash and power
+failures. However, there are still some requirements and limitations
+as explained below.
+
 Though QEMU supports multiple types of vNVDIMM backends on Linux,
-currently the only one that can guarantee the guest write persistence
+if MAP_SYNC is not supported by the host kernel and the backends,
+the only backend that can guarantee the guest write persistence
 is the device DAX on the real NVDIMM device (e.g., /dev/dax0.0), to
 which all guest access do not involve any host-side kernel cache.
 
+mmap(2) flag MAP_SYNC is added since Linux kernel 4.15. On such
+systems, QEMU can mmap(2) the backend with MAP_SYNC, which can ensure
+filesystem metadata consistent even after a system crash or power
+failure. Besides the host kernel support, enabling MAP_SYNC in QEMU
+also requires:
+
+ - the backend is a file supporting DAX, e.g., a file on an ext4 or
+   xfs file system mounted with '-o dax',
+
+ - 'share' option of memory-backend-file is 'on'.
+
+ - 'pmem' option of memory-backend-file is 'on'
+
 When using other types of backends, it's suggested to set 'unarmed'
 option of '-device nvdimm' to 'on', which sets the unarmed flag of the
 guest NVDIMM region mapping structure.  This unarmed flag indicates
diff --git a/qemu-options.hx b/qemu-options.hx
index 08f8516..545cb8a 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -4002,6 +4002,10 @@ using the SNIA NVM programming model (e.g. Intel NVDIMM).
 If @option{pmem} is set to 'on', QEMU will take necessary operations to
 guarantee the persistence of its own writes to @option{mem-path}
 (e.g. in vNVDIMM label emulation and live migration).
+Also, we will map the backend-file with MAP_SYNC flag, which can ensure
+the file metadata is in sync to @option{mem-path} even on the host crash
+and power failures. MAP_SYNC requires supports from both the host kernel
+(since Linux kernel 4.15) and @option{mem-path} (only files supporting DAX).
 
 @item -object 
memory-backend-ram,id=@var{id},merge=@var{on|off},dump=@var{on|off},share=@var{on|off},prealloc=@var{on|off},size=@var{size},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave}
 
-- 
2.7.4




[Qemu-devel] [PATCH V9 4/6] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()

2019-01-16 Thread Zhang Yi
When a file supporting DAX is used as vNVDIMM backend, mmap it with
MAP_SYNC flag in addition which can ensure file system metadata
synced in each guest writes to the backend file, without other QEMU
actions (e.g., periodic fsync() by QEMU).

Signed-off-by: Haozhong Zhang 
Signed-off-by: Zhang Yi 
---
 include/qemu/mmap-alloc.h |  1 +
 include/qemu/osdep.h  | 16 
 util/mmap-alloc.c |  7 ++-
 3 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
index 6fe6ed4..a95d91c 100644
--- a/include/qemu/mmap-alloc.h
+++ b/include/qemu/mmap-alloc.h
@@ -18,6 +18,7 @@ size_t qemu_mempath_getpagesize(const char *mem_path);
  *  @flags: specifies additional properties of the mapping, which can be one or
  *  bit-or of following values
  *  - RAM_SHARED: mmap with MAP_SHARED flag
+ *  - RAM_PMEM: mmap with MAP_SYNC flag
  *  Other bits are ignored.
  *
  * Return:
diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index 457d24e..27a6bfe 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -419,6 +419,22 @@ void qemu_anon_ram_free(void *ptr, size_t size);
 #  define QEMU_VMALLOC_ALIGN getpagesize()
 #endif
 
+/*
+ * MAP_SHARED_VALIDATE and MAP_SYNC are introduced in Linux kernel
+ * 4.15, so they may not be defined when compiling on older kernels.
+ */
+#ifdef CONFIG_LINUX
+
+#include 
+
+#ifndef MAP_SYNC
+#define MAP_SYNC 0x0
+#endif
+
+#else  /* !CONFIG_LINUX */
+#define MAP_SYNC  0x0
+#endif /* CONFIG_LINUX */
+
 #ifdef CONFIG_POSIX
 struct qemu_signalfd_siginfo {
 uint32_t ssi_signo;   /* Signal number */
diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index 8f0a740..cba961c 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -99,6 +99,8 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, 
uint32_t flags)
 void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
 #endif
 bool shared = flags & RAM_SHARED;
+bool is_pmem = flags & RAM_PMEM;
+int mmap_xflags = 0;
 size_t offset;
 void *ptr1;
 
@@ -109,12 +111,15 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, 
uint32_t flags)
 assert(is_power_of_2(align));
 /* Always align to host page size */
 assert(align >= getpagesize());
+if (shared && is_pmem) {
+mmap_xflags |= MAP_SYNC;
+}
 
 offset = QEMU_ALIGN_UP((uintptr_t)ptr, align) - (uintptr_t)ptr;
 ptr1 = mmap(ptr + offset, size, PROT_READ | PROT_WRITE,
 MAP_FIXED |
 (fd == -1 ? MAP_ANONYMOUS : 0) |
-(shared ? MAP_SHARED : MAP_PRIVATE),
+(shared ? MAP_SHARED : MAP_PRIVATE) | mmap_xflags,
 fd, 0);
 if (ptr1 == MAP_FAILED) {
 munmap(ptr, total);
-- 
2.7.4




[Qemu-devel] [PATCH V9 2/6] memory: use sparse feature define RAM_FLAG.

2019-01-16 Thread Zhang Yi
Signed-off-by: Zhang Yi 
Signed-off-by: Michael S. Tsirkin 
---
 include/exec/memory.h | 12 ++--
 include/qemu/osdep.h  |  9 +
 2 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 667466b..03824d9 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -104,27 +104,27 @@ struct IOMMUNotifier {
 typedef struct IOMMUNotifier IOMMUNotifier;
 
 /* RAM is pre-allocated and passed into qemu_ram_alloc_from_ptr */
-#define RAM_PREALLOC   (1 << 0)
+#define RAM_PREALLOC ((QEMU_FORCE QemuMmapFlags) (1 << 0))
 
 /* RAM is mmap-ed with MAP_SHARED */
-#define RAM_SHARED (1 << 1)
+#define RAM_SHARED ((QEMU_FORCE QemuMmapFlags) (1 << 1))
 
 /* Only a portion of RAM (used_length) is actually used, and migrated.
  * This used_length size can change across reboots.
  */
-#define RAM_RESIZEABLE (1 << 2)
+#define RAM_RESIZEABLE ((QEMU_FORCE QemuMmapFlags) (1 << 2))
 
 /* UFFDIO_ZEROPAGE is available on this RAMBlock to atomically
  * zero the page and wake waiting processes.
  * (Set during postcopy)
  */
-#define RAM_UF_ZEROPAGE (1 << 3)
+#define RAM_UF_ZEROPAGE ((QEMU_FORCE QemuMmapFlags) (1 << 3))
 
 /* RAM can be migrated */
-#define RAM_MIGRATABLE (1 << 4)
+#define RAM_MIGRATABLE ((QEMU_FORCE QemuMmapFlags) (1 << 4))
 
 /* RAM is a persistent kind memory */
-#define RAM_PMEM (1 << 5)
+#define RAM_PMEM ((QEMU_FORCE QemuMmapFlags) (1 << 5))
 
 static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
IOMMUNotifierFlag flags,
diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index 3bf48bc..457d24e 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -185,6 +185,15 @@ extern int daemon(int, int);
 #define ESHUTDOWN 4099
 #endif
 
+#ifdef __CHECKER__
+#define QEMU_BITWISE __attribute__((bitwise))
+#define QEMU_FORCE   __attribute__((force))
+#else
+#define QEMU_BITWISE
+#define QEMU_FORCE
+#endif
+
+typedef unsigned QEMU_BITWISE QemuMmapFlags;
 /* time_t may be either 32 or 64 bits depending on the host OS, and
  * can be either signed or unsigned, so we can't just hardcode a
  * specific maximum value. This is not a C preprocessor constant,
-- 
2.7.4




[Qemu-devel] [PATCH V9 1/6] numa: Fixed the memory leak of numa error message

2019-01-16 Thread Zhang Yi
object_get_canonical_path_component() returns a string which
must be freed using g_free().

Signed-off-by: Zhang Yi 
Reviewed-by: Pankaj gupta 
Reviewed-by: Igor Mammedov 
Reviewed-by: Eduardo Habkost 
---
 numa.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/numa.c b/numa.c
index 50ec016..3875e1e 100644
--- a/numa.c
+++ b/numa.c
@@ -533,6 +533,7 @@ void memory_region_allocate_system_memory(MemoryRegion *mr, 
Object *owner,
 error_report("memory backend %s is used multiple times. Each "
  "-numa option must use a different memdev value.",
  path);
+g_free(path);
 exit(1);
 }
 
-- 
2.7.4




[Qemu-devel] [PATCH V9 0/6] support MAP_SYNC for memory-backend-file

2019-01-16 Thread Zhang Yi
Linux 4.15 introduces a new mmap flag MAP_SYNC, which can be used to
guarantee the write persistence to mmap'ed files supporting DAX (e.g.,
files on ext4/xfs file system mounted with '-o dax').

A description of MAP_SYNC and MAP_SHARED_VALIDATE can be found at
https://patchwork.kernel.org/patch/10028151/

In order to make sure that the file metadata is in sync after a fault 
while we are writing a shared DAX supporting backend files, this
patch-set enables QEMU to use MAP_SYNC flag for memory-backend-dax-file.

As the DAX vs DMA truncated issue was solved, we refined the code and
send out this feature for the v5 version.

We will pass MAP_SYNC to mmap(2); if MAP_SYNC is supported and
'share=on' & 'pmem=on'. 
Or QEMU will not pass this flag to mmap(2)

Changes in V9:
 * 1/6: Reviewed-by: Eduardo Habkost 
 * 2/6: New Added: Micheal: use sparse feature define RAM_FLAG. 
 since I don't have much knowledge about the sparse feature, @Micheal Could you 
 add some documentation/commit message on this patch? Thank you very much.
 * 3/6: from 2/5: Eduardo: updated the commit message. 
 * 4/6: from 3/5: Micheal: don't ignore MAP_SYNC failures silently.
 * 5/6: from 4/5: Eduardo: updated the commit message.
 * 6/6: from 5/5: Micheal: Drop the sync option, document the MAP_SYNC.

Changes in v8:
 * Micheal: 3/5, remove the duplicated define in the os_dep.h
 * Micheal: 2/5, make type define safety.
 * Micheal: 2/5, fixed the incorrect define MAP_SHARE on qemu_anon_ram_alloc.
 * 4/6 removed, we remove the on/off/auto define of sync,  as by now,
   MAP_SYNC only worked with pmem=on.
 * @Micheal, I still reuse the RAM_SYNC flag, it is much straightforward to 
parse 
   all the flags in one parameter.

Changes in v7:
 * Micheal: [3,4,6]/6 limited the "sync" flag only on a nvdimm backend.(pmem=on)

Changes in v6:
 * Pankaj: 3/7 are squashed with 2/7
 * Pankaj: 7/7 update comments to "consistent filesystem metadata".
 * Pankaj, Igor: 1/7 Added Reviewed-by in patch-1/7
 * Stefan, 4/7 move the include header from "/linux/mman.h" to "osdep.h"
 * Stefan, 5/7 Add missing "munmap"
 * Stefan, 2/7 refine the shared/flag.

Changes in v5:
 * Add patch 1 to fix a memory leak issue.
 * Refine the patch 4-6
 * Remove the patch 3 as we already change the parameter from "shared" to
   "flags"

Changes in v4:
 * Add patch 1-3 to switch some functions to a single 'flags'
   parameters. (Michael S. Tsirkin)
 * v3 patch 1-3 become v4 patch 4-6.
 * Patch 4: move definitions of MAP_SYNC and MAP_SHARED_VALIDATE to a
   new header file under include/standard-headers/linux/. (Michael S. Tsirkin)
 * Patch 6: refine the description of the 'sync' option. (Michael S. Tsirkin)

Changes in v3:
 * Patch 1: add MAP_SHARED_VALIDATE in both sync=on and sync=auto
   cases, and add back the retry mechanism. MAP_SYNC will be ignored
   by Linux kernel 4.15 if MAP_SHARED_VALIDATE is missed.
 * Patch 1: define MAP_SYNC and MAP_SHARED_VALIDATE as 0 on non-Linux
   platforms in order to make qemu_ram_mmap() compile on those platforms.
 * Patch 2&3: include more information in error messages of
   memory-backend in hope to help user to identify the error.
   (Dr. David Alan Gilbert)
 * Patch 3: fix typo in the commit message. (Dr. David Alan Gilbert)

Changes in v2:
 * Add 'sync' option to control the use of MAP_SYNC. (Eduardo Habkost)
 * Remove the unnecessary set of MAP_SHARED_VALIDATE in some cases and
   the retry mechanism in qemu_ram_mmap(). (Michael S. Tsirkin)
 * Move OS dependent definitions of MAP_SYNC and MAP_SHARED_VALIDATE
   to osdep.h. (Michael S. Tsirkin)

Zhang Yi (6):
  numa: Fixed the memory leak of numa error message
  memory: use sparse feature define RAM_FLAG.
  util/mmap-alloc: switch 'shared' to 'flags' parameter
  util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
  hostmem: add more information in error messages
  docs: Added MAP_SYNC documentation

 backends/hostmem-file.c   |  6 --
 backends/hostmem.c|  8 +---
 docs/nvdimm.txt   | 21 -
 exec.c|  7 ---
 include/exec/memory.h | 12 ++--
 include/qemu/mmap-alloc.h | 20 +++-
 include/qemu/osdep.h  | 25 +
 numa.c|  1 +
 qemu-options.hx   |  4 
 util/mmap-alloc.c | 15 +++
 util/oslib-posix.c|  9 -
 11 files changed, 107 insertions(+), 21 deletions(-)

-- 
2.7.4




[Qemu-devel] [PATCH V8 5/5] hostmem-file: add 'sync' option

2019-01-01 Thread Zhang Yi
This option controls will mmap the memory backend file with MAP_SYNC flag,
which can ensure filesystem metadata consistent even after a system crash
or power failure, if MAP_SYNC flag is supported by the host kernel(Linux
kernel 4.15 and later) and the backend is a file supporting DAX (e.g.,
file on ext4/xfs file system mounted with '-o dax').

It can take one of following values:
 - on:  try to pass MAP_SYNC to mmap(2); if MAP_SYNC is not supported or
'share=off' or 'pmem!=on', QEMU will not pass this flags to
mmap(2)
 - off: default, never pass MAP_SYNC to mmap(2)

Signed-off-by: Haozhong Zhang 
Signed-off-by: Zhang Yi 
---
 backends/hostmem-file.c   | 28 
 docs/nvdimm.txt   | 23 ++-
 exec.c|  2 +-
 include/exec/memory.h |  4 
 include/exec/ram_addr.h   |  1 +
 include/qemu/mmap-alloc.h |  1 +
 qemu-options.hx   | 19 ++-
 util/mmap-alloc.c |  4 ++--
 8 files changed, 77 insertions(+), 5 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index 0dd7a90..3d39032 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -36,6 +36,7 @@ struct HostMemoryBackendFile {
 uint64_t align;
 bool discard_data;
 bool is_pmem;
+bool sync;
 };
 
 static void
@@ -62,6 +63,7 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error 
**errp)
  path,
  backend->size, fb->align,
  (backend->share ? RAM_SHARED : 0) |
+ (fb->sync ? RAM_SYNC : 0) |
  (fb->is_pmem ? RAM_PMEM : 0),
  fb->mem_path, errp);
 g_free(path);
@@ -136,6 +138,29 @@ static void file_memory_backend_set_align(Object *o, 
Visitor *v,
 error_propagate(errp, local_err);
 }
 
+static bool file_memory_backend_get_sync(Object *o, Error **errp)
+{
+return MEMORY_BACKEND_FILE(o)->sync;
+}
+
+static void file_memory_backend_set_sync(
+Object *obj, bool value, Error **errp)
+{
+HostMemoryBackend *backend = MEMORY_BACKEND(obj);
+HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(obj);
+
+if (host_memory_backend_mr_inited(backend)) {
+error_setg(errp, "cannot change property sync of %s",
+   object_get_typename(obj));
+goto out;
+}
+
+fb->sync = value;
+
+ out:
+return;
+}
+
 static bool file_memory_backend_get_pmem(Object *o, Error **errp)
 {
 return MEMORY_BACKEND_FILE(o)->is_pmem;
@@ -203,6 +228,9 @@ file_backend_class_init(ObjectClass *oc, void *data)
 object_class_property_add_bool(oc, "pmem",
 file_memory_backend_get_pmem, file_memory_backend_set_pmem,
 _abort);
+object_class_property_add_bool(oc, "sync",
+file_memory_backend_get_sync, file_memory_backend_set_sync,
+_abort);
 }
 
 static void file_backend_instance_finalize(Object *o)
diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
index 5f158a6..30db458 100644
--- a/docs/nvdimm.txt
+++ b/docs/nvdimm.txt
@@ -142,11 +142,32 @@ backend of vNVDIMM:
 Guest Data Persistence
 --
 
+vNVDIMM is designed and implemented to guarantee the guest data
+persistence on the backends even on the host crash and power
+failures. However, there are still some requirements and limitations
+as explained below.
+
 Though QEMU supports multiple types of vNVDIMM backends on Linux,
-currently the only one that can guarantee the guest write persistence
+if MAP_SYNC is not supported by the host kernel and the backends,
+the only backend that can guarantee the guest write persistence
 is the device DAX on the real NVDIMM device (e.g., /dev/dax0.0), to
 which all guest access do not involve any host-side kernel cache.
 
+mmap(2) flag MAP_SYNC is added since Linux kernel 4.15. On such
+systems, QEMU can mmap(2) the backend with MAP_SYNC, which can ensure
+filesystem metadata consistent even after a system crash or power
+failure. Besides the host kernel support, enabling MAP_SYNC in QEMU
+also requires:
+
+ - the backend is a file supporting DAX, e.g., a file on an ext4 or
+   xfs file system mounted with '-o dax',
+
+ - 'sync' option of memory-backend-file is on, and
+
+ - 'share' option of memory-backend-file is 'on'.
+
+ - 'pmem' option of memory-backend-file is 'on'
+
 When using other types of backends, it's suggested to set 'unarmed'
 option of '-device nvdimm' to 'on', which sets the unarmed flag of the
 guest NVDIMM region mapping structure.  This unarmed flag indicates
diff --git a/exec.c b/exec.c
index e92a7da..dc4d180 100644
--- a/exec.c
+++ b/exec.c
@@ -2241,7 +2241,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, 
MemoryRegion *mr,
 int64_t file_size;
 
 /* Just support these ram flags by now. */
-assert((ram_flags & ~(RAM_SHARED | RAM_P

[Qemu-devel] [PATCH V8 2/5] util/mmap-alloc: switch qemu_ram_mmap() to 'flags' parameter

2019-01-01 Thread Zhang Yi
As more flag parameters besides the existing 'shared' are going to be
added to qemu_ram_mmap(), let's switch 'shared' to a 'flags' parameter
in advance, so as to ease the further additions.

Signed-off-by: Haozhong Zhang 
Signed-off-by: Zhang Yi 
---
 exec.c|  7 ---
 include/exec/memory.h | 22 --
 include/qemu/mmap-alloc.h | 19 ++-
 util/mmap-alloc.c |  8 +---
 util/oslib-posix.c|  9 -
 5 files changed, 51 insertions(+), 14 deletions(-)

diff --git a/exec.c b/exec.c
index bb6170d..e92a7da 100644
--- a/exec.c
+++ b/exec.c
@@ -1810,6 +1810,7 @@ static void *file_ram_alloc(RAMBlock *block,
 ram_addr_t memory,
 int fd,
 bool truncate,
+uint32_t flags,
 Error **errp)
 {
 void *area;
@@ -1859,8 +1860,7 @@ static void *file_ram_alloc(RAMBlock *block,
 perror("ftruncate");
 }
 
-area = qemu_ram_mmap(fd, memory, block->mr->align,
- block->flags & RAM_SHARED);
+area = qemu_ram_mmap(fd, memory, block->mr->align, flags);
 if (area == MAP_FAILED) {
 error_setg_errno(errp, errno,
  "unable to map backing store for guest RAM");
@@ -2279,7 +2279,8 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, 
MemoryRegion *mr,
 new_block->used_length = size;
 new_block->max_length = size;
 new_block->flags = ram_flags;
-new_block->host = file_ram_alloc(new_block, size, fd, !file_size, errp);
+new_block->host = file_ram_alloc(new_block, size, fd, !file_size,
+ram_flags, errp);
 if (!new_block->host) {
 g_free(new_block);
 return NULL;
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 667466b..6e30c23 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -103,28 +103,38 @@ struct IOMMUNotifier {
 };
 typedef struct IOMMUNotifier IOMMUNotifier;
 
+#ifdef __CHECKER__
+#define QEMU_BITWISE __attribute__((bitwise))
+#define QEMU_FORCE   __attribute__((force))
+#else
+#define QEMU_BITWISE
+#define QEMU_FORCE
+#endif
+
+typedef unsigned QEMU_BITWISE QemuMmapFlags;
+
 /* RAM is pre-allocated and passed into qemu_ram_alloc_from_ptr */
-#define RAM_PREALLOC   (1 << 0)
+#define RAM_PREALLOC ((QEMU_FORCE QemuMmapFlags) (1 << 0))
 
 /* RAM is mmap-ed with MAP_SHARED */
-#define RAM_SHARED (1 << 1)
+#define RAM_SHARED ((QEMU_FORCE QemuMmapFlags) (1 << 1))
 
 /* Only a portion of RAM (used_length) is actually used, and migrated.
  * This used_length size can change across reboots.
  */
-#define RAM_RESIZEABLE (1 << 2)
+#define RAM_RESIZEABLE ((QEMU_FORCE QemuMmapFlags) (1 << 2))
 
 /* UFFDIO_ZEROPAGE is available on this RAMBlock to atomically
  * zero the page and wake waiting processes.
  * (Set during postcopy)
  */
-#define RAM_UF_ZEROPAGE (1 << 3)
+#define RAM_UF_ZEROPAGE ((QEMU_FORCE QemuMmapFlags) (1 << 3))
 
 /* RAM can be migrated */
-#define RAM_MIGRATABLE (1 << 4)
+#define RAM_MIGRATABLE ((QEMU_FORCE QemuMmapFlags) (1 << 4))
 
 /* RAM is a persistent kind memory */
-#define RAM_PMEM (1 << 5)
+#define RAM_PMEM ((QEMU_FORCE QemuMmapFlags) (1 << 5))
 
 static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
IOMMUNotifierFlag flags,
diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
index 50385e3..6fe6ed4 100644
--- a/include/qemu/mmap-alloc.h
+++ b/include/qemu/mmap-alloc.h
@@ -7,7 +7,24 @@ size_t qemu_fd_getpagesize(int fd);
 
 size_t qemu_mempath_getpagesize(const char *mem_path);
 
-void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared);
+/**
+ * qemu_ram_mmap: mmap the specified file or device.
+ *
+ * Parameters:
+ *  @fd: the file or the device to mmap
+ *  @size: the number of bytes to be mmaped
+ *  @align: if not zero, specify the alignment of the starting mapping address;
+ *  otherwise, the alignment in use will be determined by QEMU.
+ *  @flags: specifies additional properties of the mapping, which can be one or
+ *  bit-or of following values
+ *  - RAM_SHARED: mmap with MAP_SHARED flag
+ *  Other bits are ignored.
+ *
+ * Return:
+ *  On success, return a pointer to the mapped area.
+ *  On failure, return MAP_FAILED.
+ */
+void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags);
 
 void qemu_ram_munmap(void *ptr, size_t size);
 
diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index fd329ec..8f0a740 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -13,6 +13,7 @@
 #include "qemu/osdep.h"
 #include "qemu/mmap-alloc.h"
 #include "qemu/host-utils.h"
+#include "exec/memory.h"
 
 #define HUGETLBFS_MAGIC   0x958458f6
 
@@

[Qemu-devel] [PATCH V8 3/5] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()

2019-01-01 Thread Zhang Yi
When a file supporting DAX is used as vNVDIMM backend, mmap it with
MAP_SYNC flag in addition which can ensure file system metadata
synced in each guest writes to the backend file, without other QEMU
actions (e.g., periodic fsync() by QEMU).

Signed-off-by: Haozhong Zhang 
Signed-off-by: Zhang Yi 
---
 include/qemu/osdep.h | 16 
 util/mmap-alloc.c| 12 +++-
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index 3bf48bc..bb1eba1 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -410,6 +410,22 @@ void qemu_anon_ram_free(void *ptr, size_t size);
 #  define QEMU_VMALLOC_ALIGN getpagesize()
 #endif
 
+/*
+ * MAP_SHARED_VALIDATE and MAP_SYNC are introduced in Linux kernel
+ * 4.15, so they may not be defined when compiling on older kernels.
+ */
+#ifdef CONFIG_LINUX
+
+#include 
+
+#ifndef MAP_SYNC
+#define MAP_SYNC 0x0
+#endif
+
+#else  /* !CONFIG_LINUX */
+#define MAP_SYNC  0x0
+#endif /* CONFIG_LINUX */
+
 #ifdef CONFIG_POSIX
 struct qemu_signalfd_siginfo {
 uint32_t ssi_signo;   /* Signal number */
diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index 8f0a740..a9d5e56 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -99,6 +99,8 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, 
uint32_t flags)
 void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
 #endif
 bool shared = flags & RAM_SHARED;
+bool is_pmem = flags & RAM_PMEM;
+int mmap_xflags = 0;
 size_t offset;
 void *ptr1;
 
@@ -109,13 +111,21 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, 
uint32_t flags)
 assert(is_power_of_2(align));
 /* Always align to host page size */
 assert(align >= getpagesize());
+if (shared && is_pmem) {
+mmap_xflags |= MAP_SYNC;
+}
 
 offset = QEMU_ALIGN_UP((uintptr_t)ptr, align) - (uintptr_t)ptr;
+ retry_mmap_fd:
 ptr1 = mmap(ptr + offset, size, PROT_READ | PROT_WRITE,
 MAP_FIXED |
 (fd == -1 ? MAP_ANONYMOUS : 0) |
-(shared ? MAP_SHARED : MAP_PRIVATE),
+(shared ? MAP_SHARED : MAP_PRIVATE) | mmap_xflags,
 fd, 0);
+if ((ptr1 == MAP_FAILED) && (mmap_xflags & MAP_SYNC)) {
+mmap_xflags &= ~MAP_SYNC;
+goto retry_mmap_fd;
+}
 if (ptr1 == MAP_FAILED) {
 munmap(ptr, total);
 return MAP_FAILED;
-- 
2.7.4




[Qemu-devel] [PATCH V8 4/5] hostmem: add more information in error messages

2019-01-01 Thread Zhang Yi
When there are multiple memory backends in use, including the object type
name, ID and the property name in the error message can help users to
locate the error.

Signed-off-by: Haozhong Zhang 
Signed-off-by: Zhang Yi 
---
 backends/hostmem-file.c | 6 --
 backends/hostmem.c  | 8 +---
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index e640749..0dd7a90 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -82,7 +82,8 @@ static void set_mem_path(Object *o, const char *str, Error 
**errp)
 HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
 
 if (host_memory_backend_mr_inited(backend)) {
-error_setg(errp, "cannot change property value");
+error_setg(errp, "cannot change property 'mem-path' of %s",
+   object_get_typename(o));
 return;
 }
 g_free(fb->mem_path);
@@ -120,7 +121,8 @@ static void file_memory_backend_set_align(Object *o, 
Visitor *v,
 uint64_t val;
 
 if (host_memory_backend_mr_inited(backend)) {
-error_setg(_err, "cannot change property value");
+error_setg(_err, "cannot change property '%s' of %s",
+   name, object_get_typename(o));
 goto out;
 }
 
diff --git a/backends/hostmem.c b/backends/hostmem.c
index 1a89342..e2bcf9f 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -47,7 +47,8 @@ host_memory_backend_set_size(Object *obj, Visitor *v, const 
char *name,
 uint64_t value;
 
 if (host_memory_backend_mr_inited(backend)) {
-error_setg(_err, "cannot change property value");
+error_setg(_err, "cannot change property %s of %s ",
+   name, object_get_typename(obj));
 goto out;
 }
 
@@ -56,8 +57,9 @@ host_memory_backend_set_size(Object *obj, Visitor *v, const 
char *name,
 goto out;
 }
 if (!value) {
-error_setg(_err, "Property '%s.%s' doesn't take value '%"
-   PRIu64 "'", object_get_typename(obj), name, value);
+error_setg(_err,
+   "property '%s' of %s doesn't take value '%" PRIu64 "'",
+   name, object_get_typename(obj), value);
 goto out;
 }
 backend->size = value;
-- 
2.7.4




[Qemu-devel] [PATCH V8 1/5] numa: Fixed the memory leak of numa error message

2019-01-01 Thread Zhang Yi
object_get_canonical_path_component() returns a string which
must be freed using g_free().

Signed-off-by: Zhang Yi 
Reviewed-by: Pankaj gupta 
Reviewed-by: Igor Mammedov 
---
 numa.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/numa.c b/numa.c
index 50ec016..3875e1e 100644
--- a/numa.c
+++ b/numa.c
@@ -533,6 +533,7 @@ void memory_region_allocate_system_memory(MemoryRegion *mr, 
Object *owner,
 error_report("memory backend %s is used multiple times. Each "
  "-numa option must use a different memdev value.",
  path);
+g_free(path);
 exit(1);
 }
 
-- 
2.7.4




[Qemu-devel] [PATCH V8 0/5] support MAP_SYNC for memory-backend-file

2019-01-01 Thread Zhang Yi
Linux 4.15 introduces a new mmap flag MAP_SYNC, which can be used to
guarantee the write persistence to mmap'ed files supporting DAX (e.g.,
files on ext4/xfs file system mounted with '-o dax').

A description of MAP_SYNC and MAP_SHARED_VALIDATE can be found at
https://patchwork.kernel.org/patch/10028151/

In order to make sure that the file metadata is in sync after a fault 
while we are writing a shared DAX supporting backend files, this
patch-set enables QEMU to use MAP_SYNC flag for memory-backend-dax-file.

As the DAX vs DMA truncated issue was solved, we refined the code and
send out this feature for the v5 version.

A new on/off option 'sync' is added to memory-backend-file:
 - on:  try to pass MAP_SYNC to mmap(2); if MAP_SYNC is not supported or
'share=off' or 'pmem=off', QEMU will not pass this flag to mmap(2)
 - off: (default) never pass MAP_SYNC to mmap(2)

Changes in v8:
 * Micheal: 3/5, remove the duplicated define in the os_dep.h
 * Micheal: 2/5, make type define safety.
 * Micheal: 2/5, fixed the incorrect define MAP_SHARE on qemu_anon_ram_alloc.
 * 4/6 removed, we remove the on/off/auto define of sync,  as by now,
   MAP_SYNC only worked with pmem=on.
 * @Micheal, I still reuse the RAM_SYNC flag, it is much straightforward to 
parse 
   all the flags in one parameter.

Changes in v7:
 * Micheal: [3,4,6]/6 limited the "sync" flag only on a nvdimm backend.(pmem=on)

Changes in v6:
 * Pankaj: 3/7 are squashed with 2/7
 * Pankaj: 7/7 update comments to "consistent filesystem metadata".
 * Pankaj, Igor: 1/7 Added Reviewed-by in patch-1/7
 * Stefan, 4/7 move the include header from "/linux/mman.h" to "osdep.h"
 * Stefan, 5/7 Add missing "munmap"
 * Stefan, 2/7 refine the shared/flag.

Changes in v5:
 * Add patch 1 to fix a memory leak issue.
 * Refine the patch 4-6
 * Remove the patch 3 as we already change the parameter from "shared" to
   "flags"

Changes in v4:
 * Add patch 1-3 to switch some functions to a single 'flags'
   parameters. (Michael S. Tsirkin)
 * v3 patch 1-3 become v4 patch 4-6.
 * Patch 4: move definitions of MAP_SYNC and MAP_SHARED_VALIDATE to a
   new header file under include/standard-headers/linux/. (Michael S. Tsirkin)
 * Patch 6: refine the description of the 'sync' option. (Michael S. Tsirkin)

Changes in v3:
 * Patch 1: add MAP_SHARED_VALIDATE in both sync=on and sync=auto
   cases, and add back the retry mechanism. MAP_SYNC will be ignored
   by Linux kernel 4.15 if MAP_SHARED_VALIDATE is missed.
 * Patch 1: define MAP_SYNC and MAP_SHARED_VALIDATE as 0 on non-Linux
   platforms in order to make qemu_ram_mmap() compile on those platforms.
 * Patch 2&3: include more information in error messages of
   memory-backend in hope to help user to identify the error.
   (Dr. David Alan Gilbert)
 * Patch 3: fix typo in the commit message. (Dr. David Alan Gilbert)

Changes in v2:
 * Add 'sync' option to control the use of MAP_SYNC. (Eduardo Habkost)
 * Remove the unnecessary set of MAP_SHARED_VALIDATE in some cases and
   the retry mechanism in qemu_ram_mmap(). (Michael S. Tsirkin)
 * Move OS dependent definitions of MAP_SYNC and MAP_SHARED_VALIDATE
   to osdep.h. (Michael S. Tsirkin)

Zhang Yi (5):
  numa: Fixed the memory leak of numa error message
  util/mmap-alloc: switch qemu_ram_mmap() to 'flags' parameter
  util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
  hostmem: add more information in error messages
  hostmem-file: add 'sync' option

 backends/hostmem-file.c   | 34 --
 backends/hostmem.c|  8 +---
 docs/nvdimm.txt   | 23 ++-
 exec.c|  9 +
 include/exec/memory.h | 26 --
 include/exec/ram_addr.h   |  1 +
 include/qemu/mmap-alloc.h | 20 +++-
 include/qemu/osdep.h  | 16 
 numa.c|  1 +
 qemu-options.hx   | 19 ++-
 util/mmap-alloc.c | 20 
 util/oslib-posix.c|  9 -
 12 files changed, 163 insertions(+), 23 deletions(-)

-- 
2.7.4




[Qemu-devel] [PATCH V7 5/6] hostmem: add more information in error messages

2018-12-18 Thread Zhang Yi
When there are multiple memory backends in use, including the object type
name, ID and the property name in the error message can help users to
locate the error.

Signed-off-by: Haozhong Zhang 
Signed-off-by: Zhang Yi 
---
 backends/hostmem-file.c | 6 --
 backends/hostmem.c  | 8 +---
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index e640749..0dd7a90 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -82,7 +82,8 @@ static void set_mem_path(Object *o, const char *str, Error 
**errp)
 HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
 
 if (host_memory_backend_mr_inited(backend)) {
-error_setg(errp, "cannot change property value");
+error_setg(errp, "cannot change property 'mem-path' of %s",
+   object_get_typename(o));
 return;
 }
 g_free(fb->mem_path);
@@ -120,7 +121,8 @@ static void file_memory_backend_set_align(Object *o, 
Visitor *v,
 uint64_t val;
 
 if (host_memory_backend_mr_inited(backend)) {
-error_setg(_err, "cannot change property value");
+error_setg(_err, "cannot change property '%s' of %s",
+   name, object_get_typename(o));
 goto out;
 }
 
diff --git a/backends/hostmem.c b/backends/hostmem.c
index 1a89342..e2bcf9f 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -47,7 +47,8 @@ host_memory_backend_set_size(Object *obj, Visitor *v, const 
char *name,
 uint64_t value;
 
 if (host_memory_backend_mr_inited(backend)) {
-error_setg(_err, "cannot change property value");
+error_setg(_err, "cannot change property %s of %s ",
+   name, object_get_typename(obj));
 goto out;
 }
 
@@ -56,8 +57,9 @@ host_memory_backend_set_size(Object *obj, Visitor *v, const 
char *name,
 goto out;
 }
 if (!value) {
-error_setg(_err, "Property '%s.%s' doesn't take value '%"
-   PRIu64 "'", object_get_typename(obj), name, value);
+error_setg(_err,
+   "property '%s' of %s doesn't take value '%" PRIu64 "'",
+   name, object_get_typename(obj), value);
 goto out;
 }
 backend->size = value;
-- 
2.7.4




[Qemu-devel] [PATCH V7 6/6] hostmem-file: add 'sync' option

2018-12-18 Thread Zhang Yi
This option controls whether QEMU mmap(2) the memory backend file with
MAP_SYNC flag, which could consistent filesystem metadata for each guest
write, if MAP_SYNC flag is supported by the host kernel(Linux kernel 4.15
and later) and the backend is a file supporting DAX (e.g., file on ext4/xfs
file system mounted with '-o dax').

It can take one of following values:
 - on:  try to pass MAP_SYNC to mmap(2); if MAP_SYNC is not supported or
'share=off' or 'pmem!=on', QEMU will abort
 - off: never pass MAP_SYNC to mmap(2)
 - auto (default): if MAP_SYNC is supported and 'share=on' 'pmem=on', work as
if 'sync=on'; otherwise, work as if 'sync=off'

Signed-off-by: Haozhong Zhang 
Signed-off-by: Zhang Yi 
---
 backends/hostmem-file.c | 39 +++
 docs/nvdimm.txt | 22 +-
 include/exec/memory.h   |  8 
 qemu-options.hx | 22 +-
 4 files changed, 89 insertions(+), 2 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index 0dd7a90..73cf181 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -16,6 +16,7 @@
 #include "sysemu/hostmem.h"
 #include "sysemu/sysemu.h"
 #include "qom/object_interfaces.h"
+#include "qapi/qapi-visit.h"
 
 /* hostmem-file.c */
 /**
@@ -36,6 +37,7 @@ struct HostMemoryBackendFile {
 uint64_t align;
 bool discard_data;
 bool is_pmem;
+OnOffAuto sync;
 };
 
 static void
@@ -62,6 +64,7 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error 
**errp)
  path,
  backend->size, fb->align,
  (backend->share ? RAM_SHARED : 0) |
+ qemu_ram_sync_flags(fb->sync) |
  (fb->is_pmem ? RAM_PMEM : 0),
  fb->mem_path, errp);
 g_free(path);
@@ -136,6 +139,39 @@ static void file_memory_backend_set_align(Object *o, 
Visitor *v,
 error_propagate(errp, local_err);
 }
 
+static void file_memory_backend_get_sync(
+Object *obj, Visitor *v, const char *name, void *opaque, Error **errp)
+{
+HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(obj);
+OnOffAuto value = fb->sync;
+
+visit_type_OnOffAuto(v, name, , errp);
+}
+
+static void file_memory_backend_set_sync(
+Object *obj, Visitor *v, const char *name, void *opaque, Error **errp)
+{
+HostMemoryBackend *backend = MEMORY_BACKEND(obj);
+HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(obj);
+Error *local_err = NULL;
+OnOffAuto value;
+
+if (host_memory_backend_mr_inited(backend)) {
+error_setg(_err, "cannot change property '%s' of %s",
+   name, object_get_typename(obj));
+goto out;
+}
+
+visit_type_OnOffAuto(v, name, , _err);
+if (local_err) {
+goto out;
+}
+fb->sync = value;
+
+ out:
+error_propagate(errp, local_err);
+}
+
 static bool file_memory_backend_get_pmem(Object *o, Error **errp)
 {
 return MEMORY_BACKEND_FILE(o)->is_pmem;
@@ -203,6 +239,9 @@ file_backend_class_init(ObjectClass *oc, void *data)
 object_class_property_add_bool(oc, "pmem",
 file_memory_backend_get_pmem, file_memory_backend_set_pmem,
 _abort);
+object_class_property_add(oc, "sync", "OnOffAuto",
+file_memory_backend_get_sync, file_memory_backend_set_sync,
+NULL, NULL, _abort);
 }
 
 static void file_backend_instance_finalize(Object *o)
diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
index 5f158a6..d86a270 100644
--- a/docs/nvdimm.txt
+++ b/docs/nvdimm.txt
@@ -142,11 +142,31 @@ backend of vNVDIMM:
 Guest Data Persistence
 --
 
+vNVDIMM is designed and implemented to guarantee the guest data
+persistence on the backends even on the host crash and power
+failures. However, there are still some requirements and limitations
+as explained below.
+
 Though QEMU supports multiple types of vNVDIMM backends on Linux,
-currently the only one that can guarantee the guest write persistence
+if MAP_SYNC is not supported by the host kernel and the backends,
+the only backend that can guarantee the guest write persistence
 is the device DAX on the real NVDIMM device (e.g., /dev/dax0.0), to
 which all guest access do not involve any host-side kernel cache.
 
+mmap(2) flag MAP_SYNC is added since Linux kernel 4.15. On such
+systems, QEMU can mmap(2) the backend with MAP_SYNC, which could
+consistent filesystem metadata for each guest write. Besides the host
+kernel support, enabling MAP_SYNC in QEMU also requires:
+
+ - the backend is a file supporting DAX, e.g., a file on an ext4 or
+   xfs file system mounted with '-o dax',
+
+ - 'sync' option of memory-backend-file is not 'off', and
+
+ - 'share' option of memory-backend-file is 'on'.
+
+ - 'pmem' option of memory-backend-fil

[Qemu-devel] [PATCH V7 1/6] numa: Fixed the memory leak of numa error message

2018-12-18 Thread Zhang Yi
object_get_canonical_path_component() returns a string which
must be freed using g_free().

Signed-off-by: Zhang Yi 
Reviewed-by: Pankaj gupta 
Reviewed-by: Igor Mammedov 
---
 numa.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/numa.c b/numa.c
index 50ec016..3875e1e 100644
--- a/numa.c
+++ b/numa.c
@@ -533,6 +533,7 @@ void memory_region_allocate_system_memory(MemoryRegion *mr, 
Object *owner,
 error_report("memory backend %s is used multiple times. Each "
  "-numa option must use a different memdev value.",
  path);
+g_free(path);
 exit(1);
 }
 
-- 
2.7.4




[Qemu-devel] [PATCH V7 4/6] util/mmap-alloc: Switch the RAM_SYNC flags to OnOffAuto

2018-12-18 Thread Zhang Yi
Signed-off-by: Zhang Yi 

A set of RAM_SYNC_ON_OFF_AUTO{AUTO,ON,OFF} flags are added to
qemu_ram_mmap():

- If RAM_SYNC_ON_OFF_AUTO_ON is present, qemu_ram_mmap() will try to pass
  MAP_SYNC to mmap(). It will then fail if the host OS or the backend
  file do not support MAP_SYNC, or MAP_SYNC is conflict with other
  flags.

- If RAM_SYNC_ON_OFF_AUTO_OFF is present, qemu_ram_mmap() will never pass
  MAP_SYNC to mmap().

- If RAM_SYNC_ON_OFF_AUTO_AUTO is present, and
  * if the host OS and the backend file support MAP_SYNC, and MAP_SYNC
is not conflict with other flags, qemu_ram_mmap() will work as if
RAM_SYNC_ON_OFF_AUTO_ON is present;
  * otherwise, qemu_ram_mmap() will work as if RAM_SYNC_ON_OFF_AUTO_OFF is
present.
---
 include/exec/memory.h |  9 -
 util/mmap-alloc.c | 13 +++--
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 33a4e2c..c74c467 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -127,7 +127,14 @@ typedef struct IOMMUNotifier IOMMUNotifier;
 #define RAM_PMEM (1 << 5)
 
 /* RAM can be mmap by a MAP_SYNC flag */
-#define RAM_SYNC (1 << 6)
+#define RAM_SYNC_SHIFT  6
+#define RAM_SYNC_SHIFT_AUTO  7
+
+#define RAM_SYNC_ON_OFF_AUTO_ON   (1UL << RAM_SYNC_SHIFT)
+#define RAM_SYNC_ON_OFF_AUTO_OFF  (0UL << RAM_SYNC_SHIFT)
+#define RAM_SYNC_ON_OFF_AUTO_AUTO (1UL << RAM_SYNC_SHIFT_AUTO)
+
+#define RAM_SYNC (RAM_SYNC_ON_OFF_AUTO_ON | RAM_SYNC_ON_OFF_AUTO_AUTO)
 
 static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
IOMMUNotifierFlag flags,
diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index 89ae862..2f2fb43 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -111,6 +111,11 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, 
uint32_t flags)
 assert(is_power_of_2(align));
 /* Always align to host page size */
 assert(align >= getpagesize());
+if ((flags & RAM_SYNC_ON_OFF_AUTO_ON) &&
+(!shared || !MAP_SYNC_FLAGS || !is_pmem)) {
+munmap(ptr, total);
+return MAP_FAILED;
+}
 if ((flags & RAM_SYNC) && shared && is_pmem) {
 mmap_xflags |= MAP_SYNC_FLAGS;
 }
@@ -123,8 +128,12 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, 
uint32_t flags)
 (shared ? MAP_SHARED : MAP_PRIVATE) | mmap_xflags,
 fd, 0);
 if ((ptr1 == MAP_FAILED) && (mmap_xflags & MAP_SYNC_FLAGS)) {
-mmap_xflags &= ~MAP_SYNC_FLAGS;
-goto retry_mmap_fd;
+if (flags & RAM_SYNC_ON_OFF_AUTO_AUTO) {
+mmap_xflags &= ~MAP_SYNC_FLAGS;
+goto retry_mmap_fd;
+}
+munmap(ptr, total);
+return MAP_FAILED;
 }
 
 if (offset > 0) {
-- 
2.7.4




[Qemu-devel] [PATCH V7 3/6] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()

2018-12-18 Thread Zhang Yi
When a file supporting DAX is used as vNVDIMM backend, mmap it with
MAP_SYNC flag in addition can guarantee the persistence of guest write
to the backend file without other QEMU actions (e.g., periodic fsync()
by QEMU).

A set of RAM_SYNC flags are added to qemu_ram_mmap():

Signed-off-by: Haozhong Zhang 
Signed-off-by: Zhang Yi 
---
 exec.c|  2 +-
 include/exec/memory.h |  3 +++
 include/exec/ram_addr.h   |  1 +
 include/qemu/mmap-alloc.h |  1 +
 include/qemu/osdep.h  | 29 +
 util/mmap-alloc.c | 14 ++
 6 files changed, 45 insertions(+), 5 deletions(-)

diff --git a/exec.c b/exec.c
index e92a7da..dc4d180 100644
--- a/exec.c
+++ b/exec.c
@@ -2241,7 +2241,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, 
MemoryRegion *mr,
 int64_t file_size;
 
 /* Just support these ram flags by now. */
-assert((ram_flags & ~(RAM_SHARED | RAM_PMEM)) == 0);
+assert((ram_flags & ~(RAM_SHARED | RAM_PMEM | RAM_SYNC)) == 0);
 
 if (xen_enabled()) {
 error_setg(errp, "-mem-path not supported with Xen");
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 667466b..33a4e2c 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -126,6 +126,9 @@ typedef struct IOMMUNotifier IOMMUNotifier;
 /* RAM is a persistent kind memory */
 #define RAM_PMEM (1 << 5)
 
+/* RAM can be mmap by a MAP_SYNC flag */
+#define RAM_SYNC (1 << 6)
+
 static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
IOMMUNotifierFlag flags,
hwaddr start, hwaddr end,
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index 9ecd911..d239ce7 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -87,6 +87,7 @@ long qemu_getrampagesize(void);
  *  or bit-or of following values
  *  - RAM_SHARED: mmap the backing file or device with MAP_SHARED
  *  - RAM_PMEM: the backend @mem_path or @fd is persistent memory
+ *  - RAM_SYNC:   mmap with MAP_SYNC flag
  *  Other bits are ignored.
  *  @mem_path or @fd: specify the backing file or device
  *  @errp: pointer to Error*, to store an error if it happens
diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
index 6fe6ed4..1755a8b 100644
--- a/include/qemu/mmap-alloc.h
+++ b/include/qemu/mmap-alloc.h
@@ -18,6 +18,7 @@ size_t qemu_mempath_getpagesize(const char *mem_path);
  *  @flags: specifies additional properties of the mapping, which can be one or
  *  bit-or of following values
  *  - RAM_SHARED: mmap with MAP_SHARED flag
+ *  - RAM_SYNC:   mmap with MAP_SYNC flag
  *  Other bits are ignored.
  *
  * Return:
diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index 3bf48bc..f94ea68 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -410,6 +410,35 @@ void qemu_anon_ram_free(void *ptr, size_t size);
 #  define QEMU_VMALLOC_ALIGN getpagesize()
 #endif
 
+/*
+ * MAP_SHARED_VALIDATE and MAP_SYNC are introduced in Linux kernel
+ * 4.15, so they may not be defined when compiling on older kernels.
+ */
+#ifdef CONFIG_LINUX
+
+#include 
+
+#ifndef MAP_SHARED_VALIDATE
+#define MAP_SHARED_VALIDATE   0x3
+#endif
+
+#ifndef MAP_SYNC
+#define MAP_SYNC  0x8
+#endif
+
+/* MAP_SYNC is only available with MAP_SHARED_VALIDATE. */
+#define MAP_SYNC_FLAGS (MAP_SYNC | MAP_SHARED_VALIDATE)
+
+#else  /* !CONFIG_LINUX */
+
+#define MAP_SHARED_VALIDATE   0x0
+#define MAP_SYNC  0x0
+
+#define QEMU_HAS_MAP_SYNC false
+#define MAP_SYNC_FLAGS 0
+
+#endif /* CONFIG_LINUX */
+
 #ifdef CONFIG_POSIX
 struct qemu_signalfd_siginfo {
 uint32_t ssi_signo;   /* Signal number */
diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index 8f0a740..89ae862 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -99,6 +99,8 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, 
uint32_t flags)
 void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
 #endif
 bool shared = flags & RAM_SHARED;
+bool is_pmem = flags & RAM_PMEM;
+int mmap_xflags = 0;
 size_t offset;
 void *ptr1;
 
@@ -109,16 +111,20 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, 
uint32_t flags)
 assert(is_power_of_2(align));
 /* Always align to host page size */
 assert(align >= getpagesize());
+if ((flags & RAM_SYNC) && shared && is_pmem) {
+mmap_xflags |= MAP_SYNC_FLAGS;
+}
 
 offset = QEMU_ALIGN_UP((uintptr_t)ptr, align) - (uintptr_t)ptr;
+ retry_mmap_fd:
 ptr1 = mmap(ptr + offset, size, PROT_READ | PROT_WRITE,
 MAP_FIXED |
 (fd == -1 ? MAP_ANONYMOUS : 0) |
-(shared ? MAP_SHARED : MAP_PRIVATE),
+(shared ? MAP_SHARED : MAP_PRIVATE) | mmap_xflags,
 fd, 

[Qemu-devel] [PATCH V7 2/6] util/mmap-alloc: switch qemu_ram_mmap() to 'flags' parameter

2018-12-18 Thread Zhang Yi
As more flag parameters besides the existing 'shared' are going to be
added to qemu_ram_mmap(), let's switch 'shared' to a 'flags' parameter
in advance, so as to ease the further additions.

Signed-off-by: Haozhong Zhang 
Signed-off-by: Zhang Yi 
---
 exec.c|  7 ---
 include/qemu/mmap-alloc.h | 19 ++-
 util/mmap-alloc.c |  8 +---
 util/oslib-posix.c|  8 +++-
 4 files changed, 34 insertions(+), 8 deletions(-)

diff --git a/exec.c b/exec.c
index bb6170d..e92a7da 100644
--- a/exec.c
+++ b/exec.c
@@ -1810,6 +1810,7 @@ static void *file_ram_alloc(RAMBlock *block,
 ram_addr_t memory,
 int fd,
 bool truncate,
+uint32_t flags,
 Error **errp)
 {
 void *area;
@@ -1859,8 +1860,7 @@ static void *file_ram_alloc(RAMBlock *block,
 perror("ftruncate");
 }
 
-area = qemu_ram_mmap(fd, memory, block->mr->align,
- block->flags & RAM_SHARED);
+area = qemu_ram_mmap(fd, memory, block->mr->align, flags);
 if (area == MAP_FAILED) {
 error_setg_errno(errp, errno,
  "unable to map backing store for guest RAM");
@@ -2279,7 +2279,8 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, 
MemoryRegion *mr,
 new_block->used_length = size;
 new_block->max_length = size;
 new_block->flags = ram_flags;
-new_block->host = file_ram_alloc(new_block, size, fd, !file_size, errp);
+new_block->host = file_ram_alloc(new_block, size, fd, !file_size,
+ram_flags, errp);
 if (!new_block->host) {
 g_free(new_block);
 return NULL;
diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
index 50385e3..6fe6ed4 100644
--- a/include/qemu/mmap-alloc.h
+++ b/include/qemu/mmap-alloc.h
@@ -7,7 +7,24 @@ size_t qemu_fd_getpagesize(int fd);
 
 size_t qemu_mempath_getpagesize(const char *mem_path);
 
-void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared);
+/**
+ * qemu_ram_mmap: mmap the specified file or device.
+ *
+ * Parameters:
+ *  @fd: the file or the device to mmap
+ *  @size: the number of bytes to be mmaped
+ *  @align: if not zero, specify the alignment of the starting mapping address;
+ *  otherwise, the alignment in use will be determined by QEMU.
+ *  @flags: specifies additional properties of the mapping, which can be one or
+ *  bit-or of following values
+ *  - RAM_SHARED: mmap with MAP_SHARED flag
+ *  Other bits are ignored.
+ *
+ * Return:
+ *  On success, return a pointer to the mapped area.
+ *  On failure, return MAP_FAILED.
+ */
+void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags);
 
 void qemu_ram_munmap(void *ptr, size_t size);
 
diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index fd329ec..8f0a740 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -13,6 +13,7 @@
 #include "qemu/osdep.h"
 #include "qemu/mmap-alloc.h"
 #include "qemu/host-utils.h"
+#include "exec/memory.h"
 
 #define HUGETLBFS_MAGIC   0x958458f6
 
@@ -75,7 +76,7 @@ size_t qemu_mempath_getpagesize(const char *mem_path)
 return getpagesize();
 }
 
-void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
+void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
 {
 /*
  * Note: this always allocates at least one extra page of virtual address
@@ -92,11 +93,12 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, bool 
shared)
  * anonymous memory is OK.
  */
 int anonfd = fd == -1 || qemu_fd_getpagesize(fd) == getpagesize() ? -1 : 
fd;
-int flags = anonfd == -1 ? MAP_ANONYMOUS : MAP_NORESERVE;
-void *ptr = mmap(0, total, PROT_NONE, flags | MAP_PRIVATE, anonfd, 0);
+int mmap_flags = anonfd == -1 ? MAP_ANONYMOUS : MAP_NORESERVE;
+void *ptr = mmap(0, total, PROT_NONE, mmap_flags | MAP_PRIVATE, anonfd, 0);
 #else
 void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
 #endif
+bool shared = flags & RAM_SHARED;
 size_t offset;
 void *ptr1;
 
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index fbd0dc8..121c31f 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -203,7 +203,13 @@ void *qemu_memalign(size_t alignment, size_t size)
 void *qemu_anon_ram_alloc(size_t size, uint64_t *alignment, bool shared)
 {
 size_t align = QEMU_VMALLOC_ALIGN;
-void *ptr = qemu_ram_mmap(-1, size, align, shared);
+uint32_t flags = 0;
+void *ptr;
+
+if (shared) {
+flags = MAP_SHARED;
+}
+ptr = qemu_ram_mmap(-1, size, align, flags);
 
 if (ptr == MAP_FAILED) {
 return NULL;
-- 
2.7.4




[Qemu-devel] [PATCH V7 0/6] nvdimm: support MAP_SYNC for memory-backend-file

2018-12-18 Thread Zhang Yi
Linux 4.15 introduces a new mmap flag MAP_SYNC, which can be used to
guarantee the write persistence to mmap'ed files supporting DAX (e.g.,
files on ext4/xfs file system mounted with '-o dax').

A description of MAP_SYNC and MAP_SHARED_VALIDATE can be found at
https://patchwork.kernel.org/patch/10028151/

In order to make sure that the file metadata is in sync after a fault 
while we are writing a shared DAX supporting backend files, this
patch-set enables QEMU to use MAP_SYNC flag for memory-backend-dax-file.

As the DAX vs DMA truncated issue was solved, we refined the code and
send out this feature for the v5 version.

A new auto on/off option 'sync' is added to memory-backend-file:
 - on:  try to pass MAP_SYNC to mmap(2); if MAP_SYNC is not supported or
'share=off' or 'pmem=off', QEMU will abort
 - off: never pass MAP_SYNC to mmap(2)
 - auto (default): if MAP_SYNC is supported and 'share=on' 'pmem=on', work as if
'sync=on'; otherwise, work as if 'sync=off'

Changes in v7:
 * Micheal: [3,4,6]/6 limited the "sync" flag only on a nvdimm backend.(pmem=on)

Changes in v6:
 * Pankaj: 3/7 are squashed with 2/7
 * Pankaj: 7/7 update comments to "consistent filesystem metadata".
 * Pankaj, Igor: 1/7 Added Reviewed-by in patch-1/7
 * Stefan, 4/7 move the include header from "/linux/mman.h" to "osdep.h"
 * Stefan, 5/7 Add missing "munmap"
 * Stefan, 2/7 refine the shared/flag.

Changes in v5:
 * Add patch 1 to fix a memory leak issue.
 * Refine the patch 4-6
 * Remove the patch 3 as we already change the parameter from "shared" to
   "flags"

Changes in v4:
 * Add patch 1-3 to switch some functions to a single 'flags'
   parameters. (Michael S. Tsirkin)
 * v3 patch 1-3 become v4 patch 4-6.
 * Patch 4: move definitions of MAP_SYNC and MAP_SHARED_VALIDATE to a
   new header file under include/standard-headers/linux/. (Michael S. Tsirkin)
 * Patch 6: refine the description of the 'sync' option. (Michael S. Tsirkin)

Changes in v3:
 * Patch 1: add MAP_SHARED_VALIDATE in both sync=on and sync=auto
   cases, and add back the retry mechanism. MAP_SYNC will be ignored
   by Linux kernel 4.15 if MAP_SHARED_VALIDATE is missed.
 * Patch 1: define MAP_SYNC and MAP_SHARED_VALIDATE as 0 on non-Linux
   platforms in order to make qemu_ram_mmap() compile on those platforms.
 * Patch 2&3: include more information in error messages of
   memory-backend in hope to help user to identify the error.
   (Dr. David Alan Gilbert)
 * Patch 3: fix typo in the commit message. (Dr. David Alan Gilbert)

Changes in v2:
 * Add 'sync' option to control the use of MAP_SYNC. (Eduardo Habkost)
 * Remove the unnecessary set of MAP_SHARED_VALIDATE in some cases and
   the retry mechanism in qemu_ram_mmap(). (Michael S. Tsirkin)
 * Move OS dependent definitions of MAP_SYNC and MAP_SHARED_VALIDATE
   to osdep.h. (Michael S. Tsirkin)

Zhang Yi (6):
  numa: Fixed the memory leak of numa error message
  util/mmap-alloc: switch qemu_ram_mmap() to 'flags' parameter
  util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
  util/mmap-alloc: Switch the RAM_SYNC flags to OnOffAuto
  hostmem: add more information in error messages
  hostmem-file: add 'sync' option

 backends/hostmem-file.c   | 45 +++--
 backends/hostmem.c|  8 +---
 docs/nvdimm.txt   | 22 +-
 exec.c|  9 +
 include/exec/memory.h | 18 ++
 include/exec/ram_addr.h   |  1 +
 include/qemu/mmap-alloc.h | 20 +++-
 include/qemu/osdep.h  | 29 +
 numa.c|  1 +
 qemu-options.hx   | 22 +-
 util/mmap-alloc.c | 27 ++-
 util/oslib-posix.c|  8 +++-
 12 files changed, 192 insertions(+), 18 deletions(-)

-- 
2.7.4




[Qemu-devel] [PATCH V6 6/6] hostmem-file: add 'sync' option

2018-12-12 Thread Zhang Yi
This option controls whether QEMU mmap(2) the memory backend file with
MAP_SYNC flag, which could consistent filesystem metadata for each guest
write, if MAP_SYNC flag is supported by the host kernel(Linux kernel 4.15
and later) and the backend is a file supporting DAX (e.g., file on ext4/xfs
file system mounted with '-o dax').

It can take one of following values:
 - on:  try to pass MAP_SYNC to mmap(2); if MAP_SYNC is not supported or
'share=off', QEMU will abort
 - off: never pass MAP_SYNC to mmap(2)
 - auto (default): if MAP_SYNC is supported and 'share=on', work as if
'sync=on'; otherwise, work as if 'sync=off'

Signed-off-by: Haozhong Zhang 
Signed-off-by: Zhang Yi 
---
 backends/hostmem-file.c | 39 +++
 docs/nvdimm.txt | 20 +++-
 include/exec/memory.h   |  8 
 qemu-options.hx | 22 +-
 4 files changed, 87 insertions(+), 2 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index 0dd7a90..73cf181 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -16,6 +16,7 @@
 #include "sysemu/hostmem.h"
 #include "sysemu/sysemu.h"
 #include "qom/object_interfaces.h"
+#include "qapi/qapi-visit.h"
 
 /* hostmem-file.c */
 /**
@@ -36,6 +37,7 @@ struct HostMemoryBackendFile {
 uint64_t align;
 bool discard_data;
 bool is_pmem;
+OnOffAuto sync;
 };
 
 static void
@@ -62,6 +64,7 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error 
**errp)
  path,
  backend->size, fb->align,
  (backend->share ? RAM_SHARED : 0) |
+ qemu_ram_sync_flags(fb->sync) |
  (fb->is_pmem ? RAM_PMEM : 0),
  fb->mem_path, errp);
 g_free(path);
@@ -136,6 +139,39 @@ static void file_memory_backend_set_align(Object *o, 
Visitor *v,
 error_propagate(errp, local_err);
 }
 
+static void file_memory_backend_get_sync(
+Object *obj, Visitor *v, const char *name, void *opaque, Error **errp)
+{
+HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(obj);
+OnOffAuto value = fb->sync;
+
+visit_type_OnOffAuto(v, name, , errp);
+}
+
+static void file_memory_backend_set_sync(
+Object *obj, Visitor *v, const char *name, void *opaque, Error **errp)
+{
+HostMemoryBackend *backend = MEMORY_BACKEND(obj);
+HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(obj);
+Error *local_err = NULL;
+OnOffAuto value;
+
+if (host_memory_backend_mr_inited(backend)) {
+error_setg(_err, "cannot change property '%s' of %s",
+   name, object_get_typename(obj));
+goto out;
+}
+
+visit_type_OnOffAuto(v, name, , _err);
+if (local_err) {
+goto out;
+}
+fb->sync = value;
+
+ out:
+error_propagate(errp, local_err);
+}
+
 static bool file_memory_backend_get_pmem(Object *o, Error **errp)
 {
 return MEMORY_BACKEND_FILE(o)->is_pmem;
@@ -203,6 +239,9 @@ file_backend_class_init(ObjectClass *oc, void *data)
 object_class_property_add_bool(oc, "pmem",
 file_memory_backend_get_pmem, file_memory_backend_set_pmem,
 _abort);
+object_class_property_add(oc, "sync", "OnOffAuto",
+file_memory_backend_get_sync, file_memory_backend_set_sync,
+NULL, NULL, _abort);
 }
 
 static void file_backend_instance_finalize(Object *o)
diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
index 5f158a6..3d89174 100644
--- a/docs/nvdimm.txt
+++ b/docs/nvdimm.txt
@@ -142,11 +142,29 @@ backend of vNVDIMM:
 Guest Data Persistence
 --
 
+vNVDIMM is designed and implemented to guarantee the guest data
+persistence on the backends even on the host crash and power
+failures. However, there are still some requirements and limitations
+as explained below.
+
 Though QEMU supports multiple types of vNVDIMM backends on Linux,
-currently the only one that can guarantee the guest write persistence
+if MAP_SYNC is not supported by the host kernel and the backends,
+the only backend that can guarantee the guest write persistence
 is the device DAX on the real NVDIMM device (e.g., /dev/dax0.0), to
 which all guest access do not involve any host-side kernel cache.
 
+mmap(2) flag MAP_SYNC is added since Linux kernel 4.15. On such
+systems, QEMU can mmap(2) the backend with MAP_SYNC, which can
+guarantee the guest write persistence to vNVDIMM. Besides the host
+kernel support, enabling MAP_SYNC in QEMU also requires:
+
+ - the backend is a file supporting DAX, e.g., a file on an ext4 or
+   xfs file system mounted with '-o dax',
+
+ - 'sync' option of memory-backend-file is not 'off', and
+
+ - 'share' option of memory-backend-file is 'on'.
+
 When using other types of backends, it's suggested to set 'un

[Qemu-devel] [PATCH V6 5/6] hostmem: add more information in error messages

2018-12-12 Thread Zhang Yi
When there are multiple memory backends in use, including the object type
name, ID and the property name in the error message can help users to
locate the error.

Signed-off-by: Haozhong Zhang 
Signed-off-by: Zhang Yi 
---
 backends/hostmem-file.c | 6 --
 backends/hostmem.c  | 8 +---
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index e640749..0dd7a90 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -82,7 +82,8 @@ static void set_mem_path(Object *o, const char *str, Error 
**errp)
 HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
 
 if (host_memory_backend_mr_inited(backend)) {
-error_setg(errp, "cannot change property value");
+error_setg(errp, "cannot change property 'mem-path' of %s",
+   object_get_typename(o));
 return;
 }
 g_free(fb->mem_path);
@@ -120,7 +121,8 @@ static void file_memory_backend_set_align(Object *o, 
Visitor *v,
 uint64_t val;
 
 if (host_memory_backend_mr_inited(backend)) {
-error_setg(_err, "cannot change property value");
+error_setg(_err, "cannot change property '%s' of %s",
+   name, object_get_typename(o));
 goto out;
 }
 
diff --git a/backends/hostmem.c b/backends/hostmem.c
index 1a89342..e2bcf9f 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -47,7 +47,8 @@ host_memory_backend_set_size(Object *obj, Visitor *v, const 
char *name,
 uint64_t value;
 
 if (host_memory_backend_mr_inited(backend)) {
-error_setg(_err, "cannot change property value");
+error_setg(_err, "cannot change property %s of %s ",
+   name, object_get_typename(obj));
 goto out;
 }
 
@@ -56,8 +57,9 @@ host_memory_backend_set_size(Object *obj, Visitor *v, const 
char *name,
 goto out;
 }
 if (!value) {
-error_setg(_err, "Property '%s.%s' doesn't take value '%"
-   PRIu64 "'", object_get_typename(obj), name, value);
+error_setg(_err,
+   "property '%s' of %s doesn't take value '%" PRIu64 "'",
+   name, object_get_typename(obj), value);
 goto out;
 }
 backend->size = value;
-- 
2.7.4




[Qemu-devel] [PATCH V6 4/6] util/mmap-alloc: Switch the RAM_SYNC flags to OnOffAuto

2018-12-12 Thread Zhang Yi
Signed-off-by: Zhang Yi 

A set of RAM_SYNC_ON_OFF_AUTO{AUTO,ON,OFF} flags are added to
qemu_ram_mmap():

- If RAM_SYNC_ON_OFF_AUTO_ON is present, qemu_ram_mmap() will try to pass
  MAP_SYNC to mmap(). It will then fail if the host OS or the backend
  file do not support MAP_SYNC, or MAP_SYNC is conflict with other
  flags.

- If RAM_SYNC_ON_OFF_AUTO_OFF is present, qemu_ram_mmap() will never pass
  MAP_SYNC to mmap().

- If RAM_SYNC_ON_OFF_AUTO_AUTO is present, and
  * if the host OS and the backend file support MAP_SYNC, and MAP_SYNC
is not conflict with other flags, qemu_ram_mmap() will work as if
RAM_SYNC_ON_OFF_AUTO_ON is present;
  * otherwise, qemu_ram_mmap() will work as if RAM_SYNC_ON_OFF_AUTO_OFF is
present.
---
 include/exec/memory.h |  9 -
 util/mmap-alloc.c | 13 +++--
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 33a4e2c..c74c467 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -127,7 +127,14 @@ typedef struct IOMMUNotifier IOMMUNotifier;
 #define RAM_PMEM (1 << 5)
 
 /* RAM can be mmap by a MAP_SYNC flag */
-#define RAM_SYNC (1 << 6)
+#define RAM_SYNC_SHIFT  6
+#define RAM_SYNC_SHIFT_AUTO  7
+
+#define RAM_SYNC_ON_OFF_AUTO_ON   (1UL << RAM_SYNC_SHIFT)
+#define RAM_SYNC_ON_OFF_AUTO_OFF  (0UL << RAM_SYNC_SHIFT)
+#define RAM_SYNC_ON_OFF_AUTO_AUTO (1UL << RAM_SYNC_SHIFT_AUTO)
+
+#define RAM_SYNC (RAM_SYNC_ON_OFF_AUTO_ON | RAM_SYNC_ON_OFF_AUTO_AUTO)
 
 static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
IOMMUNotifierFlag flags,
diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index 025ab6a..bda6024 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -110,6 +110,11 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, 
uint32_t flags)
 assert(is_power_of_2(align));
 /* Always align to host page size */
 assert(align >= getpagesize());
+if ((flags & RAM_SYNC_ON_OFF_AUTO_ON) &&
+(!shared || !MAP_SYNC_FLAGS)) {
+munmap(ptr, total);
+return MAP_FAILED;
+}
 if ((flags & RAM_SYNC) && shared) {
 mmap_xflags |= MAP_SYNC_FLAGS;
 }
@@ -122,8 +127,12 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, 
uint32_t flags)
 (shared ? MAP_SHARED : MAP_PRIVATE) | mmap_xflags,
 fd, 0);
 if ((ptr1 == MAP_FAILED) && (mmap_xflags & MAP_SYNC_FLAGS)) {
-mmap_xflags &= ~MAP_SYNC_FLAGS;
-goto retry_mmap_fd;
+if (flags & RAM_SYNC_ON_OFF_AUTO_AUTO) {
+mmap_xflags &= ~MAP_SYNC_FLAGS;
+goto retry_mmap_fd;
+}
+munmap(ptr, total);
+return MAP_FAILED;
 }
 
 if (offset > 0) {
-- 
2.7.4




[Qemu-devel] [PATCH V6 2/6] util/mmap-alloc: switch qemu_ram_mmap() to 'flags' parameter

2018-12-12 Thread Zhang Yi
As more flag parameters besides the existing 'shared' are going to be
added to qemu_ram_mmap(), let's switch 'shared' to a 'flags' parameter
in advance, so as to ease the further additions.

Signed-off-by: Haozhong Zhang 
Signed-off-by: Zhang Yi 
---
 exec.c|  7 ---
 include/qemu/mmap-alloc.h | 19 ++-
 util/mmap-alloc.c |  8 +---
 util/oslib-posix.c|  8 +++-
 4 files changed, 34 insertions(+), 8 deletions(-)

diff --git a/exec.c b/exec.c
index bb6170d..e92a7da 100644
--- a/exec.c
+++ b/exec.c
@@ -1810,6 +1810,7 @@ static void *file_ram_alloc(RAMBlock *block,
 ram_addr_t memory,
 int fd,
 bool truncate,
+uint32_t flags,
 Error **errp)
 {
 void *area;
@@ -1859,8 +1860,7 @@ static void *file_ram_alloc(RAMBlock *block,
 perror("ftruncate");
 }
 
-area = qemu_ram_mmap(fd, memory, block->mr->align,
- block->flags & RAM_SHARED);
+area = qemu_ram_mmap(fd, memory, block->mr->align, flags);
 if (area == MAP_FAILED) {
 error_setg_errno(errp, errno,
  "unable to map backing store for guest RAM");
@@ -2279,7 +2279,8 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, 
MemoryRegion *mr,
 new_block->used_length = size;
 new_block->max_length = size;
 new_block->flags = ram_flags;
-new_block->host = file_ram_alloc(new_block, size, fd, !file_size, errp);
+new_block->host = file_ram_alloc(new_block, size, fd, !file_size,
+ram_flags, errp);
 if (!new_block->host) {
 g_free(new_block);
 return NULL;
diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
index 50385e3..6fe6ed4 100644
--- a/include/qemu/mmap-alloc.h
+++ b/include/qemu/mmap-alloc.h
@@ -7,7 +7,24 @@ size_t qemu_fd_getpagesize(int fd);
 
 size_t qemu_mempath_getpagesize(const char *mem_path);
 
-void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared);
+/**
+ * qemu_ram_mmap: mmap the specified file or device.
+ *
+ * Parameters:
+ *  @fd: the file or the device to mmap
+ *  @size: the number of bytes to be mmaped
+ *  @align: if not zero, specify the alignment of the starting mapping address;
+ *  otherwise, the alignment in use will be determined by QEMU.
+ *  @flags: specifies additional properties of the mapping, which can be one or
+ *  bit-or of following values
+ *  - RAM_SHARED: mmap with MAP_SHARED flag
+ *  Other bits are ignored.
+ *
+ * Return:
+ *  On success, return a pointer to the mapped area.
+ *  On failure, return MAP_FAILED.
+ */
+void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags);
 
 void qemu_ram_munmap(void *ptr, size_t size);
 
diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index fd329ec..8f0a740 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -13,6 +13,7 @@
 #include "qemu/osdep.h"
 #include "qemu/mmap-alloc.h"
 #include "qemu/host-utils.h"
+#include "exec/memory.h"
 
 #define HUGETLBFS_MAGIC   0x958458f6
 
@@ -75,7 +76,7 @@ size_t qemu_mempath_getpagesize(const char *mem_path)
 return getpagesize();
 }
 
-void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
+void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
 {
 /*
  * Note: this always allocates at least one extra page of virtual address
@@ -92,11 +93,12 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, bool 
shared)
  * anonymous memory is OK.
  */
 int anonfd = fd == -1 || qemu_fd_getpagesize(fd) == getpagesize() ? -1 : 
fd;
-int flags = anonfd == -1 ? MAP_ANONYMOUS : MAP_NORESERVE;
-void *ptr = mmap(0, total, PROT_NONE, flags | MAP_PRIVATE, anonfd, 0);
+int mmap_flags = anonfd == -1 ? MAP_ANONYMOUS : MAP_NORESERVE;
+void *ptr = mmap(0, total, PROT_NONE, mmap_flags | MAP_PRIVATE, anonfd, 0);
 #else
 void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
 #endif
+bool shared = flags & RAM_SHARED;
 size_t offset;
 void *ptr1;
 
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index fbd0dc8..121c31f 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -203,7 +203,13 @@ void *qemu_memalign(size_t alignment, size_t size)
 void *qemu_anon_ram_alloc(size_t size, uint64_t *alignment, bool shared)
 {
 size_t align = QEMU_VMALLOC_ALIGN;
-void *ptr = qemu_ram_mmap(-1, size, align, shared);
+uint32_t flags = 0;
+void *ptr;
+
+if (shared) {
+flags = MAP_SHARED;
+}
+ptr = qemu_ram_mmap(-1, size, align, flags);
 
 if (ptr == MAP_FAILED) {
 return NULL;
-- 
2.7.4




[Qemu-devel] [PATCH V6 3/6] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()

2018-12-12 Thread Zhang Yi
When a file supporting DAX is used as vNVDIMM backend, mmap it with
MAP_SYNC flag in addition can guarantee the persistence of guest write
to the backend file without other QEMU actions (e.g., periodic fsync()
by QEMU).

A set of RAM_SYNC flags are added to qemu_ram_mmap():

Signed-off-by: Haozhong Zhang 
Signed-off-by: Zhang Yi 
---
 exec.c|  2 +-
 include/exec/memory.h |  3 +++
 include/exec/ram_addr.h   |  1 +
 include/qemu/mmap-alloc.h |  1 +
 include/qemu/osdep.h  | 29 +
 util/mmap-alloc.c | 13 +
 6 files changed, 44 insertions(+), 5 deletions(-)

diff --git a/exec.c b/exec.c
index e92a7da..dc4d180 100644
--- a/exec.c
+++ b/exec.c
@@ -2241,7 +2241,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, 
MemoryRegion *mr,
 int64_t file_size;
 
 /* Just support these ram flags by now. */
-assert((ram_flags & ~(RAM_SHARED | RAM_PMEM)) == 0);
+assert((ram_flags & ~(RAM_SHARED | RAM_PMEM | RAM_SYNC)) == 0);
 
 if (xen_enabled()) {
 error_setg(errp, "-mem-path not supported with Xen");
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 667466b..33a4e2c 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -126,6 +126,9 @@ typedef struct IOMMUNotifier IOMMUNotifier;
 /* RAM is a persistent kind memory */
 #define RAM_PMEM (1 << 5)
 
+/* RAM can be mmap by a MAP_SYNC flag */
+#define RAM_SYNC (1 << 6)
+
 static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
IOMMUNotifierFlag flags,
hwaddr start, hwaddr end,
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index 9ecd911..d239ce7 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -87,6 +87,7 @@ long qemu_getrampagesize(void);
  *  or bit-or of following values
  *  - RAM_SHARED: mmap the backing file or device with MAP_SHARED
  *  - RAM_PMEM: the backend @mem_path or @fd is persistent memory
+ *  - RAM_SYNC:   mmap with MAP_SYNC flag
  *  Other bits are ignored.
  *  @mem_path or @fd: specify the backing file or device
  *  @errp: pointer to Error*, to store an error if it happens
diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
index 6fe6ed4..1755a8b 100644
--- a/include/qemu/mmap-alloc.h
+++ b/include/qemu/mmap-alloc.h
@@ -18,6 +18,7 @@ size_t qemu_mempath_getpagesize(const char *mem_path);
  *  @flags: specifies additional properties of the mapping, which can be one or
  *  bit-or of following values
  *  - RAM_SHARED: mmap with MAP_SHARED flag
+ *  - RAM_SYNC:   mmap with MAP_SYNC flag
  *  Other bits are ignored.
  *
  * Return:
diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index 3bf48bc..f94ea68 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -410,6 +410,35 @@ void qemu_anon_ram_free(void *ptr, size_t size);
 #  define QEMU_VMALLOC_ALIGN getpagesize()
 #endif
 
+/*
+ * MAP_SHARED_VALIDATE and MAP_SYNC are introduced in Linux kernel
+ * 4.15, so they may not be defined when compiling on older kernels.
+ */
+#ifdef CONFIG_LINUX
+
+#include 
+
+#ifndef MAP_SHARED_VALIDATE
+#define MAP_SHARED_VALIDATE   0x3
+#endif
+
+#ifndef MAP_SYNC
+#define MAP_SYNC  0x8
+#endif
+
+/* MAP_SYNC is only available with MAP_SHARED_VALIDATE. */
+#define MAP_SYNC_FLAGS (MAP_SYNC | MAP_SHARED_VALIDATE)
+
+#else  /* !CONFIG_LINUX */
+
+#define MAP_SHARED_VALIDATE   0x0
+#define MAP_SYNC  0x0
+
+#define QEMU_HAS_MAP_SYNC false
+#define MAP_SYNC_FLAGS 0
+
+#endif /* CONFIG_LINUX */
+
 #ifdef CONFIG_POSIX
 struct qemu_signalfd_siginfo {
 uint32_t ssi_signo;   /* Signal number */
diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index 8f0a740..025ab6a 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -99,6 +99,7 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, 
uint32_t flags)
 void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
 #endif
 bool shared = flags & RAM_SHARED;
+int mmap_xflags = 0;
 size_t offset;
 void *ptr1;
 
@@ -109,16 +110,20 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, 
uint32_t flags)
 assert(is_power_of_2(align));
 /* Always align to host page size */
 assert(align >= getpagesize());
+if ((flags & RAM_SYNC) && shared) {
+mmap_xflags |= MAP_SYNC_FLAGS;
+}
 
 offset = QEMU_ALIGN_UP((uintptr_t)ptr, align) - (uintptr_t)ptr;
+ retry_mmap_fd:
 ptr1 = mmap(ptr + offset, size, PROT_READ | PROT_WRITE,
 MAP_FIXED |
 (fd == -1 ? MAP_ANONYMOUS : 0) |
-(shared ? MAP_SHARED : MAP_PRIVATE),
+(shared ? MAP_SHARED : MAP_PRIVATE) | mmap_xflags,
 fd, 0);
-if (ptr1 == MAP_FAILED) {
-munmap(ptr, total);
-  

[Qemu-devel] [PATCH V6 1/6] numa: Fixed the memory leak of numa error message

2018-12-12 Thread Zhang Yi
object_get_canonical_path_component() returns a string which
must be freed using g_free().

Signed-off-by: Zhang Yi 
Reviewed-by: Pankaj gupta 
Reviewed-by: Igor Mammedov 
---
 numa.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/numa.c b/numa.c
index 50ec016..3875e1e 100644
--- a/numa.c
+++ b/numa.c
@@ -533,6 +533,7 @@ void memory_region_allocate_system_memory(MemoryRegion *mr, 
Object *owner,
 error_report("memory backend %s is used multiple times. Each "
  "-numa option must use a different memdev value.",
  path);
+g_free(path);
 exit(1);
 }
 
-- 
2.7.4




[Qemu-devel] [PATCH V6 0/6] nvdimm: support MAP_SYNC for memory-backend-file

2018-12-12 Thread Zhang Yi
Linux 4.15 introduces a new mmap flag MAP_SYNC, which can be used to
guarantee the write persistence to mmap'ed files supporting DAX (e.g.,
files on ext4/xfs file system mounted with '-o dax').

A description of MAP_SYNC and MAP_SHARED_VALIDATE can be found at
https://patchwork.kernel.org/patch/10028151/

In order to make sure that the file metadata is in sync after a fault 
while we are writing a shared DAX supporting backend files, this
patch-set enables QEMU to use MAP_SYNC flag for memory-backend-dax-file.

As the DAX vs DMA truncated issue was solved, we refined the code and
send out this feature for the v5 version.

A new auto on/off option 'sync' is added to memory-backend-file:
 - on:  try to pass MAP_SYNC to mmap(2); if MAP_SYNC is not supported or
'share=off', QEMU will abort
 - off: never pass MAP_SYNC to mmap(2)
 - auto (default): if MAP_SYNC is supported and 'share=on', work as if
'sync=on'; otherwise, work as if 'sync=off'

Changes in v6:
 * Pankaj: 3/7 are squashed with 2/7
 * Pankaj: 7/7 update comments to "consistent filesystem metadata".
 * Pankaj, Igor: 1/7 Added Reviewed-by in patch-1/7
 * Stefan, 4/7 move the include header from "/linux/mman.h" to "osdep.h"
 * Stefan, 5/7 Add missing "munmap"
 * Stefan, 2/7 refine the shared/flag.

Changes in v5:
 * Add patch 1 to fix a memory leak issue.
 * Refine the patch 4-6
 * Remove the patch 3 as we already change the parameter from "shared" to
   "flags"

Changes in v4:
 * Add patch 1-3 to switch some functions to a single 'flags'
   parameters. (Michael S. Tsirkin)
 * v3 patch 1-3 become v4 patch 4-6.
 * Patch 4: move definitions of MAP_SYNC and MAP_SHARED_VALIDATE to a
   new header file under include/standard-headers/linux/. (Michael S. Tsirkin)
 * Patch 6: refine the description of the 'sync' option. (Michael S. Tsirkin)

Changes in v3:
 * Patch 1: add MAP_SHARED_VALIDATE in both sync=on and sync=auto
   cases, and add back the retry mechanism. MAP_SYNC will be ignored
   by Linux kernel 4.15 if MAP_SHARED_VALIDATE is missed.
 * Patch 1: define MAP_SYNC and MAP_SHARED_VALIDATE as 0 on non-Linux
   platforms in order to make qemu_ram_mmap() compile on those platforms.
 * Patch 2&3: include more information in error messages of
   memory-backend in hope to help user to identify the error.
   (Dr. David Alan Gilbert)
 * Patch 3: fix typo in the commit message. (Dr. David Alan Gilbert)

Changes in v2:
 * Add 'sync' option to control the use of MAP_SYNC. (Eduardo Habkost)
 * Remove the unnecessary set of MAP_SHARED_VALIDATE in some cases and
   the retry mechanism in qemu_ram_mmap(). (Michael S. Tsirkin)
 * Move OS dependent definitions of MAP_SYNC and MAP_SHARED_VALIDATE
   to osdep.h. (Michael S. Tsirkin)

Zhang Yi (6):
  numa: Fixed the memory leak of numa error message
  util/mmap-alloc: switch qemu_ram_mmap() to 'flags' parameter
  util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
  util/mmap-alloc: Switch the RAM_SYNC flags to OnOffAuto
  hostmem: add more information in error messages
  hostmem-file: add 'sync' option

 backends/hostmem-file.c   | 45 +++--
 backends/hostmem.c|  8 +---
 docs/nvdimm.txt   | 20 +++-
 exec.c|  9 +
 include/exec/memory.h | 18 ++
 include/exec/ram_addr.h   |  1 +
 include/qemu/mmap-alloc.h | 20 +++-
 include/qemu/osdep.h  | 29 +
 numa.c|  1 +
 qemu-options.hx   | 22 +-
 util/mmap-alloc.c | 26 +-
 util/oslib-posix.c|  8 +++-
 12 files changed, 189 insertions(+), 18 deletions(-)

-- 
2.7.4




[Qemu-devel] [PATCH V5_resend 0/7] nvdimm: support MAP_SYNC for memory-backend-file

2018-11-19 Thread Zhang Yi
Linux 4.15 introduces a new mmap flag MAP_SYNC, which can be used to
guarantee the write persistence to mmap'ed files supporting DAX (e.g.,
files on ext4/xfs file system mounted with '-o dax').

A description of MAP_SYNC and MAP_SHARED_VALIDATE can be found at
https://patchwork.kernel.org/patch/10028151/

In order to make sure that the file metadata is in sync after a fault 
while we are writing a shared DAX supporting backend files, this
patch-set enables QEMU to use MAP_SYNC flag for memory-backend-dax-file.

As the DAX vs DMA truncated issue was solved, we refined the code and
send out this feature for the v5 version.

A new auto on/off option 'sync' is added to memory-backend-file:
 - on:  try to pass MAP_SYNC to mmap(2); if MAP_SYNC is not supported or
'share=off', QEMU will abort
 - off: never pass MAP_SYNC to mmap(2)
 - auto (default): if MAP_SYNC is supported and 'share=on', work as if
'sync=on'; otherwise, work as if 'sync=off'

Zhang Yi (7):
  numa: Fixed the memory leak of numa error message
  util/mmap-alloc: switch qemu_ram_mmap() to 'flags' parameter
  exec: switch qemu_ram_alloc_from_{file, fd} to the 'flags' parameter
  util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
  util/mmap-alloc: Switch the RAM_SYNC flags to OnOffAuto
  hostmem: add more information in error messages
  hostmem-file: add 'sync' option

 backends/hostmem-file.c   | 45 +--
 backends/hostmem.c|  8 ---
 docs/nvdimm.txt   | 20 +++-
 exec.c|  9 +++
 include/exec/memory.h | 18 ++
 include/exec/ram_addr.h   |  1 +
 include/qemu/mmap-alloc.h | 20 +++-
 include/standard-headers/linux/mman.h | 44 ++
 numa.c|  1 +
 qemu-options.hx   | 22 -
 util/mmap-alloc.c | 26 
 util/oslib-posix.c|  4 +++-
 12 files changed, 200 insertions(+), 18 deletions(-)
 create mode 100644 include/standard-headers/linux/mman.h

-- 
2.7.4




[Qemu-devel] [PATCH V5_resend 5/7] util/mmap-alloc: Switch the RAM_SYNC flags to OnOffAuto

2018-11-19 Thread Zhang Yi
Signed-off-by: Zhang Yi 

A set of RAM_SYNC_ON_OFF_AUTO{AUTO,ON,OFF} flags are added to
qemu_ram_mmap():

- If RAM_SYNC_ON_OFF_AUTO_ON is present, qemu_ram_mmap() will try to pass
  MAP_SYNC to mmap(). It will then fail if the host OS or the backend
  file do not support MAP_SYNC, or MAP_SYNC is conflict with other
  flags.

- If RAM_SYNC_ON_OFF_AUTO_OFF is present, qemu_ram_mmap() will never pass
  MAP_SYNC to mmap().

- If RAM_SYNC_ON_OFF_AUTO_AUTO is present, and
  * if the host OS and the backend file support MAP_SYNC, and MAP_SYNC
is not conflict with other flags, qemu_ram_mmap() will work as if
RAM_SYNC_ON_OFF_AUTO_ON is present;
  * otherwise, qemu_ram_mmap() will work as if RAM_SYNC_ON_OFF_AUTO_OFF is
present.
---
 include/exec/memory.h |  9 -
 util/mmap-alloc.c | 12 ++--
 2 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 33a4e2c..c74c467 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -127,7 +127,14 @@ typedef struct IOMMUNotifier IOMMUNotifier;
 #define RAM_PMEM (1 << 5)
 
 /* RAM can be mmap by a MAP_SYNC flag */
-#define RAM_SYNC (1 << 6)
+#define RAM_SYNC_SHIFT  6
+#define RAM_SYNC_SHIFT_AUTO  7
+
+#define RAM_SYNC_ON_OFF_AUTO_ON   (1UL << RAM_SYNC_SHIFT)
+#define RAM_SYNC_ON_OFF_AUTO_OFF  (0UL << RAM_SYNC_SHIFT)
+#define RAM_SYNC_ON_OFF_AUTO_AUTO (1UL << RAM_SYNC_SHIFT_AUTO)
+
+#define RAM_SYNC (RAM_SYNC_ON_OFF_AUTO_ON | RAM_SYNC_ON_OFF_AUTO_AUTO)
 
 static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
IOMMUNotifierFlag flags,
diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index f411df7..fe9303f 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -111,6 +111,10 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, 
uint32_t flags)
 assert(is_power_of_2(align));
 /* Always align to host page size */
 assert(align >= getpagesize());
+if ((flags & RAM_SYNC_ON_OFF_AUTO_ON) &&
+(!shared || !MAP_SYNC_FLAGS)) {
+return MAP_FAILED;
+}
 if ((flags & RAM_SYNC) && shared) {
 mmap_xflags |= MAP_SYNC_FLAGS;
 }
@@ -123,8 +127,12 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, 
uint32_t flags)
 (shared ? MAP_SHARED : MAP_PRIVATE) | mmap_xflags,
 fd, 0);
 if ((ptr1 == MAP_FAILED) && (mmap_xflags & MAP_SYNC_FLAGS)) {
-mmap_xflags &= ~MAP_SYNC_FLAGS;
-goto retry_mmap_fd;
+if (flags & RAM_SYNC_ON_OFF_AUTO_AUTO) {
+mmap_xflags &= ~MAP_SYNC_FLAGS;
+goto retry_mmap_fd;
+}
+munmap(ptr, total);
+return MAP_FAILED;
 }
 
 if (offset > 0) {
-- 
2.7.4




[Qemu-devel] [PATCH V5_resend 6/7] hostmem: add more information in error messages

2018-11-19 Thread Zhang Yi
When there are multiple memory backends in use, including the object type
name, ID and the property name in the error message can help users to
locate the error.

Signed-off-by: Haozhong Zhang 
Signed-off-by: Zhang Yi 
---
 backends/hostmem-file.c | 6 --
 backends/hostmem.c  | 8 +---
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index e640749..0dd7a90 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -82,7 +82,8 @@ static void set_mem_path(Object *o, const char *str, Error 
**errp)
 HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
 
 if (host_memory_backend_mr_inited(backend)) {
-error_setg(errp, "cannot change property value");
+error_setg(errp, "cannot change property 'mem-path' of %s",
+   object_get_typename(o));
 return;
 }
 g_free(fb->mem_path);
@@ -120,7 +121,8 @@ static void file_memory_backend_set_align(Object *o, 
Visitor *v,
 uint64_t val;
 
 if (host_memory_backend_mr_inited(backend)) {
-error_setg(_err, "cannot change property value");
+error_setg(_err, "cannot change property '%s' of %s",
+   name, object_get_typename(o));
 goto out;
 }
 
diff --git a/backends/hostmem.c b/backends/hostmem.c
index 1a89342..e2bcf9f 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -47,7 +47,8 @@ host_memory_backend_set_size(Object *obj, Visitor *v, const 
char *name,
 uint64_t value;
 
 if (host_memory_backend_mr_inited(backend)) {
-error_setg(_err, "cannot change property value");
+error_setg(_err, "cannot change property %s of %s ",
+   name, object_get_typename(obj));
 goto out;
 }
 
@@ -56,8 +57,9 @@ host_memory_backend_set_size(Object *obj, Visitor *v, const 
char *name,
 goto out;
 }
 if (!value) {
-error_setg(_err, "Property '%s.%s' doesn't take value '%"
-   PRIu64 "'", object_get_typename(obj), name, value);
+error_setg(_err,
+   "property '%s' of %s doesn't take value '%" PRIu64 "'",
+   name, object_get_typename(obj), value);
 goto out;
 }
 backend->size = value;
-- 
2.7.4




[Qemu-devel] [PATCH V5_resend 2/7] util/mmap-alloc: switch qemu_ram_mmap() to 'flags' parameter

2018-11-19 Thread Zhang Yi
As more flag parameters besides the existing 'shared' are going to be
added to qemu_ram_mmap(), let's switch 'shared' to a 'flags' parameter
in advance, so as to ease the further additions.

Signed-off-by: Haozhong Zhang 
Signed-off-by: Zhang Yi 
---
 exec.c|  3 +--
 include/qemu/mmap-alloc.h | 19 ++-
 util/mmap-alloc.c |  8 +---
 util/oslib-posix.c|  4 +++-
 4 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/exec.c b/exec.c
index bb6170d..273f668 100644
--- a/exec.c
+++ b/exec.c
@@ -1859,8 +1859,7 @@ static void *file_ram_alloc(RAMBlock *block,
 perror("ftruncate");
 }
 
-area = qemu_ram_mmap(fd, memory, block->mr->align,
- block->flags & RAM_SHARED);
+area = qemu_ram_mmap(fd, memory, block->mr->align, block->flags);
 if (area == MAP_FAILED) {
 error_setg_errno(errp, errno,
  "unable to map backing store for guest RAM");
diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
index 50385e3..6fe6ed4 100644
--- a/include/qemu/mmap-alloc.h
+++ b/include/qemu/mmap-alloc.h
@@ -7,7 +7,24 @@ size_t qemu_fd_getpagesize(int fd);
 
 size_t qemu_mempath_getpagesize(const char *mem_path);
 
-void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared);
+/**
+ * qemu_ram_mmap: mmap the specified file or device.
+ *
+ * Parameters:
+ *  @fd: the file or the device to mmap
+ *  @size: the number of bytes to be mmaped
+ *  @align: if not zero, specify the alignment of the starting mapping address;
+ *  otherwise, the alignment in use will be determined by QEMU.
+ *  @flags: specifies additional properties of the mapping, which can be one or
+ *  bit-or of following values
+ *  - RAM_SHARED: mmap with MAP_SHARED flag
+ *  Other bits are ignored.
+ *
+ * Return:
+ *  On success, return a pointer to the mapped area.
+ *  On failure, return MAP_FAILED.
+ */
+void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags);
 
 void qemu_ram_munmap(void *ptr, size_t size);
 
diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index fd329ec..8f0a740 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -13,6 +13,7 @@
 #include "qemu/osdep.h"
 #include "qemu/mmap-alloc.h"
 #include "qemu/host-utils.h"
+#include "exec/memory.h"
 
 #define HUGETLBFS_MAGIC   0x958458f6
 
@@ -75,7 +76,7 @@ size_t qemu_mempath_getpagesize(const char *mem_path)
 return getpagesize();
 }
 
-void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
+void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
 {
 /*
  * Note: this always allocates at least one extra page of virtual address
@@ -92,11 +93,12 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, bool 
shared)
  * anonymous memory is OK.
  */
 int anonfd = fd == -1 || qemu_fd_getpagesize(fd) == getpagesize() ? -1 : 
fd;
-int flags = anonfd == -1 ? MAP_ANONYMOUS : MAP_NORESERVE;
-void *ptr = mmap(0, total, PROT_NONE, flags | MAP_PRIVATE, anonfd, 0);
+int mmap_flags = anonfd == -1 ? MAP_ANONYMOUS : MAP_NORESERVE;
+void *ptr = mmap(0, total, PROT_NONE, mmap_flags | MAP_PRIVATE, anonfd, 0);
 #else
 void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
 #endif
+bool shared = flags & RAM_SHARED;
 size_t offset;
 void *ptr1;
 
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index fbd0dc8..c28869d 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -203,7 +203,9 @@ void *qemu_memalign(size_t alignment, size_t size)
 void *qemu_anon_ram_alloc(size_t size, uint64_t *alignment, bool shared)
 {
 size_t align = QEMU_VMALLOC_ALIGN;
-void *ptr = qemu_ram_mmap(-1, size, align, shared);
+uint32_t flags = 0;
+flags |= shared;
+void *ptr = qemu_ram_mmap(-1, size, align, flags);
 
 if (ptr == MAP_FAILED) {
 return NULL;
-- 
2.7.4




[Qemu-devel] [PATCH V5_resend 4/7] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()

2018-11-19 Thread Zhang Yi
When a file supporting DAX is used as vNVDIMM backend, mmap it with
MAP_SYNC flag in addition can guarantee the persistence of guest write
to the backend file without other QEMU actions (e.g., periodic fsync()
by QEMU).

A set of RAM_SYNC flags are added to qemu_ram_mmap():

Signed-off-by: Haozhong Zhang 
Signed-off-by: Zhang Yi 
---
 exec.c|  2 +-
 include/exec/memory.h |  3 +++
 include/exec/ram_addr.h   |  1 +
 include/qemu/mmap-alloc.h |  1 +
 include/standard-headers/linux/mman.h | 44 +++
 util/mmap-alloc.c | 14 +++
 6 files changed, 60 insertions(+), 5 deletions(-)
 create mode 100644 include/standard-headers/linux/mman.h

diff --git a/exec.c b/exec.c
index e92a7da..dc4d180 100644
--- a/exec.c
+++ b/exec.c
@@ -2241,7 +2241,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, 
MemoryRegion *mr,
 int64_t file_size;
 
 /* Just support these ram flags by now. */
-assert((ram_flags & ~(RAM_SHARED | RAM_PMEM)) == 0);
+assert((ram_flags & ~(RAM_SHARED | RAM_PMEM | RAM_SYNC)) == 0);
 
 if (xen_enabled()) {
 error_setg(errp, "-mem-path not supported with Xen");
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 667466b..33a4e2c 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -126,6 +126,9 @@ typedef struct IOMMUNotifier IOMMUNotifier;
 /* RAM is a persistent kind memory */
 #define RAM_PMEM (1 << 5)
 
+/* RAM can be mmap by a MAP_SYNC flag */
+#define RAM_SYNC (1 << 6)
+
 static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
IOMMUNotifierFlag flags,
hwaddr start, hwaddr end,
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index 9ecd911..d239ce7 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -87,6 +87,7 @@ long qemu_getrampagesize(void);
  *  or bit-or of following values
  *  - RAM_SHARED: mmap the backing file or device with MAP_SHARED
  *  - RAM_PMEM: the backend @mem_path or @fd is persistent memory
+ *  - RAM_SYNC:   mmap with MAP_SYNC flag
  *  Other bits are ignored.
  *  @mem_path or @fd: specify the backing file or device
  *  @errp: pointer to Error*, to store an error if it happens
diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
index 6fe6ed4..1755a8b 100644
--- a/include/qemu/mmap-alloc.h
+++ b/include/qemu/mmap-alloc.h
@@ -18,6 +18,7 @@ size_t qemu_mempath_getpagesize(const char *mem_path);
  *  @flags: specifies additional properties of the mapping, which can be one or
  *  bit-or of following values
  *  - RAM_SHARED: mmap with MAP_SHARED flag
+ *  - RAM_SYNC:   mmap with MAP_SYNC flag
  *  Other bits are ignored.
  *
  * Return:
diff --git a/include/standard-headers/linux/mman.h 
b/include/standard-headers/linux/mman.h
new file mode 100644
index 000..ea1fc47
--- /dev/null
+++ b/include/standard-headers/linux/mman.h
@@ -0,0 +1,44 @@
+/*
+ * Definitions of Linux-specific mmap flags.
+ *
+ * Copyright Intel Corporation, 2018
+ *
+ * Author: Haozhong Zhang 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later. See the COPYING file in the top-level directory.
+ */
+
+#ifndef _LINUX_MMAN_H
+#define _LINUX_MMAN_H
+
+/*
+ * MAP_SHARED_VALIDATE and MAP_SYNC are introduced in Linux kernel
+ * 4.15, so they may not be defined when compiling on older kernels.
+ */
+#ifdef CONFIG_LINUX
+
+#include 
+
+#ifndef MAP_SHARED_VALIDATE
+#define MAP_SHARED_VALIDATE   0x3
+#endif
+
+#ifndef MAP_SYNC
+#define MAP_SYNC  0x8
+#endif
+
+/* MAP_SYNC is only available with MAP_SHARED_VALIDATE. */
+#define MAP_SYNC_FLAGS (MAP_SYNC | MAP_SHARED_VALIDATE)
+
+#else  /* !CONFIG_LINUX */
+
+#define MAP_SHARED_VALIDATE   0x0
+#define MAP_SYNC  0x0
+
+#define QEMU_HAS_MAP_SYNC false
+#define MAP_SYNC_FLAGS 0
+
+#endif /* CONFIG_LINUX */
+
+#endif /* !_LINUX_MMAN_H */
diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index 8f0a740..f411df7 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -14,6 +14,7 @@
 #include "qemu/mmap-alloc.h"
 #include "qemu/host-utils.h"
 #include "exec/memory.h"
+#include "standard-headers/linux/mman.h"
 
 #define HUGETLBFS_MAGIC   0x958458f6
 
@@ -99,6 +100,7 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, 
uint32_t flags)
 void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
 #endif
 bool shared = flags & RAM_SHARED;
+int mmap_xflags = 0;
 size_t offset;
 void *ptr1;
 
@@ -109,16 +111,20 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, 
uint32_t flags)
 assert(is_power_of_2(align));
 /* Always align to host page size */
 assert(align >= 

[Qemu-devel] [PATCH V5_resend 7/7] hostmem-file: add 'sync' option

2018-11-19 Thread Zhang Yi
This option controls whether QEMU mmap(2) the memory backend file with
MAP_SYNC flag, which can fully guarantee the guest write persistence
to the backend, if MAP_SYNC flag is supported by the host kernel
(Linux kernel 4.15 and later) and the backend is a file supporting
DAX (e.g., file on ext4/xfs file system mounted with '-o dax').

It can take one of following values:
 - on:  try to pass MAP_SYNC to mmap(2); if MAP_SYNC is not supported or
'share=off', QEMU will abort
 - off: never pass MAP_SYNC to mmap(2)
 - auto (default): if MAP_SYNC is supported and 'share=on', work as if
'sync=on'; otherwise, work as if 'sync=off'

Signed-off-by: Haozhong Zhang 
Signed-off-by: Zhang Yi 
---
 backends/hostmem-file.c | 39 +++
 docs/nvdimm.txt | 20 +++-
 include/exec/memory.h   |  8 
 qemu-options.hx | 22 +-
 4 files changed, 87 insertions(+), 2 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index 0dd7a90..73cf181 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -16,6 +16,7 @@
 #include "sysemu/hostmem.h"
 #include "sysemu/sysemu.h"
 #include "qom/object_interfaces.h"
+#include "qapi/qapi-visit.h"
 
 /* hostmem-file.c */
 /**
@@ -36,6 +37,7 @@ struct HostMemoryBackendFile {
 uint64_t align;
 bool discard_data;
 bool is_pmem;
+OnOffAuto sync;
 };
 
 static void
@@ -62,6 +64,7 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error 
**errp)
  path,
  backend->size, fb->align,
  (backend->share ? RAM_SHARED : 0) |
+ qemu_ram_sync_flags(fb->sync) |
  (fb->is_pmem ? RAM_PMEM : 0),
  fb->mem_path, errp);
 g_free(path);
@@ -136,6 +139,39 @@ static void file_memory_backend_set_align(Object *o, 
Visitor *v,
 error_propagate(errp, local_err);
 }
 
+static void file_memory_backend_get_sync(
+Object *obj, Visitor *v, const char *name, void *opaque, Error **errp)
+{
+HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(obj);
+OnOffAuto value = fb->sync;
+
+visit_type_OnOffAuto(v, name, , errp);
+}
+
+static void file_memory_backend_set_sync(
+Object *obj, Visitor *v, const char *name, void *opaque, Error **errp)
+{
+HostMemoryBackend *backend = MEMORY_BACKEND(obj);
+HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(obj);
+Error *local_err = NULL;
+OnOffAuto value;
+
+if (host_memory_backend_mr_inited(backend)) {
+error_setg(_err, "cannot change property '%s' of %s",
+   name, object_get_typename(obj));
+goto out;
+}
+
+visit_type_OnOffAuto(v, name, , _err);
+if (local_err) {
+goto out;
+}
+fb->sync = value;
+
+ out:
+error_propagate(errp, local_err);
+}
+
 static bool file_memory_backend_get_pmem(Object *o, Error **errp)
 {
 return MEMORY_BACKEND_FILE(o)->is_pmem;
@@ -203,6 +239,9 @@ file_backend_class_init(ObjectClass *oc, void *data)
 object_class_property_add_bool(oc, "pmem",
 file_memory_backend_get_pmem, file_memory_backend_set_pmem,
 _abort);
+object_class_property_add(oc, "sync", "OnOffAuto",
+file_memory_backend_get_sync, file_memory_backend_set_sync,
+NULL, NULL, _abort);
 }
 
 static void file_backend_instance_finalize(Object *o)
diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
index 5f158a6..3d89174 100644
--- a/docs/nvdimm.txt
+++ b/docs/nvdimm.txt
@@ -142,11 +142,29 @@ backend of vNVDIMM:
 Guest Data Persistence
 --
 
+vNVDIMM is designed and implemented to guarantee the guest data
+persistence on the backends even on the host crash and power
+failures. However, there are still some requirements and limitations
+as explained below.
+
 Though QEMU supports multiple types of vNVDIMM backends on Linux,
-currently the only one that can guarantee the guest write persistence
+if MAP_SYNC is not supported by the host kernel and the backends,
+the only backend that can guarantee the guest write persistence
 is the device DAX on the real NVDIMM device (e.g., /dev/dax0.0), to
 which all guest access do not involve any host-side kernel cache.
 
+mmap(2) flag MAP_SYNC is added since Linux kernel 4.15. On such
+systems, QEMU can mmap(2) the backend with MAP_SYNC, which can
+guarantee the guest write persistence to vNVDIMM. Besides the host
+kernel support, enabling MAP_SYNC in QEMU also requires:
+
+ - the backend is a file supporting DAX, e.g., a file on an ext4 or
+   xfs file system mounted with '-o dax',
+
+ - 'sync' option of memory-backend-file is not 'off', and
+
+ - 'share' option of memory-backend-file is 'on'.
+
 When using other types of backends, it's suggested to set 'un

[Qemu-devel] [PATCH V5_resend 1/7] numa: Fixed the memory leak of numa error message

2018-11-19 Thread Zhang Yi
object_get_canonical_path_component() returns a string which
must be freed using g_free().

Signed-off-by: Zhang Yi 
---
 numa.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/numa.c b/numa.c
index 50ec016..3875e1e 100644
--- a/numa.c
+++ b/numa.c
@@ -533,6 +533,7 @@ void memory_region_allocate_system_memory(MemoryRegion *mr, 
Object *owner,
 error_report("memory backend %s is used multiple times. Each "
  "-numa option must use a different memdev value.",
  path);
+g_free(path);
 exit(1);
 }
 
-- 
2.7.4




[Qemu-devel] [PATCH V5_resend 3/7] exec: switch qemu_ram_alloc_from_{file, fd} to the 'flags' parameter

2018-11-19 Thread Zhang Yi
As more flag parameters besides the existing 'share' are going to be
added to qemu_ram_alloc_from_{file,fd}(), let's swith 'share' to a
'flags' parameters in advance, so as to ease the further additions.

Signed-off-by: Haozhong Zhang 
Signed-off-by: Zhang Yi 
---
 exec.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/exec.c b/exec.c
index 273f668..e92a7da 100644
--- a/exec.c
+++ b/exec.c
@@ -1810,6 +1810,7 @@ static void *file_ram_alloc(RAMBlock *block,
 ram_addr_t memory,
 int fd,
 bool truncate,
+uint32_t flags,
 Error **errp)
 {
 void *area;
@@ -1859,7 +1860,7 @@ static void *file_ram_alloc(RAMBlock *block,
 perror("ftruncate");
 }
 
-area = qemu_ram_mmap(fd, memory, block->mr->align, block->flags);
+area = qemu_ram_mmap(fd, memory, block->mr->align, flags);
 if (area == MAP_FAILED) {
 error_setg_errno(errp, errno,
  "unable to map backing store for guest RAM");
@@ -2278,7 +2279,8 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, 
MemoryRegion *mr,
 new_block->used_length = size;
 new_block->max_length = size;
 new_block->flags = ram_flags;
-new_block->host = file_ram_alloc(new_block, size, fd, !file_size, errp);
+new_block->host = file_ram_alloc(new_block, size, fd, !file_size,
+ram_flags, errp);
 if (!new_block->host) {
 g_free(new_block);
 return NULL;
-- 
2.7.4




[Qemu-devel] [PATCH v5 6/7] hostmem: add more information in error messages

2018-11-05 Thread Zhang Yi
When there are multiple memory backends in use, including the object type
name, ID and the property name in the error message can help users to
locate the error.

Signed-off-by: Haozhong Zhang 
Signed-off-by: Zhang Yi 
---
 backends/hostmem-file.c | 6 --
 backends/hostmem.c  | 8 +---
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index e640749..0dd7a90 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -82,7 +82,8 @@ static void set_mem_path(Object *o, const char *str, Error 
**errp)
 HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
 
 if (host_memory_backend_mr_inited(backend)) {
-error_setg(errp, "cannot change property value");
+error_setg(errp, "cannot change property 'mem-path' of %s",
+   object_get_typename(o));
 return;
 }
 g_free(fb->mem_path);
@@ -120,7 +121,8 @@ static void file_memory_backend_set_align(Object *o, 
Visitor *v,
 uint64_t val;
 
 if (host_memory_backend_mr_inited(backend)) {
-error_setg(_err, "cannot change property value");
+error_setg(_err, "cannot change property '%s' of %s",
+   name, object_get_typename(o));
 goto out;
 }
 
diff --git a/backends/hostmem.c b/backends/hostmem.c
index 1a89342..e2bcf9f 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -47,7 +47,8 @@ host_memory_backend_set_size(Object *obj, Visitor *v, const 
char *name,
 uint64_t value;
 
 if (host_memory_backend_mr_inited(backend)) {
-error_setg(_err, "cannot change property value");
+error_setg(_err, "cannot change property %s of %s ",
+   name, object_get_typename(obj));
 goto out;
 }
 
@@ -56,8 +57,9 @@ host_memory_backend_set_size(Object *obj, Visitor *v, const 
char *name,
 goto out;
 }
 if (!value) {
-error_setg(_err, "Property '%s.%s' doesn't take value '%"
-   PRIu64 "'", object_get_typename(obj), name, value);
+error_setg(_err,
+   "property '%s' of %s doesn't take value '%" PRIu64 "'",
+   name, object_get_typename(obj), value);
 goto out;
 }
 backend->size = value;
-- 
2.7.4




[Qemu-devel] [PATCH v5 3/7] exec: switch qemu_ram_alloc_from_{file, fd} to the 'flags' parameter

2018-11-05 Thread Zhang Yi
As more flag parameters besides the existing 'share' are going to be
added to qemu_ram_alloc_from_{file,fd}(), let's swith 'share' to a
'flags' parameters in advance, so as to ease the further additions.

Signed-off-by: Haozhong Zhang 
Signed-off-by: Zhang Yi 
---
 exec.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/exec.c b/exec.c
index 273f668..e92a7da 100644
--- a/exec.c
+++ b/exec.c
@@ -1810,6 +1810,7 @@ static void *file_ram_alloc(RAMBlock *block,
 ram_addr_t memory,
 int fd,
 bool truncate,
+uint32_t flags,
 Error **errp)
 {
 void *area;
@@ -1859,7 +1860,7 @@ static void *file_ram_alloc(RAMBlock *block,
 perror("ftruncate");
 }
 
-area = qemu_ram_mmap(fd, memory, block->mr->align, block->flags);
+area = qemu_ram_mmap(fd, memory, block->mr->align, flags);
 if (area == MAP_FAILED) {
 error_setg_errno(errp, errno,
  "unable to map backing store for guest RAM");
@@ -2278,7 +2279,8 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, 
MemoryRegion *mr,
 new_block->used_length = size;
 new_block->max_length = size;
 new_block->flags = ram_flags;
-new_block->host = file_ram_alloc(new_block, size, fd, !file_size, errp);
+new_block->host = file_ram_alloc(new_block, size, fd, !file_size,
+ram_flags, errp);
 if (!new_block->host) {
 g_free(new_block);
 return NULL;
-- 
2.7.4




[Qemu-devel] [PATCH v5 7/7] hostmem-file: add 'sync' option

2018-11-05 Thread Zhang Yi
This option controls whether QEMU mmap(2) the memory backend file with
MAP_SYNC flag, which can fully guarantee the guest write persistence
to the backend, if MAP_SYNC flag is supported by the host kernel
(Linux kernel 4.15 and later) and the backend is a file supporting
DAX (e.g., file on ext4/xfs file system mounted with '-o dax').

It can take one of following values:
 - on:  try to pass MAP_SYNC to mmap(2); if MAP_SYNC is not supported or
'share=off', QEMU will abort
 - off: never pass MAP_SYNC to mmap(2)
 - auto (default): if MAP_SYNC is supported and 'share=on', work as if
'sync=on'; otherwise, work as if 'sync=off'

Signed-off-by: Haozhong Zhang 
Signed-off-by: Zhang Yi 
---
 backends/hostmem-file.c | 39 +++
 docs/nvdimm.txt | 20 +++-
 include/exec/memory.h   |  8 
 qemu-options.hx | 22 +-
 4 files changed, 87 insertions(+), 2 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index 0dd7a90..73cf181 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -16,6 +16,7 @@
 #include "sysemu/hostmem.h"
 #include "sysemu/sysemu.h"
 #include "qom/object_interfaces.h"
+#include "qapi/qapi-visit.h"
 
 /* hostmem-file.c */
 /**
@@ -36,6 +37,7 @@ struct HostMemoryBackendFile {
 uint64_t align;
 bool discard_data;
 bool is_pmem;
+OnOffAuto sync;
 };
 
 static void
@@ -62,6 +64,7 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error 
**errp)
  path,
  backend->size, fb->align,
  (backend->share ? RAM_SHARED : 0) |
+ qemu_ram_sync_flags(fb->sync) |
  (fb->is_pmem ? RAM_PMEM : 0),
  fb->mem_path, errp);
 g_free(path);
@@ -136,6 +139,39 @@ static void file_memory_backend_set_align(Object *o, 
Visitor *v,
 error_propagate(errp, local_err);
 }
 
+static void file_memory_backend_get_sync(
+Object *obj, Visitor *v, const char *name, void *opaque, Error **errp)
+{
+HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(obj);
+OnOffAuto value = fb->sync;
+
+visit_type_OnOffAuto(v, name, , errp);
+}
+
+static void file_memory_backend_set_sync(
+Object *obj, Visitor *v, const char *name, void *opaque, Error **errp)
+{
+HostMemoryBackend *backend = MEMORY_BACKEND(obj);
+HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(obj);
+Error *local_err = NULL;
+OnOffAuto value;
+
+if (host_memory_backend_mr_inited(backend)) {
+error_setg(_err, "cannot change property '%s' of %s",
+   name, object_get_typename(obj));
+goto out;
+}
+
+visit_type_OnOffAuto(v, name, , _err);
+if (local_err) {
+goto out;
+}
+fb->sync = value;
+
+ out:
+error_propagate(errp, local_err);
+}
+
 static bool file_memory_backend_get_pmem(Object *o, Error **errp)
 {
 return MEMORY_BACKEND_FILE(o)->is_pmem;
@@ -203,6 +239,9 @@ file_backend_class_init(ObjectClass *oc, void *data)
 object_class_property_add_bool(oc, "pmem",
 file_memory_backend_get_pmem, file_memory_backend_set_pmem,
 _abort);
+object_class_property_add(oc, "sync", "OnOffAuto",
+file_memory_backend_get_sync, file_memory_backend_set_sync,
+NULL, NULL, _abort);
 }
 
 static void file_backend_instance_finalize(Object *o)
diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
index 5f158a6..3d89174 100644
--- a/docs/nvdimm.txt
+++ b/docs/nvdimm.txt
@@ -142,11 +142,29 @@ backend of vNVDIMM:
 Guest Data Persistence
 --
 
+vNVDIMM is designed and implemented to guarantee the guest data
+persistence on the backends even on the host crash and power
+failures. However, there are still some requirements and limitations
+as explained below.
+
 Though QEMU supports multiple types of vNVDIMM backends on Linux,
-currently the only one that can guarantee the guest write persistence
+if MAP_SYNC is not supported by the host kernel and the backends,
+the only backend that can guarantee the guest write persistence
 is the device DAX on the real NVDIMM device (e.g., /dev/dax0.0), to
 which all guest access do not involve any host-side kernel cache.
 
+mmap(2) flag MAP_SYNC is added since Linux kernel 4.15. On such
+systems, QEMU can mmap(2) the backend with MAP_SYNC, which can
+guarantee the guest write persistence to vNVDIMM. Besides the host
+kernel support, enabling MAP_SYNC in QEMU also requires:
+
+ - the backend is a file supporting DAX, e.g., a file on an ext4 or
+   xfs file system mounted with '-o dax',
+
+ - 'sync' option of memory-backend-file is not 'off', and
+
+ - 'share' option of memory-backend-file is 'on'.
+
 When using other types of backends, it's suggested to set 'un

[Qemu-devel] [PATCH v5 4/7] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()

2018-11-05 Thread Zhang Yi
When a file supporting DAX is used as vNVDIMM backend, mmap it with
MAP_SYNC flag in addition can guarantee the persistence of guest write
to the backend file without other QEMU actions (e.g., periodic fsync()
by QEMU).

A set of RAM_SYNC flags are added to qemu_ram_mmap():

Signed-off-by: Haozhong Zhang 
Signed-off-by: Zhang Yi 
---
 exec.c|  2 +-
 include/exec/memory.h |  3 +++
 include/exec/ram_addr.h   |  1 +
 include/qemu/mmap-alloc.h |  1 +
 include/standard-headers/linux/mman.h | 44 +++
 util/mmap-alloc.c | 14 +++
 6 files changed, 60 insertions(+), 5 deletions(-)
 create mode 100644 include/standard-headers/linux/mman.h

diff --git a/exec.c b/exec.c
index e92a7da..dc4d180 100644
--- a/exec.c
+++ b/exec.c
@@ -2241,7 +2241,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, 
MemoryRegion *mr,
 int64_t file_size;
 
 /* Just support these ram flags by now. */
-assert((ram_flags & ~(RAM_SHARED | RAM_PMEM)) == 0);
+assert((ram_flags & ~(RAM_SHARED | RAM_PMEM | RAM_SYNC)) == 0);
 
 if (xen_enabled()) {
 error_setg(errp, "-mem-path not supported with Xen");
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 667466b..33a4e2c 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -126,6 +126,9 @@ typedef struct IOMMUNotifier IOMMUNotifier;
 /* RAM is a persistent kind memory */
 #define RAM_PMEM (1 << 5)
 
+/* RAM can be mmap by a MAP_SYNC flag */
+#define RAM_SYNC (1 << 6)
+
 static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
IOMMUNotifierFlag flags,
hwaddr start, hwaddr end,
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index 9ecd911..d239ce7 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -87,6 +87,7 @@ long qemu_getrampagesize(void);
  *  or bit-or of following values
  *  - RAM_SHARED: mmap the backing file or device with MAP_SHARED
  *  - RAM_PMEM: the backend @mem_path or @fd is persistent memory
+ *  - RAM_SYNC:   mmap with MAP_SYNC flag
  *  Other bits are ignored.
  *  @mem_path or @fd: specify the backing file or device
  *  @errp: pointer to Error*, to store an error if it happens
diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
index 6fe6ed4..1755a8b 100644
--- a/include/qemu/mmap-alloc.h
+++ b/include/qemu/mmap-alloc.h
@@ -18,6 +18,7 @@ size_t qemu_mempath_getpagesize(const char *mem_path);
  *  @flags: specifies additional properties of the mapping, which can be one or
  *  bit-or of following values
  *  - RAM_SHARED: mmap with MAP_SHARED flag
+ *  - RAM_SYNC:   mmap with MAP_SYNC flag
  *  Other bits are ignored.
  *
  * Return:
diff --git a/include/standard-headers/linux/mman.h 
b/include/standard-headers/linux/mman.h
new file mode 100644
index 000..ea1fc47
--- /dev/null
+++ b/include/standard-headers/linux/mman.h
@@ -0,0 +1,44 @@
+/*
+ * Definitions of Linux-specific mmap flags.
+ *
+ * Copyright Intel Corporation, 2018
+ *
+ * Author: Haozhong Zhang 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later. See the COPYING file in the top-level directory.
+ */
+
+#ifndef _LINUX_MMAN_H
+#define _LINUX_MMAN_H
+
+/*
+ * MAP_SHARED_VALIDATE and MAP_SYNC are introduced in Linux kernel
+ * 4.15, so they may not be defined when compiling on older kernels.
+ */
+#ifdef CONFIG_LINUX
+
+#include 
+
+#ifndef MAP_SHARED_VALIDATE
+#define MAP_SHARED_VALIDATE   0x3
+#endif
+
+#ifndef MAP_SYNC
+#define MAP_SYNC  0x8
+#endif
+
+/* MAP_SYNC is only available with MAP_SHARED_VALIDATE. */
+#define MAP_SYNC_FLAGS (MAP_SYNC | MAP_SHARED_VALIDATE)
+
+#else  /* !CONFIG_LINUX */
+
+#define MAP_SHARED_VALIDATE   0x0
+#define MAP_SYNC  0x0
+
+#define QEMU_HAS_MAP_SYNC false
+#define MAP_SYNC_FLAGS 0
+
+#endif /* CONFIG_LINUX */
+
+#endif /* !_LINUX_MMAN_H */
diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index 8f0a740..f411df7 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -14,6 +14,7 @@
 #include "qemu/mmap-alloc.h"
 #include "qemu/host-utils.h"
 #include "exec/memory.h"
+#include "standard-headers/linux/mman.h"
 
 #define HUGETLBFS_MAGIC   0x958458f6
 
@@ -99,6 +100,7 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, 
uint32_t flags)
 void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
 #endif
 bool shared = flags & RAM_SHARED;
+int mmap_xflags = 0;
 size_t offset;
 void *ptr1;
 
@@ -109,16 +111,20 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, 
uint32_t flags)
 assert(is_power_of_2(align));
 /* Always align to host page size */
 assert(align >= 

[Qemu-devel] [PATCH v5 5/7] util/mmap-alloc: Switch the RAM_SYNC flags to OnOffAuto

2018-11-05 Thread Zhang Yi
Signed-off-by: Zhang Yi 

A set of RAM_SYNC_ON_OFF_AUTO{AUTO,ON,OFF} flags are added to
qemu_ram_mmap():

- If RAM_SYNC_ON_OFF_AUTO_ON is present, qemu_ram_mmap() will try to pass
  MAP_SYNC to mmap(). It will then fail if the host OS or the backend
  file do not support MAP_SYNC, or MAP_SYNC is conflict with other
  flags.

- If RAM_SYNC_ON_OFF_AUTO_OFF is present, qemu_ram_mmap() will never pass
  MAP_SYNC to mmap().

- If RAM_SYNC_ON_OFF_AUTO_AUTO is present, and
  * if the host OS and the backend file support MAP_SYNC, and MAP_SYNC
is not conflict with other flags, qemu_ram_mmap() will work as if
RAM_SYNC_ON_OFF_AUTO_ON is present;
  * otherwise, qemu_ram_mmap() will work as if RAM_SYNC_ON_OFF_AUTO_OFF is
present.
---
 include/exec/memory.h |  9 -
 util/mmap-alloc.c | 12 ++--
 2 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 33a4e2c..c74c467 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -127,7 +127,14 @@ typedef struct IOMMUNotifier IOMMUNotifier;
 #define RAM_PMEM (1 << 5)
 
 /* RAM can be mmap by a MAP_SYNC flag */
-#define RAM_SYNC (1 << 6)
+#define RAM_SYNC_SHIFT  6
+#define RAM_SYNC_SHIFT_AUTO  7
+
+#define RAM_SYNC_ON_OFF_AUTO_ON   (1UL << RAM_SYNC_SHIFT)
+#define RAM_SYNC_ON_OFF_AUTO_OFF  (0UL << RAM_SYNC_SHIFT)
+#define RAM_SYNC_ON_OFF_AUTO_AUTO (1UL << RAM_SYNC_SHIFT_AUTO)
+
+#define RAM_SYNC (RAM_SYNC_ON_OFF_AUTO_ON | RAM_SYNC_ON_OFF_AUTO_AUTO)
 
 static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
IOMMUNotifierFlag flags,
diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index f411df7..fe9303f 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -111,6 +111,10 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, 
uint32_t flags)
 assert(is_power_of_2(align));
 /* Always align to host page size */
 assert(align >= getpagesize());
+if ((flags & RAM_SYNC_ON_OFF_AUTO_ON) &&
+(!shared || !MAP_SYNC_FLAGS)) {
+return MAP_FAILED;
+}
 if ((flags & RAM_SYNC) && shared) {
 mmap_xflags |= MAP_SYNC_FLAGS;
 }
@@ -123,8 +127,12 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, 
uint32_t flags)
 (shared ? MAP_SHARED : MAP_PRIVATE) | mmap_xflags,
 fd, 0);
 if ((ptr1 == MAP_FAILED) && (mmap_xflags & MAP_SYNC_FLAGS)) {
-mmap_xflags &= ~MAP_SYNC_FLAGS;
-goto retry_mmap_fd;
+if (flags & RAM_SYNC_ON_OFF_AUTO_AUTO) {
+mmap_xflags &= ~MAP_SYNC_FLAGS;
+goto retry_mmap_fd;
+}
+munmap(ptr, total);
+return MAP_FAILED;
 }
 
 if (offset > 0) {
-- 
2.7.4




[Qemu-devel] [PATCH v5 0/7] nvdimm: support MAP_SYNC for memory-backend-file

2018-11-05 Thread Zhang Yi
Linux 4.15 introduces a new mmap flag MAP_SYNC, which can be used to
guarantee the write persistence to mmap'ed files supporting DAX (e.g.,
files on ext4/xfs file system mounted with '-o dax').

A description of MAP_SYNC and MAP_SHARED_VALIDATE can be found at
https://patchwork.kernel.org/patch/10028151/

In order to make sure that the file metadata is in sync after a fault 
while we are writing a shared DAX supporting backend files, this
patch-set enables QEMU to use MAP_SYNC flag for memory-backend-dax-file.

As the DAX vs DMA truncated issue was solved, we refined the code and
send out this feature for the v5 version.

A new auto on/off option 'sync' is added to memory-backend-file:
 - on:  try to pass MAP_SYNC to mmap(2); if MAP_SYNC is not supported or
'share=off', QEMU will abort
 - off: never pass MAP_SYNC to mmap(2)
 - auto (default): if MAP_SYNC is supported and 'share=on', work as if
'sync=on'; otherwise, work as if 'sync=off'

Changes in v5:
 * Add patch 1 to fix a memory leak issue.
 * Refine the patch 4-6
 * Remove the patch 3 as we already change the parameter from "shared" to
   "flags"

Changes in v4:
 * Add patch 1-3 to switch some functions to a single 'flags'
   parameters. (Michael S. Tsirkin)
 * v3 patch 1-3 become v4 patch 4-6.
 * Patch 4: move definitions of MAP_SYNC and MAP_SHARED_VALIDATE to a
   new header file under include/standard-headers/linux/. (Michael S. Tsirkin)
 * Patch 6: refine the description of the 'sync' option. (Michael S. Tsirkin)

Changes in v3:
 * Patch 1: add MAP_SHARED_VALIDATE in both sync=on and sync=auto
   cases, and add back the retry mechanism. MAP_SYNC will be ignored
   by Linux kernel 4.15 if MAP_SHARED_VALIDATE is missed.
 * Patch 1: define MAP_SYNC and MAP_SHARED_VALIDATE as 0 on non-Linux
   platforms in order to make qemu_ram_mmap() compile on those platforms.
 * Patch 2&3: include more information in error messages of
   memory-backend in hope to help user to identify the error.
   (Dr. David Alan Gilbert)
 * Patch 3: fix typo in the commit message. (Dr. David Alan Gilbert)

Changes in v2:
 * Add 'sync' option to control the use of MAP_SYNC. (Eduardo Habkost)
 * Remove the unnecessary set of MAP_SHARED_VALIDATE in some cases and
   the retry mechanism in qemu_ram_mmap(). (Michael S. Tsirkin)
 * Move OS dependent definitions of MAP_SYNC and MAP_SHARED_VALIDATE
   to osdep.h. (Michael S. Tsirkin)

Zhang Yi (7):
  numa: Fixed the memory leak of numa error message
  util/mmap-alloc: switch qemu_ram_mmap() to 'flags' parameter
  exec: switch qemu_ram_alloc_from_{file, fd} to the 'flags' parameter
  util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
  util/mmap-alloc: Switch the RAM_SYNC flags to OnOffAuto
  hostmem: add more information in error messages
  hostmem-file: add 'sync' option

 backends/hostmem-file.c   | 45 +--
 backends/hostmem.c|  8 ---
 docs/nvdimm.txt   | 20 +++-
 exec.c|  9 +++
 include/exec/memory.h | 18 ++
 include/exec/ram_addr.h   |  1 +
 include/qemu/mmap-alloc.h | 20 +++-
 include/standard-headers/linux/mman.h | 44 ++
 numa.c|  1 +
 qemu-options.hx   | 22 -
 util/mmap-alloc.c | 26 
 util/oslib-posix.c|  4 +++-
 12 files changed, 200 insertions(+), 18 deletions(-)
 create mode 100644 include/standard-headers/linux/mman.h

-- 
2.7.4




[Qemu-devel] [PATCH v5 2/7] util/mmap-alloc: switch qemu_ram_mmap() to 'flags' parameter

2018-11-05 Thread Zhang Yi
As more flag parameters besides the existing 'shared' are going to be
added to qemu_ram_mmap(), let's switch 'shared' to a 'flags' parameter
in advance, so as to ease the further additions.

Signed-off-by: Haozhong Zhang 
Signed-off-by: Zhang Yi 
---
 exec.c|  3 +--
 include/qemu/mmap-alloc.h | 19 ++-
 util/mmap-alloc.c |  8 +---
 util/oslib-posix.c|  4 +++-
 4 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/exec.c b/exec.c
index bb6170d..273f668 100644
--- a/exec.c
+++ b/exec.c
@@ -1859,8 +1859,7 @@ static void *file_ram_alloc(RAMBlock *block,
 perror("ftruncate");
 }
 
-area = qemu_ram_mmap(fd, memory, block->mr->align,
- block->flags & RAM_SHARED);
+area = qemu_ram_mmap(fd, memory, block->mr->align, block->flags);
 if (area == MAP_FAILED) {
 error_setg_errno(errp, errno,
  "unable to map backing store for guest RAM");
diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
index 50385e3..6fe6ed4 100644
--- a/include/qemu/mmap-alloc.h
+++ b/include/qemu/mmap-alloc.h
@@ -7,7 +7,24 @@ size_t qemu_fd_getpagesize(int fd);
 
 size_t qemu_mempath_getpagesize(const char *mem_path);
 
-void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared);
+/**
+ * qemu_ram_mmap: mmap the specified file or device.
+ *
+ * Parameters:
+ *  @fd: the file or the device to mmap
+ *  @size: the number of bytes to be mmaped
+ *  @align: if not zero, specify the alignment of the starting mapping address;
+ *  otherwise, the alignment in use will be determined by QEMU.
+ *  @flags: specifies additional properties of the mapping, which can be one or
+ *  bit-or of following values
+ *  - RAM_SHARED: mmap with MAP_SHARED flag
+ *  Other bits are ignored.
+ *
+ * Return:
+ *  On success, return a pointer to the mapped area.
+ *  On failure, return MAP_FAILED.
+ */
+void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags);
 
 void qemu_ram_munmap(void *ptr, size_t size);
 
diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index fd329ec..8f0a740 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -13,6 +13,7 @@
 #include "qemu/osdep.h"
 #include "qemu/mmap-alloc.h"
 #include "qemu/host-utils.h"
+#include "exec/memory.h"
 
 #define HUGETLBFS_MAGIC   0x958458f6
 
@@ -75,7 +76,7 @@ size_t qemu_mempath_getpagesize(const char *mem_path)
 return getpagesize();
 }
 
-void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
+void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
 {
 /*
  * Note: this always allocates at least one extra page of virtual address
@@ -92,11 +93,12 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, bool 
shared)
  * anonymous memory is OK.
  */
 int anonfd = fd == -1 || qemu_fd_getpagesize(fd) == getpagesize() ? -1 : 
fd;
-int flags = anonfd == -1 ? MAP_ANONYMOUS : MAP_NORESERVE;
-void *ptr = mmap(0, total, PROT_NONE, flags | MAP_PRIVATE, anonfd, 0);
+int mmap_flags = anonfd == -1 ? MAP_ANONYMOUS : MAP_NORESERVE;
+void *ptr = mmap(0, total, PROT_NONE, mmap_flags | MAP_PRIVATE, anonfd, 0);
 #else
 void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
 #endif
+bool shared = flags & RAM_SHARED;
 size_t offset;
 void *ptr1;
 
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index fbd0dc8..c28869d 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -203,7 +203,9 @@ void *qemu_memalign(size_t alignment, size_t size)
 void *qemu_anon_ram_alloc(size_t size, uint64_t *alignment, bool shared)
 {
 size_t align = QEMU_VMALLOC_ALIGN;
-void *ptr = qemu_ram_mmap(-1, size, align, shared);
+uint32_t flags = 0;
+flags |= shared;
+void *ptr = qemu_ram_mmap(-1, size, align, flags);
 
 if (ptr == MAP_FAILED) {
 return NULL;
-- 
2.7.4




[Qemu-devel] [PATCH v5 1/7] numa: Fixed the memory leak of numa error message

2018-11-05 Thread Zhang Yi
object_get_canonical_path_component() returns a string which
must be freed using g_free().

Signed-off-by: Zhang Yi 
---
 numa.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/numa.c b/numa.c
index 50ec016..3875e1e 100644
--- a/numa.c
+++ b/numa.c
@@ -533,6 +533,7 @@ void memory_region_allocate_system_memory(MemoryRegion *mr, 
Object *owner,
 error_report("memory backend %s is used multiple times. Each "
  "-numa option must use a different memdev value.",
  path);
+g_free(path);
 exit(1);
 }
 
-- 
2.7.4




[Qemu-devel] [PATCH 1/1] hostmem-file: remove the invalid pmem object id.

2018-10-24 Thread Zhang Yi
We will never get the canonical path from the object
before object_property_add_child.

Signed-off-by: Zhang Yi 
---
 backends/hostmem-file.c | 14 --
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index 639c8d4..9691c48 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -145,26 +145,20 @@ static void file_memory_backend_set_pmem(Object *o, bool 
value, Error **errp)
 HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
 
 if (host_memory_backend_mr_inited(backend)) {
-char *path = object_get_canonical_path_component(o);
 
-error_setg(errp, "cannot change property 'pmem' of %s '%s'",
-   object_get_typename(o),
-   path);
-g_free(path);
+error_setg(errp, "cannot change property 'pmem' of %s.",
+   object_get_typename(o));
 return;
 }
 
 #ifndef CONFIG_LIBPMEM
 if (value) {
 Error *local_err = NULL;
-char *path = object_get_canonical_path_component(o);
 
 error_setg(_err,
"Lack of libpmem support while setting the 'pmem=on'"
-   " of %s '%s'. We can't ensure data persistence.",
-   object_get_typename(o),
-   path);
-g_free(path);
+   " of %s. We can't ensure data persistence.",
+   object_get_typename(o));
 error_propagate(errp, local_err);
 return;
 }
-- 
2.7.4




[Qemu-devel] [PATCH 1/1] hostmem-file: fixed the memory leak while get pmem path.

2018-08-28 Thread Zhang Yi
object_get_canonical_path_component() returns a string which
must be freed using g_free().

Reported-by: Peter Maydell 
Signed-off-by: Michael S. Tsirkin 
Signed-off-by: Zhang Yi 
---
 backends/hostmem-file.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index 2476dcb..e8831a8 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -145,20 +145,26 @@ static void file_memory_backend_set_pmem(Object *o, bool 
value, Error **errp)
 HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
 
 if (host_memory_backend_mr_inited(backend)) {
+char *path = object_get_canonical_path_component(o);
+
 error_setg(errp, "cannot change property 'pmem' of %s '%s'",
object_get_typename(o),
-   object_get_canonical_path_component(o));
+   path);
+g_free(path);
 return;
 }
 
 #ifndef CONFIG_LIBPMEM
 if (value) {
 Error *local_err = NULL;
+char *path = object_get_canonical_path_component(o);
+
 error_setg(_err,
"Lack of libpmem support while setting the 'pmem=on'"
" of %s '%s'. We can't ensure data persistence.",
object_get_typename(o),
-   object_get_canonical_path_component(o));
+   path);
+g_free(path);
 error_propagate(errp, local_err);
 return;
 }
-- 
2.7.4




Re: [Qemu-devel] [RFC PATCH 1/1] nvdimm: let qemu requiring section alignment of pmem resource.

2018-06-11 Thread Zhang,Yi
On 一, 2018-06-11 at 19:55 -0700, Dan Williams wrote:
> On Mon, Jun 11, 2018 at 9:26 AM, Stefan Hajnoczi  > wrote:
> > 
> > On Mon, Jun 11, 2018 at 06:54:25PM +0800, Zhang Yi wrote:
> > > 
> > > Nvdimm driver use Memory hot-plug APIs to map it's pmem resource,
> > > which at a section granularity.
> > > 
> > > When QEMU emulated the vNVDIMM device, decrease the label-
> > > storage,
> > > QEMU will put the vNVDIMMs directly next to one another in
> > > physical
> > > address space, which means that the boundary between them won't
> > > align to the 128 MB memory section size.
> > I'm having a hard time parsing this.
> > 
> > Where does the "128 MB memory section size" come from?  ACPI?
> > A chipset-specific value?
> > 
> The devm_memremap_pages() implementation use the memory hotplug core
> to allocate the 'struct page' array/map for persistent memory. Memory
> hotplug can only be performed in terms of sections, 128MB on x86_64.
> There is some limited support for allowing devm_memremap_pages() to
> overlap 'System RAM' within a given section, but it does not
> currently
> support multiple devm_memremap_pages() calls overlapping within the
> same section. There is currently a kernel bug where we do not handle
> this unsupported configuration gracefully. The fix will cause
> configurations configurations that try to overlap 2 persistent memory
> ranges in the same section to fail.
> 
> The proposed fix is trying to make sure that QEMU does not run afoul
> of this constraint.
> 
> There is currently no line of sight to reduce the minimum memory
> hotplug alignment size to less than 128M. Also, as other
> architectures
> outside of x86_64 add devm_memremap_pages() support, the minimum
> section alignment constraint might change and is a property of a
> guest
> OS. My understanding is that some guest OSes might expect an even
> larger persistent memory minimum alignment.
> 
Thanks Dan's explanation, I still have a question that why we
overlapping
the un-align area  instead of drop it? and let it align to the next
section. 




[Qemu-devel] [RFC PATCH 1/1] nvdimm: let qemu requiring section alignment of pmem resource.

2018-06-10 Thread Zhang Yi
Nvdimm driver use Memory hot-plug APIs to map it's pmem resource,
which at a section granularity.

When QEMU emulated the vNVDIMM device, decrease the label-storage,
QEMU will put the vNVDIMMs directly next to one another in physical
address space, which means that the boundary between them won't
align to the 128 MB memory section size.

Signed-off-by: Zhang Yi 
---
 hw/mem/nvdimm.c | 2 +-
 include/hw/mem/nvdimm.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c
index 4087aca..ff6e171 100644
--- a/hw/mem/nvdimm.c
+++ b/hw/mem/nvdimm.c
@@ -109,7 +109,7 @@ static void nvdimm_realize(PCDIMMDevice *dimm, Error **errp)
 NVDIMMDevice *nvdimm = NVDIMM(dimm);
 uint64_t align, pmem_size, size = memory_region_size(mr);
 
-align = memory_region_get_alignment(mr);
+align = MAX(memory_region_get_alignment(mr), NVDIMM_ALIGN_SIZE);
 
 pmem_size = size - nvdimm->label_size;
 nvdimm->label_data = memory_region_get_ram_ptr(mr) + pmem_size;
diff --git a/include/hw/mem/nvdimm.h b/include/hw/mem/nvdimm.h
index 3c82751..1d384e4 100644
--- a/include/hw/mem/nvdimm.h
+++ b/include/hw/mem/nvdimm.h
@@ -41,6 +41,7 @@
  *at least 128KB in size, which holds around 1000 labels."
  */
 #define MIN_NAMESPACE_LABEL_SIZE  (128UL << 10)
+#define NVDIMM_ALIGN_SIZE  (128UL << 20)
 
 #define TYPE_NVDIMM  "nvdimm"
 #define NVDIMM(obj)  OBJECT_CHECK(NVDIMMDevice, (obj), TYPE_NVDIMM)
-- 
2.7.4