Re: [Qemu-devel] [RFC QEMU PATCH v4 03/10] hostmem-xen: add a host memory backend for Xen

2018-02-27 Thread Haozhong Zhang
On 02/27/18 16:41 +, Anthony PERARD wrote:
> On Thu, Dec 07, 2017 at 06:18:05PM +0800, Haozhong Zhang wrote:
> > diff --git a/backends/hostmem.c b/backends/hostmem.c
> > index ee2c2d5bfd..ba13a52994 100644
> > --- a/backends/hostmem.c
> > +++ b/backends/hostmem.c
> > @@ -12,6 +12,7 @@
> >  #include "qemu/osdep.h"
> >  #include "sysemu/hostmem.h"
> >  #include "hw/boards.h"
> > +#include "hw/xen/xen.h"
> >  #include "qapi/error.h"
> >  #include "qapi/visitor.h"
> >  #include "qapi-types.h"
> > @@ -277,6 +278,14 @@ host_memory_backend_memory_complete(UserCreatable *uc, 
> > Error **errp)
> >  goto out;
> >  }
> >  
> > +/*
> > + * The backend storage of MEMORY_BACKEND_XEN is managed by Xen,
> > + * so no further work in this function is needed.
> > + */
> > +if (xen_enabled() && !backend->mr.ram_block) {
> > +goto out;
> > +}
> > +
> >  ptr = memory_region_get_ram_ptr(&backend->mr);
> >  sz = memory_region_size(&backend->mr);
> >  
> > diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
> > index 66eace5a5c..dcbfce33d5 100644
> > --- a/hw/mem/pc-dimm.c
> > +++ b/hw/mem/pc-dimm.c
> > @@ -28,6 +28,7 @@
> >  #include "sysemu/kvm.h"
> >  #include "trace.h"
> >  #include "hw/virtio/vhost.h"
> > +#include "hw/xen/xen.h"
> >  
> >  typedef struct pc_dimms_capacity {
> >   uint64_t size;
> > @@ -108,7 +109,10 @@ void pc_dimm_memory_plug(DeviceState *dev, 
> > MemoryHotplugState *hpms,
> >  }
> >  
> >  memory_region_add_subregion(&hpms->mr, addr - hpms->base, mr);
> > -vmstate_register_ram(vmstate_mr, dev);
> > +/* memory-backend-xen is not backed by RAM. */
> > +if (!xen_enabled()) {
> 
> Is it possible to have the same condition as the one used in
> host_memory_backend_memory_complete? i.e. base on whether the memory
> region is mapped or not (backend->mr.ram_block).

Like "if (!xen_enabled() || backend->mr.ram_block))"? No, it will mute
the abortion (vmstate_register_ram --> qemu_ram_set_idstr ) caused by
the case that !backend->mr.ram_block in the non-xen environment.

Haozhong

> 
> > +vmstate_register_ram(vmstate_mr, dev);
> > +}
> >  numa_set_mem_node_id(addr, memory_region_size(mr), dimm->node);
> >  
> >  out:
> > -- 
> > 2.15.1
> > 
> 
> -- 
> Anthony PERARD



Re: [Qemu-devel] [RFC QEMU PATCH v4 02/10] xen-hvm: create the hotplug memory region on Xen

2018-02-27 Thread Haozhong Zhang
On 02/27/18 16:37 +, Anthony PERARD wrote:
> On Thu, Dec 07, 2017 at 06:18:04PM +0800, Haozhong Zhang wrote:
> > The guest physical address of vNVDIMM is allocated from the hotplug
> > memory region, which is not created when QEMU is used as Xen device
> > model. In order to use vNVDIMM for Xen HVM domains, this commit reuses
> > the code for pc machine type to create the hotplug memory region for
> > Xen HVM domains.
> > 
> > Signed-off-by: Haozhong Zhang 
> > ---
> > Cc: "Michael S. Tsirkin" 
> > Cc: Paolo Bonzini 
> > Cc: Richard Henderson 
> > Cc: Eduardo Habkost 
> > Cc: Stefano Stabellini 
> > Cc: Anthony Perard 
> > ---
> >  hw/i386/pc.c  | 86 
> > ---
> >  hw/i386/xen/xen-hvm.c |  2 ++
> >  include/hw/i386/pc.h  |  1 +
> >  3 files changed, 51 insertions(+), 38 deletions(-)
> > 
> > diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> > index 186545d2a4..9f46c8df79 100644
> > --- a/hw/i386/pc.c
> > +++ b/hw/i386/pc.c
> > @@ -1315,6 +1315,53 @@ void xen_load_linux(PCMachineState *pcms)
> >  pcms->fw_cfg = fw_cfg;
> >  }
> >  
> > +void pc_memory_hotplug_init(PCMachineState *pcms, MemoryRegion 
> > *system_memory)
> 
> It might be better to have a separate patch which move the code into a 
> function.

will move it to a separate patch

> 
> > +{
> > +MachineState *machine = MACHINE(pcms);
> > +PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
> > +ram_addr_t hotplug_mem_size = machine->maxram_size - machine->ram_size;
> > +
> > +if (!pcmc->has_reserved_memory || machine->ram_size >= 
> > machine->maxram_size)
> > +return;
> > +
> > +if (memory_region_size(&pcms->hotplug_memory.mr)) {
> 
> This new check looks like to catch programming error, rather than user
> error. Would it be better to be an assert instead?

Well, this was a debugging check and I forgot to remove it before
sending the patch. I'll drop it in the next version.

Thanks,
Haozhong

> 
> > +error_report("hotplug memory region has been initialized");
> > +exit(EXIT_FAILURE);
> > +}
> > +
> 
> -- 
> Anthony PERARD



[Qemu-devel] [PATCH v4 6/8] migration/ram: ensure write persistence on loading normal pages to PMEM

2018-02-27 Thread Haozhong Zhang
When loading a normal page to persistent memory, load its data by
libpmem function pmem_memcpy_nodrain() instead of memcpy(). Combined
with a call to pmem_drain() at the end of memory loading, we can
guarantee all those normal pages are persistenly loaded to PMEM.

Signed-off-by: Haozhong Zhang 
---
 include/migration/qemu-file-types.h |  2 ++
 include/qemu/pmem.h |  1 +
 migration/qemu-file.c   | 29 +++--
 migration/ram.c |  2 +-
 stubs/pmem.c|  5 +
 tests/Makefile.include  |  2 +-
 6 files changed, 29 insertions(+), 12 deletions(-)

diff --git a/include/migration/qemu-file-types.h 
b/include/migration/qemu-file-types.h
index bd6d7dd7f9..c7c3f665f9 100644
--- a/include/migration/qemu-file-types.h
+++ b/include/migration/qemu-file-types.h
@@ -33,6 +33,8 @@ void qemu_put_byte(QEMUFile *f, int v);
 void qemu_put_be16(QEMUFile *f, unsigned int v);
 void qemu_put_be32(QEMUFile *f, unsigned int v);
 void qemu_put_be64(QEMUFile *f, uint64_t v);
+size_t qemu_get_buffer_common(QEMUFile *f, uint8_t *buf, size_t size,
+  bool is_pmem);
 size_t qemu_get_buffer(QEMUFile *f, uint8_t *buf, size_t size);
 
 int qemu_get_byte(QEMUFile *f);
diff --git a/include/qemu/pmem.h b/include/qemu/pmem.h
index ce96379f3c..127b87c326 100644
--- a/include/qemu/pmem.h
+++ b/include/qemu/pmem.h
@@ -16,6 +16,7 @@
 #include 
 #else  /* !CONFIG_LIBPMEM */
 
+void *pmem_memcpy_nodrain(void *pmemdest, const void *src, size_t len);
 void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len);
 void *pmem_memset_nodrain(void *pmemdest, int c, size_t len);
 void pmem_drain(void);
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 2ab2bf362d..d19f677796 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -26,6 +26,7 @@
 #include "qemu-common.h"
 #include "qemu/error-report.h"
 #include "qemu/iov.h"
+#include "qemu/pmem.h"
 #include "migration.h"
 #include "qemu-file.h"
 #include "trace.h"
@@ -471,18 +472,13 @@ size_t qemu_peek_buffer(QEMUFile *f, uint8_t **buf, 
size_t size, size_t offset)
 return size;
 }
 
-/*
- * Read 'size' bytes of data from the file into buf.
- * 'size' can be larger than the internal buffer.
- *
- * It will return size bytes unless there was an error, in which case it will
- * return as many as it managed to read (assuming blocking fd's which
- * all current QEMUFile are)
- */
-size_t qemu_get_buffer(QEMUFile *f, uint8_t *buf, size_t size)
+size_t qemu_get_buffer_common(QEMUFile *f, uint8_t *buf, size_t size,
+  bool is_pmem)
 {
 size_t pending = size;
 size_t done = 0;
+void *(*memcpy_func)(void *d, const void *s, size_t n) =
+is_pmem ? pmem_memcpy_nodrain : memcpy;
 
 while (pending > 0) {
 size_t res;
@@ -492,7 +488,7 @@ size_t qemu_get_buffer(QEMUFile *f, uint8_t *buf, size_t 
size)
 if (res == 0) {
 return done;
 }
-memcpy(buf, src, res);
+memcpy_func(buf, src, res);
 qemu_file_skip(f, res);
 buf += res;
 pending -= res;
@@ -501,6 +497,19 @@ size_t qemu_get_buffer(QEMUFile *f, uint8_t *buf, size_t 
size)
 return done;
 }
 
+/*
+ * Read 'size' bytes of data from the file into buf.
+ * 'size' can be larger than the internal buffer.
+ *
+ * It will return size bytes unless there was an error, in which case it will
+ * return as many as it managed to read (assuming blocking fd's which
+ * all current QEMUFile are)
+ */
+size_t qemu_get_buffer(QEMUFile *f, uint8_t *buf, size_t size)
+{
+return qemu_get_buffer_common(f, buf, size, false);
+}
+
 /*
  * Read 'size' bytes of data from the file.
  * 'size' can be larger than the internal buffer.
diff --git a/migration/ram.c b/migration/ram.c
index 3904ceee79..ea2ad7dff0 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2959,7 +2959,7 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 break;
 
 case RAM_SAVE_FLAG_PAGE:
-qemu_get_buffer(f, host, TARGET_PAGE_SIZE);
+qemu_get_buffer_common(f, host, TARGET_PAGE_SIZE, is_pmem);
 break;
 
 case RAM_SAVE_FLAG_COMPRESS_PAGE:
diff --git a/stubs/pmem.c b/stubs/pmem.c
index a65b3bfc6b..e172f31174 100644
--- a/stubs/pmem.c
+++ b/stubs/pmem.c
@@ -26,3 +26,8 @@ void *pmem_memset_nodrain(void *pmemdest, int c, size_t len)
 void pmem_drain(void)
 {
 }
+
+void *pmem_memcpy_nodrain(void *pmemdest, const void *src, size_t len)
+{
+return memcpy(pmemdest, src, len);
+}
diff --git a/tests/Makefile.include b/tests/Makefile.include
index 577eb573a2..37bb85f591 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -637,7 +637,7 @@ tests/test-qdev-global-props$(EXESUF): 
tests/test-qdev-global-props.o \
$(test-qapi-obj-y)
 tests/test-vmstate$(EXESUF): tests/test-vmstate.o \
migration/vmstate.o migration/vmstate-types.o mi

Re: [Qemu-devel] [PATCH v4 0/8] nvdimm: guarantee persistence of QEMU writes to persistent memory

2018-02-27 Thread Haozhong Zhang
On 02/28/18 15:25 +0800, Haozhong Zhang wrote:
> QEMU writes to vNVDIMM backends in the vNVDIMM label emulation and
> live migration. If the backend is on the persistent memory, QEMU needs
> to take proper operations to ensure its writes persistent on the
> persistent memory. Otherwise, a host power failure may result in the
> loss the guest data on the persistent memory.
>


> This v3 patch series is based on Marcel's patch "mem: add share
> parameter to memory-backend-ram" [1] because of the changes in patch 1.
> 
> [1] https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg03858.html

I forgot to remove this part. v4 can be applied on the current master
branch now because above [1] has already been merged.



[Qemu-devel] [PATCH v4 4/8] mem/nvdimm: ensure write persistence to PMEM in label emulation

2018-02-27 Thread Haozhong Zhang
Guest writes to vNVDIMM labels are intercepted and performed on the
backend by QEMU. When the backend is a real persistent memort, QEMU
needs to take proper operations to ensure its write persistence on the
persistent memory. Otherwise, a host power failure may result in the
loss of guest label configurations.

Signed-off-by: Haozhong Zhang 
---
 hw/mem/nvdimm.c |  9 -
 include/qemu/pmem.h | 23 +++
 stubs/Makefile.objs |  1 +
 stubs/pmem.c| 19 +++
 4 files changed, 51 insertions(+), 1 deletion(-)
 create mode 100644 include/qemu/pmem.h
 create mode 100644 stubs/pmem.c

diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c
index 61e677f92f..18861d1a7a 100644
--- a/hw/mem/nvdimm.c
+++ b/hw/mem/nvdimm.c
@@ -23,6 +23,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/pmem.h"
 #include "qapi/error.h"
 #include "qapi/visitor.h"
 #include "qapi-visit.h"
@@ -156,11 +157,17 @@ static void nvdimm_write_label_data(NVDIMMDevice *nvdimm, 
const void *buf,
 {
 MemoryRegion *mr;
 PCDIMMDevice *dimm = PC_DIMM(nvdimm);
+bool is_pmem = object_property_get_bool(OBJECT(dimm->hostmem),
+"pmem", NULL);
 uint64_t backend_offset;
 
 nvdimm_validate_rw_label_data(nvdimm, size, offset);
 
-memcpy(nvdimm->label_data + offset, buf, size);
+if (!is_pmem) {
+memcpy(nvdimm->label_data + offset, buf, size);
+} else {
+pmem_memcpy_persist(nvdimm->label_data + offset, buf, size);
+}
 
 mr = host_memory_backend_get_memory(dimm->hostmem, &error_abort);
 backend_offset = memory_region_size(mr) - nvdimm->label_size + offset;
diff --git a/include/qemu/pmem.h b/include/qemu/pmem.h
new file mode 100644
index 00..16f5b2653a
--- /dev/null
+++ b/include/qemu/pmem.h
@@ -0,0 +1,23 @@
+/*
+ * QEMU header file for libpmem.
+ *
+ * Copyright (c) 2018 Intel Corporation.
+ *
+ * Author: Haozhong Zhang 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_PMEM_H
+#define QEMU_PMEM_H
+
+#ifdef CONFIG_LIBPMEM
+#include 
+#else  /* !CONFIG_LIBPMEM */
+
+void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len);
+
+#endif /* CONFIG_LIBPMEM */
+
+#endif /* !QEMU_PMEM_H */
diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs
index 2d59d84091..ba944b9739 100644
--- a/stubs/Makefile.objs
+++ b/stubs/Makefile.objs
@@ -43,3 +43,4 @@ stub-obj-y += xen-common.o
 stub-obj-y += xen-hvm.o
 stub-obj-y += pci-host-piix.o
 stub-obj-y += ram-block.o
+stub-obj-$(call lnot,$(CONFIG_LIBPMEM)) += pmem.o
\ No newline at end of file
diff --git a/stubs/pmem.c b/stubs/pmem.c
new file mode 100644
index 00..03d990e571
--- /dev/null
+++ b/stubs/pmem.c
@@ -0,0 +1,19 @@
+/*
+ * Stubs for libpmem.
+ *
+ * Copyright (c) 2018 Intel Corporation.
+ *
+ * Author: Haozhong Zhang 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include 
+
+#include "qemu/pmem.h"
+
+void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len)
+{
+return memcpy(pmemdest, src, len);
+}
-- 
2.14.1




[Qemu-devel] [PATCH v4 7/8] migration/ram: ensure write persistence on loading compressed pages to PMEM

2018-02-27 Thread Haozhong Zhang
When loading a compressed page to persistent memory, flush CPU cache
after the data is decompressed. Combined with a call to pmem_drain()
at the end of memory loading, we can guarantee those compressed pages
are persistently loaded to PMEM.

Signed-off-by: Haozhong Zhang 
---
 include/qemu/pmem.h |  1 +
 migration/ram.c | 16 +++-
 stubs/pmem.c|  4 
 3 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/include/qemu/pmem.h b/include/qemu/pmem.h
index 127b87c326..120439ecb8 100644
--- a/include/qemu/pmem.h
+++ b/include/qemu/pmem.h
@@ -20,6 +20,7 @@ void *pmem_memcpy_nodrain(void *pmemdest, const void *src, 
size_t len);
 void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len);
 void *pmem_memset_nodrain(void *pmemdest, int c, size_t len);
 void pmem_drain(void);
+void pmem_flush(const void *addr, size_t len);
 
 #endif /* CONFIG_LIBPMEM */
 
diff --git a/migration/ram.c b/migration/ram.c
index ea2ad7dff0..37f3c39cee 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -276,6 +276,7 @@ struct DecompressParam {
 void *des;
 uint8_t *compbuf;
 int len;
+bool is_pmem;
 };
 typedef struct DecompressParam DecompressParam;
 
@@ -2498,7 +2499,7 @@ static void *do_data_decompress(void *opaque)
 DecompressParam *param = opaque;
 unsigned long pagesize;
 uint8_t *des;
-int len;
+int len, rc;
 
 qemu_mutex_lock(¶m->mutex);
 while (!param->quit) {
@@ -2514,8 +2515,11 @@ static void *do_data_decompress(void *opaque)
  * not a problem because the dirty page will be retransferred
  * and uncompress() won't break the data in other pages.
  */
-uncompress((Bytef *)des, &pagesize,
-   (const Bytef *)param->compbuf, len);
+rc = uncompress((Bytef *)des, &pagesize,
+(const Bytef *)param->compbuf, len);
+if (rc == Z_OK && param->is_pmem) {
+pmem_flush(des, len);
+}
 
 qemu_mutex_lock(&decomp_done_lock);
 param->done = true;
@@ -2601,7 +2605,8 @@ static void compress_threads_load_cleanup(void)
 }
 
 static void decompress_data_with_multi_threads(QEMUFile *f,
-   void *host, int len)
+   void *host, int len,
+   bool is_pmem)
 {
 int idx, thread_count;
 
@@ -2615,6 +2620,7 @@ static void decompress_data_with_multi_threads(QEMUFile 
*f,
 qemu_get_buffer(f, decomp_param[idx].compbuf, len);
 decomp_param[idx].des = host;
 decomp_param[idx].len = len;
+decomp_param[idx].is_pmem = is_pmem;
 qemu_cond_signal(&decomp_param[idx].cond);
 qemu_mutex_unlock(&decomp_param[idx].mutex);
 break;
@@ -2969,7 +2975,7 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 ret = -EINVAL;
 break;
 }
-decompress_data_with_multi_threads(f, host, len);
+decompress_data_with_multi_threads(f, host, len, is_pmem);
 break;
 
 case RAM_SAVE_FLAG_XBZRLE:
diff --git a/stubs/pmem.c b/stubs/pmem.c
index e172f31174..cfab830131 100644
--- a/stubs/pmem.c
+++ b/stubs/pmem.c
@@ -31,3 +31,7 @@ void *pmem_memcpy_nodrain(void *pmemdest, const void *src, 
size_t len)
 {
 return memcpy(pmemdest, src, len);
 }
+
+void pmem_flush(const void *addr, size_t len)
+{
+}
-- 
2.14.1




[Qemu-devel] [PATCH v4 8/8] migration/ram: ensure write persistence on loading xbzrle pages to PMEM

2018-02-27 Thread Haozhong Zhang
When loading a xbzrle encoded page to persistent memory, load the data
via libpmem function pmem_memcpy_nodrain() instead of memcpy().
Combined with a call to pmem_drain() at the end of memory loading, we
can guarantee those xbzrle encoded pages are persistently loaded to PMEM.

Signed-off-by: Haozhong Zhang 
---
 migration/ram.c| 6 +++---
 migration/xbzrle.c | 8 ++--
 migration/xbzrle.h | 3 ++-
 tests/Makefile.include | 2 +-
 tests/test-xbzrle.c| 4 ++--
 5 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 37f3c39cee..70b196c4f5 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2391,7 +2391,7 @@ static void ram_save_pending(QEMUFile *f, void *opaque, 
uint64_t max_size,
 }
 }
 
-static int load_xbzrle(QEMUFile *f, ram_addr_t addr, void *host)
+static int load_xbzrle(QEMUFile *f, ram_addr_t addr, void *host, bool is_pmem)
 {
 unsigned int xh_len;
 int xh_flags;
@@ -2417,7 +2417,7 @@ static int load_xbzrle(QEMUFile *f, ram_addr_t addr, void 
*host)
 
 /* decode RLE */
 if (xbzrle_decode_buffer(loaded_data, xh_len, host,
- TARGET_PAGE_SIZE) == -1) {
+ TARGET_PAGE_SIZE, is_pmem) == -1) {
 error_report("Failed to load XBZRLE page - decode error!");
 return -1;
 }
@@ -2979,7 +2979,7 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 break;
 
 case RAM_SAVE_FLAG_XBZRLE:
-if (load_xbzrle(f, addr, host) < 0) {
+if (load_xbzrle(f, addr, host, is_pmem) < 0) {
 error_report("Failed to decompress XBZRLE page at "
  RAM_ADDR_FMT, addr);
 ret = -EINVAL;
diff --git a/migration/xbzrle.c b/migration/xbzrle.c
index 1ba482ded9..ca713c3697 100644
--- a/migration/xbzrle.c
+++ b/migration/xbzrle.c
@@ -12,6 +12,7 @@
  */
 #include "qemu/osdep.h"
 #include "qemu/cutils.h"
+#include "qemu/pmem.h"
 #include "xbzrle.h"
 
 /*
@@ -126,11 +127,14 @@ int xbzrle_encode_buffer(uint8_t *old_buf, uint8_t 
*new_buf, int slen,
 return d;
 }
 
-int xbzrle_decode_buffer(uint8_t *src, int slen, uint8_t *dst, int dlen)
+int xbzrle_decode_buffer(uint8_t *src, int slen, uint8_t *dst, int dlen,
+ bool is_pmem)
 {
 int i = 0, d = 0;
 int ret;
 uint32_t count = 0;
+void *(*memcpy_func)(void *d, const void *s, size_t n) =
+is_pmem ? pmem_memcpy_nodrain : memcpy;
 
 while (i < slen) {
 
@@ -167,7 +171,7 @@ int xbzrle_decode_buffer(uint8_t *src, int slen, uint8_t 
*dst, int dlen)
 return -1;
 }
 
-memcpy(dst + d, src + i, count);
+memcpy_func(dst + d, src + i, count);
 d += count;
 i += count;
 }
diff --git a/migration/xbzrle.h b/migration/xbzrle.h
index a0db507b9c..f18f679f47 100644
--- a/migration/xbzrle.h
+++ b/migration/xbzrle.h
@@ -17,5 +17,6 @@
 int xbzrle_encode_buffer(uint8_t *old_buf, uint8_t *new_buf, int slen,
  uint8_t *dst, int dlen);
 
-int xbzrle_decode_buffer(uint8_t *src, int slen, uint8_t *dst, int dlen);
+int xbzrle_decode_buffer(uint8_t *src, int slen, uint8_t *dst, int dlen,
+ bool is_pmem);
 #endif
diff --git a/tests/Makefile.include b/tests/Makefile.include
index 37bb85f591..be5b7e484b 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -616,7 +616,7 @@ tests/test-thread-pool$(EXESUF): tests/test-thread-pool.o 
$(test-block-obj-y)
 tests/test-iov$(EXESUF): tests/test-iov.o $(test-util-obj-y)
 tests/test-hbitmap$(EXESUF): tests/test-hbitmap.o $(test-util-obj-y) 
$(test-crypto-obj-y)
 tests/test-x86-cpuid$(EXESUF): tests/test-x86-cpuid.o
-tests/test-xbzrle$(EXESUF): tests/test-xbzrle.o migration/xbzrle.o 
migration/page_cache.o $(test-util-obj-y)
+tests/test-xbzrle$(EXESUF): tests/test-xbzrle.o migration/xbzrle.o 
migration/page_cache.o stubs/pmem.o $(test-util-obj-y)
 tests/test-cutils$(EXESUF): tests/test-cutils.o util/cutils.o 
$(test-util-obj-y)
 tests/test-int128$(EXESUF): tests/test-int128.o
 tests/rcutorture$(EXESUF): tests/rcutorture.o $(test-util-obj-y)
diff --git a/tests/test-xbzrle.c b/tests/test-xbzrle.c
index f5e08de91e..9afa0c4bcb 100644
--- a/tests/test-xbzrle.c
+++ b/tests/test-xbzrle.c
@@ -101,7 +101,7 @@ static void test_encode_decode_1_byte(void)
PAGE_SIZE);
 g_assert(dlen == (uleb128_encode_small(&buf[0], 4095) + 2));
 
-rc = xbzrle_decode_buffer(compressed, dlen, buffer, PAGE_SIZE);
+rc = xbzrle_decode_buffer(compressed, dlen, buffer, PAGE_SIZE, false);
 g_assert(rc == PAGE_SIZE);
 g_assert(memcmp(test, buffer, PAGE_SIZE) == 0);
 
@@ -156,7 +156,7 @@ static void encode_decode_range(void)
 dlen = xbzrle_encode_buffer(test, buffer, PAGE_SIZE, compressed,
 PAGE_SIZE);
 
-rc = xbzrle_decode_buffer(compressed, dlen, test, PAGE_SIZE);
+rc = xbzrle_decode_buffer(compres

[Qemu-devel] [PATCH v4 3/8] configure: add libpmem support

2018-02-27 Thread Haozhong Zhang
Add a pair of configure options --{enable,disable}-libpmem to control
whether QEMU is compiled with PMDK libpmem [1].

QEMU may write to the host persistent memory (e.g. in vNVDIMM label
emulation and live migration), so it must take the proper operations
to ensure the persistence of its own writes. Depending on the CPU
models and available instructions, the optimal operation can vary [2].
PMDK libpmem have already implemented those operations on multiple CPU
models (x86 and ARM) and the logic to select the optimal ones, so QEMU
can just use libpmem rather than re-implement them.

[1] PMDK (formerly known as NMVL), https://github.com/pmem/pmdk/
[2] 
https://github.com/pmem/pmdk/blob/38bfa652721a37fd94c0130ce0e3f5d8baa3ed40/src/libpmem/pmem.c#L33

Signed-off-by: Haozhong Zhang 
---
 configure | 35 +++
 1 file changed, 35 insertions(+)

diff --git a/configure b/configure
index 39f3a43001..78e10f6d6d 100755
--- a/configure
+++ b/configure
@@ -450,6 +450,7 @@ jemalloc="no"
 replication="yes"
 vxhs=""
 libxml2=""
+libpmem=""
 
 supported_cpu="no"
 supported_os="no"
@@ -1360,6 +1361,10 @@ for opt do
   ;;
   --disable-git-update) git_update=no
   ;;
+  --enable-libpmem) libpmem=yes
+  ;;
+  --disable-libpmem) libpmem=no
+  ;;
   *)
   echo "ERROR: unknown option $opt"
   echo "Try '$0 --help' for more information"
@@ -1612,6 +1617,7 @@ disabled with --disable-FEATURE, default is enabled if 
available:
   crypto-afalgLinux AF_ALG crypto backend driver
   vhost-user  vhost-user support
   capstonecapstone disassembler support
+  libpmem libpmem support
 
 NOTE: The object files are built at the place where configure is launched
 EOF
@@ -5347,6 +5353,30 @@ EOF
   fi
 fi
 
+##
+# check for libpmem
+
+if test "$libpmem" != "no"; then
+  cat > $TMPC <
+int main(void)
+{
+  pmem_is_pmem(0, 0);
+  return 0;
+}
+EOF
+  libpmem_libs="-lpmem"
+  if compile_prog "" "$libpmem_libs" ; then
+libs_softmmu="$libpmem_libs $libs_softmmu"
+libpmem="yes"
+  else
+if test "$libpmem" = "yes" ; then
+  feature_not_found "libpmem" "Install nvml or pmdk"
+fi
+libpmem="no"
+  fi
+fi
+
 ##
 # End of CC checks
 # After here, no more $cc or $ld runs
@@ -5817,6 +5847,7 @@ echo "avx2 optimization $avx2_opt"
 echo "replication support $replication"
 echo "VxHS block device $vxhs"
 echo "capstone  $capstone"
+echo "libpmem support   $libpmem"
 
 if test "$sdl_too_old" = "yes"; then
 echo "-> Your SDL version is too old - please upgrade to have SDL support"
@@ -6542,6 +6573,10 @@ if test "$vxhs" = "yes" ; then
   echo "VXHS_LIBS=$vxhs_libs" >> $config_host_mak
 fi
 
+if test "$libpmem" = "yes" ; then
+  echo "CONFIG_LIBPMEM=y" >> $config_host_mak
+fi
+
 if test "$tcg_interpreter" = "yes"; then
   QEMU_INCLUDES="-I\$(SRC_PATH)/tcg/tci $QEMU_INCLUDES"
 elif test "$ARCH" = "sparc64" ; then
-- 
2.14.1




[Qemu-devel] [PATCH v4 5/8] migration/ram: ensure write persistence on loading zero pages to PMEM

2018-02-27 Thread Haozhong Zhang
When loading a zero page, check whether it will be loaded to
persistent memory If yes, load it by libpmem function
pmem_memset_nodrain().  Combined with a call to pmem_drain() at the
end of RAM loading, we can guarantee all those zero pages are
persistently loaded.

Depending on the host HW/SW configurations, pmem_drain() can be
"sfence".  Therefore, we do not call pmem_drain() after each
pmem_memset_nodrain(), or use pmem_memset_persist() (equally
pmem_memset_nodrain() + pmem_drain()), in order to avoid unnecessary
overhead.

Signed-off-by: Haozhong Zhang 
---
 include/qemu/pmem.h |  2 ++
 migration/ram.c | 25 +
 migration/ram.h |  2 +-
 migration/rdma.c|  2 +-
 stubs/pmem.c|  9 +
 5 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/include/qemu/pmem.h b/include/qemu/pmem.h
index 16f5b2653a..ce96379f3c 100644
--- a/include/qemu/pmem.h
+++ b/include/qemu/pmem.h
@@ -17,6 +17,8 @@
 #else  /* !CONFIG_LIBPMEM */
 
 void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len);
+void *pmem_memset_nodrain(void *pmemdest, int c, size_t len);
+void pmem_drain(void);
 
 #endif /* CONFIG_LIBPMEM */
 
diff --git a/migration/ram.c b/migration/ram.c
index 5e33e5cc79..3904ceee79 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -51,6 +51,7 @@
 #include "qemu/rcu_queue.h"
 #include "migration/colo.h"
 #include "migration/block.h"
+#include "qemu/pmem.h"
 
 /***/
 /* ram save/restore */
@@ -2479,11 +2480,16 @@ static inline void *host_from_ram_block_offset(RAMBlock 
*block,
  * @host: host address for the zero page
  * @ch: what the page is filled from.  We only support zero
  * @size: size of the zero page
+ * @is_pmem: whether @host is in the persistent memory
  */
-void ram_handle_compressed(void *host, uint8_t ch, uint64_t size)
+void ram_handle_compressed(void *host, uint8_t ch, uint64_t size, bool is_pmem)
 {
 if (ch != 0 || !is_zero_range(host, size)) {
-memset(host, ch, size);
+if (!is_pmem) {
+memset(host, ch, size);
+} else {
+pmem_memset_nodrain(host, ch, size);
+}
 }
 }
 
@@ -2839,6 +2845,7 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 bool postcopy_running = postcopy_is_running();
 /* ADVISE is earlier, it shows the source has the postcopy capability on */
 bool postcopy_advised = postcopy_is_advised();
+bool need_pmem_drain = false;
 
 seq_iter++;
 
@@ -2864,6 +2871,8 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 ram_addr_t addr, total_ram_bytes;
 void *host = NULL;
 uint8_t ch;
+RAMBlock *block = NULL;
+bool is_pmem = false;
 
 addr = qemu_get_be64(f);
 flags = addr & ~TARGET_PAGE_MASK;
@@ -2880,7 +2889,7 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 
 if (flags & (RAM_SAVE_FLAG_ZERO | RAM_SAVE_FLAG_PAGE |
  RAM_SAVE_FLAG_COMPRESS_PAGE | RAM_SAVE_FLAG_XBZRLE)) {
-RAMBlock *block = ram_block_from_stream(f, flags);
+block = ram_block_from_stream(f, flags);
 
 host = host_from_ram_block_offset(block, addr);
 if (!host) {
@@ -2890,6 +2899,9 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 }
 ramblock_recv_bitmap_set(block, host);
 trace_ram_load_loop(block->idstr, (uint64_t)addr, flags, host);
+
+is_pmem = ramblock_is_pmem(block);
+need_pmem_drain = need_pmem_drain || is_pmem;
 }
 
 switch (flags & ~RAM_SAVE_FLAG_CONTINUE) {
@@ -2943,7 +2955,7 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 
 case RAM_SAVE_FLAG_ZERO:
 ch = qemu_get_byte(f);
-ram_handle_compressed(host, ch, TARGET_PAGE_SIZE);
+ram_handle_compressed(host, ch, TARGET_PAGE_SIZE, is_pmem);
 break;
 
 case RAM_SAVE_FLAG_PAGE:
@@ -2986,6 +2998,11 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 }
 
 wait_for_decompress_done();
+
+if (need_pmem_drain) {
+pmem_drain();
+}
+
 rcu_read_unlock();
 trace_ram_load_complete(ret, seq_iter);
 return ret;
diff --git a/migration/ram.h b/migration/ram.h
index f3a227b4fc..18934ae9e4 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -57,7 +57,7 @@ int ram_postcopy_send_discard_bitmap(MigrationState *ms);
 int ram_discard_range(const char *block_name, uint64_t start, size_t length);
 int ram_postcopy_incoming_init(MigrationIncomingState *mis);
 
-void ram_handle_compressed(void *host, uint8_t ch, uint64_t size);
+void ram_handle_compressed(void *host, uint8_t ch, uint64_t size, bool 
is_pmem);
 
 int ramblock_recv_bitmap_test(RAMBlock *rb, void *host_addr);
 void ramblock_recv_bitmap_set(RAMBlock *rb, void *host_addr);
diff --git a/migration/rdma.c b/migration/rdma.c
index da4

[Qemu-devel] [PATCH v4 1/8] memory, exec: switch file ram allocation functions to 'flags' parameters

2018-02-27 Thread Haozhong Zhang
As more flag parameters besides the existing 'share' are going to be
added to following functions
memory_region_init_ram_from_file
qemu_ram_alloc_from_fd
qemu_ram_alloc_from_file
, let's switch them to use the 'flags' parameters so as to ease future
flag additions.

The existing 'share' flag is converted to the QEMU_RAM_SHARE bit in
flags, and other flag bits are ignored by above functions right now.

Signed-off-by: Haozhong Zhang 
---
 backends/hostmem-file.c |  3 ++-
 exec.c  |  7 ---
 include/exec/memory.h   | 10 --
 include/exec/ram_addr.h | 25 +++--
 memory.c|  8 +---
 numa.c  |  2 +-
 6 files changed, 43 insertions(+), 12 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index 134b08d63a..30df843d90 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -58,7 +58,8 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error 
**errp)
 path = object_get_canonical_path(OBJECT(backend));
 memory_region_init_ram_from_file(&backend->mr, OBJECT(backend),
  path,
- backend->size, fb->align, backend->share,
+ backend->size, fb->align,
+ backend->share ? QEMU_RAM_SHARE : 0,
  fb->mem_path, errp);
 g_free(path);
 }
diff --git a/exec.c b/exec.c
index 4d8addb263..537bf12412 100644
--- a/exec.c
+++ b/exec.c
@@ -2000,12 +2000,13 @@ static void ram_block_add(RAMBlock *new_block, Error 
**errp, bool shared)
 
 #ifdef __linux__
 RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
- bool share, int fd,
+ uint64_t flags, int fd,
  Error **errp)
 {
 RAMBlock *new_block;
 Error *local_err = NULL;
 int64_t file_size;
+bool share = flags & QEMU_RAM_SHARE;
 
 if (xen_enabled()) {
 error_setg(errp, "-mem-path not supported with Xen");
@@ -2061,7 +2062,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, 
MemoryRegion *mr,
 
 
 RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
-   bool share, const char *mem_path,
+   uint64_t flags, const char *mem_path,
Error **errp)
 {
 int fd;
@@ -2073,7 +2074,7 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, 
MemoryRegion *mr,
 return NULL;
 }
 
-block = qemu_ram_alloc_from_fd(size, mr, share, fd, errp);
+block = qemu_ram_alloc_from_fd(size, mr, flags, fd, errp);
 if (!block) {
 if (created) {
 unlink(mem_path);
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 15e81113ba..0fc9d23a48 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -487,6 +487,9 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr,
void *host),
Error **errp);
 #ifdef __linux__
+
+#define QEMU_RAM_SHARE  (1UL << 0)
+
 /**
  * memory_region_init_ram_from_file:  Initialize RAM memory region with a
  *mmap-ed backend.
@@ -498,7 +501,10 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr,
  * @size: size of the region.
  * @align: alignment of the region base address; if 0, the default alignment
  * (getpagesize()) will be used.
- * @share: %true if memory must be mmaped with the MAP_SHARED flag
+ * @flags: specify properties of this memory region, which can be one or bit-or
+ * of following values:
+ * - QEMU_RAM_SHARE: memory must be mmaped with the MAP_SHARED flag
+ * Other bits are ignored.
  * @path: the path in which to allocate the RAM.
  * @errp: pointer to Error*, to store an error if it happens.
  *
@@ -510,7 +516,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr,
   const char *name,
   uint64_t size,
   uint64_t align,
-  bool share,
+  uint64_t flags,
   const char *path,
   Error **errp);
 
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index cf2446a176..b8b01d1eb9 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -72,12 +72,33 @@ static inline unsigned long int 
ramblock_recv_bitmap_offset(void *host_addr,
 
 long qemu_getrampagesize(void);
 unsigned long last_ram_page(void);
+
+/**
+ * qemu_ram_alloc_from_file,
+ * qemu_ram_alloc_from_fd:  Allocate a ram block from the specified back
+ *  file or device
+ *
+ * Parameters:
+ *  @size: the size in bytes 

[Qemu-devel] [PATCH v4 2/8] hostmem-file: add the 'pmem' option

2018-02-27 Thread Haozhong Zhang
When QEMU emulates vNVDIMM labels and migrates vNVDIMM devices, it
needs to know whether the backend storage is a real persistent memory,
in order to decide whether special operations should be performed to
ensure the data persistence.

This boolean option 'pmem' allows users to specify whether the backend
storage of memory-backend-file is a real persistent memory. If
'pmem=on', QEMU will set the flag RAM_PMEM in the RAM block of the
corresponding memory region.

Signed-off-by: Haozhong Zhang 
---
 backends/hostmem-file.c | 26 +-
 docs/nvdimm.txt | 14 ++
 exec.c  | 13 -
 include/exec/memory.h   |  2 ++
 include/exec/ram_addr.h |  3 +++
 qemu-options.hx |  9 -
 6 files changed, 64 insertions(+), 3 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index 30df843d90..5d706d471f 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -34,6 +34,7 @@ struct HostMemoryBackendFile {
 bool discard_data;
 char *mem_path;
 uint64_t align;
+bool is_pmem;
 };
 
 static void
@@ -59,7 +60,8 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error 
**errp)
 memory_region_init_ram_from_file(&backend->mr, OBJECT(backend),
  path,
  backend->size, fb->align,
- backend->share ? QEMU_RAM_SHARE : 0,
+ (backend->share ? QEMU_RAM_SHARE : 0) |
+ (fb->is_pmem ? QEMU_RAM_PMEM : 0),
  fb->mem_path, errp);
 g_free(path);
 }
@@ -131,6 +133,25 @@ static void file_memory_backend_set_align(Object *o, 
Visitor *v,
 error_propagate(errp, local_err);
 }
 
+static bool file_memory_backend_get_pmem(Object *o, Error **errp)
+{
+return MEMORY_BACKEND_FILE(o)->is_pmem;
+}
+
+static void file_memory_backend_set_pmem(Object *o, bool value, Error **errp)
+{
+HostMemoryBackend *backend = MEMORY_BACKEND(o);
+HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
+
+if (host_memory_backend_mr_inited(backend)) {
+error_setg(errp, "cannot change property 'pmem' of %s '%s'",
+   object_get_typename(o), backend->id);
+return;
+}
+
+fb->is_pmem = value;
+}
+
 static void file_backend_unparent(Object *obj)
 {
 HostMemoryBackend *backend = MEMORY_BACKEND(obj);
@@ -162,6 +183,9 @@ file_backend_class_init(ObjectClass *oc, void *data)
 file_memory_backend_get_align,
 file_memory_backend_set_align,
 NULL, NULL, &error_abort);
+object_class_property_add_bool(oc, "pmem",
+file_memory_backend_get_pmem, file_memory_backend_set_pmem,
+&error_abort);
 }
 
 static void file_backend_instance_finalize(Object *o)
diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
index e903d8bb09..bcb2032672 100644
--- a/docs/nvdimm.txt
+++ b/docs/nvdimm.txt
@@ -153,3 +153,17 @@ guest NVDIMM region mapping structure.  This unarmed flag 
indicates
 guest software that this vNVDIMM device contains a region that cannot
 accept persistent writes. In result, for example, the guest Linux
 NVDIMM driver, marks such vNVDIMM device as read-only.
+
+If the vNVDIMM backend is on the host persistent memory that can be
+accessed in SNIA NVM Programming Model [1] (e.g., Intel NVDIMM), it's
+suggested to set the 'pmem' option of memory-backend-file to 'on'. When
+'pmem=on' and QEMU is built with libpmem [2] support (configured with
+--enable-libpmem), QEMU will take necessary operations to guarantee
+the persistence of its own writes to the vNVDIMM backend (e.g., in
+vNVDIMM label emulation and live migration).
+
+References
+--
+
+[1] SNIA NVM Programming Model: 
https://www.snia.org/sites/default/files/technical_work/final/NVMProgrammingModel_v1.2.pdf
+[2] PMDK: http://pmem.io/pmdk/
diff --git a/exec.c b/exec.c
index 537bf12412..3f3b61fb0a 100644
--- a/exec.c
+++ b/exec.c
@@ -99,6 +99,9 @@ static MemoryRegion io_mem_unassigned;
  */
 #define RAM_RESIZEABLE (1 << 2)
 
+/* RAM is backed by the persistent memory. */
+#define RAM_PMEM   (1 << 3)
+
 #endif
 
 #ifdef TARGET_PAGE_BITS_VARY
@@ -2007,6 +2010,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, 
MemoryRegion *mr,
 Error *local_err = NULL;
 int64_t file_size;
 bool share = flags & QEMU_RAM_SHARE;
+bool is_pmem = flags & QEMU_RAM_PMEM;
 
 if (xen_enabled()) {
 error_setg(errp, "-mem-path not supported with Xen");
@@ -2043,7 +2047,8 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, 
MemoryRegion *mr,
 new_block->mr = mr;
 new_block->used_length = size;
 new_block->max_length = size;
-new_block->flags = share ? RAM_SHARED : 0;
+new_block->flags = (share ? RAM_SHARED : 0) |
+   (is_pmem ? RAM_PMEM : 0);
 new_block->host = file_ram_alloc(new_block, size, fd, !file_size, errp);
 if (!new_block->host) {

[Qemu-devel] [PATCH v4 0/8] nvdimm: guarantee persistence of QEMU writes to persistent memory

2018-02-27 Thread Haozhong Zhang
QEMU writes to vNVDIMM backends in the vNVDIMM label emulation and
live migration. If the backend is on the persistent memory, QEMU needs
to take proper operations to ensure its writes persistent on the
persistent memory. Otherwise, a host power failure may result in the
loss the guest data on the persistent memory.

This v3 patch series is based on Marcel's patch "mem: add share
parameter to memory-backend-ram" [1] because of the changes in patch 1.

[1] https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg03858.html

Previous versions can be found at
v3: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg04365.html
v2: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg01579.html
v1: https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg05040.html

Changes in v4:
 * (Patch 2) Fix compilation errors found by patchew.

Changes in v3:
 * (Patch 5) Add a is_pmem flag to ram_handle_compressed() and handle
   PMEM writes in it, so we don't need the _common function.
 * (Patch 6) Expose qemu_get_buffer_common so we can remove the
   unnecessary qemu_get_buffer_to_pmem wrapper.
 * (Patch 8) Add a is_pmem flag to xbzrle_decode_buffer() and handle
   PMEM writes in it, so we can remove the unnecessary
   xbzrle_decode_buffer_{common, to_pmem}.
 * Move libpmem stubs to stubs/pmem.c and fix the compilation failures
   of test-{xbzrle,vmstate}.c.

Changes in v2:
 * (Patch 1) Use a flags parameter in file ram allocation functions.
 * (Patch 2) Add a new option 'pmem' to hostmem-file.
 * (Patch 3) Use libpmem to operate on the persistent memory, rather
   than re-implementing those operations in QEMU.
 * (Patch 5-8) Consider the write persistence in the migration path.

Haozhong Zhang (8):
  [1/8] memory, exec: switch file ram allocation functions to 'flags' parameters
  [2/8] hostmem-file: add the 'pmem' option
  [3/8] configure: add libpmem support
  [4/8] mem/nvdimm: ensure write persistence to PMEM in label emulation
  [5/8] migration/ram: ensure write persistence on loading zero pages to PMEM
  [6/8] migration/ram: ensure write persistence on loading normal pages to PMEM
  [7/8] migration/ram: ensure write persistence on loading compressed pages to 
PMEM
  [8/8] migration/ram: ensure write persistence on loading xbzrle pages to PMEM

 backends/hostmem-file.c | 27 +++-
 configure   | 35 ++
 docs/nvdimm.txt | 14 +++
 exec.c  | 20 ---
 hw/mem/nvdimm.c |  9 ++-
 include/exec/memory.h   | 12 +++--
 include/exec/ram_addr.h | 28 +++--
 include/migration/qemu-file-types.h |  2 ++
 include/qemu/pmem.h | 27 
 memory.c|  8 +++---
 migration/qemu-file.c   | 29 ++
 migration/ram.c | 49 +++--
 migration/ram.h |  2 +-
 migration/rdma.c|  2 +-
 migration/xbzrle.c  |  8 --
 migration/xbzrle.h  |  3 ++-
 numa.c  |  2 +-
 qemu-options.hx |  9 ++-
 stubs/Makefile.objs |  1 +
 stubs/pmem.c| 37 
 tests/Makefile.include  |  4 +--
 tests/test-xbzrle.c |  4 +--
 22 files changed, 285 insertions(+), 47 deletions(-)
 create mode 100644 include/qemu/pmem.h
 create mode 100644 stubs/pmem.c

-- 
2.14.1




Re: [Qemu-devel] Deprecate tilegx ?

2018-02-27 Thread Paolo Bonzini
On 28/02/2018 07:11, Thomas Huth wrote:
> On 27.02.2018 12:51, Peter Maydell wrote:
>> I propose that we deprecate and plan to remove the unicore32 code:
> [...]
>> Essentially, it seems to be a largely-inactive university R&D project,
>> it's costing us in maintenance effort every time we have to touch it,
>> and I don't think it has any real users.
>>
>> Does anybody disagree?
>>
>> If we go ahead with deprecating then we should:
>>  * add a note to Changelog that we're deprecating the target
>>  * ditto qemu-doc.texi's deprecation section
>>  * patch hw/unicore32/puv3.c to warn on startup that it's deprecated
>>  * remove it entirely for the 2.14 release
>>
>> We could also remove linux-user/unicore32 immediately, since
>> the linux-user target has been disabled for some time.
> 
> Sounds reasonable to me, but let's wait a week or two for feedback from
> Guan Xuetao.

Sounds good---thought I would consider dropping unicore32 now with no
formal deprecation period...

>> Possibly there are other target architectures we could reasonably
>> deprecate-and-remove (though none of the other ones Linux is dropping
>> in this round are ones we support)...
> 
> I'd vote for marking tilegx as deprecated, too, since we even do not
> have an active maintainer for that CPU core (at least I did not spot one
> in our MAINTAINERS file). Opinions?

Tilegx has been last modified in 2015, so it's a little more alive than
unicore32.

Another one is moxie.  Anthony?

Thanks,

Paolo



Re: [Qemu-devel] [PATCH v3 05/29] postcopy: Add vhost-user flag for postcopy and check it

2018-02-27 Thread Peter Xu
On Fri, Feb 16, 2018 at 01:16:01PM +, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" 
> 
> Add a vhost feature flag for postcopy support, and
> use the postcopy notifier to check it before allowing postcopy.
> 
> Signed-off-by: Dr. David Alan Gilbert 

Reviewed-by: Peter Xu 

-- 
Peter Xu



Re: [Qemu-devel] [PATCH] balloon: Fix documentation of the --balloon parameter and deprecate it

2018-02-27 Thread Paolo Bonzini
On 28/02/2018 06:38, Thomas Huth wrote:
> There are two issues with the documentation of the --balloon parameter:
> First, "--balloon none" is simply doing nothing. Even if a machine had a
> balloon device by default, this option is not disabling anything, it is
> simply ignored. Thus let's simply drop this option from the documentation
> to avoid to confuse the users (but keep the code in vl.c for backward
> compatibility).
> Second, the documentation claims that "--balloon virtio" is the default
> mode, but this is not true anymore since commit 382f074371f7dc32a34.
> Since that commit, the option also has no real use case anymore, since
> you can simply use "--device virtio-balloon" nowadays instead. Thus to
> simplify our complex parameter zoo a little bit, let's deprecate the
> the parameter now and tell the user to use "--device virtio-balloon"
> instead.
> 
> Fixes: 382f074371f7dc32a34c944c845b1698e83d8c36
> Signed-off-by: Thomas Huth 
> ---
>  qemu-doc.texi   |  5 +
>  qemu-options.hx | 11 ---
>  vl.c|  3 +++
>  3 files changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/qemu-doc.texi b/qemu-doc.texi
> index 8e35569..29c888d 100644
> --- a/qemu-doc.texi
> +++ b/qemu-doc.texi
> @@ -2725,6 +2725,11 @@ enabled via the ``-machine usb=on'' argument.
>  
>  The ``-nodefconfig`` argument is a synonym for ``-no-user-config``.
>  
> +@subsection -balloon (since 2.12.0)
> +
> +The @option{--balloon virtio} argument has been superseded by
> +@option{--device virtio-balloon}.
> +
>  @subsection -machine s390-squash-mcss=on|off (since 2.12.0)
>  
>  The ``s390-squash-mcss=on`` property has been obsoleted by allowing the
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 8ccd5dc..075eb0a 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -462,16 +462,13 @@ modprobe i810_audio clocking=48000
>  ETEXI
>  
>  DEF("balloon", HAS_ARG, QEMU_OPTION_balloon,
> -"-balloon none   disable balloon device\n"
>  "-balloon virtio[,addr=str]\n"
> -"enable virtio balloon device (default)\n", 
> QEMU_ARCH_ALL)
> +"enable virtio balloon device (deprecated)\n", 
> QEMU_ARCH_ALL)
>  STEXI
> -@item -balloon none
> -@findex -balloon
> -Disable balloon device.
>  @item -balloon virtio[,addr=@var{addr}]
> -Enable virtio balloon device (default), optionally with PCI address
> -@var{addr}.
> +@findex -balloon
> +Enable virtio balloon device, optionally with PCI address @var{addr}. This
> +option is deprecated, use @option{--device virtio-balloon} instead.
>  ETEXI
>  
>  DEF("device", HAS_ARG, QEMU_OPTION_device,
> diff --git a/vl.c b/vl.c
> index 9e7235d..2729476 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -2221,6 +2221,9 @@ static int balloon_parse(const char *arg)
>  {
>  QemuOpts *opts;
>  
> +warn_report("This option is deprecated. "
> +"Use '--device virtio-balloon' to enable the balloon 
> device.");
> +
>  if (strcmp(arg, "none") == 0) {
>  return 0;
>  }
> 

Queued, thanks.

Paolo



Re: [Qemu-devel] [PATCH v3 03/29] postcopy: use UFFDIO_ZEROPAGE only when available

2018-02-27 Thread Peter Xu
On Fri, Feb 16, 2018 at 01:15:59PM +, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" 
> 
> Use a flag on the RAMBlock to state whether it has the
> UFFDIO_ZEROPAGE capability, use it when it's available.
> 
> This allows the use of postcopy on tmpfs as well as hugepage
> backed files.
> 
> Signed-off-by: Dr. David Alan Gilbert 
> ---
>  exec.c| 15 +++
>  include/exec/cpu-common.h |  3 +++
>  migration/postcopy-ram.c  | 13 ++---
>  3 files changed, 28 insertions(+), 3 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index 0ec73bc917..1dc15298c2 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -99,6 +99,11 @@ static MemoryRegion io_mem_unassigned;
>   */
>  #define RAM_RESIZEABLE (1 << 2)
>  
> +/* UFFDIO_ZEROPAGE is available on this RAMBlock to atomically
> + * zero the page and wake waiting processes.
> + * (Set during postcopy)
> + */
> +#define RAM_UF_ZEROPAGE (1 << 3)
>  #endif
>  
>  #ifdef TARGET_PAGE_BITS_VARY
> @@ -1767,6 +1772,16 @@ bool qemu_ram_is_shared(RAMBlock *rb)
>  return rb->flags & RAM_SHARED;
>  }
>  
> +bool qemu_ram_is_uf_zeroable(RAMBlock *rb)
> +{
> +return rb->flags & RAM_UF_ZEROPAGE;
> +}
> +
> +void qemu_ram_set_uf_zeroable(RAMBlock *rb)
> +{
> +rb->flags |= RAM_UF_ZEROPAGE;
> +}
> +
>  /* Called with iothread lock held.  */
>  void qemu_ram_set_idstr(RAMBlock *new_block, const char *name, DeviceState 
> *dev)
>  {
> diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
> index 0d861a6289..24d335f95d 100644
> --- a/include/exec/cpu-common.h
> +++ b/include/exec/cpu-common.h
> @@ -73,6 +73,9 @@ void qemu_ram_set_idstr(RAMBlock *block, const char *name, 
> DeviceState *dev);
>  void qemu_ram_unset_idstr(RAMBlock *block);
>  const char *qemu_ram_get_idstr(RAMBlock *rb);
>  bool qemu_ram_is_shared(RAMBlock *rb);
> +bool qemu_ram_is_uf_zeroable(RAMBlock *rb);
> +void qemu_ram_set_uf_zeroable(RAMBlock *rb);
> +
>  size_t qemu_ram_pagesize(RAMBlock *block);
>  size_t qemu_ram_pagesize_largest(void);
>  
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index bec6c2c66b..6297979700 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -490,6 +490,10 @@ static int ram_block_enable_notify(const char 
> *block_name, void *host_addr,
>  error_report("%s userfault: Region doesn't support COPY", __func__);
>  return -1;
>  }
> +if (reg_struct.ioctls & ((__u64)1 << _UFFDIO_ZEROPAGE)) {
> +RAMBlock *rb = qemu_ram_block_by_name(block_name);
> +qemu_ram_set_uf_zeroable(rb);
> +}

So the zeroable flag is only set after a listening operation of
postcopy migration.  One thing I am a bit worried is that if someone
else wants to use the flag for a RAMBlock he/she may not notice this.
Say, qemu_ram_is_uf_zeroable() is not valid if there is no such an
incoming postcopy migration.

Maybe worth add a comment in the flag definition about this?

Not a big deal (considering that I see no potential QEMU user for
userfaultfd in short peroid), so no matter what:

Reviewed-by: Peter Xu 

>  
>  return 0;
>  }
> @@ -699,11 +703,14 @@ int postcopy_place_page(MigrationIncomingState *mis, 
> void *host, void *from,
>  int postcopy_place_page_zero(MigrationIncomingState *mis, void *host,
>   RAMBlock *rb)
>  {
> +size_t pagesize = qemu_ram_pagesize(rb);
>  trace_postcopy_place_page_zero(host);
>  
> -if (qemu_ram_pagesize(rb) == getpagesize()) {
> -if (qemu_ufd_copy_ioctl(mis->userfault_fd, host, NULL, getpagesize(),
> -rb)) {
> +/* Normal RAMBlocks can zero a page using UFFDIO_ZEROPAGE
> + * but it's not available for everything (e.g. hugetlbpages)
> + */
> +if (qemu_ram_is_uf_zeroable(rb)) {
> +if (qemu_ufd_copy_ioctl(mis->userfault_fd, host, NULL, pagesize, 
> rb)) {
>  int e = errno;
>  error_report("%s: %s zero host: %p",
>   __func__, strerror(e), host);
> -- 
> 2.14.3
> 

-- 
Peter Xu



Re: [Qemu-devel] [PATCH v3 01/29] migrate: Update ram_block_discard_range for shared

2018-02-27 Thread Peter Xu
On Fri, Feb 16, 2018 at 01:15:57PM +, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" 
> 
> The choice of call to discard a block is getting more complicated
> for other cases.   We use fallocate PUNCH_HOLE in any file cases;
> it works for both hugepage and for tmpfs.
> We use the DONTNEED for non-hugepage cases either where they're
> anonymous or where they're private.
> 
> Care should be taken when trying other backing files.
> 
> Signed-off-by: Dr. David Alan Gilbert 
> ---
>  exec.c   | 60 
> ++--
>  trace-events |  3 ++-
>  2 files changed, 48 insertions(+), 15 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index e8d7b335b6..b1bb46 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -3702,6 +3702,7 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t 
> start, size_t length)
>  }
>  
>  if ((start + length) <= rb->used_length) {
> +bool need_madvise, need_fallocate;
>  uint8_t *host_endaddr = host_startaddr + length;
>  if ((uintptr_t)host_endaddr & (rb->page_size - 1)) {
>  error_report("ram_block_discard_range: Unaligned end address: 
> %p",
> @@ -3711,29 +3712,60 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t 
> start, size_t length)
>  
>  errno = ENOTSUP; /* If we are missing MADVISE etc */
>  
> -if (rb->page_size == qemu_host_page_size) {
> -#if defined(CONFIG_MADVISE)
> -/* Note: We need the madvise MADV_DONTNEED behaviour of 
> definitely
> - * freeing the page.
> - */
> -ret = madvise(host_startaddr, length, MADV_DONTNEED);
> -#endif
> -} else {
> -/* Huge page case  - unfortunately it can't do DONTNEED, but
> - * it can do the equivalent by FALLOC_FL_PUNCH_HOLE in the
> - * huge page file.
> +/* The logic here is messy;
> + *madvise DONTNEED fails for hugepages
> + *fallocate works on hugepages and shmem
> + */
> +need_madvise = (rb->page_size == qemu_host_page_size);
> +need_fallocate = rb->fd != -1;
> +if (need_fallocate) {
> +/* For a file, this causes the area of the file to be zero'd
> + * if read, and for hugetlbfs also causes it to be unmapped
> + * so a userfault will trigger.
>   */
>  #ifdef CONFIG_FALLOCATE_PUNCH_HOLE
>  ret = fallocate(rb->fd, FALLOC_FL_PUNCH_HOLE | 
> FALLOC_FL_KEEP_SIZE,
>  start, length);
> +if (ret) {
> +ret = -errno;
> +error_report("ram_block_discard_range: Failed to fallocate "
> + "%s:%" PRIx64 " +%zx (%d)",
> + rb->idstr, start, length, ret);
> +goto err;
> +}
> +#else
> +ret = -ENOSYS;
> +error_report("ram_block_discard_range: fallocate not 
> available/file"
> + "%s:%" PRIx64 " +%zx (%d)",
> + rb->idstr, start, length, ret);
> +goto err;
>  #endif
>  }
> -if (ret) {
> -ret = -errno;
> -error_report("ram_block_discard_range: Failed to discard range "
> +if (need_madvise) {
> +/* For normal RAM this causes it to be unmapped,
> + * for shared memory it causes the local mapping to disappear
> + * and to fall back on the file contents (which we just
> + * fallocate'd away).
> + */
> +#if defined(CONFIG_MADVISE)
> +ret =  madvise(host_startaddr, length, MADV_DONTNEED);
> +if (ret) {
> +ret = -errno;
> +error_report("ram_block_discard_range: Failed to discard 
> range "
> + "%s:%" PRIx64 " +%zx (%d)",
> + rb->idstr, start, length, ret);
> +goto err;
> +}
> +#else
> +ret = -ENOSYS;
> +error_report("ram_block_discard_range: MADVISE not available"
>   "%s:%" PRIx64 " +%zx (%d)",
>   rb->idstr, start, length, ret);
> +goto err;
> +#endif
>  }
> +trace_ram_block_discard_range(rb->idstr, host_startaddr,
> +  need_madvise, need_fallocate, ret);

Nit: worth to log the length too if it's named as "range"?

Either with/without:

Reviewed-by: Peter Xu 

-- 
Peter Xu



Re: [Qemu-devel] [PULL 00/12] Ui 20180227 patches

2018-02-27 Thread Gerd Hoffmann
  Hi,

> Hi. This failed to build on my OpenBSD test box:
> 
>   CC  qga/guest-agent-command-state.o
> In file included from /home/qemu/include/qemu/osdep.h:30:0,
>  from /home/qemu/qga/guest-agent-command-state.c:12:
> ./config-host.h:29:0: warning: "CONFIG_SDL" redefined
>  #define CONFIG_SDL m
>  ^
> ./config-host.h:17:0: note: this is the location of the previous definition
>  #define CONFIG_SDL 1
>  ^
> 
> (warning repeated for pretty much every object file)
> 
> and then linking of the final executables failed with
> 
> ../audio/audio.o:(.data.rel.ro+0x0): undefined reference to `sdl_audio_driver'

Oh, right, there is sdl audio, completely forgot about that ...

(this probably triggers on openbsd because sdl audio is the default
there).

I think modularizing SDL isn't that easy then.
Can you just drop the "sdl: build as ui module" patch?

cheers,
  Gerd




[Qemu-devel] Deprecate tilegx ? (was: Proposal: deprecate and remove QEMU's unicore32 target code)

2018-02-27 Thread Thomas Huth
On 27.02.2018 12:51, Peter Maydell wrote:
> I propose that we deprecate and plan to remove the unicore32 code:
[...]
> Essentially, it seems to be a largely-inactive university R&D project,
> it's costing us in maintenance effort every time we have to touch it,
> and I don't think it has any real users.
>
> Does anybody disagree?
> 
> If we go ahead with deprecating then we should:
>  * add a note to Changelog that we're deprecating the target
>  * ditto qemu-doc.texi's deprecation section
>  * patch hw/unicore32/puv3.c to warn on startup that it's deprecated
>  * remove it entirely for the 2.14 release
> 
> We could also remove linux-user/unicore32 immediately, since
> the linux-user target has been disabled for some time.

Sounds reasonable to me, but let's wait a week or two for feedback from
Guan Xuetao.

> Possibly there are other target architectures we could reasonably
> deprecate-and-remove (though none of the other ones Linux is dropping
> in this round are ones we support)...

I'd vote for marking tilegx as deprecated, too, since we even do not
have an active maintainer for that CPU core (at least I did not spot one
in our MAINTAINERS file). Opinions?

 Thomas



[Qemu-devel] [PATCH] balloon: Fix documentation of the --balloon parameter and deprecate it

2018-02-27 Thread Thomas Huth
There are two issues with the documentation of the --balloon parameter:
First, "--balloon none" is simply doing nothing. Even if a machine had a
balloon device by default, this option is not disabling anything, it is
simply ignored. Thus let's simply drop this option from the documentation
to avoid to confuse the users (but keep the code in vl.c for backward
compatibility).
Second, the documentation claims that "--balloon virtio" is the default
mode, but this is not true anymore since commit 382f074371f7dc32a34.
Since that commit, the option also has no real use case anymore, since
you can simply use "--device virtio-balloon" nowadays instead. Thus to
simplify our complex parameter zoo a little bit, let's deprecate the
the parameter now and tell the user to use "--device virtio-balloon"
instead.

Fixes: 382f074371f7dc32a34c944c845b1698e83d8c36
Signed-off-by: Thomas Huth 
---
 qemu-doc.texi   |  5 +
 qemu-options.hx | 11 ---
 vl.c|  3 +++
 3 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/qemu-doc.texi b/qemu-doc.texi
index 8e35569..29c888d 100644
--- a/qemu-doc.texi
+++ b/qemu-doc.texi
@@ -2725,6 +2725,11 @@ enabled via the ``-machine usb=on'' argument.
 
 The ``-nodefconfig`` argument is a synonym for ``-no-user-config``.
 
+@subsection -balloon (since 2.12.0)
+
+The @option{--balloon virtio} argument has been superseded by
+@option{--device virtio-balloon}.
+
 @subsection -machine s390-squash-mcss=on|off (since 2.12.0)
 
 The ``s390-squash-mcss=on`` property has been obsoleted by allowing the
diff --git a/qemu-options.hx b/qemu-options.hx
index 8ccd5dc..075eb0a 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -462,16 +462,13 @@ modprobe i810_audio clocking=48000
 ETEXI
 
 DEF("balloon", HAS_ARG, QEMU_OPTION_balloon,
-"-balloon none   disable balloon device\n"
 "-balloon virtio[,addr=str]\n"
-"enable virtio balloon device (default)\n", QEMU_ARCH_ALL)
+"enable virtio balloon device (deprecated)\n", 
QEMU_ARCH_ALL)
 STEXI
-@item -balloon none
-@findex -balloon
-Disable balloon device.
 @item -balloon virtio[,addr=@var{addr}]
-Enable virtio balloon device (default), optionally with PCI address
-@var{addr}.
+@findex -balloon
+Enable virtio balloon device, optionally with PCI address @var{addr}. This
+option is deprecated, use @option{--device virtio-balloon} instead.
 ETEXI
 
 DEF("device", HAS_ARG, QEMU_OPTION_device,
diff --git a/vl.c b/vl.c
index 9e7235d..2729476 100644
--- a/vl.c
+++ b/vl.c
@@ -2221,6 +2221,9 @@ static int balloon_parse(const char *arg)
 {
 QemuOpts *opts;
 
+warn_report("This option is deprecated. "
+"Use '--device virtio-balloon' to enable the balloon device.");
+
 if (strcmp(arg, "none") == 0) {
 return 0;
 }
-- 
1.8.3.1




[Qemu-devel] [PATCH 12/14] qio: move QIOTaskThreadData into QIOTask

2018-02-27 Thread Peter Xu
The major reason to do this is that, after the upper level can cache the
QIOTask, it should also be able to further manage the QIOTask.  And, it
can't if it does not have the information in QIOTaskThreadData.  So
let's just merge this struct with QIOTask.  Actually by doing this,
it'll simplify the code a bit too.

This will be needed in the next patch, when we want to rebuild the
completion GSource when the GMainContext changed.

Signed-off-by: Peter Xu 
---
 io/task.c | 46 ++
 1 file changed, 26 insertions(+), 20 deletions(-)

diff --git a/io/task.c b/io/task.c
index 00d3a5096a..080f9560ea 100644
--- a/io/task.c
+++ b/io/task.c
@@ -24,6 +24,13 @@
 #include "qemu/thread.h"
 #include "trace.h"
 
+struct QIOTaskThreadData {
+QIOTaskWorker worker;
+gpointer opaque;
+GDestroyNotify destroy;
+};
+typedef struct QIOTaskThreadData QIOTaskThreadData;
+
 struct QIOTask {
 Object *source;
 QIOTaskFunc func;
@@ -37,6 +44,7 @@ struct QIOTask {
 /* Threaded QIO task specific fields */
 GSource *idle_source;  /* The idle task to run complete routine */
 GMainContext *context; /* The context that idle task will run with */
+QIOTaskThreadData thread_data;
 };
 
 
@@ -86,26 +94,25 @@ static void qio_task_free(QIOTask *task)
 }
 
 
-struct QIOTaskThreadData {
-QIOTask *task;
-QIOTaskWorker worker;
-gpointer opaque;
-GDestroyNotify destroy;
-};
-
-
 static gboolean qio_task_thread_result(gpointer opaque)
 {
-struct QIOTaskThreadData *data = opaque;
+QIOTask *task = opaque;
+QIOTaskThreadData *data = &task->thread_data;
 
-trace_qio_task_thread_result(data->task);
-qio_task_complete(data->task);
+/*
+ * Take one more refcount since qio_task_complete() may otherwise
+ * release the last refcount and free, then "data" may be invalid.
+ */
+qio_task_ref(task);
+
+trace_qio_task_thread_result(task);
+qio_task_complete(task);
 
 if (data->destroy) {
 data->destroy(data->opaque);
 }
 
-g_free(data);
+qio_task_unref(task);
 
 return FALSE;
 }
@@ -113,19 +120,19 @@ static gboolean qio_task_thread_result(gpointer opaque)
 
 static gpointer qio_task_thread_worker(gpointer opaque)
 {
-struct QIOTaskThreadData *data = opaque;
-QIOTask *task = data->task;
+QIOTask *task = opaque;
+QIOTaskThreadData *data = &task->thread_data;
 GSource *idle;
 
-trace_qio_task_thread_run(data->task);
-data->worker(data->task, data->opaque);
+trace_qio_task_thread_run(task);
+data->worker(task, data->opaque);
 
 /* We're running in the background thread, and must only
  * ever report the task results in the main event loop
  * thread. So we schedule an idle callback to report
  * the worker results
  */
-trace_qio_task_thread_exit(data->task);
+trace_qio_task_thread_exit(task);
 
 idle = g_idle_source_new();
 g_source_set_callback(idle, qio_task_thread_result, data, NULL);
@@ -142,15 +149,14 @@ void qio_task_run_in_thread(QIOTask *task,
 GDestroyNotify destroy,
 GMainContext *context)
 {
-struct QIOTaskThreadData *data = g_new0(struct QIOTaskThreadData, 1);
 QemuThread thread;
+QIOTaskThreadData *data = &task->thread_data;
 
 if (context) {
 g_main_context_ref(context);
 task->context = context;
 }
 
-data->task = task;
 data->worker = worker;
 data->opaque = opaque;
 data->destroy = destroy;
@@ -159,7 +165,7 @@ void qio_task_run_in_thread(QIOTask *task,
 qemu_thread_create(&thread,
"io-task-worker",
qio_task_thread_worker,
-   data,
+   task,
QEMU_THREAD_DETACHED);
 }
 
-- 
2.14.3




[Qemu-devel] [PATCH 10/14] qio: refcount QIOTask

2018-02-27 Thread Peter Xu
It will be used in multiple threads in follow-up patches.  Let it start
to have refcounts.

Signed-off-by: Peter Xu 
---
 include/io/task.h |  3 +++
 io/task.c | 23 ++-
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/include/io/task.h b/include/io/task.h
index 9dbe3758d7..c6acd6489c 100644
--- a/include/io/task.h
+++ b/include/io/task.h
@@ -322,4 +322,7 @@ gpointer qio_task_get_result_pointer(QIOTask *task);
  */
 Object *qio_task_get_source(QIOTask *task);
 
+void qio_task_ref(QIOTask *task);
+void qio_task_unref(QIOTask *task);
+
 #endif /* QIO_TASK_H */
diff --git a/io/task.c b/io/task.c
index 204c0be286..00d3a5096a 100644
--- a/io/task.c
+++ b/io/task.c
@@ -32,6 +32,7 @@ struct QIOTask {
 Error *err;
 gpointer result;
 GDestroyNotify destroyResult;
+uint32_t refcount;
 
 /* Threaded QIO task specific fields */
 GSource *idle_source;  /* The idle task to run complete routine */
@@ -57,6 +58,8 @@ QIOTask *qio_task_new(Object *source,
 
 trace_qio_task_new(task, source, func, opaque);
 
+qio_task_ref(task);
+
 return task;
 }
 
@@ -165,7 +168,7 @@ void qio_task_complete(QIOTask *task)
 {
 task->func(task, task->opaque);
 trace_qio_task_complete(task);
-qio_task_free(task);
+qio_task_unref(task);
 }
 
 
@@ -208,3 +211,21 @@ Object *qio_task_get_source(QIOTask *task)
 {
 return task->source;
 }
+
+void qio_task_ref(QIOTask *task)
+{
+if (!task) {
+return;
+}
+atomic_inc(&task->refcount);
+}
+
+void qio_task_unref(QIOTask *task)
+{
+if (!task) {
+return;
+}
+if (atomic_fetch_dec(&task->refcount) == 1) {
+qio_task_free(task);
+}
+}
-- 
2.14.3




[Qemu-devel] [PATCH 05/14] qio: refactor net listener source operations

2018-02-27 Thread Peter Xu
Three functions are abstracted from the old code:

- qio_net_listener_source_add(): create one source for listener
- qio_net_listener_sources_clear(): unset existing net lister sources
- qio_net_listener_sources_update(): setup all sources for listener

Use them where possible.

Signed-off-by: Peter Xu 
---
 io/net-listener.c | 82 +++
 1 file changed, 41 insertions(+), 41 deletions(-)

diff --git a/io/net-listener.c b/io/net-listener.c
index de38dfae99..3e9ac51b0e 100644
--- a/io/net-listener.c
+++ b/io/net-listener.c
@@ -106,6 +106,39 @@ int qio_net_listener_open_sync(QIONetListener *listener,
 }
 }
 
+static guint qio_net_listener_source_add(QIONetListener *listener,
+ QIOChannelSocket *sioc)
+{
+return qio_channel_add_watch(QIO_CHANNEL(sioc), G_IO_IN,
+ qio_net_listener_channel_func,
+ listener, (GDestroyNotify)object_unref);
+}
+
+static void qio_net_listener_sources_clear(QIONetListener *listener)
+{
+size_t i;
+
+for (i = 0; i < listener->nsioc; i++) {
+if (listener->io_tag[i]) {
+g_source_remove(listener->io_tag[i]);
+listener->io_tag[i] = 0;
+}
+}
+}
+
+static void qio_net_listener_sources_update(QIONetListener *listener)
+{
+size_t i;
+
+if (listener->io_func != NULL) {
+for (i = 0; i < listener->nsioc; i++) {
+assert(listener->io_tag[i] == 0);
+object_ref(OBJECT(listener));
+listener->io_tag[i] = qio_net_listener_source_add(
+listener, listener->sioc[i]);
+}
+}
+}
 
 void qio_net_listener_add(QIONetListener *listener,
   QIOChannelSocket *sioc)
@@ -127,10 +160,8 @@ void qio_net_listener_add(QIONetListener *listener,
 
 if (listener->io_func != NULL) {
 object_ref(OBJECT(listener));
-listener->io_tag[listener->nsioc] = qio_channel_add_watch(
-QIO_CHANNEL(listener->sioc[listener->nsioc]), G_IO_IN,
-qio_net_listener_channel_func,
-listener, (GDestroyNotify)object_unref);
+listener->io_tag[listener->nsioc] = qio_net_listener_source_add(
+listener, listener->sioc[listener->nsioc]);
 }
 
 listener->nsioc++;
@@ -142,8 +173,6 @@ void qio_net_listener_set_client_func(QIONetListener 
*listener,
   gpointer data,
   GDestroyNotify notify)
 {
-size_t i;
-
 if (listener->io_notify) {
 listener->io_notify(listener->io_data);
 }
@@ -151,22 +180,8 @@ void qio_net_listener_set_client_func(QIONetListener 
*listener,
 listener->io_data = data;
 listener->io_notify = notify;
 
-for (i = 0; i < listener->nsioc; i++) {
-if (listener->io_tag[i]) {
-g_source_remove(listener->io_tag[i]);
-listener->io_tag[i] = 0;
-}
-}
-
-if (listener->io_func != NULL) {
-for (i = 0; i < listener->nsioc; i++) {
-object_ref(OBJECT(listener));
-listener->io_tag[i] = qio_channel_add_watch(
-QIO_CHANNEL(listener->sioc[i]), G_IO_IN,
-qio_net_listener_channel_func,
-listener, (GDestroyNotify)object_unref);
-}
-}
+qio_net_listener_sources_clear(listener);
+qio_net_listener_sources_update(listener);
 }
 
 
@@ -210,12 +225,7 @@ QIOChannelSocket 
*qio_net_listener_wait_client(QIONetListener *listener)
 };
 size_t i;
 
-for (i = 0; i < listener->nsioc; i++) {
-if (listener->io_tag[i]) {
-g_source_remove(listener->io_tag[i]);
-listener->io_tag[i] = 0;
-}
-}
+qio_net_listener_sources_clear(listener);
 
 sources = g_new0(GSource *, listener->nsioc);
 for (i = 0; i < listener->nsioc; i++) {
@@ -238,15 +248,7 @@ QIOChannelSocket 
*qio_net_listener_wait_client(QIONetListener *listener)
 g_main_loop_unref(loop);
 g_main_context_unref(ctxt);
 
-if (listener->io_func != NULL) {
-for (i = 0; i < listener->nsioc; i++) {
-object_ref(OBJECT(listener));
-listener->io_tag[i] = qio_channel_add_watch(
-QIO_CHANNEL(listener->sioc[i]), G_IO_IN,
-qio_net_listener_channel_func,
-listener, (GDestroyNotify)object_unref);
-}
-}
+qio_net_listener_sources_update(listener);
 
 return data.sioc;
 }
@@ -259,11 +261,9 @@ void qio_net_listener_disconnect(QIONetListener *listener)
 return;
 }
 
+qio_net_listener_sources_clear(listener);
+
 for (i = 0; i < listener->nsioc; i++) {
-if (listener->io_tag[i]) {
-g_source_remove(listener->io_tag[i]);
-listener->io_tag[i] = 0;
-}
 qio_channel_close(QIO_CHANNEL(listener->sioc[i]), NULL);
 }
 listener->connected = false;
-- 
2.14.3




[Qemu-devel] [PATCH 13/14] qio: allow threaded qiotask to switch contexts

2018-02-27 Thread Peter Xu
This is the part of work to allow the QIOTask to use a different
gcontext rather than the default main gcontext, by providing
qio_task_context_set() API.

We have done some work before on doing similar things to add non-default
gcontext support.  The general idea is that we delete the old GSource
from the main context, then re-add a new one to the new context when
context changed to a non-default one.  However this trick won't work
easily for threaded QIOTasks since we can't easily stop a real thread
and re-setup the whole thing from the very beginning.

But luckily, we don't need to do anything to the thread.  We just need
to keep an eye on the GSource that completes the QIOTask, which is
assigned to gcontext after the sync operation finished.

So when we setup a non-default GMainContext for a threaded QIO task, we
may face two cases:

- the thread is still running the sync task: then we don't need to do
  anything, only to update QIOTask.context to the new context

- the thread has finished the sync task and queued an idle task to main
  thread: then we destroy that old idle task, and re-create it on the
  new GMainContext.

Note that along the way when we modify either idle GSource or the
context, we need to take the mutex before hand, since the thread may be
modifying them at the same time.

Finally, call qio_task_context_set() in the tcp chardev update read
handler hook if QIOTask is running.

Signed-off-by: Peter Xu 
---
 chardev/char-socket.c |  4 +++
 include/io/task.h |  1 +
 io/task.c | 70 ++-
 3 files changed, 63 insertions(+), 12 deletions(-)

diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index 9d51b8da07..164a64ff34 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -585,6 +585,10 @@ static void tcp_chr_update_read_handler(Chardev *chr)
 tcp_chr_telnet_init(CHARDEV(s));
 }
 
+if (s->thread_task) {
+qio_task_context_set(s->thread_task, chr->gcontext);
+}
+
 if (!s->connected) {
 return;
 }
diff --git a/include/io/task.h b/include/io/task.h
index c6acd6489c..87e0152d8a 100644
--- a/include/io/task.h
+++ b/include/io/task.h
@@ -324,5 +324,6 @@ Object *qio_task_get_source(QIOTask *task);
 
 void qio_task_ref(QIOTask *task);
 void qio_task_unref(QIOTask *task);
+void qio_task_context_set(QIOTask *task, GMainContext *context);
 
 #endif /* QIO_TASK_H */
diff --git a/io/task.c b/io/task.c
index 080f9560ea..59bc439bdf 100644
--- a/io/task.c
+++ b/io/task.c
@@ -42,6 +42,9 @@ struct QIOTask {
 uint32_t refcount;
 
 /* Threaded QIO task specific fields */
+bool has_thread;
+QemuThread thread;
+QemuMutex mutex;   /* Protects threaded QIO task fields */
 GSource *idle_source;  /* The idle task to run complete routine */
 GMainContext *context; /* The context that idle task will run with */
 QIOTaskThreadData thread_data;
@@ -57,6 +60,8 @@ QIOTask *qio_task_new(Object *source,
 
 task = g_new0(QIOTask, 1);
 
+qemu_mutex_init(&task->mutex);
+
 task->source = source;
 object_ref(source);
 task->func = func;
@@ -88,7 +93,16 @@ static void qio_task_free(QIOTask *task)
 if (task->context) {
 g_main_context_unref(task->context);
 }
+/*
+ * Make sure the thread quitted before we destroy the mutex,
+ * otherwise the thread might still be using it.
+ */
+if (task->has_thread) {
+qemu_thread_join(&task->thread);
+}
+
 object_unref(task->source);
+qemu_mutex_destroy(&task->mutex);
 
 g_free(task);
 }
@@ -117,12 +131,28 @@ static gboolean qio_task_thread_result(gpointer opaque)
 return FALSE;
 }
 
+/* Must be with QIOTask.mutex held. */
+static void qio_task_thread_create_complete_job(QIOTask *task)
+{
+GSource *idle;
+
+/* Remove the old if there is */
+if (task->idle_source) {
+g_source_destroy(task->idle_source);
+g_source_unref(task->idle_source);
+}
+
+idle = g_idle_source_new();
+g_source_set_callback(idle, qio_task_thread_result, task, NULL);
+g_source_attach(idle, task->context);
+
+task->idle_source = idle;
+}
 
 static gpointer qio_task_thread_worker(gpointer opaque)
 {
 QIOTask *task = opaque;
 QIOTaskThreadData *data = &task->thread_data;
-GSource *idle;
 
 trace_qio_task_thread_run(task);
 data->worker(task, data->opaque);
@@ -134,10 +164,9 @@ static gpointer qio_task_thread_worker(gpointer opaque)
  */
 trace_qio_task_thread_exit(task);
 
-idle = g_idle_source_new();
-g_source_set_callback(idle, qio_task_thread_result, data, NULL);
-g_source_attach(idle, task->context);
-task->idle_source = idle;
+qemu_mutex_lock(&task->mutex);
+qio_task_thread_create_complete_job(task);
+qemu_mutex_unlock(&task->mutex);
 
 return NULL;
 }
@@ -149,24 +178,21 @@ void qio_task_run_in_thread(QIOTask *task,
 GDestroyNotify destroy,
 

[Qemu-devel] [PATCH 09/14] qio: basic non-default context support for thread

2018-02-27 Thread Peter Xu
qio_task_run_in_thread() allows main thread to run blocking operations
in the background. However it has an assumption on that it's always
working with the default context. This patch tries to allow the QIO task
framework to run with non-default gcontext.

Currently no functional change so far, so the QIOTasks are still always
running on main context.

Signed-off-by: Peter Xu 
---
 include/io/task.h|  6 --
 io/channel-socket.c  |  9 ++---
 io/dns-resolver.c|  3 ++-
 io/task.c| 28 ++--
 tests/test-io-task.c |  2 ++
 5 files changed, 40 insertions(+), 8 deletions(-)

diff --git a/include/io/task.h b/include/io/task.h
index 6021f51336..9dbe3758d7 100644
--- a/include/io/task.h
+++ b/include/io/task.h
@@ -227,15 +227,17 @@ QIOTask *qio_task_new(Object *source,
  * @worker: the function to invoke in a thread
  * @opaque: opaque data to pass to @worker
  * @destroy: function to free @opaque
+ * @context: the context to run the complete hook
  *
  * Run a task in a background thread. When @worker
  * returns it will call qio_task_complete() in
- * the main event thread context.
+ * the event thread context that provided.
  */
 void qio_task_run_in_thread(QIOTask *task,
 QIOTaskWorker worker,
 gpointer opaque,
-GDestroyNotify destroy);
+GDestroyNotify destroy,
+GMainContext *context);
 
 /**
  * qio_task_complete:
diff --git a/io/channel-socket.c b/io/channel-socket.c
index 563e297357..4224ce323a 100644
--- a/io/channel-socket.c
+++ b/io/channel-socket.c
@@ -187,7 +187,8 @@ void qio_channel_socket_connect_async(QIOChannelSocket *ioc,
 qio_task_run_in_thread(task,
qio_channel_socket_connect_worker,
addrCopy,
-   (GDestroyNotify)qapi_free_SocketAddress);
+   (GDestroyNotify)qapi_free_SocketAddress,
+   NULL);
 }
 
 
@@ -245,7 +246,8 @@ void qio_channel_socket_listen_async(QIOChannelSocket *ioc,
 qio_task_run_in_thread(task,
qio_channel_socket_listen_worker,
addrCopy,
-   (GDestroyNotify)qapi_free_SocketAddress);
+   (GDestroyNotify)qapi_free_SocketAddress,
+   NULL);
 }
 
 
@@ -321,7 +323,8 @@ void qio_channel_socket_dgram_async(QIOChannelSocket *ioc,
 qio_task_run_in_thread(task,
qio_channel_socket_dgram_worker,
data,
-   qio_channel_socket_dgram_worker_free);
+   qio_channel_socket_dgram_worker_free,
+   NULL);
 }
 
 
diff --git a/io/dns-resolver.c b/io/dns-resolver.c
index c072d121c3..75c2ca9c4a 100644
--- a/io/dns-resolver.c
+++ b/io/dns-resolver.c
@@ -233,7 +233,8 @@ void qio_dns_resolver_lookup_async(QIODNSResolver *resolver,
 qio_task_run_in_thread(task,
qio_dns_resolver_lookup_worker,
data,
-   qio_dns_resolver_lookup_data_free);
+   qio_dns_resolver_lookup_data_free,
+   NULL);
 }
 
 
diff --git a/io/task.c b/io/task.c
index 1a0a1c7185..204c0be286 100644
--- a/io/task.c
+++ b/io/task.c
@@ -32,6 +32,10 @@ struct QIOTask {
 Error *err;
 gpointer result;
 GDestroyNotify destroyResult;
+
+/* Threaded QIO task specific fields */
+GSource *idle_source;  /* The idle task to run complete routine */
+GMainContext *context; /* The context that idle task will run with */
 };
 
 
@@ -49,6 +53,7 @@ QIOTask *qio_task_new(Object *source,
 task->func = func;
 task->opaque = opaque;
 task->destroy = destroy;
+task->idle_source = NULL;
 
 trace_qio_task_new(task, source, func, opaque);
 
@@ -66,6 +71,12 @@ static void qio_task_free(QIOTask *task)
 if (task->err) {
 error_free(task->err);
 }
+if (task->idle_source) {
+g_source_unref(task->idle_source);
+}
+if (task->context) {
+g_main_context_unref(task->context);
+}
 object_unref(task->source);
 
 g_free(task);
@@ -100,6 +111,8 @@ static gboolean qio_task_thread_result(gpointer opaque)
 static gpointer qio_task_thread_worker(gpointer opaque)
 {
 struct QIOTaskThreadData *data = opaque;
+QIOTask *task = data->task;
+GSource *idle;
 
 trace_qio_task_thread_run(data->task);
 data->worker(data->task, data->opaque);
@@ -110,7 +123,12 @@ static gpointer qio_task_thread_worker(gpointer opaque)
  * the worker results
  */
 trace_qio_task_thread_exit(data->task);
-g_idle_add(qio_task_thread_result, data);
+
+idle = g_idle_source_new();
+g_source_set_callback(idle, qio_task_thread_result, data, NULL);
+g_source_attach(idle, task->con

[Qemu-devel] [PATCH 08/14] chardev: allow telnet gsource to switch gcontext

2018-02-27 Thread Peter Xu
It was originally created by qio_channel_add_watch() so it's always
assigning the task to main context.  Now we use the new API called
qio_channel_add_watch_full() so that we get the GSource handle rather
than the tag ID.

Meanwhile, caching the gsource in SocketChardev.telnet_source so that we
can also do dynamic context switch when update read handlers.

Signed-off-by: Peter Xu 
---
 chardev/char-socket.c | 28 
 1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index 8f0935cd15..a16d894c40 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -59,6 +59,7 @@ typedef struct {
 bool is_listen;
 bool is_telnet;
 bool is_tn3270;
+GSource *telnet_source;
 
 GSource *reconnect_timer;
 int64_t reconnect_time;
@@ -69,6 +70,7 @@ typedef struct {
 OBJECT_CHECK(SocketChardev, (obj), TYPE_CHARDEV_SOCKET)
 
 static gboolean socket_reconnect_timeout(gpointer opaque);
+static void tcp_chr_telnet_init(Chardev *chr);
 
 static void tcp_chr_reconn_timer_cancel(SocketChardev *s)
 {
@@ -555,6 +557,15 @@ static void tcp_chr_connect(void *opaque)
 qemu_chr_be_event(chr, CHR_EVENT_OPENED);
 }
 
+static void tcp_chr_telnet_destroy(SocketChardev *s)
+{
+if (s->telnet_source) {
+g_source_destroy(s->telnet_source);
+g_source_unref(s->telnet_source);
+s->telnet_source = NULL;
+}
+}
+
 static void tcp_chr_update_read_handler(Chardev *chr)
 {
 SocketChardev *s = SOCKET_CHARDEV(chr);
@@ -568,6 +579,11 @@ static void tcp_chr_update_read_handler(Chardev *chr)
 qio_net_listener_set_context(s->listener, chr->gcontext);
 }
 
+if (s->telnet_source) {
+tcp_chr_telnet_destroy(s);
+tcp_chr_telnet_init(CHARDEV(s));
+}
+
 if (!s->connected) {
 return;
 }
@@ -592,6 +608,7 @@ static gboolean tcp_chr_telnet_init_io(QIOChannel *ioc,
gpointer user_data)
 {
 TCPChardevTelnetInit *init = user_data;
+SocketChardev *s = SOCKET_CHARDEV(init->chr);
 ssize_t ret;
 
 ret = qio_channel_write(ioc, init->buf, init->buflen, NULL);
@@ -616,6 +633,8 @@ static gboolean tcp_chr_telnet_init_io(QIOChannel *ioc,
 
 end:
 g_free(init);
+g_source_unref(s->telnet_source);
+s->telnet_source = NULL;
 return G_SOURCE_REMOVE;
 }
 
@@ -655,10 +674,10 @@ static void tcp_chr_telnet_init(Chardev *chr)
 
 #undef IACSET
 
-qio_channel_add_watch(
-s->ioc, G_IO_OUT,
-tcp_chr_telnet_init_io,
-init, NULL);
+s->telnet_source = qio_channel_add_watch_full(s->ioc, G_IO_OUT,
+  tcp_chr_telnet_init_io,
+  init, NULL,
+  chr->gcontext);
 }
 
 
@@ -831,6 +850,7 @@ static void char_socket_finalize(Object *obj)
 tcp_chr_free_connection(chr);
 tcp_chr_reconn_timer_cancel(s);
 qapi_free_SocketAddress(s->addr);
+tcp_chr_telnet_destroy(s);
 if (s->listener) {
 qio_net_listener_set_client_func(s->listener, NULL, NULL, NULL);
 object_unref(OBJECT(s->listener));
-- 
2.14.3




[Qemu-devel] [PATCH 04/14] migration: let incoming side use thread context

2018-02-27 Thread Peter Xu
The old incoming migration is running in main thread and default
gcontext.  With the new qio_channel_add_watch_full() we can now let it
run in the thread's own gcontext (if there is one).

Currently this patch does nothing alone.  But when any of the incoming
migration is run in another iothread (e.g., the upcoming migrate-recover
command), this patch will bind the incoming logic to the iothread
instead of the main thread (which may already get page faulted and
hanged).

RDMA is not considered for now since it's not even using the QIO APIs at
all.

CC: Juan Quintela 
CC: Dr. David Alan Gilbert 
CC: Laurent Vivier 
Signed-off-by: Peter Xu 
---
 migration/exec.c   | 11 ++-
 migration/fd.c | 11 ++-
 migration/socket.c | 12 +++-
 3 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/migration/exec.c b/migration/exec.c
index 0bc5a427dd..f401fc005e 100644
--- a/migration/exec.c
+++ b/migration/exec.c
@@ -55,6 +55,7 @@ void exec_start_incoming_migration(const char *command, Error 
**errp)
 {
 QIOChannel *ioc;
 const char *argv[] = { "/bin/sh", "-c", command, NULL };
+GSource *source;
 
 trace_migration_exec_incoming(command);
 ioc = QIO_CHANNEL(qio_channel_command_new_spawn(argv,
@@ -65,9 +66,9 @@ void exec_start_incoming_migration(const char *command, Error 
**errp)
 }
 
 qio_channel_set_name(ioc, "migration-exec-incoming");
-qio_channel_add_watch(ioc,
-  G_IO_IN,
-  exec_accept_incoming_migration,
-  NULL,
-  NULL);
+source = qio_channel_add_watch_full(ioc, G_IO_IN,
+exec_accept_incoming_migration,
+NULL, NULL,
+g_main_context_get_thread_default());
+g_source_unref(source);
 }
diff --git a/migration/fd.c b/migration/fd.c
index cd06182d1e..9c593eb3ff 100644
--- a/migration/fd.c
+++ b/migration/fd.c
@@ -55,6 +55,7 @@ void fd_start_incoming_migration(const char *infd, Error 
**errp)
 {
 QIOChannel *ioc;
 int fd;
+GSource *source;
 
 fd = strtol(infd, NULL, 0);
 trace_migration_fd_incoming(fd);
@@ -66,9 +67,9 @@ void fd_start_incoming_migration(const char *infd, Error 
**errp)
 }
 
 qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-incoming");
-qio_channel_add_watch(ioc,
-  G_IO_IN,
-  fd_accept_incoming_migration,
-  NULL,
-  NULL);
+source = qio_channel_add_watch_full(ioc, G_IO_IN,
+fd_accept_incoming_migration,
+NULL, NULL,
+g_main_context_get_thread_default());
+g_source_unref(source);
 }
diff --git a/migration/socket.c b/migration/socket.c
index e090097077..82c330083c 100644
--- a/migration/socket.c
+++ b/migration/socket.c
@@ -164,6 +164,7 @@ static void socket_start_incoming_migration(SocketAddress 
*saddr,
 Error **errp)
 {
 QIOChannelSocket *listen_ioc = qio_channel_socket_new();
+GSource *source;
 
 qio_channel_set_name(QIO_CHANNEL(listen_ioc),
  "migration-socket-listener");
@@ -173,11 +174,12 @@ static void socket_start_incoming_migration(SocketAddress 
*saddr,
 return;
 }
 
-qio_channel_add_watch(QIO_CHANNEL(listen_ioc),
-  G_IO_IN,
-  socket_accept_incoming_migration,
-  listen_ioc,
-  (GDestroyNotify)object_unref);
+source = qio_channel_add_watch_full(QIO_CHANNEL(listen_ioc), G_IO_IN,
+socket_accept_incoming_migration,
+listen_ioc,
+(GDestroyNotify)object_unref,
+g_main_context_get_thread_default());
+g_source_unref(source);
 }
 
 void tcp_start_incoming_migration(const char *host_port, Error **errp)
-- 
2.14.3




[Qemu-devel] [PATCH 07/14] qio/chardev: update net listener gcontext

2018-02-27 Thread Peter Xu
TCP chardevs can be using QIO network listeners working in the
background when in listening mode.  However the network listeners are
always running in main context.  This can race with chardevs that are
running in non-main contexts.

To solve this: firstly introduce qio_net_listener_set_context() to allow
caller to set gcontext for network listeners.  Then call it in
tcp_chr_update_read_handler(), with the newly cached gcontext.

It's fairly straightforward after we have introduced some net listener
helper functions - basically we unregister the GSources and add them
back with the correct context.

Signed-off-by: Peter Xu 
---
 chardev/char-socket.c |  9 +
 include/io/net-listener.h | 12 
 io/net-listener.c |  7 +++
 3 files changed, 28 insertions(+)

diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index 43a2cc2c1c..8f0935cd15 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -559,6 +559,15 @@ static void tcp_chr_update_read_handler(Chardev *chr)
 {
 SocketChardev *s = SOCKET_CHARDEV(chr);
 
+if (s->listener) {
+/*
+ * It's possible that chardev context is changed in
+ * qemu_chr_be_update_read_handlers().  Reset it for QIO net
+ * listener if there is.
+ */
+qio_net_listener_set_context(s->listener, chr->gcontext);
+}
+
 if (!s->connected) {
 return;
 }
diff --git a/include/io/net-listener.h b/include/io/net-listener.h
index 566be283b3..39dede9d6f 100644
--- a/include/io/net-listener.h
+++ b/include/io/net-listener.h
@@ -106,6 +106,18 @@ int qio_net_listener_open_sync(QIONetListener *listener,
SocketAddress *addr,
Error **errp);
 
+/**
+ * qio_net_listener_set_context:
+ * @listener: the net listener object
+ * @context: the context that we'd like to bind the sources to
+ *
+ * This helper does not do anything but moves existing net listener
+ * sources from the old one to the new one.  It can be seen as a
+ * no-operation if there is no listening source at all.
+ */
+void qio_net_listener_set_context(QIONetListener *listener,
+  GMainContext *context);
+
 /**
  * qio_net_listener_add:
  * @listener: the network listener object
diff --git a/io/net-listener.c b/io/net-listener.c
index 7f07a81fed..7ffad72f55 100644
--- a/io/net-listener.c
+++ b/io/net-listener.c
@@ -145,6 +145,13 @@ static void qio_net_listener_sources_update(QIONetListener 
*listener,
 }
 }
 
+void qio_net_listener_set_context(QIONetListener *listener,
+  GMainContext *context)
+{
+qio_net_listener_sources_clear(listener);
+qio_net_listener_sources_update(listener, context);
+}
+
 void qio_net_listener_add(QIONetListener *listener,
   QIOChannelSocket *sioc)
 {
-- 
2.14.3




[Qemu-devel] [PATCH 14/14] qio/chardev: specify gcontext for TLS handshake

2018-02-27 Thread Peter Xu
We allow the TLS code to be run with non-default gcontext by providing a
new qio_channel_tls_handshake_full() API.

With the new API, we can re-setup the TLS handshake GSource by calling
it again with the correct gcontext.  Any call to the function will clean
up existing GSource tasks, and re-setup using the new gcontext.

Signed-off-by: Peter Xu 
---
 chardev/char-socket.c| 30 +---
 include/io/channel-tls.h | 22 +++-
 io/channel-tls.c | 91 
 3 files changed, 123 insertions(+), 20 deletions(-)

diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index 164a64ff34..406d33c04f 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -72,6 +72,9 @@ typedef struct {
 
 static gboolean socket_reconnect_timeout(gpointer opaque);
 static void tcp_chr_telnet_init(Chardev *chr);
+static void tcp_chr_tls_handshake_setup(Chardev *chr,
+QIOChannelTLS *tioc,
+GMainContext *context);
 
 static void tcp_chr_reconn_timer_cancel(SocketChardev *s)
 {
@@ -570,6 +573,7 @@ static void tcp_chr_telnet_destroy(SocketChardev *s)
 static void tcp_chr_update_read_handler(Chardev *chr)
 {
 SocketChardev *s = SOCKET_CHARDEV(chr);
+QIOChannelTLS *tioc;
 
 if (s->listener) {
 /*
@@ -589,6 +593,17 @@ static void tcp_chr_update_read_handler(Chardev *chr)
 qio_task_context_set(s->thread_task, chr->gcontext);
 }
 
+tioc = (QIOChannelTLS *)object_dynamic_cast(OBJECT(s->ioc),
+TYPE_QIO_CHANNEL_TLS);
+if (tioc) {
+/*
+ * TLS session enabled; reconfigure things up.  Note that, if
+ * there is existing handshake task, it'll be cleaned up first
+ * in QIO code.
+ */
+tcp_chr_tls_handshake_setup(chr, tioc, chr->gcontext);
+}
+
 if (!s->connected) {
 return;
 }
@@ -704,6 +719,16 @@ static void tcp_chr_tls_handshake(QIOTask *task,
 }
 }
 
+static void tcp_chr_tls_handshake_setup(Chardev *chr,
+QIOChannelTLS *tioc,
+GMainContext *context)
+{
+qio_channel_tls_handshake_full(tioc,
+   tcp_chr_tls_handshake,
+   chr,
+   NULL,
+   context);
+}
 
 static void tcp_chr_tls_init(Chardev *chr)
 {
@@ -736,10 +761,7 @@ static void tcp_chr_tls_init(Chardev *chr)
 object_unref(OBJECT(s->ioc));
 s->ioc = QIO_CHANNEL(tioc);
 
-qio_channel_tls_handshake(tioc,
-  tcp_chr_tls_handshake,
-  chr,
-  NULL);
+tcp_chr_tls_handshake_setup(chr, tioc, NULL);
 }
 
 
diff --git a/include/io/channel-tls.h b/include/io/channel-tls.h
index d157eb10e8..98b7cd1e51 100644
--- a/include/io/channel-tls.h
+++ b/include/io/channel-tls.h
@@ -48,6 +48,9 @@ struct QIOChannelTLS {
 QIOChannel parent;
 QIOChannel *master;
 QCryptoTLSSession *session;
+GMainContext *context;
+GSource *tls_source;
+QIOTask *task;
 };
 
 /**
@@ -111,11 +114,12 @@ qio_channel_tls_new_client(QIOChannel *master,
Error **errp);
 
 /**
- * qio_channel_tls_handshake:
+ * qio_channel_tls_handshake_full:
  * @ioc: the TLS channel object
  * @func: the callback to invoke when completed
  * @opaque: opaque data to pass to @func
  * @destroy: optional callback to free @opaque
+ * @context: the context that will run the handshake task
  *
  * Perform the TLS session handshake. This method
  * will return immediately and the handshake will
@@ -123,6 +127,22 @@ qio_channel_tls_new_client(QIOChannel *master,
  * loop is running. When the handshake is complete,
  * or fails, the @func callback will be invoked.
  */
+void qio_channel_tls_handshake_full(QIOChannelTLS *ioc,
+QIOTaskFunc func,
+gpointer opaque,
+GDestroyNotify destroy,
+GMainContext *context);
+
+/**
+ * qio_channel_tls_handshake:
+ * @ioc: the TLS channel object
+ * @func: the callback to invoke when completed
+ * @opaque: opaque data to pass to @func
+ * @destroy: optional callback to free @opaque
+ *
+ * Wrapper of qio_channel_tls_handshake_full(), only that we are
+ * running the handshake always on default main context.
+ */
 void qio_channel_tls_handshake(QIOChannelTLS *ioc,
QIOTaskFunc func,
gpointer opaque,
diff --git a/io/channel-tls.c b/io/channel-tls.c
index 6182702dab..b173680526 100644
--- a/io/channel-tls.c
+++ b/io/channel-tls.c
@@ -145,8 +145,12 @@ static gboolean qio_channel_tls_handshake_io(QIOChannel 
*ioc,
  GIO

[Qemu-devel] [PATCH 00/14] qio: general non-default GMainContext support

2018-02-27 Thread Peter Xu
This is another preparation work for monitor OOB seires.

This series tries to allow QIO code to run with non-default
GMainContext.  Note that for most places I kept the old code
untouched, and only modified/introduced new interfaces where there can
be a non-default GMainContext.  The "where" is mostly migration and
chardev submodules, since these two parts of code can be run with
monitor IOThread in the future (which holds a non-default
GMainContext).

These are existing known issue to be solved with GSources that bound
to main thread:

- migration
  - incoming side: still always running on main context, while we need
to be able to run some command in OOB thread [1]
- tcp chardev (non-tcp chardevs should all support non-NULL context now)
  - server listening mode: QIO net listener used [2]
  - TELNET session: an isolated GSource used (tcp_chr_telnet_init) [3]
  - when "reconnect=N" is used, QIO threaded task is used [4]
  - TLS session: QIO tls handshake is used (tcp_chr_tls_init) [5]

Patch 1-2 are cleanups and fixes.

Patch 3 introduced qio_channel_add_watch_full(), which is the core API
for QIO to support non-default context.

Patch 4 fixes the migration usage of QIO, which is problem [1] above.

Patch 5-7 fixes the net listeners to use non-default gcontext, which
solves problem [2] above.

Patch 8 fixes the TELNET GSource, which solves problem [3].

Patch 9-13 fixes the threaded QIOTask usage, which is for problem [4].

Patch 14 fixes the last TLS usage, which is problem [5].

The whole series survives with "make check".  There are quite a few
QIO tests there.  Let's see whether this can be acceptable before more
tests.

Please review.  Thanks.

Peter Xu (14):
  chardev: fix leak in tcp_chr_telnet_init_io()
  qio: rename qio_task_thread_result
  qio: introduce qio_channel_add_watch_full()
  migration: let incoming side use thread context
  qio: refactor net listener source operations
  qio: store gsources for net listeners
  qio/chardev: update net listener gcontext
  chardev: allow telnet gsource to switch gcontext
  qio: basic non-default context support for thread
  qio: refcount QIOTask
  qio/chardev: return QIOTask when connect async
  qio: move QIOTaskThreadData into QIOTask
  qio: allow threaded qiotask to switch contexts
  qio/chardev: specify gcontext for TLS handshake

 chardev/char-socket.c   | 114 ++---
 include/io/channel-socket.h |  14 +++--
 include/io/channel-tls.h|  22 ++-
 include/io/channel.h|  31 -
 include/io/net-listener.h   |  33 +-
 include/io/task.h   |  10 ++-
 io/channel-socket.c |  21 ---
 io/channel-tls.c|  91 ++-
 io/channel.c|  24 +--
 io/dns-resolver.c   |   3 +-
 io/net-listener.c   | 119 +--
 io/task.c   | 149 
 migration/exec.c|  11 ++--
 migration/fd.c  |  11 ++--
 migration/socket.c  |  12 ++--
 tests/test-io-task.c|   2 +
 16 files changed, 514 insertions(+), 153 deletions(-)

-- 
2.14.3




[Qemu-devel] [PATCH 03/14] qio: introduce qio_channel_add_watch_full()

2018-02-27 Thread Peter Xu
It's a more powerful version of qio_channel_add_watch(), which supports
non-default gcontext.  It's stripped from the old one, then we have
g_source_get_id() to fetch the tag ID to keep the old interface.

Note that the new API will return a gsource, meanwhile keep a reference
of it so that callers need to unref them explicitly.

Signed-off-by: Peter Xu 
---
 include/io/channel.h | 31 ---
 io/channel.c | 24 +++-
 2 files changed, 47 insertions(+), 8 deletions(-)

diff --git a/include/io/channel.h b/include/io/channel.h
index 3995e243a3..36af5e58ae 100644
--- a/include/io/channel.h
+++ b/include/io/channel.h
@@ -620,20 +620,45 @@ GSource *qio_channel_create_watch(QIOChannel *ioc,
   GIOCondition condition);
 
 /**
- * qio_channel_add_watch:
+ * qio_channel_add_watch_full:
  * @ioc: the channel object
  * @condition: the I/O condition to monitor
  * @func: callback to invoke when the source becomes ready
  * @user_data: opaque data to pass to @func
  * @notify: callback to free @user_data
+ * @context: gcontext to bind the source to
  *
- * Create a new main loop source that is used to watch
+ * Create a new source that is used to watch
  * for the I/O condition @condition. The callback @func
  * will be registered against the source, to be invoked
  * when the source becomes ready. The optional @user_data
  * will be passed to @func when it is invoked. The @notify
  * callback will be used to free @user_data when the
- * watch is deleted
+ * watch is deleted.  The source will be bound to @context if
+ * provided, or main context if it is NULL.
+ *
+ * Note: if a valid source is returned, we need to explicitly unref
+ * the source to destroy it.
+ *
+ * Returns: the source pointer
+ */
+GSource *qio_channel_add_watch_full(QIOChannel *ioc,
+GIOCondition condition,
+QIOChannelFunc func,
+gpointer user_data,
+GDestroyNotify notify,
+GMainContext *context);
+
+/**
+ * qio_channel_add_watch:
+ * @ioc: the channel object
+ * @condition: the I/O condition to monitor
+ * @func: callback to invoke when the source becomes ready
+ * @user_data: opaque data to pass to @func
+ * @notify: callback to free @user_data
+ *
+ * Wrapper of qio_channel_add_watch_full(), but it'll only bind the
+ * source object to default main context.
  *
  * The returned source ID can be used with g_source_remove()
  * to remove and free the source when no longer required.
diff --git a/io/channel.c b/io/channel.c
index ec4b86de7c..3e734cc9a5 100644
--- a/io/channel.c
+++ b/io/channel.c
@@ -299,6 +299,22 @@ void qio_channel_set_aio_fd_handler(QIOChannel *ioc,
 klass->io_set_aio_fd_handler(ioc, ctx, io_read, io_write, opaque);
 }
 
+GSource *qio_channel_add_watch_full(QIOChannel *ioc,
+GIOCondition condition,
+QIOChannelFunc func,
+gpointer user_data,
+GDestroyNotify notify,
+GMainContext *context)
+{
+GSource *source;
+
+source = qio_channel_create_watch(ioc, condition);
+g_source_set_callback(source, (GSourceFunc)func, user_data, notify);
+g_source_attach(source, context);
+
+return source;
+}
+
 guint qio_channel_add_watch(QIOChannel *ioc,
 GIOCondition condition,
 QIOChannelFunc func,
@@ -308,11 +324,9 @@ guint qio_channel_add_watch(QIOChannel *ioc,
 GSource *source;
 guint id;
 
-source = qio_channel_create_watch(ioc, condition);
-
-g_source_set_callback(source, (GSourceFunc)func, user_data, notify);
-
-id = g_source_attach(source, NULL);
+source = qio_channel_add_watch_full(ioc, condition, func,
+user_data, notify, NULL);
+id = g_source_get_id(source);
 g_source_unref(source);
 
 return id;
-- 
2.14.3




[Qemu-devel] [PATCH 11/14] qio/chardev: return QIOTask when connect async

2018-02-27 Thread Peter Xu
Let qio_channel_socket_connect_async() return the created QIOTask object
for the async connection.  In tcp chardev, cache that in SocketChardev
for further use.  With the QIOTask refcount, this is pretty safe.

Since at it, generalize out tcp_chr_socket_connect_async() since the
logic is used in both initial phase and reconnect timeout.

Signed-off-by: Peter Xu 
---
 chardev/char-socket.c   | 33 ++---
 include/io/channel-socket.h | 14 +-
 io/channel-socket.c | 12 +++-
 3 files changed, 38 insertions(+), 21 deletions(-)

diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index a16d894c40..9d51b8da07 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -64,6 +64,7 @@ typedef struct {
 GSource *reconnect_timer;
 int64_t reconnect_time;
 bool connect_err_reported;
+QIOTask *thread_task;
 } SocketChardev;
 
 #define SOCKET_CHARDEV(obj) \
@@ -879,14 +880,32 @@ static void qemu_chr_socket_connected(QIOTask *task, void 
*opaque)
 tcp_chr_new_client(chr, sioc);
 
 cleanup:
+assert(s->thread_task == task);
+qio_task_unref(task);
+s->thread_task = NULL;
 object_unref(OBJECT(sioc));
 }
 
+static void tcp_chr_socket_connect_async(SocketChardev *s)
+{
+QIOChannelSocket *sioc = qio_channel_socket_new();
+Chardev *chr = CHARDEV(s);
+QIOTask *task;
+
+assert(s->thread_task == NULL);
+
+tcp_chr_set_client_ioc_name(chr, sioc);
+task = qio_channel_socket_connect_async(sioc, s->addr,
+qemu_chr_socket_connected,
+chr, NULL);
+qio_task_ref(task);
+s->thread_task = task;
+}
+
 static gboolean socket_reconnect_timeout(gpointer opaque)
 {
 Chardev *chr = CHARDEV(opaque);
 SocketChardev *s = SOCKET_CHARDEV(opaque);
-QIOChannelSocket *sioc;
 
 g_source_unref(s->reconnect_timer);
 s->reconnect_timer = NULL;
@@ -895,11 +914,7 @@ static gboolean socket_reconnect_timeout(gpointer opaque)
 return false;
 }
 
-sioc = qio_channel_socket_new();
-tcp_chr_set_client_ioc_name(chr, sioc);
-qio_channel_socket_connect_async(sioc, s->addr,
- qemu_chr_socket_connected,
- chr, NULL);
+tcp_chr_socket_connect_async(s);
 
 return false;
 }
@@ -979,11 +994,7 @@ static void qmp_chardev_open_socket(Chardev *chr,
 }
 
 if (s->reconnect_time) {
-sioc = qio_channel_socket_new();
-tcp_chr_set_client_ioc_name(chr, sioc);
-qio_channel_socket_connect_async(sioc, s->addr,
- qemu_chr_socket_connected,
- chr, NULL);
+tcp_chr_socket_connect_async(s);
 } else {
 if (s->is_listen) {
 char *name;
diff --git a/include/io/channel-socket.h b/include/io/channel-socket.h
index 53801f6042..5cfa9e2b7c 100644
--- a/include/io/channel-socket.h
+++ b/include/io/channel-socket.h
@@ -108,12 +108,16 @@ int qio_channel_socket_connect_sync(QIOChannelSocket *ioc,
  * will be invoked on completion or failure. The @addr
  * parameter will be copied, so may be freed as soon
  * as this function returns without waiting for completion.
+ *
+ * Returns the IOTask created.  NOTE: if the caller is going to use
+ * the returned QIOTask, the caller is responsible to reference the
+ * task and unref it when it's not needed any more.
  */
-void qio_channel_socket_connect_async(QIOChannelSocket *ioc,
-  SocketAddress *addr,
-  QIOTaskFunc callback,
-  gpointer opaque,
-  GDestroyNotify destroy);
+QIOTask *qio_channel_socket_connect_async(QIOChannelSocket *ioc,
+  SocketAddress *addr,
+  QIOTaskFunc callback,
+  gpointer opaque,
+  GDestroyNotify destroy);
 
 
 /**
diff --git a/io/channel-socket.c b/io/channel-socket.c
index 4224ce323a..f420502290 100644
--- a/io/channel-socket.c
+++ b/io/channel-socket.c
@@ -169,11 +169,11 @@ static void qio_channel_socket_connect_worker(QIOTask 
*task,
 }
 
 
-void qio_channel_socket_connect_async(QIOChannelSocket *ioc,
-  SocketAddress *addr,
-  QIOTaskFunc callback,
-  gpointer opaque,
-  GDestroyNotify destroy)
+QIOTask *qio_channel_socket_connect_async(QIOChannelSocket *ioc,
+  SocketAddress *addr,
+  QIOTaskFunc callback,
+  gpointer opaque,
+ 

[Qemu-devel] [PATCH 06/14] qio: store gsources for net listeners

2018-02-27 Thread Peter Xu
Originally we were storing the GSources tag IDs.  That'll be not enough
if we are going to support non-default gcontext for QIO code.  Switch to
GSources without changing anything real.  Now we still always pass in
NULL, which means the default gcontext.

Signed-off-by: Peter Xu 
---
 include/io/net-listener.h | 21 ++--
 io/net-listener.c | 62 +--
 2 files changed, 58 insertions(+), 25 deletions(-)

diff --git a/include/io/net-listener.h b/include/io/net-listener.h
index 56d6da7a76..566be283b3 100644
--- a/include/io/net-listener.h
+++ b/include/io/net-listener.h
@@ -53,7 +53,7 @@ struct QIONetListener {
 
 char *name;
 QIOChannelSocket **sioc;
-gulong *io_tag;
+GSource **io_source;
 size_t nsioc;
 
 bool connected;
@@ -120,17 +120,34 @@ void qio_net_listener_add(QIONetListener *listener,
   QIOChannelSocket *sioc);
 
 /**
- * qio_net_listener_set_client_func:
+ * qio_net_listener_set_client_func_full:
  * @listener: the network listener object
  * @func: the callback function
  * @data: opaque data to pass to @func
  * @notify: callback to free @data
+ * @context: the context that the sources will be bound to
  *
  * Register @func to be invoked whenever a new client
  * connects to the listener. @func will be invoked
  * passing in the QIOChannelSocket instance for the
  * client.
  */
+void qio_net_listener_set_client_func_full(QIONetListener *listener,
+   QIONetListenerClientFunc func,
+   gpointer data,
+   GDestroyNotify notify,
+   GMainContext *context);
+
+/**
+ * qio_net_listener_set_client_func:
+ * @listener: the network listener object
+ * @func: the callback function
+ * @data: opaque data to pass to @func
+ * @notify: callback to free @data
+ *
+ * Wrapper of qio_net_listener_set_client_func_full(), only that the
+ * sources will always be bound to default main context.
+ */
 void qio_net_listener_set_client_func(QIONetListener *listener,
   QIONetListenerClientFunc func,
   gpointer data,
diff --git a/io/net-listener.c b/io/net-listener.c
index 3e9ac51b0e..7f07a81fed 100644
--- a/io/net-listener.c
+++ b/io/net-listener.c
@@ -106,12 +106,15 @@ int qio_net_listener_open_sync(QIONetListener *listener,
 }
 }
 
-static guint qio_net_listener_source_add(QIONetListener *listener,
- QIOChannelSocket *sioc)
+static GSource *qio_net_listener_source_add(QIONetListener *listener,
+QIOChannelSocket *sioc,
+GMainContext *context)
 {
-return qio_channel_add_watch(QIO_CHANNEL(sioc), G_IO_IN,
- qio_net_listener_channel_func,
- listener, (GDestroyNotify)object_unref);
+return qio_channel_add_watch_full(QIO_CHANNEL(sioc), G_IO_IN,
+  qio_net_listener_channel_func,
+  listener,
+  (GDestroyNotify)object_unref,
+  context);
 }
 
 static void qio_net_listener_sources_clear(QIONetListener *listener)
@@ -119,23 +122,25 @@ static void qio_net_listener_sources_clear(QIONetListener 
*listener)
 size_t i;
 
 for (i = 0; i < listener->nsioc; i++) {
-if (listener->io_tag[i]) {
-g_source_remove(listener->io_tag[i]);
-listener->io_tag[i] = 0;
+if (listener->io_source[i]) {
+g_source_destroy(listener->io_source[i]);
+g_source_unref(listener->io_source[i]);
+listener->io_source[i] = NULL;
 }
 }
 }
 
-static void qio_net_listener_sources_update(QIONetListener *listener)
+static void qio_net_listener_sources_update(QIONetListener *listener,
+GMainContext *context)
 {
 size_t i;
 
 if (listener->io_func != NULL) {
 for (i = 0; i < listener->nsioc; i++) {
-assert(listener->io_tag[i] == 0);
+assert(listener->io_source[i] == NULL);
 object_ref(OBJECT(listener));
-listener->io_tag[i] = qio_net_listener_source_add(
-listener, listener->sioc[i]);
+listener->io_source[i] = qio_net_listener_source_add(
+listener, listener->sioc[i], context);
 }
 }
 }
@@ -151,27 +156,30 @@ void qio_net_listener_add(QIONetListener *listener,
 
 listener->sioc = g_renew(QIOChannelSocket *, listener->sioc,
  listener->nsioc + 1);
-listener->io_tag = g_renew(gulong, listener->io_tag, listener->nsioc + 1);
+listener->io_source = g_renew(typeof(listener->io_source[0]),
+  

[Qemu-devel] [PATCH 01/14] chardev: fix leak in tcp_chr_telnet_init_io()

2018-02-27 Thread Peter Xu
Need to free TCPChardevTelnetInit when session established.

Since at it, switch to use G_SOURCE_* macros.

Signed-off-by: Peter Xu 
---
 chardev/char-socket.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index bdd6cff5f6..43a2cc2c1c 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -591,19 +591,23 @@ static gboolean tcp_chr_telnet_init_io(QIOChannel *ioc,
 ret = 0;
 } else {
 tcp_chr_disconnect(init->chr);
-return FALSE;
+goto end;
 }
 }
 init->buflen -= ret;
 
 if (init->buflen == 0) {
 tcp_chr_connect(init->chr);
-return FALSE;
+goto end;
 }
 
 memmove(init->buf, init->buf + ret, init->buflen);
 
-return TRUE;
+return G_SOURCE_CONTINUE;
+
+end:
+g_free(init);
+return G_SOURCE_REMOVE;
 }
 
 static void tcp_chr_telnet_init(Chardev *chr)
-- 
2.14.3




[Qemu-devel] [PATCH 02/14] qio: rename qio_task_thread_result

2018-02-27 Thread Peter Xu
It is strange that it was called gio_task_thread_result.  Rename it to
follow the naming rule of the file.

Signed-off-by: Peter Xu 
---
 io/task.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/io/task.c b/io/task.c
index 3ce556017c..1a0a1c7185 100644
--- a/io/task.c
+++ b/io/task.c
@@ -80,7 +80,7 @@ struct QIOTaskThreadData {
 };
 
 
-static gboolean gio_task_thread_result(gpointer opaque)
+static gboolean qio_task_thread_result(gpointer opaque)
 {
 struct QIOTaskThreadData *data = opaque;
 
@@ -110,7 +110,7 @@ static gpointer qio_task_thread_worker(gpointer opaque)
  * the worker results
  */
 trace_qio_task_thread_exit(data->task);
-g_idle_add(gio_task_thread_result, data);
+g_idle_add(qio_task_thread_result, data);
 return NULL;
 }
 
-- 
2.14.3




Re: [Qemu-devel] [RFC PATCH 0/5] atapi: change unlimited recursion to while loop

2018-02-27 Thread John Snow


On 02/23/2018 10:26 AM, Paolo Bonzini wrote:
> Real hardware doesn't have an unlimited stack, so the unlimited
> recursion in the ATAPI code smells a bit.  In fact, the call to
> ide_transfer_start easily becomes a tail call with a small change
> to the code (patch 4).  The remaining four patches move code around
> so as to the turn the call back to ide_atapi_cmd_reply_end into
> another tail call, and then convert the (double) tail recursion into
> a while loop.
> 
> I'm not sure how this can be tested, apart from adding a READ CD
> test to ahci-test (which I don't really have time for now, hence
> the RFC tag).  The existing AHCI tests still pass, so patches 1-3
> aren't complete crap.
> 
> Paolo
> 
> Paolo Bonzini (5):
>   ide: push call to end_transfer_func out of start_transfer callback
>   ide: push end_transfer callback to ide_transfer_halt
>   ide: make ide_transfer_stop idempotent
>   atapi: call ide_set_irq before ide_transfer_start
>   ide: introduce ide_transfer_start_norecurse
> 
>  hw/ide/ahci.c | 12 +++-
>  hw/ide/atapi.c| 37 -
>  hw/ide/core.c | 37 +++--
>  include/hw/ide/internal.h |  3 +++
>  4 files changed, 53 insertions(+), 36 deletions(-)
> 

ACK receipt, I will get to this soon, sorry!



Re: [Qemu-devel] [PATCH] macio: fix NULL pointer dereference when issuing IDE trim

2018-02-27 Thread John Snow


On 02/26/2018 03:56 AM, Anton Nefedov wrote:
> 
> 
> On 23/2/2018 9:47 PM, Mark Cave-Ayland wrote:
>> Commit ef0e64a983 "ide: pass IDEState to trim AIO callback" changed the
>> IDE trim callback from using a BlockBackend to an IDEState but forgot
>> to update
>> the dma_blk_io() call in hw/ide/macio.c accordingly.
>>
> 
> I somehow missed this whole macio part in that series :(
> 

It's my mistake entirely.

>> Without this fix qemu-system-ppc segfaults when issuing an IDE trim
>> command on
>> any of the PPC Mac machines (easily triggered by running the Debian
>> installer).
>>
>> Reported-by: Howard Spoelstra 
>> Signed-off-by: Mark Cave-Ayland 
> 
> Reviewed-by: Anton Nefedov 
> 
> ..but there should also be a fix-up for
> 947858b "ide: abort TRIM operation for invalid range"
> which apparently lacks a few steps on the invalid range errorpath for
> macio. I'll look into that.
> 

I'm unfortunately a little preoccupied right now, please CC me
(hopefully before 2.12 freeze!) and I'll get this squared away for next
release.

>> ---
>>   hw/ide/macio.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/hw/ide/macio.c b/hw/ide/macio.c
>> index 2e043ef1ea..d3a85cba3b 100644
>> --- a/hw/ide/macio.c
>> +++ b/hw/ide/macio.c
>> @@ -187,7 +187,7 @@ static void pmac_ide_transfer_cb(void *opaque, int
>> ret)
>>   break;
>>   case IDE_DMA_TRIM:
>>   s->bus->dma->aiocb = dma_blk_io(blk_get_aio_context(s->blk),
>> &s->sg,
>> -    offset, 0x1, ide_issue_trim,
>> s->blk,
>> +    offset, 0x1, ide_issue_trim, s,
>>   pmac_ide_transfer_cb, io,
>>   DMA_DIRECTION_TO_DEVICE);
>>   break;
>>
> 

In the meantime, I'm going to stage this for tomorrow so Mark doesn't
have to deal with a broken tree.

--js



[Qemu-devel] [PATCH v2 3/3] tests/bios-tables-test: add test cases for DIMM proximity

2018-02-27 Thread Haozhong Zhang
QEMU now builds one SRAT memory affinity structure for each
static-plugged PC-DIMM and NVDIMM device with the proximity domain
specified in the device option 'node', rather than only one SRAT
memory affinity structure covering the entire hotpluggable address
space with the proximity domain of the last node.

Add test cases on PC and Q35 machines with 3 proximity domains, and
one PC-DIMM and one NVDIMM attached to the second proximity domain.
Check whether the QEMU-built SRAT tables match with the expected ones.

Signed-off-by: Haozhong Zhang 
Suggested-by: Igor Mammedov 
---
 tests/acpi-test-data/pc/APIC.dimmpxm  | Bin 0 -> 136 bytes
 tests/acpi-test-data/pc/DSDT.dimmpxm  | Bin 0 -> 6710 bytes
 tests/acpi-test-data/pc/NFIT.dimmpxm  | Bin 0 -> 224 bytes
 tests/acpi-test-data/pc/SRAT.dimmpxm  | Bin 0 -> 416 bytes
 tests/acpi-test-data/pc/SSDT.dimmpxm  | Bin 0 -> 685 bytes
 tests/acpi-test-data/q35/APIC.dimmpxm | Bin 0 -> 136 bytes
 tests/acpi-test-data/q35/DSDT.dimmpxm | Bin 0 -> 9394 bytes
 tests/acpi-test-data/q35/NFIT.dimmpxm | Bin 0 -> 224 bytes
 tests/acpi-test-data/q35/SRAT.dimmpxm | Bin 0 -> 416 bytes
 tests/acpi-test-data/q35/SSDT.dimmpxm | Bin 0 -> 685 bytes
 tests/bios-tables-test.c  |  33 +
 11 files changed, 33 insertions(+)
 create mode 100644 tests/acpi-test-data/pc/APIC.dimmpxm
 create mode 100644 tests/acpi-test-data/pc/DSDT.dimmpxm
 create mode 100644 tests/acpi-test-data/pc/NFIT.dimmpxm
 create mode 100644 tests/acpi-test-data/pc/SRAT.dimmpxm
 create mode 100644 tests/acpi-test-data/pc/SSDT.dimmpxm
 create mode 100644 tests/acpi-test-data/q35/APIC.dimmpxm
 create mode 100644 tests/acpi-test-data/q35/DSDT.dimmpxm
 create mode 100644 tests/acpi-test-data/q35/NFIT.dimmpxm
 create mode 100644 tests/acpi-test-data/q35/SRAT.dimmpxm
 create mode 100644 tests/acpi-test-data/q35/SSDT.dimmpxm

diff --git a/tests/acpi-test-data/pc/APIC.dimmpxm 
b/tests/acpi-test-data/pc/APIC.dimmpxm
new file mode 100644
index 
..658d7e748e37540ff85a02f4391efc7eaae3c8b4
GIT binary patch
literal 136
zcmZ<^@O18AU|?W8>g4b25v<@85#a0y6k`O6f!H9Lf#JbFFwFr}2jX%tGD2u3CJ@cY
q0}?#&4@5F?0WpXHVzIIUXMv!Q`St
z2satx2E`NwaMQjOT80hSgA(XD{s`Mg=tt~aB0UQo6nV}n;q}*6L*s!=>De17&eqe$ErAXf5Fu1dD*Ge^>q0g
zCdy7(ZxPwqsOwZQos1~+PE+aPH|zWFglB>Rzq^4yJTQ_q<#-N~s-
zj@2#`4|`k>yE>n_OmTchclv|4EFPz3-`BKAD7a!4N}52}fwG(!gK1LTDmwuV1{QIb^P0e1
z2JXK7yNk$zZxT|wL{2o!YLk+yMAXXI5VZ>YQM7ZHL~a<_?EVL>wg#lZkfmU-(BFCX
z+A8&kM-*X^&{euac8D;wOYHuYwTd3WMNv)qqY?$`zvvQ|Pc2@Li_7qbw4aa{12zLM14YM8jDiL))
zn0g#icQ^&pJtEJfC}xFaR_O!rfhfz1J>Q?Iq^%nTKBx&AWFV)(35lb5DZUhmyr}pz
zD@aqEpkYG912Y=SBfJ!VM+P3HCLbnI&(y3oO_3K&h7?CZgB;w*!9&m4J*#>RmZJOu
zGb)9GR>@bdfuhnhS~R5u3KXgHHK-g>dY})ZQ{)eJ=Y_h=auBs4(oZJb(laly@xxUO~OQSd#DU<11Jg0
zrqNu}$=2}A!EHLs4mwPVx-GKxEE7p(0A&ZanGp3AH8`=L~n7e;A*30>v~>>M*$y2e3WE$u6`Xxb(nm}dRP)Z((Ft@<%{qGBBA;WM_57Y0T-9WRF8T5)
z$7)&ht8U;0RI^qc`$OxM3G0x*KiPU=%zDAMUI72btryhetrzURfw5~7)|v1#%ooSZ
z7k%@^3G>Am^Tp7-K4~5{h@TvNY0P}dH(#1CUy3nbvis{2L~l)+mjd%Vg>V;vDd%N3
zrB3;it)_x8MpvN=XIIg+V8hect;3>kwyKc{HsvQ*Ml&~ZwY&GcPwxLdw{z#yyZ3i)
z-}#i-R5KfEVfoE4wo1u9*{5l!(U4Sr71?KL`_Xw6$|R@ZhNIP+7S&qD4GIMzPl=>y
zh7P4>7D1wBRU0`#>g9G$O*{2wUG@le+Wpn5xBMyvz6Abd%9>fv=L>oCAlT0n>N{F{
z<+s9+4Z37c%jfgk*reDjY!6d|E)%d_+*WH-P}<5#`~0m-65sDIbPNd#)MPjK;1PFt
zW-zJ1pgcC?+82&!8fzn0H4+%;&oe|Pses{Fezi8OSz1$$3xm4P%c+42J2h0$Nm3a2
z;i_~bAb)~j6er;@C)7LQ8K6DtK3kK9wWC!2G#^jJ#G_mQ2d?7-HImx8)lSC+dhC21
zaTX%>wvUO+W5Q%FLO-7DgdsTAJNxmPgxLm54}OrikXrHx6AW_GD7UjICKDNtWS~ts
znE03!M4fgZs1!ySpv34+g-j$DI7#7yKH6F47qFTgTfHLFfFUxbYE&x*JtF%6+WCdMj>Q8R;DnK7GY
zzMNlz!GNtQW8C?}-JRWB_eenrj+Q|sVX%MAVJAQ`|LUoWZ+I>e4(A4eby!ibq;0`u
ze8SYgk0~f$KH62w^9F3jP_F_PI94`VOP8}HmjCJ(gupFi;`if0@Z;T~cmRhPe34Cq
z=Z&B_3(Z>4e7jNQz3)LY>^*6eZ-&@2(5!hlP52Tv8b}0P>vz(7Q4h8nsA~-M69epq1ZtIv*>mjd7pKNU#!xX#4ABp-RAdb3
z9f%2<)9`NCJP8gt9$H4VhvcaA&`(r_GlZ{2&85rgn~;YFotyvF-ppAfO6W^Own9P(zzH^cL$Wni`0BhY9NMGGl*#k|6?o>A)<<2VB)(reCJ14E;1%
z{6B#DWjc~G{+jsr_)1~0Lh$~9KY{p~U`r!JJ~;TtkAdc&-yba~xLLvC`ctrJ1vq&2
zu!SE0Gzo?!_y91m0bu{w2^Bu*w970twy%7<|Ls1NkmQ>2p!3Qv4+q=sUtIZ(wh}mSy=rX0
z`2o{%(qko|B=AhFRJ~jN?bx{M&s7C>1PCK#<@4|*0zYHf{r9XwD8TFpL(CJtxXNAuQfv+=h(5swP;ZP+O%2~49mMDh{e~)_D${kA-s9(;Rv_
z4ytq3GaPy*7Rp%{ICLQnTIH-~IrMBSl(W9hp|8h5S2*h%9QsBql(U}W&~wK@)!Kr7
zquCaQ>4#modhKDY+x7(o1gK_#kVAoMmO+4u6li)VpeFOMn}U1Wo&{wklmwI_odgmT
zhfs&MK8$o|43>DPLu*Ts4$Z+r4RvVm3hHJacA>h$F0yCAg5xsmBOo8@VK>>cVCir@
z+$}*q)GH{%BH;R*FCXfcQHFHp`kpTz>ivizF}WT#lRZlt(__?-Vv!!>*2nZ1IV4M@
m&!Ie~Uj`YbJ<|732EBR?k1p{43iyE-rx5nK4H`CdCjJ)(Bz;=|

literal 0
HcmV?d1

diff --git a/tests/acpi-test-data/pc/NFIT.dimmpxm 
b/tests/acpi-test-data/pc/NFIT.dimmpxm
new file mode 100644
index 
..207328ed9357409e4098df64f951a29527ab52de
GIT binary patch
literal 224
zcmeZs^9*^wz`(#5;Ny0aC=#03w0rG8iy0L6|`OtRNOx9x8-HL3Fb)1OQdFH828oB7<-f
NAqGZ>^~k~m*#MLQ69E7K

literal 0
HcmV?d1

diff --git a/tests/acpi-test-data/pc/SRAT.dimmpxm 
b/tests/acpi-test-data/pc/SRAT.dimmpxm
new file mode 100644
index 
000

[Qemu-devel] [PATCH v2 0/3] hw/acpi-build: build SRAT memory affinity structures for DIMM devices

2018-02-27 Thread Haozhong Zhang
ACPI 6.2A Table 5-129 "SPA Range Structure" requires the proximity
domain of a NVDIMM SPA range must match with corresponding entry in
SRAT table.

The address ranges of vNVDIMM in QEMU are allocated from the
hot-pluggable address space, which is entirely covered by one SRAT
memory affinity structure. However, users can set the vNVDIMM
proximity domain in NFIT SPA range structure by the 'node' property of
'-device nvdimm' to a value different than the one in the above SRAT
memory affinity structure.

In order to solve such proximity domain mismatch, this patch builds
one SRAT memory affinity structure for each static-plugged DIMM device,
including both PC-DIMM and NVDIMM, with the proximity domain specified
in '-device pc-dimm' or '-device nvdimm'.

The remaining hot-pluggable address space is covered by one or multiple
SRAT memory affinity structures with the proximity domain of the last
node as before.


Changes in v2:
 * Build SRAT memory affinity structures of PC-DIMM devices as well.
 * Add test cases.


Haozhong Zhang (3):
  hw/acpi-build: build SRAT memory affinity structures for DIMM devices
  tests/bios-tables-test: allow setting extra machine options
  tests/bios-tables-test: add test cases for DIMM proximity

 hw/i386/acpi-build.c  |  50 --
 hw/mem/pc-dimm.c  |   8 
 include/hw/mem/pc-dimm.h  |  10 +
 tests/acpi-test-data/pc/APIC.dimmpxm  | Bin 0 -> 136 bytes
 tests/acpi-test-data/pc/DSDT.dimmpxm  | Bin 0 -> 6710 bytes
 tests/acpi-test-data/pc/NFIT.dimmpxm  | Bin 0 -> 224 bytes
 tests/acpi-test-data/pc/SRAT.dimmpxm  | Bin 0 -> 416 bytes
 tests/acpi-test-data/pc/SSDT.dimmpxm  | Bin 0 -> 685 bytes
 tests/acpi-test-data/q35/APIC.dimmpxm | Bin 0 -> 136 bytes
 tests/acpi-test-data/q35/DSDT.dimmpxm | Bin 0 -> 9394 bytes
 tests/acpi-test-data/q35/NFIT.dimmpxm | Bin 0 -> 224 bytes
 tests/acpi-test-data/q35/SRAT.dimmpxm | Bin 0 -> 416 bytes
 tests/acpi-test-data/q35/SSDT.dimmpxm | Bin 0 -> 685 bytes
 tests/bios-tables-test.c  |  78 +++---
 14 files changed, 126 insertions(+), 20 deletions(-)
 create mode 100644 tests/acpi-test-data/pc/APIC.dimmpxm
 create mode 100644 tests/acpi-test-data/pc/DSDT.dimmpxm
 create mode 100644 tests/acpi-test-data/pc/NFIT.dimmpxm
 create mode 100644 tests/acpi-test-data/pc/SRAT.dimmpxm
 create mode 100644 tests/acpi-test-data/pc/SSDT.dimmpxm
 create mode 100644 tests/acpi-test-data/q35/APIC.dimmpxm
 create mode 100644 tests/acpi-test-data/q35/DSDT.dimmpxm
 create mode 100644 tests/acpi-test-data/q35/NFIT.dimmpxm
 create mode 100644 tests/acpi-test-data/q35/SRAT.dimmpxm
 create mode 100644 tests/acpi-test-data/q35/SSDT.dimmpxm

-- 
2.14.1




[Qemu-devel] [PATCH v2 2/3] tests/bios-tables-test: allow setting extra machine options

2018-02-27 Thread Haozhong Zhang
Some test cases may require extra machine options than the those
used in the current test_acpi_ones(), e.g., nvdimm test cases require
the machine option 'nvdimm=on'.

Signed-off-by: Haozhong Zhang 
---
 tests/bios-tables-test.c | 45 +
 1 file changed, 29 insertions(+), 16 deletions(-)

diff --git a/tests/bios-tables-test.c b/tests/bios-tables-test.c
index 65b271a173..d45181aa51 100644
--- a/tests/bios-tables-test.c
+++ b/tests/bios-tables-test.c
@@ -654,17 +654,22 @@ static void test_smbios_structs(test_data *data)
 }
 }
 
-static void test_acpi_one(const char *params, test_data *data)
+static void test_acpi_one(const char *extra_machine_opts,
+  const char *params, test_data *data)
 {
 char *args;
 
 /* Disable kernel irqchip to be able to override apic irq0. */
-args = g_strdup_printf("-machine %s,accel=%s,kernel-irqchip=off "
+args = g_strdup_printf("-machine %s,accel=%s,kernel-irqchip=off",
+   data->machine, "kvm:tcg");
+if (extra_machine_opts) {
+args = g_strdup_printf("%s,%s", args, extra_machine_opts);
+}
+args = g_strdup_printf("%s "
"-net none -display none %s "
"-drive id=hd0,if=none,file=%s,format=raw "
"-device ide-hd,drive=hd0 ",
-   data->machine, "kvm:tcg",
-   params ? params : "", disk);
+   args, params ? params : "", disk);
 
 qtest_start(args);
 
@@ -711,7 +716,7 @@ static void test_acpi_piix4_tcg(void)
 data.machine = MACHINE_PC;
 data.required_struct_types = base_required_struct_types;
 data.required_struct_types_len = ARRAY_SIZE(base_required_struct_types);
-test_acpi_one(NULL, &data);
+test_acpi_one(NULL, NULL, &data);
 free_test_data(&data);
 }
 
@@ -724,7 +729,7 @@ static void test_acpi_piix4_tcg_bridge(void)
 data.variant = ".bridge";
 data.required_struct_types = base_required_struct_types;
 data.required_struct_types_len = ARRAY_SIZE(base_required_struct_types);
-test_acpi_one("-device pci-bridge,chassis_nr=1", &data);
+test_acpi_one(NULL, "-device pci-bridge,chassis_nr=1", &data);
 free_test_data(&data);
 }
 
@@ -736,7 +741,7 @@ static void test_acpi_q35_tcg(void)
 data.machine = MACHINE_Q35;
 data.required_struct_types = base_required_struct_types;
 data.required_struct_types_len = ARRAY_SIZE(base_required_struct_types);
-test_acpi_one(NULL, &data);
+test_acpi_one(NULL, NULL, &data);
 free_test_data(&data);
 }
 
@@ -749,7 +754,7 @@ static void test_acpi_q35_tcg_bridge(void)
 data.variant = ".bridge";
 data.required_struct_types = base_required_struct_types;
 data.required_struct_types_len = ARRAY_SIZE(base_required_struct_types);
-test_acpi_one("-device pci-bridge,chassis_nr=1",
+test_acpi_one(NULL, "-device pci-bridge,chassis_nr=1",
   &data);
 free_test_data(&data);
 }
@@ -761,7 +766,8 @@ static void test_acpi_piix4_tcg_cphp(void)
 memset(&data, 0, sizeof(data));
 data.machine = MACHINE_PC;
 data.variant = ".cphp";
-test_acpi_one("-smp 2,cores=3,sockets=2,maxcpus=6"
+test_acpi_one(NULL,
+  "-smp 2,cores=3,sockets=2,maxcpus=6"
   " -numa node -numa node"
   " -numa dist,src=0,dst=1,val=21",
   &data);
@@ -775,7 +781,8 @@ static void test_acpi_q35_tcg_cphp(void)
 memset(&data, 0, sizeof(data));
 data.machine = MACHINE_Q35;
 data.variant = ".cphp";
-test_acpi_one(" -smp 2,cores=3,sockets=2,maxcpus=6"
+test_acpi_one(NULL,
+  " -smp 2,cores=3,sockets=2,maxcpus=6"
   " -numa node -numa node"
   " -numa dist,src=0,dst=1,val=21",
   &data);
@@ -795,7 +802,8 @@ static void test_acpi_q35_tcg_ipmi(void)
 data.variant = ".ipmibt";
 data.required_struct_types = ipmi_required_struct_types;
 data.required_struct_types_len = ARRAY_SIZE(ipmi_required_struct_types);
-test_acpi_one("-device ipmi-bmc-sim,id=bmc0"
+test_acpi_one(NULL,
+  "-device ipmi-bmc-sim,id=bmc0"
   " -device isa-ipmi-bt,bmc=bmc0",
   &data);
 free_test_data(&data);
@@ -813,7 +821,8 @@ static void test_acpi_piix4_tcg_ipmi(void)
 data.variant = ".ipmikcs";
 data.required_struct_types = ipmi_required_struct_types;
 data.required_struct_types_len = ARRAY_SIZE(ipmi_required_struct_types);
-test_acpi_one("-device ipmi-bmc-sim,id=bmc0"
+test_acpi_one(NULL,
+  "-device ipmi-bmc-sim,id=bmc0"
   " -device isa-ipmi-kcs,irq=0,bmc=bmc0",
   &data);
 free_test_data(&data);
@@ -826,7 +835,8 @@ static void test_acpi_q35_tcg_memhp(void)
 memset(&data, 0, sizeof(data));
 data.machine = MACHINE_Q35;
 data.variant = ".memhp";

[Qemu-devel] [PATCH v2 1/3] hw/acpi-build: build SRAT memory affinity structures for DIMM devices

2018-02-27 Thread Haozhong Zhang
ACPI 6.2A Table 5-129 "SPA Range Structure" requires the proximity
domain of a NVDIMM SPA range must match with corresponding entry in
SRAT table.

The address ranges of vNVDIMM in QEMU are allocated from the
hot-pluggable address space, which is entirely covered by one SRAT
memory affinity structure. However, users can set the vNVDIMM
proximity domain in NFIT SPA range structure by the 'node' property of
'-device nvdimm' to a value different than the one in the above SRAT
memory affinity structure.

In order to solve such proximity domain mismatch, this patch builds
one SRAT memory affinity structure for each static-plugged DIMM device,
including both PC-DIMM and NVDIMM, with the proximity domain specified
in '-device pc-dimm' or '-device nvdimm'.

The remaining hot-pluggable address space is covered by one or multiple
SRAT memory affinity structures with the proximity domain of the last
node as before.

Signed-off-by: Haozhong Zhang 
---
 hw/i386/acpi-build.c | 50 
 hw/mem/pc-dimm.c |  8 
 include/hw/mem/pc-dimm.h | 10 ++
 3 files changed, 64 insertions(+), 4 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index deb440f286..a88de06d8f 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2323,6 +2323,49 @@ build_tpm2(GArray *table_data, BIOSLinker *linker, 
GArray *tcpalog)
 #define HOLE_640K_START  (640 * 1024)
 #define HOLE_640K_END   (1024 * 1024)
 
+static void build_srat_hotpluggable_memory(GArray *table_data, uint64_t base,
+   uint64_t len, int default_node)
+{
+GSList *dimms = pc_dimm_get_device_list();
+GSList *ent = dimms;
+PCDIMMDevice *dev;
+Object *obj;
+uint64_t end = base + len, addr, size;
+int node;
+AcpiSratMemoryAffinity *numamem;
+
+while (base < end) {
+numamem = acpi_data_push(table_data, sizeof *numamem);
+
+if (!ent) {
+build_srat_memory(numamem, base, end - base, default_node,
+  MEM_AFFINITY_HOTPLUGGABLE | 
MEM_AFFINITY_ENABLED);
+break;
+}
+
+dev = PC_DIMM(ent->data);
+obj = OBJECT(dev);
+addr = object_property_get_uint(obj, PC_DIMM_ADDR_PROP, NULL);
+size = object_property_get_uint(obj, PC_DIMM_SIZE_PROP, NULL);
+node = object_property_get_uint(obj, PC_DIMM_NODE_PROP, NULL);
+
+if (base < addr) {
+build_srat_memory(numamem, base, addr - base, default_node,
+  MEM_AFFINITY_HOTPLUGGABLE | 
MEM_AFFINITY_ENABLED);
+numamem = acpi_data_push(table_data, sizeof *numamem);
+}
+build_srat_memory(numamem, addr, size, node,
+  MEM_AFFINITY_HOTPLUGGABLE | MEM_AFFINITY_ENABLED |
+  (object_dynamic_cast(obj, TYPE_NVDIMM) ?
+   MEM_AFFINITY_NON_VOLATILE : 0));
+
+base = addr + size;
+ent = g_slist_next(ent);
+}
+
+g_slist_free(dimms);
+}
+
 static void
 build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
 {
@@ -2434,10 +2477,9 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
MachineState *machine)
  * providing _PXM method if necessary.
  */
 if (hotplugabble_address_space_size) {
-numamem = acpi_data_push(table_data, sizeof *numamem);
-build_srat_memory(numamem, pcms->hotplug_memory.base,
-  hotplugabble_address_space_size, pcms->numa_nodes - 
1,
-  MEM_AFFINITY_HOTPLUGGABLE | MEM_AFFINITY_ENABLED);
+build_srat_hotpluggable_memory(table_data, pcms->hotplug_memory.base,
+   hotplugabble_address_space_size,
+   pcms->numa_nodes - 1);
 }
 
 build_header(linker, table_data,
diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
index 6e74b61cb6..9fd901e87a 100644
--- a/hw/mem/pc-dimm.c
+++ b/hw/mem/pc-dimm.c
@@ -276,6 +276,14 @@ static int pc_dimm_built_list(Object *obj, void *opaque)
 return 0;
 }
 
+GSList *pc_dimm_get_device_list(void)
+{
+GSList *list = NULL;
+
+object_child_foreach(qdev_get_machine(), pc_dimm_built_list, &list);
+return list;
+}
+
 uint64_t pc_dimm_get_free_addr(uint64_t address_space_start,
uint64_t address_space_size,
uint64_t *hint, uint64_t align, uint64_t size,
diff --git a/include/hw/mem/pc-dimm.h b/include/hw/mem/pc-dimm.h
index d83b957829..4cf5cc49e9 100644
--- a/include/hw/mem/pc-dimm.h
+++ b/include/hw/mem/pc-dimm.h
@@ -100,4 +100,14 @@ void pc_dimm_memory_plug(DeviceState *dev, 
MemoryHotplugState *hpms,
  MemoryRegion *mr, uint64_t align, Error **errp);
 void pc_dimm_memory_unplug(DeviceState *dev, MemoryHotplugState *hpms,
MemoryRegion *mr);
+
+/*
+ * Return a list of DeviceState of pc-d

[Qemu-devel] [PATCH v3] scripts/checkpatch.pl: add check for `while` and `for`

2018-02-27 Thread Su Hang
Adding check for `while` and `for` statements, which condition has more than
one line.

The former checkpatch.pl can check `if` statement, which condition has more
than one line, whether block misses brace round, like this:
'''
if (cond1 ||
cond2)
statement;
'''
But it doesn't do the same check for `for` and `while` statements.

Using `(?:...)` instead of `(...)` in regex pattern catch.
Because `(?:...)` is faster and avoids unwanted side-effect.

Suggested-by: Stefan Hajnoczi 
Suggested-by: Eric Blake 
Suggested-by: Thomas Huth 
Signed-off-by: Su Hang 
---
 scripts/checkpatch.pl | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 10c138344fa9..bed1dbbd54d1 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -2352,9 +2352,9 @@ sub process {
}
}
 
-# check for missing bracing round if etc
-   if ($line =~ /(^.*)\b(for|while|if)\b/ &&
-   $line !~ /\#\s*(for|while|if)/) {
+# check for missing bracing around if etc
+   if ($line =~ /(^.*)\b(?:for|while|if)\b/ &&
+   $line !~ /\#\s*(?:for|while|if)/) {
my ($level, $endln, @chunks) =
ctx_statement_full($linenr, $realcnt, 1);
 if ($dbg_adv_apw) {
-- 
2.7.4




Re: [Qemu-devel] [PATCH 00/11] macio: remove legacy macio_init() function

2018-02-27 Thread David Gibson
On Fri, Feb 23, 2018 at 02:51:48PM +, Mark Cave-Ayland wrote:
> On 23/02/18 12:11, no-re...@patchew.org wrote:
> 
> > Hi,
> > 
> > This series failed build test on s390x host. Please find the details below.
> > 
> > Type: series
> > Message-id: 20180219181922.21586-1-mark.cave-ayl...@ilande.co.uk
> > Subject: [Qemu-devel] [PATCH 00/11] macio: remove legacy
> > macio_init() function

[snip]
> >CC  ppc-linux-user/accel/stubs/whpx-stub.o
> >CC  ppc-linux-user/accel/stubs/kvm-stub.o
> >CC  ppc-linux-user/accel/tcg/tcg-runtime.o
> >CC  ppc-linux-user/accel/tcg/tcg-runtime-gvec.o
> >CC  ppc-linux-user/accel/tcg/cpu-exec.o
> >CC  ppc-linux-user/accel/tcg/cpu-exec-common.o
> >CC  ppc-linux-user/accel/tcg/translate-all.o
> >CC  ppc-linux-user/accel/tcg/translator.o
> >CC  ppc-linux-user/accel/tcg/user-exec.o
> >CC  ppc-linux-user/accel/tcg/user-exec-stub.o
> >CC  ppc-linux-user/linux-user/main.o
> >CC  ppc-linux-user/linux-user/syscall.o
> >CC  ppc-linux-user/linux-user/strace.o
> >CC  ppc-linux-user/linux-user/mmap.o
> >CC  ppc-linux-user/linux-user/signal.o
> >CC  ppc-linux-user/linux-user/elfload.o
> >CC  ppc-linux-user/linux-user/linuxload.o
> >CC  ppc-linux-user/linux-user/uaccess.o
> >CC  ppc-linux-user/linux-user/uname.o
> >CCASppc-linux-user/linux-user/safe-syscall.o
> >CC  ppc-linux-user/target/ppc/cpu-models.o
> >CC  ppc-linux-user/target/ppc/cpu.o
> >CC  ppc-linux-user/target/ppc/translate.o
> >CC  ppc-linux-user/target/ppc/kvm-stub.o
> >CC  ppc-linux-user/target/ppc/dfp_helper.o
> > In file included from 
> > /var/tmp/patchew-tester-tmp-ob5ouqpf/src/include/hw/qdev.h:4:0,
> >   from 
> > /var/tmp/patchew-tester-tmp-ob5ouqpf/src/include/hw/sysbus.h:6,
> >   from 
> > /var/tmp/patchew-tester-tmp-ob5ouqpf/src/include/hw/ppc/openpic.h:5,
> >   from 
> > /var/tmp/patchew-tester-tmp-ob5ouqpf/src/target/ppc/kvm-stub.c:15:
> > /var/tmp/patchew-tester-tmp-ob5ouqpf/src/include/hw/hw.h:6:2: error: #error 
> > Cannot include hw/hw.h from user emulation
> >   #error Cannot include hw/hw.h from user emulation
> >^
> > In file included from 
> > /var/tmp/patchew-tester-tmp-ob5ouqpf/src/target/ppc/kvm-stub.c:15:0:
> > /var/tmp/patchew-tester-tmp-ob5ouqpf/src/include/hw/ppc/openpic.h:146:18: 
> > error: field ‘mem’ has incomplete type
> >   MemoryRegion mem;
> >^~~
> > /var/tmp/patchew-tester-tmp-ob5ouqpf/src/include/hw/ppc/openpic.h:163:18: 
> > error: array type has incomplete element type ‘MemoryRegion {aka struct 
> > MemoryRegion}’
> >   MemoryRegion sub_io_mem[6];
> >^~
> > make[1]: *** [/var/tmp/patchew-tester-tmp-ob5ouqpf/src/rules.mak:66: 
> > target/ppc/kvm-stub.o] Error 1
> > make[1]: *** Waiting for unfinished jobs
> > make: *** [Makefile:404: subdir-ppc64-linux-user] Error 2
> > make: *** [Makefile:404: subdir-ppc-linux-user] Error 2
> > === OUTPUT END ===
> > 
> > Test command exited with code: 2
> 
> Oh that's fun - it seems that kvm-stub.c includes hw/ppc/openpic.h in order
> to make use of kvm_openpic_connect_vcpu() which is why this is tripping up.
> 
> David, any idea what the right solution is here?

Not off the top of my head.

> I could perhaps split the
> KVM-specific parts of openpic.h into a separate hw/ppc/openpic_kvm.h file.
> Then again it feels a bit like kvm_openpic_connect_vcpu() doesn't live in
> the right place.

Both of those seem like plausible solutions.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [Qemu-block] [PATCH] vl: introduce vm_shutdown()

2018-02-27 Thread Fam Zheng
On Tue, 02/27 15:30, Stefan Hajnoczi wrote:
> On Fri, Feb 23, 2018 at 04:20:44PM +0800, Fam Zheng wrote:
> > On Tue, 02/20 13:10, Stefan Hajnoczi wrote:
> > > 1. virtio_scsi_handle_cmd_vq() racing with iothread_stop_all() hits the
> > >virtio_scsi_ctx_check() assertion failure because the BDS AioContext
> > >has been modified by iothread_stop_all().
> > 
> > Does this patch fix the issue completely? IIUC virtio_scsi_handle_cmd can
> > already be entered at the time of main thread calling 
> > virtio_scsi_clear_aio(),
> > so this race condition still exists:
> > 
> >   main thread   iothread
> > -
> >   vm_shutdown
> > ...
> >   virtio_bus_stop_ioeventfd
> > virtio_scsi_dataplane_stop
> > aio_poll()
> >   ...
> > 
> > virtio_scsi_data_plane_handle_cmd()
> >   aio_context_acquire(s->ctx)
> >   virtio_scsi_acquire(s).enter
> >   virtio_scsi_clear_aio()
> >   aio_context_release(s->ctx)
> >   virtio_scsi_acquire(s).return
> >   virtio_scsi_handle_cmd_vq()
> > ...
> >   virtqueue_pop()
> > 
> > Is it possible that the above virtqueue_pop() still returns one element 
> > that was
> > queued before vm_shutdown() was called?
> 
> No, it can't because virtio_scsi_clear_aio() invokes
> virtio_queue_host_notifier_aio_read(&vq->host_notifier) to process the
> virtqueue.  By the time we get back to iothread's
> virtio_scsi_data_plane_handle_cmd() the virtqueue is already empty.
> 
> Vcpus have been paused so no additional elements can slip into the
> virtqueue.

So there is:

static void virtio_queue_host_notifier_aio_read(EventNotifier *n)
{
VirtQueue *vq = container_of(n, VirtQueue, host_notifier);
if (event_notifier_test_and_clear(n)) {
virtio_queue_notify_aio_vq(vq);
}
}

Guest kicks after adding an element to VQ, but we check ioeventfd before trying
virtqueue_pop(). Is that a problem? If VCPUs are paused after enqueuing but
before kicking VQ, the ioeventfd is not set, the virtqueue is not processed
here.

Fam




[Qemu-devel] [PATCH] PowerPC: Add TM bits into msr_mask

2018-02-27 Thread wei . guo . simon
From: Simon Guo 

During migration, cpu_post_load() will use msr_mask to determine which
PPC MSR bits will be sync to the target side. Hardware Transaction
Memory(HTM) has been supported since Power8. This patch adds TM/TS bits
into msr_mask for Power8, so that transactional application can be
migrated across qemu.

Signed-off-by: Simon Guo 
---
 target/ppc/translate_init.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/target/ppc/translate_init.c b/target/ppc/translate_init.c
index 55c99c9..a438721 100644
--- a/target/ppc/translate_init.c
+++ b/target/ppc/translate_init.c
@@ -8689,6 +8689,9 @@ POWERPC_FAMILY(POWER8)(ObjectClass *oc, void *data)
 (1ull << MSR_DR) |
 (1ull << MSR_PMM) |
 (1ull << MSR_RI) |
+(1ull << MSR_TM) |
+(1ull << MSR_TS0) |
+(1ull << MSR_TS1) |
 (1ull << MSR_LE);
 pcc->mmu_model = POWERPC_MMU_2_07;
 #if defined(CONFIG_SOFTMMU)
-- 
1.8.3.1




Re: [Qemu-devel] [PATCH 0/2] spapr: fix migration of old guests

2018-02-27 Thread David Gibson
On Tue, Feb 27, 2018 at 04:22:40PM +0100, Greg Kurz wrote:
> Recent VSMT work broke migration of old guests as reported in this BZ:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1549087
> 
> Patch 1 fixes the issue, while patch 2 is a tentative code reorg to
> ensure VSMT is set before anyone tries to use spapr->vsmt.

Nice job tracking this down.  Applied to ppc-for-2.12.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [patches] Re: [PULL] RISC-V QEMU Port Submission

2018-02-27 Thread Michael Clark
On Wed, Feb 28, 2018 at 5:00 AM, Igor Mammedov  wrote:

> On Tue, 27 Feb 2018 14:01:05 +
> Peter Maydell  wrote:
>
> > On 27 February 2018 at 00:15, Michael Clark  wrote:
> > > -BEGIN PGP SIGNED MESSAGE-
> > > Hash: SHA1
> > >
> > > The following changes since commit 0a773d55ac76c5aa89ed9187a3bc5a
> f8c5c2a6d0:
> > >
> > >   maintainers: Add myself as a OpenBSD maintainer (2018-02-23 12:05:07
> +)
> > >
> > > are available in the git repository at:
> > >
> > >   https://github.com/riscv/riscv-qemu.git tags/riscv-qemu-upstream-v7
> > >
> > > for you to fetch changes up to 170a9d412ca1eb3b7ae6f6c1ff86dc
> bdff0fd7a8:
> > >
> > >   RISC-V Build Infrastructure (2018-02-27 11:09:43 +1300)
> > >
> > > - 
> > > QEMU RISC-V Emulation Support (RV64GC, RV32GC)
> >
> > Hi; thanks for this pull request. Unfortunately it seems to
> > be missing Signed-off-by: tags. Every commit needs to have
> > the Signed-off-by: tags from the people who contributed code to
> > it, indicating that they're OK with the code going into QEMU.
> > (If the work was done by and copyright a company then you don't
> > need to provide signoffs from every person at the company who
> > worked on the code if you don't want to.)
> >
> > > The spike_v1.9
> > > machine has been renamed to spike_v1.9.1 to match the privileged ISA
> > > version and spike_v1.10 has been made the default machine.
> >
> > I'm confused about this. Generally QEMU boards should model
> > hardware, and the board shouldn't care about the ISA versions.
> > Versioned board names in QEMU generally follow _QEMU_'s versioning,
> > and indicate that a board is identical to whatever we modelled
> > in that earlier QEMU version, for VM migration compatibility.
> > Board renames for minor ISA version bumps sounds like there's going
> > to be a lot of churn and breakage -- is this stuff really ready?
> > (Also, should we really have two different board source files
> > for two different ISA versions? I would have expected these to
> > share a source file to share code.)
> >
> > I did a test build and there are some compile errors:
> >
> > /home/pm215/qemu/linux-user/main.c:38:24: fatal error: target_elf.h:
> > No such file or directory
> >  #include "target_elf.h"
> > ^
> > compilation terminated.
> >
> > This is because your patchset has a clash with commit 542ca4349878a2e
> > which has just merged to master, and refactors out an ifdef ladder,
> > so now all targets supporting linux-user need to provide a
> > linux-user/$ARCH/target_elf.h file. Could you fix that up and rebase,
> > please?
> also '[PATCH v7 03/23] RISC-V CPU Core Definition' still hasn't addressed
> comment http://lists.gnu.org/archive/html/qemu-devel/2018-02/msg06412.html
> which isn't fixed since it was first pointed out (v4).
>
> I'd prefer that being fixed before merge so another people
> won't have to clean it up later after original authors,
> When they try to generalize cpu_type -> cpu_model conversion.
>

I re-read the email and it doesn't seem clear what you want us to do. I
changed the CPU suffix to a prefix as you requested. The rest of the CPU
initialisation is "modelled" on arm not sh4.

If you want to make a pull request, please use this branch:

- https://github.com/riscv/riscv-qemu/tree/qemu-upstream-v7


Re: [Qemu-devel] [PULL] RISC-V QEMU Port Submission

2018-02-27 Thread Michael Clark
On Wed, Feb 28, 2018 at 3:01 AM, Peter Maydell 
wrote:

> On 27 February 2018 at 00:15, Michael Clark  wrote:
> > -BEGIN PGP SIGNED MESSAGE-
> > Hash: SHA1
> >
> > The following changes since commit 0a773d55ac76c5aa89ed9187a3bc5a
> f8c5c2a6d0:
> >
> >   maintainers: Add myself as a OpenBSD maintainer (2018-02-23 12:05:07
> +)
> >
> > are available in the git repository at:
> >
> >   https://github.com/riscv/riscv-qemu.git tags/riscv-qemu-upstream-v7
> >
> > for you to fetch changes up to 170a9d412ca1eb3b7ae6f6c1ff86dcbdff0fd7a8:
> >
> >   RISC-V Build Infrastructure (2018-02-27 11:09:43 +1300)
> >
> > - 
> > QEMU RISC-V Emulation Support (RV64GC, RV32GC)
>
> Hi; thanks for this pull request. Unfortunately it seems to
> be missing Signed-off-by: tags. Every commit needs to have
> the Signed-off-by: tags from the people who contributed code to
> it, indicating that they're OK with the code going into QEMU.
> (If the work was done by and copyright a company then you don't
> need to provide signoffs from every person at the company who
> worked on the code if you don't want to.)
>
> > The spike_v1.9
> > machine has been renamed to spike_v1.9.1 to match the privileged ISA
> > version and spike_v1.10 has been made the default machine.
>
> I'm confused about this. Generally QEMU boards should model
> hardware, and the board shouldn't care about the ISA versions.
> Versioned board names in QEMU generally follow _QEMU_'s versioning,
> and indicate that a board is identical to whatever we modelled
> in that earlier QEMU version, for VM migration compatibility.
> Board renames for minor ISA version bumps sounds like there's going
> to be a lot of churn and breakage -- is this stuff really ready?
> (Also, should we really have two different board source files
> for two different ISA versions? I would have expected these to
> share a source file to share code.)
>
> I did a test build and there are some compile errors:
>
> /home/pm215/qemu/linux-user/main.c:38:24: fatal error: target_elf.h:
> No such file or directory
>  #include "target_elf.h"
> ^
> compilation terminated.
>
> This is because your patchset has a clash with commit 542ca4349878a2e
> which has just merged to master, and refactors out an ifdef ladder,
> so now all targets supporting linux-user need to provide a
> linux-user/$ARCH/target_elf.h file. Could you fix that up and rebase,
> please?
>

No worries. I'll rebase and submit a v8 patch series very soon.

I've just discussed with SiFive, and they wish to remove a couple of
machines and devices from the v8 patch series. They want to get the chip
model, SOC and board model right before they submit them upstream.

This is fine, as most folk seem to want to use "virt" to run Linux and we
have the "spike" machines that match the RISC-V Foundation ISA Simulator.

SiFive's boards are for customers that are using their MCUs without MMUs.
We can wait until they are fully baked before we submit them. The machines
are quite easy to maintain on a file level such that they won't cause much
trouble with them being in downstream repos. Other RISC-V vendors also
probably want to submit their boards and core models at some point too, if
they choose to support QEMU...


Re: [Qemu-devel] [patches] Re: [PULL] RISC-V QEMU Port Submission

2018-02-27 Thread Michael Clark
On Wed, Feb 28, 2018 at 6:50 AM, Peter Maydell 
wrote:

> On 27 February 2018 at 15:50, Stef O'Rear  wrote:
> > On Tue, Feb 27, 2018 at 6:01 AM, Peter Maydell 
> wrote:
> >> On 27 February 2018 at 00:15, Michael Clark  wrote:
> >>> The spike_v1.9
> >>> machine has been renamed to spike_v1.9.1 to match the privileged ISA
> >>> version and spike_v1.10 has been made the default machine.
> >>
> >> I'm confused about this. Generally QEMU boards should model
> >> hardware, and the board shouldn't care about the ISA versions.
> >
> > The spike boards model the Berkeley architectural simulator "spike"
> > (https://github.com/riscv/riscv-isa-sim), which does not have a formal
> > release process or version numbers so we are using the ISA version as
> > a proxy for spike's version.
> >
> > When physical boards are released with full documentation I presume we
> > will be adding board definitions for them, and they will imply
> > specific ISA versions.
> >
> >> Versioned board names in QEMU generally follow _QEMU_'s versioning,
> >> and indicate that a board is identical to whatever we modelled
> >> in that earlier QEMU version, for VM migration compatibility.
> >
> > In this case we're handling two logically distinct boards.  We could
> > combine them and implement a parameter; I was having trouble finding a
> > suitable example to follow earlier but it looks like gic-version in
> > hw/arm/virt.c is one.  This seems like a bad thing to change this late
> > in the review though?
>
> You don't need to make them one board with a command line option
> if that doesn't suit -- for instance hw/arm/vexpress.c defines
> multiple board models that are variants on each other and
> share a lot of code. That said, see below...
>
> >> Board renames for minor ISA version bumps sounds like there's going
> >> to be a lot of churn and breakage -- is this stuff really ready?
> >> (Also, should we really have two different board source files
> >> for two different ISA versions? I would have expected these to
> >> share a source file to share code.)
> >
> > 1.10 is the version we have committed to long term support for; it
> > matches all public hardware the upstream Linux port, so it seems
> > appropriate to use for QEMU.
> >
> > 1.9.1 was the version supported by riscv-qemu at the time Michael
> > Clark took over maintainership; we have not removed support for it
> > because we cannot prove that there is nobody depending on it, although
> > I do not use it myself and do not know anyone else who does, so I
> > would not personably object to removing it if that were required.
>
> I would rather not have stray legacy old versions in QEMU just
> because we think maybe somebody might be using them. If 1.10
> is the long-term-support committed version, then I think we
> should just have a model of that. Anybody who for some reason is
> still stuck on an older unsupported version gets to find out
> what "unsupported" means; they can always keep using whatever
> old QEMU code base they've been using up til now, presumably.
>

SiFive are happy to support privileged ISA v1.9.1.

I don't think the branch we maintain will easily merge with a branch that
has privileged ISA v1.9.1 torn out (the only version that actually worked 4
months ago).

If we can't submit our port with privileged ISA v1.9.1 suport then thats
going to put a big spanner in the works.

We've made a pretty strong choice to not break backwards compatibility
going forward and privileged ISA v1.9.1 is the line in the sand so to
speak. i.e. the QEMU port is still compatible with binaries from the v1.9.1
ISA spec published in November 2016 which has been implemented by many
folk. We have to have a much more reasonable deprecation period. Software
such as GDB and OpenOCD continue to support privileged ISA v1.9.1 and have
specific fallback code paths, as well as the OpenOCD port having support
for two versions of the debug spec.


Re: [Qemu-devel] [patches] Re: [PULL] RISC-V QEMU Port Submission

2018-02-27 Thread Michael Clark
On Wed, Feb 28, 2018 at 4:50 AM, Stef O'Rear  wrote:

> On Tue, Feb 27, 2018 at 6:01 AM, Peter Maydell 
> wrote:
> > On 27 February 2018 at 00:15, Michael Clark  wrote:
> >> -BEGIN PGP SIGNED MESSAGE-
> >> Hash: SHA1
> >>
> >> The following changes since commit 0a773d55ac76c5aa89ed9187a3bc5a
> f8c5c2a6d0:
> >>
> >>   maintainers: Add myself as a OpenBSD maintainer (2018-02-23 12:05:07
> +)
> >>
> >> are available in the git repository at:
> >>
> >>   https://github.com/riscv/riscv-qemu.git tags/riscv-qemu-upstream-v7
> >>
> >> for you to fetch changes up to 170a9d412ca1eb3b7ae6f6c1ff86dc
> bdff0fd7a8:
> >>
> >>   RISC-V Build Infrastructure (2018-02-27 11:09:43 +1300)
> >>
> >> - 
> >> QEMU RISC-V Emulation Support (RV64GC, RV32GC)
> >
> > Hi; thanks for this pull request. Unfortunately it seems to
> > be missing Signed-off-by: tags. Every commit needs to have
> > the Signed-off-by: tags from the people who contributed code to
> > it, indicating that they're OK with the code going into QEMU.
> > (If the work was done by and copyright a company then you don't
> > need to provide signoffs from every person at the company who
> > worked on the code if you don't want to.)
>
> I'll add mine.
>
> >> The spike_v1.9
> >> machine has been renamed to spike_v1.9.1 to match the privileged ISA
> >> version and spike_v1.10 has been made the default machine.
> >
> > I'm confused about this. Generally QEMU boards should model
> > hardware, and the board shouldn't care about the ISA versions.
>
> The spike boards model the Berkeley architectural simulator "spike"
> (https://github.com/riscv/riscv-isa-sim), which does not have a formal
> release process or version numbers so we are using the ISA version as
> a proxy for spike's version.
>
> When physical boards are released with full documentation I presume we
> will be adding board definitions for them, and they will imply
> specific ISA versions.
>
> > Versioned board names in QEMU generally follow _QEMU_'s versioning,
> > and indicate that a board is identical to whatever we modelled
> > in that earlier QEMU version, for VM migration compatibility.
>
> In this case we're handling two logically distinct boards.  We could
> combine them and implement a parameter; I was having trouble finding a
> suitable example to follow earlier but it looks like gic-version in
> hw/arm/virt.c is one.  This seems like a bad thing to change this late
> in the review though?
>
> > Board renames for minor ISA version bumps sounds like there's going
> > to be a lot of churn and breakage -- is this stuff really ready?
> > (Also, should we really have two different board source files
> > for two different ISA versions? I would have expected these to
> > share a source file to share code.)
>
> 1.10 is the version we have committed to long term support for; it
> matches all public hardware the upstream Linux port, so it seems
> appropriate to use for QEMU.
>
> 1.9.1 was the version supported by riscv-qemu at the time Michael
> Clark took over maintainership; we have not removed support for it
> because we cannot prove that there is nobody depending on it, although
> I do not use it myself and do not know anyone else who does, so I
> would not personably object to removing it if that were required.
>
> Combining spike_v1.10 and spike_v1.9.1 would also be an option amenable to
> us.
>

I've just talked to SiFive about this. They have agreed that we can remove
the sifive_e300 and sifive_u500 boards from the patch series that we are
going to submit upstream again later this week or early next week. These
machines and their devices are pretty easy for us to maintain in the riscv
or a sifive repo. This trims the number of machines from 5 to 3 and lets us
remove the SiFiveUART and SiFivePRCI from the next patch series we are
going to submit. e.g. v8

SiFive have indicated that they would like to keep privileged ISA v1.9.1
support. It's likely the RISC-V foundation would also like us to start
supporting backwards compatibility from this point. Removing support for a
specification version only 4 months after the latest specification has been
implemented is too severe of a deprecation period. They have said they
would like QEMU to support at least 2 specification versions so we won't
consider removing privileged ISA v1.9.1 support until privileged ISA v1.11
has been released and implemented. There are still several OS ports and
private tape-outs and test chips that target privileged ISA v1.9.1. In
fact, someone may very well add privileged ISA v1.9 and privileged ISA v1.7
support, perhaps as a computing history project. The published
specifications are all available however the chips implementing these
versions of the spec are mostly test chips. Nevertheless, they are part of
RISC-V history.

With respect to combining them, we could investigate triggering the config
string vs flattened device-tree, based on a restricted set of cpu 

Re: [Qemu-devel] [PATCH] i386: Allow monitor / mwait cpuid override

2018-02-27 Thread Alexander Graf


On 27.02.18 10:52, Gonglei (Arei) wrote:
> Hi all,
> 
> Guests could achive good performance in 'Message Passing Workloads' 
> scenarios when knowing the X86_FEATURE_MWAIT feature which is presented by 
> qemu. 
> the reason is that after knowing that feature, 
> the guest could use mwait method, which saves VMEXIT, 
> to do idle, and achives high performace in latency-sensitive scenario.
> 
> Is there any plan for this patch? 
> 
> Or May I send a updated version based on yours? @Alex?

Oh, did I drop the ball on this one? If that's the case, sure, go ahead.


Alex



Re: [Qemu-devel] Call for GSoC & Outreachy 2018 mentors & project ideas

2018-02-27 Thread Philippe Mathieu-Daudé
On 02/14/2018 04:00 PM, Alistair Francis wrote:
> On Mon, Jan 15, 2018 at 4:59 AM, Stefan Hajnoczi  wrote:
>> On Thu, Jan 11, 2018 at 03:25:56PM -0800, Alistair Francis wrote:
>>> On Wed, Jan 10, 2018 at 4:52 AM, Stefan Hajnoczi  wrote:
 On Tue, Jan 9, 2018 at 9:45 PM, Alistair Francis  
 wrote:
> Can anyone who has done this before chime in.
>
> What do you think about getting someone to cleanup and improve the GDB
> support in QEMU? Would that be the right difficulty of task for a GSoC
> project?

 There is not enough information to give feedback on whether this
 project idea is suitable.  What are the specific tasks you'd like the
 student to work on?

 In general, I'm sure there are well-defined 12-week project ideas
 around the GDB stub.  New features are easy to propose and are usually
 well-defined (e.g. implement these commands that are documented in the
 GDB protocol documentation).  Cleaning up code is less clear and it
 would depend on exactly what needs to be done.  Interns will not have
 a background in the QEMU codebase and may not be able to make
 judgements about how to structure things, so I would be more careful
 about refactoring/cleanup projects.

 Please see my talk about QEMU GSoC for guidelines on project ideas:
 https://www.youtube.com/watch?v=xNVCX7YMUL8&t=19m11s
 http://vmsplice.net/~stefan/stefanha-kvm-forum-2016.pdf
>>>
>>> That helps a lot, thanks for that.
>>>
>>> So for a more concrete solution, how would adding support for multi
>>> CPU support to the GDB server sound?
>>>
>>> This would allow GDB debugging for the A53 and the R5 on the Xilinx
>>> ZynqMP for example. This is something we have in the Xilinx tree, but
>>> it is in no state to go upstream and really needs to be re-write to be
>>> upstreamable and more generic.
>>
>> Excellent.  Then they'll already have an idea of "how" it can be
>> achieved but have the freedom to write code that is most suitable for
>> upstream.  That is a good starting point for a project.
>>
>> Here is the project idea template:
>>
>> === TITLE ===
>>
>>  '''Summary:''' Short description of the project
>>
>>  Detailed description of the project.
>>
>>  '''Links:'''
>>  * Wiki links to relevant material
>>  * External links to mailing lists or web sites
>>
>>  '''Details:'''
>>  * Skill level: beginner or intermediate or advanced
>>  * Language: C
>>  * Mentor: Email address and IRC nick
>>  * Suggested by: Person who suggested the idea
>>
>> Once you have written down the project idea, please post it under
>> Internships/ProjectIdeas/MultiCPUGDBStub and then add it to the
>> Google_Summer_of_Code_2018 wiki page using the
>> "{{:Internships/ProjectIdeas/MultiCPUGDBStub}}" inlining syntax.
>>
>> Or if you prefer, just reply with the project idea to this email and
>> I'll post it on the wiki for you.
>>
>> Can you think of a co-mentor who would be willing to participate?  It
>> makes internships easier when there are multiple mentors - less stress
>> for mentors, faster communication for students.
> 
> Yep, here is my proposal. I don't have wiki access, so I can't add it myself.
> 
> I think Philippe would be a good co-mentor, if he is happy to. I am
> also welcome to mentor other ideas, it doesn't have to be this one.

I'm very happy to co-mentor with Alistair, I can manage at least 2h/w on it.

> 
> === Multi-CPU cluster support for GDB server in QEMU ===
> 
> There are many examples in modern computing where multiple CPU
> clusters are grouped together in a single SoC. This is common in the
> ARM world especially. There are numerous examples such as ARM's
> big.LITTLE implementations and Xilinx's 4xA53s and 2xR5s on the ZynqMP
> SoC. The goal of this task is to add support to the GDB server to
> allow users to debug across these clusters.
> 
> This is another step towards single binary QEMU as well.
> 
>  Detailed description of the project.
> 
> Xilinx has an out of tree implementation that can be used as a
> starting point. Work will need to be done on top of this to prepare it
> for upstream submission and to ensure the implementation is more
> generic.
> 
> This will mostly involve extending GDB server to tell GDB about
> different architectures and then allow the user to swap between them.
> 
> The Xilinx implementation can be seen here:
> https://github.com/Xilinx/qemu/blob/master/gdbstub.c
> There has been some steps in preparing the work to go upstream, which
> can be seen here:
> https://github.com/Xilinx/qemu/tree/mainline/alistair/gdb
> 
>  '''Details:'''
>  * Skill level: advanced
>  * Language: C
>  * Mentor: alistai...@gmail.com, Philippe?
>  * Suggested by: Alistair Francis
> 
>>
>> Stefan



Re: [Qemu-devel] [PATCH v7 2/3] xlnx-zynqmp-rtc: Add basic time support

2018-02-27 Thread Philippe Mathieu-Daudé
On 02/27/2018 03:52 PM, Alistair Francis wrote:
> Allow the guest to determine the time set from the QEMU command line.
> 
> This includes adding a trace event to debug the new time.
> 
> Signed-off-by: Alistair Francis 

Reviewed-by: Philippe Mathieu-Daudé 

> ---
> V7:
>  - Make the current_tm local to init
> V6:
>  - Migrate tick_offset and add a pre_save call
> V5:
>  - Recalculate tick_offset after migration
> V4:
>  - Use the .unimp property
> V3:
>  - Store an offset value
>  - Use mktimegm()
>  - Log unimplemented writes
> V2:
>  - Convert DB_PRINT() macro to trace
> 
>  include/hw/timer/xlnx-zynqmp-rtc.h |  2 ++
>  hw/timer/xlnx-zynqmp-rtc.c | 58 
> ++
>  hw/timer/trace-events  |  3 ++
>  3 files changed, 63 insertions(+)
> 
> diff --git a/include/hw/timer/xlnx-zynqmp-rtc.h 
> b/include/hw/timer/xlnx-zynqmp-rtc.h
> index 87649836cc..5ba4d8bc4a 100644
> --- a/include/hw/timer/xlnx-zynqmp-rtc.h
> +++ b/include/hw/timer/xlnx-zynqmp-rtc.h
> @@ -79,6 +79,8 @@ typedef struct XlnxZynqMPRTC {
>  qemu_irq irq_rtc_int;
>  qemu_irq irq_addr_error_int;
>  
> +uint32_t tick_offset;
> +
>  uint32_t regs[XLNX_ZYNQMP_RTC_R_MAX];
>  RegisterInfo regs_info[XLNX_ZYNQMP_RTC_R_MAX];
>  } XlnxZynqMPRTC;
> diff --git a/hw/timer/xlnx-zynqmp-rtc.c b/hw/timer/xlnx-zynqmp-rtc.c
> index 707f145027..c98dc3d94e 100644
> --- a/hw/timer/xlnx-zynqmp-rtc.c
> +++ b/hw/timer/xlnx-zynqmp-rtc.c
> @@ -29,6 +29,10 @@
>  #include "hw/register.h"
>  #include "qemu/bitops.h"
>  #include "qemu/log.h"
> +#include "hw/ptimer.h"
> +#include "qemu/cutils.h"
> +#include "sysemu/sysemu.h"
> +#include "trace.h"
>  #include "hw/timer/xlnx-zynqmp-rtc.h"
>  
>  #ifndef XLNX_ZYNQMP_RTC_ERR_DEBUG
> @@ -47,6 +51,19 @@ static void addr_error_int_update_irq(XlnxZynqMPRTC *s)
>  qemu_set_irq(s->irq_addr_error_int, pending);
>  }
>  
> +static uint32_t rtc_get_count(XlnxZynqMPRTC *s)
> +{
> +int64_t now = qemu_clock_get_ns(rtc_clock);
> +return s->tick_offset + now / NANOSECONDS_PER_SECOND;
> +}
> +
> +static uint64_t current_time_postr(RegisterInfo *reg, uint64_t val64)
> +{
> +XlnxZynqMPRTC *s = XLNX_ZYNQMP_RTC(reg->opaque);
> +
> +return rtc_get_count(s);
> +}
> +
>  static void rtc_int_status_postw(RegisterInfo *reg, uint64_t val64)
>  {
>  XlnxZynqMPRTC *s = XLNX_ZYNQMP_RTC(reg->opaque);
> @@ -97,13 +114,17 @@ static uint64_t addr_error_int_dis_prew(RegisterInfo 
> *reg, uint64_t val64)
>  
>  static const RegisterAccessInfo rtc_regs_info[] = {
>  {   .name = "SET_TIME_WRITE",  .addr = A_SET_TIME_WRITE,
> +.unimp = MAKE_64BIT_MASK(0, 32),
>  },{ .name = "SET_TIME_READ",  .addr = A_SET_TIME_READ,
>  .ro = 0x,
> +.post_read = current_time_postr,
>  },{ .name = "CALIB_WRITE",  .addr = A_CALIB_WRITE,
> +.unimp = MAKE_64BIT_MASK(0, 32),
>  },{ .name = "CALIB_READ",  .addr = A_CALIB_READ,
>  .ro = 0x1f,
>  },{ .name = "CURRENT_TIME",  .addr = A_CURRENT_TIME,
>  .ro = 0x,
> +.post_read = current_time_postr,
>  },{ .name = "CURRENT_TICK",  .addr = A_CURRENT_TICK,
>  .ro = 0x,
>  },{ .name = "ALARM",  .addr = A_ALARM,
> @@ -162,6 +183,7 @@ static void rtc_init(Object *obj)
>  XlnxZynqMPRTC *s = XLNX_ZYNQMP_RTC(obj);
>  SysBusDevice *sbd = SYS_BUS_DEVICE(obj);
>  RegisterInfoArray *reg_array;
> +struct tm current_tm;
>  
>  memory_region_init(&s->iomem, obj, TYPE_XLNX_ZYNQMP_RTC,
> XLNX_ZYNQMP_RTC_R_MAX * 4);
> @@ -178,14 +200,50 @@ static void rtc_init(Object *obj)
>  sysbus_init_mmio(sbd, &s->iomem);
>  sysbus_init_irq(sbd, &s->irq_rtc_int);
>  sysbus_init_irq(sbd, &s->irq_addr_error_int);
> +
> +qemu_get_timedate(¤t_tm, 0);
> +s->tick_offset = mktimegm(¤t_tm) -
> +qemu_clock_get_ns(rtc_clock) / NANOSECONDS_PER_SECOND;
> +
> +trace_xlnx_zynqmp_rtc_gettime(current_tm.tm_year, current_tm.tm_mon,
> +  current_tm.tm_mday, current_tm.tm_hour,
> +  current_tm.tm_min, current_tm.tm_sec);
> +}
> +
> +static int rtc_pre_save(void *opaque)
> +{
> +XlnxZynqMPRTC *s = opaque;
> +int64_t now = qemu_clock_get_ns(rtc_clock) / NANOSECONDS_PER_SECOND;
> +
> +/* Add the time at migration */
> +s->tick_offset = s->tick_offset + now;
> +
> +return 0;
> +}
> +
> +static int rtc_post_load(void *opaque, int version_id)
> +{
> +XlnxZynqMPRTC *s = opaque;
> +int64_t now = qemu_clock_get_ns(rtc_clock) / NANOSECONDS_PER_SECOND;
> +
> +/* Subtract the time after migration. This combined with the pre_save
> + * action results in us having subtracted the time that the guest was
> + * stopped to the offset.
> + */
> +s->tick_offset = s->tick_offset - now;
> +
> +return 0;
>  }
>  
>  static const VMStateDescription vmstate_rtc = {
>  .name = TYPE_XLNX_ZYNQMP_RTC,
>  .version_id = 1,

Re: [Qemu-devel] [PATCH 06/11] macio: move macio related structures and defines into separate macio.h file

2018-02-27 Thread Philippe Mathieu-Daudé
On 02/19/2018 03:19 PM, Mark Cave-Ayland wrote:
> Signed-off-by: Mark Cave-Ayland 

Reviewed-by: Philippe Mathieu-Daudé 

> ---
>  hw/misc/macio/macio.c | 43 +
>  hw/ppc/mac.h  |  3 --
>  hw/ppc/mac_newworld.c |  1 +
>  hw/ppc/mac_oldworld.c |  1 +
>  include/hw/misc/macio/macio.h | 75 
> +++
>  5 files changed, 78 insertions(+), 45 deletions(-)
>  create mode 100644 include/hw/misc/macio/macio.h
> 
> diff --git a/hw/misc/macio/macio.c b/hw/misc/macio/macio.c
> index 1c10d8a1d7..4e502ede2e 100644
> --- a/hw/misc/macio/macio.c
> +++ b/hw/misc/macio/macio.c
> @@ -30,48 +30,7 @@
>  #include "hw/pci/pci.h"
>  #include "hw/ppc/mac_dbdma.h"
>  #include "hw/char/escc.h"
> -
> -#define TYPE_MACIO "macio"
> -#define MACIO(obj) OBJECT_CHECK(MacIOState, (obj), TYPE_MACIO)
> -
> -typedef struct MacIOState
> -{
> -/*< private >*/
> -PCIDevice parent;
> -/*< public >*/
> -
> -MemoryRegion bar;
> -CUDAState cuda;
> -DBDMAState dbdma;
> -ESCCState escc;
> -MemoryRegion *pic_mem;
> -uint64_t frequency;
> -} MacIOState;
> -
> -#define OLDWORLD_MACIO(obj) \
> -OBJECT_CHECK(OldWorldMacIOState, (obj), TYPE_OLDWORLD_MACIO)
> -
> -typedef struct OldWorldMacIOState {
> -/*< private >*/
> -MacIOState parent_obj;
> -/*< public >*/
> -
> -qemu_irq irqs[7];
> -
> -MacIONVRAMState nvram;
> -MACIOIDEState ide[2];
> -} OldWorldMacIOState;
> -
> -#define NEWWORLD_MACIO(obj) \
> -OBJECT_CHECK(NewWorldMacIOState, (obj), TYPE_NEWWORLD_MACIO)
> -
> -typedef struct NewWorldMacIOState {
> -/*< private >*/
> -MacIOState parent_obj;
> -/*< public >*/
> -qemu_irq irqs[7];
> -MACIOIDEState ide[2];
> -} NewWorldMacIOState;
> +#include "hw/misc/macio/macio.h"
>  
>  /*
>   * The mac-io has two interfaces to the ESCC. One is called "escc-legacy",
> diff --git a/hw/ppc/mac.h b/hw/ppc/mac.h
> index 5b5fffdff3..a02f797598 100644
> --- a/hw/ppc/mac.h
> +++ b/hw/ppc/mac.h
> @@ -47,9 +47,6 @@
>  
>  
>  /* MacIO */
> -#define TYPE_OLDWORLD_MACIO "macio-oldworld"
> -#define TYPE_NEWWORLD_MACIO "macio-newworld"
> -
>  #define TYPE_MACIO_IDE "macio-ide"
>  #define MACIO_IDE(obj) OBJECT_CHECK(MACIOIDEState, (obj), TYPE_MACIO_IDE)
>  
> diff --git a/hw/ppc/mac_newworld.c b/hw/ppc/mac_newworld.c
> index 5e82158759..396216954e 100644
> --- a/hw/ppc/mac_newworld.c
> +++ b/hw/ppc/mac_newworld.c
> @@ -60,6 +60,7 @@
>  #include "hw/boards.h"
>  #include "hw/nvram/fw_cfg.h"
>  #include "hw/char/escc.h"
> +#include "hw/misc/macio/macio.h"
>  #include "hw/ppc/openpic.h"
>  #include "hw/ide.h"
>  #include "hw/loader.h"
> diff --git a/hw/ppc/mac_oldworld.c b/hw/ppc/mac_oldworld.c
> index 06a61220cb..5903ff47d3 100644
> --- a/hw/ppc/mac_oldworld.c
> +++ b/hw/ppc/mac_oldworld.c
> @@ -37,6 +37,7 @@
>  #include "hw/boards.h"
>  #include "hw/nvram/fw_cfg.h"
>  #include "hw/char/escc.h"
> +#include "hw/misc/macio/macio.h"
>  #include "hw/ide.h"
>  #include "hw/loader.h"
>  #include "elf.h"
> diff --git a/include/hw/misc/macio/macio.h b/include/hw/misc/macio/macio.h
> new file mode 100644
> index 00..e1e249f898
> --- /dev/null
> +++ b/include/hw/misc/macio/macio.h
> @@ -0,0 +1,75 @@
> +/*
> + * PowerMac MacIO device emulation
> + *
> + * Copyright (c) 2005-2007 Fabrice Bellard
> + * Copyright (c) 2007 Jocelyn Mayer
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a 
> copy
> + * of this software and associated documentation files (the "Software"), to 
> deal
> + * in the Software without restriction, including without limitation the 
> rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
> FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#ifndef MACIO_H
> +#define MACIO_H
> +
> +#include "hw/misc/macio/cuda.h"
> +#include "hw/ppc/mac_dbdma.h"
> +
> +#define TYPE_MACIO "macio"
> +#define MACIO(obj) OBJECT_CHECK(MacIOState, (obj), TYPE_MACIO)
> +
> +typedef struct MacIOState {
> +/*< private >*/
> +PCIDevice parent;
> +/*< public >*/
> +
> +MemoryRegion bar;
> +CUDAState cuda;
> +DBDMAState dbdma;
> +ESCCState escc;
> +MemoryRegion *pic_mem;
>

Re: [Qemu-devel] [Qemu-arm] [PATCH 2/9] pc: replace pm object initialization with one-liner in acpi_get_pm_info()

2018-02-27 Thread Philippe Mathieu-Daudé
On 02/22/2018 09:42 AM, Igor Mammedov wrote:
> next patch will need it before it gets to piix4/lpc branches
> that initializes 'obj' now.
> 
> Signed-off-by: Igor Mammedov 

Reviewed-by: Philippe Mathieu-Daudé 

> ---
>  hw/i386/acpi-build.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index deb440f..b85fefe 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -128,7 +128,7 @@ static void acpi_get_pm_info(AcpiPmInfo *pm)
>  {
>  Object *piix = piix4_pm_find();
>  Object *lpc = ich9_lpc_find();
> -Object *obj = NULL;
> +Object *obj = piix ? piix : lpc;
>  QObject *o;
>  
>  pm->force_rev1_fadt = false;
> @@ -138,7 +138,6 @@ static void acpi_get_pm_info(AcpiPmInfo *pm)
>  if (piix) {
>  /* w2k requires FADT(rev1) or it won't boot, keep PC compatible */
>  pm->force_rev1_fadt = true;
> -obj = piix;
>  pm->cpu_hp_io_base = PIIX4_CPU_HOTPLUG_IO_BASE;
>  pm->pcihp_io_base =
>  object_property_get_uint(obj, ACPI_PCIHP_IO_BASE_PROP, NULL);
> @@ -146,7 +145,6 @@ static void acpi_get_pm_info(AcpiPmInfo *pm)
>  object_property_get_uint(obj, ACPI_PCIHP_IO_LEN_PROP, NULL);
>  }
>  if (lpc) {
> -obj = lpc;
>  pm->cpu_hp_io_base = ICH9_CPU_HOTPLUG_IO_BASE;
>  }
>  assert(obj);
> 



Re: [Qemu-devel] [PATCH 3/5] hw/i2c-ddc: Do not fail writes

2018-02-27 Thread Philippe Mathieu-Daudé
On 02/27/2018 07:49 AM, Linus Walleij wrote:
> The tx function of the DDC I2C slave emulation was returning 1
> on all writes resulting in NACK in the I2C bus. Changing it to
> 0 makes the DDC I2C work fine with bit-banged I2C such as the
> versatile I2C.
> 
> I guess it was not affecting whatever I2C controller this was
> used with until now, but with the Versatile I2C it surely
> does not work.
> 
> Reviewed-by: Peter Maydell 
> Signed-off-by: Linus Walleij 

Reviewed-by: Philippe Mathieu-Daudé 

> ---
>  hw/i2c/i2c-ddc.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/i2c/i2c-ddc.c b/hw/i2c/i2c-ddc.c
> index 199dac9e41c1..bec0c91e2dd0 100644
> --- a/hw/i2c/i2c-ddc.c
> +++ b/hw/i2c/i2c-ddc.c
> @@ -259,12 +259,12 @@ static int i2c_ddc_tx(I2CSlave *i2c, uint8_t data)
>  s->reg = data;
>  s->firstbyte = false;
>  DPRINTF("[EDID] Written new pointer: %u\n", data);
> -return 1;
> +return 0;
>  }
>  
>  /* Ignore all writes */
>  s->reg++;
> -return 1;
> +return 0;
>  }
>  
>  static void i2c_ddc_init(Object *obj)
> 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH 5/5] arm/vexpress: Add proper display connector emulation

2018-02-27 Thread Philippe Mathieu-Daudé
On 02/27/2018 07:49 AM, Linus Walleij wrote:
> This adds the SiI9022 and EDID I2C devices to the ARM Versatile
> Express machine, and selects the two I2C devices necessary in the
> arm-softmmy.mak configuration so everything will build smoothly.
> 
> I am implementing proper handling of the graphics in the Linux
> kernel and adding proper emulation of SiI9022 and EDID makes the
> driver probe as nicely as before, retrieveing the resolutions
> supported by the "QEMU monitor" and overall just working nice.
> 
> The assignment of the SiI9022 at address 0x39 and the EDID
> DDC I2C at address 0x50 is not strictly correct: the DDC I2C
> is there all the time but in the actual component it only
> appears once activated inside the SiI9022, so ideally it should
> be added and removed to the bus by the SiI9022. However for this
> purpose it works fine to just have it around.
> 
> Cc: Peter Maydell 
> Signed-off-by: Linus Walleij 

Reviewed-by: Philippe Mathieu-Daudé 

> ---
> ChangeLog v1->v2:
> - Only add the SII9022 now that it will by itself realize
>   the DDCI2C as part of the bridge.
> ---
>  default-configs/arm-softmmu.mak | 2 ++
>  hw/arm/vexpress.c   | 6 +-
>  2 files changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
> index ca34cf446242..54f855d07206 100644
> --- a/default-configs/arm-softmmu.mak
> +++ b/default-configs/arm-softmmu.mak
> @@ -21,6 +21,8 @@ CONFIG_STELLARIS_INPUT=y
>  CONFIG_STELLARIS_ENET=y
>  CONFIG_SSD0303=y
>  CONFIG_SSD0323=y
> +CONFIG_DDC=y
> +CONFIG_SII9022=y
>  CONFIG_ADS7846=y
>  CONFIG_MAX111X=y
>  CONFIG_SSI=y
> diff --git a/hw/arm/vexpress.c b/hw/arm/vexpress.c
> index dc5928ae1ab5..9fad79177a19 100644
> --- a/hw/arm/vexpress.c
> +++ b/hw/arm/vexpress.c
> @@ -29,6 +29,7 @@
>  #include "hw/arm/arm.h"
>  #include "hw/arm/primecell.h"
>  #include "hw/devices.h"
> +#include "hw/i2c/i2c.h"
>  #include "net/net.h"
>  #include "sysemu/sysemu.h"
>  #include "hw/boards.h"
> @@ -537,6 +538,7 @@ static void vexpress_common_init(MachineState *machine)
>  uint32_t sys_id;
>  DriveInfo *dinfo;
>  pflash_t *pflash0;
> +I2CBus *i2c;
>  ram_addr_t vram_size, sram_size;
>  MemoryRegion *sysmem = get_system_memory();
>  MemoryRegion *vram = g_new(MemoryRegion, 1);
> @@ -628,7 +630,9 @@ static void vexpress_common_init(MachineState *machine)
>  sysbus_create_simple("sp804", map[VE_TIMER01], pic[2]);
>  sysbus_create_simple("sp804", map[VE_TIMER23], pic[3]);
>  
> -/* VE_SERIALDVI: not modelled */
> +dev = sysbus_create_simple("versatile_i2c", map[VE_SERIALDVI], NULL);
> +i2c = (I2CBus *)qdev_get_child_bus(dev, "i2c");
> +i2c_create_slave(i2c, "sii9022", 0x39);
>  
>  sysbus_create_simple("pl031", map[VE_RTC], pic[4]); /* RTC */
>  
> 



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [PATCH] decodetree: Propagate return value from translate subroutines

2018-02-27 Thread Richard Henderson
Allow the translate subroutines to return false for invalid insns.

At present we can of course invoke an invalid insn exception from within
the translate subroutine, but in the short term this consolidates code.
In the long term it would allow the decodetree language to support
overlapping patterns for ISA extensions.

Signed-off-by: Richard Henderson 
---

Since this makes an ABI change to the translate functions called by the
decode function, let's make it now before there are any in-tree users.

My SVE branch over-decodes in quite a lot of cases -- e.g. things like
the 2-bit size field must be 1-3 for fp operands, and so size==0 is
unallocated.  Returning false for these cases allows the actual call
to unallocated_encoding to be done in one place instead of hundreds.

Longer term, I'm thinking of how to handle decode of overlapping ISA
extensions.  One could allow (specific) overlapping patterns and
prioritize them in some way (e.g. first in file is first matched).
My thought is that trans_insn_a would check a cpu feature bit and
return false if not enabled.  Then trans_insn_b would be given its
chance to handle the insn.


r~
---
 scripts/decodetree.py | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/scripts/decodetree.py b/scripts/decodetree.py
index 6a33f8f8dd..41301c84aa 100755
--- a/scripts/decodetree.py
+++ b/scripts/decodetree.py
@@ -461,7 +461,7 @@ class Pattern(General):
 global translate_prefix
 output('typedef ', self.base.base.struct_name(),
' arg_', self.name, ';\n')
-output(translate_scope, 'void ', translate_prefix, '_', self.name,
+output(translate_scope, 'bool ', translate_prefix, '_', self.name,
'(DisasContext *ctx, arg_', self.name,
' *a, ', insntype, ' insn);\n')
 
@@ -474,9 +474,8 @@ class Pattern(General):
 output(ind, self.base.extract_name(), '(&u.f_', arg, ', insn);\n')
 for n, f in self.fields.items():
 output(ind, 'u.f_', arg, '.', n, ' = ', f.str_extract(), ';\n')
-output(ind, translate_prefix, '_', self.name,
+output(ind, 'return ', translate_prefix, '_', self.name,
'(ctx, &u.f_', arg, ', insn);\n')
-output(ind, 'return true;\n')
 # end Pattern
 
 
-- 
2.14.3




Re: [Qemu-devel] [PATCH v5 0/7] Generalize MDIO framework

2018-02-27 Thread Philippe Mathieu-Daudé
Hi Edgar,

On 10/09/2017 10:21 AM, Edgar E. Iglesias wrote:
> On Fri, Sep 22, 2017 at 02:13:16PM -0300, Philippe Mathieu-Daudé wrote:
>> Hi,
>>
>> I have a follow up series using multiples PHY on the MDIO bus based on this
>> series.
> 
> Hi Philippe!
> 
> I think this is a good improvement compared to todays state.
> It may make sense to have the generic mdio bus functions in mdio.c
> and specific phy models in separate files, thoughts?

I'm sorry I missed your mail and noticed it today since this thread got
awakened with Alistair's reviews.

I'll raise this series priority in my TODO and respin with Grant correct
S-o-b, Alistair comments addressed and split MDIO/PHY as you suggested.

After spending 2 months with the SD bus, I now fill more confident to
rework the MDIO bus and think of unit testing and tracing.

Regards,

Phil.

>>
>> Grant's previous work:
>> http://lists.nongnu.org/archive/html/qemu-devel/2013-02/msg00257.html
>>
>> "There is more work to be done, particularly in moving to the common GPIO 
>> api,
>>  but that work can be done as a follow on patch series."
>>
>> Grant Likely (7):
>>   hw/mdio: Generalize etraxfs MDIO bitbanging emulation
>>   hw/mdio: Add PHY register definition
>>   hw/mdio: Generalize phy initialization routine
>>   hw/mdio: Mask out read-only bits.
>>   hw/mdio: Refactor bitbanging state machine
>>   hw/mdio: Add VMState support
>>   hw/mdio: Use bitbang core for smc91c111 network device
>>
>>  include/hw/net/mdio.h   | 124 +
>>  hw/net/etraxfs_eth.c| 291 
>> +---
>>  hw/net/mdio.c   | 280 ++
>>  hw/net/smc91c111.c  |  27 -
>>  hw/net/xilinx_axienet.c | 189 +--
>>  hw/net/Makefile.objs|   2 +
>>  6 files changed, 438 insertions(+), 475 deletions(-)
>>  create mode 100644 include/hw/net/mdio.h
>>  create mode 100644 hw/net/mdio.c
>>
>> -- 
>> 2.14.1
>>



Re: [Qemu-devel] [PATCH 19/19] mps2-an505: New board model: MPS2 with AN505 Cortex-M33 FPGA image

2018-02-27 Thread Richard Henderson
On 02/20/2018 10:03 AM, Peter Maydell wrote:
> Define a new board model for the MPS2 with an AN505 FPGA image
> containing a Cortex-M33. Since the FPGA images for TrustZone
> cores (AN505, and the similar AN519 for Cortex-M23) have a
> significantly different layout of devices to the non-TrustZone
> images, we use a new source file rather than shoehorning them
> into the existing mps2.c.
> 
> Signed-off-by: Peter Maydell 
> ---
>  hw/arm/Makefile.objs |   1 +
>  hw/arm/mps2-tz.c | 504 
> +++
>  2 files changed, 505 insertions(+)
>  create mode 100644 hw/arm/mps2-tz.c

Reviewed-by: Richard Henderson 


r~




Re: [Qemu-devel] [PATCH 18/19] hw/arm/iotkit: Model Arm IOT Kit

2018-02-27 Thread Richard Henderson
On 02/20/2018 10:03 AM, Peter Maydell wrote:
> Model the Arm IoT Kit documented in
> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ecm0601256/index.html
> 
> The Arm IoT Kit is a subsystem which includes a CPU and some devices,
> and is intended be extended by adding extra devices to form a
> complete system.  It is used in the MPS2 board's AN505 image for the
> Cortex-M33.
> 
> Signed-off-by: Peter Maydell 

Reviewed-by: Richard Henderson 


r~



Re: [Qemu-devel] [PATCH v5 6/7] hw/mdio: Add VMState support

2018-02-27 Thread Alistair Francis
On Fri, Sep 22, 2017 at 10:13 AM, Philippe Mathieu-Daudé
 wrote:
> From: Grant Likely 
>
> The MDIO model needs to have VMState support before it can be used by
> devices that support VMState. This patch adds VMState macros for both
> qemu_mdio and qemu_phy.
>
> Signed-off-by: Grant Likely 
> Signed-off-by: Philippe Mathieu-Daudé 
> [PMD: just rebased]

Reviewed-by: Alistair Francis 

Alistair

> ---
>  include/hw/net/mdio.h | 22 ++
>  hw/net/mdio.c | 30 ++
>  2 files changed, 52 insertions(+)
>
> diff --git a/include/hw/net/mdio.h b/include/hw/net/mdio.h
> index 7fca19784e..b94e5ec337 100644
> --- a/include/hw/net/mdio.h
> +++ b/include/hw/net/mdio.h
> @@ -25,6 +25,8 @@
>   * THE SOFTWARE.
>   */
>
> +#include "migration/vmstate.h"
> +
>  /* PHY MII Register/Bit Definitions */
>  /* PHY Registers defined by IEEE */
>  #define PHY_CTRL 0x00 /* Control Register */
> @@ -61,6 +63,16 @@ struct qemu_phy {
>  void (*write)(struct qemu_phy *phy, unsigned int req, uint16_t data);
>  };
>
> +extern const VMStateDescription vmstate_mdio_phy;
> +
> +#define VMSTATE_MDIO_PHY(_field, _state) {   \
> +.name   = (stringify(_field)),   \
> +.size   = sizeof(struct qemu_phy),   \
> +.vmsd   = &vmstate_mdio_phy, \
> +.flags  = VMS_STRUCT,\
> +.offset = vmstate_offset_value(_state, _field, struct qemu_phy), \
> +}
> +
>  struct qemu_mdio {
>  /* bitbanging state machine */
>  bool mdc;
> @@ -83,6 +95,16 @@ struct qemu_mdio {
>  struct qemu_phy *devs[32];
>  };
>
> +extern const VMStateDescription vmstate_mdio;
> +
> +#define VMSTATE_MDIO(_field, _state) { \
> +.name   = (stringify(_field)), \
> +.size   = sizeof(struct qemu_mdio),\
> +.vmsd   = &vmstate_mdio,   \
> +.flags  = VMS_STRUCT,  \
> +.offset = vmstate_offset_value(_state, _field, struct qemu_mdio),  \
> +}
> +
>  void mdio_phy_init(struct qemu_phy *phy, uint16_t id1, uint16_t id2);
>  void mdio_attach(struct qemu_mdio *bus, struct qemu_phy *phy,
>   unsigned int addr);
> diff --git a/hw/net/mdio.c b/hw/net/mdio.c
> index 96e10fada0..6c13cc7272 100644
> --- a/hw/net/mdio.c
> +++ b/hw/net/mdio.c
> @@ -248,3 +248,33 @@ void mdio_bitbang_set_clk(struct qemu_mdio *bus, bool 
> mdc)
>  break;
>  }
>  }
> +
> +const VMStateDescription vmstate_mdio = {
> +.name = "mdio",
> +.version_id = 1,
> +.minimum_version_id = 1,
> +.minimum_version_id_old = 1,
> +.fields = (VMStateField[]) {
> +VMSTATE_BOOL(mdc, struct qemu_mdio),
> +VMSTATE_BOOL(mdio, struct qemu_mdio),
> +VMSTATE_UINT32(state, struct qemu_mdio),
> +VMSTATE_UINT16(cnt, struct qemu_mdio),
> +VMSTATE_UINT16(addr, struct qemu_mdio),
> +VMSTATE_UINT16(opc, struct qemu_mdio),
> +VMSTATE_UINT16(req, struct qemu_mdio),
> +VMSTATE_UINT32(shiftreg, struct qemu_mdio),
> +VMSTATE_END_OF_LIST()
> +}
> +};
> +
> +const VMStateDescription vmstate_mdio_phy = {
> +.name = "mdio",
> +.version_id = 1,
> +.minimum_version_id = 1,
> +.minimum_version_id_old = 1,
> +.fields = (VMStateField[]) {
> +VMSTATE_UINT16_ARRAY(regs, struct qemu_phy, 32),
> +VMSTATE_BOOL(link, struct qemu_phy),
> +VMSTATE_END_OF_LIST()
> +}
> +};
> --
> 2.14.1
>
>



Re: [Qemu-devel] [PATCH v5 5/7] hw/mdio: Refactor bitbanging state machine

2018-02-27 Thread Alistair Francis
On Fri, Sep 22, 2017 at 10:13 AM, Philippe Mathieu-Daudé
 wrote:
> From: Grant Likely 
>
> The MDIO state machine has a moderate amount of duplicate code in the
> state processing that can be consolidated. This patch does so and
> reorganizes it a bit so that far less code is required. Most of the
> states simply stream a fixed number of bits in as a single integer and
> can be handled by a common processing function that checks for
> completion of the state and returns the streamed in value.
>
> Changes include:
> - Move clock state change tracking into core code
> - Use a common shift register for clocking data in and out
> - Create separate mdc & mdio accessor functions
>   - will be replaced with GPIO connection in a follow-on patch
>
> Signed-off-by: Grant Likely 
> Signed-off-by: Philippe Mathieu-Daudé 
> [PMD: just rebased]

Acked-by: Alistair Francis 

Alistair

> ---
>  include/hw/net/mdio.h |  41 ---
>  hw/net/etraxfs_eth.c  |  11 ++--
>  hw/net/mdio.c | 140 
> ++
>  3 files changed, 87 insertions(+), 105 deletions(-)
>
> diff --git a/include/hw/net/mdio.h b/include/hw/net/mdio.h
> index ed1879a728..7fca19784e 100644
> --- a/include/hw/net/mdio.h
> +++ b/include/hw/net/mdio.h
> @@ -52,37 +52,33 @@
>  #define PHY_ADVERTISE_100FULL   0x0100  /* Try for 100mbps full-duplex */
>
>  struct qemu_phy {
> -uint32_t regs[NUM_PHY_REGS];
> +uint16_t regs[NUM_PHY_REGS];
>  const uint16_t *regs_readonly_mask; /* 0=writable, 1=read-only */
>
> -int link;
> +bool link;
>
> -unsigned int (*read)(struct qemu_phy *phy, unsigned int req);
> -void (*write)(struct qemu_phy *phy, unsigned int req, unsigned int data);
> +uint16_t (*read)(struct qemu_phy *phy, unsigned int req);
> +void (*write)(struct qemu_phy *phy, unsigned int req, uint16_t data);
>  };
>
>  struct qemu_mdio {
> -/* bus. */
> -int mdc;
> -int mdio;
> -
> -/* decoder.  */
> +/* bitbanging state machine */
> +bool mdc;
> +bool mdio;
>  enum {
>  PREAMBLE,
> -SOF,
>  OPC,
>  ADDR,
>  REQ,
>  TURNAROUND,
>  DATA
>  } state;
> -unsigned int drive;
>
> -unsigned int cnt;
> -unsigned int addr;
> -unsigned int opc;
> -unsigned int req;
> -unsigned int data;
> +uint16_t cnt; /* Bit count for current state */
> +uint16_t addr; /* PHY Address; retrieved during ADDR state */
> +uint16_t opc; /* Operation; 2:read */
> +uint16_t req; /* Register address */
> +uint32_t shiftreg; /* shift register; bits in to or out from PHY */
>
>  struct qemu_phy *devs[32];
>  };
> @@ -91,7 +87,16 @@ void mdio_phy_init(struct qemu_phy *phy, uint16_t id1, 
> uint16_t id2);
>  void mdio_attach(struct qemu_mdio *bus, struct qemu_phy *phy,
>   unsigned int addr);
>  uint16_t mdio_read_req(struct qemu_mdio *bus, uint8_t addr, uint8_t req);
> -void mdio_write_req(struct qemu_mdio *bus, uint8_t addr, uint8_t req, 
> uint16_t data);
> -void mdio_cycle(struct qemu_mdio *bus);
> +void mdio_write_req(struct qemu_mdio *bus, uint8_t addr, uint8_t req,
> +uint16_t data);
> +void mdio_bitbang_set_clk(struct qemu_mdio *bus, bool mdc);
> +static inline void mdio_bitbang_set_data(struct qemu_mdio *bus, bool mdio)
> +{
> +bus->mdio = mdio;
> +}
> +static inline bool mdio_bitbang_get_data(struct qemu_mdio *bus)
> +{
> +return bus->mdio;
> +}
>
>  #endif
> diff --git a/hw/net/etraxfs_eth.c b/hw/net/etraxfs_eth.c
> index 4c5415771f..1b518ea16e 100644
> --- a/hw/net/etraxfs_eth.c
> +++ b/hw/net/etraxfs_eth.c
> @@ -119,7 +119,7 @@ eth_read(void *opaque, hwaddr addr, unsigned int size)
>
>  switch (addr) {
>  case R_STAT:
> -r = eth->mdio_bus.mdio & 1;
> +r = mdio_bitbang_get_data(ð->mdio_bus);
>  break;
>  default:
>  r = eth->regs[addr];
> @@ -177,13 +177,10 @@ eth_write(void *opaque, hwaddr addr,
>  case RW_MGM_CTRL:
>  /* Attach an MDIO/PHY abstraction.  */
>  if (value & 2) {
> -eth->mdio_bus.mdio = value & 1;
> +mdio_bitbang_set_data(ð->mdio_bus, value & 1);
>  }
> -if (eth->mdio_bus.mdc != (value & 4)) {
> -mdio_cycle(ð->mdio_bus);
> -eth_validate_duplex(eth);
> -}
> -eth->mdio_bus.mdc = !!(value & 4);
> +mdio_bitbang_set_clk(ð->mdio_bus, value & 4);
> +eth_validate_duplex(eth);
>  eth->regs[addr] = value;
>  break;
>
> diff --git a/hw/net/mdio.c b/hw/net/mdio.c
> index 89a6a3a590..96e10fada0 100644
> --- a/hw/net/mdio.c
> +++ b/hw/net/mdio.c
> @@ -43,7 +43,7 @@
>   * linux driver (PHYID and Diagnostics reg).
>   * TODO: Add friendly names for the register nums.
>   */
> -static unsigned int mdio_phy_read(struct qemu_phy *phy, unsigned int req)
> +static uint16_t mdio_phy_read(struct qemu_phy *phy, unsigned int req)
>  {
>  int regnum;
>  

Re: [Qemu-devel] [PATCH v5 4/7] hw/mdio: Mask out read-only bits.

2018-02-27 Thread Alistair Francis
On Fri, Sep 22, 2017 at 10:13 AM, Philippe Mathieu-Daudé
 wrote:
> From: Grant Likely 
>
> The RST and ANEG_RST bits are commands, not settings. An operating
> system will get confused (or at least u-boot does) if those bits remain
> set after writing to them. Therefore, mask them out on write.
>
> Similarly, no bits in the ID1, ID2, and remote capability registers are
> writeable; so mask them out also.
>
> Signed-off-by: Grant Likely 
> Signed-off-by: Philippe Mathieu-Daudé 
> [PMD: just rebased]
> ---
>  include/hw/net/mdio.h |  1 +
>  hw/net/mdio.c | 16 
>  2 files changed, 13 insertions(+), 4 deletions(-)
>
> diff --git a/include/hw/net/mdio.h b/include/hw/net/mdio.h
> index b3b4f497c0..ed1879a728 100644
> --- a/include/hw/net/mdio.h
> +++ b/include/hw/net/mdio.h
> @@ -53,6 +53,7 @@
>
>  struct qemu_phy {
>  uint32_t regs[NUM_PHY_REGS];
> +const uint16_t *regs_readonly_mask; /* 0=writable, 1=read-only */
>
>  int link;
>
> diff --git a/hw/net/mdio.c b/hw/net/mdio.c
> index 33bfbb4623..89a6a3a590 100644
> --- a/hw/net/mdio.c
> +++ b/hw/net/mdio.c
> @@ -109,17 +109,24 @@ static unsigned int mdio_phy_read(struct qemu_phy *phy, 
> unsigned int req)
>
>  static void mdio_phy_write(struct qemu_phy *phy, unsigned int req, unsigned 
> int data)
>  {
> -int regnum;
> +int regnum = req & 0x1f;
> +uint16_t mask = phy->regs_readonly_mask[regnum];
>
> -regnum = req & 0x1f;
> -D(printf("%s reg[%d] = %x\n", __func__, regnum, data));
> +D(printf("%s reg[%d] = %x; mask=%x\n", __func__, regnum, data, mask));
>  switch (regnum) {
>  default:
> -phy->regs[regnum] = data;
> +phy->regs[regnum] = (phy->regs[regnum] & mask) | (data & ~mask);
>  break;
>  }
>  }
>
> +static const uint16_t default_readonly_mask[32] = {
> +[PHY_CTRL] = PHY_CTRL_RST | PHY_CTRL_ANEG_RST,
> +[PHY_ID1] = 0x,
> +[PHY_ID2] = 0x,
> +[PHY_LP_ABILITY] = 0x,
> +};

This is what the register API is really good at :)

Overall this looks fine, can we use a macro for the 32 though and then
protect accesses with an assert() or if()?

Alistair

> +
>  void mdio_phy_init(struct qemu_phy *phy, uint16_t id1, uint16_t id2)
>  {
>  phy->regs[PHY_CTRL] = 0x3100;
> @@ -128,6 +135,7 @@ void mdio_phy_init(struct qemu_phy *phy, uint16_t id1, 
> uint16_t id2)
>  phy->regs[PHY_ID2] = id2;
>  /* Autonegotiation advertisement reg. */
>  phy->regs[PHY_AUTONEG_ADV] = 0x01e1;
> +phy->regs_readonly_mask = default_readonly_mask;
>  phy->link = 1;
>
>  phy->read = mdio_phy_read;
> --
> 2.14.1
>
>



Re: [Qemu-devel] [PATCH v5 3/7] hw/mdio: Generalize phy initialization routine

2018-02-27 Thread Alistair Francis
On Fri, Sep 22, 2017 at 10:13 AM, Philippe Mathieu-Daudé
 wrote:
> From: Grant Likely 
>
> There really isn't anything tdk-specific about tdk_init() other than the
> phy id registers. The function should instead be generalized for any
> phy, at least as far as the ID registers are concerned. For the most
> part the read/write behaviour should be very similar across PHYs.
>
> This patch renames tdk_{read,write,init}() to mdio_phy_*() so it can be
> used for any PHY.
>
> More work definitely needs to be done here to make it easy to override
> the default behaviour for specific PHYs, but this at least is a
> reasonable start.
>
> Signed-off-by: Grant Likely 
> Signed-off-by: Philippe Mathieu-Daudé 
> [PMD: just rebased]
> ---
>  include/hw/net/mdio.h   |  2 +-
>  hw/net/etraxfs_eth.c|  2 +-
>  hw/net/mdio.c   | 14 +++---
>  hw/net/xilinx_axienet.c |  2 +-
>  4 files changed, 10 insertions(+), 10 deletions(-)
>
> diff --git a/include/hw/net/mdio.h b/include/hw/net/mdio.h
> index 7ffa4389b9..b3b4f497c0 100644
> --- a/include/hw/net/mdio.h
> +++ b/include/hw/net/mdio.h
> @@ -86,7 +86,7 @@ struct qemu_mdio {
>  struct qemu_phy *devs[32];
>  };
>
> -void tdk_init(struct qemu_phy *phy);
> +void mdio_phy_init(struct qemu_phy *phy, uint16_t id1, uint16_t id2);
>  void mdio_attach(struct qemu_mdio *bus, struct qemu_phy *phy,
>   unsigned int addr);
>  uint16_t mdio_read_req(struct qemu_mdio *bus, uint8_t addr, uint8_t req);
> diff --git a/hw/net/etraxfs_eth.c b/hw/net/etraxfs_eth.c
> index f8d8f8441d..4c5415771f 100644
> --- a/hw/net/etraxfs_eth.c
> +++ b/hw/net/etraxfs_eth.c
> @@ -333,7 +333,7 @@ static int fs_eth_init(SysBusDevice *sbd)
>  qemu_format_nic_info_str(qemu_get_queue(s->nic), s->conf.macaddr.a);
>
>
> -tdk_init(&s->phy);
> +mdio_phy_init(&s->phy, 0x0300, 0xe400);
>  mdio_attach(&s->mdio_bus, &s->phy, s->phyaddr);
>  return 0;
>  }
> diff --git a/hw/net/mdio.c b/hw/net/mdio.c
> index 3d70d99077..33bfbb4623 100644
> --- a/hw/net/mdio.c
> +++ b/hw/net/mdio.c
> @@ -43,7 +43,7 @@
>   * linux driver (PHYID and Diagnostics reg).
>   * TODO: Add friendly names for the register nums.
>   */
> -static unsigned int tdk_read(struct qemu_phy *phy, unsigned int req)
> +static unsigned int mdio_phy_read(struct qemu_phy *phy, unsigned int req)
>  {
>  int regnum;
>  unsigned r = 0;
> @@ -107,7 +107,7 @@ static unsigned int tdk_read(struct qemu_phy *phy, 
> unsigned int req)
>  return r;
>  }
>
> -static void tdk_write(struct qemu_phy *phy, unsigned int req, unsigned int 
> data)
> +static void mdio_phy_write(struct qemu_phy *phy, unsigned int req, unsigned 
> int data)
>  {
>  int regnum;
>
> @@ -120,18 +120,18 @@ static void tdk_write(struct qemu_phy *phy, unsigned 
> int req, unsigned int data)
>  }
>  }
>
> -void tdk_init(struct qemu_phy *phy)
> +void mdio_phy_init(struct qemu_phy *phy, uint16_t id1, uint16_t id2)
>  {
>  phy->regs[PHY_CTRL] = 0x3100;
>  /* PHY Id. */
> -phy->regs[PHY_ID1] = 0x0300;
> -phy->regs[PHY_ID2] = 0xe400;
> +phy->regs[PHY_ID1] = id1;
> +phy->regs[PHY_ID2] = id2;

These should be set by QEMU properties instead of values to the init() function.

Alistair

>  /* Autonegotiation advertisement reg. */
>  phy->regs[PHY_AUTONEG_ADV] = 0x01e1;
>  phy->link = 1;
>
> -phy->read = tdk_read;
> -phy->write = tdk_write;
> +phy->read = mdio_phy_read;
> +phy->write = mdio_phy_write;
>  }
>
>  void mdio_attach(struct qemu_mdio *bus, struct qemu_phy *phy, unsigned int 
> addr)
> diff --git a/hw/net/xilinx_axienet.c b/hw/net/xilinx_axienet.c
> index 1e859fdaae..408cd6e675 100644
> --- a/hw/net/xilinx_axienet.c
> +++ b/hw/net/xilinx_axienet.c
> @@ -791,7 +791,7 @@ static void xilinx_enet_realize(DeviceState *dev, Error 
> **errp)
>object_get_typename(OBJECT(dev)), dev->id, s);
>  qemu_format_nic_info_str(qemu_get_queue(s->nic), s->conf.macaddr.a);
>
> -tdk_init(&s->TEMAC.phy);
> +mdio_phy_init(&s->TEMAC.phy, 0x0300, 0xe400);
>  mdio_attach(&s->TEMAC.mdio_bus, &s->TEMAC.phy, s->c_phyaddr);
>
>  s->TEMAC.parent = s;
> --
> 2.14.1
>
>



Re: [Qemu-devel] [PATCH v5 2/7] hw/mdio: Add PHY register definition

2018-02-27 Thread Alistair Francis
On Fri, Sep 22, 2017 at 10:13 AM, Philippe Mathieu-Daudé
 wrote:
> From: Grant Likely 
>
> Trivial patch to add #defines for defined PHY register address and bit fields
>
> Signed-off-by: Grant Likely 
> Signed-off-by: Philippe Mathieu-Daudé 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  include/hw/net/mdio.h | 24 ++--
>  hw/net/mdio.c |  8 
>  2 files changed, 26 insertions(+), 6 deletions(-)
>
> diff --git a/include/hw/net/mdio.h b/include/hw/net/mdio.h
> index ac36aed3c3..7ffa4389b9 100644
> --- a/include/hw/net/mdio.h
> +++ b/include/hw/net/mdio.h
> @@ -25,14 +25,34 @@
>   * THE SOFTWARE.
>   */
>
> -/* PHY Advertisement control register */
> +/* PHY MII Register/Bit Definitions */
> +/* PHY Registers defined by IEEE */
> +#define PHY_CTRL 0x00 /* Control Register */
> +#define PHY_STATUS   0x01 /* Status Regiser */
> +#define PHY_ID1  0x02 /* Phy Id Reg (word 1) */
> +#define PHY_ID2  0x03 /* Phy Id Reg (word 2) */
> +#define PHY_AUTONEG_ADV  0x04 /* Autoneg Advertisement */
> +#define PHY_LP_ABILITY   0x05 /* Link Partner Ability (Base Page) */
> +#define PHY_AUTONEG_EXP  0x06 /* Autoneg Expansion Reg */
> +#define PHY_NEXT_PAGE_TX 0x07 /* Next Page TX */
> +#define PHY_LP_NEXT_PAGE 0x08 /* Link Partner Next Page */
> +#define PHY_1000T_CTRL   0x09 /* 1000Base-T Control Reg */
> +#define PHY_1000T_STATUS 0x0A /* 1000Base-T Status Reg */
> +#define PHY_EXT_STATUS   0x0F /* Extended Status Reg */
> +
> +#define NUM_PHY_REGS 0x20  /* 5 bit address bus (0-0x1F) */
> +
> +#define PHY_CTRL_RST0x8000 /* PHY reset command */
> +#define PHY_CTRL_ANEG_RST   0x0200 /* Autonegotiation reset command */
> +
> +/* PHY Advertisement control and remote capability registers (same 
> bitfields) */
>  #define PHY_ADVERTISE_10HALF0x0020  /* Try for 10mbps half-duplex  */
>  #define PHY_ADVERTISE_10FULL0x0040  /* Try for 10mbps full-duplex  */
>  #define PHY_ADVERTISE_100HALF   0x0080  /* Try for 100mbps half-duplex */
>  #define PHY_ADVERTISE_100FULL   0x0100  /* Try for 100mbps full-duplex */
>
>  struct qemu_phy {
> -uint32_t regs[32];
> +uint32_t regs[NUM_PHY_REGS];
>
>  int link;
>
> diff --git a/hw/net/mdio.c b/hw/net/mdio.c
> index 3763fcc8af..3d70d99077 100644
> --- a/hw/net/mdio.c
> +++ b/hw/net/mdio.c
> @@ -122,12 +122,12 @@ static void tdk_write(struct qemu_phy *phy, unsigned 
> int req, unsigned int data)
>
>  void tdk_init(struct qemu_phy *phy)
>  {
> -phy->regs[0] = 0x3100;
> +phy->regs[PHY_CTRL] = 0x3100;
>  /* PHY Id. */
> -phy->regs[2] = 0x0300;
> -phy->regs[3] = 0xe400;
> +phy->regs[PHY_ID1] = 0x0300;
> +phy->regs[PHY_ID2] = 0xe400;
>  /* Autonegotiation advertisement reg. */
> -phy->regs[4] = 0x01e1;
> +phy->regs[PHY_AUTONEG_ADV] = 0x01e1;
>  phy->link = 1;
>
>  phy->read = tdk_read;
> --
> 2.14.1
>
>



Re: [Qemu-devel] [PATCH v5 1/7] hw/mdio: Generalize etraxfs MDIO bitbanging emulation

2018-02-27 Thread Alistair Francis
On Fri, Sep 22, 2017 at 10:13 AM, Philippe Mathieu-Daudé
 wrote:
> From: Grant Likely 
>
> The etraxfs and Xilinx axienet Ethernet models implement quite a nice
> MDIO core that supports both bitbanging and direct register access. This
> change factors the common code out into a separate file. There are no
> functional changes here, just movement of code.
>
> The etraxfs and axienet are slightly different. The etraxfs version
> includes the bitbang state processing, but the axienet version has a
> minor enhancement for read/write of phy registers without using bitbang
> state variables.  This patch generalizes the etraxfs version, with the
> axienet change backported in.
>
> Signed-off-by: Grant Likely 
> Signed-off-by: Philippe Mathieu-Daudé 
> [PMD: rebased with a minor checkpatch fix]
> ---
>  include/hw/net/mdio.h   |  76 +
>  hw/net/etraxfs_eth.c| 278 
> +---
>  hw/net/mdio.c   | 262 +
>  hw/net/xilinx_axienet.c | 187 +---
>  hw/net/Makefile.objs|   2 +
>  5 files changed, 344 insertions(+), 461 deletions(-)
>  create mode 100644 include/hw/net/mdio.h
>  create mode 100644 hw/net/mdio.c
>
> diff --git a/include/hw/net/mdio.h b/include/hw/net/mdio.h
> new file mode 100644
> index 00..ac36aed3c3
> --- /dev/null
> +++ b/include/hw/net/mdio.h
> @@ -0,0 +1,76 @@
> +#ifndef BITBANG_MDIO_H
> +#define BITBANG_MDIO_H
> +
> +/*
> + * QEMU Bitbang Ethernet MDIO bus & PHY controllers.
> + *
> + * Copyright (c) 2008 Edgar E. Iglesias, Axis Communications AB.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a 
> copy
> + * of this software and associated documentation files (the "Software"), to 
> deal
> + * in the Software without restriction, including without limitation the 
> rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
> FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +/* PHY Advertisement control register */
> +#define PHY_ADVERTISE_10HALF0x0020  /* Try for 10mbps half-duplex  */
> +#define PHY_ADVERTISE_10FULL0x0040  /* Try for 10mbps full-duplex  */
> +#define PHY_ADVERTISE_100HALF   0x0080  /* Try for 100mbps half-duplex */
> +#define PHY_ADVERTISE_100FULL   0x0100  /* Try for 100mbps full-duplex */
> +
> +struct qemu_phy {
> +uint32_t regs[32];
> +
> +int link;
> +
> +unsigned int (*read)(struct qemu_phy *phy, unsigned int req);
> +void (*write)(struct qemu_phy *phy, unsigned int req, unsigned int data);
> +};
> +
> +struct qemu_mdio {
> +/* bus. */
> +int mdc;
> +int mdio;
> +
> +/* decoder.  */
> +enum {
> +PREAMBLE,
> +SOF,
> +OPC,
> +ADDR,
> +REQ,
> +TURNAROUND,
> +DATA
> +} state;
> +unsigned int drive;
> +
> +unsigned int cnt;
> +unsigned int addr;
> +unsigned int opc;
> +unsigned int req;
> +unsigned int data;
> +
> +struct qemu_phy *devs[32];
> +};
> +
> +void tdk_init(struct qemu_phy *phy);
> +void mdio_attach(struct qemu_mdio *bus, struct qemu_phy *phy,
> + unsigned int addr);
> +uint16_t mdio_read_req(struct qemu_mdio *bus, uint8_t addr, uint8_t req);
> +void mdio_write_req(struct qemu_mdio *bus, uint8_t addr, uint8_t req, 
> uint16_t data);
> +void mdio_cycle(struct qemu_mdio *bus);
> +
> +#endif
> diff --git a/hw/net/etraxfs_eth.c b/hw/net/etraxfs_eth.c
> index 013c8d0a41..f8d8f8441d 100644
> --- a/hw/net/etraxfs_eth.c
> +++ b/hw/net/etraxfs_eth.c
> @@ -26,287 +26,11 @@
>  #include "hw/sysbus.h"
>  #include "net/net.h"
>  #include "hw/cris/etraxfs.h"
> +#include "hw/net/mdio.h"
>  #include "qemu/error-report.h"
>
>  #define D(x)
>
> -/* Advertisement control register. */
> -#define ADVERTISE_10HALF0x0020  /* Try for 10mbps half-duplex  */
> -#define ADVERTISE_10FULL0x0040  /* Try for 10mbps full-duplex  */
> -#define ADVERTISE_100HALF   0x0080  /* Try for 100mbps half-duplex */
> -#define ADVERTISE_100FULL   0x0100  /* Try for 100mbps full-duplex */
> -
> -/*
> - * The MDIO extensions in the TDK PHY model were reversed engineered from the
> - * linux driver

Re: [Qemu-devel] [RFC v4 19/21] blockjobs: Expose manual property

2018-02-27 Thread Eric Blake

On 02/27/2018 03:57 PM, John Snow wrote:



On 02/27/2018 03:16 PM, Eric Blake wrote:

On 02/23/2018 05:51 PM, John Snow wrote:

Expose the "manual" property via QAPI for the backup-related jobs.
As of this commit, this allows the management API to request the
"concluded" and "dismiss" semantics for backup jobs.



+# @manual: True to use an expanded, more explicit job control flow.
+#  Jobs may transition from a running state to a pending state,
+#  where they must be instructed to complete manually via
+#  block-job-finalize.
+#  Jobs belonging to a transaction must either all or all not
use this
+#  setting. Once a transaction reaches a pending state,
issuing the
+#  finalize command to any one job in the transaction is
sufficient
+#  to finalize the entire transaction.


The previous commit message talked about mixed-manual transactions, but
this seems to imply it is not possible.  I'm fine if we don't support
mixed-manual transactions, but wonder if it means any changes to the
series.

Otherwise looks reasonable from the UI point of view.



More seriously, this documentation I wrote doesn't address the totality
of the expanded flow. I omitted dismiss here by accident as well. This
is at best a partial definition of the 'manual' property.

I'd like to use _this_ patch to ask the question: "What should the
proper noun for the QEMU 2.12+ Expanded Block Job Management Flow
Mechanism be?"


"Manual" actually doesn't sound too bad; I could also see "Explicit job 
flow", as in, "within a transaction, all jobs should have the same 
setting for the choice of Explicit Job Flow" (but then the name 'manual' 
would have to be changed to match).  The idea of a central document, 
that gets referred to from multiple spots in the QAPI docs, rather than 
duplicating information throughout the QAPI docs, is reasonable.




I'm not too sure, but "Manual mode" leaves a lot to be desired.

I keep calling it something like "2.12+ Job Management" but that's not
really descriptive.


That, and if someone ever backports the enhanced state machine to a 2.11 
branch, it becomes a misnomer.



I conceptualize the feature as the addition of a
purposefully more "needy" and less automatic completion mechanism, hence
the "manual"

Anyway, I'd like to figure out a good "documentation name" for it so I
can point all instances of the creation property (for drive-backup,
drive-mirror, and everyone else) to a central location that explains the
STM and what exactly the differences between manual=on/off are. I'd then
like to expose this property via query and link the documentation there
to this description, too.


"Explicit" and "Manual" are the two best options coming to me as I type 
this email.




It'd be nice-- under the same arguments that prompted 'dismiss'-- to say
that if a client crashes it can reconnect and discover what kind of
attention certain jobs will need by asking for the manual property back.

--js



--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-devel] [PATCH v3 9/9] tests: functional tests for QMP command set-numa-node

2018-02-27 Thread Eric Blake

On 02/16/2018 06:37 AM, Igor Mammedov wrote:

  * start QEMU with 2 unmapped cpus,
  * while in preconfig state
 * add 2 numa nodes
 * assign cpus to them
  * exit preconfig and in running state check that cpus
are mapped correctly.

Signed-off-by: Igor Mammedov 
---
  tests/numa-test.c | 71 +++
  1 file changed, 71 insertions(+)

diff --git a/tests/numa-test.c b/tests/numa-test.c
index 68aca9c..11c2842 100644
--- a/tests/numa-test.c
+++ b/tests/numa-test.c
@@ -260,6 +260,76 @@ static void aarch64_numa_cpu(const void *data)
  g_free(cli);
  }
  
+static bool is_err(QDict *response)

+{
+const char *desc = NULL;
+QDict *error = qdict_get_qdict(response, "error");
+if (error) {
+desc = qdict_get_try_str(error, "desc");
+}
+QDECREF(response);
+return !!desc;


Why are we duplicating this helper in more than one .c file?  If it is a 
common task, it should be in a common file for code reuse.  And as 
before, why are you returning false if the reply is an error but merely 
lacked 'desc'?


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-devel] [PATCH v3 8/9] QMP: add set-numa-node command

2018-02-27 Thread Eric Blake

On 02/16/2018 06:37 AM, Igor Mammedov wrote:

Command is allowed to run only in preconfig stage and
will allow to configure numa mapping for CPUs depending
on possible CPUs layout (query-hotpluggable-cpus) for
given machine instance.

Signed-off-by: Igor Mammedov 
---
  numa.c   |  5 +
  qapi-schema.json | 14 ++
  tests/qmp-test.c |  6 ++
  3 files changed, 25 insertions(+)




+++ b/qapi-schema.json
@@ -3201,3 +3201,17 @@
  # Since: 2.11
  ##
  { 'command': 'watchdog-set-action', 'data' : {'action': 'WatchdogAction'} }
+
+##
+# @set-numa-node:
+#
+# Runtime equivalent of '-numa' CLI option, available at
+# preconfigure stage to configure numa mapping before initializing
+# machine.
+#
+# Since 2.12
+##
+{ 'command': 'set-numa-node', 'boxed': true,
+  'data': 'NumaOptions',
+  'runstates': [ 'preconfig' ]
+}


Oh, so you ARE trying to do fine-grained control of which commands are 
valid in which states.  Still, would that be easier through a 
three-state enum (or pair of bools) instead of making every client 
enumerate an array of 'all states', 'all but preconfig', and 'preconfig 
only'?


Also, while preconfig is special (not every command can be made to run 
during preconfig, so having the state rejection logic centralized makes 
some sense), there are a lot fewer commands that are preconfig-only - 
could those commands (just set-numa-node at the moment) be made to 
perform state checks themselves rather than relying on centralized 
logic, and then you still only need a single bool in the QAPI schema 
(safe for preconfig, unsafe for preconfig in central logic; unsafe in 
other states in a per-command code).


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-devel] [PATCH v3 6/9] tests: extend qmp test with pereconfig checks

2018-02-27 Thread Eric Blake

On 02/16/2018 06:37 AM, Igor Mammedov wrote:

In the subject line: s/pereconfig/preconfig/


Add permission checks for commands in 'preconfig' state.

It should work for all targets, but won't work with
machine 'none' as it's limited to -smp 1 only.
So use PC machine for testing preconfig and 'runstate'
parameter.

Signed-off-by: Igor Mammedov 
---
  tests/qmp-test.c | 49 +
  1 file changed, 49 insertions(+)



  
+static bool is_err(QDict *rsp)

+{
+const char *desc = NULL;
+QDict *error = qdict_get_qdict(rsp, "error");
+if (error) {
+desc = qdict_get_try_str(error, "desc");
+}
+QDECREF(rsp);
+return !!desc;


Wait, so this returns false if this was an error but without a valid desc?


+}
+
+static void test_qmp_preconfig(void)
+{
+QDict *rsp, *ret;
+QTestState *qs = qtest_startf("-nodefaults -preconfig -smp 2");
+
+/* preconfig state */
+/* enabled commands, no error expected  */
+g_assert(!is_err(qtest_qmp(qs, "{ 'execute': 'query-commands' }")));
+
+/* forbidden commands, expected error */
+g_assert(is_err(qtest_qmp(qs, "{ 'execute': 'query-cpus' }")));


Does introspection show which commands are valid in preconfig state? 
That may be useful information for a client to know.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-devel] [PATCH v3 5/9] QAPI: allow to specify valid runstates per command

2018-02-27 Thread Eric Blake

On 02/16/2018 06:37 AM, Igor Mammedov wrote:

Add optional 'runstates' parameter in QAPI command definition,
which will permit to specify RunState variations in which
a command could be exectuted via QMP monitor.


s/exectuted/executed/



For compatibility reasons, commands, that don't use


s/commands,/commands/


'runstates' explicitly, are not permited to run in


s/explicitly,/explicitly/
s/permited/permitted/


preconfig state but allowed in all other states.

New option will be used to allow commands, which are
prepared/need to run this early, to run in preconfig state.
It will include query-hotpluggable-cpus and new set-numa-node
commands. Other commands that should be able to run in
preconfig state, should be ammeded to not expect machine


s/ammeded/amended/


in initialized state.

Signed-off-by: Igor Mammedov 
---
  include/qapi/qmp/dispatch.h |  5 +++-
  monitor.c   | 28 +---
  qapi-schema.json| 12 +++--
  qapi/qmp-dispatch.c | 39 
  qapi/qmp-registry.c |  4 ++-
  qapi/run-state.json |  6 -
  scripts/qapi-commands.py| 46 ++---
  scripts/qapi-introspect.py  |  2 +-
  scripts/qapi.py | 15 +++
  scripts/qapi2texi.py|  2 +-
  tests/qapi-schema/doc-good.out  |  4 +--
  tests/qapi-schema/ident-with-escape.out |  2 +-
  tests/qapi-schema/indented-expr.out |  4 +--
  tests/qapi-schema/qapi-schema-test.out  | 18 ++---
  tests/qapi-schema/test-qapi.py  |  6 ++---
  15 files changed, 151 insertions(+), 42 deletions(-)


Missing mention in docs/; among other things, see how the OOB series 
adds a similar per-command witness during QMP on whether the command can 
be used in certain contexts:

https://lists.gnu.org/archive/html/qemu-devel/2018-01/msg05789.html
including edits to docs/devel/qapi-code-gen.txt (definitely needed here) 
and docs/interop/qmp-spec.txt (may or may not be needed here).




diff --git a/include/qapi/qmp/dispatch.h b/include/qapi/qmp/dispatch.h
index 1e694b5..f821781 100644
--- a/include/qapi/qmp/dispatch.h
+++ b/include/qapi/qmp/dispatch.h
@@ -15,6 +15,7 @@
  #define QAPI_QMP_DISPATCH_H
  
  #include "qemu/queue.h"

+#include "qapi-types.h"


Probably conflict with the pending work from Markus to reorganize the 
QAPI header files to be more modular.



+++ b/qapi-schema.json
@@ -219,7 +219,11 @@
  # Note: This example has been shortened as the real response is too long.
  #
  ##
-{ 'command': 'query-commands', 'returns': ['CommandInfo'] }
+{ 'command': 'query-commands', 'returns': ['CommandInfo'],
+  'runstates': [ 'debug', 'inmigrate', 'internal-error', 'io-error', 'paused',
+ 'postmigrate', 'prelaunch', 'finish-migrate', 'restore-vm',
+ 'running', 'save-vm', 'shutdown', 'suspended', 'watchdog',
+ 'guest-panicked', 'colo', 'preconfig' ] }


Wow, that's going to be a lot of states to list for every command that 
is interested in the non-default state.  Would a simple bool flag be any 
easier than a list of states, since it looks like preconfig is the only 
special state?


  
  ##

  # @LostTickPolicy:
@@ -1146,7 +1150,11 @@
  # <- { "return": {} }
  #
  ##
-{ 'command': 'cont' }
+{ 'command': 'cont',
+  'runstates': [ 'debug', 'inmigrate', 'internal-error', 'io-error', 'paused',
+ 'postmigrate', 'prelaunch', 'finish-migrate', 'restore-vm',
+ 'running', 'save-vm', 'shutdown', 'suspended', 'watchdog',
+ 'guest-panicked', 'colo', 'preconfig' ] }


Should 'stop' also be permitted in the preconfig state, to get to the 
state that used to be provided by 'qemu -S'?




@@ -65,6 +66,40 @@ static QDict *qmp_dispatch_check_obj(const QObject *request, 
Error **errp)
  return dict;
  }
  
+static bool is_cmd_permited(const QmpCommand *cmd, Error **errp)


s/permited/permitted/g


+{
+int i;
+char *list = NULL;
+
+/* Old commands that don't have explicit runstate in schema
+ * are permited to run except of at PRECONFIG stage


including in the comments


+ */
+if (!cmd->valid_runstates) {
+if (runstate_check(RUN_STATE_PRECONFIG)) {
+const char *state = RunState_str(RUN_STATE_PRECONFIG);
+error_setg(errp, "The command '%s' isn't valid in '%s'",
+   cmd->name, state);
+return false;
+}
+return true;
+}
+
+for (i = 0; cmd->valid_runstates[i] != RUN_STATE__MAX; i++) {
+if (runstate_check(cmd->valid_runstates[i])) {
+return true;
+}
+}
+
+for (i = 0; cmd->valid_runstates[i] != RUN_STATE__MAX; i++) {
+const char *state = RunState_str(cmd->valid_runstates[i]);
+list = g_strjoin(", ", state, list, NULL);


This is rather complex compare

Re: [Qemu-devel] [PATCH 17/19] hw/misc/iotkit-secctl: Add remaining simple registers

2018-02-27 Thread Richard Henderson
On 02/20/2018 10:03 AM, Peter Maydell wrote:
> +case A_BRGINTEN:
> +s->brginten = value & 0x;
> +break;

Looks to me like bits 0-15 are read-only 0x,
so, that 0x should be 0x.


r~



Re: [Qemu-devel] [PATCH] hw/s390x/ipl: Bail out if the network bootloader can not be found

2018-02-27 Thread Farhan Ali



On 02/27/2018 02:11 PM, Thomas Huth wrote:

On 27.02.2018 18:26, Farhan Ali wrote:



On 02/27/2018 05:16 AM, Viktor Mihajlovski wrote:

On 27.02.2018 11:05, Thomas Huth wrote:

If QEMU fails to load 's390-netboot.img', the guest firmware currently
loops forever and just floods the console with "Network boot device
detected" messages. The code in ipl.c apparently already tried to stop
the VM with vm_stop() in this case, but this is in vain since the run
state is later reset due to a call to vm_start() from vl.c again.
To avoid the ugly firmware loop, let's simply exit QEMU directly instead
since it just does not make sense to continue if the required firmware
image can not be loaded. While we're at it, also add the file name of
the netboot binary to the error message, so that the user has a better
hint about what is missing.

Signed-off-by: Thomas Huth 
---
   hw/s390x/ipl.c | 5 +++--
   1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/s390x/ipl.c b/hw/s390x/ipl.c
index 0d06fc1..ff8308e 100644
--- a/hw/s390x/ipl.c
+++ b/hw/s390x/ipl.c
@@ -322,7 +322,8 @@ static int load_netboot_image(Error **errp)

   netboot_filename = qemu_find_file(QEMU_FILE_TYPE_BIOS,
ipl->netboot_fw);
   if (netboot_filename == NULL) {
-    error_setg(errp, "Could not find network bootloader");
+    error_setg(errp, "Could not find network bootloader '%s'",>
+   ipl->netboot_fw);
   goto unref_mr;
   }

@@ -416,7 +417,7 @@ void s390_ipl_prepare_cpu(S390CPU *cpu)
   if (ipl->netboot) {
   if (load_netboot_image(&err) < 0) {
   error_report_err(err);
-    vm_stop(RUN_STATE_INTERNAL_ERROR);

Should we print something like 'exiting' or 'terminating' here, to make
clear that the situation is terminal? Sometimes errors are reported and
processing continues nonetheless.



I had to go through my old notes to see why I didn't just exit when I
wrote it, and the reason was so we could put the guest in wait state so
we can do some diagnostics

Do we want to change this behavior?


That intended behavior obviously does not work, since the guest is
started anyway here.

And in this case, it does not make much sense to put the guest into wait
state, since the problem is simply that a firmware image could not be
loaded by QEMU, i.e. it's a problem in the host, not in the guest. Or
which kind of guest diagnostics would you expect here? ... Putting the
guest into stopped state only makes sense if the guest crashed /
panicked, so you could analyze the reason for the crash.

  Thomas



You are right and this would be the right thing to do.

Reviewed-by: Farhan Ali 




Re: [Qemu-devel] [RFC v4 19/21] blockjobs: Expose manual property

2018-02-27 Thread John Snow


On 02/27/2018 03:16 PM, Eric Blake wrote:
> On 02/23/2018 05:51 PM, John Snow wrote:
>> Expose the "manual" property via QAPI for the backup-related jobs.
>> As of this commit, this allows the management API to request the
>> "concluded" and "dismiss" semantics for backup jobs.
>>
>> Signed-off-by: John Snow 
>> ---
>>   blockdev.c   | 19 ---
>>   qapi/block-core.json | 32 ++--
>>   2 files changed, 42 insertions(+), 9 deletions(-)
>>
> 
>> +++ b/qapi/block-core.json
>> @@ -1177,6 +1177,16 @@
>>   # @job-id: identifier for the newly-created block job. If
>>   #  omitted, the device name will be used. (Since 2.7)
>>   #
>> +# @manual: True to use an expanded, more explicit job control flow.
>> +#  Jobs may transition from a running state to a pending state,
>> +#  where they must be instructed to complete manually via
>> +#  block-job-finalize.
>> +#  Jobs belonging to a transaction must either all or all not
>> use this
>> +#  setting. Once a transaction reaches a pending state,
>> issuing the
>> +#  finalize command to any one job in the transaction is
>> sufficient
>> +#  to finalize the entire transaction.
> 
> The previous commit message talked about mixed-manual transactions, but
> this seems to imply it is not possible.  I'm fine if we don't support
> mixed-manual transactions, but wonder if it means any changes to the
> series.
> 
> Otherwise looks reasonable from the UI point of view.
> 

More seriously, this documentation I wrote doesn't address the totality
of the expanded flow. I omitted dismiss here by accident as well. This
is at best a partial definition of the 'manual' property.

I'd like to use _this_ patch to ask the question: "What should the
proper noun for the QEMU 2.12+ Expanded Block Job Management Flow
Mechanism be?"

I'm not too sure, but "Manual mode" leaves a lot to be desired.

I keep calling it something like "2.12+ Job Management" but that's not
really descriptive. I conceptualize the feature as the addition of a
purposefully more "needy" and less automatic completion mechanism, hence
the "manual"

Anyway, I'd like to figure out a good "documentation name" for it so I
can point all instances of the creation property (for drive-backup,
drive-mirror, and everyone else) to a central location that explains the
STM and what exactly the differences between manual=on/off are. I'd then
like to expose this property via query and link the documentation there
to this description, too.

It'd be nice-- under the same arguments that prompted 'dismiss'-- to say
that if a client crashes it can reconnect and discover what kind of
attention certain jobs will need by asking for the manual property back.

--js



Re: [Qemu-devel] [PATCH 16/19] hw/misc/iotkit-secctl: Add handling for PPCs

2018-02-27 Thread Richard Henderson
On 02/20/2018 10:03 AM, Peter Maydell wrote:
> The IoTKit Security Controller includes various registers
> that expose to software the controls for the Peripheral
> Protection Controllers in the system. Implement these.
> 
> Signed-off-by: Peter Maydell 
> ---
>  include/hw/misc/iotkit-secctl.h |  64 +-
>  hw/misc/iotkit-secctl.c | 270 
> +---
>  2 files changed, 315 insertions(+), 19 deletions(-)

Reviewed-by: Richard Henderson 


r~




Re: [Qemu-devel] [PATCH 15/19] hw/misc/iotkit-secctl: Arm IoT Kit security controller initial skeleton

2018-02-27 Thread Richard Henderson
On 02/20/2018 10:03 AM, Peter Maydell wrote:
> +r >>= 8 * (addr & 3);
> +r &= (1 << (size * 8)) - 1;

extract32(r, (addr & 3) * 8, size * 8) ?

Otherwise,
Reviewed-by: Richard Henderson 


r~




Re: [Qemu-devel] [PATCH 14/19] hw/misc/tz-ppc: Model TrustZone peripheral protection controller

2018-02-27 Thread Richard Henderson
On 02/20/2018 10:03 AM, Peter Maydell wrote:
> Add a model of the TrustZone peripheral protection controller (PPC),
> which is used to gate transactions to non-TZ-aware peripherals so
> that secure software can configure them to not be accessible to
> non-secure software.
> 
> Signed-off-by: Peter Maydell 
> ---
>  hw/misc/Makefile.objs   |   2 +
>  include/hw/misc/tz-ppc.h| 101 ++
>  hw/misc/tz-ppc.c| 302 
> 
>  default-configs/arm-softmmu.mak |   2 +
>  hw/misc/trace-events|  11 ++
>  5 files changed, 418 insertions(+)
>  create mode 100644 include/hw/misc/tz-ppc.h
>  create mode 100644 hw/misc/tz-ppc.c

Reviewed-by: Richard Henderson 


r~



[Qemu-devel] [Bug 1673976] Re: linux-user clone() can't handle glibc posix_spawn() (causes locale-gen to assert)

2018-02-27 Thread Peter Maydell
That glibc change has caused the assert to go away, but QEMU's
spawn(CLONE_VFORK) still does not have the "always waits for child"
semantics that glibc has assumed since glibc commit 4b4d4056bb154. The
child and the parent will end up racing each other, and the child will
never be able to write to the parent's address space. I think that the
effect of that race will be that if the child fails (for instance if a
bad filename is passed and exec() fails) the parent will never notice
and will return a success code from the spawn function when it should
not.

So there remains a QEMU bug here; though it is also the case that I
can't see any way we can fix it.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1673976

Title:
  linux-user clone() can't handle glibc posix_spawn() (causes locale-gen
  to assert)

Status in QEMU:
  New

Bug description:
  I'm running a command (locale-gen) inside of an armv7h chroot mounted
  on my x86_64 desktop by putting qemu-arm-static into /usr/bin/ of the
  chroot file system and I get a core dump.

  locale-gen
  Generating locales...
    en_US.UTF-8...localedef: ../sysdeps/unix/sysv/linux/spawni.c:360: 
__spawnix: Assertion `ec >= 0' failed.
  qemu: uncaught target signal 6 (Aborted) - core dumped
  /usr/bin/locale-gen: line 41:34 Aborted (core dumped) 
localedef -i $input -c -f $charset -A /usr/share/locale/locale.alias $locale

  I've done this same thing successfully for years, but this breakage
  has appeared some time in the last 3 or so months. Possibly with the
  update to qemu version 2.8.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1673976/+subscriptions



Re: [Qemu-devel] [PATCH 13/19] hw/misc/mps2-fpgaio: FPGA control block for MPS2 AN505

2018-02-27 Thread Richard Henderson
On 02/20/2018 10:03 AM, Peter Maydell wrote:
> The MPS2 AN505 FPGA image includes a "FPGA control block"
> which is a small set of registers handling LEDs, buttons
> and some counters.
> 
> Signed-off-by: Peter Maydell 
> ---
>  hw/misc/Makefile.objs   |   1 +
>  include/hw/misc/mps2-fpgaio.h   |  43 ++
>  hw/misc/mps2-fpgaio.c   | 176 
> 
>  default-configs/arm-softmmu.mak |   1 +
>  hw/misc/trace-events|   6 ++
>  5 files changed, 227 insertions(+)
>  create mode 100644 include/hw/misc/mps2-fpgaio.h
>  create mode 100644 hw/misc/mps2-fpgaio.c

Reviewed-by: Richard Henderson 


r~




Re: [Qemu-devel] [RFC v4 00/21] blockjobs: add explicit job management

2018-02-27 Thread John Snow


On 02/24/2018 09:31 AM, no-re...@patchew.org wrote:
> Hi,
> 
> This series seems to have some coding style problems. See output below for
> more information:
> 
> Type: series
> Message-id: 20180223235142.21501-1-js...@redhat.com
> Subject: [Qemu-devel] [RFC v4 00/21] blockjobs: add explicit job management
> 
> === TEST SCRIPT BEGIN ===
> #!/bin/bash
> 
> BASE=base
> n=1
> total=$(git log --oneline $BASE.. | wc -l)
> failed=0
> 
> git config --local diff.renamelimit 0
> git config --local diff.renames True
> git config --local diff.algorithm histogram
> 
> commits="$(git log --format=%H --reverse $BASE..)"
> for c in $commits; do
> echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
> if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; 
> then
> failed=1
> echo
> fi
> n=$((n+1))
> done
> 
> exit $failed
> === TEST SCRIPT END ===
> 
> Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
> Switched to a new branch 'test'
> 230e578fa2 blockjobs: add manual_mgmt option to transactions
> f278a5155a iotests: test manual job dismissal
> 8e473ab4a8 blockjobs: Expose manual property
> 7ad2d0164f blockjobs: add block-job-finalize
> 3857c91315 blockjobs: add PENDING status and event
> 18eb8a4130 blockjobs: add waiting status
> daf9613432 blockjobs: add prepare callback
> 78be501212 blockjobs: add block_job_txn_apply function
> 4b659abe69 blockjobs: add commit, abort, clean helpers
> 4023046d76 blockjobs: ensure abort is called for cancelled jobs
> e9300b122e blockjobs: add block_job_dismiss
> 4fc045eae4 blockjobs: add NULL state
> e6aa454753 blockjobs: add CONCLUDED state
> 78efa2f937 blockjobs: add ABORTING state
> 057ad2472f blockjobs: add block_job_verb permission table
> c62c5b75a3 iotests: add pause_wait
> 4aadb9c38c blockjobs: add state transition table
> afc594c4b0 blockjobs: add status enum
> 434d3811fa blockjobs: add manual property
> fc3e3eebc9 blockjobs: model single jobs as transactions
> 8d32662676 blockjobs: fix set-speed kick
> 
> === OUTPUT BEGIN ===
> Checking PATCH 1/21: blockjobs: fix set-speed kick...
> Checking PATCH 2/21: blockjobs: model single jobs as transactions...
> Checking PATCH 3/21: blockjobs: add manual property...
> Checking PATCH 4/21: blockjobs: add status enum...
> Checking PATCH 5/21: blockjobs: add state transition table...
> ERROR: space prohibited before open square bracket '['
> #81: FILE: blockjob.c:48:
> +/* U: */ [BLOCK_JOB_STATUS_UNDEFINED] = {0, 1, 0, 0, 0, 0},
> 
> ERROR: space prohibited before open square bracket '['
> #82: FILE: blockjob.c:49:
> +/* C: */ [BLOCK_JOB_STATUS_CREATED]   = {0, 0, 1, 0, 0, 0},
> 
> ERROR: space prohibited before open square bracket '['
> #83: FILE: blockjob.c:50:
> +/* R: */ [BLOCK_JOB_STATUS_RUNNING]   = {0, 0, 0, 1, 1, 0},
> 
> ERROR: space prohibited before open square bracket '['
> #84: FILE: blockjob.c:51:
> +/* P: */ [BLOCK_JOB_STATUS_PAUSED]= {0, 0, 1, 0, 0, 0},
> 
> ERROR: space prohibited before open square bracket '['
> #85: FILE: blockjob.c:52:
> +/* Y: */ [BLOCK_JOB_STATUS_READY] = {0, 0, 0, 0, 0, 1},
> 
> ERROR: space prohibited before open square bracket '['
> #86: FILE: blockjob.c:53:
> +/* S: */ [BLOCK_JOB_STATUS_STANDBY]   = {0, 0, 0, 0, 1, 0},
> 
> total: 6 errors, 0 warnings, 90 lines checked
> 
> Your patch has style problems, please review.  If any of these errors
> are false positives report them to the maintainer, see
> CHECKPATCH in MAINTAINERS.
> 

-EWONTFIX unless someone screams louder than me.

> Checking PATCH 6/21: iotests: add pause_wait...
> Checking PATCH 7/21: blockjobs: add block_job_verb permission table...
> Checking PATCH 8/21: blockjobs: add ABORTING state...
> ERROR: space prohibited before open square bracket '['
> #61: FILE: blockjob.c:48:
> +/* U: */ [BLOCK_JOB_STATUS_UNDEFINED] = {0, 1, 0, 0, 0, 0, 0},
> 
> ERROR: space prohibited before open square bracket '['
> #62: FILE: blockjob.c:49:
> +/* C: */ [BLOCK_JOB_STATUS_CREATED]   = {0, 0, 1, 0, 0, 0, 0},
> 
> ERROR: space prohibited before open square bracket '['
> #63: FILE: blockjob.c:50:
> +/* R: */ [BLOCK_JOB_STATUS_RUNNING]   = {0, 0, 0, 1, 1, 0, 1},
> 
> ERROR: space prohibited before open square bracket '['
> #64: FILE: blockjob.c:51:
> +/* P: */ [BLOCK_JOB_STATUS_PAUSED]= {0, 0, 1, 0, 0, 0, 0},
> 
> ERROR: space prohibited before open square bracket '['
> #65: FILE: blockjob.c:52:
> +/* Y: */ [BLOCK_JOB_STATUS_READY] = {0, 0, 0, 0, 0, 1, 1},
> 
> ERROR: space prohibited before open square bracket '['
> #66: FILE: blockjob.c:53:
> +/* S: */ [BLOCK_JOB_STATUS_STANDBY]   = {0, 0, 0, 0, 1, 0, 0},
> 
> ERROR: space prohibited before open square bracket '['
> #67: FILE: blockjob.c:54:
> +/* X: */ [BLOCK_JOB_STATUS_ABORTING]  = {0, 0, 0, 0, 0, 0, 0},
> 
> total: 7 errors, 0 warnings, 62 lines checked

-EWONTFIX

> 
> Your patch has style problems, please review.  If any of these errors
> are false positives report 

Re: [Qemu-devel] [PATCH 12/19] hw/core/split-irq: Device that splits IRQ lines

2018-02-27 Thread Richard Henderson
On 02/20/2018 10:03 AM, Peter Maydell wrote:
> In some board or SoC models it is necessary to split a qemu_irq line
> so that one input can feed multiple outputs.  We currently have
> qemu_irq_split() for this, but that has several deficiencies:
>  * it can only handle splitting a line into two
>  * it unavoidably leaks memory, so it can't be used
>in a device that can be deleted
> 
> Implement a qdev device that encapsulates splitting of IRQs, with a
> configurable number of outputs.  (This is in some ways the inverse of
> the TYPE_OR_IRQ device.)
> 
> Signed-off-by: Peter Maydell 
> ---
>  hw/core/Makefile.objs   |  1 +
>  include/hw/core/split-irq.h | 57 +
>  include/hw/irq.h|  4 +-
>  hw/core/split-irq.c | 89 
> +
>  4 files changed, 150 insertions(+), 1 deletion(-)
>  create mode 100644 include/hw/core/split-irq.h
>  create mode 100644 hw/core/split-irq.c

Reviewed-by: Richard Henderson 


r~



Re: [Qemu-devel] [RFC v4 00/21] blockjobs: add explicit job management

2018-02-27 Thread John Snow


On 02/25/2018 06:25 PM, no-re...@patchew.org wrote:
>   CC  scsi/qemu-pr-helper.o
> /var/tmp/patchew-tester-tmp-9ukz0kme/src/blockjob.c: In function 
> ‘block_job_txn_apply.isra.8’:
> /var/tmp/patchew-tester-tmp-9ukz0kme/src/blockjob.c:511:5: error: ‘rc’ may be 
> used uninitialized in this function [-Werror=maybe-uninitialized]
>  return rc;
>  ^
>   CC  qemu-bridge-helper.o
>   CC  blockdev.o
>   CC  blockdev-nbd.o
>   CC  bootdevice.o
>   CC  iothread.o
>   CC  qdev-monitor.o
>   CC  device-hotplug.o
>   CC  os-posix.o
>   CC  bt-host.o
>   CC  bt-vhci.o
>   CC  dma-helpers.o
>   CC  vl.o
>   CC  tpm.o
>   CC  device_tree.o
>   CC  qmp-marshal.o
>   CC  qmp.o
>   CC  hmp.o
>   CC  cpus-common.o
>   CC  audio/audio.o
> cc1: all warnings being treated as errors
> make: *** [blockjob.o] Error 1
> make: *** Waiting for unfinished jobs
> === OUTPUT END ===
> 
> Test command exited with code: 2
> 
> 
> ---
> Email generated automatically by Patchew [http://patchew.org/].
> Please send your feedback to patchew-de...@freelists.org

No idea why this would only trigger on ppc/be, but I'll look into it.



Re: [Qemu-devel] [RFC v4 17/21] blockjobs: add PENDING status and event

2018-02-27 Thread John Snow


On 02/27/2018 03:05 PM, Eric Blake wrote:
> On 02/23/2018 05:51 PM, John Snow wrote:
>> For jobs utilizing the new manual workflow, we intend to prohibit
>> them from modifying the block graph until the management layer provides
>> an explicit ACK via block-job-finalize to move the process forward.
>>
>> To distinguish this runstate from "ready" or "waiting," we add a new
>> "pending" event.
>>
>> For now, the transition from PENDING to CONCLUDED/ABORTING is automatic,
>> but a future commit will add the explicit block-job-finalize step.
>>
>> Transitions:
>> Waiting -> Pending:   Normal transition.
>> Pending -> Concluded: Normal transition.
>> Pending -> Aborting:  Late transactional failures and cancellations.
>>
>> Removed Transitions:
>> Waiting -> Concluded: Jobs must go to PENDING first.
>>
>> Verbs:
>> Cancel: Can be applied to a pending job.
>>
> 
>> +##
>> +# @BLOCK_JOB_PENDING:
>> +#
>> +# Emitted when a block job is awaiting explicit authorization to
>> finalize graph
>> +# changes via @block-job-finalize. If this job is part of a
>> transaction, it will
>> +# not emit this event until the transaction has converged first.
> 
> Same question of whether this new event is always emitted (and older
> clients presumably ignore it), or only emitted for clients that
> requested new-style state management.
> 

Old style jobs will skip the broadcast of the event, but will still
transition to the state. However, since transition is synchronous, you
likely won't see this state show up in a query for old style jobs.

That was the intent, anyway.

I wanted to be nonintrusive, and felt that this event was likely not
useful in any way unless we were using the new state management scheme.
In the old style, this event will be fully synchronous with COMPLETED or
CANCELLED, for instance.

--js



Re: [Qemu-devel] [PATCH 11/19] qdev: Add new qdev_init_gpio_in_named_with_opaque()

2018-02-27 Thread Richard Henderson
On 02/20/2018 10:03 AM, Peter Maydell wrote:
> The function qdev_init_gpio_in_named() passes the DeviceState pointer
> as the opaque data pointor for the irq handler function.  Usually
> this is what you want, but in some cases it would be helpful to use
> some other data pointer.
> 
> Add a new function qdev_init_gpio_in_named_with_opaque() which allows
> the caller to specify the data pointer they want.
> 
> Signed-off-by: Peter Maydell 
> ---
>  include/hw/qdev-core.h | 30 --
>  hw/core/qdev.c |  8 +---
>  2 files changed, 33 insertions(+), 5 deletions(-)

Reviewed-by: Richard Henderson 


r~




Re: [Qemu-devel] [PATCH 09/19] hw/misc/unimp: Move struct to header file

2018-02-27 Thread Richard Henderson
On 02/20/2018 10:03 AM, Peter Maydell wrote:
> Move the definition of the struct for the unimplemented-device
> from unimp.c to unimp.h, so that users can embed the struct
> in their own device structs if they prefer.
> 
> Signed-off-by: Peter Maydell 
> ---
>  include/hw/misc/unimp.h | 10 ++
>  hw/misc/unimp.c | 10 --
>  2 files changed, 10 insertions(+), 10 deletions(-)

Reviewed-by: Richard Henderson 


r~




Re: [Qemu-devel] [PATCH 10/19] include/hw/or-irq.h: Add missing include guard

2018-02-27 Thread Richard Henderson
On 02/20/2018 10:03 AM, Peter Maydell wrote:
> The or-irq.h header file is missing the customary guard against
> multiple inclusion, which means compilation fails if it gets
> included twice. Fix the omission.
> 
> Signed-off-by: Peter Maydell 
> ---
>  include/hw/or-irq.h | 5 +
>  1 file changed, 5 insertions(+)

Reviewed-by: Richard Henderson 


r~




Re: [Qemu-devel] [RFC v4 16/21] blockjobs: add waiting status

2018-02-27 Thread John Snow


On 02/27/2018 03:00 PM, Eric Blake wrote:
> On 02/23/2018 05:51 PM, John Snow wrote:
>> For jobs that are stuck waiting on others in a transaction, it would
>> be nice to know that they are no longer "running" in that sense, but
>> instead are waiting on other jobs in the transaction.
>>
>> Jobs that are "waiting" in this sense cannot be meaningfully altered
>> any longer as they have left their running loop. The only meaningful
>> user verb for jobs in this state is "cancel," which will cancel the
>> whole transaction, too.
>>
>> Transitions:
>> Running -> Waiting:   Normal transition.
>> Ready   -> Waiting:   Normal transition.
>> Waiting -> Aborting:  Transactional cancellation.
>> Waiting -> Concluded: Normal transition.
>>
>> Removed Transitions:
>> Running -> Concluded: Jobs must go to WAITING first.
>> Ready   -> Concluded: Jobs must go to WAITING fisrt.
> 
> s/fisrt/first/
> 
>> +++ b/blockjob.c
> 
>> @@ -3934,6 +3938,29 @@
>>   'offset': 'int',
>>   'speed' : 'int' } }
>>   +##
>> +# @BLOCK_JOB_WAITING:
>> +#
>> +# Emitted when a block job that is part of a transction has stopped
>> work and is
> 
> s/transction/transaction/
> 
>> +# waiting for other jobs in the transaction to reach the same state.
> 
> Is this event emitted only for 'new-style' transactions (old drivers
> will never see it, because they don't request new style), or always (old
> drivers will see, but presumably ignore, it)?
> 

...! Actually, I meant to remove the WAITING *event* entirely, this is a
mistake.

It's purely an informational state that clients likely cannot make any
real use of, because regardless of old or new style, jobs will
transition automatically to "PENDING."

That said, old or new, the state is visible from query-block-jobs.

--js



Re: [Qemu-devel] [RFC v4 12/21] blockjobs: ensure abort is called for cancelled jobs

2018-02-27 Thread John Snow


On 02/27/2018 02:49 PM, Eric Blake wrote:
> On 02/23/2018 05:51 PM, John Snow wrote:
>> Presently, even if a job is canceled post-completion as a result of
>> a failing peer in a transaction, it will still call .commit because
>> nothing has updated or changed its return code.
>>
>> The reason why this does not cause problems currently is because
>> backup's implementation of .commit checks for cancellation itself.
>>
>> I'd like to simplify this contract:
>>
>> (1) Abort is called if the job/transaction fails
>> (2) Commit is called if the job/transaction succeeds
>>
>> To this end: A job's return code, if 0, will be forcibly set as
>> -ECANCELED if that job has already concluded. Remove the now
>> redundant check in the backup job implementation.
>>
>> We need to check for cancellation in both block_job_completed
>> AND block_job_completed_single, because jobs may be cancelled between
>> those two calls; for instance in transactions.
>>
>> The check in block_job_completed could be removed, but there's no
>> point in starting to attempt to succeed a transaction that we know
>> in advance will fail.
>>
>> This does NOT affect mirror jobs that are "canceled" during their
>> synchronous phase. The mirror job itself forcibly sets the canceled
>> property to false prior to ceding control, so such cases will invoke
>> the "commit" callback.
>>
>> Signed-off-by: John Snow 
>> ---
>>   block/backup.c |  2 +-
>>   block/trace-events |  1 +
>>   blockjob.c | 19 +++
>>   3 files changed, 17 insertions(+), 5 deletions(-)
> 
> More lines of code, but the contract does seem simpler and useful for
> the later patches.
> 

Unfortunately yes, but it would be less overall if more than Backup made
use of commit/abort, which I anticipate in the future when I try to
start switching jobs over to using the .prepare callback.

It was just genuinely shocking to me that we'd call commit(), but backup
secretly knew we wanted abort. That type of logic belongs in the core
dispatch layer.

> Reviewed-by: Eric Blake 
> 

Thanks for the reviews!



Re: [Qemu-devel] [PATCH 08/19] target/arm: Add Cortex-M33

2018-02-27 Thread Richard Henderson
On 02/20/2018 10:03 AM, Peter Maydell wrote:
> Add a Cortex-M33 definition. The M33 is an M profile CPU
> which implements the ARM v8M architecture, including the
> M profile Security Extension.
> 
> Signed-off-by: Peter Maydell 
> ---
>  target/arm/cpu.c | 31 +++
>  1 file changed, 31 insertions(+)

Reviewed-by: Richard Henderson 


r~




Re: [Qemu-devel] [RFC v4 15/21] blockjobs: add prepare callback

2018-02-27 Thread John Snow


On 02/27/2018 02:56 PM, Eric Blake wrote:
> On 02/23/2018 05:51 PM, John Snow wrote:
>> Some jobs upon finalization may need to perform some work that can
>> still fail. If these jobs are part of a transaction, it's important
>> that these callbacks fail the entire transaction.
>>
>> We allow for a new callback in addition to commit/abort/clean that
>> allows us the opportunity to have fairly late-breaking failures
>> in the transactional process.
>>
>> The expected flow is:
>>
>> - All jobs in a transaction converge to the WAITING state
>>    (added in a forthcoming commit)
>> - All jobs prepare to call either commit/abort
>> - If any job fails, is canceled, or fails preparation, all jobs
>>    call their .abort callback.
>> - All jobs enter the PENDING state, awaiting manual intervention
>>    (also added in a forthcoming commit)
>> - block-job-finalize is issued by the user/management layer
>> - All jobs call their commit callbacks.
>>
>> Signed-off-by: John Snow 
>> ---
>>   blockjob.c   | 34 +++---
>>   include/block/blockjob_int.h | 10 ++
>>   2 files changed, 41 insertions(+), 3 deletions(-)
>>
> 
>> @@ -467,17 +480,22 @@ static void block_job_cancel_async(BlockJob *job)
>>   job->cancelled = true;
>>   }
>>   -static void block_job_txn_apply(BlockJobTxn *txn, void fn(BlockJob *))
>> +static int block_job_txn_apply(BlockJobTxn *txn, int fn(BlockJob *))
>>   {
>>   AioContext *ctx;
>>   BlockJob *job, *next;
>> +    int rc;
>>     QLIST_FOREACH_SAFE(job, &txn->jobs, txn_list, next) {
>>   ctx = blk_get_aio_context(job->blk);
>>   aio_context_acquire(ctx);
>> -    fn(job);
>> +    rc = fn(job);
>>   aio_context_release(ctx);
>> +    if (rc) {
>> +    break;
>> +    }
> 
> This short-circuits the application of the function to the rest of the
> group.  Is that ever going to be a problem?
> 

With what I've written, I don't think so -- but I can't guarantee
someone won't misunderstand the semantics of it and it will become a
problem. It is a potentially dangerous function in that way.

--js



Re: [Qemu-devel] [RFC v4 20/21] iotests: test manual job dismissal

2018-02-27 Thread John Snow


On 02/27/2018 03:21 PM, Eric Blake wrote:
> On 02/23/2018 05:51 PM, John Snow wrote:
>> Signed-off-by: John Snow 
>> ---
>>   tests/qemu-iotests/056 | 195
>> +
>>   tests/qemu-iotests/056.out |   4 +-
>>   2 files changed, 197 insertions(+), 2 deletions(-)
>>
> 
> I'm not sure if this covers everything in the series, but it looks like

It definitely doesn't!

> a reasonable expansion and hits a lot of the highlights.  At any rate,
> it's always better to add tests, and the test passing is a good bet that
> the new code will be harder to regress.
> 

More to follow, but I was afraid of wasting time if this series didn't
look on the whole serviceable.

I'll probably focus efforts on expanding blockjob-txn and blockjob unit
tests.

> Reviewed-by: Eric Blake 
> 



Re: [Qemu-devel] [RFC v4 19/21] blockjobs: Expose manual property

2018-02-27 Thread John Snow


On 02/27/2018 03:16 PM, Eric Blake wrote:
> On 02/23/2018 05:51 PM, John Snow wrote:
>> Expose the "manual" property via QAPI for the backup-related jobs.
>> As of this commit, this allows the management API to request the
>> "concluded" and "dismiss" semantics for backup jobs.
>>
>> Signed-off-by: John Snow 
>> ---
>>   blockdev.c   | 19 ---
>>   qapi/block-core.json | 32 ++--
>>   2 files changed, 42 insertions(+), 9 deletions(-)
>>
> 
>> +++ b/qapi/block-core.json
>> @@ -1177,6 +1177,16 @@
>>   # @job-id: identifier for the newly-created block job. If
>>   #  omitted, the device name will be used. (Since 2.7)
>>   #
>> +# @manual: True to use an expanded, more explicit job control flow.
>> +#  Jobs may transition from a running state to a pending state,
>> +#  where they must be instructed to complete manually via
>> +#  block-job-finalize.
>> +#  Jobs belonging to a transaction must either all or all not
>> use this
>> +#  setting. Once a transaction reaches a pending state,
>> issuing the
>> +#  finalize command to any one job in the transaction is
>> sufficient
>> +#  to finalize the entire transaction.
> 
> The previous commit message talked about mixed-manual transactions, but
> this seems to imply it is not possible.  I'm fine if we don't support
> mixed-manual transactions, but wonder if it means any changes to the
> series.
> 
> Otherwise looks reasonable from the UI point of view.
> 

Refactor hell.

The intent (and my belief) is that as of right now you CAN mix them. In
earlier drafts, it was not always the case.



Re: [Qemu-devel] [PATCH v3 3/9] CLI: add -preconfig option

2018-02-27 Thread Eric Blake

On 02/16/2018 06:37 AM, Igor Mammedov wrote:

Option allows to pause QEMU at new RUN_STATE_PRECONFIG time,
which would allow to configure QEMU from QMP before machine
jumps into board initialization code machine_run_board_init().


Grammar suggestion:

This option allows pausing QEMU in the new RUN_STATE_PRECONFIG state, 
allowing the configuration of QEMU from QMP before the machine jumps 
into board initialization code of machine_run_board_init().




Intent is to allow management to query machine state and
additionally configure it using previous query results
within one QEMU instance (i.e. eliminate need to start QEMU
twice, 1st to query board specific parameters and 2nd for
for actual VM start using query result for additional
parameters).

Initially it's planned to be used for configuring numa
topology depending on cpu layout.


It may be worth mentioning in the commit message how this differs from 
-S, and what the QMP client must do to get the guest started in this 
mode to enter the normal lifecycle that it used to have when using -S.




Signed-off-by: Igor Mammedov 
---
  include/sysemu/sysemu.h |  1 +
  qapi/run-state.json |  3 ++-
  qemu-options.hx | 11 +++
  qmp.c   |  5 +
  vl.c| 35 ++-
  5 files changed, 53 insertions(+), 2 deletions(-)




+++ b/qapi/run-state.json
@@ -49,12 +49,13 @@
  # @colo: guest is paused to save/restore VM state under colo checkpoint,
  #VM can not get into this state unless colo capability is enabled
  #for migration. (since 2.8)
+# @preconfig: QEMU is paused before machine is created.


Needs a '(since 2.12)' tag.  Probably also be worth mentioning that this 
state is only visible for clients that pass the new CLI option.



+++ b/qemu-options.hx
@@ -3283,6 +3283,17 @@ STEXI
  Run the emulation in single step mode.
  ETEXI
  
+DEF("preconfig", 0, QEMU_OPTION_preconfig, \

+"-preconfig  pause QEMU before machine is initialized\n",
+QEMU_ARCH_ALL)
+STEXI
+@item -preconfig
+@findex -preconfig
+Pause QEMU for interactive configuration before machine is created,
+which allows to query and configure properties affecting machine
+initialization. Use QMP command 'cont' to exit paused state.


Pause QEMU for interactive configuration before the machine is created, 
which allows querying and configuring properties that will affect 
machine initialization.  Use the QMP command 'cont' to exit the 
preconfig state.


Hmm - can you also transition from preconfig to the normal paused state 
via 'stop', which you would do to emit other commands that you used to 
issue between the older 'qemu -S' and the 'cont'?


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-devel] [PATCH v3 0/9] enable numa configuration before machine_init() from QMP

2018-02-27 Thread Eric Blake

On 02/27/2018 10:36 AM, Igor Mammedov wrote:

On Fri, 16 Feb 2018 13:37:12 +0100
Igor Mammedov  wrote:

Eric,

Adding you to CC list (git send-mail somehow haven't noticed you in cover 
letter).


Actually, I did get CC'd on the original cover letter, but you are 
reading the copy that hit the list, which means we are suffering from 
mailman's bug that it rewrites emails sent through the list to drop cc 
of readers that requested no duplicates (I really detest that mailman 2 
corrupts cc's as a side effect of what is otherwise a potentially useful 
knob, and can only hope that mailman 3 has split its behavior into two 
separate knobs).




Could you please look at QAPI/QMP parts of this series.


Yes, starting a read-through now.

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-devel] [PATCH 07/19] armv7m: Forward init-svtor property to CPU object

2018-02-27 Thread Richard Henderson
On 02/20/2018 10:03 AM, Peter Maydell wrote:
> Create an "init-svtor" property on the armv7m container
> object which we can forward to the CPU object.
> 
> Signed-off-by: Peter Maydell 
> ---
>  include/hw/arm/armv7m.h | 2 ++
>  hw/arm/armv7m.c | 6 ++
>  2 files changed, 8 insertions(+)
Reviewed-by: Richard Henderson 


r~




Re: [Qemu-devel] [PATCH v3 15/29] vhost+postcopy: Send address back to qemu

2018-02-27 Thread Michael S. Tsirkin
On Tue, Feb 27, 2018 at 07:54:18PM +, Dr. David Alan Gilbert wrote:
> * Michael S. Tsirkin (m...@redhat.com) wrote:
> > On Fri, Feb 16, 2018 at 01:16:11PM +, Dr. David Alan Gilbert (git) 
> > wrote:
> > > From: "Dr. David Alan Gilbert" 
> > > 
> > > We need a better way, but at the moment we need the address of the
> > > mappings sent back to qemu so it can interpret the messages on the
> > > userfaultfd it reads.
> > > 
> > > This is done as a 3 stage set:
> > >QEMU -> client
> > >   set_mem_table
> > > 
> > >mmap stuff, get addresses
> > > 
> > >client -> qemu
> > >here are the addresses
> > > 
> > >qemu -> client
> > >OK - now you can use them
> > > 
> > > That ensures that qemu has registered the new addresses in it's
> > > userfault code before the client starts accessing them.
> > > 
> > > Note: We don't ask for the default 'ack' reply since we've got our own.
> > > 
> > > Signed-off-by: Dr. David Alan Gilbert 
> > > ---
> > >  contrib/libvhost-user/libvhost-user.c | 24 -
> > >  docs/interop/vhost-user.txt   |  9 +
> > >  hw/virtio/trace-events|  1 +
> > >  hw/virtio/vhost-user.c| 67 
> > > +--
> > >  4 files changed, 98 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/contrib/libvhost-user/libvhost-user.c 
> > > b/contrib/libvhost-user/libvhost-user.c
> > > index a18bc74a7c..e02e5d6f46 100644
> > > --- a/contrib/libvhost-user/libvhost-user.c
> > > +++ b/contrib/libvhost-user/libvhost-user.c
> > > @@ -491,10 +491,32 @@ vu_set_mem_table_exec_postcopy(VuDev *dev, 
> > > VhostUserMsg *vmsg)
> > > dev_region->mmap_addr);
> > >  }
> > >  
> > > +/* Return the address to QEMU so that it can translate the ufd
> > > + * fault addresses back.
> > > + */
> > > +msg_region->userspace_addr = (uintptr_t)(mmap_addr +
> > > + 
> > > dev_region->mmap_offset);
> > >  close(vmsg->fds[i]);
> > >  }
> > >  
> > > -/* TODO: Get address back to QEMU */
> > > +/* Send the message back to qemu with the addresses filled in */
> > > +vmsg->fd_num = 0;
> > > +if (!vu_message_write(dev, dev->sock, vmsg)) {
> > > +vu_panic(dev, "failed to respond to set-mem-table for postcopy");
> > > +return false;
> > > +}
> > > +
> > > +/* Wait for QEMU to confirm that it's registered the handler for the
> > > + * faults.
> > > + */
> > > +if (!vu_message_read(dev, dev->sock, vmsg) ||
> > > +vmsg->size != sizeof(vmsg->payload.u64) ||
> > > +vmsg->payload.u64 != 0) {
> > > +vu_panic(dev, "failed to receive valid ack for postcopy 
> > > set-mem-table");
> > > +return false;
> > > +}
> > > +
> > > +/* OK, now we can go and register the memory and generate faults */
> > >  for (i = 0; i < dev->nregions; i++) {
> > >  VuDevRegion *dev_region = &dev->regions[i];
> > >  #ifdef UFFDIO_REGISTER
> > > diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
> > > index bdec9ec0e8..5bbcab2cc4 100644
> > > --- a/docs/interop/vhost-user.txt
> > > +++ b/docs/interop/vhost-user.txt
> > > @@ -454,12 +454,21 @@ Master message types
> > >Id: 5
> > >Equivalent ioctl: VHOST_SET_MEM_TABLE
> > >Master payload: memory regions description
> > > +  Slave payload: (postcopy only) memory regions description
> > >  
> > >Sets the memory map regions on the slave so it can translate the 
> > > vring
> > >addresses. In the ancillary data there is an array of file 
> > > descriptors
> > >for each memory mapped region. The size and ordering of the fds 
> > > matches
> > >the number and ordering of memory regions.
> > >  
> > > +  When postcopy-listening has been received,
> > 
> > Which message is this?
> 
> VHOST_USER_POSTCOPY_LISTEN
> 
> Do you want me just to change that to, 'When VHOST_USER_POSTCOPY_LISTEN
> has been received' ?

I think it's better this way, yes.

> > > SET_MEM_TABLE replies with
> > > +  the bases of the memory mapped regions to the master.  It must 
> > > have mmap'd
> > > +  the regions but not yet accessed them and should not yet generate 
> > > a userfault
> > > +  event. Note NEED_REPLY_MASK is not set in this case.
> > > +  QEMU will then reply back to the list of mappings with an empty
> > > +  VHOST_USER_SET_MEM_TABLE as an acknolwedgment; only upon reception 
> > > of this
> > > +  message may the guest start accessing the memory and generating 
> > > faults.
> > > +
> > >   * VHOST_USER_SET_LOG_BASE
> > >  
> > >Id: 6
> > 
> > As you say yourself, this is probably the best we can do for now,
> > but it's not ideal. So I think it's a good idea to isolate this
> > behind a separate protocol feature bit. For now it will be required
> > for postcopy, when it's fixed in kernel we can

Re: [Qemu-devel] [RFC v4 21/21] blockjobs: add manual_mgmt option to transactions

2018-02-27 Thread Eric Blake

On 02/23/2018 05:51 PM, John Snow wrote:

This allows us to easily force the option for all jobs belonging
to a transaction to ensure consistency with how all those jobs
will be handled.

This is purely a convenience.

Signed-off-by: John Snow 
---



+++ b/qapi/transaction.json
@@ -79,7 +79,8 @@
  ##
  { 'struct': 'TransactionProperties',
'data': {
-   '*completion-mode': 'ActionCompletionMode'
+   '*completion-mode': 'ActionCompletionMode',
+   '*manual-mgmt': 'bool'


Missing QAPI documentation (what you have elsewhere in the C code can 
probably be copied here, though).


The UI aspect makes sense (I can declare one manual at the transaction 
level instead of multiple manual declarations per member level within 
the transaction).


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-devel] [PATCH v3 00/29] postcopy+vhost-user/shared ram

2018-02-27 Thread Michael S. Tsirkin
On Tue, Feb 27, 2018 at 08:05:25PM +, Dr. David Alan Gilbert wrote:
> * Michael S. Tsirkin (m...@redhat.com) wrote:
> > On Fri, Feb 16, 2018 at 01:15:56PM +, Dr. David Alan Gilbert (git) 
> > wrote:
> > > From: "Dr. David Alan Gilbert" 
> > > 
> > > Hi,
> > >   This is the first non-RFC version of this patch set that
> > > enables postcopy migration with shared memory to a vhost user process.
> > > It's based off current head.
> > > 
> > > I've tested with vhost-user-bridge and a modified dpdk; both very
> > > lightly.
> > > 
> > > Compared to v2 we're now using the just-merged reworks to the vhost
> > > code (suggested by Igor), so that the huge page region merging is now a 
> > > lot simpler
> > > in this series. The handshake between the client and the qemu for the
> > > set-mem-table is now a bit more complex to resolve a previous race where
> > > the client would start sending requests to the qemu prior to the qemu
> > > being ready to accept them.
> > > 
> > > Dave
> > 
> > From vhost-user POV this seems mostly fine to me.
> 
> OK, great - it would be nice to get this merged in the upcoming release
> (Hint: Anyone else please review!)
> 
> > I would like to have dependency of specific messages on the
> > protocol features documented, and the order of messages
> > documented a bit more explicitly.
> 
> Something like the following? (appropriately merged in with the
> individual commits):
> 
> diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
> index 4bf7d8ef99..7841812766 100644
> --- a/docs/interop/vhost-user.txt
> +++ b/docs/interop/vhost-user.txt
> @@ -461,7 +461,7 @@ Master message types
>for each memory mapped region. The size and ordering of the fds matches
>the number and ordering of memory regions.
>  
> -  When postcopy-listening has been received, SET_MEM_TABLE replies with
> +  When VHOST_USER_POSTCOPY_LISTEN has been received, SET_MEM_TABLE 
> replies with
>the bases of the memory mapped regions to the master.  It must have 
> mmap'd
>the regions but not yet accessed them and should not yet generate a 
> userfault
>event. Note NEED_REPLY_MASK is not set in this case.
> @@ -687,7 +687,8 @@ Master message types
>Master payload: N/A
>Slave payload: userfault fd + u64
>  
> -  Master advises slave that a migration with postcopy enabled is 
> underway,
> +  When VHOST_USER_PROTOCOL_F_PAGEFAULT is supported, the
> +  master advises slave that a migration with postcopy enabled is 
> underway,
>the slave must open a userfaultfd for later use.
>Note that at this stage the migration is still in precopy mode.
>  
> @@ -696,6 +697,8 @@ Master message types
>Master payload: N/A
>  
>Master advises slave that a transition to postcopy mode has happened.
> +  This is always sent sometime after a VHOST_USER_POSTCOPY_ADVISE, and
> +  thus only when VHOST_USER_PROTOCOL_F_PAGEFAULT is supported.
>  
>   * VHOST_USER_POSTCOPY_END
>Id: 28
> @@ -704,6 +707,8 @@ Master message types
>Master advises that postcopy migration has now completed.  The
>slave must disable the userfaultfd. The response is an acknowledgement
>only.
> +  This message is sent at the end of the migration, after
> +  VHOST_USER_POSTCOPY_LISTEN was previously sent.

And maybe mention VHOST_USER_PROTOCOL_F_PAGEFAULT here too.

>  Slave message types
>  ---
> 
> Dave
> 
> > 
> > 
> > 
> > > Dr. David Alan Gilbert (29):
> > >   migrate: Update ram_block_discard_range for shared
> > >   qemu_ram_block_host_offset
> > >   postcopy: use UFFDIO_ZEROPAGE only when available
> > >   postcopy: Add notifier chain
> > >   postcopy: Add vhost-user flag for postcopy and check it
> > >   vhost-user: Add 'VHOST_USER_POSTCOPY_ADVISE' message
> > >   libvhost-user: Support sending fds back to qemu
> > >   libvhost-user: Open userfaultfd
> > >   postcopy: Allow registering of fd handler
> > >   vhost+postcopy: Register shared ufd with postcopy
> > >   vhost+postcopy: Transmit 'listen' to client
> > >   postcopy+vhost-user: Split set_mem_table for postcopy
> > >   migration/ram: ramblock_recv_bitmap_test_byte_offset
> > >   libvhost-user+postcopy: Register new regions with the ufd
> > >   vhost+postcopy: Send address back to qemu
> > >   vhost+postcopy: Stash RAMBlock and offset
> > >   vhost+postcopy: Send requests to source for shared pages
> > >   vhost+postcopy: Resolve client address
> > >   postcopy: wake shared
> > >   postcopy: postcopy_notify_shared_wake
> > >   vhost+postcopy: Add vhost waker
> > >   vhost+postcopy: Call wakeups
> > >   libvhost-user: mprotect & madvises for postcopy
> > >   vhost-user: Add VHOST_USER_POSTCOPY_END message
> > >   vhost+postcopy: Wire up POSTCOPY_END notify
> > >   vhost: Huge page align and merge
> > >   postcopy: Allow shared memory
> > >   libvhost-user: Claim support for postcopy
> > >   postcopy shared docs
> > > 
> > 

Re: [Qemu-devel] [RFC v4 20/21] iotests: test manual job dismissal

2018-02-27 Thread Eric Blake

On 02/23/2018 05:51 PM, John Snow wrote:

Signed-off-by: John Snow 
---
  tests/qemu-iotests/056 | 195 +
  tests/qemu-iotests/056.out |   4 +-
  2 files changed, 197 insertions(+), 2 deletions(-)



I'm not sure if this covers everything in the series, but it looks like 
a reasonable expansion and hits a lot of the highlights.  At any rate, 
it's always better to add tests, and the test passing is a good bet that 
the new code will be harder to regress.


Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



  1   2   3   4   5   >