date:20180201

Re: [Qemu-devel] [PATCH v2] tap: close fd conditionally when error occured

2018-02-01 Thread Jay Zhou


Hi Jason,

On 2018/2/2 11:11, Jason Wang wrote:



On 2018年01月26日 11:08, Jay Zhou wrote:

If netdev_add tap,id=net0,...,vhost=on failed in net_init_tap_one(),
the followed up device_add virtio-net-pci,netdev=net0 will fail
too, prints:

   TUNSETOFFLOAD ioctl() failed: Bad file descriptor TUNSETOFFLOAD
   ioctl() failed: Bad file descriptor

The reason is that the fd of tap is closed when error occured after
calling net_init_tap_one().

The fd should be closed when calling net_init_tap_one failed:
   - if tap_set_sndbuf() failed
   - if tap_set_sndbuf() succeeded but vhost failed to initialize with
 vhostforce flag on
The fd should not be closed just because vhost failed to initialize
but without vhostforce flag. So the followed up device_add can fall
back to userspace virtio successfully.

Suggested-by: Michael S. Tsirkin 
Suggested-by: Igor Mammedov 
Suggested-by: Jason Wang 
Signed-off-by: Jay Zhou 
---
  net/tap.c | 40 
  1 file changed, 32 insertions(+), 8 deletions(-)

diff --git a/net/tap.c b/net/tap.c
index 979e622..8042c7d 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -648,12 +648,6 @@ static void net_init_tap_one(const NetdevTapOptions
*tap, NetClientState *peer,
  TAPState *s = net_tap_fd_init(peer, model, name, fd, vnet_hdr);
  int vhostfd;
-tap_set_sndbuf(s->fd, tap, );
-if (err) {
-error_propagate(errp, err);
-return;
-}
-
  if (tap->has_fd || tap->has_fds) {
  snprintf(s->nc.info_str, sizeof(s->nc.info_str), "fd=%d", fd);
  } else if (tap->has_helper) {
@@ -781,6 +775,12 @@ int net_init_tap(const Netdev *netdev, const char *name,
  vnet_hdr = tap_probe_vnet_hdr(fd);
+tap_set_sndbuf(fd, tap, );
+if (err) {
+error_propagate(errp, err);
+return -1;
+}
+
  net_init_tap_one(tap, peer, "tap", name, NULL,
   script, downscript,
   vhostfdname, vnet_hdr, fd, );
@@ -832,6 +832,12 @@ int net_init_tap(const Netdev *netdev, const char *name,
  goto free_fail;
  }
+tap_set_sndbuf(fd, tap, );
+if (err) {
+error_propagate(errp, err);
+goto free_fail;
+}
+
  net_init_tap_one(tap, peer, "tap", name, ifname,
   script, downscript,
   tap->has_vhostfds ? vhost_fds[i] : NULL,
@@ -872,12 +878,21 @@ free_fail:
  fcntl(fd, F_SETFL, O_NONBLOCK);
  vnet_hdr = tap_probe_vnet_hdr(fd);
+tap_set_sndbuf(fd, tap, );
+if (err) {
+error_propagate(errp, err);
+close(fd);
+return -1;
+}
+
  net_init_tap_one(tap, peer, "bridge", name, ifname,
   script, downscript, vhostfdname,
   vnet_hdr, fd, );
  if (err) {
  error_propagate(errp, err);
-close(fd);
+if (tap->has_vhostforce && tap->vhostforce) {
+close(fd);
+}
  return -1;
  }
  } else {
@@ -910,13 +925,22 @@ free_fail:
  }
  }
+tap_set_sndbuf(fd, tap, );
+if (err) {
+error_propagate(errp, err);
+close(fd);
+return -1;
+}
+
  net_init_tap_one(tap, peer, "tap", name, ifname,
   i >= 1 ? "no" : script,
   i >= 1 ? "no" : downscript,
   vhostfdname, vnet_hdr, fd, );
  if (err) {
  error_propagate(errp, err);
-close(fd);
+if (tap->has_vhostforce && tap->vhostforce) {
+close(fd);
+}
  return -1;
  }
  }


Hi:

I still fail to understand why not just pass force flag to net_tap_init_one(),
and let it decide?


I'm a little confused here, as you suggested in version 1:

https://lists.gnu.org/archive/html/qemu-devel/2018-01/msg02933.html

whether or not to close the fd should let the caller decide, so this is
the version 2.

If I misunderstood something, please let me know, thanks!

Regards,
Jay



Thanks




.

Re: [Qemu-devel] [PATCH] spapr: add missing break in h_get_cpu_characteristics()

2018-02-01 Thread Suraj Jitindar Singh

On Thu, 2018-02-01 at 20:47 +0100, Greg Kurz wrote:
> Detected by Coverity (CID 1385702). This fixes the recently added
> hypercall
> to let guests properly apply Spectre and Meltdown workarounds.
> 
> Fixes: c59704b25473 "target/ppc/spapr: Add H-Call
> H_GET_CPU_CHARACTERISTICS"
> Signed-off-by: Greg Kurz 

Reviewed-by: Suraj Jitindar Singh 

> ---
>  hw/ppc/spapr_hcall.c |1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index 4d0e6eb0cf1d..596f58378a40 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -1697,6 +1697,7 @@ static target_ulong
> h_get_cpu_characteristics(PowerPCCPU *cpu,
>  switch (safe_indirect_branch) {
>  case SPAPR_CAP_FIXED:
>  characteristics |= H_CPU_CHAR_BCCTRL_SERIALISED;
> +break;
>  default: /* broken */
>  assert(safe_indirect_branch == SPAPR_CAP_BROKEN);
>  break;
>

Re: [Qemu-devel] [PATCH 3/3] s390x/pci: use the right pal and pba in reg_ioat()

2018-02-01 Thread Yi Min Zhao




在 2018/2/1 下午8:02, Cornelia Huck 写道:

On Thu, 1 Feb 2018 12:33:01 +0100
Pierre Morel  wrote:


On 31/01/2018 12:44, Cornelia Huck wrote:

On Tue, 30 Jan 2018 10:47:15 +0100
Yi Min Zhao  wrote:
  

When registering ioat, pba should be comprised of leftmost 52 bits and
rightmost 12 binary zeros, and pal should be comprised of leftmost 52
bits and right most 12 binary ones. Let's fixup this.

Reviewed-by: Pierre Morel 
Signed-off-by: Yi Min Zhao 
---
   hw/s390x/s390-pci-inst.c | 2 ++
   1 file changed, 2 insertions(+)

diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
index 997a9cc2e9..3fcc330fe3 100644
--- a/hw/s390x/s390-pci-inst.c
+++ b/hw/s390x/s390-pci-inst.c
@@ -865,6 +865,8 @@ static int reg_ioat(CPUS390XState *env, S390PCIIOMMU 
*iommu, ZpciFib fib,
   uint8_t dt = (g_iota >> 2) & 0x7;
   uint8_t t = (g_iota >> 11) & 0x1;
   
+pba &= ~0xfff;

+pal |= 0xfff;
   if (pba > pal || pba < ZPCI_SDMA_ADDR || pal > ZPCI_EDMA_ADDR) {
   s390_program_interrupt(env, PGM_OPERAND, 6, ra);
   return -EINVAL;

It seems like pba and pal are part of the fib, which in turn seems to
be provided by the caller. Is that correct? If yes, is it valid for
them to not have the rightmost 12 bits as 0s resp. 1s?

(Probably answered in the architecture, I know. Might make sense to be
a tad more explicit in the description.)
  

Yes it is, only word6 and the bits 0-19 of word 7 are used for PAL and
the zPCI facility treats the right most 12 bits of the PAL as containing
ones.

For PBA words 4 and 0-19 bits of word 5 with 12 0 append on the right
provides the PBA.

The lower 12 bits of words 5 and 7 of the FIB are ignored by the facility.

@Yi Min: may be add the last sentence to the commit message.

@Conny: Is it clearer?

Yes, adding the last sentence makes it clearer. Thanks!



OK. Thanks!

Re: [Qemu-devel] [PATCH v2] tap: close fd conditionally when error occured

2018-02-01 Thread Jason Wang




On 2018年01月26日 11:08, Jay Zhou wrote:

If netdev_add tap,id=net0,...,vhost=on failed in net_init_tap_one(),
the followed up device_add virtio-net-pci,netdev=net0 will fail
too, prints:

   TUNSETOFFLOAD ioctl() failed: Bad file descriptor TUNSETOFFLOAD
   ioctl() failed: Bad file descriptor

The reason is that the fd of tap is closed when error occured after
calling net_init_tap_one().

The fd should be closed when calling net_init_tap_one failed:
   - if tap_set_sndbuf() failed
   - if tap_set_sndbuf() succeeded but vhost failed to initialize with
 vhostforce flag on
The fd should not be closed just because vhost failed to initialize
but without vhostforce flag. So the followed up device_add can fall
back to userspace virtio successfully.

Suggested-by: Michael S. Tsirkin 
Suggested-by: Igor Mammedov 
Suggested-by: Jason Wang 
Signed-off-by: Jay Zhou 
---
  net/tap.c | 40 
  1 file changed, 32 insertions(+), 8 deletions(-)

diff --git a/net/tap.c b/net/tap.c
index 979e622..8042c7d 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -648,12 +648,6 @@ static void net_init_tap_one(const NetdevTapOptions *tap, 
NetClientState *peer,
  TAPState *s = net_tap_fd_init(peer, model, name, fd, vnet_hdr);
  int vhostfd;
  
-tap_set_sndbuf(s->fd, tap, );

-if (err) {
-error_propagate(errp, err);
-return;
-}
-
  if (tap->has_fd || tap->has_fds) {
  snprintf(s->nc.info_str, sizeof(s->nc.info_str), "fd=%d", fd);
  } else if (tap->has_helper) {
@@ -781,6 +775,12 @@ int net_init_tap(const Netdev *netdev, const char *name,
  
  vnet_hdr = tap_probe_vnet_hdr(fd);
  
+tap_set_sndbuf(fd, tap, );

+if (err) {
+error_propagate(errp, err);
+return -1;
+}
+
  net_init_tap_one(tap, peer, "tap", name, NULL,
   script, downscript,
   vhostfdname, vnet_hdr, fd, );
@@ -832,6 +832,12 @@ int net_init_tap(const Netdev *netdev, const char *name,
  goto free_fail;
  }
  
+tap_set_sndbuf(fd, tap, );

+if (err) {
+error_propagate(errp, err);
+goto free_fail;
+}
+
  net_init_tap_one(tap, peer, "tap", name, ifname,
   script, downscript,
   tap->has_vhostfds ? vhost_fds[i] : NULL,
@@ -872,12 +878,21 @@ free_fail:
  fcntl(fd, F_SETFL, O_NONBLOCK);
  vnet_hdr = tap_probe_vnet_hdr(fd);
  
+tap_set_sndbuf(fd, tap, );

+if (err) {
+error_propagate(errp, err);
+close(fd);
+return -1;
+}
+
  net_init_tap_one(tap, peer, "bridge", name, ifname,
   script, downscript, vhostfdname,
   vnet_hdr, fd, );
  if (err) {
  error_propagate(errp, err);
-close(fd);
+if (tap->has_vhostforce && tap->vhostforce) {
+close(fd);
+}
  return -1;
  }
  } else {
@@ -910,13 +925,22 @@ free_fail:
  }
  }
  
+tap_set_sndbuf(fd, tap, );

+if (err) {
+error_propagate(errp, err);
+close(fd);
+return -1;
+}
+
  net_init_tap_one(tap, peer, "tap", name, ifname,
   i >= 1 ? "no" : script,
   i >= 1 ? "no" : downscript,
   vhostfdname, vnet_hdr, fd, );
  if (err) {
  error_propagate(errp, err);
-close(fd);
+if (tap->has_vhostforce && tap->vhostforce) {
+close(fd);
+}
  return -1;
  }
  }


Hi:

I still fail to understand why not just pass force flag to 
net_tap_init_one(), and let it decide?


Thanks

Re: [Qemu-devel] [PATCH] net/socket: change net_socket_listen_init to use qemu-socket functions

2018-02-01 Thread Jason Wang




On 2018年02月01日 16:20, Zihan Yang wrote:

net_socket_listen_init directly uses parse_host_port, bind and listen,
change it to use functions in include/qemu/sockets.h

Signed-off-by: Zihan Yang 
---
  net/socket.c | 30 +-
  1 file changed, 9 insertions(+), 21 deletions(-)

diff --git a/net/socket.c b/net/socket.c
index 6917fbc..e4b1f47 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -494,36 +494,24 @@ static int net_socket_listen_init(NetClientState *peer,
  {
  NetClientState *nc;
  NetSocketState *s;
-struct sockaddr_in saddr;
-int fd, ret;
+SocketAddress *saddr;
+Error *local_err = NULL;
+int fd;
  
-if (parse_host_port(, host_str, errp) < 0) {

+saddr = socket_parse(host_str, _err);
+if (NULL != local_err) {
+error_setg_errno(errp, errno, "socket_parse failed");
  return -1;
  }
  
-fd = qemu_socket(PF_INET, SOCK_STREAM, 0);

+fd = socket_listen(saddr, errp);
  if (fd < 0) {
-error_setg_errno(errp, errno, "can't create stream socket");
+error_setg_errno(errp, errno, "can't listen on address");
+qapi_free_SocketAddress(saddr);
  return -1;
  }
  qemu_set_nonblock(fd);
  
-socket_set_fast_reuse(fd);

-
-ret = bind(fd, (struct sockaddr *), sizeof(saddr));
-if (ret < 0) {
-error_setg_errno(errp, errno, "can't bind ip=%s to socket",
- inet_ntoa(saddr.sin_addr));
-closesocket(fd);
-return -1;
-}
-ret = listen(fd, 0);
-if (ret < 0) {
-error_setg_errno(errp, errno, "can't listen on socket");
-closesocket(fd);
-return -1;
-}
-
  nc = qemu_new_net_client(_socket_info, peer, model, name);
  s = DO_UPCAST(NetSocketState, nc, nc);
  s->fd = -1;


This allows more kinds of socket to be created with listen= e.g unix 
domain socket or vsock but it doesn't allow such kinds of socket to be 
created with connect=.


And change to use socket_()* is tricky especially the connect part. You 
may have a look at:


6701e5514bea Revert "Change net/socket.c to use socket_*() functions" again
616018352c24 Revert "Change net/socket.c to use socket_*() functions"

Thanks

[Qemu-devel] [PATCH v3 2/5] vfio/pci: Add base BAR MemoryRegion

2018-02-01 Thread Alex Williamson

Add one more layer to our stack of MemoryRegions, this base region
allows us to register BARs independently of the vfio region or to
extend the size of BARs which do map to a region.  This will be
useful when we want hypervisor defined BARs or sections of BARs,
for purposes such as relocating MSI-X emulation.  We therefore call
msix_init() based on this new base MemoryRegion, while the quirks,
which only modify regions still operate on those sub-MemoryRegions.

Signed-off-by: Alex Williamson 
---

Pre-pull request testing uncovered an error with this patch, one of my
test VMs assigns USB host controllers with 1K MMIO BARs.  The host
BIOS happens to align these BARs at 4K, so I make use of the sub-page
code that extends these mappings without resizing the BARs.  With this
patch we also need to extend the base BAR if it's undersized, and
remove/re-add the base BAR MemoryRegion from the address space rather
than the vfio region MemoryRegion.  v3 adds the first 4 chunks of the
patch below to update the sub-page mapping support and is otherwise
identical to v2.  I'm only posting this one patch from the series for
v3, the others remain the same.  I dropped R-b/T-b here, but unless
you're testing a sub-page BAR with host alignment to use this support,
this code won't be executed and I'll be happy to re-add those.  Please
review, thanks,

Alex

 hw/vfio/pci.c |   94 ++---
 hw/vfio/pci.h |3 ++
 2 files changed, 72 insertions(+), 25 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 2c7129512563..908b8dffca2b 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -1087,7 +1087,7 @@ static void vfio_sub_page_bar_update_mapping(PCIDevice 
*pdev, int bar)
 {
 VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
 VFIORegion *region = >bars[bar].region;
-MemoryRegion *mmap_mr, *mr;
+MemoryRegion *mmap_mr, *region_mr, *base_mr;
 PCIIORegion *r;
 pcibus_t bar_addr;
 uint64_t size = region->size;
@@ -1100,7 +1100,8 @@ static void vfio_sub_page_bar_update_mapping(PCIDevice 
*pdev, int bar)
 
 r = >io_regions[bar];
 bar_addr = r->addr;
-mr = region->mem;
+base_mr = vdev->bars[bar].mr;
+region_mr = region->mem;
 mmap_mr = >mmaps[0].mem;
 
 /* If BAR is mapped and page aligned, update to fill PAGE_SIZE */
@@ -,12 +1112,15 @@ static void vfio_sub_page_bar_update_mapping(PCIDevice 
*pdev, int bar)
 
 memory_region_transaction_begin();
 
-memory_region_set_size(mr, size);
+if (vdev->bars[bar].size < size) {
+memory_region_set_size(base_mr, size);
+}
+memory_region_set_size(region_mr, size);
 memory_region_set_size(mmap_mr, size);
-if (size != region->size && memory_region_is_mapped(mr)) {
-memory_region_del_subregion(r->address_space, mr);
+if (size != vdev->bars[bar].size && memory_region_is_mapped(base_mr)) {
+memory_region_del_subregion(r->address_space, base_mr);
 memory_region_add_subregion_overlap(r->address_space,
-bar_addr, mr, 0);
+bar_addr, base_mr, 0);
 }
 
 memory_region_transaction_commit();
@@ -1218,8 +1222,8 @@ void vfio_pci_write_config(PCIDevice *pdev,
 
 for (bar = 0; bar < PCI_ROM_SLOT; bar++) {
 if (old_addr[bar] != pdev->io_regions[bar].addr &&
-pdev->io_regions[bar].size > 0 &&
-pdev->io_regions[bar].size < qemu_real_host_page_size) {
+vdev->bars[bar].region.size > 0 &&
+vdev->bars[bar].region.size < qemu_real_host_page_size) {
 vfio_sub_page_bar_update_mapping(pdev, bar);
 }
 }
@@ -1440,9 +1444,9 @@ static int vfio_msix_setup(VFIOPCIDevice *vdev, int pos, 
Error **errp)
 vdev->msix->pending = g_malloc0(BITS_TO_LONGS(vdev->msix->entries) *
 sizeof(unsigned long));
 ret = msix_init(>pdev, vdev->msix->entries,
-vdev->bars[vdev->msix->table_bar].region.mem,
+vdev->bars[vdev->msix->table_bar].mr,
 vdev->msix->table_bar, vdev->msix->table_offset,
-vdev->bars[vdev->msix->pba_bar].region.mem,
+vdev->bars[vdev->msix->pba_bar].mr,
 vdev->msix->pba_bar, vdev->msix->pba_offset, pos,
 );
 if (ret < 0) {
@@ -1482,8 +1486,8 @@ static void vfio_teardown_msi(VFIOPCIDevice *vdev)
 
 if (vdev->msix) {
 msix_uninit(>pdev,
-vdev->bars[vdev->msix->table_bar].region.mem,
-vdev->bars[vdev->msix->pba_bar].region.mem);
+vdev->bars[vdev->msix->table_bar].mr,
+vdev->bars[vdev->msix->pba_bar].mr);
 g_free(vdev->msix->pending);
 }
 }
@@ -1500,12 +1504,11 @@ static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, 
bool

Re: [Qemu-devel] [PATCH 3/3] target/ppc: generalize check on radix when in HV mode

2018-02-01 Thread Suraj Jitindar Singh

On Wed, 2018-01-31 at 09:27 +0100, Cédric Le Goater wrote:
> On a POWER9 processor, the first doubleword of the PTCR indicates
> whether the partition uses HPT or Radix Trees translation. Use that
> bit to check for radix mode on powernv QEMU machines.

The above isn't quite right.

On a POWER9 processor, the first doubleword of the partition table
entry (as pointed to by the PTCR) indicates whether the host uses HPT
or Radix Tree translation for that partition.

> 
> Signed-off-by: Cédric Le Goater 
> ---
>  target/ppc/mmu-book3s-v3.c  | 17 -
>  target/ppc/mmu-book3s-v3.h  |  8 +---
>  target/ppc/mmu-hash64.h |  1 +
>  target/ppc/mmu_helper.c |  4 ++--
>  target/ppc/translate_init.c |  2 +-
>  5 files changed, 21 insertions(+), 11 deletions(-)
> 
> diff --git a/target/ppc/mmu-book3s-v3.c b/target/ppc/mmu-book3s-v3.c
> index e7798b3582b0..50b60fca3445 100644
> --- a/target/ppc/mmu-book3s-v3.c
> +++ b/target/ppc/mmu-book3s-v3.c
> @@ -24,10 +24,25 @@
>  #include "mmu-book3s-v3.h"
>  #include "mmu-radix64.h"
>  
> +bool ppc64_radix(PowerPCCPU *cpu)
> +{
> +CPUPPCState *env = >env;
> +
> +if (msr_hv) {

I would prefer something like:

uint64_t prtbe0 = ldq_phys(...);
return prtbe0 & HR;

> +return ldq_phys(CPU(cpu)->as, cpu->env.spr[SPR_PTCR] &
> +PTCR_PTAB) & PTCR_PTAB_HR;
> +} else  {
> +PPCVirtualHypervisorClass *vhc =
> +PPC_VIRTUAL_HYPERVISOR_GET_CLASS(cpu->vhyp);
> +
> +return !!(vhc->get_patbe(cpu->vhyp) & PATBE1_GR);
> +}
> +}
> +
>  int ppc64_v3_handle_mmu_fault(PowerPCCPU *cpu, vaddr eaddr, int rwx,
>int mmu_idx)
>  {
> -if (ppc64_radix_guest(cpu)) { /* Guest uses radix */
> +if (ppc64_radix(cpu)) { /* radix mode */
>  return ppc_radix64_handle_mmu_fault(cpu, eaddr, rwx,
> mmu_idx);
>  } else { /* Guest uses hash */
>  return ppc_hash64_handle_mmu_fault(cpu, eaddr, rwx,
> mmu_idx);
> diff --git a/target/ppc/mmu-book3s-v3.h b/target/ppc/mmu-book3s-v3.h
> index 56095dab522c..3876cb51b35c 100644
> --- a/target/ppc/mmu-book3s-v3.h
> +++ b/target/ppc/mmu-book3s-v3.h
> @@ -37,13 +37,7 @@ static inline bool ppc64_use_proc_tbl(PowerPCCPU
> *cpu)
>  return !!(cpu->env.spr[SPR_LPCR] & LPCR_UPRT);
>  }
>  
> -static inline bool ppc64_radix_guest(PowerPCCPU *cpu)
> -{
> -PPCVirtualHypervisorClass *vhc =
> -PPC_VIRTUAL_HYPERVISOR_GET_CLASS(cpu->vhyp);
> -
> -return !!(vhc->get_patbe(cpu->vhyp) & PATBE1_GR);
> -}
> +bool ppc64_radix(PowerPCCPU *cpu);
>  
>  int ppc64_v3_handle_mmu_fault(PowerPCCPU *cpu, vaddr eaddr, int rwx,
>int mmu_idx);
> diff --git a/target/ppc/mmu-hash64.h b/target/ppc/mmu-hash64.h
> index 4dc6b3968ec0..7e2ac64b6eeb 100644
> --- a/target/ppc/mmu-hash64.h
> +++ b/target/ppc/mmu-hash64.h
> @@ -106,6 +106,7 @@ void ppc_hash64_update_rmls(CPUPPCState *env);
>  /*
>   * Partition table definitions
>   */
> +#define PTCR_PTAB_HRPPC_BIT(0)/* 1:Host 

This isn't a bit in the partition table register, it is a bit in the
partition table entry. It should be defined in target/ppc/mmu-book3s-
v3.h as part of "/* Partition Table Entry Fields */"

Also to follow the naming, please call it:
#define PATBE0_HR   PPC_BIT(0)

:)

> Radix 0:HPT   */
>  #define PTCR_PTAB   0x0000ULL /* Partition
> Table Base */
>  #define PTCR_PTAS   0x001FULL /* Partition
> Table Size */
>  
> diff --git a/target/ppc/mmu_helper.c b/target/ppc/mmu_helper.c
> index b1e660a4d16a..059863b99b2e 100644
> --- a/target/ppc/mmu_helper.c
> +++ b/target/ppc/mmu_helper.c
> @@ -1286,7 +1286,7 @@ void dump_mmu(FILE *f, fprintf_function
> cpu_fprintf, CPUPPCState *env)
>  dump_slb(f, cpu_fprintf, ppc_env_get_cpu(env));
>  break;
>  case POWERPC_MMU_VER_3_00:
> -if (ppc64_radix_guest(ppc_env_get_cpu(env))) {
> +if (ppc64_radix(ppc_env_get_cpu(env))) {
>  /* TODO - Unsupported */
>  } else {
>  dump_slb(f, cpu_fprintf, ppc_env_get_cpu(env));
> @@ -1432,7 +1432,7 @@ hwaddr ppc_cpu_get_phys_page_debug(CPUState
> *cs, vaddr addr)
>  case POWERPC_MMU_VER_2_07:
>  return ppc_hash64_get_phys_page_debug(cpu, addr);
>  case POWERPC_MMU_VER_3_00:
> -if (ppc64_radix_guest(ppc_env_get_cpu(env))) {
> +if (ppc64_radix(ppc_env_get_cpu(env))) {
>  return ppc_radix64_get_phys_page_debug(cpu, addr);
>  } else {
>  return ppc_hash64_get_phys_page_debug(cpu, addr);
> diff --git a/target/ppc/translate_init.c
> b/target/ppc/translate_init.c
> index a6eaa74244ca..07012ee75e81 100644
> --- a/target/ppc/translate_init.c
> +++ b/target/ppc/translate_init.c
> @@ -8965,7 +8965,7 @@ void cpu_ppc_set_papr(PowerPCCPU *cpu,
> PPCVirtualHypervisor *vhyp)
>   * KVM but not under TCG. Update the default LPCR to keep
> new
>   * CPUs

Re: [Qemu-devel] [PATCH 1/3] target/ppc: add basic support for PTCR on POWER9

2018-02-01 Thread Suraj Jitindar Singh

On Fri, 2018-02-02 at 13:34 +1100, Suraj Jitindar Singh wrote:
> On Wed, 2018-01-31 at 09:27 +0100, Cédric Le Goater wrote:
> > The Partition Table Control Register (PTCR) is a hypervisor
> > privileged
> > SPR. It contains the host real address of the Partition Table and
> > its
> > size.
> > 
> > Signed-off-by: Cédric Le Goater 
> > ---
> >  target/ppc/cpu.h|  2 ++
> >  target/ppc/helper.h |  1 +
> >  target/ppc/misc_helper.c| 12 
> >  target/ppc/mmu-hash64.h |  6 ++
> >  target/ppc/mmu_helper.c | 28 
> >  target/ppc/translate.c  |  3 +++
> >  target/ppc/translate_init.c | 18 ++
> >  7 files changed, 70 insertions(+)
> > 
> > diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> > index 9f8cbbe7aa4d..53061229a0a8 100644
> > --- a/target/ppc/cpu.h
> > +++ b/target/ppc/cpu.h
> > @@ -1314,6 +1314,7 @@ int ppc_cpu_handle_mmu_fault(CPUState *cpu,
> > vaddr address, int size, int rw,
> >  
> >  #if !defined(CONFIG_USER_ONLY)
> >  void ppc_store_sdr1 (CPUPPCState *env, target_ulong value);
> > +void ppc_store_ptcr(CPUPPCState *env, target_ulong value);
> >  #endif /* !defined(CONFIG_USER_ONLY) */
> >  void ppc_store_msr (CPUPPCState *env, target_ulong value);
> >  
> > @@ -1605,6 +1606,7 @@ void ppc_compat_add_property(Object *obj,
> > const
> > char *name,
> >  #define SPR_BOOKE_GIVOR13 (0x1BC)
> >  #define SPR_BOOKE_GIVOR14 (0x1BD)
> >  #define SPR_TIR   (0x1BE)
> > +#define SPR_PTCR  (0x1D0)
> >  #define SPR_BOOKE_SPEFSCR (0x200)
> >  #define SPR_Exxx_BBEAR(0x201)
> >  #define SPR_Exxx_BBTAR(0x202)
> > diff --git a/target/ppc/helper.h b/target/ppc/helper.h
> > index 5b739179b8b5..19453c68138a 100644
> > --- a/target/ppc/helper.h
> > +++ b/target/ppc/helper.h
> > @@ -709,6 +709,7 @@ DEF_HELPER_FLAGS_1(load_601_rtcu,
> > TCG_CALL_NO_RWG, tl, env)
> >  #if !defined(CONFIG_USER_ONLY)
> >  #if defined(TARGET_PPC64)
> >  DEF_HELPER_FLAGS_1(load_purr, TCG_CALL_NO_RWG, tl, env)
> > +DEF_HELPER_2(store_ptcr, void, env, tl)
> >  #endif
> >  DEF_HELPER_2(store_sdr1, void, env, tl)
> >  DEF_HELPER_2(store_pidr, void, env, tl)
> > diff --git a/target/ppc/misc_helper.c b/target/ppc/misc_helper.c
> > index 0e4217821b8e..8c8cba5cc6f1 100644
> > --- a/target/ppc/misc_helper.c
> > +++ b/target/ppc/misc_helper.c
> > @@ -88,6 +88,18 @@ void helper_store_sdr1(CPUPPCState *env,
> > target_ulong val)
> >  }
> >  }
> >  
> > +#if defined(TARGET_PPC64)
> > +void helper_store_ptcr(CPUPPCState *env, target_ulong val)
> > +{
> > +PowerPCCPU *cpu = ppc_env_get_cpu(env);
> > +
> > +if (env->spr[SPR_PTCR] != val) {
> > +ppc_store_ptcr(env, val);
> > +tlb_flush(CPU(cpu));
> > +}
> > +}
> > +#endif /* defined(TARGET_PPC64) */
> > +
> >  void helper_store_pidr(CPUPPCState *env, target_ulong val)
> >  {
> >  PowerPCCPU *cpu = ppc_env_get_cpu(env);
> > diff --git a/target/ppc/mmu-hash64.h b/target/ppc/mmu-hash64.h
> > index d297b97d3773..4fb00ac17abb 100644
> > --- a/target/ppc/mmu-hash64.h
> > +++ b/target/ppc/mmu-hash64.h
> > @@ -98,6 +98,12 @@ void ppc_hash64_update_rmls(CPUPPCState *env);
> >  #define HPTE64_V_1TB_SEG0x4000ULL
> >  #define HPTE64_V_VRMA_MASK  0x4001ff00ULL
> >  
> > +/*
> > + * Partition table definitions
> > + */
> > +#define PTCR_PTAB   0x0000ULL /* Partition
> > Table Base */
> > +#define PTCR_PTAS   0x001FULL /* Partition
> > Table Size */
> > +
> 
> s/PTCR_PTAB/PTCR_PATB
> s/PTCR_PTAS/PTCR_PATS
> To match the ISA?

Also these should be in target/ppc/mmu-book3s-v3.h, they're not hash
specific

> 
> >  static inline hwaddr ppc_hash64_hpt_base(PowerPCCPU *cpu)
> >  {
> >  return cpu->env.spr[SPR_SDR1] & SDR_64_HTABORG;
> > diff --git a/target/ppc/mmu_helper.c b/target/ppc/mmu_helper.c
> > index 16ef5acaa28f..b1e660a4d16a 100644
> > --- a/target/ppc/mmu_helper.c
> > +++ b/target/ppc/mmu_helper.c
> > @@ -2029,6 +2029,34 @@ void ppc_store_sdr1(CPUPPCState *env,
> > target_ulong value)
> >  env->spr[SPR_SDR1] = value;
> >  }
> >  
> > +#if defined(TARGET_PPC64)
> > +void ppc_store_ptcr(CPUPPCState *env, target_ulong value)
> > +{
> > +PowerPCCPU *cpu = ppc_env_get_cpu(env);
> > +qemu_log_mask(CPU_LOG_MMU, "%s: " TARGET_FMT_lx "\n",
> > __func__,
> > value);
> > +
> > +assert(!cpu->vhyp);
> > +
> > +if (env->mmu_model & POWERPC_MMU_V3) {
> > +target_ulong ptcr_mask = PTCR_PTAB | PTCR_PTAS;
> > +target_ulong ptas = value & PTCR_PTAS;
> > +
> > +if (value & ~ptcr_mask) {
> > +error_report("Invalid bits 0x"TARGET_FMT_lx" set in
> > PTCR",
> > + value & ~ptcr_mask);
> > +value &= ptcr_mask;
> > +}
> > +if (ptas > 28) {
> > +error_report("Invalid PTAS 0x" TARGET_FMT_lx" stored
> > in
> > PTCR",
> > + ptas);
> > +

Re: [Qemu-devel] [PATCH qemu v2] machine: Polish -machine xxx,help

2018-02-01 Thread Alexey Kardashevskiy

On 29/01/18 16:03, Alexey Kardashevskiy wrote:
> On 15/12/17 15:47, Alexey Kardashevskiy wrote:
>> On 26/10/17 12:41, Alexey Kardashevskiy wrote:
>>> The "-machine xxx,help" prints kernel-irqchip possible values as
>>> "OnOffSplit", this adds separators to the printed line.
>>>
>>> Also, since only lower case letters are specified in qapi/common.json,
>>> this changes the letter cases too.
>>>
>>> Signed-off-by: Alexey Kardashevskiy 
>>
>> ping?
> 
> 
> anyone?

Let's try trivial list now :)


> 
>>
>>
>>> ---
>>>
>>> aik@fstn1-p1:~$ ./qemu-system-ppc64 -machine pseries,help 2>&1 | grep 
>>> kernel-irqchip
>>>
>>> Was:
>>> pseries-2.11.kernel-irqchip=OnOffSplit (Configure KVM in-kernel irqchip)
>>>
>>> Now:
>>> pseries-2.11.kernel-irqchip=on|off|split (Configure KVM in-kernel irqchip)
>>> ---
>>>  hw/core/machine.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/hw/core/machine.c b/hw/core/machine.c
>>> index 36c2fb069c..bd3db14e12 100644
>>> --- a/hw/core/machine.c
>>> +++ b/hw/core/machine.c
>>> @@ -506,7 +506,7 @@ static void machine_class_init(ObjectClass *oc, void 
>>> *data)
>>>  object_class_property_set_description(oc, "accel",
>>>  "Accelerator list", _abort);
>>>  
>>> -object_class_property_add(oc, "kernel-irqchip", "OnOffSplit",
>>> +object_class_property_add(oc, "kernel-irqchip", "on|off|split",
>>>  NULL, machine_set_kernel_irqchip,
>>>  NULL, NULL, _abort);
>>>  object_class_property_set_description(oc, "kernel-irqchip",
>>>
>>
>>
> 
> 


-- 
Alexey

Re: [Qemu-devel] [PATCH 1/3] target/ppc: add basic support for PTCR on POWER9

2018-02-01 Thread Suraj Jitindar Singh

On Wed, 2018-01-31 at 09:27 +0100, Cédric Le Goater wrote:
> The Partition Table Control Register (PTCR) is a hypervisor
> privileged
> SPR. It contains the host real address of the Partition Table and its
> size.
> 
> Signed-off-by: Cédric Le Goater 
> ---
>  target/ppc/cpu.h|  2 ++
>  target/ppc/helper.h |  1 +
>  target/ppc/misc_helper.c| 12 
>  target/ppc/mmu-hash64.h |  6 ++
>  target/ppc/mmu_helper.c | 28 
>  target/ppc/translate.c  |  3 +++
>  target/ppc/translate_init.c | 18 ++
>  7 files changed, 70 insertions(+)
> 
> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> index 9f8cbbe7aa4d..53061229a0a8 100644
> --- a/target/ppc/cpu.h
> +++ b/target/ppc/cpu.h
> @@ -1314,6 +1314,7 @@ int ppc_cpu_handle_mmu_fault(CPUState *cpu,
> vaddr address, int size, int rw,
>  
>  #if !defined(CONFIG_USER_ONLY)
>  void ppc_store_sdr1 (CPUPPCState *env, target_ulong value);
> +void ppc_store_ptcr(CPUPPCState *env, target_ulong value);
>  #endif /* !defined(CONFIG_USER_ONLY) */
>  void ppc_store_msr (CPUPPCState *env, target_ulong value);
>  
> @@ -1605,6 +1606,7 @@ void ppc_compat_add_property(Object *obj, const
> char *name,
>  #define SPR_BOOKE_GIVOR13 (0x1BC)
>  #define SPR_BOOKE_GIVOR14 (0x1BD)
>  #define SPR_TIR   (0x1BE)
> +#define SPR_PTCR  (0x1D0)
>  #define SPR_BOOKE_SPEFSCR (0x200)
>  #define SPR_Exxx_BBEAR(0x201)
>  #define SPR_Exxx_BBTAR(0x202)
> diff --git a/target/ppc/helper.h b/target/ppc/helper.h
> index 5b739179b8b5..19453c68138a 100644
> --- a/target/ppc/helper.h
> +++ b/target/ppc/helper.h
> @@ -709,6 +709,7 @@ DEF_HELPER_FLAGS_1(load_601_rtcu,
> TCG_CALL_NO_RWG, tl, env)
>  #if !defined(CONFIG_USER_ONLY)
>  #if defined(TARGET_PPC64)
>  DEF_HELPER_FLAGS_1(load_purr, TCG_CALL_NO_RWG, tl, env)
> +DEF_HELPER_2(store_ptcr, void, env, tl)
>  #endif
>  DEF_HELPER_2(store_sdr1, void, env, tl)
>  DEF_HELPER_2(store_pidr, void, env, tl)
> diff --git a/target/ppc/misc_helper.c b/target/ppc/misc_helper.c
> index 0e4217821b8e..8c8cba5cc6f1 100644
> --- a/target/ppc/misc_helper.c
> +++ b/target/ppc/misc_helper.c
> @@ -88,6 +88,18 @@ void helper_store_sdr1(CPUPPCState *env,
> target_ulong val)
>  }
>  }
>  
> +#if defined(TARGET_PPC64)
> +void helper_store_ptcr(CPUPPCState *env, target_ulong val)
> +{
> +PowerPCCPU *cpu = ppc_env_get_cpu(env);
> +
> +if (env->spr[SPR_PTCR] != val) {
> +ppc_store_ptcr(env, val);
> +tlb_flush(CPU(cpu));
> +}
> +}
> +#endif /* defined(TARGET_PPC64) */
> +
>  void helper_store_pidr(CPUPPCState *env, target_ulong val)
>  {
>  PowerPCCPU *cpu = ppc_env_get_cpu(env);
> diff --git a/target/ppc/mmu-hash64.h b/target/ppc/mmu-hash64.h
> index d297b97d3773..4fb00ac17abb 100644
> --- a/target/ppc/mmu-hash64.h
> +++ b/target/ppc/mmu-hash64.h
> @@ -98,6 +98,12 @@ void ppc_hash64_update_rmls(CPUPPCState *env);
>  #define HPTE64_V_1TB_SEG0x4000ULL
>  #define HPTE64_V_VRMA_MASK  0x4001ff00ULL
>  
> +/*
> + * Partition table definitions
> + */
> +#define PTCR_PTAB   0x0000ULL /* Partition
> Table Base */
> +#define PTCR_PTAS   0x001FULL /* Partition
> Table Size */
> +

s/PTCR_PTAB/PTCR_PATB
s/PTCR_PTAS/PTCR_PATS
To match the ISA?

>  static inline hwaddr ppc_hash64_hpt_base(PowerPCCPU *cpu)
>  {
>  return cpu->env.spr[SPR_SDR1] & SDR_64_HTABORG;
> diff --git a/target/ppc/mmu_helper.c b/target/ppc/mmu_helper.c
> index 16ef5acaa28f..b1e660a4d16a 100644
> --- a/target/ppc/mmu_helper.c
> +++ b/target/ppc/mmu_helper.c
> @@ -2029,6 +2029,34 @@ void ppc_store_sdr1(CPUPPCState *env,
> target_ulong value)
>  env->spr[SPR_SDR1] = value;
>  }
>  
> +#if defined(TARGET_PPC64)
> +void ppc_store_ptcr(CPUPPCState *env, target_ulong value)
> +{
> +PowerPCCPU *cpu = ppc_env_get_cpu(env);
> +qemu_log_mask(CPU_LOG_MMU, "%s: " TARGET_FMT_lx "\n", __func__,
> value);
> +
> +assert(!cpu->vhyp);
> +
> +if (env->mmu_model & POWERPC_MMU_V3) {
> +target_ulong ptcr_mask = PTCR_PTAB | PTCR_PTAS;
> +target_ulong ptas = value & PTCR_PTAS;
> +
> +if (value & ~ptcr_mask) {
> +error_report("Invalid bits 0x"TARGET_FMT_lx" set in
> PTCR",
> + value & ~ptcr_mask);
> +value &= ptcr_mask;
> +}
> +if (ptas > 28) {
> +error_report("Invalid PTAS 0x" TARGET_FMT_lx" stored in
> PTCR",
> + ptas);
> +return;
> +}
> +}

Should we throw some error if the ptcr is being accessed on a non-
power9 machine?

> +env->spr[SPR_PTCR] = value;
> +}
> +
> +#endif /* defined(TARGET_PPC64) */
> +
>  /* Segment registers load and store */
>  target_ulong helper_load_sr(CPUPPCState *env, target_ulong sr_num)
>  {
> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> index

Re: [Qemu-devel] [RFC PATCH qemu] qmp: Add qom-list-properties to list QOM object properties

2018-02-01 Thread Alexey Kardashevskiy

On 01/02/18 04:22, Markus Armbruster wrote:
> Alexey Kardashevskiy  writes:
> 
>> There is already 'device-list-properties' which does most of the job,
>> however it does not handle everything returned by qom-list-types such
>> as machines as they inherit directly from TYPE_OBJECT and not TYPE_DEVICE.
>>
>> This adds a new qom-list-properties command which prints properties
>> of a specific class and its instance. It is pretty much a simplified copy
>> of the device-list-properties handler.
>>
>> Since it creates an object instance, device properties should appear
>> in the output as they are copied to QOM properties at the instance_init
>> hook.
>>
>> Signed-off-by: Alexey Kardashevskiy 
> 
> Related: qom-list, which lists "any properties of a object given a path
> in the object model."  qom-list-properties takes a type name, which
> qom-list takes the path to an instance.  In other words,
> qom-list-properties is like instantiate with default configuration and
> without realizing + qom-list + destroy.


True. Same as device-list-properties.


> We need to instantiate because QOM properties are dynamic: they aren't
> specified by data (which qom-list-properties could simply read), they
> are created by (instantiation) code (which qom-list-properties has to
> run).

Correct.

> Properties created only after instantiation (by realize, perhaps) aren't
> visible in qom-list-properties.  Do such properties exist?

No idea but if they do, then this issue already exists in
device-list-properties.

> Properties created only in non-default configuration aren't visible
> either.  Such properties have to exist, or else dynamic property
> creation would be idiotic.
> 
> Likewise for properties created differently (say with a different type)
> in non-default configuration.  We can hope that no such beasts exist.
> Since properties get created by code, and code can do anything, we're
> reduced to hope.  Data is so much easier to reason about than code.
> 
> Three building blocks: instantiate, qom-list, destroy.  Do we want the
> building blocks, or do we want their combination qom-list-properties?


Building blocks as QEMU internal helpers to split my
qmp_qom_list_properties() into? These are not going to be huge and
"destroy" is literally object_unref(obj) which does not seem very useful.
Or I missed the point here?



>> ---
>>
>> I am missing the point of make_device_property_info().
>> qmp_device_list_properties() creates the instance which copies everything
>> to QOM properties hashtable and commenting out the do{}while() in
>> make_device_property_info() does not seem to change a thing, what case
>> am I missing here?
> 
> git-blame points to Stefan.  Stefan, can you help?
> 
>> ---
>>  qapi-schema.json | 29 +
>>  qmp.c| 52 
>>  2 files changed, 81 insertions(+)
>>
>> diff --git a/qapi-schema.json b/qapi-schema.json
>> index 5c06745..9d73501 100644
>> --- a/qapi-schema.json
>> +++ b/qapi-schema.json
>> @@ -1455,6 +1455,35 @@
>>'returns': [ 'DevicePropertyInfo' ] }
>>  
>>  ##
>> +# @QOMPropertyInfo:
>> +#
>> +# Information about object properties.
>> +#
>> +# @name: the name of the property
>> +# @type: the typename of the property
>> +# @description: if specified, the description of the property.
>> +#
>> +# Since: 2.12
>> +##
>> +{ 'struct': 'QOMPropertyInfo',
>> +  'data': { 'name': 'str', 'type': 'str', '*description': 'str' } }
>> +
>> +##
>> +# @qom-list-properties:
>> +#
>> +# List properties associated with a QOM object.
>> +#
>> +# @typename: the type name of an object
>> +#
>> +# Returns: a list of QOMPropertyInfo describing object properties
>> +#
>> +# Since: 2.12
>> +##
>> +{ 'command': 'qom-list-properties',
>> +  'data': { 'typename': 'str'},
>> +  'returns': [ 'QOMPropertyInfo' ] }
>> +
>> +##
>>  # @xen-set-global-dirty-log:
>>  #
>>  # Enable or disable the global dirty log mode.
>> diff --git a/qmp.c b/qmp.c
>> index 52cfd2d..20cb662 100644
>> --- a/qmp.c
>> +++ b/qmp.c
>> @@ -574,6 +574,58 @@ DevicePropertyInfoList 
>> *qmp_device_list_properties(const char *typename,
>>  return prop_list;
>>  }
>>  
>> +QOMPropertyInfoList *qmp_qom_list_properties(const char *typename,
>> + Error **errp)
>> +{
>> +ObjectClass *klass;
>> +Object *obj;
>> +ObjectProperty *prop;
>> +ObjectPropertyIterator iter;
>> +QOMPropertyInfoList *prop_list = NULL;
>> +
>> +klass = object_class_by_name(typename);
>> +if (klass == NULL) {
>> +error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
>> +  "Class '%s' not found", typename);
>> +return NULL;
>> +}
>> +
>> +klass = object_class_dynamic_cast(klass, TYPE_OBJECT);
>> +if (klass == NULL) {
>> +error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "typename", 
>> TYPE_OBJECT);
>> +return NULL;
>> +}
>> +
>> +if

Re: [Qemu-devel] [PATCH 1/2] tpm: Split off tpm_crb_reset function

2018-02-01 Thread Stefan Berger


On 02/01/2018 06:23 PM, Stefan Berger wrote:

Split off the tpm_crb_reset function part from tpm_crb_realize
that we need to run every time the machine resets.

Signed-off-by: Stefan Berger 
---
  hw/tpm/tpm_crb.c | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/hw/tpm/tpm_crb.c b/hw/tpm/tpm_crb.c
index 687d255..624e2e9 100644
--- a/hw/tpm/tpm_crb.c
+++ b/hw/tpm/tpm_crb.c
@@ -232,6 +232,11 @@ static void tpm_crb_realize(DeviceState *dev, Error **errp)
  TPM_CRB_ADDR_BASE, >mmio);
  memory_region_add_subregion(get_system_memory(),
  TPM_CRB_ADDR_BASE + sizeof(s->regs), >cmdmem);
+}
+
+static void tpm_crb_reset(DeviceState *dev)
+{
+CRBState *s = CRB(dev);

  tpm_backend_reset(s->tpmbe);

@@ -274,6 +279,7 @@ static void tpm_crb_class_init(ObjectClass *klass, void 
*data)

  dc->realize = tpm_crb_realize;
  dc->props = tpm_crb_properties;
+dc->reset = tpm_crb_reset;
  dc->vmsd  = _tpm_crb;
  dc->user_creatable = true;
  tc->model = TPM_MODEL_TPM_CRB;



So this splt-off unfortunately causes the tpm-crb-test to fail since the 
reset function is not called. I wonder how to trigger that? Do we need 
to call tpm_crb_reset now from tpm_crb_realize just to make the tests 
work ? Or is there a function we can call in the test case to trigger 
the device reset ?

Re: [Qemu-devel] [PATCH v6 0/4] cryptodev: add vhost support

2018-02-01 Thread Jay Zhou



On 2018/2/2 1:06, Michael S. Tsirkin wrote:

Yes, I plan to merge it in the next pull.
Pls don't assume anything until it's merged upstream though,
some issues surface late.



Okay, I see. Thanks for reviewing!

Regards,
Jay


On Thu, Feb 01, 2018 at 11:29:15AM +, Zhoujian (jay) wrote:

Hi Michael,
I am wondering whether this version is OK for you?
Any comment will be appreciated, thanks.

Regards,
Jay


-Original Message-
From: Zhoujian (jay)
Sent: Sunday, January 21, 2018 8:55 PM
To: qemu-devel@nongnu.org
Cc: m...@redhat.com; pbonz...@redhat.com; Huangweidong (C)
; stefa...@redhat.com; Zhoujian (jay)
; pa...@linux.vnet.ibm.com; longpeng
; xin.z...@intel.com; roy.fan.zh...@intel.com; Gonglei
(Arei) ; wangxin (U) 
Subject: [PATCH v6 0/4] cryptodev: add vhost support

From: Gonglei 

I posted the RFC verion a few months ago for DPDK vhost-crypto implmention,
and now it's time to send the formal version. Because we need an user space
scheme for better performance.

The vhost user crypto server side patches had been sent to DPDK community,
pls see

[RFC PATCH 0/6] lib/librte_vhost: introduce new vhost_user crypto backend
support http://dpdk.org/ml/archives/dev/2017-November/081048.html

You also can get virtio-crypto polling mode driver from:

[PATCH] virtio: add new driver for crypto devices
http://dpdk.org/ml/archives/dev/2017-November/081985.html

v5 -> v6:
   Fix compile error about backends/cryptodev-vhost-user.o and rebase on
   the master
v4 -> v5:
   squash [PATCH v4 5/5] into previous patches [Michael]
v3 -> v4:
   "[PATCH v4 5/5] cryptodev-vhost-user: depend on CONFIG_VHOST_CRYPTO
   and CONFIG_VHOST_USER" newly added to fix compilation dependency [Michael]
v2 -> v3:
   New added vhost user messages should be sent only when feature
   has been successfully negotiated [Michael]
v1 -> v2:
   Fix compile error on mingw32

Gonglei (4):
   cryptodev: add vhost-user as a new cryptodev backend
   cryptodev: add vhost support
   cryptodev-vhost-user: add crypto session handler
   cryptodev-vhost-user: set the key length

  backends/Makefile.objs|   6 +
  backends/cryptodev-builtin.c  |   1 +
  backends/cryptodev-vhost-user.c   | 379
++
  backends/cryptodev-vhost.c| 347 +++
  configure |  15 ++
  docs/interop/vhost-user.txt   |  26 +++
  hw/virtio/Makefile.objs   |   2 +-
  hw/virtio/vhost-user.c| 104 ++
  hw/virtio/virtio-crypto.c |  70 +++
  include/hw/virtio/vhost-backend.h |   8 +
  include/hw/virtio/virtio-crypto.h |   1 +
  include/sysemu/cryptodev-vhost-user.h |  47 +
  include/sysemu/cryptodev-vhost.h  | 154 ++
  include/sysemu/cryptodev.h|   8 +
  qemu-options.hx   |  21 ++
  vl.c  |   6 +
  16 files changed, 1194 insertions(+), 1 deletion(-)  create mode 100644
backends/cryptodev-vhost-user.c  create mode 100644 backends/cryptodev-
vhost.c  create mode 100644 include/sysemu/cryptodev-vhost-user.h
  create mode 100644 include/sysemu/cryptodev-vhost.h

--
1.8.3.1



.

Re: [Qemu-devel] [PATCH v3 14/18] sdcard: simplify SD_SEND_OP_COND (ACMD41)

2018-02-01 Thread Alistair Francis

On Mon, Jan 22, 2018 at 7:30 PM, Philippe Mathieu-Daudé  wrote:
> replace switch(single case) -> if()
>
> Signed-off-by: Philippe Mathieu-Daudé 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  hw/sd/sd.c | 56 ++--
>  1 file changed, 26 insertions(+), 30 deletions(-)
>
> diff --git a/hw/sd/sd.c b/hw/sd/sd.c
> index 707c294169..6efcacb942 100644
> --- a/hw/sd/sd.c
> +++ b/hw/sd/sd.c
> @@ -1535,45 +1535,41 @@ static sd_rsp_type_t sd_app_command(SDState *sd,
>  sd->state = sd_transfer_state;
>  return sd_r1;
>  }
> -switch (sd->state) {
> -case sd_idle_state:
> -/* If it's the first ACMD41 since reset, we need to decide
> - * whether to power up. If this is not an enquiry ACMD41,
> - * we immediately report power on and proceed below to the
> - * ready state, but if it is, we set a timer to model a
> - * delay for power up. This works around a bug in EDK2
> - * UEFI, which sends an initial enquiry ACMD41, but
> - * assumes that the card is in ready state as soon as it
> - * sees the power up bit set. */
> -if (!FIELD_EX32(sd->ocr, OCR, CARD_POWER_UP)) {
> -if ((req.arg & ACMD41_ENQUIRY_MASK) != 0) {
> -timer_del(sd->ocr_power_timer);
> -sd_ocr_powerup(sd);
> -} else {
> -trace_sdcard_inquiry_cmd41();
> -if (!timer_pending(sd->ocr_power_timer)) {
> -timer_mod_ns(sd->ocr_power_timer,
> - (qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL)
> -  + OCR_POWER_DELAY_NS));
> -}
> +if (sd->state != sd_idle_state) {
> +break;
> +}
> +/* If it's the first ACMD41 since reset, we need to decide
> + * whether to power up. If this is not an enquiry ACMD41,
> + * we immediately report power on and proceed below to the
> + * ready state, but if it is, we set a timer to model a
> + * delay for power up. This works around a bug in EDK2
> + * UEFI, which sends an initial enquiry ACMD41, but
> + * assumes that the card is in ready state as soon as it
> + * sees the power up bit set. */
> +if (!FIELD_EX32(sd->ocr, OCR, CARD_POWER_UP)) {
> +if ((req.arg & ACMD41_ENQUIRY_MASK) != 0) {
> +timer_del(sd->ocr_power_timer);
> +sd_ocr_powerup(sd);
> +} else {
> +trace_sdcard_inquiry_cmd41();
> +if (!timer_pending(sd->ocr_power_timer)) {
> +timer_mod_ns(sd->ocr_power_timer,
> + (qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL)
> +  + OCR_POWER_DELAY_NS));
>  }
>  }
> +}
>
> +if (FIELD_EX32(sd->ocr & req.arg, OCR, VDD_VOLTAGE_WINDOW)) {
>  /* We accept any voltage.  1 V is nothing.
>   *
>   * Once we're powered up, we advance straight to ready state
>   * unless it's an enquiry ACMD41 (bits 23:0 == 0).
>   */
> -if (req.arg & ACMD41_ENQUIRY_MASK) {
> -sd->state = sd_ready_state;
> -}
> -
> -return sd_r3;
> -
> -default:
> -break;
> +sd->state = sd_ready_state;
>  }
> -break;
> +
> +return sd_r3;
>
>  case 42:   /* ACMD42: SET_CLR_CARD_DETECT */
>  switch (sd->state) {
> --
> 2.15.1
>
>

[Qemu-devel] [PATCH v5 5/6] xlnx-zcu102: Specify the valid CPUs

2018-02-01 Thread Alistair Francis

List all possible valid CPU options.

Signed-off-by: Alistair Francis 
Reviewed-by: Eduardo Habkost 
Reviewed-by: Philippe Mathieu-Daudé 
---

An implementation for single CPU machines is still being discussed. A
solution proposed by Eduardo is this:

1) Change the default on TYPE_MACHINE to:
 mc->valid_cpu_types = { TYPE_CPU, NULL };

   This will keep the existing behavior for all boards.

2) mc->valid_cpu_types=NULL be interpreted as "no CPU model
   except the default is accepted" or "-cpu is not accepted" in
   machine_run_board_init() (I prefer the former, but both
   options would be correct)

3) Boards like xlnx_zynqmp could then just do this:

   static void xxx_class_init(...) {
   mc->default_cpu_type = MY_CPU_TYPE;
   /* Reason: XXX_init() is hardcoded to MY_CPU_TYPE */
   mc->valid_cpu_types = NULL;
   }

V5:
 - Use cpu_model names
V4:
 - Remove spaces
V3:
 - Make variable static
V2:
 - Don't use the users -cpu
 - Fixup allignment

 hw/arm/xlnx-zcu102.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/hw/arm/xlnx-zcu102.c b/hw/arm/xlnx-zcu102.c
index b126cf148b..994b19a36f 100644
--- a/hw/arm/xlnx-zcu102.c
+++ b/hw/arm/xlnx-zcu102.c
@@ -184,6 +184,11 @@ static void xlnx_zynqmp_init(XlnxZCU102 *s, MachineState 
*machine)
 arm_load_kernel(s->soc.boot_cpu_ptr, _zcu102_binfo);
 }
 
+static const char *xlnx_zynqmp_valid_cpus[] = {
+"cortex-a53",
+NULL
+};
+
 static void xlnx_ep108_init(MachineState *machine)
 {
 XlnxZCU102 *s = EP108_MACHINE(machine);
@@ -216,6 +221,12 @@ static void xlnx_ep108_machine_class_init(ObjectClass *oc, 
void *data)
 mc->ignore_memory_transaction_failures = true;
 mc->max_cpus = XLNX_ZYNQMP_NUM_APU_CPUS + XLNX_ZYNQMP_NUM_RPU_CPUS;
 mc->default_cpus = XLNX_ZYNQMP_NUM_APU_CPUS;
+mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-a53");
+/* The ZynqMP SoC is always a Cortex-A53. We add this here to give
+ * users a sane error if they specify a different CPU, but we never
+ * use their CPU choice.
+ */
+mc->valid_cpu_types = xlnx_zynqmp_valid_cpus;
 }
 
 static const TypeInfo xlnx_ep108_machine_init_typeinfo = {
@@ -274,6 +285,12 @@ static void xlnx_zcu102_machine_class_init(ObjectClass 
*oc, void *data)
 mc->ignore_memory_transaction_failures = true;
 mc->max_cpus = XLNX_ZYNQMP_NUM_APU_CPUS + XLNX_ZYNQMP_NUM_RPU_CPUS;
 mc->default_cpus = XLNX_ZYNQMP_NUM_APU_CPUS;
+mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-a53");
+/* The ZynqMP SoC is always a Cortex-A53. We add this here to give
+ * users a sane error if they specify a different CPU, but we never
+ * use their CPU choice.
+ */
+mc->valid_cpu_types = xlnx_zynqmp_valid_cpus;
 }
 
 static const TypeInfo xlnx_zcu102_machine_init_typeinfo = {
-- 
2.14.1

[Qemu-devel] [PATCH v5 2/6] netduino2: Specify the valid CPUs

2018-02-01 Thread Alistair Francis

List all possible valid CPU options.

Although the board only ever has a Cortex-M3 we mark the Cortex-M4 as
supported because the Netduino2 Plus supports the Cortex-M4 and the
Netduino2 Plus is similar to the Netduino2.

Signed-off-by: Alistair Francis 
Reviewed-by: Eduardo Habkost 
Reviewed-by: Philippe Mathieu-Daudé 
---

V5:
 - Use cpu_model names
V3:
 - Add static property
V2:
 - Fixup allignment
RFC v2:
 - Use a NULL terminated list
 - Add the Cortex-M4 for testing


 hw/arm/netduino2.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/hw/arm/netduino2.c b/hw/arm/netduino2.c
index f936017d4a..91c925a56c 100644
--- a/hw/arm/netduino2.c
+++ b/hw/arm/netduino2.c
@@ -34,18 +34,26 @@ static void netduino2_init(MachineState *machine)
 DeviceState *dev;
 
 dev = qdev_create(NULL, TYPE_STM32F205_SOC);
-qdev_prop_set_string(dev, "cpu-type", ARM_CPU_TYPE_NAME("cortex-m3"));
+qdev_prop_set_string(dev, "cpu-type", machine->cpu_type);
 object_property_set_bool(OBJECT(dev), true, "realized", _fatal);
 
 armv7m_load_kernel(ARM_CPU(first_cpu), machine->kernel_filename,
FLASH_SIZE);
 }
 
+static const char *netduino_valid_cpus[] = {
+"cortex-m3",
+"cortex-m4",
+NULL
+};
+
 static void netduino2_machine_init(MachineClass *mc)
 {
 mc->desc = "Netduino 2 Machine";
 mc->init = netduino2_init;
 mc->ignore_memory_transaction_failures = true;
+mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-m3");
+mc->valid_cpu_types = netduino_valid_cpus;
 }
 
 DEFINE_MACHINE("netduino2", netduino2_machine_init)
-- 
2.14.1

[Qemu-devel] [PATCH v5 3/6] bcm2836: Use the Cortex-A7 instead of Cortex-A15

2018-02-01 Thread Alistair Francis

The BCM2836 uses a Cortex-A7 not a Cortex-A15. Update the device to use
the correct CPU.
https://www.raspberrypi.org/documentation/hardware/raspberrypi/bcm2836/QA7_rev3.4.pdf

Signed-off-by: Alistair Francis 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Igor Mammedov 
---
V3:
 - Use ARM_CPU_TYPE_NAME() macro
V2:
 - Fix the BCM2836 CPU

 hw/arm/bcm2836.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/arm/bcm2836.c b/hw/arm/bcm2836.c
index 8c43291112..c42484 100644
--- a/hw/arm/bcm2836.c
+++ b/hw/arm/bcm2836.c
@@ -30,7 +30,7 @@ static void bcm2836_init(Object *obj)
 
 for (n = 0; n < BCM2836_NCPUS; n++) {
 object_initialize(>cpus[n], sizeof(s->cpus[n]),
-  "cortex-a15-" TYPE_ARM_CPU);
+  ARM_CPU_TYPE_NAME("cortex-a7"));
 object_property_add_child(obj, "cpu[*]", OBJECT(>cpus[n]),
   _abort);
 }
-- 
2.14.1

[Qemu-devel] [PATCH v5 6/6] xilinx_zynq: Specify the valid CPUs

2018-02-01 Thread Alistair Francis

List all possible valid CPU options.

Signed-off-by: Alistair Francis 
Reviewed-by: Philippe Mathieu-Daudé 
---

V5:
 - Use cpu_model names
V4:
 - Remove spaces
V3:
 - Make variable static
V2:
 - Fixup alignment

 hw/arm/xilinx_zynq.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/hw/arm/xilinx_zynq.c b/hw/arm/xilinx_zynq.c
index 1836a4ed45..822e1889a5 100644
--- a/hw/arm/xilinx_zynq.c
+++ b/hw/arm/xilinx_zynq.c
@@ -313,6 +313,11 @@ static void zynq_init(MachineState *machine)
 arm_load_kernel(ARM_CPU(first_cpu), _binfo);
 }
 
+static const char *xlnx_zynq_7000_valid_cpus[] = {
+"cortex-a9",
+NULL
+};
+
 static void zynq_machine_init(MachineClass *mc)
 {
 mc->desc = "Xilinx Zynq Platform Baseboard for Cortex-A9";
@@ -321,6 +326,7 @@ static void zynq_machine_init(MachineClass *mc)
 mc->no_sdcard = 1;
 mc->ignore_memory_transaction_failures = true;
 mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-a9");
+mc->valid_cpu_types = xlnx_zynq_7000_valid_cpus;
 }
 
 DEFINE_MACHINE("xilinx-zynq-a9", zynq_machine_init)
-- 
2.14.1

[Qemu-devel] [PATCH v5 1/6] machine: Convert the valid cpu types to use cpu_model

2018-02-01 Thread Alistair Francis

As cpu_type is not a user visible string let's convert the
valid_cpu_types to compare against cpu_model instead. This way we have a
user friendly string to report back.

Once we have a cpu_type to cpu_model conversion this patch should be
reverted and we should use cpu_type instead.

Signed-off-by: Alistair Francis 
---

 hw/core/machine.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index cdc1163dc6..de5bac1c84 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -776,13 +776,12 @@ void machine_run_board_init(MachineState *machine)
 /* If the machine supports the valid_cpu_types check and the user
  * specified a CPU with -cpu check here that the user CPU is supported.
  */
-if (machine_class->valid_cpu_types && machine->cpu_type) {
-ObjectClass *class = object_class_by_name(machine->cpu_type);
+if (machine_class->valid_cpu_types && machine->cpu_model) {
 int i;
 
 for (i = 0; machine_class->valid_cpu_types[i]; i++) {
-if (object_class_dynamic_cast(class,
-  machine_class->valid_cpu_types[i])) {
+if (!strcmp(machine->cpu_model,
+machine_class->valid_cpu_types[i])) {
 /* The user specificed CPU is in the valid field, we are
  * good to go.
  */
@@ -792,8 +791,8 @@ void machine_run_board_init(MachineState *machine)
 
 if (!machine_class->valid_cpu_types[i]) {
 /* The user specified CPU is not valid */
-error_report("Invalid CPU type: %s", machine->cpu_type);
-error_printf("The valid types are: %s",
+error_report("Invalid CPU model: %s", machine->cpu_model);
+error_printf("The valid models are: %s",
  machine_class->valid_cpu_types[0]);
 for (i = 1; machine_class->valid_cpu_types[i]; i++) {
 error_printf(", %s", machine_class->valid_cpu_types[i]);
-- 
2.14.1

[Qemu-devel] [PATCH v5 0/6] Add a valid_cpu_types property

2018-02-01 Thread Alistair Francis


There are numorous QEMU machines that only have a single or a handful of
valid CPU options. To simplyfy the management of specificying which CPU
is/isn't valid let's create a property that can be set in the machine
init. We can then check to see if the user supplied CPU is in that list
or not.

I have added the valid_cpu_types for some ARM machines only at the
moment.

Here is what specifying the CPUs looks like now:

$ aarch64-softmmu/qemu-system-aarch64 -M netduino2 -kernel ./u-boot.elf 
-nographic -cpu "cortex-m3" -S
QEMU 2.10.50 monitor - type 'help' for more information
(qemu) info cpus
* CPU #0: thread_id=24175
(qemu) q

$ aarch64-softmmu/qemu-system-aarch64 -M netduino2 -kernel ./u-boot.elf 
-nographic -cpu "cortex-m4" -S
QEMU 2.10.50 monitor - type 'help' for more information
(qemu) q

$ aarch64-softmmu/qemu-system-aarch64 -M netduino2 -kernel ./u-boot.elf 
-nographic -cpu "cortex-m5" -S
qemu-system-aarch64: unable to find CPU model 'cortex-m5'

$ aarch64-softmmu/qemu-system-aarch64 -M netduino2 -kernel ./u-boot.elf 
-nographic -cpu "cortex-a9" -S
qemu-system-aarch64: Invalid CPU type: cortex-a9-arm-cpu
The valid models are: cortex-m3, cortex-m4

V5:
 - Use cpu_model instead of cpu_type
V4:
 - Rebase
 - Remove spaces
V3:
 - Make the varialbes static
V2:
 - Rebase
 - Reorder patches
 - Add a Raspberry Pi 2 CPU fix
V1:
 - Small fixes to prepare a series instead of RFC
 - Add commit messages for the commits
 - Expand the machine support to ARM machines
RFC v2:
 - Rebase on Igor's work
 - Use more QEMUisms inside the code
 - List the supported machines in a NULL terminated array



Alistair Francis (6):
  machine: Convert the valid cpu types to use cpu_model
  netduino2: Specify the valid CPUs
  bcm2836: Use the Cortex-A7 instead of Cortex-A15
  raspi: Specify the valid CPUs
  xlnx-zcu102: Specify the valid CPUs
  xilinx_zynq: Specify the valid CPUs

 hw/arm/bcm2836.c |  2 +-
 hw/arm/netduino2.c   | 10 +-
 hw/arm/raspi.c   |  7 +++
 hw/arm/xilinx_zynq.c |  6 ++
 hw/arm/xlnx-zcu102.c | 17 +
 hw/core/machine.c| 11 +--
 6 files changed, 45 insertions(+), 8 deletions(-)

-- 
2.14.1

[Qemu-devel] [PATCH v5 4/6] raspi: Specify the valid CPUs

2018-02-01 Thread Alistair Francis

List all possible valid CPU options.

Signed-off-by: Alistair Francis 
Reviewed-by: Philippe Mathieu-Daudé 
---

V5:
 - Use cpu_model names
V4:
 - Remove spaces
V3:
 - Add static property
V2:
 - Fix the indentation

 hw/arm/raspi.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/hw/arm/raspi.c b/hw/arm/raspi.c
index cd5fa8c3dc..745a880726 100644
--- a/hw/arm/raspi.c
+++ b/hw/arm/raspi.c
@@ -158,6 +158,11 @@ static void raspi2_init(MachineState *machine)
 setup_boot(machine, 2, machine->ram_size - vcram_size);
 }
 
+static const char *raspi2_valid_cpus[] = {
+"cortex-a7",
+NULL
+};
+
 static void raspi2_machine_init(MachineClass *mc)
 {
 mc->desc = "Raspberry Pi 2";
@@ -171,5 +176,7 @@ static void raspi2_machine_init(MachineClass *mc)
 mc->default_cpus = BCM2836_NCPUS;
 mc->default_ram_size = 1024 * 1024 * 1024;
 mc->ignore_memory_transaction_failures = true;
+mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-a7");
+mc->valid_cpu_types = raspi2_valid_cpus;
 };
 DEFINE_MACHINE("raspi2", raspi2_machine_init)
-- 
2.14.1

Re: [Qemu-devel] Functional tests (AKA Avocado-based tests)

2018-02-01 Thread Alistair Francis

On Thu, Jan 18, 2018 at 1:25 AM, Lukáš Doktor  wrote:
> Dne 18.1.2018 v 00:41 Alistair Francis napsal(a):
>> On Wed, Jan 17, 2018 at 12:05 AM, Cleber Rosa  wrote:
>>> TL;DR
>>> =
>>>
>>> This is about how QEMU developers can get started with functional
>>> tests that are built on top of the Avocado libraries (and meant to be
>>> run with the Avocado test runner).
>>>
>>> The past
>>> 
>>>
>>> The Avocado project[1] has been working, for quite some time now, on a
>>> "set of tools and libraries" with the goal of making writing tests
>>> easier.  It is supposed to be a framework agnostic to the exact
>>> software that will be under test.
>>>
>>> But, at the same time, the Avocado project cannot deny its inheritance
>>> and influences.  Those come from Autotest[2], which had "KVM Autotest"
>>> as its largest and most developed "test".  This large Autotest test
>>> (KVM Autotest) became virt-test[3] and later got integrated into
>>> Avocado and became Avocado-VT[4] which is quite relevant here,
>>> together with its QEMU test provider[5].
>>>
>>> Avocado-VT and the QEMU test provider attempt to provide coverage
>>> across platform and QEMU versions, which increases its complexity.
>>> Also, it's built on a legacy set of principles and tools that makes
>>> some developers stir away from it.
>>>
>>> What's new?
>>> ===
>>>
>>> A few months ago, the Avocado developers returned to its
>>> "virtualization origins", in an attempt to learn from the QEMU
>>> project, and try to help with a way to have more functional tests in
>>> the upstream QEMU repo.
>>>
>>> We believe it's possible to expand the test coverage for QEMU by
>>> facilitating
>>> the creation of more functional tests QEMU.  This is no different than how
>>> other types of tests are already included in the tree itself.
>>>
>>> How
>>> ===
>>>
>>> How we did it (so far)
>>> --
>>>
>>> We're aware that there's a dilemma here: to be able to easily write
>>> more powerful tests, a lot of the complexity has to be moved
>>> elsewhere.  Here, it means moving complexity from the test itself to a
>>> framework.  The QEMU source tree itself has proofs of this approach,
>>> being the "scripts" and "tests/qemu-iotests" some of the examples.
>>>
>>> Avocado itself[1] provides a lot of the code that should help to
>>> absorb some of the complexities in writing tests, but not exactly
>>> everything that is needed for QEMU.  The approach we believe will have
>>> the best balance is to reuse upstream Avocado libraries whenever they
>>> are useful and generic enough, and on top of that, libraries that are
>>> part of QEMU itself.
>>>
>>> How can you get started with it
>>> ---
>>>
>>> First of all, get Avocado installed.  Besides the Avocado test runner
>>> itself, this will give you the basic libraries on which the other part
>>> of this work was built on.  We want that to be simple and painless, so
>>> here's our best bet for a one-liner installation:
>>>
>>>   pip install --user avocado-framework
>>> avocado-framework-plugin-varianter-yaml-to-mux aexpect
>>>
>>> That will install Avocado within the user's home directory.  If you
>>> give up on it, it can be uninstalled with another simple one-liner:
>>>
>>>   pip uninstall -y avocado-framework
>>> avocado-framework-plugin-varianter-yaml-to-mux aexpect
>>>
>>> Now, suppose you're working on a given feature, and want to try your
>>> luck writing a test using this work.  To avoid having you fetching and
>>> rebasing from our currently in development fork[6] and branch[7], you
>>> can just
>>> add one commit to your tree with:
>>>
>>>   curl
>>> https://patch-diff.githubusercontent.com/raw/apahim/qemu/pull/17.patch |
>>> git am -
>>>
>>> This will get a simple patch from a snapshot branch[8].  You can, of course,
>>> do it "the git way", fetching from that repo[6] and using the
>>> non-snapshotted branch.
>>>
>>> After that, we'd love for you to take a look at some of the existing
>>> tests[9][10] and then attempt to create test for your own use case.
>>> The basic README[11] file, and the Avocado documentation[12] are also
>>> important resources not to be missed.
>>>
>>> What's next?
>>> 
>>>
>>> Initially, feedback is what we're looking for.  It would be greatly
>>> appreciated to understand if/how this suits (or not) use cases out
>>> there.
>>>
>>> After feedback, further refinements, and more tests are written, the
>>> Avocado developers will follow up with an initial patch series for
>>> upstream QEMU.  In such a proposal, we intend to have further
>>> integration.  Ideally in way that "configure" can be given a
>>> "--with-functional-[avocado-]tests" parameter of sorts, and a "make
>>>  [functional-]check" would seamlessly include them.
>>
>> I have a few thoughts.
>>
>> We use pytest/pexpect internally to kick off QEMU runs and monitor the
>> output (no interaction with the QEMU source tree) and

Re: [Qemu-devel] Functional tests (AKA Avocado-based tests)

2018-02-01 Thread Alistair Francis

On Wed, Jan 17, 2018 at 4:47 PM, Cleber Rosa  wrote:
>
>
> On 01/17/2018 06:41 PM, Alistair Francis wrote:
>> On Wed, Jan 17, 2018 at 12:05 AM, Cleber Rosa  wrote:
>>> TL;DR
>>> =
>>>
>>> This is about how QEMU developers can get started with functional
>>> tests that are built on top of the Avocado libraries (and meant to be
>>> run with the Avocado test runner).
>>>
>>> The past
>>> 
>>>
>>> The Avocado project[1] has been working, for quite some time now, on a
>>> "set of tools and libraries" with the goal of making writing tests
>>> easier.  It is supposed to be a framework agnostic to the exact
>>> software that will be under test.
>>>
>>> But, at the same time, the Avocado project cannot deny its inheritance
>>> and influences.  Those come from Autotest[2], which had "KVM Autotest"
>>> as its largest and most developed "test".  This large Autotest test
>>> (KVM Autotest) became virt-test[3] and later got integrated into
>>> Avocado and became Avocado-VT[4] which is quite relevant here,
>>> together with its QEMU test provider[5].
>>>
>>> Avocado-VT and the QEMU test provider attempt to provide coverage
>>> across platform and QEMU versions, which increases its complexity.
>>> Also, it's built on a legacy set of principles and tools that makes
>>> some developers stir away from it.
>>>
>>> What's new?
>>> ===
>>>
>>> A few months ago, the Avocado developers returned to its
>>> "virtualization origins", in an attempt to learn from the QEMU
>>> project, and try to help with a way to have more functional tests in
>>> the upstream QEMU repo.
>>>
>>> We believe it's possible to expand the test coverage for QEMU by
>>> facilitating
>>> the creation of more functional tests QEMU.  This is no different than how
>>> other types of tests are already included in the tree itself.
>>>
>>> How
>>> ===
>>>
>>> How we did it (so far)
>>> --
>>>
>>> We're aware that there's a dilemma here: to be able to easily write
>>> more powerful tests, a lot of the complexity has to be moved
>>> elsewhere.  Here, it means moving complexity from the test itself to a
>>> framework.  The QEMU source tree itself has proofs of this approach,
>>> being the "scripts" and "tests/qemu-iotests" some of the examples.
>>>
>>> Avocado itself[1] provides a lot of the code that should help to
>>> absorb some of the complexities in writing tests, but not exactly
>>> everything that is needed for QEMU.  The approach we believe will have
>>> the best balance is to reuse upstream Avocado libraries whenever they
>>> are useful and generic enough, and on top of that, libraries that are
>>> part of QEMU itself.
>>>
>>> How can you get started with it
>>> ---
>>>
>>> First of all, get Avocado installed.  Besides the Avocado test runner
>>> itself, this will give you the basic libraries on which the other part
>>> of this work was built on.  We want that to be simple and painless, so
>>> here's our best bet for a one-liner installation:
>>>
>>>   pip install --user avocado-framework
>>> avocado-framework-plugin-varianter-yaml-to-mux aexpect
>>>
>>> That will install Avocado within the user's home directory.  If you
>>> give up on it, it can be uninstalled with another simple one-liner:
>>>
>>>   pip uninstall -y avocado-framework
>>> avocado-framework-plugin-varianter-yaml-to-mux aexpect
>>>
>>> Now, suppose you're working on a given feature, and want to try your
>>> luck writing a test using this work.  To avoid having you fetching and
>>> rebasing from our currently in development fork[6] and branch[7], you
>>> can just
>>> add one commit to your tree with:
>>>
>>>   curl
>>> https://patch-diff.githubusercontent.com/raw/apahim/qemu/pull/17.patch |
>>> git am -
>>>
>>> This will get a simple patch from a snapshot branch[8].  You can, of course,
>>> do it "the git way", fetching from that repo[6] and using the
>>> non-snapshotted branch.
>>>
>>> After that, we'd love for you to take a look at some of the existing
>>> tests[9][10] and then attempt to create test for your own use case.
>>> The basic README[11] file, and the Avocado documentation[12] are also
>>> important resources not to be missed.
>>>
>>> What's next?
>>> 
>>>
>>> Initially, feedback is what we're looking for.  It would be greatly
>>> appreciated to understand if/how this suits (or not) use cases out
>>> there.
>>>
>>> After feedback, further refinements, and more tests are written, the
>>> Avocado developers will follow up with an initial patch series for
>>> upstream QEMU.  In such a proposal, we intend to have further
>>> integration.  Ideally in way that "configure" can be given a
>>> "--with-functional-[avocado-]tests" parameter of sorts, and a "make
>>>  [functional-]check" would seamlessly include them.
>>
>> I have a few thoughts.
>>
>> We use pytest/pexpect internally to kick off QEMU runs and monitor the
>> output (no interaction with the QEMU source tree) and I

Re: [Qemu-devel] [PULL 17/51] readline: add a free function

2018-02-01 Thread Paolo Bonzini

On 01/02/2018 19:00, Alex Williamson wrote:
> On Tue, 16 Jan 2018 15:16:59 +0100
> Paolo Bonzini  wrote:
> 
>> From: Marc-André Lureau 
>>
>> Fixes leaks such as:
>>
>> Direct leak of 2 byte(s) in 1 object(s) allocated from:
>> #0 0x7eff58beb850 in malloc (/lib64/libasan.so.4+0xde850)
>> #1 0x7eff57942f0c in g_malloc ../glib/gmem.c:94
>> #2 0x7eff579431cf in g_malloc_n ../glib/gmem.c:331
>> #3 0x7eff5795f6eb in g_strdup ../glib/gstrfuncs.c:363
>> #4 0x55db720f1d46 in readline_hist_add 
>> /home/elmarco/src/qq/util/readline.c:258
>> #5 0x55db720f2d34 in readline_handle_byte 
>> /home/elmarco/src/qq/util/readline.c:387
>> #6 0x55db71539d00 in monitor_read /home/elmarco/src/qq/monitor.c:3896
>> #7 0x55db71f9be35 in qemu_chr_be_write_impl 
>> /home/elmarco/src/qq/chardev/char.c:167
>> #8 0x55db71f9bed3 in qemu_chr_be_write 
>> /home/elmarco/src/qq/chardev/char.c:179
>> #9 0x55db71fa013c in fd_chr_read 
>> /home/elmarco/src/qq/chardev/char-fd.c:66
>> #10 0x55db71fe18a8 in qio_channel_fd_source_dispatch 
>> /home/elmarco/src/qq/io/channel-watch.c:84
>> #11 0x7eff5793a90b in g_main_dispatch ../glib/gmain.c:3182
>> #12 0x7eff5793b7ac in g_main_context_dispatch ../glib/gmain.c:3847
>> #13 0x55db720af3bd in glib_pollfds_poll 
>> /home/elmarco/src/qq/util/main-loop.c:214
>> #14 0x55db720af505 in os_host_main_loop_wait 
>> /home/elmarco/src/qq/util/main-loop.c:261
>> #15 0x55db720af6d6 in main_loop_wait 
>> /home/elmarco/src/qq/util/main-loop.c:515
>> #16 0x55db7184e0de in main_loop /home/elmarco/src/qq/vl.c:1995
>> #17 0x55db7185e956 in main /home/elmarco/src/qq/vl.c:4914
>> #18 0x7eff4ea17039 in __libc_start_main (/lib64/libc.so.6+0x21039)
>>
>> (while at it, use g_new0(ReadLineState), it's a bit easier to read)
>>
>> Signed-off-by: Marc-André Lureau 
>> Reviewed-by: Dr. David Alan Gilbert 
>> Reviewed-by: Philippe Mathieu-Daudé 
>> Message-Id: <20180104160523.22995-11-marcandre.lur...@redhat.com>
>> Signed-off-by: Paolo Bonzini 
>> ---
> 
> I'm having some trouble with this patch, using b05631954d6d:
> 
> # /usr/local/bin/qemu-system-x86_64 -m 1G -nodefaults -net none -monitor 
> stdio -serial none -parallel none -nographic 
> QEMU 2.11.50 monitor - type 'help' for more information
> (qemu) sys
> system_powerdown  system_reset  system_wakeup 
> (qemu) system_p# resulting in system_powerdown
> (qemu) quit
> Segmentation fault (core dumped)
> 
> gdb shows:
> 
> Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
> 0x7f7d64d82927 in malloc () from /lib64/libc.so.6
> (gdb) bt
> #0  0x7f7d64d82927 in malloc () at /lib64/libc.so.6
> #1  0x7f7d6ef68359 in g_malloc () at /lib64/libglib-2.0.so.0
> #2  0x7f7d6ef83004 in g_strsplit () at /lib64/libglib-2.0.so.0
> #3  0x55e5ac0d549d in container_get (root=0x55e5ad570ee0, 
> path=path@entry=0x55e5ac2fa0f8 "/chardevs") at qom/container.c:34
> #4  0x55e5ac14d102 in get_chardevs_root () at chardev/char.c:43
> #5  0x55e5ac14ec4d in qemu_chr_cleanup () at chardev/char.c:1107
> #6  0x55e5abeff1c4 in main (argc=, argv=, 
> envp=) at vl.c:4780
> 
> Reverting this patch, commit e5dc1a6c6c435, I don't see the issue.
> Thanks,

Yeah, I have a fix queued.  Unfortunately, I don't have the usual setup
to do pre-pull-request sets here so it will have to wait for next Monday.

Paolo

Re: [Qemu-devel] [PULL 17/51] readline: add a free function

2018-02-01 Thread Alex Williamson

On Tue, 16 Jan 2018 15:16:59 +0100
Paolo Bonzini  wrote:

> From: Marc-André Lureau 
> 
> Fixes leaks such as:
> 
> Direct leak of 2 byte(s) in 1 object(s) allocated from:
> #0 0x7eff58beb850 in malloc (/lib64/libasan.so.4+0xde850)
> #1 0x7eff57942f0c in g_malloc ../glib/gmem.c:94
> #2 0x7eff579431cf in g_malloc_n ../glib/gmem.c:331
> #3 0x7eff5795f6eb in g_strdup ../glib/gstrfuncs.c:363
> #4 0x55db720f1d46 in readline_hist_add 
> /home/elmarco/src/qq/util/readline.c:258
> #5 0x55db720f2d34 in readline_handle_byte 
> /home/elmarco/src/qq/util/readline.c:387
> #6 0x55db71539d00 in monitor_read /home/elmarco/src/qq/monitor.c:3896
> #7 0x55db71f9be35 in qemu_chr_be_write_impl 
> /home/elmarco/src/qq/chardev/char.c:167
> #8 0x55db71f9bed3 in qemu_chr_be_write 
> /home/elmarco/src/qq/chardev/char.c:179
> #9 0x55db71fa013c in fd_chr_read /home/elmarco/src/qq/chardev/char-fd.c:66
> #10 0x55db71fe18a8 in qio_channel_fd_source_dispatch 
> /home/elmarco/src/qq/io/channel-watch.c:84
> #11 0x7eff5793a90b in g_main_dispatch ../glib/gmain.c:3182
> #12 0x7eff5793b7ac in g_main_context_dispatch ../glib/gmain.c:3847
> #13 0x55db720af3bd in glib_pollfds_poll 
> /home/elmarco/src/qq/util/main-loop.c:214
> #14 0x55db720af505 in os_host_main_loop_wait 
> /home/elmarco/src/qq/util/main-loop.c:261
> #15 0x55db720af6d6 in main_loop_wait 
> /home/elmarco/src/qq/util/main-loop.c:515
> #16 0x55db7184e0de in main_loop /home/elmarco/src/qq/vl.c:1995
> #17 0x55db7185e956 in main /home/elmarco/src/qq/vl.c:4914
> #18 0x7eff4ea17039 in __libc_start_main (/lib64/libc.so.6+0x21039)
> 
> (while at it, use g_new0(ReadLineState), it's a bit easier to read)
> 
> Signed-off-by: Marc-André Lureau 
> Reviewed-by: Dr. David Alan Gilbert 
> Reviewed-by: Philippe Mathieu-Daudé 
> Message-Id: <20180104160523.22995-11-marcandre.lur...@redhat.com>
> Signed-off-by: Paolo Bonzini 
> ---

I'm having some trouble with this patch, using b05631954d6d:

# /usr/local/bin/qemu-system-x86_64 -m 1G -nodefaults -net none -monitor stdio 
-serial none -parallel none -nographic 
QEMU 2.11.50 monitor - type 'help' for more information
(qemu) sys
system_powerdown  system_reset  system_wakeup 
(qemu) system_p# resulting in system_powerdown
(qemu) quit
Segmentation fault (core dumped)

gdb shows:

Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
0x7f7d64d82927 in malloc () from /lib64/libc.so.6
(gdb) bt
#0  0x7f7d64d82927 in malloc () at /lib64/libc.so.6
#1  0x7f7d6ef68359 in g_malloc () at /lib64/libglib-2.0.so.0
#2  0x7f7d6ef83004 in g_strsplit () at /lib64/libglib-2.0.so.0
#3  0x55e5ac0d549d in container_get (root=0x55e5ad570ee0, 
path=path@entry=0x55e5ac2fa0f8 "/chardevs") at qom/container.c:34
#4  0x55e5ac14d102 in get_chardevs_root () at chardev/char.c:43
#5  0x55e5ac14ec4d in qemu_chr_cleanup () at chardev/char.c:1107
#6  0x55e5abeff1c4 in main (argc=, argv=, 
envp=) at vl.c:4780

Reverting this patch, commit e5dc1a6c6c435, I don't see the issue.
Thanks,

Alex

Re: [Qemu-devel] [PATCH v3 36/39] qcow2: Allow configuring the L2 slice size

2018-02-01 Thread Max Reitz

On 2018-01-26 16:00, Alberto Garcia wrote:
> Now that the code is ready to handle L2 slices we can finally add an
> option to allow configuring their size.
> 
> An L2 slice is the portion of an L2 table that is read by the qcow2
> cache. Until now the cache was always reading full L2 tables, and
> since the L2 table size is equal to the cluster size this was not very
> efficient with large clusters. Here's a more detailed explanation of
> why it makes sense to have smaller cache entries in order to load L2
> data:
> 
>https://lists.gnu.org/archive/html/qemu-block/2017-09/msg00635.html
> 
> This patch introduces a new command-line option to the qcow2 driver
> named l2-cache-entry-size (cf. l2-cache-size). The cache entry size
> has the same restrictions as the cluster size: it must be a power of
> two and it has the same range of allowed values, with the additional
> requirement that it must not be larger than the cluster size.
> 
> The L2 cache entry size (L2 slice size) remains equal to the cluster
> size for now by default, so this feature must be explicitly enabled.
> Although my tests show that 4KB slices consistently improve
> performance and give the best results, let's wait and make more tests
> with different cluster sizes before deciding on an optimal default.
> 
> Now that the cache entry size is not necessarily equal to the cluster
> size we need to reflect that in the MIN_L2_CACHE_SIZE documentation.
> That minimum value is a requirement of the COW algorithm: we need to
> read two L2 slices (and not two L2 tables) in order to do COW, see
> l2_allocate() for the actual code.
> 
> Signed-off-by: Alberto Garcia 
> ---
>  block/qcow2-cache.c  | 10 --
>  block/qcow2.c| 34 +++---
>  block/qcow2.h|  6 --
>  qapi/block-core.json |  6 ++
>  4 files changed, 45 insertions(+), 11 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PULL 0/8] x86 queue, 2018-01-17

2018-02-01 Thread Michael Roth

Quoting Eduardo Habkost (2018-01-26 12:08:16)
> It looks like there's some confusion here: I don't have
> additional proposed changes; the flag names and MSR code I
> mentioned are already merged on master (through this pull
> request).

Ah, my mistake, I thought the flags were being proposed as follow-up.
Will pull these in as is then. Thanks!

> 
> -- 
> Eduardo
>

[Qemu-devel] [PATCH 1/2] tpm: Split off tpm_crb_reset function

2018-02-01 Thread Stefan Berger

Split off the tpm_crb_reset function part from tpm_crb_realize
that we need to run every time the machine resets.

Signed-off-by: Stefan Berger 
---
 hw/tpm/tpm_crb.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/hw/tpm/tpm_crb.c b/hw/tpm/tpm_crb.c
index 687d255..624e2e9 100644
--- a/hw/tpm/tpm_crb.c
+++ b/hw/tpm/tpm_crb.c
@@ -232,6 +232,11 @@ static void tpm_crb_realize(DeviceState *dev, Error **errp)
 TPM_CRB_ADDR_BASE, >mmio);
 memory_region_add_subregion(get_system_memory(),
 TPM_CRB_ADDR_BASE + sizeof(s->regs), >cmdmem);
+}
+
+static void tpm_crb_reset(DeviceState *dev)
+{
+CRBState *s = CRB(dev);
 
 tpm_backend_reset(s->tpmbe);
 
@@ -274,6 +279,7 @@ static void tpm_crb_class_init(ObjectClass *klass, void 
*data)
 
 dc->realize = tpm_crb_realize;
 dc->props = tpm_crb_properties;
+dc->reset = tpm_crb_reset;
 dc->vmsd  = _tpm_crb;
 dc->user_creatable = true;
 tc->model = TPM_MODEL_TPM_CRB;
-- 
2.5.5

[Qemu-devel] [PATCH 0/2] tpm: A fix and a cleanup

2018-02-01 Thread Stefan Berger

The following two patches fix the resetting of the CRB interface
and wrap calls to st{w,l}_be_p in tpm_cmd_set_XYZ functions.

   Stefan

Stefan Berger (2):
  tpm: Split off tpm_crb_reset function
  tpm: wrap stX_be_p in tpm_cmd_set_XYZ functions

 hw/tpm/tpm_crb.c  |  6 ++
 hw/tpm/tpm_util.c |  6 +++---
 hw/tpm/tpm_util.h | 15 +++
 3 files changed, 24 insertions(+), 3 deletions(-)

-- 
2.5.5

[Qemu-devel] [PATCH 2/2] tpm: wrap stX_be_p in tpm_cmd_set_XYZ functions

2018-02-01 Thread Stefan Berger

Wrap the calls to stl_be_p and stw_be_p in tpm_cmd_set_XYZ functions
that are similar to existing getters.

Signed-off-by: Stefan Berger 
---
 hw/tpm/tpm_util.c |  6 +++---
 hw/tpm/tpm_util.h | 15 +++
 2 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/hw/tpm/tpm_util.c b/hw/tpm/tpm_util.c
index 8abde59..2de52a0 100644
--- a/hw/tpm/tpm_util.c
+++ b/hw/tpm/tpm_util.c
@@ -106,9 +106,9 @@ const PropertyInfo qdev_prop_tpm = {
 void tpm_util_write_fatal_error_response(uint8_t *out, uint32_t out_len)
 {
 if (out_len >= sizeof(struct tpm_resp_hdr)) {
-stw_be_p(out, TPM_TAG_RSP_COMMAND);
-stl_be_p(out + 2, sizeof(struct tpm_resp_hdr));
-stl_be_p(out + 6, TPM_FAIL);
+tpm_cmd_set_tag(out, TPM_TAG_RSP_COMMAND);
+tpm_cmd_set_size(out, sizeof(struct tpm_resp_hdr));
+tpm_cmd_set_error(out, TPM_FAIL);
 }
 }
 
diff --git a/hw/tpm/tpm_util.h b/hw/tpm/tpm_util.h
index f003d15..f397ac2 100644
--- a/hw/tpm/tpm_util.h
+++ b/hw/tpm/tpm_util.h
@@ -36,11 +36,21 @@ static inline uint16_t tpm_cmd_get_tag(const void *b)
 return lduw_be_p(b);
 }
 
+static inline void tpm_cmd_set_tag(void *b, uint16_t tag)
+{
+stw_be_p(b, tag);
+}
+
 static inline uint32_t tpm_cmd_get_size(const void *b)
 {
 return ldl_be_p(b + 2);
 }
 
+static inline void tpm_cmd_set_size(void *b, uint32_t size)
+{
+stl_be_p(b + 2, size);
+}
+
 static inline uint32_t tpm_cmd_get_ordinal(const void *b)
 {
 return ldl_be_p(b + 6);
@@ -51,6 +61,11 @@ static inline uint32_t tpm_cmd_get_errcode(const void *b)
 return ldl_be_p(b + 6);
 }
 
+static inline void tpm_cmd_set_error(void *b, uint32_t error)
+{
+stl_be_p(b + 6, error);
+}
+
 int tpm_util_get_buffer_size(int tpm_fd, TPMVersion tpm_version,
  size_t *buffersize);
 
-- 
2.5.5

Re: [Qemu-devel] [PATCH v4 0/5] Add a valid_cpu_types property

2018-02-01 Thread Alistair Francis

On Thu, Jan 11, 2018 at 4:58 AM, Eduardo Habkost  wrote:
> On Thu, Jan 11, 2018 at 11:25:08AM +0100, Igor Mammedov wrote:
>> On Wed, 10 Jan 2018 19:48:00 -0200
>> Eduardo Habkost  wrote:
>>
>> > On Wed, Jan 10, 2018 at 01:30:29PM -0800, Alistair Francis wrote:
>> > > On Thu, Dec 28, 2017 at 6:59 AM, Eduardo Habkost  
>> > > wrote:
>> > > > On Thu, Dec 28, 2017 at 02:39:31PM +0100, Igor Mammedov wrote:
>> > > >> On Fri, 22 Dec 2017 11:47:00 -0800
>> > > >> Alistair Francis  wrote:
>> > > >>
>> > > >> > On Fri, Dec 22, 2017 at 10:45 AM, Alistair Francis
>> > > >> >  wrote:
>> > > >> > > On Wed, Dec 20, 2017 at 2:06 PM, Eduardo Habkost 
>> > > >> > >  wrote:
>> > > >> > >> On Tue, Dec 19, 2017 at 05:03:59PM -0800, Alistair Francis wrote:
>> > > >> > >>> On Tue, Dec 19, 2017 at 4:55 PM, Alistair Francis
>> > > >> > >>>  wrote:
>> > > >> > >>> > On Tue, Dec 19, 2017 at 4:43 PM, Peter Maydell 
>> > > >> > >>> >  wrote:
>> > > >> > >>> >> On 20 December 2017 at 00:27, Alistair Francis
>> > > >> > >>> >>  wrote:
>> > > >> > >>> >>> There are numorous QEMU machines that only have a single or 
>> > > >> > >>> >>> a handful of
>> > > >> > >>> >>> valid CPU options. To simplyfy the management of 
>> > > >> > >>> >>> specificying which CPU
>> > > >> > >>> >>> is/isn't valid let's create a property that can be set in 
>> > > >> > >>> >>> the machine
>> > > >> > >>> >>> init. We can then check to see if the user supplied CPU is 
>> > > >> > >>> >>> in that list
>> > > >> > >>> >>> or not.
>> > > >> > >>> >>>
>> > > >> > >>> >>> I have added the valid_cpu_types for some ARM machines only 
>> > > >> > >>> >>> at the
>> > > >> > >>> >>> moment.
>> > > >> > >>> >>>
>> > > >> > >>> >>> Here is what specifying the CPUs looks like now:
>> > > >> > >>> >>>
>> > > >> > >>> >>> $ aarch64-softmmu/qemu-system-aarch64 -M netduino2 -kernel 
>> > > >> > >>> >>> ./u-boot.elf -nographic -cpu "cortex-m3" -S
>> > > >> > >>> >>> QEMU 2.10.50 monitor - type 'help' for more information
>> > > >> > >>> >>> (qemu) info cpus
>> > > >> > >>> >>> * CPU #0: thread_id=24175
>> > > >> > >>> >>> (qemu) q
>> > > >> > >>> >>>
>> > > >> > >>> >>> $ aarch64-softmmu/qemu-system-aarch64 -M netduino2 -kernel 
>> > > >> > >>> >>> ./u-boot.elf -nographic -cpu "cortex-m4" -S
>> > > >> > >>> >>> QEMU 2.10.50 monitor - type 'help' for more information
>> > > >> > >>> >>> (qemu) q
>> > > >> > >>> >>>
>> > > >> > >>> >>> $ aarch64-softmmu/qemu-system-aarch64 -M netduino2 -kernel 
>> > > >> > >>> >>> ./u-boot.elf -nographic -cpu "cortex-m5" -S
>> > > >> > >>> >>> qemu-system-aarch64: unable to find CPU model 'cortex-m5'
>> > > >> > >>> >>>
>> > > >> > >>> >>> $ aarch64-softmmu/qemu-system-aarch64 -M netduino2 -kernel 
>> > > >> > >>> >>> ./u-boot.elf -nographic -cpu "cortex-a9" -S
>> > > >> > >>> >>> qemu-system-aarch64: Invalid CPU type: cortex-a9-arm-cpu
>> > > >> > >>> >>> The valid types are: cortex-m3-arm-cpu, cortex-m4-arm-cpu
>> > > >> > >>> >>
>> > > >> > >>> >> Thanks for this; we really should be more strict about
>> > > >> > >>> >> forbidding "won't work" combinations than we have
>> > > >> > >>> >> been in the past.
>> > > >> > >>> >>
>> > > >> > >>> >> In the last of these cases, I think that when we
>> > > >> > >>> >> list the invalid CPU type and the valid types
>> > > >> > >>> >> we should use the same names we want the user to
>> > > >> > >>> >> use on the command line, without the "-arm-cpu"
>> > > >> > >>> >> suffixes.
>> > > >> > >>> >
>> > > >> > >>> > Hmm... That is a good point, it is confusing that they don't 
>> > > >> > >>> > line up.
>> > > >> > >>
>> > > >> > >> Agreed.
>> > > >> > >>
>> > > >> > >>> >
>> > > >> > >>> > The problem is that we are just doing a simple
>> > > >> > >>> > object_class_dynamic_cast() in hw/core/machine.c which I think
>> > > >> > >>> > (untested) requires us to have the full name in the valid cpu 
>> > > >> > >>> > array.
>> > > >> > >> [...]
>> > > >> > >>>
>> > > >> > >>> I think an earlier version of my previous series adding the 
>> > > >> > >>> support to
>> > > >> > >>> machine.c did string comparison, but it was decided to utilise 
>> > > >> > >>> objects
>> > > >> > >>> instead. One option is to make the array 2 wide and have the 
>> > > >> > >>> second
>> > > >> > >>> string be user friendly?
>> > > >> > >>
>> > > >> > >> Making the array 2-column will duplicate information that we can
>> > > >> > >> already find out using other methods, and it won't solve the
>> > > >> > >> problem if an entry has a parent class with multiple subclasses
>> > > >> > >> (the original reason I suggested object_class_dynamic_cast()).
>> > > >> > >>
>> > > >> > >> The main obstacle to fix this easily is that we do have a common
>> > > >> > >>   ObjectClass *cpu_class_by_name(const char *cpu_model)
>>

Re: [Qemu-devel] Qemu Trace

2018-02-01 Thread Dongli Zhang

Hi Nesrine,

On 02/01/2018 11:30 PM, Nesrine Zouari wrote:
> Hello,
> 
> I am a computer engineering student and I am actually working on my
> graduation project at Lauterbach company. The project is about Qemu Trace
> and as a future I would like to contribute this work to the main line.
> 
> My project is divided into two parts:
> 
> 1/ Collecting the Guest trace data : The trace solution should be able to
> provide:
> 
> a/ Instruction flow Trace
> 
> b/ Memory read/write access

About mem read/write, there is a project panda:

https://github.com/panda-re/panda

> 
> c/ Time Stamps.
> 
> d/ For tracing rich operating systems that are using MMU, we
> additionally need to trace the task switches.
> 
> 2/ Sending the collected data to a third party tool for analysis.
> 
> My question is about the first part. I would like to know, which trace
> backend that better fit my use case.
> 
> 
> Regards,
> 
> -
> 
> Nesrine ZOUARI
> Computer Engineering Student
> Department of Computer Engineering and Applied Mathematics
> National Engineering School of Sfax (ENIS)
> University of Sfax-Tunisia
> Tel: +216 52 620 475
> 

Dongli Zhang

Re: [Qemu-devel] [PATCH v4 0/5] coroutine-lock: polymorphic CoQueue

2018-02-01 Thread no-reply

Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20180201212917.18131-1-pbonz...@redhat.com
Subject: [Qemu-devel] [PATCH v4 0/5] coroutine-lock: polymorphic CoQueue

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

git config --local diff.renamelimit 0
git config --local diff.renames True

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
failed=1
echo
fi
n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
87da24d9ea curl: convert to CoQueue
a94c2af350 coroutine-lock: make qemu_co_enter_next thread-safe
57eba09708 coroutine-lock: convert CoQueue to use QemuLockable
6efa617c67 lockable: add QemuLockable
1423cb0ab4 test-coroutine: add simple CoMutex test

=== OUTPUT BEGIN ===
Checking PATCH 1/5: test-coroutine: add simple CoMutex test...
ERROR: do not initialise statics to 0 or NULL
#30: FILE: tests/test-coroutine.c:198:
+static bool locked = false;

total: 1 errors, 0 warnings, 74 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 2/5: lockable: add QemuLockable...
WARNING: line over 80 characters
#58: FILE: include/qemu/compiler.h:144:
+#define QEMU_GENERIC2(x, a0, ...) QEMU_GENERIC_IF(x, a0, QEMU_GENERIC1(x, 
__VA_ARGS__))

WARNING: line over 80 characters
#59: FILE: include/qemu/compiler.h:145:
+#define QEMU_GENERIC3(x, a0, ...) QEMU_GENERIC_IF(x, a0, QEMU_GENERIC2(x, 
__VA_ARGS__))

WARNING: line over 80 characters
#60: FILE: include/qemu/compiler.h:146:
+#define QEMU_GENERIC4(x, a0, ...) QEMU_GENERIC_IF(x, a0, QEMU_GENERIC3(x, 
__VA_ARGS__))

WARNING: line over 80 characters
#61: FILE: include/qemu/compiler.h:147:
+#define QEMU_GENERIC5(x, a0, ...) QEMU_GENERIC_IF(x, a0, QEMU_GENERIC4(x, 
__VA_ARGS__))

WARNING: line over 80 characters
#62: FILE: include/qemu/compiler.h:148:
+#define QEMU_GENERIC6(x, a0, ...) QEMU_GENERIC_IF(x, a0, QEMU_GENERIC5(x, 
__VA_ARGS__))

WARNING: line over 80 characters
#63: FILE: include/qemu/compiler.h:149:
+#define QEMU_GENERIC7(x, a0, ...) QEMU_GENERIC_IF(x, a0, QEMU_GENERIC6(x, 
__VA_ARGS__))

WARNING: line over 80 characters
#64: FILE: include/qemu/compiler.h:150:
+#define QEMU_GENERIC8(x, a0, ...) QEMU_GENERIC_IF(x, a0, QEMU_GENERIC7(x, 
__VA_ARGS__))

WARNING: line over 80 characters
#65: FILE: include/qemu/compiler.h:151:
+#define QEMU_GENERIC9(x, a0, ...) QEMU_GENERIC_IF(x, a0, QEMU_GENERIC8(x, 
__VA_ARGS__))

WARNING: line over 80 characters
#66: FILE: include/qemu/compiler.h:152:
+#define QEMU_GENERIC10(x, a0, ...) QEMU_GENERIC_IF(x, a0, QEMU_GENERIC9(x, 
__VA_ARGS__))

WARNING: line over 80 characters
#124: FILE: include/qemu/lockable.h:28:
+ * to QEMU_MAKE_LOCKABLE.  For optimized builds, we can rely on dead-code 
elimination

WARNING: architecture specific defines should be avoided
#127: FILE: include/qemu/lockable.h:31:
+#ifdef __OPTIMIZE__

total: 0 errors, 11 warnings, 242 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Checking PATCH 3/5: coroutine-lock: convert CoQueue to use QemuLockable...
Checking PATCH 4/5: coroutine-lock: make qemu_co_enter_next thread-safe...
Checking PATCH 5/5: curl: convert to CoQueue...
=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-de...@freelists.org

Re: [Qemu-devel] [PATCH 2/4] tests/boot-serial-test: Add support for the aarch64 virt machine

2018-02-01 Thread Thomas Huth

On 01.02.2018 18:28, Wei Huang wrote:
> This patch adds a small binary kernel to test aarch64 virt machine's
> UART.
> 
> Signed-off-by: Wei Huang 
> ---
>  tests/Makefile.include   | 1 +
>  tests/boot-serial-test.c | 9 +
>  2 files changed, 10 insertions(+)
> 
> diff --git a/tests/Makefile.include b/tests/Makefile.include
> index ca82e0c..ebdb151 100644
> --- a/tests/Makefile.include
> +++ b/tests/Makefile.include
> @@ -367,6 +367,7 @@ gcov-files-arm-y += hw/timer/arm_mptimer.c
>  check-qtest-arm-y += tests/boot-serial-test$(EXESUF)
>  
>  check-qtest-aarch64-y = tests/numa-test$(EXESUF)
> +check-qtest-aarch64-y += tests/boot-serial-test$(EXESUF)
>  
>  check-qtest-microblazeel-y = $(check-qtest-microblaze-y)
>  
> diff --git a/tests/boot-serial-test.c b/tests/boot-serial-test.c
> index 418c5b9..66f7a84 100644
> --- a/tests/boot-serial-test.c
> +++ b/tests/boot-serial-test.c
> @@ -55,6 +55,13 @@ static const uint8_t bios_raspi2[] = {
>  0x00, 0x10, 0x20, 0x3f, /* 0x3f201000 = UART0 base addr 
> */
>  };
>  
> +static const uint8_t kernel_aarch64[] = {
> +0x81, 0x0a, 0x80, 0x52, /* mov w1, #0x54 */
> +0x02, 0x20, 0xa1, 0xd2, /* mov x2, #0x900 */
> +0x41, 0x00, 0x00, 0x39, /* strbw1, [x2] */
> +0xfd, 0xff, 0xff, 0x17, /* b   -12 (loop) */
> +};
> +
>  typedef struct testdef {
>  const char *arch;   /* Target architecture */
>  const char *machine;/* Name of the machine */
> @@ -87,6 +94,8 @@ static testdef_t tests[] = {
>sizeof(kernel_plml605), kernel_plml605 },
>  { "moxie", "moxiesim", "", "TT", sizeof(bios_moxiesim), 0, bios_moxiesim 
> },
>  { "arm", "raspi2", "", "TT", sizeof(bios_raspi2), 0, bios_raspi2 },
> +{ "aarch64", "virt", "-cpu cortex-a57", "TT", sizeof(kernel_aarch64),
> +  kernel_aarch64 },
>  
>  { NULL }
>  };
> 

Acked-by: Thomas Huth

[Qemu-devel] [PATCH 2/5] lockable: add QemuLockable

2018-02-01 Thread Paolo Bonzini

QemuLockable is a polymorphic lock type that takes an object and
knows which function to use for locking and unlocking.  The
implementation could use C11 _Generic, but since the support is
not very widespread I am instead using __builtin_choose_expr and
__builtin_types_compatible_p, which are already used by
include/qemu/atomic.h.

QemuLockable can be used to implement lock guards, or to pass around
a lock in such a way that a function can release it and re-acquire it.
The next patch will do this for CoQueue.

Signed-off-by: Paolo Bonzini 
---
 include/qemu/compiler.h  | 40 
 include/qemu/coroutine.h |  4 +-
 include/qemu/lockable.h  | 96 
 include/qemu/thread.h|  5 +--
 include/qemu/typedefs.h  |  4 ++
 tests/test-coroutine.c   | 25 +
 6 files changed, 169 insertions(+), 5 deletions(-)
 create mode 100644 include/qemu/lockable.h

diff --git a/include/qemu/compiler.h b/include/qemu/compiler.h
index 340e5fdc09..5179bedb1e 100644
--- a/include/qemu/compiler.h
+++ b/include/qemu/compiler.h
@@ -111,4 +111,44 @@
 #define GCC_FMT_ATTR(n, m)
 #endif
 
+/* Implement C11 _Generic via GCC builtins.  Example:
+ *
+ *QEMU_GENERIC(x, (float, sinf), (long double, sinl), sin) (x)
+ *
+ * The first argument is the discriminator.  The last is the default value.
+ * The middle ones are tuples in "(type, expansion)" format.
+ */
+
+/* First, find out the number of generic cases.  */
+#define QEMU_GENERIC(x, ...) \
+QEMU_GENERIC_(typeof(x), __VA_ARGS__, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0)
+
+/* There will be extra arguments, but they are not used.  */
+#define QEMU_GENERIC_(x, a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, count, ...) \
+QEMU_GENERIC##count(x, a0, a1, a2, a3, a4, a5, a6, a7, a8, a9)
+
+/* Two more helper macros, this time to extract items from a parenthesized
+ * list.
+ */
+#define QEMU_FIRST_(a, b) a
+#define QEMU_SECOND_(a, b) b
+
+/* ... and a final one for the common part of the "recursion".  */
+#define QEMU_GENERIC_IF(x, type_then, else_)   
\
+__builtin_choose_expr(__builtin_types_compatible_p(x,  
\
+   QEMU_FIRST_ type_then), 
\
+  QEMU_SECOND_ type_then, else_)
+
+/* CPP poor man's "recursion".  */
+#define QEMU_GENERIC1(x, a0, ...) (a0)
+#define QEMU_GENERIC2(x, a0, ...) QEMU_GENERIC_IF(x, a0, QEMU_GENERIC1(x, 
__VA_ARGS__))
+#define QEMU_GENERIC3(x, a0, ...) QEMU_GENERIC_IF(x, a0, QEMU_GENERIC2(x, 
__VA_ARGS__))
+#define QEMU_GENERIC4(x, a0, ...) QEMU_GENERIC_IF(x, a0, QEMU_GENERIC3(x, 
__VA_ARGS__))
+#define QEMU_GENERIC5(x, a0, ...) QEMU_GENERIC_IF(x, a0, QEMU_GENERIC4(x, 
__VA_ARGS__))
+#define QEMU_GENERIC6(x, a0, ...) QEMU_GENERIC_IF(x, a0, QEMU_GENERIC5(x, 
__VA_ARGS__))
+#define QEMU_GENERIC7(x, a0, ...) QEMU_GENERIC_IF(x, a0, QEMU_GENERIC6(x, 
__VA_ARGS__))
+#define QEMU_GENERIC8(x, a0, ...) QEMU_GENERIC_IF(x, a0, QEMU_GENERIC7(x, 
__VA_ARGS__))
+#define QEMU_GENERIC9(x, a0, ...) QEMU_GENERIC_IF(x, a0, QEMU_GENERIC8(x, 
__VA_ARGS__))
+#define QEMU_GENERIC10(x, a0, ...) QEMU_GENERIC_IF(x, a0, QEMU_GENERIC9(x, 
__VA_ARGS__))
+
 #endif /* COMPILER_H */
diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
index ce2eb73670..8a5129741c 100644
--- a/include/qemu/coroutine.h
+++ b/include/qemu/coroutine.h
@@ -121,7 +121,7 @@ bool qemu_coroutine_entered(Coroutine *co);
  * Provides a mutex that can be used to synchronise coroutines
  */
 struct CoWaitRecord;
-typedef struct CoMutex {
+struct CoMutex {
 /* Count of pending lockers; 0 for a free mutex, 1 for an
  * uncontended mutex.
  */
@@ -142,7 +142,7 @@ typedef struct CoMutex {
 unsigned handoff, sequence;
 
 Coroutine *holder;
-} CoMutex;
+};
 
 /**
  * Initialises a CoMutex. This must be called before any other operation is 
used
diff --git a/include/qemu/lockable.h b/include/qemu/lockable.h
new file mode 100644
index 00..826ac3f675
--- /dev/null
+++ b/include/qemu/lockable.h
@@ -0,0 +1,96 @@
+/*
+ * Polymorphic locking functions (aka poor man templates)
+ *
+ * Copyright Red Hat, Inc. 2017, 2018
+ *
+ * Author: Paolo Bonzini 
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#ifndef QEMU_LOCKABLE_H
+#define QEMU_LOCKABLE_H
+
+#include "qemu/coroutine.h"
+#include "qemu/thread.h"
+
+typedef void QemuLockUnlockFunc(void *);
+
+struct QemuLockable {
+void *object;
+QemuLockUnlockFunc *lock;
+QemuLockUnlockFunc *unlock;
+};
+
+/* This function gives an error if an invalid, non-NULL pointer type is passed
+ * to QEMU_MAKE_LOCKABLE.  For optimized builds, we can rely on dead-code 
elimination
+ * from the compiler, and give the errors already at link time.
+ */
+#ifdef __OPTIMIZE__
+void unknown_lock_type(void *);
+#else
+static inline

[Qemu-devel] [PATCH 4/5] coroutine-lock: make qemu_co_enter_next thread-safe

2018-02-01 Thread Paolo Bonzini

qemu_co_queue_next does not need to release and re-acquire the mutex,
because the queued coroutine does not run immediately.  However, this
does not hold for qemu_co_enter_next.  Now that qemu_co_queue_wait
can synchronize (via QemuLockable) with code that is not running in
coroutine context, it's important that code using qemu_co_enter_next
can easily use a standardized locking idiom.

First of all, qemu_co_enter_next must use aio_co_wake to restart the
coroutine.  Second, the function gains a second argument, a QemuLockable*,
and the comments of qemu_co_queue_next and qemu_co_queue_restart_all
are adjusted to clarify the difference.

Signed-off-by: Paolo Bonzini 
---
 fsdev/qemu-fsdev-throttle.c |  4 ++--
 include/qemu/coroutine.h| 19 +--
 util/qemu-coroutine-lock.c  | 10 --
 3 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/fsdev/qemu-fsdev-throttle.c b/fsdev/qemu-fsdev-throttle.c
index 49eebb5412..1dc07fbc12 100644
--- a/fsdev/qemu-fsdev-throttle.c
+++ b/fsdev/qemu-fsdev-throttle.c
@@ -20,13 +20,13 @@
 static void fsdev_throttle_read_timer_cb(void *opaque)
 {
 FsThrottle *fst = opaque;
-qemu_co_enter_next(>throttled_reqs[false]);
+qemu_co_enter_next(>throttled_reqs[false], NULL);
 }
 
 static void fsdev_throttle_write_timer_cb(void *opaque)
 {
 FsThrottle *fst = opaque;
-qemu_co_enter_next(>throttled_reqs[true]);
+qemu_co_enter_next(>throttled_reqs[true], NULL);
 }
 
 void fsdev_throttle_parse_opts(QemuOpts *opts, FsThrottle *fst, Error **errp)
diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
index 1e5f0957e6..3afee7169b 100644
--- a/include/qemu/coroutine.h
+++ b/include/qemu/coroutine.h
@@ -188,21 +188,28 @@ void qemu_co_queue_init(CoQueue *queue);
 void coroutine_fn qemu_co_queue_wait_impl(CoQueue *queue, QemuLockable *lock);
 
 /**
- * Restarts the next coroutine in the CoQueue and removes it from the queue.
- *
- * Returns true if a coroutine was restarted, false if the queue is empty.
+ * Removes the next coroutine from the CoQueue, and wake it up.
+ * Returns true if a coroutine was removed, false if the queue is empty.
  */
 bool coroutine_fn qemu_co_queue_next(CoQueue *queue);
 
 /**
- * Restarts all coroutines in the CoQueue and leaves the queue empty.
+ * Empties the CoQueue; all coroutines are woken up.
  */
 void coroutine_fn qemu_co_queue_restart_all(CoQueue *queue);
 
 /**
- * Enter the next coroutine in the queue
+ * Removes the next coroutine from the CoQueue, and wake it up.  Unlike
+ * qemu_co_queue_next, this function releases the lock during aio_co_wake
+ * because it is meant to be used outside coroutine context; in that case, the
+ * coroutine is entered immediately, before qemu_co_enter_next returns.
+ *
+ * If used in coroutine context, qemu_co_enter_next is equivalent to
+ * qemu_co_queue_next.
  */
-bool qemu_co_enter_next(CoQueue *queue);
+#define qemu_co_enter_next(queue, lock) \
+qemu_co_enter_next_impl(queue, QEMU_MAKE_LOCKABLE(lock))
+bool qemu_co_enter_next_impl(CoQueue *queue, QemuLockable *lock);
 
 /**
  * Checks if the CoQueue is empty.
diff --git a/util/qemu-coroutine-lock.c b/util/qemu-coroutine-lock.c
index 2a66fc1467..78fb79acf8 100644
--- a/util/qemu-coroutine-lock.c
+++ b/util/qemu-coroutine-lock.c
@@ -132,7 +132,7 @@ void coroutine_fn qemu_co_queue_restart_all(CoQueue *queue)
 qemu_co_queue_do_restart(queue, false);
 }
 
-bool qemu_co_enter_next(CoQueue *queue)
+bool qemu_co_enter_next_impl(CoQueue *queue, QemuLockable *lock)
 {
 Coroutine *next;
 
@@ -142,7 +142,13 @@ bool qemu_co_enter_next(CoQueue *queue)
 }
 
 QSIMPLEQ_REMOVE_HEAD(>entries, co_queue_next);
-qemu_coroutine_enter(next);
+if (lock) {
+qemu_lockable_unlock(lock);
+}
+aio_co_wake(next);
+if (lock) {
+qemu_lockable_lock(lock);
+}
 return true;
 }
 
-- 
2.14.3

[Qemu-devel] [PATCH 5/5] curl: convert to CoQueue

2018-02-01 Thread Paolo Bonzini

Now that CoQueues can use a QemuMutex for thread-safety, there is no
need for curl to roll its own coroutine queue.  Coroutines can be
placed directly on the queue instead of using a list of CURLAIOCBs.

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Paolo Bonzini 
---
 block/curl.c | 20 
 1 file changed, 4 insertions(+), 16 deletions(-)

diff --git a/block/curl.c b/block/curl.c
index 35cf417f59..cd578d3d14 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -101,8 +101,6 @@ typedef struct CURLAIOCB {
 
 size_t start;
 size_t end;
-
-QSIMPLEQ_ENTRY(CURLAIOCB) next;
 } CURLAIOCB;
 
 typedef struct CURLSocket {
@@ -138,7 +136,7 @@ typedef struct BDRVCURLState {
 bool accept_range;
 AioContext *aio_context;
 QemuMutex mutex;
-QSIMPLEQ_HEAD(, CURLAIOCB) free_state_waitq;
+CoQueue free_state_waitq;
 char *username;
 char *password;
 char *proxyusername;
@@ -538,7 +536,6 @@ static int curl_init_state(BDRVCURLState *s, CURLState 
*state)
 /* Called with s->mutex held.  */
 static void curl_clean_state(CURLState *s)
 {
-CURLAIOCB *next;
 int j;
 for (j = 0; j < CURL_NUM_ACB; j++) {
 assert(!s->acb[j]);
@@ -556,13 +553,7 @@ static void curl_clean_state(CURLState *s)
 
 s->in_use = 0;
 
-next = QSIMPLEQ_FIRST(>s->free_state_waitq);
-if (next) {
-QSIMPLEQ_REMOVE_HEAD(>s->free_state_waitq, next);
-qemu_mutex_unlock(>s->mutex);
-aio_co_wake(next->co);
-qemu_mutex_lock(>s->mutex);
-}
+qemu_co_enter_next(>s->free_state_waitq, >s->mutex);
 }
 
 static void curl_parse_filename(const char *filename, QDict *options,
@@ -784,7 +775,7 @@ static int curl_open(BlockDriverState *bs, QDict *options, 
int flags,
 }
 
 DPRINTF("CURL: Opening %s\n", file);
-QSIMPLEQ_INIT(>free_state_waitq);
+qemu_co_queue_init(>free_state_waitq);
 s->aio_context = bdrv_get_aio_context(bs);
 s->url = g_strdup(file);
 qemu_mutex_lock(>mutex);
@@ -888,10 +879,7 @@ static void curl_setup_preadv(BlockDriverState *bs, 
CURLAIOCB *acb)
 if (state) {
 break;
 }
-QSIMPLEQ_INSERT_TAIL(>free_state_waitq, acb, next);
-qemu_mutex_unlock(>mutex);
-qemu_coroutine_yield();
-qemu_mutex_lock(>mutex);
+qemu_co_queue_wait(>free_state_waitq, >mutex);
 }
 
 if (curl_init_state(s, state) < 0) {
-- 
2.14.3

[Qemu-devel] [PATCH 3/5] coroutine-lock: convert CoQueue to use QemuLockable

2018-02-01 Thread Paolo Bonzini

There are cases in which a queued coroutine must be restarted from
non-coroutine context (with qemu_co_enter_next).  In this cases,
qemu_co_enter_next also needs to be thread-safe, but it cannot use
a CoMutex and so cannot qemu_co_queue_wait.  Use QemuLockable so
that the CoQueue can interchangeably use CoMutex or QemuMutex.

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Paolo Bonzini 
---
 include/qemu/coroutine.h   |  6 +-
 util/qemu-coroutine-lock.c | 12 +++-
 2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
index 8a5129741c..1e5f0957e6 100644
--- a/include/qemu/coroutine.h
+++ b/include/qemu/coroutine.h
@@ -183,7 +183,9 @@ void qemu_co_queue_init(CoQueue *queue);
  * caller of the coroutine.  The mutex is unlocked during the wait and
  * locked again afterwards.
  */
-void coroutine_fn qemu_co_queue_wait(CoQueue *queue, CoMutex *mutex);
+#define qemu_co_queue_wait(queue, lock) \
+qemu_co_queue_wait_impl(queue, QEMU_MAKE_LOCKABLE(lock))
+void coroutine_fn qemu_co_queue_wait_impl(CoQueue *queue, QemuLockable *lock);
 
 /**
  * Restarts the next coroutine in the CoQueue and removes it from the queue.
@@ -271,4 +273,6 @@ void coroutine_fn qemu_co_sleep_ns(QEMUClockType type, 
int64_t ns);
  */
 void coroutine_fn yield_until_fd_readable(int fd);
 
+#include "qemu/lockable.h"
+
 #endif /* QEMU_COROUTINE_H */
diff --git a/util/qemu-coroutine-lock.c b/util/qemu-coroutine-lock.c
index 846ff9167f..2a66fc1467 100644
--- a/util/qemu-coroutine-lock.c
+++ b/util/qemu-coroutine-lock.c
@@ -40,13 +40,13 @@ void qemu_co_queue_init(CoQueue *queue)
 QSIMPLEQ_INIT(>entries);
 }
 
-void coroutine_fn qemu_co_queue_wait(CoQueue *queue, CoMutex *mutex)
+void coroutine_fn qemu_co_queue_wait_impl(CoQueue *queue, QemuLockable *lock)
 {
 Coroutine *self = qemu_coroutine_self();
 QSIMPLEQ_INSERT_TAIL(>entries, self, co_queue_next);
 
-if (mutex) {
-qemu_co_mutex_unlock(mutex);
+if (lock) {
+qemu_lockable_unlock(lock);
 }
 
 /* There is no race condition here.  Other threads will call
@@ -60,9 +60,11 @@ void coroutine_fn qemu_co_queue_wait(CoQueue *queue, CoMutex 
*mutex)
 /* TODO: OSv implements wait morphing here, where the wakeup
  * primitive automatically places the woken coroutine on the
  * mutex's queue.  This avoids the thundering herd effect.
+ * This could be implemented for CoMutexes, but not really for
+ * other cases of QemuLockable.
  */
-if (mutex) {
-qemu_co_mutex_lock(mutex);
+if (lock) {
+qemu_lockable_lock(lock);
 }
 }
 
-- 
2.14.3

[Qemu-devel] [PATCH v4 0/5] coroutine-lock: polymorphic CoQueue

2018-02-01 Thread Paolo Bonzini

There are cases in which a queued coroutine must be restarted from
non-coroutine context (with qemu_co_enter_next).  In this cases,
qemu_co_enter_next also needs to be thread-safe, but it cannot use a
CoMutex and so cannot qemu_co_queue_wait.  This happens in curl (which
right now is rolling its own list of Coroutines) and will happen in
Fam's NVMe driver as well.

This series extracts the idea of a polymorphic lockable object
from my "scoped lock guard" proposal, and applies it to CoQueue.
The implementation of QemuLockable is similar to C11 _Generic, but
redone using the preprocessor and GCC builtins for compatibility.

In general, while a bit on the esoteric side, the functionality used
to emulate _Generic is fairly old in GCC, and the builtins are already
used by include/qemu/atomic.h; the series was tested with Fedora 27 (boot
Damn Small Linux via http) and CentOS 6 (compiled only).

Paolo

v3->v4: fix -O0 compilation [Fam]
typos and copyright dates [Eric, Fam]
improve CoQueue comments [Stefan]

Paolo Bonzini (5):
  test-coroutine: add simple CoMutex test
  lockable: add QemuLockable
  coroutine-lock: convert CoQueue to use QemuLockable
  coroutine-lock: make qemu_co_enter_next thread-safe
  curl: convert to CoQueue

 block/curl.c| 20 ++
 fsdev/qemu-fsdev-throttle.c |  4 +-
 include/qemu/compiler.h | 40 +++
 include/qemu/coroutine.h| 29 +-
 include/qemu/lockable.h | 96 +
 include/qemu/thread.h   |  5 +--
 include/qemu/typedefs.h |  4 ++
 tests/test-coroutine.c  | 75 ++-
 util/qemu-coroutine-lock.c  | 22 +++
 9 files changed, 256 insertions(+), 39 deletions(-)
 create mode 100644 include/qemu/lockable.h

-- 
2.14.3

[Qemu-devel] [PATCH 1/5] test-coroutine: add simple CoMutex test

2018-02-01 Thread Paolo Bonzini

In preparation for adding a similar test using QemuLockable, add a very
simple testcase that has two interleaved calls to lock and unlock.

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Paolo Bonzini 
---
 tests/test-coroutine.c | 50 --
 1 file changed, 48 insertions(+), 2 deletions(-)

diff --git a/tests/test-coroutine.c b/tests/test-coroutine.c
index 76c646107e..010cb95ad6 100644
--- a/tests/test-coroutine.c
+++ b/tests/test-coroutine.c
@@ -175,7 +175,7 @@ static void coroutine_fn c1_fn(void *opaque)
 qemu_coroutine_enter(c2);
 }
 
-static void test_co_queue(void)
+static void test_no_dangling_access(void)
 {
 Coroutine *c1;
 Coroutine *c2;
@@ -195,6 +195,51 @@ static void test_co_queue(void)
 *c1 = tmp;
 }
 
+static bool locked = false;
+static int done;
+
+static void coroutine_fn mutex_fn(void *opaque)
+{
+CoMutex *m = opaque;
+qemu_co_mutex_lock(m);
+assert(!locked);
+locked = true;
+qemu_coroutine_yield();
+locked = false;
+qemu_co_mutex_unlock(m);
+done++;
+}
+
+static void do_test_co_mutex(CoroutineEntry *entry, void *opaque)
+{
+Coroutine *c1 = qemu_coroutine_create(entry, opaque);
+Coroutine *c2 = qemu_coroutine_create(entry, opaque);
+
+done = 0;
+qemu_coroutine_enter(c1);
+g_assert(locked);
+qemu_coroutine_enter(c2);
+
+/* Unlock queues c2.  It is then started automatically when c1 yields or
+ * terminates.
+ */
+qemu_coroutine_enter(c1);
+g_assert_cmpint(done, ==, 1);
+g_assert(locked);
+
+qemu_coroutine_enter(c2);
+g_assert_cmpint(done, ==, 2);
+g_assert(!locked);
+}
+
+static void test_co_mutex(void)
+{
+CoMutex m;
+
+qemu_co_mutex_init();
+do_test_co_mutex(mutex_fn, );
+}
+
 /*
  * Check that creation, enter, and return work
  */
@@ -422,7 +467,7 @@ int main(int argc, char **argv)
  * crash, so skip it.
  */
 if (CONFIG_COROUTINE_POOL) {
-g_test_add_func("/basic/co_queue", test_co_queue);
+g_test_add_func("/basic/no-dangling-access", test_no_dangling_access);
 }
 
 g_test_add_func("/basic/lifecycle", test_lifecycle);
@@ -432,6 +477,7 @@ int main(int argc, char **argv)
 g_test_add_func("/basic/entered", test_entered);
 g_test_add_func("/basic/in_coroutine", test_in_coroutine);
 g_test_add_func("/basic/order", test_order);
+g_test_add_func("/locking/co-mutex", test_co_mutex);
 if (g_test_perf()) {
 g_test_add_func("/perf/lifecycle", perf_lifecycle);
 g_test_add_func("/perf/nesting", perf_nesting);
-- 
2.14.3

Re: [Qemu-devel] [PATCH v3 39/39] iotests: Add l2-cache-entry-size to iotest 137

2018-02-01 Thread Max Reitz

On 2018-01-26 16:00, Alberto Garcia wrote:
> This test tries reopening a qcow2 image with valid and invalid
> options. This patch adds l2-cache-entry-size to the set.
> 
> Signed-off-by: Alberto Garcia 
> ---
>  tests/qemu-iotests/137 | 5 +
>  tests/qemu-iotests/137.out | 2 ++
>  2 files changed, 7 insertions(+)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] MTTCG External Halt

2018-02-01 Thread Alistair Francis

On Thu, Feb 1, 2018 at 9:13 AM, Alistair Francis
 wrote:
> On Thu, Feb 1, 2018 at 4:01 AM, Alex Bennée  wrote:
>>
>> Alistair Francis  writes:
>>
>>> On Wed, Jan 31, 2018 at 12:32 PM, Alex Bennée  
>>> wrote:

 Alistair Francis  writes:

> On Tue, Jan 30, 2018 at 8:26 PM, Paolo Bonzini  
> wrote:
>> On 30/01/2018 18:56, Alistair Francis wrote:
>>>
>>> I don't have a good solution though, as setting CPU_INTERRUPT_RESET
>>> doesn't help (that isn't handled while we are halted) and
>>> async_run_on_cpu()/run_on_cpu() doesn't reliably reset the CPU when we
>>> want.
>>>
>>> I've ever tried pausing all CPUs before reseting the CPU and them
>>> resuming them all but that doesn't seem to to work either.
>>
>> async_safe_run_on_cpu would be like async_run_on_cpu, except that it
>> takes care of stopping all other CPUs while the function runs.
>>
>>> Is there
>>> anything I'm missing? Is there no reliable way to reset a CPU?
>>
>> What do you mean by reliable?  Executing no instruction after the one
>> you were at?
>
> The reset is called by a GPIO line, so I need the reset to be called
> basically as quickly as the GPIO line changes. The async_ and
> async_safe_ functions seem to not run quickly enough, even if I run a
> process_work_queue() function afterwards.
>
> Is there a way to kick the CPU to act on the async_*?

 Define quickly enough? The async_(safe) functions kick the vCPUs so they
 will all exit the run loop as they enter the next TB (even if they loop
 to themselves).
>>>
>>> We have a special power controller CPU that wakes all the CPUs up and
>>> at boot the async_* functions don't wake the CPUs up. If I just use
>>> the cpu_rest() function directly everything starts fine (but then I
>>> hit issues later).
>>>
>>> If I forcefully run process_queued_cpu_work() then I can get the CPUs
>>> up, but I don't think that is the right solution.
>>>

 From an external vCPUs point of view those extra instructions have
 already executed. If the resetting vCPU needs them to have reset by the
 time it executes it's next instruction it should either cpu_loop_exit at
 that point or ensure it is the last instruction in it's TB (which is
 what we do for the MMU flush cases in ARM, they all end the TB at that
 point).
>>>
>>> cpu_loop_exit() sounds like it would help, but as I'm not in the CPU
>>> context it just seg faults.
>>
>> What context are you in? gdb-stub does have to something like this.
>
> gdb-stub just seems to use vm_stop() and vm_start().
>
> That fixes all hangs/asserts, but now Linux only brings up 1 CPU (instead of 
> 4).

Hmmm... Interesting if I do this on reset events:

pause_all_vcpus();
cpu_reset(cpu);
resume_all_vcpus();

it hangs, while if I do this

if (runstate_is_running()) {
vm_stop(RUN_STATE_PAUSED);
}
cpu_reset(cpu);
if (!runstate_needs_reset()) {
vm_start();
}

it doesn't hang but CPU bringup doesn't work.

Alistair

>
> Alistair

[Qemu-devel] [PATCH V9 4/4] MAINTAINERS: add entry for hw/rdma

2018-02-01 Thread Marcel Apfelbaum

Signed-off-by: Marcel Apfelbaum 
Signed-off-by: Yuval Shaia 
---
 MAINTAINERS | 8 
 1 file changed, 8 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index f8deaf638c..23a4fd0b3d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1986,6 +1986,14 @@ F: block/replication.c
 F: tests/test-replication.c
 F: docs/block-replication.txt
 
+PVRDMA
+M: Yuval Shaia 
+M: Marcel Apfelbaum 
+S: Maintained
+F: hw/rdma/*
+F: hw/rdma/vmw/*
+F: docs/pvrdma.txt
+
 Build and test automation
 -
 Build and test automation
-- 
2.13.5

[Qemu-devel] [PATCH V9 1/4] mem: add share parameter to memory-backend-ram

2018-02-01 Thread Marcel Apfelbaum

Currently only file backed memory backend can
be created with a "share" flag in order to allow
sharing guest RAM with other processes in the host.

Add the "share" flag also to RAM Memory Backend
in order to allow remapping parts of the guest RAM
to different host virtual addresses. This is needed
by the RDMA devices in order to remap non-contiguous
QEMU virtual addresses to a contiguous virtual address range.

Moved the "share" flag to the Host Memory base class,
modified phys_mem_alloc to include the new parameter
and a new interface memory_region_init_ram_shared_nomigrate.

There are no functional changes if the new flag is not used.

Signed-off-by: Marcel Apfelbaum 
---
 backends/hostmem-file.c  | 25 +
 backends/hostmem-ram.c   |  4 ++--
 backends/hostmem.c   | 21 +
 exec.c   | 26 +++---
 include/exec/memory.h| 23 +++
 include/exec/ram_addr.h  |  3 ++-
 include/qemu/osdep.h |  2 +-
 include/sysemu/hostmem.h |  2 +-
 include/sysemu/kvm.h |  2 +-
 memory.c | 16 +---
 qemu-options.hx  | 10 +-
 target/s390x/kvm.c   |  4 ++--
 util/oslib-posix.c   |  4 ++--
 util/oslib-win32.c   |  2 +-
 14 files changed, 94 insertions(+), 50 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index e319ec1ad8..134b08d63a 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -31,7 +31,6 @@ typedef struct HostMemoryBackendFile HostMemoryBackendFile;
 struct HostMemoryBackendFile {
 HostMemoryBackend parent_obj;
 
-bool share;
 bool discard_data;
 char *mem_path;
 uint64_t align;
@@ -59,7 +58,7 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error 
**errp)
 path = object_get_canonical_path(OBJECT(backend));
 memory_region_init_ram_from_file(>mr, OBJECT(backend),
  path,
- backend->size, fb->align, fb->share,
+ backend->size, fb->align, backend->share,
  fb->mem_path, errp);
 g_free(path);
 }
@@ -86,25 +85,6 @@ static void set_mem_path(Object *o, const char *str, Error 
**errp)
 fb->mem_path = g_strdup(str);
 }
 
-static bool file_memory_backend_get_share(Object *o, Error **errp)
-{
-HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
-
-return fb->share;
-}
-
-static void file_memory_backend_set_share(Object *o, bool value, Error **errp)
-{
-HostMemoryBackend *backend = MEMORY_BACKEND(o);
-HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
-
-if (host_memory_backend_mr_inited(backend)) {
-error_setg(errp, "cannot change property value");
-return;
-}
-fb->share = value;
-}
-
 static bool file_memory_backend_get_discard_data(Object *o, Error **errp)
 {
 return MEMORY_BACKEND_FILE(o)->discard_data;
@@ -171,9 +151,6 @@ file_backend_class_init(ObjectClass *oc, void *data)
 bc->alloc = file_backend_memory_alloc;
 oc->unparent = file_backend_unparent;
 
-object_class_property_add_bool(oc, "share",
-file_memory_backend_get_share, file_memory_backend_set_share,
-_abort);
 object_class_property_add_bool(oc, "discard-data",
 file_memory_backend_get_discard_data, 
file_memory_backend_set_discard_data,
 _abort);
diff --git a/backends/hostmem-ram.c b/backends/hostmem-ram.c
index 38977be73e..7ddd08d370 100644
--- a/backends/hostmem-ram.c
+++ b/backends/hostmem-ram.c
@@ -28,8 +28,8 @@ ram_backend_memory_alloc(HostMemoryBackend *backend, Error 
**errp)
 }
 
 path = object_get_canonical_path_component(OBJECT(backend));
-memory_region_init_ram_nomigrate(>mr, OBJECT(backend), path,
-   backend->size, errp);
+memory_region_init_ram_shared_nomigrate(>mr, OBJECT(backend), 
path,
+   backend->size, backend->share, errp);
 g_free(path);
 }
 
diff --git a/backends/hostmem.c b/backends/hostmem.c
index ee2c2d5bfd..1daf13bd2e 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -369,6 +369,24 @@ static void set_id(Object *o, const char *str, Error 
**errp)
 backend->id = g_strdup(str);
 }
 
+static bool host_memory_backend_get_share(Object *o, Error **errp)
+{
+HostMemoryBackend *backend = MEMORY_BACKEND(o);
+
+return backend->share;
+}
+
+static void host_memory_backend_set_share(Object *o, bool value, Error **errp)
+{
+HostMemoryBackend *backend = MEMORY_BACKEND(o);
+
+if (host_memory_backend_mr_inited(backend)) {
+error_setg(errp, "cannot change property value");
+return;
+}
+backend->share = value;
+}
+
 static void
 host_memory_backend_class_init(ObjectClass *oc, void *data)
 {
@@ -399,6 +417,9 @@ host_memory_backend_class_init(ObjectClass *oc, void *data)
 host_memory_backend_get_policy,

[Qemu-devel] [PATCH V9 2/4] docs: add pvrdma device documentation.

2018-02-01 Thread Marcel Apfelbaum

Signed-off-by: Marcel Apfelbaum 
Signed-off-by: Yuval Shaia 
Reviewed-by: Shamir Rabinovitch 
---
 docs/pvrdma.txt | 255 
 1 file changed, 255 insertions(+)
 create mode 100644 docs/pvrdma.txt

diff --git a/docs/pvrdma.txt b/docs/pvrdma.txt
new file mode 100644
index 00..5599318159
--- /dev/null
+++ b/docs/pvrdma.txt
@@ -0,0 +1,255 @@
+Paravirtualized RDMA Device (PVRDMA)
+
+
+
+1. Description
+===
+PVRDMA is the QEMU implementation of VMware's paravirtualized RDMA device.
+It works with its Linux Kernel driver AS IS, no need for any special guest
+modifications.
+
+While it complies with the VMware device, it can also communicate with bare
+metal RDMA-enabled machines and does not require an RDMA HCA in the host, it
+can work with Soft-RoCE (rxe).
+
+It does not require the whole guest RAM to be pinned allowing memory
+over-commit and, even if not implemented yet, migration support will be
+possible with some HW assistance.
+
+A project presentation accompany this document:
+- 
http://events.linuxfoundation.org/sites/events/files/slides/lpc-2017-pvrdma-marcel-apfelbaum-yuval-shaia.pdf
+
+
+
+2. Setup
+
+
+
+2.1 Guest setup
+===
+Fedora 27+ kernels work out of the box, older distributions
+require updating the kernel to 4.14 to include the pvrdma driver.
+
+However the libpvrdma library needed by User Level Software is still
+not available as part of the distributions, so the rdma-core library
+needs to be compiled and optionally installed.
+
+Please follow the instructions at:
+  https://github.com/linux-rdma/rdma-core.git
+
+
+2.2 Host Setup
+==
+The pvrdma backend is an ibdevice interface that can be exposed
+either by a Soft-RoCE(rxe) device on machines with no RDMA device,
+or an HCA SRIOV function(VF/PF).
+Note that ibdevice interfaces can't be shared between pvrdma devices,
+each one requiring a separate instance (rxe or SRIOV VF).
+
+
+2.2.1 Soft-RoCE backend(rxe)
+===
+A stable version of rxe is required, Fedora 27+ or a Linux
+Kernel 4.14+ is preferred.
+
+The rdma_rxe module is part of the Linux Kernel but not loaded by default.
+Install the User Level library (librxe) following the instructions from:
+https://github.com/SoftRoCE/rxe-dev/wiki/rxe-dev:-Home
+
+Associate an ETH interface with rxe by running:
+   rxe_cfg add eth0
+An rxe0 ibdevice interface will be created and can be used as pvrdma backend.
+
+
+2.2.2 RDMA device Virtual Function backend
+==
+Nothing special is required, the pvrdma device can work not only with
+Ethernet Links, but also Infinibands Links.
+All is needed is an ibdevice with an active port, for Mellanox cards
+will be something like mlx5_6 which can be the backend.
+
+
+2.2.3 QEMU setup
+
+Configure QEMU with --enable-rdma flag, installing
+the required RDMA libraries.
+
+
+
+3. Usage
+
+Currently the device is working only with memory backed RAM
+and it must be mark as "shared":
+   -m 1G \
+   -object memory-backend-ram,id=mb1,size=1G,share \
+   -numa node,memdev=mb1 \
+
+The pvrdma device is composed of two functions:
+ - Function 0 is a vmxnet Ethernet Device which is redundant in Guest
+   but is required to pass the ibdevice GID using its MAC.
+   Examples:
+ For an rxe backend using eth0 interface it will use its mac:
+   -device vmxnet3,addr=.0,multifunction=on,mac=
+ For an SRIOV VF, we take the Ethernet Interface exposed by it:
+   -device vmxnet3,multifunction=on,mac=
+ - Function 1 is the actual device:
+   -device 
pvrdma,addr=.1,backend-dev=,backend-gid-idx=,backend-port=
+   where the ibdevice can be rxe or RDMA VF (e.g. mlx5_4)
+ Note: Pay special attention that the GID at backend-gid-idx matches vmxnet's 
MAC.
+ The rules of conversion are part of the RoCE spec, but since manual conversion
+ is not required, spotting problems is not hard:
+Example: GID: fe80::::7efe:90ff:fecb:743a
+ MAC: 7c:fe:90:cb:74:3a
+Note the difference between the first byte of the MAC and the GID.
+
+
+
+4. Implementation details
+=
+
+
+4.1 Overview
+
+The device acts like a proxy between the Guest Driver and the host
+ibdevice interface.
+On configuration path:
+ - For every hardware resource request (PD/QP/CQ/...) the pvrdma will request
+   a resource from the backend interface, maintaining a 1-1 mapping
+   between the guest and host.
+On data path:
+ - Every post_send/receive received from the guest will be converted into
+   a post_send/receive for the backend. The buffers data will not be touched
+   or copied resulting in near bare-metal performance for large enough buffers.
+ - Completions from the backend interface will result in completions for
+   the pvrdma device.
+
+
+4.2

[Qemu-devel] [PATCH V9 0/4] hw/pvrdma: PVRDMA device implementation

2018-02-01 Thread Marcel Apfelbaum

V8 -> V9:
 - Addressed Dotan's Barak (offline) comments:
   - use g_malloc instead of malloc
   - re-arrange structs for better padding
   - some cosmetic changes
   - do not try to fetch CQE when CQ is going down
   - init state of QP changed to RESET
   - modify poll_cq
   - add fix to qkey handling so now qkey=0 is also supported
   - add validation to gid_index
   - fix memory leak with ah_key ref
 - Addressed Eduardo Habkost comments:
   - Add the mem-backed-ram "share" option to qemu-options.hx.
 - Rebased on latest master

V7 -> V8:
 - Addressed Michael S. Tsirkin comments:
   - fail to init the pvrdma device if target page size is different
 from the host size, or if the guest RAM is not backed by memory
 and shared.
   - Update documentation to include a note on huge memory regions
 registration and remove not needed info.
 - Removed "pci/shpc: Move function to generic header file" since it
   appears in latest maintainer pull request
 - Rebased on master

V6 -> V7:
 - Addressed Philippe Mathieu-Daudé comments
   - modified pow2roundup32 signature
   - added his RB tag (thanks)
 - Addressed Corenlia Huck comments:
   - Compiled the pvrdma for all archs and not only x86/arm (thanks)
   - Fixed typo in documentation
 - Rebased on latest master

V5 -> V6:
 - Found a ppc machine and solved ppc compilation issues
 - Tried to fix the s390x issue (still looking of a machine)

V4 -> V5:
 - Fixed (at least tried to) compilation issues

V3 -> V4:
 - Fixed documentation (added more impl details)
 - Fixed compilation errors discovered by patchew.
 - Addressed Michael S. Tsirkin comments:
   - Removed unnecessary typedefs and replace them with
 macros in VMware header files, together with explanations.
   - Moved more code from vmw specific to rdma generic code.
   - Added page size limitations to the documentation.

V2 -> V3:
 - Addressed Michael S. Tsirkin and Philippe Mathieu-Daudé comments:
   - Moved the device to hw/rdma
 - Addressed Michael S. Tsirkin comments:
   - Split the code into generic (hw/rdma) and VMWare
 specific (hw/rdma/vmw)
   - Added more details to documentation - VMware guest-host protocol.
   - Remove mad processing
   - limited the memory the Guest can pin.
 - Addressed Philippe Mathieu-Daudé comment:
   - s/roundup_pow_of_two/pow2roundup32 and move it to qemu/host-utils.h 
 - Added Shamit Rabinovici's review to documentation
 - Rebased to latest master 

RFC -> V2:
 - Full implementation of the pvrdma device
 - Backend is an ibdevice interface, no need for the KDBR module


General description
===
PVRDMA is the QEMU implementation of VMware's paravirtualized RDMA device.
It works with its Linux Kernel driver AS IS, no need for any special guest
modifications.

While it complies with the VMware device, it can also communicate with bare
metal RDMA-enabled machines and does not require an RDMA HCA in the host, it
can work with Soft-RoCE (rxe).

It does not require the whole guest RAM to be pinned allowing memory
over-commit and, even if not implemented yet, migration support will be
possible with some HW assistance.


 Design
 ==
 - Follows the behavior of VMware's pvrdma device, however is not tightly
   coupled with it and most of the code can be reused if we decide to
   continue to a Virtio based RDMA device.

 - It exposes 3 BARs:
BAR 0 - MSIX, utilize 3 vectors for command ring, async events and
completions
BAR 1 - Configuration of registers
BAR 2 - UAR, used to pass HW commands from driver.

 - The device performs internal management of the RDMA
   resources (PDs, CQs, QPs, ...), meaning the objects
   are not directly coupled to a physical RDMA device resources.

The pvrdma backend is an ibdevice interface that can be exposed
either by a Soft-RoCE(rxe) device on machines with no RDMA device,
or an HCA SRIOV function(VF/PF).
Note that ibdevice interfaces can't be shared between pvrdma devices,
each one requiring a separate instance (rxe or SRIOV VF).


Tests and performance
=
Tested with SoftRoCE backend (rxe)/Mellanox ConnectX3,
and Mellanox ConnectX4 HCAs with:
  - VMs in the same host
  - VMs in different hosts 
  - VMs to bare metal.

The best performance achieved with ConnectX HCAs and buffer size
bigger than 1MB which was the line rate ~ 50Gb/s.
The conclusion is that using the PVRDMA device there are no
actual performance penalties compared to bare metal for big enough
buffers (which is quite common when using RDMA), while allowing
memory overcommit.

Marcel Apfelbaum (3):
  mem: add share parameter to memory-backend-ram
  docs: add pvrdma device documentation.
  MAINTAINERS: add entry for hw/rdma

Yuval Shaia (1):
  pvrdma: initial implementation

 MAINTAINERS   |   8 +
 Makefile.objs |   2 +
 backends/hostmem-file.c   |  25 +-
 backends/hostmem-ram.c|   4 +-
 backends/hostmem.c|  21 ++
 configure |   9 +-

Re: [Qemu-devel] [PATCH v12 0/6] scripts/qemu.py fixes and cleanups

2018-02-01 Thread Eduardo Habkost

Queued on python-next, thanks!

On Mon, Jan 22, 2018 at 09:50:27PM +0100, Amador Pahim wrote:
> Changes v11->v12:
>  - Rebase.
>  - Drop the already queued commit (qemu.py: remove unused import)
>  - Fix code filling the _monitor_address when it's None.
>  - Drop the redundant commit checking whether VM was running on launch()
>  - Improve the exception message when launch() is called twice with no
>shutdown() in between.
> Changes v10->v11:
>  - Fix messed indentation.
>  - Rename self._qemu_log_fd to self._qemu_log_file.
>  - Patch changelog in reverse order.
> Changes v9->v10:
>  - Keep method _remove_if_exists().
> Changes v8->v9:
>  - Some commits were already merged. Rebased the remaining ones.
>  - New commit to remove unused import.
>  - VM files control logic changed to use a temporary directory.
>  - Better naming for variable controlling the need of a shutdown before
>next launch.
> Changes v7->v8:
>  - Rebased.
>  - Reorder commits to avoid break->fix sequence.
>  - Split commits "use poll() instead of 'returncode'" and "refactor
>launch()".
>  - Don't ignore errors in _load_io_log(). Instead, check if we created
>the file before reading it.
>  - Use LOG.warn() instead of LOG.debug() for the negative exit code
>message.
>  - Fix the exception name called in commits "launch vm only if it's not
>running" and "don't launch again before shutdown()".
>  - Minor style fixes.
> Changes v6->v7:
>  - Split commits in self-contained/atomic changes.
>  - Addressed the comments from previous version, basically improving the
>logging messages and the control over created files. See individual
>commit messages for details.
> Changes v5->v6:
>  - Remove the commit to rename self._args.
>  - Fix is_running() return before first call to maunch().
>  - Use python logging system.
>  - Include the full command line on negative exit code error message.
>  - Use os.path.null instead of /dev/null.
>  - Improve the control over the created/deleted files.
> Changes v4->v5:
>  - Break the cleanup commit into logical changes and include in the
>commit messages the rationale for making them.
> Changes v3->v4:
>  - Squash the 2 first commits since they are co-dependant.
>  - Cleanup launch() and shutdown().
>  - Reorder the commits, putting the rename of self._args first.
>  - Rebased.
> Changes v2->v3:
>  - Fix typo in patch 3 ("qemu.py: make 'args' public") commit message.
> Changes v1->v2:
>  - Style fixes to make checkpatch.pl happy.
>  - Rebased.
> 
> *** SUBJECT HERE ***
> 
> *** BLURB HERE ***
> 
> Amador Pahim (6):
>   qemu.py: better control of created files
>   qemu.py: refactor launch()
>   qemu.py: always cleanup on shutdown()
>   qemu.py: use poll() instead of 'returncode'
>   qemu.py: cleanup redundant calls in launch()
>   qemu.py: don't launch again before shutdown()
> 
>  scripts/qemu.py | 93 
> ++---
>  1 file changed, 62 insertions(+), 31 deletions(-)
> 
> -- 
> 2.14.3
> 
> 

-- 
Eduardo

[Qemu-devel] [PATCH] target/arm/kvm: gic: Prevent creating userspace GICv3 with KVM

2018-02-01 Thread Christoffer Dall

KVM doesn't support emulating a GICv3 in userspace, only GICv2.  We
currently attempt this anyway, and as a result a KVM guest doesn't
receive interrupts and the user is left wondering why.  Report an error
to the user if this particular combination is requested.

Signed-off-by: Christoffer Dall 
---
 target/arm/kvm_arm.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/target/arm/kvm_arm.h b/target/arm/kvm_arm.h
index ff53e9fafb..cfb7e5af72 100644
--- a/target/arm/kvm_arm.h
+++ b/target/arm/kvm_arm.h
@@ -234,6 +234,10 @@ static inline const char *gicv3_class_name(void)
 exit(1);
 #endif
 } else {
+if (kvm_enabled()) {
+error_report("Userspace GICv3 is not supported with KVM");
+exit(1);
+}
 return "arm-gicv3";
 }
 }
-- 
2.14.2

Re: [Qemu-devel] [PATCH v5 11/14] input: add missing JIS keys to virtio input

2018-02-01 Thread Eduardo Habkost

On Tue, Jan 16, 2018 at 01:42:14PM +, Daniel P. Berrange wrote:
> From: Miika S 
> 
> keycodemapdb updated to add the QKeyCodes muhenkan and katakanahiragana
> 
> Signed-off-by: Miika S 

Oops, this conflicts with:

commit ae6b06ab655b21c19b234ce3422f694d11a013e0
Author: Daniel P. Berrange 
Date:   Wed Jan 17 16:41:18 2018 +

hw: convert virtio-input-hid device to keycodemapdb

[...]

Patch 11/14 and 12/14 need to be redone.  I'm removing patches
11-14 from python-next until this is sorted out.

-- 
Eduardo

Re: [Qemu-devel] [PATCH v12 1/6] qemu.py: better control of created files

2018-02-01 Thread Eduardo Habkost

On Mon, Jan 22, 2018 at 09:50:28PM +0100, Amador Pahim wrote:
> To launch a VM, we need to create basically two files: the monitor
> socket (if it's a UNIX socket) and the qemu log file.
> 
> For the qemu log file, we currently just open the path, which will
> create the file if it does not exist or overwrite the file if it does
> exist.
> 
> For the monitor socket, if it already exists, we are currently removing
> it, even if it's not created by us.
> 
> This patch moves to _pre_launch() the responsibility to create a
> temporary directory to host the files so we can remove the whole
> directory on _post_shutdown().
> 
> Signed-off-by: Amador Pahim 
> ---
>  scripts/qemu.py | 39 +--
>  1 file changed, 29 insertions(+), 10 deletions(-)
> 
> diff --git a/scripts/qemu.py b/scripts/qemu.py
> index 9bfdf6d37d..453a67250a 100644
> --- a/scripts/qemu.py
> +++ b/scripts/qemu.py
> @@ -18,6 +18,8 @@ import os
>  import sys
>  import subprocess
>  import qmp.qmp
> +import shutil
> +import tempfile
>  
>  
>  LOG = logging.getLogger(__name__)
> @@ -73,10 +75,11 @@ class QEMUMachine(object):
>  wrapper = []
>  if name is None:
>  name = "qemu-%d" % os.getpid()
> -if monitor_address is None:
> -monitor_address = os.path.join(test_dir, name + "-monitor.sock")
> +self._name = name
>  self._monitor_address = monitor_address
> -self._qemu_log_path = os.path.join(test_dir, name + ".log")
> +self._vm_monitor = None
> +self._qemu_log_path = None
> +self._qemu_log_file = None
>  self._popen = None
>  self._binary = binary
>  self._args = list(args) # Force copy args in case we modify them
> @@ -86,6 +89,8 @@ class QEMUMachine(object):
>  self._socket_scm_helper = socket_scm_helper
>  self._qmp = None
>  self._qemu_full_args = None
> +self._test_dir = test_dir
> +self._temp_dir = None
>  
>  # just in case logging wasn't configured by the main script:
>  logging.basicConfig()
> @@ -168,36 +173,50 @@ class QEMUMachine(object):
>  self._monitor_address[0],
>  self._monitor_address[1])

If _monitor_address now needs to be translated to _vm_monitor,
I'd like to remove all usage of _monitor_address outside
_pre_launch, to avoid confusion between the two attributes.

This can be done in a follow-up patch.

Reviewed-by: Eduardo Habkost 

>  else:
> -moncdev = 'socket,id=mon,path=%s' % self._monitor_address
> +moncdev = 'socket,id=mon,path=%s' % self._vm_monitor
>  return ['-chardev', moncdev,
>  '-mon', 'chardev=mon,mode=control',
>  '-display', 'none', '-vga', 'none']
>  
>  def _pre_launch(self):
> -self._qmp = qmp.qmp.QEMUMonitorProtocol(self._monitor_address,
> +self._temp_dir = tempfile.mkdtemp(dir=self._test_dir)
> +if self._monitor_address is not None:
> +self._vm_monitor = self._monitor_address
> +else:
> +self._vm_monitor = os.path.join(self._temp_dir,
> +self._name + "-monitor.sock")
> +self._qemu_log_path = os.path.join(self._temp_dir, self._name + 
> ".log")
> +self._qemu_log_file = open(self._qemu_log_path, 'wb')
> +
> +self._qmp = qmp.qmp.QEMUMonitorProtocol(self._vm_monitor,
>  server=True)
>  
>  def _post_launch(self):
>  self._qmp.accept()
>  
>  def _post_shutdown(self):
> -if not isinstance(self._monitor_address, tuple):
> -self._remove_if_exists(self._monitor_address)
> -self._remove_if_exists(self._qemu_log_path)
> +if self._qemu_log_file is not None:
> +self._qemu_log_file.close()
> +self._qemu_log_file = None
> +
> +self._qemu_log_path = None
> +
> +if self._temp_dir is not None:
> +shutil.rmtree(self._temp_dir)
> +self._temp_dir = None
>  
>  def launch(self):
>  '''Launch the VM and establish a QMP connection'''
>  self._iolog = None
>  self._qemu_full_args = None
>  devnull = open(os.path.devnull, 'rb')
> -qemulog = open(self._qemu_log_path, 'wb')
>  try:
>  self._pre_launch()
>  self._qemu_full_args = (self._wrapper + [self._binary] +
>  self._base_args() + self._args)
>  self._popen = subprocess.Popen(self._qemu_full_args,
> stdin=devnull,
> -   stdout=qemulog,
> +   stdout=self._qemu_log_file,
> stderr=subprocess.STDOUT,
>

[Qemu-devel] [PATCH] spapr: add missing break in h_get_cpu_characteristics()

2018-02-01 Thread Greg Kurz

Detected by Coverity (CID 1385702). This fixes the recently added hypercall
to let guests properly apply Spectre and Meltdown workarounds.

Fixes: c59704b25473 "target/ppc/spapr: Add H-Call H_GET_CPU_CHARACTERISTICS"
Signed-off-by: Greg Kurz 
---
 hw/ppc/spapr_hcall.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index 4d0e6eb0cf1d..596f58378a40 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -1697,6 +1697,7 @@ static target_ulong h_get_cpu_characteristics(PowerPCCPU 
*cpu,
 switch (safe_indirect_branch) {
 case SPAPR_CAP_FIXED:
 characteristics |= H_CPU_CHAR_BCCTRL_SERIALISED;
+break;
 default: /* broken */
 assert(safe_indirect_branch == SPAPR_CAP_BROKEN);
 break;

Re: [Qemu-devel] [PATCH v3 38/39] iotests: Test downgrading an image using a small L2 slice size

2018-02-01 Thread Max Reitz

On 2018-01-26 16:00, Alberto Garcia wrote:
> expand_zero_clusters_in_l1() is used when downgrading qcow2 images
> from v3 to v2 (compat=0.10). This is one of the functions that needed
> more changes to support L2 slices, so this patch extends iotest 061 to
> test downgrading a qcow2 image using a smaller slice size.
> 
> Signed-off-by: Alberto Garcia 
> ---
>  tests/qemu-iotests/061 | 16 
>  tests/qemu-iotests/061.out | 61 
> ++
>  2 files changed, 77 insertions(+)

Well, it would be better if we could just specify this option for all of
the qcow2 tests, but I can see that that's not really feasible...

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PULL v5 39/43] hw/hppa: Implement DINO system board

2018-02-01 Thread Helge Deller

* Thomas Huth :
> First kudos for the new hppa-softmmu target!

Thanks Thomas!

> But these debug messages now pop up during "make check-qtest":
> 
> Serial port created at 0xfff83800
> Firmware loaded at 0xf000-0xf0023f29, entry at 0xf084.
> 
> That's a little bit annoying. Could you please either remove them or
> silence them in qtest mode by adding a check for !qtest_enabled() ?

The patch below fixes it.
Richard, can you apply it?

Helge

---

target/hppa: Silence debug messages in qtest mode

Reported-by: Thomas Huth 
Signed-off-by: Helge Deller 

diff --git a/hw/hppa/machine.c b/hw/hppa/machine.c
index afd3867313..338aa87724 100644
--- a/hw/hppa/machine.c
+++ b/hw/hppa/machine.c
@@ -12,6 +12,7 @@
 #include "hw/boards.h"
 #include "qemu/error-report.h"
 #include "sysemu/sysemu.h"
+#include "sysemu/qtest.h"
 #include "hw/timer/mc146818rtc.h"
 #include "hw/ide.h"
 #include "hw/timer/i8254.h"
@@ -111,7 +112,9 @@ static void machine_hppa_init(MachineState *machine)
 uint32_t addr = DINO_UART_HPA + 0x800;
 serial_mm_init(addr_space, addr, 0, serial_irq,
115200, serial_hds[0], DEVICE_BIG_ENDIAN);
-fprintf(stderr, "Serial port created at 0x%x\n", addr);
+if (!qtest_enabled()) {
+fprintf(stderr, "Serial port created at 0x%x\n", addr);
+}
 }
 
 /* SCSI disk setup. */
@@ -146,9 +149,11 @@ static void machine_hppa_init(MachineState *machine)
 error_report("could not load firmware '%s'", firmware_filename);
 exit(1);
 }
-fprintf(stderr, "Firmware loaded at 0x%08" PRIx64 "-0x%08" PRIx64
+if (!qtest_enabled()) {
+fprintf(stderr, "Firmware loaded at 0x%08" PRIx64 "-0x%08" PRIx64
 ", entry at 0x%08" PRIx64 ".\n",
 firmware_low, firmware_high, firmware_entry);
+}
 if (firmware_low < ram_size || firmware_high >= FIRMWARE_END) {
 error_report("Firmware overlaps with memory or IO space");
 exit(1);

Re: [Qemu-devel] [PATCH] pcie-root-port: let it has higher migrate priority

2018-02-01 Thread Dr. David Alan Gilbert

* Marcel Apfelbaum (mar...@redhat.com) wrote:
> On 01/02/2018 21:48, Dr. David Alan Gilbert wrote:
> > * Michael S. Tsirkin (m...@redhat.com) wrote:
> >> On Thu, Feb 01, 2018 at 02:24:15PM +0200, Marcel Apfelbaum wrote:
> >>> Hi Peter,
> >>>
> >>> On 01/02/2018 13:20, Peter Xu wrote:
>  In the past, we prioritized IOMMU migration so that we have such a
>  priority order:
> 
>  IOMMU > PCI Devices
> 
>  When migrating a guest with both vIOMMU and pcie-root-port, we'll always
>  migrate vIOMMU first, since pcie-root-port will be seen to have the same
>  priority of general PCI devices.
> 
>  That's problematic.
> 
>  The thing is that PCI bus number information is stored in the root port,
>  and that is needed by vIOMMU during post_load(), e.g., to figure out
>  context entry for a device.  If we don't have correct bus numbers for
>  devices, we won't be able to recover device state of the DMAR memory
>  regions, and things will be messed up.
> 
>  So let's boost the PCIe root ports to be even with higher priority:
> 
> PCIe Root Port > IOMMU > PCI Devices
> 
>  A smoke test shows that this patch fixes bug 1538953.
> 
>  CC: Alex Williamson 
>  CC: Marcel Apfelbaum 
>  CC: Michael S. Tsirkin 
>  CC: Dr. David Alan Gilbert 
>  CC: Juan Quintela 
>  CC: Laurent Vivier 
>  Bug: https://bugzilla.redhat.com/show_bug.cgi?id=1538953
>  Reported-by: Maxime Coquelin 
>  Signed-off-by: Peter Xu 
>  ---
>  Marcel & all,
> 
>  I think it's possible that we need similar thing for other bridge-like
>  devices, but I'm not that familiar.  Would you help confirm?  Thanks,
> >>>
> >>> Is a pity we don't have a way to mark the migration priority
> >>> in a base class. Dave, maybe we do have a way?
> >>>
> >>> In the meantime you would need to add it also to:
> >>> - ioh3420 (Intel root port)
> >>> - xio3130_downstream (Intel switch downstream port)
> >>> - xio3130_upstream (The counterpart of the above, you want the whole
> >>>   switch to be migrated before loading the IOMMU device state)
> >>> - pcie_pci_bridge (for pci devices)
> >>> - pci-pci bridge (if for some reason you have one attached to the 
> >>> pcie_pci_brdge)
> >>> - i82801b11 (dmi-pci bridge, we want to deprecate it bu is there for now)
> >>>
> >>> Thanks,
> >>> Marcel
> >>
> >> It's kind of strange that we need to set the priority manually.
> >> Can't migration figure it out itself?
> > 
> >> I think bus
> >> must always be migrated before the devices behind it ...
> > 
> > I think that's true; but:
> >   a) Is the iommu a child of any of the PCI busses?
> 
> No, is a sysbus device.

OK, so even if we were arguing about whether PCI busses got migrated in
order, that doesn't help Peter's case, because there's nothing that
orders the iommu relative to the PCI.

> >   b) does anything ensure that the bridge that's a parent of a bus
> >  gets migrated before the bus it provides?
> 
> I think this was Michael's question :)
> 
> >   c) What happens with more htan one root port?
> > 
> 
> Root ports can't be nested, anyway, I suppose the migration should
> follow the bus numbering order.
> 
> The question now is what happens if the migration is happening before
> the guest firmware finishes assigning numbers to buses...
> 
> Still, QEMU has enough information to decide the right ordering,
> the question is if the current migration mechanism has some ordering
> or the only one is the "VMStateFlags".

It's ordered on two things:
  a) The priority field that Peter is using
  b) The order of registration of the device.

(b) is of course dangerously unstable.

Dave

> Thanks,
> Marcel
> 
> > Dave
> > 
> > 
>   hw/pci-bridge/gen_pcie_root_port.c | 1 +
>   include/migration/vmstate.h| 1 +
>   2 files changed, 2 insertions(+)
> 
>  diff --git a/hw/pci-bridge/gen_pcie_root_port.c 
>  b/hw/pci-bridge/gen_pcie_root_port.c
>  index 0e2f2e8bf1..e6ff1effd8 100644
>  --- a/hw/pci-bridge/gen_pcie_root_port.c
>  +++ b/hw/pci-bridge/gen_pcie_root_port.c
>  @@ -101,6 +101,7 @@ static void gen_rp_realize(DeviceState *dev, Error 
>  **errp)
>   
>   static const VMStateDescription vmstate_rp_dev = {
>   .name = "pcie-root-port",
>  +.priority = MIG_PRI_PCIE_ROOT_PORT,
>   .version_id = 1,
>   .minimum_version_id = 1,
>   .post_load = pcie_cap_slot_post_load,
>  diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
>  index 8c3889433c..491449db9f 100644
>  --- a/include/migration/vmstate.h
>  +++ b/include/migration/vmstate.h
>  @@ -148,6 +148,7 @@ enum VMStateFlags {
>   typedef enum {
>   MIG_PRI_DEFAULT = 0,
>

Re: [Qemu-devel] [PATCH v6 13/23] hmp: display memory encryption support in 'info kvm'

2018-02-01 Thread Dr. David Alan Gilbert

* Brijesh Singh (brijesh.si...@amd.com) wrote:
> 
> 
> On 2/1/18 11:58 AM, Dr. David Alan Gilbert wrote:
> > * Brijesh Singh (brijesh.si...@amd.com) wrote:
> >> update 'info kvm' to display the memory encryption support.
> >>
> >> (qemu) info kvm
> >> kvm support: enabled
> >> memory encryption: disabled
> > As Markus said, this should be split qmp/hmp; but something else to
> > think about is whether this is a boolean or needs to be an enum;  do
> > you have one version of encryption or are we going to need to flag up
> > versions or the features of the encryption?
> 
> In future I could see us providing encrypted state status when we
> implement SEV-ES support, something like
> 
> (qemu) info kvm
> kvm support: enabled
> memory encryption: enabled
> cpu register state: encrypted
> 
> but so far I do not see need to provide the version string. If user
> wants to know the SEV version then it can open /dev/sev device to get
> platform status and more.

Yes, I was worried a bit more about how general that was going to be
or whether we're collecting a lot of architecture specific fields here.
So I wondered, if it was an enum, whether that would be come:

memory encryption: none

memory encryption: SEV

memory encryption: SEV-ES

(I'm not too sure whether that's better or not, just a suggestion)

Dave

> 
> > Dave
> >
> >> Cc: "Dr. David Alan Gilbert" 
> >> Cc: Eric Blake 
> >> Cc: Markus Armbruster 
> >> Cc: Paolo Bonzini 
> >> Signed-off-by: Brijesh Singh 
> >> ---
> >>  hmp.c| 2 ++
> >>  qapi-schema.json | 5 -
> >>  qmp.c| 1 +
> >>  3 files changed, 7 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/hmp.c b/hmp.c
> >> index 056bf70cf1e2..6ceb6b30af75 100644
> >> --- a/hmp.c
> >> +++ b/hmp.c
> >> @@ -88,6 +88,8 @@ void hmp_info_kvm(Monitor *mon, const QDict *qdict)
> >>  monitor_printf(mon, "kvm support: ");
> >>  if (info->present) {
> >>  monitor_printf(mon, "%s\n", info->enabled ? "enabled" : 
> >> "disabled");
> >> +monitor_printf(mon, "memory encryption: %s\n",
> >> +   info->mem_encryption ? "enabled" : "disabled");
> >>  } else {
> >>  monitor_printf(mon, "not compiled\n");
> >>  }
> >> diff --git a/qapi-schema.json b/qapi-schema.json
> >> index 5c06745c7927..2046c96669bf 100644
> >> --- a/qapi-schema.json
> >> +++ b/qapi-schema.json
> >> @@ -314,9 +314,12 @@
> >>  #
> >>  # @present: true if KVM acceleration is built into this executable
> >>  #
> >> +# @mem-encryption: true if Memory Encryption is active (since 2.12)
> >> +#
> >>  # Since: 0.14.0
> >>  ##
> >> -{ 'struct': 'KvmInfo', 'data': {'enabled': 'bool', 'present': 'bool'} }
> >> +{ 'struct': 'KvmInfo', 'data': {'enabled': 'bool', 'present': 'bool',
> >> +'mem-encryption' : 'bool'} }
> >>  
> >>  ##
> >>  # @query-kvm:
> >> diff --git a/qmp.c b/qmp.c
> >> index 52cfd2d81c0f..3a527bc8c39c 100644
> >> --- a/qmp.c
> >> +++ b/qmp.c
> >> @@ -69,6 +69,7 @@ KvmInfo *qmp_query_kvm(Error **errp)
> >>  
> >>  info->enabled = kvm_enabled();
> >>  info->present = kvm_available();
> >> +info->mem_encryption = kvm_memcrypt_enabled();
> >>  
> >>  return info;
> >>  }
> >> -- 
> >> 2.9.5
> >>
> > --
> > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH v3 37/39] iotests: Test valid values of l2-cache-entry-size

2018-02-01 Thread Max Reitz

On 2018-01-26 16:00, Alberto Garcia wrote:
> The l2-cache-entry-size setting can only contain values that are
> powers of two between 512 and the cluster size.
> 
> Signed-off-by: Alberto Garcia 
> ---
>  tests/qemu-iotests/103 | 17 +
>  tests/qemu-iotests/103.out |  3 +++
>  2 files changed, 20 insertions(+)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH] pcie-root-port: let it has higher migrate priority

2018-02-01 Thread Marcel Apfelbaum

On 01/02/2018 21:48, Dr. David Alan Gilbert wrote:
> * Michael S. Tsirkin (m...@redhat.com) wrote:
>> On Thu, Feb 01, 2018 at 02:24:15PM +0200, Marcel Apfelbaum wrote:
>>> Hi Peter,
>>>
>>> On 01/02/2018 13:20, Peter Xu wrote:
 In the past, we prioritized IOMMU migration so that we have such a
 priority order:

 IOMMU > PCI Devices

 When migrating a guest with both vIOMMU and pcie-root-port, we'll always
 migrate vIOMMU first, since pcie-root-port will be seen to have the same
 priority of general PCI devices.

 That's problematic.

 The thing is that PCI bus number information is stored in the root port,
 and that is needed by vIOMMU during post_load(), e.g., to figure out
 context entry for a device.  If we don't have correct bus numbers for
 devices, we won't be able to recover device state of the DMAR memory
 regions, and things will be messed up.

 So let's boost the PCIe root ports to be even with higher priority:

PCIe Root Port > IOMMU > PCI Devices

 A smoke test shows that this patch fixes bug 1538953.

 CC: Alex Williamson 
 CC: Marcel Apfelbaum 
 CC: Michael S. Tsirkin 
 CC: Dr. David Alan Gilbert 
 CC: Juan Quintela 
 CC: Laurent Vivier 
 Bug: https://bugzilla.redhat.com/show_bug.cgi?id=1538953
 Reported-by: Maxime Coquelin 
 Signed-off-by: Peter Xu 
 ---
 Marcel & all,

 I think it's possible that we need similar thing for other bridge-like
 devices, but I'm not that familiar.  Would you help confirm?  Thanks,
>>>
>>> Is a pity we don't have a way to mark the migration priority
>>> in a base class. Dave, maybe we do have a way?
>>>
>>> In the meantime you would need to add it also to:
>>> - ioh3420 (Intel root port)
>>> - xio3130_downstream (Intel switch downstream port)
>>> - xio3130_upstream (The counterpart of the above, you want the whole
>>>   switch to be migrated before loading the IOMMU device state)
>>> - pcie_pci_bridge (for pci devices)
>>> - pci-pci bridge (if for some reason you have one attached to the 
>>> pcie_pci_brdge)
>>> - i82801b11 (dmi-pci bridge, we want to deprecate it bu is there for now)
>>>
>>> Thanks,
>>> Marcel
>>
>> It's kind of strange that we need to set the priority manually.
>> Can't migration figure it out itself?
> 
>> I think bus
>> must always be migrated before the devices behind it ...
> 
> I think that's true; but:
>   a) Is the iommu a child of any of the PCI busses?

No, is a sysbus device.

>   b) does anything ensure that the bridge that's a parent of a bus
>  gets migrated before the bus it provides?

I think this was Michael's question :)

>   c) What happens with more htan one root port?
> 

Root ports can't be nested, anyway, I suppose the migration should
follow the bus numbering order.

The question now is what happens if the migration is happening before
the guest firmware finishes assigning numbers to buses...

Still, QEMU has enough information to decide the right ordering,
the question is if the current migration mechanism has some ordering
or the only one is the "VMStateFlags".

Thanks,
Marcel

> Dave
> 
> 
  hw/pci-bridge/gen_pcie_root_port.c | 1 +
  include/migration/vmstate.h| 1 +
  2 files changed, 2 insertions(+)

 diff --git a/hw/pci-bridge/gen_pcie_root_port.c 
 b/hw/pci-bridge/gen_pcie_root_port.c
 index 0e2f2e8bf1..e6ff1effd8 100644
 --- a/hw/pci-bridge/gen_pcie_root_port.c
 +++ b/hw/pci-bridge/gen_pcie_root_port.c
 @@ -101,6 +101,7 @@ static void gen_rp_realize(DeviceState *dev, Error 
 **errp)
  
  static const VMStateDescription vmstate_rp_dev = {
  .name = "pcie-root-port",
 +.priority = MIG_PRI_PCIE_ROOT_PORT,
  .version_id = 1,
  .minimum_version_id = 1,
  .post_load = pcie_cap_slot_post_load,
 diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
 index 8c3889433c..491449db9f 100644
 --- a/include/migration/vmstate.h
 +++ b/include/migration/vmstate.h
 @@ -148,6 +148,7 @@ enum VMStateFlags {
  typedef enum {
  MIG_PRI_DEFAULT = 0,
  MIG_PRI_IOMMU,  /* Must happen before PCI devices */
 +MIG_PRI_PCIE_ROOT_PORT, /* Must happen before IOMMU */
  MIG_PRI_GICV3_ITS,  /* Must happen before PCI devices */
  MIG_PRI_GICV3,  /* Must happen before the ITS */
  MIG_PRI_MAX,

> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
>

Re: [Qemu-devel] [PATCH V1 1/1] tests: Add migration test for aarch64

2018-02-01 Thread Dr. David Alan Gilbert

* Wei Huang (w...@redhat.com) wrote:
> 
> 
> On 01/25/2018 02:05 PM, Dr. David Alan Gilbert wrote:
> > * Wei Huang (w...@redhat.com) wrote:
> >> This patch adds the migration test support for aarch64. The test code,
> >> which implements the same functionality as x86, is compiled into a binary
> >> and booted as a kernel in qemu. Here are the design ideas:
> >>
> >>  * We choose this -kernel design because aarch64 QEMU doesn't provide a
> >>built-in fw like x86 does. So instead of relying on a boot loader, we
> >>use -kernel approach for aarch64.
> >>  * The serial output is sent to PL011 directly.
> >>  * The physical memory base for mach-virt machine is 0x4000. We change
> >>the start_address and end_address for aarch64.
> >>
> >> RFC->V1:
> >>  * aarch64 kernel blob is defined as an uint32_t array
> >>  * The test code is re-written to address a data caching issue under KVM.
> >>Tests passed under both x86 and aarch64.
> >>  * Re-use init_bootfile_x86() for both x86 and aarch64
> >>  * Other minor fixes
> >>
> >> Note that the test code is as the following:
> >>
> >> .section .text
> >>
> >> .globl  start
> >> 
> >> start:
> >> /* disable MMU to use phys mem address */
> >> mrs x0, sctlr_el1
> >> bic x0, x0, #(1<<0)
> >> msr sctlr_el1, x0
> >> isb
> >>
> >> /* output char 'A' to PL011 */
> >> mov w4, #65
> >> mov x5, #0x900
> >> strbw4, [x5]
> >>
> >> /* w6 keeps a counter so we can limit the output speed */
> >> mov w6, #0
> >>
> >> /* phys mem base addr = 0x4000 */
> >> mov x3, #(0x4000 + 100 *1024*1024) /* traverse 1M-100M */
> >> mov x2, #(0x4000 + 1 * 1024*1024)
> >>
> >> /* clean up memory first */
> >> mov w1, #0  
> >> clean:
> >> strbw1, [x2]
> >> add x2, x2, #(4096)
> >> cmp x2, x3
> >> ble clean
> >>
> >> /* main body */
> >> mainloop:
> >> mov x2, #(0x4000 + 1 * 1024*1024)
> >> 
> >> innerloop:
> >> /* clean cache because el2 might still cache guest data under KVM 
> >> */
> >> dc  civac, x2
> > 
> > Can you explain a bit more about that please;  if it's guest
> > visible acorss migration, doesn't that suggest we need the cache clear
> > in KVM or QEMU?
> 
> I think this is ARM specific. This is caused by the inconsistency
> between guest VM's data accesses and userspace's accesses (in
> check_guest_ram) at the destination:
> 
> 1) Because uncacheable (guest) + cacheable (host) ==> uncacheable. So
> the data accesses from guest VM go directly into memory.
> 2) QEMU user space will use the cacheable version. So it is possible
> that data can come from cache, instead of RAM. The difference between
> (1) and (2) obviously can cause check_guest_ram() to fail sometimes.
> 
> x86's EPT has the capability to ignore guest-provided memory type. So it
> is possible to have consistent data views between guest and QEMU user-space.

I'll admit to not quite understanding where this thread ended up;
it seems to have fallen into other ARM consistency questions.
Other than that it looks fine to me, but see the separate patch I posted
yesterday that includes the x86 source.

Peter: What's your view on the ight fix for hte memory consistency?

Dave

> 
> > 
> > Dave
> > 
> >> ldrbw1, [x2]
> >> add w1, w1, #1
> >> and w1, w1, #(0xff)
> >> strbw1, [x2]
> >>
> >> add x2, x2, #(4096)
> >> cmp x2, x3
> >> blt innerloop
> >>
> >> add w6, w6, #1
> >> and w6, w6, #(0xff)
> >> cmp w6, #0
> >> bne mainloop
> >> 
> >> /* output char 'B' to PL011 */
> >> mov w4, #66
> >> mov x5, #0x900
> >> strbw4, [x5]
> >>
> >> bl  mainloop
> >>
> >> The code is compiled with the following commands:
> >>  > gcc -c -o fill.o fill.s
> >>  > gcc -O2 -o fill.elf 
> >> -Wl,-T,/tmp/flat.lds,--build-id=none,-Ttext=4008 \
> >>-nostdlib fill.o
> >>  > objcopy -O binary fill.elf fill.flat
> >>  > truncate -c -s 144 ./fill.flat
> >>  > xxd -g 4 -c 24 -e fill.flat | awk '{print "0x"$2 ", " "0x"$3 ", " 
> >> "0x"C$4 ", " "0x"C$5 ", " "0x"$6 ", " "0x"C$7 "," }'
> >>
> >> The linker file (borrowed from KVM unit test) is defined as:
> >>
> >> SECTIONS
> >> {
> >> .text : { *(.init) *(.text) *(.text.*) }
> >> . = ALIGN(64K);
> >> etext = .;
> >> .data : {
> >> *(.data)
> >> }
> >> . = ALIGN(16);
> >> .rodata : { *(.rodata) }
> >> . = ALIGN(16);
> >> .bss : { *(.bss) }
> >> . = ALIGN(64K);
> >> edata = .;
> >> . += 64K;
> >> . = ALIGN(64K);
> >> /*
> >>  * stack depth is 16K for arm and PAGE_SIZE for arm64, see THREAD_SIZE
> >>  * sp must be 16 byte aligned for

Re: [Qemu-devel] [PATCH v3 35/39] qcow2: Rename l2_table in count_cow_clusters()

2018-02-01 Thread Max Reitz

On 2018-01-26 16:00, Alberto Garcia wrote:
> This function doesn't need any changes to support L2 slices, but since
> it's now dealing with slices intead of full tables, the l2_table
> variable is renamed for clarity.
> 
> Signed-off-by: Alberto Garcia 
> Reviewed-by: Eric Blake 
> ---
>  block/qcow2-cluster.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH] pcie-root-port: let it has higher migrate priority

2018-02-01 Thread Dr. David Alan Gilbert

* Michael S. Tsirkin (m...@redhat.com) wrote:
> On Thu, Feb 01, 2018 at 02:24:15PM +0200, Marcel Apfelbaum wrote:
> > Hi Peter,
> > 
> > On 01/02/2018 13:20, Peter Xu wrote:
> > > In the past, we prioritized IOMMU migration so that we have such a
> > > priority order:
> > > 
> > > IOMMU > PCI Devices
> > > 
> > > When migrating a guest with both vIOMMU and pcie-root-port, we'll always
> > > migrate vIOMMU first, since pcie-root-port will be seen to have the same
> > > priority of general PCI devices.
> > > 
> > > That's problematic.
> > > 
> > > The thing is that PCI bus number information is stored in the root port,
> > > and that is needed by vIOMMU during post_load(), e.g., to figure out
> > > context entry for a device.  If we don't have correct bus numbers for
> > > devices, we won't be able to recover device state of the DMAR memory
> > > regions, and things will be messed up.
> > > 
> > > So let's boost the PCIe root ports to be even with higher priority:
> > > 
> > >PCIe Root Port > IOMMU > PCI Devices
> > > 
> > > A smoke test shows that this patch fixes bug 1538953.
> > > 
> > > CC: Alex Williamson 
> > > CC: Marcel Apfelbaum 
> > > CC: Michael S. Tsirkin 
> > > CC: Dr. David Alan Gilbert 
> > > CC: Juan Quintela 
> > > CC: Laurent Vivier 
> > > Bug: https://bugzilla.redhat.com/show_bug.cgi?id=1538953
> > > Reported-by: Maxime Coquelin 
> > > Signed-off-by: Peter Xu 
> > > ---
> > > Marcel & all,
> > > 
> > > I think it's possible that we need similar thing for other bridge-like
> > > devices, but I'm not that familiar.  Would you help confirm?  Thanks,
> > 
> > Is a pity we don't have a way to mark the migration priority
> > in a base class. Dave, maybe we do have a way?
> > 
> > In the meantime you would need to add it also to:
> > - ioh3420 (Intel root port)
> > - xio3130_downstream (Intel switch downstream port)
> > - xio3130_upstream (The counterpart of the above, you want the whole
> >   switch to be migrated before loading the IOMMU device state)
> > - pcie_pci_bridge (for pci devices)
> > - pci-pci bridge (if for some reason you have one attached to the 
> > pcie_pci_brdge)
> > - i82801b11 (dmi-pci bridge, we want to deprecate it bu is there for now)
> > 
> > Thanks,
> > Marcel
> 
> It's kind of strange that we need to set the priority manually.
> Can't migration figure it out itself?

> I think bus
> must always be migrated before the devices behind it ...

I think that's true; but:
  a) Is the iommu a child of any of the PCI busses?
  b) does anything ensure that the bridge that's a parent of a bus
 gets migrated before the bus it provides?
  c) What happens with more htan one root port?

Dave


> > >  hw/pci-bridge/gen_pcie_root_port.c | 1 +
> > >  include/migration/vmstate.h| 1 +
> > >  2 files changed, 2 insertions(+)
> > > 
> > > diff --git a/hw/pci-bridge/gen_pcie_root_port.c 
> > > b/hw/pci-bridge/gen_pcie_root_port.c
> > > index 0e2f2e8bf1..e6ff1effd8 100644
> > > --- a/hw/pci-bridge/gen_pcie_root_port.c
> > > +++ b/hw/pci-bridge/gen_pcie_root_port.c
> > > @@ -101,6 +101,7 @@ static void gen_rp_realize(DeviceState *dev, Error 
> > > **errp)
> > >  
> > >  static const VMStateDescription vmstate_rp_dev = {
> > >  .name = "pcie-root-port",
> > > +.priority = MIG_PRI_PCIE_ROOT_PORT,
> > >  .version_id = 1,
> > >  .minimum_version_id = 1,
> > >  .post_load = pcie_cap_slot_post_load,
> > > diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
> > > index 8c3889433c..491449db9f 100644
> > > --- a/include/migration/vmstate.h
> > > +++ b/include/migration/vmstate.h
> > > @@ -148,6 +148,7 @@ enum VMStateFlags {
> > >  typedef enum {
> > >  MIG_PRI_DEFAULT = 0,
> > >  MIG_PRI_IOMMU,  /* Must happen before PCI devices */
> > > +MIG_PRI_PCIE_ROOT_PORT, /* Must happen before IOMMU */
> > >  MIG_PRI_GICV3_ITS,  /* Must happen before PCI devices */
> > >  MIG_PRI_GICV3,  /* Must happen before the ITS */
> > >  MIG_PRI_MAX,
> > > 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH v6 13/23] hmp: display memory encryption support in 'info kvm'

2018-02-01 Thread Brijesh Singh



On 2/1/18 11:58 AM, Dr. David Alan Gilbert wrote:
> * Brijesh Singh (brijesh.si...@amd.com) wrote:
>> update 'info kvm' to display the memory encryption support.
>>
>> (qemu) info kvm
>> kvm support: enabled
>> memory encryption: disabled
> As Markus said, this should be split qmp/hmp; but something else to
> think about is whether this is a boolean or needs to be an enum;  do
> you have one version of encryption or are we going to need to flag up
> versions or the features of the encryption?

In future I could see us providing encrypted state status when we
implement SEV-ES support, something like

(qemu) info kvm
kvm support: enabled
memory encryption: enabled
cpu register state: encrypted

but so far I do not see need to provide the version string. If user
wants to know the SEV version then it can open /dev/sev device to get
platform status and more.

> Dave
>
>> Cc: "Dr. David Alan Gilbert" 
>> Cc: Eric Blake 
>> Cc: Markus Armbruster 
>> Cc: Paolo Bonzini 
>> Signed-off-by: Brijesh Singh 
>> ---
>>  hmp.c| 2 ++
>>  qapi-schema.json | 5 -
>>  qmp.c| 1 +
>>  3 files changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/hmp.c b/hmp.c
>> index 056bf70cf1e2..6ceb6b30af75 100644
>> --- a/hmp.c
>> +++ b/hmp.c
>> @@ -88,6 +88,8 @@ void hmp_info_kvm(Monitor *mon, const QDict *qdict)
>>  monitor_printf(mon, "kvm support: ");
>>  if (info->present) {
>>  monitor_printf(mon, "%s\n", info->enabled ? "enabled" : "disabled");
>> +monitor_printf(mon, "memory encryption: %s\n",
>> +   info->mem_encryption ? "enabled" : "disabled");
>>  } else {
>>  monitor_printf(mon, "not compiled\n");
>>  }
>> diff --git a/qapi-schema.json b/qapi-schema.json
>> index 5c06745c7927..2046c96669bf 100644
>> --- a/qapi-schema.json
>> +++ b/qapi-schema.json
>> @@ -314,9 +314,12 @@
>>  #
>>  # @present: true if KVM acceleration is built into this executable
>>  #
>> +# @mem-encryption: true if Memory Encryption is active (since 2.12)
>> +#
>>  # Since: 0.14.0
>>  ##
>> -{ 'struct': 'KvmInfo', 'data': {'enabled': 'bool', 'present': 'bool'} }
>> +{ 'struct': 'KvmInfo', 'data': {'enabled': 'bool', 'present': 'bool',
>> +'mem-encryption' : 'bool'} }
>>  
>>  ##
>>  # @query-kvm:
>> diff --git a/qmp.c b/qmp.c
>> index 52cfd2d81c0f..3a527bc8c39c 100644
>> --- a/qmp.c
>> +++ b/qmp.c
>> @@ -69,6 +69,7 @@ KvmInfo *qmp_query_kvm(Error **errp)
>>  
>>  info->enabled = kvm_enabled();
>>  info->present = kvm_available();
>> +info->mem_encryption = kvm_memcrypt_enabled();
>>  
>>  return info;
>>  }
>> -- 
>> 2.9.5
>>
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH v3 31/39] qcow2: Update qcow2_truncate() to support L2 slices

2018-02-01 Thread Max Reitz

On 2018-01-26 15:59, Alberto Garcia wrote:
> The qcow2_truncate() code is mostly independent from whether
> we're using L2 slices or full L2 tables, but in full and
> falloc preallocation modes new L2 tables are allocated using
> qcow2_alloc_cluster_link_l2().  Therefore the code needs to be
> modified to ensure that all nb_clusters that are processed in each
> call can be allocated with just one L2 slice.
> 
> Signed-off-by: Alberto Garcia 
> Reviewed-by: Eric Blake 
> ---
>  block/qcow2.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 78f067cae7..529becfa30 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -3261,8 +3261,9 @@ static int qcow2_truncate(BlockDriverState *bs, int64_t 
> offset,
>  guest_offset = old_length;
>  while (nb_new_data_clusters) {
>  int64_t guest_cluster = guest_offset >> s->cluster_bits;
> -int64_t nb_clusters = MIN(nb_new_data_clusters,
> -  s->l2_size - guest_cluster % 
> s->l2_size);
> +int64_t nb_clusters = MIN(
> +nb_new_data_clusters,
> +s->l2_slice_size - guest_cluster % s->l2_slice_size);

An alternative would be the
"s->l2_slice_size - offset_to_l2_slice_index(s, guest_offset)" we
basically have elsewhere, but that's longer and doesn't really matter:

Reviewed-by: Max Reitz 

>  QCowL2Meta allocation = {
>  .offset   = guest_offset,
>  .alloc_offset = host_offset,
> 




signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v3 32/39] qcow2: Rename l2_table in qcow2_alloc_compressed_cluster_offset()

2018-02-01 Thread Max Reitz

On 2018-01-26 15:59, Alberto Garcia wrote:
> This function doesn't need any changes to support L2 slices, but since
> it's now dealing with slices instead of full tables, the l2_table
> variable is renamed for clarity.
> 
> Signed-off-by: Alberto Garcia 
> Reviewed-by: Eric Blake 
> ---
>  block/qcow2-cluster.c | 16 
>  1 file changed, 8 insertions(+), 8 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH] pcie-root-port: let it has higher migrate priority

2018-02-01 Thread Dr. David Alan Gilbert

* Peter Xu (pet...@redhat.com) wrote:
> In the past, we prioritized IOMMU migration so that we have such a
> priority order:
> 
> IOMMU > PCI Devices
> 
> When migrating a guest with both vIOMMU and pcie-root-port, we'll always
> migrate vIOMMU first, since pcie-root-port will be seen to have the same
> priority of general PCI devices.
> 
> That's problematic.
> 
> The thing is that PCI bus number information is stored in the root port,
> and that is needed by vIOMMU during post_load(), e.g., to figure out
> context entry for a device.  If we don't have correct bus numbers for
> devices, we won't be able to recover device state of the DMAR memory
> regions, and things will be messed up.
> 
> So let's boost the PCIe root ports to be even with higher priority:
> 
>PCIe Root Port > IOMMU > PCI Devices
> 
> A smoke test shows that this patch fixes bug 1538953.

Two questions (partially overlapping with what I replied to Michaels):
  a) What happens with multiple IOMMUs?
  b) What happens with multiple root ports?
  c) How correct is this ordering on different implementations 
(e.g. ARM/Power/etc)

Dave

> 
> CC: Alex Williamson 
> CC: Marcel Apfelbaum 
> CC: Michael S. Tsirkin 
> CC: Dr. David Alan Gilbert 
> CC: Juan Quintela 
> CC: Laurent Vivier 
> Bug: https://bugzilla.redhat.com/show_bug.cgi?id=1538953
> Reported-by: Maxime Coquelin 
> Signed-off-by: Peter Xu 
> ---
> Marcel & all,
> 
> I think it's possible that we need similar thing for other bridge-like
> devices, but I'm not that familiar.  Would you help confirm?  Thanks,
> ---
>  hw/pci-bridge/gen_pcie_root_port.c | 1 +
>  include/migration/vmstate.h| 1 +
>  2 files changed, 2 insertions(+)
> 
> diff --git a/hw/pci-bridge/gen_pcie_root_port.c 
> b/hw/pci-bridge/gen_pcie_root_port.c
> index 0e2f2e8bf1..e6ff1effd8 100644
> --- a/hw/pci-bridge/gen_pcie_root_port.c
> +++ b/hw/pci-bridge/gen_pcie_root_port.c
> @@ -101,6 +101,7 @@ static void gen_rp_realize(DeviceState *dev, Error **errp)
>  
>  static const VMStateDescription vmstate_rp_dev = {
>  .name = "pcie-root-port",
> +.priority = MIG_PRI_PCIE_ROOT_PORT,
>  .version_id = 1,
>  .minimum_version_id = 1,
>  .post_load = pcie_cap_slot_post_load,
> diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
> index 8c3889433c..491449db9f 100644
> --- a/include/migration/vmstate.h
> +++ b/include/migration/vmstate.h
> @@ -148,6 +148,7 @@ enum VMStateFlags {
>  typedef enum {
>  MIG_PRI_DEFAULT = 0,
>  MIG_PRI_IOMMU,  /* Must happen before PCI devices */
> +MIG_PRI_PCIE_ROOT_PORT, /* Must happen before IOMMU */
>  MIG_PRI_GICV3_ITS,  /* Must happen before PCI devices */
>  MIG_PRI_GICV3,  /* Must happen before the ITS */
>  MIG_PRI_MAX,
> -- 
> 2.14.3
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH v3 34/39] qcow2: Rename l2_table in count_contiguous_clusters_unallocated()

2018-02-01 Thread Max Reitz

On 2018-01-26 16:00, Alberto Garcia wrote:
> This function doesn't need any changes to support L2 slices, but since
> it's now dealing with slices intead of full tables, the l2_table
> variable is renamed for clarity.
> 
> Signed-off-by: Alberto Garcia 
> Reviewed-by: Eric Blake 
> ---
>  block/qcow2-cluster.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v3 33/39] qcow2: Rename l2_table in count_contiguous_clusters()

2018-02-01 Thread Max Reitz

On 2018-01-26 16:00, Alberto Garcia wrote:
> This function doesn't need any changes to support L2 slices, but since
> it's now dealing with slices intead of full tables, the l2_table
> variable is renamed for clarity.
> 
> Signed-off-by: Alberto Garcia 
> Reviewed-by: Eric Blake 
> ---
>  block/qcow2-cluster.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH] pcie-root-port: let it has higher migrate priority

2018-02-01 Thread Dr. David Alan Gilbert

* Marcel Apfelbaum (mar...@redhat.com) wrote:
> Hi Peter,
> 
> On 01/02/2018 13:20, Peter Xu wrote:
> > In the past, we prioritized IOMMU migration so that we have such a
> > priority order:
> > 
> > IOMMU > PCI Devices
> > 
> > When migrating a guest with both vIOMMU and pcie-root-port, we'll always
> > migrate vIOMMU first, since pcie-root-port will be seen to have the same
> > priority of general PCI devices.
> > 
> > That's problematic.
> > 
> > The thing is that PCI bus number information is stored in the root port,
> > and that is needed by vIOMMU during post_load(), e.g., to figure out
> > context entry for a device.  If we don't have correct bus numbers for
> > devices, we won't be able to recover device state of the DMAR memory
> > regions, and things will be messed up.
> > 
> > So let's boost the PCIe root ports to be even with higher priority:
> > 
> >PCIe Root Port > IOMMU > PCI Devices
> > 
> > A smoke test shows that this patch fixes bug 1538953.
> > 
> > CC: Alex Williamson 
> > CC: Marcel Apfelbaum 
> > CC: Michael S. Tsirkin 
> > CC: Dr. David Alan Gilbert 
> > CC: Juan Quintela 
> > CC: Laurent Vivier 
> > Bug: https://bugzilla.redhat.com/show_bug.cgi?id=1538953
> > Reported-by: Maxime Coquelin 
> > Signed-off-by: Peter Xu 
> > ---
> > Marcel & all,
> > 
> > I think it's possible that we need similar thing for other bridge-like
> > devices, but I'm not that familiar.  Would you help confirm?  Thanks,
> 
> Is a pity we don't have a way to mark the migration priority
> in a base class. Dave, maybe we do have a way?

Not that I'm aware of; the 'priority' field is associated with the VMSD
and it's not really connected to the class hierarchy at all.

Dave

> In the meantime you would need to add it also to:
> - ioh3420 (Intel root port)
> - xio3130_downstream (Intel switch downstream port)
> - xio3130_upstream (The counterpart of the above, you want the whole
>   switch to be migrated before loading the IOMMU device state)
> - pcie_pci_bridge (for pci devices)
> - pci-pci bridge (if for some reason you have one attached to the 
> pcie_pci_brdge)
> - i82801b11 (dmi-pci bridge, we want to deprecate it bu is there for now)
> 
> Thanks,
> Marcel
> 
> >  hw/pci-bridge/gen_pcie_root_port.c | 1 +
> >  include/migration/vmstate.h| 1 +
> >  2 files changed, 2 insertions(+)
> > 
> > diff --git a/hw/pci-bridge/gen_pcie_root_port.c 
> > b/hw/pci-bridge/gen_pcie_root_port.c
> > index 0e2f2e8bf1..e6ff1effd8 100644
> > --- a/hw/pci-bridge/gen_pcie_root_port.c
> > +++ b/hw/pci-bridge/gen_pcie_root_port.c
> > @@ -101,6 +101,7 @@ static void gen_rp_realize(DeviceState *dev, Error 
> > **errp)
> >  
> >  static const VMStateDescription vmstate_rp_dev = {
> >  .name = "pcie-root-port",
> > +.priority = MIG_PRI_PCIE_ROOT_PORT,
> >  .version_id = 1,
> >  .minimum_version_id = 1,
> >  .post_load = pcie_cap_slot_post_load,
> > diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
> > index 8c3889433c..491449db9f 100644
> > --- a/include/migration/vmstate.h
> > +++ b/include/migration/vmstate.h
> > @@ -148,6 +148,7 @@ enum VMStateFlags {
> >  typedef enum {
> >  MIG_PRI_DEFAULT = 0,
> >  MIG_PRI_IOMMU,  /* Must happen before PCI devices */
> > +MIG_PRI_PCIE_ROOT_PORT, /* Must happen before IOMMU */
> >  MIG_PRI_GICV3_ITS,  /* Must happen before PCI devices */
> >  MIG_PRI_GICV3,  /* Must happen before the ITS */
> >  MIG_PRI_MAX,
> > 
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH V8 3/4] pvrdma: initial implementation

2018-02-01 Thread Marcel Apfelbaum

Hi Michael,

On 01/02/2018 21:10, Michael S. Tsirkin wrote:
>> diff --git a/hw/rdma/vmw/vmw_pvrdma-abi.h b/hw/rdma/vmw/vmw_pvrdma-abi.h
>> new file mode 100644
>> index 00..8cfb9d7745
>> --- /dev/null
>> +++ b/hw/rdma/vmw/vmw_pvrdma-abi.h
> 
> Why do you copy this here? Why not import it
> from include/uapi/rdma/vmw_pvrdma-abi.h in Linux into
> standard headers?
> 

Do you mean adding it to QEMU's include/standard-headers/ ?
I am sorry for the possibly dumb question, but is there
a special way to do it or we simply copy it?

Regarding why I am not sure is worth it (Cornelia asked the same question
some time ago):
We are only using the header because we are too lazy to copy the structs
that are passed between the guest driver and QEMU.
This ABI is against the pvrdma device and we don't want/need to update
it every time it gets updated in Linux kernel.
If such ABI will be changed, the command channel version will be updated
but we will support only the current version.

The bottom line: we don't want to depend on the Linux kernel version,
we want to import it only once. When we will want to upgrade, we want
us to be the ones to decide and not re-act.

(We saw the same pattern in other implementations)

Thanks,
Marcel

Re: [Qemu-devel] [PATCH v3 30/39] qcow2: Update expand_zero_clusters_in_l1() to support L2 slices

2018-02-01 Thread Max Reitz

On 2018-01-26 15:59, Alberto Garcia wrote:
> expand_zero_clusters_in_l1() expands zero clusters as a necessary step
> to downgrade qcow2 images to a version that doesn't support metadata
> zero clusters. This function takes an L1 table (which may or may not
> be active) and iterates over all its L2 tables looking for zero
> clusters.
> 
> Since we'll be loading L2 slices instead of full tables we need to add
> an extra loop that iterates over all slices of each L2 table, and we
> should also use the slice size when allocating the buffer used when
> the L1 table is not active.
> 
> This function doesn't need any additional changes so apart from that
> this patch simply updates the variable name from l2_table to l2_slice.
> 
> Finally, and since we have to touch the bdrv_read() / bdrv_write()
> calls anyway, this patch takes the opportunity to replace them with
> the byte-based bdrv_pread() / bdrv_pwrite().
> 
> Signed-off-by: Alberto Garcia 
> ---
>  block/qcow2-cluster.c | 52 
> ---
>  1 file changed, 29 insertions(+), 23 deletions(-)

Aside from the void * casts (and slice_size2 *cough* *cough*):

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v3 29/39] qcow2: Prepare expand_zero_clusters_in_l1() for adding L2 slice support

2018-02-01 Thread Max Reitz

On 2018-01-26 15:59, Alberto Garcia wrote:
> Adding support for L2 slices to expand_zero_clusters_in_l1() needs
> (among other things) an extra loop that iterates over all slices of
> each L2 table.
> 
> Putting all changes in one patch would make it hard to read because
> all semantic changes would be mixed with pure indentation changes.
> 
> To make things easier this patch simply creates a new block and
> changes the indentation of all lines of code inside it. Thus, all
> modifications in this patch are cosmetic. There are no semantic
> changes and no variables are renamed yet. The next patch will take
> care of that.
> 
> Signed-off-by: Alberto Garcia 
> ---
>  block/qcow2-cluster.c | 187 
> ++
>  1 file changed, 96 insertions(+), 91 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v3 27/39] qcow2: Update qcow2_update_snapshot_refcount() to support L2 slices

2018-02-01 Thread Max Reitz

On 2018-01-26 15:59, Alberto Garcia wrote:
> qcow2_update_snapshot_refcount() increases the refcount of all
> clusters of a given snapshot. In order to do that it needs to load all
> its L2 tables and iterate over their entries. Since we'll be loading
> L2 slices instead of full tables we need to add an extra loop that
> iterates over all slices of each L2 table.
> 
> This function doesn't need any additional changes so apart from that
> this patch simply updates the variable name from l2_table to l2_slice.
> 
> Signed-off-by: Alberto Garcia 
> ---
>  block/qcow2-refcount.c | 31 +--
>  1 file changed, 17 insertions(+), 14 deletions(-)
> 
> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
> index dfa28301c4..60b521cb89 100644
> --- a/block/qcow2-refcount.c
> +++ b/block/qcow2-refcount.c
> @@ -1183,17 +1183,20 @@ int qcow2_update_snapshot_refcount(BlockDriverState 
> *bs,
>  int64_t l1_table_offset, int l1_size, int addend)
>  {
>  BDRVQcow2State *s = bs->opaque;
> -uint64_t *l1_table, *l2_table, l2_offset, entry, l1_size2, refcount;
> +uint64_t *l1_table, *l2_slice, l2_offset, entry, l1_size2, refcount;
>  bool l1_allocated = false;
>  int64_t old_entry, old_l2_offset;
> +unsigned slice, slice_size2, n_slices;

Hm, well. Hm.

>  int i, j, l1_modified = 0, nb_csectors;
>  int ret;
>  
>  assert(addend >= -1 && addend <= 1);
>  
> -l2_table = NULL;
> +l2_slice = NULL;
>  l1_table = NULL;
>  l1_size2 = l1_size * sizeof(uint64_t);
> +slice_size2 = s->l2_slice_size * sizeof(uint64_t);
> +n_slices = s->cluster_size / slice_size2;
>  
>  s->cache_discards = true;
>  

[...]

> @@ -1273,12 +1276,13 @@ int qcow2_update_snapshot_refcount(BlockDriverState 
> *bs,
>  case QCOW2_CLUSTER_NORMAL:
>  case QCOW2_CLUSTER_ZERO_ALLOC:
>  if (offset_into_cluster(s, offset)) {
> +int l2_index = slice * s->l2_slice_size + j;
>  qcow2_signal_corruption(
>  bs, true, -1, -1, "Cluster "
>  "allocation offset %#" PRIx64
>  " unaligned (L2 offset: %#"
>  PRIx64 ", L2 index: %#x)",
> -offset, l2_offset, j);
> +offset, l2_offset, l2_index);

This makes it a bit weird that in other patches l2_index is now
generally the L2 slice index...

Oh well.

Reviewed-by: Max Reitz 

>  ret = -EIO;
>  goto fail;
>  }



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH V8 1/4] mem: add share parameter to memory-backend-ram

2018-02-01 Thread Paolo Bonzini

On 01/02/2018 14:21, Eduardo Habkost wrote:
>> The device looks at its own private page tables, and not
>> to the OS ones.
> I'm still confused by your statement that the device builds its
> own [IOVA->PA] page table.  How would the device do that if it
> doesn't have access to the CPU MMU state?  Isn't the IOVA->PA
> translation table built by the OS?

The driver builds a page table for the device, either when it pins the
pages or by using MMU notifiers.

Paolo

Re: [Qemu-devel] [PATCH v3 28/39] qcow2: Read refcount before L2 table in expand_zero_clusters_in_l1()

2018-02-01 Thread Max Reitz

On 2018-01-26 15:59, Alberto Garcia wrote:
> At the moment it doesn't really make a difference whether we call
> qcow2_get_refcount() before of after reading the L2 table, but if we
> want to support L2 slices we'll need to read the refcount first.
> 
> This patch simply changes the order of those two operations to prepare
> for that. The patch with the actual semantic changes will be easier to
> read because of this.
> 
> Signed-off-by: Alberto Garcia 
> ---
>  block/qcow2-cluster.c | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH V8 1/4] mem: add share parameter to memory-backend-ram

2018-02-01 Thread Marcel Apfelbaum

On 01/02/2018 21:21, Eduardo Habkost wrote:
> On Thu, Feb 01, 2018 at 08:58:32PM +0200, Marcel Apfelbaum wrote:
>> On 01/02/2018 20:51, Eduardo Habkost wrote:
>>> On Thu, Feb 01, 2018 at 08:31:09PM +0200, Marcel Apfelbaum wrote:
 On 01/02/2018 20:21, Eduardo Habkost wrote:
> On Thu, Feb 01, 2018 at 08:03:53PM +0200, Marcel Apfelbaum wrote:
>> On 01/02/2018 15:53, Eduardo Habkost wrote:
>>> On Thu, Feb 01, 2018 at 02:29:25PM +0200, Marcel Apfelbaum wrote:
 On 01/02/2018 14:10, Eduardo Habkost wrote:
> On Thu, Feb 01, 2018 at 07:36:50AM +0200, Marcel Apfelbaum wrote:
>> On 01/02/2018 4:22, Michael S. Tsirkin wrote:
>>> On Wed, Jan 31, 2018 at 09:34:22PM -0200, Eduardo Habkost wrote:
> [...]
 BTW, what's the root cause for requiring HVAs in the buffer?
>>>
>>> It's a side effect of the kernel/userspace API which always wants
>>> a single HVA/len pair to map memory for the application.
>>>
>>>
>>
>> Hi Eduardo and Michael,
>>
  Can
 this be fixed?
>>>
>>> I think yes.  It'd need to be a kernel patch for the RDMA subsystem
>>> mapping an s/g list with actual memory. The HVA/len pair would then 
>>> just
>>> be used to refer to the region, without creating the two mappings.
>>>
>>> Something like splitting the register mr into
>>>
>>> mr = create mr (va/len) - allocate a handle and record the va/len
>>>
>>> addmemory(mr, offset, hva, len) - pin memory
>>>
>>> register mr - pass it to HW
>>>
>>> As a nice side effect we won't burn so much virtual address space.
>>>
>>
>> We would still need a contiguous virtual address space range (for 
>> post-send)
>> which we don't have since guest contiguous virtual address space
>> will always end up as non-contiguous host virtual address space.
>>
>> I am not sure the RDMA HW can handle a large VA with holes.
>
> I'm confused.  Why would the hardware see and care about virtual
> addresses? 

 The post-send operations bypasses the kernel, and the process
 puts in the work request GVA addresses.

> How exactly does the hardware translates VAs to
> PAs? 

 The HW maintains a page-directory like structure different form MMU
 VA -> phys pages

> What if the process page tables change?
>

 Since the page tables the HW uses are their own, we just need the phys
 page to be pinned.
>>>
>>> So there's no hardware-imposed requirement that the hardware VAs
>>> (mapped by the HW page directory) match the VAs in QEMU
>>> address-space, right? 
>>
>> Actually there is. Today it works exactly as you described.
>
> Are you sure there's such hardware-imposed requirement?
>

 Yes.

> Why would the hardware require VAs to match the ones in the
> userspace address-space, if it doesn't use the CPU MMU at all?
>

 It works like that:

 1. We register a buffer from the process address space
giving its base address and length.
This call goes to kernel which in turn pins the phys pages
and registers them with the device *together* with the base
address (virtual address!)
 2. The device builds its own page tables to be able to translate
the virtual addresses to actual phys pages.
>>>
>>> How would the device be able to do that?  It would require the
>>> device to look at the process page tables, wouldn't it?  Isn't
>>> the HW IOVA->PA translation table built by the OS?
>>>
>>
>> As stated above, these are tables private for the device.
>> (They even have a hw vendor specific layout I think,
>>  since the device holds some cache)
>>
>> The device looks at its own private page tables, and not
>> to the OS ones.
> 
> I'm still confused by your statement that the device builds its
> own [IOVA->PA] page table.  How would the device do that if it
> doesn't have access to the CPU MMU state?  Isn't the IOVA->PA
> translation table built by the OS?
> 

Sorry about the confusion. The device gets a base virtual address,
the memory region length and a list of phys pages.
This is enough information to create its own kind of tables
which will tell, for example, if the IOVA starts at address
0x1000, that address 0x1001 is at page 0 and address 0x2000 is at page 1.

Be aware this base virtual address can be from any address space, not only
from the process address, the process address space is only the current software
implementation.

Thanks,
Marcel

>>
>>>
 3. The process executes post-send requests directly to hw by-passing
the kernel giving process virtual addresses in work requests.
 4. The device uses

Re: [Qemu-devel] [PATCH V8 1/4] mem: add share parameter to memory-backend-ram

2018-02-01 Thread Eduardo Habkost

On Thu, Feb 01, 2018 at 08:58:32PM +0200, Marcel Apfelbaum wrote:
> On 01/02/2018 20:51, Eduardo Habkost wrote:
> > On Thu, Feb 01, 2018 at 08:31:09PM +0200, Marcel Apfelbaum wrote:
> >> On 01/02/2018 20:21, Eduardo Habkost wrote:
> >>> On Thu, Feb 01, 2018 at 08:03:53PM +0200, Marcel Apfelbaum wrote:
>  On 01/02/2018 15:53, Eduardo Habkost wrote:
> > On Thu, Feb 01, 2018 at 02:29:25PM +0200, Marcel Apfelbaum wrote:
> >> On 01/02/2018 14:10, Eduardo Habkost wrote:
> >>> On Thu, Feb 01, 2018 at 07:36:50AM +0200, Marcel Apfelbaum wrote:
>  On 01/02/2018 4:22, Michael S. Tsirkin wrote:
> > On Wed, Jan 31, 2018 at 09:34:22PM -0200, Eduardo Habkost wrote:
> >>> [...]
> >> BTW, what's the root cause for requiring HVAs in the buffer?
> >
> > It's a side effect of the kernel/userspace API which always wants
> > a single HVA/len pair to map memory for the application.
> >
> >
> 
>  Hi Eduardo and Michael,
> 
> >>  Can
> >> this be fixed?
> >
> > I think yes.  It'd need to be a kernel patch for the RDMA subsystem
> > mapping an s/g list with actual memory. The HVA/len pair would then 
> > just
> > be used to refer to the region, without creating the two mappings.
> >
> > Something like splitting the register mr into
> >
> > mr = create mr (va/len) - allocate a handle and record the va/len
> >
> > addmemory(mr, offset, hva, len) - pin memory
> >
> > register mr - pass it to HW
> >
> > As a nice side effect we won't burn so much virtual address space.
> >
> 
>  We would still need a contiguous virtual address space range (for 
>  post-send)
>  which we don't have since guest contiguous virtual address space
>  will always end up as non-contiguous host virtual address space.
> 
>  I am not sure the RDMA HW can handle a large VA with holes.
> >>>
> >>> I'm confused.  Why would the hardware see and care about virtual
> >>> addresses? 
> >>
> >> The post-send operations bypasses the kernel, and the process
> >> puts in the work request GVA addresses.
> >>
> >>> How exactly does the hardware translates VAs to
> >>> PAs? 
> >>
> >> The HW maintains a page-directory like structure different form MMU
> >> VA -> phys pages
> >>
> >>> What if the process page tables change?
> >>>
> >>
> >> Since the page tables the HW uses are their own, we just need the phys
> >> page to be pinned.
> >
> > So there's no hardware-imposed requirement that the hardware VAs
> > (mapped by the HW page directory) match the VAs in QEMU
> > address-space, right? 
> 
>  Actually there is. Today it works exactly as you described.
> >>>
> >>> Are you sure there's such hardware-imposed requirement?
> >>>
> >>
> >> Yes.
> >>
> >>> Why would the hardware require VAs to match the ones in the
> >>> userspace address-space, if it doesn't use the CPU MMU at all?
> >>>
> >>
> >> It works like that:
> >>
> >> 1. We register a buffer from the process address space
> >>giving its base address and length.
> >>This call goes to kernel which in turn pins the phys pages
> >>and registers them with the device *together* with the base
> >>address (virtual address!)
> >> 2. The device builds its own page tables to be able to translate
> >>the virtual addresses to actual phys pages.
> > 
> > How would the device be able to do that?  It would require the
> > device to look at the process page tables, wouldn't it?  Isn't
> > the HW IOVA->PA translation table built by the OS?
> > 
> 
> As stated above, these are tables private for the device.
> (They even have a hw vendor specific layout I think,
>  since the device holds some cache)
> 
> The device looks at its own private page tables, and not
> to the OS ones.

I'm still confused by your statement that the device builds its
own [IOVA->PA] page table.  How would the device do that if it
doesn't have access to the CPU MMU state?  Isn't the IOVA->PA
translation table built by the OS?

> 
> > 
> >> 3. The process executes post-send requests directly to hw by-passing
> >>the kernel giving process virtual addresses in work requests.
> >> 4. The device uses its own page tables to translate the virtual
> >>addresses to phys pages and sending them.
> >>
> >> Theoretically is possible to send any contiguous IOVA instead of
> >> process's one but is not how is working today.
> >>
> >> Makes sense?
> > 
> 

-- 
Eduardo

Re: [Qemu-devel] [PATCH v3 26/39] qcow2: Prepare qcow2_update_snapshot_refcount() for adding L2 slice support

2018-02-01 Thread Max Reitz

On 2018-01-26 15:59, Alberto Garcia wrote:
> Adding support for L2 slices to qcow2_update_snapshot_refcount() needs
> (among other things) an extra loop that iterates over all slices of
> each L2 table.
> 
> Putting all changes in one patch would make it hard to read because
> all semantic changes would be mixed with pure indentation changes.
> 
> To make things easier this patch simply creates a new block and
> changes the indentation of all lines of code inside it. Thus, all
> modifications in this patch are cosmetic. There are no semantic
> changes and no variables are renamed yet. The next patch will take
> care of that.
> 
> Signed-off-by: Alberto Garcia 
> ---
>  block/qcow2-refcount.c | 140 
> ++---
>  1 file changed, 73 insertions(+), 67 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH] pcie-root-port: let it has higher migrate priority

2018-02-01 Thread Michael S. Tsirkin

On Thu, Feb 01, 2018 at 02:24:15PM +0200, Marcel Apfelbaum wrote:
> Hi Peter,
> 
> On 01/02/2018 13:20, Peter Xu wrote:
> > In the past, we prioritized IOMMU migration so that we have such a
> > priority order:
> > 
> > IOMMU > PCI Devices
> > 
> > When migrating a guest with both vIOMMU and pcie-root-port, we'll always
> > migrate vIOMMU first, since pcie-root-port will be seen to have the same
> > priority of general PCI devices.
> > 
> > That's problematic.
> > 
> > The thing is that PCI bus number information is stored in the root port,
> > and that is needed by vIOMMU during post_load(), e.g., to figure out
> > context entry for a device.  If we don't have correct bus numbers for
> > devices, we won't be able to recover device state of the DMAR memory
> > regions, and things will be messed up.
> > 
> > So let's boost the PCIe root ports to be even with higher priority:
> > 
> >PCIe Root Port > IOMMU > PCI Devices
> > 
> > A smoke test shows that this patch fixes bug 1538953.
> > 
> > CC: Alex Williamson 
> > CC: Marcel Apfelbaum 
> > CC: Michael S. Tsirkin 
> > CC: Dr. David Alan Gilbert 
> > CC: Juan Quintela 
> > CC: Laurent Vivier 
> > Bug: https://bugzilla.redhat.com/show_bug.cgi?id=1538953
> > Reported-by: Maxime Coquelin 
> > Signed-off-by: Peter Xu 
> > ---
> > Marcel & all,
> > 
> > I think it's possible that we need similar thing for other bridge-like
> > devices, but I'm not that familiar.  Would you help confirm?  Thanks,
> 
> Is a pity we don't have a way to mark the migration priority
> in a base class. Dave, maybe we do have a way?
> 
> In the meantime you would need to add it also to:
> - ioh3420 (Intel root port)
> - xio3130_downstream (Intel switch downstream port)
> - xio3130_upstream (The counterpart of the above, you want the whole
>   switch to be migrated before loading the IOMMU device state)
> - pcie_pci_bridge (for pci devices)
> - pci-pci bridge (if for some reason you have one attached to the 
> pcie_pci_brdge)
> - i82801b11 (dmi-pci bridge, we want to deprecate it bu is there for now)
> 
> Thanks,
> Marcel

It's kind of strange that we need to set the priority manually.
Can't migration figure it out itself? I think bus
must always be migrated before the devices behind it ...

> >  hw/pci-bridge/gen_pcie_root_port.c | 1 +
> >  include/migration/vmstate.h| 1 +
> >  2 files changed, 2 insertions(+)
> > 
> > diff --git a/hw/pci-bridge/gen_pcie_root_port.c 
> > b/hw/pci-bridge/gen_pcie_root_port.c
> > index 0e2f2e8bf1..e6ff1effd8 100644
> > --- a/hw/pci-bridge/gen_pcie_root_port.c
> > +++ b/hw/pci-bridge/gen_pcie_root_port.c
> > @@ -101,6 +101,7 @@ static void gen_rp_realize(DeviceState *dev, Error 
> > **errp)
> >  
> >  static const VMStateDescription vmstate_rp_dev = {
> >  .name = "pcie-root-port",
> > +.priority = MIG_PRI_PCIE_ROOT_PORT,
> >  .version_id = 1,
> >  .minimum_version_id = 1,
> >  .post_load = pcie_cap_slot_post_load,
> > diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
> > index 8c3889433c..491449db9f 100644
> > --- a/include/migration/vmstate.h
> > +++ b/include/migration/vmstate.h
> > @@ -148,6 +148,7 @@ enum VMStateFlags {
> >  typedef enum {
> >  MIG_PRI_DEFAULT = 0,
> >  MIG_PRI_IOMMU,  /* Must happen before PCI devices */
> > +MIG_PRI_PCIE_ROOT_PORT, /* Must happen before IOMMU */
> >  MIG_PRI_GICV3_ITS,  /* Must happen before PCI devices */
> >  MIG_PRI_GICV3,  /* Must happen before the ITS */
> >  MIG_PRI_MAX,
> >

Re: [Qemu-devel] [PATCH] iotests: Fix CID for VMDK afl image

2018-02-01 Thread Eric Blake

On 02/01/2018 11:59 AM, Max Reitz wrote:

>>> H, now this fails again on my 32 bit build. :-(
>>>
>>> The issue there is that you get a "Cannot allocate memory" when trying
>>> to open the file.  My current fix was 2291712c39111a732 which simply
>>> converted that to "Invalid argument", but now it's getting a bit more
>>> complicated...  Should I just continue to play the game and check the
>>> output for "Cannot allocate memory" and print exactly what the reference
>>> output is expecting...?
>>
>> Ahhh. OK, then, with a big comment.
>>
>> I'd say let's just _notrun on 32 bit.
> 
> Sounds OK, but how would we test that? uname? Or just _notrun when we
> see the ENOMEM message?

_notrun when you detect ENOMEM seems reasonable (the equivalent of exit
status 77 in skipping a test in an automake context)

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH V8 3/4] pvrdma: initial implementation

2018-02-01 Thread Michael S. Tsirkin

> diff --git a/hw/rdma/vmw/vmw_pvrdma-abi.h b/hw/rdma/vmw/vmw_pvrdma-abi.h
> new file mode 100644
> index 00..8cfb9d7745
> --- /dev/null
> +++ b/hw/rdma/vmw/vmw_pvrdma-abi.h

Why do you copy this here? Why not import it
from include/uapi/rdma/vmw_pvrdma-abi.h in Linux into
standard headers?

-- 
MST

Re: [Qemu-devel] [PATCH v3 25/39] qcow2: Update zero_single_l2() to support L2 slices

2018-02-01 Thread Max Reitz

On 2018-01-26 15:59, Alberto Garcia wrote:
> zero_single_l2() limits the number of clusters to be zeroed to the
> amount that fits inside an L2 table. Since we'll be loading L2 slices
> instead of full tables we need to update that limit.

Same as last patch, maybe we should rename the function now.

> Apart from that, this function doesn't need any additional changes, so
> this patch simply updates the variable name from l2_table to l2_slice.
> 
> Signed-off-by: Alberto Garcia 
> Reviewed-by: Eric Blake 
> ---
>  block/qcow2-cluster.c | 20 ++--
>  1 file changed, 10 insertions(+), 10 deletions(-)

Also again:

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v3 24/39] qcow2: Update discard_single_l2() to support L2 slices

2018-02-01 Thread Max Reitz

On 2018-01-26 15:59, Alberto Garcia wrote:
> discard_single_l2() limits the number of clusters to be discarded to
> the amount that fits inside an L2 table. Since we'll be loading L2
> slices instead of full tables we need to update that limit.

H, maybe rename the function to discard_l2_slice() or
discard_in_l2_slice() or discard_all_in_l2_slice() or
discard_single_l2_slice()?

> Apart from that, this function doesn't need any additional changes, so
> this patch simply updates the variable name from l2_table to l2_slice.
> 
> Signed-off-by: Alberto Garcia 
> Reviewed-by: Eric Blake 
> ---
>  block/qcow2-cluster.c | 20 ++--
>  1 file changed, 10 insertions(+), 10 deletions(-)

Anyway:

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PULL 0/7] Small IPMI fixes

2018-02-01 Thread minyard

The following changes since commit 6521130b0a7f699fdb82446d57df5627bfa7ed3c:

  Merge remote-tracking branch 
'remotes/stefanberger/tags/pull-tpm-2018-01-26-2' into staging (2018-01-30 
15:20:01 +)

are available in the git repository at:

  https://github.com/cminyard/qemu.git tags/for-release-20180201

for you to fetch changes up to 20b233641d76cc1812064304798ffeb530dc112d:

  ipmi: Allow BMC device properties to be set (2018-01-30 15:52:53 -0600)


Lots of litte miscellaneous fixes for the IPMI code, plus
add me as the IPMI maintainer.


Corey Minyard (7):
  Add maintainer for the IPMI code
  ipmi: Fix SEL get/set time commands
  ipmi: Don't set the timestamp on add events that don't have it
  ipmi: Add the platform event message command
  ipmi: Fix macro issues
  ipmi: disable IRQ and ATN on an external disconnect
  ipmi: Allow BMC device properties to be set

 MAINTAINERS   |  9 
 hw/ipmi/ipmi_bmc_extern.c |  5 
 hw/ipmi/ipmi_bmc_sim.c| 58 ++-
 hw/ipmi/isa_ipmi_bt.c | 12 +-
 4 files changed, 67 insertions(+), 17 deletions(-)

[Qemu-devel] [PATCH 0/2] Fix vmstate for IPMI devices

2018-02-01 Thread minyard

There were several issues with vmstate transfers in the IPMI devices.
This new version is heavily tested and should be much better.  It will
not be possible to migrate BT devices from older systems, it was too
broken to handle that.

Just to make sure I got this right, could somebody knowledgeable
take a glance at this?

Thanks,

-corey

[Qemu-devel] [PATCH 1/2] ipmi: Use proper struct reference for KCS vmstate

2018-02-01 Thread minyard

From: Corey Minyard 

The vmstate for isa_ipmi_kcs was referencing into the kcs structure,
instead create a kcs structure separate and use that.

There was also some issues in the state transfer.  The inlen field
was not being transferred, so if a transaction was in process during
the transfer it would be messed up.  And the use_irq field was
transferred, but that should come from the configuration.  This
also fixes those issues and is tested under heavy load.

Signed-off-by: Corey Minyard 
---
 hw/ipmi/isa_ipmi_kcs.c | 75 --
 1 file changed, 60 insertions(+), 15 deletions(-)

diff --git a/hw/ipmi/isa_ipmi_kcs.c b/hw/ipmi/isa_ipmi_kcs.c
index 689587b..3c942d6 100644
--- a/hw/ipmi/isa_ipmi_kcs.c
+++ b/hw/ipmi/isa_ipmi_kcs.c
@@ -422,24 +422,69 @@ static void ipmi_isa_realize(DeviceState *dev, Error 
**errp)
 isa_register_ioport(isadev, >kcs.io, iik->kcs.io_base);
 }
 
-const VMStateDescription vmstate_ISAIPMIKCSDevice = {
-.name = TYPE_IPMI_INTERFACE,
+static const VMStateDescription vmstate_IPMIKCS = {
+.name = TYPE_IPMI_INTERFACE_PREFIX "kcs",
 .version_id = 1,
 .minimum_version_id = 1,
 .fields  = (VMStateField[]) {
-VMSTATE_BOOL(kcs.obf_irq_set, ISAIPMIKCSDevice),
-VMSTATE_BOOL(kcs.atn_irq_set, ISAIPMIKCSDevice),
-VMSTATE_BOOL(kcs.use_irq, ISAIPMIKCSDevice),
-VMSTATE_BOOL(kcs.irqs_enabled, ISAIPMIKCSDevice),
-VMSTATE_UINT32(kcs.outpos, ISAIPMIKCSDevice),
-VMSTATE_UINT8_ARRAY(kcs.outmsg, ISAIPMIKCSDevice, MAX_IPMI_MSG_SIZE),
-VMSTATE_UINT8_ARRAY(kcs.inmsg, ISAIPMIKCSDevice, MAX_IPMI_MSG_SIZE),
-VMSTATE_BOOL(kcs.write_end, ISAIPMIKCSDevice),
-VMSTATE_UINT8(kcs.status_reg, ISAIPMIKCSDevice),
-VMSTATE_UINT8(kcs.data_out_reg, ISAIPMIKCSDevice),
-VMSTATE_INT16(kcs.data_in_reg, ISAIPMIKCSDevice),
-VMSTATE_INT16(kcs.cmd_reg, ISAIPMIKCSDevice),
-VMSTATE_UINT8(kcs.waiting_rsp, ISAIPMIKCSDevice),
+VMSTATE_BOOL(obf_irq_set, IPMIKCS),
+VMSTATE_BOOL(atn_irq_set, IPMIKCS),
+VMSTATE_BOOL(use_irq, IPMIKCS),
+VMSTATE_BOOL(irqs_enabled, IPMIKCS),
+VMSTATE_UINT32(outpos, IPMIKCS),
+VMSTATE_UINT8_ARRAY(outmsg, IPMIKCS, MAX_IPMI_MSG_SIZE),
+VMSTATE_UINT32(inlen, IPMIKCS),
+VMSTATE_UINT8_ARRAY(inmsg, IPMIKCS, MAX_IPMI_MSG_SIZE),
+VMSTATE_BOOL(write_end, IPMIKCS),
+VMSTATE_UINT8(status_reg, IPMIKCS),
+VMSTATE_UINT8(data_out_reg, IPMIKCS),
+VMSTATE_INT16(data_in_reg, IPMIKCS),
+VMSTATE_INT16(cmd_reg, IPMIKCS),
+VMSTATE_UINT8(waiting_rsp, IPMIKCS),
+VMSTATE_END_OF_LIST()
+}
+};
+
+static int isa_ipmi_kcs_load_old(QEMUFile *f, void *opaque, int version_id)
+{
+ISAIPMIKCSDevice *iik = opaque;
+IPMIKCS *k = >kcs;
+unsigned int i;
+
+if (version_id != 1) {
+return -EINVAL;
+}
+
+k->obf_irq_set = qemu_get_byte(f);
+k->atn_irq_set = qemu_get_byte(f);
+qemu_get_byte(f); /* Used to be use_irq, but that's not a good idea. */
+k->irqs_enabled = qemu_get_byte(f);
+k->outpos = qemu_get_be32(f);
+for (i = 0; i < MAX_IPMI_MSG_SIZE; i++) {
+k->outmsg[i] = qemu_get_byte(f);
+}
+k->inlen = 0; /* This was forgotten on version 1, just reset it. */
+for (i = 0; i < MAX_IPMI_MSG_SIZE; i++) {
+k->inmsg[i] = qemu_get_byte(f);
+}
+k->write_end = qemu_get_byte(f);
+k->status_reg = qemu_get_byte(f);
+k->data_out_reg = qemu_get_byte(f);
+k->data_in_reg = qemu_get_be16(f);
+k->cmd_reg = qemu_get_be16(f);
+k->waiting_rsp = qemu_get_byte(f);
+
+return 0;
+}
+
+static const VMStateDescription vmstate_ISAIPMIKCSDevice = {
+.name = TYPE_IPMI_INTERFACE_PREFIX "isa-kcs",
+.version_id = 2,
+.minimum_version_id = 2,
+.minimum_version_id_old = 1,
+.load_state_old = isa_ipmi_kcs_load_old,
+.fields  = (VMStateField[]) {
+VMSTATE_STRUCT(kcs, ISAIPMIKCSDevice, 1, vmstate_IPMIKCS, IPMIKCS),
 VMSTATE_END_OF_LIST()
 }
 };
-- 
2.7.4

Re: [Qemu-devel] [PATCH V8 1/4] mem: add share parameter to memory-backend-ram

2018-02-01 Thread Marcel Apfelbaum

On 01/02/2018 20:51, Eduardo Habkost wrote:
> On Thu, Feb 01, 2018 at 08:31:09PM +0200, Marcel Apfelbaum wrote:
>> On 01/02/2018 20:21, Eduardo Habkost wrote:
>>> On Thu, Feb 01, 2018 at 08:03:53PM +0200, Marcel Apfelbaum wrote:
 On 01/02/2018 15:53, Eduardo Habkost wrote:
> On Thu, Feb 01, 2018 at 02:29:25PM +0200, Marcel Apfelbaum wrote:
>> On 01/02/2018 14:10, Eduardo Habkost wrote:
>>> On Thu, Feb 01, 2018 at 07:36:50AM +0200, Marcel Apfelbaum wrote:
 On 01/02/2018 4:22, Michael S. Tsirkin wrote:
> On Wed, Jan 31, 2018 at 09:34:22PM -0200, Eduardo Habkost wrote:
>>> [...]
>> BTW, what's the root cause for requiring HVAs in the buffer?
>
> It's a side effect of the kernel/userspace API which always wants
> a single HVA/len pair to map memory for the application.
>
>

 Hi Eduardo and Michael,

>>  Can
>> this be fixed?
>
> I think yes.  It'd need to be a kernel patch for the RDMA subsystem
> mapping an s/g list with actual memory. The HVA/len pair would then 
> just
> be used to refer to the region, without creating the two mappings.
>
> Something like splitting the register mr into
>
> mr = create mr (va/len) - allocate a handle and record the va/len
>
> addmemory(mr, offset, hva, len) - pin memory
>
> register mr - pass it to HW
>
> As a nice side effect we won't burn so much virtual address space.
>

 We would still need a contiguous virtual address space range (for 
 post-send)
 which we don't have since guest contiguous virtual address space
 will always end up as non-contiguous host virtual address space.

 I am not sure the RDMA HW can handle a large VA with holes.
>>>
>>> I'm confused.  Why would the hardware see and care about virtual
>>> addresses? 
>>
>> The post-send operations bypasses the kernel, and the process
>> puts in the work request GVA addresses.
>>
>>> How exactly does the hardware translates VAs to
>>> PAs? 
>>
>> The HW maintains a page-directory like structure different form MMU
>> VA -> phys pages
>>
>>> What if the process page tables change?
>>>
>>
>> Since the page tables the HW uses are their own, we just need the phys
>> page to be pinned.
>
> So there's no hardware-imposed requirement that the hardware VAs
> (mapped by the HW page directory) match the VAs in QEMU
> address-space, right? 

 Actually there is. Today it works exactly as you described.
>>>
>>> Are you sure there's such hardware-imposed requirement?
>>>
>>
>> Yes.
>>
>>> Why would the hardware require VAs to match the ones in the
>>> userspace address-space, if it doesn't use the CPU MMU at all?
>>>
>>
>> It works like that:
>>
>> 1. We register a buffer from the process address space
>>giving its base address and length.
>>This call goes to kernel which in turn pins the phys pages
>>and registers them with the device *together* with the base
>>address (virtual address!)
>> 2. The device builds its own page tables to be able to translate
>>the virtual addresses to actual phys pages.
> 
> How would the device be able to do that?  It would require the
> device to look at the process page tables, wouldn't it?  Isn't
> the HW IOVA->PA translation table built by the OS?
> 

As stated above, these are tables private for the device.
(They even have a hw vendor specific layout I think,
 since the device holds some cache)

The device looks at its own private page tables, and not
to the OS ones.

> 
>> 3. The process executes post-send requests directly to hw by-passing
>>the kernel giving process virtual addresses in work requests.
>> 4. The device uses its own page tables to translate the virtual
>>addresses to phys pages and sending them.
>>
>> Theoretically is possible to send any contiguous IOVA instead of
>> process's one but is not how is working today.
>>
>> Makes sense?
>

[Qemu-devel] [PATCH 5/7] ipmi: Fix macro issues

2018-02-01 Thread minyard

From: Corey Minyard 

Macro parameters should almost always have () around them when used.
llvm reported an error on this.

Remove redundant parenthesis and put parenthesis around the entire
macros with assignments in case they are used in an expression.

The macros were doing ((v) & 1) for a binary input, but that only works
if v == 0 or if v & 1.  Changed to !!(v) so they work for all values.

Remove some unused macros.

Reported in https://bugs.launchpad.net/bugs/1651167

An audit of these changes found no semantic changes; this is just
cleanups for proper style and to avoid a compiler warning.

Signed-off-by: Corey Minyard 
Reviewed-by: Eric Blake 
---
 hw/ipmi/isa_ipmi_bt.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/ipmi/isa_ipmi_bt.c b/hw/ipmi/isa_ipmi_bt.c
index e098fd5..e946030 100644
--- a/hw/ipmi/isa_ipmi_bt.c
+++ b/hw/ipmi/isa_ipmi_bt.c
@@ -45,21 +45,21 @@
 #define IPMI_BT_B2H_ATN_MASK   (1 << IPMI_BT_B2H_ATN_BIT)
 #define IPMI_BT_GET_B2H_ATN(d) (((d) >> IPMI_BT_B2H_ATN_BIT) & 0x1)
 #define IPMI_BT_SET_B2H_ATN(d, v)  ((d) = (((d) & ~IPMI_BT_B2H_ATN_MASK) | \
-(((v) & 1) << IPMI_BT_B2H_ATN_BIT)))
+(!!(v) << IPMI_BT_B2H_ATN_BIT)))
 
 #define IPMI_BT_SMS_ATN_MASK   (1 << IPMI_BT_SMS_ATN_BIT)
 #define IPMI_BT_GET_SMS_ATN(d) (((d) >> IPMI_BT_SMS_ATN_BIT) & 0x1)
 #define IPMI_BT_SET_SMS_ATN(d, v)  ((d) = (((d) & ~IPMI_BT_SMS_ATN_MASK) | \
-(((v) & 1) << IPMI_BT_SMS_ATN_BIT)))
+(!!(v) << IPMI_BT_SMS_ATN_BIT)))
 
 #define IPMI_BT_HBUSY_MASK (1 << IPMI_BT_HBUSY_BIT)
 #define IPMI_BT_GET_HBUSY(d)   (((d) >> IPMI_BT_HBUSY_BIT) & 0x1)
 #define IPMI_BT_SET_HBUSY(d, v)((d) = (((d) & ~IPMI_BT_HBUSY_MASK) | \
-   (((v) & 1) << IPMI_BT_HBUSY_BIT)))
+   (!!(v) << IPMI_BT_HBUSY_BIT)))
 
 #define IPMI_BT_BBUSY_MASK (1 << IPMI_BT_BBUSY_BIT)
 #define IPMI_BT_SET_BBUSY(d, v)((d) = (((d) & ~IPMI_BT_BBUSY_MASK) | \
-   (((v) & 1) << IPMI_BT_BBUSY_BIT)))
+   (!!(v) << IPMI_BT_BBUSY_BIT)))
 
 
 /* Mask register */
@@ -69,12 +69,12 @@
 #define IPMI_BT_B2H_IRQ_EN_MASK  (1 << IPMI_BT_B2H_IRQ_EN_BIT)
 #define IPMI_BT_GET_B2H_IRQ_EN(d)(((d) >> IPMI_BT_B2H_IRQ_EN_BIT) & 0x1)
 #define IPMI_BT_SET_B2H_IRQ_EN(d, v) ((d) = (((d) & ~IPMI_BT_B2H_IRQ_EN_MASK) 
|\
-(((v) & 1) << IPMI_BT_B2H_IRQ_EN_BIT)))
+(!!(v) << IPMI_BT_B2H_IRQ_EN_BIT)))
 
 #define IPMI_BT_B2H_IRQ_MASK (1 << IPMI_BT_B2H_IRQ_BIT)
 #define IPMI_BT_GET_B2H_IRQ(d)   (((d) >> IPMI_BT_B2H_IRQ_BIT) & 0x1)
 #define IPMI_BT_SET_B2H_IRQ(d, v)((d) = (((d) & ~IPMI_BT_B2H_IRQ_MASK) | \
-(((v) & 1) << IPMI_BT_B2H_IRQ_BIT)))
+(!!(v) << IPMI_BT_B2H_IRQ_BIT)))
 
 typedef struct IPMIBT {
 IPMIBmc *bmc;
-- 
2.7.4

[Qemu-devel] [PATCH 2/2] ipmi: Use proper struct reference for BT vmstate

2018-02-01 Thread minyard

From: Corey Minyard 

The vmstate for isa_ipmi_bt was referencing into the bt structure,
instead create a bt structure separate and use that.

The version 1 of the BT transfer was fairly broken, if a migration
occured during an IPMI operation, it is likely the migration would
be corrupted because I misunderstood the VMSTATE_VBUFFER_UINT32()
handling, I thought it handled transferring the length field,
too.  So I just remove support for that.  I doubt anyone is using
it at this point.

This also removes the transfer of use_irq, since that should come
from configuration.

Signed-off-by: Corey Minyard 
---
 hw/ipmi/isa_ipmi_bt.c | 43 ++-
 1 file changed, 30 insertions(+), 13 deletions(-)

diff --git a/hw/ipmi/isa_ipmi_bt.c b/hw/ipmi/isa_ipmi_bt.c
index e946030..a990ab7 100644
--- a/hw/ipmi/isa_ipmi_bt.c
+++ b/hw/ipmi/isa_ipmi_bt.c
@@ -450,22 +450,39 @@ static void isa_ipmi_bt_realize(DeviceState *dev, Error 
**errp)
 isa_register_ioport(isadev, >bt.io, iib->bt.io_base);
 }
 
-static const VMStateDescription vmstate_ISAIPMIBTDevice = {
-.name = TYPE_IPMI_INTERFACE,
+
+const VMStateDescription vmstate_IPMIBT = {
+.name = TYPE_IPMI_INTERFACE_PREFIX "bt",
 .version_id = 1,
 .minimum_version_id = 1,
 .fields  = (VMStateField[]) {
-VMSTATE_BOOL(bt.obf_irq_set, ISAIPMIBTDevice),
-VMSTATE_BOOL(bt.atn_irq_set, ISAIPMIBTDevice),
-VMSTATE_BOOL(bt.use_irq, ISAIPMIBTDevice),
-VMSTATE_BOOL(bt.irqs_enabled, ISAIPMIBTDevice),
-VMSTATE_UINT32(bt.outpos, ISAIPMIBTDevice),
-VMSTATE_VBUFFER_UINT32(bt.outmsg, ISAIPMIBTDevice, 1, NULL, bt.outlen),
-VMSTATE_VBUFFER_UINT32(bt.inmsg, ISAIPMIBTDevice, 1, NULL, bt.inlen),
-VMSTATE_UINT8(bt.control_reg, ISAIPMIBTDevice),
-VMSTATE_UINT8(bt.mask_reg, ISAIPMIBTDevice),
-VMSTATE_UINT8(bt.waiting_rsp, ISAIPMIBTDevice),
-VMSTATE_UINT8(bt.waiting_seq, ISAIPMIBTDevice),
+VMSTATE_BOOL(obf_irq_set, IPMIBT),
+VMSTATE_BOOL(atn_irq_set, IPMIBT),
+VMSTATE_BOOL(irqs_enabled, IPMIBT),
+VMSTATE_UINT32(outpos, IPMIBT),
+VMSTATE_UINT32(outlen, IPMIBT),
+VMSTATE_UINT8_ARRAY(outmsg, IPMIBT, MAX_IPMI_MSG_SIZE),
+VMSTATE_UINT32(inlen, IPMIBT),
+VMSTATE_UINT8_ARRAY(inmsg, IPMIBT, MAX_IPMI_MSG_SIZE),
+VMSTATE_UINT8(control_reg, IPMIBT),
+VMSTATE_UINT8(mask_reg, IPMIBT),
+VMSTATE_UINT8(waiting_rsp, IPMIBT),
+VMSTATE_UINT8(waiting_seq, IPMIBT),
+VMSTATE_END_OF_LIST()
+}
+};
+
+static const VMStateDescription vmstate_ISAIPMIBTDevice = {
+.name = TYPE_IPMI_INTERFACE_PREFIX "isa-bt",
+.version_id = 2,
+.minimum_version_id = 2,
+/*
+ * Version 1 had messed up the array transfer, it's not even usable
+ * because it used VMSTATE_VBUFFER_UINT32, but it did not transfer
+ * the buffer length, so random things would happen.
+ */
+.fields  = (VMStateField[]) {
+VMSTATE_STRUCT(bt, ISAIPMIBTDevice, 1, vmstate_IPMIBT, IPMIBT),
 VMSTATE_END_OF_LIST()
 }
 };
-- 
2.7.4

Re: [Qemu-devel] [PATCH v3 23/39] qcow2: Update handle_alloc() to support L2 slices

2018-02-01 Thread Max Reitz

On 2018-01-26 15:59, Alberto Garcia wrote:
> handle_alloc() loads an L2 table and limits the number of checked
> clusters to the amount that fits inside that table. Since we'll be
> loading L2 slices instead of full tables we need to update that limit.
> 
> Apart from that, this function doesn't need any additional changes, so
> this patch simply updates the variable name from l2_table to l2_slice.
> 
> Signed-off-by: Alberto Garcia 
> Reviewed-by: Eric Blake 
> ---
>  block/qcow2-cluster.c | 18 +-
>  1 file changed, 9 insertions(+), 9 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v3 22/39] qcow2: Update handle_copied() to support L2 slices

2018-02-01 Thread Max Reitz

On 2018-01-26 15:59, Alberto Garcia wrote:
> handle_copied() loads an L2 table and limits the number of checked
> clusters to the amount that fits inside that table. Since we'll be
> loading L2 slices instead of full tables we need to update that limit.
> 
> Apart from that, this function doesn't need any additional changes, so
> this patch simply updates the variable name from l2_table to l2_slice.
> 
> Signed-off-by: Alberto Garcia 
> Reviewed-by: Eric Blake 
> ---
>  block/qcow2-cluster.c | 16 
>  1 file changed, 8 insertions(+), 8 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH 1/7] Add maintainer for the IPMI code

2018-02-01 Thread minyard

From: Corey Minyard 

Signed-off-by: Corey Minyard 
Acked-by: Marc-André Lureau 
---
 MAINTAINERS | 9 +
 1 file changed, 9 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index fe39b30..192d8b8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -940,6 +940,15 @@ F: tests/ahci-test.c
 F: tests/libqos/ahci*
 T: git git://github.com/jnsnow/qemu.git ide
 
+IPMI
+M: Corey Minyard 
+S: Maintained
+F: include/hw/ipmi/*
+F: hw/ipmi/*
+F: hw/smbios/smbios_type_38.c
+F: tests/ipmi*
+T: git git://github.com/cminyard/qemu.git master-ipmi-rebase
+
 Floppy
 M: John Snow 
 L: qemu-bl...@nongnu.org
-- 
2.7.4

[Qemu-devel] [PATCH 4/7] ipmi: Add the platform event message command

2018-02-01 Thread minyard

From: Corey Minyard 

This lets an event be added to the SEL as if a sensor had generated
it.  The OpenIPMI driver uses it for storing panic event information.

Signed-off-by: Corey Minyard 
Reviewed-by: Cédric Le Goater 
---
 hw/ipmi/ipmi_bmc_sim.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/hw/ipmi/ipmi_bmc_sim.c b/hw/ipmi/ipmi_bmc_sim.c
index a0bbfd5..e84d710 100644
--- a/hw/ipmi/ipmi_bmc_sim.c
+++ b/hw/ipmi/ipmi_bmc_sim.c
@@ -38,6 +38,7 @@
 
 #define IPMI_NETFN_SENSOR_EVENT   0x04
 
+#define IPMI_CMD_PLATFORM_EVENT_MSG   0x02
 #define IPMI_CMD_SET_SENSOR_EVT_ENABLE0x28
 #define IPMI_CMD_GET_SENSOR_EVT_ENABLE0x29
 #define IPMI_CMD_REARM_SENSOR_EVTS0x2a
@@ -1581,6 +1582,28 @@ static void set_sel_time(IPMIBmcSim *ibs,
 ibs->sel.time_offset = now.tv_sec - ((long) val);
 }
 
+static void platform_event_msg(IPMIBmcSim *ibs,
+   uint8_t *cmd, unsigned int cmd_len,
+   RspBuffer *rsp)
+{
+uint8_t event[16];
+
+event[2] = 2; /* System event record */
+event[7] = cmd[2]; /* Generator ID */
+event[8] = 0;
+event[9] = cmd[3]; /* EvMRev */
+event[10] = cmd[4]; /* Sensor type */
+event[11] = cmd[5]; /* Sensor number */
+event[12] = cmd[6]; /* Event dir / Event type */
+event[13] = cmd[7]; /* Event data 1 */
+event[14] = cmd[8]; /* Event data 2 */
+event[15] = cmd[9]; /* Event data 3 */
+
+if (sel_add_event(ibs, event)) {
+rsp_buffer_set_error(rsp, IPMI_CC_OUT_OF_SPACE);
+}
+}
+
 static void set_sensor_evt_enable(IPMIBmcSim *ibs,
   uint8_t *cmd, unsigned int cmd_len,
   RspBuffer *rsp)
@@ -1757,6 +1780,7 @@ static const IPMINetfn chassis_netfn = {
 };
 
 static const IPMICmdHandler sensor_event_cmds[] = {
+[IPMI_CMD_PLATFORM_EVENT_MSG] = { platform_event_msg, 10 },
 [IPMI_CMD_SET_SENSOR_EVT_ENABLE] = { set_sensor_evt_enable, 4 },
 [IPMI_CMD_GET_SENSOR_EVT_ENABLE] = { get_sensor_evt_enable, 3 },
 [IPMI_CMD_REARM_SENSOR_EVTS] = { rearm_sensor_evts, 4 },
-- 
2.7.4

[Qemu-devel] [PATCH 7/7] ipmi: Allow BMC device properties to be set

2018-02-01 Thread minyard

From: Corey Minyard 

Signed-off-by: Corey Minyard 
Reviewed-by: Marc-André Lureau 
---
 hw/ipmi/ipmi_bmc_sim.c | 21 ++---
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/hw/ipmi/ipmi_bmc_sim.c b/hw/ipmi/ipmi_bmc_sim.c
index e84d710..9b509f8 100644
--- a/hw/ipmi/ipmi_bmc_sim.c
+++ b/hw/ipmi/ipmi_bmc_sim.c
@@ -214,8 +214,8 @@ struct IPMIBmcSim {
 uint8_t device_rev;
 uint8_t fwrev1;
 uint8_t fwrev2;
-uint8_t mfg_id[3];
-uint8_t product_id[2];
+uint32_t mfg_id;
+uint16_t product_id;
 
 uint8_t restart_cause;
 
@@ -867,11 +867,11 @@ static void get_device_id(IPMIBmcSim *ibs,
 rsp_buffer_push(rsp, ibs->fwrev2);
 rsp_buffer_push(rsp, ibs->ipmi_version);
 rsp_buffer_push(rsp, 0x07); /* sensor, SDR, and SEL. */
-rsp_buffer_push(rsp, ibs->mfg_id[0]);
-rsp_buffer_push(rsp, ibs->mfg_id[1]);
-rsp_buffer_push(rsp, ibs->mfg_id[2]);
-rsp_buffer_push(rsp, ibs->product_id[0]);
-rsp_buffer_push(rsp, ibs->product_id[1]);
+rsp_buffer_push(rsp, ibs->mfg_id & 0xff);
+rsp_buffer_push(rsp, (ibs->mfg_id >> 8) & 0xff);
+rsp_buffer_push(rsp, (ibs->mfg_id >> 16) & 0xff);
+rsp_buffer_push(rsp, ibs->product_id & 0xff);
+rsp_buffer_push(rsp, (ibs->product_id >> 8) & 0xff);
 }
 
 static void set_global_enables(IPMIBmcSim *ibs, uint8_t val)
@@ -1997,6 +1997,13 @@ static Property ipmi_sim_properties[] = {
 DEFINE_PROP_UINT16("fruareasize", IPMIBmcSim, fru.areasize, 1024),
 DEFINE_PROP_STRING("frudatafile", IPMIBmcSim, fru.filename),
 DEFINE_PROP_STRING("sdrfile", IPMIBmcSim, sdr_filename),
+DEFINE_PROP_UINT8("device_id", IPMIBmcSim, device_id, 0x20),
+DEFINE_PROP_UINT8("ipmi_version", IPMIBmcSim, ipmi_version, 0x02),
+DEFINE_PROP_UINT8("device_rev", IPMIBmcSim, device_rev, 0),
+DEFINE_PROP_UINT8("fwrev1", IPMIBmcSim, fwrev1, 0),
+DEFINE_PROP_UINT8("fwrev2", IPMIBmcSim, fwrev2, 0),
+DEFINE_PROP_UINT32("mfg_id", IPMIBmcSim, mfg_id, 0),
+DEFINE_PROP_UINT16("product_id", IPMIBmcSim, product_id, 0),
 DEFINE_PROP_END_OF_LIST(),
 };
 
-- 
2.7.4

[Qemu-devel] [PATCH 2/7] ipmi: Fix SEL get/set time commands

2018-02-01 Thread minyard

From: Corey Minyard 

The minimum message size was on the wrong commands, for getting
the time it's zero and for setting the time it's 6.

Signed-off-by: Corey Minyard 
Reviewed-by: Cédric Le Goater 
Reviewed-by: Marc-André Lureau 
---
 hw/ipmi/ipmi_bmc_sim.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/ipmi/ipmi_bmc_sim.c b/hw/ipmi/ipmi_bmc_sim.c
index 277c28c..cc068f2 100644
--- a/hw/ipmi/ipmi_bmc_sim.c
+++ b/hw/ipmi/ipmi_bmc_sim.c
@@ -1802,8 +1802,8 @@ static const IPMICmdHandler storage_cmds[] = {
 [IPMI_CMD_GET_SEL_ENTRY] = { get_sel_entry, 8 },
 [IPMI_CMD_ADD_SEL_ENTRY] = { add_sel_entry, 18 },
 [IPMI_CMD_CLEAR_SEL] = { clear_sel, 8 },
-[IPMI_CMD_GET_SEL_TIME] = { get_sel_time, 6 },
-[IPMI_CMD_SET_SEL_TIME] = { set_sel_time },
+[IPMI_CMD_GET_SEL_TIME] = { get_sel_time },
+[IPMI_CMD_SET_SEL_TIME] = { set_sel_time, 6 },
 };
 
 static const IPMINetfn storage_netfn = {
-- 
2.7.4

[Qemu-devel] [PATCH 6/7] ipmi: disable IRQ and ATN on an external disconnect

2018-02-01 Thread minyard

From: Corey Minyard 

Otherwise there's no way to clear them without an external command,
and it could lock the OS in the VM if they were stuck.

Signed-off-by: Corey Minyard 
---
 hw/ipmi/ipmi_bmc_extern.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/hw/ipmi/ipmi_bmc_extern.c b/hw/ipmi/ipmi_bmc_extern.c
index 8c0535d..bf0b7ee 100644
--- a/hw/ipmi/ipmi_bmc_extern.c
+++ b/hw/ipmi/ipmi_bmc_extern.c
@@ -425,6 +425,11 @@ static void chr_event(void *opaque, int event)
 return;
 }
 ibe->connected = false;
+/*
+ * Don't hang the OS trying to handle the ATN bit, other end will
+ * resend on a reconnect.
+ */
+k->set_atn(s, 0, 0);
 if (ibe->waiting_rsp) {
 ibe->waiting_rsp = false;
 ibe->inbuf[1] = ibe->outbuf[1] | 0x04;
-- 
2.7.4

[Qemu-devel] [PATCH 3/7] ipmi: Don't set the timestamp on add events that don't have it

2018-02-01 Thread minyard

From: Corey Minyard 

According to the spec, from section "32.3 OEM SEL Record - Type
E0h-FFh", event types from 0x0e to 0xff do not have a timestamp.
So don't set it when adding those types.  This required putting
the timestamp in a temporary buffer, since it's still required
to set the last addition time.

Signed-off-by: Corey Minyard 
Reviewed-by: Cédric Le Goater 
---
 hw/ipmi/ipmi_bmc_sim.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/hw/ipmi/ipmi_bmc_sim.c b/hw/ipmi/ipmi_bmc_sim.c
index cc068f2..a0bbfd5 100644
--- a/hw/ipmi/ipmi_bmc_sim.c
+++ b/hw/ipmi/ipmi_bmc_sim.c
@@ -443,16 +443,21 @@ static void sel_inc_reservation(IPMISel *sel)
 /* Returns 1 if the SEL is full and can't hold the event. */
 static int sel_add_event(IPMIBmcSim *ibs, uint8_t *event)
 {
+uint8_t ts[4];
+
 event[0] = 0xff;
 event[1] = 0xff;
-set_timestamp(ibs, event + 3);
+set_timestamp(ibs, ts);
+if (event[2] < 0xe0) { /* Don't set timestamps for type 0xe0-0xff. */
+memcpy(event + 3, ts, 4);
+}
 if (ibs->sel.next_free == MAX_SEL_SIZE) {
 ibs->sel.overflow = 1;
 return 1;
 }
 event[0] = ibs->sel.next_free & 0xff;
 event[1] = (ibs->sel.next_free >> 8) & 0xff;
-memcpy(ibs->sel.last_addition, event + 3, 4);
+memcpy(ibs->sel.last_addition, ts, 4);
 memcpy(ibs->sel.sel[ibs->sel.next_free], event, 16);
 ibs->sel.next_free++;
 sel_inc_reservation(>sel);
-- 
2.7.4

Re: [Qemu-devel] [PATCH V8 1/4] mem: add share parameter to memory-backend-ram

2018-02-01 Thread Michael S. Tsirkin

On Thu, Feb 01, 2018 at 08:31:09PM +0200, Marcel Apfelbaum wrote:
> Theoretically is possible to send any contiguous IOVA instead of
> process's one but is not how is working today.

It works this way today in hardware but it's not hardware that limits
it to work this way - it's a software limitation.

-- 
MST

Re: [Qemu-devel] [PATCH V8 1/4] mem: add share parameter to memory-backend-ram

2018-02-01 Thread Eduardo Habkost

On Thu, Feb 01, 2018 at 08:31:09PM +0200, Marcel Apfelbaum wrote:
> On 01/02/2018 20:21, Eduardo Habkost wrote:
> > On Thu, Feb 01, 2018 at 08:03:53PM +0200, Marcel Apfelbaum wrote:
> >> On 01/02/2018 15:53, Eduardo Habkost wrote:
> >>> On Thu, Feb 01, 2018 at 02:29:25PM +0200, Marcel Apfelbaum wrote:
>  On 01/02/2018 14:10, Eduardo Habkost wrote:
> > On Thu, Feb 01, 2018 at 07:36:50AM +0200, Marcel Apfelbaum wrote:
> >> On 01/02/2018 4:22, Michael S. Tsirkin wrote:
> >>> On Wed, Jan 31, 2018 at 09:34:22PM -0200, Eduardo Habkost wrote:
> > [...]
>  BTW, what's the root cause for requiring HVAs in the buffer?
> >>>
> >>> It's a side effect of the kernel/userspace API which always wants
> >>> a single HVA/len pair to map memory for the application.
> >>>
> >>>
> >>
> >> Hi Eduardo and Michael,
> >>
>   Can
>  this be fixed?
> >>>
> >>> I think yes.  It'd need to be a kernel patch for the RDMA subsystem
> >>> mapping an s/g list with actual memory. The HVA/len pair would then 
> >>> just
> >>> be used to refer to the region, without creating the two mappings.
> >>>
> >>> Something like splitting the register mr into
> >>>
> >>> mr = create mr (va/len) - allocate a handle and record the va/len
> >>>
> >>> addmemory(mr, offset, hva, len) - pin memory
> >>>
> >>> register mr - pass it to HW
> >>>
> >>> As a nice side effect we won't burn so much virtual address space.
> >>>
> >>
> >> We would still need a contiguous virtual address space range (for 
> >> post-send)
> >> which we don't have since guest contiguous virtual address space
> >> will always end up as non-contiguous host virtual address space.
> >>
> >> I am not sure the RDMA HW can handle a large VA with holes.
> >
> > I'm confused.  Why would the hardware see and care about virtual
> > addresses? 
> 
>  The post-send operations bypasses the kernel, and the process
>  puts in the work request GVA addresses.
> 
> > How exactly does the hardware translates VAs to
> > PAs? 
> 
>  The HW maintains a page-directory like structure different form MMU
>  VA -> phys pages
> 
> > What if the process page tables change?
> >
> 
>  Since the page tables the HW uses are their own, we just need the phys
>  page to be pinned.
> >>>
> >>> So there's no hardware-imposed requirement that the hardware VAs
> >>> (mapped by the HW page directory) match the VAs in QEMU
> >>> address-space, right? 
> >>
> >> Actually there is. Today it works exactly as you described.
> > 
> > Are you sure there's such hardware-imposed requirement?
> > 
> 
> Yes.
> 
> > Why would the hardware require VAs to match the ones in the
> > userspace address-space, if it doesn't use the CPU MMU at all?
> > 
> 
> It works like that:
> 
> 1. We register a buffer from the process address space
>giving its base address and length.
>This call goes to kernel which in turn pins the phys pages
>and registers them with the device *together* with the base
>address (virtual address!)
> 2. The device builds its own page tables to be able to translate
>the virtual addresses to actual phys pages.

How would the device be able to do that?  It would require the
device to look at the process page tables, wouldn't it?  Isn't
the HW IOVA->PA translation table built by the OS?


> 3. The process executes post-send requests directly to hw by-passing
>the kernel giving process virtual addresses in work requests.
> 4. The device uses its own page tables to translate the virtual
>addresses to phys pages and sending them.
> 
> Theoretically is possible to send any contiguous IOVA instead of
> process's one but is not how is working today.
> 
> Makes sense?

-- 
Eduardo

Re: [Qemu-devel] [PATCH v3 21/39] qcow2: Update qcow2_alloc_cluster_link_l2() to support L2 slices

2018-02-01 Thread Max Reitz

On 2018-01-26 15:59, Alberto Garcia wrote:
> There's a loop in this function that iterates over the L2 entries in a
> table, so now we need to assert that it remains within the limits of
> an L2 slice.
> 
> Apart from that, this function doesn't need any additional changes, so
> this patch simply updates the variable name from l2_table to l2_slice.
> 
> Signed-off-by: Alberto Garcia 
> Reviewed-by: Eric Blake 
> ---
>  block/qcow2-cluster.c | 16 
>  1 file changed, 8 insertions(+), 8 deletions(-)

Hm, well, strictly speaking this patch should not be at this point in
this series -- e.g. handle_alloc() so far only limits its nb_clusters to
the L2 size, not the L2 slice size.

But that's nit picking because the slice size equals the L2 size anyway
(for now), so

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH V8 1/4] mem: add share parameter to memory-backend-ram

2018-02-01 Thread Marcel Apfelbaum

On 01/02/2018 20:18, Eduardo Habkost wrote:
> On Thu, Feb 01, 2018 at 07:58:10PM +0200, Marcel Apfelbaum wrote:
>> On 01/02/2018 19:36, Eduardo Habkost wrote:
>>> On Thu, Feb 01, 2018 at 07:12:45PM +0200, Michael S. Tsirkin wrote:
 On Thu, Feb 01, 2018 at 03:01:36PM -0200, Eduardo Habkost wrote:
> On Thu, Feb 01, 2018 at 06:59:07PM +0200, Michael S. Tsirkin wrote:
>> On Thu, Feb 01, 2018 at 02:57:39PM -0200, Eduardo Habkost wrote:
>>> On Thu, Feb 01, 2018 at 06:48:54PM +0200, Michael S. Tsirkin wrote:
 On Thu, Feb 01, 2018 at 02:31:32PM -0200, Eduardo Habkost wrote:
> On Thu, Feb 01, 2018 at 04:24:30PM +0200, Michael S. Tsirkin wrote:
> [...]
>> The full fix would be to allow QEMU to map a list of
>> pages to a guest supplied IOVA.
>
> Thanks, that's what I expected.
>
> While this is not possible, the only requests I have for this
> patch is that we clearly document:
> * What's the only purpose of share=on on a host-memory-backend
>   object (due to pvrdma limitations).
> * The potential undesirable side-effects of setting share=on.
> * On the commit message and other comments, clearly distinguish
>   HVAs in the QEMU address-space from IOVAs, to avoid confusion.

 Looking forward, when we do support it, how will management find out
 it no longer needs to pass the share parameter?

 Further, if the side effects of the share parameter go away,
 how will it know these no longer hold?
>>>
>>> A query-host-capabilities or similar QMP command seems necessary
>>> for that.
>>
>> Is anyone working on that?
>
> Not yet.
>
> -- 
> Eduardo

 Do these patches need to wait until we do have that command?
>>>
>>> I don't think so.  The command will be needed only when
>>> support for pvrdma without share=on gets implemented.
>>>
>>> Right now, all we need is clear documentation.
>>>

 I'm thinking it's better to have "share=on required with rdma"
 and "hugetlbfs not supported with rdma"
 than the reverse, this way new hosts do not need to carry
 thus stuff around forever.
>>>
>>> What do you mean by "the reverse"?
>>>
>>> IIUC, the requirements/limitations are:
>>>
>>> * share=on required for pvrdma.  Already documented and enforced
>>>   by pvrdma code in this series.
>>
>> Right.
>>
>>> * hugetlbfs not supported with rdma. Is this detected/reported by
>>>   QEMU?  Is it documented?
>>
>> Yes, enforced by the pvrdma device initialization and documented in the
>> corresponding pvrdma doc.
>>
>>> * side-effects of share=on.  This is not detected nor documented,
>>>   and probably already applies to other memory backends.
>>>   * Nice to have: document when share=on is useful (answer:
>>> because of pvrdma), when adding share=on support to
>>> host-memory-backend.
>>>
>>
>> The documentation is part of the pvrdma doc.
>> What are the side-effects of share=on? I missed that.
>> (share=on is new for the memory backed RAM, the file
>> backed RAM already had the share parameter)
>>
>> One can just grep for "share=on" in the docs directory
>> and can easily see the only current usage. But maybe will
>> be more, maybe we don't want to limit it for now.
>>
>> I am planning to re-spin today/tomorrow before sending
>> a pull-request, can you please point me on what documentation
>> to add and what side-effects I should document?
>>
> 
> The full list of side-effects is not clear to me.  For some of
> them, see Documentation/vm/numa_memory_policy.txt on the kernel
> tree.
> 
> The documentation for memory backend options is at
> qemu-options.hx.  Maybe something like this, extending the
> existing paragraph:
> 
>   The @option{share} boolean option determines whether the memory
>   region is marked as private to QEMU, or shared (mapped using
>   the MAP_SHARED flag).  The latter allows a co-operating
>   external process to access the QEMU memory region.
> 
>   @option{share} is also required for pvrdma devices due to
>   limitations in the RDMA API provided by Linux.
> 
>   Setting share=on might affect the ability to configure NUMA
>   bindings for the memory backend under some circumstances, see
>   Documentation/vm/numa_memory_policy.txt on the Linux kernel
>   source tree for additional details.
> 
> I hate to point users to low-level documentation on the kernel
> tree, but it's better than nothing.
> 
> We also need to list "share" as a valid option at the
> "@item -object memory-backend-ram,[...]" line.
> 

Thanks for the help Eduardo, I'll be sure to update the docs as
advised.
Marcel

Re: [Qemu-devel] [PATCH V8 1/4] mem: add share parameter to memory-backend-ram

2018-02-01 Thread Marcel Apfelbaum

On 01/02/2018 20:21, Eduardo Habkost wrote:
> On Thu, Feb 01, 2018 at 08:03:53PM +0200, Marcel Apfelbaum wrote:
>> On 01/02/2018 15:53, Eduardo Habkost wrote:
>>> On Thu, Feb 01, 2018 at 02:29:25PM +0200, Marcel Apfelbaum wrote:
 On 01/02/2018 14:10, Eduardo Habkost wrote:
> On Thu, Feb 01, 2018 at 07:36:50AM +0200, Marcel Apfelbaum wrote:
>> On 01/02/2018 4:22, Michael S. Tsirkin wrote:
>>> On Wed, Jan 31, 2018 at 09:34:22PM -0200, Eduardo Habkost wrote:
> [...]
 BTW, what's the root cause for requiring HVAs in the buffer?
>>>
>>> It's a side effect of the kernel/userspace API which always wants
>>> a single HVA/len pair to map memory for the application.
>>>
>>>
>>
>> Hi Eduardo and Michael,
>>
  Can
 this be fixed?
>>>
>>> I think yes.  It'd need to be a kernel patch for the RDMA subsystem
>>> mapping an s/g list with actual memory. The HVA/len pair would then just
>>> be used to refer to the region, without creating the two mappings.
>>>
>>> Something like splitting the register mr into
>>>
>>> mr = create mr (va/len) - allocate a handle and record the va/len
>>>
>>> addmemory(mr, offset, hva, len) - pin memory
>>>
>>> register mr - pass it to HW
>>>
>>> As a nice side effect we won't burn so much virtual address space.
>>>
>>
>> We would still need a contiguous virtual address space range (for 
>> post-send)
>> which we don't have since guest contiguous virtual address space
>> will always end up as non-contiguous host virtual address space.
>>
>> I am not sure the RDMA HW can handle a large VA with holes.
>
> I'm confused.  Why would the hardware see and care about virtual
> addresses? 

 The post-send operations bypasses the kernel, and the process
 puts in the work request GVA addresses.

> How exactly does the hardware translates VAs to
> PAs? 

 The HW maintains a page-directory like structure different form MMU
 VA -> phys pages

> What if the process page tables change?
>

 Since the page tables the HW uses are their own, we just need the phys
 page to be pinned.
>>>
>>> So there's no hardware-imposed requirement that the hardware VAs
>>> (mapped by the HW page directory) match the VAs in QEMU
>>> address-space, right? 
>>
>> Actually there is. Today it works exactly as you described.
> 
> Are you sure there's such hardware-imposed requirement?
> 

Yes.

> Why would the hardware require VAs to match the ones in the
> userspace address-space, if it doesn't use the CPU MMU at all?
> 

It works like that:

1. We register a buffer from the process address space
   giving its base address and length.
   This call goes to kernel which in turn pins the phys pages
   and registers them with the device *together* with the base
   address (virtual address!)
2. The device builds its own page tables to be able to translate
   the virtual addresses to actual phys pages.
3. The process executes post-send requests directly to hw by-passing
   the kernel giving process virtual addresses in work requests.
4. The device uses its own page tables to translate the virtual
   addresses to phys pages and sending them.

Theoretically is possible to send any contiguous IOVA instead of
process's one but is not how is working today.

Makes sense?

Thanks,
Marcel

1 2 3 4 >

1 - 100 of 312 matches

Mail list logo