date:20170630

[Qemu-devel] [Bug 994662] Re: QEMU crashes on ioport access

2017-06-30 Thread Launchpad Bug Tracker

[Expired for QEMU because there has been no activity for 60 days.]

** Changed in: qemu
   Status: Incomplete => Expired

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/994662

Title:
  QEMU crashes on ioport access

Status in QEMU:
  Expired

Bug description:
  While running a fuzzer inside the guest, QEMU crashed with the
  following message and dumped the state of all vcpus:

  
  qemu: hardware error: register_ioport_read: invalid opaque for address 0x0Al
  CPU #0:
  RAX=880007a73000 RBX=8800095b6000 RCX=880007a33530 
RDX=880007a33530
  RSI=00aa6000 RDI=00aa6000 RBP=880007c13c68 
RSP=880007c13c48
  R8 = R9 = R10= 
R11=0001
  R12=00aa6000 R13=800033556045 R14=00aa6000 
R15=8800095b6000
  RIP=8108ae02 RFL=0282 [--S] CPL=0 II=0 A20=1 SMM=0 HLT=0
  ES =   
  CS =0010   00a09b00 DPL=0 CS64 [-RA]
  SS =0018   00c09300 DPL=0 DS   [-WA]
  DS =   
  FS = 7f7de18e8700  
  GS = 88000d80  
  LDT=   
  TR =0040 88000d9d2540 2087 8b00 DPL=0 TSS64-busy
  GDT= 88000d804000 007f
  IDT= 8436d000 0fff
  CR0=8005003b CR2=7f2f25752e9c CR3=07a3d000 CR4=000407f0
  DR0= DR1= DR2= 
DR3= 
  DR6=0ff0 DR7=0400
  EFER=0d01
  FCW=037f FSW= [ST=0] FTW=00 MXCSR=1f80
  FPR0=  FPR1= 
  FPR2=  FPR3= 
  FPR4=  FPR5= 
  FPR6=  FPR7= 
  XMM00=00ff00ff XMM01=25252525252525252525252525252525
  XMM02= XMM03=
  XMM04= XMM05=
  XMM06= XMM07=
  XMM08= XMM09=
  XMM10= XMM11=
  XMM12= XMM13=
  XMM14= XMM15=
  CPU #1:
  RAX=88001b588000 RBX=ea4ab300 RCX=c9304000 
RDX=0005
  RSI=c9304000 RDI=005000380028 RBP=880012681c38 
RSP=880012681c28
  R8 = R9 = R10= 
R11=0002
  R12=0004 R13=88001bfd3000 R14=00fef000 
R15=88000ed51000
  RIP=811daf87 RFL=0006 [-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
  ES =   
  CS =0010   00a09b00 DPL=0 CS64 [-RA]
  SS =0018   00c09300 DPL=0 DS   [-WA]
  DS =   
  FS = 7fe38bb99700  
  GS = 88001b80  
  LDT=   
  TR =0040 88001b9d2540 2087 8b00 DPL=0 TSS64-busy
  GDT= 88001b804000 007f
  IDT= 8436d000 0fff
  CR0=8005003b CR2=7f2f25ac4518 CR3=1173e000 CR4=000407e0
  DR0= DR1= DR2= 
DR3=
  DR6=0ff0 DR7=0400
  EFER=0d01
  FCW=037f FSW= [ST=0] FTW=00 MXCSR=1f80
  FPR0=  FPR1= 
  FPR2=  FPR3= 
  FPR4=  FPR5= 
  FPR6=  FPR7= 
  XMM00=ffff00ff XMM01=25252525252525252525252525252525
  XMM02= XMM03=ff00ff00ff00
  XMM04= XMM05=
  XMM06= XMM07=
  XMM08= XMM09=
  XMM10= XMM11=
  XMM12= XMM13=
  XMM14= XMM15=
  CPU #2:
  RAX=001d RBX=0080 RCX=0080 
RDX=0cfc
  RSI= RDI=0086 RBP=8800121f7de8 
RSP=8800121f7db8
  R8 =0004 R9 =001d R10= 
R11=0002
  R12=88001b7b

[Qemu-devel] [Bug 988128] Re: smbd crashes when called with "smb ports = 0"

2017-06-30 Thread Launchpad Bug Tracker

[Expired for QEMU because there has been no activity for 60 days.]

** Changed in: qemu
   Status: Incomplete => Expired

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/988128

Title:
  smbd crashes when called with "smb ports = 0"

Status in QEMU:
  Expired

Bug description:
  The smb.conf generated by qemu-kvm contains a "smb ports = 0" directive. This
  causes at least version 3.6.4 of Samba to crash with

  [0] vostro:/tmp/qemu-smb.6836-0# smbd -i -s smb.conf
  Unable to setup corepath for smbd: Operation not permitted
  smbd version 3.6.4 started.
  Copyright Andrew Tridgell and the Samba Team 1992-2011
  open_sockets_smbd: No sockets available to bind to.
  ===
  Abnormal server exit: open_sockets_smbd() failed
  ===
  BACKTRACE: 6 stack frames:
   #0 smbd(log_stack_trace+0x1a) [0x7fe50c14f8ba]
   #1 smbd(+0x6a0743) [0x7fe50c3bd743]
   #2 smbd(+0x6a0a41) [0x7fe50c3bda41]
   #3 smbd(main+0xa52) [0x7fe50be26d42]
   #4 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd) [0x7fe508ac0ead]
   #5 smbd(+0x10a6b9) [0x7fe50be276b9]

  Changing "smb ports" to a non-privilileged port works around the
  issue.

  I'd like to help fix this, but I am not sure what qemu-kvm's intention is 
here.
  Zero is not a valid port, and the smb.conf manpage does not describe any
  special meaning of zero here. I found that previous versions of samba 
apparently
  did not bind to any port if zero was specified - but in that case, how is
  qemu communicating with samba?

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/988128/+subscriptions

[Qemu-devel] [Bug 965133] Re: Sparc64 crash on start

2017-06-30 Thread Launchpad Bug Tracker

[Expired for QEMU because there has been no activity for 60 days.]

** Changed in: qemu
   Status: Incomplete => Expired

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/965133

Title:
  Sparc64 crash on start

Status in QEMU:
  Expired

Bug description:
  qemu version 1.0.1 compiled on a Ubuntu live on a HP laptop win a x64
  architecture.

  With more than 4G of memory sparc64 machine crash on start.

  command line: qemu-system-sparc64 -m 4G

  output:
  VNC server running on `127.0.0.1:5900'
  qemu: fatal: Trap 0x0064 while trap level (5) >= MAXTL (5), Error state
  pc: ffd04c80  npc: ffd04c84
  General Registers:
  %g0-3:    
  %g4-7:    

  Current Register Window:
  %o0-3: ffd0 0008 0008  
  %o4-7:   fff754e1 ffd144d4 
  %l0-3: 0001 fff75c4d   
  %l4-7:     
  %i0-3:   0001 0036 
  %i4-7: ffe87418 ffe87648 fff75591 ffd0bf54 

  Floating Point Registers:
  %f00:    
  %f08:    
  %f16:    
  %f24:    
  %f32:    
  %f40:    
  %f48:    
  %f56:    
  pstate: 0414 ccr: 99 (icc: N--C xcc: N--C) asi: 00 tl: 5 pil: 0
  cansave: 5 canrestore: 1 otherwin: 0 wstate: 0 cleanwin: 6 cwp: 3
  fsr:  y:  fprs: 
  Aborted (core dumped)

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/965133/+subscriptions

Re: [Qemu-devel] [RFC 0/7] tcg: parallel code generation (Work in Progress)

2017-06-30 Thread Emilio G. Cota

On Fri, Jun 30, 2017 at 01:25:54 -0700, Richard Henderson wrote:
> On 06/29/2017 01:28 PM, Emilio G. Cota wrote:
> >- Patches 2-3 remove *tbs[] to use a binary search tree instead.
> >   This removes the assumption in tb_find_pc that *tbs[] are ordered
> >   by tc_ptr, thereby allowing us to generate code regardless of
> >   its location on the host (as we do after patch 6).
> 
> Have you considered a scheme by which the front end translation and tcg
> optimization are done outside the lock, but final code generation is done
> inside the lock?
> 
> It would put at least half of the translation time in the parallel space
> without requiring changes to code_buffer allocation.

I don't think that would save much, because the performance issue comes
from the fact that we have to grab the lock, regardless of how long we hold
it. So even if we did nothing inside the lock, scalability when
translating a lot of code (e.g. booting) would still be quite bad.

So we either get rid of the lock altogether, or use a more scalable lock.

E.

Re: [Qemu-devel] [RFC 6/7] [XXX] tcg: make TCGContext thread-local for softmmu

2017-06-30 Thread Emilio G. Cota

On Fri, Jun 30, 2017 at 01:18:58 -0700, Richard Henderson wrote:
> On 06/29/2017 01:28 PM, Emilio G. Cota wrote:
> >This will allow us to generate TCG code in parallel.
> >
> >User-mode is kept out of this: contention due to concurrent translation
> >is more commonly found in full-system mode (e.g. booting a many-core guest).
> >
> >XXX: For now, only convert arm/a64, since these are the only guests that
> >have proper MTTCG support.
> 
> Not true.  TCG_GUEST_DEFAULT_MO suggests we have 4.

Yes, sorry that was outdated. I'm testing arm/a64 because I can boot
many cores on a64.

> >XXX: This is calling prologue_init once per vCPU, i.e. each TCGContext
> >  gets a different prologue/epilogue (all of them with the same
> >  contents though). Far from ideal, but for an experiment it
> >  "should" work, right?
> 
> Um... sure.  But surely there's a better way.  Perhaps copying the main
> thread's tcg_ctx?  That'd be after prologue, and target globals are created.
> That would avoid the need to make all the target cpu_* globals be
> thread-local, for instance.
> 
> Of course, now that I think of it, this implies that my tcg patches to use
> pointers instead of indicies is not the Way Forward...

Didn't consider this, but after looking at the code (sans your patchset)
I don't see why this wouldn't work -- would be great to avoid touching
target code, as you point out below.

> >  /* Share the TCG temporaries common between 32 and 64 bit modes.  */
> >-extern TCGv_env cpu_env;
> >-extern TCGv_i32 cpu_NF, cpu_ZF, cpu_CF, cpu_VF;
> >-extern TCGv_i64 cpu_exclusive_addr;
> >-extern TCGv_i64 cpu_exclusive_val;
> >+extern TCG_THREAD TCGv_env cpu_env;
> >+extern TCG_THREAD TCGv_i32 cpu_NF, cpu_ZF, cpu_CF, cpu_VF;
> >+extern TCG_THREAD TCGv_i64 cpu_exclusive_addr;
> >+extern TCG_THREAD TCGv_i64 cpu_exclusive_val;
> 
> It would be Really Good if we could avoid this in the end.
> I see that it's the quickest way to test for now though.
> 
> >@@ -887,7 +893,7 @@ typedef struct TCGOpDef {
> >  #endif
> >  } TCGOpDef;
> >-extern TCGOpDef tcg_op_defs[];
> >+extern TCG_THREAD TCGOpDef tcg_op_defs[];
> 
> Why?  It's constant after startup.

I was just going for a mass conversion first to see whether this
could ever work. Will refine it.

Thanks,

E.

[Qemu-devel] [PATCH] fixup: missed some TLS variables

2017-06-30 Thread Emilio G. Cota

On Thu, Jun 29, 2017 at 16:28:28 -0400, Emilio G. Cota wrote:
> XXX: After allowing tb_gen_code to run in parallel (see next patch),
>  crashes due to races in TCG code are found very quickly with -smp > 1
>  (e.g. "tcg/tcg.c:233: tcg_out_label: Assertion `!l->has_value' failed.")
>  Note that with -smp 1 it works fine; with smp > 1 I can make it
>  fail later with "taskset -c 0", so clearly there is a race going on.

Fixed! I missed a few __thread's -- see below.

I found the missing ones by moving tb_lock around and using
'valgrind --tool=drd' (it's not ThreadSanitizer, but it doesn't
choke on our coroutines and it's reasonably fast).

Preliminary tests show even with patch 6 we don't gain almost anything
when booting 64 cores (on a 64-core host), which is what I expected
(we still have to acquire tb_lock on each translation, even if only
for a few instructions, which is a scalability killer). But I'd
worry about optimisations later; for now I'll focus on cleaning up
the patchset, so my next steps are:

- Apply Richard's feedback so far
- Consolidate TLS variables into TCGContext, so that we'll have
  as little TLS variables as possible.

Cheers,

E.

--- 8< 

Signed-off-by: Emilio G. Cota 
---
 include/exec/gen-icount.h | 2 +-
 tcg/optimize.c| 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/exec/gen-icount.h b/include/exec/gen-icount.h
index 62d462e..2e2bc7b 100644
--- a/include/exec/gen-icount.h
+++ b/include/exec/gen-icount.h
@@ -6,7 +6,7 @@
 /* Helpers for instruction counting code generation.  */
 
 static int icount_start_insn_idx;
-static TCGLabel *exitreq_label;
+static TCG_THREAD TCGLabel *exitreq_label;
 
 static inline void gen_tb_start(TranslationBlock *tb)
 {
diff --git a/tcg/optimize.c b/tcg/optimize.c
index adfc56c..71af19b 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -40,8 +40,8 @@ struct tcg_temp_info {
 tcg_target_ulong mask;
 };
 
-static struct tcg_temp_info temps[TCG_MAX_TEMPS];
-static TCGTempSet temps_used;
+static TCG_THREAD struct tcg_temp_info temps[TCG_MAX_TEMPS];
+static TCG_THREAD TCGTempSet temps_used;
 
 static inline bool temp_is_const(TCGArg arg)
 {
-- 
2.7.4

Re: [Qemu-devel] [PATCH 4/4] xen: don't use xenstore to save/restore physmap anymore

2017-06-30 Thread Stefano Stabellini

On Fri, 30 Jun 2017, Igor Druzhinin wrote:
> If we have a system with xenforeignmemory_map2() implemented
> we don't need to save/restore physmap on suspend/restore
> anymore. In case we resume a VM without physmap - try to
> recreate the physmap during memory region restore phase and
> remap map cache entries accordingly. The old code is left
> for compatibility reasons.
> 
> Signed-off-by: Igor Druzhinin 
> ---
>  hw/i386/xen/xen-hvm.c   | 45 
> ++---
>  include/hw/xen/xen_common.h |  1 +
>  2 files changed, 35 insertions(+), 11 deletions(-)
> 
> diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
> index d259cf7..1b6a5ce 100644
> --- a/hw/i386/xen/xen-hvm.c
> +++ b/hw/i386/xen/xen-hvm.c
> @@ -305,6 +305,7 @@ static hwaddr xen_phys_offset_to_gaddr(hwaddr start_addr,
>  return start_addr;
>  }
>  
> +#ifdef XEN_COMPAT_PHYSMAP
>  static int xen_save_physmap(XenIOState *state, XenPhysmap *physmap)
>  {
>  char path[80], value[17];
> @@ -334,6 +335,12 @@ static int xen_save_physmap(XenIOState *state, 
> XenPhysmap *physmap)
>  }
>  return 0;
>  }
> +#else
> +static int xen_save_physmap(XenIOState *state, XenPhysmap *physmap)
> +{
> +return 0;
> +}
> +#endif
>  
>  static int xen_add_to_physmap(XenIOState *state,
>hwaddr start_addr,
> @@ -368,6 +375,26 @@ go_physmap:
>  DPRINTF("mapping vram to %"HWADDR_PRIx" - %"HWADDR_PRIx"\n",
>  start_addr, start_addr + size);
>  
> +mr_name = memory_region_name(mr);
> +
> +physmap = g_malloc(sizeof (XenPhysmap));
> +
> +physmap->start_addr = start_addr;
> +physmap->size = size;
> +physmap->name = mr_name;
> +physmap->phys_offset = phys_offset;
> +
> +QLIST_INSERT_HEAD(>physmap, physmap, list);
> +
> +if (runstate_check(RUN_STATE_INMIGRATE)) {
> +/* Now when we have a physmap entry we can remap a dummy mapping and 
> change
> + * it to a real one of guest foreign memory. */
> +uint8_t *p = xen_remap_cache_entry(phys_offset, size);
> +assert(p && p == memory_region_get_ram_ptr(mr));

I would just pass start_addr to xen_remap_cache_entry as argument. It
would make things easier. With that, I think we should also be able to
#ifdef xen_phys_offset_to_gaddr and the call to phys_offset_to_gaddr in
xen_map_cache_unlocked, right?  It would make things simpler.


> +return 0;
> +}
>  pfn = phys_offset >> TARGET_PAGE_BITS;
>  start_gpfn = start_addr >> TARGET_PAGE_BITS;
>  for (i = 0; i < size >> TARGET_PAGE_BITS; i++) {
> @@ -382,21 +409,11 @@ go_physmap:
>  }
>  }
>  
> -mr_name = memory_region_name(mr);
> -
> -physmap = g_malloc(sizeof (XenPhysmap));
> -
> -physmap->start_addr = start_addr;
> -physmap->size = size;
> -physmap->name = mr_name;
> -physmap->phys_offset = phys_offset;
> -
> -QLIST_INSERT_HEAD(>physmap, physmap, list);
> -
>  xc_domain_pin_memory_cacheattr(xen_xc, xen_domid,
> start_addr >> TARGET_PAGE_BITS,
> (start_addr + size - 1) >> 
> TARGET_PAGE_BITS,
> XEN_DOMCTL_MEM_CACHEATTR_WB);
> +

Spurious change


>  return xen_save_physmap(state, physmap);
>  }
>  
> @@ -1158,6 +1175,7 @@ static void xen_exit_notifier(Notifier *n, void *data)
>  xs_daemon_close(state->xenstore);
>  }
>  
> +#ifdef XEN_COMPAT_PHYSMAP
>  static void xen_read_physmap(XenIOState *state)
>  {
>  XenPhysmap *physmap = NULL;
> @@ -1205,6 +1223,11 @@ static void xen_read_physmap(XenIOState *state)
>  }
>  free(entries);
>  }
> +#else
> +static void xen_read_physmap(XenIOState *state)
> +{
> +}
> +#endif
>  
>  static void xen_wakeup_notifier(Notifier *notifier, void *data)
>  {
> diff --git a/include/hw/xen/xen_common.h b/include/hw/xen/xen_common.h
> index 70a5cad..c04c5c9 100644
> --- a/include/hw/xen/xen_common.h
> +++ b/include/hw/xen/xen_common.h
> @@ -80,6 +80,7 @@ extern xenforeignmemory_handle *xen_fmem;
>  
>  #if CONFIG_XEN_CTRL_INTERFACE_VERSION < 41000
>  
> +#define XEN_COMPAT_PHYSMAP
>  #define xenforeignmemory_map2(h, d, a, p, f, ps, ar, e) \
>  xenforeignmemory_map(h, d, p, ps, ar, e)
>  
> -- 
> 2.7.4
>

Re: [Qemu-devel] [PATCH 3/4] xen/mapcache: introduce xen_remap_cache_entry()

2017-06-30 Thread Stefano Stabellini

On Fri, 30 Jun 2017, Igor Druzhinin wrote:
> This new call is trying to update a requested map cache entry
> according to the changes in the physmap. The call is searching
> for the entry, unmaps it, tries to translate the address and
> maps again at the same place. If the mapping is dummy this call
> will make it real.
> 
> This function makes use of a new xenforeignmemory_map2() call
> with extended interface that was recently introduced in
> libxenforeignmemory [1].
> 
> [1] https://www.mail-archive.com/xen-devel@lists.xen.org/msg113007.html
> 
> Signed-off-by: Igor Druzhinin 
> ---
>  configure |  18 
>  hw/i386/xen/xen-mapcache.c| 105 
> +++---
>  include/hw/xen/xen_common.h   |   7 +++
>  include/sysemu/xen-mapcache.h |   6 +++
>  4 files changed, 130 insertions(+), 6 deletions(-)
> 
> diff --git a/configure b/configure
> index c571ad1..ad6156b 100755
> --- a/configure
> +++ b/configure
> @@ -2021,6 +2021,24 @@ EOF
>  # Xen unstable
>  elif
>  cat > $TMPC < +#undef XC_WANT_COMPAT_MAP_FOREIGN_API
> +#include 
> +int main(void) {
> +  xenforeignmemory_handle *xfmem;
> +
> +  xfmem = xenforeignmemory_open(0, 0);
> +  xenforeignmemory_map2(xfmem, 0, 0, 0, 0, 0, 0, 0);
> +
> +  return 0;
> +}
> +EOF
> +compile_prog "" "$xen_libs -lxendevicemodel $xen_stable_libs"
> +  then
> +  xen_stable_libs="-lxendevicemodel $xen_stable_libs"
> +  xen_ctrl_version=41000
> +  xen=yes
> +elif
> +cat > $TMPC <  #undef XC_WANT_COMPAT_DEVICEMODEL_API
>  #define __XEN_TOOLS__
>  #include 
> diff --git a/hw/i386/xen/xen-mapcache.c b/hw/i386/xen/xen-mapcache.c
> index 05050de..5d8d990 100644
> --- a/hw/i386/xen/xen-mapcache.c
> +++ b/hw/i386/xen/xen-mapcache.c
> @@ -149,6 +149,7 @@ void xen_map_cache_init(phys_offset_to_gaddr_t f, void 
> *opaque)
>  }
>  
>  static void xen_remap_bucket(MapCacheEntry *entry,
> + void *vaddr,
>   hwaddr size,
>   hwaddr address_index,
>   bool dummy)
> @@ -179,11 +180,11 @@ static void xen_remap_bucket(MapCacheEntry *entry,
>  }
>  
>  if (!dummy) {
> -vaddr_base = xenforeignmemory_map(xen_fmem, xen_domid,
> -   PROT_READ|PROT_WRITE,
> +vaddr_base = xenforeignmemory_map2(xen_fmem, xen_domid, vaddr,
> +   PROT_READ|PROT_WRITE, 0,
> nb_pfn, pfns, err);
>  if (vaddr_base == NULL) {
> -perror("xenforeignmemory_map");
> +perror("xenforeignmemory_map2");
>  exit(-1);
>  }
>  } else {
> @@ -191,7 +192,7 @@ static void xen_remap_bucket(MapCacheEntry *entry,
>   * We create dummy mappings where we are unable to create a foreign
>   * mapping immediately due to certain circumstances (i.e. on resume 
> now)
>   */
> -vaddr_base = mmap(NULL, size, PROT_READ|PROT_WRITE,
> +vaddr_base = mmap(vaddr, size, PROT_READ|PROT_WRITE,
>MAP_ANON|MAP_SHARED, -1, 0);
>  if (vaddr_base == NULL) {
>  perror("mmap");
> @@ -278,14 +279,14 @@ tryagain:
>  if (!entry) {
>  entry = g_malloc0(sizeof (MapCacheEntry));
>  pentry->next = entry;
> -xen_remap_bucket(entry, cache_size, address_index, dummy);
> +xen_remap_bucket(entry, NULL, cache_size, address_index, dummy);
>  } else if (!entry->lock) {
>  if (!entry->vaddr_base || entry->paddr_index != address_index ||
>  entry->size != cache_size ||
>  !test_bits(address_offset >> XC_PAGE_SHIFT,
>  test_bit_size >> XC_PAGE_SHIFT,
>  entry->valid_mapping)) {
> -xen_remap_bucket(entry, cache_size, address_index, dummy);
> +xen_remap_bucket(entry, NULL, cache_size, address_index, dummy);
>  }
>  }
>  
> @@ -482,3 +483,95 @@ void xen_invalidate_map_cache(void)
>  
>  mapcache_unlock();
>  }
> +
> +static uint8_t *xen_remap_cache_entry_unlocked(hwaddr phys_addr, hwaddr size)

I think it's best if we use a more descriptive name, such as
xen_replace_dummy_entry to avoid confusion.


> +{
> +MapCacheEntry *entry, *pentry = NULL;
> +hwaddr address_index;
> +hwaddr address_offset;
> +hwaddr cache_size = size;
> +hwaddr test_bit_size;
> +void *vaddr = NULL;
> +uint8_t lock;
> +
> +address_index  = phys_addr >> MCACHE_BUCKET_SHIFT;
> +address_offset = phys_addr & (MCACHE_BUCKET_SIZE - 1);
> +
> +/* test_bit_size is always a multiple of XC_PAGE_SIZE */
> +if (size) {

There is no need to make xen_remap_cache_entry_unlocked generic: it's
only used with explicitly sized mappings, right? We could assert(!size).


> +test_bit_size = size +

Re: [Qemu-devel] [PATCH 2/4] xen/mapcache: add an ability to create dummy mappings

2017-06-30 Thread Stefano Stabellini

On Fri, 30 Jun 2017, Igor Druzhinin wrote:
> Dummys are simple anonymous mappings that are placed instead
> of regular foreign mappings in certain situations when we need
> to postpone the actual mapping but still have to give a
> memory region to QEMU to play with.
> 
> This is planned to be used for restore on Xen.
> 
> Signed-off-by: Igor Druzhinin 
>
> ---
>  hw/i386/xen/xen-mapcache.c | 36 
>  1 file changed, 28 insertions(+), 8 deletions(-)
> 
> diff --git a/hw/i386/xen/xen-mapcache.c b/hw/i386/xen/xen-mapcache.c
> index e60156c..05050de 100644
> --- a/hw/i386/xen/xen-mapcache.c
> +++ b/hw/i386/xen/xen-mapcache.c
> @@ -150,7 +150,8 @@ void xen_map_cache_init(phys_offset_to_gaddr_t f, void 
> *opaque)
>  
>  static void xen_remap_bucket(MapCacheEntry *entry,
>   hwaddr size,
> - hwaddr address_index)
> + hwaddr address_index,
> + bool dummy)
>  {
>  uint8_t *vaddr_base;
>  xen_pfn_t *pfns;
> @@ -177,11 +178,25 @@ static void xen_remap_bucket(MapCacheEntry *entry,
>  pfns[i] = (address_index << (MCACHE_BUCKET_SHIFT-XC_PAGE_SHIFT)) + i;
>  }
>  
> -vaddr_base = xenforeignmemory_map(xen_fmem, xen_domid, 
> PROT_READ|PROT_WRITE,
> -  nb_pfn, pfns, err);
> -if (vaddr_base == NULL) {
> -perror("xenforeignmemory_map");
> -exit(-1);
> +if (!dummy) {
> +vaddr_base = xenforeignmemory_map(xen_fmem, xen_domid,
> +   PROT_READ|PROT_WRITE,
> +   nb_pfn, pfns, err);
> +if (vaddr_base == NULL) {
> +perror("xenforeignmemory_map");
> +exit(-1);
> +}
> +} else {
> +/*
> + * We create dummy mappings where we are unable to create a foreign
> + * mapping immediately due to certain circumstances (i.e. on resume 
> now)
> + */
> +vaddr_base = mmap(NULL, size, PROT_READ|PROT_WRITE,
> +  MAP_ANON|MAP_SHARED, -1, 0);
> +if (vaddr_base == NULL) {
> +perror("mmap");
> +exit(-1);
> +}

For our sanity in debugging this in the future, I think it's best if we
mark this mapcache entry as "dummy". Since we are at it, we could turn
the lock field of MapCacheEntry into a flag field and #define LOCK as
(1<<0) and DUMMY as (1<<1). Please do that as a separate patch.


>  }
>  
>  entry->vaddr_base = vaddr_base;
> @@ -211,6 +226,7 @@ static uint8_t *xen_map_cache_unlocked(hwaddr phys_addr, 
> hwaddr size,
>  hwaddr cache_size = size;
>  hwaddr test_bit_size;
>  bool translated = false;
> +bool dummy = false;
>  
>  tryagain:
>  address_index  = phys_addr >> MCACHE_BUCKET_SHIFT;
> @@ -262,14 +278,14 @@ tryagain:
>  if (!entry) {
>  entry = g_malloc0(sizeof (MapCacheEntry));
>  pentry->next = entry;
> -xen_remap_bucket(entry, cache_size, address_index);
> +xen_remap_bucket(entry, cache_size, address_index, dummy);
>  } else if (!entry->lock) {
>  if (!entry->vaddr_base || entry->paddr_index != address_index ||
>  entry->size != cache_size ||
>  !test_bits(address_offset >> XC_PAGE_SHIFT,
>  test_bit_size >> XC_PAGE_SHIFT,
>  entry->valid_mapping)) {
> -xen_remap_bucket(entry, cache_size, address_index);
> +xen_remap_bucket(entry, cache_size, address_index, dummy);
>  }
>  }
>  
> @@ -282,6 +298,10 @@ tryagain:
>  translated = true;
>  goto tryagain;
>  }
> +if (!dummy && runstate_check(RUN_STATE_INMIGRATE)) {
> +dummy = true;
> +goto tryagain;
> +}
>  trace_xen_map_cache_return(NULL);
>  return NULL;
>  }
> -- 
> 2.7.4
>

Re: [Qemu-devel] [PATCH 1/4] xen: move physmap saving into a separate function

2017-06-30 Thread Stefano Stabellini

On Fri, 30 Jun 2017, Igor Druzhinin wrote:
> Non-functional change.
> 
> Signed-off-by: Igor Druzhinin 

Reviewed-by: Stefano Stabellini 


> ---
>  hw/i386/xen/xen-hvm.c | 57 
> ---
>  1 file changed, 31 insertions(+), 26 deletions(-)
> 
> diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
> index cffa7e2..d259cf7 100644
> --- a/hw/i386/xen/xen-hvm.c
> +++ b/hw/i386/xen/xen-hvm.c
> @@ -305,6 +305,36 @@ static hwaddr xen_phys_offset_to_gaddr(hwaddr start_addr,
>  return start_addr;
>  }
>  
> +static int xen_save_physmap(XenIOState *state, XenPhysmap *physmap)
> +{
> +char path[80], value[17];
> +
> +snprintf(path, sizeof(path),
> +"/local/domain/0/device-model/%d/physmap/%"PRIx64"/start_addr",
> +xen_domid, (uint64_t)physmap->phys_offset);
> +snprintf(value, sizeof(value), "%"PRIx64, (uint64_t)physmap->start_addr);
> +if (!xs_write(state->xenstore, 0, path, value, strlen(value))) {
> +return -1;
> +}
> +snprintf(path, sizeof(path),
> +"/local/domain/0/device-model/%d/physmap/%"PRIx64"/size",
> +xen_domid, (uint64_t)physmap->phys_offset);
> +snprintf(value, sizeof(value), "%"PRIx64, (uint64_t)physmap->size);
> +if (!xs_write(state->xenstore, 0, path, value, strlen(value))) {
> +return -1;
> +}
> +if (physmap->name) {
> +snprintf(path, sizeof(path),
> +"/local/domain/0/device-model/%d/physmap/%"PRIx64"/name",
> +xen_domid, (uint64_t)physmap->phys_offset);
> +if (!xs_write(state->xenstore, 0, path,
> +  physmap->name, strlen(physmap->name))) {
> +return -1;
> +}
> +}
> +return 0;
> +}
> +
>  static int xen_add_to_physmap(XenIOState *state,
>hwaddr start_addr,
>ram_addr_t size,
> @@ -316,7 +346,6 @@ static int xen_add_to_physmap(XenIOState *state,
>  XenPhysmap *physmap = NULL;
>  hwaddr pfn, start_gpfn;
>  hwaddr phys_offset = memory_region_get_ram_addr(mr);
> -char path[80], value[17];
>  const char *mr_name;
>  
>  if (get_physmapping(state, start_addr, size)) {
> @@ -368,31 +397,7 @@ go_physmap:
> start_addr >> TARGET_PAGE_BITS,
> (start_addr + size - 1) >> 
> TARGET_PAGE_BITS,
> XEN_DOMCTL_MEM_CACHEATTR_WB);
> -
> -snprintf(path, sizeof(path),
> -"/local/domain/0/device-model/%d/physmap/%"PRIx64"/start_addr",
> -xen_domid, (uint64_t)phys_offset);
> -snprintf(value, sizeof(value), "%"PRIx64, (uint64_t)start_addr);
> -if (!xs_write(state->xenstore, 0, path, value, strlen(value))) {
> -return -1;
> -}
> -snprintf(path, sizeof(path),
> -"/local/domain/0/device-model/%d/physmap/%"PRIx64"/size",
> -xen_domid, (uint64_t)phys_offset);
> -snprintf(value, sizeof(value), "%"PRIx64, (uint64_t)size);
> -if (!xs_write(state->xenstore, 0, path, value, strlen(value))) {
> -return -1;
> -}
> -if (mr_name) {
> -snprintf(path, sizeof(path),
> -"/local/domain/0/device-model/%d/physmap/%"PRIx64"/name",
> -xen_domid, (uint64_t)phys_offset);
> -if (!xs_write(state->xenstore, 0, path, mr_name, strlen(mr_name))) {
> -return -1;
> -}
> -}
> -
> -return 0;
> +return xen_save_physmap(state, physmap);
>  }
>  
>  static int xen_remove_from_physmap(XenIOState *state,
> -- 
> 2.7.4
>

Re: [Qemu-devel] [PATCH] target/ppc: Use tcg_gen_lookup_and_goto_ptr

2017-06-30 Thread Emilio G. Cota

On Fri, Jun 30, 2017 at 11:37:36 -0700, Richard Henderson wrote:
> Cc: qemu-...@nongnu.org
> Signed-off-by: Richard Henderson 
> ---
>  target/ppc/translate.c | 23 ---
>  1 file changed, 8 insertions(+), 15 deletions(-)

Tested-by: Emilio G. Cota 

Just measured the speedup for linux-user:

 SPECint06 (test set), ppc64le-linux-user. 
Host: IBM POWER8 @ 3.42 GHz  


  1.6 
+-+--+-++++-+++-++++-+--+-+
   
  | 
  | |   
  | 
  |   after |   
  1.5 
+-+.|...+-+
   
  | 
  | |   
  | 
   **   |   
  1.4 
+-+.+++.+++..*..|.*.+++.+-+
   
  ||   +++ |
   *  | *  ||   
  |||  |
   *  | **  |   
  1.3 
+-+..||..|...*..|.**.|.*+-+
   
  || +++  ** *  
   * +++** | * **   |   
  |  **   |   * |  * * | *  
   ***+++* *+++ *   |   
  1.2 
+-+*.|..*...|...*.|..*.+++.*.|.*.***...*.**.+-+
   
  |  * |  *   |   * |  *  |  *+++*  
   ***   * **   **  |   
  |  * |  * * *+++ *   ***   *  
   ***   * **   * +++*  |   
  1.1 
+-+*+++.*.*.|.*.**...*.+++**...*.***...*.+++.**...**+-+
   
  |  ** * | * **   ***   *  
   ***   *  |  **   **  |   
  |  ** * | * **   ***   *
**   ***   *  |  **   **  |   
1 
+-+**.+++.*+++*.+++.**...***...***...***...****...**+-+
   
  |  **  |  *   *  |  **   ***   ** 
   *   ***   ** | ***   **  |   
  |  **  |  *   ****   ***   ** 
   *   ***   ** | ***   **  |   
  0.9 
+-+**...***...**.|.***...***...***...***...**.|.***...**+-+
   
  |  **   *  | **   ** | ***   ***   ** 
   *   ***   **+++***   **  |   
  |  **   * +++**   **+++***   ***   ** 
   *   ***   **   ***   **  |   
  0.8 
+-+**---******---*****---******---**+-+
   
  400.perlbench 401.bzip2  403.gcc  429.mcf445.gobmk 
456.hmmer458.462.libquantu464.h264r471.omnetpp473.a483.xalancbmk gmean  
   
png: http://imgur.com/a/twj6V

Thanks,

Emilio

PS. If you have spec06, you can very easily generate these plots. See
  https://github.com/cota/runspec-simple/commit/14211b927

Re: [Qemu-devel] [Qemu-block] [PATCH v2] tests: Avoid non-portable 'echo -ARG'

2017-06-30 Thread Jeff Cody

On Fri, Jun 30, 2017 at 02:58:31PM -0500, Eric Blake wrote:
> POSIX says that backslashes in the arguments to 'echo', as well as
> any use of 'echo -n' and 'echo -e', are non-portable; it recommends
> people should favor 'printf' instead.  This is definitely true where
> we do not control which shell is running (such as in makefile snippets
> or in documentation examples).  But even for scripts where we
> require bash (and therefore, where echo does what we want by default),
> it is still possible to use 'shopt -s xpg_echo' to change bash's
> behavior of echo.  And setting a good example never hurts when we are
> not sure if a snippet will be copied from a bash-only script to a
> general shell script (although I don't change the use of non-portable
> \e for ESC when we know the running shell is bash).
> 
> Replace 'echo -n "..."' with 'printf %s "..."', and 'echo -e "..."'
> with 'printf %b "...\n"', with the optimization that the %s/%b
> argument can be omitted if the string being printed contains no
> substitutions that could result in '%' (trivial if there is no
> $ or `, but also possible when using things like $ret that are
> known to be numeric given the assignment to ret just above).
> 
> In the qemu-iotests check script, fix unusual shell quoting
> that would result in word-splitting if 'date' outputs a space.
> 
> In test 051, take an opportunity to shorten the line.
> 
> In test 068, get rid of a pointless second invocation of bash.
> 
> CC: qemu-triv...@nongnu.org
> Signed-off-by: Eric Blake 
> 
> ---
> v2: be robust to potential % in substitutions [Max], rebase to master,
> shorten some long lines and an odd bash -c use
> 
> Add qemu-trivial in cc, although we may decide this is better through
> Max's block tree since it is mostly iotests related
> 
> ---
>  qemu-options.hx |  4 ++--
>  tests/multiboot/run_test.sh | 10 +-
>  tests/qemu-iotests/051  |  7 ---
>  tests/qemu-iotests/068  |  2 +-
>  tests/qemu-iotests/142  | 48 
> ++---
>  tests/qemu-iotests/171  | 14 ++---
>  tests/qemu-iotests/check| 18 -
>  tests/rocker/all| 10 +-
>  tests/tcg/cris/Makefile |  8 
>  9 files changed, 61 insertions(+), 60 deletions(-)
> 
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 297bd8a..05a0474 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -4363,7 +4363,7 @@ The simplest (insecure) usage is to provide the secret 
> inline
> 
>  The simplest secure usage is to provide the secret via a file
> 
> - # echo -n "letmein" > mypasswd.txt
> + # printf "letmein" > mypasswd.txt
>   # $QEMU -object secret,id=sec0,file=mypasswd.txt,format=raw
> 
>  For greater security, AES-256-CBC should be used. To illustrate usage,
> @@ -4391,7 +4391,7 @@ telling openssl to base64 encode the result, but it 
> could be left
>  as raw bytes if desired.
> 
>  @example
> - # SECRET=$(echo -n "letmein" |
> + # SECRET=$(printf "letmein" |
>  openssl enc -aes-256-cbc -a -K $KEY -iv $IV)
>  @end example
> 
> diff --git a/tests/multiboot/run_test.sh b/tests/multiboot/run_test.sh
> index 78d7edf..9a1cd59 100755
> --- a/tests/multiboot/run_test.sh
> +++ b/tests/multiboot/run_test.sh
> @@ -26,7 +26,7 @@ run_qemu() {
>  local kernel=$1
>  shift
> 
> -echo -e "\n\n=== Running test case: $kernel $@ ===\n" >> test.log
> +printf %b "\n\n=== Running test case: $kernel $@ ===\n\n" >> test.log
> 
>  $QEMU \
>  -kernel $kernel \
> @@ -68,21 +68,21 @@ for t in mmap modules; do
>  pass=1
> 
>  if [ $debugexit != 1 ]; then
> -echo -e "\e[31m ?? \e[0m $t (no debugexit used, exit code $ret)"
> +printf "\e[31m ?? \e[0m $t (no debugexit used, exit code $ret)\n"
>  pass=0
>  elif [ $ret != 0 ]; then
> -echo -e "\e[31mFAIL\e[0m $t (exit code $ret)"
> +printf "\e[31mFAIL\e[0m $t (exit code $ret)\n"
>  pass=0
>  fi
> 
>  if ! diff $t.out test.log > /dev/null 2>&1; then
> -echo -e "\e[31mFAIL\e[0m $t (output difference)"
> +printf "\e[31mFAIL\e[0m $t (output difference)\n"
>  diff -u $t.out test.log
>  pass=0
>  fi
> 
>  if [ $pass == 1 ]; then
> -echo -e "\e[32mPASS\e[0m $t"
> +printf "\e[32mPASS\e[0m $t\n"
>  fi
> 
>  done
> diff --git a/tests/qemu-iotests/051 b/tests/qemu-iotests/051
> index 26c29de..c3c08bf 100755
> --- a/tests/qemu-iotests/051
> +++ b/tests/qemu-iotests/051
> @@ -217,7 +217,7 @@ run_qemu -drive driver=null-co,cache=invalid_value
>  # Test 142 checks the direct=on cases
> 
>  for cache in writeback writethrough unsafe invalid_value; do
> -echo -e "info block\ninfo block file\ninfo block backing\ninfo block 
> backing-file" | \
> +printf "info block %s\n" '' file backing backing-file | \
>  run_qemu -drive 
>

Re: [Qemu-devel] [Qemu-block] [PATCH v4 1/2] iotests: Use absolute paths for executables

2017-06-30 Thread Jeff Cody

On Fri, Jun 30, 2017 at 02:45:46PM -0500, Eric Blake wrote:
> On 06/30/2017 02:41 PM, Eric Blake wrote:
> 
> > +++ 068.out.bad 2017-06-30 14:35:28.720241398 -0500
> > @@ -1,4 +1,5 @@
> >  QA output created by 068
> > +realpath: '': No such file or directory
> > 
> > The culprit? $QEMU_VXHS_PROG is empty for me, which means `set_prog_path
> > qnio_server` found nothing to use.  You'll have to add in a safety valve
> > that only calls 'type' if operating on a non-empty path in the first place.
> 
> I'm using this locally, in the meantime:
> 
> diff --git i/tests/qemu-iotests/common.config
> w/tests/qemu-iotests/common.config
> index c1dc425..6f97331 100644
> --- i/tests/qemu-iotests/common.config
> +++ w/tests/qemu-iotests/common.config
> @@ -107,7 +107,9 @@ export QEMU_PROG=$(realpath -- "$(type -p
> "$QEMU_PROG")")
>  export QEMU_IMG_PROG=$(realpath -- "$(type -p "$QEMU_IMG_PROG")")
>  export QEMU_IO_PROG=$(realpath -- "$(type -p "$QEMU_IO_PROG")")
>  export QEMU_NBD_PROG=$(realpath -- "$(type -p "$QEMU_NBD_PROG")")
> -export QEMU_VXHS_PROG=$(realpath -- "$(type -p "$QEMU_VXHS_PROG")")
> +if [ -n "$QEMU_VXHS_PROG" ]; then
> +export QEMU_VXHS_PROG=$(realpath -- "$(type -p "$QEMU_VXHS_PROG")")
> +fi
> 
>  _qemu_wrapper()
>  {
> 
> 
> Is qnio_server easily available for Fedora?
>

Depends on your definition of "easily".  It needs to be built from the
upstream project github [1].  Most developers will likely not have it
installed.

https://github.com/VeritasHyperScale/libqnio

Re: [Qemu-devel] [PATCH v3 20/20] block: Make bdrv_is_allocated_above() byte-based

2017-06-30 Thread Jeff Cody

On Tue, Jun 27, 2017 at 02:24:58PM -0500, Eric Blake wrote:
> We are gradually moving away from sector-based interfaces, towards
> byte-based.  In the common case, allocation is unlikely to ever use
> values that are not naturally sector-aligned, but it is possible
> that byte-based values will let us be more precise about allocation
> at the end of an unaligned file that can do byte-based access.
> 
> Changing the signature of the function to use int64_t *pnum ensures
> that the compiler enforces that all callers are updated.  For now,
> the io.c layer still assert()s that all callers are sector-aligned,
> but that can be relaxed when a later patch implements byte-based
> block status.  Therefore, for the most part this patch is just the
> addition of scaling at the callers followed by inverse scaling at
> bdrv_is_allocated().  But some code, particularly stream_run(),
> gets a lot simpler because it no longer has to mess with sectors.
> 
> For ease of review, bdrv_is_allocated() was tackled separately.
> 
> Signed-off-by: Eric Blake 
> Reviewed-by: John Snow 
> 

Reviewed-by: Jeff Cody 

> ---
> v2: tweak function comments, favor bdrv_getlength() over ->total_sectors
> ---
>  include/block/block.h |  2 +-
>  block/commit.c| 20 
>  block/io.c| 42 --
>  block/mirror.c|  5 -
>  block/replication.c   | 17 -
>  block/stream.c| 21 +
>  qemu-img.c| 10 +++---
>  7 files changed, 61 insertions(+), 56 deletions(-)
> 
> diff --git a/include/block/block.h b/include/block/block.h
> index 9b9d87b..13022d5 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -428,7 +428,7 @@ int64_t bdrv_get_block_status_above(BlockDriverState *bs,
>  int bdrv_is_allocated(BlockDriverState *bs, int64_t offset, int64_t bytes,
>int64_t *pnum);
>  int bdrv_is_allocated_above(BlockDriverState *top, BlockDriverState *base,
> -int64_t sector_num, int nb_sectors, int *pnum);
> +int64_t offset, int64_t bytes, int64_t *pnum);
> 
>  bool bdrv_is_read_only(BlockDriverState *bs);
>  bool bdrv_is_writable(BlockDriverState *bs);
> diff --git a/block/commit.c b/block/commit.c
> index 241aa95..774a8a5 100644
> --- a/block/commit.c
> +++ b/block/commit.c
> @@ -146,7 +146,7 @@ static void coroutine_fn commit_run(void *opaque)
>  int64_t offset;
>  uint64_t delay_ns = 0;
>  int ret = 0;
> -int n = 0; /* sectors */
> +int64_t n = 0; /* bytes */
>  void *buf = NULL;
>  int bytes_written = 0;
>  int64_t base_len;
> @@ -171,7 +171,7 @@ static void coroutine_fn commit_run(void *opaque)
> 
>  buf = blk_blockalign(s->top, COMMIT_BUFFER_SIZE);
> 
> -for (offset = 0; offset < s->common.len; offset += n * BDRV_SECTOR_SIZE) 
> {
> +for (offset = 0; offset < s->common.len; offset += n) {
>  bool copy;
> 
>  /* Note that even when no rate limit is applied we need to yield
> @@ -183,15 +183,12 @@ static void coroutine_fn commit_run(void *opaque)
>  }
>  /* Copy if allocated above the base */
>  ret = bdrv_is_allocated_above(blk_bs(s->top), blk_bs(s->base),
> -  offset / BDRV_SECTOR_SIZE,
> -  COMMIT_BUFFER_SIZE / BDRV_SECTOR_SIZE,
> -  );
> +  offset, COMMIT_BUFFER_SIZE, );
>  copy = (ret == 1);
> -trace_commit_one_iteration(s, offset, n * BDRV_SECTOR_SIZE, ret);
> +trace_commit_one_iteration(s, offset, n, ret);
>  if (copy) {
> -ret = commit_populate(s->top, s->base, offset,
> -  n * BDRV_SECTOR_SIZE, buf);
> -bytes_written += n * BDRV_SECTOR_SIZE;
> +ret = commit_populate(s->top, s->base, offset, n, buf);
> +bytes_written += n;
>  }
>  if (ret < 0) {
>  BlockErrorAction action =
> @@ -204,11 +201,10 @@ static void coroutine_fn commit_run(void *opaque)
>  }
>  }
>  /* Publish progress */
> -s->common.offset += n * BDRV_SECTOR_SIZE;
> +s->common.offset += n;
> 
>  if (copy && s->common.speed) {
> -delay_ns = ratelimit_calculate_delay(>limit,
> - n * BDRV_SECTOR_SIZE);
> +delay_ns = ratelimit_calculate_delay(>limit, n);
>  }
>  }
> 
> diff --git a/block/io.c b/block/io.c
> index 5bbf153..061a162 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -1903,54 +1903,52 @@ int coroutine_fn bdrv_is_allocated(BlockDriverState 
> *bs, int64_t offset,
>  /*
>   * Given an image chain: ... -> [BASE] -> [INTER1] -> [INTER2] -> [TOP]
>   *
> - * Return true if the given sector is

Re: [Qemu-devel] [Qemu-block] [PATCH v3 19/20] block: Minimize raw use of bds->total_sectors

2017-06-30 Thread Jeff Cody

On Tue, Jun 27, 2017 at 02:24:57PM -0500, Eric Blake wrote:
> bdrv_is_allocated_above() was relying on intermediate->total_sectors,
> which is a field that can have stale contents depending on the value
> of intermediate->has_variable_length.  An audit shows that we are safe
> (we were first calling through bdrv_co_get_block_status() which in
> turn calls bdrv_nb_sectors() and therefore just refreshed the current
> length), but it's nicer to favor our accessor functions to avoid having
> to repeat such an audit, even if it means refresh_total_sectors() is
> called more frequently.
> 
> Suggested-by: John Snow 
> Signed-off-by: Eric Blake 
> 

Reviewed-by: Jeff Cody 

> ---
> v2: new patch
> ---
>  block/io.c | 12 +++-
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/block/io.c b/block/io.c
> index 0545180..5bbf153 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -1924,6 +1924,7 @@ int bdrv_is_allocated_above(BlockDriverState *top,
>  intermediate = top;
>  while (intermediate && intermediate != base) {
>  int64_t pnum_inter;
> +int64_t size_inter;
>  int psectors_inter;
> 
>  ret = bdrv_is_allocated(intermediate, sector_num * BDRV_SECTOR_SIZE,
> @@ -1941,13 +1942,14 @@ int bdrv_is_allocated_above(BlockDriverState *top,
> 
>  /*
>   * [sector_num, nb_sectors] is unallocated on top but intermediate
> - * might have
> - *
> - * [sector_num+x, nr_sectors] allocated.
> + * might have [sector_num+x, nb_sectors-x] allocated.
>   */
> +size_inter = bdrv_nb_sectors(intermediate);
> +if (size_inter < 0) {
> +return size_inter;
> +}
>  if (n > psectors_inter &&
> -(intermediate == top ||
> - sector_num + psectors_inter < intermediate->total_sectors)) {
> +(intermediate == top || sector_num + psectors_inter < 
> size_inter)) {
>  n = psectors_inter;
>  }
> 
> -- 
> 2.9.4
> 
>

Re: [Qemu-devel] [PATCH v3 18/20] block: Make bdrv_is_allocated() byte-based

2017-06-30 Thread Jeff Cody

On Tue, Jun 27, 2017 at 02:24:56PM -0500, Eric Blake wrote:
> We are gradually moving away from sector-based interfaces, towards
> byte-based.  In the common case, allocation is unlikely to ever use
> values that are not naturally sector-aligned, but it is possible
> that byte-based values will let us be more precise about allocation
> at the end of an unaligned file that can do byte-based access.
> 
> Changing the signature of the function to use int64_t *pnum ensures
> that the compiler enforces that all callers are updated.  For now,
> the io.c layer still assert()s that all callers are sector-aligned
> on input and that *pnum is sector-aligned on return to the caller,
> but that can be relaxed when a later patch implements byte-based
> block status.  Therefore, this code adds usages like
> DIV_ROUND_UP(,BDRV_SECTOR_SIZE) to callers that still want aligned
> values, where the call might reasonbly give non-aligned results
> in the future; on the other hand, no rounding is needed for callers
> that should just continue to work with byte alignment.
> 
> For the most part this patch is just the addition of scaling at the
> callers followed by inverse scaling at bdrv_is_allocated().  But
> some code, particularly bdrv_commit(), gets a lot simpler because it
> no longer has to mess with sectors; also, it is now possible to pass
> NULL if the caller does not care how much of the image is allocated
> beyond the initial offset.
> 
> For ease of review, bdrv_is_allocated_above() will be tackled
> separately.
> 
> Signed-off-by: Eric Blake 
> Reviewed-by: John Snow 
> 

Reviewed-by: Jeff Cody 

> ---
> v2: rebase to earlier changes, tweak commit message
> ---
>  include/block/block.h |  4 +--
>  block/backup.c| 17 -
>  block/commit.c| 21 +++-
>  block/io.c| 49 +---
>  block/stream.c|  5 ++--
>  block/vvfat.c | 34 ++---
>  migration/block.c |  9 ---
>  qemu-img.c|  5 +++-
>  qemu-io-cmds.c| 70 
> +++
>  9 files changed, 114 insertions(+), 100 deletions(-)
> 
> diff --git a/include/block/block.h b/include/block/block.h
> index 5cdd690..9b9d87b 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -425,8 +425,8 @@ int64_t bdrv_get_block_status_above(BlockDriverState *bs,
>  int64_t sector_num,
>  int nb_sectors, int *pnum,
>  BlockDriverState **file);
> -int bdrv_is_allocated(BlockDriverState *bs, int64_t sector_num, int 
> nb_sectors,
> -  int *pnum);
> +int bdrv_is_allocated(BlockDriverState *bs, int64_t offset, int64_t bytes,
> +  int64_t *pnum);
>  int bdrv_is_allocated_above(BlockDriverState *top, BlockDriverState *base,
>  int64_t sector_num, int nb_sectors, int *pnum);
> 
> diff --git a/block/backup.c b/block/backup.c
> index 04def91..b2048bf 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -47,12 +47,6 @@ typedef struct BackupBlockJob {
>  QLIST_HEAD(, CowRequest) inflight_reqs;
>  } BackupBlockJob;
> 
> -/* Size of a cluster in sectors, instead of bytes. */
> -static inline int64_t cluster_size_sectors(BackupBlockJob *job)
> -{
> -  return job->cluster_size / BDRV_SECTOR_SIZE;
> -}
> -
>  /* See if in-flight requests overlap and wait for them to complete */
>  static void coroutine_fn wait_for_overlapping_requests(BackupBlockJob *job,
> int64_t offset,
> @@ -433,7 +427,6 @@ static void coroutine_fn backup_run(void *opaque)
>  BackupCompleteData *data;
>  BlockDriverState *bs = blk_bs(job->common.blk);
>  int64_t offset;
> -int64_t sectors_per_cluster = cluster_size_sectors(job);
>  int ret = 0;
> 
>  QLIST_INIT(>inflight_reqs);
> @@ -465,12 +458,13 @@ static void coroutine_fn backup_run(void *opaque)
>  }
> 
>  if (job->sync_mode == MIRROR_SYNC_MODE_TOP) {
> -int i, n;
> +int i;
> +int64_t n;
> 
>  /* Check to see if these blocks are already in the
>   * backing file. */
> 
> -for (i = 0; i < sectors_per_cluster;) {
> +for (i = 0; i < job->cluster_size;) {
>  /* bdrv_is_allocated() only returns true/false based
>   * on the first set of sectors it comes across that
>   * are are all in the same state.
> @@ -478,9 +472,8 @@ static void coroutine_fn backup_run(void *opaque)
>   * backup cluster length.  We end up copying more than
>   * needed but at some point that is always the case. */
>  alloced =
> -

Re: [Qemu-devel] [PULL 04/14] migration: let MigrationState be a qdev

2017-06-30 Thread Eric Blake

On 06/30/2017 04:18 PM, Philippe Mathieu-Daudé wrote:
> Hi Peter, Juan,
> 
> On 06/28/2017 08:30 AM, Juan Quintela wrote:
>> From: Peter Xu 
>>
>> Let the old man "MigrationState" join the object family. Direct benefit
>> is that we can start to use all the property features derived from
>> current QDev, like: HW_COMPAT_* bits, command line setup for migration
>> parameters (so will never need to set them up each time using HMP/QMP,
>> this is really, really attractive for test writters), etc.
>>
>> I see no reason to disallow this happen yet. So let's start from this
>> one, to see whether it would be anything good.
>>
>> Now we init the MigrationState struct statically in main() to make sure
>> it's initialized after global properties are applied, since we'll use
>> them during creation of the object.
>>
>> No functional change at all.
>>

> qemu-system-arm: migration/migration.c:127: migrate_get_current:
> Assertion `current_migration' failed.
> 
> I'v bisected to this commit using the following script:

Known issue;

https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg06958.html

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v3 16/20] backup: Switch backup_do_cow() to byte-based

2017-06-30 Thread Jeff Cody

On Tue, Jun 27, 2017 at 02:24:54PM -0500, Eric Blake wrote:
> We are gradually converting to byte-based interfaces, as they are
> easier to reason about than sector-based.  Convert another internal
> function (no semantic change).
> 
> Signed-off-by: Eric Blake 
> Reviewed-by: John Snow 
> 

Reviewed-by: Jeff Cody 

> ---
> v2: no change
> ---
>  block/backup.c | 62 
> --
>  1 file changed, 26 insertions(+), 36 deletions(-)
> 
> diff --git a/block/backup.c b/block/backup.c
> index cfbd921..c029d44 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -91,7 +91,7 @@ static void cow_request_end(CowRequest *req)
>  }
> 
>  static int coroutine_fn backup_do_cow(BackupBlockJob *job,
> -  int64_t sector_num, int nb_sectors,
> +  int64_t offset, uint64_t bytes,
>bool *error_is_read,
>bool is_write_notifier)
>  {
> @@ -101,34 +101,28 @@ static int coroutine_fn backup_do_cow(BackupBlockJob 
> *job,
>  QEMUIOVector bounce_qiov;
>  void *bounce_buffer = NULL;
>  int ret = 0;
> -int64_t sectors_per_cluster = cluster_size_sectors(job);
> -int64_t start, end; /* clusters */
> +int64_t start, end; /* bytes */
>  int n; /* bytes */
> 
>  qemu_co_rwlock_rdlock(>flush_rwlock);
> 
> -start = sector_num / sectors_per_cluster;
> -end = DIV_ROUND_UP(sector_num + nb_sectors, sectors_per_cluster);
> +start = QEMU_ALIGN_DOWN(offset, job->cluster_size);
> +end = QEMU_ALIGN_UP(bytes + offset, job->cluster_size);
> 
> -trace_backup_do_cow_enter(job, start * job->cluster_size,
> -  sector_num * BDRV_SECTOR_SIZE,
> -  nb_sectors * BDRV_SECTOR_SIZE);
> +trace_backup_do_cow_enter(job, start, offset, bytes);
> 
> -wait_for_overlapping_requests(job, start * job->cluster_size,
> -  end * job->cluster_size);
> -cow_request_begin(_request, job, start * job->cluster_size,
> -  end * job->cluster_size);
> +wait_for_overlapping_requests(job, start, end);
> +cow_request_begin(_request, job, start, end);
> 
> -for (; start < end; start++) {
> -if (test_bit(start, job->done_bitmap)) {
> -trace_backup_do_cow_skip(job, start * job->cluster_size);
> +for (; start < end; start += job->cluster_size) {
> +if (test_bit(start / job->cluster_size, job->done_bitmap)) {
> +trace_backup_do_cow_skip(job, start);
>  continue; /* already copied */
>  }
> 
> -trace_backup_do_cow_process(job, start * job->cluster_size);
> +trace_backup_do_cow_process(job, start);
> 
> -n = MIN(job->cluster_size,
> -job->common.len - start * job->cluster_size);
> +n = MIN(job->cluster_size, job->common.len - start);
> 
>  if (!bounce_buffer) {
>  bounce_buffer = blk_blockalign(blk, job->cluster_size);
> @@ -137,11 +131,10 @@ static int coroutine_fn backup_do_cow(BackupBlockJob 
> *job,
>  iov.iov_len = n;
>  qemu_iovec_init_external(_qiov, , 1);
> 
> -ret = blk_co_preadv(blk, start * job->cluster_size,
> -bounce_qiov.size, _qiov,
> +ret = blk_co_preadv(blk, start, bounce_qiov.size, _qiov,
>  is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0);
>  if (ret < 0) {
> -trace_backup_do_cow_read_fail(job, start * job->cluster_size, 
> ret);
> +trace_backup_do_cow_read_fail(job, start, ret);
>  if (error_is_read) {
>  *error_is_read = true;
>  }
> @@ -149,22 +142,22 @@ static int coroutine_fn backup_do_cow(BackupBlockJob 
> *job,
>  }
> 
>  if (buffer_is_zero(iov.iov_base, iov.iov_len)) {
> -ret = blk_co_pwrite_zeroes(job->target, start * 
> job->cluster_size,
> +ret = blk_co_pwrite_zeroes(job->target, start,
> bounce_qiov.size, BDRV_REQ_MAY_UNMAP);
>  } else {
> -ret = blk_co_pwritev(job->target, start * job->cluster_size,
> +ret = blk_co_pwritev(job->target, start,
>   bounce_qiov.size, _qiov,
>   job->compress ? BDRV_REQ_WRITE_COMPRESSED : 
> 0);
>  }
>  if (ret < 0) {
> -trace_backup_do_cow_write_fail(job, start * job->cluster_size, 
> ret);
> +trace_backup_do_cow_write_fail(job, start, ret);
>  if (error_is_read) {
>  *error_is_read = false;
>  }
>  goto out;
>  }
> 
> -set_bit(start, job->done_bitmap);
> +set_bit(start / job->cluster_size, job->done_bitmap);
> 
>

Re: [Qemu-devel] [PATCH v3 17/20] backup: Switch backup_run() to byte-based

2017-06-30 Thread Jeff Cody

On Tue, Jun 27, 2017 at 02:24:55PM -0500, Eric Blake wrote:
> We are gradually converting to byte-based interfaces, as they are
> easier to reason about than sector-based.  Change the internal
> loop iteration of backups to track by bytes instead of sectors
> (although we are still guaranteed that we iterate by steps that
> are cluster-aligned).
> 
> Signed-off-by: Eric Blake 
> Reviewed-by: John Snow 
> 

Reviewed-by: Jeff Cody 

> ---
> v2: no change
> ---
>  block/backup.c | 32 +++-
>  1 file changed, 15 insertions(+), 17 deletions(-)
> 
> diff --git a/block/backup.c b/block/backup.c
> index c029d44..04def91 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -370,11 +370,10 @@ static int coroutine_fn 
> backup_run_incremental(BackupBlockJob *job)
>  int ret = 0;
>  int clusters_per_iter;
>  uint32_t granularity;
> -int64_t sector;
> +int64_t offset;
>  int64_t cluster;
>  int64_t end;
>  int64_t last_cluster = -1;
> -int64_t sectors_per_cluster = cluster_size_sectors(job);
>  BdrvDirtyBitmapIter *dbi;
> 
>  granularity = bdrv_dirty_bitmap_granularity(job->sync_bitmap);
> @@ -382,8 +381,8 @@ static int coroutine_fn 
> backup_run_incremental(BackupBlockJob *job)
>  dbi = bdrv_dirty_iter_new(job->sync_bitmap, 0);
> 
>  /* Find the next dirty sector(s) */
> -while ((sector = bdrv_dirty_iter_next(dbi)) != -1) {
> -cluster = sector / sectors_per_cluster;
> +while ((offset = bdrv_dirty_iter_next(dbi) * BDRV_SECTOR_SIZE) >= 0) {
> +cluster = offset / job->cluster_size;
> 
>  /* Fake progress updates for any clusters we skipped */
>  if (cluster != last_cluster + 1) {
> @@ -410,7 +409,8 @@ static int coroutine_fn 
> backup_run_incremental(BackupBlockJob *job)
>  /* If the bitmap granularity is smaller than the backup granularity,
>   * we need to advance the iterator pointer to the next cluster. */
>  if (granularity < job->cluster_size) {
> -bdrv_set_dirty_iter(dbi, cluster * sectors_per_cluster);
> +bdrv_set_dirty_iter(dbi,
> +cluster * job->cluster_size / 
> BDRV_SECTOR_SIZE);
>  }
> 
>  last_cluster = cluster - 1;
> @@ -432,17 +432,15 @@ static void coroutine_fn backup_run(void *opaque)
>  BackupBlockJob *job = opaque;
>  BackupCompleteData *data;
>  BlockDriverState *bs = blk_bs(job->common.blk);
> -int64_t start, end;
> +int64_t offset;
>  int64_t sectors_per_cluster = cluster_size_sectors(job);
>  int ret = 0;
> 
>  QLIST_INIT(>inflight_reqs);
>  qemu_co_rwlock_init(>flush_rwlock);
> 
> -start = 0;
> -end = DIV_ROUND_UP(job->common.len, job->cluster_size);
> -
> -job->done_bitmap = bitmap_new(end);
> +job->done_bitmap = bitmap_new(DIV_ROUND_UP(job->common.len,
> +   job->cluster_size));
> 
>  job->before_write.notify = backup_before_write_notify;
>  bdrv_add_before_write_notifier(bs, >before_write);
> @@ -457,7 +455,8 @@ static void coroutine_fn backup_run(void *opaque)
>  ret = backup_run_incremental(job);
>  } else {
>  /* Both FULL and TOP SYNC_MODE's require copying.. */
> -for (; start < end; start++) {
> +for (offset = 0; offset < job->common.len;
> + offset += job->cluster_size) {
>  bool error_is_read;
>  int alloced = 0;
> 
> @@ -480,8 +479,8 @@ static void coroutine_fn backup_run(void *opaque)
>   * needed but at some point that is always the case. */
>  alloced =
>  bdrv_is_allocated(bs,
> -start * sectors_per_cluster + i,
> -sectors_per_cluster - i, );
> +  (offset >> BDRV_SECTOR_BITS) + i,
> +  sectors_per_cluster - i, );
>  i += n;
> 
>  if (alloced || n == 0) {
> @@ -499,9 +498,8 @@ static void coroutine_fn backup_run(void *opaque)
>  if (alloced < 0) {
>  ret = alloced;
>  } else {
> -ret = backup_do_cow(job, start * job->cluster_size,
> -job->cluster_size, _is_read,
> -false);
> +ret = backup_do_cow(job, offset, job->cluster_size,
> +_is_read, false);
>  }
>  if (ret < 0) {
>  /* Depending on error action, fail now or retry cluster */
> @@ -510,7 +508,7 @@ static void coroutine_fn backup_run(void *opaque)
>  if (action == BLOCK_ERROR_ACTION_REPORT) {
>  break;
>  } else {
> -start--;
> +offset

Re: [Qemu-devel] [PATCH v3 15/20] backup: Switch block_backup.h to byte-based

2017-06-30 Thread Jeff Cody

On Tue, Jun 27, 2017 at 02:24:53PM -0500, Eric Blake wrote:
> We are gradually converting to byte-based interfaces, as they are
> easier to reason about than sector-based.  Continue by converting
> the public interface to backup jobs (no semantic change), including
> a change to CowRequest to track by bytes instead of cluster indices.
> 
> Note that this does not change the difference between the public
> interface (starting point, and size of the subsequent range) and
> the internal interface (starting and end points).
> 
> Signed-off-by: Eric Blake 
> Reviewed-by: John Snow 
> 

Reviewed-by: Jeff Cody 

> ---
> v2: change a couple more parameter names
> ---
>  include/block/block_backup.h | 11 +--
>  block/backup.c   | 33 -
>  block/replication.c  | 12 
>  3 files changed, 29 insertions(+), 27 deletions(-)
> 
> diff --git a/include/block/block_backup.h b/include/block/block_backup.h
> index 8a75947..994a3bd 100644
> --- a/include/block/block_backup.h
> +++ b/include/block/block_backup.h
> @@ -21,17 +21,16 @@
>  #include "block/block_int.h"
> 
>  typedef struct CowRequest {
> -int64_t start;
> -int64_t end;
> +int64_t start_byte;
> +int64_t end_byte;
>  QLIST_ENTRY(CowRequest) list;
>  CoQueue wait_queue; /* coroutines blocked on this request */
>  } CowRequest;
> 
> -void backup_wait_for_overlapping_requests(BlockJob *job, int64_t sector_num,
> -  int nb_sectors);
> +void backup_wait_for_overlapping_requests(BlockJob *job, int64_t offset,
> +  uint64_t bytes);
>  void backup_cow_request_begin(CowRequest *req, BlockJob *job,
> -  int64_t sector_num,
> -  int nb_sectors);
> +  int64_t offset, uint64_t bytes);
>  void backup_cow_request_end(CowRequest *req);
> 
>  void backup_do_checkpoint(BlockJob *job, Error **errp);
> diff --git a/block/backup.c b/block/backup.c
> index 4e64710..cfbd921 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -55,7 +55,7 @@ static inline int64_t cluster_size_sectors(BackupBlockJob 
> *job)
> 
>  /* See if in-flight requests overlap and wait for them to complete */
>  static void coroutine_fn wait_for_overlapping_requests(BackupBlockJob *job,
> -   int64_t start,
> +   int64_t offset,
> int64_t end)
>  {
>  CowRequest *req;
> @@ -64,7 +64,7 @@ static void coroutine_fn 
> wait_for_overlapping_requests(BackupBlockJob *job,
>  do {
>  retry = false;
>  QLIST_FOREACH(req, >inflight_reqs, list) {
> -if (end > req->start && start < req->end) {
> +if (end > req->start_byte && offset < req->end_byte) {
>  qemu_co_queue_wait(>wait_queue, NULL);
>  retry = true;
>  break;
> @@ -75,10 +75,10 @@ static void coroutine_fn 
> wait_for_overlapping_requests(BackupBlockJob *job,
> 
>  /* Keep track of an in-flight request */
>  static void cow_request_begin(CowRequest *req, BackupBlockJob *job,
> - int64_t start, int64_t end)
> +  int64_t offset, int64_t end)
>  {
> -req->start = start;
> -req->end = end;
> +req->start_byte = offset;
> +req->end_byte = end;
>  qemu_co_queue_init(>wait_queue);
>  QLIST_INSERT_HEAD(>inflight_reqs, req, list);
>  }
> @@ -114,8 +114,10 @@ static int coroutine_fn backup_do_cow(BackupBlockJob 
> *job,
>sector_num * BDRV_SECTOR_SIZE,
>nb_sectors * BDRV_SECTOR_SIZE);
> 
> -wait_for_overlapping_requests(job, start, end);
> -cow_request_begin(_request, job, start, end);
> +wait_for_overlapping_requests(job, start * job->cluster_size,
> +  end * job->cluster_size);
> +cow_request_begin(_request, job, start * job->cluster_size,
> +  end * job->cluster_size);
> 
>  for (; start < end; start++) {
>  if (test_bit(start, job->done_bitmap)) {
> @@ -277,32 +279,29 @@ void backup_do_checkpoint(BlockJob *job, Error **errp)
>  bitmap_zero(backup_job->done_bitmap, len);
>  }
> 
> -void backup_wait_for_overlapping_requests(BlockJob *job, int64_t sector_num,
> -  int nb_sectors)
> +void backup_wait_for_overlapping_requests(BlockJob *job, int64_t offset,
> +  uint64_t bytes)
>  {
>  BackupBlockJob *backup_job = container_of(job, BackupBlockJob, common);
> -int64_t sectors_per_cluster = cluster_size_sectors(backup_job);
>  int64_t start, end;
> 
>  assert(job->driver->job_type ==

Re: [Qemu-devel] [PULL 04/14] migration: let MigrationState be a qdev

2017-06-30 Thread Philippe Mathieu-Daudé


Hi Peter, Juan,

On 06/28/2017 08:30 AM, Juan Quintela wrote:

From: Peter Xu 

Let the old man "MigrationState" join the object family. Direct benefit
is that we can start to use all the property features derived from
current QDev, like: HW_COMPAT_* bits, command line setup for migration
parameters (so will never need to set them up each time using HMP/QMP,
this is really, really attractive for test writters), etc.

I see no reason to disallow this happen yet. So let's start from this
one, to see whether it would be anything good.

Now we init the MigrationState struct statically in main() to make sure
it's initialized after global properties are applied, since we'll use
them during creation of the object.

No functional change at all.

Reviewed-by: Juan Quintela 
Signed-off-by: Peter Xu 
Message-Id: <1498536619-14548-5-git-send-email-pet...@redhat.com>
Reviewed-by: Eduardo Habkost 
Signed-off-by: Juan Quintela 
---
  include/migration/misc.h |  1 +
  migration/migration.c| 78 ++--
  migration/migration.h| 19 
  vl.c |  6 
  4 files changed, 81 insertions(+), 23 deletions(-)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index 65c7070..2d36cf5 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -45,6 +45,7 @@ void savevm_skip_section_footers(void);
  void savevm_skip_configuration(void);
  
  /* migration/migration.c */

+void migration_object_init(void);
  void qemu_start_incoming_migration(const char *uri, Error **errp);
  bool migration_is_idle(void);
  void add_migration_state_change_notifier(Notifier *notify);
diff --git a/migration/migration.c b/migration/migration.c
index f588329..2c25927 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -98,32 +98,21 @@ enum mig_rp_message_type {
 migrations at once.  For now we don't need to add
 dynamic creation of migration */
  
+static MigrationState *current_migration;

+
+void migration_object_init(void)
+{
+/* This can only be called once. */
+assert(!current_migration);
+current_migration = MIGRATION_OBJ(object_new(TYPE_MIGRATION));
+}
+
  /* For outgoing */
  MigrationState *migrate_get_current(void)
  {
-static bool once;
-static MigrationState current_migration = {
-.state = MIGRATION_STATUS_NONE,
-.xbzrle_cache_size = DEFAULT_MIGRATE_CACHE_SIZE,
-.mbps = -1,
-.parameters = {
-.compress_level = DEFAULT_MIGRATE_COMPRESS_LEVEL,
-.compress_threads = DEFAULT_MIGRATE_COMPRESS_THREAD_COUNT,
-.decompress_threads = DEFAULT_MIGRATE_DECOMPRESS_THREAD_COUNT,
-.cpu_throttle_initial = DEFAULT_MIGRATE_CPU_THROTTLE_INITIAL,
-.cpu_throttle_increment = DEFAULT_MIGRATE_CPU_THROTTLE_INCREMENT,
-.max_bandwidth = MAX_THROTTLE,
-.downtime_limit = DEFAULT_MIGRATE_SET_DOWNTIME,
-.x_checkpoint_delay = DEFAULT_MIGRATE_X_CHECKPOINT_DELAY,
-},
-};
-
-if (!once) {
-current_migration.parameters.tls_creds = g_strdup("");
-current_migration.parameters.tls_hostname = g_strdup("");
-once = true;
-}
-return _migration;
+/* This can only be called after the object created. */
+assert(current_migration);


This this pull I'v been unable to run qemu:

qemu-system-arm: migration/migration.c:127: migrate_get_current: 
Assertion `current_migration' failed.


I'v bisected to this commit using the following script:

#! /usr/bin/env bash
test -f test.qcow2 || qemu-img create -f qcow test.qcow2 1G
make -C build/system-arm subdir-arm-softmmu -j4 || exit 125
echo q | build/system-arm/arm-softmmu/qemu-system-arm -M virt \
  -drive if=none,file=test.qcow2,format=qcow,id=hd \
  -device virtio-blk-device,drive=hd \
  -nographic -serial null -monitor stdio
test $? -eq 0 || exit 1

Regards,

Phil.


+return current_migration;
  }
  
  MigrationIncomingState *migration_incoming_get_current(void)

@@ -1987,3 +1976,46 @@ void migrate_fd_connect(MigrationState *s)
  s->migration_thread_running = true;
  }
  
+static void migration_class_init(ObjectClass *klass, void *data)

+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+
+dc->user_creatable = false;
+}
+
+static void migration_instance_init(Object *obj)
+{
+MigrationState *ms = MIGRATION_OBJ(obj);
+
+ms->state = MIGRATION_STATUS_NONE;
+ms->xbzrle_cache_size = DEFAULT_MIGRATE_CACHE_SIZE;
+ms->mbps = -1;
+ms->parameters = (MigrationParameters) {
+.compress_level = DEFAULT_MIGRATE_COMPRESS_LEVEL,
+.compress_threads = DEFAULT_MIGRATE_COMPRESS_THREAD_COUNT,
+.decompress_threads = DEFAULT_MIGRATE_DECOMPRESS_THREAD_COUNT,
+.cpu_throttle_initial = DEFAULT_MIGRATE_CPU_THROTTLE_INITIAL,
+.cpu_throttle_increment =

Re: [Qemu-devel] [PATCH v3 14/20] backup: Switch BackupBlockJob to byte-based

2017-06-30 Thread Jeff Cody

On Tue, Jun 27, 2017 at 02:24:52PM -0500, Eric Blake wrote:
> We are gradually converting to byte-based interfaces, as they are
> easier to reason about than sector-based.  Continue by converting an
> internal structure (no semantic change), and all references to
> tracking progress.  Drop a redundant local variable bytes_per_cluster.
> 
> Signed-off-by: Eric Blake 
> Reviewed-by: John Snow 
> 

Reviewed-by: Jeff Cody 

> ---
> v2: no change
> ---
>  block/backup.c | 33 +++--
>  1 file changed, 15 insertions(+), 18 deletions(-)
> 
> diff --git a/block/backup.c b/block/backup.c
> index 06431ac..4e64710 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -39,7 +39,7 @@ typedef struct BackupBlockJob {
>  BlockdevOnError on_source_error;
>  BlockdevOnError on_target_error;
>  CoRwlock flush_rwlock;
> -uint64_t sectors_read;
> +uint64_t bytes_read;
>  unsigned long *done_bitmap;
>  int64_t cluster_size;
>  bool compress;
> @@ -102,16 +102,15 @@ static int coroutine_fn backup_do_cow(BackupBlockJob 
> *job,
>  void *bounce_buffer = NULL;
>  int ret = 0;
>  int64_t sectors_per_cluster = cluster_size_sectors(job);
> -int64_t bytes_per_cluster = sectors_per_cluster * BDRV_SECTOR_SIZE;
> -int64_t start, end;
> -int n;
> +int64_t start, end; /* clusters */
> +int n; /* bytes */
> 
>  qemu_co_rwlock_rdlock(>flush_rwlock);
> 
>  start = sector_num / sectors_per_cluster;
>  end = DIV_ROUND_UP(sector_num + nb_sectors, sectors_per_cluster);
> 
> -trace_backup_do_cow_enter(job, start * bytes_per_cluster,
> +trace_backup_do_cow_enter(job, start * job->cluster_size,
>sector_num * BDRV_SECTOR_SIZE,
>nb_sectors * BDRV_SECTOR_SIZE);
> 
> @@ -120,28 +119,27 @@ static int coroutine_fn backup_do_cow(BackupBlockJob 
> *job,
> 
>  for (; start < end; start++) {
>  if (test_bit(start, job->done_bitmap)) {
> -trace_backup_do_cow_skip(job, start * bytes_per_cluster);
> +trace_backup_do_cow_skip(job, start * job->cluster_size);
>  continue; /* already copied */
>  }
> 
> -trace_backup_do_cow_process(job, start * bytes_per_cluster);
> +trace_backup_do_cow_process(job, start * job->cluster_size);
> 
> -n = MIN(sectors_per_cluster,
> -job->common.len / BDRV_SECTOR_SIZE -
> -start * sectors_per_cluster);
> +n = MIN(job->cluster_size,
> +job->common.len - start * job->cluster_size);
> 
>  if (!bounce_buffer) {
>  bounce_buffer = blk_blockalign(blk, job->cluster_size);
>  }
>  iov.iov_base = bounce_buffer;
> -iov.iov_len = n * BDRV_SECTOR_SIZE;
> +iov.iov_len = n;
>  qemu_iovec_init_external(_qiov, , 1);
> 
>  ret = blk_co_preadv(blk, start * job->cluster_size,
>  bounce_qiov.size, _qiov,
>  is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0);
>  if (ret < 0) {
> -trace_backup_do_cow_read_fail(job, start * bytes_per_cluster, 
> ret);
> +trace_backup_do_cow_read_fail(job, start * job->cluster_size, 
> ret);
>  if (error_is_read) {
>  *error_is_read = true;
>  }
> @@ -157,7 +155,7 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
>   job->compress ? BDRV_REQ_WRITE_COMPRESSED : 
> 0);
>  }
>  if (ret < 0) {
> -trace_backup_do_cow_write_fail(job, start * bytes_per_cluster, 
> ret);
> +trace_backup_do_cow_write_fail(job, start * job->cluster_size, 
> ret);
>  if (error_is_read) {
>  *error_is_read = false;
>  }
> @@ -169,8 +167,8 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
>  /* Publish progress, guest I/O counts as progress too.  Note that the
>   * offset field is an opaque progress value, it is not a disk offset.
>   */
> -job->sectors_read += n;
> -job->common.offset += n * BDRV_SECTOR_SIZE;
> +job->bytes_read += n;
> +job->common.offset += n;
>  }
> 
>  out:
> @@ -363,9 +361,8 @@ static bool coroutine_fn yield_and_check(BackupBlockJob 
> *job)
>   */
>  if (job->common.speed) {
>  uint64_t delay_ns = ratelimit_calculate_delay(>limit,
> -  job->sectors_read *
> -  BDRV_SECTOR_SIZE);
> -job->sectors_read = 0;
> +  job->bytes_read);
> +job->bytes_read = 0;
>  block_job_sleep_ns(>common, QEMU_CLOCK_REALTIME, delay_ns);
>  } else {
>  block_job_sleep_ns(>common, QEMU_CLOCK_REALTIME, 0);
>

Re: [Qemu-devel] [Qemu-block] [PATCH v3 13/20] block: Drop unused bdrv_round_sectors_to_clusters()

2017-06-30 Thread Jeff Cody

On Tue, Jun 27, 2017 at 02:24:51PM -0500, Eric Blake wrote:
> Now that the last user [mirror_iteration()] has converted to using
> bytes, we no longer need a function to round sectors to clusters.
> 
> Signed-off-by: Eric Blake 
> Reviewed-by: John Snow 
> 

Reviewed-by: Jeff Cody 

> ---
> v2: hoist to earlier series, no change
> ---
>  include/block/block.h |  4 
>  block/io.c| 21 -
>  2 files changed, 25 deletions(-)
> 
> diff --git a/include/block/block.h b/include/block/block.h
> index 3e91cac..5cdd690 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -473,10 +473,6 @@ const char *bdrv_get_device_or_node_name(const 
> BlockDriverState *bs);
>  int bdrv_get_flags(BlockDriverState *bs);
>  int bdrv_get_info(BlockDriverState *bs, BlockDriverInfo *bdi);
>  ImageInfoSpecific *bdrv_get_specific_info(BlockDriverState *bs);
> -void bdrv_round_sectors_to_clusters(BlockDriverState *bs,
> -int64_t sector_num, int nb_sectors,
> -int64_t *cluster_sector_num,
> -int *cluster_nb_sectors);
>  void bdrv_round_to_clusters(BlockDriverState *bs,
>  int64_t offset, unsigned int bytes,
>  int64_t *cluster_offset,
> diff --git a/block/io.c b/block/io.c
> index c72d701..d9fec1f 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -419,27 +419,6 @@ static void mark_request_serialising(BdrvTrackedRequest 
> *req, uint64_t align)
>  }
> 
>  /**
> - * Round a region to cluster boundaries (sector-based)
> - */
> -void bdrv_round_sectors_to_clusters(BlockDriverState *bs,
> -int64_t sector_num, int nb_sectors,
> -int64_t *cluster_sector_num,
> -int *cluster_nb_sectors)
> -{
> -BlockDriverInfo bdi;
> -
> -if (bdrv_get_info(bs, ) < 0 || bdi.cluster_size == 0) {
> -*cluster_sector_num = sector_num;
> -*cluster_nb_sectors = nb_sectors;
> -} else {
> -int64_t c = bdi.cluster_size / BDRV_SECTOR_SIZE;
> -*cluster_sector_num = QEMU_ALIGN_DOWN(sector_num, c);
> -*cluster_nb_sectors = QEMU_ALIGN_UP(sector_num - *cluster_sector_num 
> +
> -nb_sectors, c);
> -}
> -}
> -
> -/**
>   * Round a region to cluster boundaries
>   */
>  void bdrv_round_to_clusters(BlockDriverState *bs,
> -- 
> 2.9.4
> 
>

Re: [Qemu-devel] [PATCH] util/cacheinfo: Fix warning generated by clang

2017-06-30 Thread Emilio G. Cota

On Fri, Jun 30, 2017 at 11:39:46 -0400, Pranith Kumar wrote:
> Clang generates the following warning on aarch64 host:
> 
>   CC  util/cacheinfo.o
> /home/pranith/qemu/util/cacheinfo.c:121:48: warning: value size does not 
> match register size specified by the constraint and modifier 
> [-Wasm-operand-widths]
> asm volatile("mrs\t%0, ctr_el0" : "=r"(ctr));
>^
> /home/pranith/qemu/util/cacheinfo.c:121:28: note: use constraint modifier "w"
> asm volatile("mrs\t%0, ctr_el0" : "=r"(ctr));
>^~
>%w0
> 
> Constraint modifier 'w' is not (yet?) accepted by gcc. Fix this by increasing 
> the ctr size.
> 
> Signed-off-by: Pranith Kumar 

I can reproduce with clang 3.9.1.

Tested-by: Emilio G. Cota 
Reviewed-by: Emilio G. Cota 

Thanks,

Emilio

Re: [Qemu-devel] [PATCH v3 12/20] mirror: Switch mirror_iteration() to byte-based

2017-06-30 Thread Jeff Cody

On Tue, Jun 27, 2017 at 02:24:50PM -0500, Eric Blake wrote:
> We are gradually converting to byte-based interfaces, as they are
> easier to reason about than sector-based.  Change the internal
> loop iteration of mirroring to track by bytes instead of sectors
> (although we are still guaranteed that we iterate by steps that
> are both sector-aligned and multiples of the granularity).  Drop
> the now-unused mirror_clip_sectors().
> 
> Signed-off-by: Eric Blake 
> Reviewed-by: John Snow 
> 

Reviewed-by: Jeff Cody 

> ---
> v3: rebase to Paolo's thread-safety changes, R-b kept
> v2: straightforward rebase to earlier mirror_clip_bytes() change, R-b kept
> ---
>  block/mirror.c | 105 
> +
>  1 file changed, 46 insertions(+), 59 deletions(-)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index 81ff784..0eb2af4 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -184,15 +184,6 @@ static inline int64_t mirror_clip_bytes(MirrorBlockJob 
> *s,
>  return MIN(bytes, s->bdev_length - offset);
>  }
> 
> -/* Clip nb_sectors relative to sector_num to not exceed end-of-file */
> -static inline int mirror_clip_sectors(MirrorBlockJob *s,
> -  int64_t sector_num,
> -  int nb_sectors)
> -{
> -return MIN(nb_sectors,
> -   s->bdev_length / BDRV_SECTOR_SIZE - sector_num);
> -}
> -
>  /* Round offset and/or bytes to target cluster if COW is needed, and
>   * return the offset of the adjusted tail against original. */
>  static int mirror_cow_align(MirrorBlockJob *s, int64_t *offset,
> @@ -336,30 +327,28 @@ static void mirror_do_zero_or_discard(MirrorBlockJob *s,
>  static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
>  {
>  BlockDriverState *source = s->source;
> -int64_t sector_num, first_chunk;
> +int64_t offset, first_chunk;
>  uint64_t delay_ns = 0;
>  /* At least the first dirty chunk is mirrored in one iteration. */
>  int nb_chunks = 1;
> -int64_t end = s->bdev_length / BDRV_SECTOR_SIZE;
>  int sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS;
>  bool write_zeroes_ok = 
> bdrv_can_write_zeroes_with_unmap(blk_bs(s->target));
>  int max_io_bytes = MAX(s->buf_size / MAX_IN_FLIGHT, MAX_IO_BYTES);
> 
>  bdrv_dirty_bitmap_lock(s->dirty_bitmap);
> -sector_num = bdrv_dirty_iter_next(s->dbi);
> -if (sector_num < 0) {
> +offset = bdrv_dirty_iter_next(s->dbi) * BDRV_SECTOR_SIZE;
> +if (offset < 0) {
>  bdrv_set_dirty_iter(s->dbi, 0);
> -sector_num = bdrv_dirty_iter_next(s->dbi);
> +offset = bdrv_dirty_iter_next(s->dbi) * BDRV_SECTOR_SIZE;
>  trace_mirror_restart_iter(s, bdrv_get_dirty_count(s->dirty_bitmap) *
>BDRV_SECTOR_SIZE);
> -assert(sector_num >= 0);
> +assert(offset >= 0);
>  }
>  bdrv_dirty_bitmap_unlock(s->dirty_bitmap);
> 
> -first_chunk = sector_num / sectors_per_chunk;
> +first_chunk = offset / s->granularity;
>  while (test_bit(first_chunk, s->in_flight_bitmap)) {
> -trace_mirror_yield_in_flight(s, sector_num * BDRV_SECTOR_SIZE,
> - s->in_flight);
> +trace_mirror_yield_in_flight(s, offset, s->in_flight);
>  mirror_wait_for_io(s);
>  }
> 
> @@ -368,25 +357,26 @@ static uint64_t coroutine_fn 
> mirror_iteration(MirrorBlockJob *s)
>  /* Find the number of consective dirty chunks following the first dirty
>   * one, and wait for in flight requests in them. */
>  bdrv_dirty_bitmap_lock(s->dirty_bitmap);
> -while (nb_chunks * sectors_per_chunk < (s->buf_size >> 
> BDRV_SECTOR_BITS)) {
> +while (nb_chunks * s->granularity < s->buf_size) {
>  int64_t next_dirty;
> -int64_t next_sector = sector_num + nb_chunks * sectors_per_chunk;
> -int64_t next_chunk = next_sector / sectors_per_chunk;
> -if (next_sector >= end ||
> -!bdrv_get_dirty_locked(source, s->dirty_bitmap, next_sector)) {
> +int64_t next_offset = offset + nb_chunks * s->granularity;
> +int64_t next_chunk = next_offset / s->granularity;
> +if (next_offset >= s->bdev_length ||
> +!bdrv_get_dirty_locked(source, s->dirty_bitmap,
> +   next_offset >> BDRV_SECTOR_BITS)) {
>  break;
>  }
>  if (test_bit(next_chunk, s->in_flight_bitmap)) {
>  break;
>  }
> 
> -next_dirty = bdrv_dirty_iter_next(s->dbi);
> -if (next_dirty > next_sector || next_dirty < 0) {
> +next_dirty = bdrv_dirty_iter_next(s->dbi) * BDRV_SECTOR_SIZE;
> +if (next_dirty > next_offset || next_dirty < 0) {
>  /* The bitmap iterator's cache is stale, refresh it */
> -bdrv_set_dirty_iter(s->dbi, next_sector);
> -

Re: [Qemu-devel] [PATCH v3 11/20] mirror: Switch mirror_do_read() to byte-based

2017-06-30 Thread Jeff Cody

On Tue, Jun 27, 2017 at 02:24:49PM -0500, Eric Blake wrote:
> We are gradually converting to byte-based interfaces, as they are
> easier to reason about than sector-based.  Convert another internal
> function (no semantic change).
> 
> Signed-off-by: Eric Blake 
> Reviewed-by: John Snow 
> 

Reviewed-by: Jeff Cody 

> ---
> v2: rebase to earlier changes
> ---
>  block/mirror.c | 75 
> ++
>  1 file changed, 33 insertions(+), 42 deletions(-)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index 1a4602a..81ff784 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -196,7 +196,7 @@ static inline int mirror_clip_sectors(MirrorBlockJob *s,
>  /* Round offset and/or bytes to target cluster if COW is needed, and
>   * return the offset of the adjusted tail against original. */
>  static int mirror_cow_align(MirrorBlockJob *s, int64_t *offset,
> -unsigned int *bytes)
> +uint64_t *bytes)
>  {
>  bool need_cow;
>  int ret = 0;
> @@ -204,6 +204,7 @@ static int mirror_cow_align(MirrorBlockJob *s, int64_t 
> *offset,
>  unsigned int align_bytes = *bytes;
>  int max_bytes = s->granularity * s->max_iov;
> 
> +assert(*bytes < INT_MAX);
>  need_cow = !test_bit(*offset / s->granularity, s->cow_bitmap);
>  need_cow |= !test_bit((*offset + *bytes - 1) / s->granularity,
>s->cow_bitmap);
> @@ -239,59 +240,50 @@ static inline void mirror_wait_for_io(MirrorBlockJob *s)
>  }
> 
>  /* Submit async read while handling COW.
> - * Returns: The number of sectors copied after and including sector_num,
> - *  excluding any sectors copied prior to sector_num due to 
> alignment.
> - *  This will be nb_sectors if no alignment is necessary, or
> - *  (new_end - sector_num) if tail is rounded up or down due to
> + * Returns: The number of bytes copied after and including offset,
> + *  excluding any bytes copied prior to offset due to alignment.
> + *  This will be @bytes if no alignment is necessary, or
> + *  (new_end - offset) if tail is rounded up or down due to
>   *  alignment or buffer limit.
>   */
> -static int mirror_do_read(MirrorBlockJob *s, int64_t sector_num,
> -  int nb_sectors)
> +static uint64_t mirror_do_read(MirrorBlockJob *s, int64_t offset,
> +   uint64_t bytes)
>  {
>  BlockBackend *source = s->common.blk;
> -int sectors_per_chunk, nb_chunks;
> -int ret;
> +int nb_chunks;
> +uint64_t ret;
>  MirrorOp *op;
> -int max_sectors;
> +uint64_t max_bytes;
> 
> -sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS;
> -max_sectors = sectors_per_chunk * s->max_iov;
> +max_bytes = s->granularity * s->max_iov;
> 
>  /* We can only handle as much as buf_size at a time. */
> -nb_sectors = MIN(s->buf_size >> BDRV_SECTOR_BITS, nb_sectors);
> -nb_sectors = MIN(max_sectors, nb_sectors);
> -assert(nb_sectors);
> -assert(nb_sectors < BDRV_REQUEST_MAX_SECTORS);
> -ret = nb_sectors;
> +bytes = MIN(s->buf_size, MIN(max_bytes, bytes));
> +assert(bytes);
> +assert(bytes < BDRV_REQUEST_MAX_BYTES);
> +ret = bytes;
> 
>  if (s->cow_bitmap) {
> -int64_t offset = sector_num * BDRV_SECTOR_SIZE;
> -unsigned int bytes = nb_sectors * BDRV_SECTOR_SIZE;
> -int gap;
> -
> -gap = mirror_cow_align(s, , );
> -sector_num = offset / BDRV_SECTOR_SIZE;
> -nb_sectors = bytes / BDRV_SECTOR_SIZE;
> -ret += gap / BDRV_SECTOR_SIZE;
> +ret += mirror_cow_align(s, , );
>  }
> -assert(nb_sectors << BDRV_SECTOR_BITS <= s->buf_size);
> -/* The sector range must meet granularity because:
> +assert(bytes <= s->buf_size);
> +/* The range will be sector-aligned because:
>   * 1) Caller passes in aligned values;
> - * 2) mirror_cow_align is used only when target cluster is larger. */
> -assert(!(sector_num % sectors_per_chunk));
> -nb_chunks = DIV_ROUND_UP(nb_sectors, sectors_per_chunk);
> + * 2) mirror_cow_align is used only when target cluster is larger.
> + * But it might not be cluster-aligned at end-of-file. */
> +assert(QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE));
> +nb_chunks = DIV_ROUND_UP(bytes, s->granularity);
> 
>  while (s->buf_free_count < nb_chunks) {
> -trace_mirror_yield_in_flight(s, sector_num * BDRV_SECTOR_SIZE,
> - s->in_flight);
> +trace_mirror_yield_in_flight(s, offset, s->in_flight);
>  mirror_wait_for_io(s);
>  }
> 
>  /* Allocate a MirrorOp that is used as an AIO callback.  */
>  op = g_new(MirrorOp, 1);
>  op->s = s;
> -op->offset = sector_num * BDRV_SECTOR_SIZE;
> -op->bytes = nb_sectors * BDRV_SECTOR_SIZE;
> +op->offset =

Re: [Qemu-devel] [PATCH v3 10/20] mirror: Switch mirror_cow_align() to byte-based

2017-06-30 Thread Jeff Cody

On Tue, Jun 27, 2017 at 02:24:48PM -0500, Eric Blake wrote:
> We are gradually converting to byte-based interfaces, as they are
> easier to reason about than sector-based.  Convert another internal
> function (no semantic change), and add mirror_clip_bytes() as a
> counterpart to mirror_clip_sectors().  Some of the conversion is
> a bit tricky, requiring temporaries to convert between units; it
> will be cleared up in a following patch.
> 
> Signed-off-by: Eric Blake 
> Reviewed-by: John Snow 
> 

Reviewed-by: Jeff Cody 

> ---
> v2: tweak mirror_clip_bytes() signature to match previous patch
> ---
>  block/mirror.c | 64 
> ++
>  1 file changed, 38 insertions(+), 26 deletions(-)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index 1a43304..1a4602a 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -176,6 +176,15 @@ static void mirror_read_complete(void *opaque, int ret)
>  aio_context_release(blk_get_aio_context(s->common.blk));
>  }
> 
> +/* Clip bytes relative to offset to not exceed end-of-file */
> +static inline int64_t mirror_clip_bytes(MirrorBlockJob *s,
> +int64_t offset,
> +int64_t bytes)
> +{
> +return MIN(bytes, s->bdev_length - offset);
> +}
> +
> +/* Clip nb_sectors relative to sector_num to not exceed end-of-file */
>  static inline int mirror_clip_sectors(MirrorBlockJob *s,
>int64_t sector_num,
>int nb_sectors)
> @@ -184,44 +193,39 @@ static inline int mirror_clip_sectors(MirrorBlockJob *s,
> s->bdev_length / BDRV_SECTOR_SIZE - sector_num);
>  }
> 
> -/* Round sector_num and/or nb_sectors to target cluster if COW is needed, and
> - * return the offset of the adjusted tail sector against original. */
> -static int mirror_cow_align(MirrorBlockJob *s,
> -int64_t *sector_num,
> -int *nb_sectors)
> +/* Round offset and/or bytes to target cluster if COW is needed, and
> + * return the offset of the adjusted tail against original. */
> +static int mirror_cow_align(MirrorBlockJob *s, int64_t *offset,
> +unsigned int *bytes)
>  {
>  bool need_cow;
>  int ret = 0;
> -int chunk_sectors = s->granularity >> BDRV_SECTOR_BITS;
> -int64_t align_sector_num = *sector_num;
> -int align_nb_sectors = *nb_sectors;
> -int max_sectors = chunk_sectors * s->max_iov;
> +int64_t align_offset = *offset;
> +unsigned int align_bytes = *bytes;
> +int max_bytes = s->granularity * s->max_iov;
> 
> -need_cow = !test_bit(*sector_num / chunk_sectors, s->cow_bitmap);
> -need_cow |= !test_bit((*sector_num + *nb_sectors - 1) / chunk_sectors,
> +need_cow = !test_bit(*offset / s->granularity, s->cow_bitmap);
> +need_cow |= !test_bit((*offset + *bytes - 1) / s->granularity,
>s->cow_bitmap);
>  if (need_cow) {
> -bdrv_round_sectors_to_clusters(blk_bs(s->target), *sector_num,
> -   *nb_sectors, _sector_num,
> -   _nb_sectors);
> +bdrv_round_to_clusters(blk_bs(s->target), *offset, *bytes,
> +   _offset, _bytes);
>  }
> 
> -if (align_nb_sectors > max_sectors) {
> -align_nb_sectors = max_sectors;
> +if (align_bytes > max_bytes) {
> +align_bytes = max_bytes;
>  if (need_cow) {
> -align_nb_sectors = QEMU_ALIGN_DOWN(align_nb_sectors,
> -   s->target_cluster_size >>
> -   BDRV_SECTOR_BITS);
> +align_bytes = QEMU_ALIGN_DOWN(align_bytes,
> +  s->target_cluster_size);
>  }
>  }
> -/* Clipping may result in align_nb_sectors unaligned to chunk boundary, 
> but
> +/* Clipping may result in align_bytes unaligned to chunk boundary, but
>   * that doesn't matter because it's already the end of source image. */
> -align_nb_sectors = mirror_clip_sectors(s, align_sector_num,
> -   align_nb_sectors);
> +align_bytes = mirror_clip_bytes(s, align_offset, align_bytes);
> 
> -ret = align_sector_num + align_nb_sectors - (*sector_num + *nb_sectors);
> -*sector_num = align_sector_num;
> -*nb_sectors = align_nb_sectors;
> +ret = align_offset + align_bytes - (*offset + *bytes);
> +*offset = align_offset;
> +*bytes = align_bytes;
>  assert(ret >= 0);
>  return ret;
>  }
> @@ -257,10 +261,18 @@ static int mirror_do_read(MirrorBlockJob *s, int64_t 
> sector_num,
>  nb_sectors = MIN(s->buf_size >> BDRV_SECTOR_BITS, nb_sectors);
>  nb_sectors = MIN(max_sectors, nb_sectors);
>

Re: [Qemu-devel] [RFC 3/7] translate-all: use a binary search tree to track TBs in TBContext

2017-06-30 Thread Emilio G. Cota

On Fri, Jun 30, 2017 at 00:49:37 -0700, Richard Henderson wrote:
> On 06/30/2017 12:41 AM, Richard Henderson wrote:
> >On 06/29/2017 01:28 PM, Emilio G. Cota wrote:
> >>+/* @key is already in the tree so it's safe to use container_of on it */
> >>+static gint tc_ptr_cmp(gconstpointer candidate, gconstpointer key)
> >>+{
> >>+uintptr_t a = *(uintptr_t *)candidate;
> >>+const TranslationBlock *tb = container_of(key, TranslationBlock, 
> >>tc_ptr);
> >
> >This I'm not keen on.  It'd be one thing if it was our own datastructure,
> >but I see nothing in the GTree documentation that says that the comparison
> >must always be done this way.

I also checked for this. Couldn't find anything in the docs -- the
only guarantee I could find is the implicit one that since the g_tree
module was created, the 2nd pointer ("b" above) has consistenly been
the in-node one. I don't think they'd ever change this, but yes
relying on this assumption is a bit risky.

If we prefer using our own we could bring the AVL tree from CCAN:
  http://git.ozlabs.org/?p=ccan;a=tree;f=ccan/avl

> What if we bundle tc_ptr + tc_size into a struct and only reference that?
> We'd embed that struct into the TB.  In tb_find_pc, create that struct on
> the stack, setting tc_size = 0.

This is clean and safe, but we'd add a 4-byte hole for 64-on-64bit.
However, we could bring other fields into the embedded struct to plug
the hole. Also, using an anonymous struct would hide this from the
calling code:

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 4b4c143..07f1f50 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -319,16 +319,18 @@ struct TranslationBlock {
 uint16_t size;  /* size of target code for this block (1 <=
size <= TARGET_PAGE_SIZE) */
 uint16_t icount;
-uint32_t cflags;/* compile flags */
+
+struct {
+void *tc_ptr;/* pointer to the translated code */
+int32_t out_size; /* size of host code for this block */
+uint32_t cflags;/* compile flags */
 #define CF_COUNT_MASK  0x7fff
 #define CF_LAST_IO 0x8000 /* Last insn may be an IO access.  */
 #define CF_NOCACHE 0x1 /* To be freed after execution */
 #define CF_USE_ICOUNT  0x2
 #define CF_IGNORE_ICOUNT 0x4 /* Do not generate icount code */
+};
 
-uint16_t invalid;
-
-void *tc_ptr;/* pointer to the translated code */
 uint8_t *tc_search;  /* pointer to search data */
 /* original tb when cflags has CF_NOCACHE */
 struct TranslationBlock *orig_tb;
@@ -365,7 +367,7 @@ struct TranslationBlock {
  */
 uintptr_t jmp_list_next[2];
 uintptr_t jmp_list_first;
-int32_t out_size; /* size of host code for this block */
+uint16_t invalid;
 };

That is 122 bytes, with all 6 bytes of padding at the end.
We also move invalid to the 2nd cache line, which I'm not sure
it would matter much (I liked having out_size there because
it's fairly slow path).

Also I'd rename out_size to tc_size.

E.

Re: [Qemu-devel] [PATCH v3 09/20] mirror: Update signature of mirror_clip_sectors()

2017-06-30 Thread Jeff Cody

On Tue, Jun 27, 2017 at 02:24:47PM -0500, Eric Blake wrote:
> Rather than having a void function that modifies its input
> in-place as the output, change the signature to reduce a layer
> of indirection and return the result.
> 
> Suggested-by: John Snow 
> Signed-off-by: Eric Blake 
> Reviewed-by: John Snow 
> 

Reviewed-by: Jeff Cody 

> ---
> v2: new patch
> ---
>  block/mirror.c | 15 ---
>  1 file changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index af27bcc..1a43304 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -176,12 +176,12 @@ static void mirror_read_complete(void *opaque, int ret)
>  aio_context_release(blk_get_aio_context(s->common.blk));
>  }
> 
> -static inline void mirror_clip_sectors(MirrorBlockJob *s,
> -   int64_t sector_num,
> -   int *nb_sectors)
> +static inline int mirror_clip_sectors(MirrorBlockJob *s,
> +  int64_t sector_num,
> +  int nb_sectors)
>  {
> -*nb_sectors = MIN(*nb_sectors,
> -  s->bdev_length / BDRV_SECTOR_SIZE - sector_num);
> +return MIN(nb_sectors,
> +   s->bdev_length / BDRV_SECTOR_SIZE - sector_num);
>  }
> 
>  /* Round sector_num and/or nb_sectors to target cluster if COW is needed, and
> @@ -216,7 +216,8 @@ static int mirror_cow_align(MirrorBlockJob *s,
>  }
>  /* Clipping may result in align_nb_sectors unaligned to chunk boundary, 
> but
>   * that doesn't matter because it's already the end of source image. */
> -mirror_clip_sectors(s, align_sector_num, _nb_sectors);
> +align_nb_sectors = mirror_clip_sectors(s, align_sector_num,
> +   align_nb_sectors);
> 
>  ret = align_sector_num + align_nb_sectors - (*sector_num + *nb_sectors);
>  *sector_num = align_sector_num;
> @@ -445,7 +446,7 @@ static uint64_t coroutine_fn 
> mirror_iteration(MirrorBlockJob *s)
>  return 0;
>  }
> 
> -mirror_clip_sectors(s, sector_num, _sectors);
> +io_sectors = mirror_clip_sectors(s, sector_num, io_sectors);
>  switch (mirror_method) {
>  case MIRROR_METHOD_COPY:
>  io_sectors = mirror_do_read(s, sector_num, io_sectors);
> -- 
> 2.9.4
>

Re: [Qemu-devel] [PATCH v3 08/20] mirror: Switch mirror_do_zero_or_discard() to byte-based

2017-06-30 Thread Jeff Cody

On Tue, Jun 27, 2017 at 02:24:46PM -0500, Eric Blake wrote:
> We are gradually converting to byte-based interfaces, as they are
> easier to reason about than sector-based.  Convert another internal
> function (no semantic change).
> 
> Signed-off-by: Eric Blake 
> Reviewed-by: John Snow 
> 

Reviewed-by: Jeff Cody 

> ---
> v2: no change
> ---
>  block/mirror.c | 20 +++-
>  1 file changed, 11 insertions(+), 9 deletions(-)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index 9e28d59..af27bcc 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -305,8 +305,8 @@ static int mirror_do_read(MirrorBlockJob *s, int64_t 
> sector_num,
>  }
> 
>  static void mirror_do_zero_or_discard(MirrorBlockJob *s,
> -  int64_t sector_num,
> -  int nb_sectors,
> +  int64_t offset,
> +  uint64_t bytes,
>bool is_discard)
>  {
>  MirrorOp *op;
> @@ -315,16 +315,16 @@ static void mirror_do_zero_or_discard(MirrorBlockJob *s,
>   * so the freeing in mirror_iteration_done is nop. */
>  op = g_new0(MirrorOp, 1);
>  op->s = s;
> -op->offset = sector_num * BDRV_SECTOR_SIZE;
> -op->bytes = nb_sectors * BDRV_SECTOR_SIZE;
> +op->offset = offset;
> +op->bytes = bytes;
> 
>  s->in_flight++;
> -s->bytes_in_flight += nb_sectors * BDRV_SECTOR_SIZE;
> +s->bytes_in_flight += bytes;
>  if (is_discard) {
> -blk_aio_pdiscard(s->target, sector_num << BDRV_SECTOR_BITS,
> +blk_aio_pdiscard(s->target, offset,
>   op->bytes, mirror_write_complete, op);
>  } else {
> -blk_aio_pwrite_zeroes(s->target, sector_num * BDRV_SECTOR_SIZE,
> +blk_aio_pwrite_zeroes(s->target, offset,
>op->bytes, s->unmap ? BDRV_REQ_MAY_UNMAP : 0,
>mirror_write_complete, op);
>  }
> @@ -453,7 +453,8 @@ static uint64_t coroutine_fn 
> mirror_iteration(MirrorBlockJob *s)
>  break;
>  case MIRROR_METHOD_ZERO:
>  case MIRROR_METHOD_DISCARD:
> -mirror_do_zero_or_discard(s, sector_num, io_sectors,
> +mirror_do_zero_or_discard(s, sector_num * BDRV_SECTOR_SIZE,
> +  io_sectors * BDRV_SECTOR_SIZE,
>mirror_method == 
> MIRROR_METHOD_DISCARD);
>  if (write_zeroes_ok) {
>  io_bytes_acct = 0;
> @@ -657,7 +658,8 @@ static int coroutine_fn mirror_dirty_init(MirrorBlockJob 
> *s)
>  continue;
>  }
> 
> -mirror_do_zero_or_discard(s, sector_num, nb_sectors, false);
> +mirror_do_zero_or_discard(s, sector_num * BDRV_SECTOR_SIZE,
> +  nb_sectors * BDRV_SECTOR_SIZE, false);
>  sector_num += nb_sectors;
>  }
> 
> -- 
> 2.9.4
>

Re: [Qemu-devel] [PATCH v3 07/20] mirror: Switch MirrorBlockJob to byte-based

2017-06-30 Thread Jeff Cody

On Tue, Jun 27, 2017 at 02:24:45PM -0500, Eric Blake wrote:
> We are gradually converting to byte-based interfaces, as they are
> easier to reason about than sector-based.  Continue by converting an
> internal structure (no semantic change), and all references to the
> buffer size.
> 
> [checkpatch has a false positive on use of MIN() in this patch]
> 
> Signed-off-by: Eric Blake 
> Reviewed-by: John Snow 
> 

Reviewed-by: Jeff Cody 

> ---
> v2: no change
> ---
>  block/mirror.c | 79 
> --
>  1 file changed, 38 insertions(+), 41 deletions(-)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index b4dfe95..9e28d59 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -24,9 +24,8 @@
> 
>  #define SLICE_TIME1ULL /* ns */
>  #define MAX_IN_FLIGHT 16
> -#define MAX_IO_SECTORS ((1 << 20) >> BDRV_SECTOR_BITS) /* 1 Mb */
> -#define DEFAULT_MIRROR_BUF_SIZE \
> -(MAX_IN_FLIGHT * MAX_IO_SECTORS * BDRV_SECTOR_SIZE)
> +#define MAX_IO_BYTES (1 << 20) /* 1 Mb */
> +#define DEFAULT_MIRROR_BUF_SIZE (MAX_IN_FLIGHT * MAX_IO_BYTES)
> 
>  /* The mirroring buffer is a list of granularity-sized chunks.
>   * Free chunks are organized in a list.
> @@ -67,11 +66,11 @@ typedef struct MirrorBlockJob {
>  uint64_t last_pause_ns;
>  unsigned long *in_flight_bitmap;
>  int in_flight;
> -int64_t sectors_in_flight;
> +int64_t bytes_in_flight;
>  int ret;
>  bool unmap;
>  bool waiting_for_io;
> -int target_cluster_sectors;
> +int target_cluster_size;
>  int max_iov;
>  bool initial_zeroing_ongoing;
>  } MirrorBlockJob;
> @@ -79,8 +78,8 @@ typedef struct MirrorBlockJob {
>  typedef struct MirrorOp {
>  MirrorBlockJob *s;
>  QEMUIOVector qiov;
> -int64_t sector_num;
> -int nb_sectors;
> +int64_t offset;
> +uint64_t bytes;
>  } MirrorOp;
> 
>  static BlockErrorAction mirror_error_action(MirrorBlockJob *s, bool read,
> @@ -101,13 +100,12 @@ static void mirror_iteration_done(MirrorOp *op, int ret)
>  MirrorBlockJob *s = op->s;
>  struct iovec *iov;
>  int64_t chunk_num;
> -int i, nb_chunks, sectors_per_chunk;
> +int i, nb_chunks;
> 
> -trace_mirror_iteration_done(s, op->sector_num * BDRV_SECTOR_SIZE,
> -op->nb_sectors * BDRV_SECTOR_SIZE, ret);
> +trace_mirror_iteration_done(s, op->offset, op->bytes, ret);
> 
>  s->in_flight--;
> -s->sectors_in_flight -= op->nb_sectors;
> +s->bytes_in_flight -= op->bytes;
>  iov = op->qiov.iov;
>  for (i = 0; i < op->qiov.niov; i++) {
>  MirrorBuffer *buf = (MirrorBuffer *) iov[i].iov_base;
> @@ -115,16 +113,15 @@ static void mirror_iteration_done(MirrorOp *op, int ret)
>  s->buf_free_count++;
>  }
> 
> -sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS;
> -chunk_num = op->sector_num / sectors_per_chunk;
> -nb_chunks = DIV_ROUND_UP(op->nb_sectors, sectors_per_chunk);
> +chunk_num = op->offset / s->granularity;
> +nb_chunks = DIV_ROUND_UP(op->bytes, s->granularity);
>  bitmap_clear(s->in_flight_bitmap, chunk_num, nb_chunks);
>  if (ret >= 0) {
>  if (s->cow_bitmap) {
>  bitmap_set(s->cow_bitmap, chunk_num, nb_chunks);
>  }
>  if (!s->initial_zeroing_ongoing) {
> -s->common.offset += (uint64_t)op->nb_sectors * BDRV_SECTOR_SIZE;
> +s->common.offset += op->bytes;
>  }
>  }
>  qemu_iovec_destroy(>qiov);
> @@ -144,7 +141,8 @@ static void mirror_write_complete(void *opaque, int ret)
>  if (ret < 0) {
>  BlockErrorAction action;
> 
> -bdrv_set_dirty_bitmap(s->dirty_bitmap, op->sector_num, 
> op->nb_sectors);
> +bdrv_set_dirty_bitmap(s->dirty_bitmap, op->offset >> 
> BDRV_SECTOR_BITS,
> +  op->bytes >> BDRV_SECTOR_BITS);
>  action = mirror_error_action(s, false, -ret);
>  if (action == BLOCK_ERROR_ACTION_REPORT && s->ret >= 0) {
>  s->ret = ret;
> @@ -163,7 +161,8 @@ static void mirror_read_complete(void *opaque, int ret)
>  if (ret < 0) {
>  BlockErrorAction action;
> 
> -bdrv_set_dirty_bitmap(s->dirty_bitmap, op->sector_num, 
> op->nb_sectors);
> +bdrv_set_dirty_bitmap(s->dirty_bitmap, op->offset >> 
> BDRV_SECTOR_BITS,
> +  op->bytes >> BDRV_SECTOR_BITS);
>  action = mirror_error_action(s, true, -ret);
>  if (action == BLOCK_ERROR_ACTION_REPORT && s->ret >= 0) {
>  s->ret = ret;
> @@ -171,7 +170,7 @@ static void mirror_read_complete(void *opaque, int ret)
> 
>  mirror_iteration_done(op, ret);
>  } else {
> -blk_aio_pwritev(s->target, op->sector_num * BDRV_SECTOR_SIZE, 
> >qiov,
> +blk_aio_pwritev(s->target, op->offset, >qiov,
>  0, mirror_write_complete, op);
>  }
>

Re: [Qemu-devel] [PATCH v3 06/20] commit: Switch commit_run() to byte-based

2017-06-30 Thread Jeff Cody

On Tue, Jun 27, 2017 at 02:24:44PM -0500, Eric Blake wrote:
> We are gradually converting to byte-based interfaces, as they are
> easier to reason about than sector-based.  Change the internal
> loop iteration of committing to track by bytes instead of sectors
> (although we are still guaranteed that we iterate by steps that
> are sector-aligned).
> 
> Signed-off-by: Eric Blake 
> Reviewed-by: John Snow 
> 

Reviewed-by: Jeff Cody 

> ---
> v2: no change
> ---
>  block/commit.c | 16 ++--
>  1 file changed, 6 insertions(+), 10 deletions(-)
> 
> diff --git a/block/commit.c b/block/commit.c
> index 6f67d78..c3a7bca 100644
> --- a/block/commit.c
> +++ b/block/commit.c
> @@ -143,17 +143,16 @@ static void coroutine_fn commit_run(void *opaque)
>  {
>  CommitBlockJob *s = opaque;
>  CommitCompleteData *data;
> -int64_t sector_num, end;
> +int64_t offset;
>  uint64_t delay_ns = 0;
>  int ret = 0;
> -int n = 0;
> +int n = 0; /* sectors */
>  void *buf = NULL;
>  int bytes_written = 0;
>  int64_t base_len;
> 
>  ret = s->common.len = blk_getlength(s->top);
> 
> -
>  if (s->common.len < 0) {
>  goto out;
>  }
> @@ -170,10 +169,9 @@ static void coroutine_fn commit_run(void *opaque)
>  }
>  }
> 
> -end = s->common.len >> BDRV_SECTOR_BITS;
>  buf = blk_blockalign(s->top, COMMIT_BUFFER_SIZE);
> 
> -for (sector_num = 0; sector_num < end; sector_num += n) {
> +for (offset = 0; offset < s->common.len; offset += n * BDRV_SECTOR_SIZE) 
> {
>  bool copy;
> 
>  /* Note that even when no rate limit is applied we need to yield
> @@ -185,15 +183,13 @@ static void coroutine_fn commit_run(void *opaque)
>  }
>  /* Copy if allocated above the base */
>  ret = bdrv_is_allocated_above(blk_bs(s->top), blk_bs(s->base),
> -  sector_num,
> +  offset / BDRV_SECTOR_SIZE,
>COMMIT_BUFFER_SIZE / BDRV_SECTOR_SIZE,
>);
>  copy = (ret == 1);
> -trace_commit_one_iteration(s, sector_num * BDRV_SECTOR_SIZE,
> -   n * BDRV_SECTOR_SIZE, ret);
> +trace_commit_one_iteration(s, offset, n * BDRV_SECTOR_SIZE, ret);
>  if (copy) {
> -ret = commit_populate(s->top, s->base,
> -  sector_num * BDRV_SECTOR_SIZE,
> +ret = commit_populate(s->top, s->base, offset,
>n * BDRV_SECTOR_SIZE, buf);
>  bytes_written += n * BDRV_SECTOR_SIZE;
>  }
> -- 
> 2.9.4
>

Re: [Qemu-devel] [PATCH v3 05/20] commit: Switch commit_populate() to byte-based

2017-06-30 Thread Jeff Cody

On Tue, Jun 27, 2017 at 02:24:43PM -0500, Eric Blake wrote:
> We are gradually converting to byte-based interfaces, as they are
> easier to reason about than sector-based.  Start by converting an
> internal function (no semantic change).
> 
> Signed-off-by: Eric Blake 
> Reviewed-by: John Snow 
> 

Reviewed-by: Jeff Cody 

> ---
> v2: no change
> ---
>  block/commit.c | 15 ---
>  1 file changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/block/commit.c b/block/commit.c
> index 4cda7f2..6f67d78 100644
> --- a/block/commit.c
> +++ b/block/commit.c
> @@ -47,26 +47,25 @@ typedef struct CommitBlockJob {
>  } CommitBlockJob;
> 
>  static int coroutine_fn commit_populate(BlockBackend *bs, BlockBackend *base,
> -int64_t sector_num, int nb_sectors,
> +int64_t offset, uint64_t bytes,
>  void *buf)
>  {
>  int ret = 0;
>  QEMUIOVector qiov;
>  struct iovec iov = {
>  .iov_base = buf,
> -.iov_len = nb_sectors * BDRV_SECTOR_SIZE,
> +.iov_len = bytes,
>  };
> 
> +assert(bytes < SIZE_MAX);
>  qemu_iovec_init_external(, , 1);
> 
> -ret = blk_co_preadv(bs, sector_num * BDRV_SECTOR_SIZE,
> -qiov.size, , 0);
> +ret = blk_co_preadv(bs, offset, qiov.size, , 0);
>  if (ret < 0) {
>  return ret;
>  }
> 
> -ret = blk_co_pwritev(base, sector_num * BDRV_SECTOR_SIZE,
> - qiov.size, , 0);
> +ret = blk_co_pwritev(base, offset, qiov.size, , 0);
>  if (ret < 0) {
>  return ret;
>  }
> @@ -193,7 +192,9 @@ static void coroutine_fn commit_run(void *opaque)
>  trace_commit_one_iteration(s, sector_num * BDRV_SECTOR_SIZE,
> n * BDRV_SECTOR_SIZE, ret);
>  if (copy) {
> -ret = commit_populate(s->top, s->base, sector_num, n, buf);
> +ret = commit_populate(s->top, s->base,
> +  sector_num * BDRV_SECTOR_SIZE,
> +  n * BDRV_SECTOR_SIZE, buf);
>  bytes_written += n * BDRV_SECTOR_SIZE;
>  }
>  if (ret < 0) {
> -- 
> 2.9.4
>

Re: [Qemu-devel] [PATCH v3 04/20] stream: Switch stream_run() to byte-based

2017-06-30 Thread Jeff Cody

On Tue, Jun 27, 2017 at 02:24:42PM -0500, Eric Blake wrote:
> We are gradually converting to byte-based interfaces, as they are
> easier to reason about than sector-based.  Change the internal
> loop iteration of streaming to track by bytes instead of sectors
> (although we are still guaranteed that we iterate by steps that
> are sector-aligned).
> 
> Signed-off-by: Eric Blake 
> Reviewed-by: John Snow 
> 

Reviewed-by: Jeff Cody 

> ---
> v2: no change
> ---
>  block/stream.c | 24 ++--
>  1 file changed, 10 insertions(+), 14 deletions(-)
> 
> diff --git a/block/stream.c b/block/stream.c
> index 746d525..2f9618b 100644
> --- a/block/stream.c
> +++ b/block/stream.c
> @@ -108,12 +108,11 @@ static void coroutine_fn stream_run(void *opaque)
>  BlockBackend *blk = s->common.blk;
>  BlockDriverState *bs = blk_bs(blk);
>  BlockDriverState *base = s->base;
> -int64_t sector_num = 0;
> -int64_t end = -1;
> +int64_t offset = 0;
>  uint64_t delay_ns = 0;
>  int error = 0;
>  int ret = 0;
> -int n = 0;
> +int n = 0; /* sectors */
>  void *buf;
> 
>  if (!bs->backing) {
> @@ -126,7 +125,6 @@ static void coroutine_fn stream_run(void *opaque)
>  goto out;
>  }
> 
> -end = s->common.len >> BDRV_SECTOR_BITS;
>  buf = qemu_blockalign(bs, STREAM_BUFFER_SIZE);
> 
>  /* Turn on copy-on-read for the whole block device so that guest read
> @@ -138,7 +136,7 @@ static void coroutine_fn stream_run(void *opaque)
>  bdrv_enable_copy_on_read(bs);
>  }
> 
> -for (sector_num = 0; sector_num < end; sector_num += n) {
> +for ( ; offset < s->common.len; offset += n * BDRV_SECTOR_SIZE) {
>  bool copy;
> 
>  /* Note that even when no rate limit is applied we need to yield
> @@ -151,28 +149,26 @@ static void coroutine_fn stream_run(void *opaque)
> 
>  copy = false;
> 
> -ret = bdrv_is_allocated(bs, sector_num,
> +ret = bdrv_is_allocated(bs, offset / BDRV_SECTOR_SIZE,
>  STREAM_BUFFER_SIZE / BDRV_SECTOR_SIZE, );
>  if (ret == 1) {
>  /* Allocated in the top, no need to copy.  */
>  } else if (ret >= 0) {
>  /* Copy if allocated in the intermediate images.  Limit to the
> - * known-unallocated area [sector_num, sector_num+n).  */
> + * known-unallocated area [offset, offset+n*BDRV_SECTOR_SIZE).  
> */
>  ret = bdrv_is_allocated_above(backing_bs(bs), base,
> -  sector_num, n, );
> +  offset / BDRV_SECTOR_SIZE, n, );
> 
>  /* Finish early if end of backing file has been reached */
>  if (ret == 0 && n == 0) {
> -n = end - sector_num;
> +n = (s->common.len - offset) / BDRV_SECTOR_SIZE;
>  }
> 
>  copy = (ret == 1);
>  }
> -trace_stream_one_iteration(s, sector_num * BDRV_SECTOR_SIZE,
> -   n * BDRV_SECTOR_SIZE, ret);
> +trace_stream_one_iteration(s, offset, n * BDRV_SECTOR_SIZE, ret);
>  if (copy) {
> -ret = stream_populate(blk, sector_num * BDRV_SECTOR_SIZE,
> -  n * BDRV_SECTOR_SIZE, buf);
> +ret = stream_populate(blk, offset, n * BDRV_SECTOR_SIZE, buf);
>  }
>  if (ret < 0) {
>  BlockErrorAction action =
> @@ -211,7 +207,7 @@ out:
>  /* Modify backing chain and close BDSes in main loop */
>  data = g_malloc(sizeof(*data));
>  data->ret = ret;
> -data->reached_end = sector_num == end;
> +data->reached_end = offset == s->common.len;
>  block_job_defer_to_main_loop(>common, stream_complete, data);
>  }
> 
> -- 
> 2.9.4
>

[Qemu-devel] [PATCH v2] tests: Avoid non-portable 'echo -ARG'

2017-06-30 Thread Eric Blake

POSIX says that backslashes in the arguments to 'echo', as well as
any use of 'echo -n' and 'echo -e', are non-portable; it recommends
people should favor 'printf' instead.  This is definitely true where
we do not control which shell is running (such as in makefile snippets
or in documentation examples).  But even for scripts where we
require bash (and therefore, where echo does what we want by default),
it is still possible to use 'shopt -s xpg_echo' to change bash's
behavior of echo.  And setting a good example never hurts when we are
not sure if a snippet will be copied from a bash-only script to a
general shell script (although I don't change the use of non-portable
\e for ESC when we know the running shell is bash).

Replace 'echo -n "..."' with 'printf %s "..."', and 'echo -e "..."'
with 'printf %b "...\n"', with the optimization that the %s/%b
argument can be omitted if the string being printed contains no
substitutions that could result in '%' (trivial if there is no
$ or `, but also possible when using things like $ret that are
known to be numeric given the assignment to ret just above).

In the qemu-iotests check script, fix unusual shell quoting
that would result in word-splitting if 'date' outputs a space.

In test 051, take an opportunity to shorten the line.

In test 068, get rid of a pointless second invocation of bash.

CC: qemu-triv...@nongnu.org
Signed-off-by: Eric Blake 

---
v2: be robust to potential % in substitutions [Max], rebase to master,
shorten some long lines and an odd bash -c use

Add qemu-trivial in cc, although we may decide this is better through
Max's block tree since it is mostly iotests related

---
 qemu-options.hx |  4 ++--
 tests/multiboot/run_test.sh | 10 +-
 tests/qemu-iotests/051  |  7 ---
 tests/qemu-iotests/068  |  2 +-
 tests/qemu-iotests/142  | 48 ++---
 tests/qemu-iotests/171  | 14 ++---
 tests/qemu-iotests/check| 18 -
 tests/rocker/all| 10 +-
 tests/tcg/cris/Makefile |  8 
 9 files changed, 61 insertions(+), 60 deletions(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index 297bd8a..05a0474 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -4363,7 +4363,7 @@ The simplest (insecure) usage is to provide the secret 
inline

 The simplest secure usage is to provide the secret via a file

- # echo -n "letmein" > mypasswd.txt
+ # printf "letmein" > mypasswd.txt
  # $QEMU -object secret,id=sec0,file=mypasswd.txt,format=raw

 For greater security, AES-256-CBC should be used. To illustrate usage,
@@ -4391,7 +4391,7 @@ telling openssl to base64 encode the result, but it could 
be left
 as raw bytes if desired.

 @example
- # SECRET=$(echo -n "letmein" |
+ # SECRET=$(printf "letmein" |
 openssl enc -aes-256-cbc -a -K $KEY -iv $IV)
 @end example

diff --git a/tests/multiboot/run_test.sh b/tests/multiboot/run_test.sh
index 78d7edf..9a1cd59 100755
--- a/tests/multiboot/run_test.sh
+++ b/tests/multiboot/run_test.sh
@@ -26,7 +26,7 @@ run_qemu() {
 local kernel=$1
 shift

-echo -e "\n\n=== Running test case: $kernel $@ ===\n" >> test.log
+printf %b "\n\n=== Running test case: $kernel $@ ===\n\n" >> test.log

 $QEMU \
 -kernel $kernel \
@@ -68,21 +68,21 @@ for t in mmap modules; do
 pass=1

 if [ $debugexit != 1 ]; then
-echo -e "\e[31m ?? \e[0m $t (no debugexit used, exit code $ret)"
+printf "\e[31m ?? \e[0m $t (no debugexit used, exit code $ret)\n"
 pass=0
 elif [ $ret != 0 ]; then
-echo -e "\e[31mFAIL\e[0m $t (exit code $ret)"
+printf "\e[31mFAIL\e[0m $t (exit code $ret)\n"
 pass=0
 fi

 if ! diff $t.out test.log > /dev/null 2>&1; then
-echo -e "\e[31mFAIL\e[0m $t (output difference)"
+printf "\e[31mFAIL\e[0m $t (output difference)\n"
 diff -u $t.out test.log
 pass=0
 fi

 if [ $pass == 1 ]; then
-echo -e "\e[32mPASS\e[0m $t"
+printf "\e[32mPASS\e[0m $t\n"
 fi

 done
diff --git a/tests/qemu-iotests/051 b/tests/qemu-iotests/051
index 26c29de..c3c08bf 100755
--- a/tests/qemu-iotests/051
+++ b/tests/qemu-iotests/051
@@ -217,7 +217,7 @@ run_qemu -drive driver=null-co,cache=invalid_value
 # Test 142 checks the direct=on cases

 for cache in writeback writethrough unsafe invalid_value; do
-echo -e "info block\ninfo block file\ninfo block backing\ninfo block 
backing-file" | \
+printf "info block %s\n" '' file backing backing-file | \
 run_qemu -drive 
file="$TEST_IMG",cache=$cache,backing.file.filename="$TEST_IMG.base",backing.cache.no-flush=on,backing.node-name=backing,backing.file.node-name=backing-file,file.node-name=file,if=none,id=$device_id
 -nodefaults
 done

@@ -325,8 +325,9 @@ echo "qemu-io $device_id \"write -P 0x22 0 4k\"" | run_qemu 
-drive file="$TEST_I

 $QEMU_IO -c "read -P 0x22 0 4k" "$TEST_IMG" | _filter_qemu_io

-echo -e

Re: [Qemu-devel] [PATCH v3 03/20] stream: Switch stream_populate() to byte-based

2017-06-30 Thread Jeff Cody

On Tue, Jun 27, 2017 at 02:24:41PM -0500, Eric Blake wrote:
> We are gradually converting to byte-based interfaces, as they are
> easier to reason about than sector-based.  Start by converting an
> internal function (no semantic change).
> 
> Signed-off-by: Eric Blake 
> Reviewed-by: John Snow 
> 

Reviewed-by: Jeff Cody 

> ---
> v2: no change
> ---
>  block/stream.c | 11 ++-
>  1 file changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/block/stream.c b/block/stream.c
> index 6cb3939..746d525 100644
> --- a/block/stream.c
> +++ b/block/stream.c
> @@ -41,20 +41,20 @@ typedef struct StreamBlockJob {
>  } StreamBlockJob;
> 
>  static int coroutine_fn stream_populate(BlockBackend *blk,
> -int64_t sector_num, int nb_sectors,
> +int64_t offset, uint64_t bytes,
>  void *buf)
>  {
>  struct iovec iov = {
>  .iov_base = buf,
> -.iov_len  = nb_sectors * BDRV_SECTOR_SIZE,
> +.iov_len  = bytes,
>  };
>  QEMUIOVector qiov;
> 
> +assert(bytes < SIZE_MAX);
>  qemu_iovec_init_external(, , 1);
> 
>  /* Copy-on-read the unallocated clusters */
> -return blk_co_preadv(blk, sector_num * BDRV_SECTOR_SIZE, qiov.size, 
> ,
> - BDRV_REQ_COPY_ON_READ);
> +return blk_co_preadv(blk, offset, qiov.size, , 
> BDRV_REQ_COPY_ON_READ);
>  }
> 
>  typedef struct {
> @@ -171,7 +171,8 @@ static void coroutine_fn stream_run(void *opaque)
>  trace_stream_one_iteration(s, sector_num * BDRV_SECTOR_SIZE,
> n * BDRV_SECTOR_SIZE, ret);
>  if (copy) {
> -ret = stream_populate(blk, sector_num, n, buf);
> +ret = stream_populate(blk, sector_num * BDRV_SECTOR_SIZE,
> +  n * BDRV_SECTOR_SIZE, buf);
>  }
>  if (ret < 0) {
>  BlockErrorAction action =
> -- 
> 2.9.4
>

Re: [Qemu-devel] [PATCH v3 02/20] trace: Show blockjob actions via bytes, not sectors

2017-06-30 Thread Jeff Cody

On Tue, Jun 27, 2017 at 02:24:40PM -0500, Eric Blake wrote:
> Upcoming patches are going to switch to byte-based interfaces
> instead of sector-based.  Even worse, trace_backup_do_cow_enter()
> had a weird mix of cluster and sector indices.
> 
> The trace interface is low enough that there are no stability
> guarantees, and therefore nothing wrong with changing our units,
> even in cases like trace_backup_do_cow_skip() where we are not
> changing the trace output.  So make the tracing uniformly use
> bytes.
> 
> Signed-off-by: Eric Blake 
> Reviewed-by: John Snow 
> 

Reviewed-by: Jeff Cody 

> ---
> v2: improve commit message, no code change
> ---
>  block/backup.c | 16 ++--
>  block/commit.c |  3 ++-
>  block/mirror.c | 26 +-
>  block/stream.c |  3 ++-
>  block/trace-events | 14 +++---
>  5 files changed, 38 insertions(+), 24 deletions(-)
> 
> diff --git a/block/backup.c b/block/backup.c
> index 9ca1d8e..06431ac 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -102,6 +102,7 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
>  void *bounce_buffer = NULL;
>  int ret = 0;
>  int64_t sectors_per_cluster = cluster_size_sectors(job);
> +int64_t bytes_per_cluster = sectors_per_cluster * BDRV_SECTOR_SIZE;
>  int64_t start, end;
>  int n;
> 
> @@ -110,18 +111,20 @@ static int coroutine_fn backup_do_cow(BackupBlockJob 
> *job,
>  start = sector_num / sectors_per_cluster;
>  end = DIV_ROUND_UP(sector_num + nb_sectors, sectors_per_cluster);
> 
> -trace_backup_do_cow_enter(job, start, sector_num, nb_sectors);
> +trace_backup_do_cow_enter(job, start * bytes_per_cluster,
> +  sector_num * BDRV_SECTOR_SIZE,
> +  nb_sectors * BDRV_SECTOR_SIZE);
> 
>  wait_for_overlapping_requests(job, start, end);
>  cow_request_begin(_request, job, start, end);
> 
>  for (; start < end; start++) {
>  if (test_bit(start, job->done_bitmap)) {
> -trace_backup_do_cow_skip(job, start);
> +trace_backup_do_cow_skip(job, start * bytes_per_cluster);
>  continue; /* already copied */
>  }
> 
> -trace_backup_do_cow_process(job, start);
> +trace_backup_do_cow_process(job, start * bytes_per_cluster);
> 
>  n = MIN(sectors_per_cluster,
>  job->common.len / BDRV_SECTOR_SIZE -
> @@ -138,7 +141,7 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
>  bounce_qiov.size, _qiov,
>  is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0);
>  if (ret < 0) {
> -trace_backup_do_cow_read_fail(job, start, ret);
> +trace_backup_do_cow_read_fail(job, start * bytes_per_cluster, 
> ret);
>  if (error_is_read) {
>  *error_is_read = true;
>  }
> @@ -154,7 +157,7 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
>   job->compress ? BDRV_REQ_WRITE_COMPRESSED : 
> 0);
>  }
>  if (ret < 0) {
> -trace_backup_do_cow_write_fail(job, start, ret);
> +trace_backup_do_cow_write_fail(job, start * bytes_per_cluster, 
> ret);
>  if (error_is_read) {
>  *error_is_read = false;
>  }
> @@ -177,7 +180,8 @@ out:
> 
>  cow_request_end(_request);
> 
> -trace_backup_do_cow_return(job, sector_num, nb_sectors, ret);
> +trace_backup_do_cow_return(job, sector_num * BDRV_SECTOR_SIZE,
> +   nb_sectors * BDRV_SECTOR_SIZE, ret);
> 
>  qemu_co_rwlock_unlock(>flush_rwlock);
> 
> diff --git a/block/commit.c b/block/commit.c
> index 6993994..4cda7f2 100644
> --- a/block/commit.c
> +++ b/block/commit.c
> @@ -190,7 +190,8 @@ static void coroutine_fn commit_run(void *opaque)
>COMMIT_BUFFER_SIZE / BDRV_SECTOR_SIZE,
>);
>  copy = (ret == 1);
> -trace_commit_one_iteration(s, sector_num, n, ret);
> +trace_commit_one_iteration(s, sector_num * BDRV_SECTOR_SIZE,
> +   n * BDRV_SECTOR_SIZE, ret);
>  if (copy) {
>  ret = commit_populate(s->top, s->base, sector_num, n, buf);
>  bytes_written += n * BDRV_SECTOR_SIZE;
> diff --git a/block/mirror.c b/block/mirror.c
> index eb27efc..b4dfe95 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -103,7 +103,8 @@ static void mirror_iteration_done(MirrorOp *op, int ret)
>  int64_t chunk_num;
>  int i, nb_chunks, sectors_per_chunk;
> 
> -trace_mirror_iteration_done(s, op->sector_num, op->nb_sectors, ret);
> +trace_mirror_iteration_done(s, op->sector_num * BDRV_SECTOR_SIZE,
> +op->nb_sectors * BDRV_SECTOR_SIZE, ret);
> 
>

Re: [Qemu-devel] [PATCH v4 1/2] iotests: Use absolute paths for executables

2017-06-30 Thread Eric Blake

On 06/30/2017 02:41 PM, Eric Blake wrote:

> +++ 068.out.bad   2017-06-30 14:35:28.720241398 -0500
> @@ -1,4 +1,5 @@
>  QA output created by 068
> +realpath: '': No such file or directory
> 
> The culprit? $QEMU_VXHS_PROG is empty for me, which means `set_prog_path
> qnio_server` found nothing to use.  You'll have to add in a safety valve
> that only calls 'type' if operating on a non-empty path in the first place.

I'm using this locally, in the meantime:

diff --git i/tests/qemu-iotests/common.config
w/tests/qemu-iotests/common.config
index c1dc425..6f97331 100644
--- i/tests/qemu-iotests/common.config
+++ w/tests/qemu-iotests/common.config
@@ -107,7 +107,9 @@ export QEMU_PROG=$(realpath -- "$(type -p
"$QEMU_PROG")")
 export QEMU_IMG_PROG=$(realpath -- "$(type -p "$QEMU_IMG_PROG")")
 export QEMU_IO_PROG=$(realpath -- "$(type -p "$QEMU_IO_PROG")")
 export QEMU_NBD_PROG=$(realpath -- "$(type -p "$QEMU_NBD_PROG")")
-export QEMU_VXHS_PROG=$(realpath -- "$(type -p "$QEMU_VXHS_PROG")")
+if [ -n "$QEMU_VXHS_PROG" ]; then
+export QEMU_VXHS_PROG=$(realpath -- "$(type -p "$QEMU_VXHS_PROG")")
+fi

 _qemu_wrapper()
 {


Is qnio_server easily available for Fedora?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v3 01/20] blockjob: Track job ratelimits via bytes, not sectors

2017-06-30 Thread Jeff Cody

On Tue, Jun 27, 2017 at 02:24:39PM -0500, Eric Blake wrote:
> The user interface specifies job rate limits in bytes/second.
> It's pointless to have our internal representation track things
> in sectors/second, particularly since we want to move away from
> sector-based interfaces.
> 
> Fix up a doc typo found while verifying that the ratelimit
> code handles the scaling difference.
> 
> Repetition of expressions like 'n * BDRV_SECTOR_SIZE' will be
> cleaned up later when functions are converted to iterate over
> images by bytes rather than by sectors.
> 
> Signed-off-by: Eric Blake 
> Reviewed-by: John Snow 
> 

Reviewed-by: Jeff Cody 

> ---
> v2: adjust commit message based on review; no code change
> ---
>  include/qemu/ratelimit.h |  3 ++-
>  block/backup.c   |  5 +++--
>  block/commit.c   |  5 +++--
>  block/mirror.c   | 13 +++--
>  block/stream.c   |  5 +++--
>  5 files changed, 18 insertions(+), 13 deletions(-)
> 
> diff --git a/include/qemu/ratelimit.h b/include/qemu/ratelimit.h
> index 8da1232..8dece48 100644
> --- a/include/qemu/ratelimit.h
> +++ b/include/qemu/ratelimit.h
> @@ -24,7 +24,8 @@ typedef struct {
> 
>  /** Calculate and return delay for next request in ns
>   *
> - * Record that we sent @p n data units. If we may send more data units
> + * Record that we sent @n data units (where @n matches the scale chosen
> + * during ratelimit_set_speed). If we may send more data units
>   * in the current time slice, return 0 (i.e. no delay). Otherwise
>   * return the amount of time (in ns) until the start of the next time
>   * slice that will permit sending the next chunk of data.
> diff --git a/block/backup.c b/block/backup.c
> index 5387fbd..9ca1d8e 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -208,7 +208,7 @@ static void backup_set_speed(BlockJob *job, int64_t 
> speed, Error **errp)
>  error_setg(errp, QERR_INVALID_PARAMETER, "speed");
>  return;
>  }
> -ratelimit_set_speed(>limit, speed / BDRV_SECTOR_SIZE, SLICE_TIME);
> +ratelimit_set_speed(>limit, speed, SLICE_TIME);
>  }
> 
>  static void backup_cleanup_sync_bitmap(BackupBlockJob *job, int ret)
> @@ -359,7 +359,8 @@ static bool coroutine_fn yield_and_check(BackupBlockJob 
> *job)
>   */
>  if (job->common.speed) {
>  uint64_t delay_ns = ratelimit_calculate_delay(>limit,
> -  job->sectors_read);
> +  job->sectors_read *
> +  BDRV_SECTOR_SIZE);
>  job->sectors_read = 0;
>  block_job_sleep_ns(>common, QEMU_CLOCK_REALTIME, delay_ns);
>  } else {
> diff --git a/block/commit.c b/block/commit.c
> index 524bd54..6993994 100644
> --- a/block/commit.c
> +++ b/block/commit.c
> @@ -209,7 +209,8 @@ static void coroutine_fn commit_run(void *opaque)
>  s->common.offset += n * BDRV_SECTOR_SIZE;
> 
>  if (copy && s->common.speed) {
> -delay_ns = ratelimit_calculate_delay(>limit, n);
> +delay_ns = ratelimit_calculate_delay(>limit,
> + n * BDRV_SECTOR_SIZE);
>  }
>  }
> 
> @@ -231,7 +232,7 @@ static void commit_set_speed(BlockJob *job, int64_t 
> speed, Error **errp)
>  error_setg(errp, QERR_INVALID_PARAMETER, "speed");
>  return;
>  }
> -ratelimit_set_speed(>limit, speed / BDRV_SECTOR_SIZE, SLICE_TIME);
> +ratelimit_set_speed(>limit, speed, SLICE_TIME);
>  }
> 
>  static const BlockJobDriver commit_job_driver = {
> diff --git a/block/mirror.c b/block/mirror.c
> index 61a862d..eb27efc 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -396,7 +396,8 @@ static uint64_t coroutine_fn 
> mirror_iteration(MirrorBlockJob *s)
>  bitmap_set(s->in_flight_bitmap, sector_num / sectors_per_chunk, 
> nb_chunks);
>  while (nb_chunks > 0 && sector_num < end) {
>  int64_t ret;
> -int io_sectors, io_sectors_acct;
> +int io_sectors;
> +int64_t io_bytes_acct;
>  BlockDriverState *file;
>  enum MirrorMethod {
>  MIRROR_METHOD_COPY,
> @@ -444,16 +445,16 @@ static uint64_t coroutine_fn 
> mirror_iteration(MirrorBlockJob *s)
>  switch (mirror_method) {
>  case MIRROR_METHOD_COPY:
>  io_sectors = mirror_do_read(s, sector_num, io_sectors);
> -io_sectors_acct = io_sectors;
> +io_bytes_acct = io_sectors * BDRV_SECTOR_SIZE;
>  break;
>  case MIRROR_METHOD_ZERO:
>  case MIRROR_METHOD_DISCARD:
>  mirror_do_zero_or_discard(s, sector_num, io_sectors,
>mirror_method == 
> MIRROR_METHOD_DISCARD);
>  if (write_zeroes_ok) {
> -io_sectors_acct = 0;
> +io_bytes_acct = 0;
>  } else {
>

Re: [Qemu-devel] [PATCH v4 1/2] iotests: Use absolute paths for executables

2017-06-30 Thread Eric Blake

On 06/29/2017 09:46 PM, Max Reitz wrote:

>>> +++ b/tests/qemu-iotests/common.config
>>> @@ -103,6 +103,12 @@ if [ -z "$QEMU_VXHS_PROG" ]; then
>>>  export QEMU_VXHS_PROG="`set_prog_path qnio_server`"
>>>  fi
>>>  
>>> +export QEMU_PROG=$(realpath -- "$(type -p "$QEMU_PROG")")
>>
>> ...now that you updated per my review to favor 'type' over 'which'?
>> Otherwise, the R-b stands.
> 
> Thanks, will fix and apply then...
> 
> ...and done, applied to my block branch:
> 
> https://github.com/XanClic/qemu/commits/block

Sorry for not noticing sooner, but you'll need to replace v4 (commit
0f0fec82 on your branch) with a fix, because now your branch does the
following on all iotests for me:

068 3s ... - output mismatch (see 068.out.bad)
--- /home/eblake/qemu/tests/qemu-iotests/068.out2017-06-26
22:02:56.057734882 -0500
+++ 068.out.bad 2017-06-30 14:35:28.720241398 -0500
@@ -1,4 +1,5 @@
 QA output created by 068
+realpath: '': No such file or directory

The culprit? $QEMU_VXHS_PROG is empty for me, which means `set_prog_path
qnio_server` found nothing to use.  You'll have to add in a safety valve
that only calls 'type' if operating on a non-empty path in the first place.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v1] s390x/cpumodel: allow to enable "idtes" feature for TCG

2017-06-30 Thread Richard Henderson


On 06/29/2017 12:05 AM, Thomas Huth wrote:

However, I'm not sure whether you can simply ignore the clearing-by-ASCE
stuff in this case. For example, according to the PoP:

"When the clearing-by-ASCE-option bit (bit 52 of gen-
  eral register R2 is one), the M4 field is ignored."

And the idte helper function currently always takes the M4 field into
account...


I don't see that quote.  I see only

   Bit 2 of the M 4 field is ignored for the
   clearing-by-ASCE operation and when EDAT-2
   does not apply.

We don't actually handle bit 2 at all...


r~

Re: [Qemu-devel] [PATCH RFC 0/6] q35: add acpi pci hotplug support

2017-06-30 Thread Michael S. Tsirkin

On Fri, Jun 30, 2017 at 10:25:05AM +0300, Marcel Apfelbaum wrote:
> On 30/06/2017 2:17, Michael S. Tsirkin wrote:
> > On Fri, Jun 30, 2017 at 12:55:56AM +0300, Aleksandr Bezzubikov wrote:
> > > The series adds hotplug support to legacy PCI buses for Q35 machines.
> > > The ACPI hotplug code is emitted if at least one legacy pci-bridge is 
> > > present.
> > > 
> > > This series is mostly based on past Marcel's series
> > > https://lists.gnu.org/archive/html/qemu-devel/2016-05/msg05681.html,
> > > but rebased on current master with some minor changes according to 
> > > current codebase.
> > > 
> > > ACPI code emission approach used in this series can be called "static",
> > > because it checkswhether a bridge exists only at initial DSDT generation 
> > > moment.
> > > The main goal is to enable AML PCI hotplug-related code to be generated 
> > > dynamically.
> > > In other words, the bridge plugged in - a new acpi definition block is
> > > loaded (using LoadTable method).
> > > This is necessary for PCIE-PCI bridge hotplugging feature.
> > 
> 
> Hi Michael,
> Thank you for looking into this.
> 
> > I wonder whether we really need ACPI hotplug.
> > Most modern systems are limited to native PCIE.
> > 
> 
> I was under impression that ACPI hotplug will still remain
> for memory/cpu hotplug, but I might be wrong. If this
> is the case (ACPI hotplug is used for other subsystems),
> I don't see why modern system would disable the PCI APCI hoptplug.
> (I am not saying PCIe hotplug is not preferred.)
> 
> And if the modern systems are limited to native PCIe
> we might have a bigger problem since the QEMU default
> x86 machine is PCI based and we don't have PCIe obviously...
> maybe is this the right time to switch to Q35 as the default machine?
> (I am aware it should be a different discussion)
> 
> 
> > I do understand the need to add PCI devices sometimes,
> > but maybe we can find a way to do this with native hotplug.
> > 
> > For example, how about the following approach
> > 
> > 
> > - PCIE downstream port - X - PCIE-to-PCI bridge - PCI device
> > 
> 
> It can be a solution, yes, but then we would limit (seriously)
> the number of PCI devices we can add. It will not be a problem
> if we will continue to keep Q35 for "modern systems" only machine
> and keep PC around as the default machine. However adding full support
> for pci-bridge will allow us to switch to Q35 and use it
> as the default machine (sometime sooner) since it would
> be easier to map older configurations.

Part of the value from where I stand is dropping support for
a large matrix of configurations, without hurting users.
So we maintain piix but add features to q35 only.

Where are these pci devices coming from that you need such a large
number of them?  Part of the issue is things like sound, but these
aren't even always hotpluggable.

> > 
> > By hotplugging the bridge+device combination into a downstream
> > port (point X) we can potentially make devices enumerate
> > properly.
> > 
> 
> It may work, yes, Alexandr will you be willing to try it?
> 
> > This might cause some issues with IO port assignment (uses 4K
> > io port space due to bridge aperture limitations)
> > but maybe we do not need so many legacy PCI devices -
> > people who do can simply use piix.
> > 
> 
> IO port assignment issue can be solved by using non standard
> IO port space, some OSes can actually deal with it (I think),
> however it will not solve the limitation of the number of
> pci devices we can have, since each device (PCIe or not) will be
> under a different bus we will have 256 PCI devices max.
> We can use multi-functions, but then the hot-plug granularity
> goes away.

That's quite a lot. SRIOV systems sometimes go higher because
people expose each VF as a separate device to the guest,
but these almost always
. are pcie
. have no io space

> > 
> > For this to work we need a way to create bridge instance
> > that is invisible to the guest. There is already a
> > way to do this for multifunction devices:
> > 
> > create bridge in function != 0
> > attach device
> > then create a dummy function 0
> > 
> > Inelegant - it would be cleaner to support this for function 0
> > as well - but should allow you to test the idea directly.
> > 
> 
> The benefit of this project is to  make Q35 a "wider" machine
> that would include all prev QEMU devices and will facilitate
> the transition from the pc to q35 machine.

But it's not a given that it's a win. We carry in a bunch of
crappy half-way supported devices. No way to drop them from piix.

> So for the modern systems not supporting PCI ACPI hotplug
> we don't need pci-bridges anyway, but for the older ones
> the ACPI code of the pci-bridge will be loaded into the
> ACPI namespace only if a pci-bridge is actually hot-plugged.
> 
> That being said, if we decide that q35 will be used for
> modern systems only and the pc machine will remain the
> default for the next years, we may try to bundle
> the pci-bridge with the

[Qemu-devel] [PATCH] Python3 Support for qmp.py

2017-06-30 Thread Ishani Chugh

This patch intends to make qmp.py compatible with both python2 and python3.

Signed-off-by: Ishani Chugh 
---
 scripts/qmp/qmp.py | 66 +++---
 1 file changed, 43 insertions(+), 23 deletions(-)

diff --git a/scripts/qmp/qmp.py b/scripts/qmp/qmp.py
index 62d3651..9926c36 100644
--- a/scripts/qmp/qmp.py
+++ b/scripts/qmp/qmp.py
@@ -13,18 +13,23 @@ import errno
 import socket
 import sys
 
+
 class QMPError(Exception):
 pass
 
+
 class QMPConnectError(QMPError):
 pass
 
+
 class QMPCapabilitiesError(QMPError):
 pass
 
+
 class QMPTimeoutError(QMPError):
 pass
 
+
 class QEMUMonitorProtocol:
 def __init__(self, address, server=False, debug=False):
 """
@@ -42,6 +47,7 @@ class QEMUMonitorProtocol:
 self.__address = address
 self._debug = debug
 self.__sock = self.__get_sock()
+self.data = b""
 if server:
 self.__sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
 self.__sock.bind(self.__address)
@@ -56,23 +62,35 @@ class QEMUMonitorProtocol:
 
 def __negotiate_capabilities(self):
 greeting = self.__json_read()
-if greeting is None or not greeting.has_key('QMP'):
+if greeting is None or 'QMP' not in greeting:
 raise QMPConnectError
-# Greeting seems ok, negotiate capabilities
 resp = self.cmd('qmp_capabilities')
 if "return" in resp:
 return greeting
 raise QMPCapabilitiesError
 
+def __sock_readline(self):
+while True:
+ch = self.__sock.recv(1)
+if ch is None:
+if self.data:
+raise ValueError('socket closed mid-line')
+return None
+self.data += ch
+if ch == b'\n':
+line = self.data.decode('utf-8')
+self.data = b""
+return line
+
 def __json_read(self, only_event=False):
 while True:
-data = self.__sockfile.readline()
+data = self.__sock_readline()
 if not data:
 return
 resp = json.loads(data)
 if 'event' in resp:
 if self._debug:
-print >>sys.stderr, "QMP:<<< %s" % resp
+print("QMP:<<< %s" % resp)
 self.__events.append(resp)
 if not only_event:
 continue
@@ -87,18 +105,18 @@ class QEMUMonitorProtocol:
 @param wait (bool): block until an event is available.
 @param wait (float): If wait is a float, treat it as a timeout value.
 
-@raise QMPTimeoutError: If a timeout float is provided and the timeout
-period elapses.
-@raise QMPConnectError: If wait is True but no events could be 
retrieved
-or if some other error occurred.
+@raise QMPTimeoutError: If a timeout float is provided and the
+timeout period elapses.
+@raise QMPConnectError: If wait is True but no events could be
+retrieved or if some other error occurred.
 """
 
 # Check for new events regardless and pull them into the cache:
 self.__sock.setblocking(0)
 try:
-self.__json_read()
+test = self.__json_read()
 except socket.error as err:
-if err[0] == errno.EAGAIN:
+if err.errno == errno.EAGAIN:
 # No data available
 pass
 self.__sock.setblocking(1)
@@ -128,7 +146,7 @@ class QEMUMonitorProtocol:
 @raise QMPCapabilitiesError if fails to negotiate capabilities
 """
 self.__sock.connect(self.__address)
-self.__sockfile = self.__sock.makefile()
+self.__sockfile = self.__sock.makefile('rb')
 if negotiate:
 return self.__negotiate_capabilities()
 
@@ -143,7 +161,7 @@ class QEMUMonitorProtocol:
 """
 self.__sock.settimeout(15)
 self.__sock, _ = self.__sock.accept()
-self.__sockfile = self.__sock.makefile()
+self.__sockfile = self.__sock.makefile('rb')
 return self.__negotiate_capabilities()
 
 def cmd_obj(self, qmp_cmd):
@@ -155,16 +173,17 @@ class QEMUMonitorProtocol:
 been closed
 """
 if self._debug:
-print >>sys.stderr, "QMP:>>> %s" % qmp_cmd
+print("QMP:>>> %s" % qmp_cmd)
 try:
-self.__sock.sendall(json.dumps(qmp_cmd))
+command = json.dumps(qmp_cmd)
+self.__sock.sendall(command.encode('UTF-8'))
 except socket.error as err:
-if err[0] == errno.EPIPE:
+if err.errno == errno.EPIPE:
 return
-raise socket.error(err)
+raise
 resp = self.__json_read()
 if

[Qemu-devel] [PULL 3/3] tcg: consistently access cpu->tb_jmp_cache atomically

2017-06-30 Thread Richard Henderson

From: "Emilio G. Cota" 

Some code paths can lead to atomic accesses racing with memset()
on cpu->tb_jmp_cache, which can result in torn reads/writes
and is undefined behaviour in C11.

These torn accesses are unlikely to show up as bugs, but from code
inspection they seem possible. For example, tb_phys_invalidate does:
/* remove the TB from the hash list */
h = tb_jmp_cache_hash_func(tb->pc);
CPU_FOREACH(cpu) {
if (atomic_read(>tb_jmp_cache[h]) == tb) {
atomic_set(>tb_jmp_cache[h], NULL);
}
}
Here atomic_set might race with a concurrent memset (such as the
ones scheduled via "unsafe" async work, e.g. tlb_flush_page) and
therefore we might end up with a torn pointer (or who knows what,
because we are under undefined behaviour).

This patch converts parallel accesses to cpu->tb_jmp_cache to use
atomic primitives, thereby bringing these accesses back to defined
behaviour. The price to pay is to potentially execute more instructions
when clearing cpu->tb_jmp_cache, but given how infrequently they happen
and the small size of the cache, the performance impact I have measured
is within noise range when booting debian-arm.

Note that under "safe async" work (e.g. do_tb_flush) we could use memset
because no other vcpus are running. However I'm keeping these accesses
atomic as well to keep things simple and to avoid confusing analysis
tools such as ThreadSanitizer.

Reviewed-by: Paolo Bonzini 
Reviewed-by: Richard Henderson 
Signed-off-by: Emilio G. Cota 
Message-Id: <1497486973-25845-1-git-send-email-c...@braap.org>
Signed-off-by: Richard Henderson 
---
 include/qom/cpu.h | 11 ++-
 accel/tcg/cputlb.c|  4 ++--
 accel/tcg/translate-all.c | 26 --
 qom/cpu.c |  5 +
 4 files changed, 25 insertions(+), 21 deletions(-)

diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index 89ddb68..2fe7cff 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -346,7 +346,7 @@ struct CPUState {
 
 void *env_ptr; /* CPUArchState */
 
-/* Writes protected by tb_lock, reads not thread-safe  */
+/* Accessed in parallel; all accesses must be atomic */
 struct TranslationBlock *tb_jmp_cache[TB_JMP_CACHE_SIZE];
 
 struct GDBRegisterState *gdb_regs;
@@ -422,6 +422,15 @@ extern struct CPUTailQ cpus;
 
 extern __thread CPUState *current_cpu;
 
+static inline void cpu_tb_jmp_cache_clear(CPUState *cpu)
+{
+unsigned int i;
+
+for (i = 0; i < TB_JMP_CACHE_SIZE; i++) {
+atomic_set(>tb_jmp_cache[i], NULL);
+}
+}
+
 /**
  * qemu_tcg_mttcg_enabled:
  * Check whether we are running MultiThread TCG or not.
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 1900936..85635ae 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -118,7 +118,7 @@ static void tlb_flush_nocheck(CPUState *cpu)
 
 memset(env->tlb_table, -1, sizeof(env->tlb_table));
 memset(env->tlb_v_table, -1, sizeof(env->tlb_v_table));
-memset(cpu->tb_jmp_cache, 0, sizeof(cpu->tb_jmp_cache));
+cpu_tb_jmp_cache_clear(cpu);
 
 env->vtlb_index = 0;
 env->tlb_flush_addr = -1;
@@ -183,7 +183,7 @@ static void tlb_flush_by_mmuidx_async_work(CPUState *cpu, 
run_on_cpu_data data)
 }
 }
 
-memset(cpu->tb_jmp_cache, 0, sizeof(cpu->tb_jmp_cache));
+cpu_tb_jmp_cache_clear(cpu);
 
 tlb_debug("done\n");
 
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index f6ad46b..93fb923 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -928,11 +928,7 @@ static void do_tb_flush(CPUState *cpu, run_on_cpu_data 
tb_flush_count)
 }
 
 CPU_FOREACH(cpu) {
-int i;
-
-for (i = 0; i < TB_JMP_CACHE_SIZE; ++i) {
-atomic_set(>tb_jmp_cache[i], NULL);
-}
+cpu_tb_jmp_cache_clear(cpu);
 }
 
 tcg_ctx.tb_ctx.nb_tbs = 0;
@@ -1813,19 +1809,21 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
 cpu_loop_exit_noexc(cpu);
 }
 
-void tb_flush_jmp_cache(CPUState *cpu, target_ulong addr)
+static void tb_jmp_cache_clear_page(CPUState *cpu, target_ulong page_addr)
 {
-unsigned int i;
+unsigned int i, i0 = tb_jmp_cache_hash_page(page_addr);
 
+for (i = 0; i < TB_JMP_PAGE_SIZE; i++) {
+atomic_set(>tb_jmp_cache[i0 + i], NULL);
+}
+}
+
+void tb_flush_jmp_cache(CPUState *cpu, target_ulong addr)
+{
 /* Discard jump cache entries for any tb which might potentially
overlap the flushed page.  */
-i = tb_jmp_cache_hash_page(addr - TARGET_PAGE_SIZE);
-memset(>tb_jmp_cache[i], 0,
-   TB_JMP_PAGE_SIZE * sizeof(TranslationBlock *));
-
-i = tb_jmp_cache_hash_page(addr);
-memset(>tb_jmp_cache[i], 0,
-   TB_JMP_PAGE_SIZE * sizeof(TranslationBlock *));
+tb_jmp_cache_clear_page(cpu, addr - TARGET_PAGE_SIZE);
+tb_jmp_cache_clear_page(cpu, addr);
 }
 
 static void

[Qemu-devel] [PULL 2/3] gen-icount: use tcg_ctx.tcg_env instead of cpu_env

2017-06-30 Thread Richard Henderson

From: "Emilio G. Cota" 

We are relying on cpu_env being defined as a global, yet most
targets (i.e. all but arm/a64) have it defined as a local variable.
Luckily all of them use the same "cpu_env" name, but really
compilation shouldn't break if the name of that local variable
changed.

Fix it by using tcg_ctx.tcg_env, which all targets set in their
translate_init function. This change also helps paving the way
for the upcoming "translation loop common to all targets" work.

Reviewed-by: Richard Henderson 
Signed-off-by: Emilio G. Cota 
Message-Id: <1497639397-19453-3-git-send-email-c...@braap.org>
Signed-off-by: Richard Henderson 
---
 include/exec/gen-icount.h | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/include/exec/gen-icount.h b/include/exec/gen-icount.h
index 547c979..9b3cb14 100644
--- a/include/exec/gen-icount.h
+++ b/include/exec/gen-icount.h
@@ -19,7 +19,7 @@ static inline void gen_tb_start(TranslationBlock *tb)
 count = tcg_temp_new_i32();
 }
 
-tcg_gen_ld_i32(count, cpu_env,
+tcg_gen_ld_i32(count, tcg_ctx.tcg_env,
-ENV_OFFSET + offsetof(CPUState, icount_decr.u32));
 
 if (tb->cflags & CF_USE_ICOUNT) {
@@ -37,7 +37,7 @@ static inline void gen_tb_start(TranslationBlock *tb)
 tcg_gen_brcondi_i32(TCG_COND_LT, count, 0, exitreq_label);
 
 if (tb->cflags & CF_USE_ICOUNT) {
-tcg_gen_st16_i32(count, cpu_env,
+tcg_gen_st16_i32(count, tcg_ctx.tcg_env,
  -ENV_OFFSET + offsetof(CPUState, 
icount_decr.u16.low));
 }
 
@@ -62,14 +62,16 @@ static inline void gen_tb_end(TranslationBlock *tb, int 
num_insns)
 static inline void gen_io_start(void)
 {
 TCGv_i32 tmp = tcg_const_i32(1);
-tcg_gen_st_i32(tmp, cpu_env, -ENV_OFFSET + offsetof(CPUState, can_do_io));
+tcg_gen_st_i32(tmp, tcg_ctx.tcg_env,
+   -ENV_OFFSET + offsetof(CPUState, can_do_io));
 tcg_temp_free_i32(tmp);
 }
 
 static inline void gen_io_end(void)
 {
 TCGv_i32 tmp = tcg_const_i32(0);
-tcg_gen_st_i32(tmp, cpu_env, -ENV_OFFSET + offsetof(CPUState, can_do_io));
+tcg_gen_st_i32(tmp, tcg_ctx.tcg_env,
+   -ENV_OFFSET + offsetof(CPUState, can_do_io));
 tcg_temp_free_i32(tmp);
 }
 
-- 
2.9.4

[Qemu-devel] [PULL 1/3] gen-icount: add missing inline to gen_tb_end

2017-06-30 Thread Richard Henderson

From: "Emilio G. Cota" 

Reviewed-by: Richard Henderson 
Signed-off-by: Emilio G. Cota 
Message-Id: <1497639397-19453-2-git-send-email-c...@braap.org>
Signed-off-by: Richard Henderson 
---
 include/exec/gen-icount.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/exec/gen-icount.h b/include/exec/gen-icount.h
index 62d462e..547c979 100644
--- a/include/exec/gen-icount.h
+++ b/include/exec/gen-icount.h
@@ -44,7 +44,7 @@ static inline void gen_tb_start(TranslationBlock *tb)
 tcg_temp_free_i32(count);
 }
 
-static void gen_tb_end(TranslationBlock *tb, int num_insns)
+static inline void gen_tb_end(TranslationBlock *tb, int num_insns)
 {
 if (tb->cflags & CF_USE_ICOUNT) {
 /* Update the num_insn immediate parameter now that we know
-- 
2.9.4

[Qemu-devel] [PULL 0/3] Queued TCG patches

2017-06-30 Thread Richard Henderson

None of my TCGTemp patches for now; I'm still trying to understand
how they might (or might not) conflict with multi-threaded code gen
for TCG.


r~


The following changes since commit 82d76dc7fc19a5eb9f731d7faed1792bb97214e0:

  Merge remote-tracking branch 'remotes/famz/tags/block-pull-request' into 
staging (2017-06-30 16:29:51 +0100)

are available in the git repository at:

  git://github.com/rth7680/qemu.git tags/pull-tcg-20170603

for you to fetch changes up to f3ced3c59287dabc253f83f0c70aa4934470c15e:

  tcg: consistently access cpu->tb_jmp_cache atomically (2017-06-30 11:40:59 
-0700)


Queued TCG patches


Emilio G. Cota (3):
  gen-icount: add missing inline to gen_tb_end
  gen-icount: use tcg_ctx.tcg_env instead of cpu_env
  tcg: consistently access cpu->tb_jmp_cache atomically

 include/exec/gen-icount.h | 12 +++-
 include/qom/cpu.h | 11 ++-
 accel/tcg/cputlb.c|  4 ++--
 accel/tcg/translate-all.c | 26 --
 qom/cpu.c |  5 +
 5 files changed, 32 insertions(+), 26 deletions(-)

Re: [Qemu-devel] [PATCH v11 01/29] Pass generic CPUState to gen_intermediate_code()

2017-06-30 Thread Richard Henderson


On 06/29/2017 03:52 PM, Emilio G. Cota wrote:

On Wed, Jun 28, 2017 at 15:20:42 +0300, Lluís Vilanova wrote:

Needed to implement a target-agnostic gen_intermediate_code() in the
future.

Signed-off-by: Lluís Vilanova 
Reviewed-by: David Gibson 
Reviewed-by: Richard Henderson 
---

(snip)

-void gen_intermediate_code(CPUAlphaState *env, struct TranslationBlock *tb)
+void gen_intermediate_code(CPUState *cpu, struct TranslationBlock *tb)
  {
-AlphaCPU *cpu = alpha_env_get_cpu(env);
-CPUState *cs = CPU(cpu);
+CPUAlphaState *env = cpu->env_ptr;


I'd keep the original variable names, i.e. cs for CPUState in this case,
just like you did for a64:


-void gen_intermediate_code_a64(ARMCPU *cpu, TranslationBlock *tb)
+void gen_intermediate_code_a64(CPUState *cs, TranslationBlock *tb)
  {
-CPUState *cs = CPU(cpu);
-CPUARMState *env = >env;
+CPUARMState *env = cs->env_ptr;
+ARMCPU *cpu = arm_env_get_cpu(env);


This will keep the diff size to a minimum.


I asked for the same thing many revisions ago.


r~

[Qemu-devel] [PATCH] target/ppc: Use tcg_gen_lookup_and_goto_ptr

2017-06-30 Thread Richard Henderson

Cc: qemu-...@nongnu.org
Signed-off-by: Richard Henderson 
---
 target/ppc/translate.c | 23 ---
 1 file changed, 8 insertions(+), 15 deletions(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index c0cd64d..9aa66f5 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -3419,7 +3419,7 @@ static inline bool use_goto_tb(DisasContext *ctx, 
target_ulong dest)
 }
 
 /***Branch ***/
-static inline void gen_goto_tb(DisasContext *ctx, int n, target_ulong dest)
+static void gen_goto_tb(DisasContext *ctx, int n, target_ulong dest)
 {
 if (NARROW_MODE(ctx)) {
 dest = (uint32_t) dest;
@@ -3441,7 +3441,7 @@ static inline void gen_goto_tb(DisasContext *ctx, int n, 
target_ulong dest)
 gen_debug_exception(ctx);
 }
 }
-tcg_gen_exit_tb(0);
+tcg_gen_lookup_and_goto_ptr(cpu_nip);
 }
 }
 
@@ -3479,7 +3479,7 @@ static void gen_b(DisasContext *ctx)
 #define BCOND_CTR 2
 #define BCOND_TAR 3
 
-static inline void gen_bcond(DisasContext *ctx, int type)
+static void gen_bcond(DisasContext *ctx, int type)
 {
 uint32_t bo = BO(ctx->opcode);
 TCGLabel *l1;
@@ -3543,26 +3543,19 @@ static inline void gen_bcond(DisasContext *ctx, int 
type)
 } else {
 gen_goto_tb(ctx, 0, li);
 }
-if ((bo & 0x14) != 0x14) {
-gen_set_label(l1);
-gen_goto_tb(ctx, 1, ctx->nip);
-}
 } else {
 if (NARROW_MODE(ctx)) {
 tcg_gen_andi_tl(cpu_nip, target, (uint32_t)~3);
 } else {
 tcg_gen_andi_tl(cpu_nip, target, ~3);
 }
-tcg_gen_exit_tb(0);
-if ((bo & 0x14) != 0x14) {
-gen_set_label(l1);
-gen_update_nip(ctx, ctx->nip);
-tcg_gen_exit_tb(0);
-}
-}
-if (type == BCOND_LR || type == BCOND_CTR || type == BCOND_TAR) {
+tcg_gen_lookup_and_goto_ptr(cpu_nip);
 tcg_temp_free(target);
 }
+if ((bo & 0x14) != 0x14) {
+gen_set_label(l1);
+gen_goto_tb(ctx, 1, ctx->nip);
+}
 }
 
 static void gen_bc(DisasContext *ctx)
-- 
2.9.4

Re: [Qemu-devel] [PATCH v22 19/30] qcow2: add persistent dirty bitmaps support

2017-06-30 Thread Eric Blake

On 06/30/2017 12:58 PM, John Snow wrote:

>>
>> "Structure of a bitmap directory entry:
>> ...
>>  8 - 11:bitmap_table_size
>> Number of entries in the bitmap table of the bitmap."
>>
> 
> This is the number of bitmaps stored in the qcow2, not the size of one
> particular bitmap.

No, I was quoting from "Structure of a bitmap directory entry" (the same
struct that also includes a per-bitmap granularity).

However, on re-reading the difference between the bitmap table and the
bitmap data, I see that the bitmap_table_size it formally the number of
cluster mappings that the bitmap occupies - so while it is not a precise
size of the bitmap, it IS an approximation (where scaling has introduced
lost precision for all sizes that map within the final cluster).  But my
point remains - we have some imprecision on whether bitmap_table_size
has to be clamped to the same number as you would get if the current
virtual disk size is converted into bitmaps, or whether it is valid to
have a qcow2 file where a bitmap table contains fewer cluster mappings
than would be required to cover the whole virtual file, or conversely
contains more cluster mappings than what you get even when rounding the
virtual table size up to the bitmap granularity and further scaled by
how many bits fit per cluster.

Concretely, if I'm using 512-byte clusters (the qcow2 minimum), and
qcow2 forces me to use a bitmap granularity no smaller than 9 (each bit
of the bitmap covers 512 bytes), then one cluster of bitmap data covers
2M of virtual size.  If I have an image with a virtual size of exactly
4M, and a bitmap covering that image, then my bitmap table _should_ have
exactly two mappings (bitmap_table_size should be 2, because it required
2 8-byte entries in the table to give the mapping for the 2 clusters
used to contain all the bits of the bitmap; the remaining 496 bytes of
the bitmap table should be 0 because there are no further mappings).
But note that any size in the range (2M, 4M] as the same
bitmap_table_size of 2 for that granularity.

If the bitmap is tied to the image (has the 'auto' and/or 'dirty' bit
set), then I agree that the bitmap size should match the virtual image
size.  But if the bitmap is independent, what technical reason do we
have that prevents us from having a bitmap covering 2M or less of data
(bitmap_table_size of 1), or more than 4M of data (bitmap_table_size of
3 or more), even though it has no relation to the current virtual image
size of 4M?

Meanwhile, if I use a bitmap granularity of 1k instead of 512, in the
same image with 512-byte clusters, a virtual size of 2M is
indistinguishable from a virtual size of 4M (both bitmaps fit within a
single cluster, or bitmap_table_size of 1; although the 2M case only
uses 256 of the 512 bytes in the bitmap data cluster).

I'm worried that we are too strict in stating that ALL bitmaps are tied
to the current virtual image size, but I'm also worried that our spec
might not be strict enough in enforcing that certain bitmaps MUST match
the virtual size (if the bitmap is automatically tracking writes to the
virtual image); and also worried whether we are setting ourselves up for
obscure failures based on the interaction of resize and/or internal
snapshots when bitmaps are in play (or whether we are being conservative
and forbidding those interactions until we've had time to further prove
that we handle their interaction safely).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v22 19/30] qcow2: add persistent dirty bitmaps support

2017-06-30 Thread John Snow



On 06/30/2017 01:47 PM, Eric Blake wrote:
> On 06/29/2017 09:23 PM, Max Reitz wrote:
>> On 2017-06-30 04:18, Eric Blake wrote:
>>> On 06/28/2017 07:05 AM, Vladimir Sementsov-Ogievskiy wrote:
 Store persistent dirty bitmaps in qcow2 image.

 Signed-off-by: Vladimir Sementsov-Ogievskiy 
 Reviewed-by: Max Reitz 
 ---
> 
>>>
>>> This grabs the size (currently in sectors, although I plan to fix it to
>>> be in bytes)...
>>>
 +const char *bm_name = bdrv_dirty_bitmap_name(bitmap);
 +uint8_t *buf = NULL;
 +BdrvDirtyBitmapIter *dbi;
 +uint64_t *tb;
 +uint64_t tb_size =
 +size_to_clusters(s,
 +bdrv_dirty_bitmap_serialization_size(bitmap, 0, bm_size));
>>>
>>> ...then finds out how many bytes are required to serialize the entire
>>> image, where bm_size should be the same as before...
>>>
 +while ((sector = bdrv_dirty_iter_next(dbi)) != -1) {
 +uint64_t cluster = sector / sbc;
 +uint64_t end, write_size;
 +int64_t off;
 +
 +sector = cluster * sbc;
 +end = MIN(bm_size, sector + sbc);
 +write_size =
 +bdrv_dirty_bitmap_serialization_size(bitmap, sector, end - 
 sector);
>>>
>>> But here, rather than tackling the entire image at once, you are
>>> subdividing your queries along arbitrary lines. But nowhere do I see a
>>> call to bdrv_dirty_bitmap_serialization_align() to make sure your
>>> end-sector value is properly aligned; if it is not aligned, you will
>>> trigger an assertion failure here...
>>
>> It's 4:21 am here, so I cannot claim to be right, but if I remember
>> correctly, it will automatically aligned because sbc is the number of
>> bits (and thus sectors) a bitmap cluster covers.
> 
> Okay, I re-read the spec.  First thing I noticed: we have a potential
> conflict if an image is allowed to be resized:
> 
> "All stored bitmaps are related to the virtual disk stored in the same
> image, so each bitmap size is equal to the virtual disk size."
> 
> If you resize an image, does the bitmap size have to be adjusted as
> well?  What if you create one bitmap, then take an internal snapshot,
> then resize?  Or do we declare that (at least for now) the presence of a
> bitmap is incompatible with the use of an internal snapshot?
> 
> Conversely, we state that:
> 
> 
> "Structure of a bitmap directory entry:
> ...
>  8 - 11:bitmap_table_size
> Number of entries in the bitmap table of the bitmap."
> 

This is the number of bitmaps stored in the qcow2, not the size of one
particular bitmap.

> Since a bitmap therefore tracks its own size, I think the earlier
> statement that all bitmap sizes are equal to the virtual disk size is
> too strict (there seems to be no technical reason why a bitmap can't
> have a different size that the image).
> 
> 
> But, having read that, you are correct that we are subdividing our
> bitmaps according to what fits in a qcow2 cluster, and the smallest
> qcow2 cluster we can come up with is 512-bytes (or 4k bits of the
> bitmap).  Checking hbitmap.c, we are merely asserting that our alignment
> is always a multiple of 64 << hb->granularity.  But since we are
> partitioning a cluster at a time, our minimum alignment will be 512 <<
> hb->granularity, which is always aligned.
> 
> So all the more I need to do is add an assertion.
> 
>> I'm for fixing it later. I would have announced the series "applied" an
>> hour ago if I hadn't had to bisect iotest 055 breakage...
> 
> I'm working it into my v4 posting of dirty-bitmap cleanups.
>

Re: [Qemu-devel] [PATCH v22 06/30] block/dirty-bitmap: add deserialize_ones func

2017-06-30 Thread John Snow



On 06/29/2017 10:01 PM, Eric Blake wrote:
> On 06/29/2017 08:55 PM, Eric Blake wrote:
>> On 06/28/2017 07:05 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> Add bdrv_dirty_bitmap_deserialize_ones() function, which is needed for
>>> qcow2 bitmap loading, to handle unallocated bitmap parts, marked as
>>> all-ones.
>>>
> 
>>> + * hbitmap_deserialize_ones
>>> + * @hb: HBitmap to operate on.
>>> + * @start: First bit to restore.
>>> + * @count: Number of bits to restore.
>>
>> This part is accurate (the dirty-bitmap is using an underlying bitmap
>> with "one bit per sector" before my series, afterwards it will be "one
>> bit per byte", remembering that hbitmap really stores only one bit per
>> granularity multiple of the underlying unit), if incomplete (the code
>> asserts that things are aligned, but doesn't document that the caller
>> must pass in aligned values); but again, that's matching the
>> pre-existing deserialize_zeroes code.
> 
> Okay, I looked again; the documentation for
> hbitmap_serialization_granularity() has a blanket statement that all
> other hbitmap_serialization_* functions taking a start and count must be
> aligned. Indirect, but at least documented, so I retract my statement
> about the docs being incomplete.
> 

If the docs are confusing, please send patches to amend them; the bitmap
code is definitely very confusing at times and I appreciate any and all
insight to make the names, variable and documentation read better to be
more intuitive.

Thanks!

Re: [Qemu-devel] [PATCH v22 19/30] qcow2: add persistent dirty bitmaps support

2017-06-30 Thread Eric Blake

On 06/29/2017 09:23 PM, Max Reitz wrote:
> On 2017-06-30 04:18, Eric Blake wrote:
>> On 06/28/2017 07:05 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> Store persistent dirty bitmaps in qcow2 image.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>>> Reviewed-by: Max Reitz 
>>> ---

>>
>> This grabs the size (currently in sectors, although I plan to fix it to
>> be in bytes)...
>>
>>> +const char *bm_name = bdrv_dirty_bitmap_name(bitmap);
>>> +uint8_t *buf = NULL;
>>> +BdrvDirtyBitmapIter *dbi;
>>> +uint64_t *tb;
>>> +uint64_t tb_size =
>>> +size_to_clusters(s,
>>> +bdrv_dirty_bitmap_serialization_size(bitmap, 0, bm_size));
>>
>> ...then finds out how many bytes are required to serialize the entire
>> image, where bm_size should be the same as before...
>>
>>> +while ((sector = bdrv_dirty_iter_next(dbi)) != -1) {
>>> +uint64_t cluster = sector / sbc;
>>> +uint64_t end, write_size;
>>> +int64_t off;
>>> +
>>> +sector = cluster * sbc;
>>> +end = MIN(bm_size, sector + sbc);
>>> +write_size =
>>> +bdrv_dirty_bitmap_serialization_size(bitmap, sector, end - 
>>> sector);
>>
>> But here, rather than tackling the entire image at once, you are
>> subdividing your queries along arbitrary lines. But nowhere do I see a
>> call to bdrv_dirty_bitmap_serialization_align() to make sure your
>> end-sector value is properly aligned; if it is not aligned, you will
>> trigger an assertion failure here...
> 
> It's 4:21 am here, so I cannot claim to be right, but if I remember
> correctly, it will automatically aligned because sbc is the number of
> bits (and thus sectors) a bitmap cluster covers.

Okay, I re-read the spec.  First thing I noticed: we have a potential
conflict if an image is allowed to be resized:

"All stored bitmaps are related to the virtual disk stored in the same
image, so each bitmap size is equal to the virtual disk size."

If you resize an image, does the bitmap size have to be adjusted as
well?  What if you create one bitmap, then take an internal snapshot,
then resize?  Or do we declare that (at least for now) the presence of a
bitmap is incompatible with the use of an internal snapshot?

Conversely, we state that:

"Structure of a bitmap directory entry:
...
 8 - 11:bitmap_table_size
Number of entries in the bitmap table of the bitmap."

Since a bitmap therefore tracks its own size, I think the earlier
statement that all bitmap sizes are equal to the virtual disk size is
too strict (there seems to be no technical reason why a bitmap can't
have a different size that the image).

But, having read that, you are correct that we are subdividing our
bitmaps according to what fits in a qcow2 cluster, and the smallest
qcow2 cluster we can come up with is 512-bytes (or 4k bits of the
bitmap).  Checking hbitmap.c, we are merely asserting that our alignment
is always a multiple of 64 << hb->granularity.  But since we are
partitioning a cluster at a time, our minimum alignment will be 512 <<
hb->granularity, which is always aligned.

So all the more I need to do is add an assertion.

> I'm for fixing it later. I would have announced the series "applied" an
> hour ago if I hadn't had to bisect iotest 055 breakage...

I'm working it into my v4 posting of dirty-bitmap cleanups.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [RFC 5/5] vifo: introduce new VFIO ioctl VFIO_DEVICE_PCI_GET_DIRTY_BITMAP

2017-06-30 Thread Alex Williamson

On Fri, 30 Jun 2017 05:14:40 +
"Tian, Kevin"  wrote:

> > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > Sent: Friday, June 30, 2017 4:57 AM
> > 
> > On Thu, 29 Jun 2017 00:10:59 +
> > "Tian, Kevin"  wrote:
> >   
> > > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > > Sent: Thursday, June 29, 2017 12:00 AM
> > > > Thanks Kevin.  So really it's not really a dirty bitmap, it's just a
> > > > bitmap of pages that the device has access to and may have dirtied.
> > > > Don't we have this more generally in the vfio type1 IOMMU backend?  For
> > > > a mediated device, we know all the pages that the vendor driver has
> > > > asked to be pinned.  Should we perhaps make this interface on the vfio
> > > > container rather than the device?  Any mediated device can provide this
> > > > level of detail without specific vendor support.  If we had DMA page
> > > > faulting, this would be the natural place to put it as well, so maybe
> > > > we should design the interface there to support everything similarly.
> > > > Thanks,
> > > >  
> > >
> > > That's a nice idea. Just two comments:
> > >
> > > 1) If some mediated device has its own way to construct true dirty
> > > bitmap (not thru DMA page faulting), the interface is better designed
> > > to allow that flexibility. Maybe an optional callback if not registered
> > > then use common type1 IOMMU logic otherwise prefers to vendor
> > > specific callback  
> > 
> > I'm not sure what that looks like, but I agree with the idea.  Could
> > the pages that type1 knows about every be anything other than a
> > superset of the dirty pages?  Perhaps a device ioctl to flush unused
> > mappings would be sufficient.  
> 
> sorry I didn't quite get your idea here. My understanding is that
> type1 is OK as an alternative in case mediated device has no way
> to track dirtied pages (as for Intel GPU), so we can use type1 pinned
> pages as an indirect way to indicate dirtied pages. But if mediated
> device has its own way (e.g. a device private MMU) to track dirty
> pages, then we should allow that device to provide dirty bitmap
> instead of using type1.

My thought was that our current mdev iommu interface allows the vendor
driver to pin specific pages.  In order for the mdev device to dirty a
page, we need for it to be pinned.  Therefore at worst, the set of
pages pinned in type1 is the superset of all pages that can potentially
be dirtied by the device.  In the worst case, this devolves to all
pages mapped through the iommu in the case of direct assigned devices.
My assertion is therefore that a device specific dirty page bitmap can
only be a subset of the type1 pinned pages.  Therefore if the mdev
vendor driver can flush any stale pinnings, then the type1 view off
pinned pages should match the devices view of the current working set.
Then we wouldn't need a device specific dirty bitmap, we'd only need a
mechanism to trigger a flush of stale mappings on the device.

Otherwise I'm not sure how we cleanly create an interface where the
dirty bitmap can either come from the device or the container... but
I'd welcome suggestions.  Thanks,

Alex
 
> > > 2) If there could be multiple mediated devices from different vendors
> > > in same container while not all mediated devices support live migration,
> > > would container-level interface impose some limitation?  
> > 
> > Dirty page logging is only one small part of migration, each
> > migrate-able device would still need to provide a device-level
> > interface to save/restore state.  The migration would fail when we get
> > to the device(s) that don't provide that.  Thanks,
> >   
> 
> Agree here. Yulei, can you investigate this direction and report back
> whether it's feasible or anything overlooked?
> 
> Thanks
> Kevin

Re: [Qemu-devel] [PULL 0/2] Block patches

2017-06-30 Thread Peter Maydell

On 30 June 2017 at 15:10, Fam Zheng <f...@redhat.com> wrote:
> The following changes since commit 36f87b4513373b3cd79c87c9197d17face95d4ac:
>
>   Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.10-20170630' 
> into staging (2017-06-30 11:58:49 +0100)
>
> are available in the git repository at:
>
>   git://github.com/famz/qemu.git tags/block-pull-request
>
> for you to fetch changes up to c61e684e44272f2acb2bef34cf2aa234582a73a9:
>
>   block: Exploit BDRV_BLOCK_EOF for larger zero blocks (2017-06-30 21:48:06 
> +0800)
>
> 
>
> Hi Peter,
>
> Here are Eric Blake's enhancement to block layer API. Thanks!
>
> 

Applied, thanks.

-- PMM

Re: [Qemu-devel] [PATCH] s390: return unavailable features via query-cpu-definitions

2017-06-30 Thread David Hildenbrand

On 30.06.2017 15:25, Viktor Mihajlovski wrote:
> The response for query-cpu-definitions didn't include the
> unavailable-features field, which is used by libvirt to figure
> out whether a certain cpu model is usable on the host.
> 
> The unavailable features are now computed by obtaining the host CPU
> model and comparing its feature bitmap with the feature bitmaps of
> the known CPU models.
> 
> I.e. the output of virsh domcapabilities would change from
>  ...
>  
>   z10EC-base
>   z9EC-base
>   z196.2-base
>   z900-base
>   z990
>  ...
> to
>  ...
>  
>   z10EC-base
>   z9EC-base
>   z196.2-base
>   z900-base
>   z990
>  ...
> 
> Signed-off-by: Viktor Mihajlovski 
> ---
>  target/s390x/cpu_models.c | 51 
> ++-
>  1 file changed, 46 insertions(+), 5 deletions(-)
> 
> diff --git a/target/s390x/cpu_models.c b/target/s390x/cpu_models.c
> index 63903c2..dc3371f 100644
> --- a/target/s390x/cpu_models.c
> +++ b/target/s390x/cpu_models.c
> @@ -283,14 +283,43 @@ void s390_cpu_list(FILE *f, fprintf_function print)
>  }
>  }
>  
> +static S390CPUModel *get_max_cpu_model(Error **errp);
> +
>  #ifndef CONFIG_USER_ONLY
> +static void list_add_feat(const char *name, void *opaque);
> +
> +static void check_unavailable_features(const S390CPUModel *max_model,
> +   const S390CPUModel *model,
> +   strList **unavailable)
> +{
> +S390FeatBitmap missing;
> +
> +/* detect missing features if any to properly report them */
> +bitmap_andnot(missing, model->features, max_model->features,
> +  S390_FEAT_MAX);
> +if (!bitmap_empty(missing, S390_FEAT_MAX)) {
> +s390_feat_bitmap_to_ascii(missing,
> +  unavailable,
> +  list_add_feat);

I remember discussing this with Eduardo when he introduced this.

There is one additional case to handle here that might turn a CPU model
not runnable. This can happen when trying to run a new CPU model on an
old host, where the features are not the problem, but the model itself.
E.g. running a GA2 version on a GA1 is not allowed. But this can happen
more generally also between hardware generations.

Therefore, whenever the model is never than the host model, we have to
add here the special property "type" as missing feature. The
documentation partly coveres what we had in mind.

qapi-schema.json:
# @unavailable-features is a list of QOM property names that

# represent CPU model attributes that prevent the CPU from running.

# If the QOM property is read-only, that means there's no known

# way to make the CPU model run in the current host. Implementations

# that choose not to provide specific information return the

# property name "type".

We discussed that "type" should always be added if there is no way to
make the model runnable (by simply removing features).

See check_compatibility() for details. For these cases, add "type" to
the list. (you might be able to extend check_compatibility(), making
e.g. the *errp and *unavailable parameters optional).

-- 

Thanks,

David

Re: [Qemu-devel] [PATCH 1/2] vhost: ensure vhost_ops are set before calling iotlb callback

2017-06-30 Thread Marc-André Lureau



- Original Message -
> This patch fixes a crash that happens when vhost-user iommu
> support is enabled and vhost-user socket is closed.
> 
> When it happens, if an IOTLB invalidation notification is sent
> by the IOMMU, vhost_ops's NULL pointer is dereferenced.
> 
> Signed-off-by: Maxime Coquelin 

looks fine to me,

Reviewed-by: Marc-André Lureau 

> ---
>  hw/virtio/vhost-backend.c | 10 --
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
> index 4e31de1..cb055e8 100644
> --- a/hw/virtio/vhost-backend.c
> +++ b/hw/virtio/vhost-backend.c
> @@ -309,7 +309,10 @@ int vhost_backend_update_device_iotlb(struct vhost_dev
> *dev,
>  return -EINVAL;
>  }
>  
> -return dev->vhost_ops->vhost_send_device_iotlb_msg(dev, );
> +if (dev->vhost_ops && dev->vhost_ops->vhost_send_device_iotlb_msg)
> +return dev->vhost_ops->vhost_send_device_iotlb_msg(dev, );
> +
> +return -ENODEV;
>  }
>  
>  int vhost_backend_invalidate_device_iotlb(struct vhost_dev *dev,
> @@ -321,7 +324,10 @@ int vhost_backend_invalidate_device_iotlb(struct
> vhost_dev *dev,
>  imsg.size = len;
>  imsg.type = VHOST_IOTLB_INVALIDATE;
>  
> -return dev->vhost_ops->vhost_send_device_iotlb_msg(dev, );
> +if (dev->vhost_ops && dev->vhost_ops->vhost_send_device_iotlb_msg)
> +return dev->vhost_ops->vhost_send_device_iotlb_msg(dev, );
> +
> +return -ENODEV;
>  }
>  
>  int vhost_backend_handle_iotlb_msg(struct vhost_dev *dev,
> --
> 2.9.4
> 
>

Re: [Qemu-devel] postcopy migration hangs while loading virtio state

2017-06-30 Thread Dr. David Alan Gilbert

* Christian Borntraeger (borntrae...@de.ibm.com) wrote:
> On 04/26/2017 01:45 PM, Christian Borntraeger wrote:
> 
> >> Hmm, I have a theory, if the flags field has bit 1 set, i.e. 
> >> RAM_SAVE_FLAG_COMPRESS
> >> then try changing ram_handle_compressed to always do the memset.
> > 
> > FWIW, changing ram_handle_compressed to always memset makes the problem go 
> > away.
> 
> It is still running fine now with the "always memset change"

Did we ever nail down a fix for this; as I remember Andrea said
we shouldn't need to do that memset, but we came to the conclusion
it was something specific to how s390 protection keys worked.

Dave

--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH 2/2] vhost-user: unregister slave req handler at cleanup time

2017-06-30 Thread Marc-André Lureau



- Original Message -
> If the backend sends a request just before closing the socket,
> the aio dispatcher might schedule its reading after the vhost
> device has been cleaned, leading to a NULL pointer dereference
> in slave_read();
> 
> vhost_user_cleanup() already closes the socket but it is not
> enough, the handler has to be unregistered.
> 
> Signed-off-by: Maxime Coquelin 

Reviewed-by: Marc-André Lureau 


> ---
>  hw/virtio/vhost-user.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index 958ee09..2203011 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -779,6 +779,7 @@ static int vhost_user_cleanup(struct vhost_dev *dev)
>  
>  u = dev->opaque;
>  if (u->slave_fd >= 0) {
> +qemu_set_fd_handler(u->slave_fd, NULL, NULL, NULL);
>  close(u->slave_fd);
>  u->slave_fd = -1;
>  }
> --
> 2.9.4
> 
>

[Qemu-devel] [PATCH] migration/rdma: Fix race on source

2017-06-30 Thread Dr. David Alan Gilbert (git)

From: "Dr. David Alan Gilbert" 

Fix a race where the destination might try and send the source a
WRID_READY before the source has done a post-recv for it.

rdma_post_recv has to happen after the qp exists, and we're
OK since we've already called qemu_rdma_source_init that calls
qemu_alloc_qp.

This corresponds to:
https://bugzilla.redhat.com/show_bug.cgi?id=1285044

The race can be triggered by adding a few ms wait before this
post_recv_control (which was originally due to me turning on loads of
debug).

Signed-off-by: Dr. David Alan Gilbert 
---
 migration/rdma.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index c6bc607a03..6111e10c70 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2365,6 +2365,12 @@ static int qemu_rdma_connect(RDMAContext *rdma, Error 
**errp)
 
 caps_to_network();
 
+ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY);
+if (ret) {
+ERROR(errp, "posting second control recv");
+goto err_rdma_source_connect;
+}
+
 ret = rdma_connect(rdma->cm_id, _param);
 if (ret) {
 perror("rdma_connect");
@@ -2405,12 +2411,6 @@ static int qemu_rdma_connect(RDMAContext *rdma, Error 
**errp)
 
 rdma_ack_cm_event(cm_event);
 
-ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY);
-if (ret) {
-ERROR(errp, "posting second control recv!");
-goto err_rdma_source_connect;
-}
-
 rdma->control_ready_expected = 1;
 rdma->nb_sent = 0;
 return 0;
-- 
2.13.0

Re: [Qemu-devel] [RFC PATCH 1/3] vmstate: error hint for failed equal checks

2017-06-30 Thread Halil Pasic



On 06/30/2017 04:54 PM, Eric Blake wrote:
> On 06/30/2017 09:41 AM, Halil Pasic wrote:
 'This' basically boils down to the question and
 'Why aren't hints reported in QMP context?'
>>>
>>> QMP is supposed to be machine-parseable.  Hints are supposed to be
>>> human-readable. If you have a machine managing the monitor, the hint
>>> adds nothing but bandwidth consumption, because machine should not be
>>> parsing the human portion of the error message in the first place (as it
>>> is, libvirt already just logs the human-readable portion of a message,
>>> and bases its actions solely on the machine-stable portions of an error
>>> reply: namely, whether an error was sent at all, and occasionally, what
>>> error class was used for that error - there's no guarantee a human will
>>> be reading the log, though).
>>
>>
>> Seems I've made wrong assumptions about error messages (in QEMU) up until
>> now. If I understand you correctly, in QEMU error messages are part of
>> the API (but hints are not). Thus if one changes a typo in an error
>> message (like here
>> https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg06732.html) the
>> one is strictly speaking breaking API backward compatibility.  Is that
>> really the way we want to have things?
> 
> Quite the opposite. In QMP, the EXISTENCE of an error message is part of
> the API, but the CONTENTS of the message are not (machines are not
> supposed to further parse the message) - anything that the machine would
> want to differentiate between two different possible error messages
> should instead be conveyed via a second field in the same returned
> dictionary (the error class), and not by parsing the message.  

I think we are in agreement, it's just that you call 'error message' what
I would call 'error response' (from docs/qmp-spec.txt). For me an error
response MAY OR MAY NOT or MUST (I don't know it is not stated in
qmp-spec.txt, and qapi-schema.json did not make me much smarter: I would
guess may or may not -- there is even some comment in qapi-schema showing
it that direction) contain a 'desc' which is per definition "- The
"desc" member is a human-readable error message. Clients should not
attempt to parse this message.".

So I would call that 'error message'. If the logic (modulo reporting) in
libvirt (I don't know, my focus isn't libvirt) or any other management
software depends on the EXISTENCE of 'desc' (or human-readable portion of
some error API object) I find that weird, but it's a definition thing.


> Most
> often, there is not a strong case for having differentiation, so most
> errors are lumped in the generic class (error_setg() makes this easy to
> do by default).  An example where differentiation matters: look at the
> "Important Note" in blockdev.c:qmp_block_commit().

I think I have seen that. I find the 'strong discouragement' weird, because
if there is a reason to have differentiation the error class is the way
to go. And if there is no reason to -- it should be obvious.

> 
>>
>> From prior experiences I'm more used to think about error messages as
>> something meant for human consumption, and expressing things expected to
>> be relevant for some kind of client code in a different way (optimized
>> for machine consumption).
>>
>> If however the error message ain't part of the machine relevant portion,
>> then the same argument applies as to the 'hint', and I don't see the
>> reason for handling hints differently. Do you agree with my
>> argumentation?
> 
> Indeed, it may not hurt to start passing the hints over the wire (errors
> would then consume more bandwidth, but errors are not the hot path).
> And I'm not necessarily opposed to that change, so much as trying to
> document why it is not currently the case.  At the same time, I probably
> won't be the one writing a path to populate the hint information into
> the QMP error, as I don't have any reason to use the hint when
> controlling libvirt (except maybe for logging, but there, the hint is
> not going to help the end user, because it's not the end-user's fault
> that libvirt used the API wrong to get a hint in the first place).

For me both human readable things make sense only for error reporting
(effectively logging). Error.msg should IMHO be different, than Error.hint.
The existence of an error should be indicated by the Error object.

> 
> 
>>> If something absolutely must be reported, then it is not a hint, and
>>> shouldn't be using the hint mechanism.
>>>
>>
>> I find it hard to formulate criteria for 'must be reported'. I'm afraid
>> this is backwards logic: since the hint may not be reported everything
>> that needs to be reported is not a hint. This is a valid approach of
>> course, but then I think some modifications to the comments in error.h
>> would not hurt. And maybe something with verbose would be more
>> expressive name.
>>
>> I hope all this makes some sense and ain't pure waste of time...
> 
> No, it never hurts to question whether the design is

Re: [Qemu-devel] [PATCH v4 2/3] tcg/aarch64: Use ADRP+ADD to compute target address

2017-06-30 Thread Richard Henderson


On 06/30/2017 07:36 AM, Pranith Kumar wrote:

We use ADRP+ADD to compute the target address for goto_tb. This patch
introduces the NOP instruction which is used to align the above
instruction pair so that we can use one atomic instruction to patch
the destination offsets.

CC: Richard Henderson
CC: Alex Bennée
Signed-off-by: Pranith Kumar
---
  accel/tcg/translate-all.c|  2 +-
  tcg/aarch64/tcg-target.inc.c | 36 ++--
  2 files changed, 31 insertions(+), 7 deletions(-)


Reviewed-by: Richard Henderson 


r~

[Qemu-devel] [PATCH 3/4] xen/mapcache: introduce xen_remap_cache_entry()

2017-06-30 Thread Igor Druzhinin

This new call is trying to update a requested map cache entry
according to the changes in the physmap. The call is searching
for the entry, unmaps it, tries to translate the address and
maps again at the same place. If the mapping is dummy this call
will make it real.

This function makes use of a new xenforeignmemory_map2() call
with extended interface that was recently introduced in
libxenforeignmemory [1].

[1] https://www.mail-archive.com/xen-devel@lists.xen.org/msg113007.html

Signed-off-by: Igor Druzhinin 
---
 configure |  18 
 hw/i386/xen/xen-mapcache.c| 105 +++---
 include/hw/xen/xen_common.h   |   7 +++
 include/sysemu/xen-mapcache.h |   6 +++
 4 files changed, 130 insertions(+), 6 deletions(-)

diff --git a/configure b/configure
index c571ad1..ad6156b 100755
--- a/configure
+++ b/configure
@@ -2021,6 +2021,24 @@ EOF
 # Xen unstable
 elif
 cat > $TMPC <
+int main(void) {
+  xenforeignmemory_handle *xfmem;
+
+  xfmem = xenforeignmemory_open(0, 0);
+  xenforeignmemory_map2(xfmem, 0, 0, 0, 0, 0, 0, 0);
+
+  return 0;
+}
+EOF
+compile_prog "" "$xen_libs -lxendevicemodel $xen_stable_libs"
+  then
+  xen_stable_libs="-lxendevicemodel $xen_stable_libs"
+  xen_ctrl_version=41000
+  xen=yes
+elif
+cat > $TMPC <
diff --git a/hw/i386/xen/xen-mapcache.c b/hw/i386/xen/xen-mapcache.c
index 05050de..5d8d990 100644
--- a/hw/i386/xen/xen-mapcache.c
+++ b/hw/i386/xen/xen-mapcache.c
@@ -149,6 +149,7 @@ void xen_map_cache_init(phys_offset_to_gaddr_t f, void 
*opaque)
 }
 
 static void xen_remap_bucket(MapCacheEntry *entry,
+ void *vaddr,
  hwaddr size,
  hwaddr address_index,
  bool dummy)
@@ -179,11 +180,11 @@ static void xen_remap_bucket(MapCacheEntry *entry,
 }
 
 if (!dummy) {
-vaddr_base = xenforeignmemory_map(xen_fmem, xen_domid,
-   PROT_READ|PROT_WRITE,
+vaddr_base = xenforeignmemory_map2(xen_fmem, xen_domid, vaddr,
+   PROT_READ|PROT_WRITE, 0,
nb_pfn, pfns, err);
 if (vaddr_base == NULL) {
-perror("xenforeignmemory_map");
+perror("xenforeignmemory_map2");
 exit(-1);
 }
 } else {
@@ -191,7 +192,7 @@ static void xen_remap_bucket(MapCacheEntry *entry,
  * We create dummy mappings where we are unable to create a foreign
  * mapping immediately due to certain circumstances (i.e. on resume 
now)
  */
-vaddr_base = mmap(NULL, size, PROT_READ|PROT_WRITE,
+vaddr_base = mmap(vaddr, size, PROT_READ|PROT_WRITE,
   MAP_ANON|MAP_SHARED, -1, 0);
 if (vaddr_base == NULL) {
 perror("mmap");
@@ -278,14 +279,14 @@ tryagain:
 if (!entry) {
 entry = g_malloc0(sizeof (MapCacheEntry));
 pentry->next = entry;
-xen_remap_bucket(entry, cache_size, address_index, dummy);
+xen_remap_bucket(entry, NULL, cache_size, address_index, dummy);
 } else if (!entry->lock) {
 if (!entry->vaddr_base || entry->paddr_index != address_index ||
 entry->size != cache_size ||
 !test_bits(address_offset >> XC_PAGE_SHIFT,
 test_bit_size >> XC_PAGE_SHIFT,
 entry->valid_mapping)) {
-xen_remap_bucket(entry, cache_size, address_index, dummy);
+xen_remap_bucket(entry, NULL, cache_size, address_index, dummy);
 }
 }
 
@@ -482,3 +483,95 @@ void xen_invalidate_map_cache(void)
 
 mapcache_unlock();
 }
+
+static uint8_t *xen_remap_cache_entry_unlocked(hwaddr phys_addr, hwaddr size)
+{
+MapCacheEntry *entry, *pentry = NULL;
+hwaddr address_index;
+hwaddr address_offset;
+hwaddr cache_size = size;
+hwaddr test_bit_size;
+void *vaddr = NULL;
+uint8_t lock;
+
+address_index  = phys_addr >> MCACHE_BUCKET_SHIFT;
+address_offset = phys_addr & (MCACHE_BUCKET_SIZE - 1);
+
+/* test_bit_size is always a multiple of XC_PAGE_SIZE */
+if (size) {
+test_bit_size = size + (phys_addr & (XC_PAGE_SIZE - 1));
+if (test_bit_size % XC_PAGE_SIZE) {
+test_bit_size += XC_PAGE_SIZE - (test_bit_size % XC_PAGE_SIZE);
+}
+cache_size = size + address_offset;
+if (cache_size % MCACHE_BUCKET_SIZE) {
+cache_size += MCACHE_BUCKET_SIZE - (cache_size % 
MCACHE_BUCKET_SIZE);
+}
+} else {
+test_bit_size = XC_PAGE_SIZE;
+cache_size = MCACHE_BUCKET_SIZE;
+}
+
+/* Search for the requested map cache entry to invalidate */
+entry = >entry[address_index % mapcache->nr_buckets];
+while (entry && !(entry->paddr_index == address_index && entry->size == 
cache_size)) {

[Qemu-devel] [PATCH 2/4] xen/mapcache: add an ability to create dummy mappings

2017-06-30 Thread Igor Druzhinin

Dummys are simple anonymous mappings that are placed instead
of regular foreign mappings in certain situations when we need
to postpone the actual mapping but still have to give a
memory region to QEMU to play with.

This is planned to be used for restore on Xen.

Signed-off-by: Igor Druzhinin 
---
 hw/i386/xen/xen-mapcache.c | 36 
 1 file changed, 28 insertions(+), 8 deletions(-)

diff --git a/hw/i386/xen/xen-mapcache.c b/hw/i386/xen/xen-mapcache.c
index e60156c..05050de 100644
--- a/hw/i386/xen/xen-mapcache.c
+++ b/hw/i386/xen/xen-mapcache.c
@@ -150,7 +150,8 @@ void xen_map_cache_init(phys_offset_to_gaddr_t f, void 
*opaque)
 
 static void xen_remap_bucket(MapCacheEntry *entry,
  hwaddr size,
- hwaddr address_index)
+ hwaddr address_index,
+ bool dummy)
 {
 uint8_t *vaddr_base;
 xen_pfn_t *pfns;
@@ -177,11 +178,25 @@ static void xen_remap_bucket(MapCacheEntry *entry,
 pfns[i] = (address_index << (MCACHE_BUCKET_SHIFT-XC_PAGE_SHIFT)) + i;
 }
 
-vaddr_base = xenforeignmemory_map(xen_fmem, xen_domid, 
PROT_READ|PROT_WRITE,
-  nb_pfn, pfns, err);
-if (vaddr_base == NULL) {
-perror("xenforeignmemory_map");
-exit(-1);
+if (!dummy) {
+vaddr_base = xenforeignmemory_map(xen_fmem, xen_domid,
+   PROT_READ|PROT_WRITE,
+   nb_pfn, pfns, err);
+if (vaddr_base == NULL) {
+perror("xenforeignmemory_map");
+exit(-1);
+}
+} else {
+/*
+ * We create dummy mappings where we are unable to create a foreign
+ * mapping immediately due to certain circumstances (i.e. on resume 
now)
+ */
+vaddr_base = mmap(NULL, size, PROT_READ|PROT_WRITE,
+  MAP_ANON|MAP_SHARED, -1, 0);
+if (vaddr_base == NULL) {
+perror("mmap");
+exit(-1);
+}
 }
 
 entry->vaddr_base = vaddr_base;
@@ -211,6 +226,7 @@ static uint8_t *xen_map_cache_unlocked(hwaddr phys_addr, 
hwaddr size,
 hwaddr cache_size = size;
 hwaddr test_bit_size;
 bool translated = false;
+bool dummy = false;
 
 tryagain:
 address_index  = phys_addr >> MCACHE_BUCKET_SHIFT;
@@ -262,14 +278,14 @@ tryagain:
 if (!entry) {
 entry = g_malloc0(sizeof (MapCacheEntry));
 pentry->next = entry;
-xen_remap_bucket(entry, cache_size, address_index);
+xen_remap_bucket(entry, cache_size, address_index, dummy);
 } else if (!entry->lock) {
 if (!entry->vaddr_base || entry->paddr_index != address_index ||
 entry->size != cache_size ||
 !test_bits(address_offset >> XC_PAGE_SHIFT,
 test_bit_size >> XC_PAGE_SHIFT,
 entry->valid_mapping)) {
-xen_remap_bucket(entry, cache_size, address_index);
+xen_remap_bucket(entry, cache_size, address_index, dummy);
 }
 }
 
@@ -282,6 +298,10 @@ tryagain:
 translated = true;
 goto tryagain;
 }
+if (!dummy && runstate_check(RUN_STATE_INMIGRATE)) {
+dummy = true;
+goto tryagain;
+}
 trace_xen_map_cache_return(NULL);
 return NULL;
 }
-- 
2.7.4

[Qemu-devel] [PATCH 4/4] xen: don't use xenstore to save/restore physmap anymore

2017-06-30 Thread Igor Druzhinin

If we have a system with xenforeignmemory_map2() implemented
we don't need to save/restore physmap on suspend/restore
anymore. In case we resume a VM without physmap - try to
recreate the physmap during memory region restore phase and
remap map cache entries accordingly. The old code is left
for compatibility reasons.

Signed-off-by: Igor Druzhinin 
---
 hw/i386/xen/xen-hvm.c   | 45 ++---
 include/hw/xen/xen_common.h |  1 +
 2 files changed, 35 insertions(+), 11 deletions(-)

diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
index d259cf7..1b6a5ce 100644
--- a/hw/i386/xen/xen-hvm.c
+++ b/hw/i386/xen/xen-hvm.c
@@ -305,6 +305,7 @@ static hwaddr xen_phys_offset_to_gaddr(hwaddr start_addr,
 return start_addr;
 }
 
+#ifdef XEN_COMPAT_PHYSMAP
 static int xen_save_physmap(XenIOState *state, XenPhysmap *physmap)
 {
 char path[80], value[17];
@@ -334,6 +335,12 @@ static int xen_save_physmap(XenIOState *state, XenPhysmap 
*physmap)
 }
 return 0;
 }
+#else
+static int xen_save_physmap(XenIOState *state, XenPhysmap *physmap)
+{
+return 0;
+}
+#endif
 
 static int xen_add_to_physmap(XenIOState *state,
   hwaddr start_addr,
@@ -368,6 +375,26 @@ go_physmap:
 DPRINTF("mapping vram to %"HWADDR_PRIx" - %"HWADDR_PRIx"\n",
 start_addr, start_addr + size);
 
+mr_name = memory_region_name(mr);
+
+physmap = g_malloc(sizeof (XenPhysmap));
+
+physmap->start_addr = start_addr;
+physmap->size = size;
+physmap->name = mr_name;
+physmap->phys_offset = phys_offset;
+
+QLIST_INSERT_HEAD(>physmap, physmap, list);
+
+if (runstate_check(RUN_STATE_INMIGRATE)) {
+/* Now when we have a physmap entry we can remap a dummy mapping and 
change
+ * it to a real one of guest foreign memory. */
+uint8_t *p = xen_remap_cache_entry(phys_offset, size);
+assert(p && p == memory_region_get_ram_ptr(mr));
+
+return 0;
+}
+
 pfn = phys_offset >> TARGET_PAGE_BITS;
 start_gpfn = start_addr >> TARGET_PAGE_BITS;
 for (i = 0; i < size >> TARGET_PAGE_BITS; i++) {
@@ -382,21 +409,11 @@ go_physmap:
 }
 }
 
-mr_name = memory_region_name(mr);
-
-physmap = g_malloc(sizeof (XenPhysmap));
-
-physmap->start_addr = start_addr;
-physmap->size = size;
-physmap->name = mr_name;
-physmap->phys_offset = phys_offset;
-
-QLIST_INSERT_HEAD(>physmap, physmap, list);
-
 xc_domain_pin_memory_cacheattr(xen_xc, xen_domid,
start_addr >> TARGET_PAGE_BITS,
(start_addr + size - 1) >> TARGET_PAGE_BITS,
XEN_DOMCTL_MEM_CACHEATTR_WB);
+
 return xen_save_physmap(state, physmap);
 }
 
@@ -1158,6 +1175,7 @@ static void xen_exit_notifier(Notifier *n, void *data)
 xs_daemon_close(state->xenstore);
 }
 
+#ifdef XEN_COMPAT_PHYSMAP
 static void xen_read_physmap(XenIOState *state)
 {
 XenPhysmap *physmap = NULL;
@@ -1205,6 +1223,11 @@ static void xen_read_physmap(XenIOState *state)
 }
 free(entries);
 }
+#else
+static void xen_read_physmap(XenIOState *state)
+{
+}
+#endif
 
 static void xen_wakeup_notifier(Notifier *notifier, void *data)
 {
diff --git a/include/hw/xen/xen_common.h b/include/hw/xen/xen_common.h
index 70a5cad..c04c5c9 100644
--- a/include/hw/xen/xen_common.h
+++ b/include/hw/xen/xen_common.h
@@ -80,6 +80,7 @@ extern xenforeignmemory_handle *xen_fmem;
 
 #if CONFIG_XEN_CTRL_INTERFACE_VERSION < 41000
 
+#define XEN_COMPAT_PHYSMAP
 #define xenforeignmemory_map2(h, d, a, p, f, ps, ar, e) \
 xenforeignmemory_map(h, d, p, ps, ar, e)
 
-- 
2.7.4

[Qemu-devel] [PATCH 0/4] xen: don't save/restore the physmap on VM save/restore

2017-06-30 Thread Igor Druzhinin

Saving/restoring the physmap to/from xenstore was introduced to
QEMU majorly in order to cover up the VRAM region restore issue.
The sequence of restore operations implies that we should know
the effective guest VRAM address *before* we have the VRAM region
restored (which happens later). Unfortunately, in Xen environment
VRAM memory does actually belong to a guest - not QEMU itself -
which means the position of this region is unknown beforehand and
can't be mapped into QEMU address space immediately.

Previously, recreating xenstore keys, holding the physmap, by the
toolstack helped to get this information in place at the right
moment ready to be consumed by QEMU to map the region properly.
But using xenstore for it has certain disadvantages: toolstack
needs to be aware of these keys and save/restore them accordingly;
accessing xenstore requires extra privileges which hinders QEMU
sandboxing.

The previous attempt to get rid of that was to remember all the
VRAM pointers during QEMU initialization phase and then update
them all at once when an actual foreign mapping is established.
Unfortunately, this approach worked only for VRAM and only for
a predefined set of devices - stdvga and cirrus. QXL and other
possible future devices using a moving emulated MMIO region
would be equally broken.

The new approach leverages xenforeignmemory_map2() call recently
introduced in libxenforeignmemory. It allows to create a dummy
anonymous mapping for QEMU during its initialization and change
it to a real one later during machine state restore.

Igor Druzhinin (4):
  xen: move physmap saving into a separate function
  xen/mapcache: add an ability to create dummy mappings
  xen/mapcache: introduce xen_remap_cache_entry()
  xen: don't use xenstore to save/restore physmap anymore

 configure |  18 ++
 hw/i386/xen/xen-hvm.c | 100 
 hw/i386/xen/xen-mapcache.c| 129 +++---
 include/hw/xen/xen_common.h   |   8 +++
 include/sysemu/xen-mapcache.h |   6 ++
 5 files changed, 217 insertions(+), 44 deletions(-)

-- 
2.7.4

[Qemu-devel] [PATCH 1/4] xen: move physmap saving into a separate function

2017-06-30 Thread Igor Druzhinin

Non-functional change.

Signed-off-by: Igor Druzhinin 
---
 hw/i386/xen/xen-hvm.c | 57 ---
 1 file changed, 31 insertions(+), 26 deletions(-)

diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
index cffa7e2..d259cf7 100644
--- a/hw/i386/xen/xen-hvm.c
+++ b/hw/i386/xen/xen-hvm.c
@@ -305,6 +305,36 @@ static hwaddr xen_phys_offset_to_gaddr(hwaddr start_addr,
 return start_addr;
 }
 
+static int xen_save_physmap(XenIOState *state, XenPhysmap *physmap)
+{
+char path[80], value[17];
+
+snprintf(path, sizeof(path),
+"/local/domain/0/device-model/%d/physmap/%"PRIx64"/start_addr",
+xen_domid, (uint64_t)physmap->phys_offset);
+snprintf(value, sizeof(value), "%"PRIx64, (uint64_t)physmap->start_addr);
+if (!xs_write(state->xenstore, 0, path, value, strlen(value))) {
+return -1;
+}
+snprintf(path, sizeof(path),
+"/local/domain/0/device-model/%d/physmap/%"PRIx64"/size",
+xen_domid, (uint64_t)physmap->phys_offset);
+snprintf(value, sizeof(value), "%"PRIx64, (uint64_t)physmap->size);
+if (!xs_write(state->xenstore, 0, path, value, strlen(value))) {
+return -1;
+}
+if (physmap->name) {
+snprintf(path, sizeof(path),
+"/local/domain/0/device-model/%d/physmap/%"PRIx64"/name",
+xen_domid, (uint64_t)physmap->phys_offset);
+if (!xs_write(state->xenstore, 0, path,
+  physmap->name, strlen(physmap->name))) {
+return -1;
+}
+}
+return 0;
+}
+
 static int xen_add_to_physmap(XenIOState *state,
   hwaddr start_addr,
   ram_addr_t size,
@@ -316,7 +346,6 @@ static int xen_add_to_physmap(XenIOState *state,
 XenPhysmap *physmap = NULL;
 hwaddr pfn, start_gpfn;
 hwaddr phys_offset = memory_region_get_ram_addr(mr);
-char path[80], value[17];
 const char *mr_name;
 
 if (get_physmapping(state, start_addr, size)) {
@@ -368,31 +397,7 @@ go_physmap:
start_addr >> TARGET_PAGE_BITS,
(start_addr + size - 1) >> TARGET_PAGE_BITS,
XEN_DOMCTL_MEM_CACHEATTR_WB);
-
-snprintf(path, sizeof(path),
-"/local/domain/0/device-model/%d/physmap/%"PRIx64"/start_addr",
-xen_domid, (uint64_t)phys_offset);
-snprintf(value, sizeof(value), "%"PRIx64, (uint64_t)start_addr);
-if (!xs_write(state->xenstore, 0, path, value, strlen(value))) {
-return -1;
-}
-snprintf(path, sizeof(path),
-"/local/domain/0/device-model/%d/physmap/%"PRIx64"/size",
-xen_domid, (uint64_t)phys_offset);
-snprintf(value, sizeof(value), "%"PRIx64, (uint64_t)size);
-if (!xs_write(state->xenstore, 0, path, value, strlen(value))) {
-return -1;
-}
-if (mr_name) {
-snprintf(path, sizeof(path),
-"/local/domain/0/device-model/%d/physmap/%"PRIx64"/name",
-xen_domid, (uint64_t)phys_offset);
-if (!xs_write(state->xenstore, 0, path, mr_name, strlen(mr_name))) {
-return -1;
-}
-}
-
-return 0;
+return xen_save_physmap(state, physmap);
 }
 
 static int xen_remove_from_physmap(XenIOState *state,
-- 
2.7.4

[Qemu-devel] [PATCH 1/2] vhost: ensure vhost_ops are set before calling iotlb callback

2017-06-30 Thread Maxime Coquelin

This patch fixes a crash that happens when vhost-user iommu
support is enabled and vhost-user socket is closed.

When it happens, if an IOTLB invalidation notification is sent
by the IOMMU, vhost_ops's NULL pointer is dereferenced.

Signed-off-by: Maxime Coquelin 
---
 hw/virtio/vhost-backend.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
index 4e31de1..cb055e8 100644
--- a/hw/virtio/vhost-backend.c
+++ b/hw/virtio/vhost-backend.c
@@ -309,7 +309,10 @@ int vhost_backend_update_device_iotlb(struct vhost_dev 
*dev,
 return -EINVAL;
 }
 
-return dev->vhost_ops->vhost_send_device_iotlb_msg(dev, );
+if (dev->vhost_ops && dev->vhost_ops->vhost_send_device_iotlb_msg)
+return dev->vhost_ops->vhost_send_device_iotlb_msg(dev, );
+
+return -ENODEV;
 }
 
 int vhost_backend_invalidate_device_iotlb(struct vhost_dev *dev,
@@ -321,7 +324,10 @@ int vhost_backend_invalidate_device_iotlb(struct vhost_dev 
*dev,
 imsg.size = len;
 imsg.type = VHOST_IOTLB_INVALIDATE;
 
-return dev->vhost_ops->vhost_send_device_iotlb_msg(dev, );
+if (dev->vhost_ops && dev->vhost_ops->vhost_send_device_iotlb_msg)
+return dev->vhost_ops->vhost_send_device_iotlb_msg(dev, );
+
+return -ENODEV;
 }
 
 int vhost_backend_handle_iotlb_msg(struct vhost_dev *dev,
-- 
2.9.4

[Qemu-devel] [PATCH 2/2] vhost-user: unregister slave req handler at cleanup time

2017-06-30 Thread Maxime Coquelin

If the backend sends a request just before closing the socket,
the aio dispatcher might schedule its reading after the vhost
device has been cleaned, leading to a NULL pointer dereference
in slave_read();

vhost_user_cleanup() already closes the socket but it is not
enough, the handler has to be unregistered.

Signed-off-by: Maxime Coquelin 
---
 hw/virtio/vhost-user.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 958ee09..2203011 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -779,6 +779,7 @@ static int vhost_user_cleanup(struct vhost_dev *dev)
 
 u = dev->opaque;
 if (u->slave_fd >= 0) {
+qemu_set_fd_handler(u->slave_fd, NULL, NULL, NULL);
 close(u->slave_fd);
 u->slave_fd = -1;
 }
-- 
2.9.4

[Qemu-devel] [PATCH 0/2] vhost-user: fix crashes on disconnect when iommu is on

2017-06-30 Thread Maxime Coquelin

This two patches series aims at fixing a couple of crashes
that happens when the vhost-user socket is closed and iommu
enabled.

Maxime Coquelin (2):
  vhost: ensure vhost_ops are set before calling iotlb callback
  vhost-user: unregister slave req handler at cleanup time

 hw/virtio/vhost-backend.c | 10 --
 hw/virtio/vhost-user.c|  1 +
 2 files changed, 9 insertions(+), 2 deletions(-)

-- 
2.9.4

Re: [Qemu-devel] [PATCH v2] Add chardev-send-break monitor command

2017-06-30 Thread Stefan Fritsch

On Thursday, 29 June 2017 16:00:54 CEST Dr. David Alan Gilbert wrote:
> * Eric Blake (ebl...@redhat.com) wrote:
> > On 06/11/2017 02:48 AM, Stefan Fritsch wrote:
> > > Sending a break on a serial console can be useful for debugging the
> > > guest. But not all chardev backends support sending breaks (only telnet
> > > and mux do). The chardev-send-break command allows to send a break even
> > > if using other backends.
> > > 
> > > Signed-off-by: Stefan Fritsch 
> > > Acked-by: Dr. David Alan Gilbert 
> > > ---
> > > v2: added tests and Acked-by line
> > > 
> > > 
> > > +++ b/hmp-commands.hx
> > > @@ -1745,6 +1745,22 @@ Removes the chardev @var{id}.
> > > 
> > >  ETEXI
> > >  
> > >  {
> > > 
> > > +.name   = "chardev-send-break",
> > > +.args_type  = "id:s",
> > > +.params = "id",
> > > +.help   = "send break on chardev",
> > 
> > Compare this wording,...
> > 
> > > +STEXI
> > > +@item chardev-send-break id
> > > +@findex chardev-send-break
> > > +Sends break on the chardev @var{id}.
> > 
> > repeated here,
> > 
> > > +++ b/qapi-schema.json
> > > @@ -5114,6 +5114,26 @@
> > > 
> > >  { 'command': 'chardev-remove', 'data': {'id': 'str'} }
> > >  
> > >  ##
> > > 
> > > +# @chardev-send-break:
> > > +#
> > > +# Send a break to a character device
> > 
> > ...with this wording.  I like 'send a break' better than 'send break',
> > but even better might be 'send a break sequence' or even 'emulate a
> > break sequence' (by definition, a break is NOT a character, but on bare
> > metal character devices it IS a defined electrical sequence distinct
> > from characters to make the recipient aware that the sender is trying to
> > get attention).
> > 
> > Otherwise, the patch looks fine to me.  If all that changes is some
> > word-smithing (and the maintainer may be willing to do that), you can add:
> > Reviewed-by: Eric Blake 
> 
> I'll go with the 'send a break' - the official terminology seems to
> be a 'break condition', but I've never heard anyone say that,
> it's always been 'send a break'.

I also think that "send a break" is best for people who google for it.

Thanks.

Cheers,
Stefan

Re: [Qemu-devel] [PATCH 3/3] msi: Handle remappable format interrupt request

2017-06-30 Thread Anthony PERARD

On Thu, Jun 29, 2017 at 01:49:54AM -0400, Lan Tianyu wrote:
> From: Chao Gao 
> 
> According to VT-d spec Interrupt Remapping and Interrupt Posting ->
> Interrupt Remapping -> Interrupt Request Formats On Intel 64
> Platforms, fields of MSI data register have changed. This patch
> avoids wrongly regarding a remappable format interrupt request as
> an interrupt binded with an event channel.
> 
> Signed-off-by: Chao Gao 
> Signed-off-by: Lan Tianyu 

The patch needs to be rebased on top of QEMU upstream.

Beside that:
Acked-by: Anthony PERARD 

Thanks,

-- 
Anthony PERARD

Re: [Qemu-devel] [PATCH 2/3] xen-pt: bind/unbind interrupt remapping format MSI

2017-06-30 Thread Anthony PERARD

On Thu, Jun 29, 2017 at 01:49:53AM -0400, Lan Tianyu wrote:
> From: Chao Gao 
> 
> If a vIOMMU is exposed to guest, guest will configure the msi to remapping
> format. The original code isn't suitable to the new format. A new pair
> bind/unbind interfaces are added for this usage. This patch recognizes
> this case and uses new interfaces to bind/unbind msi.
> 
> Signed-off-by: Chao Gao 
> Signed-off-by: Lan Tianyu 

Hi,

The patch series is going to need to be rebased on top of QEMU upstream.
For starter, configure have changed a bit.

> ---
>  configure | 54 
> +++
>  hw/xen/xen_pt_msi.c   | 50 ---
>  include/hw/i386/apic-msidef.h |  1 +
>  include/hw/xen/xen_common.h   | 25 
>  4 files changed, 117 insertions(+), 13 deletions(-)
> 
> diff --git a/configure b/configure
> index 476210b..b3ac49f 100755
> --- a/configure
> +++ b/configure
> @@ -1982,6 +1982,60 @@ EOF
>  /*
>   * If we have stable libs the we don't want the libxc compat
>   * layers, regardless of what CFLAGS we may have been given.
> + */
> +#undef XC_WANT_COMPAT_EVTCHN_API
> +#undef XC_WANT_COMPAT_GNTTAB_API
> +#undef XC_WANT_COMPAT_MAP_FOREIGN_API
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#if !defined(HVM_MAX_VCPUS)
> +# error HVM_MAX_VCPUS not defined
> +#endif
> +int main(void) {
> +  xc_interface *xc = NULL;
> +  xenforeignmemory_handle *xfmem;
> +  xenevtchn_handle *xe;
> +  xengnttab_handle *xg;
> +  xen_domain_handle_t handle;
> +  xengnttab_grant_copy_segment_t* seg = NULL;
> +
> +  xs_daemon_open();
> +
> +  xc = xc_interface_open(0, 0, 0);
> +  xc_hvm_set_mem_type(0, 0, HVMMEM_ram_ro, 0, 0);
> +  xc_domain_add_to_physmap(0, 0, XENMAPSPACE_gmfn, 0, 0);
> +  xc_hvm_inject_msi(xc, 0, 0xf000, 0x);
> +  xc_hvm_create_ioreq_server(xc, 0, HVM_IOREQSRV_BUFIOREQ_ATOMIC, NULL);
> +  xc_domain_create(xc, 0, handle, 0, NULL, NULL);
> +
> +  xfmem = xenforeignmemory_open(0, 0);
> +  xenforeignmemory_map(xfmem, 0, 0, 0, 0, 0);
> +
> +  xe = xenevtchn_open(0, 0);
> +  xenevtchn_fd(xe);
> +
> +  xg = xengnttab_open(0, 0);
> +  xengnttab_grant_copy(xg, 0, seg);
> +
> +  xc_domain_update_msi_irq_remapping(xc, 0, 0, 0, 0, 0 ,0);
> +
> +  return 0;
> +}
> +EOF
> +  compile_prog "" "$xen_libs $xen_stable_libs"
> +then
> +xen_ctrl_version=4100
> +xen=yes

There have been some change/refactoring in configure, so this won't
work. The xen_ctrl_version got one more digit.

Can you try with this patch? Which is also simpler.
diff --git a/configure b/configure
index c571ad14e5..a06f2c0b92 100755
--- a/configure
+++ b/configure
@@ -2021,6 +2021,24 @@ EOF
 # Xen unstable
 elif
 cat > $TMPC <
+int main(void) {
+  xc_interface *xc = NULL;
+
+  xc_domain_update_msi_irq_remapping(xc, 0, 0, 0, 0, 0 ,0);
+
+  return 0;
+}
+EOF
+compile_prog "" "$xen_libs -lxendevicemodel $xen_stable_libs"
+  then
+  xen_stable_libs="-lxendevicemodel $xen_stable_libs"
+  xen_ctrl_version=41000
+  xen=yes
+
+# Xen 4.9
+elif
+cat > $TMPC <


> index 8e1580d..4ba43a8 100644
> --- a/include/hw/xen/xen_common.h
> +++ b/include/hw/xen/xen_common.h
> @@ -438,4 +438,29 @@ static inline int xengnttab_grant_copy(xengnttab_handle 
> *xgt, uint32_t count,
>  }
>  #endif
>  
> +/* Xen before 4.10 */
> +#if CONFIG_XEN_CTRL_INTERFACE_VERSION < 4100

This will needs to be
CONFIG_XEN_CTRL_INTERFACE_VERSION < 41000

The rest of the patch is good, Thanks,

-- 
Anthony PERARD

[Qemu-devel] [PATCH] util/cacheinfo: Fix warning generated by clang

2017-06-30 Thread Pranith Kumar

Clang generates the following warning on aarch64 host:

  CC  util/cacheinfo.o
/home/pranith/qemu/util/cacheinfo.c:121:48: warning: value size does not match 
register size specified by the constraint and modifier [-Wasm-operand-widths]
asm volatile("mrs\t%0, ctr_el0" : "=r"(ctr));
   ^
/home/pranith/qemu/util/cacheinfo.c:121:28: note: use constraint modifier "w"
asm volatile("mrs\t%0, ctr_el0" : "=r"(ctr));
   ^~
   %w0

Constraint modifier 'w' is not (yet?) accepted by gcc. Fix this by increasing 
the ctr size.

Signed-off-by: Pranith Kumar 
---
 util/cacheinfo.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/util/cacheinfo.c b/util/cacheinfo.c
index f987522df4..6253049533 100644
--- a/util/cacheinfo.c
+++ b/util/cacheinfo.c
@@ -112,7 +112,7 @@ static void sys_cache_info(int *isize, int *dsize)
 static void arch_cache_info(int *isize, int *dsize)
 {
 if (*isize == 0 || *dsize == 0) {
-unsigned ctr;
+unsigned long ctr;
 
 /* The real cache geometry is in CCSIDR_EL1/CLIDR_EL1/CSSELR_EL1,
but (at least under Linux) these are marked protected by the
-- 
2.13.0

Re: [Qemu-devel] [PULL 0/7] M68k for 2.10 patches

2017-06-30 Thread Peter Maydell

On 30 June 2017 at 13:30, Laurent Vivier  wrote:
> The following changes since commit 4c8c1cc544dbd5e2564868e61c5037258e393832:
>
>   Merge remote-tracking branch 
> 'remotes/vivier/tags/m68k-for-2.10-pull-request' into staging (2017-06-22 
> 19:01:58 +0100)
>
> are available in the git repository at:
>
>   git://github.com/vivier/qemu-m68k.git tags/m68k-for-2.10-pull-request
>
> for you to fetch changes up to a1e58ddcb3eed7ec4a158512b9dae46f90492c1b:
>
>   target/m68k: add fmovem (2017-06-29 20:29:57 +0200)
>
> 
>
> 

Applied, thanks.

-- PMM

Re: [Qemu-devel] [RFC PATCH 1/3] vmstate: error hint for failed equal checks

2017-06-30 Thread Eric Blake

On 06/30/2017 09:41 AM, Halil Pasic wrote:
>>> 'This' basically boils down to the question and
>>> 'Why aren't hints reported in QMP context?'
>>
>> QMP is supposed to be machine-parseable.  Hints are supposed to be
>> human-readable. If you have a machine managing the monitor, the hint
>> adds nothing but bandwidth consumption, because machine should not be
>> parsing the human portion of the error message in the first place (as it
>> is, libvirt already just logs the human-readable portion of a message,
>> and bases its actions solely on the machine-stable portions of an error
>> reply: namely, whether an error was sent at all, and occasionally, what
>> error class was used for that error - there's no guarantee a human will
>> be reading the log, though).
> 
> 
> Seems I've made wrong assumptions about error messages (in QEMU) up until
> now. If I understand you correctly, in QEMU error messages are part of
> the API (but hints are not). Thus if one changes a typo in an error
> message (like here
> https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg06732.html) the
> one is strictly speaking breaking API backward compatibility.  Is that
> really the way we want to have things?

Quite the opposite. In QMP, the EXISTENCE of an error message is part of
the API, but the CONTENTS of the message are not (machines are not
supposed to further parse the message) - anything that the machine would
want to differentiate between two different possible error messages
should instead be conveyed via a second field in the same returned
dictionary (the error class), and not by parsing the message.  Most
often, there is not a strong case for having differentiation, so most
errors are lumped in the generic class (error_setg() makes this easy to
do by default).  An example where differentiation matters: look at the
"Important Note" in blockdev.c:qmp_block_commit().

> 
> From prior experiences I'm more used to think about error messages as
> something meant for human consumption, and expressing things expected to
> be relevant for some kind of client code in a different way (optimized
> for machine consumption).
> 
> If however the error message ain't part of the machine relevant portion,
> then the same argument applies as to the 'hint', and I don't see the
> reason for handling hints differently. Do you agree with my
> argumentation?

Indeed, it may not hurt to start passing the hints over the wire (errors
would then consume more bandwidth, but errors are not the hot path).
And I'm not necessarily opposed to that change, so much as trying to
document why it is not currently the case.  At the same time, I probably
won't be the one writing a path to populate the hint information into
the QMP error, as I don't have any reason to use the hint when
controlling libvirt (except maybe for logging, but there, the hint is
not going to help the end user, because it's not the end-user's fault
that libvirt used the API wrong to get a hint in the first place).

>> If something absolutely must be reported, then it is not a hint, and
>> shouldn't be using the hint mechanism.
>>
> 
> I find it hard to formulate criteria for 'must be reported'. I'm afraid
> this is backwards logic: since the hint may not be reported everything
> that needs to be reported is not a hint. This is a valid approach of
> course, but then I think some modifications to the comments in error.h
> would not hurt. And maybe something with verbose would be more
> expressive name.
> 
> I hope all this makes some sense and ain't pure waste of time...

No, it never hurts to question whether the design is optimal, and it's
better to question first to know whether it is even worth patching
things to behave differently, rather than spending time patching it only
to have a maintainer clarify that the patch can't be accepted because of
some design constraint.  So I still hope Markus will chime in.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v1 1/1] xilinx-dp: Add support for the yuy2 video format

2017-06-30 Thread KONRAD Frederic



On 06/30/2017 03:55 PM, Edgar E. Iglesias wrote:

From: "Edgar E. Iglesias" 

Add support for the yuy2 video format.

Signed-off-by: Edgar E. Iglesias 
Acked-by: Sai Pavan Boddu 
---
  hw/display/xlnx_dp.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/hw/display/xlnx_dp.c b/hw/display/xlnx_dp.c
index f7b7b80..a77d7db 100644
--- a/hw/display/xlnx_dp.c
+++ b/hw/display/xlnx_dp.c
@@ -624,6 +624,9 @@ static void xlnx_dp_change_graphic_fmt(XlnxDPState *s)
  case 0:
  s->v_plane.format = PIXMAN_x8b8g8r8;
  break;
+case DP_NL_VID_Y0_CB_Y1_CR:
+s->v_plane.format = PIXMAN_yuy2;
+break;
  case DP_NL_VID_RGBA8880:
  s->v_plane.format = PIXMAN_x8b8g8r8;
  break;



Reviewed-by: KONRAD Frederic 

Fred

Re: [Qemu-devel] [PATCH] tests: Avoid non-portable 'echo -ARG'

2017-06-30 Thread Eric Blake

On 06/30/2017 09:38 AM, Max Reitz wrote:
> On 2017-06-28 16:21, Eric Blake wrote:
>> POSIX says that backslashes in the arguments to 'echo', as well as
>> any use of 'echo -n' and 'echo -e', are non-portable; it recommends
>> people should favor 'printf' instead.  This is definitely true where
>> we do not control which shell is running (such as in makefile snippets
>> or in documentation examples).  But even for scripts where we
>> require bash (and therefore, where echo does what we want by default),
>> it is still possible to use 'shopt -s xpg_echo' to change bash's
>> behavior of echo.  And setting a good example never hurts when we are
>> not sure if a snippet will be copied from a bash-only script to a
>> general shell script (although I don't change the use of non-portable
>> \e for ESC when we know the running shell is bash).
>>
>> Replace 'echo -n "..."' with 'printf "..."', and 'echo -e "..."'
>> with 'printf "...\n"'.
>>
>> In the qemu-iotests check script, also fix unusual shell quoting
>> that would result in word-splitting if 'date' outputs a space.
>>
>> Signed-off-by: Eric Blake 
>> ---

> Question 1: Who's going to take this? :-)

It's test-related, touching mostly iotests. But it's also a candidate
for qemu-trivial (I'll cc them on v2).

> 
> Question 2: This breaks 171 if TEST_DIR contains a % (e.g.
> "TEST_DIR=/tmp/foo%% ./check -raw 171"). Is that OK?

No.  A more formal fix is using 'printf %s "..."' in place of 'echo -n
"..."', if the "..." contains any substitutions.  The extra %s is not
needed when there are no risky substitutions.  I'll spin up a v2.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [RFC PATCH 1/3] vmstate: error hint for failed equal checks

2017-06-30 Thread Halil Pasic

On 06/29/2017 09:04 PM, Eric Blake wrote:
> On 06/14/2017 08:51 AM, Halil Pasic wrote:
> 
> [apologies for the delayed response, and also adding Markus]
> 

No problem. Many thanks for the effort. I see I've ended up with a
lengthy email. A disclaimer before I start: No strong opinions here.
Things have been working reasonably well for years and I respect that.
Nevertheless I like conceptual clarity, and because of this, I ended up
doing discussion without considering the expected cost/benefit ration. If
I think about it that way it probably ain't wort it. So I'm OK with
concluding the discussion with that argument at any time -- just tell ;).

>>>
>>> One reason I choose error_report_err is to be consistent about hint
>>> reporting (the other one is that was what Connie suggested). I do
>>> not understand why do we omit hints if QMP, but I figured that's
>>> our policy. So the hint I'm adding must not be printed in QMP
>>> context -- because that's our policy. I was pretty sure what I
>>> want to do is add a hint (and not make a very long 'core' error
>>> message).
>>>
>>> Can you (or somebody else)  explain why are hints dropped in QMP
>>> context?
>>>
>>> Don't misunderstand I'm open towards your proposal, it's just
>>> that:
>>> 1) I would like to understand.
>>> 2) I would like to get the very same result as produced by
>>> https://lists.nongnu.org/archive/html/qemu-devel/2017-06/msg01472.html 
>>>
>>> Regards,
>>> Halil
>>>
>>>
>>
>> ping.
>>
>> I would like to do a v2, but I want this sorted out first.
>>
>> 'This' basically boils down to the question and
>> 'Why aren't hints reported in QMP context?'
> 
> QMP is supposed to be machine-parseable.  Hints are supposed to be
> human-readable. If you have a machine managing the monitor, the hint
> adds nothing but bandwidth consumption, because machine should not be
> parsing the human portion of the error message in the first place (as it
> is, libvirt already just logs the human-readable portion of a message,
> and bases its actions solely on the machine-stable portions of an error
> reply: namely, whether an error was sent at all, and occasionally, what
> error class was used for that error - there's no guarantee a human will
> be reading the log, though).

Seems I've made wrong assumptions about error messages (in QEMU) up until
now. If I understand you correctly, in QEMU error messages are part of
the API (but hints are not). Thus if one changes a typo in an error
message (like here
https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg06732.html) the
one is strictly speaking breaking API backward compatibility.  Is that
really the way we want to have things?

From prior experiences I'm more used to think about error messages as
something meant for human consumption, and expressing things expected to
be relevant for some kind of client code in a different way (optimized
for machine consumption).

If however the error message ain't part of the machine relevant portion,
then the same argument applies as to the 'hint', and I don't see the
reason for handling hints differently. Do you agree with my
argumentation?

Let us also examine some comments in qapi/error.h:

/*
 * Just like error_setg(), except you get to specify the error class.
 * Note: use of error classes other than ERROR_CLASS_GENERIC_ERROR is
 * strongly discouraged.
 */
#define error_set(errp, err_class, fmt, ...)

This probably means client code (e.g. libvirt) is in general not meant
to make decision based on the type of the error that occurred (e.g. what
went wrong).

/*
[..]
 * human-readable error message is made from printf-style @fmt, ...
 * The resulting message should be a single phrase, with no newline or
 * trailing punctuation.
[..]
 */
#define error_setg(errp, fmt, ...) 

From this it seems to me error message is intended for human-consumption.

/*
 * Append a printf-style human-readable explanation to an existing error.
 * @errp may be NULL, but not _fatal or _abort.
 * Trivially the case if you call it only after error_setg() or
 * error_propagate().
 * May be called multiple times.  The resulting hint should end with a
 * newline.
 */
void error_append_hint(Error **errp, const char *fmt, ...)

From this, I would say: The 'hint' is about why something went wrong. The
'message' is about what problem in particular or in general was
encountered (in general, the requested operation failed/can not be
performed; the caller knows what operation was attempted) and should be
considered debugging aid (along with the bits supposed to answer the
question 'where'). This debugging aid, however, can be very useful to the
end user if seeking a workaround, and the error_class is for providing
client code with additional information beyond 'something went wrong'.

Whether the message is supposed to be only about 'in particular' is a
tricky one, and should probably depend on the contract: if the client
code is supposed tell us which high level operation failed then I guess
just 'in

Re: [Qemu-devel] [PATCH] tests: Avoid non-portable 'echo -ARG'

2017-06-30 Thread Max Reitz

On 2017-06-28 16:21, Eric Blake wrote:
> POSIX says that backslashes in the arguments to 'echo', as well as
> any use of 'echo -n' and 'echo -e', are non-portable; it recommends
> people should favor 'printf' instead.  This is definitely true where
> we do not control which shell is running (such as in makefile snippets
> or in documentation examples).  But even for scripts where we
> require bash (and therefore, where echo does what we want by default),
> it is still possible to use 'shopt -s xpg_echo' to change bash's
> behavior of echo.  And setting a good example never hurts when we are
> not sure if a snippet will be copied from a bash-only script to a
> general shell script (although I don't change the use of non-portable
> \e for ESC when we know the running shell is bash).
> 
> Replace 'echo -n "..."' with 'printf "..."', and 'echo -e "..."'
> with 'printf "...\n"'.
> 
> In the qemu-iotests check script, also fix unusual shell quoting
> that would result in word-splitting if 'date' outputs a space.
> 
> Signed-off-by: Eric Blake 
> ---
> 
> Of course, Stefan's pending patch:
> [PATCH 3/5] qemu-iotests: 068: extract _qemu() function
> also touches 068, so there may be some (obvious) merge conflicts
> to resolve there depending on what goes in first.
> 
>  qemu-options.hx |  4 ++--
>  tests/multiboot/run_test.sh | 10 +-
>  tests/qemu-iotests/051  |  7 ---
>  tests/qemu-iotests/068  |  2 +-
>  tests/qemu-iotests/142  | 48 
> ++---
>  tests/qemu-iotests/171  | 14 ++---
>  tests/qemu-iotests/check| 18 -
>  tests/rocker/all| 10 +-
>  tests/tcg/cris/Makefile |  8 
>  9 files changed, 61 insertions(+), 60 deletions(-)

Question 1: Who's going to take this? :-)

Question 2: This breaks 171 if TEST_DIR contains a % (e.g.
"TEST_DIR=/tmp/foo%% ./check -raw 171"). Is that OK?

Max



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH v4 2/3] tcg/aarch64: Use ADRP+ADD to compute target address

2017-06-30 Thread Pranith Kumar

We use ADRP+ADD to compute the target address for goto_tb. This patch
introduces the NOP instruction which is used to align the above
instruction pair so that we can use one atomic instruction to patch
the destination offsets.

CC: Richard Henderson 
CC: Alex Bennée 
Signed-off-by: Pranith Kumar 
---
 accel/tcg/translate-all.c|  2 +-
 tcg/aarch64/tcg-target.inc.c | 36 ++--
 2 files changed, 31 insertions(+), 7 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index f6ad46b613..65a92dbf67 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -522,7 +522,7 @@ static inline PageDesc *page_find(tb_page_addr_t index)
 #elif defined(__powerpc__)
 # define MAX_CODE_GEN_BUFFER_SIZE  (32u * 1024 * 1024)
 #elif defined(__aarch64__)
-# define MAX_CODE_GEN_BUFFER_SIZE  (128ul * 1024 * 1024)
+# define MAX_CODE_GEN_BUFFER_SIZE  (2ul * 1024 * 1024 * 1024)
 #elif defined(__s390x__)
   /* We have a +- 4GB range on the branches; leave some slop.  */
 # define MAX_CODE_GEN_BUFFER_SIZE  (3ul * 1024 * 1024 * 1024)
diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 8fce11ace7..a84422d633 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -372,6 +372,7 @@ typedef enum {
 I3510_EON   = 0x4a20,
 I3510_ANDS  = 0x6a00,
 
+NOP = 0xd503201f,
 /* System instructions.  */
 DMB_ISH = 0xd50338bf,
 DMB_LD  = 0x0100,
@@ -865,11 +866,27 @@ static inline void tcg_out_call(TCGContext *s, 
tcg_insn_unit *target)
 
 void aarch64_tb_set_jmp_target(uintptr_t jmp_addr, uintptr_t addr)
 {
-tcg_insn_unit *code_ptr = (tcg_insn_unit *)jmp_addr;
-tcg_insn_unit *target = (tcg_insn_unit *)addr;
+tcg_insn_unit i1, i2;
+TCGType rt = TCG_TYPE_I64;
+TCGReg  rd = TCG_REG_TMP;
+uint64_t pair;
 
-reloc_pc26_atomic(code_ptr, target);
-flush_icache_range(jmp_addr, jmp_addr + 4);
+ptrdiff_t offset = addr - jmp_addr;
+
+if (offset == sextract64(offset, 0, 26)) {
+i1 = I3206_B | ((offset >> 2) & 0x3ff);
+i2 = NOP;
+} else {
+offset = (addr >> 12) - (jmp_addr >> 12);
+
+/* patch ADRP */
+i1 = I3406_ADRP | (offset & 3) << 29 | (offset & 0x1c) << (5 - 2) 
| rd;
+/* patch ADDI */
+i2 = I3401_ADDI | rt << 31 | (addr & 0xfff) << 10 | rd << 5 | rd;
+}
+pair = (uint64_t)i2 << 32 | i1;
+atomic_set((uint64_t *)jmp_addr, pair);
+flush_icache_range(jmp_addr, jmp_addr + 8);
 }
 
 static inline void tcg_out_goto_label(TCGContext *s, TCGLabel *l)
@@ -1388,10 +1405,17 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 #endif
 /* consistency for USE_DIRECT_JUMP */
 tcg_debug_assert(s->tb_jmp_insn_offset != NULL);
+/* Ensure that ADRP+ADD are 8-byte aligned so that an atomic
+   write can be used to patch the target address. */
+if ((uintptr_t)s->code_ptr & 7) {
+tcg_out32(s, NOP);
+}
 s->tb_jmp_insn_offset[a0] = tcg_current_code_size(s);
 /* actual branch destination will be patched by
-   aarch64_tb_set_jmp_target later, beware retranslation. */
-tcg_out_goto_noaddr(s);
+   aarch64_tb_set_jmp_target later. */
+tcg_out_insn(s, 3406, ADRP, TCG_REG_TMP, 0);
+tcg_out_insn(s, 3401, ADDI, TCG_TYPE_I64, TCG_REG_TMP, TCG_REG_TMP, 0);
+tcg_out_insn(s, 3207, BR, TCG_REG_TMP);
 s->tb_jmp_reset_offset[a0] = tcg_current_code_size(s);
 break;
 
-- 
2.13.0

[Qemu-devel] [PATCH v4 1/3] tcg/aarch64: Introduce and use long branch to register

2017-06-30 Thread Pranith Kumar

We can use a branch to register instruction for exit_tb for offsets
greater than 128MB.

CC: Alex Bennée 
Reviewed-by: Richard Henderson 
Signed-off-by: Pranith Kumar 
---
 tcg/aarch64/tcg-target.inc.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 1fa3bccc89..8fce11ace7 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -819,6 +819,17 @@ static inline void tcg_out_goto(TCGContext *s, 
tcg_insn_unit *target)
 tcg_out_insn(s, 3206, B, offset);
 }
 
+static inline void tcg_out_goto_long(TCGContext *s, tcg_insn_unit *target)
+{
+ptrdiff_t offset = target - s->code_ptr;
+if (offset == sextract64(offset, 0, 26)) {
+tcg_out_insn(s, 3206, BL, offset);
+} else {
+tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_TMP, (intptr_t)target);
+tcg_out_insn(s, 3207, BR, TCG_REG_TMP);
+}
+}
+
 static inline void tcg_out_goto_noaddr(TCGContext *s)
 {
 /* We pay attention here to not modify the branch target by reading from
@@ -1364,10 +1375,10 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 case INDEX_op_exit_tb:
 /* Reuse the zeroing that exists for goto_ptr.  */
 if (a0 == 0) {
-tcg_out_goto(s, s->code_gen_epilogue);
+tcg_out_goto_long(s, s->code_gen_epilogue);
 } else {
 tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_X0, a0);
-tcg_out_goto(s, tb_ret_addr);
+tcg_out_goto_long(s, tb_ret_addr);
 }
 break;
 
-- 
2.13.0

[Qemu-devel] [PATCH v4 3/3] tcg/aarch64: Enable indirect jump path using LDR (literal)

2017-06-30 Thread Pranith Kumar

This patch enables the indirect jump path using an LDR (literal)
instruction. It will be interesting to test and see which performs
better among the two paths.

CC: Alex Bennée 
Reviewed-by: Richard Henderson 
Signed-off-by: Pranith Kumar 
---
 tcg/aarch64/tcg-target.inc.c | 42 --
 1 file changed, 28 insertions(+), 14 deletions(-)

diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index a84422d633..04bc369a92 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -269,6 +269,8 @@ typedef enum {
 I3207_BLR   = 0xd63f,
 I3207_RET   = 0xd65f,
 
+/* Load literal for loading the address at pc-relative offset */
+I3305_LDR   = 0x5800,
 /* Load/store register.  Described here as 3.3.12, but the helper
that emits them can transform to 3.3.10 or 3.3.13.  */
 I3312_STRB  = 0x3800 | LDST_ST << 22 | MO_8 << 30,
@@ -389,6 +391,11 @@ static inline uint32_t tcg_in32(TCGContext *s)
 #define tcg_out_insn(S, FMT, OP, ...) \
 glue(tcg_out_insn_,FMT)(S, glue(glue(glue(I,FMT),_),OP), ## __VA_ARGS__)
 
+static void tcg_out_insn_3305(TCGContext *s, AArch64Insn insn, int imm19, 
TCGReg rt)
+{
+tcg_out32(s, insn | (imm19 & 0x7) << 5 | rt);
+}
+
 static void tcg_out_insn_3201(TCGContext *s, AArch64Insn insn, TCGType ext,
   TCGReg rt, int imm19)
 {
@@ -864,6 +871,8 @@ static inline void tcg_out_call(TCGContext *s, 
tcg_insn_unit *target)
 }
 }
 
+#ifdef USE_DIRECT_JUMP
+
 void aarch64_tb_set_jmp_target(uintptr_t jmp_addr, uintptr_t addr)
 {
 tcg_insn_unit i1, i2;
@@ -889,6 +898,8 @@ void aarch64_tb_set_jmp_target(uintptr_t jmp_addr, 
uintptr_t addr)
 flush_icache_range(jmp_addr, jmp_addr + 8);
 }
 
+#endif
+
 static inline void tcg_out_goto_label(TCGContext *s, TCGLabel *l)
 {
 if (!l->has_value) {
@@ -1400,21 +1411,24 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 break;
 
 case INDEX_op_goto_tb:
-#ifndef USE_DIRECT_JUMP
-#error "USE_DIRECT_JUMP required for aarch64"
-#endif
-/* consistency for USE_DIRECT_JUMP */
-tcg_debug_assert(s->tb_jmp_insn_offset != NULL);
-/* Ensure that ADRP+ADD are 8-byte aligned so that an atomic
-   write can be used to patch the target address. */
-if ((uintptr_t)s->code_ptr & 7) {
-tcg_out32(s, NOP);
+if (s->tb_jmp_insn_offset != NULL) {
+/* USE_DIRECT_JUMP */
+/* Ensure that ADRP+ADD are 8-byte aligned so that an atomic
+   write can be used to patch the target address. */
+if ((uintptr_t)s->code_ptr & 7) {
+tcg_out32(s, NOP);
+}
+s->tb_jmp_insn_offset[a0] = tcg_current_code_size(s);
+/* actual branch destination will be patched by
+   aarch64_tb_set_jmp_target later. */
+tcg_out_insn(s, 3406, ADRP, TCG_REG_TMP, 0);
+tcg_out_insn(s, 3401, ADDI, TCG_TYPE_I64, TCG_REG_TMP, 
TCG_REG_TMP, 0);
+} else {
+/* !USE_DIRECT_JUMP */
+tcg_debug_assert(s->tb_jmp_target_addr != NULL);
+intptr_t offset = tcg_pcrel_diff(s, (s->tb_jmp_target_addr + a0)) 
>> 2;
+tcg_out_insn(s, 3305, LDR, offset, TCG_REG_TMP);
 }
-s->tb_jmp_insn_offset[a0] = tcg_current_code_size(s);
-/* actual branch destination will be patched by
-   aarch64_tb_set_jmp_target later. */
-tcg_out_insn(s, 3406, ADRP, TCG_REG_TMP, 0);
-tcg_out_insn(s, 3401, ADDI, TCG_TYPE_I64, TCG_REG_TMP, TCG_REG_TMP, 0);
 tcg_out_insn(s, 3207, BR, TCG_REG_TMP);
 s->tb_jmp_reset_offset[a0] = tcg_current_code_size(s);
 break;
-- 
2.13.0

Re: [Qemu-devel] [Qemu-block] [RFC] QMP design: Fixing query-block and friends

2017-06-30 Thread Alberto Garcia

On Fri 30 Jun 2017 04:22:11 PM CEST, Kevin Wolf wrote:
> Am 30.06.2017 um 15:01 hat Alberto Garcia geschrieben:
>> On Tue 27 Jun 2017 06:31:45 PM CEST, Kevin Wolf wrote:
>> > * Speaking of recursion: ImageInfo recursively includes information
>> >   about all images in the backing chain. This is what makes the output
>> >   of query-named-block-nodes so redundant. It is also inconsistent
>> >   because the runtime information (BlockInfo/BlockDeviceInfo) isn't
>> >   recursive.
>> 
>> I've also been told of cases where a query-block command on a VM with a
>> very large amount of images in all backing chains would slow down the
>> guest because of this recursion.
>> 
>> I haven't been able to reproduce this problem myself, but being able to
>> see only the devices seen by the guest (as opposed to the complete
>> graph) seems like a good idea.
>
> I think the information only really becomes redundant if you use
> query-named-block-nodes because then you list every backing file node
> and each of them contains the recursive information.
>
> With query-block, you start at the top and include the information for
> each image in the backing file chain only once. If we did indeed have
> a problem there, that might mean that we do in fact need filtering to
> reduce the overhead. But I don't think any of the information involves
> any I/O to get, so it seems unlikely to me that this would make a big
> difference.

Yes, that was also what I thought, but I haven't been able to reproduce
the problem so I don't know yet. In my own tests with thousands of
backing files I haven't noticed any slowdown caused by query-block.

Berto

Re: [Qemu-devel] [Qemu-block] [RFC] QMP design: Fixing query-block and friends

2017-06-30 Thread Kevin Wolf

Am 30.06.2017 um 15:01 hat Alberto Garcia geschrieben:
> On Tue 27 Jun 2017 06:31:45 PM CEST, Kevin Wolf wrote:
> > * Speaking of recursion: ImageInfo recursively includes information
> >   about all images in the backing chain. This is what makes the output
> >   of query-named-block-nodes so redundant. It is also inconsistent
> >   because the runtime information (BlockInfo/BlockDeviceInfo) isn't
> >   recursive.
> 
> I've also been told of cases where a query-block command on a VM with a
> very large amount of images in all backing chains would slow down the
> guest because of this recursion.
> 
> I haven't been able to reproduce this problem myself, but being able to
> see only the devices seen by the guest (as opposed to the complete
> graph) seems like a good idea.

I think the information only really becomes redundant if you use
query-named-block-nodes because then you list every backing file node
and each of them contains the recursive information.

With query-block, you start at the top and include the information for
each image in the backing file chain only once. If we did indeed have a
problem there, that might mean that we do in fact need filtering to
reduce the overhead. But I don't think any of the information involves
any I/O to get, so it seems unlikely to me that this would make a big
difference.

Kevin

Re: [Qemu-devel] [PATCH] qom: enforce readonly nature of link's check callback

2017-06-30 Thread Fam Zheng

On Fri, 06/30 12:41, Paolo Bonzini wrote:
> 
> 
> On 29/06/2017 13:14, Igor Mammedov wrote:
> > link's check callback is supposed to verify/permit setting it,
> > however currently nothing restricts it from misusing it
> > and modifying target object from within.
> > Make sure that readonly semantics are checked by compiler
> > to prevent callback's misuse.
> > 
> > Signed-off-by: Igor Mammedov 
> > ---
> > Fam,
> >  it probably conflicts with yours DEFINE_PROP_LINK series,
> >  feel free to include this patch if you'll have to respin
> 
> Since there will be a respin, I'm _not_ queuing this patch.

I am going to include this in my next version of DEFINE_PROP_LINK series.

Fam

Re: [Qemu-devel] [PATCH v6 0/5] Improve I/O tests coverage of LUKS driver

2017-06-30 Thread Max Reitz

On 2017-06-26 14:35, Daniel P. Berrange wrote:
> The main goal of this series is to get the I/O tests passing
> 100% with LUKS when run with './check -luks'. It also adds a
> few more combinations to the LUKS/dmcrypt interoperability
> test.
> 
> To make LUKS testing not quite as slow, we drop the PBKDF
> iteration count down to a very small value. This doesn't
> remove all overhead, as formatting the volume will always
> measure PBKDF timing over a 1 second interval.

Thanks, applied to my block branch:

https://github.com/XanClic/qemu/commits/block

Max



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PULL 1/2] block: Add BDRV_BLOCK_EOF to bdrv_get_block_status()

2017-06-30 Thread Fam Zheng

From: Eric Blake 

Just as the block layer already sets BDRV_BLOCK_ALLOCATED as a
shortcut for subsequent operations, there are also some optimizations
that are made easier if we can quickly tell that *pnum will advance
us to the end of a file, via a new BDRV_BLOCK_EOF which gets set
by the block layer.

This just plumbs up the new bit; subsequent patches will make use
of it.

Signed-off-by: Eric Blake 
Message-Id: <20170505021500.19315-2-ebl...@redhat.com>
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Fam Zheng 
---
 block/io.c| 15 +++
 include/block/block.h |  2 ++
 2 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/block/io.c b/block/io.c
index 9bba730..7e6abef 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1711,15 +1711,16 @@ typedef struct BdrvCoGetBlockStatusData {
  * Drivers not implementing the functionality are assumed to not support
  * backing files, hence all their sectors are reported as allocated.
  *
- * If 'sector_num' is beyond the end of the disk image the return value is 0
- * and 'pnum' is set to 0.
+ * If 'sector_num' is beyond the end of the disk image the return value is
+ * BDRV_BLOCK_EOF and 'pnum' is set to 0.
  *
  * 'pnum' is set to the number of sectors (including and immediately following
  * the specified sector) that are known to be in the same
  * allocated/unallocated state.
  *
  * 'nb_sectors' is the max value 'pnum' should be set to.  If nb_sectors goes
- * beyond the end of the disk image it will be clamped.
+ * beyond the end of the disk image it will be clamped; if 'pnum' is set to
+ * the end of the image, then the returned value will include BDRV_BLOCK_EOF.
  *
  * If returned value is positive and BDRV_BLOCK_OFFSET_VALID bit is set, 'file'
  * points to the BDS which the sector range is allocated in.
@@ -1740,7 +1741,7 @@ static int64_t coroutine_fn 
bdrv_co_get_block_status(BlockDriverState *bs,
 
 if (sector_num >= total_sectors) {
 *pnum = 0;
-return 0;
+return BDRV_BLOCK_EOF;
 }
 
 n = total_sectors - sector_num;
@@ -1751,6 +1752,9 @@ static int64_t coroutine_fn 
bdrv_co_get_block_status(BlockDriverState *bs,
 if (!bs->drv->bdrv_co_get_block_status) {
 *pnum = nb_sectors;
 ret = BDRV_BLOCK_DATA | BDRV_BLOCK_ALLOCATED;
+if (sector_num + nb_sectors == total_sectors) {
+ret |= BDRV_BLOCK_EOF;
+}
 if (bs->drv->protocol_name) {
 ret |= BDRV_BLOCK_OFFSET_VALID | (sector_num * BDRV_SECTOR_SIZE);
 }
@@ -1814,6 +1818,9 @@ static int64_t coroutine_fn 
bdrv_co_get_block_status(BlockDriverState *bs,
 
 out:
 bdrv_dec_in_flight(bs);
+if (ret >= 0 && sector_num + *pnum == total_sectors) {
+ret |= BDRV_BLOCK_EOF;
+}
 return ret;
 }
 
diff --git a/include/block/block.h b/include/block/block.h
index 85e4be7..4c149ad 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -129,6 +129,7 @@ typedef struct HDGeometry {
  * BDRV_BLOCK_OFFSET_VALID: an associated offset exists for accessing raw data
  * BDRV_BLOCK_ALLOCATED: the content of the block is determined by this
  *   layer (short for DATA || ZERO), set by block layer
+ * BDRV_BLOCK_EOF: the returned pnum covers through end of file for this layer
  *
  * Internal flag:
  * BDRV_BLOCK_RAW: used internally to indicate that the request was
@@ -157,6 +158,7 @@ typedef struct HDGeometry {
 #define BDRV_BLOCK_OFFSET_VALID 0x04
 #define BDRV_BLOCK_RAW  0x08
 #define BDRV_BLOCK_ALLOCATED0x10
+#define BDRV_BLOCK_EOF  0x20
 #define BDRV_BLOCK_OFFSET_MASK  BDRV_SECTOR_MASK
 
 typedef QSIMPLEQ_HEAD(BlockReopenQueue, BlockReopenQueueEntry) 
BlockReopenQueue;
-- 
2.9.4

[Qemu-devel] [PULL 0/2] Block patches

2017-06-30 Thread Fam Zheng

The following changes since commit 36f87b4513373b3cd79c87c9197d17face95d4ac:

  Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.10-20170630' 
into staging (2017-06-30 11:58:49 +0100)

are available in the git repository at:

  git://github.com/famz/qemu.git tags/block-pull-request

for you to fetch changes up to c61e684e44272f2acb2bef34cf2aa234582a73a9:

  block: Exploit BDRV_BLOCK_EOF for larger zero blocks (2017-06-30 21:48:06 
+0800)



Hi Peter,

Here are Eric Blake's enhancement to block layer API. Thanks!



Eric Blake (2):
  block: Add BDRV_BLOCK_EOF to bdrv_get_block_status()
  block: Exploit BDRV_BLOCK_EOF for larger zero blocks

 block/io.c | 42 +-
 include/block/block.h  |  2 ++
 tests/qemu-iotests/154 |  4 
 tests/qemu-iotests/154.out | 12 ++--
 4 files changed, 41 insertions(+), 19 deletions(-)

-- 
2.9.4

[Qemu-devel] [PULL 2/2] block: Exploit BDRV_BLOCK_EOF for larger zero blocks

2017-06-30 Thread Fam Zheng

From: Eric Blake 

When we have a BDS with unallocated clusters, but asking the status
of its underlying bs->file or backing layer encounters an end-of-file
condition, we know that the rest of the unallocated area will read as
zeroes.  However, pre-patch, this required two separate calls to
bdrv_get_block_status(), as the first call stops at the point where
the underlying file ends.  Thanks to BDRV_BLOCK_EOF, we can now widen
the results of the primary status if the secondary status already
includes BDRV_BLOCK_ZERO.

In turn, this fixes a TODO mentioned in iotest 154, where we can now
see that all sectors in a partial cluster at the end of a file read
as zero when coupling the shorter backing file's status along with our
knowledge that the remaining sectors came from an unallocated cluster.

Also, note that the loop in bdrv_co_get_block_status_above() had an
inefficent exit: in cases where the active layer sets BDRV_BLOCK_ZERO
but does NOT set BDRV_BLOCK_ALLOCATED (namely, where we know we read
zeroes merely because our unallocated clusters lie beyond the backing
file's shorter length), we still ended up probing the backing layer
even though we already had a good answer.

Signed-off-by: Eric Blake 
Message-Id: <20170505021500.19315-3-ebl...@redhat.com>
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Fam Zheng 
---
 block/io.c | 27 ++-
 tests/qemu-iotests/154 |  4 
 tests/qemu-iotests/154.out | 12 ++--
 3 files changed, 28 insertions(+), 15 deletions(-)

diff --git a/block/io.c b/block/io.c
index 7e6abef..2de7c77 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1803,10 +1803,13 @@ static int64_t coroutine_fn 
bdrv_co_get_block_status(BlockDriverState *bs,
 /* Ignore errors.  This is just providing extra information, it
  * is useful but not necessary.
  */
-if (!file_pnum) {
-/* !file_pnum indicates an offset at or beyond the EOF; it is
- * perfectly valid for the format block driver to point to such
- * offsets, so catch it and mark everything as zero */
+if (ret2 & BDRV_BLOCK_EOF &&
+(!file_pnum || ret2 & BDRV_BLOCK_ZERO)) {
+/*
+ * It is valid for the format block driver to read
+ * beyond the end of the underlying file's current
+ * size; such areas read as zero.
+ */
 ret |= BDRV_BLOCK_ZERO;
 } else {
 /* Limit request to the range reported by the protocol driver 
*/
@@ -1833,16 +1836,30 @@ static int64_t coroutine_fn 
bdrv_co_get_block_status_above(BlockDriverState *bs,
 {
 BlockDriverState *p;
 int64_t ret = 0;
+bool first = true;
 
 assert(bs != base);
 for (p = bs; p != base; p = backing_bs(p)) {
 ret = bdrv_co_get_block_status(p, sector_num, nb_sectors, pnum, file);
-if (ret < 0 || ret & BDRV_BLOCK_ALLOCATED) {
+if (ret < 0) {
+break;
+}
+if (ret & BDRV_BLOCK_ZERO && ret & BDRV_BLOCK_EOF && !first) {
+/*
+ * Reading beyond the end of the file continues to read
+ * zeroes, but we can only widen the result to the
+ * unallocated length we learned from an earlier
+ * iteration.
+ */
+*pnum = nb_sectors;
+}
+if (ret & (BDRV_BLOCK_ZERO | BDRV_BLOCK_DATA)) {
 break;
 }
 /* [sector_num, pnum] unallocated on this layer, which could be only
  * the first part of [sector_num, nb_sectors].  */
 nb_sectors = MIN(nb_sectors, *pnum);
+first = false;
 }
 return ret;
 }
diff --git a/tests/qemu-iotests/154 b/tests/qemu-iotests/154
index dd8a426..fde03b0 100755
--- a/tests/qemu-iotests/154
+++ b/tests/qemu-iotests/154
@@ -334,8 +334,6 @@ $QEMU_IO -c "alloc $size 2048" "$TEST_IMG" | _filter_qemu_io
 $QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
 
 # Repeat with backing file holding unallocated cluster.
-# TODO: Note that this forces an allocation, because we aren't yet able to
-# quickly detect that reads beyond EOF of the backing file are always zero
 CLUSTER_SIZE=2048 TEST_IMG="$TEST_IMG.base" _make_test_img $((size + 1024))
 
 # Write at the front: sector-wise, the request is:
@@ -371,8 +369,6 @@ $QEMU_IO -c "alloc $size 2048" "$TEST_IMG" | _filter_qemu_io
 $QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
 
 # Repeat with backing file holding zero'd cluster
-# TODO: Note that this forces an allocation, because we aren't yet able to
-# quickly detect that reads beyond EOF of the backing file are always zero
 $QEMU_IO -c "write -z $size 512" "$TEST_IMG.base" | _filter_qemu_io
 
 # Write at the front: sector-wise, the request is:
diff --git a/tests/qemu-iotests/154.out

Re: [Qemu-devel] [PATCH v3] hmp, qmp: introduce "info memory" and "query-memory" commands

2017-06-30 Thread Vadim Galitsyn

Hi Guys,

Thank you for the input. Please find updated patch v4 at
http://lists.nongnu.org/archive/html/qemu-devel/2017-06/msg07065.html.

Vadim

On Tue, Jun 27, 2017 at 4:14 PM, Eric Blake  wrote:

> On 06/27/2017 09:05 AM, Igor Mammedov wrote:
> > On Wed, 14 Jun 2017 17:21:06 +0200
> > Vadim Galitsyn  wrote:
> >
>
> >> +void hmp_info_memory(Monitor *mon, const QDict *qdict)
> >> +{
> >> +Error *err = NULL;
> >> +MemoryInfo *info = qmp_query_memory();
> >> +if (info) {
> >> +monitor_printf(mon, "base-memory: %" PRIu64 "\n",
> >> +   info->base_memory);
> >> +monitor_printf(mon, "hot-plug-memory: %" PRIu64 "\n",
> >> +   info->hot_plug_memory);
> > it shouldn't be printed if hotplug is not enabled
>
> In which case, the QAPI change in the .json file should mark it as
> '*hot-plug-memory', to make it an optional field, and you'll need to
> check/set info->has_hot_plug_memory as appropriate.
>
> >
> >> +monitor_printf(mon, "ballooned-actual-memory: %" PRIu64 "\n",
> >> +   info->ballooned_actual_memory);
> >> +g_free(info);
> >
> > probably there is autogenerated qapi_free_FOO() for MemInfo type
>
> Yes, there is.
>
> > since it's QAPI type, it should be used here instead of g_free() if it
> exists.
>
> Yes, that is correct.
>
>
> >> +#
> >> +# @ballooned-actual-memory: amount of guest memory available after
> ballooning.
> >> +#
> >> +# Since: 2.10.0
> >> +##
> >> +{ 'struct': 'MemoryInfo',
> >> +  'data'  : { 'base-memory': 'int',
> >> 'hot-plug-memory': 'int',
> > should be optional and shown only if hotplug is actually enabled
> >
> >> +  'ballooned-actual-memory': 'int' } }
> > maybe the same for ballooning
>
> Yes, that makes sense to have both of those stats be optional, since
> they are opt-in configurations.
>
> --
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.   +1-919-301-3266
> Virtualization:  qemu.org | libvirt.org
>
>

Re: [Qemu-devel] [PULL 0/7] Block patches

2017-06-30 Thread Peter Maydell

On 30 June 2017 at 12:46, Stefan Hajnoczi  wrote:
> The following changes since commit 464588675455afda2899e20a0b120e4075de50c7:
>
>   Merge remote-tracking branch 'remotes/sstabellini/tags/xen-20170627-tag' 
> into staging (2017-06-29 11:45:01 +0100)
>
> are available in the git repository at:
>
>   git://github.com/stefanha/qemu.git tags/block-pull-request
>
> for you to fetch changes up to c324fd0a39cee43c13f6d1fb34f74fc5e2fb007b:
>
>   virtio-pci: use ioeventfd even when KVM is disabled (2017-06-30 11:03:45 
> +0100)
>
> 
>
> 
>
> Stefan Hajnoczi (7):
>   virtio-blk: trace vdev so devices can be distinguished
>   libqos: fix typo in virtio.h QVirtQueue->used comment
>   libqos: add virtio used ring support
>   tests: fix virtio-scsi-test ISR dependence
>   tests: fix virtio-blk-test ISR dependence
>   tests: fix virtio-net-test ISR dependence
>   virtio-pci: use ioeventfd even when KVM is disabled

Applied, thanks.

-- PMM

Re: [Qemu-devel] [PATCH v6 04/10] migration: let MigrationState be a qdev

2017-06-30 Thread Max Reitz

On 2017-06-30 15:05, Eric Blake wrote:
> On 06/30/2017 07:33 AM, Max Reitz wrote:
> 
>>> The assertion is caused by migrate_add_blocker() called before
>>> initialization of migration object. I'll fix it.
>>
>> Thanks!
>>
>>> But even with a fix (so I can pass 055 now), I still cannot pass some
>>> of the other tests. Errors I got:
>>>
>>>   https://pastebin.com/ACqbXAYd
>>>
>>> I am not familiar with iotests. Is above usual? Looks like it still
>>> includes 3 failures, and some output mismatch.
>>
>> Well, not usual. But 068 just is broken on master currently (Stefan has
>> sent "virtio: use ioeventfd in TCG and qtest mode" to fix it, and it's
>> part of his latest pull request). The failure in test 087 is because you
>> don't have aio=native enabled in your build, as the message says. :-)
> 
> We could obviously patch 087 to gracefully skip instead of fail when the
> build doesn't support running it.
> 
>>
>> I'm not sure about 118. Maybe the os.chmod() doesn't work as intended on
>> your machine...? Because it tries to open a read-only image as
>> read/write and wants to see it fail (which it doesn't in your case).
> 
> Maybe a run-as-root issue, where root can write to the file in spite of
> permissions?

That's what I had in mind, too.

>   Ideally, I'm reluctant to run testsuites as root without
> good reason (or at least a good sandbox), for fear that a bug in the
> testsuite will hose my system.

I never do. :-)

There is one test which requires it, but well, that just never gets run
on my machine (and I don't feel very sorry about it).

Max



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v8 0/9] qemu-img: add measure sub-command

2017-06-30 Thread Max Reitz

On 2017-06-14 17:35, Stefan Hajnoczi wrote:
> v8:
>  * Check 2P file size in qemu-iotests 178 [Berto]
>  * Use QCOW_MAX_L1_SIZE to check maximum virtual disk size [Berto]
> 
> v7:
>  * Check max file size with 7 exabytes [Berto]
>  * Really use UINT64_MAX everywhere instead of ~0ULL [Berto]
> 
> v6:
>  * Change bdrv_measure() return type to BlockMeasureInfo * [Eric]
>  * Clarify that holes in sparse POSIX files are still counted [Eric]
> 
> v5:
>  * Use UINT64_MAX instead of ~0ULL [Berto]
>  * Document qemu-img measure ofmt, fmt, output_fmt, and snapshot_param
>[Berto]
> 
> v4:
>  * Make qcow2 refcount calculation conservative [Maor]
>  * Include actual qemu-img convert image size in test cases
> 
> v3:
>  * Drop RFC, this is ready to go for QEMU 2.10
>  * Use "required size" instead of "required bytes" in qemu-img output for
>consistency [Nir]
>  * Clarify BlockMeasureInfo semantics [Max]
>  * Clarify bdrv_measure() opts argument and error handling [Nir]
>  * Handle -o backing_file= for qcow2 [Max]
>  * Handle snapshot options in qemu-img measure
>  * Probe input image for allocated data clusters for qcow2.  Didn't centralize
>this because there are format-specific aspects such as the cluster_size.  
> It
>may make sense to centralize it later (with a bit more complexity) if
>support is added to more formats.
>  * Add qemu-img(1) man page section for 'measure' sub-command [Max]
>  * Extend test case to cover additional scenarios [Nir]
> 
> RFCv2:
>  * Publishing RFC again to discuss the new user-visible interfaces.  Code has
>changed quite a bit, I have not kept any Reviewed-by tags.
>  * Rename qemu-img sub-command "measure" and API bdrv_measure() [Nir]
>  * Report both "required bytes" and "fully allocated bytes" to handle the 
> empty
>image file and prealloc use cases [Nir and Dan]
>  * Use bdrv_getlength() instead of bdrv_nb_sectors() [Berto]
>  * Rename "err" label "out" in qemu-img-cmds.c [Nir]
>  * Add basic qcow2 support, doesn't support qemu-img convert from existing 
> files yet
> 
> RFCv1:
>  * Publishing patch series with just raw support, no qcow2 yet.  Please review
>the command-line interface and let me know if you are happy with this
>approach.
> 
> Users and management tools sometimes need to know the size required for a new
> disk image so that an LVM volume, SAN LUN, etc can be allocated ahead of time.
> Image formats like qcow2 have non-trivial metadata that makes it hard to
> estimate the exact size without knowledge of file format internals.
> 
> This patch series introduces a new qemu-img sub-command that calculates the
> required size for both image creation and conversion scenarios.
> 
> The conversion scenario is:
> 
>   $ qemu-img measure -f raw -O qcow2 input.img
>   required size: 1327680
>   fully allocated size: 1074069504
> 
> Here an existing image file is taken and the output includes the space 
> required
> for data from the input image file.
> 
> The creation scenario is:
> 
>   $ qemu-img measure -O qcow2 --size 5G
>   required size: 327680
>   fully allocated size: 1074069504
> 
> Stefan Hajnoczi (9):
>   block: add bdrv_measure() API
>   raw-format: add bdrv_measure() support
>   qcow2: extract preallocation calculation function
>   qcow2: make refcount size calculation conservative
>   qcow2: extract image creation option parsing
>   qcow2: add bdrv_measure() support
>   qemu-img: add measure subcommand
>   qemu-iotests: support per-format golden output files
>   iotests: add test 178 for qemu-img measure
> 
>  qapi/block-core.json |  25 +++
>  include/block/block.h|   2 +
>  include/block/block_int.h|   2 +
>  block.c  |  35 
>  block/qcow2.c| 383 
> +--
>  block/raw-format.c   |  26 +++
>  qemu-img.c   | 234 
>  qemu-img-cmds.hx |   6 +
>  qemu-img.texi|  30 +++
>  tests/qemu-iotests/178   | 170 +
>  tests/qemu-iotests/178.out.qcow2 | 286 +
>  tests/qemu-iotests/178.out.raw   | 158 
>  tests/qemu-iotests/check |   5 +
>  tests/qemu-iotests/group |   1 +
>  14 files changed, 1268 insertions(+), 95 deletions(-)
>  create mode 100755 tests/qemu-iotests/178
>  create mode 100644 tests/qemu-iotests/178.out.qcow2
>  create mode 100644 tests/qemu-iotests/178.out.raw

The series looks good to me, but the output_fmt is a bit too weird to
take it. A follow-up patch would work, too, though.

Max



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH v1 1/1] xilinx-dp: Add support for the yuy2 video format

2017-06-30 Thread Edgar E. Iglesias

From: "Edgar E. Iglesias" 

Add support for the yuy2 video format.

Signed-off-by: Edgar E. Iglesias 
Acked-by: Sai Pavan Boddu 
---
 hw/display/xlnx_dp.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/display/xlnx_dp.c b/hw/display/xlnx_dp.c
index f7b7b80..a77d7db 100644
--- a/hw/display/xlnx_dp.c
+++ b/hw/display/xlnx_dp.c
@@ -624,6 +624,9 @@ static void xlnx_dp_change_graphic_fmt(XlnxDPState *s)
 case 0:
 s->v_plane.format = PIXMAN_x8b8g8r8;
 break;
+case DP_NL_VID_Y0_CB_Y1_CR:
+s->v_plane.format = PIXMAN_yuy2;
+break;
 case DP_NL_VID_RGBA8880:
 s->v_plane.format = PIXMAN_x8b8g8r8;
 break;
-- 
2.7.4

Re: [Qemu-devel] BIT_WORD(start >> TARGET_PAGE_BITS)

2017-06-30 Thread Stefan Hajnoczi

On Fri, Jun 30, 2017 at 2:02 PM, ali saeedi  wrote:
> Hello
> what does the following code do?
> 'unsigned long page = BIT_WORD(start >> TARGET_PAGE_BITS)' ?
> thanks a lot

Aside from Eric's response, I want to mention that I've noticed your
frequent questions too.

A volunteer project like QEMU can only make progress if people "give"
more than they "take".

Each question you ask takes someone's time.  Often the person
responding searches the code base for you and then replies with the
answer.

If you keep demanding time and attention without giving back to QEMU
then people notice this and they will stop responding.

I'm not trying to attack you, just letting you know what I've often
seen happen when someone wants more support than the volunteers are
willing to give.

This is an opportunity to do two things:

1. Learn to answer your own questions by reading the code, adding
printfs, using a debugger, and other techniques.  This will make you
more independent and you'll be more productive.

2. Start contributing to QEMU by reviewing patches on the mailing
list, helping users on #qemu IRC, etc.

I'd be happy to discuss how to do these things in more detail.

Stefan

Re: [Qemu-devel] [PATCH v3 2/3] tcg/aarch64: Use ADRP+ADD to compute target address

2017-06-30 Thread Pranith Kumar

On Fri, Jun 30, 2017 at 12:47 AM, Richard Henderson  wrote:
> On 06/29/2017 05:40 PM, Pranith Kumar wrote:
>>
>>   void aarch64_tb_set_jmp_target(uintptr_t jmp_addr, uintptr_t addr)
>>   {
>>   tcg_insn_unit *code_ptr = (tcg_insn_unit *)jmp_addr;
>> -tcg_insn_unit *target = (tcg_insn_unit *)addr;
>> +tcg_insn_unit i1, i2;
>> +uint64_t pair;
>>   +ptrdiff_t offset = addr - jmp_addr;
>> +
>> +if (offset == sextract64(offset, 0, 26)) {
>> +i1 = NOP;
>> +i2 = I3206_B | ((offset >> 2) & 0x3ff);
>
>
> Branch first, since that's the offset you calculated.
> Also, the nop need not be executed.

This is exactly how I form the instruction pair below (B+NOP, not
NOP+B). But I get your point. It is confusing to use i1 for the second
instruction. I'll change it.

>
>> +} else {
>> +offset = (addr >> 12) - (jmp_addr >> 12);
>> +
>> +/* patch ADRP */
>> +i2 = deposit32(*code_ptr++, 29, 2, offset & 0x3);
>> +i2 = deposit32(i2, 5, 19, offset >> 2);
>> +/* patch ADDI */
>> +i1 = deposit32(*code_ptr, 10, 12, addr & 0xfff);
>
>
> You can't just patch these insns, because they aren't necessarily ADRP+ADD.
> Indeed, they will very likely be B and NOP.  The first address we patch in
> is tb_jmp_reset_offset, which is the following opcode, which is definitely
> in range of the branch above.

Whoops, I totally missed that we patch these out the first time out. I
will explicitly generate the ADRP+ADD pair from here.

Thanks,
-- 
Pranith

Re: [Qemu-devel] [PATCH v8 1/9] block: add bdrv_measure() API

2017-06-30 Thread Max Reitz

On 2017-06-14 17:35, Stefan Hajnoczi wrote:
> bdrv_measure() provides a conservative maximum for the size of a new
> image.  This information is handy if storage needs to be allocated (e.g.
> a SAN or an LVM volume) ahead of time.
> 
> Signed-off-by: Stefan Hajnoczi 
> Reviewed-by: Alberto Garcia 
> ---
> v6:
>  * Change bdrv_measure() return type to BlockMeasureInfo * [Eric]
>  * Clarify that holes in sparse POSIX files are still counted [Eric]
> ---
>  qapi/block-core.json  | 25 +
>  include/block/block.h |  2 ++
>  include/block/block_int.h |  2 ++
>  block.c   | 35 +++
>  4 files changed, 64 insertions(+)

[...]

> diff --git a/block.c b/block.c
> index fa1d06d..056400a 100644
> --- a/block.c
> +++ b/block.c
> @@ -3446,6 +3446,41 @@ int64_t bdrv_get_allocated_file_size(BlockDriverState 
> *bs)
>  return -ENOTSUP;
>  }
>  
> +/*
> + * bdrv_measure:
> + * @drv: Format driver
> + * @opts: Creation options for new image
> + * @in_bs: Existing image containing data for new image (may be NULL)
> + * @errp: Error object
> + * Returns: A #BlockMeasureInfo (free using qapi_free_BlockMeasureInfo())
> + *  or NULL on error
> + *
> + * Calculate file size required to create a new image.
> + *
> + * If @in_bs is given then space for allocated clusters and zero clusters
> + * from that image are included in the calculation.  If @opts contains a
> + * backing file that is shared by @in_bs then backing clusters are omitted
> + * from the calculation.

Well, it's a bit weird that qcow2 just doesn't do this, then. :-)

I'm not really complaining (I think it's the best for qcow2 to ignore
this, at least for now), but maybe an s/are/may be/ would be more
appropriate...

(I wouldn't have complained if I hadn't replied to patch 7 anyway.)

Max

> + *
> + * If @in_bs is NULL then the calculation includes no allocated clusters
> + * unless a preallocation option is given in @opts.
> + *
> + * Note that @in_bs may use a different BlockDriver from @drv.
> + *
> + * If an error occurs the @errp pointer is set.
> + */
> +BlockMeasureInfo *bdrv_measure(BlockDriver *drv, QemuOpts *opts,
> +   BlockDriverState *in_bs, Error **errp)
> +{
> +if (!drv->bdrv_measure) {
> +error_setg(errp, "Block driver '%s' does not support size 
> measurement",
> +   drv->format_name);
> +return NULL;
> +}
> +
> +return drv->bdrv_measure(opts, in_bs, errp);
> +}
> +
>  /**
>   * Return number of sectors on success, -errno on error.
>   */
> 




signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] BIT_WORD(start >> TARGET_PAGE_BITS)

2017-06-30 Thread ali saeedi

thank you so much Eric
Sorry for too questions
I certainly follow your guide
thanks a lot


On Fri, Jun 30, 2017 at 5:49 PM, Eric Blake  wrote:

> On 06/30/2017 08:02 AM, ali saeedi wrote:
> > Hello
> > what does the following code do?
> > 'unsigned long page = BIT_WORD(start >> TARGET_PAGE_BITS)' ?
>
> I've noticed you've been asking a lot of questions (each as a new
> thread, rather than replying to previous answers), that seem like you
> are not trying very hard to read the code and find an answer for
> yourself.  Rather than just answer you, I'm going to try to teach you
> how to read the source and answer the question yourself.  You may also
> get better answers to your future questions if you ask in the form of
> "this code is confusing me, I think it means this, based on what I read
> at xyz, but would like confirmation or correction" (showing WHAT you
> have already researched) rather than just "what does this code do" (and
> making it feel like you are off-loading the research work onto others).
>
> First, figure out what BIT_WORD does:
>
> $ git grep 'define BIT_WORD'
>
> That should have only one hit, in include/qemu/bitops.h.  Reading it in
> context doesn't have any more comments, but it looks like it is
> computing the number of bits that are available in a word, and looks
> like it is defining a word to be the type most efficiently operated on
> for the current ABI (a long is 32 bits on a 32-bit OS, and 64 bits on a
> 64-bit OS).
>
> It also looks like you are scaling a start address by the number of bits
> in a target page.
>
> So it probably means you are computing the index for which page 'start'
> occurs on (depending on values, it might mean that 'start == 0x0' is
> page 0, 'start == 0x1' is page 1, and so on), where start is
> initially in the form of the number of bits per page.
>
> --
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.   +1-919-301-3266
> Virtualization:  qemu.org | libvirt.org
>
>

Re: [Qemu-devel] [PATCH 1/3] i386/msi: Correct mask of destination ID in MSI address

2017-06-30 Thread Anthony PERARD

On Thu, Jun 29, 2017 at 01:49:52AM -0400, Lan Tianyu wrote:
> From: Chao Gao 
> 
> According to SDM 10.11.1, only [19:12] bits of MSI address are
> Destination ID, change the mask to avoid ambiguity for VT-d spec
> has used the bit 4 to indicate a remappable interrupt request.
> 
> Signed-off-by: Chao Gao 
> Signed-off-by: Lan Tianyu 

Reviewed-by: Anthony PERARD 

> ---
>  include/hw/i386/apic-msidef.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/hw/i386/apic-msidef.h b/include/hw/i386/apic-msidef.h
> index 8b4d4cc..420b411 100644
> --- a/include/hw/i386/apic-msidef.h
> +++ b/include/hw/i386/apic-msidef.h
> @@ -26,6 +26,6 @@
>  
>  #define MSI_ADDR_DEST_ID_SHIFT  12
>  #define MSI_ADDR_DEST_IDX_SHIFT 4
> -#define  MSI_ADDR_DEST_ID_MASK  0x000
> +#define  MSI_ADDR_DEST_ID_MASK  0x000ff000
>  
>  #endif /* HW_APIC_MSIDEF_H */

-- 
Anthony PERARD

1 2 3 >

1 - 100 of 210 matches

Mail list logo