date:20180517

Re: [Qemu-devel] [PATCH 4/9] target/riscv: Introduce cpu_riscv_get_fcsr

2018-05-17 Thread Michael Clark

On Fri, May 11, 2018 at 3:52 PM, Richard Henderson <
richard.hender...@linaro.org> wrote:

> Cc: Michael Clark 
> Cc: Palmer Dabbelt 
> Cc: Sagar Karandikar 
> Cc: Bastian Koppelmann 
> Signed-off-by: Richard Henderson 
>

I'm not against this change but it conflicts with changes in the riscv
repo. I should post my patch queue to the list...

We have made a somewhat medium sized change and have unraveled two
monolithic switch statements out of csr_read_helper switch and
csr_write_helper into clearly decomposed functions for modifying control
and status registers, along with an interface to allow CPUs to hook custom
control and status registers. This was done to support atomic
read/modify/write CSRs which was not possible to achieve with the current
helpers which separately called via the csr_read_helper followed by
csr_write_helper. Given the only way to modify CSRs was via the switch
statements, we needed to move them out to provide a mechanism for CSRs that
wish to be truly atomic. e.g. 'mip'. The CSR functions are defined in The
RISC-V Instruction Set Manual Volume I: User-Level ISA Document Version 2.2
as "atomic" instructions:

- CSRRW (Atomic Read/Write CSR)
- CSRRS (Atomic Read and Set Bits in CSR)
- CSRRC (Atomic Read and Clear Bits in CSR)

We have thus changed QEMU to allow truly atomic CSR implementations. The
new implementation replaces the compiler doing compare/branch vs jump table
switch codegen for a sparse CSR address space with a single array of
function pointers. i.e. load, indirect jump. Along with this change we have
also renamed functions in target/riscv to use riscv_ prefix and added a
public interface to hook custom CSRs. The CSR changes will allow out of
tree code to hook custom CSRs without needing to change target/riscv code.

- riscv_cpu_ won over cpu_riscv_ given the number of functions conforming
with the former riscv_ prefix and the desire for consistency in target/riscv

In the riscv tree we now have riscv_csr_read(env, CSR_FCSR)
and riscv_csr_write(env, CSR_FCSR, fcsr) as the method to read and write
the composite. There is also a user in linux-user/riscv/signal.c that
should probably use the new interface. We could change
linux-user/riscv/signal.c to use your new interface however your interface
only provides a read method and no write method, so the write interface
remains in the (current) big CSR switch statement, leaving an inconsitency
between the encapsulation of read and write. We currently have the new fcsr
read and write encapsulated in static functions read_fcsr and write_fcsr in
a new csr module (which should perhaps be called csr_helper.c).

See:

- https://github.com/riscv/riscv-qemu/commits/qemu-2.13-for-upstream
-
https://github.com/riscv/riscv-qemu/commit/0783ce5ea580552b1f8e2f16a3e3cc1af19db69b
-
https://github.com/riscv/riscv-qemu/commit/fa17549fbc726e83a3c163b1534c7465147c6718

> ---
>  target/riscv/cpu.h| 1 +
>  target/riscv/fpu_helper.c | 6 ++
>  target/riscv/op_helper.c  | 3 +--
>  3 files changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
> index 34abc383e3..f2bc243b95 100644
> --- a/target/riscv/cpu.h
> +++ b/target/riscv/cpu.h
> @@ -265,6 +265,7 @@ void QEMU_NORETURN do_raise_exception_err(CPURISCVState
> *env,
>uint32_t exception, uintptr_t
> pc);
>
>  target_ulong cpu_riscv_get_fflags(CPURISCVState *env);
> +target_ulong cpu_riscv_get_fcsr(CPURISCVState *env);
>  void cpu_riscv_set_fflags(CPURISCVState *env, target_ulong);
>
>  #define TB_FLAGS_MMU_MASK  3
> diff --git a/target/riscv/fpu_helper.c b/target/riscv/fpu_helper.c
> index abbadead5c..41c7352115 100644
> --- a/target/riscv/fpu_helper.c
> +++ b/target/riscv/fpu_helper.c
> @@ -37,6 +37,12 @@ target_ulong cpu_riscv_get_fflags(CPURISCVState *env)
>  return hard;
>  }
>
> +target_ulong cpu_riscv_get_fcsr(CPURISCVState *env)
> +{
> +return (cpu_riscv_get_fflags(env) << FSR_AEXC_SHIFT)
> + | (env->frm << FSR_RD_SHIFT);
> +}
> +
>  void cpu_riscv_set_fflags(CPURISCVState *env, target_ulong hard)
>  {
>  int soft = 0;
> diff --git a/target/riscv/op_helper.c b/target/riscv/op_helper.c
> index 3abf52453c..fd2d8c0a9d 100644
> --- a/target/riscv/op_helper.c
> +++ b/target/riscv/op_helper.c
> @@ -423,8 +423,7 @@ target_ulong csr_read_helper(CPURISCVState *env,
> target_ulong csrno)
>  return env->frm;
>  case CSR_FCSR:
>  validate_mstatus_fs(env, GETPC());
> -return (cpu_riscv_get_fflags(env) << FSR_AEXC_SHIFT)
> -| (env->frm << FSR_RD_SHIFT);
> +return cpu_riscv_get_fcsr(env);
>  /* rdtime/rdtimeh is trapped and emulated by bbl in system mode */
>  #ifdef CONFIG_USER_ONLY
>  case CSR_TIME:
> --
> 2.17.0
>
>

Re: [Qemu-devel] [PATCH] RISC-V: make it possible to alter default reset vector

2018-05-17 Thread Michael Clark

On Tue, May 8, 2018 at 9:08 AM, Antony Pavlov 
wrote:

> The RISC-V Instruction Set Manual, Volume II:
> Privileged Architecture, Version 1.10 states
> that upon reset the pc is set to
> an implementation-defined reset vector
> (see chapter 3.3 Reset).
>
> This patch makes it possible to alter default
> reset vector by setting "rstvec" property
> for TYPE_RISCV_HART_ARRAY.
>

This one needs some thought. We have already made it possible for a CPU
class to override the reset vector, with consideration of this exact use
case.

The idea with the current approach is that you instantiate your specific
CPU model (target/riscv) in the hardware machine model (hw/riscv) and the
reset vector is a property of the CPU model you are using on your machine,
which is how it is now.

RISCVHartArray needs some work. First it is in 'hw/riscv' not
'target/riscv'. Secondly RISCVHartArray is commented as "heterogenous"
(nevertheless the current implementation is "homogeneous" so apologies if
this mislead you). The intention is that RISCVHartArray can construct a
heterogenous array of different cpu models with some topology. The current
shortcoming of homogeneity is in the current constructor which has only one
model property. The SiFive U54-MC for example has 5 cores, a 'e51' no-MMU
monitor core and 4 'u54' application cores e.g. "e51,u54,u54,u54,u54". The
reset vector should be a property of each CPU, given they can be different
models with different reset vectors. RISCVHartArray is a work in progress.

Background on whay RISCVHartArray exists, as a placeholder that will be
expanded to add support for configuration of "heterogeneous" core
complexes. There is some periphery that needs to reflect on the CPU array
properties. The SiFivePLIC memory layout is dependent on the cores in the "
heterogenous" core complex. We haven't yet wired the PLIC to RISCVHartArray
to get topoology information so it currently uses a mode list property
"M,MS,MS,MS,MS". The idea is that it can eventuall reflect on topology
configuration from the RISCVHartArray. Example: the 'e51' coreplex in the
U54-MC does not support S-mode but the 'u54' coreplex application cores do,
and the interrupt controller memory map is dependent on the topology. Our
idea was to re-use RISCVHartArray logic for instantiating "heterogenous"
core complexes as part of an SOC using some configuration e.g. an array of
cpu models. It's likely that individial CPUs will have different modes,
different reset vectors, different extensions, etc, etc.

Q1. Which CPU are you actually modelling?
Q2. Can you achieve your goal by defining your own CPU? I believe yes.
Q3. What object should have the property of a reset vector?

I believe the answer to Q3 is a specific cpu model, which is how we have
it, not a heterogenous array where some CPUs may indeed have different
reset vectors.

Apologies if RISCVHartArray gave you the idea the harts where homogeneous.
I believe moving a per cpu property onto the array is IMHO not the right
thing to do.

This leads into an RFC that I need to write on modelling dynamically
reconfigurable hardware models that can be produced by SiFive's core
generator. We want an interface that is kinder to the user than complex
command line options... or ones deemed inappropriate such as inferring
toplogy from a device-tree passed with -dtb e.g. an SOC class that
instantiates its cpu cores and hardware blocks as defined in the
device-tree at the given memory addresses with the given interrupt routing,
etc (we understand that this may work for some simple configurations, but
not for more complex configurations). The reset vector is a good example of
a property that is not available in device-tree. In any case the RFC on
configuration models for dynamically reconfigurable hardware will be
another email, however this serves as context for it. i.e. where a property
should be located. In fact we should fix the RISCVHartArray constructor or
possible move the class altogether until we have a good model for
constructing topology without hardcoding it in SOC structures, which for
SiFive's use of QEMU, would be a combinatorial explosion, given the
combinations of cores, extensions and blocks that can be generated by their
core generator, and that we wish to model in QEMU or some derivative, if we
have to maintain reconfigurable hardware support in a SiFive tree. I'll
leave the RFC proper for another email. This is just an abstract.

BTW - there are plently of others you can get to accept this patch ;-) See
the 'Cc.

Signed-off-by: Antony Pavlov 
> Cc: Michael Clark 
> Cc: Palmer Dabbelt 
> Cc: Sagar Karandikar 
> Cc: Bastian Koppelmann 
> Cc: Peter Crosthwaite 
> Cc: Peter Maydell 
> ---
>  hw/riscv/riscv_hart.c |  3 +++
>  include/hw/riscv/riscv_hart.h |  1 +
>

Re: [Qemu-devel] [PATCH 4/9] target/riscv: Introduce cpu_riscv_get_fcsr

2018-05-17 Thread Richard Henderson

On 05/17/2018 07:46 PM, Michael Clark wrote:
> 
> 
> On Fri, May 11, 2018 at 3:52 PM, Richard Henderson
> > wrote:
> 
> Cc: Michael Clark >
> Cc: Palmer Dabbelt >
> Cc: Sagar Karandikar  >
> Cc: Bastian Koppelmann  >
> Signed-off-by: Richard Henderson  >
> 
> 
> I'm not against this change but it conflicts with changes in the riscv repo. I
> should post my patch queue to the list...

Ok, I'll drop this for now, and the dump of the FCSR in the next patch.
To be revisited once your csr reorg is upstream...


r~

Re: [Qemu-devel] [PATCH v2 01/10] intel-iommu: send PSI always even if across PDEs

2018-05-17 Thread Peter Xu

On Thu, May 17, 2018 at 04:42:54PM +0200, Auger Eric wrote:
> Hi Peter,
> 
> On 05/04/2018 05:08 AM, Peter Xu wrote:
> > During IOVA page table walking, there is a special case when the PSI
> > covers one whole PDE (Page Directory Entry, which contains 512 Page
> > Table Entries) or more.  In the past, we skip that entry and we don't
> > notify the IOMMU notifiers.  This is not correct.  We should send UNMAP
> > notification to registered UNMAP notifiers in this case.
> > 
> > For UNMAP only notifiers, this might cause IOTLBs cached in the devices
> > even if they were already invalid.  For MAP/UNMAP notifiers like
> > vfio-pci, this will cause stale page mappings.
> > 
> > This special case doesn't trigger often, but it is very easy to be
> > triggered by nested device assignments, since in that case we'll
> > possibly map the whole L2 guest RAM region into the device's IOVA
> > address space (several GBs at least), which is far bigger than normal
> > kernel driver usages of the device (tens of MBs normally).
> > 
> > Without this patch applied to L1 QEMU, nested device assignment to L2
> > guests will dump some errors like:
> > 
> > qemu-system-x86_64: VFIO_MAP_DMA: -17
> > qemu-system-x86_64: vfio_dma_map(0x557305420c30, 0xad000, 0x1000,
> > 0x7f89a920d000) = -17 (File exists)
> > 
> > Acked-by: Jason Wang 
> > [peterx: rewrite the commit message]
> > Signed-off-by: Peter Xu 
> > ---
> >  hw/i386/intel_iommu.c | 42 ++
> >  1 file changed, 30 insertions(+), 12 deletions(-)
> > 
> > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > index fb31de9416..b359efd6f9 100644
> > --- a/hw/i386/intel_iommu.c
> > +++ b/hw/i386/intel_iommu.c
> > @@ -722,6 +722,15 @@ static int vtd_iova_to_slpte(VTDContextEntry *ce, 
> > uint64_t iova, bool is_write,
> >  
> >  typedef int (*vtd_page_walk_hook)(IOMMUTLBEntry *entry, void *private);
> >  
> > +static int vtd_page_walk_one(IOMMUTLBEntry *entry, int level,
> > + vtd_page_walk_hook hook_fn, void *private)
> I find the function  name a bit weird as it does not does a ptw but
> rather call a callback on an entry. vtd_callback_wrapper?

It's a hook for the page walk process, and IMHO vtd_callback_wrapper
does not really provide any hint for the page walking.  So even if you
prefer the "callback_wrapper" naming I would still more prefer:

  vtd_page_walk_callback[_wrapper]

though if so I'd say I don't see much benefit comparing to use the old
vtd_page_walk_hook, which seems fine to me too...

> > +{
> > +assert(hook_fn);
> > +trace_vtd_page_walk_one(level, entry->iova, entry->translated_addr,
> > +entry->addr_mask, entry->perm);
> > +return hook_fn(entry, private);
> > +}
> > +
> >  /**
> >   * vtd_page_walk_level - walk over specific level for IOVA range
> >   *
> > @@ -781,28 +790,37 @@ static int vtd_page_walk_level(dma_addr_t addr, 
> > uint64_t start,
> >   */
> >  entry_valid = read_cur | write_cur;
> >  
> > +entry.target_as = _space_memory;
> > +entry.iova = iova & subpage_mask;
> > +entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
> > +entry.addr_mask = ~subpage_mask;
> > +
> >  if (vtd_is_last_slpte(slpte, level)) {
> > -entry.target_as = _space_memory;
> > -entry.iova = iova & subpage_mask;
> >  /* NOTE: this is only meaningful if entry_valid == true */
> >  entry.translated_addr = vtd_get_slpte_addr(slpte, aw);
> > -entry.addr_mask = ~subpage_mask;
> > -entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
> >  if (!entry_valid && !notify_unmap) {
> >  trace_vtd_page_walk_skip_perm(iova, iova_next);
> >  goto next;
> >  }
> > -trace_vtd_page_walk_one(level, entry.iova, 
> > entry.translated_addr,
> > -entry.addr_mask, entry.perm);
> > -if (hook_fn) {
> > -ret = hook_fn(, private);
> > -if (ret < 0) {
> > -return ret;
> > -}
> > +ret = vtd_page_walk_one(, level, hook_fn, private);
> > +if (ret < 0) {
> > +return ret;
> >  }
> >  } else {
> >  if (!entry_valid) {
> > -trace_vtd_page_walk_skip_perm(iova, iova_next);
> > +if (notify_unmap) {
> > +/*
> > + * The whole entry is invalid; unmap it all.
> > + * Translated address is meaningless, zero it.
> > + */
> > +entry.translated_addr = 0x0;
> do you really need to zero the translated_addr and the related comment.
> As soon as perm is NONE this should not be used?

Yes here we can avoid setting it.  However that'll make sure we

Re: [Qemu-devel] [PATCH v4 29/49] tests/tcg/arm: disable -p 32768 mmap test

2018-05-17 Thread Philippe Mathieu-Daudé

On 05/17/2018 06:34 PM, Richard Henderson wrote:
> On 05/17/2018 02:24 PM, Alex Bennée wrote:
>>
>> Richard Henderson  writes:
>>
>>> On 05/17/2018 10:46 AM, Alex Bennée wrote:
 Broken since I updated to 18.04

 Signed-off-by: Alex Bennée 
 ---
  tests/tcg/arm/Makefile.target | 8 
  1 file changed, 8 insertions(+)
>>>
>>> Meh.  Most of these fail for hosts with 64k pages.
>>> So, sure, disable this one, but I don't think that
>>> the others are useful either.
>>
>> I'm not entirely sure what the point of -p is meant to be. Is it just a
>> performance hack for linux-user to have bigger pages? We are not using
>> softmmu but I guess it affects the PageDesc structures?
> 
> I think it was just meant for testing, but I really have no idea.
> 
> If we actually had better support for mismatched host/guest page sizes, then
> one could view -p as a way to choose between legitimate guest page sizes.  
> E.g.
> 8k, 16k, 64k are all legitimate for aarch64.

8k + 16k on aarch64:
Tested-by: Philippe Mathieu-Daudé

Re: [Qemu-devel] [PATCH v2 04/10] intel-iommu: only do page walk for MAP notifiers

2018-05-17 Thread Peter Xu

On Thu, May 17, 2018 at 03:39:50PM +0200, Auger Eric wrote:
> Hi Peter,
> 
> On 05/04/2018 05:08 AM, Peter Xu wrote:
> > For UNMAP-only IOMMU notifiers, we don't really need to walk the page
> s/really// ;-)

Ok.

> > tables.  Fasten that procedure by skipping the page table walk.  That
> > should boost performance for UNMAP-only notifiers like vhost.
> > 
> > Signed-off-by: Peter Xu 
> > ---
> >  include/hw/i386/intel_iommu.h |  2 ++
> >  hw/i386/intel_iommu.c | 43 +++
> >  2 files changed, 40 insertions(+), 5 deletions(-)
> > 
> > diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
> > index ee517704e7..9e0a6c1c6a 100644
> > --- a/include/hw/i386/intel_iommu.h
> > +++ b/include/hw/i386/intel_iommu.h
> > @@ -93,6 +93,8 @@ struct VTDAddressSpace {
> >  IntelIOMMUState *iommu_state;
> >  VTDContextCacheEntry context_cache_entry;
> >  QLIST_ENTRY(VTDAddressSpace) next;
> > +/* Superset of notifier flags that this address space has */
> > +IOMMUNotifierFlag notifier_flags;
> >  };
> >  
> >  struct VTDBus {
> > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > index 112971638d..9a418abfb6 100644
> > --- a/hw/i386/intel_iommu.c
> > +++ b/hw/i386/intel_iommu.c
> > @@ -138,6 +138,12 @@ static inline void vtd_iommu_unlock(IntelIOMMUState *s)
> >  qemu_mutex_unlock(>iommu_lock);
> >  }
> >  
> > +/* Whether the address space needs to notify new mappings */
> > +static inline gboolean vtd_as_notify_mappings(VTDAddressSpace *as)
> would suggest vtd_as_has_map_notifier()? But tastes & colours ;-)

Yeah it is.  But okay, I can switch to that especially it's only used
in this patch and it's new.

> > +{
> > +return as->notifier_flags & IOMMU_NOTIFIER_MAP;
> > +}
> > +
> >  /* GHashTable functions */
> >  static gboolean vtd_uint64_equal(gconstpointer v1, gconstpointer v2)
> >  {
> > @@ -1433,14 +1439,35 @@ static void 
> > vtd_iotlb_page_invalidate_notify(IntelIOMMUState *s,
> >  VTDAddressSpace *vtd_as;
> >  VTDContextEntry ce;
> >  int ret;
> > +hwaddr size = (1 << am) * VTD_PAGE_SIZE;
> >  
> >  QLIST_FOREACH(vtd_as, &(s->notifiers_list), next) {
> >  ret = vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus),
> > vtd_as->devfn, );
> >  if (!ret && domain_id == VTD_CONTEXT_ENTRY_DID(ce.hi)) {
> > -vtd_page_walk(, addr, addr + (1 << am) * VTD_PAGE_SIZE,
> > -  vtd_page_invalidate_notify_hook,
> > -  (void *)_as->iommu, true, s->aw_bits);
> > +if (vtd_as_notify_mappings(vtd_as)) {
> > +/*
> > + * For MAP-inclusive notifiers, we need to walk the
> > + * page table to sync the shadow page table.
> > + */
> Potentially we may have several notifiers attached to the IOMMU MR ~
> vtd_as, each of them having different flags. Those flags are OR'ed in
> memory_region_update_iommu_notify_flags and this is the one you now
> store in the vtd_as. So maybe your comment may rather state:
> as soon as we have at least one MAP notifier, we need to do the PTW?

Actually this is not 100% clear too, since all the "MAP notifiers" are
actually both MAP+UNMAP notifiers...  Maybe:

  As long as we have MAP notifications registered in any of our IOMMU
  notifiers, we need to sync the shadow page table.

> 
> nit: not related to this patch: vtd_page_walk kerneldoc comments misses
> @notify_unmap param comment
> side note: the name of the hook is a bit misleading as it suggests we
> invalidate the entry, whereas we update any valid entry and invalidate
> stale ones (if notify_unmap=true)?
> > +vtd_page_walk(, addr, addr + size,
> > +  vtd_page_invalidate_notify_hook,
> > +  (void *)_as->iommu, true, s->aw_bits);
> > +} else {
> > +/*
> > + * For UNMAP-only notifiers, we don't need to walk the
> > + * page tables.  We just deliver the PSI down to
> > + * invalidate caches.
> 
> We just unmap the range?

Isn't it the same thing? :)

If to be explicit, here we know we only registered UNMAP
notifications, it's not really "unmap", it's really cache
invalidations only.

> > + */
> > +IOMMUTLBEntry entry = {
> > +.target_as = _space_memory,
> > +.iova = addr,
> > +.translated_addr = 0,
> > +.addr_mask = size - 1,
> > +.perm = IOMMU_NONE,
> > +};
> > +memory_region_notify_iommu(_as->iommu, entry);
> > +}
> >  }
> >  }
> >  }
> > @@ -2380,6 +2407,9 @@ static void 
> > vtd_iommu_notify_flag_changed(IOMMUMemoryRegion *iommu,
> >  exit(1);
> >  }
> >  
> > +/* Update

Re: [Qemu-devel] [PATCH v4 32/49] tests/tcg/arm: add fcvt test cases for AArch32/64

2018-05-17 Thread Philippe Mathieu-Daudé

On 05/17/2018 05:30 PM, Richard Henderson wrote:
> On 05/17/2018 10:47 AM, Alex Bennée wrote:
>> This runs through the usual float to float conversions and crucially
>> also runs with ARM Alternative Half Precision Format.
>>
>> Signed-off-by: Alex Bennée 
>> [rth: tweak vcvtb.f16.f32/vctb.f32.f16 code and regen]
>> Signed-off-by: Richard Henderson 
>>
>> ---
>> v4
>>   - add fcvt.ref and check results against it
>>   - fix single_to_half, single_to_double conversions
>>   - properly toggle AHP mode (fpsr->fpcr)
>>   - more values around the AHP margins
>>   - add INF/NAN/SNAN inputs
>>   - build for ARM and AArch64
>>   - fix bug for hex literals
>>   - add float-to-int
>>   - checkpatch fix
>> ---
> 
> Reviewed-by: Richard Henderson 

Thanks Richard, I was not very motivated to review each line of the
fcvt.ref files =)

> diff --git a/tests/tcg/arm/Makefile.target b/tests/tcg/arm/Makefile.target
> index 9d2b551732..7bb777f442 100644
> --- a/tests/tcg/arm/Makefile.target
> +++ b/tests/tcg/arm/Makefile.target
> @@ -8,7 +8,9 @@ ARM_SRC=$(SRC_PATH)/tests/tcg/arm
>  # Set search path for all sources
>  VPATH+= $(ARM_SRC)
>
> -TESTS += hello-arm test-arm-iwmmxt
> +ARM_TESTS=hello-arm test-arm-iwmmxt
> +
> +TESTS += $(ARM_TESTS) fcvt
>
>  hello-arm: CFLAGS+=-marm -ffreestanding
>  hello-arm: LDFLAGS+=-nostdlib
> @@ -24,3 +26,14 @@ run-test-mmap: test-mmap
>   $(call quiet-command, $(QEMU) -p 8192 $< 8192 > test-mmap-8192.out,
"TEST", "$< (8k pages) on $(TARGET_NAME)")
>   $(call quiet-command, $(QEMU) -p 16384 $< 16384 >
test-mmap-16384.out, "TEST", "$< (16k pages) on $(TARGET_NAME)")
>  endif
> +
> +ifeq ($(TARGET_NAME), arm)
> +fcvt: LDFLAGS+=-lm
> +# fcvt: CFLAGS+=-march=armv8.2-a+fp16 -mfpu=neon-fp-armv8

Alex, What is your idea here, enable this later?
Maybe add some TODO comment around..

Except this nit, for both Makefile.target:
Reviewed-by: Philippe Mathieu-Daudé 

> +
> +run-fcvt: fcvt
> + $(call quiet-command, \
> + $(QEMU) $< > fcvt.out && \
> + diff -u $(ARM_SRC)/fcvt.ref fcvt.out, \
> + "TEST", "$< (default) on $(TARGET_NAME)")
> +endif

$ make -j1 run-tcg-tests-aarch64-linux-user
  BUILD   debian9
  BUILD   debian-arm64-cross
  CROSS-BUILD aarch64 guest-tests with docker qemu:debian-arm64-cross
  BUILD   debian9
  BUILD   debian-arm64-cross
  CROSS-BUILD aarch64 guest-tests with docker qemu:debian-arm64-cross
  RUN-TESTS for aarch64
  TESTtest-mmap (default) on aarch64
  TESTtest-mmap (8k pages) on aarch64
  TESTtest-mmap (16k pages) on aarch64
  TESTtest-mmap (32k pages) on aarch64
  TESTsha1 on aarch64
  TESTlinux-test on aarch64
  TESTtestthread on aarch64
  TESTfcvt (default) on aarch64
  TESTsysregs on aarch64

\o/

Tested-by: Philippe Mathieu-Daudé 




signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v2 03/10] intel-iommu: add iommu lock

2018-05-17 Thread Peter Xu

On Thu, May 17, 2018 at 04:32:52PM +0200, Auger Eric wrote:
> Hi Peter,
> 
> On 05/04/2018 05:08 AM, Peter Xu wrote:
> > Add a per-iommu big lock to protect IOMMU status.  Currently the only
> > thing to be protected is the IOTLB/context cache, since that can be
> > accessed even without BQL, e.g., in IO dataplane.
> 
> As discussed together, Peter challenged per device mutex in
> "Re: [PATCH v11 15/17] hw/arm/smmuv3: Cache/invalidate config data"
> https://lists.gnu.org/archive/html/qemu-devel/2018-04/msg02403.html
> but at that time I fail to justify the use :-(
> 
> > 
> > Note that we don't need to protect device page tables since that's fully
> > controlled by the guest kernel.  However there is still possibility that
> > malicious drivers will program the device to not obey the rule.  In that
> > case QEMU can't really do anything useful, instead the guest itself will
> > be responsible for all uncertainties.
> > 
> > Reported-by: Fam Zheng 
> > Signed-off-by: Peter Xu 
> > ---
> >  include/hw/i386/intel_iommu.h |  6 +
> >  hw/i386/intel_iommu.c | 43 +++
> >  2 files changed, 44 insertions(+), 5 deletions(-)
> > 
> > diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
> > index 220697253f..ee517704e7 100644
> > --- a/include/hw/i386/intel_iommu.h
> > +++ b/include/hw/i386/intel_iommu.h
> > @@ -300,6 +300,12 @@ struct IntelIOMMUState {
> >  OnOffAuto intr_eim; /* Toggle for EIM cabability */
> >  bool buggy_eim; /* Force buggy EIM unless eim=off */
> >  uint8_t aw_bits;/* Host/IOVA address width (in bits) */
> > +
> > +/*
> > + * Protects IOMMU states in general.  Currently it protects the
> > + * per-IOMMU IOTLB cache, and context entry cache in VTDAddressSpace.
> > + */
> > +QemuMutex iommu_lock;
> >  };
> >  
> >  /* Find the VTD Address space associated with the given bus pointer,
> > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > index 5987b48d43..112971638d 100644
> > --- a/hw/i386/intel_iommu.c
> > +++ b/hw/i386/intel_iommu.c
> > @@ -128,6 +128,16 @@ static uint64_t 
> > vtd_set_clear_mask_quad(IntelIOMMUState *s, hwaddr addr,
> >  return new_val;
> >  }
> >  
> > +static inline void vtd_iommu_lock(IntelIOMMUState *s)
> > +{
> > +qemu_mutex_lock(>iommu_lock);
> > +}
> > +
> > +static inline void vtd_iommu_unlock(IntelIOMMUState *s)
> > +{
> > +qemu_mutex_unlock(>iommu_lock);
> > +}
> > +
> >  /* GHashTable functions */
> >  static gboolean vtd_uint64_equal(gconstpointer v1, gconstpointer v2)
> >  {
> > @@ -172,7 +182,7 @@ static gboolean vtd_hash_remove_by_page(gpointer key, 
> > gpointer value,
> >  }
> >  
> >  /* Reset all the gen of VTDAddressSpace to zero and set the gen of
> > - * IntelIOMMUState to 1.
> > + * IntelIOMMUState to 1.  Must be with IOMMU lock held.
> s/Must be/ Must be called ?

Ok.

> not done in vtd_init()

IMHO we can omit that since it's only used during either realization
or system reset.  But sure I can add it too.

> >   */
> >  static void vtd_reset_context_cache(IntelIOMMUState *s)
> >  {
> > @@ -197,12 +207,19 @@ static void vtd_reset_context_cache(IntelIOMMUState 
> > *s)
> >  s->context_cache_gen = 1;
> >  }
> >  
> > -static void vtd_reset_iotlb(IntelIOMMUState *s)
> > +static void vtd_reset_iotlb_locked(IntelIOMMUState *s)
> add the above comment and keep the original name?

I can add the comment; the original name is defined as another
function below.

> >  {
> >  assert(s->iotlb);
> >  g_hash_table_remove_all(s->iotlb);
> >  }
> >  
> > +static void vtd_reset_iotlb(IntelIOMMUState *s)

[1]

> > +{
> > +vtd_iommu_lock(s);
> > +vtd_reset_iotlb_locked(s);
> > +vtd_iommu_unlock(s);
> > +}
> > +
> >  static uint64_t vtd_get_iotlb_key(uint64_t gfn, uint16_t source_id,
> >uint32_t level)
> >  {
> > @@ -215,6 +232,7 @@ static uint64_t vtd_get_iotlb_gfn(hwaddr addr, uint32_t 
> > level)
> >  return (addr & vtd_slpt_level_page_mask(level)) >> VTD_PAGE_SHIFT_4K;
> >  }
> >  
> > +/* Must be with IOMMU lock held */
> >  static VTDIOTLBEntry *vtd_lookup_iotlb(IntelIOMMUState *s, uint16_t 
> > source_id,
> > hwaddr addr)
> >  {
> > @@ -235,6 +253,7 @@ out:
> >  return entry;
> >  }
> >  
> > +/* Must be with IOMMU lock held */
> >  static void vtd_update_iotlb(IntelIOMMUState *s, uint16_t source_id,
> >   uint16_t domain_id, hwaddr addr, uint64_t 
> > slpte,
> >   uint8_t access_flags, uint32_t level)
> > @@ -246,7 +265,7 @@ static void vtd_update_iotlb(IntelIOMMUState *s, 
> > uint16_t source_id,
> >  trace_vtd_iotlb_page_update(source_id, addr, slpte, domain_id);
> >  if (g_hash_table_size(s->iotlb) >= VTD_IOTLB_MAX_SIZE) {
> >  trace_vtd_iotlb_reset("iotlb exceeds size limit");
> > -

Re: [Qemu-devel] [edk2] [PATCH 1/4] ovmf: add and link with Tcg2PhysicalPresenceLibNull when !TPM2_ENABLE

2018-05-17 Thread Laszlo Ersek

On 05/15/18 14:30, marcandre.lur...@redhat.com wrote:
> From: Marc-André Lureau 
> 
> This NULL library will let us call
> Tcg2PhysicalPresenceLibProcessRequest() unconditionally from
> BdsPlatform when building without TPM2_ENABLE.
> 
> Signed-off-by: Marc-André Lureau 
> ---
>  .../DxeTcg2PhysicalPresenceLib.c  | 26 ++
>  .../DxeTcg2PhysicalPresenceLib.inf| 34 +++
>  OvmfPkg/OvmfPkgIa32.dsc   |  2 ++
>  OvmfPkg/OvmfPkgIa32X64.dsc|  2 ++
>  OvmfPkg/OvmfPkgX64.dsc|  2 ++
>  5 files changed, 66 insertions(+)
>  create mode 100644 
> OvmfPkg/Library/Tcg2PhysicalPresenceLibNull/DxeTcg2PhysicalPresenceLib.c
>  create mode 100644 
> OvmfPkg/Library/Tcg2PhysicalPresenceLibNull/DxeTcg2PhysicalPresenceLib.inf
> 
> diff --git 
> a/OvmfPkg/Library/Tcg2PhysicalPresenceLibNull/DxeTcg2PhysicalPresenceLib.c 
> b/OvmfPkg/Library/Tcg2PhysicalPresenceLibNull/DxeTcg2PhysicalPresenceLib.c
> new file mode 100644
> index ..0b8b98410315
> --- /dev/null
> +++ b/OvmfPkg/Library/Tcg2PhysicalPresenceLibNull/DxeTcg2PhysicalPresenceLib.c
> @@ -0,0 +1,26 @@
> +/** @file
> +  NULL Tcg2PhysicalPresenceLib library instance
> +
> +  Copyright (c) 2018, Red Hat, Inc.
> +  Copyright (c) 2013 - 2016, Intel Corporation. All rights reserved.
> +  This program and the accompanying materials
> +  are licensed and made available under the terms and conditions of the BSD 
> License
> +  which accompanies this distribution.  The full text of the license may be 
> found at
> +  http://opensource.org/licenses/bsd-license.php
> +
> +  THE PROGRAM IS DISTRIBUTED UNDER THE BSD LICENSE ON AN "AS IS" BASIS,
> +  WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY KIND, EITHER EXPRESS OR 
> IMPLIED.
> +
> +**/
> +
> +#include "PiDxe.h"

(1) Can you drop this #include?

> +#include 
> +
> +VOID
> +EFIAPI
> +Tcg2PhysicalPresenceLibProcessRequest (
> +  IN  TPM2B_AUTH *PlatformAuth  OPTIONAL
> +  )
> +{
> +return;
> +}

(2) Indentation.

Better yet: please replace the "return" statement with a comment:

  //
  // do nothing
  //

> diff --git 
> a/OvmfPkg/Library/Tcg2PhysicalPresenceLibNull/DxeTcg2PhysicalPresenceLib.inf 
> b/OvmfPkg/Library/Tcg2PhysicalPresenceLibNull/DxeTcg2PhysicalPresenceLib.inf
> new file mode 100644
> index ..e6f6239e1e00
> --- /dev/null
> +++ 
> b/OvmfPkg/Library/Tcg2PhysicalPresenceLibNull/DxeTcg2PhysicalPresenceLib.inf
> @@ -0,0 +1,34 @@
> +## @file
> +#  NULL Tcg2PhysicalPresenceLib library instance
> +#
> +#  In SecurityPkg, this library will check and execute TPM 1.2 request
> +#  from OS or BIOS. The request may ask for user confirmation before
> +#  execution. This Library will also lock TPM physical presence at
> +#  last.

(3) The approach on this comment is generally OK, but the specific text
originates from
"SecurityPkg/Library/DxeTcgPhysicalPresenceLib/DxeTcgPhysicalPresenceLib.inf".
I think we should update the comment from the TPM2 variant, namely
"SecurityPkg/Library/DxeTcg2PhysicalPresenceLib/DxeTcg2PhysicalPresenceLib.inf".

Thus, I suggest the following comment:

"Under SecurityPkg, the corresponding library instance will check and
execute TPM 2.0 request from OS or BIOS; the request may ask for user
confirmation before execution. This Null instance implements a no-op
Tcg2PhysicalPresenceLibProcessRequest(), without user interaction."

> +#
> +# Copyright (C) 2018, Red Hat, Inc.
> +# Copyright (c) 2009 - 2015, Intel Corporation. All rights reserved.

(4) Same comment applies to the Intel copyright notice: from the TCG2
variant, this should come as

"Copyright (c) 2013 - 2016, Intel Corporation. All rights reserved."

> +# This program and the accompanying materials
> +# are licensed and made available under the terms and conditions of the BSD 
> License
> +# which accompanies this distribution. The full text of the license may be 
> found at
> +# http://opensource.org/licenses/bsd-license.php
> +# THE PROGRAM IS DISTRIBUTED UNDER THE BSD LICENSE ON AN "AS IS" BASIS,
> +# WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY KIND, EITHER EXPRESS OR 
> IMPLIED.
> +#
> +##
> +
> +[Defines]
> +  INF_VERSION= 0x00010005
> +  BASE_NAME  = DxeTcg2PhysicalPresenceLibNull
> +  FILE_GUID  = 2A6BA243-DC22-42D8-9C3D-AE3728DC7AFA
> +  MODULE_TYPE= DXE_DRIVER
> +  VERSION_STRING = 1.0
> +  LIBRARY_CLASS  = Tcg2PhysicalPresenceLib|DXE_DRIVER 
> DXE_RUNTIME_DRIVER DXE_SAL_DRIVER UEFI_APPLICATION UEFI_DRIVER
> +
> +[Sources]
> +  DxeTcg2PhysicalPresenceLib.c
> +
> +[Packages]
> +  MdePkg/MdePkg.dec
> +  MdeModulePkg/MdeModulePkg.dec

(5) I think you can drop "MdeModulePkg/MdeModulePkg.dec". (MdePkg.dec is
needed by all modules, and SecurityPkg.dec below is needed for the lib
class header; so those are OK).

> +

Re: [Qemu-devel] [PATCH 1/2] qapi: allow flat unions with empty branches

2018-05-17 Thread Markus Armbruster

Anton Nefedov  writes:

> On 15/5/2018 8:40 PM, Markus Armbruster wrote:
>> Eric Blake  writes:
>>
>>> On 05/15/2018 02:01 AM, Markus Armbruster wrote:
>>>
>> QAPI language design alternatives:
>>
>> 1. Having unions cover all discriminator values explicitly is useful.
>>>
>> 2. Having unions repeat all the discriminator values explicitly is not
>> useful.  All we need is replacing the code enforcing that by code
>> defaulting missing ones to the empty type.
>>
>> 3. We can't decide, so we do both (this patch).
>>
>>>
 I'd prefer a more opinionated design here.

 Either we opine making people repeat the tag values is an overall win.
 Then make them repeat them always, don't give them the option to not
 repeat them.  Drop this patch.  To reduce verbosity, we can add a
 suitable way to denote a predefined empty type.

 Or we opine it's not.  Then let them not repeat them, don't give them
 the option to make themselves repeat them.  Evolve this patch into one
 that makes flat union variants optional and default to empty.  If we
 later find we still want a way to denote a predefined empty type, we can
 add it then.

 My personal opinion is it's not.
>>>
>>> I followed the arguments up to the last sentence, but then I got lost
>>> on whether you meant:
>>>
>>> This patch is not an overall win, so let's drop it and keep status quo
>>> and/or implement a way to write 'branch':{} (option 1 above)
>>>
>>> or
>>>
>>> Forcing repetition is not an overall win, so let's drop that
>>> requirement by using something similar to this patch (option 2 above)
>>> but without adding a 'partial-data' key.
>>
>> Sorry about that.  I meant the latter.
>>
>>> But you've convinced me that option 3 (supporting a compact branch
>>> representation AND supporting missing branches as defaulting to an
>>> empty type) is more of a maintenance burden (any time there is more
>>> than one way to write something, it requires more testing that both
>>> ways continue to work) and thus not worth doing without strong
>>> evidence that we need both ways (which we do not currently have).
>
> I agree that neither option 3 nor this patch are the best way to handle
> this, so it's 1 or 2.
>
> (2) sure brings some prettiness into jsons; I wonder when it might harm;
> e.g. a person adds another block driver: it would be difficult to get
> round BlockdevOptionsFoo, and what can be missed is something
> optional like BlockdevStatsFoo, which is harmless if left empty and
> probably would be made an empty branch anyway. The difference is that
> an empty branch is something one might notice during a review and
> object.

Yes, forcing reviewer attention is the one real advantage of 1. I can
see.  I agree with you that reviewers missing the "main" union (such as
BlockdevOptions) is unlikely.  Missing "secondary" unions is more
plausible.  Let's see how many of unions sharing a tag enumeration type
we have.  A quick hack to introspect.py coughs up:

Flat union type  Tag enum type
--
BlockdevCreateOptionsBlockdevDriver
BlockdevOptions  BlockdevDriver
BlockdevQcow2Encryption  BlockdevQcow2EncryptionFormat
ImageInfoSpecificQCow2Encryption BlockdevQcow2EncryptionFormat
BlockdevQcowEncryption   BlockdevQcowEncryptionFormat
CpuInfo  CpuInfoArch
GuestPanicInformationGuestPanicInformationType
QCryptoBlockCreateOptionsQCryptoBlockFormat
SchemaInfo   SchemaMetaType
SheepdogRedundancy   SheepdogRedundancyType
SocketAddressSocketAddressType
SshHostKeyCheck  SshHostKeyCheckMode
CpuInfoFast  SysEmuTarget
UsernetConnectionUsernetType

Two pairs.  We'll live.

> I think I'd vote for 2 (never enforce all-branches coverage) as well.

Eric, what do you think?

One more thought: if we ever get around to provide more convenient flat
union syntax so users don't have to enumerate the variant names twice,
we'll need a way to say "empty branch".  Let's worry about that problem
when we have it.

[Qemu-devel] [PATCH v4 08/14] spapr: handle pc-dimm unplug via hotplug handler chain

2018-05-17 Thread David Hildenbrand

Let's handle it via hotplug_handler_unplug(). E.g. necessary to hotplug/
unplug memory devices (which a pc-dimm is) later.

Signed-off-by: David Hildenbrand 
---
 hw/ppc/spapr.c | 23 +++
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 2f315f963b..286c38c842 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -3291,7 +3291,8 @@ static sPAPRDIMMState 
*spapr_recover_pending_dimm_state(sPAPRMachineState *ms,
 /* Callback to be called during DRC release. */
 void spapr_lmb_release(DeviceState *dev)
 {
-sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_hotplug_handler(dev));
+HotplugHandler *hotplug_ctrl = qdev_get_hotplug_handler(dev);
+sPAPRMachineState *spapr = SPAPR_MACHINE(hotplug_ctrl);
 sPAPRDIMMState *ds = spapr_pending_dimm_unplugs_find(spapr, PC_DIMM(dev));
 
 /* This information will get lost if a migration occurs
@@ -3309,9 +3310,21 @@ void spapr_lmb_release(DeviceState *dev)
 
 /*
  * Now that all the LMBs have been removed by the guest, call the
- * pc-dimm unplug handler to cleanup up the pc-dimm device.
+ * unplug handler chain. This can never fail.
  */
-pc_dimm_memory_unplug(dev, MACHINE(spapr));
+hotplug_ctrl = qdev_get_hotplug_handler(dev);
+hotplug_handler_unplug(hotplug_ctrl, dev, _abort);
+}
+
+static void spapr_memory_unplug(HotplugHandler *hotplug_dev, DeviceState *dev,
+Error **errp)
+{
+sPAPRMachineState *spapr = SPAPR_MACHINE(hotplug_dev);
+sPAPRDIMMState *ds = spapr_pending_dimm_unplugs_find(spapr, PC_DIMM(dev));
+
+g_assert(ds);
+g_assert(!ds->nr_lmbs);
+pc_dimm_memory_unplug(dev, MACHINE(hotplug_dev));
 object_unparent(OBJECT(dev));
 spapr_pending_dimm_unplugs_remove(spapr, ds);
 }
@@ -3608,7 +3621,9 @@ static void spapr_machine_device_unplug(HotplugHandler 
*hotplug_dev,
 Error *local_err = NULL;
 
 /* final stage hotplug handler */
-if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
+if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
+spapr_memory_unplug(hotplug_dev, dev, _err);
+} else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
 hotplug_handler_unplug(dev->parent_bus->hotplug_handler, dev,
_err);
 }
-- 
2.14.3

[Qemu-devel] [PATCH v4 11/14] pc-dimm: implement new memory device functions

2018-05-17 Thread David Hildenbrand

Implement the new functions, we don't have to care about alignment for
these DIMMs right now, so leave that function unimplemented.

Signed-off-by: David Hildenbrand 
---
 hw/mem/pc-dimm.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
index 12da89d562..5e2e3263ab 100644
--- a/hw/mem/pc-dimm.c
+++ b/hw/mem/pc-dimm.c
@@ -244,6 +244,21 @@ static uint64_t pc_dimm_md_get_addr(const 
MemoryDeviceState *md)
 return dimm->addr;
 }
 
+static void pc_dimm_md_set_addr(MemoryDeviceState *md, uint64_t addr)
+{
+PCDIMMDevice *dimm = PC_DIMM(md);
+
+dimm->addr = addr;
+}
+
+static MemoryRegion *pc_dimm_md_get_memory_region(MemoryDeviceState *md)
+{
+const PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(md);
+PCDIMMDevice *dimm = PC_DIMM(md);
+
+return ddc->get_memory_region(dimm, _abort);
+}
+
 static uint64_t pc_dimm_md_get_region_size(const MemoryDeviceState *md)
 {
 /* dropping const here is fine as we don't touch the memory region */
@@ -304,6 +319,8 @@ static void pc_dimm_class_init(ObjectClass *oc, void *data)
 ddc->get_vmstate_memory_region = pc_dimm_get_vmstate_memory_region;
 
 mdc->get_addr = pc_dimm_md_get_addr;
+mdc->set_addr = pc_dimm_md_set_addr;
+mdc->get_memory_region = pc_dimm_md_get_memory_region;
 /* for a dimm plugged_size == region_size */
 mdc->get_plugged_size = pc_dimm_md_get_region_size;
 mdc->get_region_size = pc_dimm_md_get_region_size;
-- 
2.14.3

[Qemu-devel] [PATCH v3 12/12] intel-iommu: new sync_shadow_page_table

2018-05-17 Thread Peter Xu

Firstly, introduce the sync_shadow_page_table() helper to resync the
whole shadow page table of an IOMMU address space.  Meanwhile, when we
receive domain invalidation or similar requests (for example, context
entry invalidations, global invalidations, ...), we should not really
run the replay logic, instead we can now use the new sync shadow page
table API to resync the whole shadow page table.

There will be two major differences:

1. We don't unmap-all before walking the page table, we just sync.  The
   global unmap-all can create a very small window that the page table
   is invalid or incomplete

2. We only walk the page table once now (while replay can be triggered
   multiple times depending on how many notifiers there are)

Signed-off-by: Peter Xu 
---
 hw/i386/intel_iommu.c | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index a1a2a009c1..fbb2f763f0 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -1065,6 +1065,11 @@ static int 
vtd_sync_shadow_page_table_range(VTDAddressSpace *vtd_as,
 return vtd_page_walk(_cache, addr, addr + size, );
 }
 
+static int vtd_sync_shadow_page_table(VTDAddressSpace *vtd_as)
+{
+return vtd_sync_shadow_page_table_range(vtd_as, NULL, 0, UINT64_MAX);
+}
+
 /*
  * Fetch translation type for specific device. Returns <0 if error
  * happens, otherwise return the shifted type to check against
@@ -1397,7 +1402,7 @@ static void vtd_iommu_replay_all(IntelIOMMUState *s)
 VTDAddressSpace *vtd_as;
 
 QLIST_FOREACH(vtd_as, >notifiers_list, next) {
-memory_region_iommu_replay_all(_as->iommu);
+vtd_sync_shadow_page_table(vtd_as);
 }
 }
 
@@ -1470,14 +1475,13 @@ static void 
vtd_context_device_invalidate(IntelIOMMUState *s,
 vtd_switch_address_space(vtd_as);
 /*
  * So a device is moving out of (or moving into) a
- * domain, a replay() suites here to notify all the
- * IOMMU_NOTIFIER_MAP registers about this change.
+ * domain, resync the shadow page table.
  * This won't bring bad even if we have no such
  * notifier registered - the IOMMU notification
  * framework will skip MAP notifications if that
  * happened.
  */
-memory_region_iommu_replay_all(_as->iommu);
+vtd_sync_shadow_page_table(vtd_as);
 }
 }
 }
@@ -1535,7 +1539,7 @@ static void vtd_iotlb_domain_invalidate(IntelIOMMUState 
*s, uint16_t domain_id)
 if (!vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus),
   vtd_as->devfn, ) &&
 domain_id == VTD_CONTEXT_ENTRY_DID(ce.hi)) {
-memory_region_iommu_replay_all(_as->iommu);
+vtd_sync_shadow_page_table(vtd_as);
 }
 }
 }
-- 
2.17.0

[Qemu-devel] [PATCH v3 10/12] intel-iommu: simplify page walk logic

2018-05-17 Thread Peter Xu

Let's move the notify_unmap check into the new vtd_page_walk_one()
function so that we can greatly simplify the vtd_page_walk_level()
logic.

No functional change at all.

Signed-off-by: Peter Xu 
---
 hw/i386/intel_iommu.c | 66 ---
 hw/i386/trace-events  |  1 -
 2 files changed, 30 insertions(+), 37 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 5a5175a4ed..272e49ff66 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -779,6 +779,11 @@ static int vtd_page_walk_one(IOMMUTLBEntry *entry, 
vtd_page_walk_info *info)
 };
 DMAMap *mapped = iova_tree_find(as->iova_tree, );
 
+if (entry->perm == IOMMU_NONE && !info->notify_unmap) {
+trace_vtd_page_walk_one_skip_unmap(entry->iova, entry->addr_mask);
+return 0;
+}
+
 assert(hook_fn);
 
 /* Update local IOVA mapped ranges */
@@ -894,45 +899,34 @@ static int vtd_page_walk_level(dma_addr_t addr, uint64_t 
start,
  */
 entry_valid = read_cur | write_cur;
 
-entry.target_as = _space_memory;
-entry.iova = iova & subpage_mask;
-entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
-entry.addr_mask = ~subpage_mask;
-
-if (vtd_is_last_slpte(slpte, level)) {
-/* NOTE: this is only meaningful if entry_valid == true */
-entry.translated_addr = vtd_get_slpte_addr(slpte, info->aw);
-if (!entry_valid && !info->notify_unmap) {
-trace_vtd_page_walk_skip_perm(iova, iova_next);
-goto next;
-}
-ret = vtd_page_walk_one(, info);
-if (ret < 0) {
-return ret;
-}
-} else {
-if (!entry_valid) {
-if (info->notify_unmap) {
-/*
- * The whole entry is invalid; unmap it all.
- * Translated address is meaningless, zero it.
- */
-entry.translated_addr = 0x0;
-ret = vtd_page_walk_one(, info);
-if (ret < 0) {
-return ret;
-}
-} else {
-trace_vtd_page_walk_skip_perm(iova, iova_next);
-}
-goto next;
-}
+if (!vtd_is_last_slpte(slpte, level) && entry_valid) {
+/*
+ * This is a valid PDE (or even bigger than PDE).  We need
+ * to walk one further level.
+ */
 ret = vtd_page_walk_level(vtd_get_slpte_addr(slpte, info->aw),
   iova, MIN(iova_next, end), level - 1,
   read_cur, write_cur, info);
-if (ret < 0) {
-return ret;
-}
+} else {
+/*
+ * This means we are either:
+ *
+ * (1) the real page entry (either 4K page, or huge page)
+ * (2) the whole range is invalid
+ *
+ * In either case, we send an IOTLB notification down.
+ */
+entry.target_as = _space_memory;
+entry.iova = iova & subpage_mask;
+entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
+entry.addr_mask = ~subpage_mask;
+/* NOTE: this is only meaningful if entry_valid == true */
+entry.translated_addr = vtd_get_slpte_addr(slpte, info->aw);
+ret = vtd_page_walk_one(, info);
+}
+
+if (ret < 0) {
+return ret;
 }
 
 next:
diff --git a/hw/i386/trace-events b/hw/i386/trace-events
index d8194b80e3..e14d06ec83 100644
--- a/hw/i386/trace-events
+++ b/hw/i386/trace-events
@@ -43,7 +43,6 @@ vtd_page_walk_one(uint16_t domain, uint64_t iova, uint64_t 
gpa, uint64_t mask, i
 vtd_page_walk_one_skip_map(uint64_t iova, uint64_t mask, uint64_t translated) 
"iova 0x%"PRIx64" mask 0x%"PRIx64" translated 0x%"PRIx64
 vtd_page_walk_one_skip_unmap(uint64_t iova, uint64_t mask) "iova 0x%"PRIx64" 
mask 0x%"PRIx64
 vtd_page_walk_skip_read(uint64_t iova, uint64_t next) "Page walk skip iova 
0x%"PRIx64" - 0x%"PRIx64" due to unable to read"
-vtd_page_walk_skip_perm(uint64_t iova, uint64_t next) "Page walk skip iova 
0x%"PRIx64" - 0x%"PRIx64" due to perm empty"
 vtd_page_walk_skip_reserve(uint64_t iova, uint64_t next) "Page walk skip iova 
0x%"PRIx64" - 0x%"PRIx64" due to rsrv set"
 vtd_switch_address_space(uint8_t bus, uint8_t slot, uint8_t fn, bool on) 
"Device %02x:%02x.%x switching address space (iommu enabled=%d)"
 vtd_as_unmap_whole(uint8_t bus, uint8_t slot, uint8_t fn, uint64_t iova, 
uint64_t size) "Device %02x:%02x.%x start 0x%"PRIx64" size 0x%"PRIx64
-- 
2.17.0

Re: [Qemu-devel] [PATCH v2 5/8] linux-user: move ppc socket.h definitions to ppc/sockbits.h

2018-05-17 Thread Laurent Vivier

Le 17/05/2018 à 01:17, Philippe Mathieu-Daudé a écrit :
> On 05/16/2018 05:55 PM, Laurent Vivier wrote:
>> No code change.
>>
>> Signed-off-by: Laurent Vivier 
>> ---
>>  linux-user/generic/sockbits.h |  9 +
>>  linux-user/ppc/sockbits.h | 19 +++
>>  2 files changed, 20 insertions(+), 8 deletions(-)
>>
>> diff --git a/linux-user/generic/sockbits.h b/linux-user/generic/sockbits.h
>> index 093faf0a48..5ad43eb0c8 100644
>> --- a/linux-user/generic/sockbits.h
>> +++ b/linux-user/generic/sockbits.h
>> @@ -30,14 +30,7 @@
>>  #define TARGET_SO_LINGER   13
>>  #define TARGET_SO_BSDCOMPAT14
>>  /* To add :#define TARGET_SO_REUSEPORT 15 */
>> -#if defined(TARGET_PPC)
>> -#define TARGET_SO_RCVLOWAT 16
>> -#define TARGET_SO_SNDLOWAT 17
>> -#define TARGET_SO_RCVTIMEO 18
>> -#define TARGET_SO_SNDTIMEO 19
>> -#define TARGET_SO_PASSCRED 20
>> -#define TARGET_SO_PEERCRED 21
>> -#else
>> +#ifndef TARGET_SO_PASSCRED /* powerpc only differs in these */
> 
> #ifndef TARGET_PPC ?

In fact, I have copied the line from linux, I think it's better to not
depend on the target but on the value we want to define.

Thanks,
Laurent

Re: [Qemu-devel] FW: [PATCH 2/2] migration: not wait RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect

2018-05-17 Thread Aviad Yehezkel




On 5/17/2018 10:41 AM, 858585 jemmy wrote:

On Thu, May 17, 2018 at 3:31 PM, Aviad Yehezkel
 wrote:


On 5/17/2018 5:42 AM, 858585 jemmy wrote:

On Wed, May 16, 2018 at 11:11 PM, Aviad Yehezkel
 wrote:

Hi Lidong and David,
Sorry for the late response, I had to ramp up on migration code and build
a
setup on my side.

PSB my comments for this patch below.
For the RDMA post-copy patches I will comment next week after testing on
Mellanox side too.

Thanks!

On 5/16/2018 5:21 PM, Aviad Yehezkel wrote:


-Original Message-
From: Dr. David Alan Gilbert [mailto:dgilb...@redhat.com]
Sent: Wednesday, May 16, 2018 4:13 PM
To: 858585 jemmy 
Cc: Aviad Yehezkel ; Juan Quintela
; qemu-devel ; Gal Shachaf
; Adi Dotan ; Lidong Chen

Subject: Re: [PATCH 2/2] migration: not wait RDMA_CM_EVENT_DISCONNECTED
event after rdma_disconnect

* 858585 jemmy (jemmy858...@gmail.com) wrote:




I wonder why dereg_mr takes so long - I could understand if
reg_mr took a long time, but why for dereg, that sounds like the
easy side.

I use perf collect the information when ibv_dereg_mr is invoked.

-   9.95%  client2  [kernel.kallsyms]  [k] put_compound_page
 `
  - put_compound_page
 - 98.45% put_page
  __ib_umem_release
  ib_umem_release
  dereg_mr
  mlx5_ib_dereg_mr
  ib_dereg_mr
  uverbs_free_mr
  remove_commit_idr_uobject
  _rdma_remove_commit_uobject
  rdma_remove_commit_uobject
  ib_uverbs_dereg_mr
  ib_uverbs_write
  vfs_write
  sys_write
  system_call_fastpath
  __GI___libc_write
  0
 + 1.55% __ib_umem_release
+   8.31%  client2  [kernel.kallsyms]  [k]
compound_unlock_irqrestore
+   7.01%  client2  [kernel.kallsyms]  [k] page_waitqueue
+   7.00%  client2  [kernel.kallsyms]  [k] set_page_dirty
+   6.61%  client2  [kernel.kallsyms]  [k] unlock_page
+   6.33%  client2  [kernel.kallsyms]  [k] put_page_testzero
+   5.68%  client2  [kernel.kallsyms]  [k] set_page_dirty_lock
+   4.30%  client2  [kernel.kallsyms]  [k] __wake_up_bit
+   4.04%  client2  [kernel.kallsyms]  [k] free_pages_prepare
+   3.65%  client2  [kernel.kallsyms]  [k] release_pages
+   3.62%  client2  [kernel.kallsyms]  [k] arch_local_irq_save
+   3.35%  client2  [kernel.kallsyms]  [k] page_mapping
+   3.13%  client2  [kernel.kallsyms]  [k]
get_pageblock_flags_group
+   3.09%  client2  [kernel.kallsyms]  [k] put_page

the reason is __ib_umem_release will loop many times for each page.

static void __ib_umem_release(struct ib_device *dev, struct
ib_umem *umem, int dirty) {
   struct scatterlist *sg;
   struct page *page;
   int i;

   if (umem->nmap > 0)
ib_dma_unmap_sg(dev, umem->sg_head.sgl,
   umem->npages,
   DMA_BIDIRECTIONAL);

for_each_sg(umem->sg_head.sgl, sg, umem->npages, i) {
<<
loop a lot of times for each page.here

Why 'lot of times for each page'?  I don't know this code at all,
but I'd expected once per page?

sorry, once per page, but a lot of page for a big size virtual
machine.

Ah OK; so yes it seems best if you can find a way to do the release
in the migration thread then;  still maybe this is something some of
the kernel people could look at speeding up?

The kernel code seem is not complex, and I have no idea how to speed
up.

Me neither; but I'll ask around.

I will ask internally and get back on this one.



With your other kernel fix, does the problem of the missing
RDMA_CM_EVENT_DISCONNECTED events go away?

Yes, after kernel and qemu fixed, this issue never happens again.

I'm confused; which qemu fix; my question was whether the kernel fix
by itself fixed the problem of the missing event.

this qemu fix:
migration: update index field when delete or qsort RDMALocalBlock

OK good; so then we shouldn't need this 2/2 patch.

It is good that the qemu fix solved this issue but me and my colleagues
think we need 2/2 patch anyway.
According to IB Spec once active side send DREQ message he should wait
for
DREP message and only once it arrived it should trigger a DISCONNECT
event.
DREP message can be dropped due to network issues.
For that case the spec defines a DREP_timeout state in the CM state
machine,
if the DREP is dropped we should get a timeout and a TIMEWAIT_EXIT event
will be trigger.
Unfortunately the current kernel CM implementation doesn't include the
DREP_timeout state and in above scenario we will not get DISCONNECT or
TIMEWAIT_EXIT events.
(Similar scenario exists also for passive side).
We think it is best to apply this patch until we

[Qemu-devel] [PATCH v4 07/14] spapr: route all memory devices through the machine hotplug handler

2018-05-17 Thread David Hildenbrand

Necessary to hotplug them cleanly later.

Signed-off-by: David Hildenbrand 
---
 hw/ppc/spapr.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index b7c5c95f7a..2f315f963b 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -3666,6 +3666,7 @@ static HotplugHandler 
*spapr_get_hotplug_handler(MachineState *machine,
  DeviceState *dev)
 {
 if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM) ||
+object_dynamic_cast(OBJECT(dev), TYPE_MEMORY_DEVICE) ||
 object_dynamic_cast(OBJECT(dev), TYPE_SPAPR_CPU_CORE)) {
 return HOTPLUG_HANDLER(machine);
 }
-- 
2.14.3

Re: [Qemu-devel] [PATCH v6 2/2] qga: update guest-suspend-ram and guest-suspend-hybrid descriptions

2018-05-17 Thread Markus Armbruster

Daniel Henrique Barboza  writes:

> On 05/15/2018 12:35 PM, Markus Armbruster wrote:
>> Daniel Henrique Barboza  writes:
>>
>>> This patch updates the descriptions of 'guest-suspend-ram' and
>>> 'guest-suspend-hybrid' to mention that both commands relies now
>>> on the existence of 'system_wakeup' and also on the proper support
>>> for wake up from suspend, retrieved by the 'wakeup-suspend-support'
>>> attribute of the 'query-target' QMP command.
>>>
>>> Reported-by: Balamuruhan S 
>>> Signed-off-by: Daniel Henrique Barboza 
>>> Reviewed-by: Michael Roth 
>>> ---
>>>   qga/qapi-schema.json | 14 ++
>>>   1 file changed, 10 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/qga/qapi-schema.json b/qga/qapi-schema.json
>>> index 17884c7c70..e3fb8adfce 100644
>>> --- a/qga/qapi-schema.json
>>> +++ b/qga/qapi-schema.json
>>> @@ -566,8 +566,11 @@
>>>   # package installed in the guest.
>>>   #
>>>   # IMPORTANT: guest-suspend-ram requires QEMU to support the 
>>> 'system_wakeup'
>>> -# command.  Thus, it's *required* to query QEMU for the presence of the
>>> -# 'system_wakeup' command before issuing guest-suspend-ram.
>>> +# command and the guest to support wake up from suspend. Thus, it's
>>> +# *required* to query QEMU for the presence of the 'system_wakeup' command
>>> +# and to verify that wake up from suspend is enabled by checking the
>>> +# 'wakeup-suspend-support' flag of 'query-target' QMP command, before 
>>> issuing
>>> +# guest-suspend-ram.
>> Isn't checking for presence of system_wakeup redundant?
>>
>> When query-target tells us "system_wakeup works" by returning
>> wakeup-suspend-support: true, we surely have system_wakeup (or else
>> query-target would be lying to us).
>>
>> When it returns wakeup-suspend-support: false, it doesn't matter whether
>> we have system_wakeup.
>>
>> Unless I'm wrong, we can simplify this to something like
>>
>> # IMPORTANT: guest-suspend-ram requires working wakeup support in
>> # QEMU.  You *must* check QMP command query-target returns
>> # wakeup-suspend-support: true before issuing this command.
>
> It is worth noticing that this API isn't checking for the existence of
> system_wakeup. It is checking whether there are notifiers added in
> the wakeup_notifiers QLIST.
>
> However, I think we can simplify the text as you suggested because that
> part seems outdated anyway. Is there any relevant scenario where
> system_wakeup will not be present?

I doubt it: we have system_wakeup since 1.1, and we've even backported
it to RHEL-6.

But even if there *was* a relevant scenario involving a QEMU that
doesn't provide system_wakeup, that QEMU will also not provide member
wakeup-suspend-support, let alone tell us wakeup-suspend-support: true,
unless somebody set out to break things on purpose.  That somebody would
get to keep the pieces then.

Re: [Qemu-devel] [PATCH v2 02/10] intel-iommu: remove IntelIOMMUNotifierNode

2018-05-17 Thread Peter Xu

On Thu, May 17, 2018 at 11:46:22AM +0200, Auger Eric wrote:
> Hi Peter,
> 
> On 05/04/2018 05:08 AM, Peter Xu wrote:
> > That is not really necessary.  Removing that node struct and put the
> > list entry directly into VTDAddressSpace.  It simplfies the code a lot.
> > 
> > Signed-off-by: Peter Xu 
> > ---
> >  include/hw/i386/intel_iommu.h |  9 ++--
> >  hw/i386/intel_iommu.c | 41 ++-
> >  2 files changed, 14 insertions(+), 36 deletions(-)
> > 
> > diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
> > index 45ec8919b6..220697253f 100644
> > --- a/include/hw/i386/intel_iommu.h
> > +++ b/include/hw/i386/intel_iommu.h
> > @@ -67,7 +67,6 @@ typedef union VTD_IR_TableEntry VTD_IR_TableEntry;
> >  typedef union VTD_IR_MSIAddress VTD_IR_MSIAddress;
> >  typedef struct VTDIrq VTDIrq;
> >  typedef struct VTD_MSIMessage VTD_MSIMessage;
> > -typedef struct IntelIOMMUNotifierNode IntelIOMMUNotifierNode;
> >  
> >  /* Context-Entry */
> >  struct VTDContextEntry {
> > @@ -93,6 +92,7 @@ struct VTDAddressSpace {
> >  MemoryRegion iommu_ir;  /* Interrupt region: 0xfeeX */
> >  IntelIOMMUState *iommu_state;
> >  VTDContextCacheEntry context_cache_entry;
> > +QLIST_ENTRY(VTDAddressSpace) next;
> >  };
> >  
> >  struct VTDBus {
> > @@ -253,11 +253,6 @@ struct VTD_MSIMessage {
> >  /* When IR is enabled, all MSI/MSI-X data bits should be zero */
> >  #define VTD_IR_MSI_DATA  (0)
> >  
> > -struct IntelIOMMUNotifierNode {
> > -VTDAddressSpace *vtd_as;
> > -QLIST_ENTRY(IntelIOMMUNotifierNode) next;
> > -};
> > -
> >  /* The iommu (DMAR) device state struct */
> >  struct IntelIOMMUState {
> >  X86IOMMUState x86_iommu;
> > @@ -295,7 +290,7 @@ struct IntelIOMMUState {
> >  GHashTable *vtd_as_by_busptr;   /* VTDBus objects indexed by PCIBus* 
> > reference */
> >  VTDBus *vtd_as_by_bus_num[VTD_PCI_BUS_MAX]; /* VTDBus objects indexed 
> > by bus number */
> >  /* list of registered notifiers */
> > -QLIST_HEAD(, IntelIOMMUNotifierNode) notifiers_list;
> > +QLIST_HEAD(, VTDAddressSpace) notifiers_list;
> Wouldn't it make sense to rename notifiers_list into something more
> understandable like address_spaces?

But address_spaces might be a bit confusing too on the other side as
"a list of all VT-d address spaces".  How about something like:

 address_spaces_with_notifiers

?

[...]

> > -/* update notifier node with new flags */
> > -QLIST_FOREACH_SAFE(node, >notifiers_list, next, next_node) {
> > -if (node->vtd_as == vtd_as) {
> > -if (new == IOMMU_NOTIFIER_NONE) {
> > -QLIST_REMOVE(node, next);
> > -g_free(node);
> > -}
> > -return;
> > -}
> > +/* Insert new ones */
> s/ones/one
> 
> > +QLIST_INSERT_HEAD(>notifiers_list, vtd_as, next);
> > +} else if (new == IOMMU_NOTIFIER_NONE) {
> > +/* Remove old ones */
> same. Not sure the comments are worth.

Will remove two "s"s there.  Thanks,

-- 
Peter Xu

Re: [Qemu-devel] [RFC PATCH 07/12] qcow2-bitmap: add basic bitmaps info

2018-05-17 Thread Vladimir Sementsov-Ogievskiy


17.05.2018 00:17, John Snow wrote:


On 05/14/2018 11:12 AM, Vladimir Sementsov-Ogievskiy wrote:

12.05.2018 04:25, John Snow wrote:

Add functions for querying the basic information inside of bitmaps.
Restructure the bitmaps flags masks to facilitate providing a list of
flags belonging to the bitmap(s) being queried.

Signed-off-by: John Snow 
---
   block/qcow2-bitmap.c | 81
++--
   block/qcow2.c    |  7 +
   block/qcow2.h    |  1 +
   3 files changed, 87 insertions(+), 2 deletions(-)

diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c
index 60e01abfd7..811b82743a 100644
--- a/block/qcow2-bitmap.c
+++ b/block/qcow2-bitmap.c
@@ -49,8 +49,28 @@
     /* Bitmap directory entry flags */
   #define BME_RESERVED_FLAGS 0xfffcU
-#define BME_FLAG_IN_USE (1U << 0)
-#define BME_FLAG_AUTO   (1U << 1)
+
+enum BME_FLAG_BITS {
+    BME_FLAG_BIT__BEGIN = 0,
+    BME_FLAG_BIT_IN_USE = BME_FLAG_BIT__BEGIN,
+    BME_FLAG_BIT_AUTO   = 1,
+    BME_FLAG_BIT_EXTRA  = 2,
+    BME_FLAG_BIT__MAX,
+};
+
+#define BME_FLAG_IN_USE (1U << BME_FLAG_BIT_IN_USE)
+#define BME_FLAG_AUTO   (1U << BME_FLAG_BIT_AUTO)
+#define BME_FLAG_EXTRA  (1U << BME_FLAG_BIT_EXTRA)
+
+/* Correlate canonical spec values to autogenerated QAPI values */
+struct {
+    uint32_t mask;

mask is unused in this patch


+    int qapi_value;
+} BMEFlagMap[BME_FLAG_BIT__MAX] = {
+    [BME_FLAG_BIT_IN_USE] = { BME_FLAG_IN_USE,
BITMAP_FLAG_ENUM_IN_USE },
+    [BME_FLAG_BIT_AUTO]   = { BME_FLAG_AUTO,   BITMAP_FLAG_ENUM_AUTO },
+    [BME_FLAG_BIT_EXTRA]  = { BME_FLAG_EXTRA,
BITMAP_FLAG_ENUM_EXTRA_DATA_COMPATIBLE },
+};
     /* bits [1, 8] U [56, 63] are reserved */
   #define BME_TABLE_ENTRY_RESERVED_MASK 0xff0001feULL
@@ -663,6 +683,63 @@ static void del_bitmap_list(BlockDriverState *bs)
   }
   }
   +static BitmapFlagEnumList *get_bitmap_flags(uint32_t flags)
+{
+    int i;
+    BitmapFlagEnumList *flist = NULL;
+    BitmapFlagEnumList *ftmp;
+
+    while (flags) {
+    i = ctz32(flags);
+    ftmp = g_new0(BitmapFlagEnumList, 1);
+    if (i >= BME_FLAG_BIT__BEGIN && i < BME_FLAG_BIT__MAX) {
+    ftmp->value = BMEFlagMap[i].qapi_value;
+    } else {
+    ftmp->value = BITMAP_FLAG_ENUM_UNKNOWN;

so, there may be several "unknown" entries. It's inconsistent with
"@unknown: This bitmap has unknown or reserved properties.".

Finally, can we export values for unknown flags? It should be more
informative.


I changed my mind -- qemu-img is not a debugging tool and shouldn't get
lost in the weeds trying to enumerate the different kinds of reserved
values that are set.

It's an error to have them set, or not -- which ones specifically is not
information that an end-user need know or care about, and I don't want
to create an extra structure to contain the value.

I'm going to rewrite this loop to just only ever have one unknown value
at a maximum.

--js



Ok, I agree.

--
Best regards,
Vladimir

Re: [Qemu-devel] [PATCH v3 5/8] xen_disk: remove use of grant map/unmap

2018-05-17 Thread Anthony PERARD

On Fri, May 04, 2018 at 08:26:04PM +0100, Paul Durrant wrote:
> Now that the (native or emulated) xen_be_copy_grant_refs() helper is
> always available, the xen_disk code can be significantly simplified by
> removing direct use of grant map and unmap operations.
> 
> Signed-off-by: Paul Durrant 

Acked-by: Anthony PERARD 

-- 
Anthony PERARD

Re: [Qemu-devel] FW: [PATCH 2/2] migration: not wait RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect

2018-05-17 Thread Aviad Yehezkel




On 5/17/2018 5:42 AM, 858585 jemmy wrote:

On Wed, May 16, 2018 at 11:11 PM, Aviad Yehezkel
 wrote:

Hi Lidong and David,
Sorry for the late response, I had to ramp up on migration code and build a
setup on my side.

PSB my comments for this patch below.
For the RDMA post-copy patches I will comment next week after testing on
Mellanox side too.

Thanks!

On 5/16/2018 5:21 PM, Aviad Yehezkel wrote:


-Original Message-
From: Dr. David Alan Gilbert [mailto:dgilb...@redhat.com]
Sent: Wednesday, May 16, 2018 4:13 PM
To: 858585 jemmy 
Cc: Aviad Yehezkel ; Juan Quintela
; qemu-devel ; Gal Shachaf
; Adi Dotan ; Lidong Chen

Subject: Re: [PATCH 2/2] migration: not wait RDMA_CM_EVENT_DISCONNECTED
event after rdma_disconnect

* 858585 jemmy (jemmy858...@gmail.com) wrote:




I wonder why dereg_mr takes so long - I could understand if
reg_mr took a long time, but why for dereg, that sounds like the
easy side.

I use perf collect the information when ibv_dereg_mr is invoked.

-   9.95%  client2  [kernel.kallsyms]  [k] put_compound_page
`
 - put_compound_page
- 98.45% put_page
 __ib_umem_release
 ib_umem_release
 dereg_mr
 mlx5_ib_dereg_mr
 ib_dereg_mr
 uverbs_free_mr
 remove_commit_idr_uobject
 _rdma_remove_commit_uobject
 rdma_remove_commit_uobject
 ib_uverbs_dereg_mr
 ib_uverbs_write
 vfs_write
 sys_write
 system_call_fastpath
 __GI___libc_write
 0
+ 1.55% __ib_umem_release
+   8.31%  client2  [kernel.kallsyms]  [k] compound_unlock_irqrestore
+   7.01%  client2  [kernel.kallsyms]  [k] page_waitqueue
+   7.00%  client2  [kernel.kallsyms]  [k] set_page_dirty
+   6.61%  client2  [kernel.kallsyms]  [k] unlock_page
+   6.33%  client2  [kernel.kallsyms]  [k] put_page_testzero
+   5.68%  client2  [kernel.kallsyms]  [k] set_page_dirty_lock
+   4.30%  client2  [kernel.kallsyms]  [k] __wake_up_bit
+   4.04%  client2  [kernel.kallsyms]  [k] free_pages_prepare
+   3.65%  client2  [kernel.kallsyms]  [k] release_pages
+   3.62%  client2  [kernel.kallsyms]  [k] arch_local_irq_save
+   3.35%  client2  [kernel.kallsyms]  [k] page_mapping
+   3.13%  client2  [kernel.kallsyms]  [k] get_pageblock_flags_group
+   3.09%  client2  [kernel.kallsyms]  [k] put_page

the reason is __ib_umem_release will loop many times for each page.

static void __ib_umem_release(struct ib_device *dev, struct
ib_umem *umem, int dirty) {
  struct scatterlist *sg;
  struct page *page;
  int i;

  if (umem->nmap > 0)
   ib_dma_unmap_sg(dev, umem->sg_head.sgl,
  umem->npages,
  DMA_BIDIRECTIONAL);

   for_each_sg(umem->sg_head.sgl, sg, umem->npages, i) {
<<
loop a lot of times for each page.here

Why 'lot of times for each page'?  I don't know this code at all,
but I'd expected once per page?

sorry, once per page, but a lot of page for a big size virtual machine.

Ah OK; so yes it seems best if you can find a way to do the release
in the migration thread then;  still maybe this is something some of
the kernel people could look at speeding up?

The kernel code seem is not complex, and I have no idea how to speed up.

Me neither; but I'll ask around.

I will ask internally and get back on this one.



With your other kernel fix, does the problem of the missing
RDMA_CM_EVENT_DISCONNECTED events go away?

Yes, after kernel and qemu fixed, this issue never happens again.

I'm confused; which qemu fix; my question was whether the kernel fix
by itself fixed the problem of the missing event.

this qemu fix:
migration: update index field when delete or qsort RDMALocalBlock

OK good; so then we shouldn't need this 2/2 patch.

It is good that the qemu fix solved this issue but me and my colleagues
think we need 2/2 patch anyway.
According to IB Spec once active side send DREQ message he should wait for
DREP message and only once it arrived it should trigger a DISCONNECT event.
DREP message can be dropped due to network issues.
For that case the spec defines a DREP_timeout state in the CM state machine,
if the DREP is dropped we should get a timeout and a TIMEWAIT_EXIT event
will be trigger.
Unfortunately the current kernel CM implementation doesn't include the
DREP_timeout state and in above scenario we will not get DISCONNECT or
TIMEWAIT_EXIT events.
(Similar scenario exists also for passive side).
We think it is best to apply this patch until we will enhance the kernel
code.


Hi Aviad:
 How long about the DREP_timeout state?
 Do RDMA have some tools like tcpdump? then I can use to confirm

Re: [Qemu-devel] [edk2] [PATCH 0/4] RFC: ovmf: Add support for TPM Physical Presence interface

2018-05-17 Thread Laszlo Ersek

On 05/16/18 11:29, Laszlo Ersek wrote:
> Hi Marc-André,
> 
> On 05/15/18 14:30, marcandre.lur...@redhat.com wrote:
>> From: Marc-André Lureau 
>>
>> Hi,
>>
>> The following series adds basic TPM PPI 1.3 support for OVMF-on-QEMU
>> with TPM2 (I haven't tested TPM1, for lack of interest).

Here's another general request: please make sure that all code you add
(new lines in existent files, and especially brand new files) use CRLF
line terminators.

Thanks!
Laszlo

Re: [Qemu-devel] [PATCH] RISC-V: make it possible to alter default reset vector

2018-05-17 Thread Antony Pavlov

On Tue,  8 May 2018 00:08:38 +0300
Antony Pavlov  wrote:

Please comment this patch!

> The RISC-V Instruction Set Manual, Volume II:
> Privileged Architecture, Version 1.10 states
> that upon reset the pc is set to
> an implementation-defined reset vector
> (see chapter 3.3 Reset).
> 
> This patch makes it possible to alter default
> reset vector by setting "rstvec" property
> for TYPE_RISCV_HART_ARRAY.
> 
> Signed-off-by: Antony Pavlov 
> Cc: Michael Clark 
> Cc: Palmer Dabbelt 
> Cc: Sagar Karandikar 
> Cc: Bastian Koppelmann 
> Cc: Peter Crosthwaite 
> Cc: Peter Maydell 
> ---
>  hw/riscv/riscv_hart.c |  3 +++
>  include/hw/riscv/riscv_hart.h |  1 +
>  target/riscv/cpu.c| 17 ++---
>  target/riscv/cpu.h|  2 ++
>  4 files changed, 16 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/riscv/riscv_hart.c b/hw/riscv/riscv_hart.c
> index 14e3c186fe..98e5e50f33 100644
> --- a/hw/riscv/riscv_hart.c
> +++ b/hw/riscv/riscv_hart.c
> @@ -27,6 +27,7 @@
>  static Property riscv_harts_props[] = {
>  DEFINE_PROP_UINT32("num-harts", RISCVHartArrayState, num_harts, 1),
>  DEFINE_PROP_STRING("cpu-type", RISCVHartArrayState, cpu_type),
> +DEFINE_PROP_UINT64("rstvec", RISCVHartArrayState, rstvec, 
> DEFAULT_RSTVEC),
>  DEFINE_PROP_END_OF_LIST(),
>  };
>  
> @@ -50,6 +51,8 @@ static void riscv_harts_realize(DeviceState *dev, Error 
> **errp)
>  s->harts[n].env.mhartid = n;
>  object_property_add_child(OBJECT(s), "harts[*]", 
> OBJECT(>harts[n]),
>_abort);
> +object_property_set_uint(OBJECT(>harts[n]), s->rstvec,
> + "rstvec", );
>  qemu_register_reset(riscv_harts_cpu_reset, >harts[n]);
>  object_property_set_bool(OBJECT(>harts[n]), true,
>   "realized", );
> diff --git a/include/hw/riscv/riscv_hart.h b/include/hw/riscv/riscv_hart.h
> index 0671d88a44..3cc19e2b60 100644
> --- a/include/hw/riscv/riscv_hart.h
> +++ b/include/hw/riscv/riscv_hart.h
> @@ -34,6 +34,7 @@ typedef struct RISCVHartArrayState {
>  uint32_t num_harts;
>  char *cpu_type;
>  RISCVCPU *harts;
> +uint64_t rstvec;
>  } RISCVHartArrayState;
>  
>  #endif
> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> index 5a527fbba0..061aa5cc6b 100644
> --- a/target/riscv/cpu.c
> +++ b/target/riscv/cpu.c
> @@ -23,6 +23,7 @@
>  #include "exec/exec-all.h"
>  #include "qapi/error.h"
>  #include "migration/vmstate.h"
> +#include "hw/qdev-properties.h"
>  
>  /* RISC-V CPU definitions */
>  
> @@ -112,7 +113,6 @@ static void riscv_any_cpu_init(Object *obj)
>  CPURISCVState *env = _CPU(obj)->env;
>  set_misa(env, RVXLEN | RVI | RVM | RVA | RVF | RVD | RVC | RVU);
>  set_versions(env, USER_VERSION_2_02_0, PRIV_VERSION_1_10_0);
> -set_resetvec(env, DEFAULT_RSTVEC);
>  }
>  
>  #if defined(TARGET_RISCV32)
> @@ -122,7 +122,6 @@ static void rv32gcsu_priv1_09_1_cpu_init(Object *obj)
>  CPURISCVState *env = _CPU(obj)->env;
>  set_misa(env, RV32 | RVI | RVM | RVA | RVF | RVD | RVC | RVS | RVU);
>  set_versions(env, USER_VERSION_2_02_0, PRIV_VERSION_1_09_1);
> -set_resetvec(env, DEFAULT_RSTVEC);
>  set_feature(env, RISCV_FEATURE_MMU);
>  }
>  
> @@ -131,7 +130,6 @@ static void rv32gcsu_priv1_10_0_cpu_init(Object *obj)
>  CPURISCVState *env = _CPU(obj)->env;
>  set_misa(env, RV32 | RVI | RVM | RVA | RVF | RVD | RVC | RVS | RVU);
>  set_versions(env, USER_VERSION_2_02_0, PRIV_VERSION_1_10_0);
> -set_resetvec(env, DEFAULT_RSTVEC);
>  set_feature(env, RISCV_FEATURE_MMU);
>  }
>  
> @@ -140,7 +138,6 @@ static void rv32imacu_nommu_cpu_init(Object *obj)
>  CPURISCVState *env = _CPU(obj)->env;
>  set_misa(env, RV32 | RVI | RVM | RVA | RVC | RVU);
>  set_versions(env, USER_VERSION_2_02_0, PRIV_VERSION_1_10_0);
> -set_resetvec(env, DEFAULT_RSTVEC);
>  }
>  
>  #elif defined(TARGET_RISCV64)
> @@ -150,7 +147,6 @@ static void rv64gcsu_priv1_09_1_cpu_init(Object *obj)
>  CPURISCVState *env = _CPU(obj)->env;
>  set_misa(env, RV64 | RVI | RVM | RVA | RVF | RVD | RVC | RVS | RVU);
>  set_versions(env, USER_VERSION_2_02_0, PRIV_VERSION_1_09_1);
> -set_resetvec(env, DEFAULT_RSTVEC);
>  set_feature(env, RISCV_FEATURE_MMU);
>  }
>  
> @@ -159,7 +155,6 @@ static void rv64gcsu_priv1_10_0_cpu_init(Object *obj)
>  CPURISCVState *env = _CPU(obj)->env;
>  set_misa(env, RV64 | RVI | RVM | RVA | RVF | RVD | RVC | RVS | RVU);
>  set_versions(env, USER_VERSION_2_02_0, PRIV_VERSION_1_10_0);
> -set_resetvec(env, DEFAULT_RSTVEC);
>  set_feature(env, RISCV_FEATURE_MMU);
>  }
>  
> @@ -168,7 +163,6 @@ static void rv64imacu_nommu_cpu_init(Object *obj)
>  CPURISCVState *env = _CPU(obj)->env;
>

Re: [Qemu-devel] [PATCH V7 RESEND 16/17] COLO: notify net filters about checkpoint/failover event

2018-05-17 Thread Dr. David Alan Gilbert

* Zhang Chen (zhangc...@gmail.com) wrote:
> From: zhanghailiang 
> 
> Notify all net filters about the checkpoint and failover event.
> 
> Signed-off-by: zhanghailiang 

Reviewed-by: Dr. David Alan Gilbert 

> ---
>  migration/colo.c | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/migration/colo.c b/migration/colo.c
> index 3dfd84d897..15463e2823 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -88,6 +88,11 @@ static void secondary_vm_do_failover(void)
>  if (local_err) {
>  error_report_err(local_err);
>  }
> +/* Notify all filters of all NIC to do checkpoint */
> +colo_notify_filters_event(COLO_EVENT_FAILOVER, _err);
> +if (local_err) {
> +error_report_err(local_err);
> +}
>  
>  if (!autostart) {
>  error_report("\"-S\" qemu option will be ignored in secondary side");
> @@ -799,6 +804,13 @@ void *colo_process_incoming_thread(void *opaque)
>  goto out;
>  }
>  
> +/* Notify all filters of all NIC to do checkpoint */
> +colo_notify_filters_event(COLO_EVENT_CHECKPOINT, _err);
> +if (local_err) {
> +qemu_mutex_unlock_iothread();
> +goto out;
> +}
> +
>  vmstate_loading = false;
>  vm_start();
>  trace_colo_vm_state_change("stop", "run");
> -- 
> 2.17.0
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH v2 02/10] intel-iommu: remove IntelIOMMUNotifierNode

2018-05-17 Thread Auger Eric

Hi Peter,

On 05/04/2018 05:08 AM, Peter Xu wrote:
> That is not really necessary.  Removing that node struct and put the
> list entry directly into VTDAddressSpace.  It simplfies the code a lot.
> 
> Signed-off-by: Peter Xu 
> ---
>  include/hw/i386/intel_iommu.h |  9 ++--
>  hw/i386/intel_iommu.c | 41 ++-
>  2 files changed, 14 insertions(+), 36 deletions(-)
> 
> diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
> index 45ec8919b6..220697253f 100644
> --- a/include/hw/i386/intel_iommu.h
> +++ b/include/hw/i386/intel_iommu.h
> @@ -67,7 +67,6 @@ typedef union VTD_IR_TableEntry VTD_IR_TableEntry;
>  typedef union VTD_IR_MSIAddress VTD_IR_MSIAddress;
>  typedef struct VTDIrq VTDIrq;
>  typedef struct VTD_MSIMessage VTD_MSIMessage;
> -typedef struct IntelIOMMUNotifierNode IntelIOMMUNotifierNode;
>  
>  /* Context-Entry */
>  struct VTDContextEntry {
> @@ -93,6 +92,7 @@ struct VTDAddressSpace {
>  MemoryRegion iommu_ir;  /* Interrupt region: 0xfeeX */
>  IntelIOMMUState *iommu_state;
>  VTDContextCacheEntry context_cache_entry;
> +QLIST_ENTRY(VTDAddressSpace) next;
>  };
>  
>  struct VTDBus {
> @@ -253,11 +253,6 @@ struct VTD_MSIMessage {
>  /* When IR is enabled, all MSI/MSI-X data bits should be zero */
>  #define VTD_IR_MSI_DATA  (0)
>  
> -struct IntelIOMMUNotifierNode {
> -VTDAddressSpace *vtd_as;
> -QLIST_ENTRY(IntelIOMMUNotifierNode) next;
> -};
> -
>  /* The iommu (DMAR) device state struct */
>  struct IntelIOMMUState {
>  X86IOMMUState x86_iommu;
> @@ -295,7 +290,7 @@ struct IntelIOMMUState {
>  GHashTable *vtd_as_by_busptr;   /* VTDBus objects indexed by PCIBus* 
> reference */
>  VTDBus *vtd_as_by_bus_num[VTD_PCI_BUS_MAX]; /* VTDBus objects indexed by 
> bus number */
>  /* list of registered notifiers */
> -QLIST_HEAD(, IntelIOMMUNotifierNode) notifiers_list;
> +QLIST_HEAD(, VTDAddressSpace) notifiers_list;
Wouldn't it make sense to rename notifiers_list into something more
understandable like address_spaces?
>  
>  /* interrupt remapping */
>  bool intr_enabled;  /* Whether guest enabled IR */
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index b359efd6f9..5987b48d43 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -1248,10 +1248,10 @@ static void 
> vtd_interrupt_remap_table_setup(IntelIOMMUState *s)
>  
>  static void vtd_iommu_replay_all(IntelIOMMUState *s)
>  {
> -IntelIOMMUNotifierNode *node;
> +VTDAddressSpace *vtd_as;
>  
> -QLIST_FOREACH(node, >notifiers_list, next) {
> -memory_region_iommu_replay_all(>vtd_as->iommu);
> +QLIST_FOREACH(vtd_as, >notifiers_list, next) {
> +memory_region_iommu_replay_all(_as->iommu);
>  }
>  }
>  
> @@ -1372,7 +1372,6 @@ static void vtd_iotlb_global_invalidate(IntelIOMMUState 
> *s)
>  
>  static void vtd_iotlb_domain_invalidate(IntelIOMMUState *s, uint16_t 
> domain_id)
>  {
> -IntelIOMMUNotifierNode *node;
>  VTDContextEntry ce;
>  VTDAddressSpace *vtd_as;
>  
> @@ -1381,8 +1380,7 @@ static void vtd_iotlb_domain_invalidate(IntelIOMMUState 
> *s, uint16_t domain_id)
>  g_hash_table_foreach_remove(s->iotlb, vtd_hash_remove_by_domain,
>  _id);
>  
> -QLIST_FOREACH(node, >notifiers_list, next) {
> -vtd_as = node->vtd_as;
> +QLIST_FOREACH(vtd_as, >notifiers_list, next) {
>  if (!vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus),
>vtd_as->devfn, ) &&
>  domain_id == VTD_CONTEXT_ENTRY_DID(ce.hi)) {
> @@ -1402,12 +1400,11 @@ static void 
> vtd_iotlb_page_invalidate_notify(IntelIOMMUState *s,
> uint16_t domain_id, hwaddr addr,
> uint8_t am)
>  {
> -IntelIOMMUNotifierNode *node;
> +VTDAddressSpace *vtd_as;
>  VTDContextEntry ce;
>  int ret;
>  
> -QLIST_FOREACH(node, &(s->notifiers_list), next) {
> -VTDAddressSpace *vtd_as = node->vtd_as;
> +QLIST_FOREACH(vtd_as, &(s->notifiers_list), next) {
>  ret = vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus),
> vtd_as->devfn, );
>  if (!ret && domain_id == VTD_CONTEXT_ENTRY_DID(ce.hi)) {
> @@ -2344,8 +2341,6 @@ static void 
> vtd_iommu_notify_flag_changed(IOMMUMemoryRegion *iommu,
>  {
>  VTDAddressSpace *vtd_as = container_of(iommu, VTDAddressSpace, iommu);
>  IntelIOMMUState *s = vtd_as->iommu_state;
> -IntelIOMMUNotifierNode *node = NULL;
> -IntelIOMMUNotifierNode *next_node = NULL;
>  
>  if (!s->caching_mode && new & IOMMU_NOTIFIER_MAP) {
>  error_report("We need to set caching-mode=1 for intel-iommu to 
> enable "
> @@ -2354,21 +2349,11 @@ static void 
> vtd_iommu_notify_flag_changed(IOMMUMemoryRegion *iommu,
>  }
>  
>  if (old ==

Re: [Qemu-devel] [PULL 09/28] target/arm: convert conversion helpers to fpst/ahp_flag

2018-05-17 Thread Peter Maydell

On 16 May 2018 at 16:52, Richard Henderson  wrote:
> diff --git a/target/arm/translate.c b/target/arm/translate.c
> index 731cf327a1..613598d090 100644
> --- a/target/arm/translate.c
> +++ b/target/arm/translate.c

Just noticed, but in the 32-bit translator where the argument to
get_fpstatus_ptr() is "is this neon?" (ie "do we use the standard
FPSCR value"), shouldn't we be passing 'true' to get_fpstatus_ptr()
for the halfprec conversions in disas_neon_data_insn() ?

I haven't tested, but I imagine that otherwise you get the wrong
results if the input is a denormal and FPSCR.FZ is 0 or if the
output should be a NaN and FPSCR.DN is 0.

> @@ -7222,53 +7247,70 @@ static int disas_neon_data_insn(DisasContext *s, 
> uint32_t insn)
>  }
>  break;
>  case NEON_2RM_VCVT_F16_F32:
> +{
> +TCGv_ptr fpst;
> +TCGv_i32 ahp;
> +
>  if (!arm_dc_feature(s, ARM_FEATURE_VFP_FP16) ||
>  q || (rm & 1)) {
>  return 1;
>  }
>  tmp = tcg_temp_new_i32();
>  tmp2 = tcg_temp_new_i32();
> +fpst = get_fpstatus_ptr(false);
> +ahp = get_ahp_flag();
>  tcg_gen_ld_f32(cpu_F0s, cpu_env, neon_reg_offset(rm, 0));
> -gen_helper_neon_fcvt_f32_to_f16(tmp, cpu_F0s, cpu_env);
> +gen_helper_vfp_fcvt_f32_to_f16(tmp, cpu_F0s, fpst, ahp);
>  tcg_gen_ld_f32(cpu_F0s, cpu_env, neon_reg_offset(rm, 1));
> -gen_helper_neon_fcvt_f32_to_f16(tmp2, cpu_F0s, cpu_env);
> +gen_helper_vfp_fcvt_f32_to_f16(tmp2, cpu_F0s, fpst, ahp);
>  tcg_gen_shli_i32(tmp2, tmp2, 16);
>  tcg_gen_or_i32(tmp2, tmp2, tmp);
>  tcg_gen_ld_f32(cpu_F0s, cpu_env, neon_reg_offset(rm, 2));
> -gen_helper_neon_fcvt_f32_to_f16(tmp, cpu_F0s, cpu_env);
> +gen_helper_vfp_fcvt_f32_to_f16(tmp, cpu_F0s, fpst, ahp);
>  tcg_gen_ld_f32(cpu_F0s, cpu_env, neon_reg_offset(rm, 3));
>  neon_store_reg(rd, 0, tmp2);
>  tmp2 = tcg_temp_new_i32();
> -gen_helper_neon_fcvt_f32_to_f16(tmp2, cpu_F0s, cpu_env);
> +gen_helper_vfp_fcvt_f32_to_f16(tmp2, cpu_F0s, fpst, ahp);
>  tcg_gen_shli_i32(tmp2, tmp2, 16);
>  tcg_gen_or_i32(tmp2, tmp2, tmp);
>  neon_store_reg(rd, 1, tmp2);
>  tcg_temp_free_i32(tmp);
> +tcg_temp_free_i32(ahp);
> +tcg_temp_free_ptr(fpst);
>  break;
> +}
>  case NEON_2RM_VCVT_F32_F16:
> +{
> +TCGv_ptr fpst;
> +TCGv_i32 ahp;
>  if (!arm_dc_feature(s, ARM_FEATURE_VFP_FP16) ||
>  q || (rd & 1)) {
>  return 1;
>  }
> +fpst = get_fpstatus_ptr(false);
> +ahp = get_ahp_flag();
>  tmp3 = tcg_temp_new_i32();
>  tmp = neon_load_reg(rm, 0);
>  tmp2 = neon_load_reg(rm, 1);
>  tcg_gen_ext16u_i32(tmp3, tmp);
> -gen_helper_neon_fcvt_f16_to_f32(cpu_F0s, tmp3, cpu_env);
> +gen_helper_vfp_fcvt_f16_to_f32(cpu_F0s, tmp3, fpst, ahp);
>  tcg_gen_st_f32(cpu_F0s, cpu_env, neon_reg_offset(rd, 0));
>  tcg_gen_shri_i32(tmp3, tmp, 16);
> -gen_helper_neon_fcvt_f16_to_f32(cpu_F0s, tmp3, cpu_env);
> +gen_helper_vfp_fcvt_f16_to_f32(cpu_F0s, tmp3, fpst, ahp);
>  tcg_gen_st_f32(cpu_F0s, cpu_env, neon_reg_offset(rd, 1));
>  tcg_temp_free_i32(tmp);
>  tcg_gen_ext16u_i32(tmp3, tmp2);
> -gen_helper_neon_fcvt_f16_to_f32(cpu_F0s, tmp3, cpu_env);
> +gen_helper_vfp_fcvt_f16_to_f32(cpu_F0s, tmp3, fpst, ahp);
>  tcg_gen_st_f32(cpu_F0s, cpu_env, neon_reg_offset(rd, 2));
>  tcg_gen_shri_i32(tmp3, tmp2, 16);
> -gen_helper_neon_fcvt_f16_to_f32(cpu_F0s, tmp3, cpu_env);
> +gen_helper_vfp_fcvt_f16_to_f32(cpu_F0s, tmp3, fpst, ahp);
>  tcg_gen_st_f32(cpu_F0s, cpu_env, neon_reg_offset(rd, 3));
>  tcg_temp_free_i32(tmp2);
>  tcg_temp_free_i32(tmp3);
> +tcg_temp_free_i32(ahp);
> +tcg_temp_free_ptr(fpst);
>  break;
> +}
>  case NEON_2RM_AESE: case NEON_2RM_AESMC:
>

[Qemu-devel] [Bug 1459626] Re: emacs (gtk3) core dumped with -vga qxl

2018-05-17 Thread Thomas Huth

Looking through old bug tickets... can you still reproduce this issue
with the latest versions of QEMU, x11, kernel, spice, etc.? Or could we
close this ticket nowadays?

** Changed in: qemu
   Status: New => Incomplete

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1459626

Title:
  emacs (gtk3) core dumped with -vga qxl

Status in QEMU:
  Incomplete

Bug description:
  Emacs (gtk3) exited with bus error and core dumped with -vga qxl. If I
  use the builtin modesetting xorg driver, emacs could survive for a
  short while sometimes. If I use xf86-video-qxl (git r587.8babd05), it
  dies right away with the same error. It seems to corrupt xorg at some
  point as well, because sometimes I cannot exit i3 properly and gnome-
  terminal can go crazy afterwards (all letters become empty retangles).

  It doesn't seem to happen with other -vga driver.

  Error message is attached. Can also provide the core dumped but it's
  of 47M.

  I started the vm as root (sudo) with the following command: qemu-
  system-x86_64 -enable-kvm -m 4G -virtfs
  local,mount_tag=qemu,security_model=passthrough,path=/mnt/qemu/
  -kernel /mnt/qemu/boot/vmlinuz-linux -initrd /mnt/qemu/boot/initramfs-
  linux-fallback.img -append 'rw root=qemu fstype=9p' -usbdevice tablet
  -vga qxl -spice port=12345,disable-ticketing

  /mnt/qemu is a btrfs snapshot of the subvolume used as the host root

  Arch Linux, qemu 2.3.0, emacs 24.5, xorg-server 1.17.1, linux 4.0.4,
  gtk 3.16.3, glib 2.44.1

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1459626/+subscriptions

[Qemu-devel] [Bug 1459622] Re: firefox hang with virtfs

2018-05-17 Thread Thomas Huth

Looking through old bug tickets... can you still reproduce this issue
with the latest version of QEMU? Or could we close this ticket nowadays?

** Changed in: qemu
   Status: New => Incomplete

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1459622

Title:
  firefox hang with virtfs

Status in QEMU:
  Incomplete

Bug description:
  Firefox hangs once it starts to load pages. I tried to delete
  .cache/mozilla/ and .mozilla/ but it doesn't help. But if I mount
  tmpfs on to .mozilla (not necessary for .cache/mozilla/), pages loads
  fine.

  I started the vm as root (sudo) with the following command: qemu-
  system-x86_64 -enable-kvm -m 4G -virtfs
  local,mount_tag=qemu,security_model=passthrough,path=/mnt/qemu/
  -kernel /mnt/qemu/boot/vmlinuz-linux -initrd /mnt/qemu/boot/initramfs-
  linux-fallback.img -append 'rw root=qemu fstype=9p' -usbdevice tablet
  -vga qxl -spice port=12345,disable-ticketing

  /mnt/qemu is a btrfs snapshot of the subvolume used as the host root

  Arch Linux, qemu 2.3.0, firefox 38.0.1

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1459622/+subscriptions

[Qemu-devel] [PATCH 3/4] bochs-display: add dirty tracking support

2018-05-17 Thread Gerd Hoffmann

Signed-off-by: Gerd Hoffmann 
---
 hw/display/bochs-display.c | 33 -
 1 file changed, 32 insertions(+), 1 deletion(-)

diff --git a/hw/display/bochs-display.c b/hw/display/bochs-display.c
index beeda58475..9ea6f798f4 100644
--- a/hw/display/bochs-display.c
+++ b/hw/display/bochs-display.c
@@ -192,9 +192,13 @@ nofb:
 static void bochs_display_update(void *opaque)
 {
 BochsDisplayState *s = opaque;
+DirtyBitmapSnapshot *snap = NULL;
+bool full_update = false;
 BochsDisplayMode mode;
 DisplaySurface *ds;
 uint8_t *ptr;
+bool dirty;
+int y, ys;
 
 bochs_display_get_mode(s, );
 if (!mode.size) {
@@ -212,9 +216,34 @@ static void bochs_display_update(void *opaque)
  mode.stride,
  ptr + mode.offset);
 dpy_gfx_replace_surface(s->con, ds);
+full_update = true;
 }
 
-dpy_gfx_update_full(s->con);
+if (full_update) {
+dpy_gfx_update_full(s->con);
+} else {
+snap = memory_region_snapshot_and_clear_dirty(>vram,
+  mode.offset, mode.size,
+  DIRTY_MEMORY_VGA);
+ys = -1;
+for (y = 0; y < mode.height; y++) {
+dirty = memory_region_snapshot_get_dirty(>vram, snap,
+ mode.offset + mode.stride 
* y,
+ mode.stride);
+if (dirty && ys < 0) {
+ys = y;
+}
+if (!dirty && ys >= 0) {
+dpy_gfx_update(s->con, 0, ys,
+   mode.width, y - ys);
+ys = -1;
+}
+}
+if (ys >= 0) {
+dpy_gfx_update(s->con, 0, ys,
+   mode.width, y - ys);
+}
+}
 }
 
 static const GraphicHwOps bochs_display_gfx_ops = {
@@ -250,6 +279,8 @@ static void bochs_display_realize(PCIDevice *dev, Error 
**errp)
 pci_set_byte(>pci.config[PCI_REVISION_ID], 2);
 pci_register_bar(>pci, 0, PCI_BASE_ADDRESS_MEM_PREFETCH, >vram);
 pci_register_bar(>pci, 2, PCI_BASE_ADDRESS_SPACE_MEMORY, >mmio);
+
+memory_region_set_log(>vram, true, DIRTY_MEMORY_VGA);
 }
 
 static bool bochs_display_get_big_endian_fb(Object *obj, Error **errp)
-- 
2.9.3

[Qemu-devel] [PATCH 4/4] bochs-display: add pcie support

2018-05-17 Thread Gerd Hoffmann

Signed-off-by: Gerd Hoffmann 
---
 hw/display/bochs-display.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/hw/display/bochs-display.c b/hw/display/bochs-display.c
index 9ea6f798f4..23dbf57b8f 100644
--- a/hw/display/bochs-display.c
+++ b/hw/display/bochs-display.c
@@ -254,6 +254,7 @@ static void bochs_display_realize(PCIDevice *dev, Error 
**errp)
 {
 BochsDisplayState *s = BOCHS_DISPLAY(dev);
 Object *obj = OBJECT(dev);
+int ret;
 
 s->con = graphic_console_init(DEVICE(dev), 0, _display_gfx_ops, s);
 
@@ -280,6 +281,12 @@ static void bochs_display_realize(PCIDevice *dev, Error 
**errp)
 pci_register_bar(>pci, 0, PCI_BASE_ADDRESS_MEM_PREFETCH, >vram);
 pci_register_bar(>pci, 2, PCI_BASE_ADDRESS_SPACE_MEMORY, >mmio);
 
+if (pci_bus_is_express(pci_get_bus(dev))) {
+dev->cap_present |= QEMU_PCI_CAP_EXPRESS;
+ret = pcie_endpoint_cap_init(dev, 0x80);
+assert(ret > 0);
+}
+
 memory_region_set_log(>vram, true, DIRTY_MEMORY_VGA);
 }
 
@@ -341,6 +348,7 @@ static const TypeInfo bochs_display_type_info = {
 .instance_init  = bochs_display_init,
 .class_init = bochs_display_class_init,
 .interfaces = (InterfaceInfo[]) {
+{ INTERFACE_PCIE_DEVICE },
 { INTERFACE_CONVENTIONAL_PCI_DEVICE },
 { },
 },
-- 
2.9.3

Re: [Qemu-devel] [PATCH 0/5] NBD reconnect: preliminary refactoring

2018-05-17 Thread Vladimir Sementsov-Ogievskiy


What about patches 1-4?

07.05.2018 18:44, Vladimir Sementsov-Ogievskiy wrote:

Hi all!

Here are some preliminary refactoring patches, before NBD reconnect
series.

Vladimir Sementsov-Ogievskiy (5):
   block/nbd-client: split channel errors from export errors
   block/nbd: move connection code from block/nbd to block/nbd-client
   block/nbd-client: split connection from initialization
   block/nbd-client: fix nbd_reply_chunk_iter_receive
   block/nbd-client: don't check ioc

  block/nbd-client.h |   2 +-
  block/nbd-client.c | 163 ++---
  block/nbd.c|  41 +-
  3 files changed, 107 insertions(+), 99 deletions(-)




--
Best regards,
Vladimir

Re: [Qemu-devel] [edk2] [PATCH 3/4] ovmf: replace SecurityPkg with OvfmPkg Tcg2PhysicalPresenceLibQemu

2018-05-17 Thread Laszlo Ersek

On 05/15/18 14:30, marcandre.lur...@redhat.com wrote:
> From: Marc-André Lureau 
> 
> Cloned "SecurityPkg/Library/DxeTcg2PhysicalPresenceLib" and:
> 
> - removed all the functions that are unreachable from
>   Tcg2PhysicalPresenceLibProcessRequest()
> 
> - replaced everything that's related to the
>   TCG2_PHYSICAL_PRESENCE*_VARIABLE variables, with direct access to
>   the QEMU structures.
> 
> This commit is based on initial experimental work from Stefan Berger.
> In particular, he wrote most of QEMU PPI support, and designed the
> qemu/firmware interaction. Initially, Stefan tried to reuse the
> existing SecurityPkg code, but we eventually decided to get rid of the
> variables and simplify the ovmf/qemu version.
> 
> Signed-off-by: Marc-André Lureau 
> ---
>  .../DxeTcg2PhysicalPresenceLib.c  | 881 ++
>  .../DxeTcg2PhysicalPresenceLib.inf|  67 ++
>  .../DxeTcg2PhysicalPresenceLib.uni|  26 +

(1) Please drop the "DxeTcg2PhysicalPresenceLib.uni" file (also the
reference in the INF file).

We generally don't do MODULE_UNI_FILEs in OvmfPkg because OvmfPkg is not
distributed with UPT (UEFI Packaging Tool).

>  .../PhysicalPresenceStrings.uni   |  49 +
>  OvmfPkg/OvmfPkgIa32.dsc   |   2 +-
>  OvmfPkg/OvmfPkgIa32X64.dsc|   2 +-
>  OvmfPkg/OvmfPkgX64.dsc|   2 +-
>  7 files changed, 1026 insertions(+), 3 deletions(-)
>  create mode 100644 
> OvmfPkg/Library/Tcg2PhysicalPresenceLibQemu/DxeTcg2PhysicalPresenceLib.c
>  create mode 100644 
> OvmfPkg/Library/Tcg2PhysicalPresenceLibQemu/DxeTcg2PhysicalPresenceLib.inf
>  create mode 100644 
> OvmfPkg/Library/Tcg2PhysicalPresenceLibQemu/DxeTcg2PhysicalPresenceLib.uni
>  create mode 100644 
> OvmfPkg/Library/Tcg2PhysicalPresenceLibQemu/PhysicalPresenceStrings.uni
> 
> diff --git 
> a/OvmfPkg/Library/Tcg2PhysicalPresenceLibQemu/DxeTcg2PhysicalPresenceLib.c 
> b/OvmfPkg/Library/Tcg2PhysicalPresenceLibQemu/DxeTcg2PhysicalPresenceLib.c
> new file mode 100644
> index ..da45f990369a
> --- /dev/null
> +++ b/OvmfPkg/Library/Tcg2PhysicalPresenceLibQemu/DxeTcg2PhysicalPresenceLib.c
> @@ -0,0 +1,881 @@
> +/** @file
> +  Execute pending TPM2 requests from OS or BIOS.
> +
> +  Caution: This module requires additional review when modified.
> +  This driver will have external input - variable.
> +  This external input must be validated carefully to avoid security issue.
> +
> +  Tcg2ExecutePendingTpmRequest() will receive untrusted input and do 
> validation.
> +
> +Copyright (C) 2018, Red Hat, Inc.
> +Copyright (c) 2018, IBM Corporation. All rights reserved.
> +Copyright (c) 2013 - 2016, Intel Corporation. All rights reserved.
> +This program and the accompanying materials
> +are licensed and made available under the terms and conditions of the BSD 
> License
> +which accompanies this distribution.  The full text of the license may be 
> found at
> +http://opensource.org/licenses/bsd-license.php
> +
> +THE PROGRAM IS DISTRIBUTED UNDER THE BSD LICENSE ON AN "AS IS" BASIS,
> +WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED.
> +
> +**/
> +
> +#include 
> +
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +
> +#define CONFIRM_BUFFER_SIZE 4096
> +
> +EFI_HII_HANDLE mTcg2PpStringPackHandle;
> +
> +#define TPM_PPI_FLAGS (QEMU_TPM_PPI_FUNC_ALLOWED_USR_REQ)
> +
> +STATIC CONST UINT8 mTpm2PPIFuncs[] = {
> +  [TCG2_PHYSICAL_PRESENCE_NO_ACTION] = TPM_PPI_FLAGS,
> +  [TCG2_PHYSICAL_PRESENCE_CLEAR] = TPM_PPI_FLAGS,
> +  [TCG2_PHYSICAL_PRESENCE_ENABLE_CLEAR] = TPM_PPI_FLAGS,
> +  [TCG2_PHYSICAL_PRESENCE_ENABLE_CLEAR_2] = TPM_PPI_FLAGS,
> +  [TCG2_PHYSICAL_PRESENCE_ENABLE_CLEAR_3] = TPM_PPI_FLAGS,
> +  [TCG2_PHYSICAL_PRESENCE_SET_PCR_BANKS] = TPM_PPI_FLAGS,
> +  [TCG2_PHYSICAL_PRESENCE_CHANGE_EPS] = TPM_PPI_FLAGS,
> +  [TCG2_PHYSICAL_PRESENCE_LOG_ALL_DIGESTS] = TPM_PPI_FLAGS,
> +  [TCG2_PHYSICAL_PRESENCE_ENABLE_BLOCK_SID] = TPM_PPI_FLAGS,
> +  [TCG2_PHYSICAL_PRESENCE_DISABLE_BLOCK_SID] = TPM_PPI_FLAGS,
> +};

(2) Unfortunately, designated initializers cannot be used in edk2.
You'll have to

- either spell out the entire initial slice of the array (and use "//
TCG2_PHYSICAL_PRESENCE_NO_ACTION" style comments to help readers),

- or else drop the "CONST", introduce a CONSTRUCTOR function to the
library instance, and perform the assignments manually in the
constructor function, such as

  mTpm2PPIFuncs[TCG2_PHYSICAL_PRESENCE_NO_ACTION] = TPM_PPI_FLAGS;

> +
> +STATIC QEMU_TPM_PPI *mPpi;
> +
> +
> +/**
> +  Reads QEMU PPI config from fw_cfg.
> +**/
> +EFI_STATUS
> +QemuTpmReadConfig (

(3) Please make this STATIC.

Please make *all* functions STATIC that can be.

> +  IN QEMU_FWCFG_TPM_CONFIG *Config

(4) Should be

Re: [Qemu-devel] [Qemu-arm] [PATCH 2/2] hw/arm/smmu-common: Fix coverity issue in get_block_pte_address

2018-05-17 Thread Auger Eric

Hi Philippe,
On 05/16/2018 10:01 PM, Philippe Mathieu-Daudé wrote:
> On 05/16/2018 01:23 PM, Peter Maydell wrote:
>> On 16 May 2018 at 16:16, Philippe Mathieu-Daudé  wrote:
>>> Hi Eric,
>>>
>>> On 05/16/2018 03:03 PM, Eric Auger wrote:
 Coverity points out that this can overflow if n > 31,
 because it's only doing 32-bit arithmetic. Let's use 1ULL instead
 of 1. Also the formulae used to compute n can be replaced by
 the level_shift() macro.
>>>
>>> This level_shift() replacement doesn't seems that obvious to me, can you
>>> split it in another patch?
>>>

 Reported-by: Peter Maydell 
 Signed-off-by: Eric Auger 
 ---
  hw/arm/smmu-common.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

 diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
 index 01c7be8..3c5f724 100644
 --- a/hw/arm/smmu-common.c
 +++ b/hw/arm/smmu-common.c
 @@ -83,9 +83,9 @@ static inline hwaddr get_table_pte_address(uint64_t pte, 
 int granule_sz)
  static inline hwaddr get_block_pte_address(uint64_t pte, int level,
 int granule_sz, uint64_t *bsz)
  {
 -int n = (granule_sz - 3) * (4 - level) + 3;
 +int n = level_shift(level, granule_sz);
>>>
>>> Shouldn't this be level_shift(level + 1, granule_sz)?
>>
>> No. The two expressions are equivalent, they're
>> just arranged differently:
>>
>>level_shift(lvl, gsz)
>>   == gsz + (3 - lvl) * (gsz - 3)
>>   == gsz + (4 - lvl) * (gsz - 3) - (gsz - 3)
>>   == gsz - gsz + (4 - lvl) * (gsz - 3) + 3
>>   == (gsz - 3) * (4 - lvl) + 3
> 
> Argh I failed this middle school demonstrations...
> 
> Thanks Peter :)
> 
> So for the much cleaner level_shift() use:
> Reviewed-by: Philippe Mathieu-Daudé 

Thank you for the review!

Eric
>

Re: [Qemu-devel] [PATCH v2 1/1] memory.h: Improve IOMMU related documentation

2018-05-17 Thread Auger Eric

Hi Peter,
On 05/16/2018 06:18 PM, Peter Maydell wrote:
> On 16 May 2018 at 15:33, Auger Eric  wrote:
>> On 05/08/2018 06:25 PM, Peter Maydell wrote:
>>> This runs into something I found when I was implementing the Arm
>>> Memory Protection Controller -- at the point when the guest changes
>>> the config,
>> do you mean the config structures (STE, CD) or the page table update?
> 
> The Memory Protection Controller is not the SMMUv3. The MPC
> config is set using some registers to write to the MPC's LUT
> (lookup table).
> 
>>  it doesn't have enough information to be able to call a
>>> "map" notifier, because the mapping depends on the memory transaction
>>> attributes, it's not fixed and dependent only on the address.
>>
>> I am not sure I understand what you mean here. When the notifier get's
>> called, aren't we supposed to look for the info in the actual page table
>> (where memory access control attributes and others can be found at that
>> time, ie. ARM AP[] for instance) and send those through the notifier (as
>> stored in the IOTLBEnry)?
> 
> The problem is that if your translations depend on the tx attributes,
> ie "secure access to address X should translate to Y, but nonsecure
> access to address X should translate to Z", then the notifier
> API doesn't let you report that, because all it knows about is
> unmap events which are "address X unmapped" and map events which
> are "address X translates to Y".
> 
> Paolo has suggested some API changes in another thread (roughly,
> having the notifier say which tx attributes matter for the translation,
> and send multiple map events with appropriate information).
> 
>> For instance in the intel iommu code, the whole table is parsed and the
>> replay hook is called for all valid entries.
> 
> This works because the intel iommu does not care about memory
> transaction attributes: it has one translation for the input
> address, regardless.

OK I get it now. Thank you for the explanations.

Eric
> 
> thanks
> -- PMM
>

Re: [Qemu-devel] [RFC 00/10] [TESTING NEEDED] python: futurize --stage1 (Python 3 compatibility)

2018-05-17 Thread Fam Zheng

On Fri, 05/11 19:20, Eduardo Habkost wrote:
> TESTING NEEDED: Due to the amount of changes, I didn't test all
> scripts touched by this series.  If you are responsible for any
> of the touched files, I would appreciate help on testing the
> series.

The tests/docker and tests/vm changes look good to me:

Acked-by: Fam Zheng

[Qemu-devel] [PATCH v4 02/14] memory-device: introduce separate config option

2018-05-17 Thread David Hildenbrand

Some architectures might support memory devices, while they don't
support DIMM/NVDIMM. So let's
- Rename CONFIG_MEM_HOTPLUG to CONFIG_MEM_DEVICE
- Introduce CONFIG_DIMM and use it similarly to CONFIG NVDIMM

CONFIG_DIMM and CONFIG_NVDIMM require CONFIG_MEM_DEVICE.

Signed-off-by: David Hildenbrand 
---
 default-configs/i386-softmmu.mak   | 3 ++-
 default-configs/ppc64-softmmu.mak  | 3 ++-
 default-configs/x86_64-softmmu.mak | 3 ++-
 hw/Makefile.objs   | 2 +-
 hw/mem/Makefile.objs   | 4 ++--
 qapi/misc.json | 2 +-
 6 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
index 8c7d4a0fa0..4c1637338b 100644
--- a/default-configs/i386-softmmu.mak
+++ b/default-configs/i386-softmmu.mak
@@ -50,7 +50,8 @@ CONFIG_PCI_Q35=y
 CONFIG_APIC=y
 CONFIG_IOAPIC=y
 CONFIG_PVPANIC=y
-CONFIG_MEM_HOTPLUG=y
+CONFIG_MEM_DEVICE=y
+CONFIG_DIMM=y
 CONFIG_NVDIMM=y
 CONFIG_ACPI_NVDIMM=y
 CONFIG_PCIE_PORT=y
diff --git a/default-configs/ppc64-softmmu.mak 
b/default-configs/ppc64-softmmu.mak
index b94af6c7c6..f550573782 100644
--- a/default-configs/ppc64-softmmu.mak
+++ b/default-configs/ppc64-softmmu.mak
@@ -16,4 +16,5 @@ CONFIG_VIRTIO_VGA=y
 CONFIG_XICS=$(CONFIG_PSERIES)
 CONFIG_XICS_SPAPR=$(CONFIG_PSERIES)
 CONFIG_XICS_KVM=$(call land,$(CONFIG_PSERIES),$(CONFIG_KVM))
-CONFIG_MEM_HOTPLUG=y
+CONFIG_MEM_DEVICE=y
+CONFIG_DIMM=y
diff --git a/default-configs/x86_64-softmmu.mak 
b/default-configs/x86_64-softmmu.mak
index 0390b4303c..7785351414 100644
--- a/default-configs/x86_64-softmmu.mak
+++ b/default-configs/x86_64-softmmu.mak
@@ -50,7 +50,8 @@ CONFIG_PCI_Q35=y
 CONFIG_APIC=y
 CONFIG_IOAPIC=y
 CONFIG_PVPANIC=y
-CONFIG_MEM_HOTPLUG=y
+CONFIG_MEM_DEVICE=y
+CONFIG_DIMM=y
 CONFIG_NVDIMM=y
 CONFIG_ACPI_NVDIMM=y
 CONFIG_PCIE_PORT=y
diff --git a/hw/Makefile.objs b/hw/Makefile.objs
index 6a0ffe0afd..127a60eca4 100644
--- a/hw/Makefile.objs
+++ b/hw/Makefile.objs
@@ -33,7 +33,7 @@ devices-dirs-$(CONFIG_SOFTMMU) += vfio/
 devices-dirs-$(CONFIG_SOFTMMU) += virtio/
 devices-dirs-$(CONFIG_SOFTMMU) += watchdog/
 devices-dirs-$(CONFIG_SOFTMMU) += xen/
-devices-dirs-$(CONFIG_MEM_HOTPLUG) += mem/
+devices-dirs-$(CONFIG_MEM_DEVICE) += mem/
 devices-dirs-$(CONFIG_SOFTMMU) += smbios/
 devices-dirs-y += core/
 common-obj-y += $(devices-dirs-y)
diff --git a/hw/mem/Makefile.objs b/hw/mem/Makefile.objs
index 10be4df2a2..3e2f7c5ca2 100644
--- a/hw/mem/Makefile.objs
+++ b/hw/mem/Makefile.objs
@@ -1,3 +1,3 @@
-common-obj-$(CONFIG_MEM_HOTPLUG) += pc-dimm.o
-common-obj-$(CONFIG_MEM_HOTPLUG) += memory-device.o
+common-obj-$(CONFIG_DIMM) += pc-dimm.o
+common-obj-$(CONFIG_MEM_DEVICE) += memory-device.o
 common-obj-$(CONFIG_NVDIMM) += nvdimm.o
diff --git a/qapi/misc.json b/qapi/misc.json
index f5988cc0b5..4e6265cd2e 100644
--- a/qapi/misc.json
+++ b/qapi/misc.json
@@ -2060,7 +2060,7 @@
 #
 # @plugged-memory: size of memory that can be hot-unplugged. This field
 #  is omitted if target doesn't support memory hotplug
-#  (i.e. CONFIG_MEM_HOTPLUG not defined on build time).
+#  (i.e. CONFIG_MEM_DEVICE not defined at build time).
 #
 # Since: 2.11.0
 ##
-- 
2.14.3

[Qemu-devel] [PATCH v4 12/14] memory-device: factor out pre-plug into hotplug handler

2018-05-17 Thread David Hildenbrand

Let's move all pre-plug checks we can do without the device being
realized into the applicable hotplug handler for pc and spapr.

Signed-off-by: David Hildenbrand 
---
 hw/i386/pc.c   | 11 +++
 hw/mem/memory-device.c | 72 +++---
 hw/ppc/spapr.c | 11 +++
 include/hw/mem/memory-device.h |  2 ++
 4 files changed, 57 insertions(+), 39 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 8bc41ef24b..61f1537e14 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -2010,6 +2010,16 @@ static void pc_machine_device_pre_plug_cb(HotplugHandler 
*hotplug_dev,
 {
 Error *local_err = NULL;
 
+/* first stage hotplug handler */
+if (object_dynamic_cast(OBJECT(dev), TYPE_MEMORY_DEVICE)) {
+memory_device_pre_plug(MACHINE(hotplug_dev), MEMORY_DEVICE(dev),
+   _err);
+}
+
+if (local_err) {
+goto out;
+}
+
 /* final stage hotplug handler */
 if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
 pc_cpu_pre_plug(hotplug_dev, dev, _err);
@@ -2017,6 +2027,7 @@ static void pc_machine_device_pre_plug_cb(HotplugHandler 
*hotplug_dev,
 hotplug_handler_pre_plug(dev->parent_bus->hotplug_handler, dev,
  _err);
 }
+out:
 error_propagate(errp, local_err);
 }
 
diff --git a/hw/mem/memory-device.c b/hw/mem/memory-device.c
index 361d38bfc5..d22c91993f 100644
--- a/hw/mem/memory-device.c
+++ b/hw/mem/memory-device.c
@@ -68,58 +68,26 @@ static int memory_device_used_region_size(Object *obj, void 
*opaque)
 return 0;
 }
 
-static void memory_device_check_addable(MachineState *ms, uint64_t size,
-Error **errp)
-{
-uint64_t used_region_size = 0;
-
-/* we will need a new memory slot for kvm and vhost */
-if (kvm_enabled() && !kvm_has_free_slot(ms)) {
-error_setg(errp, "hypervisor has no free memory slots left");
-return;
-}
-if (!vhost_has_free_slot()) {
-error_setg(errp, "a used vhost backend has no free memory slots left");
-return;
-}
-
-/* will we exceed the total amount of memory specified */
-memory_device_used_region_size(OBJECT(ms), _region_size);
-if (used_region_size + size > ms->maxram_size - ms->ram_size) {
-error_setg(errp, "not enough space, currently 0x%" PRIx64
-   " in use of total hot pluggable 0x" RAM_ADDR_FMT,
-   used_region_size, ms->maxram_size - ms->ram_size);
-return;
-}
-
-}
-
 uint64_t memory_device_get_free_addr(MachineState *ms, const uint64_t *hint,
  uint64_t align, uint64_t size,
  Error **errp)
 {
 uint64_t address_space_start, address_space_end;
+uint64_t used_region_size = 0;
 GSList *list = NULL, *item;
 uint64_t new_addr = 0;
 
-if (!ms->device_memory) {
-error_setg(errp, "memory devices (e.g. for memory hotplug) are not "
- "supported by the machine");
-return 0;
-}
-
-if (!memory_region_size(>device_memory->mr)) {
-error_setg(errp, "memory devices (e.g. for memory hotplug) are not "
- "enabled, please specify the maxmem option");
-return 0;
-}
 address_space_start = ms->device_memory->base;
 address_space_end = address_space_start +
 memory_region_size(>device_memory->mr);
 g_assert(address_space_end >= address_space_start);
 
-memory_device_check_addable(ms, size, errp);
-if (*errp) {
+/* will we exceed the total amount of memory specified */
+memory_device_used_region_size(OBJECT(ms), _region_size);
+if (used_region_size + size > ms->maxram_size - ms->ram_size) {
+error_setg(errp, "not enough space, currently 0x%" PRIx64
+   " in use of total hot pluggable 0x" RAM_ADDR_FMT,
+   used_region_size, ms->maxram_size - ms->ram_size);
 return 0;
 }
 
@@ -242,6 +210,32 @@ uint64_t get_plugged_memory_size(void)
 return size;
 }
 
+void memory_device_pre_plug(MachineState *ms, const MemoryDeviceState *md,
+Error **errp)
+{
+if (!ms->device_memory) {
+error_setg(errp, "memory devices (e.g. for memory hotplug) are not "
+ "supported by the machine");
+return;
+}
+
+if (!memory_region_size(>device_memory->mr)) {
+error_setg(errp, "memory devices (e.g. for memory hotplug) are not "
+ "enabled, please specify the maxmem option");
+return;
+}
+
+/* we will need a new memory slot for kvm and vhost */
+if (kvm_enabled() && !kvm_has_free_slot(ms)) {
+error_setg(errp, "hypervisor has no free memory slots left");
+return;
+}
+if (!vhost_has_free_slot()) {
+error_setg(errp, "a used vhost

Re: [Qemu-devel] [PATCH v6 1/2] qmp: adding 'wakeup-suspend-support' in query-target

2018-05-17 Thread Markus Armbruster

Daniel Henrique Barboza  writes:

> On 05/15/2018 12:45 PM, Markus Armbruster wrote:
>> Daniel Henrique Barboza  writes:
>>
>>> When issuing the qmp/hmp 'system_wakeup' command, what happens in a
>>> nutshell is:
>>>
>>> - qmp_system_wakeup_request set runstate to RUNNING, sets a wakeup_reason
>>> and notify the event
>>> - in the main_loop, all vcpus are paused, a system reset is issued, all
>>> subscribers of wakeup_notifiers receives a notification, vcpus are then
>>> resumed and the wake up QAPI event is fired
>>>
>>> Note that this procedure alone doesn't ensure that the guest will awake
>>> from SUSPENDED state - the subscribers of the wake up event must take
>>> action to resume the guest, otherwise the guest will simply reboot.
>>>
>>> At this moment there are only two subscribers of the wake up event: one
>>> in hw/acpi/core.c and another one in hw/i386/xen/xen-hvm.c. This means
>>> that system_wakeup does not work as intended with other architectures.
>>>
>>> However, only the presence of 'system_wakeup' is required for QGA to
>>> support 'guest-suspend-ram' and 'guest-suspend-hybrid' at this moment.
>>> This means that the user/management will expect to suspend the guest using
>>> one of those suspend commands and then resume execution using system_wakeup,
>>> regardless of the support offered in system_wakeup in the first place.
>>>
>>> This patch adds a new flag called 'wakeup-suspend-support' in TargetInfo
>>> that allows the caller to query if the guest supports wake up from
>>> suspend via system_wakeup. It goes over the subscribers of the wake up
>>> event and, if it's empty, it assumes that the guest does not support
>>> wake up from suspend (and thus, pm-suspend itself).
>>>
>>> This is the expected output of query-target when running a x86 guest:
>>>
>>> {"execute" : "query-target"}
>>> {"return": {"arch": "x86_64", "wakeup-suspend-support": true}}
>>>
>>> This is the output when running a pseries guest:
>>>
>>> {"execute" : "query-target"}
>>> {"return": {"arch": "ppc64", "wakeup-suspend-support": false}}
>>>
>>> Given that the TargetInfo structure is read-only, adding a new flag to
>>> it is backwards compatible. There is no need to deprecate the old
>>> TargetInfo format.
>>>
>>> With this extra tool, management can avoid situations where a guest
>>> that does not have proper suspend/wake capabilities ends up in
>>> inconsistent state (e.g.
>>> https://github.com/open-power-host-os/qemu/issues/31).
>>>
>>> Reported-by: Balamuruhan S 
>>> Signed-off-by: Daniel Henrique Barboza 
>>> ---
>>>   arch_init.c |  1 +
>>>   include/sysemu/sysemu.h |  1 +
>>>   qapi/misc.json  |  4 +++-
>>>   vl.c| 21 +
>>>   4 files changed, 26 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch_init.c b/arch_init.c
>>> index 9597218ced..67bdf27528 100644
>>> --- a/arch_init.c
>>> +++ b/arch_init.c
>>> @@ -115,6 +115,7 @@ TargetInfo *qmp_query_target(Error **errp)
>>> info->arch = qapi_enum_parse(_lookup,
>>> TARGET_NAME, -1,
>>>_abort);
>>> +info->wakeup_suspend_support = !qemu_wakeup_notifier_is_empty();
>> Huh?  Hmm, see "hack" below.
>>
>>> return info;
>>>   }
>>> diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
>>> index 544ab77a2b..fbe2a3373e 100644
>>> --- a/include/sysemu/sysemu.h
>>> +++ b/include/sysemu/sysemu.h
>>> @@ -69,6 +69,7 @@ typedef enum WakeupReason {
>>>   void qemu_system_reset_request(ShutdownCause reason);
>>>   void qemu_system_suspend_request(void);
>>>   void qemu_register_suspend_notifier(Notifier *notifier);
>>> +bool qemu_wakeup_notifier_is_empty(void);
>>>   void qemu_system_wakeup_request(WakeupReason reason);
>>>   void qemu_system_wakeup_enable(WakeupReason reason, bool enabled);
>>>   void qemu_register_wakeup_notifier(Notifier *notifier);
>>> diff --git a/qapi/misc.json b/qapi/misc.json
>>> index f5988cc0b5..a385d897ae 100644
>>> --- a/qapi/misc.json
>>> +++ b/qapi/misc.json
>>> @@ -2484,11 +2484,13 @@
>>>   # Information describing the QEMU target.
>>>   #
>>>   # @arch: the target architecture
>>> +# @wakeup-suspend-support: true if the target supports wake up from
>>> +#  suspend (since 2.13)
>>>   #
>>>   # Since: 1.2.0
>>>   ##
>>>   { 'struct': 'TargetInfo',
>>> -  'data': { 'arch': 'SysEmuTarget' } }
>>> +  'data': { 'arch': 'SysEmuTarget', 'wakeup-suspend-support': 'bool' } }
>>> ##
>>>   # @query-target:
>> Does the documentation of system_wakeup need fixing?
>>
>> ##
>> # @system_wakeup:
>> #
>> # Wakeup guest from suspend.  Does nothing in case the guest isn't 
>> suspended.
>> #
>> # Since:  1.1
>> #
>> # Returns:  nothing.
>> #
>> # Example:
>> #
>> # -> { "execute": "system_wakeup" }
>> # <- { "return": {} }
>> #
>> ##
>> { 'command':

[Qemu-devel] [PATCH v3 11/12] intel-iommu: new vtd_sync_shadow_page_table_range

2018-05-17 Thread Peter Xu

I pick part of the page walk implementation out into this single
function.  It was only used now for PSIs, but actually it can be used
not only for invalidations but for any place that we want to synchronize
the shadow page table.  No functional change at all.

Here I passed in context entry explicit to avoid fetching it again.
However I enhanced the helper to even be able to fetch it on its own,
since in follow up patches we might call without context entries.

Signed-off-by: Peter Xu 
---
 hw/i386/intel_iommu.c | 65 ---
 1 file changed, 48 insertions(+), 17 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 272e49ff66..a1a2a009c1 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -1018,6 +1018,53 @@ static int vtd_dev_to_context_entry(IntelIOMMUState *s, 
uint8_t bus_num,
 return 0;
 }
 
+static int vtd_sync_shadow_page_hook(IOMMUTLBEntry *entry,
+ void *private)
+{
+memory_region_notify_iommu((IOMMUMemoryRegion *)private, *entry);
+return 0;
+}
+
+/* If context entry is NULL, we'll try to fetch it on our own. */
+static int vtd_sync_shadow_page_table_range(VTDAddressSpace *vtd_as,
+VTDContextEntry *ce,
+hwaddr addr, hwaddr size)
+{
+IntelIOMMUState *s = vtd_as->iommu_state;
+vtd_page_walk_info info = {
+.hook_fn = vtd_sync_shadow_page_hook,
+.private = (void *)_as->iommu,
+.notify_unmap = true,
+.aw = s->aw_bits,
+.as = vtd_as,
+};
+VTDContextEntry ce_cache;
+int ret;
+
+if (ce) {
+/* If the caller provided context entry, use it */
+ce_cache = *ce;
+} else {
+/* If the caller didn't provide ce, try to fetch */
+ret = vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus),
+   vtd_as->devfn, _cache);
+if (ret) {
+/*
+ * This should not really happen, but in case it happens,
+ * we just skip the sync for this time.  After all we even
+ * don't have the root table pointer!
+ */
+trace_vtd_err("Detected invalid context entry when "
+  "trying to sync shadow page table");
+return 0;
+}
+}
+
+info.domain_id = VTD_CONTEXT_ENTRY_DID(ce_cache.hi);
+
+return vtd_page_walk(_cache, addr, addr + size, );
+}
+
 /*
  * Fetch translation type for specific device. Returns <0 if error
  * happens, otherwise return the shifted type to check against
@@ -1493,13 +1540,6 @@ static void vtd_iotlb_domain_invalidate(IntelIOMMUState 
*s, uint16_t domain_id)
 }
 }
 
-static int vtd_page_invalidate_notify_hook(IOMMUTLBEntry *entry,
-   void *private)
-{
-memory_region_notify_iommu((IOMMUMemoryRegion *)private, *entry);
-return 0;
-}
-
 static void vtd_iotlb_page_invalidate_notify(IntelIOMMUState *s,
uint16_t domain_id, hwaddr addr,
uint8_t am)
@@ -1514,20 +1554,11 @@ static void 
vtd_iotlb_page_invalidate_notify(IntelIOMMUState *s,
vtd_as->devfn, );
 if (!ret && domain_id == VTD_CONTEXT_ENTRY_DID(ce.hi)) {
 if (vtd_as_notify_mappings(vtd_as)) {
-vtd_page_walk_info info = {
-.hook_fn = vtd_page_invalidate_notify_hook,
-.private = (void *)_as->iommu,
-.notify_unmap = true,
-.aw = s->aw_bits,
-.as = vtd_as,
-.domain_id = domain_id,
-};
-
 /*
  * For MAP-inclusive notifiers, we need to walk the
  * page table to sync the shadow page table.
  */
-vtd_page_walk(, addr, addr + size, );
+vtd_sync_shadow_page_table_range(vtd_as, , addr, size);
 } else {
 /*
  * For UNMAP-only notifiers, we don't need to walk the
-- 
2.17.0

[Qemu-devel] [PATCH 1/4] vga: move bochs vbe defines to header file

2018-05-17 Thread Gerd Hoffmann

Create a new header file, move the bochs vbe dispi interface
defines to it, so they can be used outside vga code.

Signed-off-by: Gerd Hoffmann 
---
 hw/display/vga_int.h   | 35 ++-
 include/hw/display/bochs-vbe.h | 64 ++
 hw/display/vga-pci.c   | 13 -
 3 files changed, 66 insertions(+), 46 deletions(-)
 create mode 100644 include/hw/display/bochs-vbe.h

diff --git a/hw/display/vga_int.h b/hw/display/vga_int.h
index fe23b81442..313cff84fc 100644
--- a/hw/display/vga_int.h
+++ b/hw/display/vga_int.h
@@ -29,42 +29,11 @@
 #include "exec/memory.h"
 #include "ui/console.h"
 
+#include "hw/display/bochs-vbe.h"
+
 #define ST01_V_RETRACE  0x08
 #define ST01_DISP_ENABLE0x01
 
-#define VBE_DISPI_MAX_XRES  16000
-#define VBE_DISPI_MAX_YRES  12000
-#define VBE_DISPI_MAX_BPP   32
-
-#define VBE_DISPI_INDEX_ID  0x0
-#define VBE_DISPI_INDEX_XRES0x1
-#define VBE_DISPI_INDEX_YRES0x2
-#define VBE_DISPI_INDEX_BPP 0x3
-#define VBE_DISPI_INDEX_ENABLE  0x4
-#define VBE_DISPI_INDEX_BANK0x5
-#define VBE_DISPI_INDEX_VIRT_WIDTH  0x6
-#define VBE_DISPI_INDEX_VIRT_HEIGHT 0x7
-#define VBE_DISPI_INDEX_X_OFFSET0x8
-#define VBE_DISPI_INDEX_Y_OFFSET0x9
-#define VBE_DISPI_INDEX_NB  0xa /* size of vbe_regs[] */
-#define VBE_DISPI_INDEX_VIDEO_MEMORY_64K 0xa /* read-only, not in vbe_regs */
-
-#define VBE_DISPI_ID0   0xB0C0
-#define VBE_DISPI_ID1   0xB0C1
-#define VBE_DISPI_ID2   0xB0C2
-#define VBE_DISPI_ID3   0xB0C3
-#define VBE_DISPI_ID4   0xB0C4
-#define VBE_DISPI_ID5   0xB0C5
-
-#define VBE_DISPI_DISABLED  0x00
-#define VBE_DISPI_ENABLED   0x01
-#define VBE_DISPI_GETCAPS   0x02
-#define VBE_DISPI_8BIT_DAC  0x20
-#define VBE_DISPI_LFB_ENABLED   0x40
-#define VBE_DISPI_NOCLEARMEM0x80
-
-#define VBE_DISPI_LFB_PHYSICAL_ADDRESS  0xE000
-
 #define CH_ATTR_SIZE (160 * 100)
 #define VGA_MAX_HEIGHT 2048
 
diff --git a/include/hw/display/bochs-vbe.h b/include/hw/display/bochs-vbe.h
new file mode 100644
index 00..6f27ed4a91
--- /dev/null
+++ b/include/hw/display/bochs-vbe.h
@@ -0,0 +1,64 @@
+/*
+ * bochs vesa bios extension interface
+ */
+
+#define VBE_DISPI_MAX_XRES  16000
+#define VBE_DISPI_MAX_YRES  12000
+#define VBE_DISPI_MAX_BPP   32
+
+#define VBE_DISPI_INDEX_ID  0x0
+#define VBE_DISPI_INDEX_XRES0x1
+#define VBE_DISPI_INDEX_YRES0x2
+#define VBE_DISPI_INDEX_BPP 0x3
+#define VBE_DISPI_INDEX_ENABLE  0x4
+#define VBE_DISPI_INDEX_BANK0x5
+#define VBE_DISPI_INDEX_VIRT_WIDTH  0x6
+#define VBE_DISPI_INDEX_VIRT_HEIGHT 0x7
+#define VBE_DISPI_INDEX_X_OFFSET0x8
+#define VBE_DISPI_INDEX_Y_OFFSET0x9
+#define VBE_DISPI_INDEX_NB  0xa /* size of vbe_regs[] */
+#define VBE_DISPI_INDEX_VIDEO_MEMORY_64K 0xa /* read-only, not in vbe_regs */
+
+/* VBE_DISPI_INDEX_ID */
+#define VBE_DISPI_ID0   0xB0C0
+#define VBE_DISPI_ID1   0xB0C1
+#define VBE_DISPI_ID2   0xB0C2
+#define VBE_DISPI_ID3   0xB0C3
+#define VBE_DISPI_ID4   0xB0C4
+#define VBE_DISPI_ID5   0xB0C5
+
+/* VBE_DISPI_INDEX_ENABLE */
+#define VBE_DISPI_DISABLED  0x00
+#define VBE_DISPI_ENABLED   0x01
+#define VBE_DISPI_GETCAPS   0x02
+#define VBE_DISPI_8BIT_DAC  0x20
+#define VBE_DISPI_LFB_ENABLED   0x40
+#define VBE_DISPI_NOCLEARMEM0x80
+
+/* only used by isa-vga, pci vga devices use a memory bar */
+#define VBE_DISPI_LFB_PHYSICAL_ADDRESS  0xE000
+
+
+/*
+ * qemu extension: mmio bar (region 2)
+ */
+
+#define PCI_VGA_MMIO_SIZE 0x1000
+
+/* vga register region */
+#define PCI_VGA_IOPORT_OFFSET 0x400
+#define PCI_VGA_IOPORT_SIZE   (0x3e0 - 0x3c0)
+
+/* bochs vbe register region */
+#define PCI_VGA_BOCHS_OFFSET  0x500
+#define PCI_VGA_BOCHS_SIZE(0x0b * 2)
+
+/* qemu extension register region */
+#define PCI_VGA_QEXT_OFFSET   0x600
+#define PCI_VGA_QEXT_SIZE (2 * 4)
+
+/* qemu extension registers */
+#define PCI_VGA_QEXT_REG_SIZE (0 * 4)
+#define PCI_VGA_QEXT_REG_BYTEORDER(1 * 4)
+#define  PCI_VGA_QEXT_LITTLE_ENDIAN   0x1e1e1e1e
+#define  PCI_VGA_QEXT_BIG_ENDIAN  0xbebebebe
diff --git a/hw/display/vga-pci.c b/hw/display/vga-pci.c
index f312930664..fb3e4cd400 100644
--- a/hw/display/vga-pci.c
+++ b/hw/display/vga-pci.c
@@ -31,19 +31,6 @@
 #include "qemu/timer.h"
 #include "hw/loader.h"
 
-#define PCI_VGA_IOPORT_OFFSET 0x400
-#define PCI_VGA_IOPORT_SIZE   (0x3e0 - 0x3c0)
-#define PCI_VGA_BOCHS_OFFSET  0x500
-#define PCI_VGA_BOCHS_SIZE(0x0b * 2)
-#define

Re: [Qemu-devel] [PATCH v2 02/10] intel-iommu: remove IntelIOMMUNotifierNode

2018-05-17 Thread Auger Eric

Hi Peter,

On 05/17/2018 12:02 PM, Peter Xu wrote:
> On Thu, May 17, 2018 at 11:46:22AM +0200, Auger Eric wrote:
>> Hi Peter,
>>
>> On 05/04/2018 05:08 AM, Peter Xu wrote:
>>> That is not really necessary.  Removing that node struct and put the
>>> list entry directly into VTDAddressSpace.  It simplfies the code a lot.
>>>
>>> Signed-off-by: Peter Xu 
>>> ---
>>>  include/hw/i386/intel_iommu.h |  9 ++--
>>>  hw/i386/intel_iommu.c | 41 ++-
>>>  2 files changed, 14 insertions(+), 36 deletions(-)
>>>
>>> diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
>>> index 45ec8919b6..220697253f 100644
>>> --- a/include/hw/i386/intel_iommu.h
>>> +++ b/include/hw/i386/intel_iommu.h
>>> @@ -67,7 +67,6 @@ typedef union VTD_IR_TableEntry VTD_IR_TableEntry;
>>>  typedef union VTD_IR_MSIAddress VTD_IR_MSIAddress;
>>>  typedef struct VTDIrq VTDIrq;
>>>  typedef struct VTD_MSIMessage VTD_MSIMessage;
>>> -typedef struct IntelIOMMUNotifierNode IntelIOMMUNotifierNode;
>>>  
>>>  /* Context-Entry */
>>>  struct VTDContextEntry {
>>> @@ -93,6 +92,7 @@ struct VTDAddressSpace {
>>>  MemoryRegion iommu_ir;  /* Interrupt region: 0xfeeX */
>>>  IntelIOMMUState *iommu_state;
>>>  VTDContextCacheEntry context_cache_entry;
>>> +QLIST_ENTRY(VTDAddressSpace) next;
>>>  };
>>>  
>>>  struct VTDBus {
>>> @@ -253,11 +253,6 @@ struct VTD_MSIMessage {
>>>  /* When IR is enabled, all MSI/MSI-X data bits should be zero */
>>>  #define VTD_IR_MSI_DATA  (0)
>>>  
>>> -struct IntelIOMMUNotifierNode {
>>> -VTDAddressSpace *vtd_as;
>>> -QLIST_ENTRY(IntelIOMMUNotifierNode) next;
>>> -};
>>> -
>>>  /* The iommu (DMAR) device state struct */
>>>  struct IntelIOMMUState {
>>>  X86IOMMUState x86_iommu;
>>> @@ -295,7 +290,7 @@ struct IntelIOMMUState {
>>>  GHashTable *vtd_as_by_busptr;   /* VTDBus objects indexed by PCIBus* 
>>> reference */
>>>  VTDBus *vtd_as_by_bus_num[VTD_PCI_BUS_MAX]; /* VTDBus objects indexed 
>>> by bus number */
>>>  /* list of registered notifiers */
>>> -QLIST_HEAD(, IntelIOMMUNotifierNode) notifiers_list;
>>> +QLIST_HEAD(, VTDAddressSpace) notifiers_list;
>> Wouldn't it make sense to rename notifiers_list into something more
>> understandable like address_spaces?
> 
> But address_spaces might be a bit confusing too on the other side as
> "a list of all VT-d address spaces".  How about something like:
> 
>  address_spaces_with_notifiers
Hum I missed not all of them belonged to that list. a bit long now?
vtd_as_with_notifiers?

Thanks

Eric
> 
> ?
> 
> [...]
> 
>>> -/* update notifier node with new flags */
>>> -QLIST_FOREACH_SAFE(node, >notifiers_list, next, next_node) {
>>> -if (node->vtd_as == vtd_as) {
>>> -if (new == IOMMU_NOTIFIER_NONE) {
>>> -QLIST_REMOVE(node, next);
>>> -g_free(node);
>>> -}
>>> -return;
>>> -}
>>> +/* Insert new ones */
>> s/ones/one
>>
>>> +QLIST_INSERT_HEAD(>notifiers_list, vtd_as, next);
>>> +} else if (new == IOMMU_NOTIFIER_NONE) {
>>> +/* Remove old ones */
>> same. Not sure the comments are worth.
> 
> Will remove two "s"s there.  Thanks,
>

Re: [Qemu-devel] [PATCH v2 02/10] intel-iommu: remove IntelIOMMUNotifierNode

2018-05-17 Thread Peter Xu

On Thu, May 17, 2018 at 12:10:41PM +0200, Auger Eric wrote:
> Hi Peter,
> 
> On 05/17/2018 12:02 PM, Peter Xu wrote:
> > On Thu, May 17, 2018 at 11:46:22AM +0200, Auger Eric wrote:
> >> Hi Peter,
> >>
> >> On 05/04/2018 05:08 AM, Peter Xu wrote:
> >>> That is not really necessary.  Removing that node struct and put the
> >>> list entry directly into VTDAddressSpace.  It simplfies the code a lot.
> >>>
> >>> Signed-off-by: Peter Xu 
> >>> ---
> >>>  include/hw/i386/intel_iommu.h |  9 ++--
> >>>  hw/i386/intel_iommu.c | 41 ++-
> >>>  2 files changed, 14 insertions(+), 36 deletions(-)
> >>>
> >>> diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
> >>> index 45ec8919b6..220697253f 100644
> >>> --- a/include/hw/i386/intel_iommu.h
> >>> +++ b/include/hw/i386/intel_iommu.h
> >>> @@ -67,7 +67,6 @@ typedef union VTD_IR_TableEntry VTD_IR_TableEntry;
> >>>  typedef union VTD_IR_MSIAddress VTD_IR_MSIAddress;
> >>>  typedef struct VTDIrq VTDIrq;
> >>>  typedef struct VTD_MSIMessage VTD_MSIMessage;
> >>> -typedef struct IntelIOMMUNotifierNode IntelIOMMUNotifierNode;
> >>>  
> >>>  /* Context-Entry */
> >>>  struct VTDContextEntry {
> >>> @@ -93,6 +92,7 @@ struct VTDAddressSpace {
> >>>  MemoryRegion iommu_ir;  /* Interrupt region: 0xfeeX */
> >>>  IntelIOMMUState *iommu_state;
> >>>  VTDContextCacheEntry context_cache_entry;
> >>> +QLIST_ENTRY(VTDAddressSpace) next;
> >>>  };
> >>>  
> >>>  struct VTDBus {
> >>> @@ -253,11 +253,6 @@ struct VTD_MSIMessage {
> >>>  /* When IR is enabled, all MSI/MSI-X data bits should be zero */
> >>>  #define VTD_IR_MSI_DATA  (0)
> >>>  
> >>> -struct IntelIOMMUNotifierNode {
> >>> -VTDAddressSpace *vtd_as;
> >>> -QLIST_ENTRY(IntelIOMMUNotifierNode) next;
> >>> -};
> >>> -
> >>>  /* The iommu (DMAR) device state struct */
> >>>  struct IntelIOMMUState {
> >>>  X86IOMMUState x86_iommu;
> >>> @@ -295,7 +290,7 @@ struct IntelIOMMUState {
> >>>  GHashTable *vtd_as_by_busptr;   /* VTDBus objects indexed by PCIBus* 
> >>> reference */
> >>>  VTDBus *vtd_as_by_bus_num[VTD_PCI_BUS_MAX]; /* VTDBus objects 
> >>> indexed by bus number */
> >>>  /* list of registered notifiers */
> >>> -QLIST_HEAD(, IntelIOMMUNotifierNode) notifiers_list;
> >>> +QLIST_HEAD(, VTDAddressSpace) notifiers_list;
> >> Wouldn't it make sense to rename notifiers_list into something more
> >> understandable like address_spaces?
> > 
> > But address_spaces might be a bit confusing too on the other side as
> > "a list of all VT-d address spaces".  How about something like:
> > 
> >  address_spaces_with_notifiers
> Hum I missed not all of them belonged to that list. a bit long now?
> vtd_as_with_notifiers?

Okay I can use that.  Regarding to the other "s"s issues - I think
I'll just drop those comments since they aren't really helpful after
all.  Thanks,

-- 
Peter Xu

Re: [Qemu-devel] [PATCH v6 2/2] iothread: let aio_epoll_disable fit to aio_context_destroy

2018-05-17 Thread WangJie (Pluto)

I enjoyed the great benefit of your suggestions, and I will improve next time. 
:)
This time, I ask maintainers to touch up the commit message base on version 5 
and merge it, thanks very much.

On 2018/5/17 14:22, Peter Xu wrote:
> On Thu, May 17, 2018 at 10:26:17AM +0800, Jie Wang wrote:
>> epoll_available will only be set if epollfd != -1, os we
>> can swap the two variables in aio_epoll_disable, and
>> aio_context_destroy can call aio_epoll_disable directly.
>>
>> Signed-off-by: Jie Wang 
>> ---
>>  util/aio-posix.c | 10 --
>>  1 file changed, 4 insertions(+), 6 deletions(-)
>>
>> diff --git a/util/aio-posix.c b/util/aio-posix.c
>> index 0ade2c7..118bf57 100644
>> --- a/util/aio-posix.c
>> +++ b/util/aio-posix.c
>> @@ -45,11 +45,11 @@ struct AioHandler
>>  
>>  static void aio_epoll_disable(AioContext *ctx)
>>  {
>> -ctx->epoll_available = false;
>> -if (!ctx->epoll_enabled) {
>> +ctx->epoll_enabled = false;
>> +if (!ctx->epoll_available) {
>>  return;
>>  }
>> -ctx->epoll_enabled = false;
>> +ctx->epoll_available = false;
>>  close(ctx->epollfd);
>>  }
>>  
>> @@ -716,9 +716,7 @@ void aio_context_setup(AioContext *ctx)
>>  void aio_context_destroy(AioContext *ctx)
>>  {
>>  #ifdef CONFIG_EPOLL_CREATE1
>> -if (ctx->epollfd >= 0) {
>> -close(ctx->epollfd);
>> -}
>> +aio_epoll_disable(ctx);
> 
> Hmm... I think this patch should be the first if to split.
> 
> Anyway, IMHO version 5 is already good enough and has got r-bs, no
> need to bother reposting a version 7.  Maintainer could possibly touch
> up the commit message if necessary.
> 
> Thanks,
> 
>>  #endif
>>  }
>>  
>> -- 
>> 1.8.3.1
>>
>

Re: [Qemu-devel] [edk2] [PATCH 0/4] RFC: ovmf: Add support for TPM Physical Presence interface

2018-05-17 Thread Laszlo Ersek

On 05/15/18 14:30, marcandre.lur...@redhat.com wrote:
> From: Marc-André Lureau 
> 
> Hi,
> 
> The following series adds basic TPM PPI 1.3 support for OVMF-on-QEMU
> with TPM2 (I haven't tested TPM1, for lack of interest).
> 
> PPI test runs successfully with Windows 10 WHLK, despite the limited
> number of supported funcions (tpm2_ppi_funcs table, in particular, no
> function allows to manipulate Tcg2PhysicalPresenceFlags)
> 
> The way it works is relatively simple: a memory region is allocated by
> QEMU to save PPI related variables. An ACPI interface is exposed by
> QEMU to let the guest manipulate those. At boot, ovmf processes and
> updates the PPI qemu region and request variables.
> 
> I build edk2 with:
> 
> $ build -DTPM2_ENABLE -DSECURE_BOOT_ENABLE

Is -DSECURE_BOOT_ENABLE necessary for *building* with -DTPM2_ENABLE? If
that's the case, we should update the DSC files; users building OVMF
from source shouldn't have to care about "-D" inter-dependencies, if we
can manage that somehow.

If -DSECURE_BOOT_ENABLE is only there because otherwise a guest OS
doesn't really make use of -DTPM2_ENABLE either, that's different. In
that case, it's fine to allow building OVMF with TPM2 support but
without SB support.

Thanks!
Laszlo

[Qemu-devel] [PATCH v4 09/14] spapr: handle cpu core unplug via hotplug handler chain

2018-05-17 Thread David Hildenbrand

Let's handle it via hotplug_handler_unplug().

Signed-off-by: David Hildenbrand 
---
 hw/ppc/spapr.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 286c38c842..13d153b5a6 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -3412,7 +3412,16 @@ static void *spapr_populate_hotplug_cpu_dt(CPUState *cs, 
int *fdt_offset,
 /* Callback to be called during DRC release. */
 void spapr_core_release(DeviceState *dev)
 {
-MachineState *ms = MACHINE(qdev_get_hotplug_handler(dev));
+HotplugHandler *hotplug_ctrl = qdev_get_hotplug_handler(dev);
+
+/* Call the unplug handler chain. This can never fail. */
+hotplug_handler_unplug(hotplug_ctrl, dev, _abort);
+}
+
+static void spapr_core_unplug(HotplugHandler *hotplug_dev, DeviceState *dev,
+  Error **errp)
+{
+MachineState *ms = MACHINE(hotplug_dev);
 sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(ms);
 CPUCore *cc = CPU_CORE(dev);
 CPUArchId *core_slot = spapr_find_cpu_slot(ms, cc->core_id, NULL);
@@ -3623,6 +3632,8 @@ static void spapr_machine_device_unplug(HotplugHandler 
*hotplug_dev,
 /* final stage hotplug handler */
 if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
 spapr_memory_unplug(hotplug_dev, dev, _err);
+} else if (object_dynamic_cast(OBJECT(dev), TYPE_SPAPR_CPU_CORE)) {
+spapr_core_unplug(hotplug_dev, dev, _err);
 } else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
 hotplug_handler_unplug(dev->parent_bus->hotplug_handler, dev,
_err);
-- 
2.14.3

[Qemu-devel] [PATCH v4 00/14] MemoryDevice: use multi stage hotplug handlers

2018-05-17 Thread David Hildenbrand

We can have devices that need certain other resources that are e.g.
system resources managed by the machine. We need a clean way to assign
these resources (without violating layers as brought up by Igor).

One example is virtio-mem/virtio-pmem. Both device types need to be
assigned some region in guest physical address space. This device memory
belongs to the machine and is managed by it. However, virito devices are
hotplugged using the hotplug handler their proxy device implements. So we
could trigger e.g. a PCI hotplug handler for virtio-pci or a CSS/CCW
hotplug handler for virtio-ccw. But definetly not the machine.

Now, we can route other devices through the machine hotplug handler, to
properly assign/unassign resources - like a portion in guest physical
address space.

v3 -> v4:
- Removed the s390x bits, will send that out separately (was just a proof
  that it works just fine with s390x)
- Fixed a typo and reworded a comment

v2 -> v3:
- Added "memory-device: introduce separate config option"
- Dropped "parent_bus" check from hotplug handler lookup functions
- "Handly" -> "Handle" in patch description.

v1 -> v2:
- Use multi stage hotplug handler instead of resource handler
- MemoryDevices only compiled if necessary (CONFIG_MEM_HOTPLUG)
- Prepare PC/SPAPR machines properly for multi stage hotplug handlers
- Route SPAPR unplug code via the hotunplug handler
- Directly include s390x support. But there are no usable memory devices
  yet (well, only my virtio-mem prototype)
- Included "memory-device: drop assert related to align and start of address
  space"

David Hildenbrand (13):
  memory-device: drop assert related to align and start of address space
  memory-device: introduce separate config option
  pc: prepare for multi stage hotplug handlers
  pc: route all memory devices through the machine hotplug handler
  spapr: prepare for multi stage hotplug handlers
  spapr: route all memory devices through the machine hotplug handler
  spapr: handle pc-dimm unplug via hotplug handler chain
  spapr: handle cpu core unplug via hotplug handler chain
  memory-device: new functions to handle plug/unplug
  pc-dimm: implement new memory device functions
  memory-device: factor out pre-plug into hotplug handler
  memory-device: factor out unplug into hotplug handler
  memory-device: factor out plug into hotplug handler

Igor Mammedov (1):
  qdev: let machine hotplug handler to override bus hotplug handler

 default-configs/i386-softmmu.mak   |   3 +-
 default-configs/ppc64-softmmu.mak  |   3 +-
 default-configs/x86_64-softmmu.mak |   3 +-
 hw/Makefile.objs   |   2 +-
 hw/core/qdev.c |   6 +-
 hw/i386/pc.c   | 102 ++---
 hw/mem/Makefile.objs   |   4 +-
 hw/mem/memory-device.c | 129 +++--
 hw/mem/pc-dimm.c   |  48 ++
 hw/mem/trace-events|   4 +-
 hw/ppc/spapr.c | 129 +++--
 include/hw/mem/memory-device.h |  21 --
 include/hw/mem/pc-dimm.h   |   3 +-
 include/hw/qdev-core.h |  11 
 qapi/misc.json |   2 +-
 15 files changed, 330 insertions(+), 140 deletions(-)

-- 
2.14.3

[Qemu-devel] [PATCH v4 04/14] pc: prepare for multi stage hotplug handlers

2018-05-17 Thread David Hildenbrand

For multi stage hotplug handlers, we'll have to do some error handling
in some hotplug functions, so let's use a local error variable (except
for unplug requests).

Also, add code to pass control to the final stage hotplug handler at the
parent bus.

Signed-off-by: David Hildenbrand 
---
 hw/i386/pc.c | 39 +++
 1 file changed, 31 insertions(+), 8 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index d768930d02..510076e156 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -2007,19 +2007,32 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev,
 static void pc_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev,
   DeviceState *dev, Error **errp)
 {
+Error *local_err = NULL;
+
+/* final stage hotplug handler */
 if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
-pc_cpu_pre_plug(hotplug_dev, dev, errp);
+pc_cpu_pre_plug(hotplug_dev, dev, _err);
+} else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
+hotplug_handler_pre_plug(dev->parent_bus->hotplug_handler, dev,
+ _err);
 }
+error_propagate(errp, local_err);
 }
 
 static void pc_machine_device_plug_cb(HotplugHandler *hotplug_dev,
   DeviceState *dev, Error **errp)
 {
+Error *local_err = NULL;
+
+/* final stage hotplug handler */
 if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
-pc_dimm_plug(hotplug_dev, dev, errp);
+pc_dimm_plug(hotplug_dev, dev, _err);
 } else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
-pc_cpu_plug(hotplug_dev, dev, errp);
+pc_cpu_plug(hotplug_dev, dev, _err);
+} else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
+hotplug_handler_plug(dev->parent_bus->hotplug_handler, dev, 
_err);
 }
+error_propagate(errp, local_err);
 }
 
 static void pc_machine_device_unplug_request_cb(HotplugHandler *hotplug_dev,
@@ -2029,7 +2042,10 @@ static void 
pc_machine_device_unplug_request_cb(HotplugHandler *hotplug_dev,
 pc_dimm_unplug_request(hotplug_dev, dev, errp);
 } else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
 pc_cpu_unplug_request_cb(hotplug_dev, dev, errp);
-} else {
+} else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
+hotplug_handler_unplug_request(dev->parent_bus->hotplug_handler, dev,
+   errp);
+} else if (!dev->parent_bus) {
 error_setg(errp, "acpi: device unplug request for not supported device"
" type: %s", object_get_typename(OBJECT(dev)));
 }
@@ -2038,14 +2054,21 @@ static void 
pc_machine_device_unplug_request_cb(HotplugHandler *hotplug_dev,
 static void pc_machine_device_unplug_cb(HotplugHandler *hotplug_dev,
 DeviceState *dev, Error **errp)
 {
+Error *local_err = NULL;
+
+/* final stage hotplug handler */
 if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
-pc_dimm_unplug(hotplug_dev, dev, errp);
+pc_dimm_unplug(hotplug_dev, dev, _err);
 } else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
-pc_cpu_unplug_cb(hotplug_dev, dev, errp);
-} else {
-error_setg(errp, "acpi: device unplug for not supported device"
+pc_cpu_unplug_cb(hotplug_dev, dev, _err);
+} else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
+hotplug_handler_unplug(dev->parent_bus->hotplug_handler, dev,
+   _err);
+} else if (!dev->parent_bus) {
+error_setg(_err, "acpi: device unplug for not supported device"
" type: %s", object_get_typename(OBJECT(dev)));
 }
+error_propagate(errp, local_err);
 }
 
 static HotplugHandler *pc_get_hotpug_handler(MachineState *machine,
-- 
2.14.3

[Qemu-devel] [PATCH v4 05/14] pc: route all memory devices through the machine hotplug handler

2018-05-17 Thread David Hildenbrand

Necessary to hotplug them cleanly later. We could drop the PC_DIMM
check, as PC_DIMM are just memory devices, but this approach is cleaner.

Signed-off-by: David Hildenbrand 
---
 hw/i386/pc.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 510076e156..8bc41ef24b 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -74,6 +74,7 @@
 #include "hw/nmi.h"
 #include "hw/i386/intel_iommu.h"
 #include "hw/net/ne2000-isa.h"
+#include "hw/mem/memory-device.h"
 
 /* debug PC/ISA interrupts */
 //#define DEBUG_IRQ
@@ -2075,6 +2076,7 @@ static HotplugHandler *pc_get_hotpug_handler(MachineState 
*machine,
  DeviceState *dev)
 {
 if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM) ||
+object_dynamic_cast(OBJECT(dev), TYPE_MEMORY_DEVICE) ||
 object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
 return HOTPLUG_HANDLER(machine);
 }
-- 
2.14.3

[Qemu-devel] [PATCH v3 00/12] intel-iommu: nested vIOMMU, cleanups, bug fixes

2018-05-17 Thread Peter Xu

(Hello, Jintack, Feel free to test this branch again against your scp
 error case when you got free time)

I rewrote some of the patches in V3.  Major changes:

- Dropped mergable interval tree, instead introduced IOVA tree, which
  is even simpler.

- Fix the scp error issue that Jintack reported.  Please see patches
  for detailed information.  That's the major reason to rewrite a few
  of the patches.  We use replay for domain flushes are possibly
  incorrect in the past.  The thing is that IOMMU replay has an
  "definition" that "we should only send MAP when new page detected",
  while for shadow page syncing we actually need something else than
  that.  So in this version I started to use a new
  vtd_sync_shadow_page_table() helper to do the page sync.

- Some other refines after the refactoring.

I'll add unit test for the IOVA tree after this series merged to make
sure we won't switch to another new tree implementaion...

The element size in the new IOVA tree should be around
sizeof(GTreeNode + IOMMUTLBEntry) ~= (5*8+4*8) = 72 bytes.  So the
worst case usage ratio would be 72/4K=2%, which still seems acceptable
(it means 8G L2 guest will use 8G*2%=160MB as metadata to maintain the
mapping in QEMU).

I did explicit test with scp this time, copying 1G sized file for >10
times on each of the following case:

- L1 guest, with vIOMMU and with assigned device
- L2 guest, without vIOMMU and with assigned device
- L2 guest, with vIOMMU (so 3-layer nested IOMMU) and with assigned device

Please review.  Thanks,

(Below are old content from previous cover letter)

==

v2:
- fix patchew code style warnings
- interval tree: postpone malloc when inserting; simplify node remove
  a bit where proper [Jason]
- fix up comment and commit message for iommu lock patch [Kevin]
- protect context cache too using the iommu lock [Kevin, Jason]
- add vast comment in patch 8 to explain the modify-PTE problem
  [Jason, Kevin]

Online repo:

  https://github.com/xzpeter/qemu/tree/fix-vtd-dma

This series fixes several major problems that current code has:

- Issue 1: when getting very big PSI UNMAP invalidations, the current
  code is buggy in that we might skip the notification while actually
  we should always send that notification.

- Issue 2: IOTLB is not thread safe, while block dataplane can be
  accessing and updating it in parallel.

- Issue 3: For devices that only registered with UNMAP-only notifiers,
  we don't really need to do page walking for PSIs, we can directly
  deliver the notification down.  For example, vhost.

- Issue 4: unsafe window for MAP notified devices like vfio-pci (and
  in the future, vDPA as well).  The problem is that, now for domain
  invalidations we do this to make sure the shadow page tables are
  correctly synced:

  1. unmap the whole address space
  2. replay the whole address space, map existing pages

  However during step 1 and 2 there will be a very tiny window (it can
  be as big as 3ms) that the shadow page table is either invalid or
  incomplete (since we're rebuilding it up).  That's fatal error since
  devices never know that happending and it's still possible to DMA to
  memories.

Patch 1 fixes issue 1.  I put it at the first since it's picked from
an old post.

Patch 2 is a cleanup to remove useless IntelIOMMUNotifierNode struct.

Patch 3 fixes issue 2.

Patch 4 fixes issue 3.

Patch 5-9 fix issue 4.  Here a very simple interval tree is
implemented based on Gtree.  It's different with general interval tree
in that it does not allow user to pass in private data (e.g.,
translated addresses).  However that benefits us that then we can
merge adjacent interval leaves so that hopefully we won't consume much
memory even if the mappings are a lot (that happens for nested virt -
when mapping the whole L2 guest RAM range, it can be at least in GBs).

Patch 10 is another big cleanup only can work after patch 9.

Tests:

- device assignments to L1, even L2 guests.  With this series applied
  (and the kernel IOMMU patches: https://lkml.org/lkml/2018/4/18/5),
  we can even nest vIOMMU now, e.g., we can specify vIOMMU in L2 guest
  with assigned devices and things will work.  We can't before.

- vhost smoke test for regression.

Please review.  Thanks,

Peter Xu (12):
  intel-iommu: send PSI always even if across PDEs
  intel-iommu: remove IntelIOMMUNotifierNode
  intel-iommu: add iommu lock
  intel-iommu: only do page walk for MAP notifiers
  intel-iommu: introduce vtd_page_walk_info
  intel-iommu: pass in address space when page walk
  intel-iommu: trace domain id during page walk
  util: implement simple iova tree
  intel-iommu: maintain per-device iova ranges
  intel-iommu: simplify page walk logic
  intel-iommu: new vtd_sync_shadow_page_table_range
  intel-iommu: new sync_shadow_page_table

 include/hw/i386/intel_iommu.h |  19 +-
 include/qemu/iova-tree.h  | 134 
 hw/i386/intel_iommu.c | 381 +-

[Qemu-devel] [PATCH v3 01/12] intel-iommu: send PSI always even if across PDEs

2018-05-17 Thread Peter Xu

During IOVA page table walking, there is a special case when the PSI
covers one whole PDE (Page Directory Entry, which contains 512 Page
Table Entries) or more.  In the past, we skip that entry and we don't
notify the IOMMU notifiers.  This is not correct.  We should send UNMAP
notification to registered UNMAP notifiers in this case.

For UNMAP only notifiers, this might cause IOTLBs cached in the devices
even if they were already invalid.  For MAP/UNMAP notifiers like
vfio-pci, this will cause stale page mappings.

This special case doesn't trigger often, but it is very easy to be
triggered by nested device assignments, since in that case we'll
possibly map the whole L2 guest RAM region into the device's IOVA
address space (several GBs at least), which is far bigger than normal
kernel driver usages of the device (tens of MBs normally).

Without this patch applied to L1 QEMU, nested device assignment to L2
guests will dump some errors like:

qemu-system-x86_64: VFIO_MAP_DMA: -17
qemu-system-x86_64: vfio_dma_map(0x557305420c30, 0xad000, 0x1000,
0x7f89a920d000) = -17 (File exists)

Acked-by: Jason Wang 
[peterx: rewrite the commit message]
Signed-off-by: Peter Xu 
---
 hw/i386/intel_iommu.c | 42 ++
 1 file changed, 30 insertions(+), 12 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index fb31de9416..b359efd6f9 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -722,6 +722,15 @@ static int vtd_iova_to_slpte(VTDContextEntry *ce, uint64_t 
iova, bool is_write,
 
 typedef int (*vtd_page_walk_hook)(IOMMUTLBEntry *entry, void *private);
 
+static int vtd_page_walk_one(IOMMUTLBEntry *entry, int level,
+ vtd_page_walk_hook hook_fn, void *private)
+{
+assert(hook_fn);
+trace_vtd_page_walk_one(level, entry->iova, entry->translated_addr,
+entry->addr_mask, entry->perm);
+return hook_fn(entry, private);
+}
+
 /**
  * vtd_page_walk_level - walk over specific level for IOVA range
  *
@@ -781,28 +790,37 @@ static int vtd_page_walk_level(dma_addr_t addr, uint64_t 
start,
  */
 entry_valid = read_cur | write_cur;
 
+entry.target_as = _space_memory;
+entry.iova = iova & subpage_mask;
+entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
+entry.addr_mask = ~subpage_mask;
+
 if (vtd_is_last_slpte(slpte, level)) {
-entry.target_as = _space_memory;
-entry.iova = iova & subpage_mask;
 /* NOTE: this is only meaningful if entry_valid == true */
 entry.translated_addr = vtd_get_slpte_addr(slpte, aw);
-entry.addr_mask = ~subpage_mask;
-entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
 if (!entry_valid && !notify_unmap) {
 trace_vtd_page_walk_skip_perm(iova, iova_next);
 goto next;
 }
-trace_vtd_page_walk_one(level, entry.iova, entry.translated_addr,
-entry.addr_mask, entry.perm);
-if (hook_fn) {
-ret = hook_fn(, private);
-if (ret < 0) {
-return ret;
-}
+ret = vtd_page_walk_one(, level, hook_fn, private);
+if (ret < 0) {
+return ret;
 }
 } else {
 if (!entry_valid) {
-trace_vtd_page_walk_skip_perm(iova, iova_next);
+if (notify_unmap) {
+/*
+ * The whole entry is invalid; unmap it all.
+ * Translated address is meaningless, zero it.
+ */
+entry.translated_addr = 0x0;
+ret = vtd_page_walk_one(, level, hook_fn, private);
+if (ret < 0) {
+return ret;
+}
+} else {
+trace_vtd_page_walk_skip_perm(iova, iova_next);
+}
 goto next;
 }
 ret = vtd_page_walk_level(vtd_get_slpte_addr(slpte, aw), iova,
-- 
2.17.0

[Qemu-devel] [PATCH v3 05/12] intel-iommu: introduce vtd_page_walk_info

2018-05-17 Thread Peter Xu

During the recursive page walking of IOVA page tables, some stack
variables are constant variables and never changed during the whole page
walking procedure.  Isolate them into a struct so that we don't need to
pass those contants down the stack every time and multiple times.

Signed-off-by: Peter Xu 
---
 hw/i386/intel_iommu.c | 84 ++-
 1 file changed, 52 insertions(+), 32 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 9a418abfb6..4953d02ed0 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -747,9 +747,27 @@ static int vtd_iova_to_slpte(VTDContextEntry *ce, uint64_t 
iova, bool is_write,
 
 typedef int (*vtd_page_walk_hook)(IOMMUTLBEntry *entry, void *private);
 
+/**
+ * Constant information used during page walking
+ *
+ * @hook_fn: hook func to be called when detected page
+ * @private: private data to be passed into hook func
+ * @notify_unmap: whether we should notify invalid entries
+ * @aw: maximum address width
+ */
+typedef struct {
+vtd_page_walk_hook hook_fn;
+void *private;
+bool notify_unmap;
+uint8_t aw;
+} vtd_page_walk_info;
+
 static int vtd_page_walk_one(IOMMUTLBEntry *entry, int level,
- vtd_page_walk_hook hook_fn, void *private)
+ vtd_page_walk_info *info)
 {
+vtd_page_walk_hook hook_fn = info->hook_fn;
+void *private = info->private;
+
 assert(hook_fn);
 trace_vtd_page_walk_one(level, entry->iova, entry->translated_addr,
 entry->addr_mask, entry->perm);
@@ -762,17 +780,13 @@ static int vtd_page_walk_one(IOMMUTLBEntry *entry, int 
level,
  * @addr: base GPA addr to start the walk
  * @start: IOVA range start address
  * @end: IOVA range end address (start <= addr < end)
- * @hook_fn: hook func to be called when detected page
- * @private: private data to be passed into hook func
  * @read: whether parent level has read permission
  * @write: whether parent level has write permission
- * @notify_unmap: whether we should notify invalid entries
- * @aw: maximum address width
+ * @info: constant information for the page walk
  */
 static int vtd_page_walk_level(dma_addr_t addr, uint64_t start,
-   uint64_t end, vtd_page_walk_hook hook_fn,
-   void *private, uint32_t level, bool read,
-   bool write, bool notify_unmap, uint8_t aw)
+   uint64_t end, uint32_t level, bool read,
+   bool write, vtd_page_walk_info *info)
 {
 bool read_cur, write_cur, entry_valid;
 uint32_t offset;
@@ -822,24 +836,24 @@ static int vtd_page_walk_level(dma_addr_t addr, uint64_t 
start,
 
 if (vtd_is_last_slpte(slpte, level)) {
 /* NOTE: this is only meaningful if entry_valid == true */
-entry.translated_addr = vtd_get_slpte_addr(slpte, aw);
-if (!entry_valid && !notify_unmap) {
+entry.translated_addr = vtd_get_slpte_addr(slpte, info->aw);
+if (!entry_valid && !info->notify_unmap) {
 trace_vtd_page_walk_skip_perm(iova, iova_next);
 goto next;
 }
-ret = vtd_page_walk_one(, level, hook_fn, private);
+ret = vtd_page_walk_one(, level, info);
 if (ret < 0) {
 return ret;
 }
 } else {
 if (!entry_valid) {
-if (notify_unmap) {
+if (info->notify_unmap) {
 /*
  * The whole entry is invalid; unmap it all.
  * Translated address is meaningless, zero it.
  */
 entry.translated_addr = 0x0;
-ret = vtd_page_walk_one(, level, hook_fn, private);
+ret = vtd_page_walk_one(, level, info);
 if (ret < 0) {
 return ret;
 }
@@ -848,10 +862,9 @@ static int vtd_page_walk_level(dma_addr_t addr, uint64_t 
start,
 }
 goto next;
 }
-ret = vtd_page_walk_level(vtd_get_slpte_addr(slpte, aw), iova,
-  MIN(iova_next, end), hook_fn, private,
-  level - 1, read_cur, write_cur,
-  notify_unmap, aw);
+ret = vtd_page_walk_level(vtd_get_slpte_addr(slpte, info->aw),
+  iova, MIN(iova_next, end), level - 1,
+  read_cur, write_cur, info);
 if (ret < 0) {
 return ret;
 }
@@ -870,28 +883,24 @@ next:
  * @ce: context entry to walk upon
  * @start: IOVA address to start the walk
  * @end: IOVA range end address (start <= addr < end)
- * @hook_fn: the hook that to be called for each detected

[Qemu-devel] [PATCH v3 09/12] intel-iommu: maintain per-device iova ranges

2018-05-17 Thread Peter Xu

For each VTDAddressSpace, now we maintain what IOVA ranges we have
mapped and what we have not.  With that information, now we only send
MAP or UNMAP when necessary.  Say, we don't send MAP notifies if we know
we have already mapped the range, meanwhile we don't send UNMAP notifies
if we know we never mapped the range at all.

Signed-off-by: Peter Xu 
---
 include/hw/i386/intel_iommu.h |  2 ++
 hw/i386/intel_iommu.c | 67 +++
 hw/i386/trace-events  |  2 ++
 3 files changed, 71 insertions(+)

diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 9e0a6c1c6a..90190bfaa1 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -27,6 +27,7 @@
 #include "hw/i386/ioapic.h"
 #include "hw/pci/msi.h"
 #include "hw/sysbus.h"
+#include "qemu/iova-tree.h"
 
 #define TYPE_INTEL_IOMMU_DEVICE "intel-iommu"
 #define INTEL_IOMMU_DEVICE(obj) \
@@ -95,6 +96,7 @@ struct VTDAddressSpace {
 QLIST_ENTRY(VTDAddressSpace) next;
 /* Superset of notifier flags that this address space has */
 IOMMUNotifierFlag notifier_flags;
+IOVATree *iova_tree;  /* Traces mapped IOVA ranges */
 };
 
 struct VTDBus {
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 29fcf2b3a8..5a5175a4ed 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -768,10 +768,71 @@ typedef struct {
 
 static int vtd_page_walk_one(IOMMUTLBEntry *entry, vtd_page_walk_info *info)
 {
+VTDAddressSpace *as = info->as;
 vtd_page_walk_hook hook_fn = info->hook_fn;
 void *private = info->private;
+DMAMap target = {
+.iova = entry->iova,
+.size = entry->addr_mask,
+.translated_addr = entry->translated_addr,
+.perm = entry->perm,
+};
+DMAMap *mapped = iova_tree_find(as->iova_tree, );
 
 assert(hook_fn);
+
+/* Update local IOVA mapped ranges */
+if (entry->perm) {
+if (mapped) {
+/* If it's exactly the same translation, skip */
+if (!memcmp(mapped, , sizeof(target))) {
+trace_vtd_page_walk_one_skip_map(entry->iova, entry->addr_mask,
+ entry->translated_addr);
+return 0;
+} else {
+/*
+ * Translation changed.  This should not happen with
+ * "intel_iommu=on,strict", but it can happen when
+ * delayed flushing is used in guest IOMMU driver
+ * (when without "strict") when page A is reused
+ * before its previous unmap (the unmap can still be
+ * queued in the delayed flushing queue).  Now we do
+ * our best to remap.  Note that there will be a small
+ * window that we don't have map at all.  But that's
+ * the best effort we can do, and logically
+ * well-behaved guests should not really using this
+ * DMA region yet so we should be very safe.
+ */
+IOMMUAccessFlags cache_perm = entry->perm;
+int ret;
+
+/* Emulate an UNMAP */
+entry->perm = IOMMU_NONE;
+trace_vtd_page_walk_one(info->domain_id,
+entry->iova,
+entry->translated_addr,
+entry->addr_mask,
+entry->perm);
+ret = hook_fn(entry, private);
+if (ret) {
+return ret;
+}
+/* Drop any existing mapping */
+iova_tree_remove(as->iova_tree, );
+/* Recover the correct permission */
+entry->perm = cache_perm;
+}
+}
+iova_tree_insert(as->iova_tree, );
+} else {
+if (!mapped) {
+/* Skip since we didn't map this range at all */
+trace_vtd_page_walk_one_skip_unmap(entry->iova, entry->addr_mask);
+return 0;
+}
+iova_tree_remove(as->iova_tree, );
+}
+
 trace_vtd_page_walk_one(info->domain_id, entry->iova,
 entry->translated_addr, entry->addr_mask,
 entry->perm);
@@ -2804,6 +2865,7 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, 
PCIBus *bus, int devfn)
 vtd_dev_as->devfn = (uint8_t)devfn;
 vtd_dev_as->iommu_state = s;
 vtd_dev_as->context_cache_entry.context_cache_gen = 0;
+vtd_dev_as->iova_tree = iova_tree_new();
 
 /*
  * Memory region relationships looks like (Address range shows
@@ -2856,6 +2918,7 @@ static void vtd_address_space_unmap(VTDAddressSpace *as, 
IOMMUNotifier *n)
 hwaddr start = n->start;
 hwaddr end = n->end;
 IntelIOMMUState *s = as->iommu_state;
+DMAMap map;
 
 /*
  *

[Qemu-devel] [PATCH v3 06/12] intel-iommu: pass in address space when page walk

2018-05-17 Thread Peter Xu

We pass in the VTDAddressSpace too.  It'll be used in the follow up
patches.

Signed-off-by: Peter Xu 
---
 hw/i386/intel_iommu.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 4953d02ed0..fe5ee77d46 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -753,9 +753,11 @@ typedef int (*vtd_page_walk_hook)(IOMMUTLBEntry *entry, 
void *private);
  * @hook_fn: hook func to be called when detected page
  * @private: private data to be passed into hook func
  * @notify_unmap: whether we should notify invalid entries
+ * @as: VT-d address space of the device
  * @aw: maximum address width
  */
 typedef struct {
+VTDAddressSpace *as;
 vtd_page_walk_hook hook_fn;
 void *private;
 bool notify_unmap;
@@ -1460,6 +1462,7 @@ static void 
vtd_iotlb_page_invalidate_notify(IntelIOMMUState *s,
 .private = (void *)_as->iommu,
 .notify_unmap = true,
 .aw = s->aw_bits,
+.as = vtd_as,
 };
 
 /*
@@ -2941,6 +2944,7 @@ static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, 
IOMMUNotifier *n)
 .private = (void *)n,
 .notify_unmap = false,
 .aw = s->aw_bits,
+.as = vtd_as,
 };
 
 vtd_page_walk(, 0, ~0ULL, );
-- 
2.17.0

[Qemu-devel] [PATCH 2/4] display: add new bochs-display device

2018-05-17 Thread Gerd Hoffmann

After writing up the virtual mdev device emulating a display supporting
the bochs vbe dispi interface (mbochs.ko) and seeing how simple it
actually is I've figured that would be useful for qemu too.

So, here it is, -device bochs-display.  It is basically -device VGA
without legacy vga emulation.  PCI bar 0 is the framebuffer, PCI bar 2
is mmio with the registers.  The vga registers are simply not there
though, neither in the legacy ioport location nor in the mmio bar.
Consequently it is PCI class DISPLAY_OTHER not DISPLAY_VGA.

So there is no text mode emulation, no weird video modes (planar,
256color palette), no memory window at 0xa.  Just a linear
framebuffer in the pci memory bar.  And the amount of code to emulate
this (and therefore the attack surface) is an order of magnitude smaller
when compared to vga emulation.

Compatibility wise it almost works with OVMF (little tweak needed).
The bochs-drm.ko linux kernel module can handle it just fine too.
So once the OVMF fix is merged UEFI guests should not see any
functional difference to VGA.

Signed-off-by: Gerd Hoffmann 
---
 hw/display/bochs-display.c | 323 +
 hw/display/Makefile.objs   |   1 +
 2 files changed, 324 insertions(+)
 create mode 100644 hw/display/bochs-display.c

diff --git a/hw/display/bochs-display.c b/hw/display/bochs-display.c
new file mode 100644
index 00..beeda58475
--- /dev/null
+++ b/hw/display/bochs-display.c
@@ -0,0 +1,323 @@
+/*
+ * QEMU PCI bochs display adapter.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#include "qemu/osdep.h"
+#include "hw/hw.h"
+#include "hw/pci/pci.h"
+#include "hw/display/bochs-vbe.h"
+
+#include "qapi/error.h"
+
+#include "ui/console.h"
+#include "ui/qemu-pixman.h"
+
+typedef struct BochsDisplayMode {
+pixman_format_code_t format;
+uint32_t bytepp;
+uint32_t width;
+uint32_t height;
+uint32_t stride;
+uint32_t __pad;
+uint64_t offset;
+uint64_t size;
+} BochsDisplayMode;
+
+typedef struct BochsDisplayState {
+/* parent */
+PCIDevicepci;
+
+/* device elements */
+QemuConsole  *con;
+MemoryRegion vram;
+MemoryRegion mmio;
+MemoryRegion vbe;
+MemoryRegion qext;
+
+/* device config */
+uint64_t vgamem;
+
+/* device registers */
+uint16_t vbe_regs[VBE_DISPI_INDEX_NB];
+bool big_endian_fb;
+
+/* device state */
+BochsDisplayMode mode;
+} BochsDisplayState;
+
+#define TYPE_BOCHS_DISPLAY "bochs-display"
+#define BOCHS_DISPLAY(obj) OBJECT_CHECK(BochsDisplayState, (obj), \
+TYPE_BOCHS_DISPLAY)
+
+static const VMStateDescription vmstate_bochs_display = {
+.name = "bochs-display",
+.fields = (VMStateField[]) {
+VMSTATE_PCI_DEVICE(pci, BochsDisplayState),
+VMSTATE_UINT16_ARRAY(vbe_regs, BochsDisplayState, VBE_DISPI_INDEX_NB),
+VMSTATE_BOOL(big_endian_fb, BochsDisplayState),
+VMSTATE_END_OF_LIST()
+}
+};
+
+static uint64_t bochs_display_vbe_read(void *ptr, hwaddr addr,
+   unsigned size)
+{
+BochsDisplayState *s = ptr;
+unsigned int index = addr >> 1;
+
+switch (index) {
+case VBE_DISPI_INDEX_ID:
+return VBE_DISPI_ID5;
+case VBE_DISPI_INDEX_VIDEO_MEMORY_64K:
+return s->vgamem / (64 * 1024);
+}
+
+if (index >= ARRAY_SIZE(s->vbe_regs)) {
+return -1;
+}
+return s->vbe_regs[index];
+}
+
+static void bochs_display_vbe_write(void *ptr, hwaddr addr,
+uint64_t val, unsigned size)
+{
+BochsDisplayState *s = ptr;
+unsigned int index = addr >> 1;
+
+if (index >= ARRAY_SIZE(s->vbe_regs)) {
+return;
+}
+s->vbe_regs[index] = val;
+}
+
+static const MemoryRegionOps bochs_display_vbe_ops = {
+.read = bochs_display_vbe_read,
+.write = bochs_display_vbe_write,
+.valid.min_access_size = 1,
+.valid.max_access_size = 4,
+.impl.min_access_size = 2,
+.impl.max_access_size = 2,
+.endianness = DEVICE_LITTLE_ENDIAN,
+};
+
+static uint64_t bochs_display_qext_read(void *ptr, hwaddr addr,
+unsigned size)
+{
+BochsDisplayState *s = ptr;
+
+switch (addr) {
+case PCI_VGA_QEXT_REG_SIZE:
+return PCI_VGA_QEXT_SIZE;
+case PCI_VGA_QEXT_REG_BYTEORDER:
+return s->big_endian_fb ?
+PCI_VGA_QEXT_BIG_ENDIAN : PCI_VGA_QEXT_LITTLE_ENDIAN;
+default:
+return 0;
+}
+}
+
+static void bochs_display_qext_write(void *ptr, hwaddr addr,
+ uint64_t val, unsigned size)
+{
+BochsDisplayState *s = ptr;
+
+switch (addr) {
+case PCI_VGA_QEXT_REG_BYTEORDER:
+

Re: [Qemu-devel] [PATCH 1/3] hw/arm/smmuv3: Cache/invalidate config data

2018-05-17 Thread Auger Eric

Hi Peter,

On 05/16/2018 08:31 PM, Eric Auger wrote:
> Let's cache config data to avoid fetching and parsing STE/CD
> structures on each translation. We invalidate them on data structure
> invalidation commands.

You may remember that initially I was taking a QemuMutex to protect
IOTLB/cache structures against concurrent access. I checked whether the
BQL was hold on translate and I did not notice any case where it isn't.
However I may have missed some, featuring virtio-blk-pci where
translates are called from IO threads. Looks the problem was reported on
Intel and Peter's is trying to fix the issue with the introduction of a
local mutex.

[Qemu-devel] [PATCH v2 03/10] intel-iommu: add iommu lock
http://patchwork.ozlabs.org/patch/908464/

Also please see the original thread:
https://lists.gnu.org/archive/html/qemu-devel/2018-04/msg04153.html

So I think I may respin this series with the addition of the QemuMutex.

Thanks

Eric
> 
> Signed-off-by: Eric Auger 
> 
> ---
> 
> v11 -> v12:
> - only insert the new config if decode_cfg succeeds
> - use smmu_get_sid for trace_* and store hits/misses in the SMMUDevice
> - s/smmuv3_put_config/smmuv3_flush_config
> - document smmuv3_get_config
> - removing the mutex as BQL does the job
> ---
>  hw/arm/smmu-common.c |  26 -
>  hw/arm/smmuv3.c  | 130 
> +--
>  hw/arm/trace-events  |   6 ++
>  include/hw/arm/smmu-common.h |   5 ++
>  4 files changed, 159 insertions(+), 8 deletions(-)
> 
> diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
> index 3c5f724..7e9827d 100644
> --- a/hw/arm/smmu-common.c
> +++ b/hw/arm/smmu-common.c
> @@ -297,6 +297,8 @@ static AddressSpace *smmu_find_add_as(PCIBus *bus, void 
> *opaque, int devfn)
>  sdev->smmu = s;
>  sdev->bus = bus;
>  sdev->devfn = devfn;
> +sdev->cfg_cache_misses = 0;
> +sdev->cfg_cache_hits = 0;
>  
>  memory_region_init_iommu(>iommu, sizeof(sdev->iommu),
>   s->mrtypename,
> @@ -310,6 +312,24 @@ static AddressSpace *smmu_find_add_as(PCIBus *bus, void 
> *opaque, int devfn)
>  return >as;
>  }
>  
> +IOMMUMemoryRegion *smmu_iommu_mr(SMMUState *s, uint32_t sid)
> +{
> +uint8_t bus_n, devfn;
> +SMMUPciBus *smmu_bus;
> +SMMUDevice *smmu;
> +
> +bus_n = PCI_BUS_NUM(sid);
> +smmu_bus = smmu_find_smmu_pcibus(s, bus_n);
> +if (smmu_bus) {
> +devfn = sid & 0x7;
> +smmu = smmu_bus->pbdev[devfn];
> +if (smmu) {
> +return >iommu;
> +}
> +}
> +return NULL;
> +}
> +
>  static void smmu_base_realize(DeviceState *dev, Error **errp)
>  {
>  SMMUState *s = ARM_SMMU(dev);
> @@ -321,7 +341,7 @@ static void smmu_base_realize(DeviceState *dev, Error 
> **errp)
>  error_propagate(errp, local_err);
>  return;
>  }
> -
> +s->configs = g_hash_table_new_full(NULL, NULL, NULL, g_free);
>  s->smmu_pcibus_by_busptr = g_hash_table_new(NULL, NULL);
>  
>  if (s->primary_bus) {
> @@ -333,7 +353,9 @@ static void smmu_base_realize(DeviceState *dev, Error 
> **errp)
>  
>  static void smmu_base_reset(DeviceState *dev)
>  {
> -/* will be filled later on */
> +SMMUState *s = ARM_SMMU(dev);
> +
> +g_hash_table_remove_all(s->configs);
>  }
>  
>  static Property smmu_dev_properties[] = {
> diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
> index 42dc521..d3b64c2 100644
> --- a/hw/arm/smmuv3.c
> +++ b/hw/arm/smmuv3.c
> @@ -537,6 +537,58 @@ static int smmuv3_decode_config(IOMMUMemoryRegion *mr, 
> SMMUTransCfg *cfg,
>  return decode_cd(cfg, , event);
>  }
>  
> +/**
> + * smmuv3_get_config - Look up for a cached copy of configuration data for
> + * @sdev and on cache miss performs a configuration structure decoding from
> + * guest RAM.
> + *
> + * @sdev: SMMUDevice handle
> + * @event: output event info
> + *
> + * The configuration cache contains data resulting from both STE and CD
> + * decoding under the form of an SMMUTransCfg struct. The hash table is 
> indexed
> + * by the SMMUDevice handle.
> + */
> +static SMMUTransCfg *smmuv3_get_config(SMMUDevice *sdev, SMMUEventInfo 
> *event)
> +{
> +SMMUv3State *s = sdev->smmu;
> +SMMUState *bc = >smmu_state;
> +SMMUTransCfg *cfg;
> +
> +cfg = g_hash_table_lookup(bc->configs, sdev);
> +if (cfg) {
> +sdev->cfg_cache_hits += 1;
> +trace_smmuv3_config_cache_hit(smmu_get_sid(sdev),
> +sdev->cfg_cache_hits, sdev->cfg_cache_misses,
> +100 * sdev->cfg_cache_hits /
> +(sdev->cfg_cache_hits + sdev->cfg_cache_misses));
> +} else {
> +sdev->cfg_cache_misses += 1;
> +trace_smmuv3_config_cache_miss(smmu_get_sid(sdev),
> +sdev->cfg_cache_hits, sdev->cfg_cache_misses,
> +100 * sdev->cfg_cache_hits /
> +

Re: [Qemu-devel] [PATCH v4 01/10] block: Introduce API for copy offloading

2018-05-17 Thread Stefan Hajnoczi

On Fri, May 11, 2018 at 08:08:14PM +0800, Fam Zheng wrote:
> Introduce the bdrv_co_copy_range() API for copy offloading.  Block
> drivers implementing this API support efficient copy operations that
> avoid reading each block from the source device and writing it to the
> destination devices.  Examples of copy offload primitives are SCSI
> EXTENDED COPY and Linux copy_file_range(2).
> 
> Signed-off-by: Fam Zheng 
> ---
>  block/io.c| 96 
> +++
>  include/block/block.h | 32 
>  include/block/block_int.h | 38 +++
>  3 files changed, 166 insertions(+)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [Qemu-devel] [edk2] [PATCH 4/4] ovmf: process TPM PPI request in AfterConsole()

2018-05-17 Thread Laszlo Ersek

On 05/15/18 14:30, marcandre.lur...@redhat.com wrote:
> From: Marc-André Lureau 
> 
> Call Tcg2PhysicalPresenceLibProcessRequest() to process pending PPI
> requests from PlatformBootManagerAfterConsole().
> 
> Laszlo understanding of edk2 is that the PPI operation processing was
> meant to occur *entirely* before End-Of-Dxe, so that 3rd party UEFI
> drivers couldn't interfere with PPI opcode processing *at all*.
> 
> He suggested that we should *not* call
> Tcg2PhysicalPresenceLibProcessRequest() from BeforeConsole(). Because,
> an "auth" console, i.e. one that does not depend on a 3rd party
> driver, is *in general* impossible to guarantee. Instead we could opt
> to trust 3rd party drivers, and use the "normal" console(s) in
> AfterConsole(), in order to let the user confirm the PPI requests. It
> will depend on the user to enable Secure Boot, so that the
> trustworthiness of those 3rd party drivers is ensured. If an attacker
> roots the guest OS from within, queues some TPM2 PPI requests, and
> also modifies drivers on the EFI system partition and/or in GPU option
> ROMs (?), then those drivers will not load after guest reboot, and
> thus the dependent console(s) won't be used for confirming the PPI
> requests.
> 
> Signed-off-by: Marc-André Lureau 
> ---
>  OvmfPkg/Library/PlatformBootManagerLib/BdsPlatform.c  | 8 
>  .../PlatformBootManagerLib/PlatformBootManagerLib.inf | 2 ++
>  2 files changed, 10 insertions(+)
> 
> diff --git a/OvmfPkg/Library/PlatformBootManagerLib/BdsPlatform.c 
> b/OvmfPkg/Library/PlatformBootManagerLib/BdsPlatform.c
> index 004b753f4d26..8b1beaa3e207 100644
> --- a/OvmfPkg/Library/PlatformBootManagerLib/BdsPlatform.c
> +++ b/OvmfPkg/Library/PlatformBootManagerLib/BdsPlatform.c
> @@ -16,6 +16,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  
>  //
> @@ -1410,6 +1411,13 @@ PlatformBootManagerAfterConsole (
>//
>PciAcpiInitialization ();
>  
> +
> +  //
> +  // Process TPM PPI request
> +  //
> +  Tcg2PhysicalPresenceLibProcessRequest (NULL);
> +
> +

Please just keep one empty line before and after the new code. With that
cleanup, for this patch:

Reviewed-by: Laszlo Ersek 

This series is a very nice work IMO, thank you both Stefan and
Marc-André. I hope v2 can be merged!

Thanks!
Laszlo

>//
>// Process QEMU's -kernel command line option
>//
> diff --git 
> a/OvmfPkg/Library/PlatformBootManagerLib/PlatformBootManagerLib.inf 
> b/OvmfPkg/Library/PlatformBootManagerLib/PlatformBootManagerLib.inf
> index 27789b7377bc..4b72c44bcf0a 100644
> --- a/OvmfPkg/Library/PlatformBootManagerLib/PlatformBootManagerLib.inf
> +++ b/OvmfPkg/Library/PlatformBootManagerLib/PlatformBootManagerLib.inf
> @@ -38,6 +38,7 @@ [Packages]
>IntelFrameworkModulePkg/IntelFrameworkModulePkg.dec
>SourceLevelDebugPkg/SourceLevelDebugPkg.dec
>OvmfPkg/OvmfPkg.dec
> +  SecurityPkg/SecurityPkg.dec
>  
>  [LibraryClasses]
>BaseLib
> @@ -56,6 +57,7 @@ [LibraryClasses]
>LoadLinuxLib
>QemuBootOrderLib
>UefiLib
> +  Tcg2PhysicalPresenceLib
>  
>  [Pcd]
>gUefiOvmfPkgTokenSpaceGuid.PcdEmuVariableEvent
>

Re: [Qemu-devel] [PATCH v7 11/11] tests: functional tests for QMP command set-numa-node

2018-05-17 Thread Igor Mammedov

On Wed, 16 May 2018 19:12:30 -0300
Eduardo Habkost  wrote:

> On Fri, May 04, 2018 at 10:37:49AM +0200, Igor Mammedov wrote:
> >  * start QEMU with 2 unmapped cpus,
> >  * while in preconfig state
> > * add 2 numa nodes
> > * assign cpus to them
> >  * exit preconfig and in running state check that cpus
> >are mapped correctly.
> > 
> > Signed-off-by: Igor Mammedov 
> > ---
> > v6:
> >   * replace 'cont' with 'exit-preconfig' command
> > v5:
> >   * s/qobject_to_qdict(/qobject_to(QDict,/
> >   * s/-preconfig/--preconfig/
> > v4:
> >   * drop duplicate is_err() and reuse qmp_rsp_is_err() wich is moved
> > to generic file libqtest.c. (Eric Blake )
> > 
> > FIXUP! tests: functional tests for QMP command  set-numa-node
> > ---
> >  tests/libqtest.h  |  9 
> >  tests/libqtest.c  |  7 +++
> >  tests/numa-test.c | 61 
> > +++
> >  tests/qmp-test.c  |  7 ---
> >  4 files changed, 77 insertions(+), 7 deletions(-)
> > 
> > diff --git a/tests/libqtest.h b/tests/libqtest.h
> > index cbe8df4..ac52872 100644
> > --- a/tests/libqtest.h
> > +++ b/tests/libqtest.h
> > @@ -972,4 +972,13 @@ void qtest_qmp_device_add(const char *driver, const 
> > char *id, const char *fmt,
> >   */
> >  void qtest_qmp_device_del(const char *id);
> >  
> > +/**
> > + * qmp_rsp_is_err:
> > + * @rsp: QMP response to check for error
> > + *
> > + * Test @rsp for error and discard @rsp.
> > + * Returns 'true' if there is error in @rsp and 'false' otherwise.
> > + */
> > +bool qmp_rsp_is_err(QDict *rsp);
> > +
> >  #endif
> > diff --git a/tests/libqtest.c b/tests/libqtest.c
> > index 6f33a37..33426d5 100644
> > --- a/tests/libqtest.c
> > +++ b/tests/libqtest.c
> > @@ -1098,3 +1098,10 @@ void qtest_qmp_device_del(const char *id)
> >  QDECREF(response1);
> >  QDECREF(response2);
> >  }
> > +
> > +bool qmp_rsp_is_err(QDict *rsp)
> > +{
> > +QDict *error = qdict_get_qdict(rsp, "error");
> > +QDECREF(rsp);  
> 
> 
> Oops:
> 
>   tests/libqtest.c: In function ‘qmp_rsp_is_err’:
>   tests/libqtest.c:1105:5: error: implicit declaration of function ‘QDECREF’ 
> [-Werror=implicit-function-declaration]
>QDECREF(rsp);
>^
>   tests/libqtest.c:1105:5: error: nested extern declaration of ‘QDECREF’ 
> [-Werror=nested-externs]
> 
> I've fixed this on numa-next, replaced QDECREF with object_unref.
I guess QDECREF was removed while patch were sitting on the list.

Re: [Qemu-devel] [edk2] [PATCH 2/4] ovmf: add QemuTpm.h header

2018-05-17 Thread Laszlo Ersek

On 05/15/18 14:30, marcandre.lur...@redhat.com wrote:
> From: Marc-André Lureau 
> 
> Add some common macros and type definitions corresponding to the QEMU
> TPM interface.
> 
> Signed-off-by: Marc-André Lureau 
> ---
>  OvmfPkg/Include/IndustryStandard/QemuTpm.h | 67 ++
>  1 file changed, 67 insertions(+)
>  create mode 100644 OvmfPkg/Include/IndustryStandard/QemuTpm.h
> 
> diff --git a/OvmfPkg/Include/IndustryStandard/QemuTpm.h 
> b/OvmfPkg/Include/IndustryStandard/QemuTpm.h
> new file mode 100644
> index ..054cf79374b5
> --- /dev/null
> +++ b/OvmfPkg/Include/IndustryStandard/QemuTpm.h
> @@ -0,0 +1,67 @@
> +/** @file
> +  Macro and type definitions corresponding to the QEMU TPM interface.
> +
> +  Refer to "docs/specs/tpm.txt" in the QEMU source directory.
> +
> +  Copyright (C) 2018, Red Hat, Inc.
> +  Copyright (c) 2018, IBM Corporation. All rights reserved.
> +
> +  This program and the accompanying materials are licensed and made available
> +  under the terms and conditions of the BSD License which accompanies this
> +  distribution. The full text of the license may be found at
> +  http://opensource.org/licenses/bsd-license.php
> +
> +  THE PROGRAM IS DISTRIBUTED UNDER THE BSD LICENSE ON AN "AS IS" BASIS, 
> WITHOUT
> +  WARRANTIES OR REPRESENTATIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED.
> +**/
> +
> +#ifndef __QEMU_TPM_H__
> +#define __QEMU_TPM_H__
> +
> +#include 
> +
> +/* whether function is blocked by BIOS settings; bits 0, 1, 2 */
> +#define QEMU_TPM_PPI_FUNC_NOT_IMPLEMENTED (0 << 0)
> +#define QEMU_TPM_PPI_FUNC_BIOS_ONLY   (1 << 0)
> +#define QEMU_TPM_PPI_FUNC_BLOCKED (2 << 0)
> +#define QEMU_TPM_PPI_FUNC_ALLOWED_USR_REQ (3 << 0)
> +#define QEMU_TPM_PPI_FUNC_ALLOWED_USR_NOT_REQ (4 << 0)
> +#define QEMU_TPM_PPI_FUNC_MASK(7 << 0)
> +
> +//
> +// The following structure is shared between firmware and ACPI.
> +//
> +#pragma pack (1)
> +typedef struct {
> +  UINT8 Func[256];   /* func */
> +  UINT8 In;  /* ppin */
> +  UINT32 Ip; /* ppip */
> +  UINT32 Response;   /* pprp */
> +  UINT32 Request;/* pprq */
> +  UINT32 RequestParameter;   /* pprm */
> +  UINT32 LastRequest;/* lppr */
> +  UINT32 FRet;   /* fret */
> +  UINT8 Res1[0x40];  /* res1 */
> +  UINT8 NextStep;/* next_step */
> +} QEMU_TPM_PPI;
> +#pragma pack ()
> +
> +//
> +// The following structure is for the fw_cfg etc/tpm/config file.
> +//
> +#pragma pack (1)
> +typedef struct {
> +  UINT32 PpiAddress;
> +  UINT8 TpmVersion;
> +  UINT8 PpiVersion;
> +} QEMU_FWCFG_TPM_CONFIG;
> +#pragma pack ()
> +
> +#define QEMU_TPM_VERSION_UNSPEC0
> +#define QEMU_TPM_VERSION_1_2   1
> +#define QEMU_TPM_VERSION_2 2
> +
> +#define QEMU_TPM_PPI_VERSION_NONE  0
> +#define QEMU_TPM_PPI_VERSION_1_30  1
> +
> +#endif
> 

(1) Please update the subject line as discussed earlier; for example:

OvmfPkg/IndustryStandard: add QemuTpm.h header

(2) Please convert the file to CRLF.

(3) Please use the "// ..." comment style near the fields of QEMU_TPM_PPI.

(4) Please align the member identifiers in each of QEMU_TPM_PPI and
QEMU_FWCFG_TPM_CONFIG -- in practice this means inserting another space
char after each "UINT8" type name.

With those changes:

Acked-by: Laszlo Ersek 

Thanks!
Laszlo

[Qemu-devel] [PATCH v4 01/14] memory-device: drop assert related to align and start of address space

2018-05-17 Thread David Hildenbrand

The start of the address space does not have to be aligned for the
search. Handle this case explicitly when starting the search for a new
address.

Signed-off-by: David Hildenbrand 
---
 hw/mem/memory-device.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/hw/mem/memory-device.c b/hw/mem/memory-device.c
index 3e04f3954e..361d38bfc5 100644
--- a/hw/mem/memory-device.c
+++ b/hw/mem/memory-device.c
@@ -116,7 +116,6 @@ uint64_t memory_device_get_free_addr(MachineState *ms, 
const uint64_t *hint,
 address_space_start = ms->device_memory->base;
 address_space_end = address_space_start +
 memory_region_size(>device_memory->mr);
-g_assert(QEMU_ALIGN_UP(address_space_start, align) == address_space_start);
 g_assert(address_space_end >= address_space_start);
 
 memory_device_check_addable(ms, size, errp);
@@ -149,7 +148,7 @@ uint64_t memory_device_get_free_addr(MachineState *ms, 
const uint64_t *hint,
 return 0;
 }
 } else {
-new_addr = address_space_start;
+new_addr = QEMU_ALIGN_UP(address_space_start, align);
 }
 
 /* find address range that will fit new memory device */
-- 
2.14.3

[Qemu-devel] [PATCH v4 10/14] memory-device: new functions to handle plug/unplug

2018-05-17 Thread David Hildenbrand

We will need a handful of new functions:
- set_addr(): To set the calculated address
- get_memory_region(): To add it to the memory region container
- get_addr(): If the device has any specific alignment requirements

Using these and the existing functions, we can properly plug/unplug
memory devices.

Signed-off-by: David Hildenbrand 
---
 include/hw/mem/memory-device.h | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/include/hw/mem/memory-device.h b/include/hw/mem/memory-device.h
index 2853b084b5..62d906be50 100644
--- a/include/hw/mem/memory-device.h
+++ b/include/hw/mem/memory-device.h
@@ -29,14 +29,24 @@ typedef struct MemoryDeviceState {
 Object parent_obj;
 } MemoryDeviceState;
 
+/*
+ * MemoryDeviceClass functions should only be called on realized
+ * MemoryDevice instances.
+ */
 typedef struct MemoryDeviceClass {
 InterfaceClass parent_class;
 
+/* required functions that have to be implemented */
 uint64_t (*get_addr)(const MemoryDeviceState *md);
+void (*set_addr)(MemoryDeviceState *md, uint64_t addr);
+MemoryRegion *(*get_memory_region)(MemoryDeviceState *md);
 uint64_t (*get_plugged_size)(const MemoryDeviceState *md);
 uint64_t (*get_region_size)(const MemoryDeviceState *md);
 void (*fill_device_info)(const MemoryDeviceState *md,
  MemoryDeviceInfo *info);
+
+/* optional functions that can be implemented */
+uint64_t (*get_align)(const MemoryDeviceState *md);
 } MemoryDeviceClass;
 
 MemoryDeviceInfoList *qmp_memory_device_list(void);
-- 
2.14.3

[Qemu-devel] [PATCH v4 06/14] spapr: prepare for multi stage hotplug handlers

2018-05-17 Thread David Hildenbrand

For multi stage hotplug handlers, we'll have to do some error handling
in some hotplug functions, so let's use a local error variable (except
for unplug requests).

Also, add code to pass control to the final stage hotplug handler at the
parent bus.

Signed-off-by: David Hildenbrand 
---
 hw/ppc/spapr.c | 54 +++---
 1 file changed, 43 insertions(+), 11 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index ebf30dd60b..b7c5c95f7a 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -3571,27 +3571,48 @@ static void spapr_machine_device_plug(HotplugHandler 
*hotplug_dev,
 {
 MachineState *ms = MACHINE(hotplug_dev);
 sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(ms);
+Error *local_err = NULL;
 
+/* final stage hotplug handler */
 if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
 int node;
 
 if (!smc->dr_lmb_enabled) {
-error_setg(errp, "Memory hotplug not supported for this machine");
-return;
+error_setg(_err,
+   "Memory hotplug not supported for this machine");
+goto out;
 }
-node = object_property_get_uint(OBJECT(dev), PC_DIMM_NODE_PROP, errp);
-if (*errp) {
-return;
+node = object_property_get_uint(OBJECT(dev), PC_DIMM_NODE_PROP,
+_err);
+if (local_err) {
+goto out;
 }
 if (node < 0 || node >= MAX_NODES) {
-error_setg(errp, "Invaild node %d", node);
-return;
+error_setg(_err, "Invaild node %d", node);
+goto out;
 }
 
-spapr_memory_plug(hotplug_dev, dev, node, errp);
+spapr_memory_plug(hotplug_dev, dev, node, _err);
 } else if (object_dynamic_cast(OBJECT(dev), TYPE_SPAPR_CPU_CORE)) {
-spapr_core_plug(hotplug_dev, dev, errp);
+spapr_core_plug(hotplug_dev, dev, _err);
+} else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
+hotplug_handler_plug(dev->parent_bus->hotplug_handler, dev, 
_err);
+}
+out:
+error_propagate(errp, local_err);
+}
+
+static void spapr_machine_device_unplug(HotplugHandler *hotplug_dev,
+DeviceState *dev, Error **errp)
+{
+Error *local_err = NULL;
+
+/* final stage hotplug handler */
+if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
+hotplug_handler_unplug(dev->parent_bus->hotplug_handler, dev,
+   _err);
 }
+error_propagate(errp, local_err);
 }
 
 static void spapr_machine_device_unplug_request(HotplugHandler *hotplug_dev,
@@ -3618,17 +3639,27 @@ static void 
spapr_machine_device_unplug_request(HotplugHandler *hotplug_dev,
 return;
 }
 spapr_core_unplug_request(hotplug_dev, dev, errp);
+} else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
+hotplug_handler_unplug_request(dev->parent_bus->hotplug_handler, dev,
+   errp);
 }
 }
 
 static void spapr_machine_device_pre_plug(HotplugHandler *hotplug_dev,
   DeviceState *dev, Error **errp)
 {
+Error *local_err = NULL;
+
+/* final stage hotplug handler */
 if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
-spapr_memory_pre_plug(hotplug_dev, dev, errp);
+spapr_memory_pre_plug(hotplug_dev, dev, _err);
 } else if (object_dynamic_cast(OBJECT(dev), TYPE_SPAPR_CPU_CORE)) {
-spapr_core_pre_plug(hotplug_dev, dev, errp);
+spapr_core_pre_plug(hotplug_dev, dev, _err);
+} else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
+hotplug_handler_pre_plug(dev->parent_bus->hotplug_handler, dev,
+ _err);
 }
+error_propagate(errp, local_err);
 }
 
 static HotplugHandler *spapr_get_hotplug_handler(MachineState *machine,
@@ -3988,6 +4019,7 @@ static void spapr_machine_class_init(ObjectClass *oc, 
void *data)
 mc->get_default_cpu_node_id = spapr_get_default_cpu_node_id;
 mc->possible_cpu_arch_ids = spapr_possible_cpu_arch_ids;
 hc->unplug_request = spapr_machine_device_unplug_request;
+hc->unplug = spapr_machine_device_unplug;
 
 smc->dr_lmb_enabled = true;
 mc->default_cpu_type = POWERPC_CPU_TYPE_NAME("power8_v2.0");
-- 
2.14.3

Re: [Qemu-devel] [PATCH V7 10/17] qmp event: Add COLO_EXIT event to notify users while exited COLO

2018-05-17 Thread Markus Armbruster

Zhang Chen  writes:

> On Tue, May 15, 2018 at 10:29 PM, Markus Armbruster 
> wrote:
>
>> Zhang Chen  writes:
>>
>> > From: zhanghailiang 
>> >
>> > If some errors happen during VM's COLO FT stage, it's important to
>> > notify the users of this event. Together with 'x-colo-lost-heartbeat',
>> > Users can intervene in COLO's failover work immediately.
>> > If users don't want to get involved in COLO's failover verdict,
>> > it is still necessary to notify users that we exited COLO mode.
>> >
>> > Signed-off-by: zhanghailiang 
>> > Signed-off-by: Li Zhijian 
>> > Signed-off-by: Zhang Chen 
>> > Reviewed-by: Eric Blake 
>> > ---
>> >  migration/colo.c| 20 
>> >  qapi/migration.json | 37 +
>> >  2 files changed, 57 insertions(+)
>> >
>> > diff --git a/migration/colo.c b/migration/colo.c
>> > index c083d36..8ca6381 100644
>> > --- a/migration/colo.c
>> > +++ b/migration/colo.c
>> > @@ -28,6 +28,7 @@
>> >  #include "net/colo-compare.h"
>> >  #include "net/colo.h"
>> >  #include "block/block.h"
>> > +#include "qapi/qapi-events-migration.h"
>> >
>> >  static bool vmstate_loading;
>> >  static Notifier packets_compare_notifier;
>> > @@ -514,6 +515,18 @@ out:
>> >  qemu_fclose(fb);
>> >  }
>> >
>> > +/*
>> > + * There are only two reasons we can go here, some error happened.
>> > + * Or the user triggered failover.
>> > + */
>> > +if (failover_get_state() == FAILOVER_STATUS_NONE) {
>> > +qapi_event_send_colo_exit(COLO_MODE_PRIMARY,
>> > +  COLO_EXIT_REASON_ERROR, NULL);
>> > +} else {
>> > +qapi_event_send_colo_exit(COLO_MODE_PRIMARY,
>> > +  COLO_EXIT_REASON_REQUEST, NULL);
>> > +}
>>
>> Your comment makes me suspect failover_get_state() can only be
>> FAILOVER_STATUS_NONE or FAILOVER_STATUS_REQUIRE here.  Is that correct?
>>
>> If yes, I recommend to add a suitable assertion.

... to make the possible states immediately obvious.  The fact that you
felt a need for a comment is further evidence of non-obviousness.

>
> Yes, and what kinds of 'suitable assertion'? Just for the
> 'failover_get_state()' ?

Here's one way to skin this cat:

  failover_state = failover_get_state();
  if (failover_state == FAILOVER_STATUS_NONE) {
  qapi_event_send_colo_exit(COLO_MODE_PRIMARY,
COLO_EXIT_REASON_ERROR, NULL);
  } else {
  assert(failover_state == FAILOVER_STATUS_REQUIRE);
  qapi_event_send_colo_exit(COLO_MODE_PRIMARY,
COLO_EXIT_REASON_REQUEST, NULL);
  }

Another one:

  switch (failover_get_state() {
  case FAILOVER_STATUS_NONE:
  qapi_event_send_colo_exit(COLO_MODE_PRIMARY,
COLO_EXIT_REASON_ERROR, NULL);
  break;
  case FAILOVER_STATUS_REQUIRE:
  qapi_event_send_colo_exit(COLO_MODE_PRIMARY,
COLO_EXIT_REASON_REQUEST, NULL);
  break;
  default:
  abort();
  }

Either way, the possible states are immediately obvious.  The run time
check is a nice bonus.

With just your comment, the reader still has to make the connection from
the comment's prose to states, i.e. from "some error happened" to
FAILOVER_STATUS_NONE, and from "user triggered failover" to
FAILOVER_STATUS_REQUIRE.

[...]

[Qemu-devel] [PATCH v3 04/12] intel-iommu: only do page walk for MAP notifiers

2018-05-17 Thread Peter Xu

For UNMAP-only IOMMU notifiers, we don't really need to walk the page
tables.  Fasten that procedure by skipping the page table walk.  That
should boost performance for UNMAP-only notifiers like vhost.

Signed-off-by: Peter Xu 
---
 include/hw/i386/intel_iommu.h |  2 ++
 hw/i386/intel_iommu.c | 43 +++
 2 files changed, 40 insertions(+), 5 deletions(-)

diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index ee517704e7..9e0a6c1c6a 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -93,6 +93,8 @@ struct VTDAddressSpace {
 IntelIOMMUState *iommu_state;
 VTDContextCacheEntry context_cache_entry;
 QLIST_ENTRY(VTDAddressSpace) next;
+/* Superset of notifier flags that this address space has */
+IOMMUNotifierFlag notifier_flags;
 };
 
 struct VTDBus {
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 112971638d..9a418abfb6 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -138,6 +138,12 @@ static inline void vtd_iommu_unlock(IntelIOMMUState *s)
 qemu_mutex_unlock(>iommu_lock);
 }
 
+/* Whether the address space needs to notify new mappings */
+static inline gboolean vtd_as_notify_mappings(VTDAddressSpace *as)
+{
+return as->notifier_flags & IOMMU_NOTIFIER_MAP;
+}
+
 /* GHashTable functions */
 static gboolean vtd_uint64_equal(gconstpointer v1, gconstpointer v2)
 {
@@ -1433,14 +1439,35 @@ static void 
vtd_iotlb_page_invalidate_notify(IntelIOMMUState *s,
 VTDAddressSpace *vtd_as;
 VTDContextEntry ce;
 int ret;
+hwaddr size = (1 << am) * VTD_PAGE_SIZE;
 
 QLIST_FOREACH(vtd_as, &(s->notifiers_list), next) {
 ret = vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus),
vtd_as->devfn, );
 if (!ret && domain_id == VTD_CONTEXT_ENTRY_DID(ce.hi)) {
-vtd_page_walk(, addr, addr + (1 << am) * VTD_PAGE_SIZE,
-  vtd_page_invalidate_notify_hook,
-  (void *)_as->iommu, true, s->aw_bits);
+if (vtd_as_notify_mappings(vtd_as)) {
+/*
+ * For MAP-inclusive notifiers, we need to walk the
+ * page table to sync the shadow page table.
+ */
+vtd_page_walk(, addr, addr + size,
+  vtd_page_invalidate_notify_hook,
+  (void *)_as->iommu, true, s->aw_bits);
+} else {
+/*
+ * For UNMAP-only notifiers, we don't need to walk the
+ * page tables.  We just deliver the PSI down to
+ * invalidate caches.
+ */
+IOMMUTLBEntry entry = {
+.target_as = _space_memory,
+.iova = addr,
+.translated_addr = 0,
+.addr_mask = size - 1,
+.perm = IOMMU_NONE,
+};
+memory_region_notify_iommu(_as->iommu, entry);
+}
 }
 }
 }
@@ -2380,6 +2407,9 @@ static void 
vtd_iommu_notify_flag_changed(IOMMUMemoryRegion *iommu,
 exit(1);
 }
 
+/* Update per-address-space notifier flags */
+vtd_as->notifier_flags = new;
+
 if (old == IOMMU_NOTIFIER_NONE) {
 /* Insert new ones */
 QLIST_INSERT_HEAD(>notifiers_list, vtd_as, next);
@@ -2890,8 +2920,11 @@ static void vtd_iommu_replay(IOMMUMemoryRegion 
*iommu_mr, IOMMUNotifier *n)
   PCI_FUNC(vtd_as->devfn),
   VTD_CONTEXT_ENTRY_DID(ce.hi),
   ce.hi, ce.lo);
-vtd_page_walk(, 0, ~0ULL, vtd_replay_hook, (void *)n, false,
-  s->aw_bits);
+if (vtd_as_notify_mappings(vtd_as)) {
+/* This is required only for MAP typed notifiers */
+vtd_page_walk(, 0, ~0ULL, vtd_replay_hook, (void *)n, false,
+  s->aw_bits);
+}
 } else {
 trace_vtd_replay_ce_invalid(bus_n, PCI_SLOT(vtd_as->devfn),
 PCI_FUNC(vtd_as->devfn));
-- 
2.17.0

[Qemu-devel] [PATCH v3 07/12] intel-iommu: trace domain id during page walk

2018-05-17 Thread Peter Xu

This patch only modifies the trace points.

Previously we were tracing page walk levels.  They are redundant since
we have page mask (size) already.  Now we trace something much more
useful which is the domain ID of the page walking.  That can be very
useful when we trace more than one devices on the same system, so that
we can know which map is for which domain.

Signed-off-by: Peter Xu 
---
 hw/i386/intel_iommu.c | 16 ++--
 hw/i386/trace-events  |  2 +-
 2 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index fe5ee77d46..29fcf2b3a8 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -755,6 +755,7 @@ typedef int (*vtd_page_walk_hook)(IOMMUTLBEntry *entry, 
void *private);
  * @notify_unmap: whether we should notify invalid entries
  * @as: VT-d address space of the device
  * @aw: maximum address width
+ * @domain: domain ID of the page walk
  */
 typedef struct {
 VTDAddressSpace *as;
@@ -762,17 +763,18 @@ typedef struct {
 void *private;
 bool notify_unmap;
 uint8_t aw;
+uint16_t domain_id;
 } vtd_page_walk_info;
 
-static int vtd_page_walk_one(IOMMUTLBEntry *entry, int level,
- vtd_page_walk_info *info)
+static int vtd_page_walk_one(IOMMUTLBEntry *entry, vtd_page_walk_info *info)
 {
 vtd_page_walk_hook hook_fn = info->hook_fn;
 void *private = info->private;
 
 assert(hook_fn);
-trace_vtd_page_walk_one(level, entry->iova, entry->translated_addr,
-entry->addr_mask, entry->perm);
+trace_vtd_page_walk_one(info->domain_id, entry->iova,
+entry->translated_addr, entry->addr_mask,
+entry->perm);
 return hook_fn(entry, private);
 }
 
@@ -843,7 +845,7 @@ static int vtd_page_walk_level(dma_addr_t addr, uint64_t 
start,
 trace_vtd_page_walk_skip_perm(iova, iova_next);
 goto next;
 }
-ret = vtd_page_walk_one(, level, info);
+ret = vtd_page_walk_one(, info);
 if (ret < 0) {
 return ret;
 }
@@ -855,7 +857,7 @@ static int vtd_page_walk_level(dma_addr_t addr, uint64_t 
start,
  * Translated address is meaningless, zero it.
  */
 entry.translated_addr = 0x0;
-ret = vtd_page_walk_one(, level, info);
+ret = vtd_page_walk_one(, info);
 if (ret < 0) {
 return ret;
 }
@@ -1463,6 +1465,7 @@ static void 
vtd_iotlb_page_invalidate_notify(IntelIOMMUState *s,
 .notify_unmap = true,
 .aw = s->aw_bits,
 .as = vtd_as,
+.domain_id = domain_id,
 };
 
 /*
@@ -2945,6 +2948,7 @@ static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, 
IOMMUNotifier *n)
 .notify_unmap = false,
 .aw = s->aw_bits,
 .as = vtd_as,
+.domain_id = VTD_CONTEXT_ENTRY_DID(ce.hi),
 };
 
 vtd_page_walk(, 0, ~0ULL, );
diff --git a/hw/i386/trace-events b/hw/i386/trace-events
index 22d44648af..ca23ba9fad 100644
--- a/hw/i386/trace-events
+++ b/hw/i386/trace-events
@@ -39,7 +39,7 @@ vtd_fault_disabled(void) "Fault processing disabled for 
context entry"
 vtd_replay_ce_valid(uint8_t bus, uint8_t dev, uint8_t fn, uint16_t domain, 
uint64_t hi, uint64_t lo) "replay valid context device 
%02"PRIx8":%02"PRIx8".%02"PRIx8" domain 0x%"PRIx16" hi 0x%"PRIx64" lo 0x%"PRIx64
 vtd_replay_ce_invalid(uint8_t bus, uint8_t dev, uint8_t fn) "replay invalid 
context device %02"PRIx8":%02"PRIx8".%02"PRIx8
 vtd_page_walk_level(uint64_t addr, uint32_t level, uint64_t start, uint64_t 
end) "walk (base=0x%"PRIx64", level=%"PRIu32") iova range 0x%"PRIx64" - 
0x%"PRIx64
-vtd_page_walk_one(uint32_t level, uint64_t iova, uint64_t gpa, uint64_t mask, 
int perm) "detected page level 0x%"PRIx32" iova 0x%"PRIx64" -> gpa 0x%"PRIx64" 
mask 0x%"PRIx64" perm %d"
+vtd_page_walk_one(uint16_t domain, uint64_t iova, uint64_t gpa, uint64_t mask, 
int perm) "domain 0x%"PRIu16" iova 0x%"PRIx64" -> gpa 0x%"PRIx64" mask 
0x%"PRIx64" perm %d"
 vtd_page_walk_skip_read(uint64_t iova, uint64_t next) "Page walk skip iova 
0x%"PRIx64" - 0x%"PRIx64" due to unable to read"
 vtd_page_walk_skip_perm(uint64_t iova, uint64_t next) "Page walk skip iova 
0x%"PRIx64" - 0x%"PRIx64" due to perm empty"
 vtd_page_walk_skip_reserve(uint64_t iova, uint64_t next) "Page walk skip iova 
0x%"PRIx64" - 0x%"PRIx64" due to rsrv set"
-- 
2.17.0

Re: [Qemu-devel] [PATCH v3 6/8] xen_backend: make the xen_feature_grant_copy flag private

2018-05-17 Thread Anthony PERARD

On Fri, May 04, 2018 at 08:26:05PM +0100, Paul Durrant wrote:
> There is no longer any use of this flag outside of the xen_backend code.
> 
> Signed-off-by: Paul Durrant 

Acked-by: Anthony PERARD 

-- 
Anthony PERARD

[Qemu-devel] [PATCH v4 03/14] qdev: let machine hotplug handler to override bus hotplug handler

2018-05-17 Thread David Hildenbrand

From: Igor Mammedov 

it will allow to return another hotplug handler than the default
one for a specific bus based device type. Which is needed to handle
non trivial plug/unplug sequences that need the access to resources
configured outside of bus where device is attached.

That will allow for returned hotplug handler to orchestrate wiring
in arbitrary order, by chaining other hotplug handlers when
it's needed.

PS:
It could be used for hybrid virtio-mem and virtio-pmem devices
where it will return machine as hotplug handler which will do
necessary wiring at machine level and then pass control down
the chain to bus specific hotplug handler.

Example of top level hotplug handler override and custom plug sequence:

  some_machine_get_hotplug_handler(machine){
  if (object_dynamic_cast(OBJECT(dev), TYPE_SOME_BUS_DEVICE)) {
  return HOTPLUG_HANDLER(machine);
  }
  return NULL;
  }

  some_machine_device_plug(hotplug_dev, dev) {
  if (object_dynamic_cast(OBJECT(dev), TYPE_SOME_BUS_DEVICE)) {
  /* do machine specific initialization */
  some_machine_init_special_device(dev)

  /* pass control to bus specific handler */
  hotplug_handler_plug(dev->parent_bus->hotplug_handler, dev)
  }
  }

Signed-off-by: Igor Mammedov 
Signed-off-by: David Hildenbrand 
---
 hw/core/qdev.c |  6 ++
 include/hw/qdev-core.h | 11 +++
 2 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index f6f92473b8..885286f579 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -261,12 +261,10 @@ HotplugHandler 
*qdev_get_machine_hotplug_handler(DeviceState *dev)
 
 HotplugHandler *qdev_get_hotplug_handler(DeviceState *dev)
 {
-HotplugHandler *hotplug_ctrl;
+HotplugHandler *hotplug_ctrl = qdev_get_machine_hotplug_handler(dev);
 
-if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
+if (hotplug_ctrl == NULL && dev->parent_bus) {
 hotplug_ctrl = dev->parent_bus->hotplug_handler;
-} else {
-hotplug_ctrl = qdev_get_machine_hotplug_handler(dev);
 }
 return hotplug_ctrl;
 }
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 9453588160..e6a8eca558 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -286,6 +286,17 @@ void qdev_init_nofail(DeviceState *dev);
 void qdev_set_legacy_instance_id(DeviceState *dev, int alias_id,
  int required_for_version);
 HotplugHandler *qdev_get_machine_hotplug_handler(DeviceState *dev);
+/**
+ * qdev_get_hotplug_handler: Get handler responsible for device wiring
+ *
+ * Find HOTPLUG_HANDLER for @dev that provides [pre|un]plug callbacks for it.
+ *
+ * Note: in case @dev has a parent bus, it will be returned as handler unless
+ * machine handler overrides it.
+ *
+ * Returns: pointer to object that implements TYPE_HOTPLUG_HANDLER interface
+ *  or NULL if there aren't any.
+ */
 HotplugHandler *qdev_get_hotplug_handler(DeviceState *dev);
 void qdev_unplug(DeviceState *dev, Error **errp);
 void qdev_simple_device_unplug_cb(HotplugHandler *hotplug_dev,
-- 
2.14.3

[Qemu-devel] [PATCH v3 02/12] intel-iommu: remove IntelIOMMUNotifierNode

2018-05-17 Thread Peter Xu

That is not really necessary.  Removing that node struct and put the
list entry directly into VTDAddressSpace.  It simplfies the code a lot.

Signed-off-by: Peter Xu 
---
 include/hw/i386/intel_iommu.h |  9 ++--
 hw/i386/intel_iommu.c | 41 ++-
 2 files changed, 14 insertions(+), 36 deletions(-)

diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 45ec8919b6..220697253f 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -67,7 +67,6 @@ typedef union VTD_IR_TableEntry VTD_IR_TableEntry;
 typedef union VTD_IR_MSIAddress VTD_IR_MSIAddress;
 typedef struct VTDIrq VTDIrq;
 typedef struct VTD_MSIMessage VTD_MSIMessage;
-typedef struct IntelIOMMUNotifierNode IntelIOMMUNotifierNode;
 
 /* Context-Entry */
 struct VTDContextEntry {
@@ -93,6 +92,7 @@ struct VTDAddressSpace {
 MemoryRegion iommu_ir;  /* Interrupt region: 0xfeeX */
 IntelIOMMUState *iommu_state;
 VTDContextCacheEntry context_cache_entry;
+QLIST_ENTRY(VTDAddressSpace) next;
 };
 
 struct VTDBus {
@@ -253,11 +253,6 @@ struct VTD_MSIMessage {
 /* When IR is enabled, all MSI/MSI-X data bits should be zero */
 #define VTD_IR_MSI_DATA  (0)
 
-struct IntelIOMMUNotifierNode {
-VTDAddressSpace *vtd_as;
-QLIST_ENTRY(IntelIOMMUNotifierNode) next;
-};
-
 /* The iommu (DMAR) device state struct */
 struct IntelIOMMUState {
 X86IOMMUState x86_iommu;
@@ -295,7 +290,7 @@ struct IntelIOMMUState {
 GHashTable *vtd_as_by_busptr;   /* VTDBus objects indexed by PCIBus* 
reference */
 VTDBus *vtd_as_by_bus_num[VTD_PCI_BUS_MAX]; /* VTDBus objects indexed by 
bus number */
 /* list of registered notifiers */
-QLIST_HEAD(, IntelIOMMUNotifierNode) notifiers_list;
+QLIST_HEAD(, VTDAddressSpace) notifiers_list;
 
 /* interrupt remapping */
 bool intr_enabled;  /* Whether guest enabled IR */
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index b359efd6f9..5987b48d43 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -1248,10 +1248,10 @@ static void 
vtd_interrupt_remap_table_setup(IntelIOMMUState *s)
 
 static void vtd_iommu_replay_all(IntelIOMMUState *s)
 {
-IntelIOMMUNotifierNode *node;
+VTDAddressSpace *vtd_as;
 
-QLIST_FOREACH(node, >notifiers_list, next) {
-memory_region_iommu_replay_all(>vtd_as->iommu);
+QLIST_FOREACH(vtd_as, >notifiers_list, next) {
+memory_region_iommu_replay_all(_as->iommu);
 }
 }
 
@@ -1372,7 +1372,6 @@ static void vtd_iotlb_global_invalidate(IntelIOMMUState 
*s)
 
 static void vtd_iotlb_domain_invalidate(IntelIOMMUState *s, uint16_t domain_id)
 {
-IntelIOMMUNotifierNode *node;
 VTDContextEntry ce;
 VTDAddressSpace *vtd_as;
 
@@ -1381,8 +1380,7 @@ static void vtd_iotlb_domain_invalidate(IntelIOMMUState 
*s, uint16_t domain_id)
 g_hash_table_foreach_remove(s->iotlb, vtd_hash_remove_by_domain,
 _id);
 
-QLIST_FOREACH(node, >notifiers_list, next) {
-vtd_as = node->vtd_as;
+QLIST_FOREACH(vtd_as, >notifiers_list, next) {
 if (!vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus),
   vtd_as->devfn, ) &&
 domain_id == VTD_CONTEXT_ENTRY_DID(ce.hi)) {
@@ -1402,12 +1400,11 @@ static void 
vtd_iotlb_page_invalidate_notify(IntelIOMMUState *s,
uint16_t domain_id, hwaddr addr,
uint8_t am)
 {
-IntelIOMMUNotifierNode *node;
+VTDAddressSpace *vtd_as;
 VTDContextEntry ce;
 int ret;
 
-QLIST_FOREACH(node, &(s->notifiers_list), next) {
-VTDAddressSpace *vtd_as = node->vtd_as;
+QLIST_FOREACH(vtd_as, &(s->notifiers_list), next) {
 ret = vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus),
vtd_as->devfn, );
 if (!ret && domain_id == VTD_CONTEXT_ENTRY_DID(ce.hi)) {
@@ -2344,8 +2341,6 @@ static void 
vtd_iommu_notify_flag_changed(IOMMUMemoryRegion *iommu,
 {
 VTDAddressSpace *vtd_as = container_of(iommu, VTDAddressSpace, iommu);
 IntelIOMMUState *s = vtd_as->iommu_state;
-IntelIOMMUNotifierNode *node = NULL;
-IntelIOMMUNotifierNode *next_node = NULL;
 
 if (!s->caching_mode && new & IOMMU_NOTIFIER_MAP) {
 error_report("We need to set caching-mode=1 for intel-iommu to enable "
@@ -2354,21 +2349,11 @@ static void 
vtd_iommu_notify_flag_changed(IOMMUMemoryRegion *iommu,
 }
 
 if (old == IOMMU_NOTIFIER_NONE) {
-node = g_malloc0(sizeof(*node));
-node->vtd_as = vtd_as;
-QLIST_INSERT_HEAD(>notifiers_list, node, next);
-return;
-}
-
-/* update notifier node with new flags */
-QLIST_FOREACH_SAFE(node, >notifiers_list, next, next_node) {
-if (node->vtd_as == vtd_as) {
-if (new == IOMMU_NOTIFIER_NONE) {
-

[Qemu-devel] [PATCH v3 03/12] intel-iommu: add iommu lock

2018-05-17 Thread Peter Xu

Add a per-iommu big lock to protect IOMMU status.  Currently the only
thing to be protected is the IOTLB/context cache, since that can be
accessed even without BQL, e.g., in IO dataplane.

Note that we don't need to protect device page tables since that's fully
controlled by the guest kernel.  However there is still possibility that
malicious drivers will program the device to not obey the rule.  In that
case QEMU can't really do anything useful, instead the guest itself will
be responsible for all uncertainties.

Reported-by: Fam Zheng 
Signed-off-by: Peter Xu 
---
 include/hw/i386/intel_iommu.h |  6 +
 hw/i386/intel_iommu.c | 43 +++
 2 files changed, 44 insertions(+), 5 deletions(-)

diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 220697253f..ee517704e7 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -300,6 +300,12 @@ struct IntelIOMMUState {
 OnOffAuto intr_eim; /* Toggle for EIM cabability */
 bool buggy_eim; /* Force buggy EIM unless eim=off */
 uint8_t aw_bits;/* Host/IOVA address width (in bits) */
+
+/*
+ * Protects IOMMU states in general.  Currently it protects the
+ * per-IOMMU IOTLB cache, and context entry cache in VTDAddressSpace.
+ */
+QemuMutex iommu_lock;
 };
 
 /* Find the VTD Address space associated with the given bus pointer,
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 5987b48d43..112971638d 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -128,6 +128,16 @@ static uint64_t vtd_set_clear_mask_quad(IntelIOMMUState 
*s, hwaddr addr,
 return new_val;
 }
 
+static inline void vtd_iommu_lock(IntelIOMMUState *s)
+{
+qemu_mutex_lock(>iommu_lock);
+}
+
+static inline void vtd_iommu_unlock(IntelIOMMUState *s)
+{
+qemu_mutex_unlock(>iommu_lock);
+}
+
 /* GHashTable functions */
 static gboolean vtd_uint64_equal(gconstpointer v1, gconstpointer v2)
 {
@@ -172,7 +182,7 @@ static gboolean vtd_hash_remove_by_page(gpointer key, 
gpointer value,
 }
 
 /* Reset all the gen of VTDAddressSpace to zero and set the gen of
- * IntelIOMMUState to 1.
+ * IntelIOMMUState to 1.  Must be with IOMMU lock held.
  */
 static void vtd_reset_context_cache(IntelIOMMUState *s)
 {
@@ -197,12 +207,19 @@ static void vtd_reset_context_cache(IntelIOMMUState *s)
 s->context_cache_gen = 1;
 }
 
-static void vtd_reset_iotlb(IntelIOMMUState *s)
+static void vtd_reset_iotlb_locked(IntelIOMMUState *s)
 {
 assert(s->iotlb);
 g_hash_table_remove_all(s->iotlb);
 }
 
+static void vtd_reset_iotlb(IntelIOMMUState *s)
+{
+vtd_iommu_lock(s);
+vtd_reset_iotlb_locked(s);
+vtd_iommu_unlock(s);
+}
+
 static uint64_t vtd_get_iotlb_key(uint64_t gfn, uint16_t source_id,
   uint32_t level)
 {
@@ -215,6 +232,7 @@ static uint64_t vtd_get_iotlb_gfn(hwaddr addr, uint32_t 
level)
 return (addr & vtd_slpt_level_page_mask(level)) >> VTD_PAGE_SHIFT_4K;
 }
 
+/* Must be with IOMMU lock held */
 static VTDIOTLBEntry *vtd_lookup_iotlb(IntelIOMMUState *s, uint16_t source_id,
hwaddr addr)
 {
@@ -235,6 +253,7 @@ out:
 return entry;
 }
 
+/* Must be with IOMMU lock held */
 static void vtd_update_iotlb(IntelIOMMUState *s, uint16_t source_id,
  uint16_t domain_id, hwaddr addr, uint64_t slpte,
  uint8_t access_flags, uint32_t level)
@@ -246,7 +265,7 @@ static void vtd_update_iotlb(IntelIOMMUState *s, uint16_t 
source_id,
 trace_vtd_iotlb_page_update(source_id, addr, slpte, domain_id);
 if (g_hash_table_size(s->iotlb) >= VTD_IOTLB_MAX_SIZE) {
 trace_vtd_iotlb_reset("iotlb exceeds size limit");
-vtd_reset_iotlb(s);
+vtd_reset_iotlb_locked(s);
 }
 
 entry->gfn = gfn;
@@ -1106,7 +1125,7 @@ static bool vtd_do_iommu_translate(VTDAddressSpace 
*vtd_as, PCIBus *bus,
 IntelIOMMUState *s = vtd_as->iommu_state;
 VTDContextEntry ce;
 uint8_t bus_num = pci_bus_num(bus);
-VTDContextCacheEntry *cc_entry = _as->context_cache_entry;
+VTDContextCacheEntry *cc_entry;
 uint64_t slpte, page_mask;
 uint32_t level;
 uint16_t source_id = vtd_make_source_id(bus_num, devfn);
@@ -1123,6 +1142,10 @@ static bool vtd_do_iommu_translate(VTDAddressSpace 
*vtd_as, PCIBus *bus,
  */
 assert(!vtd_is_interrupt_addr(addr));
 
+vtd_iommu_lock(s);
+
+cc_entry = _as->context_cache_entry;
+
 /* Try to fetch slpte form IOTLB */
 iotlb_entry = vtd_lookup_iotlb(s, source_id, addr);
 if (iotlb_entry) {
@@ -1182,7 +1205,7 @@ static bool vtd_do_iommu_translate(VTDAddressSpace 
*vtd_as, PCIBus *bus,
  * IOMMU region can be swapped back.
  */
 vtd_pt_enable_fast_path(s, source_id);
-
+vtd_iommu_unlock(s);
 return true;
 }
 
@@ -1203,6 +1226,7 @@

Re: [Qemu-devel] [PATCH v3 09/12] intel-iommu: maintain per-device iova ranges

2018-05-17 Thread Peter Xu

On Thu, May 17, 2018 at 04:59:24PM +0800, Peter Xu wrote:

[...]

> +/* Update local IOVA mapped ranges */
> +if (entry->perm) {
> +if (mapped) {
> +/* If it's exactly the same translation, skip */
> +if (!memcmp(mapped, , sizeof(target))) {
> +trace_vtd_page_walk_one_skip_map(entry->iova, 
> entry->addr_mask,
> + entry->translated_addr);
> +return 0;
> +} else {
> +/*
> + * Translation changed.  This should not happen with
> + * "intel_iommu=on,strict", but it can happen when
> + * delayed flushing is used in guest IOMMU driver
> + * (when without "strict") when page A is reused
> + * before its previous unmap (the unmap can still be
> + * queued in the delayed flushing queue).  Now we do

This comment is wrong.  We can ignore above comments for now since as
I explained in the other thread Linux IOVA deferred flushing won't
free IOVA range until the unmap is flushed.  But still, below comment
is valid.

Regards,

> + * our best to remap.  Note that there will be a small
> + * window that we don't have map at all.  But that's
> + * the best effort we can do, and logically
> + * well-behaved guests should not really using this
> + * DMA region yet so we should be very safe.
> + */
> +IOMMUAccessFlags cache_perm = entry->perm;
> +int ret;
> +
> +/* Emulate an UNMAP */
> +entry->perm = IOMMU_NONE;
> +trace_vtd_page_walk_one(info->domain_id,
> +entry->iova,
> +entry->translated_addr,
> +entry->addr_mask,
> +entry->perm);
> +ret = hook_fn(entry, private);
> +if (ret) {
> +return ret;
> +}
> +/* Drop any existing mapping */
> +iova_tree_remove(as->iova_tree, );
> +/* Recover the correct permission */
> +entry->perm = cache_perm;
> +}
> +}
> +iova_tree_insert(as->iova_tree, );
> +} else {
> +if (!mapped) {
> +/* Skip since we didn't map this range at all */
> +trace_vtd_page_walk_one_skip_unmap(entry->iova, 
> entry->addr_mask);
> +return 0;
> +}
> +iova_tree_remove(as->iova_tree, );
> +}
> +
>  trace_vtd_page_walk_one(info->domain_id, entry->iova,
>  entry->translated_addr, entry->addr_mask,
>  entry->perm);

-- 
Peter Xu

Re: [Qemu-devel] [PATCH] nbd/server: introduce NBD_CMD_CACHE

2018-05-17 Thread Vladimir Sementsov-Ogievskiy


Finally, what about this?

13.04.2018 17:31, Vladimir Sementsov-Ogievskiy wrote:

Handle nbd CACHE command. Just do read, without sending read data back.
Cache mechanism should be done by exported node driver chain.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  include/block/nbd.h |  3 ++-
  nbd/server.c| 10 ++
  2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index fcdcd54502..b4793d0a29 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -135,6 +135,7 @@ typedef struct NBDExtent {
  #define NBD_FLAG_SEND_TRIM (1 << 5) /* Send TRIM (discard) */
  #define NBD_FLAG_SEND_WRITE_ZEROES (1 << 6) /* Send WRITE_ZEROES */
  #define NBD_FLAG_SEND_DF   (1 << 7) /* Send DF (Do not Fragment) */
+#define NBD_FLAG_SEND_CACHE(1 << 8) /* Send CACHE (prefetch) */
  
  /* New-style handshake (global) flags, sent from server to client, and

 control what will happen during handshake phase. */
@@ -195,7 +196,7 @@ enum {
  NBD_CMD_DISC = 2,
  NBD_CMD_FLUSH = 3,
  NBD_CMD_TRIM = 4,
-/* 5 reserved for failed experiment NBD_CMD_CACHE */
+NBD_CMD_CACHE = 5,
  NBD_CMD_WRITE_ZEROES = 6,
  NBD_CMD_BLOCK_STATUS = 7,
  };
diff --git a/nbd/server.c b/nbd/server.c
index 9e1f227178..30d7d3f444 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1134,7 +1134,7 @@ static coroutine_fn int nbd_negotiate(NBDClient *client, 
Error **errp)
  int ret;
  const uint16_t myflags = (NBD_FLAG_HAS_FLAGS | NBD_FLAG_SEND_TRIM |
NBD_FLAG_SEND_FLUSH | NBD_FLAG_SEND_FUA |
-  NBD_FLAG_SEND_WRITE_ZEROES);
+  NBD_FLAG_SEND_WRITE_ZEROES | 
NBD_FLAG_SEND_CACHE);
  bool oldStyle;
  
  /* Old style negotiation header, no room for options

@@ -1826,7 +1826,9 @@ static int nbd_co_receive_request(NBDRequestData *req, 
NBDRequest *request,
  return -EIO;
  }
  
-if (request->type == NBD_CMD_READ || request->type == NBD_CMD_WRITE) {

+if (request->type == NBD_CMD_READ || request->type == NBD_CMD_WRITE ||
+request->type == NBD_CMD_CACHE)
+{
  if (request->len > NBD_MAX_BUFFER_SIZE) {
  error_setg(errp, "len (%" PRIu32" ) is larger than max len (%u)",
 request->len, NBD_MAX_BUFFER_SIZE);
@@ -1911,7 +1913,7 @@ static coroutine_fn int nbd_do_cmd_read(NBDClient 
*client, NBDRequest *request,
  int ret;
  NBDExport *exp = client->exp;
  
-assert(request->type == NBD_CMD_READ);

+assert(request->type == NBD_CMD_READ || request->type == NBD_CMD_CACHE);
  
  /* XXX: NBD Protocol only documents use of FUA with WRITE */

  if (request->flags & NBD_CMD_FLAG_FUA) {
@@ -1930,7 +1932,7 @@ static coroutine_fn int nbd_do_cmd_read(NBDClient 
*client, NBDRequest *request,
  
  ret = blk_pread(exp->blk, request->from + exp->dev_offset, data,

  request->len);
-if (ret < 0) {
+if (ret < 0 || request->type == NBD_CMD_CACHE) {
  return nbd_send_generic_reply(client, request->handle, ret,
"reading from file failed", errp);
  }



--
Best regards,
Vladimir

Re: [Qemu-devel] [RFC PATCH 06/12] qapi: add bitmap info

2018-05-17 Thread Vladimir Sementsov-Ogievskiy


17.05.2018 00:15, John Snow wrote:


On 05/14/2018 10:30 AM, Vladimir Sementsov-Ogievskiy wrote:

12.05.2018 04:25, John Snow wrote:

Add some of the necessary scaffolding for reporting bitmap information.

Signed-off-by: John Snow 
---
   qapi/block-core.json | 60
+++-
   1 file changed, 59 insertions(+), 1 deletion(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index c50517bff3..8f33f41ce7 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -33,6 +33,61 @@
   'date-sec': 'int', 'date-nsec': 'int',
   'vm-clock-sec': 'int', 'vm-clock-nsec': 'int' } }
   +##
+# @BitmapTypeEnum:
+#
+# An enumeration of possible serialized bitmap types.
+#
+# @dirty-tracking: This bitmap records information on dirty
+#  segments within the file.
+#
+# @unknown: This bitmap is of an unknown or reserved type.
+#
+# Since: 2.13
+##
+{ 'enum': 'BitmapTypeEnum', 'data': [ 'dirty-tracking', 'unknown' ] }
+
+##
+# @BitmapFlagEnum:
+#
+# An enumeration of possible flags for serialized bitmaps.
+#
+# @in-use: This bitmap is considered to be in-use, and may now be
inconsistent.
+#
+# @auto: This bitmap must reflect any and all changes to the file it
describes.
+#    The type of this bitmap must be @DirtyTrackingBitmap.

logical, but I don't see this restriction in the spec. May be we need to
update the spec


1: auto

"The bitmap must reflect all changes of the virtual disk by any
application that would write to this qcow2 file (including writes,
snapshot switching, etc.). The type of this bitmap must be 'dirty
tracking bitmap'."

Actually, this looks correct now that I'm looking at the spec again.
I've used a terser phrasing but I think it's correct.


another thought: why must? We know nothing about other types.. May be 
for other type this flag will have similar or other meaning.. For me, 
this flag looks like a property of dirty-tracking bitmap, not the thing 
which dictates only that type.





+#
+# @extra-data-compatible: This bitmap has extra information
associated with it.

no, this flag means, that extra data is compatible. So, if you don't
know what is this extra data, you can read and modify the bitmap,
leaving this data as is. If this flag is unset, and there are some extra
data, bitmap must not be used.

Finally, this spec should be consistent (or, may be better, duplicate)
spec from docs/interop/qcow2.txt..

I might suggest a rewrite of this portion of the spec as it's a little
unclear to me.

I've given this portion a rewrite.


+#
+# @unknown: This bitmap has unknown or reserved properties.

Better is only "reserved flags" (not unknown and not properties), they
are reserved by spec.


+#
+# Since: 2.13
+##
+{ 'enum': 'BitmapFlagEnum', 'data': [ 'in-use', 'auto',
+  'extra-data-compatible',
'unknown' ] }
+
+##
+# @BitmapInfo:
+#
+# @name: The name of the bitmap.
+#
+# @type: The type of bitmap.
+#
+# @granularity: Bitmap granularity, in bytes.
+#
+# @count: Overall bitmap dirtiness, in bytes.
+#
+# @flags: Bitmap flags, if any.
+#
+# Since: 2.13
+#
+##
+{ 'struct': 'BitmapInfo',
+  'data': { 'name': 'str', 'type': 'BitmapTypeEnum', 'granularity':
'int',
+    'count': 'int', '*flags': ['BitmapFlagEnum']

may be worth add 'has-extra-data'


+  }
+}
+
   ##
   # @ImageInfoSpecificQCow2EncryptionBase:
   #
@@ -69,6 +124,8 @@
   # @encrypt: details about encryption parameters; only set if image
   #   is encrypted (since 2.10)
   #
+# @bitmaps: list of image bitmaps (since 2.13)
+#
   # Since: 1.7
   ##
   { 'struct': 'ImageInfoSpecificQCow2',
@@ -77,7 +134,8 @@
     '*lazy-refcounts': 'bool',
     '*corrupt': 'bool',
     'refcount-bits': 'int',
-  '*encrypt': 'ImageInfoSpecificQCow2Encryption'
+  '*encrypt': 'ImageInfoSpecificQCow2Encryption',
+  '*bitmaps': ['BitmapInfo']
     } }
     ##





--
Best regards,
Vladimir

Re: [Qemu-devel] [PATCH v4 05/10] file-posix: Implement bdrv_co_copy_range

2018-05-17 Thread Stefan Hajnoczi

On Fri, May 11, 2018 at 08:08:18PM +0800, Fam Zheng wrote:
> With copy_file_range(2), we can implement the bdrv_co_copy_range
> semantics.
> 
> Signed-off-by: Fam Zheng 
> ---
>  block/file-posix.c  | 96 
> +++--
>  include/block/raw-aio.h | 10 --
>  2 files changed, 101 insertions(+), 5 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v4 04/10] qcow2: Implement copy offloading

2018-05-17 Thread Stefan Hajnoczi

On Fri, May 11, 2018 at 08:08:17PM +0800, Fam Zheng wrote:
> +static int qcow2_handle_l2meta(BlockDriverState *bs, QCowL2Meta *l2meta)
> +{
> +int ret = 0;
> +
> +while (l2meta != NULL) {
> +QCowL2Meta *next;
> +
> +if (!ret) {
> +ret = qcow2_alloc_cluster_link_l2(bs, l2meta);
> +}
> +
> +/* Take the request off the list of running requests */
> +if (l2meta->nb_clusters != 0) {
> +QLIST_REMOVE(l2meta, next_in_flight);
> +}
> +
> +qemu_co_queue_restart_all(>dependent_requests);

A coroutine_fn may only be called by a coroutine_fn.  Please mark
qcow2_handle_l2meta() coroutine_fn.

> @@ -2069,18 +2080,7 @@ static coroutine_fn int 
> qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset,
>  ret = 0;
>  
>  fail:
> -while (l2meta != NULL) {
> -QCowL2Meta *next;
> -
> -if (l2meta->nb_clusters != 0) {
> -QLIST_REMOVE(l2meta, next_in_flight);
> -}
> -qemu_co_queue_restart_all(>dependent_requests);
> -
> -next = l2meta->next;
> -g_free(l2meta);
> -l2meta = next;
> -}
> +qcow2_handle_l2meta(bs, l2meta);

Is qcow2_handle_l2meta() equivalent to the code that is being removed?
qcow2_handle_l2meta() calls qcow2_alloc_cluster_link_l2() while this
code does not.

>  
>  qemu_co_mutex_unlock(>lock);
>  
> @@ -3267,6 +3267,176 @@ static coroutine_fn int 
> qcow2_co_pdiscard(BlockDriverState *bs,
>  return ret;
>  }
>  
> +static int qcow2_co_copy_range_from(BlockDriverState *bs,

Missing coroutine_fn.  Please check your patches for more occurrences.

> +while (l2meta != NULL) {
> +QCowL2Meta *next;
> +
> +ret = qcow2_alloc_cluster_link_l2(bs, l2meta);
> +if (ret < 0) {
> +goto fail;
> +}
> +
> +/* Take the request off the list of running requests */
> +if (l2meta->nb_clusters != 0) {
> +QLIST_REMOVE(l2meta, next_in_flight);
> +}
> +
> +qemu_co_queue_restart_all(>dependent_requests);
> +
> +next = l2meta->next;
> +g_free(l2meta);
> +l2meta = next;
> +}

Why isn't this a call to qcow2_handle_l2meta()?  I guess the reason for
extracting that function was to use it here?


signature.asc
Description: PGP signature

Re: [Qemu-devel] FW: [PATCH 2/2] migration: not wait RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect

2018-05-17 Thread 858585 jemmy

On Thu, May 17, 2018 at 3:31 PM, Aviad Yehezkel
 wrote:
>
>
> On 5/17/2018 5:42 AM, 858585 jemmy wrote:
>>
>> On Wed, May 16, 2018 at 11:11 PM, Aviad Yehezkel
>>  wrote:
>>>
>>> Hi Lidong and David,
>>> Sorry for the late response, I had to ramp up on migration code and build
>>> a
>>> setup on my side.
>>>
>>> PSB my comments for this patch below.
>>> For the RDMA post-copy patches I will comment next week after testing on
>>> Mellanox side too.
>>>
>>> Thanks!
>>>
>>> On 5/16/2018 5:21 PM, Aviad Yehezkel wrote:


 -Original Message-
 From: Dr. David Alan Gilbert [mailto:dgilb...@redhat.com]
 Sent: Wednesday, May 16, 2018 4:13 PM
 To: 858585 jemmy 
 Cc: Aviad Yehezkel ; Juan Quintela
 ; qemu-devel ; Gal Shachaf
 ; Adi Dotan ; Lidong Chen
 
 Subject: Re: [PATCH 2/2] migration: not wait RDMA_CM_EVENT_DISCONNECTED
 event after rdma_disconnect

 * 858585 jemmy (jemmy858...@gmail.com) wrote:

 

>> I wonder why dereg_mr takes so long - I could understand if
>> reg_mr took a long time, but why for dereg, that sounds like the
>> easy side.
>
> I use perf collect the information when ibv_dereg_mr is invoked.
>
> -   9.95%  client2  [kernel.kallsyms]  [k] put_compound_page
> `
>  - put_compound_page
> - 98.45% put_page
>  __ib_umem_release
>  ib_umem_release
>  dereg_mr
>  mlx5_ib_dereg_mr
>  ib_dereg_mr
>  uverbs_free_mr
>  remove_commit_idr_uobject
>  _rdma_remove_commit_uobject
>  rdma_remove_commit_uobject
>  ib_uverbs_dereg_mr
>  ib_uverbs_write
>  vfs_write
>  sys_write
>  system_call_fastpath
>  __GI___libc_write
>  0
> + 1.55% __ib_umem_release
> +   8.31%  client2  [kernel.kallsyms]  [k]
> compound_unlock_irqrestore
> +   7.01%  client2  [kernel.kallsyms]  [k] page_waitqueue
> +   7.00%  client2  [kernel.kallsyms]  [k] set_page_dirty
> +   6.61%  client2  [kernel.kallsyms]  [k] unlock_page
> +   6.33%  client2  [kernel.kallsyms]  [k] put_page_testzero
> +   5.68%  client2  [kernel.kallsyms]  [k] set_page_dirty_lock
> +   4.30%  client2  [kernel.kallsyms]  [k] __wake_up_bit
> +   4.04%  client2  [kernel.kallsyms]  [k] free_pages_prepare
> +   3.65%  client2  [kernel.kallsyms]  [k] release_pages
> +   3.62%  client2  [kernel.kallsyms]  [k] arch_local_irq_save
> +   3.35%  client2  [kernel.kallsyms]  [k] page_mapping
> +   3.13%  client2  [kernel.kallsyms]  [k]
> get_pageblock_flags_group
> +   3.09%  client2  [kernel.kallsyms]  [k] put_page
>
> the reason is __ib_umem_release will loop many times for each page.
>
> static void __ib_umem_release(struct ib_device *dev, struct
> ib_umem *umem, int dirty) {
>   struct scatterlist *sg;
>   struct page *page;
>   int i;
>
>   if (umem->nmap > 0)
>ib_dma_unmap_sg(dev, umem->sg_head.sgl,
>   umem->npages,
>   DMA_BIDIRECTIONAL);
>
>for_each_sg(umem->sg_head.sgl, sg, umem->npages, i) {
> <<
> loop a lot of times for each page.here

 Why 'lot of times for each page'?  I don't know this code at all,
 but I'd expected once per page?
>>>
>>> sorry, once per page, but a lot of page for a big size virtual
>>> machine.
>>
>> Ah OK; so yes it seems best if you can find a way to do the release
>> in the migration thread then;  still maybe this is something some of
>> the kernel people could look at speeding up?
>
> The kernel code seem is not complex, and I have no idea how to speed
> up.

 Me neither; but I'll ask around.
>>>
>>> I will ask internally and get back on this one.


 With your other kernel fix, does the problem of the missing
 RDMA_CM_EVENT_DISCONNECTED events go away?
>>>
>>> Yes, after kernel and qemu fixed, this issue never happens again.
>>
>> I'm confused; which qemu fix; my question was whether the kernel fix
>> by itself fixed the problem of the missing event.
>
> this qemu fix:
> migration: update

[Qemu-devel] [PATCH v4 14/14] memory-device: factor out plug into hotplug handler

2018-05-17 Thread David Hildenbrand

Let's move the plug logic into the applicable hotplug handler for pc and
spapr.

Signed-off-by: David Hildenbrand 
---
 hw/i386/pc.c   | 35 ---
 hw/mem/memory-device.c | 40 ++--
 hw/mem/pc-dimm.c   | 29 +
 hw/mem/trace-events|  2 +-
 hw/ppc/spapr.c | 15 ---
 include/hw/mem/memory-device.h |  7 ++-
 include/hw/mem/pc-dimm.h   |  3 +--
 7 files changed, 71 insertions(+), 60 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 426fb534c2..f022eb042e 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1682,22 +1682,8 @@ static void pc_dimm_plug(HotplugHandler *hotplug_dev,
 HotplugHandlerClass *hhc;
 Error *local_err = NULL;
 PCMachineState *pcms = PC_MACHINE(hotplug_dev);
-PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
-PCDIMMDevice *dimm = PC_DIMM(dev);
-PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
-MemoryRegion *mr;
-uint64_t align = TARGET_PAGE_SIZE;
 bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM);
 
-mr = ddc->get_memory_region(dimm, _err);
-if (local_err) {
-goto out;
-}
-
-if (memory_region_get_alignment(mr) && pcmc->enforce_aligned_dimm) {
-align = memory_region_get_alignment(mr);
-}
-
 /*
  * When -no-acpi is used with Q35 machine type, no ACPI is built,
  * but pcms->acpi_dev is still created. Check !acpi_enabled in
@@ -1715,7 +1701,7 @@ static void pc_dimm_plug(HotplugHandler *hotplug_dev,
 goto out;
 }
 
-pc_dimm_memory_plug(dev, MACHINE(pcms), align, _err);
+pc_dimm_memory_plug(dev, MACHINE(pcms), _err);
 if (local_err) {
 goto out;
 }
@@ -2036,6 +2022,25 @@ static void pc_machine_device_plug_cb(HotplugHandler 
*hotplug_dev,
 {
 Error *local_err = NULL;
 
+/* first stage hotplug handler */
+if (object_dynamic_cast(OBJECT(dev), TYPE_MEMORY_DEVICE)) {
+const PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(hotplug_dev);
+uint64_t align = 0;
+
+/* compat handling: force to TARGET_PAGE_SIZE */
+if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM) &&
+!pcmc->enforce_aligned_dimm) {
+align = TARGET_PAGE_SIZE;
+}
+memory_device_plug(MACHINE(hotplug_dev), MEMORY_DEVICE(dev),
+   align ?  : NULL, _err);
+}
+
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+
 /* final stage hotplug handler */
 if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
 pc_dimm_plug(hotplug_dev, dev, _err);
diff --git a/hw/mem/memory-device.c b/hw/mem/memory-device.c
index 8f10d613ea..04bdb30f22 100644
--- a/hw/mem/memory-device.c
+++ b/hw/mem/memory-device.c
@@ -69,9 +69,10 @@ static int memory_device_used_region_size(Object *obj, void 
*opaque)
 return 0;
 }
 
-uint64_t memory_device_get_free_addr(MachineState *ms, const uint64_t *hint,
- uint64_t align, uint64_t size,
- Error **errp)
+static uint64_t memory_device_get_free_addr(MachineState *ms,
+const uint64_t *hint,
+uint64_t align, uint64_t size,
+Error **errp)
 {
 uint64_t address_space_start, address_space_end;
 uint64_t used_region_size = 0;
@@ -237,11 +238,38 @@ void memory_device_pre_plug(MachineState *ms, const 
MemoryDeviceState *md,
 }
 }
 
-void memory_device_plug_region(MachineState *ms, MemoryRegion *mr,
-   uint64_t addr)
+void memory_device_plug(MachineState *ms, MemoryDeviceState *md,
+uint64_t *enforced_align, Error **errp)
 {
-/* we expect a previous call to memory_device_get_free_addr() */
+const MemoryDeviceClass *mdc = MEMORY_DEVICE_GET_CLASS(md);
+const uint64_t size = mdc->get_region_size(md);
+MemoryRegion *mr = mdc->get_memory_region(md);
+uint64_t addr = mdc->get_addr(md);
+uint64_t align;
+
+/* we expect a previous call to memory_device_pre_plug */
 g_assert(ms->device_memory);
+g_assert(mr && !memory_region_is_mapped(mr));
+
+/* compat handling, some alignment has to be enforced for DIMMs */
+if (enforced_align) {
+align = *enforced_align;
+} else {
+align = memory_region_get_alignment(mr);
+}
+
+/* our device might have stronger alignment requirements */
+if (mdc->get_align) {
+align = MAX(align, mdc->get_align(md));
+}
+
+addr = memory_device_get_free_addr(ms, !addr ? NULL : , align,
+   size, errp);
+if (*errp) {
+return;
+}
+trace_memory_device_assign_address(addr);
+mdc->set_addr(md, addr);

[Qemu-devel] [PATCH v4 13/14] memory-device: factor out unplug into hotplug handler

2018-05-17 Thread David Hildenbrand

Let's move the unplug logic into the applicable hotplug handler for pc and
spapr.

We'll move the plug logic next, then this will look more symmetrical in
the hotplug handlers.

Signed-off-by: David Hildenbrand 
---
 hw/i386/pc.c   | 17 -
 hw/mem/memory-device.c | 14 --
 hw/mem/pc-dimm.c   |  2 --
 hw/mem/trace-events|  2 ++
 hw/ppc/spapr.c | 16 +++-
 include/hw/mem/memory-device.h |  2 +-
 6 files changed, 46 insertions(+), 7 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 61f1537e14..426fb534c2 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -2044,6 +2044,12 @@ static void pc_machine_device_plug_cb(HotplugHandler 
*hotplug_dev,
 } else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
 hotplug_handler_plug(dev->parent_bus->hotplug_handler, dev, 
_err);
 }
+
+if (local_err) {
+if (object_dynamic_cast(OBJECT(dev), TYPE_MEMORY_DEVICE)) {
+memory_device_unplug(MACHINE(hotplug_dev), MEMORY_DEVICE(dev));
+}
+}
 error_propagate(errp, local_err);
 }
 
@@ -2080,7 +2086,16 @@ static void pc_machine_device_unplug_cb(HotplugHandler 
*hotplug_dev,
 error_setg(_err, "acpi: device unplug for not supported device"
" type: %s", object_get_typename(OBJECT(dev)));
 }
-error_propagate(errp, local_err);
+
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+
+/* first stage hotplug handler */
+if (object_dynamic_cast(OBJECT(dev), TYPE_MEMORY_DEVICE)) {
+memory_device_unplug(MACHINE(hotplug_dev), MEMORY_DEVICE(dev));
+}
 }
 
 static HotplugHandler *pc_get_hotpug_handler(MachineState *machine,
diff --git a/hw/mem/memory-device.c b/hw/mem/memory-device.c
index d22c91993f..8f10d613ea 100644
--- a/hw/mem/memory-device.c
+++ b/hw/mem/memory-device.c
@@ -17,6 +17,7 @@
 #include "qemu/range.h"
 #include "hw/virtio/vhost.h"
 #include "sysemu/kvm.h"
+#include "trace.h"
 
 static gint memory_device_addr_sort(gconstpointer a, gconstpointer b)
 {
@@ -246,12 +247,21 @@ void memory_device_plug_region(MachineState *ms, 
MemoryRegion *mr,
 addr - ms->device_memory->base, mr);
 }
 
-void memory_device_unplug_region(MachineState *ms, MemoryRegion *mr)
+void memory_device_unplug(MachineState *ms, MemoryDeviceState *md)
 {
-/* we expect a previous call to memory_device_get_free_addr() */
+const MemoryDeviceClass *mdc = MEMORY_DEVICE_GET_CLASS(md);
+MemoryRegion *mr = mdc->get_memory_region(md);
+
+/* we expect a previous call to memory_device_pre_plug */
 g_assert(ms->device_memory);
 
+if (!memory_region_is_mapped(mr)) {
+return;
+}
+
 memory_region_del_subregion(>device_memory->mr, mr);
+trace_memory_device_unassign_address(mdc->get_addr(md));
+mdc->set_addr(md, 0);
 }
 
 static const TypeInfo memory_device_info = {
diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
index 5e2e3263ab..d487bb513b 100644
--- a/hw/mem/pc-dimm.c
+++ b/hw/mem/pc-dimm.c
@@ -94,9 +94,7 @@ void pc_dimm_memory_unplug(DeviceState *dev, MachineState 
*machine)
 PCDIMMDevice *dimm = PC_DIMM(dev);
 PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
 MemoryRegion *vmstate_mr = ddc->get_vmstate_memory_region(dimm);
-MemoryRegion *mr = ddc->get_memory_region(dimm, _abort);
 
-memory_device_unplug_region(machine, mr);
 vmstate_unregister_ram(vmstate_mr, dev);
 }
 
diff --git a/hw/mem/trace-events b/hw/mem/trace-events
index e150dcc497..a661ee49a3 100644
--- a/hw/mem/trace-events
+++ b/hw/mem/trace-events
@@ -3,3 +3,5 @@
 # hw/mem/pc-dimm.c
 mhp_pc_dimm_assigned_slot(int slot) "%d"
 mhp_pc_dimm_assigned_address(uint64_t addr) "0x%"PRIx64
+# hw/mem/memory-device.c
+memory_device_unassign_address(uint64_t addr) "0x%"PRIx64
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 562712def2..abdd38a6b5 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -3621,6 +3621,11 @@ static void spapr_machine_device_plug(HotplugHandler 
*hotplug_dev,
 hotplug_handler_plug(dev->parent_bus->hotplug_handler, dev, 
_err);
 }
 out:
+if (local_err) {
+if (object_dynamic_cast(OBJECT(dev), TYPE_MEMORY_DEVICE)) {
+memory_device_unplug(MACHINE(hotplug_dev), MEMORY_DEVICE(dev));
+}
+}
 error_propagate(errp, local_err);
 }
 
@@ -3638,7 +3643,16 @@ static void spapr_machine_device_unplug(HotplugHandler 
*hotplug_dev,
 hotplug_handler_unplug(dev->parent_bus->hotplug_handler, dev,
_err);
 }
-error_propagate(errp, local_err);
+
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+
+/* first stage hotplug handler */
+if (object_dynamic_cast(OBJECT(dev), TYPE_MEMORY_DEVICE)) {
+memory_device_unplug(MACHINE(hotplug_dev), MEMORY_DEVICE(dev));
+}
 }
 
 static void

Re: [Qemu-devel] [edk2] [PATCH 0/4] RFC: ovmf: Add support for TPM Physical Presence interface

2018-05-17 Thread Laszlo Ersek

On 05/17/18 09:54, Laszlo Ersek wrote:
> On 05/15/18 14:30, marcandre.lur...@redhat.com wrote:
>> From: Marc-André Lureau 
>>
>> Hi,
>>
>> The following series adds basic TPM PPI 1.3 support for OVMF-on-QEMU
>> with TPM2 (I haven't tested TPM1, for lack of interest).
>>
>> PPI test runs successfully with Windows 10 WHLK, despite the limited
>> number of supported funcions (tpm2_ppi_funcs table, in particular, no
>> function allows to manipulate Tcg2PhysicalPresenceFlags)
>>
>> The way it works is relatively simple: a memory region is allocated by
>> QEMU to save PPI related variables. An ACPI interface is exposed by
>> QEMU to let the guest manipulate those. At boot, ovmf processes and
>> updates the PPI qemu region and request variables.
>>
>> I build edk2 with:
>>
>> $ build -DTPM2_ENABLE -DSECURE_BOOT_ENABLE
> 
> Is -DSECURE_BOOT_ENABLE necessary for *building* with -DTPM2_ENABLE? If
> that's the case, we should update the DSC files; users building OVMF
> from source shouldn't have to care about "-D" inter-dependencies, if we
> can manage that somehow.
> 
> If -DSECURE_BOOT_ENABLE is only there because otherwise a guest OS
> doesn't really make use of -DTPM2_ENABLE either, that's different. In
> that case, it's fine to allow building OVMF with TPM2 support but
> without SB support.

Oops, almost missed another important omission: in every commit message,
please insert the following line just above your S-o-b:

Contributed-under: TianoCore Contribution Agreement 1.1

We cannot take patches without that line. You can read about it in the
"Contributions.txt" file, in the project root directory.

Thanks!
Laszlo

[Qemu-devel] [PATCH v3 08/12] util: implement simple iova tree

2018-05-17 Thread Peter Xu

Introduce a simplest iova tree implementation based on GTree.

Signed-off-by: Peter Xu 
---
 include/qemu/iova-tree.h | 134 +++
 util/iova-tree.c | 114 +
 MAINTAINERS  |   6 ++
 util/Makefile.objs   |   1 +
 4 files changed, 255 insertions(+)
 create mode 100644 include/qemu/iova-tree.h
 create mode 100644 util/iova-tree.c

diff --git a/include/qemu/iova-tree.h b/include/qemu/iova-tree.h
new file mode 100644
index 00..b061932097
--- /dev/null
+++ b/include/qemu/iova-tree.h
@@ -0,0 +1,134 @@
+/*
+ * An very simplified iova tree implementation based on GTree.
+ *
+ * Copyright 2018 Red Hat, Inc.
+ *
+ * Authors:
+ *  Peter Xu 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ */
+#ifndef IOVA_TREE_H
+#define IOVA_TREE_H
+
+/*
+ * Currently the iova tree will only allow to keep ranges
+ * information, and no extra user data is allowed for each element.  A
+ * benefit is that we can merge adjacent ranges internally within the
+ * tree.  It can save a lot of memory when the ranges are splitted but
+ * mostly continuous.
+ *
+ * Note that current implementation does not provide any thread
+ * protections.  Callers of the iova tree should be responsible
+ * for the thread safety issue.
+ */
+
+#include "qemu/osdep.h"
+#include "exec/memory.h"
+#include "exec/hwaddr.h"
+
+#define  IOVA_OK   (0)
+#define  IOVA_ERR_INVALID  (-1) /* Invalid parameters */
+#define  IOVA_ERR_OVERLAP  (-2) /* IOVA range overlapped */
+
+typedef struct IOVATree IOVATree;
+typedef struct DMAMap {
+hwaddr iova;
+hwaddr translated_addr;
+hwaddr size;/* Inclusive */
+IOMMUAccessFlags perm;
+} QEMU_PACKED DMAMap;
+typedef gboolean (*iova_tree_iterator)(DMAMap *map);
+
+/**
+ * iova_tree_new:
+ *
+ * Create a new iova tree.
+ *
+ * Returns: the tree pointer when succeeded, or NULL if error.
+ */
+IOVATree *iova_tree_new(void);
+
+/**
+ * iova_tree_insert:
+ *
+ * @tree: the iova tree to insert
+ * @map: the mapping to insert
+ *
+ * Insert an iova range to the tree.  If there is overlapped
+ * ranges, IOVA_ERR_OVERLAP will be returned.
+ *
+ * Return: 0 if succeeded, or <0 if error.
+ */
+int iova_tree_insert(IOVATree *tree, DMAMap *map);
+
+/**
+ * iova_tree_remove:
+ *
+ * @tree: the iova tree to remove range from
+ * @map: the map range to remove
+ *
+ * Remove mappings from the tree that are covered by the map range
+ * provided.  The range does not need to be exactly what has inserted,
+ * all the mappings that are included in the provided range will be
+ * removed from the tree.  Here map->translated_addr is meaningless.
+ *
+ * Return: 0 if succeeded, or <0 if error.
+ */
+int iova_tree_remove(IOVATree *tree, DMAMap *map);
+
+/**
+ * iova_tree_find:
+ *
+ * @tree: the iova tree to search from
+ * @map: the mapping to search
+ *
+ * Search for a mapping in the iova tree that overlaps with the
+ * mapping range specified.  Only the first found mapping will be
+ * returned.
+ *
+ * Return: DMAMap pointer if found, or NULL if not found.  Note that
+ * the returned DMAMap pointer is maintained internally.  User should
+ * only read the content but never modify or free the content.  Also,
+ * user is responsible to make sure the pointer is valid (say, no
+ * concurrent deletion in progress).
+ */
+DMAMap *iova_tree_find(IOVATree *tree, DMAMap *map);
+
+/**
+ * iova_tree_find_address:
+ *
+ * @tree: the iova tree to search from
+ * @iova: the iova address to find
+ *
+ * Similar to iova_tree_find(), but it tries to find mapping with
+ * range iova=iova & size=0.
+ *
+ * Return: same as iova_tree_find().
+ */
+DMAMap *iova_tree_find_address(IOVATree *tree, hwaddr iova);
+
+/**
+ * iova_tree_foreach:
+ *
+ * @tree: the iova tree to iterate on
+ * @iterator: the interator for the mappings, return true to stop
+ *
+ * Iterate over the iova tree.
+ *
+ * Return: 1 if found any overlap, 0 if not, <0 if error.
+ */
+void iova_tree_foreach(IOVATree *tree, iova_tree_iterator iterator);
+
+/**
+ * iova_tree_destroy:
+ *
+ * @tree: the iova tree to destroy
+ *
+ * Destroy an existing iova tree.
+ *
+ * Return: None.
+ */
+void iova_tree_destroy(IOVATree *tree);
+
+#endif
diff --git a/util/iova-tree.c b/util/iova-tree.c
new file mode 100644
index 00..2d9cebfc89
--- /dev/null
+++ b/util/iova-tree.c
@@ -0,0 +1,114 @@
+/*
+ * IOVA tree implementation based on GTree.
+ *
+ * Copyright 2018 Red Hat, Inc.
+ *
+ * Authors:
+ *  Peter Xu 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ */
+
+#include 
+#include "qemu/iova-tree.h"
+
+struct IOVATree {
+GTree *tree;
+};
+
+static int iova_tree_compare(gconstpointer a, gconstpointer b, gpointer data)
+{
+const DMAMap *m1 = a, *m2 = b;
+
+if (m1->iova > m2->iova + m2->size) {
+return 1;
+}
+
+if (m1->iova + m1->size <

[Qemu-devel] [PATCH 0/4] display: add new bochs-display device

2018-05-17 Thread Gerd Hoffmann



Gerd Hoffmann (4):
  vga: move bochs vbe defines to header file
  display: add new bochs-display device
  bochs-display: add dirty tracking support
  bochs-display: add pcie support

 hw/display/vga_int.h   |  35 +---
 include/hw/display/bochs-vbe.h |  64 
 hw/display/bochs-display.c | 362 +
 hw/display/vga-pci.c   |  13 --
 hw/display/Makefile.objs   |   1 +
 5 files changed, 429 insertions(+), 46 deletions(-)
 create mode 100644 include/hw/display/bochs-vbe.h
 create mode 100644 hw/display/bochs-display.c

-- 
2.9.3

Re: [Qemu-devel] [PATCH V7 RESEND 17/17] COLO: quick failover process by kick COLO thread

2018-05-17 Thread Dr. David Alan Gilbert

* Zhang Chen (zhangc...@gmail.com) wrote:
> From: zhanghailiang 
> 
> COLO thread may sleep at qemu_sem_wait(>colo_checkpoint_sem),
> while failover works begin, It's better to wakeup it to quick
> the process.
> 
> Signed-off-by: zhanghailiang 

Reviewed-by: Dr. David Alan Gilbert 

> ---
>  migration/colo.c | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/migration/colo.c b/migration/colo.c
> index 15463e2823..16def4865c 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -135,6 +135,11 @@ static void primary_vm_do_failover(void)
>  
>  migrate_set_state(>state, MIGRATION_STATUS_COLO,
>MIGRATION_STATUS_COMPLETED);
> +/*
> + * kick COLO thread which might wait at
> + * qemu_sem_wait(>colo_checkpoint_sem).
> + */
> +colo_checkpoint_notify(migrate_get_current());
>  
>  /*
>   * Wake up COLO thread which may blocked in recv() or send(),
> @@ -552,6 +557,9 @@ static void colo_process_checkpoint(MigrationState *s)
>  
>  qemu_sem_wait(>colo_checkpoint_sem);
>  
> +if (s->state != MIGRATION_STATUS_COLO) {
> +goto out;
> +}
>  ret = colo_do_checkpoint_transaction(s, bioc, fb);
>  if (ret < 0) {
>  goto out;
> -- 
> 2.17.0
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PULL 0/7] x86 queue, 2018-05-15

2018-05-17 Thread Peter Maydell

On 15 May 2018 at 22:54, Eduardo Habkost  wrote:
> The following changes since commit ad1b4ec39caa5b3f17cbd8160283a03a3dcfe2ae:
>
>   Merge remote-tracking branch 
> 'remotes/kraxel/tags/input-20180515-pull-request' into staging (2018-05-15 
> 12:50:06 +0100)
>
> are available in the Git repository at:
>
>   git://github.com/ehabkost/qemu.git tags/x86-next-pull-request
>
> for you to fetch changes up to ab8f992e3e63e91be257e4e343d386dae7be4bcb:
>
>   i386: Add new property to control cache info (2018-05-15 11:33:33 -0300)
>
> 
> x86 queue, 2018-05-15
>
> * KnightsMill CPU model
> * CLDEMOTE(Demote Cache Line) cpu feature
> * pc-i440fx-2.13 and pc-q35-2.13 machine-types
> * Add model-specific cache information to EPYC CPU model
>
> 

Applied, thanks.

-- PMM

Re: [Qemu-devel] [PATCH v3-a 00/27] target/arm: Scalable Vector Extension

2018-05-17 Thread no-reply

Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20180516223007.10256-1-richard.hender...@linaro.org
Subject: [Qemu-devel] [PATCH v3-a 00/27] target/arm: Scalable Vector Extension

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
failed=1
echo
fi
n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag]   
patchew/20180516223007.10256-1-richard.hender...@linaro.org -> 
patchew/20180516223007.10256-1-richard.hender...@linaro.org
Switched to a new branch 'test'
d5d6c8f1c1 target/arm: Implement SVE Permute - Unpredicated Group
3e81d585ea target/arm: Extend vec_reg_offset to larger sizes
b1fd0b35ed target/arm: Implement SVE Permute - Extract Group
7440d418b4 target/arm: Implement SVE Integer Wide Immediate - Predicated Group
c35230538c target/arm: Implement SVE Bitwise Immediate Group
b74492415e target/arm: Implement SVE Element Count Group
4c4d56ab60 target/arm: Implement SVE floating-point trig select coefficient
9037d7db89 target/arm: Implement SVE floating-point exponential accelerator
4dcb0e1e0d target/arm: Implement SVE Compute Vector Address Group
6ebd038b3f target/arm: Implement SVE Bitwise Shift - Unpredicated Group
6f313170d7 target/arm: Implement SVE Stack Allocation Group
3ac441769e target/arm: Implement SVE Index Generation Group
55bb3f6e0f target/arm: Implement SVE Integer Arithmetic - Unpredicated Group
a903947c6e target/arm: Implement SVE Integer Multiply-Add Group
61ca1958eb target/arm: Implement SVE Integer Arithmetic - Unary Predicated Group
10ebd0fdb6 target/arm: Implement SVE bitwise shift by wide elements (predicated)
2bc70c192a target/arm: Implement SVE bitwise shift by vector (predicated)
df68c49188 target/arm: Implement SVE bitwise shift by immediate (predicated)
f7ff94163d target/arm: Implement SVE Integer Reduction Group
b1496deb00 target/arm: Implement SVE Integer Binary Arithmetic - Predicated 
Group
c5f223b5fb target/arm: Implement SVE Predicate Misc Group
a437ba8c21 target/arm: Implement SVE Predicate Logical Operations Group
c2eb96dd2f target/arm: Implement SVE predicate test
d8c66d6568 target/arm: Implement SVE load vector/predicate
356c9c883c target/arm: Implement SVE Bitwise Logical - Unpredicated Group
932a4173aa target/arm: Add SVE decode skeleton
313294c78c target/arm: Introduce translate-a64.h

=== OUTPUT BEGIN ===
Checking PATCH 1/27: target/arm: Introduce translate-a64.h...
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#256: 
new file mode 100644

total: 0 errors, 1 warnings, 337 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Checking PATCH 2/27: target/arm: Add SVE decode skeleton...
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#41: 
new file mode 100644

total: 0 errors, 1 warnings, 140 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Checking PATCH 3/27: target/arm: Implement SVE Bitwise Logical - Unpredicated 
Group...
Checking PATCH 4/27: target/arm: Implement SVE load vector/predicate...
Checking PATCH 5/27: target/arm: Implement SVE predicate test...
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#21: 
new file mode 100644

total: 0 errors, 1 warnings, 197 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Checking PATCH 6/27: target/arm: Implement SVE Predicate Logical Operations 
Group...
Checking PATCH 7/27: target/arm: Implement SVE Predicate Misc Group...
Checking PATCH 8/27: target/arm: Implement SVE Integer Binary Arithmetic - 
Predicated Group...
Checking PATCH 9/27: target/arm: Implement SVE Integer Reduction Group...
Checking PATCH 10/27: target/arm: Implement SVE bitwise shift by immediate 
(predicated)...
Checking PATCH 11/27: target/arm: Implement SVE bitwise shift by vector 
(predicated)...
Checking PATCH 12/27: target/arm: Implement SVE bitwise shift by wide elements 
(predicated)...
ERROR: spaces required around that '*' (ctx:WxV)
#118: FILE: target/arm/translate-sve.c:505:
+static bool trans_##NAME##_zpzw(DisasContext *s, arg_rprr_esz *a, \

Re: [Qemu-devel] [PATCH v3 7/8] xen_disk: use a single entry iovec

2018-05-17 Thread Anthony PERARD

On Fri, May 04, 2018 at 08:26:06PM +0100, Paul Durrant wrote:
> Since xen_disk now always copies data to and from a guest there is no need
> to maintain a vector entry corresponding to every page of a request.
> This means there is less per-request state to maintain so the ioreq
> structure can shrink significantly.
> 
> Signed-off-by: Paul Durrant 

Acked-by: Anthony PERARD 

-- 
Anthony PERARD

Re: [Qemu-devel] [PATCH 1/2] qapi: allow flat unions with empty branches

2018-05-17 Thread Eric Blake


On 05/17/2018 03:05 AM, Markus Armbruster wrote:

QAPI language design alternatives:

1. Having unions cover all discriminator values explicitly is useful.



2. Having unions repeat all the discriminator values explicitly is not
useful.  All we need is replacing the code enforcing that by code
defaulting missing ones to the empty type.




I think I'd vote for 2 (never enforce all-branches coverage) as well.


Eric, what do you think?


I'm sold. Let's go ahead and make the change that for any flat union, a 
branch not listed defaults to the empty type (no added fields) rather 
than being an error, then simplify a couple of the existing flat unions 
that benefit from that.




One more thought: if we ever get around to provide more convenient flat
union syntax so users don't have to enumerate the variant names twice,
we'll need a way to say "empty branch".  Let's worry about that problem
when we have it.


In other words, our current "simple" unions act in a manner that 
declares an implicit enum type - if we ever add an implicit enum to a 
flat union (where you don't have to declare a pre-existing enum type), 
then you need some syntax to declare additional acceptable enum values 
that form an empty branch.  Indeed, not a problem to be solved right now.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-devel] [PATCH] ui: add x_keymap.o to modules

2018-05-17 Thread Gerd Hoffmann

  Hi,

> > Related: can modules depend on modules, so we could make x_keymap a
> > module of its own and have both gtk and sdl depend on it?
> > 
> > That would also be useful when trying to modularize spice.
> 
> How hard would it be to modularize the libspice-server side?  The part
> of the library that is used by QXL rendering should have much fewer
> dependencies than the part that is used for keyboard, mouse, audio,
> vmchannel/agent, etc.

kbd, mouse, audio is needed on the client side and is not part of
libspice-server anyway.  So spice is much less of a burden compared
to sdl/gtk which bring alot of ui toolkit deps.

cheers,
  Gerd

Re: [Qemu-devel] [PATCH] pnv: add a physical mapping array describing MMIO ranges in each chip

2018-05-17 Thread Philippe Mathieu-Daudé

Hi Cédric,

On 05/17/2018 10:18 AM, Cédric Le Goater wrote:
> Based on previous work done in skiboot, the physical mapping array
> helps in calculating the MMIO ranges of each controller depending on
> the chip id and the chip type. This is will be particularly useful for
> the P9 models which use less the XSCOM bus and rely more on MMIOs.
> 
> A link on the chip is now necessary to calculate MMIO BARs and
> sizes. This is why such a link is introduced in the PSIHB model.
> 
> Signed-off-by: Cédric Le Goater 
> ---
>  hw/ppc/pnv.c | 32 +
>  hw/ppc/pnv_psi.c | 11 +-
>  hw/ppc/pnv_xscom.c   |  8 
>  include/hw/ppc/pnv.h | 58 
> +---

I recommend you to use the scripts/git.orderfile to make this review easier.

>  4 files changed, 80 insertions(+), 29 deletions(-)
> 
> diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
> index 031488131629..330bf6f74810 100644
> --- a/hw/ppc/pnv.c
> +++ b/hw/ppc/pnv.c
> @@ -712,6 +712,16 @@ static uint32_t pnv_chip_core_pir_p9(PnvChip *chip, 
> uint32_t core_id)
>   */
>  #define POWER9_CORE_MASK   (0xffull)
>  
> +/*
> + * POWER8 MMIOs
> + */
> +static const PnvPhysMapEntry pnv_chip_power8_phys_map[] = {
> +[PNV_MAP_XSCOM] = { 0x0003fc00ull, 0x0008ull },
> +[PNV_MAP_ICP]   = { 0x00038000ull, 0x0010ull },
> +[PNV_MAP_PSIHB] = { 0x0003fffe8000ull, 0x0010ull },
> +[PNV_MAP_PSIHB_FSP] = { 0x0003ffe0ull, 0x0001ull },
> +};
> +
>  static void pnv_chip_power8e_class_init(ObjectClass *klass, void *data)
>  {
>  DeviceClass *dc = DEVICE_CLASS(klass);
> @@ -721,7 +731,7 @@ static void pnv_chip_power8e_class_init(ObjectClass 
> *klass, void *data)
>  k->chip_cfam_id = 0x221ef0498000ull;  /* P8 Murano DD2.1 */
>  k->cores_mask = POWER8E_CORE_MASK;
>  k->core_pir = pnv_chip_core_pir_p8;
> -k->xscom_base = 0x003fc00ull;
> +k->phys_map = pnv_chip_power8_phys_map;
>  dc->desc = "PowerNV Chip POWER8E";
>  }
>  
> @@ -734,7 +744,7 @@ static void pnv_chip_power8_class_init(ObjectClass 
> *klass, void *data)
>  k->chip_cfam_id = 0x220ea0498000ull; /* P8 Venice DD2.0 */
>  k->cores_mask = POWER8_CORE_MASK;
>  k->core_pir = pnv_chip_core_pir_p8;
> -k->xscom_base = 0x003fc00ull;
> +k->phys_map = pnv_chip_power8_phys_map;
>  dc->desc = "PowerNV Chip POWER8";
>  }
>  
> @@ -747,10 +757,17 @@ static void pnv_chip_power8nvl_class_init(ObjectClass 
> *klass, void *data)
>  k->chip_cfam_id = 0x120d30498000ull;  /* P8 Naples DD1.0 */
>  k->cores_mask = POWER8_CORE_MASK;
>  k->core_pir = pnv_chip_core_pir_p8;
> -k->xscom_base = 0x003fc00ull;
> +k->phys_map = pnv_chip_power8_phys_map;
>  dc->desc = "PowerNV Chip POWER8NVL";
>  }
>  
> +/*
> + * POWER9 MMIOs
> + */
> +static const PnvPhysMapEntry pnv_chip_power9_phys_map[] = {
> +[PNV_MAP_XSCOM] = { 0x000603fcull, 0x0400ull },
> +};
> +
>  static void pnv_chip_power9_class_init(ObjectClass *klass, void *data)
>  {
>  DeviceClass *dc = DEVICE_CLASS(klass);
> @@ -760,7 +777,7 @@ static void pnv_chip_power9_class_init(ObjectClass 
> *klass, void *data)
>  k->chip_cfam_id = 0x220d10498000ull; /* P9 Nimbus DD2.0 */
>  k->cores_mask = POWER9_CORE_MASK;
>  k->core_pir = pnv_chip_core_pir_p9;
> -k->xscom_base = 0x00603fcull;
> +k->phys_map = pnv_chip_power9_phys_map;
>  dc->desc = "PowerNV Chip POWER9";
>  }
>  
> @@ -797,15 +814,14 @@ static void pnv_chip_core_sanitize(PnvChip *chip, Error 
> **errp)
>  static void pnv_chip_init(Object *obj)
>  {
>  PnvChip *chip = PNV_CHIP(obj);
> -PnvChipClass *pcc = PNV_CHIP_GET_CLASS(chip);
> -
> -chip->xscom_base = pcc->xscom_base;
>  
>  object_initialize(>lpc, sizeof(chip->lpc), TYPE_PNV_LPC);
>  object_property_add_child(obj, "lpc", OBJECT(>lpc), NULL);
>  
>  object_initialize(>psi, sizeof(chip->psi), TYPE_PNV_PSI);
>  object_property_add_child(obj, "psi", OBJECT(>psi), NULL);
> +object_property_add_const_link(OBJECT(>psi), "chip", obj,
> +   _abort);
>  object_property_add_const_link(OBJECT(>psi), "xics",
> OBJECT(qdev_get_machine()), _abort);
>  
> @@ -829,7 +845,7 @@ static void pnv_chip_icp_realize(PnvChip *chip, Error 
> **errp)
>  XICSFabric *xi = XICS_FABRIC(qdev_get_machine());
>  
>  name = g_strdup_printf("icp-%x", chip->chip_id);
> -memory_region_init(>icp_mmio, OBJECT(chip), name, PNV_ICP_SIZE);
> +memory_region_init(>icp_mmio, OBJECT(chip), name, 
> PNV_ICP_SIZE(chip));
>  sysbus_init_mmio(SYS_BUS_DEVICE(chip), >icp_mmio);
>  g_free(name);
>  
> diff --git a/hw/ppc/pnv_psi.c b/hw/ppc/pnv_psi.c
> index 5b969127c303..dd7707b971c9 100644
> --- a/hw/ppc/pnv_psi.c
> +++ b/hw/ppc/pnv_psi.c
>

[Qemu-devel] [PATCH v8 08/11] tests: extend qmp test with preconfig checks

2018-05-17 Thread Igor Mammedov

Add permission checks for commands at 'preconfig' stage.

Signed-off-by: Igor Mammedov 
---
v8:
  * there isn't QDECREF anymore use qobject_unref instead
  * add negative test for exit-preconfig
v6:
  * replace 'cont' with 'exit-preconfig' command
v5:
  * s/-preconfig/--preconfig/
v4:
  * s/is_err()/qmp_rsp_is_err()/
  * return true even if 'error' doesn't contain 'desc'
(Eric Blake )
---
 tests/qmp-test.c | 47 +++
 1 file changed, 47 insertions(+)

diff --git a/tests/qmp-test.c b/tests/qmp-test.c
index 88f867f..2ee441c 100644
--- a/tests/qmp-test.c
+++ b/tests/qmp-test.c
@@ -392,6 +392,52 @@ static void add_query_tests(QmpSchema *schema)
 }
 }
 
+static bool qmp_rsp_is_err(QDict *rsp)
+{
+QDict *error = qdict_get_qdict(rsp, "error");
+qobject_unref(rsp);
+return !!error;
+}
+
+static void test_qmp_preconfig(void)
+{
+QDict *rsp, *ret;
+QTestState *qs = qtest_startf("%s --preconfig", common_args);
+
+/* preconfig state */
+/* enabled commands, no error expected  */
+g_assert(!qmp_rsp_is_err(qtest_qmp(qs, "{ 'execute': 'query-commands' 
}")));
+
+/* forbidden commands, expected error */
+g_assert(qmp_rsp_is_err(qtest_qmp(qs, "{ 'execute': 'query-cpus' }")));
+
+/* check that query-status returns preconfig state */
+rsp = qtest_qmp(qs, "{ 'execute': 'query-status' }");
+ret = qdict_get_qdict(rsp, "return");
+g_assert(ret);
+g_assert_cmpstr(qdict_get_try_str(ret, "status"), ==, "preconfig");
+qobject_unref(rsp);
+
+/* exit preconfig state */
+g_assert(!qmp_rsp_is_err(qtest_qmp(qs, "{ 'execute': 'exit-preconfig' 
}")));
+qtest_qmp_eventwait(qs, "RESUME");
+
+/* check that query-status returns running state */
+rsp = qtest_qmp(qs, "{ 'execute': 'query-status' }");
+ret = qdict_get_qdict(rsp, "return");
+g_assert(ret);
+g_assert_cmpstr(qdict_get_try_str(ret, "status"), ==, "running");
+qobject_unref(rsp);
+
+/* check that exit-preconfig returns error after exiting preconfig */
+g_assert(qmp_rsp_is_err(qtest_qmp(qs, "{ 'execute': 'exit-preconfig' }")));
+
+/* enabled commands, no error expected  */
+g_assert(!qmp_rsp_is_err(qtest_qmp(qs, "{ 'execute': 'query-cpus' }")));
+
+qtest_quit(qs);
+}
+
 int main(int argc, char *argv[])
 {
 QmpSchema schema;
@@ -403,6 +449,7 @@ int main(int argc, char *argv[])
 qtest_add_func("qmp/oob", test_qmp_oob);
 qmp_schema_init();
 add_query_tests();
+qtest_add_func("qmp/preconfig", test_qmp_preconfig);
 
 ret = g_test_run();
 
-- 
2.7.4

Re: [Qemu-devel] [PATCH v6 1/2] qmp: adding 'wakeup-suspend-support' in query-target

2018-05-17 Thread Daniel Henrique Barboza




On 05/17/2018 05:44 AM, Markus Armbruster wrote:

Daniel Henrique Barboza  writes:


On 05/15/2018 12:45 PM, Markus Armbruster wrote:

Daniel Henrique Barboza  writes:


When issuing the qmp/hmp 'system_wakeup' command, what happens in a
nutshell is:

- qmp_system_wakeup_request set runstate to RUNNING, sets a wakeup_reason
and notify the event
- in the main_loop, all vcpus are paused, a system reset is issued, all
subscribers of wakeup_notifiers receives a notification, vcpus are then
resumed and the wake up QAPI event is fired

Note that this procedure alone doesn't ensure that the guest will awake
from SUSPENDED state - the subscribers of the wake up event must take
action to resume the guest, otherwise the guest will simply reboot.

At this moment there are only two subscribers of the wake up event: one
in hw/acpi/core.c and another one in hw/i386/xen/xen-hvm.c. This means
that system_wakeup does not work as intended with other architectures.

However, only the presence of 'system_wakeup' is required for QGA to
support 'guest-suspend-ram' and 'guest-suspend-hybrid' at this moment.
This means that the user/management will expect to suspend the guest using
one of those suspend commands and then resume execution using system_wakeup,
regardless of the support offered in system_wakeup in the first place.

This patch adds a new flag called 'wakeup-suspend-support' in TargetInfo
that allows the caller to query if the guest supports wake up from
suspend via system_wakeup. It goes over the subscribers of the wake up
event and, if it's empty, it assumes that the guest does not support
wake up from suspend (and thus, pm-suspend itself).

This is the expected output of query-target when running a x86 guest:

{"execute" : "query-target"}
{"return": {"arch": "x86_64", "wakeup-suspend-support": true}}

This is the output when running a pseries guest:

{"execute" : "query-target"}
{"return": {"arch": "ppc64", "wakeup-suspend-support": false}}

Given that the TargetInfo structure is read-only, adding a new flag to
it is backwards compatible. There is no need to deprecate the old
TargetInfo format.

With this extra tool, management can avoid situations where a guest
that does not have proper suspend/wake capabilities ends up in
inconsistent state (e.g.
https://github.com/open-power-host-os/qemu/issues/31).

Reported-by: Balamuruhan S 
Signed-off-by: Daniel Henrique Barboza 
---
   arch_init.c |  1 +
   include/sysemu/sysemu.h |  1 +
   qapi/misc.json  |  4 +++-
   vl.c| 21 +
   4 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/arch_init.c b/arch_init.c
index 9597218ced..67bdf27528 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -115,6 +115,7 @@ TargetInfo *qmp_query_target(Error **errp)
 info->arch = qapi_enum_parse(_lookup,
TARGET_NAME, -1,
_abort);
+info->wakeup_suspend_support = !qemu_wakeup_notifier_is_empty();

Huh?  Hmm, see "hack" below.


 return info;
   }
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 544ab77a2b..fbe2a3373e 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -69,6 +69,7 @@ typedef enum WakeupReason {
   void qemu_system_reset_request(ShutdownCause reason);
   void qemu_system_suspend_request(void);
   void qemu_register_suspend_notifier(Notifier *notifier);
+bool qemu_wakeup_notifier_is_empty(void);
   void qemu_system_wakeup_request(WakeupReason reason);
   void qemu_system_wakeup_enable(WakeupReason reason, bool enabled);
   void qemu_register_wakeup_notifier(Notifier *notifier);
diff --git a/qapi/misc.json b/qapi/misc.json
index f5988cc0b5..a385d897ae 100644
--- a/qapi/misc.json
+++ b/qapi/misc.json
@@ -2484,11 +2484,13 @@
   # Information describing the QEMU target.
   #
   # @arch: the target architecture
+# @wakeup-suspend-support: true if the target supports wake up from
+#  suspend (since 2.13)
   #
   # Since: 1.2.0
   ##
   { 'struct': 'TargetInfo',
-  'data': { 'arch': 'SysEmuTarget' } }
+  'data': { 'arch': 'SysEmuTarget', 'wakeup-suspend-support': 'bool' } }
 ##
   # @query-target:

Does the documentation of system_wakeup need fixing?

 ##
 # @system_wakeup:
 #
 # Wakeup guest from suspend.  Does nothing in case the guest isn't 
suspended.
 #
 # Since:  1.1
 #
 # Returns:  nothing.
 #
 # Example:
 #
 # -> { "execute": "system_wakeup" }
 # <- { "return": {} }
 #
 ##
 { 'command': 'system_wakeup' }

I figure we better explain right here what the command does, both for
wakeup-suspend-support: false and true.

Hmm, I've re-sent a patch that changes a bit the behavior of system_wakeup
yesterday. The command should now fail with an error if the VM isn't in
SUSPENDED state. However, I failed to update this documentation
in that

Re: [Qemu-devel] [RFC PATCH v2 00/12] iommu: add MemTxAttrs argument to IOMMU translate function

2018-05-17 Thread Peter Maydell

On 17 May 2018 at 13:46, Paolo Bonzini  wrote:
> Yes, this also sounds good.  It does have the same issue for VFIO that
> get_num_indexes() would be called too late to fail (and again, in a
> place where it's hard to fail).
>
> Maybe the index count and the index-from/to-attrs translation should be
> static (index-to-attrs could use the same pair of MemTxAttrs for "which
> bits matter" and "what value should they have"), so that VFIO can
> inspect it and decide if it's okay to proceed with e.g. the first iommu_idx?

Well, they need to be static in the sense that the IOMMU can't
change its opinion later about what it has. So as long as you
have a pointer to the IOMMU you can ask it how many indexes it has.

thanks
-- PMM

Re: [Qemu-devel] [PATCH] nbd/server: introduce NBD_CMD_CACHE

2018-05-17 Thread Eric Blake


On 05/17/2018 04:52 AM, Vladimir Sementsov-Ogievskiy wrote:

Finally, what about this?

13.04.2018 17:31, Vladimir Sementsov-Ogievskiy wrote:

Handle nbd CACHE command. Just do read, without sending read data back.
Cache mechanism should be done by exported node driver chain.


Still waiting on the NBD spec review, which I've pinged on the NBD list. 
But as mentioned there, I'll probably go ahead and accept this (possibly 
with slight tweaks) on Monday, after giving one more weekend for any 
last-minute review comments.


@@ -1826,7 +1826,9 @@ static int nbd_co_receive_request(NBDRequestData 
*req, NBDRequest *request,

  return -EIO;
  }
-    if (request->type == NBD_CMD_READ || request->type == 
NBD_CMD_WRITE) {
+    if (request->type == NBD_CMD_READ || request->type == 
NBD_CMD_WRITE ||

+    request->type == NBD_CMD_CACHE)
+    {
  if (request->len > NBD_MAX_BUFFER_SIZE) {
  error_setg(errp, "len (%" PRIu32" ) is larger than max 
len (%u)",


I'm not sure I agree with this one. Since we aren't passing the cached 
data over the wire, we can reject the command with EINVAL instead of 
killing the connection entirely.


(As it is, I wonder if we can be nicer about rejecting a read request > 
32M. For a write request, we have to disconnect; but for a read request, 
we can keep the connection alive by just returning EINVAL for a 
too-large read, instead of our current behavior of disconnecting)



 request->len, NBD_MAX_BUFFER_SIZE);
@@ -1911,7 +1913,7 @@ static coroutine_fn int 
nbd_do_cmd_read(NBDClient *client, NBDRequest *request,

  int ret;
  NBDExport *exp = client->exp;
-    assert(request->type == NBD_CMD_READ);
+    assert(request->type == NBD_CMD_READ || request->type == 
NBD_CMD_CACHE);

  /* XXX: NBD Protocol only documents use of FUA with WRITE */
  if (request->flags & NBD_CMD_FLAG_FUA) {
@@ -1930,7 +1932,7 @@ static coroutine_fn int 
nbd_do_cmd_read(NBDClient *client, NBDRequest *request,

  ret = blk_pread(exp->blk, request->from + exp->dev_offset, data,
  request->len);
-    if (ret < 0) {
+    if (ret < 0 || request->type == NBD_CMD_CACHE) {
  return nbd_send_generic_reply(client, request->handle, ret,
    "reading from file failed", 
errp);


As for the implementation on the server side, yes, this looks 
reasonable, given the proposed spec wording being considered on the NBD 
list.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-devel] [PATCH] ui: add x_keymap.o to modules

2018-05-17 Thread Paolo Bonzini

On 17/05/2018 15:45, Gerd Hoffmann wrote:
>   Hi,
> 
>>> Related: can modules depend on modules, so we could make x_keymap a
>>> module of its own and have both gtk and sdl depend on it?
>>>
>>> That would also be useful when trying to modularize spice.
>>
>> How hard would it be to modularize the libspice-server side?  The part
>> of the library that is used by QXL rendering should have much fewer
>> dependencies than the part that is used for keyboard, mouse, audio,
>> vmchannel/agent, etc.
> 
> kbd, mouse, audio is needed on the client side and is not part of
> libspice-server anyway.

Yes, I'm talking about separating the client side from the QXL rendering
part.

>  So spice is much less of a burden compared
> to sdl/gtk which bring alot of ui toolkit deps.

But SPICE does bring in 16 libraries, including both of NSS and OpenSSL...

Paolo

Re: [Qemu-devel] [PATCH v3 28/38] target-microblaze: dec_msr: Plug a temp leak

2018-05-17 Thread Philippe Mathieu-Daudé

On 05/16/2018 03:51 PM, Edgar E. Iglesias wrote:
> From: "Edgar E. Iglesias" 
> 
> Plug a temp leak.
> 
> Reported-by: Richard Henderson 
> Signed-off-by: Edgar E. Iglesias 

Reviewed-by: Philippe Mathieu-Daudé 

> ---
>  target/microblaze/translate.c | 13 +
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/target/microblaze/translate.c b/target/microblaze/translate.c
> index 03a0289858..cf1b87c09e 100644
> --- a/target/microblaze/translate.c
> +++ b/target/microblaze/translate.c
> @@ -516,12 +516,17 @@ static void dec_msr(DisasContext *dc)
>  #if !defined(CONFIG_USER_ONLY)
>  /* Catch read/writes to the mmu block.  */
>  if ((sr & ~0xff) == 0x1000) {
> +TCGv_i32 tmp_sr;
> +
>  sr &= 7;
> +tmp_sr = tcg_const_i32(sr);
>  LOG_DIS("m%ss sr%d r%d imm=%x\n", to ? "t" : "f", sr, dc->ra, 
> dc->imm);
> -if (to)
> -gen_helper_mmu_write(cpu_env, tcg_const_i32(sr), cpu_R[dc->ra]);
> -else
> -gen_helper_mmu_read(cpu_R[dc->rd], cpu_env, tcg_const_i32(sr));
> +if (to) {
> +gen_helper_mmu_write(cpu_env, tmp_sr, cpu_R[dc->ra]);
> +} else {
> +gen_helper_mmu_read(cpu_R[dc->rd], cpu_env, tmp_sr);
> +}
> +tcg_temp_free_i32(tmp_sr);
>  return;
>  }
>  #endif
>

Re: [Qemu-devel] [PATCH v3 36/38] target-microblaze: Use tcg_gen_movcond in eval_cond_jmp

2018-05-17 Thread Philippe Mathieu-Daudé

On 05/16/2018 03:51 PM, Edgar E. Iglesias wrote:
> From: "Edgar E. Iglesias" 
> 
> Cleanup eval_cond_jmp to use tcg_gen_movcond_i64().
> No functional change.
> 
> Suggested-by: Richard Henderson 
> Signed-off-by: Edgar E. Iglesias 
> ---
>  target/microblaze/translate.c | 16 ++--
>  1 file changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/target/microblaze/translate.c b/target/microblaze/translate.c
> index a846797d9c..78c2855ff0 100644
> --- a/target/microblaze/translate.c
> +++ b/target/microblaze/translate.c
> @@ -1171,12 +1171,16 @@ static inline void eval_cc(DisasContext *dc, unsigned 
> int cc,
>  
>  static void eval_cond_jmp(DisasContext *dc, TCGv_i64 pc_true, TCGv_i64 
> pc_false)
>  {
> -TCGLabel *l1 = gen_new_label();
> -/* Conditional jmp.  */
> -tcg_gen_mov_i64(cpu_SR[SR_PC], pc_false);
> -tcg_gen_brcondi_i32(TCG_COND_EQ, env_btaken, 0, l1);
> -tcg_gen_mov_i64(cpu_SR[SR_PC], pc_true);
> -gen_set_label(l1);
> +TCGv_i64 tmp_btaken = tcg_temp_new_i64();
> +TCGv_i64 tmp_zero = tcg_const_i64(0);
> +
> +tcg_gen_extu_i32_i64(tmp_btaken, env_btaken);

env_btaken is i32, ok.

Reviewed-by: Philippe Mathieu-Daudé 

> +tcg_gen_movcond_i64(TCG_COND_NE, cpu_SR[SR_PC],
> +tmp_btaken, tmp_zero,
> +pc_true, pc_false);
> +
> +tcg_temp_free_i64(tmp_btaken);
> +tcg_temp_free_i64(tmp_zero);
>  }
>  
>  static void dec_bcc(DisasContext *dc)
>

[Qemu-devel] [PATCH v2] linux-user: update comments to point to tcg_exec_init()

2018-05-17 Thread Igor Mammedov

cpu_init() was replaced by cpu_create() since 2.12 but comments
weren't updated. So update stale comments to point that page
sizes arei actually initialized by tcg_exec_init(). Also move
another qemu_host_page_size related comment before tcg_exec_init()
where it belongs.

Signed-off-by: Igor Mammedov 
---
---
 bsd-user/main.c   | 7 ---
 linux-user/main.c | 5 ++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/bsd-user/main.c b/bsd-user/main.c
index 283dc6f..da3b833 100644
--- a/bsd-user/main.c
+++ b/bsd-user/main.c
@@ -898,9 +898,10 @@ int main(int argc, char **argv)
 cpu_model = "any";
 #endif
 }
+
+/* init tcg before creating CPUs and to get qemu_host_page_size */
 tcg_exec_init(0);
-/* NOTE: we need to init the CPU at this stage to get
-   qemu_host_page_size */
+
 cpu_type = parse_cpu_model(cpu_model);
 cpu = cpu_create(cpu_type);
 env = cpu->env_ptr;
@@ -917,7 +918,7 @@ int main(int argc, char **argv)
 envlist_free(envlist);
 
 /*
- * Now that page sizes are configured in cpu_init() we can do
+ * Now that page sizes are configured in tcg_exec_init() we can do
  * proper page alignment for guest_base.
  */
 guest_base = HOST_PAGE_ALIGN(guest_base);
diff --git a/linux-user/main.c b/linux-user/main.c
index 3234754..78d6d3e 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -671,9 +671,8 @@ int main(int argc, char **argv, char **envp)
 }
 cpu_type = parse_cpu_model(cpu_model);
 
+/* init tcg before creating CPUs and to get qemu_host_page_size */
 tcg_exec_init(0);
-/* NOTE: we need to init the CPU at this stage to get
-   qemu_host_page_size */
 
 cpu = cpu_create(cpu_type);
 env = cpu->env_ptr;
@@ -693,7 +692,7 @@ int main(int argc, char **argv, char **envp)
 envlist_free(envlist);
 
 /*
- * Now that page sizes are configured in cpu_init() we can do
+ * Now that page sizes are configured in tcg_exec_init() we can do
  * proper page alignment for guest_base.
  */
 guest_base = HOST_PAGE_ALIGN(guest_base);
-- 
2.7.4

Re: [Qemu-devel] [Qemu-ppc] [PATCH v4 06/14] spapr: prepare for multi stage hotplug handlers

2018-05-17 Thread Greg Kurz

On Thu, 17 May 2018 10:15:19 +0200
David Hildenbrand  wrote:

> For multi stage hotplug handlers, we'll have to do some error handling
> in some hotplug functions, so let's use a local error variable (except
> for unplug requests).
> 
> Also, add code to pass control to the final stage hotplug handler at the
> parent bus.
> 
> Signed-off-by: David Hildenbrand 
> ---
>  hw/ppc/spapr.c | 54 +++---
>  1 file changed, 43 insertions(+), 11 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index ebf30dd60b..b7c5c95f7a 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -3571,27 +3571,48 @@ static void spapr_machine_device_plug(HotplugHandler 
> *hotplug_dev,
>  {
>  MachineState *ms = MACHINE(hotplug_dev);
>  sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(ms);
> +Error *local_err = NULL;
>  
> +/* final stage hotplug handler */
>  if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
>  int node;
>  
>  if (!smc->dr_lmb_enabled) {
> -error_setg(errp, "Memory hotplug not supported for this 
> machine");
> -return;
> +error_setg(_err,
> +   "Memory hotplug not supported for this machine");
> +goto out;
>  }
> -node = object_property_get_uint(OBJECT(dev), PC_DIMM_NODE_PROP, 
> errp);
> -if (*errp) {

Heh ! This is even a fix since errp could theoretically be NULL.

Reviewed-by: Greg Kurz 

> -return;
> +node = object_property_get_uint(OBJECT(dev), PC_DIMM_NODE_PROP,
> +_err);
> +if (local_err) {
> +goto out;
>  }
>  if (node < 0 || node >= MAX_NODES) {
> -error_setg(errp, "Invaild node %d", node);
> -return;
> +error_setg(_err, "Invaild node %d", node);
> +goto out;
>  }
>  
> -spapr_memory_plug(hotplug_dev, dev, node, errp);
> +spapr_memory_plug(hotplug_dev, dev, node, _err);
>  } else if (object_dynamic_cast(OBJECT(dev), TYPE_SPAPR_CPU_CORE)) {
> -spapr_core_plug(hotplug_dev, dev, errp);
> +spapr_core_plug(hotplug_dev, dev, _err);
> +} else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
> +hotplug_handler_plug(dev->parent_bus->hotplug_handler, dev, 
> _err);
> +}
> +out:
> +error_propagate(errp, local_err);
> +}
> +
> +static void spapr_machine_device_unplug(HotplugHandler *hotplug_dev,
> +DeviceState *dev, Error **errp)
> +{
> +Error *local_err = NULL;
> +
> +/* final stage hotplug handler */
> +if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
> +hotplug_handler_unplug(dev->parent_bus->hotplug_handler, dev,
> +   _err);
>  }
> +error_propagate(errp, local_err);
>  }
>  
>  static void spapr_machine_device_unplug_request(HotplugHandler *hotplug_dev,
> @@ -3618,17 +3639,27 @@ static void 
> spapr_machine_device_unplug_request(HotplugHandler *hotplug_dev,
>  return;
>  }
>  spapr_core_unplug_request(hotplug_dev, dev, errp);
> +} else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
> +hotplug_handler_unplug_request(dev->parent_bus->hotplug_handler, dev,
> +   errp);
>  }
>  }
>  
>  static void spapr_machine_device_pre_plug(HotplugHandler *hotplug_dev,
>DeviceState *dev, Error **errp)
>  {
> +Error *local_err = NULL;
> +
> +/* final stage hotplug handler */
>  if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
> -spapr_memory_pre_plug(hotplug_dev, dev, errp);
> +spapr_memory_pre_plug(hotplug_dev, dev, _err);
>  } else if (object_dynamic_cast(OBJECT(dev), TYPE_SPAPR_CPU_CORE)) {
> -spapr_core_pre_plug(hotplug_dev, dev, errp);
> +spapr_core_pre_plug(hotplug_dev, dev, _err);
> +} else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
> +hotplug_handler_pre_plug(dev->parent_bus->hotplug_handler, dev,
> + _err);
>  }
> +error_propagate(errp, local_err);
>  }
>  
>  static HotplugHandler *spapr_get_hotplug_handler(MachineState *machine,
> @@ -3988,6 +4019,7 @@ static void spapr_machine_class_init(ObjectClass *oc, 
> void *data)
>  mc->get_default_cpu_node_id = spapr_get_default_cpu_node_id;
>  mc->possible_cpu_arch_ids = spapr_possible_cpu_arch_ids;
>  hc->unplug_request = spapr_machine_device_unplug_request;
> +hc->unplug = spapr_machine_device_unplug;
>  
>  smc->dr_lmb_enabled = true;
>  mc->default_cpu_type = POWERPC_CPU_TYPE_NAME("power8_v2.0");

Re: [Qemu-devel] [PATCH] ui: add x_keymap.o to modules

2018-05-17 Thread Gerd Hoffmann

  Hi,

> > +ifeq ($(CONFIG_X11),y)
> > +sdl.mo-objs += x_keymap.o
> > +gtk.mo-objs += x_keymap.o
> 
> Would this cause symbol clash if both sdl & gtk modules are loaded
> at the same time, or have we used linker scripts to limit what symbols
> each module exposes ?

Related: can modules depend on modules, so we could make x_keymap a
module of its own and have both gtk and sdl depend on it?

That would also be useful when trying to modularize spice.

cheers,
  Gerd

[Qemu-devel] KVM Forum 2018: Call For Participation

2018-05-17 Thread Paolo Bonzini


KVM Forum 2018: Call For Participation
October 24-26, 2018 -  Edinburgh International Conference Centre - Edinburgh, UK

(All submissions must be received before midnight June 14, 2018)
=

KVM Forum is an annual event that presents a rare opportunity
for developers and users to meet, discuss the state of Linux
virtualization technology, and plan for the challenges ahead. 
We invite you to lead part of the discussion by submitting a speaking
proposal for KVM Forum 2018.

At this highly technical conference, developers driving innovation
in the KVM virtualization stack (Linux, KVM, QEMU, libvirt) can
meet users who depend on KVM as part of their offerings, or to
power their data centers and clouds.

KVM Forum will include sessions on the state of the KVM
virtualization stack, planning for the future, and many
opportunities for attendees to collaborate. After more than ten
years of development in the Linux kernel, KVM continues to be a
critical part of the FOSS cloud infrastructure.

This year, KVM Forum is joining Open Source Summit in Edinburgh, UK. Selected
talks from KVM Forum will be presented on Wednesday October 24 to the full
audience of the Open Source Summit.  Also, attendees of KVM Forum will have
access to all of the talks from Open Source Summit on Wednesday.
https://events.linuxfoundation.org/events/kvm-forum-2018/program/cfp/

Suggested topics:
* Scaling, latency optimizations, performance tuning, real-time guests
* Hardening and security
* New features
* Testing

KVM and the Linux kernel:
* Nested virtualization
* Resource management (CPU, I/O, memory) and scheduling
* VFIO: IOMMU, SR-IOV, virtual GPU, etc.
* Networking: Open vSwitch, XDP, etc.
* virtio and vhost
* Architecture ports and new processor features

QEMU:
* Management interfaces: QOM and QMP
* New devices, new boards, new architectures
* Graphics, desktop virtualization and virtual GPU
* New storage features
* High availability, live migration and fault tolerance
* Emulation and TCG
* Firmware: ACPI, UEFI, coreboot, U-Boot, etc.

Management and infrastructure
* Managing KVM: Libvirt, OpenStack, oVirt, etc.
* Storage: Ceph, Gluster, SPDK, etc.r
* Network Function Virtualization: DPDK, OPNFV, OVN, etc.
* Provisioning


===
SUBMITTING YOUR PROPOSAL
===
Abstracts due: June 14, 2018

Please submit a short abstract (~150 words) describing your presentation
proposal. Slots vary in length up to 45 minutes. Also include the proposal
type -- one of:
- technical talk
- end-user talk

Submit your proposal here:http://events.linuxfoundation.org/cfp
Please only use the categories "presentation" and "panel discussion"

You will receive a notification whether or not your presentation proposal
was accepted by August 10, 2018.

Speakers will receive a complimentary pass for the event. In the instance
that case your submission has multiple presenters, only the primary speaker for 
a
proposal will receive a complimentary event pass. For panel discussions, all
panelists will receive a complimentary event pass.

TECHNICAL TALKS

A good technical talk should not just report on what has happened over
the last year; it should present a concrete problem and how it impacts
the user and/or developer community. Whenever applicable, focus on
work that needs to be done, difficulties that haven't yet been solved,
and on decisions that other developers should be aware of. Summarizing
recent developments is okay but it should not be more than a small
portion of the overall talk.

END-USER TALKS

One of the big challenges as developers is to know what, where and how
people actually use our software. We will reserve a few slots for end
users talking about their deployment challenges and achievements.

If you are using KVM in production you are encouraged submit a speaking
proposal. Simply mark it as an end-user talk. As an end user, this is a
unique opportunity to get your input to developers.

HANDS-ON / BOF SESSIONS

We will reserve some time for people to get together and discuss
strategic decisions as well as other topics that are best solved within
smaller groups.

These sessions will be announced during the event. If you are interested
in organizing such a session, please add it to the list at

  http://www.linux-kvm.org/page/KVM_Forum_2018_BOF

Let people you think who might be interested know about your BOF, and encourage
them to add their names to the wiki page as well. Please try to
add your ideas to the list before KVM Forum starts.


PANEL DISCUSSIONS

If you are proposing a panel discussion, please make sure that you list
all of your potential panelists in your the abstract. We will request full
biographies if a panel is acceped.


===
HOTEL / TRAVEL
===

This year's event will take place at the Edinburgh International Conference 
Centre.
For information about discounted hotel

Re: [Qemu-devel] [PATCH v2] linux-user: update comments to point to tcg_exec_init()

2018-05-17 Thread Laurent Vivier

Le 17/05/2018 à 13:51, Igor Mammedov a écrit :
> cpu_init() was replaced by cpu_create() since 2.12 but comments
> weren't updated. So update stale comments to point that page
> sizes arei actually initialized by tcg_exec_init(). Also move
> another qemu_host_page_size related comment before tcg_exec_init()
> where it belongs.
> 
> Signed-off-by: Igor Mammedov 

Reviewed-by: Laurent Vivier

1 2 3 4 >

1 - 100 of 301 matches

Mail list logo