Re: [Qemu-devel] Can qemu run on the ARM hardware?

2017-03-28 Thread Wangjintang

> -Original Message-
> From: Qemu-devel
> [mailto:qemu-devel-bounces+wangjintang=huawei@nongnu.org] On
> Behalf Of liangy...@zhwei.com
> Sent: Wednesday, March 29, 2017 10:53 AM
> To: qemu-devel
> Subject: [Qemu-devel] Can qemu run on the ARM hardware?
> 
> Dear QEMU Community,
> 
> I am a newbie for the QEMU.  I have some questions to ask for you,
> My  hardware is an arm board and installed with ubuntu Linux OS.  i have
> some code must run on the x86 platform.
> so i want to install  a vm software like the VMware and VirtualBox on the
> arm board and simulate an x86 machine.
> 
> 1, can the qemu run on the arm platform?
Yes. 
> 2, if the qemu support arm hardware,  when the host is arm(qemu run on
> the arm),  can the qemu simulate a x86 virtual machine?
Yes.
> 3, if the qemu can meet my requirements, where can i find the maual?
http://www.qemu-project.org/documentation/
> 
> Wait for you reply,  thank you very much!
> 
> 
> 
> liangy...@zhwei.com



Re: [Qemu-devel] [PATCH 1/1] target/ppc: Improve accuracy of guest HTM availability on P8s

2017-03-28 Thread David Gibson
On Wed, Mar 29, 2017 at 04:01:28PM +1100, Sam Bobroff wrote:
> On Power8 hosts it is currently theoretically possible for QEMU/KVM-HV guests
> to receive a ibm,pa-features property indicating that HTM support is available
> when it is not.  The situation would occur if the platform firmware of
> a Power8 host cleared the HTM bit of the ibm,pa-features property.
> QEMU would query KVM for the availability of HTM, which will return no
> support, but workaround code in kvm_arch_init_vcpu() would then
> re-enable it because KVM_HV is in use and the processor is P8.
> 
> This patch adjusts the workaround in kvm_arch_init_vcpu() so that it does not
> enable HTM (in the above case) unless the host kernel indicates to the QEMU
> process, via the auxiliary vector, that userspace can use HTM (via the HWCAP2
> bit KVM_FEATURE2_HTM).
> 
> The reason to use the value from the auxiliary vector is that it is
> set based only on what the host kernel found in the ibm,pa-features
> HTM bit at boot time.
> 
> Signed-off-by: Sam Bobroff 

Applied to ppc-for-2.9.

> ---
>  target/ppc/kvm.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
> index 9f1f132cef..8a54709ae4 100644
> --- a/target/ppc/kvm.c
> +++ b/target/ppc/kvm.c
> @@ -49,6 +49,7 @@
>  #if defined(TARGET_PPC64)
>  #include "hw/ppc/spapr_cpu_core.h"
>  #endif
> +#include "elf.h"
>  
>  //#define DEBUG_KVM
>  
> @@ -509,8 +510,11 @@ int kvm_arch_init_vcpu(CPUState *cs)
>  case POWERPC_MMU_2_07:
>  if (!cap_htm && !kvmppc_is_pr(cs->kvm_state)) {
>  /* KVM-HV has transactional memory on POWER8 also without the
> - * KVM_CAP_PPC_HTM extension, so enable it here instead. */
> -cap_htm = true;
> + * KVM_CAP_PPC_HTM extension, so enable it here instead as
> + * long as it's availble to userspace on the host. */
> +if (qemu_getauxval(AT_HWCAP2) & PPC_FEATURE2_HAS_HTM) {
> +cap_htm = true;
> +}
>  }
>  break;
>  default:

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH qemu] spapr_pci: Removed unused include

2017-03-28 Thread David Gibson
On Wed, Mar 29, 2017 at 04:09:58PM +1100, Alexey Kardashevskiy wrote:
> Signed-off-by: Alexey Kardashevskiy 
> ---
> 
> This leftover is just confusing :)

Applied to ppc-for-2.10.

> 
> 
> ---
>  hw/ppc/spapr_pci.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index 097ebdd51d..e7567e2e8f 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -50,8 +50,6 @@
>  #include "sysemu/hostmem.h"
>  #include "sysemu/numa.h"
>  
> -#include "hw/vfio/vfio.h"
> -
>  /* Copied from the kernel arch/powerpc/platforms/pseries/msi.c */
>  #define RTAS_QUERY_FN   0
>  #define RTAS_CHANGE_FN  1

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH v4 1/8] xen: import ring.h from xen

2017-03-28 Thread Juergen Gross
On 29/03/17 01:54, Stefano Stabellini wrote:
> On Tue, 28 Mar 2017, Juergen Gross wrote:
>> On 28/03/17 00:48, Stefano Stabellini wrote:
>>> On Mon, 27 Mar 2017, Juergen Gross wrote:
 On 24/03/17 18:37, Stefano Stabellini wrote:
> On Fri, 24 Mar 2017, Juergen Gross wrote:
>> On 23/03/17 19:22, Stefano Stabellini wrote:
>>> On Thu, 23 Mar 2017, Paolo Bonzini wrote:
 On 23/03/2017 14:55, Juergen Gross wrote:
> On 23/03/17 14:00, Greg Kurz wrote:
>> On Mon, 20 Mar 2017 11:19:05 -0700
>> Stefano Stabellini  wrote:
>>
>>> Do not use the ring.h header installed on the system. Instead, 
>>> import
>>> the header into the QEMU codebase. This avoids problems when QEMU is
>>> built against a Xen version too old to provide all the ring macros.
>>>
>>> Signed-off-by: Stefano Stabellini 
>>> Reviewed-by: Greg Kurz 
>>> CC: anthony.per...@citrix.com
>>> CC: jgr...@suse.com
>>> ---
>>> NB: The new macros have not been committed to Xen yet. Do not apply 
>>> this
>>> patch until they do.
>>> ---
>>
>> Looking at your other series for the kernel part of this feature:
>>
>> https://lkml.org/lkml/2017/3/22/761
>>
>> I realize that the ring.h header from Xen also exists in the kernel 
>> tree... 
>>
>> Shouldn't all the code that can be used in both kernel and userspace 
>> go to a
>> header file under include/uapi in the kernel tree ? And then we 
>> would import
>> it under include/standard-headers/linux in the QEMU tree and we 
>> could keep it
>> in sync using scripts/update-linux-headers.sh.
>>
>> Cc'ing Paolo for insights.
>
> As Xen isn't part of the kernel we don't want that. You can use and/or
> build qemu with xen-9pfs backend support on an old Linux kernel 
> without
> the related frontend.

 As long as the header changes rarely, I guess it's fine not to go
 through update-linux-headers.sh.
>>>
>>> Very rarely, last time ring.h was changed was 2015, and to introduce a
>>> new macro (which we don't necessarily need in QEMU).
>>>
>>>
> OTOH I don't see the advantage of not using the headers from Xen. This
> is working for qdisk and pvusb backends and for all the Xen libraries.
> Do you expect the 9pfs backend to be used for a qemu version built
> against a Xen version not supporting that backend?
>>>
>>> Yes, I think that is entirely possible: Xen and QEMU versions can mix
>>> and match.
>>>
>>> Keeping in mind that the 9pfs backend has actually no build dependencies
>>> on Xen, except for these new ring.h macros, we have the following
>>> options:
>>>
>>> 1) we build the 9pfs backend only for Xen >= 4.9, because of the new
>>>macros in ring.h that we need
>>
>> Right. You have sent 9pfs support patches for Xen tools. So obviously
>> you need a proper Xen version to use 9pfs. Why not build qemu against
>> it? Do you really expect a new Xen being used with an old qemu while
>> wanting to use new features? That makes no sense for me.
>  
> Tools support is needed to setup the frontend/backend connection as
> usual, but that's not a requirement for building the 9pfs backend. In
> fact, the backend doesn't need any tools support for it to work. The
> macro themselves are just a convenience - the backend would work just
> fine without them. Why restrict the QEMU build gratuitously?

 You are duplicating a header without any real benefit I can see. This
 is adding future work for keeping both versions of the header in sync.

 In which scenario would you want qemu to support xen-9pfs without being
 built against a Xen version supporting xen-9pfs?

 I am not completely against copying the header, I just don't see an
 advantage for any distro or user in doing it.
>>>
>>> I understand your point of view, and honestly it wouldn't be a problem
>>> doing it the way you suggested either. However, I think that going
>>> forward it will be less of a maintenance pain to keep ring.h in sync,
>>> compared to maintaining a versioned build dependency between Xen and
>>> QEMU for the compilation of one PV backend. We do have version checks
>>> in QEMU for Xen compatibility, but not for PV backends or the xenpv
>>> machine yet.
>>
>> For the pvUSB backend I just used a mandatory macro from the header for
>> the #ifdef. The backend will signal support when it was defined during
>> build and will refuse initialization otherwise. Xen tools are able to
>> recoginze qemu support of the backend by looking into Xenstore.
> 
> 
> What do you think of 

Re: [Qemu-devel] [PATCH 1/1] target/ppc: Improve accuracy of guest HTM availability on P8s

2017-03-28 Thread Thomas Huth
On 29.03.2017 07:01, Sam Bobroff wrote:
> On Power8 hosts it is currently theoretically possible for QEMU/KVM-HV guests
> to receive a ibm,pa-features property indicating that HTM support is available
> when it is not.  The situation would occur if the platform firmware of
> a Power8 host cleared the HTM bit of the ibm,pa-features property.

Out of curiosity: Is there a machine out there where this happens?

> QEMU would query KVM for the availability of HTM, which will return no
> support, but workaround code in kvm_arch_init_vcpu() would then
> re-enable it because KVM_HV is in use and the processor is P8.
> 
> This patch adjusts the workaround in kvm_arch_init_vcpu() so that it does not
> enable HTM (in the above case) unless the host kernel indicates to the QEMU
> process, via the auxiliary vector, that userspace can use HTM (via the HWCAP2
> bit KVM_FEATURE2_HTM).
> 
> The reason to use the value from the auxiliary vector is that it is
> set based only on what the host kernel found in the ibm,pa-features
> HTM bit at boot time.
> 
> Signed-off-by: Sam Bobroff 
> ---
>  target/ppc/kvm.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
> index 9f1f132cef..8a54709ae4 100644
> --- a/target/ppc/kvm.c
> +++ b/target/ppc/kvm.c
> @@ -49,6 +49,7 @@
>  #if defined(TARGET_PPC64)
>  #include "hw/ppc/spapr_cpu_core.h"
>  #endif
> +#include "elf.h"
>  
>  //#define DEBUG_KVM
>  
> @@ -509,8 +510,11 @@ int kvm_arch_init_vcpu(CPUState *cs)
>  case POWERPC_MMU_2_07:
>  if (!cap_htm && !kvmppc_is_pr(cs->kvm_state)) {
>  /* KVM-HV has transactional memory on POWER8 also without the
> - * KVM_CAP_PPC_HTM extension, so enable it here instead. */
> -cap_htm = true;
> + * KVM_CAP_PPC_HTM extension, so enable it here instead as
> + * long as it's availble to userspace on the host. */
> +if (qemu_getauxval(AT_HWCAP2) & PPC_FEATURE2_HAS_HTM) {
> +cap_htm = true;
> +}

That's a very good idea! ... but I think you could also merge the two
if-statements into one to save one level of indentation.

 Thomas




Re: [Qemu-devel] [PATCH v3 7/8] ppc/pnv: link the CPUs to the machine XICSFabric

2017-03-28 Thread David Gibson
On Tue, Mar 28, 2017 at 09:32:31AM +0200, Cédric Le Goater wrote:
> This assigns the ICPState object to the CPU using the PIR number for
> lookups before calling the XICS layer to finish the job.
> 
> Signed-off-by: Cédric Le Goater 
> Reviewed-by: David Gibson 
> ---
>  hw/ppc/pnv.c  |  2 ++
>  hw/ppc/pnv_core.c | 20 
>  2 files changed, 18 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
> index e441b8ac1cad..ae894834892f 100644
> --- a/hw/ppc/pnv.c
> +++ b/hw/ppc/pnv.c
> @@ -711,6 +711,8 @@ static void pnv_chip_realize(DeviceState *dev, Error 
> **errp)
>  object_property_set_int(OBJECT(pnv_core),
>  pcc->core_pir(chip, core_hwid),
>  "pir", _fatal);
> +object_property_add_const_link(OBJECT(pnv_core), "xics",
> +   qdev_get_machine(), _fatal);
>  object_property_set_bool(OBJECT(pnv_core), true, "realized",
>   _fatal);
>  object_unref(OBJECT(pnv_core));
> diff --git a/hw/ppc/pnv_core.c b/hw/ppc/pnv_core.c
> index d79d530b4881..a5e9614dac7d 100644
> --- a/hw/ppc/pnv_core.c
> +++ b/hw/ppc/pnv_core.c
> @@ -25,6 +25,7 @@
>  #include "hw/ppc/pnv.h"
>  #include "hw/ppc/pnv_core.h"
>  #include "hw/ppc/pnv_xscom.h"
> +#include "hw/ppc/xics.h"
>  
>  static void powernv_cpu_reset(void *opaque)
>  {
> @@ -43,7 +44,7 @@ static void powernv_cpu_reset(void *opaque)
>  env->msr |= MSR_HVB; /* Hypervisor mode */
>  }
>  
> -static void powernv_cpu_init(PowerPCCPU *cpu, Error **errp)
> +static void powernv_cpu_init(PowerPCCPU *cpu, XICSFabric *xi, Error **errp)
>  {
>  CPUPPCState *env = >env;
>  int core_pir;
> @@ -63,6 +64,9 @@ static void powernv_cpu_init(PowerPCCPU *cpu, Error **errp)
>  cpu_ppc_tb_init(env, PNV_TIMEBASE_FREQ);
>  
>  qemu_register_reset(powernv_cpu_reset, cpu);
> +
> +cpu->icp = OBJECT(xics_icp_get(xi, pir->default_value));
> +xics_cpu_setup(xi, cpu);

Hmm.. seems like xics_cpu_setup() should probably set the cpu->icp link..

>  }
>  
>  /*
> @@ -110,7 +114,7 @@ static const MemoryRegionOps pnv_core_xscom_ops = {
>  .endianness = DEVICE_BIG_ENDIAN,
>  };
>  
> -static void pnv_core_realize_child(Object *child, Error **errp)
> +static void pnv_core_realize_child(Object *child, XICSFabric *xi, Error 
> **errp)
>  {
>  Error *local_err = NULL;
>  CPUState *cs = CPU(child);
> @@ -122,7 +126,7 @@ static void pnv_core_realize_child(Object *child, Error 
> **errp)
>  return;
>  }
>  
> -powernv_cpu_init(cpu, _err);
> +powernv_cpu_init(cpu, xi, _err);
>  if (local_err) {
>  error_propagate(errp, local_err);
>  return;
> @@ -140,6 +144,14 @@ static void pnv_core_realize(DeviceState *dev, Error 
> **errp)
>  void *obj;
>  int i, j;
>  char name[32];
> +Object *xi;
> +
> +xi = object_property_get_link(OBJECT(dev), "xics", _err);
> +if (!xi) {
> +error_setg(errp, "%s: required link 'xics' not found: %s",
> +   __func__, error_get_pretty(local_err));
> +return;
> +}
>  
>  pc->threads = g_malloc0(size * cc->nr_threads);
>  for (i = 0; i < cc->nr_threads; i++) {
> @@ -160,7 +172,7 @@ static void pnv_core_realize(DeviceState *dev, Error 
> **errp)
>  for (j = 0; j < cc->nr_threads; j++) {
>  obj = pc->threads + j * size;
>  
> -pnv_core_realize_child(obj, _err);
> +pnv_core_realize_child(obj, XICS_FABRIC(xi), _err);
>  if (local_err) {
>  goto err;
>  }

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH v3 5/8] ppc/pnv: create the ICP and ICS objects under the machine

2017-03-28 Thread David Gibson
On Tue, Mar 28, 2017 at 09:32:29AM +0200, Cédric Le Goater wrote:
> Like this is done for the sPAPR machine, we use a simple array under
> the PowerNV machine to store the Interrupt Control Presenters (ICP)
> objects, one for each vCPU. This array is indexed by 'cpu_index' of
> the CPUState but the users will provide a core PIR number. The mapping
> is done in the icp_get() handler of the machine and is transparent to
> XICS.
> 
> The Interrupt Control Sources (ICS), Processor Service Interface and
> PCI-E interface models, will be introduced in subsequent patches. For
> now, we have none, so we just prepare ground with place holders.
> 
> Finally, to interface with the XICS layer which manipulates the ICP
> and ICS objects, we extend the PowerNV machine with an XICSFabric
> interface and its associated handlers.
> 
> Signed-off-by: Cédric Le Goater 
> ---
> 
>  Changes since v2:
> 
>  - removed the list of ICS. The handlers will iterate on the chips to
>use the available ICS.
> 
>  Changes since v1:
> 
>  - handled pir-to-cpu_index mapping under icp_get 
>  - removed ics_eio handler
>  - changed ICP name indexing
>  - removed sysbus parenting of the ICP object
> 
>  hw/ppc/pnv.c | 96 
> 
>  include/hw/ppc/pnv.h |  3 ++
>  2 files changed, 99 insertions(+)
> 
> diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
> index 3fa722af82e6..e441b8ac1cad 100644
> --- a/hw/ppc/pnv.c
> +++ b/hw/ppc/pnv.c
> @@ -33,7 +33,10 @@
>  #include "exec/address-spaces.h"
>  #include "qemu/cutils.h"
>  #include "qapi/visitor.h"
> +#include "monitor/monitor.h"
> +#include "hw/intc/intc.h"
>  
> +#include "hw/ppc/xics.h"
>  #include "hw/ppc/pnv_xscom.h"
>  
>  #include "hw/isa/isa.h"
> @@ -417,6 +420,23 @@ static void ppc_powernv_init(MachineState *machine)
>  machine->cpu_model = "POWER8";
>  }
>  
> +/* Create the Interrupt Control Presenters before the vCPUs */
> +pnv->nr_servers = pnv->num_chips * smp_cores * smp_threads;
> +pnv->icps = g_new0(PnvICPState, pnv->nr_servers);
> +for (i = 0; i < pnv->nr_servers; i++) {
> +PnvICPState *icp = >icps[i];
> +char name[32];
> +
> +/* TODO: fix ICP object name to be in sync with the core name */
> +snprintf(name, sizeof(name), "icp[%d]", i);

It may end up being the same value, but since the qom name is exposed
to the outside, it would be better to have it be the PIR, rather than
the cpu_index.

> +object_initialize(icp, sizeof(*icp), TYPE_PNV_ICP);
> +object_property_add_child(OBJECT(pnv), name, OBJECT(icp),
> +  _fatal);
> +object_property_add_const_link(OBJECT(icp), "xics", OBJECT(pnv),
> +   _fatal);
> +object_property_set_bool(OBJECT(icp), true, "realized", 
> _fatal);
> +}
> +
>  /* Create the processor chips */
>  chip_typename = g_strdup_printf(TYPE_PNV_CHIP "-%s", machine->cpu_model);
>  if (!object_class_by_name(chip_typename)) {
> @@ -737,6 +757,71 @@ static const TypeInfo pnv_chip_info = {
>  .abstract  = true,
>  };
>  
> +static ICSState *pnv_ics_get(XICSFabric *xi, int irq)
> +{
> +PnvMachineState *pnv = POWERNV_MACHINE(xi);
> +int i;
> +
> +for (i = 0; i < pnv->num_chips; i++) {
> +/* place holder */
> +}
> +return NULL;
> +}
> +
> +static void pnv_ics_resend(XICSFabric *xi)
> +{
> +PnvMachineState *pnv = POWERNV_MACHINE(xi);
> +int i;
> +
> +for (i = 0; i < pnv->num_chips; i++) {
> +/* place holder */
> +}
> +}

Seems like the above two functions belong in a later patch.

> +
> +static PowerPCCPU *ppc_get_vcpu_by_pir(int pir)
> +{
> +CPUState *cs;
> +
> +CPU_FOREACH(cs) {
> +PowerPCCPU *cpu = POWERPC_CPU(cs);
> +CPUPPCState *env = >env;
> +
> +if (env->spr_cb[SPR_PIR].default_value == pir) {
> +return cpu;
> +}
> +}
> +
> +return NULL;
> +}
> +
> +static ICPState *pnv_icp_get(XICSFabric *xi, int pir)
> +{
> +PnvMachineState *pnv = POWERNV_MACHINE(xi);
> +PowerPCCPU *cpu = ppc_get_vcpu_by_pir(pir);
> +
> +if (!cpu) {
> +return NULL;
> +}
> +
> +assert(cpu->parent_obj.cpu_index < pnv->nr_servers);
> +return ICP(>icps[cpu->parent_obj.cpu_index]);

Should use CPU() instead of parent_obj here.

> +}
> +
> +static void pnv_pic_print_info(InterruptStatsProvider *obj,
> +   Monitor *mon)
> +{
> +PnvMachineState *pnv = POWERNV_MACHINE(obj);
> +int i;
> +
> +for (i = 0; i < pnv->nr_servers; i++) {
> +icp_pic_print_info(ICP(>icps[i]), mon);
> +}
> +
> +for (i = 0; i < pnv->num_chips; i++) {
> +/* place holder */
> +}
> +}
> +
>  static void pnv_get_num_chips(Object *obj, Visitor *v, const char *name,
>void *opaque, Error **errp)
>  {
> @@ -787,6 +872,8 @@ static void 

Re: [Qemu-devel] [PATCH qemu] spapr_pci: Warn when RAM page size is not enabled in IOMMU page mask

2017-03-28 Thread David Gibson
On Tue, Mar 28, 2017 at 07:13:49PM +1100, Alexey Kardashevskiy wrote:
> If a page size used by QEMU is not enabled in the PHB IOMMU page mask,
> in-kernel acceleration of TCE handling won't be enabled and performance
> might be slower than expected.
> 
> This prints a warning if system page size is not enabled. This should
> print a warning if huge pages are enabled but sphb.pgsz still uses
> the default value of 4K|64K.
> 
> Signed-off-by: Alexey Kardashevskiy 
> ---
> 
> This follow-up for "exec, spapr_pci: Advertise huge IOMMU pages".
> Instead of silently changing PHB properties+behaviour for better
> performance if huge pages are detected, this simply warns the user
> if IOMMU page mask needs to be adjusted. Since the user chooses
> huge pages in the first place, the user can also supply additional
> page masks as well.

Applied to ppc-for-2.10.

> 
> 
> ---
>  hw/ppc/spapr_pci.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index 98c52e411f..097ebdd51d 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -1771,6 +1771,12 @@ static void spapr_phb_realize(DeviceState *dev, Error 
> **errp)
>  }
>  
>  /* DMA setup */
> +if ((sphb->page_size_mask & qemu_getrampagesize()) == 0) {
> +error_report("System page size 0x%lx is not enabled in 
> page_size_mask "
> + "(0x%"PRIx64"). Performance may be slow",
> + qemu_getrampagesize(), sphb->page_size_mask);
> +}
> +
>  for (i = 0; i < windows_supported; ++i) {
>  tcet = spapr_tce_new_table(DEVICE(sphb), sphb->dma_liobn[i]);
>  if (!tcet) {

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


[Qemu-devel] [PATCH qemu] spapr_pci: Removed unused include

2017-03-28 Thread Alexey Kardashevskiy
Signed-off-by: Alexey Kardashevskiy 
---

This leftover is just confusing :)


---
 hw/ppc/spapr_pci.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 097ebdd51d..e7567e2e8f 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -50,8 +50,6 @@
 #include "sysemu/hostmem.h"
 #include "sysemu/numa.h"
 
-#include "hw/vfio/vfio.h"
-
 /* Copied from the kernel arch/powerpc/platforms/pseries/msi.c */
 #define RTAS_QUERY_FN   0
 #define RTAS_CHANGE_FN  1
-- 
2.11.0




Re: [Qemu-devel] [RFC PATCH qemu 1/3] memory: Add get_fd() hook for IOMMU MR

2017-03-28 Thread Alexey Kardashevskiy
On 29/03/17 14:35, David Gibson wrote:
> On Tue, Mar 28, 2017 at 11:48:29AM -0600, Alex Williamson wrote:
>> On Tue, 28 Mar 2017 20:05:28 +1100
>> Alexey Kardashevskiy  wrote:
>>
>>> Signed-off-by: Alexey Kardashevskiy 
>>> ---
>>>  include/exec/memory.h | 2 ++
>>>  hw/ppc/spapr_iommu.c  | 8 
>>>  2 files changed, 10 insertions(+)
>>>
>>> diff --git a/include/exec/memory.h b/include/exec/memory.h
>>> index e39256ad03..925c10b35b 100644
>>> --- a/include/exec/memory.h
>>> +++ b/include/exec/memory.h
>>> @@ -174,6 +174,8 @@ struct MemoryRegionIOMMUOps {
>>>  void (*notify_flag_changed)(MemoryRegion *iommu,
>>>  IOMMUNotifierFlag old_flags,
>>>  IOMMUNotifierFlag new_flags);
>>> +/* Returns a kernel fd for IOMMU */
>>> +int (*get_fd)(MemoryRegion *iommu);
>>
>> What if we used this as a prototype:
>>
>> int (*get_fd)(IOMMUFdType type, MemoryRegion *iommu);
>>
>> And then we defined:
>>
>> typedef enum {
>> SPAPR_IOMMU_TABLE_FD = 0,
>> } IOMMUFdType;
> 
> Are we expecting any new types of fd?  Maybe it would be simpler just
> to name this spapr_tce_fd() or something more specific, and only
> generalize if we really need it for another fd type.


So far we managed to keep VFIO and sPAPR-IOMMU relatively separate - they
do not include each others headers and interact via memory regions and
kernel uapi interface. The only direct connection between VFIO and sPAPR at
all is vfio_eeh_as_ok/vfio_eeh_as_op which is rather workaround. I like
this separation tbh.



> 
>>
>> Such that you're actually asking the IOMMUOps for a specific type of FD
>> and it either has it or not, so the caller doesn't need to assume what
>> it is they get back.
>>
>> Furthermore, add:
>>
>> int memory_region_iommu_get_fd(IOMMUFdType type, MemoryRegion *mr)
>> {
>> assert(memory_region_is_iommu(mr));
>>
>> if (mr->iommu_ops && mr->iommu_ops->get_fd) {
>> return mr->iommu_ops->get_fd(type, mr);
>> }
>>
>> return -1;
>> }
>>
>>>  };
>>>
>>
>> This should be two patches, patch 1 above, patch 2 below
>>   
>>>  typedef struct CoalescedMemoryRange CoalescedMemoryRange;
>>> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
>>> index 9e30e148d6..b61c8f053e 100644
>>> --- a/hw/ppc/spapr_iommu.c
>>> +++ b/hw/ppc/spapr_iommu.c
>>> @@ -170,6 +170,13 @@ static void spapr_tce_notify_flag_changed(MemoryRegion 
>>> *iommu,
>>>  }
>>>  }
>>>  
>>> +static int spapr_tce_get_fd(MemoryRegion *iommu)
>>> +{
>>> +sPAPRTCETable *tcet = container_of(iommu, sPAPRTCETable, iommu);
>>> +
>>> +return tcet->fd;
>>
>>
>> This would then be:
>>
>> return type == SPAPR_IOMMU_TABLE_FD ? tcet->fd : -1;
>>
>>> +}
>>> +
>>>  static int spapr_tce_table_post_load(void *opaque, int version_id)
>>>  {
>>>  sPAPRTCETable *tcet = SPAPR_TCE_TABLE(opaque);
>>> @@ -251,6 +258,7 @@ static MemoryRegionIOMMUOps spapr_iommu_ops = {
>>>  .translate = spapr_tce_translate_iommu,
>>>  .get_min_page_size = spapr_tce_get_min_page_size,
>>>  .notify_flag_changed = spapr_tce_notify_flag_changed,
>>> +.get_fd = spapr_tce_get_fd,
>>>  };
>>>  
>>>  static int spapr_tce_table_realize(DeviceState *dev)
>>
> 


-- 
Alexey



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [PATCH 0/1] target/ppc: Improve accuracy of guest HTM availability on P8s

2017-03-28 Thread Sam Bobroff

Hi QEMU,

See the patch itself for a description of the issue it's fixing.

Additionally, I've done some investigation on the effect of the patch on older
kernels. The discussion below only refers to the situation in which the
existing workaround would have an effect (system is P8, KVM is HV and KVM does
not indicate support for HTM):

PPC_FEATURE2_HTM has existed since mid 2013 [1], and at that time it was
unconditionally set for P8: nothing will change here because the new test will
always be true, always allowing the workaround to activate. The patch doesn't
help here.

In early 2016 [2] PPC_FEATURE2_HTM was linked to the HTM bit of
ibm,pa-features: the patch will help from here onwards.

So the patch doesn't fix all situations but it doesn't break any either, and it
fixes versions going forward.

Cheers,
Sam.

1: Around kernel commit cbbc6f1b1433ef553d57826eee87a84ca49645ce (v3.10-rc1)
2: Around kernel commit 4705e02498d6d5a7ab98dfee9595cd5e91db2017 (v4.6-rc1)


Sam Bobroff (1):
  target/ppc: Improve accuracy of guest HTM availability on P8s

 target/ppc/kvm.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

-- 
2.12.1.382.gc0f9c7058




[Qemu-devel] [PATCH 1/1] target/ppc: Improve accuracy of guest HTM availability on P8s

2017-03-28 Thread Sam Bobroff
On Power8 hosts it is currently theoretically possible for QEMU/KVM-HV guests
to receive a ibm,pa-features property indicating that HTM support is available
when it is not.  The situation would occur if the platform firmware of
a Power8 host cleared the HTM bit of the ibm,pa-features property.
QEMU would query KVM for the availability of HTM, which will return no
support, but workaround code in kvm_arch_init_vcpu() would then
re-enable it because KVM_HV is in use and the processor is P8.

This patch adjusts the workaround in kvm_arch_init_vcpu() so that it does not
enable HTM (in the above case) unless the host kernel indicates to the QEMU
process, via the auxiliary vector, that userspace can use HTM (via the HWCAP2
bit KVM_FEATURE2_HTM).

The reason to use the value from the auxiliary vector is that it is
set based only on what the host kernel found in the ibm,pa-features
HTM bit at boot time.

Signed-off-by: Sam Bobroff 
---
 target/ppc/kvm.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
index 9f1f132cef..8a54709ae4 100644
--- a/target/ppc/kvm.c
+++ b/target/ppc/kvm.c
@@ -49,6 +49,7 @@
 #if defined(TARGET_PPC64)
 #include "hw/ppc/spapr_cpu_core.h"
 #endif
+#include "elf.h"
 
 //#define DEBUG_KVM
 
@@ -509,8 +510,11 @@ int kvm_arch_init_vcpu(CPUState *cs)
 case POWERPC_MMU_2_07:
 if (!cap_htm && !kvmppc_is_pr(cs->kvm_state)) {
 /* KVM-HV has transactional memory on POWER8 also without the
- * KVM_CAP_PPC_HTM extension, so enable it here instead. */
-cap_htm = true;
+ * KVM_CAP_PPC_HTM extension, so enable it here instead as
+ * long as it's availble to userspace on the host. */
+if (qemu_getauxval(AT_HWCAP2) & PPC_FEATURE2_HAS_HTM) {
+cap_htm = true;
+}
 }
 break;
 default:
-- 
2.12.1.382.gc0f9c7058




Re: [Qemu-devel] [PATCH] e1000: disable debug by default

2017-03-28 Thread Jason Wang



On 2017年03月22日 11:07, Jason Wang wrote:

Disable debug output by default, the information were not needed for
release.

Cc: Peter Maydell 
Cc: Stefan Hajnoczi 
Cc: Leonid Bloch 
Cc: Dmitry Fleytman 
Cc: qemu-sta...@nongnu.org
Signed-off-by: Jason Wang 
---
  hw/net/e1000.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/net/e1000.c b/hw/net/e1000.c
index 9324949..f2e5072 100644
--- a/hw/net/e1000.c
+++ b/hw/net/e1000.c
@@ -40,7 +40,7 @@
  
  static const uint8_t bcast[] = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff};
  
-#define E1000_DEBUG

+/* #define E1000_DEBUG */
  
  #ifdef E1000_DEBUG

  enum {


Applied, thanks.



Re: [Qemu-devel] [PATCH for 2.9?] tap-win32: don't abort in tap_enable(); enables -netdev tap

2017-03-28 Thread Jason Wang



On 2017年03月29日 12:13, Andrew Baumann via Qemu-devel wrote:

From: Jason Wang [mailto:jasow...@redhat.com]
Sent: Tuesday, 28 March 2017 19:39

On 2017年03月29日 02:55, Andrew Baumann wrote:

From: Stefan Weil [mailto:s...@weilnetz.de]
Sent: Tuesday, 28 March 2017 11:28
Am 25.03.2017 um 00:46 schrieb Andrew Baumann:

The docs generally steer users away from using the legacy -net
parameter, however on win32 attempting to enable a tap device using
-netdev tap fails at an abort() in tap_enable(). Removing the abort()s
seems to be enough to get everything working, so do that.

Signed-off-by: Andrew Baumann

[...]

Jason, what is the use of tap_enable, tap_disable?

It should be only used when we want to enable and disable a specific
queue of a multiqueue supported tap.


   Is it fine
to simply do nothing on Windows here?

Unless windows support multiqueue tap, we should keep the assert here.


I was also hoping for a review -- I'm no expert on this stuff either, but my

quick reading of those code paths is that they issue ioctls to enable/disable
packet reception on the underlying tap device. As win32 TAP is implemented,
that is already enabled from start of day.

It's possible this patch still does not permit dynamic reconfiguration of tap

devices (e.g. from the monitor console). However, it does work with the -
netdev tap option on the command-line.

And is this something for QEMU‌ 2.9 (I added question to subject line)?

Ideally, yes. If not, -netdev tap will continue to blow up in the abort as it 
does

today...

Andrew

Yes, so the problem is we should prevent tap_enable() and tap_disable()
from being called if multiqueue is disabled.

I believe the following patch can fix this issue, could you give a try
on this?

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index c321680..7d091c9 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -510,6 +510,10 @@ static int peer_attach(VirtIONet *n, int index)
   return 0;
   }

+if (n->max_queues == 1) {
+return 0;
+}
+
   return tap_enable(nc->peer);
   }


Yep, this works. Thanks!

Andrew


Thanks, queued this for 2.9.



Re: [Qemu-devel] [RFC for-2.10 3/3] pseries: Allow PCIe virtio and XHCI on pseries machine type

2017-03-28 Thread David Gibson
On Wed, Mar 29, 2017 at 01:20:50PM +1100, Alexey Kardashevskiy wrote:
> On 28/03/17 13:16, David Gibson wrote:
> > pseries now allows PCIe devices (both emulated and VFIO), although its
> > PCI bus is in most respects a plain PCI bus - this uses paravirtualized
> > access methods to PCIe extended config space defined in the PAPR spec.
> > 
> > However, because the bus is not PCIe, it means that virtio-pci and XHCI
> > devices will present themselves as plain PCI rather than PCIe, which would
> > be preferable.
> > 
> > This patch uses the new hook to override the behaviour for such PCI/PCIe
> > "hybrid" devices to allow PCIe virtio-pci and XHCI on pseries.
> 
> 
> Not clear what all these tests/virtio-*.c changes are for and why here -
> does "make check" break if you do not enforce disable-legacy=off?

Yes it does.  There's an explanation in the 0/3 message.

> > Signed-off-by: David Gibson 
> > ---
> >  hw/ppc/spapr_pci.c   | 9 +
> >  tests/virtio-9p-test.c   | 2 +-
> >  tests/virtio-blk-test.c  | 4 ++--
> >  tests/virtio-net-test.c  | 2 +-
> >  tests/virtio-scsi-test.c | 2 +-
> >  5 files changed, 14 insertions(+), 5 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> > index 98c52e4..7686f7f 100644
> > --- a/hw/ppc/spapr_pci.c
> > +++ b/hw/ppc/spapr_pci.c
> > @@ -1979,6 +1979,14 @@ static const char 
> > *spapr_phb_root_bus_path(PCIHostState *host_bridge,
> >  return sphb->dtbusname;
> >  }
> >  
> > +static bool spapr_phb_allow_hybrid_pcie(PCIHostState *host_bridge,
> > +PCIDevice *pci_dev)
> > +{
> > +sPAPRPHBState *sphb = SPAPR_PCI_HOST_BRIDGE(host_bridge);
> > +
> > +return sphb->pcie_ecs;
> > +}
> > +
> >  static void spapr_phb_class_init(ObjectClass *klass, void *data)
> >  {
> >  PCIHostBridgeClass *hc = PCI_HOST_BRIDGE_CLASS(klass);
> > @@ -1986,6 +1994,7 @@ static void spapr_phb_class_init(ObjectClass *klass, 
> > void *data)
> >  HotplugHandlerClass *hp = HOTPLUG_HANDLER_CLASS(klass);
> >  
> >  hc->root_bus_path = spapr_phb_root_bus_path;
> > +hc->allow_hybrid_pcie = spapr_phb_allow_hybrid_pcie;
> >  dc->realize = spapr_phb_realize;
> >  dc->props = spapr_phb_properties;
> >  dc->reset = spapr_phb_reset;
> > diff --git a/tests/virtio-9p-test.c b/tests/virtio-9p-test.c
> > index 43a1ad8..ae0d51e 100644
> > --- a/tests/virtio-9p-test.c
> > +++ b/tests/virtio-9p-test.c
> > @@ -32,7 +32,7 @@ static QVirtIO9P *qvirtio_9p_start(const char *driver)
> >  {
> >  const char *arch = qtest_get_arch();
> >  const char *cmd = "-fsdev local,id=fsdev0,security_model=none,path=%s "
> > -  "-device %s,fsdev=fsdev0,mount_tag=%s";
> > +  "-device 
> > %s,fsdev=fsdev0,mount_tag=%s,disable-legacy=off";
> >  QVirtIO9P *v9p = g_new0(QVirtIO9P, 1);
> >  
> >  v9p->test_share = g_strdup("/tmp/qtest.XX");
> > diff --git a/tests/virtio-blk-test.c b/tests/virtio-blk-test.c
> > index 1eee95d..5fb7882 100644
> > --- a/tests/virtio-blk-test.c
> > +++ b/tests/virtio-blk-test.c
> > @@ -65,7 +65,7 @@ static QOSState *pci_test_start(void)
> >  const char *cmd = "-drive if=none,id=drive0,file=%s,format=raw "
> >"-drive if=none,id=drive1,file=/dev/null,format=raw "
> >"-device virtio-blk-pci,id=drv0,drive=drive0,"
> > -  "addr=%x.%x";
> > +  "addr=%x.%x,disable-legacy=off";
> >  
> >  tmp_path = drive_create();
> >  
> > @@ -656,7 +656,7 @@ static void pci_hotplug(void)
> >  
> >  /* plug secondary disk */
> >  qpci_plug_device_test("virtio-blk-pci", "drv1", PCI_SLOT_HP,
> > -  "'drive': 'drive1'");
> > +  "'drive': 'drive1', 'disable-legacy': 'off'");
> >  
> >  dev = virtio_blk_pci_init(qs->pcibus, PCI_SLOT_HP);
> >  g_assert(dev);
> > diff --git a/tests/virtio-net-test.c b/tests/virtio-net-test.c
> > index 8f94360..a35d87b 100644
> > --- a/tests/virtio-net-test.c
> > +++ b/tests/virtio-net-test.c
> > @@ -55,7 +55,7 @@ static QOSState *pci_test_start(int socket)
> >  {
> >  const char *arch = qtest_get_arch();
> >  const char *cmd = "-netdev socket,fd=%d,id=hs0 -device "
> > -  "virtio-net-pci,netdev=hs0";
> > +  "virtio-net-pci,netdev=hs0,disable-legacy=off";
> >  
> >  if (strcmp(arch, "i386") == 0 || strcmp(arch, "x86_64") == 0) {
> >  return qtest_pc_boot(cmd, socket);
> > diff --git a/tests/virtio-scsi-test.c b/tests/virtio-scsi-test.c
> > index 0eabd56..5a802d9 100644
> > --- a/tests/virtio-scsi-test.c
> > +++ b/tests/virtio-scsi-test.c
> > @@ -36,7 +36,7 @@ static QOSState *qvirtio_scsi_start(const char 
> > *extra_opts)
> >  {
> >  const char *arch = qtest_get_arch();
> >  const char *cmd = "-drive id=drv0,if=none,file=/dev/null,format=raw "
> > -  "-device 

Re: [Qemu-devel] [PATCH v3 1/8] ppc/xics: introduce an 'icp' backlink under PowerPCCPU

2017-03-28 Thread David Gibson
On Tue, Mar 28, 2017 at 09:32:25AM +0200, Cédric Le Goater wrote:
> Today, the ICPState array of the sPAPR machine is indexed with
> 'cpu_index' of the CPUState. This numbering of CPUs is internal to
> QEMU and the guest only knows about what is exposed in the device
> tree, that is the 'cpu_dt_id'. This is why sPAPR uses the helper
> xics_get_cpu_index_by_dt_id() to do the mapping in a couple of places.
> 
> To provide a more generic XICS layer, we need to abstract the IRQ
> 'server' number and remove any assumption made on its nature. It
> should not be used as a 'cpu_index' for lookups like xics_cpu_setup()
> and xics_cpu_destroy() do.
> 
> To reach that goal, we choose to introduce an 'icp' backlink under
> PowerPCCPU, and let the machine core init routine do the ICPState
> lookup. The resulting object is stored under PowerPCCPU which is
> passed on to xics_cpu_setup(). The IRQ 'server' number in XICS is now
> generic. sPAPR uses 'cpu_dt_id' and PowerNV will use 'PIR' number.
> 
> This also has the benefit of simplifying the sPAPR hcall routines
> which do not need to do any ICPState lookups anymore.

Since you've changed the type to a generic Object *, the name needs to
be changed to something generic as well.  Maybe 'intc' or
'irq_private'.

> 
> Signed-off-by: Cédric Le Goater 
> ---
> 
> Changes since v2:
> 
>  - changed the 'icp' backlink type to be an 'Object'
> 
>  hw/intc/xics.c  |  4 ++--
>  hw/intc/xics_spapr.c| 20 +---
>  hw/ppc/spapr_cpu_core.c |  5 -
>  target/ppc/cpu.h|  1 +
>  4 files changed, 12 insertions(+), 18 deletions(-)
> 
> diff --git a/hw/intc/xics.c b/hw/intc/xics.c
> index e740989a1162..bb485cc5b078 100644
> --- a/hw/intc/xics.c
> +++ b/hw/intc/xics.c
> @@ -52,7 +52,7 @@ int xics_get_cpu_index_by_dt_id(int cpu_dt_id)
>  void xics_cpu_destroy(XICSFabric *xi, PowerPCCPU *cpu)
>  {
>  CPUState *cs = CPU(cpu);
> -ICPState *icp = xics_icp_get(xi, cs->cpu_index);
> +ICPState *icp = ICP(cpu->icp);
>  
>  assert(icp);
>  assert(cs == icp->cs);
> @@ -65,7 +65,7 @@ void xics_cpu_setup(XICSFabric *xi, PowerPCCPU *cpu)
>  {
>  CPUState *cs = CPU(cpu);
>  CPUPPCState *env = >env;
> -ICPState *icp = xics_icp_get(xi, cs->cpu_index);
> +ICPState *icp = ICP(cpu->icp);
>  ICPStateClass *icpc;
>  
>  assert(icp);
> diff --git a/hw/intc/xics_spapr.c b/hw/intc/xics_spapr.c
> index 84d24b2837a7..6144f9876ae3 100644
> --- a/hw/intc/xics_spapr.c
> +++ b/hw/intc/xics_spapr.c
> @@ -43,11 +43,9 @@
>  static target_ulong h_cppr(PowerPCCPU *cpu, sPAPRMachineState *spapr,
> target_ulong opcode, target_ulong *args)
>  {
> -CPUState *cs = CPU(cpu);
> -ICPState *icp = xics_icp_get(XICS_FABRIC(spapr), cs->cpu_index);
>  target_ulong cppr = args[0];
>  
> -icp_set_cppr(icp, cppr);
> +icp_set_cppr(ICP(cpu->icp), cppr);
>  return H_SUCCESS;
>  }
>  
> @@ -69,9 +67,7 @@ static target_ulong h_ipi(PowerPCCPU *cpu, 
> sPAPRMachineState *spapr,
>  static target_ulong h_xirr(PowerPCCPU *cpu, sPAPRMachineState *spapr,
> target_ulong opcode, target_ulong *args)
>  {
> -CPUState *cs = CPU(cpu);
> -ICPState *icp = xics_icp_get(XICS_FABRIC(spapr), cs->cpu_index);
> -uint32_t xirr = icp_accept(icp);
> +uint32_t xirr = icp_accept(ICP(cpu->icp));
>  
>  args[0] = xirr;
>  return H_SUCCESS;
> @@ -80,9 +76,7 @@ static target_ulong h_xirr(PowerPCCPU *cpu, 
> sPAPRMachineState *spapr,
>  static target_ulong h_xirr_x(PowerPCCPU *cpu, sPAPRMachineState *spapr,
>   target_ulong opcode, target_ulong *args)
>  {
> -CPUState *cs = CPU(cpu);
> -ICPState *icp = xics_icp_get(XICS_FABRIC(spapr), cs->cpu_index);
> -uint32_t xirr = icp_accept(icp);
> +uint32_t xirr = icp_accept(ICP(cpu->icp));
>  
>  args[0] = xirr;
>  args[1] = cpu_get_host_ticks();
> @@ -92,21 +86,17 @@ static target_ulong h_xirr_x(PowerPCCPU *cpu, 
> sPAPRMachineState *spapr,
>  static target_ulong h_eoi(PowerPCCPU *cpu, sPAPRMachineState *spapr,
>target_ulong opcode, target_ulong *args)
>  {
> -CPUState *cs = CPU(cpu);
> -ICPState *icp = xics_icp_get(XICS_FABRIC(spapr), cs->cpu_index);
>  target_ulong xirr = args[0];
>  
> -icp_eoi(icp, xirr);
> +icp_eoi(ICP(cpu->icp), xirr);
>  return H_SUCCESS;
>  }
>  
>  static target_ulong h_ipoll(PowerPCCPU *cpu, sPAPRMachineState *spapr,
>  target_ulong opcode, target_ulong *args)
>  {
> -CPUState *cs = CPU(cpu);
> -ICPState *icp = xics_icp_get(XICS_FABRIC(spapr), cs->cpu_index);
>  uint32_t mfrr;
> -uint32_t xirr = icp_ipoll(icp, );
> +uint32_t xirr = icp_ipoll(ICP(cpu->icp), );
>  
>  args[0] = xirr;
>  args[1] = mfrr;
> diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
> index 6883f0991ae9..f9ca3f09a0f8 100644
> --- a/hw/ppc/spapr_cpu_core.c
> +++ 

[Qemu-devel] [PATCH] tcg/i386: Display AMD HT warning only for KVM

2017-03-28 Thread Pranith Kumar
TCG uses the AMD cpu which warns when we use hyperthreading. Disable
the warning for TCG since it is not necessary.

Signed-off-by: Pranith Kumar 
---
 target/i386/cpu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 7aa762245a..66242893b6 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -3647,7 +3647,7 @@ static void x86_cpu_realizefn(DeviceState *dev, Error 
**errp)
  * NOTE: the following code has to follow qemu_init_vcpu(). Otherwise
  * cs->nr_threads hasn't be populated yet and the checking is incorrect.
  */
-if (!IS_INTEL_CPU(env) && cs->nr_threads > 1 && !ht_warned) {
+if (!IS_INTEL_CPU(env) && cs->nr_threads > 1 && !ht_warned && 
kvm_enabled()) {
 error_report("AMD CPU doesn't support hyperthreading. Please configure"
  " -smp options properly.");
 ht_warned = true;
-- 
2.11.0




Re: [Qemu-devel] [RFC PATCH qemu 3/3] vfio: Enable in-kernel acceleration via VFIO KVM device

2017-03-28 Thread Alexey Kardashevskiy
On 29/03/17 04:48, Alex Williamson wrote:
> On Tue, 28 Mar 2017 20:05:30 +1100
> Alexey Kardashevskiy  wrote:
> 
>> This enables in-kernel acceleration of TCE update requests via
>> VFIO KVM device.
>>
>> Signed-off-by: Alexey Kardashevskiy 
>> ---
>>  include/hw/vfio/vfio-common.h |  1 +
>>  target/ppc/kvm_ppc.h  |  6 ++
>>  hw/ppc/spapr_iommu.c  |  4 
>>  hw/vfio/common.c  | 13 +
>>  hw/vfio/spapr.c   | 26 ++
>>  target/ppc/kvm.c  |  7 ++-
>>  hw/vfio/trace-events  |  1 +
>>  7 files changed, 57 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>> index c582de18c9..ee8c96cc4a 100644
>> --- a/include/hw/vfio/vfio-common.h
>> +++ b/include/hw/vfio/vfio-common.h
> 
> Two patches intermixed here again it seems.  I'll refer to them as "A"
> and "B".  Seems easy to split at the file level.
> 
> Patch "B"
> 
>> @@ -175,6 +175,7 @@ extern const MemoryListener vfio_prereg_listener;
>>  int vfio_spapr_create_window(VFIOContainer *container,
>>   MemoryRegionSection *section,
>>   hwaddr *pgsize);
>> +int vfio_spapr_notify_kvm(int vfio_kvm_device_fd, int groupfd, int tablefd);
>>  int vfio_spapr_remove_window(VFIOContainer *container,
>>   hwaddr offset_within_address_space);
>>  
>> diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
>> index f48243d13f..ce7327a4e0 100644
>> --- a/target/ppc/kvm_ppc.h
>> +++ b/target/ppc/kvm_ppc.h
> 
> Patch "A"
> 
>> @@ -46,6 +46,7 @@ void *kvmppc_create_spapr_tce(uint32_t liobn, uint32_t 
>> page_shift,
>>  int kvmppc_remove_spapr_tce(void *table, int pfd, uint32_t window_size);
>>  int kvmppc_reset_htab(int shift_hint);
>>  uint64_t kvmppc_rma_size(uint64_t current_size, unsigned int hash_shift);
>> +bool kvmppc_has_cap_spapr_vfio(void);
>>  #endif /* !CONFIG_USER_ONLY */
>>  bool kvmppc_has_cap_epr(void);
>>  int kvmppc_define_rtas_kernel_token(uint32_t token, const char *function);
>> @@ -216,6 +217,11 @@ static inline bool 
>> kvmppc_is_mem_backend_page_size_ok(char *obj_path)
>>  return true;
>>  }
>>  
>> +static inline bool kvmppc_has_cap_spapr_vfio(void)
>> +{
>> +return false;
>> +}
>> +
>>  #endif /* !CONFIG_USER_ONLY */
>>  
>>  static inline bool kvmppc_has_cap_epr(void)
>> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
>> index b61c8f053e..fc23d81645 100644
>> --- a/hw/ppc/spapr_iommu.c
>> +++ b/hw/ppc/spapr_iommu.c
> 
> Patch "A"
> 
>> @@ -293,6 +293,10 @@ void spapr_tce_set_need_vfio(sPAPRTCETable *tcet, bool 
>> need_vfio)
>>  
>>  tcet->need_vfio = need_vfio;
>>  
>> +if (!need_vfio || (tcet->fd != -1 && kvmppc_has_cap_spapr_vfio())) {
>> +return;
>> +}


Separation to "A" and "B" makes sense most of the time, however this bit
being put into "A" will look at the capability and change the behaviour
effectively disabling TCE requests handling in the kernel as
vfio_spapr_notify_kvm() only appears in "B". Bad for bisectability.

I could swap "A" and "B", this way vfio_spapr_notify_kvm() would fail but
thing would keep working.



>> +
>>  oldtable = tcet->table;
>>  
>>  tcet->table = spapr_tce_alloc_table(tcet->liobn,
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index c75c7594d5..9aaf861904 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
> 
> Patch "B"
> 
>> @@ -440,6 +440,19 @@ static void vfio_listener_region_add(MemoryListener 
>> *listener,
>>  goto fail;
>>  }
>>  
>> +#ifdef CONFIG_KVM
> 
> I don't think we need this just for kvm_enabled(), do we?


We do for vfio_kvm_device_fd - this one is defined under #ifdef.


> 
>> +if (kvm_enabled() && section->mr->iommu_ops->get_fd) {
>> +VFIOGroup *group;
>> +int tablefd =  section->mr->iommu_ops->get_fd(section->mr);
> 
> This would change to
> 
> tablefd=memory_region_iommu_get_fd(SPAPR_IOMMU_TABLE_FD,section->mr);
> 
>> +
>> +if (tablefd != -1) {
>> +QLIST_FOREACH(group, >group_list, 
>> container_next) {
>> +vfio_spapr_notify_kvm(vfio_kvm_device_fd,
>> +  group->fd, tablefd);
>> +}
>> +}
>> +}
>> +#endif
>>  vfio_host_win_add(container, section->offset_within_address_space,
>>section->offset_within_address_space +
>>int128_get64(section->size) - 1, pgsize);
>> diff --git a/hw/vfio/spapr.c b/hw/vfio/spapr.c
>> index 4409bcc0d7..dffef3bd5f 100644
>> --- a/hw/vfio/spapr.c
>> +++ b/hw/vfio/spapr.c
> 
> Patch "B"
> 
>> @@ -17,6 +17,9 @@
>>  #include "hw/hw.h"
>>  #include "qemu/error-report.h"
>>  #include "trace.h"
>> +#ifdef CONFIG_KVM
>> +#include "linux/kvm.h"
>> +#endif
>>  
>>  static bool 

Re: [Qemu-devel] [PATCH for 2.9?] tap-win32: don't abort in tap_enable(); enables -netdev tap

2017-03-28 Thread Andrew Baumann via Qemu-devel
> From: Jason Wang [mailto:jasow...@redhat.com]
> Sent: Tuesday, 28 March 2017 19:39
> 
> On 2017年03月29日 02:55, Andrew Baumann wrote:
> >> From: Stefan Weil [mailto:s...@weilnetz.de]
> >> Sent: Tuesday, 28 March 2017 11:28
> >> Am 25.03.2017 um 00:46 schrieb Andrew Baumann:
> >>> The docs generally steer users away from using the legacy -net
> >>> parameter, however on win32 attempting to enable a tap device using
> >>> -netdev tap fails at an abort() in tap_enable(). Removing the abort()s
> >>> seems to be enough to get everything working, so do that.
> >>>
> >>> Signed-off-by: Andrew Baumann 
[...]
> >> Jason, what is the use of tap_enable, tap_disable?
> 
> It should be only used when we want to enable and disable a specific
> queue of a multiqueue supported tap.
> 
> >>   Is it fine
> >> to simply do nothing on Windows here?
> 
> Unless windows support multiqueue tap, we should keep the assert here.
> 
> > I was also hoping for a review -- I'm no expert on this stuff either, but my
> quick reading of those code paths is that they issue ioctls to enable/disable
> packet reception on the underlying tap device. As win32 TAP is implemented,
> that is already enabled from start of day.
> >
> > It's possible this patch still does not permit dynamic reconfiguration of 
> > tap
> devices (e.g. from the monitor console). However, it does work with the -
> netdev tap option on the command-line.
> >
> >> And is this something for QEMU‌ 2.9 (I added question to subject line)?
> > Ideally, yes. If not, -netdev tap will continue to blow up in the abort as 
> > it does
> today...
> >
> > Andrew
> 
> Yes, so the problem is we should prevent tap_enable() and tap_disable()
> from being called if multiqueue is disabled.
> 
> I believe the following patch can fix this issue, could you give a try
> on this?
> 
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index c321680..7d091c9 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -510,6 +510,10 @@ static int peer_attach(VirtIONet *n, int index)
>   return 0;
>   }
> 
> +if (n->max_queues == 1) {
> +return 0;
> +}
> +
>   return tap_enable(nc->peer);
>   }
> 

Yep, this works. Thanks!

Andrew


[Qemu-devel] [PATCH for 2.9] vhost: generalize iommu memory region

2017-03-28 Thread Jason Wang
We assumes the iommu_ops were attached to the root region of address
space. This may not be true for all kinds of IOMMU implementation and
especially after commit 3716d5902d74 ("pci: introduce a bus master
container"). So fix this by not assuming as->root has iommu_ops,
instead depending on the regions reported by memory listener through:

- register a memory listener to dma_as
- during region_add, if it's a region of IOMMU, register a specific
  IOMMU notifier, and store all notifiers in a list.
- during region_del, compare and delete the IOMMU notifier from the list

This is also a must for making vhost device IOTLB works for all types
of IOMMUs. Note, since we register one notifier during each
.region_add, the IOTLB may be flushed more than one times, this is
suboptimal and could be optimized in the future.

Reported-by: Maxime Coquelin 
Fixes: 3716d5902d74 ("pci: introduce a bus master container")
Cc: Peter Xu 
Signed-off-by: Jason Wang 
---
 hw/virtio/vhost.c | 84 ---
 include/hw/virtio/vhost.h | 11 +++
 2 files changed, 75 insertions(+), 20 deletions(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index febe519..613494d 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -425,10 +425,8 @@ static inline void vhost_dev_log_resize(struct vhost_dev 
*dev, uint64_t size)
 static int vhost_dev_has_iommu(struct vhost_dev *dev)
 {
 VirtIODevice *vdev = dev->vdev;
-AddressSpace *dma_as = vdev->dma_as;
 
-return memory_region_is_iommu(dma_as->root) &&
-   virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM);
+return virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM);
 }
 
 static void *vhost_memory_map(struct vhost_dev *dev, hwaddr addr,
@@ -720,6 +718,63 @@ static void vhost_region_del(MemoryListener *listener,
 }
 }
 
+static void vhost_iommu_unmap_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
+{
+struct vhost_iommu *iommu = container_of(n, struct vhost_iommu, n);
+struct vhost_dev *hdev = iommu->hdev;
+hwaddr iova = iotlb->iova + iommu->iommu_offset;
+
+if (hdev->vhost_ops->vhost_invalidate_device_iotlb(hdev, iova,
+   iotlb->addr_mask + 1)) {
+error_report("Fail to invalidate device iotlb");
+}
+}
+
+static void vhost_iommu_region_add(MemoryListener *listener,
+   MemoryRegionSection *section)
+{
+struct vhost_dev *dev = container_of(listener, struct vhost_dev,
+ iommu_listener);
+struct vhost_iommu *iommu;
+
+if (!memory_region_is_iommu(section->mr)) {
+return;
+}
+
+iommu = g_malloc0(sizeof(*iommu));
+iommu->n.notify = vhost_iommu_unmap_notify;
+iommu->n.notifier_flags = IOMMU_NOTIFIER_UNMAP;
+iommu->mr = section->mr;
+iommu->iommu_offset = section->offset_within_address_space -
+  section->offset_within_region;
+iommu->hdev = dev;
+memory_region_register_iommu_notifier(section->mr, >n);
+QLIST_INSERT_HEAD(>iommu_list, iommu, iommu_next);
+/* TODO: can replay help performance here? */
+}
+
+static void vhost_iommu_region_del(MemoryListener *listener,
+   MemoryRegionSection *section)
+{
+struct vhost_dev *dev = container_of(listener, struct vhost_dev,
+ iommu_listener);
+struct vhost_iommu *iommu;
+
+if (!memory_region_is_iommu(section->mr)) {
+return;
+}
+
+QLIST_FOREACH(iommu, >iommu_list, iommu_next) {
+if (iommu->mr == section->mr) {
+memory_region_unregister_iommu_notifier(iommu->mr,
+>n);
+QLIST_REMOVE(iommu, iommu_next);
+g_free(iommu);
+break;
+}
+}
+}
+
 static void vhost_region_nop(MemoryListener *listener,
  MemoryRegionSection *section)
 {
@@ -1161,17 +1216,6 @@ static void vhost_virtqueue_cleanup(struct 
vhost_virtqueue *vq)
 event_notifier_cleanup(>masked_notifier);
 }
 
-static void vhost_iommu_unmap_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
-{
-struct vhost_dev *hdev = container_of(n, struct vhost_dev, n);
-
-if (hdev->vhost_ops->vhost_invalidate_device_iotlb(hdev,
-   iotlb->iova,
-   iotlb->addr_mask + 1)) {
-error_report("Fail to invalidate device iotlb");
-}
-}
-
 int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
VhostBackendType backend_type, uint32_t busyloop_timeout)
 {
@@ -1244,8 +1288,10 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
 .priority = 10
 };
 
-hdev->n.notify = vhost_iommu_unmap_notify;
-hdev->n.notifier_flags = IOMMU_NOTIFIER_UNMAP;

Re: [Qemu-devel] [RFC PATCH qemu 3/3] vfio: Enable in-kernel acceleration via VFIO KVM device

2017-03-28 Thread David Gibson
On Tue, Mar 28, 2017 at 08:05:30PM +1100, Alexey Kardashevskiy wrote:
> This enables in-kernel acceleration of TCE update requests via
> VFIO KVM device.
> 
> Signed-off-by: Alexey Kardashevskiy 
> ---
>  include/hw/vfio/vfio-common.h |  1 +
>  target/ppc/kvm_ppc.h  |  6 ++
>  hw/ppc/spapr_iommu.c  |  4 
>  hw/vfio/common.c  | 13 +
>  hw/vfio/spapr.c   | 26 ++
>  target/ppc/kvm.c  |  7 ++-
>  hw/vfio/trace-events  |  1 +
>  7 files changed, 57 insertions(+), 1 deletion(-)
> 
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index c582de18c9..ee8c96cc4a 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -175,6 +175,7 @@ extern const MemoryListener vfio_prereg_listener;
>  int vfio_spapr_create_window(VFIOContainer *container,
>   MemoryRegionSection *section,
>   hwaddr *pgsize);
> +int vfio_spapr_notify_kvm(int vfio_kvm_device_fd, int groupfd, int tablefd);
>  int vfio_spapr_remove_window(VFIOContainer *container,
>   hwaddr offset_within_address_space);
>  
> diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
> index f48243d13f..ce7327a4e0 100644
> --- a/target/ppc/kvm_ppc.h
> +++ b/target/ppc/kvm_ppc.h
> @@ -46,6 +46,7 @@ void *kvmppc_create_spapr_tce(uint32_t liobn, uint32_t 
> page_shift,
>  int kvmppc_remove_spapr_tce(void *table, int pfd, uint32_t window_size);
>  int kvmppc_reset_htab(int shift_hint);
>  uint64_t kvmppc_rma_size(uint64_t current_size, unsigned int hash_shift);
> +bool kvmppc_has_cap_spapr_vfio(void);
>  #endif /* !CONFIG_USER_ONLY */
>  bool kvmppc_has_cap_epr(void);
>  int kvmppc_define_rtas_kernel_token(uint32_t token, const char *function);
> @@ -216,6 +217,11 @@ static inline bool 
> kvmppc_is_mem_backend_page_size_ok(char *obj_path)
>  return true;
>  }
>  
> +static inline bool kvmppc_has_cap_spapr_vfio(void)
> +{
> +return false;
> +}
> +
>  #endif /* !CONFIG_USER_ONLY */
>  
>  static inline bool kvmppc_has_cap_epr(void)
> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
> index b61c8f053e..fc23d81645 100644
> --- a/hw/ppc/spapr_iommu.c
> +++ b/hw/ppc/spapr_iommu.c
> @@ -293,6 +293,10 @@ void spapr_tce_set_need_vfio(sPAPRTCETable *tcet, bool 
> need_vfio)
>  
>  tcet->need_vfio = need_vfio;
>  
> +if (!need_vfio || (tcet->fd != -1 && kvmppc_has_cap_spapr_vfio())) {
> +return;
> +}
> +
>  oldtable = tcet->table;
>  
>  tcet->table = spapr_tce_alloc_table(tcet->liobn,
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index c75c7594d5..9aaf861904 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -440,6 +440,19 @@ static void vfio_listener_region_add(MemoryListener 
> *listener,
>  goto fail;
>  }
>  
> +#ifdef CONFIG_KVM
> +if (kvm_enabled() && section->mr->iommu_ops->get_fd) {
> +VFIOGroup *group;
> +int tablefd =  section->mr->iommu_ops->get_fd(section->mr);
> +
> +if (tablefd != -1) {
> +QLIST_FOREACH(group, >group_list, container_next) 
> {
> +vfio_spapr_notify_kvm(vfio_kvm_device_fd,
> +  group->fd, tablefd);

This is only going to make sense if we have both PAPR-style TCE tables
on the guest and TCE-based IOMMU backend on the host.  In which case
wouldn't it make more sense to explicitly verify that, and upcast,
rather than adding a new vaguely-specified get_fd hook.

> +}
> +}
> +}
> +#endif
>  vfio_host_win_add(container, section->offset_within_address_space,
>section->offset_within_address_space +
>int128_get64(section->size) - 1, pgsize);
> diff --git a/hw/vfio/spapr.c b/hw/vfio/spapr.c
> index 4409bcc0d7..dffef3bd5f 100644
> --- a/hw/vfio/spapr.c
> +++ b/hw/vfio/spapr.c
> @@ -17,6 +17,9 @@
>  #include "hw/hw.h"
>  #include "qemu/error-report.h"
>  #include "trace.h"
> +#ifdef CONFIG_KVM
> +#include "linux/kvm.h"
> +#endif
>  
>  static bool vfio_prereg_listener_skipped_section(MemoryRegionSection 
> *section)
>  {
> @@ -187,6 +190,29 @@ int vfio_spapr_create_window(VFIOContainer *container,
>  return 0;
>  }
>  
> +int vfio_spapr_notify_kvm(int vfio_kvm_device_fd, int groupfd, int tablefd)
> +{
> +#ifdef CONFIG_KVM
> +struct kvm_vfio_spapr_tce param = {
> +.groupfd = groupfd,
> +.tablefd = tablefd
> +};
> +struct kvm_device_attr attr = {
> +.group = KVM_DEV_VFIO_GROUP,
> +.attr = KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE,
> +.addr = (uint64_t)(unsigned long),
> +};
> +
> +if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, )) {
> +error_report("vfio: failed to setup fd %d for a group with fd %d: 
> %s",
> + 

Re: [Qemu-devel] [RFC PATCH qemu 1/3] memory: Add get_fd() hook for IOMMU MR

2017-03-28 Thread David Gibson
On Tue, Mar 28, 2017 at 11:48:29AM -0600, Alex Williamson wrote:
> On Tue, 28 Mar 2017 20:05:28 +1100
> Alexey Kardashevskiy  wrote:
> 
> > Signed-off-by: Alexey Kardashevskiy 
> > ---
> >  include/exec/memory.h | 2 ++
> >  hw/ppc/spapr_iommu.c  | 8 
> >  2 files changed, 10 insertions(+)
> > 
> > diff --git a/include/exec/memory.h b/include/exec/memory.h
> > index e39256ad03..925c10b35b 100644
> > --- a/include/exec/memory.h
> > +++ b/include/exec/memory.h
> > @@ -174,6 +174,8 @@ struct MemoryRegionIOMMUOps {
> >  void (*notify_flag_changed)(MemoryRegion *iommu,
> >  IOMMUNotifierFlag old_flags,
> >  IOMMUNotifierFlag new_flags);
> > +/* Returns a kernel fd for IOMMU */
> > +int (*get_fd)(MemoryRegion *iommu);
> 
> What if we used this as a prototype:
> 
> int (*get_fd)(IOMMUFdType type, MemoryRegion *iommu);
> 
> And then we defined:
> 
> typedef enum {
> SPAPR_IOMMU_TABLE_FD = 0,
> } IOMMUFdType;

Are we expecting any new types of fd?  Maybe it would be simpler just
to name this spapr_tce_fd() or something more specific, and only
generalize if we really need it for another fd type.

> 
> Such that you're actually asking the IOMMUOps for a specific type of FD
> and it either has it or not, so the caller doesn't need to assume what
> it is they get back.
> 
> Furthermore, add:
> 
> int memory_region_iommu_get_fd(IOMMUFdType type, MemoryRegion *mr)
> {
> assert(memory_region_is_iommu(mr));
> 
> if (mr->iommu_ops && mr->iommu_ops->get_fd) {
> return mr->iommu_ops->get_fd(type, mr);
> }
> 
> return -1;
> }
> 
> >  };
> >
> 
> This should be two patches, patch 1 above, patch 2 below
>   
> >  typedef struct CoalescedMemoryRange CoalescedMemoryRange;
> > diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
> > index 9e30e148d6..b61c8f053e 100644
> > --- a/hw/ppc/spapr_iommu.c
> > +++ b/hw/ppc/spapr_iommu.c
> > @@ -170,6 +170,13 @@ static void spapr_tce_notify_flag_changed(MemoryRegion 
> > *iommu,
> >  }
> >  }
> >  
> > +static int spapr_tce_get_fd(MemoryRegion *iommu)
> > +{
> > +sPAPRTCETable *tcet = container_of(iommu, sPAPRTCETable, iommu);
> > +
> > +return tcet->fd;
> 
> 
> This would then be:
> 
> return type == SPAPR_IOMMU_TABLE_FD ? tcet->fd : -1;
> 
> > +}
> > +
> >  static int spapr_tce_table_post_load(void *opaque, int version_id)
> >  {
> >  sPAPRTCETable *tcet = SPAPR_TCE_TABLE(opaque);
> > @@ -251,6 +258,7 @@ static MemoryRegionIOMMUOps spapr_iommu_ops = {
> >  .translate = spapr_tce_translate_iommu,
> >  .get_min_page_size = spapr_tce_get_min_page_size,
> >  .notify_flag_changed = spapr_tce_notify_flag_changed,
> > +.get_fd = spapr_tce_get_fd,
> >  };
> >  
> >  static int spapr_tce_table_realize(DeviceState *dev)
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [RFC PATCH qemu 2/3] vfio-pci: Reorder group-to-container attaching

2017-03-28 Thread David Gibson
On Tue, Mar 28, 2017 at 11:48:36AM -0600, Alex Williamson wrote:
> On Tue, 28 Mar 2017 20:05:29 +1100
> Alexey Kardashevskiy  wrote:
> 
> > At the moment VFIO PCI device initialization works as follows:
> > vfio_realize
> > vfio_get_group
> > vfio_connect_container
> > register memory listeners (1)
> > update QEMU groups lists
> > vfio_kvm_device_add_group
> > 
> > Then (example for pseries) the machine reset hook triggers region_add()
> > for all regions where listeners from (1) are listening:
> > 
> > ppc_spapr_reset
> > spapr_phb_reset
> > spapr_tce_table_enable
> > memory_region_add_subregion
> > vfio_listener_region_add
> > vfio_spapr_create_window
> > 
> > This scheme works fine until we need to handle VFIO PCI device hotplug
> > _and_ we want to enable in-kernel acceleration on, i.e. after PCI hotplug
> > we need a place to call
> > ioctl(vfio_kvm_device_fd, KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE).
> > Since the ioctl needs a LIOBN fd (from sPAPRTCETable) and a IOMMU group fd
> > (from VFIOGroup), vfio_listener_region_add() seems to be the only place
> > for this ioctl().
> > 
> > However this only works during boot time because the machine reset
> > happens strictly after all devices are finalized. When hotplug happens,
> > vfio_listener_region_add() is called when a memory listener is registered
> > but when this happens:
> > 1. new group is not added to the container->group_list yet;
> > 2. VFIO KVM device is unaware of the new IOMMU group.
> > 
> > This moves bits around to have all necessary VFIO infrastructure
> > in place for both initial startup and hotplug cases.
> > 
> > Signed-off-by: Alexey Kardashevskiy 
> > ---
> >  hw/vfio/common.c | 21 +++--
> >  1 file changed, 11 insertions(+), 10 deletions(-)
> > 
> > diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> > index f3ba9b9007..c75c7594d5 100644
> > --- a/hw/vfio/common.c
> > +++ b/hw/vfio/common.c
> > @@ -1086,6 +1086,16 @@ static int vfio_connect_container(VFIOGroup *group, 
> > AddressSpace *as,
> >  goto free_container_exit;
> >  }
> >  
> > +container->initialized = true;
> 
> This ignores the purpose of this variable, which is to make runtime
> mapping faults fatal, but device realize time faults simply cause the
> device to be rejected.  This cannot be moved above
> memory_listener_register().

Apart from that, the rest of the code motion looks ok, though.  Even
if it weren't for the hotplug case, have the container/group
more-or-less initialized before registering the listener makes more
logical sense to me.

> 
> > +
> > +vfio_kvm_device_add_group(group);
> > +
> > +QLIST_INIT(>group_list);
> > +QLIST_INSERT_HEAD(>containers, container, next);
> > +
> > +group->container = container;
> > +QLIST_INSERT_HEAD(>group_list, group, container_next);
> > +
> >  container->listener = vfio_memory_listener;
> >  
> >  memory_listener_register(>listener, container->space->as);
> > @@ -1097,16 +1107,9 @@ static int vfio_connect_container(VFIOGroup *group, 
> > AddressSpace *as,
> >  goto listener_release_exit;
> >  }
> >  
> > -container->initialized = true;
> > -
> > -QLIST_INIT(>group_list);
> > -QLIST_INSERT_HEAD(>containers, container, next);
> > -
> > -group->container = container;
> > -QLIST_INSERT_HEAD(>group_list, group, container_next);
> > -
> >  return 0;
> >  listener_release_exit:
> > +vfio_kvm_device_del_group(group);
> 
> 
> Where's the QLIST cleanup?

Moving it does introduce more intermediate cleanup cases which need to
be handled, though..

> 
> 
> >  vfio_listener_release(container);
> >  
> >  free_container_exit:
> > @@ -1210,8 +1213,6 @@ VFIOGroup *vfio_get_group(int groupid, AddressSpace 
> > *as, Error **errp)
> >  
> >  QLIST_INSERT_HEAD(_group_list, group, next);
> >  
> > -vfio_kvm_device_add_group(group);
> > -
> >  return group;
> >  
> >  close_fd_exit:
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


[Qemu-devel] [PULL 1/2] spapr: fix buffer-overflow

2017-03-28 Thread David Gibson
From: Marc-André Lureau 

Running postcopy-test with ASAN produces the following error:

QTEST_QEMU_BINARY=ppc64-softmmu/qemu-system-ppc64  tests/postcopy-test
...
=
==23641==ERROR: AddressSanitizer: heap-buffer-overflow on address 
0x7f155660 at pc 0x55b8e9d28208 bp 0x7f1555f4d3c0 sp 0x7f1555f4d3b0
READ of size 8 at 0x7f155660 thread T6
#0 0x55b8e9d28207 in htab_save_first_pass 
/home/elmarco/src/qq/hw/ppc/spapr.c:1528
#1 0x55b8e9d2939c in htab_save_iterate 
/home/elmarco/src/qq/hw/ppc/spapr.c:1665
#2 0x55b8e9beae3a in qemu_savevm_state_iterate 
/home/elmarco/src/qq/migration/savevm.c:1044
#3 0x55b8ea677733 in migration_thread 
/home/elmarco/src/qq/migration/migration.c:1976
#4 0x7f15845f46c9 in start_thread (/lib64/libpthread.so.0+0x76c9)
#5 0x7f157d9d0f7e in clone (/lib64/libc.so.6+0x107f7e)

0x7f155660 is located 0 bytes to the right of 2097152-byte region 
[0x7f155640,0x7f155660)
allocated by thread T0 here:
#0 0x7f159bb76980 in posix_memalign (/lib64/libasan.so.3+0xc7980)
#1 0x55b8eab185b2 in qemu_try_memalign 
/home/elmarco/src/qq/util/oslib-posix.c:106
#2 0x55b8eab186c8 in qemu_memalign 
/home/elmarco/src/qq/util/oslib-posix.c:122
#3 0x55b8e9d268a8 in spapr_reallocate_hpt 
/home/elmarco/src/qq/hw/ppc/spapr.c:1214
#4 0x55b8e9d26e04 in ppc_spapr_reset 
/home/elmarco/src/qq/hw/ppc/spapr.c:1261
#5 0x55b8ea12e913 in qemu_system_reset /home/elmarco/src/qq/vl.c:1697
#6 0x55b8ea13fa40 in main /home/elmarco/src/qq/vl.c:4679
#7 0x7f157d8e9400 in __libc_start_main (/lib64/libc.so.6+0x20400)

Thread T6 created by T0 here:
#0 0x7f159bae0488 in __interceptor_pthread_create 
(/lib64/libasan.so.3+0x31488)
#1 0x55b8eab1d9cb in qemu_thread_create 
/home/elmarco/src/qq/util/qemu-thread-posix.c:465
#2 0x55b8ea67874c in migrate_fd_connect 
/home/elmarco/src/qq/migration/migration.c:2096
#3 0x55b8ea66cbb0 in migration_channel_connect 
/home/elmarco/src/qq/migration/migration.c:500
#4 0x55b8ea678f38 in socket_outgoing_migration 
/home/elmarco/src/qq/migration/socket.c:87
#5 0x55b8eaa5a03a in qio_task_complete /home/elmarco/src/qq/io/task.c:142
#6 0x55b8eaa599cc in gio_task_thread_result 
/home/elmarco/src/qq/io/task.c:88
#7 0x7f15823e38e6  (/lib64/libglib-2.0.so.0+0x468e6)
SUMMARY: AddressSanitizer: heap-buffer-overflow 
/home/elmarco/src/qq/hw/ppc/spapr.c:1528 in htab_save_first_pass

index seems to be wrongly incremented, unless I miss something that
would be worth a comment.

Signed-off-by: Marc-André Lureau 
Signed-off-by: David Gibson 
---
 hw/ppc/spapr.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 8aecea3..44c26e4 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1524,16 +1524,16 @@ static void htab_save_first_pass(QEMUFile *f, 
sPAPRMachineState *spapr,
 /* Consume invalid HPTEs */
 while ((index < htabslots)
&& !HPTE_VALID(HPTE(spapr->htab, index))) {
-index++;
 CLEAN_HPTE(HPTE(spapr->htab, index));
+index++;
 }
 
 /* Consume valid HPTEs */
 chunkstart = index;
 while ((index < htabslots) && (index - chunkstart < USHRT_MAX)
&& HPTE_VALID(HPTE(spapr->htab, index))) {
-index++;
 CLEAN_HPTE(HPTE(spapr->htab, index));
+index++;
 }
 
 if (index > chunkstart) {
-- 
2.9.3




[Qemu-devel] [PULL 0/2] ppc-for-2.9 queue 20170329

2017-03-28 Thread David Gibson
The following changes since commit df9046363220e57d45818312759b954c033c58ab:

  Update version for v2.9.0-rc2 release (2017-03-28 19:11:16 +0100)

are available in the git repository at:

  git://github.com/dgibson/qemu.git tags/ppc-for-2.9-20170329

for you to fetch changes up to fe6824d12642b005c69123ecf8631f9b13553f8b:

  spapr: fix memory hot-unplugging (2017-03-29 11:35:16 +1100)


ppc patch queue for 2017-03-29

Two more bugfixes of sufficient severity to warrant going into 2.9.


Laurent Vivier (1):
  spapr: fix memory hot-unplugging

Marc-André Lureau (1):
  spapr: fix buffer-overflow

 hw/ppc/spapr.c |  4 ++--
 hw/ppc/spapr_drc.c | 20 +---
 include/hw/ppc/spapr_drc.h |  1 +
 3 files changed, 20 insertions(+), 5 deletions(-)



[Qemu-devel] [PULL 2/2] spapr: fix memory hot-unplugging

2017-03-28 Thread David Gibson
From: Laurent Vivier 

If, once the kernel has booted, we try to remove a memory
hotplugged while the kernel was not started, QEMU crashes on
an assert:

qemu-system-ppc64: hw/virtio/vhost.c:651:
   vhost_commit: Assertion `r >= 0' failed.
...
#4  in vhost_commit
#5  in memory_region_transaction_commit
#6  in pc_dimm_memory_unplug
#7  in spapr_memory_unplug
#8  spapr_machine_device_unplug
#9  in hotplug_handler_unplug
#10 in spapr_lmb_release
#11 in detach
#12 in set_allocation_state
#13 in rtas_set_indicator
...

If we take a closer look to the guest kernel log, we can see when
we try to unplug the memory:

pseries-hotplug-mem: Attempting to hot-add 4 LMB(s)

What happens:

1- The kernel has ignored the memory hotplug event because
   it was not started when it was generated.

2- When we hot-unplug the memory,
   QEMU starts to remove the memory,
generates an hot-unplug event,
and signals the kernel of the incoming new event

3- as the kernel is started, on the QEMU signal, it reads
   the event list, decodes the hotplug event and tries to
   finish the hotplugging.

4- QEMU receive the the hotplug notification while it
   is trying to hot-unplug the memory. This moves the memory
   DRC to an invalid state

This patch prevents this by not allowing to set the allocation
state to USABLE while the DRC is awaiting release.

RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1432382

Signed-off-by: Laurent Vivier 
Signed-off-by: David Gibson 
---
 hw/ppc/spapr_drc.c | 20 +---
 include/hw/ppc/spapr_drc.h |  1 +
 2 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
index 150f6bf..a1cdc87 100644
--- a/hw/ppc/spapr_drc.c
+++ b/hw/ppc/spapr_drc.c
@@ -135,6 +135,17 @@ static uint32_t set_allocation_state(sPAPRDRConnector *drc,
 if (!drc->dev) {
 return RTAS_OUT_NO_SUCH_INDICATOR;
 }
+if (drc->awaiting_release && drc->awaiting_allocation) {
+/* kernel is acknowledging a previous hotplug event
+ * while we are already removing it.
+ * it's safe to ignore awaiting_allocation here since we know the
+ * situation is predicated on the guest either already having done
+ * so (boot-time hotplug), or never being able to acquire in the
+ * first place (hotplug followed by immediate unplug).
+ */
+drc->awaiting_allocation_skippable = true;
+return RTAS_OUT_NO_SUCH_INDICATOR;
+}
 }
 
 if (drc->type != SPAPR_DR_CONNECTOR_TYPE_PCI) {
@@ -436,9 +447,11 @@ static void detach(sPAPRDRConnector *drc, DeviceState *d,
 }
 
 if (drc->awaiting_allocation) {
-drc->awaiting_release = true;
-trace_spapr_drc_awaiting_allocation(get_index(drc));
-return;
+if (!drc->awaiting_allocation_skippable) {
+drc->awaiting_release = true;
+trace_spapr_drc_awaiting_allocation(get_index(drc));
+return;
+}
 }
 
 drc->indicator_state = SPAPR_DR_INDICATOR_STATE_INACTIVE;
@@ -448,6 +461,7 @@ static void detach(sPAPRDRConnector *drc, DeviceState *d,
 }
 
 drc->awaiting_release = false;
+drc->awaiting_allocation_skippable = false;
 g_free(drc->fdt);
 drc->fdt = NULL;
 drc->fdt_start_offset = 0;
diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
index fa531d5..5524247 100644
--- a/include/hw/ppc/spapr_drc.h
+++ b/include/hw/ppc/spapr_drc.h
@@ -154,6 +154,7 @@ typedef struct sPAPRDRConnector {
 bool awaiting_release;
 bool signalled;
 bool awaiting_allocation;
+bool awaiting_allocation_skippable;
 
 /* device pointer, via link property */
 DeviceState *dev;
-- 
2.9.3




[Qemu-devel] Can qemu run on the ARM hardware?

2017-03-28 Thread liangy...@zhwei.com
Dear QEMU Community,

I am a newbie for the QEMU.  I have some questions to ask for you, 
My  hardware is an arm board and installed with ubuntu Linux OS.  i have some 
code must run on the x86 platform.
so i want to install  a vm software like the VMware and VirtualBox on the arm 
board and simulate an x86 machine.

1, can the qemu run on the arm platform?
2, if the qemu support arm hardware,  when the host is arm(qemu run on the 
arm),  can the qemu simulate a x86 virtual machine?
3, if the qemu can meet my requirements, where can i find the maual?

Wait for you reply,  thank you very much!



liangy...@zhwei.com


Re: [Qemu-devel] host stalls when qemu-system-aarch64 with kvm and pflash

2017-03-28 Thread Christoffer Dall
Hi Radha,

On Tue, Mar 28, 2017 at 12:58:24PM -0700, Radha Mohan wrote:
> Hi,
> I am seeing an issue with qemu-system-aarch64 when using pflash
> (booting kernel via UEFI bios).
> 
> Host kernel: 4.11.0-rc3-next-20170323
> Qemu version: v2.9.0-rc1
> 
> Command used:
> ./aarch64-softmmu/qemu-system-aarch64 -cpu host -enable-kvm -M
> virt,gic_version=3 -nographic -smp 1 -m 2048 -drive
> if=none,id=hd0,file=/root/zesty-server-cloudimg-arm64.img,id=0 -device
> virtio-blk-device,drive=hd0 -pflash /root/flash0.img -pflash
> /root/flash1.img
> 
> 
> As soon as the guest kernel boots the host starts to stall and prints
> the below messages. And the system never recovers. I can neither
> poweroff the guest nor the host. So I have resort to external power
> reset of the host.
> 
> ==
> [  116.199077] NMI watchdog: BUG: soft lockup - CPU#25 stuck for 23s!
> [kworker/25:1:454]
> [  116.206901] Modules linked in: binfmt_misc nls_iso8859_1 aes_ce_blk
> shpchp crypto_simd gpio_keys cryptd aes_ce_cipher ghash_ce sha2_ce
> sha1_ce uio_pdrv_genirq uio autofs4 btrfs raid10 rai
> d456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor
> raid6_pq libcrc32c raid1 raid0 multipath linear ast i2c_algo_bit ttm
> drm_kms_helper syscopyarea sysfillrect sysimgblt fb_s
> ys_fops drm nicvf ahci nicpf libahci thunder_bgx thunder_xcv
> mdio_thunder mdio_cavium
> 
> [  116.206995] CPU: 25 PID: 454 Comm: kworker/25:1 Not tainted
> 4.11.0-rc3-next-20170323 #1
> [  116.206997] Hardware name: www.cavium.com crb-1s/crb-1s, BIOS 0.3 Feb 23 
> 2017
> [  116.207010] Workqueue: events netstamp_clear
> [  116.207015] task: 801f906b5400 task.stack: 801f901a4000
> [  116.207020] PC is at smp_call_function_many+0x284/0x2e8
> [  116.207023] LR is at smp_call_function_many+0x244/0x2e8
> [  116.207026] pc : [] lr : []
> pstate: 8145
> [  116.207028] sp : 801f901a7be0
> [  116.207030] x29: 801f901a7be0 x28: 09139000
> [  116.207036] x27: 09139434 x26: 0080
> [  116.207041] x25:  x24: 081565d0
> [  116.207047] x23: 0001 x22: 08e11e00
> [  116.207052] x21: 801f6d5cff00 x20: 801f6d5cff08
> [  116.207057] x19: 09138e38 x18: 0a03
> [  116.207063] x17: b77c9028 x16: 082e81d8
> [  116.207068] x15: 3d0d6dd44d08 x14: 0036312196549b4a
> [  116.207073] x13: 58dabe4c x12: 0018
> [  116.207079] x11: 366e2f04 x10: 09f0
> [  116.207084] x9 : 801f901a7d30 x8 : 0002
> [  116.207089] x7 :  x6 : 
> [  116.207095] x5 :  x4 : 0020
> [  116.207100] x3 : 0020 x2 : 
> [  116.207105] x1 : 801f6d682578 x0 : 0003
> 
> [  150.443116] INFO: rcu_sched self-detected stall on CPU
> [  150.448261]  25-...: (14997 ticks this GP)
> idle=47a/141/0 softirq=349/349 fqs=7495
> [  150.451115] INFO: rcu_sched detected stalls on CPUs/tasks:
> [  150.451123]  25-...: (14997 ticks this GP)
> idle=47a/141/0 softirq=349/349 fqs=7495
> [  150.451124]  (detected by 13, t=15002 jiffies, g=805, c=804, q=8384)
> [  150.451136] Task dump for CPU 25:
> [  150.451138] kworker/25:1R  running task0   454  2 
> 0x0002
> [  150.451155] Workqueue: events netstamp_clear
> [  150.451158] Call trace:
> [  150.451164] [] __switch_to+0x90/0xa8
> [  150.451172] [] static_key_slow_inc+0x128/0x138
> [  150.451175] [] static_key_enable+0x34/0x60
> [  150.451178] [] netstamp_clear+0x68/0x80
> [  150.451181] [] process_one_work+0x158/0x478
> [  150.451183] [] worker_thread+0x50/0x4a8
> [  150.451187] [] kthread+0x108/0x138
> [  150.451190] [] ret_from_fork+0x10/0x50
> [  150.477451]   (t=15008 jiffies g=805 c=804 q=8384)
> [  150.482242] Task dump for CPU 25:
> [  150.482245] kworker/25:1R  running task0   454  2 
> 0x0002
> [  150.482259] Workqueue: events netstamp_clear
> [  150.482264] Call trace:
> [  150.482271] [] dump_backtrace+0x0/0x2b0
> [  150.482277] [] show_stack+0x24/0x30
> [  150.482281] [] sched_show_task+0x128/0x178
> [  150.482285] [] dump_cpu_task+0x48/0x58
> [  150.482288] [] rcu_dump_cpu_stacks+0xa0/0xe8
> [  150.482297] [] rcu_check_callbacks+0x774/0x938
> [  150.482305] [] update_process_times+0x34/0x60
> [  150.482314] [] tick_sched_handle.isra.7+0x38/0x70
> [  150.482319] [] tick_sched_timer+0x4c/0x98
> [  150.482324] [] __hrtimer_run_queues+0xd8/0x2b8
> [  150.482328] [] hrtimer_interrupt+0xa8/0x228
> [  150.482334] [] arch_timer_handler_phys+0x3c/0x50
> [  150.482341] [] handle_percpu_devid_irq+0x8c/0x230
> [  150.482344] [] generic_handle_irq+0x34/0x50
> [  150.482347] [] __handle_domain_irq+0x68/0xc0
> [  150.482351] [] gic_handle_irq+0xc4/0x170
> [  150.482356] Exception stack(0x801f901a7ab0 to 0x801f901a7be0)
> [  150.482360] 7aa0:
> 0003 801f6d682578
> [  150.482364] 7ac0: 

Re: [Qemu-devel] [RFC PATCH qemu 1/3] memory: Add get_fd() hook for IOMMU MR

2017-03-28 Thread Alex Williamson
On Wed, 29 Mar 2017 12:41:01 +1100
Alexey Kardashevskiy  wrote:

> On 29/03/17 04:48, Alex Williamson wrote:
> > On Tue, 28 Mar 2017 20:05:28 +1100
> > Alexey Kardashevskiy  wrote:
> >   
> >> Signed-off-by: Alexey Kardashevskiy 
> >> ---
> >>  include/exec/memory.h | 2 ++
> >>  hw/ppc/spapr_iommu.c  | 8 
> >>  2 files changed, 10 insertions(+)
> >>
> >> diff --git a/include/exec/memory.h b/include/exec/memory.h
> >> index e39256ad03..925c10b35b 100644
> >> --- a/include/exec/memory.h
> >> +++ b/include/exec/memory.h
> >> @@ -174,6 +174,8 @@ struct MemoryRegionIOMMUOps {
> >>  void (*notify_flag_changed)(MemoryRegion *iommu,
> >>  IOMMUNotifierFlag old_flags,
> >>  IOMMUNotifierFlag new_flags);
> >> +/* Returns a kernel fd for IOMMU */
> >> +int (*get_fd)(MemoryRegion *iommu);  
> > 
> > What if we used this as a prototype:
> > 
> > int (*get_fd)(IOMMUFdType type, MemoryRegion *iommu);
> > 
> > And then we defined:
> > 
> > typedef enum {
> > SPAPR_IOMMU_TABLE_FD = 0,
> > } IOMMUFdType;  
> 
> 
> Where do I put this enum definition? include/exec/memory.h? It does not
> have any mention of any platform yet...

I would assume memory.h, yes.  It seems like the enum is just an
abstraction, what does "get fd" mean generically to an IOMMU
MemoryRegion?  How can anyone else ever implement that callback if the
initial user is assuming that the returned fd is a specific, yet
unspecified type.  If the API is "give me an fd for this type of thing"
then the IOMMU driver can either provide it or indicate that type is not
supported.  There's really no platform knowledge at the memory API
level, it's just a type of thing that means something to the driver
providing the MemoryRegionIOMMUOps and the caller.
 
> I could pass char* instead of IOMMUFdType (and pass there something like
> TYPE_SPAPR_TCE_TABLE), would it be any better?

Gack, an enum seems so much cleaner than requiring a strcmp.  Thanks,

Alex

> > Such that you're actually asking the IOMMUOps for a specific type of FD
> > and it either has it or not, so the caller doesn't need to assume what
> > it is they get back.
> > 
> > Furthermore, add:
> > 
> > int memory_region_iommu_get_fd(IOMMUFdType type, MemoryRegion *mr)
> > {
> > assert(memory_region_is_iommu(mr));
> > 
> > if (mr->iommu_ops && mr->iommu_ops->get_fd) {
> > return mr->iommu_ops->get_fd(type, mr);
> > }
> > 
> > return -1;
> > }
> >   
> >>  };
> >>  
> > 
> > This should be two patches, patch 1 above, patch 2 below
> > 
> >>  typedef struct CoalescedMemoryRange CoalescedMemoryRange;
> >> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
> >> index 9e30e148d6..b61c8f053e 100644
> >> --- a/hw/ppc/spapr_iommu.c
> >> +++ b/hw/ppc/spapr_iommu.c
> >> @@ -170,6 +170,13 @@ static void 
> >> spapr_tce_notify_flag_changed(MemoryRegion *iommu,
> >>  }
> >>  }
> >>  
> >> +static int spapr_tce_get_fd(MemoryRegion *iommu)
> >> +{
> >> +sPAPRTCETable *tcet = container_of(iommu, sPAPRTCETable, iommu);
> >> +
> >> +return tcet->fd;  
> > 
> > 
> > This would then be:
> > 
> > return type == SPAPR_IOMMU_TABLE_FD ? tcet->fd : -1;
> >   
> >> +}
> >> +
> >>  static int spapr_tce_table_post_load(void *opaque, int version_id)
> >>  {
> >>  sPAPRTCETable *tcet = SPAPR_TCE_TABLE(opaque);
> >> @@ -251,6 +258,7 @@ static MemoryRegionIOMMUOps spapr_iommu_ops = {
> >>  .translate = spapr_tce_translate_iommu,
> >>  .get_min_page_size = spapr_tce_get_min_page_size,
> >>  .notify_flag_changed = spapr_tce_notify_flag_changed,
> >> +.get_fd = spapr_tce_get_fd,
> >>  };
> >>  
> >>  static int spapr_tce_table_realize(DeviceState *dev)  
> >   
> 
> 




Re: [Qemu-devel] [PATCH fixup 2/2] vhost: genearlize iommu memory region

2017-03-28 Thread Jason Wang



On 2017年03月21日 09:39, Peter Xu wrote:

On Mon, Mar 20, 2017 at 08:21:44PM -0500, Eric Blake wrote:

On 03/20/2017 08:12 PM, Michael S. Tsirkin wrote:


Since this patchset depends on vtd vfio series and fixes its breakage
to vhost, I'll pick them up for consistency for next post of vtd vfio
series as well.

Thanks,

-- peterx

Sounds good. It's best to order patches in a way that avoids
breakages even for people that bisect though.
Might require some patch squashing.

Indeed - a patch submitted with 'fixup' in the title is usually best
incorporated by squashing into a prior patch that has not actually
landed in master, rather than as a standalone patch.

But if you do post this to master as a standalone patch, please fix the
subject line: s/genearlize/generalize/

Will do. Thanks!

-- peterx


Looks like the assumption were broken by the introducing of bus master 
container, so this patch is needed for 2.9.  Will post a formal patch 
for this.


Thanks



Re: [Qemu-devel] [PATCH v3 3/3] vfio-pci: process non fatal error of AER

2017-03-28 Thread Alex Williamson
On Wed, 29 Mar 2017 02:59:34 +0300
"Michael S. Tsirkin"  wrote:

> On Tue, Mar 28, 2017 at 10:12:25AM -0600, Alex Williamson wrote:
> > On Tue, 28 Mar 2017 21:49:17 +0800
> > Cao jin  wrote:
> >   
> > > On 03/25/2017 06:12 AM, Alex Williamson wrote:  
> > > > On Thu, 23 Mar 2017 17:09:23 +0800
> > > > Cao jin  wrote:
> > > > 
> > > >> Make use of the non fatal error eventfd that the kernel module provide
> > > >> to process the AER non fatal error. Fatal error still goes into the
> > > >> legacy way which results in VM stop.
> > > >>
> > > >> Register the handler, wait for notification. Construct aer message and
> > > >> pass it to root port on notification. Root port will trigger an 
> > > >> interrupt
> > > >> to signal guest, then guest driver will do the recovery.
> > > > 
> > > > Can we guarantee this is the better solution in all cases or could
> > > > there be guests without AER support where the VM stop is the better
> > > > solution?
> > > > 
> > > 
> > > Currently, we only have VM stop on errors, that looks the same as a
> > > sudden power down to me.  With this solution, we have about
> > > 50%(non-fatal) chance to reduce the sudden power-down risk.  
> > 
> > If half of all faults are expected to be non-fatal, then you must have
> > some real examples of devices triggering non-fatal errors which can be
> > corrected in the guest driver that you can share to justify why it's a
> > good thing to enable this behavior.
> >   
> > > What if a guest doesn't support AER?  It looks the same as a host
> > > without AER support. Now I only can speculate the worst condition: guest
> > > crash, would that be quite different from a sudden power-down?  
> > 
> > Yes, it's very different.  In one case we contain the fault by stopping
> > the guest, in the other case we allow the guest to continue operating
> > with a known fault in the device which may allow the fault to propagate
> > and perhaps go unnoticed.  We have established with the current
> > behavior that QEMU will prevent further propagation of a fault by
> > halting the VM.  To change QEMU's behavior here risks that a VM relying
> > on that behavior no longer has that protection.  So it seems we either
> > need to detect whether the VM is handling AER or we need to require the
> > VM administrator to opt-in to this new feature.  
> 
> An opt-in flag sounds very reasonable. It can also specify whether
> to log the errors. We have a similar flag for disk errors.

An opt-in works, but is rather burdensome to the user.
 
> >  Real hardware has
> > these same issues and I believe there are handshakes that can be done
> > through ACPI to allow the guest to take over error handling from the
> > system.  
> 
> No, that's only for error reporting IIUC. Driver needs to be
> aware of a chance for errors to trigger and be able to
> handle them.

See drivers/acpi/pci_root.c:negotiate_os_control(), it seems that the
OSPM uses an _OSC to tell ACPI via OSC_PCI_EXPRESS_AER_CONTROL.  Would
that not be a reasonable mechanism for the guest to indicate AER
support?

> So yes, some guests might have benefitted from VM stop
> on AER but
> 1. the stop happens asynchronously so if guest can't handle
>errors there's a chance it is already crashed by the time we
>try to do vm stop

I fully concede that it's asynchronous, bad data can propagate and a
guest crash is one potential outcome.  That's fine, a guest crash
indicates a problem.  A VM stop also indicates a problem.  Potential
lack of a crash or VM stop is the worrisome case.

> 2. it's more of a chance by-product - we never promised
>guests that VMs would be more robust than bare metal

Does that make it not a regression if we change the behavior?  I
wouldn't exactly call it a chance by-product, perhaps it wasn't the
primary motivation, but it was considered.  Thanks,

Alex

> > > >> Signed-off-by: Dou Liyang 
> > > >> Signed-off-by: Cao jin 
> > > >> ---
> > > >>  hw/vfio/pci.c  | 202 
> > > >> +
> > > >>  hw/vfio/pci.h  |   2 +
> > > >>  linux-headers/linux/vfio.h |   2 +
> > > >>  3 files changed, 206 insertions(+)
> > > >>
> > > >> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > > >> index 3d0d005..c6786d5 100644
> > > >> --- a/hw/vfio/pci.c
> > > >> +++ b/hw/vfio/pci.c
> > > >> @@ -2432,6 +2432,200 @@ static void vfio_put_device(VFIOPCIDevice 
> > > >> *vdev)
> > > >>  vfio_put_base_device(>vbasedev);
> > > >>  }
> > > >>  
> > > >> +static void vfio_non_fatal_err_notifier_handler(void *opaque)
> > > >> +{
> > > >> +VFIOPCIDevice *vdev = opaque;
> > > >> +PCIDevice *dev = >pdev;
> > > >> +PCIEAERMsg msg = {
> > > >> +.severity = PCI_ERR_ROOT_CMD_NONFATAL_EN,
> > > >> +.source_id = pci_requester_id(dev),
> > > >> +};
> > > >> +
> > > >> +if 
> > > >> 

Re: [Qemu-devel] [PATCH v6] vfio error recovery: kernel support

2017-03-28 Thread Alex Williamson
On Wed, 29 Mar 2017 03:01:48 +0300
"Michael S. Tsirkin"  wrote:

> On Tue, Mar 28, 2017 at 10:12:33AM -0600, Alex Williamson wrote:
> > On Tue, 28 Mar 2017 21:47:00 +0800
> > Cao jin  wrote:
> >   
> > > On 03/25/2017 06:12 AM, Alex Williamson wrote:  
> > > > On Thu, 23 Mar 2017 17:07:31 +0800
> > > > Cao jin  wrote:
> > > > 
> > > > A more appropriate patch subject would be:
> > > > 
> > > > vfio-pci: Report correctable errors and slot reset events to user
> > > >
> > > 
> > > Correctable? It is confusing to me. Correctable error has its clear
> > > definition in PCIe spec, shouldn't it be "non-fatal"?  
> > 
> > My mistake, non-fatal.
> >
> > > >> From: "Michael S. Tsirkin" 
> > > > 
> > > > This hardly seems accurate anymore.  You could say Suggested-by and let
> > > > Michael add a sign-off, but it's changed since he sent it.
> > > > 
> > > >>
> > > >> 0. What happens now (PCIE AER only)
> > > >>Fatal errors cause a link reset. Non fatal errors don't.
> > > >>All errors stop the QEMU guest eventually, but not immediately,
> > > >>because it's detected and reported asynchronously.
> > > >>Interrupts are forwarded as usual.
> > > >>Correctable errors are not reported to user at all.
> > > >>
> > > >>Note:
> > > >>PPC EEH is different, but this approach won't affect EEH. EEH treat
> > > >>all errors as fatal ones in AER, so they will still be signalled to 
> > > >> user
> > > >>via the legacy eventfd.  Besides, all devices/functions in a PE 
> > > >> belongs
> > > >>to the same IOMMU group, so the slot_reset handler in this approach
> > > >>won't affect EEH either.
> > > >>
> > > >> 1. Correctable errors
> > > >>Hardware can correct these errors without software intervention,
> > > >>clear the error status is enough, this is what already done now.
> > > >>No need to recover it, nothing changed, leave it as it is.
> > > >>
> > > >> 2. Fatal errors
> > > >>They will induce a link reset. This is troublesome when user is
> > > >>a QEMU guest. This approach doesn't touch the existing mechanism.
> > > >>
> > > >> 3. Non-fatal errors
> > > >>Before this patch, they are signalled to user the same way as fatal 
> > > >> ones.
> > > >>With this patch, a new eventfd is introduced only for non-fatal 
> > > >> error
> > > >>notification. By splitting non-fatal ones out, it will benefit AER
> > > >>recovery of a QEMU guest user.
> > > >>
> > > >>To maintain backwards compatibility with userspace, non-fatal errors
> > > >>will continue to trigger via the existing error interrupt index if a
> > > >>non-fatal signaling mechanism has not been registered.
> > > >>
> > > >>Note:
> > > >>In case of PCI Express errors, kernel might request a slot reset
> > > >>affecting our device (from our point of view this is a passive 
> > > >> device
> > > >>reset as opposed to an active one requested by vfio itself).
> > > >>This might currently happen if a slot reset is requested by a driver
> > > >>(other than vfio) bound to another device function in the same slot.
> > > >>This will cause our device to lose its state so report this event to
> > > >>userspace.
> > > > 
> > > > I tried to convey this in my last comments, I don't think this is an
> > > > appropriate commit log.  Lead with what is the problem you're trying to
> > > > fix and why, what is the benefit to the user, and how is the change
> > > > accomplished.  If you want to provide a State of Error Handling in
> > > > VFIO, append it after the main points of the commit log.
> > > 
> > > ok.
> > >   
> > > > 
> > > > I also asked in my previous comments to provide examples of errors that
> > > > might trigger correctable errors to the user, this comment seems to
> > > > have been missed.  In my experience, AERs generated during device
> > > > assignment are generally hardware faults or induced by bad guest
> > > > drivers.  These are cases where a single fatal error is an appropriate
> > > > and sufficient response.  We've scaled back this support to the point
> > > > where we're only improving the situation of correctable errors and I'm
> > > > not convinced this is worthwhile and we're not simply checking a box on
> > > > an ill-conceived marketing requirements document.
> > > 
> > > Sorry. I noticed that question: "what actual errors do we expect
> > > userspace to see as non-fatal errors?", but I am confused about it.
> > > Correctable, non-fatal, fatal errors are clearly defined in PCIe spec,
> > > and Uncorrectable Error Severity Register will tell which is fatal, and
> > > which is non-fatal, this register is configurable, they are device
> > > specific as I guess. AER core driver distinguish them by
> > > pci_channel_io_normal/pci_channel_io_frozen,  So I don't understand your
> > > question. Or
> > > 
> > > Or, Do you mean we could 

Re: [Qemu-devel] [PATCH for-2.10 05/23] numa: move source of default CPUs to NUMA node mapping into boards

2017-03-28 Thread David Gibson
On Tue, Mar 28, 2017 at 12:53:10PM +0200, Igor Mammedov wrote:
> On Tue, 28 Mar 2017 15:19:20 +1100
> David Gibson  wrote:
> 
> > On Wed, Mar 22, 2017 at 02:32:30PM +0100, Igor Mammedov wrote:
> > > Originally CPU threads were by default assigned in
> > > round-robin fashion. However it was causing issues in
> > > guest since CPU threads from the same socket/core could
> > > be placed on different NUMA nodes.
> > > Commit fb43b73b (pc: fix default VCPU to NUMA node mapping)
> > > fixed it by grouping threads within a socket on the same node
> > > introducing cpu_index_to_socket_id() callback and commit
> > > 20bb648d (spapr: Fix default NUMA node allocation for threads)
> > > reused callback to fix similar issues for SPAPR machine
> > > even though socket doesn't make much sense there.
> > > 
> > > As result QEMU ended up having 3 default distribution rules
> > > used by 3 targets /virt-arm, spapr, pc/.
> > > 
> > > In effort of moving NUMA mapping for CPUs into possible_cpus,
> > > generalize default mapping in numa.c by making boards decide
> > > on default mapping and let them explicitly tell generic
> > > numa code to which node a CPU thread belongs to by replacing
> > > cpu_index_to_socket_id() with @cpu_index_to_instance_props()
> > > which provides default node_id assigned by board to specified
> > > cpu_index.
> > > 
> > > Signed-off-by: Igor Mammedov 
[snip]
> > > +static CpuInstanceProperties
> > > +virt_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
> > > +{
> > > +MachineClass *mc = MACHINE_GET_CLASS(ms);
> > > +const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms);
> > > +
> > > +assert(cpu_index < possible_cpus->len);
> > > +return possible_cpus->cpus[cpu_index].props;;
> > > +}
> > > +
> > 
> > It seems a bit weird to have a machine specific hook to pull the
> > property information when one way or another it's coming from the
> > possible_cpus table, which is already constructed by a machine
> > specific hook.  Could we add a range or list of cpu_index values to
> > each possible_cpus entry instead, and have a generic lookup of the
> > right entry based on that?

[snip]
> > > -static unsigned pc_cpu_index_to_socket_id(unsigned cpu_index)
> > > +static CpuInstanceProperties
> > > +pc_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
> > >  {
> > > -X86CPUTopoInfo topo;
> > > -x86_topo_ids_from_idx(smp_cores, smp_threads, cpu_index,
> > > -  );
> > > -return topo.pkg_id;
> > > +MachineClass *mc = MACHINE_GET_CLASS(ms);
> > > +const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms);
> > > +
> > > +assert(cpu_index < possible_cpus->len);
> > > +return possible_cpus->cpus[cpu_index].props;;
> > 
> > Since the pc and arm version of this are basically identical, I wonder
> > if that should actually be the default implementation.  If we need it
> > at all.
> ARM is still moving target and props are not really defined for it yet,
> so I'd like to keep it separate for now and when it stabilizes we can think
> about generalizing it.

Fair enough.

Any thoughts on my more general query above

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH for-2.10 22/23] numa: add '-numa cpu, ...' option for property based node mapping

2017-03-28 Thread David Gibson
On Tue, Mar 28, 2017 at 01:09:11PM +0200, Igor Mammedov wrote:
> On Tue, 28 Mar 2017 16:16:02 +1100
> David Gibson  wrote:
> 
> > On Wed, Mar 22, 2017 at 02:32:47PM +0100, Igor Mammedov wrote:
> > > legacy cpu to node mapping is using cpu index values to map
> > > VCPU to node with help of '-numa node,nodeid=node,cpus=x[-y]'
> > > option. However cpu index is internal concept and QEMU users
> > > have to guess /reimplement qemu's logic/ to map it to
> > > a concrete cpu socket/core/thread to make sane CPUs
> > > placement across numa nodes.
> > > 
> > > This patch allows to map cpu objects to numa nodes using
> > > the same properties as used for cpus with -device/device_add
> > > (socket-id/core-id/thread-id/node-id).
> > > 
> > > At present valid properties/values to address CPUs could be
> > > fetched using hotpluggable-cpus monitor/qmp command, it will
> > > require user to start qemu twice when creating domain to fetch
> > > possible CPUs for a machine type/-smp layout first and
> > > then the second time with numa explicit mapping for actual
> > > usage. The first step results could be saved and reused to
> > > set/change mapping later as far as machine type/-smp stays
> > > the same.
> > > 
> > > Proposed impl. supports exact and wildcard matching to
> > > simplify CLI and allow to set mapping for a specific cpu
> > > or group of cpu objects specified by matched properties.
> > > 
> > > For example:
> > > 
> > ># exact mapping x86
> > >-numa cpu,node-id=x,socket-id=y,core-id=z,thread-id=n
> > > 
> > ># exact mapping SPAPR
> > >-numa cpu,node-id=x,core-id=y
> > > 
> > ># wildcard mapping, all cpu objects that match socket-id=y
> > ># are mapped to node-id=x
> > >-numa cpu,node-id=x,socket-id=y
> > > 
> > > Signed-off-by: Igor Mammedov 
> > 
> > What's the rationale for adding a new CLI, rather than adding node-id
> > properties to the appropriate objects with -device, -global or -set as
> > appropriate?
>  '-global' applies to all cpus, while '-device,-set' applies to present
>  at boot time cpus only. So they do not work for the case of possible but
>  not present at boot time objects.

Ah!  Of course.

> For ACPI based targets, we need to have
>  numa mapping at boot time to build ACPI SRAT table.
>  I don't know if it's important for spapr/fdt,

Not in the same way.  For spapr the device tree fragment for the new
cpu is supplied to the guest at hotplug time rather than having to be
in the initial device tree.  So for us, node could be supplied with
device_add.

> but it uses current predefined
>  mapping with -numa node,cpus=x-y and new CLI hides from user internal
>  cpu_index and allows to use the same properties as we use for -device cpu,...
>  to define mapping to numa nodes for present/possible cpus.
> 
> > 
> > > ---
> > >  numa.c   | 13 +
> > >  qapi-schema.json |  7 +--
> > >  qemu-options.hx  | 23 ++-
> > >  3 files changed, 40 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/numa.c b/numa.c
> > > index 088fae3..588586b 100644
> > > --- a/numa.c
> > > +++ b/numa.c
> > > @@ -246,6 +246,19 @@ static int parse_numa(void *opaque, QemuOpts *opts, 
> > > Error **errp)
> > >  }
> > >  nb_numa_nodes++;
> > >  break;
> > > +case NUMA_OPTIONS_TYPE_CPU:
> > > +if (!object->u.cpu.has_node_id) {
> > > +error_setg(, "Missing mandatory node-id property");
> > > +goto end;
> > > +}
> > > +if (!numa_info[object->u.cpu.node_id].present) {
> > > +error_setg(, "Invalid node-id=%" PRId64 ", NUMA node 
> > > must be "
> > > +"defined with -numa node,nodeid=ID before it's used with 
> > > "
> > > +"-numa cpu,node-id=ID", object->u.cpu.node_id);
> > > +goto end;
> > > +}
> > > +machine_set_cpu_numa_node(ms, >u.cpu, );
> > > +break;
> > >  default:
> > >  abort();
> > >  }
> > > diff --git a/qapi-schema.json b/qapi-schema.json
> > > index a6b5955..a9a1d5e 100644
> > > --- a/qapi-schema.json
> > > +++ b/qapi-schema.json
> > > @@ -5673,10 +5673,12 @@
> > >  ##
> > >  # @NumaOptionsType:
> > >  #
> > > +# @cpu: property based CPU(s) to node mapping (Since: 2.10)
> > > +#
> > >  # Since: 2.1
> > >  ##
> > >  { 'enum': 'NumaOptionsType',
> > > -  'data': [ 'node' ] }
> > > +  'data': [ 'node', 'cpu' ] }
> > >  
> > >  ##
> > >  # @NumaOptions:
> > > @@ -5689,7 +5691,8 @@
> > >'base': { 'type': 'NumaOptionsType' },
> > >'discriminator': 'type',
> > >'data': {
> > > -'node': 'NumaNodeOptions' }}
> > > +'node': 'NumaNodeOptions',
> > > +'cpu': 'CpuInstanceProperties' }}
> > >  
> > >  ##
> > >  # @NumaNodeOptions:
> > > diff --git a/qemu-options.hx b/qemu-options.hx
> > > index 99af8ed..2185c34 100644
> > > --- a/qemu-options.hx
> > > +++ b/qemu-options.hx
> > > @@ -139,13 +139,16 @@ ETEXI
> > >  
> > 

Re: [Qemu-devel] [PATCH qemu] pci: Add missing drop of bus master AS reference

2017-03-28 Thread Jason Wang



On 2017年03月27日 12:40, Alexey Kardashevskiy wrote:

The recent introduction of a bus master container added
memory_region_add_subregion() into the PCI device registering path but
missed memory_region_del_subregion() in the unregistering path leaving
a reference to the root memory region of the new container.

This adds missing memory_region_del_subregion().

Fixes: 3716d5902d743 ("pci: introduce a bus master container")
Signed-off-by: Alexey Kardashevskiy 
---
  hw/pci/pci.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index e6b08e1988..bd8043c460 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -869,6 +869,8 @@ static void do_pci_unregister_device(PCIDevice *pci_dev)
  pci_dev->bus->devices[pci_dev->devfn] = NULL;
  pci_config_free(pci_dev);
  
+memory_region_del_subregion(_dev->bus_master_container_region,

+_dev->bus_master_enable_region);
  address_space_destroy(_dev->bus_master_as);
  }
  


Acked-by: Jason Wang 

Thanks!



Re: [Qemu-devel] [PATCH for 2.9?] tap-win32: don't abort in tap_enable(); enables -netdev tap

2017-03-28 Thread Jason Wang



On 2017年03月29日 02:55, Andrew Baumann wrote:

From: Stefan Weil [mailto:s...@weilnetz.de]
Sent: Tuesday, 28 March 2017 11:28
Am 25.03.2017 um 00:46 schrieb Andrew Baumann:

The docs generally steer users away from using the legacy -net
parameter, however on win32 attempting to enable a tap device using
-netdev tap fails at an abort() in tap_enable(). Removing the abort()s
seems to be enough to get everything working, so do that.

Signed-off-by: Andrew Baumann 
---
  net/tap-win32.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/tap-win32.c b/net/tap-win32.c
index 662f9b6..3620843 100644
--- a/net/tap-win32.c
+++ b/net/tap-win32.c
@@ -811,10 +811,10 @@ int net_init_tap(const Netdev *netdev, const char

*name,

  int tap_enable(NetClientState *nc)
  {
-abort();
+return 0;
  }

  int tap_disable(NetClientState *nc)
  {
-abort();
+return 0;
  }

As I never worked with TAP on Windows, I cannot say much to this fix.

Jason, what is the use of tap_enable, tap_disable?


It should be only used when we want to enable and disable a specific 
queue of a multiqueue supported tap.



  Is it fine
to simply do nothing on Windows here?


Unless windows support multiqueue tap, we should keep the assert here.


I was also hoping for a review -- I'm no expert on this stuff either, but my 
quick reading of those code paths is that they issue ioctls to enable/disable 
packet reception on the underlying tap device. As win32 TAP is implemented, 
that is already enabled from start of day.

It's possible this patch still does not permit dynamic reconfiguration of tap 
devices (e.g. from the monitor console). However, it does work with the -netdev 
tap option on the command-line.


And is this something for QEMU‌ 2.9 (I added question to subject line)?

Ideally, yes. If not, -netdev tap will continue to blow up in the abort as it 
does today...

Andrew


Yes, so the problem is we should prevent tap_enable() and tap_disable() 
from being called if multiqueue is disabled.


I believe the following patch can fix this issue, could you give a try 
on this?


diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index c321680..7d091c9 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -510,6 +510,10 @@ static int peer_attach(VirtIONet *n, int index)
 return 0;
 }

+if (n->max_queues == 1) {
+return 0;
+}
+
 return tap_enable(nc->peer);
 }





[Qemu-devel] Python support in Risu

2017-03-28 Thread G 3
I was wondering if someone wrote a risugen_*.pm file in python and it  
still worked with risugen, would you accept that patch?




Re: [Qemu-devel] [RFC for-2.10 3/3] pseries: Allow PCIe virtio and XHCI on pseries machine type

2017-03-28 Thread Alexey Kardashevskiy
On 28/03/17 13:16, David Gibson wrote:
> pseries now allows PCIe devices (both emulated and VFIO), although its
> PCI bus is in most respects a plain PCI bus - this uses paravirtualized
> access methods to PCIe extended config space defined in the PAPR spec.
> 
> However, because the bus is not PCIe, it means that virtio-pci and XHCI
> devices will present themselves as plain PCI rather than PCIe, which would
> be preferable.
> 
> This patch uses the new hook to override the behaviour for such PCI/PCIe
> "hybrid" devices to allow PCIe virtio-pci and XHCI on pseries.


Not clear what all these tests/virtio-*.c changes are for and why here -
does "make check" break if you do not enforce disable-legacy=off?


> 
> Signed-off-by: David Gibson 
> ---
>  hw/ppc/spapr_pci.c   | 9 +
>  tests/virtio-9p-test.c   | 2 +-
>  tests/virtio-blk-test.c  | 4 ++--
>  tests/virtio-net-test.c  | 2 +-
>  tests/virtio-scsi-test.c | 2 +-
>  5 files changed, 14 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index 98c52e4..7686f7f 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -1979,6 +1979,14 @@ static const char 
> *spapr_phb_root_bus_path(PCIHostState *host_bridge,
>  return sphb->dtbusname;
>  }
>  
> +static bool spapr_phb_allow_hybrid_pcie(PCIHostState *host_bridge,
> +PCIDevice *pci_dev)
> +{
> +sPAPRPHBState *sphb = SPAPR_PCI_HOST_BRIDGE(host_bridge);
> +
> +return sphb->pcie_ecs;
> +}
> +
>  static void spapr_phb_class_init(ObjectClass *klass, void *data)
>  {
>  PCIHostBridgeClass *hc = PCI_HOST_BRIDGE_CLASS(klass);
> @@ -1986,6 +1994,7 @@ static void spapr_phb_class_init(ObjectClass *klass, 
> void *data)
>  HotplugHandlerClass *hp = HOTPLUG_HANDLER_CLASS(klass);
>  
>  hc->root_bus_path = spapr_phb_root_bus_path;
> +hc->allow_hybrid_pcie = spapr_phb_allow_hybrid_pcie;
>  dc->realize = spapr_phb_realize;
>  dc->props = spapr_phb_properties;
>  dc->reset = spapr_phb_reset;
> diff --git a/tests/virtio-9p-test.c b/tests/virtio-9p-test.c
> index 43a1ad8..ae0d51e 100644
> --- a/tests/virtio-9p-test.c
> +++ b/tests/virtio-9p-test.c
> @@ -32,7 +32,7 @@ static QVirtIO9P *qvirtio_9p_start(const char *driver)
>  {
>  const char *arch = qtest_get_arch();
>  const char *cmd = "-fsdev local,id=fsdev0,security_model=none,path=%s "
> -  "-device %s,fsdev=fsdev0,mount_tag=%s";
> +  "-device 
> %s,fsdev=fsdev0,mount_tag=%s,disable-legacy=off";
>  QVirtIO9P *v9p = g_new0(QVirtIO9P, 1);
>  
>  v9p->test_share = g_strdup("/tmp/qtest.XX");
> diff --git a/tests/virtio-blk-test.c b/tests/virtio-blk-test.c
> index 1eee95d..5fb7882 100644
> --- a/tests/virtio-blk-test.c
> +++ b/tests/virtio-blk-test.c
> @@ -65,7 +65,7 @@ static QOSState *pci_test_start(void)
>  const char *cmd = "-drive if=none,id=drive0,file=%s,format=raw "
>"-drive if=none,id=drive1,file=/dev/null,format=raw "
>"-device virtio-blk-pci,id=drv0,drive=drive0,"
> -  "addr=%x.%x";
> +  "addr=%x.%x,disable-legacy=off";
>  
>  tmp_path = drive_create();
>  
> @@ -656,7 +656,7 @@ static void pci_hotplug(void)
>  
>  /* plug secondary disk */
>  qpci_plug_device_test("virtio-blk-pci", "drv1", PCI_SLOT_HP,
> -  "'drive': 'drive1'");
> +  "'drive': 'drive1', 'disable-legacy': 'off'");
>  
>  dev = virtio_blk_pci_init(qs->pcibus, PCI_SLOT_HP);
>  g_assert(dev);
> diff --git a/tests/virtio-net-test.c b/tests/virtio-net-test.c
> index 8f94360..a35d87b 100644
> --- a/tests/virtio-net-test.c
> +++ b/tests/virtio-net-test.c
> @@ -55,7 +55,7 @@ static QOSState *pci_test_start(int socket)
>  {
>  const char *arch = qtest_get_arch();
>  const char *cmd = "-netdev socket,fd=%d,id=hs0 -device "
> -  "virtio-net-pci,netdev=hs0";
> +  "virtio-net-pci,netdev=hs0,disable-legacy=off";
>  
>  if (strcmp(arch, "i386") == 0 || strcmp(arch, "x86_64") == 0) {
>  return qtest_pc_boot(cmd, socket);
> diff --git a/tests/virtio-scsi-test.c b/tests/virtio-scsi-test.c
> index 0eabd56..5a802d9 100644
> --- a/tests/virtio-scsi-test.c
> +++ b/tests/virtio-scsi-test.c
> @@ -36,7 +36,7 @@ static QOSState *qvirtio_scsi_start(const char *extra_opts)
>  {
>  const char *arch = qtest_get_arch();
>  const char *cmd = "-drive id=drv0,if=none,file=/dev/null,format=raw "
> -  "-device virtio-scsi-pci,id=vs0 "
> +  "-device virtio-scsi-pci,id=vs0,disable-legacy=off "
>"-device scsi-hd,bus=vs0.0,drive=drv0 %s";
>  
>  if (strcmp(arch, "i386") == 0 || strcmp(arch, "x86_64") == 0) {
> 


-- 
Alexey



Re: [Qemu-devel] [RFC PATCH qemu 1/3] memory: Add get_fd() hook for IOMMU MR

2017-03-28 Thread Alexey Kardashevskiy
On 29/03/17 04:48, Alex Williamson wrote:
> On Tue, 28 Mar 2017 20:05:28 +1100
> Alexey Kardashevskiy  wrote:
> 
>> Signed-off-by: Alexey Kardashevskiy 
>> ---
>>  include/exec/memory.h | 2 ++
>>  hw/ppc/spapr_iommu.c  | 8 
>>  2 files changed, 10 insertions(+)
>>
>> diff --git a/include/exec/memory.h b/include/exec/memory.h
>> index e39256ad03..925c10b35b 100644
>> --- a/include/exec/memory.h
>> +++ b/include/exec/memory.h
>> @@ -174,6 +174,8 @@ struct MemoryRegionIOMMUOps {
>>  void (*notify_flag_changed)(MemoryRegion *iommu,
>>  IOMMUNotifierFlag old_flags,
>>  IOMMUNotifierFlag new_flags);
>> +/* Returns a kernel fd for IOMMU */
>> +int (*get_fd)(MemoryRegion *iommu);
> 
> What if we used this as a prototype:
> 
> int (*get_fd)(IOMMUFdType type, MemoryRegion *iommu);
> 
> And then we defined:
> 
> typedef enum {
> SPAPR_IOMMU_TABLE_FD = 0,
> } IOMMUFdType;


Where do I put this enum definition? include/exec/memory.h? It does not
have any mention of any platform yet...

I could pass char* instead of IOMMUFdType (and pass there something like
TYPE_SPAPR_TCE_TABLE), would it be any better?


> 
> Such that you're actually asking the IOMMUOps for a specific type of FD
> and it either has it or not, so the caller doesn't need to assume what
> it is they get back.
> 
> Furthermore, add:
> 
> int memory_region_iommu_get_fd(IOMMUFdType type, MemoryRegion *mr)
> {
> assert(memory_region_is_iommu(mr));
> 
> if (mr->iommu_ops && mr->iommu_ops->get_fd) {
> return mr->iommu_ops->get_fd(type, mr);
> }
> 
> return -1;
> }
> 
>>  };
>>
> 
> This should be two patches, patch 1 above, patch 2 below
>   
>>  typedef struct CoalescedMemoryRange CoalescedMemoryRange;
>> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
>> index 9e30e148d6..b61c8f053e 100644
>> --- a/hw/ppc/spapr_iommu.c
>> +++ b/hw/ppc/spapr_iommu.c
>> @@ -170,6 +170,13 @@ static void spapr_tce_notify_flag_changed(MemoryRegion 
>> *iommu,
>>  }
>>  }
>>  
>> +static int spapr_tce_get_fd(MemoryRegion *iommu)
>> +{
>> +sPAPRTCETable *tcet = container_of(iommu, sPAPRTCETable, iommu);
>> +
>> +return tcet->fd;
> 
> 
> This would then be:
> 
> return type == SPAPR_IOMMU_TABLE_FD ? tcet->fd : -1;
> 
>> +}
>> +
>>  static int spapr_tce_table_post_load(void *opaque, int version_id)
>>  {
>>  sPAPRTCETable *tcet = SPAPR_TCE_TABLE(opaque);
>> @@ -251,6 +258,7 @@ static MemoryRegionIOMMUOps spapr_iommu_ops = {
>>  .translate = spapr_tce_translate_iommu,
>>  .get_min_page_size = spapr_tce_get_min_page_size,
>>  .notify_flag_changed = spapr_tce_notify_flag_changed,
>> +.get_fd = spapr_tce_get_fd,
>>  };
>>  
>>  static int spapr_tce_table_realize(DeviceState *dev)
> 


-- 
Alexey



Re: [Qemu-devel] [PATCH v2] spapr: fix memory hot-unplugging

2017-03-28 Thread David Gibson
On Tue, Mar 28, 2017 at 02:09:34PM +0200, Laurent Vivier wrote:
> If, once the kernel has booted, we try to remove a memory
> hotplugged while the kernel was not started, QEMU crashes on
> an assert:
> 
> qemu-system-ppc64: hw/virtio/vhost.c:651:
>vhost_commit: Assertion `r >= 0' failed.
> ...
> #4  in vhost_commit
> #5  in memory_region_transaction_commit
> #6  in pc_dimm_memory_unplug
> #7  in spapr_memory_unplug
> #8  spapr_machine_device_unplug
> #9  in hotplug_handler_unplug
> #10 in spapr_lmb_release
> #11 in detach
> #12 in set_allocation_state
> #13 in rtas_set_indicator
> ...
> 
> If we take a closer look to the guest kernel log, we can see when
> we try to unplug the memory:
> 
> pseries-hotplug-mem: Attempting to hot-add 4 LMB(s)
> 
> What happens:
> 
> 1- The kernel has ignored the memory hotplug event because
>it was not started when it was generated.
> 
> 2- When we hot-unplug the memory,
>QEMU starts to remove the memory,
> generates an hot-unplug event,
> and signals the kernel of the incoming new event
> 
> 3- as the kernel is started, on the QEMU signal, it reads
>the event list, decodes the hotplug event and tries to
>finish the hotplugging.
> 
> 4- QEMU receive the the hotplug notification while it
>is trying to hot-unplug the memory. This moves the memory
>DRC to an invalid state
> 
> This patch prevents this by not allowing to set the allocation
> state to USABLE while the DRC is awaiting release.
> 
> RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1432382
> 
> Signed-off-by: Laurent Vivier 

Applied to ppc-for-2.9, thanks.

> ---
> v2: Add awaiting_allocation_skippable flag
> as suggested by Michael
> 
>  hw/ppc/spapr_drc.c | 20 +---
>  include/hw/ppc/spapr_drc.h |  1 +
>  2 files changed, 18 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> index 150f6bf..a1cdc87 100644
> --- a/hw/ppc/spapr_drc.c
> +++ b/hw/ppc/spapr_drc.c
> @@ -135,6 +135,17 @@ static uint32_t set_allocation_state(sPAPRDRConnector 
> *drc,
>  if (!drc->dev) {
>  return RTAS_OUT_NO_SUCH_INDICATOR;
>  }
> +if (drc->awaiting_release && drc->awaiting_allocation) {
> +/* kernel is acknowledging a previous hotplug event
> + * while we are already removing it.
> + * it's safe to ignore awaiting_allocation here since we know the
> + * situation is predicated on the guest either already having 
> done
> + * so (boot-time hotplug), or never being able to acquire in the
> + * first place (hotplug followed by immediate unplug).
> + */
> +drc->awaiting_allocation_skippable = true;
> +return RTAS_OUT_NO_SUCH_INDICATOR;
> +}
>  }
>  
>  if (drc->type != SPAPR_DR_CONNECTOR_TYPE_PCI) {
> @@ -436,9 +447,11 @@ static void detach(sPAPRDRConnector *drc, DeviceState *d,
>  }
>  
>  if (drc->awaiting_allocation) {
> -drc->awaiting_release = true;
> -trace_spapr_drc_awaiting_allocation(get_index(drc));
> -return;
> +if (!drc->awaiting_allocation_skippable) {
> +drc->awaiting_release = true;
> +trace_spapr_drc_awaiting_allocation(get_index(drc));
> +return;
> +}
>  }
>  
>  drc->indicator_state = SPAPR_DR_INDICATOR_STATE_INACTIVE;
> @@ -448,6 +461,7 @@ static void detach(sPAPRDRConnector *drc, DeviceState *d,
>  }
>  
>  drc->awaiting_release = false;
> +drc->awaiting_allocation_skippable = false;
>  g_free(drc->fdt);
>  drc->fdt = NULL;
>  drc->fdt_start_offset = 0;
> diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
> index fa531d5..5524247 100644
> --- a/include/hw/ppc/spapr_drc.h
> +++ b/include/hw/ppc/spapr_drc.h
> @@ -154,6 +154,7 @@ typedef struct sPAPRDRConnector {
>  bool awaiting_release;
>  bool signalled;
>  bool awaiting_allocation;
> +bool awaiting_allocation_skippable;
>  
>  /* device pointer, via link property */
>  DeviceState *dev;

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


[Qemu-devel] [PULL 2/3] pci: Add missing drop of bus master AS reference

2017-03-28 Thread Michael S. Tsirkin
From: Alexey Kardashevskiy 

The recent introduction of a bus master container added
memory_region_add_subregion() into the PCI device registering path but
missed memory_region_del_subregion() in the unregistering path leaving
a reference to the root memory region of the new container.

This adds missing memory_region_del_subregion().

Fixes: 3716d5902d743 ("pci: introduce a bus master container")
Signed-off-by: Alexey Kardashevskiy 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
Reviewed-by: Paolo Bonzini 
---
 hw/pci/pci.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index e6b08e1..bd8043c 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -869,6 +869,8 @@ static void do_pci_unregister_device(PCIDevice *pci_dev)
 pci_dev->bus->devices[pci_dev->devfn] = NULL;
 pci_config_free(pci_dev);
 
+memory_region_del_subregion(_dev->bus_master_container_region,
+_dev->bus_master_enable_region);
 address_space_destroy(_dev->bus_master_as);
 }
 
-- 
MST




[Qemu-devel] [PULL 3/3] virtio: fix vring_align() on 64-bit windows

2017-03-28 Thread Michael S. Tsirkin
From: Andrew Baumann 

long is 32-bits on 64-bit windows, which caused the top half of the
address to be truncated; this patch changes it to use the
QEMU_ALIGN_UP macro which does not suffer the same problem

Signed-off-by: Andrew Baumann 
Reviewed-by: Eric Blake 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
Reviewed-by: Stefan Weil 
Reviewed-by: Philippe Mathieu-Daudé 
---
 include/hw/virtio/virtio.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 15efcf2..7b6edba 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -34,7 +34,7 @@ struct VirtQueue;
 static inline hwaddr vring_align(hwaddr addr,
  unsigned long align)
 {
-return (addr + align - 1) & ~(align - 1);
+return QEMU_ALIGN_UP(addr, align);
 }
 
 typedef struct VirtQueue VirtQueue;
-- 
MST




[Qemu-devel] [PULL 0/3] virtio, pci: fixes

2017-03-28 Thread Michael S. Tsirkin
The following changes since commit df9046363220e57d45818312759b954c033c58ab:

  Update version for v2.9.0-rc2 release (2017-03-28 19:11:16 +0100)

are available in the git repository at:

  git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream

for you to fetch changes up to b8adbc657802482e4da1767bf983ebfdf9bfe9fc:

  virtio: fix vring_align() on 64-bit windows (2017-03-29 02:35:24 +0300)


virtio, pci: fixes

More fixes for 2.9.

Signed-off-by: Michael S. Tsirkin 


Alexey Kardashevskiy (1):
  pci: Add missing drop of bus master AS reference

Andrew Baumann (1):
  virtio: fix vring_align() on 64-bit windows

Halil Pasic (1):
  event_notifier: prevent accidental use after close

 include/hw/virtio/virtio.h  | 2 +-
 hw/pci/pci.c| 2 ++
 util/event_notifier-posix.c | 2 ++
 util/event_notifier-win32.c | 1 +
 4 files changed, 6 insertions(+), 1 deletion(-)




[Qemu-devel] [PULL 1/3] event_notifier: prevent accidental use after close

2017-03-28 Thread Michael S. Tsirkin
From: Halil Pasic 

Let's set the handles to the underlying facilities to their extremal
value so no accidental misuse can happen, and to make it obvious that the
notifier is dysfunctional. E.g. if we just close an fd but do not touch
the int holding the fd eventually a read/write could succeed again when
the fd gets reused, and corrupt the file addressed by the fd.

Signed-off-by: Halil Pasic 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 util/event_notifier-posix.c | 2 ++
 util/event_notifier-win32.c | 1 +
 2 files changed, 3 insertions(+)

diff --git a/util/event_notifier-posix.c b/util/event_notifier-posix.c
index 7e40252..acdbe3b 100644
--- a/util/event_notifier-posix.c
+++ b/util/event_notifier-posix.c
@@ -81,8 +81,10 @@ void event_notifier_cleanup(EventNotifier *e)
 {
 if (e->rfd != e->wfd) {
 close(e->rfd);
+e->rfd = -1;
 }
 close(e->wfd);
+e->wfd = -1;
 }
 
 int event_notifier_get_fd(const EventNotifier *e)
diff --git a/util/event_notifier-win32.c b/util/event_notifier-win32.c
index 519fb59..62c53b0 100644
--- a/util/event_notifier-win32.c
+++ b/util/event_notifier-win32.c
@@ -25,6 +25,7 @@ int event_notifier_init(EventNotifier *e, int active)
 void event_notifier_cleanup(EventNotifier *e)
 {
 CloseHandle(e->event);
+e->event = NULL;
 }
 
 HANDLE event_notifier_get_handle(EventNotifier *e)
-- 
MST




[Qemu-devel] [PATCH v3 0/1] block: pass the right options for BlockDriver.bdrv_open()

2017-03-28 Thread Dong Jia Shi
Trying to restore rbd image on ceph cluster from snapshot with
qemu-img could trigger a calling to raw_open with a NULL @options,
and that will lead to a failure of the snapshot applying.

[root@s8345007 ~]# gdb --args qemu-img snapshot -a snap1 rbd:test_pool/dj_image
... ...
Program received signal SIGSEGV, Segmentation fault.
0x801395a8 in qdict_next_entry (qdict=0x0, first_bucket=0) at 
qobject/qdict.c:327
327 if (!QLIST_EMPTY(>table[i])) {
(gdb) bt
#0  0x801395a8 in qdict_next_entry (qdict=0x0, first_bucket=0) at 
qobject/qdict.c:327
#1  0x80139626 in qdict_first (qdict=0x0) at qobject/qdict.c:340
#2  0x8013a00c in qdict_extract_subqdict (src=0x0, dst=0x3ffec50, 
start=0x80698260 "file.")
at qobject/qdict.c:576
#3  0x80019c26 in bdrv_open_child_bs (filename=0x0, options=0x0, 
bdref_key=0x8017ab38 "file", 
parent=0x80630300, child_role=0x80176108 , allow_none=false, 
errp=0x0) at block.c:2018
#4  0x80019dfa in bdrv_open_child (filename=0x0, options=0x0, 
bdref_key=0x8017ab38 "file", 
parent=0x80630300, child_role=0x80176108 , allow_none=false, 
errp=0x0) at block.c:2065
#5  0x8002b9a0 in raw_open (bs=0x80630300, options=0x0, flags=8194, 
errp=0x0) at block/raw-format.c:387
#6  0x80087516 in bdrv_snapshot_goto (bs=0x80630300, 
snapshot_id=0x3fff75c "snap1")
at block/snapshot.c:194
#7  0x80010b8c in img_snapshot (argc=4, argv=0x3fff4c0) at 
qemu-img.c:2937
#8  0x800140e4 in main (argc=4, argv=0x3fff4c0) at qemu-img.c:4373

The problematic code is /block/snapshot.c:194:
178 int bdrv_snapshot_goto(BlockDriverState *bs,
179const char *snapshot_id)
180 {
181 BlockDriver *drv = bs->drv;
182 int ret, open_ret;
183 
184 if (!drv) {
185 return -ENOMEDIUM;
186 }
187 if (drv->bdrv_snapshot_goto) {
188 return drv->bdrv_snapshot_goto(bs, snapshot_id);
189 }
190 
191 if (bs->file) {
192 drv->bdrv_close(bs);
193 ret = bdrv_snapshot_goto(bs->file->bs, snapshot_id);
194 open_ret = drv->bdrv_open(bs, NULL, bs->open_flags, NULL);
195 if (open_ret < 0) {
196 bdrv_unref(bs->file->bs);
197 bs->drv = NULL;
198 return open_ret;
199 }
200 return ret;
201 }
202 
203 return -ENOTSUP;
204 }

After a chat with Kevin, my understanding is that's because NULL
@options is not a valid parameter value for the bdrv_open callback of
the BlockDriver.
We shoule prepare a @options by adding the "file" key-value pair to
the actual options that were given for the node (i.e. bs->options),
and pass it to the callback.

Then Max pointed out that we're not guaranteed that "file" is a
nested QDict. The following commands will both result in segfaults:
$ ./qemu-img snapshot -a foo \
"json:{'driver':'raw','file':{'driver':'qcow2','file':{'driver':'file','filename':'foo.qcow2'}}}"
$ ./qemu-img snapshot -a foo --image-opts \
driver=raw,file.driver=qcow2,file.file.driver=file,file.file.filename=foo.qcow2

Max also proposed for the previous patch (v2) to:
(1) Remove every option in "options" that has a "file." prefix before
the qdict_put() call.
(2) Use bdrv_unref_child(bs, bs->file) instead of bdrv_unref(bs->file->bs).

So I adopted Max's suggestions, and here is the new patch.

Dong Jia Shi (1):
  block: pass the right options for BlockDriver.bdrv_open()

 block/snapshot.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

-- 
2.8.4




[Qemu-devel] [PATCH v3 1/1] block: pass the right options for BlockDriver.bdrv_open()

2017-03-28 Thread Dong Jia Shi
raw_open() expects the caller always passing in the right actual
@options parameter. But when trying to applying snapshot on a RBD
image, bdrv_snapshot_goto() calls raw_open() (by calling the
bdrv_open callback on the BlockDriver) with a NULL @options, and
that will result in a Segmentation fault.

For the other non-raw format drivers, it also makes sense to passing
in the actual options, althought they don't trigger the problem so
far.

Let's prepare a @options by adding the "file" key-value pair to a
copy of the actual options that were given for the node (i.e.
bs->options), and pass it to the callback.

BlockDriver.bdrv_open() expects bs->file to be NULL and just
overwrites it with the result from bdrv_open_child(). If that
bdrv_open_child() fails, the field becomes NULL. While we are at
it, we also correct the cleanning up action for a call failure of
BlockDriver.bdrv_open() by replacing bdrv_unref() with
bdrv_unref_child().

Suggested-by: Max Reitz 
Signed-off-by: Dong Jia Shi 
---
 block/snapshot.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/block/snapshot.c b/block/snapshot.c
index bf5c2ca..281626c 100644
--- a/block/snapshot.c
+++ b/block/snapshot.c
@@ -27,6 +27,7 @@
 #include "block/block_int.h"
 #include "qapi/error.h"
 #include "qapi/qmp/qerror.h"
+#include "qapi/qmp/qstring.h"
 
 QemuOptsList internal_snapshot_opts = {
 .name = "snapshot",
@@ -189,11 +190,20 @@ int bdrv_snapshot_goto(BlockDriverState *bs,
 }
 
 if (bs->file) {
+QDict *options = qdict_clone_shallow(bs->options);
+QDict *file_options;
+
+qdict_extract_subqdict(options, _options, "file.");
+QDECREF(file_options);
+qdict_put(options, "file",
+  qstring_from_str(bdrv_get_node_name(bs->file->bs)));
+
 drv->bdrv_close(bs);
 ret = bdrv_snapshot_goto(bs->file->bs, snapshot_id);
-open_ret = drv->bdrv_open(bs, NULL, bs->open_flags, NULL);
+open_ret = drv->bdrv_open(bs, options, bs->open_flags, NULL);
+QDECREF(options);
 if (open_ret < 0) {
-bdrv_unref(bs->file->bs);
+bdrv_unref_child(bs, bs->file);
 bs->drv = NULL;
 return open_ret;
 }
-- 
2.8.4




Re: [Qemu-devel] [PATCH 3/4] savevm: fix savevm after migration

2017-03-28 Thread Denis V. Lunev
On 03/28/2017 01:55 PM, Dr. David Alan Gilbert wrote:
> * Kevin Wolf (kw...@redhat.com) wrote:
>> Am 25.02.2017 um 20:31 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>> After migration all drives are inactive and savevm will fail with
>>>
>>> qemu-kvm: block/io.c:1406: bdrv_co_do_pwritev:
>>>Assertion `!(bs->open_flags & 0x0800)' failed.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>> What's the exact state you're in? I tried to reproduce this, but just
>> doing a live migration and then savevm on the destination works fine for
>> me.
>>
>> Hm... Or do you mean on the source? In that case, I think the operation
>> must fail, but of course more gracefully than now.
>>
>> Actually, the question that you're asking implicitly here is how the
>> source qemu process should be "reactivated" after a failed migration.
>> Currently, as far as I know, this is only with issuing a "cont" command.
>> It might make sense to provide a way to get control without resuming the
>> VM, but I doubt that adding automatic resume to every QMP command is the
>> right way to achieve it.
>>
>> Dave, Juan, what do you think?
> I'd only ever really thought of 'cont' or retrying the migration.
> However, it does make sense to me that you might want to do a savevm instead;
> if you can't migrate then perhaps a savevm is the best you can do before
> your machine dies.  Are there any other things that should be allowed?
>
> We would want to be careful not to accidentally reactivate the disks on the 
> source
> after what was actually a succesful migration.
>
> As for the actual patch contents, I'd leave that to you to say if it's
> OK from the block side of things.
>
> Dave
>
>>> diff --git a/block/snapshot.c b/block/snapshot.c
>>> index bf5c2ca5e1..256d06ac9f 100644
>>> --- a/block/snapshot.c
>>> +++ b/block/snapshot.c
>>> @@ -145,7 +145,8 @@ bool bdrv_snapshot_find_by_id_and_name(BlockDriverState 
>>> *bs,
>>>  int bdrv_can_snapshot(BlockDriverState *bs)
>>>  {
>>>  BlockDriver *drv = bs->drv;
>>> -if (!drv || !bdrv_is_inserted(bs) || bdrv_is_read_only(bs)) {
>>> +if (!drv || !bdrv_is_inserted(bs) || bdrv_is_read_only(bs) ||
>>> +(bs->open_flags & BDRV_O_INACTIVE)) {
>>>  return 0;
>>>  }
>> I wasn't sure if this doesn't disable too much, but it seems it only
>> makes 'info snapshots' turn up empty, which might not be nice, but maybe
>> tolerable.
>>
>> At least it should definitely fix the assertion.
> Did Denis have some concerns about this chunk?
Yep. I really think that this check is unnecessary and wrong.
Actually all disks are in the INACTIVE state and we will face
the problem later on the actual write. This exact operation
is sane.

Den



Re: [Qemu-devel] [PATCH for 2.9 1/1] block: add missed aio_context_acquire into release_drive

2017-03-28 Thread Fam Zheng
On Tue, 03/28 19:12, Denis V. Lunev wrote:
> Recently we expirience hang with iothreads enabled with the following
> call trace:
> Thread 1 (Thread 0x7fa95efebc80 (LWP 177117)):
> 0  ppoll () from /lib64/libc.so.6
> 2  qemu_poll_ns () at qemu-timer.c:313
> 3  aio_poll () at aio-posix.c:457
> 4  bdrv_flush () at block/io.c:2641
> 5  bdrv_close () at block.c:2143
> 6  bdrv_delete () at block.c:2352
> 7  bdrv_unref () at block.c:3429
> 8  blk_remove_bs () at block/block-backend.c:427
> 9  blk_delete () at block/block-backend.c:178
> 10 blk_unref () at block/block-backend.c:226
> 11 object_property_del_all () at qom/object.c:399
> 12 object_finalize () at qom/object.c:461
> 13 object_unref () at qom/object.c:898
> 14 object_property_del_child () at qom/object.c:422
> 15 qmp_marshal_device_del () at qmp-marshal.c:1145
> 16 handle_qmp_command () at /usr/src/debug/qemu-2.6.0/monitor.c:3929
> 
> Technically bdrv_flush() stucks in
> while (rwco.ret == NOT_DONE) {
> aio_poll(aio_context, true);
> }
> but rwco.ret is equal to 0 thus we have missed wakeup. Code investigation
> reveals that we do not have performed aio_context_acquire() on this call
> stack.
> 
> This patch adds missed lock.
> 
> Signed-off-by: Denis V. Lunev 
> CC: Kevin Wolf 
> CC: Max Reitz 
> CC: Eric Blake 
> CC: Markus Armbruster 

Nit: reading the subject I thought it's an unbalanced acquire/release, but it is
actually a missing pair.

In bdrv_unref we should have asserted we have acquired the AioContext, that way
you wouldn't have been bit by this bug.

Reviewed-by: Fam Zheng 



Re: [Qemu-devel] [PATCH v6] vfio error recovery: kernel support

2017-03-28 Thread Michael S. Tsirkin
On Tue, Mar 28, 2017 at 10:12:33AM -0600, Alex Williamson wrote:
> On Tue, 28 Mar 2017 21:47:00 +0800
> Cao jin  wrote:
> 
> > On 03/25/2017 06:12 AM, Alex Williamson wrote:
> > > On Thu, 23 Mar 2017 17:07:31 +0800
> > > Cao jin  wrote:
> > > 
> > > A more appropriate patch subject would be:
> > > 
> > > vfio-pci: Report correctable errors and slot reset events to user
> > >  
> > 
> > Correctable? It is confusing to me. Correctable error has its clear
> > definition in PCIe spec, shouldn't it be "non-fatal"?
> 
> My mistake, non-fatal.
>  
> > >> From: "Michael S. Tsirkin"   
> > > 
> > > This hardly seems accurate anymore.  You could say Suggested-by and let
> > > Michael add a sign-off, but it's changed since he sent it.
> > >   
> > >>
> > >> 0. What happens now (PCIE AER only)
> > >>Fatal errors cause a link reset. Non fatal errors don't.
> > >>All errors stop the QEMU guest eventually, but not immediately,
> > >>because it's detected and reported asynchronously.
> > >>Interrupts are forwarded as usual.
> > >>Correctable errors are not reported to user at all.
> > >>
> > >>Note:
> > >>PPC EEH is different, but this approach won't affect EEH. EEH treat
> > >>all errors as fatal ones in AER, so they will still be signalled to 
> > >> user
> > >>via the legacy eventfd.  Besides, all devices/functions in a PE 
> > >> belongs
> > >>to the same IOMMU group, so the slot_reset handler in this approach
> > >>won't affect EEH either.
> > >>
> > >> 1. Correctable errors
> > >>Hardware can correct these errors without software intervention,
> > >>clear the error status is enough, this is what already done now.
> > >>No need to recover it, nothing changed, leave it as it is.
> > >>
> > >> 2. Fatal errors
> > >>They will induce a link reset. This is troublesome when user is
> > >>a QEMU guest. This approach doesn't touch the existing mechanism.
> > >>
> > >> 3. Non-fatal errors
> > >>Before this patch, they are signalled to user the same way as fatal 
> > >> ones.
> > >>With this patch, a new eventfd is introduced only for non-fatal error
> > >>notification. By splitting non-fatal ones out, it will benefit AER
> > >>recovery of a QEMU guest user.
> > >>
> > >>To maintain backwards compatibility with userspace, non-fatal errors
> > >>will continue to trigger via the existing error interrupt index if a
> > >>non-fatal signaling mechanism has not been registered.
> > >>
> > >>Note:
> > >>In case of PCI Express errors, kernel might request a slot reset
> > >>affecting our device (from our point of view this is a passive device
> > >>reset as opposed to an active one requested by vfio itself).
> > >>This might currently happen if a slot reset is requested by a driver
> > >>(other than vfio) bound to another device function in the same slot.
> > >>This will cause our device to lose its state so report this event to
> > >>userspace.  
> > > 
> > > I tried to convey this in my last comments, I don't think this is an
> > > appropriate commit log.  Lead with what is the problem you're trying to
> > > fix and why, what is the benefit to the user, and how is the change
> > > accomplished.  If you want to provide a State of Error Handling in
> > > VFIO, append it after the main points of the commit log.  
> > 
> > ok.
> > 
> > > 
> > > I also asked in my previous comments to provide examples of errors that
> > > might trigger correctable errors to the user, this comment seems to
> > > have been missed.  In my experience, AERs generated during device
> > > assignment are generally hardware faults or induced by bad guest
> > > drivers.  These are cases where a single fatal error is an appropriate
> > > and sufficient response.  We've scaled back this support to the point
> > > where we're only improving the situation of correctable errors and I'm
> > > not convinced this is worthwhile and we're not simply checking a box on
> > > an ill-conceived marketing requirements document.  
> > 
> > Sorry. I noticed that question: "what actual errors do we expect
> > userspace to see as non-fatal errors?", but I am confused about it.
> > Correctable, non-fatal, fatal errors are clearly defined in PCIe spec,
> > and Uncorrectable Error Severity Register will tell which is fatal, and
> > which is non-fatal, this register is configurable, they are device
> > specific as I guess. AER core driver distinguish them by
> > pci_channel_io_normal/pci_channel_io_frozen,  So I don't understand your
> > question. Or
> > 
> > Or, Do you mean we could list the default non-fatal error of
> > Uncorrectable Error Severity Register which is provided by PCIe spec?
> 
> I'm trying to ask why is this patch series useful.  It's clearly
> possible for us to signal non-fatal errors for a device to a guest, but
> why is it necessarily a good idea to do so?  What additional 

Re: [Qemu-devel] [PATCH v3 3/3] vfio-pci: process non fatal error of AER

2017-03-28 Thread Michael S. Tsirkin
On Tue, Mar 28, 2017 at 10:12:25AM -0600, Alex Williamson wrote:
> On Tue, 28 Mar 2017 21:49:17 +0800
> Cao jin  wrote:
> 
> > On 03/25/2017 06:12 AM, Alex Williamson wrote:
> > > On Thu, 23 Mar 2017 17:09:23 +0800
> > > Cao jin  wrote:
> > >   
> > >> Make use of the non fatal error eventfd that the kernel module provide
> > >> to process the AER non fatal error. Fatal error still goes into the
> > >> legacy way which results in VM stop.
> > >>
> > >> Register the handler, wait for notification. Construct aer message and
> > >> pass it to root port on notification. Root port will trigger an interrupt
> > >> to signal guest, then guest driver will do the recovery.  
> > > 
> > > Can we guarantee this is the better solution in all cases or could
> > > there be guests without AER support where the VM stop is the better
> > > solution?
> > >   
> > 
> > Currently, we only have VM stop on errors, that looks the same as a
> > sudden power down to me.  With this solution, we have about
> > 50%(non-fatal) chance to reduce the sudden power-down risk.
> 
> If half of all faults are expected to be non-fatal, then you must have
> some real examples of devices triggering non-fatal errors which can be
> corrected in the guest driver that you can share to justify why it's a
> good thing to enable this behavior.
> 
> > What if a guest doesn't support AER?  It looks the same as a host
> > without AER support. Now I only can speculate the worst condition: guest
> > crash, would that be quite different from a sudden power-down?
> 
> Yes, it's very different.  In one case we contain the fault by stopping
> the guest, in the other case we allow the guest to continue operating
> with a known fault in the device which may allow the fault to propagate
> and perhaps go unnoticed.  We have established with the current
> behavior that QEMU will prevent further propagation of a fault by
> halting the VM.  To change QEMU's behavior here risks that a VM relying
> on that behavior no longer has that protection.  So it seems we either
> need to detect whether the VM is handling AER or we need to require the
> VM administrator to opt-in to this new feature.

An opt-in flag sounds very reasonable. It can also specify whether
to log the errors. We have a similar flag for disk errors.

>  Real hardware has
> these same issues and I believe there are handshakes that can be done
> through ACPI to allow the guest to take over error handling from the
> system.

No, that's only for error reporting IIUC. Driver needs to be
aware of a chance for errors to trigger and be able to
handle them.

So yes, some guests might have benefitted from VM stop
on AER but
1. the stop happens asynchronously so if guest can't handle
   errors there's a chance it is already crashed by the time we
   try to do vm stop
2. it's more of a chance by-product - we never promised
   guests that VMs would be more robust than bare metal



> > >> Signed-off-by: Dou Liyang 
> > >> Signed-off-by: Cao jin 
> > >> ---
> > >>  hw/vfio/pci.c  | 202 
> > >> +
> > >>  hw/vfio/pci.h  |   2 +
> > >>  linux-headers/linux/vfio.h |   2 +
> > >>  3 files changed, 206 insertions(+)
> > >>
> > >> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > >> index 3d0d005..c6786d5 100644
> > >> --- a/hw/vfio/pci.c
> > >> +++ b/hw/vfio/pci.c
> > >> @@ -2432,6 +2432,200 @@ static void vfio_put_device(VFIOPCIDevice *vdev)
> > >>  vfio_put_base_device(>vbasedev);
> > >>  }
> > >>  
> > >> +static void vfio_non_fatal_err_notifier_handler(void *opaque)
> > >> +{
> > >> +VFIOPCIDevice *vdev = opaque;
> > >> +PCIDevice *dev = >pdev;
> > >> +PCIEAERMsg msg = {
> > >> +.severity = PCI_ERR_ROOT_CMD_NONFATAL_EN,
> > >> +.source_id = pci_requester_id(dev),
> > >> +};
> > >> +
> > >> +if (!event_notifier_test_and_clear(>non_fatal_err_notifier)) {
> > >> +return;
> > >> +}
> > >> +
> > >> +/* Populate the aer msg and send it to root port */
> > >> +if (dev->exp.aer_cap) {  
> > > 
> > > Why would we have registered this notifier otherwise?
> > >   
> > >> +uint8_t *aer_cap = dev->config + dev->exp.aer_cap;
> > >> +uint32_t uncor_status;
> > >> +bool isfatal;
> > >> +
> > >> +uncor_status = vfio_pci_read_config(dev,
> > >> +dev->exp.aer_cap + PCI_ERR_UNCOR_STATUS, 4);
> > >> +if (!uncor_status) {
> > >> +return;
> > >> +}
> > >> +
> > >> +isfatal = uncor_status & pci_get_long(aer_cap + 
> > >> PCI_ERR_UNCOR_SEVER);
> > >> +if (isfatal) {
> > >> +goto stop;
> > >> +}  
> > > 
> > > Huh?  How can we get a non-fatal error notice for a fatal error?  (and
> > > why are we saving this to a variable rather than testing it within the
> > > 'if' condition?
> > >  
> > 
> > Both 

Re: [Qemu-devel] [PATCH v4 1/8] xen: import ring.h from xen

2017-03-28 Thread Stefano Stabellini
On Tue, 28 Mar 2017, Juergen Gross wrote:
> On 28/03/17 00:48, Stefano Stabellini wrote:
> > On Mon, 27 Mar 2017, Juergen Gross wrote:
> >> On 24/03/17 18:37, Stefano Stabellini wrote:
> >>> On Fri, 24 Mar 2017, Juergen Gross wrote:
>  On 23/03/17 19:22, Stefano Stabellini wrote:
> > On Thu, 23 Mar 2017, Paolo Bonzini wrote:
> >> On 23/03/2017 14:55, Juergen Gross wrote:
> >>> On 23/03/17 14:00, Greg Kurz wrote:
>  On Mon, 20 Mar 2017 11:19:05 -0700
>  Stefano Stabellini  wrote:
> 
> > Do not use the ring.h header installed on the system. Instead, 
> > import
> > the header into the QEMU codebase. This avoids problems when QEMU is
> > built against a Xen version too old to provide all the ring macros.
> >
> > Signed-off-by: Stefano Stabellini 
> > Reviewed-by: Greg Kurz 
> > CC: anthony.per...@citrix.com
> > CC: jgr...@suse.com
> > ---
> > NB: The new macros have not been committed to Xen yet. Do not apply 
> > this
> > patch until they do.
> > ---
> 
>  Looking at your other series for the kernel part of this feature:
> 
>  https://lkml.org/lkml/2017/3/22/761
> 
>  I realize that the ring.h header from Xen also exists in the kernel 
>  tree... 
> 
>  Shouldn't all the code that can be used in both kernel and userspace 
>  go to a
>  header file under include/uapi in the kernel tree ? And then we 
>  would import
>  it under include/standard-headers/linux in the QEMU tree and we 
>  could keep it
>  in sync using scripts/update-linux-headers.sh.
> 
>  Cc'ing Paolo for insights.
> >>>
> >>> As Xen isn't part of the kernel we don't want that. You can use and/or
> >>> build qemu with xen-9pfs backend support on an old Linux kernel 
> >>> without
> >>> the related frontend.
> >>
> >> As long as the header changes rarely, I guess it's fine not to go
> >> through update-linux-headers.sh.
> >
> > Very rarely, last time ring.h was changed was 2015, and to introduce a
> > new macro (which we don't necessarily need in QEMU).
> >
> >
> >>> OTOH I don't see the advantage of not using the headers from Xen. This
> >>> is working for qdisk and pvusb backends and for all the Xen libraries.
> >>> Do you expect the 9pfs backend to be used for a qemu version built
> >>> against a Xen version not supporting that backend?
> >
> > Yes, I think that is entirely possible: Xen and QEMU versions can mix
> > and match.
> >
> > Keeping in mind that the 9pfs backend has actually no build dependencies
> > on Xen, except for these new ring.h macros, we have the following
> > options:
> >
> > 1) we build the 9pfs backend only for Xen >= 4.9, because of the new
> >macros in ring.h that we need
> 
>  Right. You have sent 9pfs support patches for Xen tools. So obviously
>  you need a proper Xen version to use 9pfs. Why not build qemu against
>  it? Do you really expect a new Xen being used with an old qemu while
>  wanting to use new features? That makes no sense for me.
> >>>  
> >>> Tools support is needed to setup the frontend/backend connection as
> >>> usual, but that's not a requirement for building the 9pfs backend. In
> >>> fact, the backend doesn't need any tools support for it to work. The
> >>> macro themselves are just a convenience - the backend would work just
> >>> fine without them. Why restrict the QEMU build gratuitously?
> >>
> >> You are duplicating a header without any real benefit I can see. This
> >> is adding future work for keeping both versions of the header in sync.
> >>
> >> In which scenario would you want qemu to support xen-9pfs without being
> >> built against a Xen version supporting xen-9pfs?
> >>
> >> I am not completely against copying the header, I just don't see an
> >> advantage for any distro or user in doing it.
> > 
> > I understand your point of view, and honestly it wouldn't be a problem
> > doing it the way you suggested either. However, I think that going
> > forward it will be less of a maintenance pain to keep ring.h in sync,
> > compared to maintaining a versioned build dependency between Xen and
> > QEMU for the compilation of one PV backend. We do have version checks
> > in QEMU for Xen compatibility, but not for PV backends or the xenpv
> > machine yet.
> 
> For the pvUSB backend I just used a mandatory macro from the header for
> the #ifdef. The backend will signal support when it was defined during
> build and will refuse initialization otherwise. Xen tools are able to
> recoginze qemu support of the backend by looking into Xenstore.


What do you think of the following:

diff --git a/hw/9pfs/Makefile.objs 

Re: [Qemu-devel] [PATCH v2] xen: additionally restrict xenforeignmemory operations

2017-03-28 Thread Stefano Stabellini
On Tue, 28 Mar 2017, Paul Durrant wrote:
> Commit f0f272baf3a7 "xen: use libxendevice model to restrict operations"
> added a command-line option (-xen-domid-restrict) to limit operations
> using the libxendevicemodel API to a specified domid. The commit also
> noted that the restriction would be extended to cover operations issued
> via other xen libraries by subsequent patches.
> 
> My recent Xen patch [1] added a call to the xenforeignmemory API to allow
> it to be restricted. This patch now makes use of that new call when the
> -xen-domid-restrict option is passed.
> 
> [1] http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=5823d6eb
> 
> Signed-off-by: Paul Durrant 

Reviewed-by: Stefano Stabellini 


> ---
> Cc: Stefano Stabellini 
> Cc: Anthony Perard 
> 
> v2:
>  - Use Stefano's re-arrangement
> ---
>  include/hw/xen/xen_common.h | 134 
> ++--
>  1 file changed, 78 insertions(+), 56 deletions(-)
> 
> diff --git a/include/hw/xen/xen_common.h b/include/hw/xen/xen_common.h
> index 4f3bd35..624fb86 100644
> --- a/include/hw/xen/xen_common.h
> +++ b/include/hw/xen/xen_common.h
> @@ -27,6 +27,58 @@ extern xc_interface *xen_xc;
>   * We don't support Xen prior to 4.2.0.
>   */
>  
> +/* Xen 4.2 through 4.6 */
> +#if CONFIG_XEN_CTRL_INTERFACE_VERSION < 40701
> +
> +typedef xc_interface xenforeignmemory_handle;
> +typedef xc_evtchn xenevtchn_handle;
> +typedef xc_gnttab xengnttab_handle;
> +
> +#define xenevtchn_open(l, f) xc_evtchn_open(l, f);
> +#define xenevtchn_close(h) xc_evtchn_close(h)
> +#define xenevtchn_fd(h) xc_evtchn_fd(h)
> +#define xenevtchn_pending(h) xc_evtchn_pending(h)
> +#define xenevtchn_notify(h, p) xc_evtchn_notify(h, p)
> +#define xenevtchn_bind_interdomain(h, d, p) xc_evtchn_bind_interdomain(h, d, 
> p)
> +#define xenevtchn_unmask(h, p) xc_evtchn_unmask(h, p)
> +#define xenevtchn_unbind(h, p) xc_evtchn_unbind(h, p)
> +
> +#define xengnttab_open(l, f) xc_gnttab_open(l, f)
> +#define xengnttab_close(h) xc_gnttab_close(h)
> +#define xengnttab_set_max_grants(h, n) xc_gnttab_set_max_grants(h, n)
> +#define xengnttab_map_grant_ref(h, d, r, p) xc_gnttab_map_grant_ref(h, d, r, 
> p)
> +#define xengnttab_unmap(h, a, n) xc_gnttab_munmap(h, a, n)
> +#define xengnttab_map_grant_refs(h, c, d, r, p) \
> +xc_gnttab_map_grant_refs(h, c, d, r, p)
> +#define xengnttab_map_domain_grant_refs(h, c, d, r, p) \
> +xc_gnttab_map_domain_grant_refs(h, c, d, r, p)
> +
> +#define xenforeignmemory_open(l, f) xen_xc
> +#define xenforeignmemory_close(h)
> +
> +static inline void *xenforeignmemory_map(xc_interface *h, uint32_t dom,
> + int prot, size_t pages,
> + const xen_pfn_t arr[/*pages*/],
> + int err[/*pages*/])
> +{
> +if (err)
> +return xc_map_foreign_bulk(h, dom, prot, arr, err, pages);
> +else
> +return xc_map_foreign_pages(h, dom, prot, arr, pages);
> +}
> +
> +#define xenforeignmemory_unmap(h, p, s) munmap(p, s * XC_PAGE_SIZE)
> +
> +#else /* CONFIG_XEN_CTRL_INTERFACE_VERSION >= 40701 */
> +
> +#include 
> +#include 
> +#include 
> +
> +#endif
> +
> +extern xenforeignmemory_handle *xen_fmem;
> +
>  #if CONFIG_XEN_CTRL_INTERFACE_VERSION < 40900
>  
>  typedef xc_interface xendevicemodel_handle;
> @@ -159,6 +211,13 @@ static inline int xendevicemodel_restrict(
>  return -1;
>  }
>  
> +static inline int xenforeignmemory_restrict(
> +xenforeignmemory_handle *fmem, domid_t domid)
> +{
> +errno = ENOTTY;
> +return -1;
> +}
> +
>  #else /* CONFIG_XEN_CTRL_INTERFACE_VERSION >= 40900 */
>  
>  #include 
> @@ -215,69 +274,32 @@ static inline int xen_modified_memory(domid_t domid, 
> uint64_t first_pfn,
>  
>  static inline int xen_restrict(domid_t domid)
>  {
> -int rc = xendevicemodel_restrict(xen_dmod, domid);
> +int rc;
>  
> -trace_xen_domid_restrict(errno);
> +/* Attempt to restrict devicemodel operations */
> +rc = xendevicemodel_restrict(xen_dmod, domid);
> +trace_xen_domid_restrict(rc ? errno : 0);
>  
> -if (errno == ENOTTY) {
> -return 0;
> +if (rc < 0) {
> +/*
> + * If errno is ENOTTY then restriction is not implemented so
> + * there's no point in trying to restrict other types of
> + * operation, but it should not be treated as a failure.
> + */
> +if (errno == ENOTTY) {
> +return 0;
> +}
> +
> +return rc;
>  }
>  
> -return rc;
> -}
> -
> -/* Xen 4.2 through 4.6 */
> -#if CONFIG_XEN_CTRL_INTERFACE_VERSION < 40701
> -
> -typedef xc_interface xenforeignmemory_handle;
> -typedef xc_evtchn xenevtchn_handle;
> -typedef xc_gnttab xengnttab_handle;
> -
> -#define xenevtchn_open(l, f) xc_evtchn_open(l, f);
> -#define xenevtchn_close(h) xc_evtchn_close(h)
> -#define xenevtchn_fd(h) 

[Qemu-devel] [PULL 2/3] slirp: Make RA build more flexible

2017-03-28 Thread Samuel Thibault
Do not hardcode the RA size at all, use a pl_size variable which
accounts the accumulated size, and fill rip->ip_pl at the end.

This will allow to make some blocks optional.

Signed-off-by: Samuel Thibault 
Reviewed-by: Philippe Mathieu-Daudé 
---
 slirp/ip6_icmp.c | 24 +---
 1 file changed, 9 insertions(+), 15 deletions(-)

diff --git a/slirp/ip6_icmp.c b/slirp/ip6_icmp.c
index 298a48dd25..d0f5cc1456 100644
--- a/slirp/ip6_icmp.c
+++ b/slirp/ip6_icmp.c
@@ -143,17 +143,10 @@ void ndp_send_ra(Slirp *slirp)
 /* Build IPv6 packet */
 struct mbuf *t = m_get(slirp);
 struct ip6 *rip = mtod(t, struct ip6 *);
+size_t pl_size = 0;
 rip->ip_src = (struct in6_addr)LINKLOCAL_ADDR;
 rip->ip_dst = (struct in6_addr)ALLNODES_MULTICAST;
 rip->ip_nh = IPPROTO_ICMPV6;
-rip->ip_pl = htons(ICMP6_NDP_RA_MINLEN
-+ NDPOPT_LINKLAYER_LEN
-+ NDPOPT_PREFIXINFO_LEN
-#ifndef _WIN32
-+ NDPOPT_RDNSS_LEN
-#endif
-);
-t->m_len = sizeof(struct ip6) + ntohs(rip->ip_pl);
 
 /* Build ICMPv6 packet */
 t->m_data += sizeof(struct ip6);
@@ -171,6 +164,7 @@ void ndp_send_ra(Slirp *slirp)
 ricmp->icmp6_nra.reach_time = htonl(NDP_AdvReachableTime);
 ricmp->icmp6_nra.retrans_time = htonl(NDP_AdvRetransTime);
 t->m_data += ICMP6_NDP_RA_MINLEN;
+pl_size += ICMP6_NDP_RA_MINLEN;
 
 /* Source link-layer address (NDP option) */
 struct ndpopt *opt = mtod(t, struct ndpopt *);
@@ -178,6 +172,7 @@ void ndp_send_ra(Slirp *slirp)
 opt->ndpopt_len = NDPOPT_LINKLAYER_LEN / 8;
 in6_compute_ethaddr(rip->ip_src, opt->ndpopt_linklayer);
 t->m_data += NDPOPT_LINKLAYER_LEN;
+pl_size += NDPOPT_LINKLAYER_LEN;
 
 /* Prefix information (NDP option) */
 struct ndpopt *opt2 = mtod(t, struct ndpopt *);
@@ -192,6 +187,7 @@ void ndp_send_ra(Slirp *slirp)
 opt2->ndpopt_prefixinfo.reserved2 = 0;
 opt2->ndpopt_prefixinfo.prefix = slirp->vprefix_addr6;
 t->m_data += NDPOPT_PREFIXINFO_LEN;
+pl_size += NDPOPT_PREFIXINFO_LEN;
 
 #ifndef _WIN32
 /* Prefix information (NDP option) */
@@ -203,16 +199,14 @@ void ndp_send_ra(Slirp *slirp)
 opt3->ndpopt_rdnss.lifetime = htonl(2 * NDP_MaxRtrAdvInterval);
 opt3->ndpopt_rdnss.addr = slirp->vnameserver_addr6;
 t->m_data += NDPOPT_RDNSS_LEN;
+pl_size += NDPOPT_RDNSS_LEN;
 #endif
 
+rip->ip_pl = htons(pl_size);
+t->m_data -= sizeof(struct ip6) + pl_size;
+t->m_len = sizeof(struct ip6) + pl_size;
+
 /* ICMPv6 Checksum */
-#ifndef _WIN32
-t->m_data -= NDPOPT_RDNSS_LEN;
-#endif
-t->m_data -= NDPOPT_PREFIXINFO_LEN;
-t->m_data -= NDPOPT_LINKLAYER_LEN;
-t->m_data -= ICMP6_NDP_RA_MINLEN;
-t->m_data -= sizeof(struct ip6);
 ricmp->icmp6_cksum = ip6_cksum(t);
 
 ip6_output(NULL, t, 0);
-- 
2.11.0




[Qemu-devel] [PULL 1/3] slirp: fix compilation errors with DEBUG set

2017-03-28 Thread Samuel Thibault
From: Laurent Vivier 

slirp/slirp.c: In function 'get_dns_addr_resolv_conf':
slirp/slirp.c:202:29: error: initialization discards 'const' qualifier from 
pointer target type [-Werror=discarded-qualifiers]
 char *res = inet_ntop(af, tmp_addr, s, sizeof(s));
 ^
slirp/slirp.c:204:25: error: assignment discards 'const' qualifier from pointer 
target type [-Werror=discarded-qualifiers]
 res = "(string conversion error)";

Signed-off-by: Laurent Vivier 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Samuel Thibault 
---
 slirp/slirp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/slirp/slirp.c b/slirp/slirp.c
index 60539de7a3..5a94b06f5e 100644
--- a/slirp/slirp.c
+++ b/slirp/slirp.c
@@ -198,7 +198,7 @@ static int get_dns_addr_resolv_conf(int af, void 
*pdns_addr, void *cached_addr,
 #ifdef DEBUG
 else {
 char s[INET6_ADDRSTRLEN];
-char *res = inet_ntop(af, tmp_addr, s, sizeof(s));
+const char *res = inet_ntop(af, tmp_addr, s, sizeof(s));
 if (!res) {
 res = "(string conversion error)";
 }
-- 
2.11.0




[Qemu-devel] [PULL 3/3] slirp: Send RDNSS in RA only if host has an IPv6 DNS server

2017-03-28 Thread Samuel Thibault
Previously we would always send an RDNSS option in the RA, making the guest
try to resolve DNS through IPv6, even if the host does not actually have
and IPv6 DNS server available.

This makes the RDNSS option enabled only when an IPv6 DNS server is
available.

Signed-off-by: Samuel Thibault 
Reviewed-by: Philippe Mathieu-Daudé 
---
 slirp/ip6_icmp.c | 25 ++---
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/slirp/ip6_icmp.c b/slirp/ip6_icmp.c
index d0f5cc1456..777eb574be 100644
--- a/slirp/ip6_icmp.c
+++ b/slirp/ip6_icmp.c
@@ -144,6 +144,9 @@ void ndp_send_ra(Slirp *slirp)
 struct mbuf *t = m_get(slirp);
 struct ip6 *rip = mtod(t, struct ip6 *);
 size_t pl_size = 0;
+struct in6_addr addr;
+uint32_t scope_id;
+
 rip->ip_src = (struct in6_addr)LINKLOCAL_ADDR;
 rip->ip_dst = (struct in6_addr)ALLNODES_MULTICAST;
 rip->ip_nh = IPPROTO_ICMPV6;
@@ -189,18 +192,18 @@ void ndp_send_ra(Slirp *slirp)
 t->m_data += NDPOPT_PREFIXINFO_LEN;
 pl_size += NDPOPT_PREFIXINFO_LEN;
 
-#ifndef _WIN32
 /* Prefix information (NDP option) */
-/* disabled for windows for now, until get_dns6_addr is implemented */
-struct ndpopt *opt3 = mtod(t, struct ndpopt *);
-opt3->ndpopt_type = NDPOPT_RDNSS;
-opt3->ndpopt_len = NDPOPT_RDNSS_LEN / 8;
-opt3->ndpopt_rdnss.reserved = 0;
-opt3->ndpopt_rdnss.lifetime = htonl(2 * NDP_MaxRtrAdvInterval);
-opt3->ndpopt_rdnss.addr = slirp->vnameserver_addr6;
-t->m_data += NDPOPT_RDNSS_LEN;
-pl_size += NDPOPT_RDNSS_LEN;
-#endif
+if (get_dns6_addr(, _id) >= 0) {
+/* Host system does have an IPv6 DNS server, announce our proxy.  */
+struct ndpopt *opt3 = mtod(t, struct ndpopt *);
+opt3->ndpopt_type = NDPOPT_RDNSS;
+opt3->ndpopt_len = NDPOPT_RDNSS_LEN / 8;
+opt3->ndpopt_rdnss.reserved = 0;
+opt3->ndpopt_rdnss.lifetime = htonl(2 * NDP_MaxRtrAdvInterval);
+opt3->ndpopt_rdnss.addr = slirp->vnameserver_addr6;
+t->m_data += NDPOPT_RDNSS_LEN;
+pl_size += NDPOPT_RDNSS_LEN;
+}
 
 rip->ip_pl = htons(pl_size);
 t->m_data -= sizeof(struct ip6) + pl_size;
-- 
2.11.0




[Qemu-devel] [PULL 0/3] slirp updates

2017-03-28 Thread Samuel Thibault
The following changes since commit df9046363220e57d45818312759b954c033c58ab:

  Update version for v2.9.0-rc2 release (2017-03-28 19:11:16 +0100)

are available in the git repository at:

  http://people.debian.org/~sthibault/qemu.git tags/samuel-thibault

for you to fetch changes up to a2f80fdfc683019901cdf4c0863a5920c0ca7245:

  slirp: Send RDNSS in RA only if host has an IPv6 DNS server (2017-03-29 
00:51:25 +0200)


slirp updates


Laurent Vivier (1):
  slirp: fix compilation errors with DEBUG set

Samuel Thibault (2):
  slirp: Make RA build more flexible
  slirp: Send RDNSS in RA only if host has an IPv6 DNS server

 slirp/ip6_icmp.c | 47 ++-
 slirp/slirp.c|  2 +-
 2 files changed, 23 insertions(+), 26 deletions(-)



[Qemu-devel] [ANNOUNCE] QEMU 2.9.0-rc2 is now available

2017-03-28 Thread Michael Roth
Hello,

On behalf of the QEMU Team, I'd like to announce the availability of the
third release candidate for the QEMU 2.9 release.  This release is meant
for testing purposes and should not be used in a production environment.

  http://download.qemu-project.org/qemu-2.9.0-rc2.tar.xz
  http://download.qemu-project.org/qemu-2.9.0-rc2.tar.xz.sig

You can help improve the quality of the QEMU 2.9 release by testing this
release and reporting bugs on Launchpad:

  https://bugs.launchpad.net/qemu/

The release plan, as well a documented known issues for release
candidates, are available at:

  http://wiki.qemu.org/Planning/2.9

The dates have all been pushed back a week due to delays with the
initial RC.

Please add entries to the ChangeLog for the 2.9 release below:

  http://wiki.qemu.org/ChangeLog/2.9

Changes since rc1:

df90463: Update version for v2.9.0-rc2 release (Peter Maydell)
44fdc76: sockets: Fix socket_address_to_string() hostname truncation (Markus 
Armbruster)
2836284: rbd: Fix bugs around -drive parameter "server" (Markus Armbruster)
577d8c9: rbd: Revert -blockdev parameter password-secret (Markus Armbruster)
46f: rbd: Revert -blockdev and -drive parameter auth-supported (Markus 
Armbruster)
0784639: rbd: Clean up qemu_rbd_create()'s detour through QemuOpts (Markus 
Armbruster)
cbf036b: rbd: Clean up runtime_opts, fix -drive to reject filename (Markus 
Armbruster)
82f20e8: rbd: Don't accept -drive driver=rbd, keyvalue-pairs=... (Markus 
Armbruster)
8efb339: rbd: Clean up after the previous commit (Markus Armbruster)
730b00b: rbd: Don't limit length of parameter values (Markus Armbruster)
f51c363: rbd: Fix to cleanly reject -drive without pool or image (Markus 
Armbruster)
eb87203: rbd: Reject -blockdev server.*.{numeric, to, ipv4, ipv6} (Markus 
Armbruster)
79b7a77: block: Declare blockdev-add and blockdev-del supported (Markus 
Armbruster)
7609ffb: trace: fix tcg tracing build breakage (Stefan Hajnoczi)
dc62da8: parallels: wrong call to bdrv_truncate (Denis V. Lunev)
5b12c16: replay/replay.c: bump REPLAY_VERSION (Alex Bennée)
8cfef89: tcg: Add a new line after incompatibility warning (Pranith Kumar)
0096109: ui/console: use exclusive mechanism directly (Alex Bennée)
8539093: ui/console: ensure do_safe_dpy_refresh holds BQL (Alex Bennée)
95992b6: bsd-user: align use of mmap_lock to that of linux-user (Alex Bennée)
02bed6b: user-exec: handle synchronous signals from QEMU gracefully (Alex 
Bennée)
34ef723: tests/virtio-9p-test: Don't call le*_to_cpus on fields of packed 
struct (Peter Maydell)
d63fb19: 9pfs: fix file descriptor leak (Li Qiang)
700f9ce: block/file-posix.c: Fix unused variable warning on OpenBSD (Peter 
Maydell)
bed58b4: scsi-generic: Fill in opt_xfer_len in INQUIRY reply if it is zero (Fam 
Zheng)
e5bcf96: file-posix: Make bdrv_flush() failure permanent without O_DIRECT 
(Kevin Wolf)
a12a712: nbd-client: fix handling of hungup connections (Paolo Bonzini)
c919297: qemu-img: print short help on getopt failure (Stefan Hajnoczi)
f707762: qemu-img: fix switch indentation in img_amend() (Stefan Hajnoczi)
4581c16: qemu-img: show help for invalid global options (Stefan Hajnoczi)
5354edd: Revert "apic: save apic_delivered flag" (Paolo Bonzini)
e4548bb: nbd: drop unused NBDClientSession.is_unix field (Stefan Hajnoczi)
12f8def: win32: replace custom mutex and condition variable with native 
primitives (Andrey Shedel)
e5766eb: vnc: fix reverse mode (Gerd Hoffmann)
8bce03e: ui/egl-helpers: fix egl 1.5 display init (Gerd Hoffmann)
db6cd4c: cirrus: fix PUTPIXEL macro (Gerd Hoffmann)
5709454: virtio-input: fix eventq batching (Ladi Prosek)
0f5a15e: virtio-input: free event queue when finalizing (Ladi Prosek)
7150d34: boot-serial-test: use -no-shutdown (Christian Borntraeger)
dfd0dcc: mem-prealloc: fix sysconf(_SC_NPROCESSORS_ONLN) failure case. 
(Jitendra Kolhe)
30663fd: tcg/i386: Check the size of instruction being translated (Pranith 
Kumar)
7140778: virtio-scsi: Fix acquire/release in dataplane handlers (Fam Zheng)
3d69f82: virtio-scsi: Make virtio_scsi_acquire/release public (Fam Zheng)
ade9c1a: clear pending status before calling memory commit (Xu, Anthony)
bd517b4: disas/microblaze: Remove unused REG_PC define (Peter Maydell)
0d3ef78: trace: Avoid abuse of amdvi_mmio_read (Eric Blake)
d17e744: trace: Fix incorrect megasas trace parameters (Eric Blake)
67adf4b: trace: Fix backwards mirror_yield parameters (Eric Blake)
0832970: qom: Fix regression with 'qom-type' (Eric Blake)
c50126a: configure: Fix cut-n-paste errors in OS deprecation warning (Peter 
Maydell)
a352aa6: target/s390x: Fix broken user mode (Stefan Weil)
b7bad50: cryptodev: fix asserting single queue (Halil Pasic)
50d19cf: cryptodev: setiv only when really need (Longpeng(Mike))
21f88d0: qapi: Fix QemuOpts visitor regression on unvisited input (Eric Blake)
9a6d1ac: qom: Avoid unvisited 'id'/'qom-type' in user_creatable_add_opts (Eric 
Blake)
600ac6a: blockjob: add devops to blockjob backends (John Snow)
f4d9cc8: block-backend: 

Re: [Qemu-devel] [PATCH 1/1] parallels: wrong call to bdrv_truncate

2017-03-28 Thread Denis V. Lunev
On 03/28/2017 07:26 PM, Kevin Wolf wrote:
> [ Cc: qemu-block ]
>
> Am 27.03.2017 um 16:38 hat Denis V. Lunev geschrieben:
>> Parallels driver should not call bdrv_truncate if the image was opened
>> in the read-only mode. Without the patch
>> qemu-img check harddisk.hds
>> asserts with
>> bdrv_truncate: Assertion `child->perm & BLK_PERM_RESIZE' failed.
>>
>> Parameters used on the write path are not needed if the image is opened
>> in the read-only mode.
>>
>> Signed-off-by: Denis V. Lunev 
>> Reported-by: Edgar Kaziahmedov 
>> CC: Stefan Hajnoczi 
>> ---
>>  block/parallels.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/block/parallels.c b/block/parallels.c
>> index 6bf9375..4173b3f 100644
>> --- a/block/parallels.c
>> +++ b/block/parallels.c
>> @@ -687,7 +687,8 @@ static int parallels_open(BlockDriverState *bs, QDict 
>> *options, int flags,
>>  if (local_err != NULL) {
>>  goto fail_options;
>>  }
>> -if (!bdrv_has_zero_init(bs->file->bs) ||
>> +
>> +if (!(flags & BDRV_O_RESIZE) || !bdrv_has_zero_init(bs->file->bs) ||
>>  bdrv_truncate(bs->file, bdrv_getlength(bs->file->bs)) != 0) {
>>  s->prealloc_mode = PRL_PREALLOC_MODE_FALLOCATE;
>>  }
> Relying on BDRV_O_RESIZE in block drivers is wrong. It is set in some
> paths (specifically the users of blk_new_open), but not in others. We
> should probably have filtered out the flag before passing it to the
> drivers.
>
> As a concrete example, if you're using -blockdev, the bdrv_truncate()
> call won't be executed after applying this patch.
>
> I think the correct way would be to check bdrv_is_read_only() instead.
>
> Kevin
hmmm. But why do we have

int bdrv_truncate(BdrvChild *child, int64_t offset)
{
BlockDriverState *bs = child->bs;
BlockDriver *drv = bs->drv;
int ret;

assert(child->perm & BLK_PERM_RESIZE);

if (!drv)
return -ENOMEDIUM;
if (!drv->bdrv_truncate)
return -ENOTSUP;
if (bs->read_only)
return -EACCES;

ret = drv->bdrv_truncate(bs, offset);

instead of

int bdrv_truncate(BdrvChild *child, int64_t offset)
{
BlockDriverState *bs = child->bs;
BlockDriver *drv = bs->drv;
int ret;

if (!drv)
return -ENOMEDIUM;
if (!drv->bdrv_truncate)
return -ENOTSUP;
if (bs->read_only)
return -EACCES;

assert(child->perm & BLK_PERM_RESIZE);
ret = drv->bdrv_truncate(bs, offset);

technically this will work properly for my case and calling of
bdrv_truncate could be valid.

Another thing, should we add assert like added into bdrv_co_pwritev,
namely
assert(!(bs->open_flags & BDRV_O_INACTIVE));
in the same place below access check.

Technically, the requested change is not a problem it looks a bit
strange and not consistent to me.

Den




Re: [Qemu-devel] [RFC v3 2/3] hw/intc/arm_gicv3_its: Implement state save/restore

2017-03-28 Thread Peter Maydell
On 28 March 2017 at 20:45, Juan Quintela  wrote:
> Let me see if I understood this correctly.
>
> We have an ARM_GICV3_ITS_COMMON.  And that has some fields.
> In particular:
>
> struct GICv3ITSState {
> /* Registers */
> uint32_t ctlr;
> uint64_t cbaser;
> uint64_t cwriter;
> uint64_t creadr;
> uint64_t baser[8];
> /* lots of things removed */
> };
>
>
>
> We have this in arm_gicv3_its_common.c  (it is exactly the same for
> post_load, so we forgot about it by now).
>
>
> static void gicv3_its_pre_save(void *opaque)
> {
> GICv3ITSState *s = (GICv3ITSState *)opaque; (*)
>/* nitpit: the cast
>is useless */
> GICv3ITSCommonClass *c = ARM_GICV3_ITS_COMMON_GET_CLASS(s);
>
> if (c->pre_save) {
> c->pre_save(s);
> }
> }
>
> And then we have in the patch:
>
>
>> @@ -109,6 +203,8 @@ static void kvm_arm_its_class_init(ObjectClass *klass, 
>> void *data)
>>
>>  dc->realize = kvm_arm_its_realize;
>>  icc->send_msi = kvm_its_send_msi;
>> +icc->pre_save = kvm_arm_its_pre_save;
>> +icc->post_load = kvm_arm_its_post_load;
>>  }
>
>
> struct GICv3ITSCommonClass {
> 
> void (*pre_save)(GICv3ITSState *s);
> void (*post_load)(GICv3ITSState *s);
> };
>
>
> Notice that I have only found one user of this on the tree, so I don't
> know if there is a good reason for this.

This is just following the existing pattern we have for
the GICv3 itself (and the GICv2, for that matter).
At some point we'll implement the emulated ITS which
will share the base class (and the vmstate).

thanks
-- PMM



Re: [Qemu-devel] [PATCH v3 for-2.10 0/4] block: Add errp to b{lk, drv}_truncate()

2017-03-28 Thread Max Reitz
On 28.03.2017 22:51, Max Reitz wrote:
> Having an Error parameter for these functions makes sense because we
> sometimes want a bit more information than just "Something failed". Some
> drivers already use error_report() and the like to emit this additional
> information, so it's rather obvious that we do want a real error object
> here.
> 
> 
> v3:
> - Patch 2: Keep "Could not resize image" message in qcow2_create2() by
>using error_prepend() [Kevin]
> - Patch 3: Dropped archipelago
> - Patch 4:
>   - Keep errno information where available [Kevin]
>   - Make all drivers generate error messages [Stefan/Eric]
>   - Drop generic error message from bdrv_truncate() [Stefan/Eric]

Oops, forgot the backport-diff against v2, here you go:

Key:
[] : patches are identical
[] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences,
respectively

001/4:[] [--] 'block/vhdx: Make vhdx_create() always set errp'
002/4:[0001] [FC] 'block: Add errp to b{lk,drv}_truncate()'
003/4:[0003] [FC] 'block: Add errp to BD.bdrv_truncate()'
004/4:[0025] [FC] 'block: Add .bdrv_truncate() error messages'


Max

> Max Reitz (4):
>   block/vhdx: Make vhdx_create() always set errp
>   block: Add errp to b{lk,drv}_truncate()
>   block: Add errp to BD.bdrv_truncate()
>   block: Add .bdrv_truncate() error messages
> 
>  include/block/block.h  |  2 +-
>  include/block/block_int.h  |  2 +-
>  include/sysemu/block-backend.h |  2 +-
>  block.c| 16 +++-
>  block/blkdebug.c   |  4 ++--
>  block/block-backend.c  |  5 +++--
>  block/commit.c |  5 +++--
>  block/crypto.c |  5 +++--
>  block/file-posix.c | 19 +--
>  block/file-win32.c |  6 +++---
>  block/gluster.c|  7 +--
>  block/iscsi.c  |  6 --
>  block/mirror.c |  2 +-
>  block/nfs.c| 12 ++--
>  block/parallels.c  | 13 -
>  block/qcow.c   |  6 +++---
>  block/qcow2-refcount.c |  5 -
>  block/qcow2.c  | 24 +++-
>  block/qed.c|  8 +---
>  block/raw-format.c |  6 --
>  block/rbd.c|  3 ++-
>  block/sheepdog.c   | 14 ++
>  block/vdi.c|  4 ++--
>  block/vhdx-log.c   |  2 +-
>  block/vhdx.c   | 25 ++---
>  block/vmdk.c   | 13 +++--
>  block/vpc.c| 13 +++--
>  blockdev.c | 21 +
>  qemu-img.c | 17 -
>  qemu-io-cmds.c |  5 +++--
>  30 files changed, 147 insertions(+), 125 deletions(-)
> 




signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [PATCH v3 for-2.10 4/4] block: Add .bdrv_truncate() error messages

2017-03-28 Thread Max Reitz
Add missing error messages for the block driver implementations of
.bdrv_truncate(); drop the generic one from block.c's bdrv_truncate().

Since one of these changes touches a mis-indented block in
block/file-posix.c, this patch fixes that coding style issue along the
way.

Signed-off-by: Max Reitz 
---
 block.c|  2 --
 block/file-posix.c | 17 -
 block/gluster.c|  4 +++-
 block/iscsi.c  |  2 ++
 block/nfs.c| 10 +-
 block/qcow2.c  |  2 ++
 block/qed.c|  4 +++-
 block/raw-format.c |  2 ++
 block/rbd.c|  1 +
 9 files changed, 34 insertions(+), 10 deletions(-)

diff --git a/block.c b/block.c
index 7b9841f99a..80e16e33d3 100644
--- a/block.c
+++ b/block.c
@@ -3249,8 +3249,6 @@ int bdrv_truncate(BdrvChild *child, int64_t offset, Error 
**errp)
 bdrv_dirty_bitmap_truncate(bs);
 bdrv_parent_cb_resize(bs);
 ++bs->write_gen;
-} else if (errp && !*errp) {
-error_setg_errno(errp, -ret, "Failed to resize image");
 }
 return ret;
 }
diff --git a/block/file-posix.c b/block/file-posix.c
index d23464013f..c08d031c78 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1413,20 +1413,27 @@ static int raw_truncate(BlockDriverState *bs, int64_t 
offset, Error **errp)
 {
 BDRVRawState *s = bs->opaque;
 struct stat st;
+int ret;
 
 if (fstat(s->fd, )) {
-return -errno;
+ret = -errno;
+error_setg_errno(errp, -ret, "Failed to fstat() the file");
+return ret;
 }
 
 if (S_ISREG(st.st_mode)) {
 if (ftruncate(s->fd, offset) < 0) {
-return -errno;
+ret = -errno;
+error_setg_errno(errp, -ret, "Failed to resize the file");
+return ret;
 }
 } else if (S_ISCHR(st.st_mode) || S_ISBLK(st.st_mode)) {
-   if (offset > raw_getlength(bs)) {
-   return -EINVAL;
-   }
+if (offset > raw_getlength(bs)) {
+error_setg(errp, "Cannot grow device files");
+return -EINVAL;
+}
 } else {
+error_setg(errp, "Resizing this file is not supported");
 return -ENOTSUP;
 }
 
diff --git a/block/gluster.c b/block/gluster.c
index 00b8240562..65350b575b 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -1092,7 +1092,9 @@ static int qemu_gluster_truncate(BlockDriverState *bs, 
int64_t offset,
 
 ret = glfs_ftruncate(s->fd, offset);
 if (ret < 0) {
-return -errno;
+ret = -errno;
+error_setg_errno(errp, -ret, "Failed to truncate file");
+return ret;
 }
 
 return 0;
diff --git a/block/iscsi.c b/block/iscsi.c
index ab559a6f71..036f5b6930 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -2066,6 +2066,7 @@ static int iscsi_truncate(BlockDriverState *bs, int64_t 
offset, Error **errp)
 Error *local_err = NULL;
 
 if (iscsilun->type != TYPE_DISK) {
+error_setg(errp, "Cannot resize non-disk iSCSI devices");
 return -ENOTSUP;
 }
 
@@ -2076,6 +2077,7 @@ static int iscsi_truncate(BlockDriverState *bs, int64_t 
offset, Error **errp)
 }
 
 if (offset > iscsi_getlength(bs)) {
+error_setg(errp, "Cannot grow iSCSI devices");
 return -EINVAL;
 }
 
diff --git a/block/nfs.c b/block/nfs.c
index 57d12efc51..6eccf18d75 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -760,7 +760,15 @@ static int64_t 
nfs_get_allocated_file_size(BlockDriverState *bs)
 static int nfs_file_truncate(BlockDriverState *bs, int64_t offset, Error 
**errp)
 {
 NFSClient *client = bs->opaque;
-return nfs_ftruncate(client->context, client->fh, offset);
+int ret;
+
+ret = nfs_ftruncate(client->context, client->fh, offset);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "Failed to truncate file");
+return ret;
+}
+
+return 0;
 }
 
 /* Note that this will not re-establish a connection with the NFS server
diff --git a/block/qcow2.c b/block/qcow2.c
index 6c347989e3..4ca4cf04b0 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2551,6 +2551,7 @@ static int qcow2_truncate(BlockDriverState *bs, int64_t 
offset, Error **errp)
 new_l1_size = size_to_l1(s, offset);
 ret = qcow2_grow_l1_table(bs, new_l1_size, true);
 if (ret < 0) {
+error_setg_errno(errp, -ret, "Failed to grow the L1 table");
 return ret;
 }
 
@@ -2559,6 +2560,7 @@ static int qcow2_truncate(BlockDriverState *bs, int64_t 
offset, Error **errp)
 ret = bdrv_pwrite_sync(bs->file, offsetof(QCowHeader, size),
, sizeof(uint64_t));
 if (ret < 0) {
+error_setg_errno(errp, -ret, "Failed to update the image size");
 return ret;
 }
 
diff --git a/block/qed.c b/block/qed.c
index fa2aeee471..fd76817cbb 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -1526,11 +1526,12 @@ static int bdrv_qed_truncate(BlockDriverState *bs, 
int64_t offset, Error **errp)
 
 if (!qed_is_image_size_valid(offset, 

[Qemu-devel] [PATCH v3 for-2.10 3/4] block: Add errp to BD.bdrv_truncate()

2017-03-28 Thread Max Reitz
Add an Error parameter to the block drivers' bdrv_truncate() interface.
If a block driver does not set this in case of an error, the generic
bdrv_truncate() implementation will do so.

Where it is obvious, this patch also makes some block drivers set this
value.

Signed-off-by: Max Reitz 
---
 include/block/block_int.h |  2 +-
 block.c   |  4 ++--
 block/blkdebug.c  |  4 ++--
 block/crypto.c|  5 +++--
 block/file-posix.c|  2 +-
 block/file-win32.c|  6 +++---
 block/gluster.c   |  3 ++-
 block/iscsi.c |  4 ++--
 block/nfs.c   |  2 +-
 block/qcow2.c |  8 
 block/qed.c   |  2 +-
 block/raw-format.c|  4 ++--
 block/rbd.c   |  2 +-
 block/sheepdog.c  | 14 ++
 14 files changed, 31 insertions(+), 31 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index 59400bd848..08063c10c8 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -196,7 +196,7 @@ struct BlockDriver {
 int coroutine_fn (*bdrv_co_flush_to_os)(BlockDriverState *bs);
 
 const char *protocol_name;
-int (*bdrv_truncate)(BlockDriverState *bs, int64_t offset);
+int (*bdrv_truncate)(BlockDriverState *bs, int64_t offset, Error **errp);
 
 int64_t (*bdrv_getlength)(BlockDriverState *bs);
 bool has_variable_length;
diff --git a/block.c b/block.c
index 9ed526e01d..7b9841f99a 100644
--- a/block.c
+++ b/block.c
@@ -3243,13 +3243,13 @@ int bdrv_truncate(BdrvChild *child, int64_t offset, 
Error **errp)
 return -EACCES;
 }
 
-ret = drv->bdrv_truncate(bs, offset);
+ret = drv->bdrv_truncate(bs, offset, errp);
 if (ret == 0) {
 ret = refresh_total_sectors(bs, offset >> BDRV_SECTOR_BITS);
 bdrv_dirty_bitmap_truncate(bs);
 bdrv_parent_cb_resize(bs);
 ++bs->write_gen;
-} else {
+} else if (errp && !*errp) {
 error_setg_errno(errp, -ret, "Failed to resize image");
 }
 return ret;
diff --git a/block/blkdebug.c b/block/blkdebug.c
index 15a9966096..c795ae9e72 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -661,9 +661,9 @@ static int64_t blkdebug_getlength(BlockDriverState *bs)
 return bdrv_getlength(bs->file->bs);
 }
 
-static int blkdebug_truncate(BlockDriverState *bs, int64_t offset)
+static int blkdebug_truncate(BlockDriverState *bs, int64_t offset, Error 
**errp)
 {
-return bdrv_truncate(bs->file, offset, NULL);
+return bdrv_truncate(bs->file, offset, errp);
 }
 
 static void blkdebug_refresh_filename(BlockDriverState *bs, QDict *options)
diff --git a/block/crypto.c b/block/crypto.c
index 52e4f2b20f..17b3140998 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -381,7 +381,8 @@ static int block_crypto_create_generic(QCryptoBlockFormat 
format,
 return ret;
 }
 
-static int block_crypto_truncate(BlockDriverState *bs, int64_t offset)
+static int block_crypto_truncate(BlockDriverState *bs, int64_t offset,
+ Error **errp)
 {
 BlockCrypto *crypto = bs->opaque;
 size_t payload_offset =
@@ -389,7 +390,7 @@ static int block_crypto_truncate(BlockDriverState *bs, 
int64_t offset)
 
 offset += payload_offset;
 
-return bdrv_truncate(bs->file, offset, NULL);
+return bdrv_truncate(bs->file, offset, errp);
 }
 
 static void block_crypto_close(BlockDriverState *bs)
diff --git a/block/file-posix.c b/block/file-posix.c
index 0841a08785..d23464013f 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1409,7 +1409,7 @@ static void raw_close(BlockDriverState *bs)
 }
 }
 
-static int raw_truncate(BlockDriverState *bs, int64_t offset)
+static int raw_truncate(BlockDriverState *bs, int64_t offset, Error **errp)
 {
 BDRVRawState *s = bs->opaque;
 struct stat st;
diff --git a/block/file-win32.c b/block/file-win32.c
index 800fabdd72..3f3925623f 100644
--- a/block/file-win32.c
+++ b/block/file-win32.c
@@ -461,7 +461,7 @@ static void raw_close(BlockDriverState *bs)
 }
 }
 
-static int raw_truncate(BlockDriverState *bs, int64_t offset)
+static int raw_truncate(BlockDriverState *bs, int64_t offset, Error **errp)
 {
 BDRVRawState *s = bs->opaque;
 LONG low, high;
@@ -476,11 +476,11 @@ static int raw_truncate(BlockDriverState *bs, int64_t 
offset)
  */
 dwPtrLow = SetFilePointer(s->hfile, low, , FILE_BEGIN);
 if (dwPtrLow == INVALID_SET_FILE_POINTER && GetLastError() != NO_ERROR) {
-fprintf(stderr, "SetFilePointer error: %lu\n", GetLastError());
+error_setg_win32(errp, GetLastError(), "SetFilePointer error");
 return -EIO;
 }
 if (SetEndOfFile(s->hfile) == 0) {
-fprintf(stderr, "SetEndOfFile error: %lu\n", GetLastError());
+error_setg_win32(errp, GetLastError(), "SetEndOfFile error");
 return -EIO;
 }
 return 0;
diff --git a/block/gluster.c b/block/gluster.c
index a577daef10..00b8240562 100644

[Qemu-devel] [PATCH v3 for-2.10 2/4] block: Add errp to b{lk, drv}_truncate()

2017-03-28 Thread Max Reitz
For one thing, this allows us to drop the error message generation from
qemu-img.c and blockdev.c and instead have it unified in
bdrv_truncate().

Signed-off-by: Max Reitz 
---
 include/block/block.h  |  2 +-
 include/sysemu/block-backend.h |  2 +-
 block.c| 16 
 block/blkdebug.c   |  2 +-
 block/block-backend.c  |  5 +++--
 block/commit.c |  5 +++--
 block/crypto.c |  2 +-
 block/mirror.c |  2 +-
 block/parallels.c  | 13 -
 block/qcow.c   |  6 +++---
 block/qcow2-refcount.c |  5 -
 block/qcow2.c  | 14 +-
 block/qed.c|  2 +-
 block/raw-format.c |  2 +-
 block/vdi.c|  4 ++--
 block/vhdx-log.c   |  2 +-
 block/vhdx.c   | 10 +++---
 block/vmdk.c   | 13 +++--
 block/vpc.c| 13 +++--
 blockdev.c | 21 +
 qemu-img.c | 17 -
 qemu-io-cmds.c |  5 +++--
 22 files changed, 73 insertions(+), 90 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 5149260827..4c9ed0e43c 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -294,7 +294,7 @@ BlockDriverState *bdrv_find_backing_image(BlockDriverState 
*bs,
 const char *backing_file);
 int bdrv_get_backing_file_depth(BlockDriverState *bs);
 void bdrv_refresh_filename(BlockDriverState *bs);
-int bdrv_truncate(BdrvChild *child, int64_t offset);
+int bdrv_truncate(BdrvChild *child, int64_t offset, Error **errp);
 int64_t bdrv_nb_sectors(BlockDriverState *bs);
 int64_t bdrv_getlength(BlockDriverState *bs);
 int64_t bdrv_get_allocated_file_size(BlockDriverState *bs);
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 7462228ac1..0ba4e277b9 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -225,7 +225,7 @@ int coroutine_fn blk_co_pwrite_zeroes(BlockBackend *blk, 
int64_t offset,
   int count, BdrvRequestFlags flags);
 int blk_pwrite_compressed(BlockBackend *blk, int64_t offset, const void *buf,
   int count);
-int blk_truncate(BlockBackend *blk, int64_t offset);
+int blk_truncate(BlockBackend *blk, int64_t offset, Error **errp);
 int blk_pdiscard(BlockBackend *blk, int64_t offset, int count);
 int blk_save_vmstate(BlockBackend *blk, const uint8_t *buf,
  int64_t pos, int size);
diff --git a/block.c b/block.c
index 6e906ec53c..9ed526e01d 100644
--- a/block.c
+++ b/block.c
@@ -3222,7 +3222,7 @@ exit:
 /**
  * Truncate file to 'offset' bytes (needed only for file protocols)
  */
-int bdrv_truncate(BdrvChild *child, int64_t offset)
+int bdrv_truncate(BdrvChild *child, int64_t offset, Error **errp)
 {
 BlockDriverState *bs = child->bs;
 BlockDriver *drv = bs->drv;
@@ -3230,12 +3230,18 @@ int bdrv_truncate(BdrvChild *child, int64_t offset)
 
 assert(child->perm & BLK_PERM_RESIZE);
 
-if (!drv)
+if (!drv) {
+error_setg(errp, "No medium inserted");
 return -ENOMEDIUM;
-if (!drv->bdrv_truncate)
+}
+if (!drv->bdrv_truncate) {
+error_setg(errp, "Image format driver does not support resize");
 return -ENOTSUP;
-if (bs->read_only)
+}
+if (bs->read_only) {
+error_setg(errp, "Image is read-only");
 return -EACCES;
+}
 
 ret = drv->bdrv_truncate(bs, offset);
 if (ret == 0) {
@@ -3243,6 +3249,8 @@ int bdrv_truncate(BdrvChild *child, int64_t offset)
 bdrv_dirty_bitmap_truncate(bs);
 bdrv_parent_cb_resize(bs);
 ++bs->write_gen;
+} else {
+error_setg_errno(errp, -ret, "Failed to resize image");
 }
 return ret;
 }
diff --git a/block/blkdebug.c b/block/blkdebug.c
index 67e8024e36..15a9966096 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -663,7 +663,7 @@ static int64_t blkdebug_getlength(BlockDriverState *bs)
 
 static int blkdebug_truncate(BlockDriverState *bs, int64_t offset)
 {
-return bdrv_truncate(bs->file, offset);
+return bdrv_truncate(bs->file, offset, NULL);
 }
 
 static void blkdebug_refresh_filename(BlockDriverState *bs, QDict *options)
diff --git a/block/block-backend.c b/block/block-backend.c
index 0b6377332c..3abd9005b9 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1705,13 +1705,14 @@ int blk_pwrite_compressed(BlockBackend *blk, int64_t 
offset, const void *buf,
BDRV_REQ_WRITE_COMPRESSED);
 }
 
-int blk_truncate(BlockBackend *blk, int64_t offset)
+int blk_truncate(BlockBackend *blk, int64_t offset, Error **errp)
 {
 if (!blk_is_available(blk)) {
+error_setg(errp, "No medium inserted");
 return -ENOMEDIUM;
 }
 
-return 

[Qemu-devel] [PATCH v3 for-2.10 0/4] block: Add errp to b{lk, drv}_truncate()

2017-03-28 Thread Max Reitz
Having an Error parameter for these functions makes sense because we
sometimes want a bit more information than just "Something failed". Some
drivers already use error_report() and the like to emit this additional
information, so it's rather obvious that we do want a real error object
here.


v3:
- Patch 2: Keep "Could not resize image" message in qcow2_create2() by
   using error_prepend() [Kevin]
- Patch 3: Dropped archipelago
- Patch 4:
  - Keep errno information where available [Kevin]
  - Make all drivers generate error messages [Stefan/Eric]
  - Drop generic error message from bdrv_truncate() [Stefan/Eric]


Max Reitz (4):
  block/vhdx: Make vhdx_create() always set errp
  block: Add errp to b{lk,drv}_truncate()
  block: Add errp to BD.bdrv_truncate()
  block: Add .bdrv_truncate() error messages

 include/block/block.h  |  2 +-
 include/block/block_int.h  |  2 +-
 include/sysemu/block-backend.h |  2 +-
 block.c| 16 +++-
 block/blkdebug.c   |  4 ++--
 block/block-backend.c  |  5 +++--
 block/commit.c |  5 +++--
 block/crypto.c |  5 +++--
 block/file-posix.c | 19 +--
 block/file-win32.c |  6 +++---
 block/gluster.c|  7 +--
 block/iscsi.c  |  6 --
 block/mirror.c |  2 +-
 block/nfs.c| 12 ++--
 block/parallels.c  | 13 -
 block/qcow.c   |  6 +++---
 block/qcow2-refcount.c |  5 -
 block/qcow2.c  | 24 +++-
 block/qed.c|  8 +---
 block/raw-format.c |  6 --
 block/rbd.c|  3 ++-
 block/sheepdog.c   | 14 ++
 block/vdi.c|  4 ++--
 block/vhdx-log.c   |  2 +-
 block/vhdx.c   | 25 ++---
 block/vmdk.c   | 13 +++--
 block/vpc.c| 13 +++--
 blockdev.c | 21 +
 qemu-img.c | 17 -
 qemu-io-cmds.c |  5 +++--
 30 files changed, 147 insertions(+), 125 deletions(-)

-- 
2.12.1




[Qemu-devel] [PATCH v3 for-2.10 1/4] block/vhdx: Make vhdx_create() always set errp

2017-03-28 Thread Max Reitz
This patch makes vhdx_create() always set errp in case of an error. It
also adds errp parameters to vhdx_create_bat() and
vhdx_create_new_region_table() so we can pass on the error object
generated by blk_truncate() as of a future commit.

Signed-off-by: Max Reitz 
Reviewed-by: Kevin Wolf 
---
 block/vhdx.c | 23 +++
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/block/vhdx.c b/block/vhdx.c
index 052a753159..d25bcd91de 100644
--- a/block/vhdx.c
+++ b/block/vhdx.c
@@ -1586,7 +1586,7 @@ exit:
 static int vhdx_create_bat(BlockBackend *blk, BDRVVHDXState *s,
uint64_t image_size, VHDXImageType type,
bool use_zero_blocks, uint64_t file_offset,
-   uint32_t length)
+   uint32_t length, Error **errp)
 {
 int ret = 0;
 uint64_t data_file_offset;
@@ -1609,14 +1609,19 @@ static int vhdx_create_bat(BlockBackend *blk, 
BDRVVHDXState *s,
  * is the furthest thing we have written yet */
 ret = blk_truncate(blk, data_file_offset);
 if (ret < 0) {
+error_setg_errno(errp, -ret,
+"Failed to resize the underlying file");
 goto exit;
 }
 } else if (type == VHDX_TYPE_FIXED) {
 ret = blk_truncate(blk, data_file_offset + image_size);
 if (ret < 0) {
+error_setg_errno(errp, -ret,
+"Failed to resize the underlying file");
 goto exit;
 }
 } else {
+error_setg(errp, "Unsupported image type");
 ret = -ENOTSUP;
 goto exit;
 }
@@ -1627,6 +1632,7 @@ static int vhdx_create_bat(BlockBackend *blk, 
BDRVVHDXState *s,
 /* for a fixed file, the default BAT entry is not zero */
 s->bat = g_try_malloc0(length);
 if (length && s->bat == NULL) {
+error_setg(errp, "Failed to allocate memory for the BAT");
 ret = -ENOMEM;
 goto exit;
 }
@@ -1646,6 +1652,7 @@ static int vhdx_create_bat(BlockBackend *blk, 
BDRVVHDXState *s,
 }
 ret = blk_pwrite(blk, file_offset, s->bat, length, 0);
 if (ret < 0) {
+error_setg_errno(errp, -ret, "Failed to write the BAT");
 goto exit;
 }
 }
@@ -1671,7 +1678,8 @@ static int vhdx_create_new_region_table(BlockBackend *blk,
 uint32_t log_size,
 bool use_zero_blocks,
 VHDXImageType type,
-uint64_t *metadata_offset)
+uint64_t *metadata_offset,
+Error **errp)
 {
 int ret = 0;
 uint32_t offset = 0;
@@ -1740,7 +1748,7 @@ static int vhdx_create_new_region_table(BlockBackend *blk,
 /* The region table gives us the data we need to create the BAT,
  * so do that now */
 ret = vhdx_create_bat(blk, s, image_size, type, use_zero_blocks,
-  bat_file_offset, bat_length);
+  bat_file_offset, bat_length, errp);
 if (ret < 0) {
 goto exit;
 }
@@ -1749,12 +1757,14 @@ static int vhdx_create_new_region_table(BlockBackend 
*blk,
 ret = blk_pwrite(blk, VHDX_REGION_TABLE_OFFSET, buffer,
  VHDX_HEADER_BLOCK_SIZE, 0);
 if (ret < 0) {
+error_setg_errno(errp, -ret, "Failed to write first region table");
 goto exit;
 }
 
 ret = blk_pwrite(blk, VHDX_REGION_TABLE2_OFFSET, buffer,
  VHDX_HEADER_BLOCK_SIZE, 0);
 if (ret < 0) {
+error_setg_errno(errp, -ret, "Failed to write second region table");
 goto exit;
 }
 
@@ -1825,6 +1835,7 @@ static int vhdx_create(const char *filename, QemuOpts 
*opts, Error **errp)
 ret = -ENOTSUP;
 goto exit;
 } else {
+error_setg(errp, "Invalid subformat '%s'", type);
 ret = -EINVAL;
 goto exit;
 }
@@ -1879,12 +1890,14 @@ static int vhdx_create(const char *filename, QemuOpts 
*opts, Error **errp)
 ret = blk_pwrite(blk, VHDX_FILE_ID_OFFSET, , sizeof(signature),
  0);
 if (ret < 0) {
+error_setg_errno(errp, -ret, "Failed to write file signature");
 goto delete_and_exit;
 }
 if (creator) {
 ret = blk_pwrite(blk, VHDX_FILE_ID_OFFSET + sizeof(signature),
  creator, creator_items * sizeof(gunichar2), 0);
 if (ret < 0) {
+error_setg_errno(errp, -ret, "Failed to write creator field");
 goto delete_and_exit;
 }
 }
@@ -1893,13 +1906,14 @@ static int vhdx_create(const char *filename, QemuOpts 
*opts, Error **errp)
 /* Creates (B),(C) */
 ret = vhdx_create_new_headers(blk, image_size, log_size);
 if (ret < 0) {
+error_setg_errno(errp, 

Re: [Qemu-devel] host stalls when qemu-system-aarch64 with kvm and pflash

2017-03-28 Thread Radha Mohan
On Tue, Mar 28, 2017 at 1:16 PM, Christoffer Dall  wrote:
> Hi Radha,
>
> On Tue, Mar 28, 2017 at 12:58:24PM -0700, Radha Mohan wrote:
>> Hi,
>> I am seeing an issue with qemu-system-aarch64 when using pflash
>> (booting kernel via UEFI bios).
>>
>> Host kernel: 4.11.0-rc3-next-20170323
>> Qemu version: v2.9.0-rc1
>>
>> Command used:
>> ./aarch64-softmmu/qemu-system-aarch64 -cpu host -enable-kvm -M
>> virt,gic_version=3 -nographic -smp 1 -m 2048 -drive
>> if=none,id=hd0,file=/root/zesty-server-cloudimg-arm64.img,id=0 -device
>> virtio-blk-device,drive=hd0 -pflash /root/flash0.img -pflash
>> /root/flash1.img
>>
>>
>> As soon as the guest kernel boots the host starts to stall and prints
>> the below messages. And the system never recovers. I can neither
>> poweroff the guest nor the host. So I have resort to external power
>> reset of the host.
>>
>> ==
>> [  116.199077] NMI watchdog: BUG: soft lockup - CPU#25 stuck for 23s!
>> [kworker/25:1:454]
>> [  116.206901] Modules linked in: binfmt_misc nls_iso8859_1 aes_ce_blk
>> shpchp crypto_simd gpio_keys cryptd aes_ce_cipher ghash_ce sha2_ce
>> sha1_ce uio_pdrv_genirq uio autofs4 btrfs raid10 rai
>> d456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor
>> raid6_pq libcrc32c raid1 raid0 multipath linear ast i2c_algo_bit ttm
>> drm_kms_helper syscopyarea sysfillrect sysimgblt fb_s
>> ys_fops drm nicvf ahci nicpf libahci thunder_bgx thunder_xcv
>> mdio_thunder mdio_cavium
>>
>> [  116.206995] CPU: 25 PID: 454 Comm: kworker/25:1 Not tainted
>> 4.11.0-rc3-next-20170323 #1
>> [  116.206997] Hardware name: www.cavium.com crb-1s/crb-1s, BIOS 0.3 Feb 23 
>> 2017
>> [  116.207010] Workqueue: events netstamp_clear
>> [  116.207015] task: 801f906b5400 task.stack: 801f901a4000
>> [  116.207020] PC is at smp_call_function_many+0x284/0x2e8
>> [  116.207023] LR is at smp_call_function_many+0x244/0x2e8
>> [  116.207026] pc : [] lr : []
>> pstate: 8145
>> [  116.207028] sp : 801f901a7be0
>> [  116.207030] x29: 801f901a7be0 x28: 09139000
>> [  116.207036] x27: 09139434 x26: 0080
>> [  116.207041] x25:  x24: 081565d0
>> [  116.207047] x23: 0001 x22: 08e11e00
>> [  116.207052] x21: 801f6d5cff00 x20: 801f6d5cff08
>> [  116.207057] x19: 09138e38 x18: 0a03
>> [  116.207063] x17: b77c9028 x16: 082e81d8
>> [  116.207068] x15: 3d0d6dd44d08 x14: 0036312196549b4a
>> [  116.207073] x13: 58dabe4c x12: 0018
>> [  116.207079] x11: 366e2f04 x10: 09f0
>> [  116.207084] x9 : 801f901a7d30 x8 : 0002
>> [  116.207089] x7 :  x6 : 
>> [  116.207095] x5 :  x4 : 0020
>> [  116.207100] x3 : 0020 x2 : 
>> [  116.207105] x1 : 801f6d682578 x0 : 0003
>>
>> [  150.443116] INFO: rcu_sched self-detected stall on CPU
>> [  150.448261]  25-...: (14997 ticks this GP)
>> idle=47a/141/0 softirq=349/349 fqs=7495
>> [  150.451115] INFO: rcu_sched detected stalls on CPUs/tasks:
>> [  150.451123]  25-...: (14997 ticks this GP)
>> idle=47a/141/0 softirq=349/349 fqs=7495
>> [  150.451124]  (detected by 13, t=15002 jiffies, g=805, c=804, q=8384)
>> [  150.451136] Task dump for CPU 25:
>> [  150.451138] kworker/25:1R  running task0   454  2 
>> 0x0002
>> [  150.451155] Workqueue: events netstamp_clear
>> [  150.451158] Call trace:
>> [  150.451164] [] __switch_to+0x90/0xa8
>> [  150.451172] [] static_key_slow_inc+0x128/0x138
>> [  150.451175] [] static_key_enable+0x34/0x60
>> [  150.451178] [] netstamp_clear+0x68/0x80
>> [  150.451181] [] process_one_work+0x158/0x478
>> [  150.451183] [] worker_thread+0x50/0x4a8
>> [  150.451187] [] kthread+0x108/0x138
>> [  150.451190] [] ret_from_fork+0x10/0x50
>> [  150.477451]   (t=15008 jiffies g=805 c=804 q=8384)
>> [  150.482242] Task dump for CPU 25:
>> [  150.482245] kworker/25:1R  running task0   454  2 
>> 0x0002
>> [  150.482259] Workqueue: events netstamp_clear
>> [  150.482264] Call trace:
>> [  150.482271] [] dump_backtrace+0x0/0x2b0
>> [  150.482277] [] show_stack+0x24/0x30
>> [  150.482281] [] sched_show_task+0x128/0x178
>> [  150.482285] [] dump_cpu_task+0x48/0x58
>> [  150.482288] [] rcu_dump_cpu_stacks+0xa0/0xe8
>> [  150.482297] [] rcu_check_callbacks+0x774/0x938
>> [  150.482305] [] update_process_times+0x34/0x60
>> [  150.482314] [] tick_sched_handle.isra.7+0x38/0x70
>> [  150.482319] [] tick_sched_timer+0x4c/0x98
>> [  150.482324] [] __hrtimer_run_queues+0xd8/0x2b8
>> [  150.482328] [] hrtimer_interrupt+0xa8/0x228
>> [  150.482334] [] arch_timer_handler_phys+0x3c/0x50
>> [  150.482341] [] handle_percpu_devid_irq+0x8c/0x230
>> [  150.482344] [] generic_handle_irq+0x34/0x50
>> [  150.482347] [] __handle_domain_irq+0x68/0xc0
>> [  150.482351] [] 

Re: [Qemu-devel] [PATCH v3 for-2.9] virtio: fix vring_align() on 64-bit windows

2017-03-28 Thread Stefan Weil

Am 28.03.2017 um 22:11 schrieb Michael S. Tsirkin:


I'm doing a pull request a bit later today - I can pick this one up
if you prefer. If yes, pls send your ack.



Yes, please.

Thanks a lot
Stefan



Re: [Qemu-devel] [PATCH v3 for-2.9] virtio: fix vring_align() on 64-bit windows

2017-03-28 Thread Michael S. Tsirkin
On Tue, Mar 28, 2017 at 10:02:10PM +0200, Stefan Weil wrote:
> Am 28.03.2017 um 20:56 schrieb Andrew Baumann:
> > > From: Eric Blake [mailto:ebl...@redhat.com]
> > > Sent: Tuesday, 28 March 2017 11:52
> > > 
> > > On 03/28/2017 01:38 PM, Stefan Weil wrote:
> > > > Am 25.03.2017 um 00:19 schrieb Andrew Baumann:
> > > > > long is 32-bits on 64-bit windows, which caused the top half of the
> > > > > address to be truncated; this patch changes it to use the
> > > > > QEMU_ALIGN_UP macro which does not suffer the same problem
> > > > > 
> > > > > Signed-off-by: Andrew Baumann 
> > > > > Reviewed-by: Eric Blake 
> > > > > ---
> > > 
> > > > Eric added "for-2.9" to the subject line of v2, but now it was
> > > > missing again for v3.
> > > > 
> > > > Is this needed for 2.9?
> > > 
> > > Yes, it's a correctness bug that avoids miscompilation on 64-bit targets
> > > where long is 32 bits (which, at the moment, is really just Windows).
> > 
> > I agree, this should be in 2.9. I dropped the tag by accident.
> > 
> > > > I wonder why I never before noticed
> > > > a problem or got a bug report for this issue.
> > > 
> > > Probably because so few people are testing on native Windows, and it
> > > doesn't affect other platforms.
> > 
> > In addition to that, you only notice it on virtio devices mapped above the 
> > 32-bit limit...
> > 
> > Andrew
> > 
> 
> Reviewed-by: Stefan Weil 
> 
> I added this patch to my queue. Peter, do you still accept pull requests
> for 2.9? I'm still waiting for a review of another bug fix for Windows
> (http://patchwork.ozlabs.org/patch/743416/). How long do I have time
> to get bug fixes for Windows into 2.9?
> 
> Of course I would not mind if you pulled this one directly (see
> http://patchwork.ozlabs.org/patch/743410/).
> 
> Stefan

I'm doing a pull request a bit later today - I can pick this one up
if you prefer. If yes, pls send your ack.

-- 
MST



Re: [Qemu-devel] [PATCH v3 for-2.9] virtio: fix vring_align() on 64-bit windows

2017-03-28 Thread Stefan Weil

Am 28.03.2017 um 20:56 schrieb Andrew Baumann:

From: Eric Blake [mailto:ebl...@redhat.com]
Sent: Tuesday, 28 March 2017 11:52

On 03/28/2017 01:38 PM, Stefan Weil wrote:

Am 25.03.2017 um 00:19 schrieb Andrew Baumann:

long is 32-bits on 64-bit windows, which caused the top half of the
address to be truncated; this patch changes it to use the
QEMU_ALIGN_UP macro which does not suffer the same problem

Signed-off-by: Andrew Baumann 
Reviewed-by: Eric Blake 
---



Eric added "for-2.9" to the subject line of v2, but now it was
missing again for v3.

Is this needed for 2.9?


Yes, it's a correctness bug that avoids miscompilation on 64-bit targets
where long is 32 bits (which, at the moment, is really just Windows).


I agree, this should be in 2.9. I dropped the tag by accident.


I wonder why I never before noticed
a problem or got a bug report for this issue.


Probably because so few people are testing on native Windows, and it
doesn't affect other platforms.


In addition to that, you only notice it on virtio devices mapped above the 
32-bit limit...

Andrew



Reviewed-by: Stefan Weil 

I added this patch to my queue. Peter, do you still accept pull requests
for 2.9? I'm still waiting for a review of another bug fix for Windows 
(http://patchwork.ozlabs.org/patch/743416/). How long do I have time

to get bug fixes for Windows into 2.9?

Of course I would not mind if you pulled this one directly (see 
http://patchwork.ozlabs.org/patch/743410/).


Stefan




Re: [Qemu-devel] [PATCH v3] virtio: fix vring_align() on 64-bit windows

2017-03-28 Thread Michael S. Tsirkin
On Fri, Mar 24, 2017 at 04:19:43PM -0700, Andrew Baumann wrote:
> long is 32-bits on 64-bit windows, which caused the top half of the
> address to be truncated; this patch changes it to use the
> QEMU_ALIGN_UP macro which does not suffer the same problem
> 
> Signed-off-by: Andrew Baumann 
> Reviewed-by: Eric Blake 

Reviewed-by: Michael S. Tsirkin 

> ---
>  include/hw/virtio/virtio.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index 15efcf2..7b6edba 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -34,7 +34,7 @@ struct VirtQueue;
>  static inline hwaddr vring_align(hwaddr addr,
>   unsigned long align)
>  {
> -return (addr + align - 1) & ~(align - 1);
> +return QEMU_ALIGN_UP(addr, align);
>  }
>  
>  typedef struct VirtQueue VirtQueue;
> -- 
> 2.8.3



Re: [Qemu-devel] [PULL for-2.9] block: Declare blockdev-add and blockdev-del supported

2017-03-28 Thread Max Reitz
On 28.03.2017 20:49, Markus Armbruster wrote:
> Paolo Bonzini  writes:
> 
>> On 28/03/2017 15:45, Markus Armbruster wrote:
>>> It's been a long journey, but here we are.
>>>
>>> The supported blockdev-add is not compatible to its experimental
>>> predecessors; bump all Since: tags to 2.9.
>>
>> Can you document the differences in the 2.9 changelog?
> 
> Not sure I can before I drop off for a week of vacation.  Kevin, Max,
> can you chip in?

Just document the changes we have done to blockdev-add since it was
originally introduced? Sure, I'll have a shot.

Max



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] host stalls when qemu-system-aarch64 with kvm and pflash

2017-03-28 Thread Radha Mohan
Hi,
I am seeing an issue with qemu-system-aarch64 when using pflash
(booting kernel via UEFI bios).

Host kernel: 4.11.0-rc3-next-20170323
Qemu version: v2.9.0-rc1

Command used:
./aarch64-softmmu/qemu-system-aarch64 -cpu host -enable-kvm -M
virt,gic_version=3 -nographic -smp 1 -m 2048 -drive
if=none,id=hd0,file=/root/zesty-server-cloudimg-arm64.img,id=0 -device
virtio-blk-device,drive=hd0 -pflash /root/flash0.img -pflash
/root/flash1.img


As soon as the guest kernel boots the host starts to stall and prints
the below messages. And the system never recovers. I can neither
poweroff the guest nor the host. So I have resort to external power
reset of the host.

==
[  116.199077] NMI watchdog: BUG: soft lockup - CPU#25 stuck for 23s!
[kworker/25:1:454]
[  116.206901] Modules linked in: binfmt_misc nls_iso8859_1 aes_ce_blk
shpchp crypto_simd gpio_keys cryptd aes_ce_cipher ghash_ce sha2_ce
sha1_ce uio_pdrv_genirq uio autofs4 btrfs raid10 rai
d456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor
raid6_pq libcrc32c raid1 raid0 multipath linear ast i2c_algo_bit ttm
drm_kms_helper syscopyarea sysfillrect sysimgblt fb_s
ys_fops drm nicvf ahci nicpf libahci thunder_bgx thunder_xcv
mdio_thunder mdio_cavium

[  116.206995] CPU: 25 PID: 454 Comm: kworker/25:1 Not tainted
4.11.0-rc3-next-20170323 #1
[  116.206997] Hardware name: www.cavium.com crb-1s/crb-1s, BIOS 0.3 Feb 23 2017
[  116.207010] Workqueue: events netstamp_clear
[  116.207015] task: 801f906b5400 task.stack: 801f901a4000
[  116.207020] PC is at smp_call_function_many+0x284/0x2e8
[  116.207023] LR is at smp_call_function_many+0x244/0x2e8
[  116.207026] pc : [] lr : []
pstate: 8145
[  116.207028] sp : 801f901a7be0
[  116.207030] x29: 801f901a7be0 x28: 09139000
[  116.207036] x27: 09139434 x26: 0080
[  116.207041] x25:  x24: 081565d0
[  116.207047] x23: 0001 x22: 08e11e00
[  116.207052] x21: 801f6d5cff00 x20: 801f6d5cff08
[  116.207057] x19: 09138e38 x18: 0a03
[  116.207063] x17: b77c9028 x16: 082e81d8
[  116.207068] x15: 3d0d6dd44d08 x14: 0036312196549b4a
[  116.207073] x13: 58dabe4c x12: 0018
[  116.207079] x11: 366e2f04 x10: 09f0
[  116.207084] x9 : 801f901a7d30 x8 : 0002
[  116.207089] x7 :  x6 : 
[  116.207095] x5 :  x4 : 0020
[  116.207100] x3 : 0020 x2 : 
[  116.207105] x1 : 801f6d682578 x0 : 0003

[  150.443116] INFO: rcu_sched self-detected stall on CPU
[  150.448261]  25-...: (14997 ticks this GP)
idle=47a/141/0 softirq=349/349 fqs=7495
[  150.451115] INFO: rcu_sched detected stalls on CPUs/tasks:
[  150.451123]  25-...: (14997 ticks this GP)
idle=47a/141/0 softirq=349/349 fqs=7495
[  150.451124]  (detected by 13, t=15002 jiffies, g=805, c=804, q=8384)
[  150.451136] Task dump for CPU 25:
[  150.451138] kworker/25:1R  running task0   454  2 0x0002
[  150.451155] Workqueue: events netstamp_clear
[  150.451158] Call trace:
[  150.451164] [] __switch_to+0x90/0xa8
[  150.451172] [] static_key_slow_inc+0x128/0x138
[  150.451175] [] static_key_enable+0x34/0x60
[  150.451178] [] netstamp_clear+0x68/0x80
[  150.451181] [] process_one_work+0x158/0x478
[  150.451183] [] worker_thread+0x50/0x4a8
[  150.451187] [] kthread+0x108/0x138
[  150.451190] [] ret_from_fork+0x10/0x50
[  150.477451]   (t=15008 jiffies g=805 c=804 q=8384)
[  150.482242] Task dump for CPU 25:
[  150.482245] kworker/25:1R  running task0   454  2 0x0002
[  150.482259] Workqueue: events netstamp_clear
[  150.482264] Call trace:
[  150.482271] [] dump_backtrace+0x0/0x2b0
[  150.482277] [] show_stack+0x24/0x30
[  150.482281] [] sched_show_task+0x128/0x178
[  150.482285] [] dump_cpu_task+0x48/0x58
[  150.482288] [] rcu_dump_cpu_stacks+0xa0/0xe8
[  150.482297] [] rcu_check_callbacks+0x774/0x938
[  150.482305] [] update_process_times+0x34/0x60
[  150.482314] [] tick_sched_handle.isra.7+0x38/0x70
[  150.482319] [] tick_sched_timer+0x4c/0x98
[  150.482324] [] __hrtimer_run_queues+0xd8/0x2b8
[  150.482328] [] hrtimer_interrupt+0xa8/0x228
[  150.482334] [] arch_timer_handler_phys+0x3c/0x50
[  150.482341] [] handle_percpu_devid_irq+0x8c/0x230
[  150.482344] [] generic_handle_irq+0x34/0x50
[  150.482347] [] __handle_domain_irq+0x68/0xc0
[  150.482351] [] gic_handle_irq+0xc4/0x170
[  150.482356] Exception stack(0x801f901a7ab0 to 0x801f901a7be0)
[  150.482360] 7aa0:
0003 801f6d682578
[  150.482364] 7ac0:  0020
0020 
[  150.482367] 7ae0:  
0002 801f901a7d30
[  150.482371] 7b00: 09f0 366e2f04
0018 58dabe4c
[  150.482375] 7b20: 0036312196549b4a 

[Qemu-devel] [PULL 2/2] i386: Don't override -cpu options on -cpu host/max

2017-03-28 Thread Eduardo Habkost
The existing code for "host" and "max" CPU models overrides every
single feature in the CPU object at realize time, even the ones
that were explicitly enabled or disabled by the user using
"feat=on" or "feat=off", while features set using +feat/-feat are
kept.

This means "-cpu host,+invtsc" works as expected, while
"-cpu host,invtsc=on" doesn't.

This was a known bug, already documented in a comment inside
x86_cpu_expand_features(). What makes this bug worse now is that
libvirt 3.0.0 and newer now use "feat=on|off" instead of
+feat/-feat when it detects a QEMU version that supports it (see
libvirt commit d47db7b16dd5422c7e487c8c8ee5b181a2f9cd66).

Change the feature property getter/setter to set a
env->user_features field, to keep track of features that were
explicitly changed using QOM properties. Then make the
max_features code not override user features when handling "-cpu
host" and "-cpu max".

This will also allow us to remove the plus_features/minus_features
hack in the future, but I plan to do that after 2.9.0 is
released.

Reported-by: Jiri Denemark 
Signed-off-by: Eduardo Habkost 
Message-Id: <20170327144815.8043-3-ehabk...@redhat.com>
Reviewed-by: Igor Mammedov 
Tested-by: Jiri Denemark 
Signed-off-by: Eduardo Habkost 
---
 target/i386/cpu.h |  2 ++
 target/i386/cpu.c | 13 +
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 07401ad9fe..c4602ca80d 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1147,6 +1147,8 @@ typedef struct CPUX86State {
 uint32_t cpuid_vendor3;
 uint32_t cpuid_version;
 FeatureWordArray features;
+/* Features that were explicitly enabled/disabled */
+FeatureWordArray user_features;
 uint32_t cpuid_model[12];
 
 /* MTRRs */
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index feefa5b8a4..13c0985f11 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -3373,15 +3373,19 @@ static void x86_cpu_expand_features(X86CPU *cpu, Error 
**errp)
 GList *l;
 Error *local_err = NULL;
 
-/*TODO: cpu->max_features incorrectly overwrites features
- * set using "feat=on|off". Once we fix this, we can convert
+/*TODO: Now cpu->max_features doesn't overwrite features
+ * set using QOM properties, and we can convert
  * plus_features & minus_features to global properties
  * inside x86_cpu_parse_featurestr() too.
  */
 if (cpu->max_features) {
 for (w = 0; w < FEATURE_WORDS; w++) {
-env->features[w] =
-x86_cpu_get_supported_feature_word(w, cpu->migratable);
+/* Override only features that weren't set explicitly
+ * by the user.
+ */
+env->features[w] |=
+x86_cpu_get_supported_feature_word(w, cpu->migratable) &
+~env->user_features[w];
 }
 }
 
@@ -3731,6 +3735,7 @@ static void x86_cpu_set_bit_prop(Object *obj, Visitor *v, 
const char *name,
 } else {
 cpu->env.features[fp->w] &= ~fp->mask;
 }
+cpu->env.user_features[fp->w] |= fp->mask;
 }
 
 static void x86_cpu_release_bit_prop(Object *obj, const char *name,
-- 
2.11.0.259.g40922b1




[Qemu-devel] [PULL 0/2] i386: Fix for "-cpu host,invtsc=on" bug

2017-03-28 Thread Eduardo Habkost
Last-minute fix for a bug found by Jiri Denemark. Unfortunately
not in time for -rc2, but I would like to get this in -rc3.

The following changes since commit df9046363220e57d45818312759b954c033c58ab:

  Update version for v2.9.0-rc2 release (2017-03-28 19:11:16 +0100)

are available in the git repository at:

  git://github.com/ehabkost/qemu.git tags/x86-pull-request

for you to fetch changes up to d4a606b38b5d4b3689b86cc1575908e82179ecfb:

  i386: Don't override -cpu options on -cpu host/max (2017-03-28 16:41:10 -0300)


i386: Fix for "-cpu host,invtsc=on" bug



Eduardo Habkost (2):
  i386: Replace uint32_t* with FeatureWord on feature getter/setter
  i386: Don't override -cpu options on -cpu host/max

 target/i386/cpu.h |  2 ++
 target/i386/cpu.c | 32 
 2 files changed, 22 insertions(+), 12 deletions(-)

-- 
2.11.0.259.g40922b1




[Qemu-devel] [PULL 1/2] i386: Replace uint32_t* with FeatureWord on feature getter/setter

2017-03-28 Thread Eduardo Habkost
Instead of passing a pointer to the feature property getter and
setter functions, pass a FeatureWord enum so they can perform
other actions related to the feature flag.

This will be used to add a new "user_features" field to keep
track of features that were explicitly set by the user.

Signed-off-by: Eduardo Habkost 
Message-Id: <20170327144815.8043-2-ehabk...@redhat.com>
Reviewed-by: Igor Mammedov 
Tested-by: Jiri Denemark 
Signed-off-by: Eduardo Habkost 
---
 target/i386/cpu.c | 19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 7aa762245a..feefa5b8a4 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -3692,15 +3692,17 @@ static void x86_cpu_unrealizefn(DeviceState *dev, Error 
**errp)
 }
 
 typedef struct BitProperty {
-uint32_t *ptr;
+FeatureWord w;
 uint32_t mask;
 } BitProperty;
 
 static void x86_cpu_get_bit_prop(Object *obj, Visitor *v, const char *name,
  void *opaque, Error **errp)
 {
+X86CPU *cpu = X86_CPU(obj);
 BitProperty *fp = opaque;
-bool value = (*fp->ptr & fp->mask) == fp->mask;
+uint32_t f = cpu->env.features[fp->w];
+bool value = (f & fp->mask) == fp->mask;
 visit_type_bool(v, name, , errp);
 }
 
@@ -3708,6 +3710,7 @@ static void x86_cpu_set_bit_prop(Object *obj, Visitor *v, 
const char *name,
  void *opaque, Error **errp)
 {
 DeviceState *dev = DEVICE(obj);
+X86CPU *cpu = X86_CPU(obj);
 BitProperty *fp = opaque;
 Error *local_err = NULL;
 bool value;
@@ -3724,9 +3727,9 @@ static void x86_cpu_set_bit_prop(Object *obj, Visitor *v, 
const char *name,
 }
 
 if (value) {
-*fp->ptr |= fp->mask;
+cpu->env.features[fp->w] |= fp->mask;
 } else {
-*fp->ptr &= ~fp->mask;
+cpu->env.features[fp->w] &= ~fp->mask;
 }
 }
 
@@ -3745,7 +3748,7 @@ static void x86_cpu_release_bit_prop(Object *obj, const 
char *name,
  */
 static void x86_cpu_register_bit_prop(X86CPU *cpu,
   const char *prop_name,
-  uint32_t *field,
+  FeatureWord w,
   int bitnr)
 {
 BitProperty *fp;
@@ -3755,11 +3758,11 @@ static void x86_cpu_register_bit_prop(X86CPU *cpu,
 op = object_property_find(OBJECT(cpu), prop_name, NULL);
 if (op) {
 fp = op->opaque;
-assert(fp->ptr == field);
+assert(fp->w == w);
 fp->mask |= mask;
 } else {
 fp = g_new0(BitProperty, 1);
-fp->ptr = field;
+fp->w = w;
 fp->mask = mask;
 object_property_add(OBJECT(cpu), prop_name, "bool",
 x86_cpu_get_bit_prop,
@@ -3787,7 +3790,7 @@ static void x86_cpu_register_feature_bit_props(X86CPU 
*cpu,
 /* aliases don't use "|" delimiters anymore, they are registered
  * manually using object_property_add_alias() */
 assert(!strchr(name, '|'));
-x86_cpu_register_bit_prop(cpu, name, >env.features[w], bitnr);
+x86_cpu_register_bit_prop(cpu, name, w, bitnr);
 }
 
 static GuestPanicInformation *x86_cpu_get_crash_info(CPUState *cs)
-- 
2.11.0.259.g40922b1




Re: [Qemu-devel] [RFC v3 3/3] hw/intc/arm_gicv3_its: Allow save/restore

2017-03-28 Thread Juan Quintela
Eric Auger  wrote:
> We change the restoration priority of both the GICv3 and ITS. The
> GICv3 must be restored before the ITS and the ITS needs to be restored
> before PCIe devices since it translates their MSI transactions.
>
> Signed-off-by: Eric Auger 

Reviewed-by: Juan Quintela 



Re: [Qemu-devel] [RFC v3 2/3] hw/intc/arm_gicv3_its: Implement state save/restore

2017-03-28 Thread Juan Quintela
Eric Auger  wrote:
> We need to handle both registers and ITS tables. While
> register handling is standard, ITS table handling is more
> challenging since the kernel API is devised so that the
> tables are flushed into guest RAM and not in vmstate buffers.
>
> Flushing the ITS tables on device pre_save() is too late
> since the guest RAM is already saved at this point.

We need to put a way to register handlers for this.

> Table flushing needs to happen when we are sure the vcpus
> are stopped and before the last dirty page saving. The
> right point is RUN_STATE_FINISH_MIGRATE but sometimes the
> VM gets stopped before migration launch so let's simply
> flush the tables each time the VM gets stopped.

Just curious, how slow is doing that in all stops?


No comments in the rest of the patch


>  static void kvm_arm_its_init(Object *obj)
> @@ -102,6 +122,80 @@ static void kvm_arm_its_init(Object *obj)
>   _abort);
>  }
>  
> +/**
> + * kvm_arm_its_pre_save - handles the saving of ITS registers.
> + * ITS tables are flushed into guest RAM separately and earlier,
> + * through the VM change state handler, since at the moment pre_save()
> + * is called, the guest RAM has already been saved.
> + */
> +static void kvm_arm_its_pre_save(GICv3ITSState *s)
> +{

...

> +}
> +
> +/**
> + * kvm_arm_its_post_load - Restore both the ITS registers and tables
> + */
> +static void kvm_arm_its_post_load(GICv3ITSState *s)
> +{

...

> +}
> +

I assume that two functions are right.  I have no clue about ARM.

> @@ -109,6 +203,8 @@ static void kvm_arm_its_class_init(ObjectClass *klass, 
> void *data)
>  
>  dc->realize = kvm_arm_its_realize;
>  icc->send_msi = kvm_its_send_msi;
> +icc->pre_save = kvm_arm_its_pre_save;
> +icc->post_load = kvm_arm_its_post_load;
>  }

Let me see if I understood this correctly.

We have an ARM_GICV3_ITS_COMMON.  And that has some fields.
In particular:

struct GICv3ITSState {
/* Registers */
uint32_t ctlr;
uint64_t cbaser;
uint64_t cwriter;
uint64_t creadr;
uint64_t baser[8];
/* lots of things removed */
};



We have this in arm_gicv3_its_common.c  (it is exactly the same for
post_load, so we forgot about it by now).


static void gicv3_its_pre_save(void *opaque)
{
GICv3ITSState *s = (GICv3ITSState *)opaque; (*)
   /* nitpit: the cast
   is useless */
GICv3ITSCommonClass *c = ARM_GICV3_ITS_COMMON_GET_CLASS(s);

if (c->pre_save) {
c->pre_save(s);
}
}

And then we have in the patch:


> @@ -109,6 +203,8 @@ static void kvm_arm_its_class_init(ObjectClass *klass, 
> void *data)
>  
>  dc->realize = kvm_arm_its_realize;
>  icc->send_msi = kvm_its_send_msi;
> +icc->pre_save = kvm_arm_its_pre_save;
> +icc->post_load = kvm_arm_its_post_load;
>  }


struct GICv3ITSCommonClass {

void (*pre_save)(GICv3ITSState *s);
void (*post_load)(GICv3ITSState *s);
};


Notice that I have only found one user of this on the tree, so I don't
know if there is a good reason for this.


static void gicv3_its_common_class_init(ObjectClass *klass, void *data)
{
DeviceClass *dc = DEVICE_CLASS(klass);

dc->reset = gicv3_its_common_reset;
dc->vmsd = _its;
}

So, what if we change:

const VMSField vmstate_its_fields[] = {
 VMSTATE_UINT32(ctlr, GICv3ITSState),
 VMSTATE_UINT32(iidr, GICv3ITSState),
 VMSTATE_UINT64(cbaser, GICv3ITSState),
 VMSTATE_UINT64(cwriter, GICv3ITSState),
 VMSTATE_UINT64(creadr, GICv3ITSState),
 VMSTATE_UINT64_ARRAY(baser, GICv3ITSState, 8),
 VMSTATE_END_OF_LIST()
};



Remove the dc->vmsd = _its; from gicv3_its_common_class_init();

And we add in arm_gicv3_its_kvm.c


static const VMStateDescription vmstate_its_kvm = {
.name = "arm_gicv3_its",
.pre_save = kvm_arm_its_pre_save,
.post_load = kvm_arm_its_post_load,
.fields = _its_fields;
},
};

And add the:

dc->vmstate = _its_kvm;

into kvm_arm_its_class_init()?

And be with it?  Or it is too late by then?

I am assuming that there is some reason why we want to call
arm_gicv3_its either for kvm or for anything else.  But IMHO, you are
making things more complicated that they need to be.

My understanding:
- We have GICv3 ITS state
- We want to have several implementations
- We want to be able to migration from one to another


Or have I missed something?

Notice that I like more this other approach, but as far as I can see,
yours should also work.

Thanks, Juan.






Re: [Qemu-devel] [PATCH v2 for-2.10 0/4] block: Add errp to b{lk, drv}_truncate()

2017-03-28 Thread Max Reitz
On 23.03.2017 19:05, Kevin Wolf wrote:
> Am 08.03.2017 um 20:14 hat Max Reitz geschrieben:
>> Having an Error parameter for these functions makes sense because we
>> sometimes want a bit more information than just "Something failed". Some
>> drivers already use error_report() and the like to emit this additional
>> information, so it's rather obvious that we do want a real error object
>> here.
> 
> I had only some minor comments about improving one error message or
> another, so whether you fix them or not:
> 
> Reviewed-by: Kevin Wolf 
> 
> (But if you don't fix them, you need to merge the patches yourself)

That sounds very fair, I like it. :-)

Don't worry, though, I will address them. Thanks for reviewing!

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] QEMU website (wiki) improvements

2017-03-28 Thread Stefan Weil

Am 23.01.2017 um 11:28 schrieb Stefan Hajnoczi:

On Sun, Jan 22, 2017 at 04:19:43PM +0100, Stefan Weil wrote:

On 03/02/15 23:12, Stefan Hajnoczi wrote:

On Sat, Feb 28, 2015 at 04:29:44PM +0100, Stefan Weil wrote:

* It does not support secure access (https), so each login is insecure.
  Can we get a free server certificate?


This is on my todo list.  I'm travelling right now but will work on it
over the coming weeks.

There are some gotchas:

1. qemu.org vs qemu-project.org.  Unless we get a SNI certificate, the
   certificate will only be valid for one or the other.  Users will get
   an untrusted certificate message if they go to the other domain name.

2. We use subdomains, so a wildcard certificate is necessary.  That's
   not always offered for free so I need to compare the certificate
   vendors.

Stefan



Although this discussion thread is rather old, its subject
still applies.

In the meantime there are free certificates available.
We could add https support with a certificate from
https://letsencrypt.org/. As long as there is only a
small number of host names (*), I'd simply add them all
to the primary certificate. In addition, SNI certificates
for the different names can be installed.


Good idea, Jeff and I have discussed Let's Encrypt and have experience
setting it up.


I can help with the installation if that is needed.

Stefan

(*)

qemu.org
qemu.osuosl.org
qemu-project.org
wiki.qemu.org
wiki.qemu-project.org
www.qemu.org
www.qemu-project.org

Are there more host names used?


git.qemu.org
git.qemu-project.org



Update: Currently https://www.qemu-project.org/ supports secure
connections, but uses a self-signed certificate which was issued
for qemu.org, so it still cannot be simply used in most browsers.

Regards
Stefan




Re: [Qemu-devel] [PATCH v2 for-2.10 3/4] block: Add errp to BD.bdrv_truncate()

2017-03-28 Thread Max Reitz
On 23.03.2017 19:00, Kevin Wolf wrote:
> Am 08.03.2017 um 20:15 hat Max Reitz geschrieben:
>> Add an Error parameter to the block drivers' bdrv_truncate() interface.
>> If a block driver does not set this in case of an error, the generic
>> bdrv_truncate() implementation will do so.
>>
>> Where it is obvious, this patch also makes some block drivers set this
>> value.
>>
>> Signed-off-by: Max Reitz 
> 
>> diff --git a/block/iscsi.c b/block/iscsi.c
>> index 75d890538e..ab559a6f71 100644
>> --- a/block/iscsi.c
>> +++ b/block/iscsi.c
>> @@ -2060,7 +2060,7 @@ static void iscsi_reopen_commit(BDRVReopenState 
>> *reopen_state)
>>  }
>>  }
>>  
>> -static int iscsi_truncate(BlockDriverState *bs, int64_t offset)
>> +static int iscsi_truncate(BlockDriverState *bs, int64_t offset, Error 
>> **errp)
>>  {
>>  IscsiLun *iscsilun = bs->opaque;
>>  Error *local_err = NULL;
>> @@ -2071,7 +2071,7 @@ static int iscsi_truncate(BlockDriverState *bs, 
>> int64_t offset)
>>  
>>  iscsi_readcapacity_sync(iscsilun, _err);
>>  if (local_err != NULL) {
>> -error_free(local_err);
>> +error_propagate(errp, local_err);
>>  return -EIO;
>>  }
> 
> I think this function contains a few more cases for patch 4.

I'm probably going to follow Stefan's proposal of adding (even generic)
error messages to all drivers anyway.

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v2 for-2.10 4/4] block: Add some bdrv_truncate() error messages

2017-03-28 Thread Max Reitz
On 23.03.2017 19:03, Kevin Wolf wrote:
> Am 08.03.2017 um 20:15 hat Max Reitz geschrieben:
>> Add missing error messages for the drivers I am comfortable to do this
>> in.
>>
>> Since one of these changes touches a mis-indented block in
>> block/file-posix.c, this patch fixes that coding style issue along the
>> way.
>>
>> Signed-off-by: Max Reitz 
> 
>> diff --git a/block/qcow2.c b/block/qcow2.c
>> index 17585fbb89..53b0bd61a7 100644
>> --- a/block/qcow2.c
>> +++ b/block/qcow2.c
>> @@ -2550,6 +2550,7 @@ static int qcow2_truncate(BlockDriverState *bs, 
>> int64_t offset, Error **errp)
>>  new_l1_size = size_to_l1(s, offset);
>>  ret = qcow2_grow_l1_table(bs, new_l1_size, true);
>>  if (ret < 0) {
>> +error_setg(errp, "Failed to grow the L1 table");
> 
> Let's not throw away error codes, error_setg_errno() is your friend.

:-)

Will do. Not sure why I haven't.

Max

>>  return ret;
>>  }
>>  
>> @@ -2558,6 +2559,7 @@ static int qcow2_truncate(BlockDriverState *bs, 
>> int64_t offset, Error **errp)
>>  ret = bdrv_pwrite_sync(bs->file, offsetof(QCowHeader, size),
>> , sizeof(uint64_t));
>>  if (ret < 0) {
>> +error_setg(errp, "Failed to update the image size");
> 
> Here, too.
> 
>>  return ret;
>>  }
>>  
>> diff --git a/block/qed.c b/block/qed.c
>> index fa2aeee471..eb346d645b 100644
>> --- a/block/qed.c
>> +++ b/block/qed.c
>> @@ -1526,11 +1526,12 @@ static int bdrv_qed_truncate(BlockDriverState *bs, 
>> int64_t offset, Error **errp)
>>  
>>  if (!qed_is_image_size_valid(offset, s->header.cluster_size,
>>   s->header.table_size)) {
>> +error_setg(errp, "Invalid image size specified");
>>  return -EINVAL;
>>  }
>>  
>> -/* Shrinking is currently not supported */
>>  if ((uint64_t)offset < s->header.image_size) {
>> +error_setg(errp, "Shrinking images is currently not supported");
>>  return -ENOTSUP;
>>  }
>>  
>> @@ -1539,6 +1540,7 @@ static int bdrv_qed_truncate(BlockDriverState *bs, 
>> int64_t offset, Error **errp)
>>  ret = qed_write_header_sync(s);
>>  if (ret < 0) {
>>  s->header.image_size = old_image_size;
>> +error_setg(errp, "Failed to update the image size");
> 
> As well as here.
> 
>>  }
>>  return ret;
>>  }
> 
> Kevin
> 




signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v2 for-2.10 2/4] block: Add errp to b{lk, drv}_truncate()

2017-03-28 Thread Max Reitz
On 23.03.2017 18:46, Kevin Wolf wrote:
> Am 08.03.2017 um 20:14 hat Max Reitz geschrieben:
>> For one thing, this allows us to drop the error message generation from
>> qemu-img.c and blockdev.c and instead have it unified in
>> bdrv_truncate().
>>
>> Signed-off-by: Max Reitz 
> 
>> diff --git a/block/qcow2.c b/block/qcow2.c
>> index 6a92d2ef3f..43b8a986f0 100644
>> --- a/block/qcow2.c
>> +++ b/block/qcow2.c
>> @@ -2294,9 +2294,8 @@ static int qcow2_create2(const char *filename, int64_t 
>> total_size,
>>  }
>>  
>>  /* Okay, now that we have a valid image, let's give it the right size */
>> -ret = blk_truncate(blk, total_size);
>> +ret = blk_truncate(blk, total_size, errp);
>>  if (ret < 0) {
>> -error_setg_errno(errp, -ret, "Could not resize image");
> 
> Maybe error_prepend(errp, "Could not resize image: ") could make sense?

Sure, why not.

Max

>>  goto out;
>>  }
> 
> Kevin
> 




signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v3 for-2.9?] virtio: fix vring_align() on 64-bit windows

2017-03-28 Thread Stefan Weil

Am 28.03.2017 um 20:56 schrieb Andrew Baumann:

From: Eric Blake [mailto:ebl...@redhat.com]

Is this needed for 2.9?


Yes, it's a correctness bug that avoids miscompilation on 64-bit targets
where long is 32 bits (which, at the moment, is really just Windows).


I agree, this should be in 2.9. I dropped the tag by accident.


I wonder why I never before noticed
a problem or got a bug report for this issue.


Probably because so few people are testing on native Windows, and it
doesn't affect other platforms.


In addition to that, you only notice it on virtio devices mapped above the 
32-bit limit...


I think that is the reason why most people don't get
that problem.

I also think that only a few people are testing on Windows,
but there seem to be more people than expected who simply
use it. Most of them will never complain when they have a
problem, but sometimes I also get e-mails which report
an issue.

By the way: I expect that more Windows users will be
attracted as soon as the HAXM acceleration works better
(Intel is just preparing a new HAXM version which fixes
CPUID, something which was reported to me by a Windows
user).

Stefan





Re: [Qemu-devel] [PATCH v3 for-2.9?] virtio: fix vring_align() on 64-bit windows

2017-03-28 Thread Philippe Mathieu-Daudé

Hi, I never received Andrew Baumann's mail via the ML...

On 03/28/2017 03:38 PM, Stefan Weil wrote:

Am 25.03.2017 um 00:19 schrieb QEMU_ALIGN_UP:

long is 32-bits on 64-bit windows, which caused the top half of the
address to be truncated; this patch changes it to use the
QEMU_ALIGN_UP macro which does not suffer the same problem

Signed-off-by: Andrew Baumann 
Reviewed-by: Eric Blake 


Reviewed-by: Philippe Mathieu-Daudé 


---
 include/hw/virtio/virtio.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 15efcf2..7b6edba 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -34,7 +34,7 @@ struct VirtQueue;
 static inline hwaddr vring_align(hwaddr addr,
  unsigned long align)
 {
-return (addr + align - 1) & ~(align - 1);
+return QEMU_ALIGN_UP(addr, align);
 }

 typedef struct VirtQueue VirtQueue;



Eric added "for-2.9" to the subject line of v2, but now it was
missing again for v3.

Is this needed for 2.9? I wonder why I never before noticed
a problem or got a bug report for this issue.

Regards
Stefan






[Qemu-devel] [PATCH v2 1/1] slirp: add SOCKS5 support

2017-03-28 Thread Laurent Vivier
When the VM is used behind a firewall, This allows
the use of a SOCKS5 proxy server to connect the VM IP stack
directly to the Internet.

This implementation doesn't manage UDP packets, so they
are simply dropped (as with restrict=on), except for
the localhost as we need it for DNS.

Signed-off-by: Laurent Vivier 
---
 net/slirp.c |  39 +++-
 qapi-schema.json|   9 ++
 qemu-options.hx |  11 ++
 slirp/Makefile.objs |   2 +-
 slirp/ip_icmp.c |   2 +-
 slirp/libslirp.h|   3 +
 slirp/slirp.c   |  66 +++-
 slirp/slirp.h   |   6 ++
 slirp/socket.h  |   4 +
 slirp/socks5.c  | 284 
 slirp/socks5.h  |  85 
 slirp/tcp_subr.c|  21 +++-
 slirp/udp.c |   9 ++
 slirp/udp6.c|   2 +-
 14 files changed, 531 insertions(+), 12 deletions(-)
 create mode 100644 slirp/socks5.c
 create mode 100644 slirp/socks5.h

diff --git a/net/slirp.c b/net/slirp.c
index f97ec23..8a5dc3f 100644
--- a/net/slirp.c
+++ b/net/slirp.c
@@ -41,6 +41,7 @@
 #include "sysemu/sysemu.h"
 #include "qemu/cutils.h"
 #include "qapi/error.h"
+#include "crypto/secret.h"
 
 static int get_str_sep(char *buf, int buf_size, const char **pp, int sep)
 {
@@ -139,6 +140,33 @@ static void net_slirp_cleanup(NetClientState *nc)
 QTAILQ_REMOVE(_stacks, s, entry);
 }
 
+static int net_slirp_add_proxy(SlirpState *s, const char *proxy_server,
+   const char *proxy_user,
+   const char *proxy_secretid)
+{
+InetSocketAddress *addr;
+char *password = NULL;
+int ret;
+
+if (proxy_server == NULL) {
+return 0;
+}
+
+if (proxy_secretid) {
+password = qcrypto_secret_lookup_as_utf8(proxy_secretid, _fatal);
+}
+
+addr = inet_parse(proxy_server, _fatal);
+
+ret = slirp_add_proxy(s->slirp, addr->host, atoi(addr->port),
+  proxy_user, password);
+
+qapi_free_InetSocketAddress(addr);
+g_free(password);
+
+return ret;
+}
+
 static NetClientInfo net_slirp_info = {
 .type = NET_CLIENT_DRIVER_USER,
 .size = sizeof(SlirpState),
@@ -155,7 +183,8 @@ static int net_slirp_init(NetClientState *peer, const char 
*model,
   const char *bootfile, const char *vdhcp_start,
   const char *vnameserver, const char *vnameserver6,
   const char *smb_export, const char *vsmbserver,
-  const char **dnssearch)
+  const char **dnssearch, const char *proxy_server,
+  const char *proxy_user, const char *proxy_secretid)
 {
 /* default settings according to historic slirp */
 struct in_addr net  = { .s_addr = htonl(0x0a000200) }; /* 10.0.2.0 */
@@ -361,6 +390,11 @@ static int net_slirp_init(NetClientState *peer, const char 
*model,
 }
 #endif
 
+if (net_slirp_add_proxy(s, proxy_server,
+proxy_user, proxy_secretid) < 0) {
+goto error;
+}
+
 s->exit_notifier.notify = slirp_smb_exit;
 qemu_add_exit_notifier(>exit_notifier);
 return 0;
@@ -878,7 +912,8 @@ int net_init_slirp(const Netdev *netdev, const char *name,
  user->ipv6_host, user->hostname, user->tftp,
  user->bootfile, user->dhcpstart,
  user->dns, user->ipv6_dns, user->smb,
- user->smbserver, dnssearch);
+ user->smbserver, dnssearch, user->proxy_server,
+ user->proxy_user, user->proxy_secretid);
 
 while (slirp_configs) {
 config = slirp_configs;
diff --git a/qapi-schema.json b/qapi-schema.json
index b921994..1799ae2 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -3658,6 +3658,12 @@
 #
 # @guestfwd: forward guest TCP connections
 #
+# @proxy-server: address of the SOCKS5 proxy server to use (since 2.10)
+#
+# @proxy-user: username to use with the proxy server (since 2.10)
+#
+# @proxy-secretid: secret id to use for the proxy server password (since 2.10)
+#
 # Since: 1.2
 ##
 { 'struct': 'NetdevUserOptions',
@@ -3680,6 +3686,9 @@
 '*ipv6-dns': 'str',
 '*smb':   'str',
 '*smbserver': 'str',
+'*proxy-server': 'str',
+'*proxy-user':   'str',
+'*proxy-secretid': 'str',
 '*hostfwd':   ['String'],
 '*guestfwd':  ['String'] } }
 
diff --git a/qemu-options.hx b/qemu-options.hx
index 99af8ed..e625d1a 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -1645,6 +1645,7 @@ DEF("netdev", HAS_ARG, QEMU_OPTION_netdev,
 #ifndef _WIN32
  "[,smb=dir[,smbserver=addr]]\n"
 #endif
+" [,proxy-server=addr:port[,proxy-user=user,proxy-secretid=id]]\n"
 "configure a user mode network backend with ID 'str',\n"
 "its DHCP server and optional services\n"
 #endif
@@ 

[Qemu-devel] [PATCH v2 0/1] slirp: add SOCKS5 support

2017-03-28 Thread Laurent Vivier
This patch implements the SOCKS5 client part for "-net user" backend.

It allows to route all internet traffic of the virtual machine
to a SOCKS5 server.

But all the local traffic (to the host) is sent to the host.
It is needed because this SOCKS5 client doesn't route UDP traffic,
and this allows to use the host DNS server.

I've tested this using public SOCKS5 proxy list found on the WEB, and
using TOR server on my host.

Used with TOR, all the TCP connections are sent to the TOR network and
this allows to insert a virtual machine directly in the TOR network
without needing more configuration in the virtual machine.

But be aware that all DNS requests will be sent to the host that can
forward them to internet with its own IP address. So confidentiality
will not be as good as with the TOR browser which hides in the TOR
network all the DNS requests.

If you want to test this:

- with a public SOCKS5 server, ask google for "socks5 proxy address"
  and start QEMU with, for instance:

  qemu-system-x86_64 -net nic,model=e1000 -net 
user,proxy-server=46.105.121.37:63066 ...

  if needed, you can provide user/password using secret objects framework:
  "-object secret,id=sec0,data=password,format=raw \
   -net user,...,proxy-user=user,proxy-secretid=sec0"

- with a local TOR proxy:

  sudo systemctl start tor
  qemu-system-x86_64 -net nic,model=e1000 -net user,proxy-server=localhost:9050 
...

You can check your IP address is the one of the proxy by connecting
to http://check.torproject.org with a browser inside the VM.

v2:
  - use secret objects framework to provide password
  - add documentation for new parameters in qapi-schema.json
  - s/passwd/password/g
  - I didn't move proxy paramaters to a substruct as I didn't
find a way to set them from the command line :(

Laurent Vivier (1):
  slirp: add SOCKS5 support

 net/slirp.c |  39 +++-
 qapi-schema.json|   9 ++
 qemu-options.hx |  11 ++
 slirp/Makefile.objs |   2 +-
 slirp/ip_icmp.c |   2 +-
 slirp/libslirp.h|   3 +
 slirp/slirp.c   |  66 +++-
 slirp/slirp.h   |   6 ++
 slirp/socket.h  |   4 +
 slirp/socks5.c  | 284 
 slirp/socks5.h  |  85 
 slirp/tcp_subr.c|  21 +++-
 slirp/udp.c |   9 ++
 slirp/udp6.c|   2 +-
 14 files changed, 531 insertions(+), 12 deletions(-)
 create mode 100644 slirp/socks5.c
 create mode 100644 slirp/socks5.h

-- 
2.9.3




Re: [Qemu-devel] [RFC] Split migration bitmaps by ramblock

2017-03-28 Thread Dr. David Alan Gilbert
* Juan Quintela (quint...@redhat.com) wrote:
> Hi
> 
> This series split the migration and unsent bitmaps by ramblock.  This
> makes it easier to synchronize in small bits.  This is on top of the
> RAMState and not-hotplug series.

So I think generally this is a good idea; my main reason is I'd like
to see per-NUMA node syncing, preferably tied with multi-fd so that
each of the NUMA nodes syncs and stuffs data over it's own NIC.
Although we would still have to be careful about cases where one
node is hammering it's RAM and the others are idleing.

> Why?
> 
> reason 1:
> 
> People have complained that by the time that we detect that a page is
> sent, it has already been marked dirty "again" inside kvm, so we are
> going to send it again.  On top of this patch, my idea is, for words
> of the bitmap that have any bit set, just synchonize the bitmap before
> sending the pages.  I have not looking into performance numbers yet,
> jsut asking for comments about how it is done.
> 
> reason 2:
> 
> In case where the host page is a multiple of the the TARGET_PAGE_SIZE,
> we do a lot of work when we are synchronizing the bitmaps to pass it
> to target page size.  The idea is to change the bitmaps on that
> RAMBlocks to mean host page size and not TARGET_PAGE_SIZE.
> 
> Note that there are two reason for this, ARM and PPC do things like
> guests with 4kb pages on hosts with 16/64kb hosts, and then we have
> HugePages.  Note all the workarounds that postcopy has to do because
> to work in HugePages size.

There are some fun problems with changing the bitmap page size;
off the top of my head, the ones I can remember include:
a) I'm sure I've seen rare cases where a target page is marked as
   dirty inside a hostpage; I'm guessing that was qemu's doing, but
   there are more subtle cases, e.g. running a 4kb guest on a 64kb host;
   it's legal - and 4kb power guests used to exist;  I think in those
   cases you see KVM only marking one target page as dirty.

b) Are we required to support migration across hosts of different pagesize;
   and if we do that what size should a bit represent?
   People asked about it during postcopy but I think it's restricted to
   matching sizes.  I don't think precopy has any requirement for matching
   host pagesize at the moment.  64bit ARM does 4k, 64k and I think 16k was
   added later.

c) Hugepages have similar issues; precopy doesn't currently have any
   requirement for the hugepage selection on the two hosts to match,
   but it does on postcopy.  Also you don't want to have a single dirty
   bit for a 1GB host hugepage if you can handle detecting changes at
   a finer grain level.

Dave
> Please, comment?
> 
> Later, Juan.
> 
> Juan Quintela (1):
>   ram: Split dirty bitmap by RAMBlock
> 
>  include/exec/ram_addr.h |  13 +++-
>  migration/ram.c | 201 
> ++--
>  2 files changed, 85 insertions(+), 129 deletions(-)
> 
> -- 
> 2.9.3
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-devel] [PATCH] Create libqemutrace.a for all trace.o

2017-03-28 Thread Xu, Anthony
> >>> ./trace.o, ./qapi/trace.o and ./util/trace.o are added into
> >>> libqemuutil.a  to avoid recursive dependencies between
> >>> libqemuutil.a and libqemutrace.a.
> >> Why would libqemutrace.a depend on libqemuutil.a?
> > Each trace.c calls trace_event_register_group to register events,
> > trace_event_register_group is defined in trace/control.c , which
> > is linked into libqemuutil.a.
> 
> Ah:
> 
> util-obj-$(CONFIG_TRACE_SIMPLE) += simple.o
> util-obj-$(CONFIG_TRACE_FTRACE) += ftrace.o
> util-obj-y += control.o
> util-obj-y += qmp.o
> 
> With the introduction of libqemutrace.a, I believe these should be moved
> into libqemutrace.a.

Agreed,
But it doesn't solve infinite recursion issue. register_module_init is 
needed by libqemutrace.a, which is defined util/module.c.

it is hard to remove libqemutrace.a dependency on libqemuutil.a.

Removing libqemuutil.a dependency on libqemutrace.a is feasible.
Just like what I did in this patch, include all util related trace.o 
to libqemuutila.

The other simple way is to include all trace.o into libqemuutil.a

What's your opinion?

Thanks
Anthony



Re: [Qemu-devel] [PATCH v3 for-2.9?] virtio: fix vring_align() on 64-bit windows

2017-03-28 Thread Andrew Baumann via Qemu-devel
> From: Eric Blake [mailto:ebl...@redhat.com]
> Sent: Tuesday, 28 March 2017 11:52
> 
> On 03/28/2017 01:38 PM, Stefan Weil wrote:
> > Am 25.03.2017 um 00:19 schrieb Andrew Baumann:
> >> long is 32-bits on 64-bit windows, which caused the top half of the
> >> address to be truncated; this patch changes it to use the
> >> QEMU_ALIGN_UP macro which does not suffer the same problem
> >>
> >> Signed-off-by: Andrew Baumann 
> >> Reviewed-by: Eric Blake 
> >> ---
> 
> > Eric added "for-2.9" to the subject line of v2, but now it was
> > missing again for v3.
> >
> > Is this needed for 2.9?
> 
> Yes, it's a correctness bug that avoids miscompilation on 64-bit targets
> where long is 32 bits (which, at the moment, is really just Windows).

I agree, this should be in 2.9. I dropped the tag by accident.

> > I wonder why I never before noticed
> > a problem or got a bug report for this issue.
> 
> Probably because so few people are testing on native Windows, and it
> doesn't affect other platforms.

In addition to that, you only notice it on virtio devices mapped above the 
32-bit limit...

Andrew


Re: [Qemu-devel] [PATCH for 2.9?] tap-win32: don't abort in tap_enable(); enables -netdev tap

2017-03-28 Thread Andrew Baumann via Qemu-devel
> From: Stefan Weil [mailto:s...@weilnetz.de]
> Sent: Tuesday, 28 March 2017 11:28

> Am 25.03.2017 um 00:46 schrieb Andrew Baumann:
> > The docs generally steer users away from using the legacy -net
> > parameter, however on win32 attempting to enable a tap device using
> > -netdev tap fails at an abort() in tap_enable(). Removing the abort()s
> > seems to be enough to get everything working, so do that.
> >
> > Signed-off-by: Andrew Baumann 
> > ---
> >  net/tap-win32.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/net/tap-win32.c b/net/tap-win32.c
> > index 662f9b6..3620843 100644
> > --- a/net/tap-win32.c
> > +++ b/net/tap-win32.c
> > @@ -811,10 +811,10 @@ int net_init_tap(const Netdev *netdev, const char
> *name,
> >
> >  int tap_enable(NetClientState *nc)
> >  {
> > -abort();
> > +return 0;
> >  }
> >
> >  int tap_disable(NetClientState *nc)
> >  {
> > -abort();
> > +return 0;
> >  }
> 
> As I never worked with TAP on Windows, I cannot say much to this fix.
> 
> Jason, what is the use of tap_enable, tap_disable? Is it fine
> to simply do nothing on Windows here?

I was also hoping for a review -- I'm no expert on this stuff either, but my 
quick reading of those code paths is that they issue ioctls to enable/disable 
packet reception on the underlying tap device. As win32 TAP is implemented, 
that is already enabled from start of day.

It's possible this patch still does not permit dynamic reconfiguration of tap 
devices (e.g. from the monitor console). However, it does work with the -netdev 
tap option on the command-line.

> And is this something for QEMU‌ 2.9 (I added question to subject line)?

Ideally, yes. If not, -netdev tap will continue to blow up in the abort as it 
does today...

Andrew


Re: [Qemu-devel] [PATCH v3 for-2.9?] virtio: fix vring_align() on 64-bit windows

2017-03-28 Thread Eric Blake
On 03/28/2017 01:38 PM, Stefan Weil wrote:
> Am 25.03.2017 um 00:19 schrieb Andrew Baumann:
>> long is 32-bits on 64-bit windows, which caused the top half of the
>> address to be truncated; this patch changes it to use the
>> QEMU_ALIGN_UP macro which does not suffer the same problem
>>
>> Signed-off-by: Andrew Baumann 
>> Reviewed-by: Eric Blake 
>> ---

> Eric added "for-2.9" to the subject line of v2, but now it was
> missing again for v3.
> 
> Is this needed for 2.9?

Yes, it's a correctness bug that avoids miscompilation on 64-bit targets
where long is 32 bits (which, at the moment, is really just Windows).

> I wonder why I never before noticed
> a problem or got a bug report for this issue.

Probably because so few people are testing on native Windows, and it
doesn't affect other platforms.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [RFC] Split migration bitmaps by ramblock

2017-03-28 Thread Dr. David Alan Gilbert
* Yang Hongyang (yanghongy...@huawei.com) wrote:
> 
> 
> On 2017/3/24 16:34, Juan Quintela wrote:
> > Yang Hongyang  wrote:
> >> Hi Juan,
> >>
> >>   First of all, I like the refactor patchset about RAMState, it makes
> >> things clean, great!
> > 
> > Thanks.
> > 
> > The whole idea of the series was to make testing changes easier.
> > 
> >> On 2017/3/24 5:01, Juan Quintela wrote:
> >>> Hi
> >>>
> >>> This series split the migration and unsent bitmaps by ramblock.  This
> >>> makes it easier to synchronize in small bits.  This is on top of the
> >>> RAMState and not-hotplug series.
> >>>
> >>> Why?
> >>>
> >>> reason 1:
> >>>
> >>> People have complained that by the time that we detect that a page is
> >>> sent, it has already been marked dirty "again" inside kvm, so we are
> >>> going to send it again.  On top of this patch, my idea is, for words
> >>> of the bitmap that have any bit set, just synchonize the bitmap before
> >>> sending the pages.  I have not looking into performance numbers yet,
> >>> jsut asking for comments about how it is done.
> >>
> >> Here you said 'synchonize the bitmap before sending the pages', do you
> >> mean synchronize the bitmap from kvm? If so, I doubt the performance...
> >> because every synchronization will require a ioctl(). If not, the
> >> synchronization of per block is useless.
> >>
> >> Currently, migration thread will synchronize the bitmap from kvm every
> >> iter(migration_bitmap_sync()). The RAMBlock already has kind of per block
> >> bitmap for this kind of sync. And the migration bitmap is used to put all
> >> those per block bitmap together for data sending use.
> > 
> > Hi
> > For huge memory machines, we are doing it always in one go.
> > 
> > 
> > bitmap_sync(1TB RAM)
> > walk bitmap for 512MB of RAM, at that point, it is very probable that
> > this page is again dirty in the KVM bitmap, so, we send it, but as it is
> > dirty again, we would have to send it in the next pass.  This sent is
> > completely useless.
> 
> I got your point, the problem is KVM do not have the ability to sync in
> small chunks currently, even it has, it will generate lots of ioctls:
> KVM bitmap sync is per Memory Region IIRC, think we have a 1T mem
> guest, 16 MRs for example, currently, every iter we need to do 16 ioctls.
> But if we sync in small chunks, 64M for example, we might need to do
> 16384 ioctls in the worst case. eg:mem press(dirty rate) is very high.

Yes but I think people  have suggested it should have that ability for a long
time; so hopefully we can add it.

I'd assumed the sizes of these chunks would be a bit bigger, a few GB each maybe
and also assumed we'd have a separate thread that was doing the syncing from
a different thread to the one doing the writing, trying to keep ahead of the
write pointer.

Even with the structure Juan has here, we could get to syncing each NUMA
node separately like that which would kind of makes some sense.

Dave

> 
> > 
> > So my idea is to split things in smaller chunks.  As we have to do an
> > ioctl, we wouldn't want to synchronize page by page, but perhaps 16MB at
> > a time, 64MB, anything less than the whole amount of memory.
> > 
> > Later, Juan.
> > 
> 
> -- 
> Thanks,
> Yang
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



[Qemu-devel] [PATCH for-2.9?] configure: Remove unused code (found by shellcheck)

2017-03-28 Thread Stefan Weil
smartcard_cflags is no longer needed since commit
0b22ef0f57a8910d849602bef0940edcd0553d2c.

Signed-off-by: Stefan Weil 
---

This is a very old patch which I just found in my patch queue.
It's not mandatory for 2.9, but I also think that there is
no risk to apply it.

Sorry for sending it so late.

Regards
Stefan

 configure | 1 -
 1 file changed, 1 deletion(-)

diff --git a/configure b/configure
index d1ce33bc79..b65ea2931c 100755
--- a/configure
+++ b/configure
@@ -4096,7 +4096,6 @@ EOF
 fi
 
 # check for smartcard support
-smartcard_cflags=""
 if test "$smartcard" != "no"; then
 if $pkg_config libcacard; then
 libcacard_cflags=$($pkg_config --cflags libcacard)
-- 
2.11.0




Re: [Qemu-devel] [PULL for-2.9] block: Declare blockdev-add and blockdev-del supported

2017-03-28 Thread Markus Armbruster
Paolo Bonzini  writes:

> On 28/03/2017 15:45, Markus Armbruster wrote:
>> It's been a long journey, but here we are.
>> 
>> The supported blockdev-add is not compatible to its experimental
>> predecessors; bump all Since: tags to 2.9.
>
> Can you document the differences in the 2.9 changelog?

Not sure I can before I drop off for a week of vacation.  Kevin, Max,
can you chip in?



Re: [Qemu-devel] [PATCH 42/51] ram: Pass RAMBlock to bitmap_sync

2017-03-28 Thread Juan Quintela
"Dr. David Alan Gilbert"  wrote:
> * Juan Quintela (quint...@redhat.com) wrote:
>> Yang Hongyang  wrote:
>> > On 2017/3/24 4:45, Juan Quintela wrote:
>> >> We change the meaning of start to be the offset from the beggining of
>> >> the block.
>> >> 
>> >> @@ -701,7 +701,7 @@ static void migration_bitmap_sync(RAMState *rs)
>> >>  qemu_mutex_lock(>bitmap_mutex);
>> >>  rcu_read_lock();
>> >>  QLIST_FOREACH_RCU(block, _list.blocks, next) {
>> >> -migration_bitmap_sync_range(rs, block->offset, 
>> >> block->used_length);
>> >> +migration_bitmap_sync_range(rs, block, 0, block->used_length);
>> If you have several terabytes of RAM that is too ineficient, because
>> when we arrive to the page_send(page), it is possible that it is already
>> dirty again, and we have to send it twice.  So, the idea is to change to
>> something like:
>> 
>> while(true) {
>> foreach(block)
>> bitmap_sync(block)
>> foreach(block)
>> foreach(64pages)
>> bitmap_sync(64pages)
>> foreach(page of the 64)
>>if (dirty)
>>   page_send(page)
>
> Yes, although it might be best to actually do the sync in a separate thread
> so that the sync is always a bit ahead of the thread doing the writing.

Doing it synchronously shouldn't be a problem.  But we should be able to
in smaller chucks.

Thanks, Juan.



Re: [Qemu-devel] [PATCH 11/51] ram: Move dup_pages into RAMState

2017-03-28 Thread Juan Quintela
Peter Xu  wrote:
> On Thu, Mar 23, 2017 at 09:45:04PM +0100, Juan Quintela wrote:
>> Once there rename it to its actual meaning, zero_pages.
>> 
>> Signed-off-by: Juan Quintela 
>> Reviewed-by: Dr. David Alan Gilbert 
>
> Reviewed-by: Peter Xu 
>
> Will post a question below though (not directly related to this patch
> but context-wide)...
>>  {
>>  int pages = -1;
>>  
>>  if (is_zero_range(p, TARGET_PAGE_SIZE)) {
>> -acct_info.dup_pages++;
>> +rs->zero_pages++;
>>  *bytes_transferred += save_page_header(f, block,
>> offset | 
>> RAM_SAVE_FLAG_COMPRESS);
>>  qemu_put_byte(f, 0);
>> @@ -822,11 +826,11 @@ static int ram_save_page(RAMState *rs, MigrationState 
>> *ms, QEMUFile *f,
>>  if (bytes_xmit > 0) {
>>  acct_info.norm_pages++;
>>  } else if (bytes_xmit == 0) {
>> -acct_info.dup_pages++;
>> +rs->zero_pages++;
>
> This code path looks suspicous... since iiuc currently it should only
> be triggered by RDMA case, and I believe here qemu_rdma_save_page()
> should have met something wrong (so that it didn't return with
> RAM_SAVE_CONTROL_DELAYED). Then is it correct we do increase zero page
> counting unconditionally here? (hmm, the default bytes_xmit is zero as
> well...)

My head hurts at this point.
ok.  bytse_xmit can only be zero if we called qemu_rdma_save_page() with
size=0 or there has been an RDMA error.  We ver call the function with
size = 0.  And if there is one error, we are in very bady shape already.

> Another thing is that I see when RDMA is enabled we are updating
> accounting info with acct_update_position(), while we updated it here
> as well. Is this an issue of duplicated accounting?

I think stats and rdma are not right.  I have to check more that.

Thanks, Juan.



Re: [Qemu-devel] qemu-devel mailing list vs DMARC and microsoft.com's p=reject policy

2017-03-28 Thread Eric Blake
On 03/28/2017 01:28 PM, Michael S. Tsirkin wrote:
>>  (2) I could reconfigure mailman to try to not rewrite anything that
>>  we think is likely to be signed (in particular not the body or the
>>  subject)
>>* this means dropping the [qemu-devel] tag from the subject, which I'm
>>  a bit reluctant to do (it seems likely at least some readers are
>>  filtering on it, and personally I quite like it)
>>* if anybody DKIM-signs the Sender: header we're stuck anyway
> 
> For the record I'd strongly prefer this option - I tag all list mail
> and so "qemu-devel" appears twice: in subject and as a tag.
> Also, if mail is copied to another list, qemu-devel will
> still appear as gmail de-duplicates email by msg id.
> I can remove tags I don't care about but can't remove
> subject prefixes.

I'm ambivalent - I like the prefixes, but don't mind if they are not
present (it's easy enough to filter on List-Sender: when the prefix is
not reliable).  It's especially nice that the prefix lets me tell the
difference between mail sent to qemu-devel and qemu-block while still
dumping both lists into the same folder.

> Is there a way not to munge the name? It's currently rewritten to
> add "via qemu-devel" which confuses the clients which think
> it's part of the name, and can't be easily stripped away.

Not that I know of, but at least the munging only occurs for senders
with restrictive DMARC and not for ALL senders.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v3 for-2.9?] virtio: fix vring_align() on 64-bit windows

2017-03-28 Thread Stefan Weil

Am 25.03.2017 um 00:19 schrieb Andrew Baumann:

long is 32-bits on 64-bit windows, which caused the top half of the
address to be truncated; this patch changes it to use the
QEMU_ALIGN_UP macro which does not suffer the same problem

Signed-off-by: Andrew Baumann 
Reviewed-by: Eric Blake 
---
 include/hw/virtio/virtio.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 15efcf2..7b6edba 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -34,7 +34,7 @@ struct VirtQueue;
 static inline hwaddr vring_align(hwaddr addr,
  unsigned long align)
 {
-return (addr + align - 1) & ~(align - 1);
+return QEMU_ALIGN_UP(addr, align);
 }

 typedef struct VirtQueue VirtQueue;



Eric added "for-2.9" to the subject line of v2, but now it was
missing again for v3.

Is this needed for 2.9? I wonder why I never before noticed
a problem or got a bug report for this issue.

Regards
Stefan




Re: [Qemu-devel] qemu-devel mailing list vs DMARC and microsoft.com's p=reject policy

2017-03-28 Thread Eric Blake
On 03/28/2017 12:53 PM, Andrew Baumann via Qemu-devel wrote:

>>  (3) I could set dmarc_moderation_action to Munge From, which means that
>>  those senders who have a p=reject policy will get their mails
>>  rewritten to have a From="Whoever (via the list) "
>>  and their actual email in the Reply-to:
>>* if anybody's mail client doesn't honour Reply-to: then what they
>>  think is a personal reply will go to the list by accident

That's my favorite of the options (and these days, reply-to works a lot
better than it used to even 10 year ago)...

>>
>> For the moment I have picked option (3), but I'm open to argument
>> that we should pick something else.
> 
> Option 3 is a fine one from my perspective (I could also live with 2). This 
> email will hopefully help you test whether it's effective.

and it appears to have worked; Andrew's mail purported to be from the
list, but had a correct 'Reply-to', and my mailer (thunderbird) appears
to do the right thing for reply-to-all.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH 16/51] ram: Move iterations into RAMState

2017-03-28 Thread Juan Quintela
Peter Xu  wrote:
> On Thu, Mar 23, 2017 at 09:45:09PM +0100, Juan Quintela wrote:
>> Signed-off-by: Juan Quintela 

>> @@ -693,13 +694,13 @@ static void migration_bitmap_sync(RAMState *rs)
>>  }
>>  
>>  if (migrate_use_xbzrle()) {
>> -if (rs->iterations_prev != acct_info.iterations) {
>> +if (rs->iterations_prev != rs->iterations) {
>>  acct_info.xbzrle_cache_miss_rate =
>> (double)(acct_info.xbzrle_cache_miss -
>>  rs->xbzrle_cache_miss_prev) /
>> -   (acct_info.iterations - rs->iterations_prev);
>> +   (rs->iterations - rs->iterations_prev);
>
> Here we are calculating cache miss rate by xbzrle_cache_miss and
> iterations. However looks like xbzrle_cache_miss is counted per guest
> page (in save_xbzrle_page()) while the iteration count is per host
> page (in ram_save_iterate()). Then, what if host page size not equals
> to guest page size? E.g., when host uses 2M huge pages, host page size
> is 2M, while guest page size can be 4K?

Good catch.  Will have to think about this.  You are right.  I will
change that later.

Thanks, Juan.



  1   2   3   4   >