date:20180416

Re: [Qemu-devel] [PATCH v4 1/2] Implement .hex file loader

2018-04-16 Thread Su Hang

> -Original Messages-
> From: "Su Hang" 
> Sent Time: 2018-04-16 22:49:48 (Monday)
> To: "stefan hajnoczi" 
> Cc: "jim mussared" , qemu-devel 
> , "joel stanley" 
> Subject: Re: [Qemu-devel] [PATCH v4 1/2] Implement .hex file loader

> I have tried to use be16_to_cpu() the first time you suggested it to me,
> but qemu crashed...
> I have read other places call this function to verify I use it correctly,
> but it did crash(that's why I choose bswap16()), It's strange...
> I'll report more detail about this case tomorrow.

Oops, when I try be16_to_cpu() today, It works without any problems...
I don't know what to say, maybe I have misused someplace last time.

Re: [Qemu-devel] [PATCH 1/1] Make qemu-bridge-helper work in macOS and FreeBSD

2018-04-16 Thread Nikhil Balachandra

Hi Stefan,

Thanks for the review. I'll be sending v2 version of this patch in next 3-4
days with the following changes.

1) New scripts/update-xnu-headers.sh script to import if_bridgevar.h[4]
into include/standard-headers/xnu directory.
@qemu-devel - Please let me know if there is a better approach.
2) Fix broken vnet hdr in this patch.
3) Having minimal code in qemu-bridge-helper.c to open tap device instead
of using tap_open().

[4] - macOS if_bridgevar.h header file has APSL and BSD license.


On Mon, Apr 16, 2018 at 1:01 PM, Stefan Hajnoczi  wrote:

> On Sat, Apr 07, 2018 at 01:12:05AM +0530, Nikhil Balachandra wrote:
> > Eventhough macOS does not ship with the if_bridgevar.h header file[2],
> > I expect the API to remain stable as this header file is similar to what
> > is found in other BSDs. If this patch is decided to be included in the
> > qemu, can experienced qemu developers please tell me how to go about
> > having this header file in the include path such that it does not require
> > manually downloading and copying the file[3]?
>
> QEMU ships Linux headers.  They are synced using this script:
> scripts/update-linux-headers.sh
>
> If the macOS header is appropriately licensed, it could be kept under
> include/standard-headers/ alongside the other third-party headers that
> QEMU ships.
>
> > @@ -310,30 +374,18 @@ int main(int argc, char **argv)
> >  goto cleanup;
> >  }
> >
> > +
> >  /* open the tap device */
> > -fd = open("/dev/net/tun", O_RDWR);
> > +memset(, '\0', sizeof(char) * IFNAMSIZ);
> > +int vnet_supported = has_vnet_hdr(fd);
>
> fd is always -1 here, so this patch breaks vnet hdr?
>
> > +Error *err = NULL;
> > +fd = tap_open([0], sizeof(iface), _supported, use_vnet,
> 0, );
>
> tap_open() was not written with setuid programs in mind.  I think this
> is a case where code duplication is justified.
>
> It's safer to have the minimal code to open the tap device rather than
> calling into QEMU code which may not realize it is running setuid.
>

Re: [Qemu-devel] [PATCH v2 5/7] ACPICA: Integrate package handling with module-level code

2018-04-16 Thread Dan Williams

On Mon, Apr 16, 2018 at 5:05 PM, Schmauss, Erik  wrote:
>
>> -Original Message-
>> From: linux-acpi-ow...@vger.kernel.org [mailto:linux-acpi-
>> ow...@vger.kernel.org] On Behalf Of Dan Williams
>> Sent: Monday, April 16, 2018 4:22 PM
>> To: Schmauss, Erik 
>> Cc: Rafael J. Wysocki ; Linux ACPI > a...@vger.kernel.org>; Moore, Robert ; linux-
>> nvdimm ; Qemu Developers > de...@nongnu.org>
>> Subject: Re: [PATCH v2 5/7] ACPICA: Integrate package handling with module-
>> level code
>>
>> On Mon, Apr 16, 2018 at 4:15 PM, Schmauss, Erik 
>> wrote:
>> > [ trimming ]
>> >> >> Rafael, we may want to hold back on the module-level code changes
>> >> >> (the patches below) for rc1. Between this and the strange _TSS
>> >> >> issue, it seems like there are a few more things to resolve before
>> >> >> this is ready for kernel upstream.
>> >> >
>> > Hi Rafael,
>> >
>> >> > It looks like you are asking me to queue up reverts as per the
>> >> > Dan's report, is that correct?
>> >
>> > This is indeed what I meant last week. However, I've looked into the
>> > issue and Dan's qemu instance had AML that we no longer support. This
>> > is because the ACPICA commit makes changes to the execution of AML
>> > during table load to match windows AML interpreter behavior so this commit
>> also got rid of support for executing code containing forward references 
>> (except
>> for package elements).
>> >
>> > I've suggested a fix for the firmware in a separate email. So I would
>> > say that this issue is resolved after if Dan can run his test successfully 
>> > with the
>> adjusted firmware.
>> >
>> > If Dan's test is successful, we don’t need to revert these changes
>>
>
> Hi Dan,
>
>> I'm concerned about other qemu-kvm users that do not upgrade their hypervisor
>> at the same pace as their guest kernel. Especially for cloud providers that 
>> may
>> be running latest mainline kernel on older qemu-kvm this will look like a 
>> pure
>> kernel regression. Is there a quick fix we can carry in the kernel to 
>> support these
>> forward references, at least until we know that qemu-kvm is no longer 
>> shipping
>> the broken AML?
>
> This is a very good point. Thanks for bringing this up! One option is for 
> them to set
> the global variable acpi_gbl_execute_tables_as_methods in 
> include/acpi/acpixf.h to false.
> This will effectively revert the new behavior in the AML interpreter and go 
> back to the old way.
> Since this is a global flag, we could have a command line option for Linux 
> kernel to turn this
> feature on.
>
> Out of curiosity, is this ACPI table somehow customized for your work? I have 
> a collection
> of acpi tables and your ACPI tables are the only ones that have an 
> OperationRegion called
> NRAM. What is the chance that others will be running Linux on the same tables 
> as the one
> you sent me?

I don't think there's anything atypical about my particular setup. It
creates two virtual NVDIMMs that each represent a 30GB address space.
I suspect any user of the KVM NVDIMM virtualization would see the same
problem.

Re: [Qemu-devel] [PATCH for 2.13 v3 1/2] spapr: Add ibm, max-associativity-domains property

2018-04-16 Thread Bharata B Rao

On Mon, Apr 16, 2018 at 07:47:29PM +0300, Serhii Popovych wrote:
> Bharata B Rao wrote:
> > On Wed, Apr 11, 2018 at 02:41:59PM -0400, Serhii Popovych wrote:
> >> Now recent kernels (i.e. since linux-stable commit a346137e9142
> >> ("powerpc/numa: Use ibm,max-associativity-domains to discover possible 
> >> nodes")
> >> support this property to mark initially memory-less NUMA nodes as 
> >> "possible"
> >> to allow further memory hot-add to them.
> >>
> >> Advertise this property for pSeries machines to let guest kernels detect
> >> maximum supported node configuration and benefit from kernel side change
> >> when hot-add memory to specific, possibly empty before, NUMA node.
> >>
> >> Signed-off-by: Serhii Popovych 
> >> ---
> >>  hw/ppc/spapr.c | 10 ++
> >>  1 file changed, 10 insertions(+)
> >>
> >> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> >> index a81570e..c05bbad 100644
> >> --- a/hw/ppc/spapr.c
> >> +++ b/hw/ppc/spapr.c
> >> @@ -910,6 +910,13 @@ static void spapr_dt_rtas(sPAPRMachineState *spapr, 
> >> void *fdt)
> >>  0, cpu_to_be32(SPAPR_MEMORY_BLOCK_SIZE),
> >>  cpu_to_be32(max_cpus / smp_threads),
> >>  };
> >> +uint32_t maxdomains[] = {
> >> +cpu_to_be32(4),
> >> +cpu_to_be32(0),
> >> +cpu_to_be32(0),
> >> +cpu_to_be32(0),
> >> +cpu_to_be32(nb_numa_nodes - 1),
> >> +};
> >>
> >>  _FDT(rtas = fdt_add_subnode(fdt, 0, "rtas"));
> >>
> >> @@ -946,6 +953,9 @@ static void spapr_dt_rtas(sPAPRMachineState *spapr, 
> >> void *fdt)
> >>  _FDT(fdt_setprop(fdt, rtas, "ibm,associativity-reference-points",
> >>   refpoints, sizeof(refpoints)));
> >>
> >> +_FDT(fdt_setprop(fdt, rtas, "ibm,max-associativity-domains",
> >> + maxdomains, sizeof(maxdomains)));
> >> +
> >>  _FDT(fdt_setprop_cell(fdt, rtas, "rtas-error-log-max",
> >>RTAS_ERROR_LOG_MAX));
> >>  _FDT(fdt_setprop_cell(fdt, rtas, "rtas-event-scan-rate",
> >> -- 
> >> 1.8.3.1
> > 
> > This commit causes hash guest with latest guest kernel to hang at early 
> > boot.
> 
> I use v4.16 tag from stable and can't reproduce on P8 machine reported
> issue.
> 
> Could you please share more details about your setup, kernel commit id
> you spot problem?

I am on 4.16.0-rc7 (commit id: 0b412605ef5f)

BTW this happens only for non-NUMA guest.

Regards,
Bharata.

Re: [Qemu-devel] [PATCH v3 0/4] Small fixes for s390x QEMU boot menu

2018-04-16 Thread Thomas Huth

On 16.04.2018 18:56, Collin Walling wrote:
> Change Log:
> 
> v3
> 
> - added r-b's
> - added check around memset
> 
> v2
> 
> - added r-b's
> - s/zipl_println/zipl_print_entry
> - prints entry and returns entry number
> - while loop now handles valid_entries
> 
> These patches fix the following:
> 
> - The QEMU zIPL boot menu does not allow accurate selection of
> non-sequential entries.
> 
> - The QEMU zIPL boot menu does not have all the capabilities of the
> real zIPL menu (such as commandline args). We should print a different
> banner to reflect this.
> 
> - The loadparm array in main.c can end up being not null terminated when
> converted to an integer via atoui.
> 
> - A loadparm set to an empty string does not allow a boot menu.
> 
> Collin Walling (4):
>   pc-bios/s390-ccw: rename MAX_TABLE_ENTRIES to MAX_BOOT_ENTRIES
>   pc-bios/s390-ccw: fix loadparm initialization and int conversion
>   pc-bios/s390-ccw: fix non-sequential boot entries (eckd)
>   pc-bios/s390-ccw: fix non-sequential boot entries (enum)

Thanks, queued to my s390-ccw-bios branch now.

 Thomas

[Qemu-devel] [PATCH 2/2] fpu: Bound increment for scalbn

2018-04-16 Thread Richard Henderson

Without bounding the increment, we can overflow exp either here
in scalbn_decomposed or when adding the bias in round_canonical.
This can result in e.g. underflowing to 0 instead of overflowing
to infinity.

The old softfloat code did bound the increment.

Signed-off-by: Richard Henderson 
---
 fpu/softfloat.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index ba6e654050..a589f328c9 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -1883,6 +1883,12 @@ static FloatParts scalbn_decomposed(FloatParts a, int n, 
float_status *s)
 return return_nan(a, s);
 }
 if (a.cls == float_class_normal) {
+/* The largest float type (even though not supported by FloatParts)
+ * is float128, which has a 15 bit exponent.  Bounding N to 16 bits
+ * still allows rounding to infinity, without allowing overflow
+ * within the int32_t that backs FloatParts.exp.
+ */
+n = MIN(MAX(n, -0x1), 0x1);
 a.exp += n;
 }
 return a;
-- 
2.14.3

[Qemu-devel] [PATCH for-2.12 0/2] softfloat fixes

2018-04-16 Thread Richard Henderson

An alternate fix for Bastian's division problem and another
fix for a scalbn problem exposed during SVE testing.  Again,
the SVE issue should be visible with normal NEON insns too.


r~


Richard Henderson (2):
  fpu: Check for inf/x before x/0
  fpu: Bound increment for scalbn

 fpu/softfloat.c | 17 ++---
 1 file changed, 14 insertions(+), 3 deletions(-)

-- 
2.14.3

[Qemu-devel] [PATCH 1/2] fpu: Check for inf/x before x/0

2018-04-16 Thread Richard Henderson

The re-factoring of div_floats changed the order of checking meaning
an operation like -inf/0 erroneously raises the divbyzero flag.
IEEE-754 (2008) specifies this should only occur for operations
on finite operands.

We fix this by moving the check on the dividend being Inf/0 to
before the divisor is zero check.

Cc: Bastian Koppelmann 
Cc: Alex Bennée 
Cc: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 fpu/softfloat.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index fb8663f59e..ba6e654050 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -1146,15 +1146,20 @@ static FloatParts div_floats(FloatParts a, FloatParts 
b, float_status *s)
 a.cls = float_class_dnan;
 return a;
 }
-/* Div 0 => Inf */
+/* Inf / x */
+if (a.cls == float_class_inf) {
+a.sign = sign;
+return a;
+}
+/* x / 0 => Inf */
 if (b.cls == float_class_zero) {
 s->float_exception_flags |= float_flag_divbyzero;
 a.cls = float_class_inf;
 a.sign = sign;
 return a;
 }
-/* Inf / x or 0 / x */
-if (a.cls == float_class_inf || a.cls == float_class_zero) {
+/* 0 / x */
+if (a.cls == float_class_zero) {
 a.sign = sign;
 return a;
 }
-- 
2.14.3

Re: [Qemu-devel] [RFC] vhost-user: introduce F_NEED_ALL_IOTLB protocol feature

2018-04-16 Thread Tiwei Bie

On Mon, Apr 16, 2018 at 03:47:06PM +0800, Stefan Hajnoczi wrote:
> On Wed, Apr 11, 2018 at 03:20:27PM +0800, Tiwei Bie wrote:
> > This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB
> 
> Naming suggestion: "PREFILL_IOTLB" instead of "NEED_ALL_IOTLB".
> 
> I wasn't able to guess what "NEED_ALL_IOTLB" does.  "PREFILL_IOTLB"
> communicates that memory translation will be set up in advance.

By naming it as NEED_ALL_IOTLB, I want to indicate that,
without this, it's hard or impossible for backend to get
all the IOTLBs (because backend needs to meet IOVAs first
before getting the corresponding IOTLBs, and backend can't
tell whether it has met all the IOVAs or not, especially
when dynamic mapping is being used by guest). But with
this feature negotiated, backend will expect to get all
the IOTLBs. But it seems that, the name (NEED_ALL_IOTLB)
isn't good. And thanks for your suggestion! :)

Best regards,
Tiwei Bie

Re: [Qemu-devel] [PATCH] vfio/pci-quirks.c: Make stolen memory size adjustable for igd VFIO

2018-04-16 Thread Zhang, Xiong Y

> On Wed, 11 Apr 2018 13:24:10 +0800
> Xiong Zhang  wrote:
> 
> > Currenly linux guest with kernel above 3.19 couldn't boot up on igd
> > passthrough env. The root case is i915 driver use stolen memory, but
> > qemu vfio doesn't support it.
> >
> > This patch set stolen memory size to zero default, so guest i915 won't
> > use it. But this breaks old windows igd driver which will be unloaded
> > once it see zero stolen memory size. Then this patch also use
> > x-igd-gms option to adjust stolen memory size. New windows igd driver
> > fixes this and could work with zero stolen memory size.
> >
> > Finally with this patch, Linux guest and windows guest with igd driver
> > above 15.45.4860 could work successfully, windows guest with igd
> > driver below 15.45.4860 should add x-igd-gms=* option.
> 
> Sorry, I can't understand the above commit log in comparison to what this
> patch is actually doing.
> 
> > Signed-off-by: Xiong Zhang 
> > ---
> >  hw/vfio/pci-quirks.c | 95
> > 
> >  1 file changed, 52 insertions(+), 43 deletions(-)
> >
> > diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c index
> > 60ad5fb..6e9ed7f 100644
> > --- a/hw/vfio/pci-quirks.c
> > +++ b/hw/vfio/pci-quirks.c
> > @@ -1372,14 +1372,60 @@ static void
> vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr)
> >  uint16_t cmd_orig, cmd;
> >  Error *err = NULL;
> >
> > +/* This must be an Intel VGA device. */
> > +if (!vfio_pci_is(vdev, PCI_VENDOR_ID_INTEL, PCI_ANY_ID) ||
> > +!vfio_is_vga(vdev) || nr != 4) {
> > +return;
> > +}
> > +
> >  /*
> > - * This must be an Intel VGA device at address 00:02.0 for us to even
> > - * consider enabling legacy mode.  The vBIOS has dependencies on
> the
> > - * PCI bus address.
> > + * IGD is not a standard, they like to change their specs often.  We
> > + * only attempt to support back to SandBridge and we hope that
> newer
> > + * devices maintain compatibility with generation 8.
> >   */
> > -if (!vfio_pci_is(vdev, PCI_VENDOR_ID_INTEL, PCI_ANY_ID) ||
> > -!vfio_is_vga(vdev) || nr != 4 ||
> > ->pdev !=
> pci_find_device(pci_device_root_bus(>pdev),
> > +gen = igd_gen(vdev);
> > +if (gen != 6 && gen != 8) {
> > +error_report("IGD device %s is unsupported by IGD quirks, "
> > + "try SandyBridge or newer",
> vdev->vbasedev.name);
> > +return;
> > +}
> > +
> > +/*
> > + * Regardless of running in UPT or legacy mode, the guest graphics
> > + * driver may attempt to use stolen memory, however only legacy
> mode
> > + * has BIOS support for reserving stolen memory in the guest VM.
> > + * Emulate the GMCH register in all cases and zero out the stolen
> > + * memory size here.
> > + */
> > +gmch = vfio_pci_read_config(>pdev, IGD_GMCH, 4);
> > +gmch &= ~((gen < 8 ? 0x1f : 0xff) << (gen < 8 ? 3 : 8));
> 
> Aha!  You want to enable stolen memory in either UPT or legacy mode.  I
> could not possibly have determined that from the commit log.  Please
> rewrite the commit log to describe what is actually changing here.
[Zhang, Xiong Y] Here, I want to disable stolen memory in UPT and legacy mode.
Sure, I will rewrite the commit log.
> 
> > +
> > +/*
> > + * Assume we have no GMS memory, but allow it to be overrided by
> device
> > + * option (experimental). Old windows igd driver must see non-zero
> stolen
> > + * memory size, otherwise it will be unloaded. New igd windows
> driver fix
> > + * this issue and could load with zero stolen memory size.
> > + */
> > +if (vdev->igd_gms) {
> > +if (vdev->igd_gms <= 0x10) {
> > +gms_mb = vdev->igd_gms * 32;
> > +gmch |= vdev->igd_gms << (gen < 8 ? 3 : 8);
> > +} else {
> > +error_report("Unsupported IGD GMS value 0x%x",
> vdev->igd_gms);
> > +vdev->igd_gms = 0;
> > +}
> > +}
> 
> You state in the previous code block that only legacy mode has BIOS support
> for reserving stolen memory, so why should UPT mode be allowed to specify
> stolen memory?  Where is it allocated?
[Zhang, Xiong Y] As the previous code disable stolen memory and set the size of 
stolen memory to zero, this cause old windows igd driver couldn't be loaded as 
the old windows igd driver will be unloaded once it see zero size stolen memory 
but the old windows igd driver doesn't use stolen memory, so for old windows 
igd driver, we just need to set a none zero size stolen memory and don't need 
to allocate an emulated stolen memory . This is old windows igd driver issue 
and has been fixed in new windows igd driver.
So the above code is just for old windows igd driver. if user use an old 
windows igd driver, he could use x-igd-gms option to workaround old windows igd 
driver issue. Without this code, old windows igd driver couldn't

Re: [Qemu-devel] [RFC] vhost-user: introduce F_NEED_ALL_IOTLB protocol feature

2018-04-16 Thread Jason Wang




On 2018年04月16日 15:47, Stefan Hajnoczi wrote:

On Wed, Apr 11, 2018 at 03:20:27PM +0800, Tiwei Bie wrote:

This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB

Naming suggestion: "PREFILL_IOTLB" instead of "NEED_ALL_IOTLB".

I wasn't able to guess what "NEED_ALL_IOTLB" does.  "PREFILL_IOTLB"
communicates that memory translation will be set up in advance.


I agree we need a better name here but it should not have anything like 
"IOTLB" since it actually try to do page table shadowing on the remote 
vhost-user backend.


Thanks

Re: [Qemu-devel] [PATCH for-2.13 2/4] platform-bus-device: use device plug callback instead of machine_done notifier

2018-04-16 Thread David Gibson

On Mon, Apr 16, 2018 at 10:19:14AM +0200, Igor Mammedov wrote:
> On Mon, 16 Apr 2018 12:43:55 +1000
> David Gibson  wrote:
> 
> > On Thu, Apr 12, 2018 at 06:40:19PM +0200, Igor Mammedov wrote:
> > > platform-bus were using machine_done notifier to get and map
> > > (assign irq/mmio resources) dynamically added sysbus devices
> > > after all '-device' options had been processed.
> > > That however creates non obvious dependencies on ordering of
> > > machine_done notifiers and requires carefull line juggling
> > > to keep it working. For example see comment above
> > > create_platform_bus() and 'straitforward' arm_load_kernel()
> > > had to converted to machine_done notifier and that lead to
> > > yet another machine_done notifier to keep it working
> > > arm_register_platform_bus_fdt_creator().
> > > 
> > > Instead of hiding resource assignment in platform-bus-device
> > > to magically initialize sysbus devices, use device plug
> > > callback and assign resources explicitly at board level
> > > at the moment each -device option is being processed.
> > > 
> > > That adds a bunch of machine declaration boiler plate to
> > > e500plat board, similar to ARM/x86 but gets rid of hidden
> > > machine_done notifier and would allow to remove the dependent
> > > notifiers in ARM code simplifying it and making code flow
> > > easier to follow.
> > > 
> > > Signed-off-by: Igor Mammedov 
> > > ---
> > > CC: ag...@suse.de
> > > CC: da...@gibson.dropbear.id.au
> > > CC: qemu-...@nongnu.org
> > > ---
> > >  hw/ppc/e500.h |  3 +++
> > >  include/hw/arm/virt.h |  3 +++
> > >  include/hw/platform-bus.h |  4 ++--
> > >  hw/arm/sysbus-fdt.c   |  3 ---
> > >  hw/arm/virt.c | 36 
> > >  hw/core/platform-bus.c| 29 ---
> > >  hw/ppc/e500.c | 37 +
> > >  hw/ppc/e500plat.c | 60 
> > > +--
> > >  8 files changed, 129 insertions(+), 46 deletions(-)
> > > 
> > > diff --git a/hw/ppc/e500.h b/hw/ppc/e500.h
> > > index 70ba1d8..d0f8ddd 100644
> > > --- a/hw/ppc/e500.h
> > > +++ b/hw/ppc/e500.h
> > > @@ -2,6 +2,7 @@
> > >  #define PPCE500_H
> > >  
> > >  #include "hw/boards.h"
> > > +#include "hw/sysbus.h"
> > >  
> > >  typedef struct PPCE500Params {
> > >  int pci_first_slot;
> > > @@ -26,6 +27,8 @@ typedef struct PPCE500Params {
> > >  
> > >  void ppce500_init(MachineState *machine, PPCE500Params *params);
> > >  
> > > +void ppce500_plug_dynamic_sysbus_device(SysBusDevice *sbdev);
> > > +
> > >  hwaddr booke206_page_size_to_tlb(uint64_t size);
> > >  
> > >  #endif
> > > diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
> > > index ba0c1a4..5535760 100644
> > > --- a/include/hw/arm/virt.h
> > > +++ b/include/hw/arm/virt.h
> > > @@ -86,11 +86,14 @@ typedef struct {
> > >  bool no_pmu;
> > >  bool claim_edge_triggered_timers;
> > >  bool smbios_old_sys_ver;
> > > +HotplugHandler *(*get_hotplug_handler)(MachineState *machine,
> > > +   DeviceState *dev);
> > >  } VirtMachineClass;
> > >  
> > >  typedef struct {
> > >  MachineState parent;
> > >  Notifier machine_done;
> > > +DeviceState *platform_bus_dev;
> > >  FWCfgState *fw_cfg;
> > >  bool secure;
> > >  bool highmem;
> > > diff --git a/include/hw/platform-bus.h b/include/hw/platform-bus.h
> > > index a00775c..19e20c5 100644
> > > --- a/include/hw/platform-bus.h
> > > +++ b/include/hw/platform-bus.h
> > > @@ -37,8 +37,6 @@ typedef struct PlatformBusDevice PlatformBusDevice;
> > >  struct PlatformBusDevice {
> > >  /*< private >*/
> > >  SysBusDevice parent_obj;
> > > -Notifier notifier;
> > > -bool done_gathering;
> > >  
> > >  /*< public >*/
> > >  uint32_t mmio_size;
> > > @@ -54,4 +52,6 @@ int platform_bus_get_irqn(PlatformBusDevice 
> > > *platform_bus, SysBusDevice *sbdev,
> > >  hwaddr platform_bus_get_mmio_addr(PlatformBusDevice *pbus, SysBusDevice 
> > > *sbdev,
> > >int n);
> > >  
> > > +void platform_bus_link_device(PlatformBusDevice *pbus, SysBusDevice 
> > > *sbdev);
> > > +
> > >  #endif /* HW_PLATFORM_BUS_H */
> > > diff --git a/hw/arm/sysbus-fdt.c b/hw/arm/sysbus-fdt.c
> > > index d68e3dc..80ff70e 100644
> > > --- a/hw/arm/sysbus-fdt.c
> > > +++ b/hw/arm/sysbus-fdt.c
> > > @@ -506,9 +506,6 @@ static void 
> > > add_all_platform_bus_fdt_nodes(ARMPlatformBusFDTParams *fdt_params)
> > >  dev = qdev_find_recursive(sysbus_get_default(), 
> > > TYPE_PLATFORM_BUS_DEVICE);
> > >  pbus = PLATFORM_BUS_DEVICE(dev);
> > >  
> > > -/* We can only create dt nodes for dynamic devices when they're 
> > > ready */
> > > -assert(pbus->done_gathering);
> > > -
> > >  PlatformBusFDTData data = {
> > >  .fdt = fdt,
> > >  .irq_start = irq_start,
> > > diff --git a/hw/arm/virt.c

Re: [Qemu-devel] [PATCH v3] spapr: Support ibm, dynamic-memory-v2 property

2018-04-16 Thread David Gibson

On Mon, Apr 16, 2018 at 02:25:12PM +0530, Bharata B Rao wrote:
> The new property ibm,dynamic-memory-v2 allows memory to be represented
> in a more compact manner in device tree.
> 
> Signed-off-by: Bharata B Rao 
> ---
> v2 - https://lists.nongnu.org/archive/html/qemu-ppc/2018-04/msg00052.html
> Changes in v3:
> - Addressed David Gibson's review comments: use of CamelCase, separate
>   helper for populating drconf_cell queue entries, removed the user
>   configurability of drmem_v2 machine property.
>  
>  docs/specs/ppc-spapr-hotplug.txt |  19 
>  hw/ppc/spapr.c   | 233 
> +++
>  include/hw/ppc/spapr.h   |   1 +
>  include/hw/ppc/spapr_ovec.h  |   1 +
>  4 files changed, 207 insertions(+), 47 deletions(-)
> 
> diff --git a/docs/specs/ppc-spapr-hotplug.txt 
> b/docs/specs/ppc-spapr-hotplug.txt
> index f57e2a09c6..cc7833108e 100644
> --- a/docs/specs/ppc-spapr-hotplug.txt
> +++ b/docs/specs/ppc-spapr-hotplug.txt
> @@ -387,4 +387,23 @@ Each LMB list entry consists of the following elements:
>  - A 32bit flags word. The bit at bit position 0x0008 defines whether
>the LMB is assigned to the the partition as of boot time.
>  
> +ibm,dynamic-memory-v2
> +
> +This property describes the dynamically reconfigurable memory. This is
> +an alternate and newer way to describe dyanamically reconfigurable memory.
> +It is a property encoded array that has an integer N (the number of
> +LMB set entries) followed by N LMB set entries. There is an LMB set entry
> +for each sequential group of LMBs that share common attributes.
> +
> +Each LMB set entry consists of the following elements:
> +
> +- Number of sequential LMBs in the entry represented by a 32bit integer.
> +- Logical address of the first LMB in the set encoded as a 64bit integer.
> +- DRC index of the first LMB in the set.
> +- Associativity list index that is used as an index into
> +  ibm,associativity-lookup-arrays property described earlier. This
> +  is used to retrieve the right associativity list to be used for all
> +  the LMBs in this set.
> +- A 32bit flags word that applies to all the LMBs in the set.
> +
>  [1] http://thread.gmane.org/gmane.linux.ports.ppc.embedded/75350/focus=106867
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 3ffadd6ac7..fe327f3c1b 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -669,63 +669,136 @@ static uint32_t 
> spapr_pc_dimm_node(MemoryDeviceInfoList *list, ram_addr_t addr)
>  return -1;
>  }
>  
> -/*
> - * Adds ibm,dynamic-reconfiguration-memory node.
> - * Refer to docs/specs/ppc-spapr-hotplug.txt for the documentation
> - * of this device tree node.
> - */
> -static int spapr_populate_drconf_memory(sPAPRMachineState *spapr, void *fdt)
> +struct sPAPRDrconfCellV2 {
> + uint32_t seq_lmbs;
> + uint64_t base_addr;
> + uint32_t drc_index;
> + uint32_t aa_index;
> + uint32_t flags;
> +} __attribute__((packed));

Should be QEMU_PACKED to at least allow for the possibility of other compilers.

> +
> +/* 6 uint32_t entries in sPAPRDrconfCellV2 */
> +#define SPAPR_NR_DRCONF_CELL_ENTRIES 6

Please use sizeof, as noted before.

> +
> +typedef struct DrconfCellQueue {
> +struct sPAPRDrconfCellV2 cell;
> +QSIMPLEQ_ENTRY(DrconfCellQueue) entry;
> +} DrconfCellQueue;
> +
> +QSIMPLEQ_HEAD(, DrconfCellQueue) drconf_queue
> += QSIMPLEQ_HEAD_INITIALIZER(drconf_queue);
> +
> +static void spapr_add_drconf_cell(uint32_t seq_lmbs, uint64_t base_addr,
> +  uint32_t drc_index, uint32_t aa_index,
> +  uint32_t flags)
> +{
> +DrconfCellQueue *elem;
> +
> +elem = g_malloc0(sizeof(*elem));
> +elem->cell.seq_lmbs = cpu_to_be32(seq_lmbs);
> +elem->cell.base_addr = cpu_to_be64(base_addr);
> +elem->cell.drc_index = cpu_to_be32(drc_index);
> +elem->cell.aa_index = cpu_to_be32(aa_index);
> +elem->cell.flags = cpu_to_be32(flags);
> +QSIMPLEQ_INSERT_TAIL(_queue, elem, entry);

Don't use a global, pass the queue as a parameter.  Or alternatively
have the helper just return the allocated and initialized element to
add to the queue in the caller.

> +}
> +
> +/* ibm,dynamic-memory-v2 */
> +static int spapr_populate_drmem_v2(sPAPRMachineState *spapr, void *fdt,
> +   int offset, MemoryDeviceInfoList *dimms)
>  {
> -MachineState *machine = MACHINE(spapr);
> -int ret, i, offset;
> -uint64_t lmb_size = SPAPR_MEMORY_BLOCK_SIZE;
> -uint32_t prop_lmb_size[] = {0, cpu_to_be32(lmb_size)};
> -uint32_t hotplug_lmb_start = spapr->hotplug_memory.base / lmb_size;
> -uint32_t nr_lmbs = (spapr->hotplug_memory.base +
> -   memory_region_size(>hotplug_memory.mr)) /
> -   lmb_size;
>  uint32_t *int_buf, *cur_index, buf_len;
> -int nr_nodes = nb_numa_nodes ? nb_numa_nodes : 1;
> -MemoryDeviceInfoList *dimms = NULL;
> +

Re: [Qemu-devel] [Consult] nfs-vsocks support

2018-04-16 Thread Roy, Arindam

Hi Jing,
Busybox is a standalone statically linked bash env.
You can download the source from here:
https://busybox.net/downloads/

Or you can just download and copy it to your /usr/sbin :
https://busybox.net/downloads/binaries/

The script is creating a runtime filesystem for your guest.
Its linking all the standard shell commands to busybox, so in the guest runtime 
those are executed by busybox.

Arindam

Re: [Qemu-devel] [PATCH v2 5/7] ACPICA: Integrate package handling with module-level code

2018-04-16 Thread Dan Williams

On Mon, Apr 16, 2018 at 4:15 PM, Schmauss, Erik  wrote:
> [ trimming ]
>> >> Rafael, we may want to hold back on the module-level code changes
>> >> (the patches below) for rc1. Between this and the strange _TSS issue,
>> >> it seems like there are a few more things to resolve before this is
>> >> ready for kernel upstream.
>> >
> Hi Rafael,
>
>> > It looks like you are asking me to queue up reverts as per the Dan's
>> > report, is that correct?
>
> This is indeed what I meant last week. However, I've looked into the issue 
> and Dan's qemu
> instance had AML that we no longer support. This is because the ACPICA commit 
> makes changes to the execution of AML
> during table load to match windows AML interpreter behavior so this commit 
> also got rid of support for executing code
> containing forward references (except for package elements).
>
> I've suggested a fix for the firmware in a separate email. So I would say 
> that this issue is resolved after if Dan can run
> his test successfully with the adjusted firmware.
>
> If Dan's test is successful, we don’t need to revert these changes

I'm concerned about other qemu-kvm users that do not upgrade their
hypervisor at the same pace as their guest kernel. Especially for
cloud providers that may be running latest mainline kernel on older
qemu-kvm this will look like a pure kernel regression. Is there a
quick fix we can carry in the kernel to support these forward
references, at least until we know that qemu-kvm is no longer shipping
the broken AML?

[Qemu-devel] [Bug 955379] Re: cmake hangs with qemu-arm-static

2018-04-16 Thread David Lechner

I have filed a new bug:
https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1764555

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/955379

Title:
  cmake hangs with qemu-arm-static

Status in QEMU:
  Fix Released
Status in Linaro QEMU:
  Confirmed
Status in qemu package in Ubuntu:
  Fix Released
Status in qemu-linaro package in Ubuntu:
  Confirmed
Status in qemu source package in Xenial:
  Incomplete
Status in qemu-linaro source package in Xenial:
  New

Bug description:
  I'm using git commit 3e7ecd976b06f... configured with --target-list
  =arm-linux-user --static in a chroot environment to compile some
  things. I ran into this problem with both pcl and opencv-2.3.1. cmake
  consistently freezes at some point during its execution, though in a
  different spot each time, usually during a step when it's searching
  for some libraries. For instance, pcl most commonly stops after:

  [snip]
  -- Boost version: 1.46.1
  -- Found the following Boost libraries:
  --   system
  --   filesystem
  --   thread
  --   date_time
  -- checking for module 'eigen3'
  --   found eigen3, version 3.0.1

  which is perplexing because it freezes after finding what it wants,
  not during the search. When it does get past that point, it does so
  almost immediately but freezes somewhere else.

  I'm using 64-bit Ubuntu 11.10 with kernel release 3.0.0-16-generic
  with an Intel i5.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/955379/+subscriptions

Re: [Qemu-devel] [PATCH V4] migration: add capability to bypass the shared memory

2018-04-16 Thread Lai Jiangshan

On Tue, Apr 10, 2018 at 1:30 AM, Dr. David Alan Gilbert
 wrote:

>>
>> +bool migrate_bypass_shared_memory(void)
>> +{
>> +MigrationState *s;
>> +
>> +/* it is not workable with postcopy yet. */
>> +if (migrate_postcopy_ram()) {
>> +return false;
>> +}
>
> Please change this to work in the same way as the check for
> postcopy+compress in migration.c migrate_caps_check.


done in V5.

>
>> +s = migrate_get_current();
>> +
>> +return 
>> s->enabled_capabilities[MIGRATION_CAPABILITY_BYPASS_SHARED_MEMORY];
>> +}
>> +
>>  bool migrate_postcopy_ram(void)
>>  {
>>  MigrationState *s;
>> diff --git a/migration/migration.h b/migration/migration.h
>> index 8d2f320c48..cfd2513ef0 100644
>> --- a/migration/migration.h
>> +++ b/migration/migration.h
>> @@ -206,6 +206,7 @@ MigrationState *migrate_get_current(void);
>>
>>  bool migrate_postcopy(void);
>>
>> +bool migrate_bypass_shared_memory(void);
>>  bool migrate_release_ram(void);
>>  bool migrate_postcopy_ram(void);
>>  bool migrate_zero_blocks(void);
>> diff --git a/migration/ram.c b/migration/ram.c
>> index 0e90efa092..bca170c386 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -780,6 +780,11 @@ unsigned long migration_bitmap_find_dirty(RAMState *rs, 
>> RAMBlock *rb,
>>  unsigned long *bitmap = rb->bmap;
>>  unsigned long next;
>>
>> +/* when this ramblock is requested bypassing */
>> +if (!bitmap) {
>> +return size;
>> +}
>> +
>>  if (rs->ram_bulk_stage && start > 0) {
>>  next = start + 1;
>>  } else {
>> @@ -850,7 +855,9 @@ static void migration_bitmap_sync(RAMState *rs)
>>  qemu_mutex_lock(>bitmap_mutex);
>>  rcu_read_lock();
>>  RAMBLOCK_FOREACH(block) {
>> -migration_bitmap_sync_range(rs, block, 0, block->used_length);
>> +if (!migrate_bypass_shared_memory() || !qemu_ram_is_shared(block)) {
>> +migration_bitmap_sync_range(rs, block, 0, block->used_length);
>> +}
>>  }
>>  rcu_read_unlock();
>>  qemu_mutex_unlock(>bitmap_mutex);
>> @@ -2132,18 +2139,12 @@ static int ram_state_init(RAMState **rsp)
>>  qemu_mutex_init(&(*rsp)->src_page_req_mutex);
>>  QSIMPLEQ_INIT(&(*rsp)->src_page_requests);
>>
>> -/*
>> - * Count the total number of pages used by ram blocks not including any
>> - * gaps due to alignment or unplugs.
>> - */
>> -(*rsp)->migration_dirty_pages = ram_bytes_total() >> TARGET_PAGE_BITS;
>> -
>>  ram_state_reset(*rsp);
>>
>>  return 0;
>>  }
>>
>> -static void ram_list_init_bitmaps(void)
>> +static void ram_list_init_bitmaps(RAMState *rs)
>>  {
>>  RAMBlock *block;
>>  unsigned long pages;
>> @@ -2151,9 +2152,17 @@ static void ram_list_init_bitmaps(void)
>>  /* Skip setting bitmap if there is no RAM */
>>  if (ram_bytes_total()) {
>
> I think you need to add here a :
>rs->migration_dirty_pages = 0;

In ram_state_init(),
*rsp = g_try_new0(RAMState, 1);
so the state is always reset.

>
> I don't see anywhere else that initialises it, and there is the case of
> a migration that fails, followed by a 2nd attempt.
>
>>  QLIST_FOREACH_RCU(block, _list.blocks, next) {
>> +if (migrate_bypass_shared_memory() && 
>> qemu_ram_is_shared(block)) {
>> +continue;
>> +}
>>  pages = block->max_length >> TARGET_PAGE_BITS;
>>  block->bmap = bitmap_new(pages);
>>  bitmap_set(block->bmap, 0, pages);
>> +/*
>> + * Count the total number of pages used by ram blocks not
>> + * including any gaps due to alignment or unplugs.
>> + */
>> +rs->migration_dirty_pages += pages;
>>  if (migrate_postcopy_ram()) {
>>  block->unsentmap = bitmap_new(pages);
>>  bitmap_set(block->unsentmap, 0, pages);
>> @@ -2169,7 +2178,7 @@ static void ram_init_bitmaps(RAMState *rs)
>>  qemu_mutex_lock_ramlist();
>>  rcu_read_lock();
>>
>> -ram_list_init_bitmaps();
>> +ram_list_init_bitmaps(rs);
>>  memory_global_dirty_log_start();
>>  migration_bitmap_sync(rs);
>>
>> diff --git a/qapi/migration.json b/qapi/migration.json
>> index 9d0bf82cf4..45326480bd 100644
>> --- a/qapi/migration.json
>> +++ b/qapi/migration.json
>> @@ -357,13 +357,17 @@
>>  # @dirty-bitmaps: If enabled, QEMU will migrate named dirty bitmaps.
>>  # (since 2.12)
>>  #
>> +# @bypass-shared-memory: the shared memory region will be bypassed on 
>> migration.
>> +#  This feature allows the memory region to be reused by new qemu(s)
>> +#  or be migrated separately. (since 2.13)
>> +#
>>  # Since: 1.2
>>  ##
>>  { 'enum': 'MigrationCapability',
>>'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks',
>> 'compress', 'events', 'postcopy-ram', 'x-colo', 'release-ram',
>> 'block', 'return-path', 'pause-before-switchover', 'x-multifd',

Re: [Qemu-devel] qapi: [PATCH v2] Implement query-usbhost QMP command

2018-04-16 Thread Alexander Kappner

Hi Gerd,

thanks for reviewing. I'll follow up with a v3 of the patch addressing your 
proposed changes.

>> This allows a QMP client to query which USB devices may be available
>> for redirection.

> At least libvirt sandboxes qemu, for security reasons, and you wouldn't
> get any useful results because qemu hasn't the permissions needed to
> scan the usb bus.
Then at least this tells you that you can't make anything available via 
redirection either, which is also useful information.

> I'm wondering what the use case is?
I'm working on a smartphone client that talks to a QEMU instance via QMP, and 
that allows the user to switch USB devices between host and QEMU guest. This 
removes the need for a separate set of input devices permanently assigned to 
the host (since if I pass through a keyboard/mouse to the VM, I can't use that 
same keyboard/mouse to run a command that re-attaches those devices to the 
host, so I need some external channel). Basically using the phone as a KVM 
switch.

Best
Alexander

On 04/16/2018 01:29 AM, Gerd Hoffmann wrote:
> On Sat, Apr 14, 2018 at 02:29:34AM -0700, Alexander Kappner wrote:
>> Implement a QMP command similar to the HMP's "info usbhost" command.
> 
> The hmp version of the command should be switched over to call the
> qmp_query_usbhost() and pretty-print the qapi struct then, to make
> sure hmp and qmp versions of the command deliver consistent results.
> 
>> This allows a QMP client to query which USB devices may be available
>> for redirection.
> 
> I'm wondering what the use case is?
> 
> At least libvirt sandboxes qemu, for security reasons, and you wouldn't
> get any useful results because qemu hasn't the permissions needed to
> scan the usb bus.
> 
>>  ##
>> +# @query-usbhost:
>> +#
>> +# Returns information about USB devices available on the host
>> +#
>> +# Returns: a [UsbDeviceInfo]. Returns an error if compiled without
>> +# CONFIG_USB_LIBUSB
>> +#
>> +# Since: TODO (maintainer insert version number if mainlined)
> 
> Will not make it into 2.12, so 2.13.
> 
>> +# @UsbDeviceInfo:
>> +#
>> +# @speed: the speed
> 
> Use an enum for this?
> 
>> +# @id_vendor: idVendor field from device descriptor
>> +#
>> +# @id_product: idProduct field from device descriptor
>> +#
>> +# @str_product: string descriptor referenced by iProduct index, if any
>> +#
>> +# @str_manufacturer: string descriptor referenced by iManufacturer index, 
>> if any
>> +#
>> +# @dev_addr: address on bus that device is connected to
>> +#
>> +# @bus_num: bus number device is connected to
> 
> No port?
> 
> cheers,
>   Gerd
>

Re: [Qemu-devel] [PATCH] vfio/pci-quirks.c: Make stolen memory size adjustable for igd VFIO

2018-04-16 Thread Alex Williamson

On Wed, 11 Apr 2018 13:24:10 +0800
Xiong Zhang  wrote:

> Currenly linux guest with kernel above 3.19 couldn't boot up on igd
> passthrough env. The root case is i915 driver use stolen memory, but
> qemu vfio doesn't support it.
> 
> This patch set stolen memory size to zero default, so guest i915 won't
> use it. But this breaks old windows igd driver which will be unloaded
> once it see zero stolen memory size. Then this patch also use x-igd-gms
> option to adjust stolen memory size. New windows igd driver fixes this and
> could work with zero stolen memory size.
> 
> Finally with this patch, Linux guest and windows guest with igd driver
> above 15.45.4860 could work successfully, windows guest with igd driver
> below 15.45.4860 should add x-igd-gms=* option.

Sorry, I can't understand the above commit log in comparison to what
this patch is actually doing.

> Signed-off-by: Xiong Zhang 
> ---
>  hw/vfio/pci-quirks.c | 95 
> 
>  1 file changed, 52 insertions(+), 43 deletions(-)
> 
> diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
> index 60ad5fb..6e9ed7f 100644
> --- a/hw/vfio/pci-quirks.c
> +++ b/hw/vfio/pci-quirks.c
> @@ -1372,14 +1372,60 @@ static void vfio_probe_igd_bar4_quirk(VFIOPCIDevice 
> *vdev, int nr)
>  uint16_t cmd_orig, cmd;
>  Error *err = NULL;
>  
> +/* This must be an Intel VGA device. */
> +if (!vfio_pci_is(vdev, PCI_VENDOR_ID_INTEL, PCI_ANY_ID) ||
> +!vfio_is_vga(vdev) || nr != 4) {
> +return;
> +}
> +
>  /*
> - * This must be an Intel VGA device at address 00:02.0 for us to even
> - * consider enabling legacy mode.  The vBIOS has dependencies on the
> - * PCI bus address.
> + * IGD is not a standard, they like to change their specs often.  We
> + * only attempt to support back to SandBridge and we hope that newer
> + * devices maintain compatibility with generation 8.
>   */
> -if (!vfio_pci_is(vdev, PCI_VENDOR_ID_INTEL, PCI_ANY_ID) ||
> -!vfio_is_vga(vdev) || nr != 4 ||
> ->pdev != pci_find_device(pci_device_root_bus(>pdev),
> +gen = igd_gen(vdev);
> +if (gen != 6 && gen != 8) {
> +error_report("IGD device %s is unsupported by IGD quirks, "
> + "try SandyBridge or newer", vdev->vbasedev.name);
> +return;
> +}
> +
> +/*
> + * Regardless of running in UPT or legacy mode, the guest graphics
> + * driver may attempt to use stolen memory, however only legacy mode
> + * has BIOS support for reserving stolen memory in the guest VM.
> + * Emulate the GMCH register in all cases and zero out the stolen
> + * memory size here.
> + */
> +gmch = vfio_pci_read_config(>pdev, IGD_GMCH, 4);
> +gmch &= ~((gen < 8 ? 0x1f : 0xff) << (gen < 8 ? 3 : 8));

Aha!  You want to enable stolen memory in either UPT or legacy mode.  I
could not possibly have determined that from the commit log.  Please
rewrite the commit log to describe what is actually changing here.

> +
> +/*
> + * Assume we have no GMS memory, but allow it to be overrided by device
> + * option (experimental). Old windows igd driver must see non-zero stolen
> + * memory size, otherwise it will be unloaded. New igd windows driver fix
> + * this issue and could load with zero stolen memory size.
> + */
> +if (vdev->igd_gms) {
> +if (vdev->igd_gms <= 0x10) {
> +gms_mb = vdev->igd_gms * 32;
> +gmch |= vdev->igd_gms << (gen < 8 ? 3 : 8);
> +} else {
> +error_report("Unsupported IGD GMS value 0x%x", vdev->igd_gms);
> +vdev->igd_gms = 0;
> +}
> +}

You state in the previous code block that only legacy mode has BIOS
support for reserving stolen memory, so why should UPT mode be allowed
to specify stolen memory?  Where is it allocated?

> +
> +/* GMCH is read-only, emulated */
> +pci_set_long(vdev->pdev.config + IGD_GMCH, gmch);
> +pci_set_long(vdev->pdev.wmask + IGD_GMCH, 0);
> +pci_set_long(vdev->emulated_config_bits + IGD_GMCH, ~0);
> +
> +/*
> + * This must be at address 00:02.0 for us to even onsider enabling
> + * legacy mode.  The vBIOS has dependencies on the PCI bus address.
> + */
> +if (>pdev != pci_find_device(pci_device_root_bus(>pdev),
> 0, PCI_DEVFN(0x2, 0))) {

So we allow the user to specify stolen memory size in UPT mode, we
emulate that in the GMCH register, but we do nothing with the BDSM
register which tells the guest driver the address of the stolen
memory.  What's the point?  I'm lost.  Thanks,

Alex

>  return;
>  }
> @@ -1399,18 +1445,6 @@ static void vfio_probe_igd_bar4_quirk(VFIOPCIDevice 
> *vdev, int nr)
>  }
>  
>  /*
> - * IGD is not a standard, they like to change their specs often.  We
> - * only attempt to support back to SandBridge and

Re: [Qemu-devel] [Xen-devel] [PATCH v3 14/41] hw/display: Use the BYTE-based definitions

2018-04-16 Thread Alistair Francis

On Sun, Apr 15, 2018 at 4:42 PM, Philippe Mathieu-Daudé  wrote:
> It eases code review, unit is explicit.
>
> Patch generated using:
>
>   $ git grep -E '(1024|2048|4096|8192|(<<|>>).?(10|20|30))' hw/ include/hw/
>
> and modified manually.
>
> Signed-off-by: Philippe Mathieu-Daudé 
> Reviewed-by: Gerd Hoffmann 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  include/hw/display/xlnx_dp.h |  5 +++--
>  hw/display/cirrus_vga.c  | 10 +-
>  hw/display/g364fb.c  |  3 ++-
>  hw/display/qxl.c | 27 ---
>  hw/display/sm501.c   |  2 +-
>  hw/display/vga-isa-mm.c  |  5 +++--
>  hw/display/vga.c |  5 +++--
>  hw/display/virtio-gpu.c  |  4 ++--
>  hw/display/vmware_vga.c  |  3 ++-
>  hw/display/xenfb.c   |  3 ++-
>  10 files changed, 35 insertions(+), 32 deletions(-)
>
> diff --git a/include/hw/display/xlnx_dp.h b/include/hw/display/xlnx_dp.h
> index ee046a5fac..8fb604dee0 100644
> --- a/include/hw/display/xlnx_dp.h
> +++ b/include/hw/display/xlnx_dp.h
> @@ -29,14 +29,15 @@
>  #include "hw/display/dpcd.h"
>  #include "hw/i2c/i2c-ddc.h"
>  #include "qemu/fifo8.h"
> +#include "qemu/units.h"
>  #include "hw/dma/xlnx_dpdma.h"
>  #include "audio/audio.h"
>
>  #ifndef XLNX_DP_H
>  #define XLNX_DP_H
>
> -#define AUD_CHBUF_MAX_DEPTH 32768
> -#define MAX_QEMU_BUFFER_SIZE4096
> +#define AUD_CHBUF_MAX_DEPTH (32 * K_BYTE)
> +#define MAX_QEMU_BUFFER_SIZE(4 * K_BYTE)
>
>  #define DP_CORE_REG_ARRAY_SIZE  (0x3AF >> 2)
>  #define DP_AVBUF_REG_ARRAY_SIZE (0x238 >> 2)
> diff --git a/hw/display/cirrus_vga.c b/hw/display/cirrus_vga.c
> index 138ae961b9..b6d6263297 100644
> --- a/hw/display/cirrus_vga.c
> +++ b/hw/display/cirrus_vga.c
> @@ -27,6 +27,7 @@
>   *   available at http://home.worldonline.dk/~finth/
>   */
>  #include "qemu/osdep.h"
> +#include "qemu/units.h"
>  #include "qapi/error.h"
>  #include "trace.h"
>  #include "hw/hw.h"
> @@ -2218,7 +2219,7 @@ static inline void 
> cirrus_cursor_compute_yrange(CirrusVGAState *s)
>  uint32_t content;
>  int y, y_min, y_max;
>
> -src = s->vga.vram_ptr + s->real_vram_size - 16 * 1024;
> +src = s->vga.vram_ptr + s->real_vram_size - 16 * K_BYTE;
>  if (s->vga.sr[0x12] & CIRRUS_CURSOR_LARGE) {
>  src += (s->vga.sr[0x13] & 0x3c) * 256;
>  y_min = 64;
> @@ -2347,7 +2348,7 @@ static void cirrus_cursor_draw_line(VGACommonState *s1, 
> uint8_t *d1, int scr_y)
>  return;
>  }
>
> -src = s->vga.vram_ptr + s->real_vram_size - 16 * 1024;
> +src = s->vga.vram_ptr + s->real_vram_size - 16 * K_BYTE;
>  if (s->vga.sr[0x12] & CIRRUS_CURSOR_LARGE) {
>  src += (s->vga.sr[0x13] & 0x3c) * 256;
>  src += (scr_y - s->vga.hw_cursor_y) * 16;
> @@ -2995,8 +2996,7 @@ static void cirrus_init_common(CirrusVGAState *s, 
> Object *owner,
>
>  /* I/O handler for LFB */
>  memory_region_init_io(>cirrus_linear_io, owner, 
> _linear_io_ops, s,
> -  "cirrus-linear-io", s->vga.vram_size_mb
> -  * 1024 * 1024);
> +  "cirrus-linear-io", s->vga.vram_size_mb * M_BYTE);
>  memory_region_set_flush_coalesced(>cirrus_linear_io);
>
>  /* I/O handler for LFB */
> @@ -3013,7 +3013,7 @@ static void cirrus_init_common(CirrusVGAState *s, 
> Object *owner,
>  memory_region_set_flush_coalesced(>cirrus_mmio_io);
>
>  s->real_vram_size =
> -(s->device_id == CIRRUS_ID_CLGD5446) ? 4096 * 1024 : 2048 * 1024;
> +(s->device_id == CIRRUS_ID_CLGD5446) ? 4 * M_BYTE : 2 * M_BYTE;
>
>  /* XXX: s->vga.vram_size must be a power of two */
>  s->cirrus_addr_mask = s->real_vram_size - 1;
> diff --git a/hw/display/g364fb.c b/hw/display/g364fb.c
> index 3d75394e77..2e7af33427 100644
> --- a/hw/display/g364fb.c
> +++ b/hw/display/g364fb.c
> @@ -18,6 +18,7 @@
>   */
>
>  #include "qemu/osdep.h"
> +#include "qemu/units.h"
>  #include "hw/hw.h"
>  #include "qemu/error-report.h"
>  #include "ui/console.h"
> @@ -511,7 +512,7 @@ static void g364fb_sysbus_reset(DeviceState *d)
>
>  static Property g364fb_sysbus_properties[] = {
>  DEFINE_PROP_UINT32("vram_size", G364SysBusState, g364.vram_size,
> -8 * 1024 * 1024),
> +   8 * M_BYTE),
>  DEFINE_PROP_END_OF_LIST(),
>  };
>
> diff --git a/hw/display/qxl.c b/hw/display/qxl.c
> index a71714ccb4..f0340ae355 100644
> --- a/hw/display/qxl.c
> +++ b/hw/display/qxl.c
> @@ -19,6 +19,7 @@
>   */
>
>  #include "qemu/osdep.h"
> +#include "qemu/units.h"
>  #include 
>
>  #include "qapi/error.h"
> @@ -2012,11 +2013,11 @@ static void qxl_init_ramsize(PCIQXLDevice *qxl)
>  if (qxl->vgamem_size_mb > 256) {
>  qxl->vgamem_size_mb = 256;
>  }
> -qxl->vgamem_size = qxl->vgamem_size_mb * 1024 * 1024;
> +

Re: [Qemu-devel] [PATCH v3 29/41] hw/mips: Use the BYTE-based definitions

2018-04-16 Thread Alistair Francis

On Sun, Apr 15, 2018 at 4:42 PM, Philippe Mathieu-Daudé  wrote:
> It eases code review, unit is explicit.
>
> Patch generated using:
>
>   $ git grep -E '(1024|2048|4096|8192|(<<|>>).?(10|20|30))' hw/ include/hw/
>
> and modified manually.
>
> Signed-off-by: Philippe Mathieu-Daudé 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  include/hw/intc/mips_gic.h |  3 ++-
>  include/hw/mips/bios.h |  3 ++-
>  hw/mips/boston.c   |  4 ++--
>  hw/mips/mips_fulong2e.c|  7 ---
>  hw/mips/mips_malta.c   | 23 +--
>  hw/mips/mips_r4k.c | 11 ++-
>  hw/misc/mips_itu.c |  3 ++-
>  hw/pci-host/xilinx-pcie.c  |  5 +++--
>  8 files changed, 34 insertions(+), 25 deletions(-)
>
> diff --git a/include/hw/intc/mips_gic.h b/include/hw/intc/mips_gic.h
> index b98d50094a..c0da44bdb3 100644
> --- a/include/hw/intc/mips_gic.h
> +++ b/include/hw/intc/mips_gic.h
> @@ -11,6 +11,7 @@
>  #ifndef MIPS_GIC_H
>  #define MIPS_GIC_H
>
> +#include "qemu/units.h"
>  #include "hw/timer/mips_gictimer.h"
>  #include "cpu.h"
>  /*
> @@ -19,7 +20,7 @@
>
>  /* The MIPS default location */
>  #define GIC_BASE_ADDR   0x1bdcULL
> -#define GIC_ADDRSPACE_SZ(128 * 1024)
> +#define GIC_ADDRSPACE_SZ(128 * K_BYTE)
>
>  /* Constants */
>  #define GIC_POL_POS 1
> diff --git a/include/hw/mips/bios.h b/include/hw/mips/bios.h
> index b4b88ac43d..b4c97ce87c 100644
> --- a/include/hw/mips/bios.h
> +++ b/include/hw/mips/bios.h
> @@ -1,6 +1,7 @@
> +#include "qemu/units.h"
>  #include "cpu.h"
>
> -#define BIOS_SIZE (4 * 1024 * 1024)
> +#define BIOS_SIZE (4 * M_BYTE)
>  #ifdef TARGET_WORDS_BIGENDIAN
>  #define BIOS_FILENAME "mips_bios.bin"
>  #else
> diff --git a/hw/mips/boston.c b/hw/mips/boston.c
> index fb23161b33..edc39e91f7 100644
> --- a/hw/mips/boston.c
> +++ b/hw/mips/boston.c
> @@ -32,7 +32,7 @@
>  #include "hw/mips/cpudevs.h"
>  #include "hw/pci-host/xilinx-pcie.h"
>  #include "qapi/error.h"
> -#include "qemu/cutils.h"
> +#include "qemu/units.h"
>  #include "qemu/error-report.h"
>  #include "qemu/log.h"
>  #include "chardev/char.h"
> @@ -437,7 +437,7 @@ static void boston_mach_init(MachineState *machine)
>  bool is_64b;
>
>  if ((machine->ram_size % G_BYTE) ||
> -(machine->ram_size > (2 * G_BYTE))) {
> +(machine->ram_size > 2 * G_BYTE)) {
>  error_report("Memory size must be 1GB or 2GB");
>  exit(1);
>  }
> diff --git a/hw/mips/mips_fulong2e.c b/hw/mips/mips_fulong2e.c
> index 02fb2fdcc4..779883db7c 100644
> --- a/hw/mips/mips_fulong2e.c
> +++ b/hw/mips/mips_fulong2e.c
> @@ -19,6 +19,7 @@
>   */
>
>  #include "qemu/osdep.h"
> +#include "qemu/units.h"
>  #include "qapi/error.h"
>  #include "hw/hw.h"
>  #include "hw/i386/pc.h"
> @@ -159,7 +160,7 @@ static int64_t load_kernel (CPUMIPSState *env)
>  /* Setup minimum environment variables */
>  prom_set(prom_buf, index++, "busclock=3300");
>  prom_set(prom_buf, index++, "cpuclock=1");
> -prom_set(prom_buf, index++, "memsize=%i", 
> loaderparams.ram_size/1024/1024);
> +prom_set(prom_buf, index++, "memsize=%llu", loaderparams.ram_size / 
> M_BYTE);
>  prom_set(prom_buf, index++, "modetty0=38400n8r");
>  prom_set(prom_buf, index++, NULL);
>
> @@ -303,10 +304,10 @@ static void mips_fulong2e_init(MachineState *machine)
>  qemu_register_reset(main_cpu_reset, cpu);
>
>  /* fulong 2e has 256M ram. */
> -ram_size = 256 * 1024 * 1024;
> +ram_size = 256 * M_BYTE;
>
>  /* fulong 2e has a 1M flash.Winbond W39L040AP70Z */
> -bios_size = 1024 * 1024;
> +bios_size = 1 * M_BYTE;
>
>  /* allocate RAM */
>  memory_region_allocate_system_memory(ram, NULL, "fulong2e.ram", 
> ram_size);
> diff --git a/hw/mips/mips_malta.c b/hw/mips/mips_malta.c
> index f6513a4fd5..240dd762be 100644
> --- a/hw/mips/mips_malta.c
> +++ b/hw/mips/mips_malta.c
> @@ -23,6 +23,7 @@
>   */
>
>  #include "qemu/osdep.h"
> +#include "qemu/units.h"
>  #include "qemu-common.h"
>  #include "cpu.h"
>  #include "hw/hw.h"
> @@ -844,7 +845,8 @@ static int64_t load_kernel (void)
>  /* The kernel allocates the bootmap memory in the low memory 
> after
> the initrd.  It takes at most 128kiB for 2GB RAM and 4kiB
> pages.  */
> -initrd_offset = (loaderparams.ram_low_size - initrd_size - 131072
> +initrd_offset = (loaderparams.ram_low_size - initrd_size
> + - (128 * K_BYTE)
>   - ~INITRD_PAGE_MASK) & INITRD_PAGE_MASK;
>  if (kernel_high >= initrd_offset) {
>  error_report("memory too small for initial ram disk '%s'",
> @@ -1022,9 +1024,9 @@ void mips_malta_init(MachineState *machine)
>  mips_create_cpu(s, machine->cpu_type, _irq, _irq);
>
>  /* allocate RAM */
> -if (ram_size > (2048u << 20)) {
> -error_report("Too much memory for

Re: [Qemu-devel] [PATCH for-2.12?] m25p80: Correct the at25128a/at25256a eeproms size

2018-04-16 Thread Alistair Francis

On Mon, Apr 16, 2018 at 9:05 AM, mar.krzeminski
 wrote:
> W dniu 15.04.2018 o 22:31, Philippe Mathieu-Daudé pisze:
>>
>> >From the datasheet (3368J–SEEPR) description:
>>  The AT25128A/256A provides 131,072/262,144 bits of serial
>>  electrically-erasable programmable read only memory (EEPROM)
>>  organized as 16,384/32,768 words of 8 bits each.
>>
>> However QEMU models the flash size in bytes.
>> Correct the at25128a/at25256a entries to reflect the datasheet size.
>>
>> Signed-off-by: Philippe Mathieu-Daudé 
>
> Reviewed-by: Marcin Krzeminski 

Reviewed-by: Alistair Francis 

Alistair

>>
>> ---
>> This is not a regression, so can now wait for 2.13.
>>
>>   hw/block/m25p80.c | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/block/m25p80.c b/hw/block/m25p80.c
>> index b49c8e9caa..d0b9fbfc50 100644
>> --- a/hw/block/m25p80.c
>> +++ b/hw/block/m25p80.c
>> @@ -196,8 +196,8 @@ static const FlashPartInfo known_devices[] = {
>>   /* Atmel EEPROMS - it is assumed, that don't care bit in command
>>* is set to 0. Block protection is not supported.
>>*/
>> -{ INFO("at25128a-nonjedec", 0x0, 0, 1, 131072, EEPROM) },
>> -{ INFO("at25256a-nonjedec", 0x0, 0, 1, 262144, EEPROM) },
>> +{ INFO("at25128a-nonjedec", 0x0, 0, 1, 16384, EEPROM) },
>> +{ INFO("at25256a-nonjedec", 0x0, 0, 1, 32768, EEPROM) },
>
> Thanks!
> Marcin
>
>> /* EON -- en25xxx */
>>   { INFO("en25f32", 0x1c3116,  0,  64 << 10,  64, ER_4K) },
>
>
>

Re: [Qemu-devel] [PATCH] WHPX fixes an issue with CPUID 1 not returning CPUID_EXT_HYPERVISOR

2018-04-16 Thread Justin Terry (VM) via Qemu-devel

Hey Eduardo/Paolo,

I have not forgotten about your responses. I am working out how best to do this 
in our platform and will send a follow up patch (this one is already merged) to 
fully support the -cpu flag. It looks like all the pieces are in place between 
the two and we just need a bit of translation between the QEMU state at 
partition creation/start/execution to handle the various exits either by 
pre-setting the value or via the actual CPUID trap at runtime. Thank you for 
all your insights/input up to here. It has been really helpful. More to come.

Thanks,
Justin

> -Original Message-
> From: Eduardo Habkost 
> Sent: Monday, April 16, 2018 9:33 AM
> To: Paolo Bonzini 
> Cc: Justin Terry (VM) ; qemu-devel@nongnu.org;
> r...@twiddle.net
> Subject: Re: [PATCH] WHPX fixes an issue with CPUID 1 not returning
> CPUID_EXT_HYPERVISOR
> 
> On Thu, Apr 05, 2018 at 05:50:20PM +0200, Paolo Bonzini wrote:
> > On 28/03/2018 22:48, Justin Terry (VM) wrote:
> [...]
> > > If we use [2] to inject the answers at creation time WHPX needs
> > > access to the CPUX86State at accel init which also doesn't seem to
> > > be possible in QEMU today. WHPX could basically just call
> > > cpu_x86_cpuid() for each CPUID QEMU cares about and plumb the
> answer
> > > before start. This has the best performance as we avoid the
> > > additional exits but has an issue in that the results must be known ahead
> of time.
> >
> > The earliest where you have access to that is x86_cpu_initfn.
> 
> x86_cpu_initfn() is the earliest you have access to the CPU object, but note
> that the final CPUID bits (based on -cpu options, accel data, and possibly
> other input) are known only when x86_cpu_realizefn() is called.
> 
> --
> Eduardo

Re: [Qemu-devel] [RFC PATCH V4 3/4] vfio: Add SaveVMHanlders for VFIO device to support live migration

2018-04-16 Thread Alex Williamson

On Tue, 10 Apr 2018 14:03:13 +0800
Yulei Zhang  wrote:

> Instead of using vm state description, add SaveVMHandlers for VFIO
> device to support live migration.
> 
> Introduce new Ioctl VFIO_DEVICE_GET_DIRTY_BITMAP to fetch the memory
> bitmap that dirtied by vfio device during the iterative precopy stage
> to shorten the system downtime afterward.
> 
> For vfio pci device status migrate, during the system downtime, it will
> save the following states
> 1. pci configuration space addr0~addr5
> 2. pci configuration space msi_addr msi_data
> 3. pci device status fetch from device driver
> 
> And on the target side the vfio_load will restore the same states
> 1. re-setup the pci bar configuration
> 2. re-setup the pci device msi configuration
> 3. restore the pci device status

Interrupts are configured via ioctl, but I don't see any code here to
configure the device into the correct interrupt state.  How do we know
the target device is compatible with the source device?  Do we rely on
the vendor driver to implicitly include some kind of device and version
information and fail at the very end of the migration?  It seems like
we need to somehow front-load that sort of device compatibility
checking since a vfio-pci device can be anything (ex. what happens if
a user tries to migrate a GVT-g vGPU to an NVIDIA vGPU?).  Thanks,

Alex

Re: [Qemu-devel] [RFC PATCH V4 2/4] vfio: Add vm status change callback to stop/restart the mdev device

2018-04-16 Thread Alex Williamson

On Mon, 16 Apr 2018 20:14:27 +0530
Kirti Wankhede  wrote:

> On 4/10/2018 11:32 AM, Yulei Zhang wrote:
> > VM status change handler is added to change the vfio pci device
> > status during the migration, write the demanded device status
> > to the DEVICE STATUS subregion to stop the device on the source side
> > before fetch its status and start the deivce on the target side
> > after restore its status.
> > 
> > Signed-off-by: Yulei Zhang 
> > ---
> >  hw/vfio/pci.c | 20 
> >  include/hw/vfio/vfio-common.h |  1 +
> >  linux-headers/linux/vfio.h|  6 ++
> >  roms/seabios  |  2 +-
> >  4 files changed, 28 insertions(+), 1 deletion(-)
> > 
> > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > index f98a9dd..13d8c73 100644
> > --- a/hw/vfio/pci.c
> > +++ b/hw/vfio/pci.c
> > @@ -38,6 +38,7 @@
> >  
> >  static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
> >  static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
> > +static void vfio_vm_change_state_handler(void *pv, int running, RunState 
> > state);
> >  
> >  /*
> >   * Disabling BAR mmaping can be slow, but toggling it around INTx can
> > @@ -2896,6 +2897,7 @@ static void vfio_realize(PCIDevice *pdev, Error 
> > **errp)
> >  vfio_register_err_notifier(vdev);
> >  vfio_register_req_notifier(vdev);
> >  vfio_setup_resetfn_quirk(vdev);
> > +qemu_add_vm_change_state_handler(vfio_vm_change_state_handler, vdev);
> >  
> >  return;
> >  
> > @@ -2982,6 +2984,24 @@ post_reset:
> >  vfio_pci_post_reset(vdev);
> >  }
> >  
> > +static void vfio_vm_change_state_handler(void *pv, int running, RunState 
> > state)
> > +{
> > +VFIOPCIDevice *vdev = pv;
> > +VFIODevice *vbasedev = >vbasedev;
> > +uint8_t dev_state;
> > +uint8_t sz = 1;
> > +
> > +dev_state = running ? VFIO_DEVICE_START : VFIO_DEVICE_STOP;
> > +
> > +if (pwrite(vdev->vbasedev.fd, _state,
> > +   sz, vdev->device_state.offset) != sz) {
> > +error_report("vfio: Failed to %s device", running ? "start" : 
> > "stop");
> > +return;
> > +}
> > +
> > +vbasedev->device_state = dev_state;
> > +}
> > +  
> 
> Is it expected to trap device_state region by vendor driver?
> Can this information be communicated to vendor driver through an ioctl?

Either the mdev vendor driver or vfio bus driver (ie. vfio-pci) would
be providing REGION_INFO for this region, so the vendor driver is
already in full control here using existing ioctls.  I don't see that
we need new ioctls, we just need to fully define the API of the
proposed regions here.

> Here only device state is conveyed to vendor driver but knowing
> 'RunState' in vendor driver is very useful and vendor driver can take
> necessary action accordingly like RUN_STATE_PAUSED indicating that VM is
> in paused state, similarly RUN_STATE_SUSPENDED,
> RUN_STATE_FINISH_MIGRATE, RUN_STATE_IO_ERROR. If these states are
> handled properly, all the cases can be supported with same interface
> like VM suspend-resume, VM pause-restore.

I agree, but let's remember that we're talking about device state, not
VM state.  vfio is a userspace device interface, not specifically a
virtual machine interface, so states should be in terms of the device.
The API of this region needs to be clearly defined and using only 1
byte at the start of the region is not very forward looking.  Thanks,

Alex

> >  static void vfio_instance_init(Object *obj)
> >  {
> >  PCIDevice *pci_dev = PCI_DEVICE(obj);
> > diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> > index f3a2ac9..9c14a8f 100644
> > --- a/include/hw/vfio/vfio-common.h
> > +++ b/include/hw/vfio/vfio-common.h
> > @@ -125,6 +125,7 @@ typedef struct VFIODevice {
> >  unsigned int num_irqs;
> >  unsigned int num_regions;
> >  unsigned int flags;
> > +bool device_state;
> >  } VFIODevice;
> >  
> >  struct VFIODeviceOps {
> > diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
> > index e3380ad..8f02f2f 100644
> > --- a/linux-headers/linux/vfio.h
> > +++ b/linux-headers/linux/vfio.h
> > @@ -304,6 +304,12 @@ struct vfio_region_info_cap_type {
> >  /* Mdev sub-type for device state save and restore */
> >  #define VFIO_REGION_SUBTYPE_DEVICE_STATE   (4)
> >  
> > +/* Offset in region to save device state */
> > +#define VFIO_DEVICE_STATE_OFFSET   1
> > +
> > +#define VFIO_DEVICE_START  0
> > +#define VFIO_DEVICE_STOP   1
> > +
> >  /**
> >   * VFIO_DEVICE_GET_IRQ_INFO - _IOWR(VFIO_TYPE, VFIO_BASE + 9,
> >   * struct vfio_irq_info)
> > diff --git a/roms/seabios b/roms/seabios
> > index 63451fc..5f4c7b1 16
> > --- a/roms/seabios
> > +++ b/roms/seabios
> > @@ -1 +1 @@
> > -Subproject commit 63451fca13c75870e1703eb3e20584d91179aebc
> > +Subproject commit 5f4c7b13cdf9c450eb55645f4362ea58fa61b79b
> >

Re: [Qemu-devel] [RFC PATCH V4 1/4] vfio: introduce a new VFIO subregion for mdev device migration support

2018-04-16 Thread Alex Williamson

On Mon, 16 Apr 2018 20:14:03 +0530
Kirti Wankhede  wrote:

> On 4/10/2018 11:32 AM, Yulei Zhang wrote:
> > New VFIO sub region VFIO_REGION_SUBTYPE_DEVICE_STATE is added
> > to fetch and restore the status of mdev device vGPU during the
> > live migration.
> > 
> > Signed-off-by: Yulei Zhang 
> > ---
> >  hw/vfio/pci.c  | 25 -
> >  hw/vfio/pci.h  |  2 ++
> >  linux-headers/linux/vfio.h |  9 ++---
> >  3 files changed, 32 insertions(+), 4 deletions(-)
> > 
> > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > index c977ee3..f98a9dd 100644
> > --- a/hw/vfio/pci.c
> > +++ b/hw/vfio/pci.c
> > @@ -32,6 +32,7 @@
> >  #include "pci.h"
> >  #include "trace.h"
> >  #include "qapi/error.h"
> > +#include "migration/blocker.h"
> >  
> >  #define MSIX_CAP_LENGTH 12
> >  
> > @@ -2821,6 +2822,25 @@ static void vfio_realize(PCIDevice *pdev, Error 
> > **errp)
> >  vfio_vga_quirk_setup(vdev);
> >  }
> >  
> > +struct vfio_region_info *device_state;
> > +/* device state region setup */
> > +if (!vfio_get_dev_region_info(>vbasedev,
> > +VFIO_REGION_TYPE_PCI_VENDOR_TYPE,

This is not how VENDOR_TYPE is meant to be used.  We have a 32-bit type
and 32-bit sub-type.  When bit 31 of type is set (ie. VENDOR_TYPE),
then the low 16-bits (VENDOR_MASK) defines a vendor specific set of
sub-types.  Using it as above, you're stomping on vendor 0x's
vendor sub-types.  I would expect non-vendor specific types to leave
bit 31 of the type field clear.  We could then define a universal type
for device state with a sub-type specifying the API such that we could
revise the API by providing an updated sub-type.  Or perhaps there are
devices that want separate save vs restore regions, we could do that
with the sub-type.

> > +VFIO_REGION_SUBTYPE_DEVICE_STATE, _state)) {
> > +memcpy(>device_state, device_state,
> > +   sizeof(struct vfio_region_info));
> > +g_free(device_state);
> > +} else {
> > +error_setg(>migration_blocker,
> > +"Migration disabled: cannot support device state region");
> > +migrate_add_blocker(vdev->migration_blocker, );

This appears as if it's going to be rather verbose and generate errors
for anything not supporting migration, which is currently everything.
Maybe there should be an OnOffAuto vfio-pci device option for the user
to specify migration such that if migration=on is specified the device
will fail if it's not available.  Otherwise the default would be auto.

> > +if (err) {
> > +error_propagate(errp, err);
> > +error_free(vdev->migration_blocker);
> > +goto error;
> > +}
> > +}
> > +  
> 
> I think there should be a _PROBE ioctl before trying to setup region for
> migration. If vfio device driver or vendor driver for mdev device
> supports migration capability, _PROBE ioctl should return success along
> with region's information  which can be used to setup the
> region.

How is _PROBE different from _REGION_INFO which already exists to tell
us information about the region?

> >  for (i = 0; i < PCI_ROM_SLOT; i++) {
> >  vfio_bar_quirk_setup(vdev, i);
> >  }
> > @@ -2884,6 +2904,10 @@ out_teardown:
> >  vfio_teardown_msi(vdev);
> >  vfio_bars_exit(vdev);
> >  error:
> > +if (vdev->migration_blocker) {
> > +migrate_del_blocker(vdev->migration_blocker);
> > +error_free(vdev->migration_blocker);
> > +}
> >  error_prepend(errp, ERR_PREFIX, vdev->vbasedev.name);
> >  }
> >  
> > @@ -3009,7 +3033,6 @@ static Property vfio_pci_dev_properties[] = {
> >  
> >  static const VMStateDescription vfio_pci_vmstate = {
> >  .name = "vfio-pci",
> > -.unmigratable = 1,  
> 
> Based on the result of above _PROBE ioctl, 'unmigratable' should be set
> if vendor driver doesn't support migration capability.

.unmigratable isn't really dynamically settable aiui, thus the
existence of the migration blocker.

> >  };
> >  
> >  static void vfio_pci_dev_class_init(ObjectClass *klass, void *data)
> > diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
> > index 502a575..0ee1724 100644
> > --- a/hw/vfio/pci.h
> > +++ b/hw/vfio/pci.h
> > @@ -116,6 +116,8 @@ typedef struct VFIOPCIDevice {
> >  VFIOBAR bars[PCI_NUM_REGIONS - 1]; /* No ROM */
> >  VFIOVGA *vga; /* 0xa, 0x3b0, 0x3c0 */
> >  void *igd_opregion;
> > +struct vfio_region_info device_state;
> > +Error *migration_blocker;
> >  PCIHostDeviceAddress host;
> >  EventNotifier err_notifier;
> >  EventNotifier req_notifier;
> > diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
> > index 4312e96..e3380ad 100644
> > --- a/linux-headers/linux/vfio.h
> > +++ b/linux-headers/linux/vfio.h
> > @@ -297,9 +297,12 @@ struct vfio_region_info_cap_type {
> >  #define VFIO_REGION_TYPE_PCI_VENDOR_MASK   (0x)
> >  
> >

Re: [Qemu-devel] [PATCH v1 23/24] Makefile.target: add (clean-)guest-tests targets

2018-04-16 Thread Alex Bennée


Philippe Mathieu-Daudé  writes:

> On 04/16/2018 05:53 AM, Alex Bennée wrote:
>>
>> Philippe Mathieu-Daudé  writes:
>>
>>> On 04/10/2018 04:39 PM, Alex Bennée wrote:
 Now all the build infrastructure is in place we can build tests for
 each guest that we support. That support mainly depends on having
 cross compilers installed or docker setup. To keep all the logic for
 that together we put the rules in tests/tcg/Makefile.include and
 include it from the main Makefile.target.

 Signed-off-by: Alex Bennée 
 ---
  Makefile.target|  5 +++
  tests/tcg/Makefile.include | 79 
 ++
  2 files changed, 84 insertions(+)
  create mode 100644 tests/tcg/Makefile.include

 diff --git a/Makefile.target b/Makefile.target
 index d0ec77a307..a30fd40257 100644
 --- a/Makefile.target
 +++ b/Makefile.target
 @@ -36,6 +36,11 @@ endif
  PROGS=$(QEMU_PROG) $(QEMU_PROGW)
  STPFILES=

 +# Makefile Tests
 +ifdef CONFIG_USER_ONLY
 +include $(SRC_PATH)/tests/tcg/Makefile.include
 +endif
 +
  config-target.h: config-target.h-timestamp
  config-target.h-timestamp: config-target.mak

 diff --git a/tests/tcg/Makefile.include b/tests/tcg/Makefile.include
 new file mode 100644
 index 00..cb8bb36026
 --- /dev/null
 +++ b/tests/tcg/Makefile.include
 @@ -0,0 +1,79 @@
 +# -*- Mode: makefile -*-
 +#
 +# TCG tests (per-target rules)
 +#
 +# This Makefile fragement is included from the per-target
 +# Makefile.target so will be invoked for each linux-user program we
 +# build. We have two options for compiling, either using a configured
 +# guest compiler or calling one of our docker images to do it for us.
 +#
 +
 +# The per ARCH makefile, if it exists holds extra information about
 +# useful docker images or alternative compiler flags. Include it if it
 +# exists
 +
 +ARCH_MAKEFILE=$(SRC_PATH)/tests/tcg/$(TARGET_NAME)/Makefile.include
 +CHECK_INCLUDE=$(wildcard $(ARCH_MAKEFILE))
 +
 +ifeq ($(ARCH_MAKEFILE),$(CHECK_INCLUDE))
 +include $(ARCH_MAKEFILE)
 +endif
 +
 +GUEST_BUILD=
 +
 +# Support installed Cross Compilers
 +
 +ifdef CROSS_CC_GUEST
 +
 +.PHONY: cross-build-guest-tests
 +cross-build-guest-tests:
 +  $(call quiet-command, \
 +  (mkdir -p tests && cd tests && \
 + make -f $(SRC_PATH)/tests/tcg/Makefile ARCH=$(TARGET_NAME) 
 CC=$(CROSS_CC_GUEST)), \
 +"CROSS-BUILD","$(TARGET_NAME) guest-tests with $(CROSS_CC_GUEST)")
 +
 +
 +GUEST_BUILD=cross-build-guest-tests
 +
 +endif
 +
 +# Support building with Docker
 +
 +ifeq ($(HAVE_USER_DOCKER)$(GUEST_BUILD),y)
 +ifneq ($(DOCKER_IMAGE),)
 +
 +# We also need the Docker make rules to depend on
 +include $(SRC_PATH)/tests/docker/Makefile.include
 +
>>>
>>> Eventually:
>>>
>>> DOCKER_CROSS_COMPILER ?= $(DOCKER_CROSS_COMPILER_PREFIX)-gcc
>>>
 +DOCKER_COMPILE_CMD="$(DOCKER_SCRIPT) cc --user $(shell id -u) \
 +  --cc $(DOCKER_CROSS_COMPILER) \
 +  -i qemu:$(DOCKER_IMAGE) \
 +  -s $(SRC_PATH) -- "
>>>
>>> Should we check $(DOCKER_CROSS_COMPILER) is set?
>>
>> Currently it's implied. It would be nice if there was an easy way just
>> to fish it out of the env of the docker image as they already have it
>> set there.
>>
>> I've been trying quite hard to ensure all the tests can just use LD (see
>> the hoops I jump in i386 for this). Maybe a prefix config makes more
>> sense though
>
> We might also need some CROSS_CFLAGS, i.e. for armeb we need to use
> -mbig-endian. This would go in
> tests/tcg/$(TARGET_NAME)/Makefile.target

Yeah we can fish that out from config-target.mak

However I've run into problems with compiling things with -mbig-endian
as it forces the inclusion of something that seems to be missing in the
glibc-dev:


aarch64-linux-gnu-gcc -mbig-endian -o config-temp/qemu-conf.exe 
config-temp/qemu-conf.c -static
In file included from /usr/aarch64-linux-gnu/include/features.h:391:0,
 from /usr/aarch64-linux-gnu/include/stdio.h:27,
 from config-temp/qemu-conf.c:1:
/usr/aarch64-linux-gnu/include/gnu/stubs.h:11:32: fatal error: 
gnu/stubs-lp64_be.h: No such file or directory
compilation terminated.

And they aren't packaged if the distro doesn't support the arch.

>
>>
>>>
 +DOCKER_PREREQ=docker-image-$(DOCKER_IMAGE)
 +
 +.PHONY: docker-build-guest-tests
 +docker-build-guest-tests: $(DOCKER_PREREQ)
 +  $(call quiet-command, \
 +  (mkdir -p tests && cd tests && \
 + make -f $(SRC_PATH)/tests/tcg/Makefile ARCH=$(TARGET_NAME) 
 CC=$(DOCKER_COMPILE_CMD)), \
 +"CROSS-BUILD","$(TARGET_NAME) guest-tests with docker

Re: [Qemu-devel] [prefix=PATCH for-2.12?] linux-user: check that all of AArch64 SVE extended sigframe is writable

2018-04-16 Thread Richard Henderson

On 04/16/2018 05:19 AM, Peter Maydell wrote:
> In commit 8c5931de0ac7738809 we added support for SVE extended
> sigframe records.  These mean that the signal frame might now be
> larger than the size of the target_rt_sigframe record, so make sure
> we call lock_user on the entire frame size when we're creating it.
> (The code for restoring the signal frame already correctly handles
> the extended records by locking the 'extra' section separately to the
> main section.)
> 
> In particular, this fixes a bug even for non-SVE signal frames,
> because it extends the locked section to cover the
> target_rt_frame_record. Previously this was part of 'struct
> target_rt_sigframe', but in commit e1eecd1d9d4c1ade3 we pulled
> it out into its own struct, and so locking the target_rt_sigframe
> alone doesn't cover it. This bug would mean that we would fail
> to correctly handle the case where a signal was taken with
> SP pointing 16 bytes into an unwritable page, with the page
> immediately below it in memory being writable.
> 
> Signed-off-by: Peter Maydell 
> ---
> The requirements to trigger the bug sound implausible, except
> that the stack page might be unwritable because we just
> executed some trampoline code from it, so perhaps not so
> unlikely as it first seems? Not sure whether to put into 2.12
> or not...
> ---
>  linux-user/signal.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)

Yes, let's just go ahead and fix this all properly now.
Reviewed-by: Richard Henderson 


r~

[Qemu-devel] [PATCH] dump: add Windows dump format to dump-guest-memory

2018-04-16 Thread Viktor Prutyanov

This patch adds Windows crashdumping feature. Now QEMU can produce crashdump
file understandable for WinDbg. The crashdump will be obtained by joining
physical memory dump and 8K header exposed through vmcoreinfo/fw_cfg device
by guest driver at BSOD time. Option '-w' was added to dump-guest-memory
command. At the moment, only x64 configuration is supported.
Suitable driver can be found at
https://github.com/virtio-win/kvm-guest-drivers-windows/tree/master/fwcfg64

Signed-off-by: Viktor Prutyanov 
---
 Makefile.target |   1 +
 dump.c  |  24 +++-
 hmp-commands.hx |  13 ++--
 hmp.c   |   9 ++-
 qapi/misc.json  |   4 +-
 win_dump.c  | 182 
 win_dump.h  |  86 ++
 7 files changed, 310 insertions(+), 9 deletions(-)
 create mode 100644 win_dump.c
 create mode 100644 win_dump.h

diff --git a/Makefile.target b/Makefile.target
index 6549481096..f47ae7187e 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -138,6 +138,7 @@ obj-y += hw/
 obj-y += memory.o
 obj-y += memory_mapping.o
 obj-y += dump.o
+obj-y += win_dump.o
 obj-y += migration/ram.o
 LIBS := $(libs_softmmu) $(LIBS)
 
diff --git a/dump.c b/dump.c
index 6bdb0dbe23..961f26a7ee 100644
--- a/dump.c
+++ b/dump.c
@@ -29,6 +29,10 @@
 #include "qemu/error-report.h"
 #include "hw/misc/vmcoreinfo.h"
 
+#ifdef TARGET_X86_64
+#include "win_dump.h"
+#endif
+
 #include 
 #ifdef CONFIG_LZO
 #include 
@@ -1861,7 +1865,11 @@ static void dump_process(DumpState *s, Error **errp)
 Error *local_err = NULL;
 DumpQueryResult *result = NULL;
 
-if (s->has_format && s->format != DUMP_GUEST_MEMORY_FORMAT_ELF) {
+if (s->has_format && s->format == DUMP_GUEST_MEMORY_FORMAT_WIN_DMP) {
+#ifdef TARGET_X86_64
+create_win_dump(s, _err);
+#endif
+} else if (s->has_format && s->format != DUMP_GUEST_MEMORY_FORMAT_ELF) {
 create_kdump_vmcore(s, _err);
 } else {
 create_vmcore(s, _err);
@@ -1965,6 +1973,13 @@ void qmp_dump_guest_memory(bool paging, const char *file,
 }
 #endif
 
+#ifndef TARGET_X86_64
+if (has_format && format == DUMP_GUEST_MEMORY_FORMAT_WIN_DMP) {
+error_setg(errp, "Windows dump is only available for x86-64");
+return;
+}
+#endif
+
 #if !defined(WIN32)
 if (strstart(file, "fd:", )) {
 fd = monitor_get_fd(cur_mon, p, errp);
@@ -2039,5 +2054,12 @@ DumpGuestMemoryCapability 
*qmp_query_dump_guest_memory_capability(Error **errp)
 item->value = DUMP_GUEST_MEMORY_FORMAT_KDUMP_SNAPPY;
 #endif
 
+/* Windows dump is available only if target is x86_64 */
+#ifdef TARGET_X86_64
+item->next = g_malloc0(sizeof(DumpGuestMemoryFormatList));
+item = item->next;
+item->value = DUMP_GUEST_MEMORY_FORMAT_WIN_DMP;
+#endif
+
 return cap;
 }
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 35d862a5d2..196aebea65 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1088,30 +1088,33 @@ ETEXI
 
 {
 .name   = "dump-guest-memory",
-.args_type  = 
"paging:-p,detach:-d,zlib:-z,lzo:-l,snappy:-s,filename:F,begin:i?,length:i?",
-.params = "[-p] [-d] [-z|-l|-s] filename [begin length]",
+.args_type  = 
"paging:-p,detach:-d,windmp:-w,zlib:-z,lzo:-l,snappy:-s,filename:F,begin:i?,length:i?",
+.params = "[-p] [-d] [-z|-l|-s|-w] filename [begin length]",
 .help   = "dump guest memory into file 'filename'.\n\t\t\t"
   "-p: do paging to get guest's memory mapping.\n\t\t\t"
   "-d: return immediately (do not wait for 
completion).\n\t\t\t"
   "-z: dump in kdump-compressed format, with zlib 
compression.\n\t\t\t"
   "-l: dump in kdump-compressed format, with lzo 
compression.\n\t\t\t"
   "-s: dump in kdump-compressed format, with snappy 
compression.\n\t\t\t"
+  "-w: dump in Windows crashdump format,\n\t\t\t"
+  "for Windows x64 guests with vmcoreinfo driver 
only.\n\t\t\t"
   "begin: the starting physical address.\n\t\t\t"
   "length: the memory size, in bytes.",
 .cmd= hmp_dump_guest_memory,
 },
 
-
 STEXI
 @item dump-guest-memory [-p] @var{filename} @var{begin} @var{length}
-@item dump-guest-memory [-z|-l|-s] @var{filename}
+@item dump-guest-memory [-z|-l|-s|-w] @var{filename}
 @findex dump-guest-memory
 Dump guest memory to @var{protocol}. The file can be processed with crash or
-gdb. Without -z|-l|-s, the dump format is ELF.
+gdb. Without -z|-l|-s|-w, the dump format is ELF.
 -p: do paging to get guest's memory mapping.
 -z: dump in kdump-compressed format, with zlib compression.
 -l: dump in kdump-compressed format, with lzo compression.
 -s: dump in kdump-compressed format, with snappy compression.
+-w: dump in Windows crashdump format,
+

Re: [Qemu-devel] [PATCH] fpu/softfloat: check for Inf / x or 0 / x before /0

2018-04-16 Thread Richard Henderson

On 04/16/2018 03:54 AM, Alex Bennée wrote:
> +/* Inf / x or 0 / x */
> +if (a.cls == float_class_inf || a.cls == float_class_zero) {
> +a.sign = sign;
> +return a;
> +}

0/0 should raise an exception.

I find inf/0 non-intuitive, but there ya go.

r~

[Qemu-devel] Patch for ARM memory barriers and getdent

2018-04-16 Thread Henry Wertz

Please find submitted a patch for ARM memory barriers.  This patch is
against qemu-2.12-rc2 but I do believe it should apply for anything from
2.11.x to current.

I found with qemu 2.11.x or newer that I would get an illegal instruction
error running some Intel binaries on my ARM chromebook.  On investigation,
I found it was quitting on memory barriers.
qemu instruction:
mb $0x31
was translating as:
0x604050cc:  5bf07ff5  blpl #0x600250a8

After patch it gives:
0x604050cc:  f57ff05b  dmb  ish

In short, I found INSN_DMB_ISH (memory barrier for ARMv7) appeared to be
correct based on online docs, but due to some endian-related shenanigans it
had to be byte-swapped to suit qemu; it appears INSN_DMB_MCR (memory
barrier for ARMv6) also should be byte swapped  (and this patch does so).
I have not checked for correctness of aarch64's barrier instruction.
---
Please find submitted a patch for getdents (this system call stands for
"get directory entries", it is passed a file descriptor pointing to a
directory and returns a struct with info on the entries in that
directory.)  This patch is against qemu-2.10 series but continues to apply
cleanly on current as of April 15 2018.  If you are running a 32-bit binary
on 64-bit target current qemu will convert he getdents struct, but it will
NOT in the case of 64-bit binary on 32-bit host.  (In my case, I had
android SDK's "aapt" for x86-64 error out on 32-bit ARM without this patch,
and run fine with it.)  This patch simply replaces a "#if TARGET_ABI_BITS
== 32 && HOST_LONG_BITS == 64" with "#if TARGET_ABI_BITS != HOST_LONG_BITS".

Thanks!
*** tcg/arm/tcg-target.inc.c.orig	2018-04-04 15:28:50.0 -0500
--- tcg/arm/tcg-target.inc.c	2018-04-16 12:55:04.917518898 -0500
***
*** 158,167 
  INSN_LDRD_REG  = 0x00d0,
  INSN_STRD_IMM  = 0x004000f0,
  INSN_STRD_REG  = 0x00f0,
  
! INSN_DMB_ISH   = 0x5bf07ff5,
! INSN_DMB_MCR   = 0xba0f07ee,
  
  /* Architected nop introduced in v6k.  */
  /* ??? This is an MSR (imm) 0,0,0 insn.  Anyone know if this
 also Just So Happened to do nothing on pre-v6k so that we
--- 158,167 
  INSN_LDRD_REG  = 0x00d0,
  INSN_STRD_IMM  = 0x004000f0,
  INSN_STRD_REG  = 0x00f0,
  
! INSN_DMB_ISH   = 0xf57ff05b,
! INSN_DMB_MCR   = 0xee070fba,
  
  /* Architected nop introduced in v6k.  */
  /* ??? This is an MSR (imm) 0,0,0 insn.  Anyone know if this
 also Just So Happened to do nothing on pre-v6k so that we
*** linux-user/syscall.c~	2017-03-04 10:31:14.0 -0600
--- linux-user/syscall.c	2017-03-07 17:08:24.615399116 -0600
***
*** 9913,9921 
  #endif
  #ifdef TARGET_NR_getdents
  case TARGET_NR_getdents:
  #ifdef __NR_getdents
! #if TARGET_ABI_BITS == 32 && HOST_LONG_BITS == 64
  {
  struct target_dirent *target_dirp;
  struct linux_dirent *dirp;
  abi_long count = arg3;
--- 9913,9921 
  #endif
  #ifdef TARGET_NR_getdents
  case TARGET_NR_getdents:
  #ifdef __NR_getdents
! #if TARGET_ABI_BITS != HOST_LONG_BITS
  {
  struct target_dirent *target_dirp;
  struct linux_dirent *dirp;
  abi_long count = arg3;

Re: [Qemu-devel] [PATCH] mux: fix ctrl-a b again

2018-04-16 Thread Daniel P . Berrangé

On Mon, Apr 16, 2018 at 07:28:28PM +0100, Peter Maydell wrote:
> On 16 April 2018 at 19:18, Marc-André Lureau
>  wrote:
> > Commit fb5e19d2e1472e96d72d5e4d89c20033f8ab345c originally fixed the
> > regression, but was inadvertently broken again in merge commit
> > 2d6752d38d8acda.
> >
> > Fixes:
> > https://bugs.launchpad.net/qemu/+bug/1654137
> >
> > Cc: qemu-sta...@nongnu.org
> > Signed-off-by: Marc-André Lureau 
> > ---
> >  chardev/char-mux.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/chardev/char-mux.c b/chardev/char-mux.c
> > index 1b925c8dec..6055e76293 100644
> > --- a/chardev/char-mux.c
> > +++ b/chardev/char-mux.c
> > @@ -304,6 +304,7 @@ void mux_set_focus(Chardev *chr, int focus)
> >  }
> >
> >  d->focus = focus;
> > +chr->be = d->backends[focus];
> >  mux_chr_send_event(d, d->focus, CHR_EVENT_MUX_IN);
> >  }
> >
> > --
> > 2.17.0.rc1.36.gcedb63ea2f
> 
> Opinions welcome on whether this is a regression fix worth
> putting into rc4.

It is a regression, but a long standing one - we've been broken for quite
a while since 2.9.0 or even before.

If we're doing an rc4 anyway I'd suggest including it, but not the end of
the world if it has to go in via -stable given how long we've been broken
for.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [Qemu-devel] [PATCH 0/3] Remove artificial length limits when parsing options

2018-04-16 Thread Daniel P . Berrangé

On Mon, Apr 16, 2018 at 08:13:42PM +0200, Markus Armbruster wrote:
> Daniel P. Berrangé  writes:
> 
> > On Mon, Apr 16, 2018 at 06:30:45PM +0200, Markus Armbruster wrote:
> >> Daniel P. Berrangé  writes:
> >> 
> >> > A user trying out SMBIOS "OEM strings" feature reported that the data
> >> > they are exposing to the guest was truncated at 1023 bytes, which breaks
> >> > the app consuming in the guest. After searching for the cause I
> >> > eventually found that the QemuOpts parsing is using fixed length 1024
> >> > byte array for option values and 128 byte array for key names.
> >> >
> >> > We can certainly debate whether it is sane to have such long command
> >> > line argument values (it is not sane), but if the OS was capable of
> >> > exec'ing QEMU with such an ARGV array, there is little good reason for
> >> > imposing an artificial length restriction when parsing it. Even worse is
> >> > that we silently truncate without reporting an error when hitting limits
> >> > resulting in a semantically incorrect behaviour, possibly even leading
> >> > to security flaws depending on the data that was truncated.
> >> >
> >> > Thus this patch series removes the artificial length limits by killing
> >> > the fixed length buffers.
> >> >
> >> > Separately I intend to make it possible to read "OEM strings" data from
> >> > a file, to avoid need to have long command line args.
> >> 
> >> Too bad I haven't been able to complete my quest to kill QemuOpts.
> >> 
> >> As far as I know, keyval.c's only arbitrary limit is the length of a key
> >> fragment (the things separated by '.').
> >
> > Looks like that's the same scenario I tried to address in patch 2. The
> > 'key' part in QemuOpts has the same 128 byte limit as in the keyval.c
> > code. I fear that could be hit with -blockdev when setting params on
> > very deeply nested block backends.  On the plus side  keyval.c actually
> > reports an error when it hits its 128 byte limit, instead of silently
> > carrying on as if all was well like QemuOpts did :-)
> 
> In keyval.c, the key (things like "a.b.c") can be arbitrarily long
> (well, until g_malloc() throws in the towel), but each key fragment
> ("a", "b" and "c") is limited to 128 bytes.
> 
> If key length was limited there, I would've asked you to fix it there,
> too.

Agreed, if only fragments are limited, that's fine because we know that
no code ever declares a key long enough to exceed the individual fragment
size.


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [Qemu-devel] [PATCH] mux: fix ctrl-a b again

2018-04-16 Thread Peter Maydell

On 16 April 2018 at 19:18, Marc-André Lureau
 wrote:
> Commit fb5e19d2e1472e96d72d5e4d89c20033f8ab345c originally fixed the
> regression, but was inadvertently broken again in merge commit
> 2d6752d38d8acda.
>
> Fixes:
> https://bugs.launchpad.net/qemu/+bug/1654137
>
> Cc: qemu-sta...@nongnu.org
> Signed-off-by: Marc-André Lureau 
> ---
>  chardev/char-mux.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/chardev/char-mux.c b/chardev/char-mux.c
> index 1b925c8dec..6055e76293 100644
> --- a/chardev/char-mux.c
> +++ b/chardev/char-mux.c
> @@ -304,6 +304,7 @@ void mux_set_focus(Chardev *chr, int focus)
>  }
>
>  d->focus = focus;
> +chr->be = d->backends[focus];
>  mux_chr_send_event(d, d->focus, CHR_EVENT_MUX_IN);
>  }
>
> --
> 2.17.0.rc1.36.gcedb63ea2f

Opinions welcome on whether this is a regression fix worth
putting into rc4.

thanks
-- PMM

[Qemu-devel] [PATCH] mux: fix ctrl-a b again

2018-04-16 Thread Marc-André Lureau

Commit fb5e19d2e1472e96d72d5e4d89c20033f8ab345c originally fixed the
regression, but was inadvertently broken again in merge commit
2d6752d38d8acda.

Fixes:
https://bugs.launchpad.net/qemu/+bug/1654137

Cc: qemu-sta...@nongnu.org
Signed-off-by: Marc-André Lureau 
---
 chardev/char-mux.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/chardev/char-mux.c b/chardev/char-mux.c
index 1b925c8dec..6055e76293 100644
--- a/chardev/char-mux.c
+++ b/chardev/char-mux.c
@@ -304,6 +304,7 @@ void mux_set_focus(Chardev *chr, int focus)
 }
 
 d->focus = focus;
+chr->be = d->backends[focus];
 mux_chr_send_event(d, d->focus, CHR_EVENT_MUX_IN);
 }
 
-- 
2.17.0.rc1.36.gcedb63ea2f

Re: [Qemu-devel] [PATCH 0/3] Remove artificial length limits when parsing options

2018-04-16 Thread Markus Armbruster

Daniel P. Berrangé  writes:

> On Mon, Apr 16, 2018 at 06:30:45PM +0200, Markus Armbruster wrote:
>> Daniel P. Berrangé  writes:
>> 
>> > A user trying out SMBIOS "OEM strings" feature reported that the data
>> > they are exposing to the guest was truncated at 1023 bytes, which breaks
>> > the app consuming in the guest. After searching for the cause I
>> > eventually found that the QemuOpts parsing is using fixed length 1024
>> > byte array for option values and 128 byte array for key names.
>> >
>> > We can certainly debate whether it is sane to have such long command
>> > line argument values (it is not sane), but if the OS was capable of
>> > exec'ing QEMU with such an ARGV array, there is little good reason for
>> > imposing an artificial length restriction when parsing it. Even worse is
>> > that we silently truncate without reporting an error when hitting limits
>> > resulting in a semantically incorrect behaviour, possibly even leading
>> > to security flaws depending on the data that was truncated.
>> >
>> > Thus this patch series removes the artificial length limits by killing
>> > the fixed length buffers.
>> >
>> > Separately I intend to make it possible to read "OEM strings" data from
>> > a file, to avoid need to have long command line args.
>> 
>> Too bad I haven't been able to complete my quest to kill QemuOpts.
>> 
>> As far as I know, keyval.c's only arbitrary limit is the length of a key
>> fragment (the things separated by '.').
>
> Looks like that's the same scenario I tried to address in patch 2. The
> 'key' part in QemuOpts has the same 128 byte limit as in the keyval.c
> code. I fear that could be hit with -blockdev when setting params on
> very deeply nested block backends.  On the plus side  keyval.c actually
> reports an error when it hits its 128 byte limit, instead of silently
> carrying on as if all was well like QemuOpts did :-)

In keyval.c, the key (things like "a.b.c") can be arbitrarily long
(well, until g_malloc() throws in the towel), but each key fragment
("a", "b" and "c") is limited to 128 bytes.

If key length was limited there, I would've asked you to fix it there,
too.

Re: [Qemu-devel] [PATCH 08/12] os-posix: Provide new -runas : facility

2018-04-16 Thread Markus Armbruster

Ian Jackson  writes:

> Markus Armbruster writes ("Re: [Qemu-devel] [PATCH 08/12] os-posix: Provide 
> new -runas : facility"):
>> Ian Jackson  writes:
>> > That would defer the getpwnam from argument parsing to os_setup_post.
>> > I think that's undesriable.
>> 
>> No argument.  But why can't os_parse_cmd_args() call getpwnam() as it
>> does now, then store user_pwd->pw_uid, ->pw_gid and ->pw_name instead of
>> user_pwd?  Store a null name when it parses the argument as UID:GID.
>
> Oh, I see.  It seems less obvious to me than what I have done, but I
> can do it like that if you like.

I just wanted to toss out the idea.  Please use your judgenment and do
it the way you like better.

[Qemu-devel] [Bug 1654137] Re: Ctrl-A b not working in 2.8.0

2018-04-16 Thread Thomas Huth

** Changed in: qemu
 Assignee: (unassigned) => elmarco (marcandre-lureau)

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1654137

Title:
  Ctrl-A b not working in 2.8.0

Status in QEMU:
  Confirmed

Bug description:
  With a recent update from 2.7.0 to 2.8.0 I have discovered that I can
  no longer send a "break" to the VM.  Ctrl-A b is simply ignored.
  Other Ctrl-A sequences seem to work correctly.

  This is on a NetBSD amd64 system, version 7.99.53, and qemu was
  installed on this system from source.

  Reverting to the previous install restores "break" capability.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1654137/+subscriptions

Re: [Qemu-devel] Purpose of memory-backend-file.discard-data

2018-04-16 Thread Eduardo Habkost

(CCing Zack Cornelius)

On Fri, Apr 13, 2018 at 08:21:26AM +0200, Michal Privoznik wrote:
> Eduardo et al,
> 
> I'm looking at 11ae6ed8affdd131e and I wanted to implement libvirt
> support for that. But more I look at it less I understand it. My
> understanding it is an optimization (although not very effective one
> since madvise() is/should be immediately followed by munmap()). So any
> application that is trying to keep track of guest memory  can stop doing
> so as soon as it sees munmap(). Or does the optimization lies in fact
> that madvise() is called sooner and thus the app can stop caring
> slightly sooner?

munmap() or close() are not enough: it would still make the host
kernel unnecessarily write data to backing storage.  The point of
the madvise() call is to tell the kernel that data can be
destroyed instead of being written back.

The original use case is described by Zack here:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg473538.html


> 
> Also, I don't quite understand why is this configurable. What's the harm
> in turning it on by default? Yesterday I've posted some patches to
> libvirt list [1] (although I have to admit I am still not fully
> convinced about the design) and they implement just this - whenever qemu
> supports the feature libvirt turns it on.S

The option can destroy the data on the backing file, so QEMU can't enable it by
default.  libvirt can enable it as long as it knows it can destroy the data on
the backing file.

> 
> Michal
> 
> 1: https://www.redhat.com/archives/libvir-list/2018-April/msg01066.html

-- 
Eduardo

[Qemu-devel] [PATCH] tests: request access to locality 0 before sending command

2018-04-16 Thread Stefan Berger

Recent changes to the CRB code now require that access to locality 0
be requested before the locality can be used for sending a command.
This patch adds the request to access the locality.

Signed-off-by: Stefan Berger 
---
 tests/tpm-util.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tests/tpm-util.c b/tests/tpm-util.c
index 4967a4e..6910503 100644
--- a/tests/tpm-util.c
+++ b/tests/tpm-util.c
@@ -25,6 +25,8 @@ void tpm_util_crb_transfer(QTestState *s,
 uint64_t caddr = qtest_readq(s, TPM_CRB_ADDR_BASE + A_CRB_CTRL_CMD_LADDR);
 uint64_t raddr = qtest_readq(s, TPM_CRB_ADDR_BASE + A_CRB_CTRL_RSP_ADDR);
 
+qtest_writeb(s, TPM_CRB_ADDR_BASE + A_CRB_LOC_CTRL, 1);
+
 qtest_memwrite(s, caddr, req, req_size);
 
 uint32_t sts, start = 1;
-- 
2.5.5

Re: [Qemu-devel] [PULL 0/1] i386: Don't automatically enable FEAT_KVM_HINTS bits

2018-04-16 Thread Peter Maydell

On 16 April 2018 at 18:46, Eduardo Habkost  wrote:
> On Mon, Apr 16, 2018 at 02:42:52PM -0300, Eduardo Habkost wrote:
>> On Mon, Apr 16, 2018 at 05:57:35PM +0100, Peter Maydell wrote:
>> > On 16 April 2018 at 17:39, Eduardo Habkost  wrote:
>> > > Last remaining fix for -rc4.
>> > >
>> > > The following changes since commit 
>> > > 042f6a31af3d38eefc6ec995cce1d762c41d4515:
>> > >
>> > >   Merge remote-tracking branch 
>> > > 'remotes/maxreitz/tags/pull-block-2018-04-16' into staging (2018-04-16 
>> > > 15:30:54 +0100)
>> > >
>> > > are available in the Git repository at:
>> > >
>> > >   git://github.com/ehabkost/qemu.git tags/x86-next-pull-request
>> > >
>> > > for you to fetch changes up to 0d914f39a77954bf5472b16130aeefe573a068f5:
>> > >
>> > >   i386: Don't automatically enable FEAT_KVM_HINTS bits (2018-04-16 
>> > > 13:36:52 -0300)
>> > >
>> > > 
>> > > i386: Don't automatically enable FEAT_KVM_HINTS bits
>> > >
>> > > Bug fix for "-cpu host" with newer kernels.
>> > >
>> > > 
>> >
>> > Is this a regression since 2.11 ?
>>
>> Yes.
>
> For reference, it was introduced by:
>
> commit be7773268d98176489483a315d3e2323cb0615b9
> Author: Wanpeng Li 
> Date:   Fri Feb 9 06:15:25 2018 -0800
>
> target-i386: add KVM_HINTS_DEDICATED performance hint
>
> Add KVM_HINTS_DEDICATED performance hint, guest checks this feature bit
> to determine if they run on dedicated vCPUs, allowing optimizations such
> as usage of qspinlocks.
>
> Sorry for not including a "Fixes:" tag in the commit message.  If
> you think it's worth it, I can redo the pull request with it.

No, that's fine. I'm just checking that we're all on the same page
about the level of fix that goes into a late-stage rc. Particularly
as we get past rc2 or so, it's definitely helpful to me if pull
request cover letters include justifications for why the changes
should go in.

thanks
-- PMM

Re: [Qemu-devel] [PATCH] scripts/device-crash-test: Removed fixed CAN entries

2018-04-16 Thread Eduardo Habkost

On Mon, Apr 16, 2018 at 07:41:29PM +0200, Thomas Huth wrote:
> The CAN device crashes have been fixed with the commit
> 089eac81e1d34d202471c0a023284f47f4c5f00e already.
> 
> Signed-off-by: Thomas Huth 

Queued, thanks.

-- 
Eduardo

Re: [Qemu-devel] [PULL 0/1] i386: Don't automatically enable FEAT_KVM_HINTS bits

2018-04-16 Thread Eduardo Habkost

On Mon, Apr 16, 2018 at 02:42:52PM -0300, Eduardo Habkost wrote:
> On Mon, Apr 16, 2018 at 05:57:35PM +0100, Peter Maydell wrote:
> > On 16 April 2018 at 17:39, Eduardo Habkost  wrote:
> > > Last remaining fix for -rc4.
> > >
> > > The following changes since commit 
> > > 042f6a31af3d38eefc6ec995cce1d762c41d4515:
> > >
> > >   Merge remote-tracking branch 
> > > 'remotes/maxreitz/tags/pull-block-2018-04-16' into staging (2018-04-16 
> > > 15:30:54 +0100)
> > >
> > > are available in the Git repository at:
> > >
> > >   git://github.com/ehabkost/qemu.git tags/x86-next-pull-request
> > >
> > > for you to fetch changes up to 0d914f39a77954bf5472b16130aeefe573a068f5:
> > >
> > >   i386: Don't automatically enable FEAT_KVM_HINTS bits (2018-04-16 
> > > 13:36:52 -0300)
> > >
> > > 
> > > i386: Don't automatically enable FEAT_KVM_HINTS bits
> > >
> > > Bug fix for "-cpu host" with newer kernels.
> > >
> > > 
> > 
> > Is this a regression since 2.11 ?
> 
> Yes.

For reference, it was introduced by:

commit be7773268d98176489483a315d3e2323cb0615b9
Author: Wanpeng Li 
Date:   Fri Feb 9 06:15:25 2018 -0800

target-i386: add KVM_HINTS_DEDICATED performance hint

Add KVM_HINTS_DEDICATED performance hint, guest checks this feature bit
to determine if they run on dedicated vCPUs, allowing optimizations such
as usage of qspinlocks.

Sorry for not including a "Fixes:" tag in the commit message.  If
you think it's worth it, I can redo the pull request with it.

-- 
Eduardo

Re: [Qemu-devel] [PULL 0/1] i386: Don't automatically enable FEAT_KVM_HINTS bits

2018-04-16 Thread Eduardo Habkost

On Mon, Apr 16, 2018 at 05:57:35PM +0100, Peter Maydell wrote:
> On 16 April 2018 at 17:39, Eduardo Habkost  wrote:
> > Last remaining fix for -rc4.
> >
> > The following changes since commit 042f6a31af3d38eefc6ec995cce1d762c41d4515:
> >
> >   Merge remote-tracking branch 
> > 'remotes/maxreitz/tags/pull-block-2018-04-16' into staging (2018-04-16 
> > 15:30:54 +0100)
> >
> > are available in the Git repository at:
> >
> >   git://github.com/ehabkost/qemu.git tags/x86-next-pull-request
> >
> > for you to fetch changes up to 0d914f39a77954bf5472b16130aeefe573a068f5:
> >
> >   i386: Don't automatically enable FEAT_KVM_HINTS bits (2018-04-16 13:36:52 
> > -0300)
> >
> > 
> > i386: Don't automatically enable FEAT_KVM_HINTS bits
> >
> > Bug fix for "-cpu host" with newer kernels.
> >
> > 
> 
> Is this a regression since 2.11 ?

Yes.

-- 
Eduardo

[Qemu-devel] [PATCH] scripts/device-crash-test: Removed fixed CAN entries

2018-04-16 Thread Thomas Huth

The CAN device crashes have been fixed with the commit
089eac81e1d34d202471c0a023284f47f4c5f00e already.

Signed-off-by: Thomas Huth 
---
 scripts/device-crash-test | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/scripts/device-crash-test b/scripts/device-crash-test
index 5d17dc6..b3ce720 100755
--- a/scripts/device-crash-test
+++ b/scripts/device-crash-test
@@ -218,9 +218,6 @@ ERROR_WHITELIST = [
 {'exitcode':-6, 'log':r"Object .* is not an instance of type e500-ccsr", 
'loglevel':logging.ERROR},
 {'exitcode':-6, 'log':r"vmstate_register_with_alias_id: Assertion 
`!se->compat \|\| se->instance_id == 0' failed", 'loglevel':logging.ERROR},
 {'exitcode':-11, 'device':'isa-serial', 'loglevel':logging.ERROR, 
'expected':True},
-{'exitcode':-11, 'device':'mioe3680_pci', 'loglevel':logging.ERROR, 
'expected':True},
-{'exitcode':-11, 'device':'pcm3680_pci', 'loglevel':logging.ERROR, 
'expected':True},
-{'exitcode':-11, 'device':'kvaser_pci', 'loglevel':logging.ERROR, 
'expected':True},
 
 # everything else (including SIGABRT and SIGSEGV) will be a fatal error:
 {'exitcode':None, 'fatal':True, 'loglevel':logging.FATAL},
-- 
1.8.3.1

Re: [Qemu-devel] [PATCH v2 for-2.12] Makefile: install gtk message catalogs if CONFIG_GTK=y too, not only =m

2018-04-16 Thread Peter Maydell

On 16 April 2018 at 10:37, Michael Tokarev  wrote:
> Fixes 722cd7496474cebb2218f21e038592fad8603365
>
> Signed-off-by: Michael Tokarev 
> ---
>  Makefile | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/Makefile b/Makefile
> index 727ef118f3..d71dd5bea4 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -856,7 +856,7 @@ ifneq ($(BLOBS),)
> $(INSTALL_DATA) $(SRC_PATH)/pc-bios/$$x 
> "$(DESTDIR)$(qemu_datadir)"; \
> done
>  endif
> -ifeq ($(CONFIG_GTK),m)
> +ifdef CONFIG_GTK
> $(MAKE) -C po $@
>  endif
> $(INSTALL_DIR) "$(DESTDIR)$(qemu_datadir)/keymaps"
> --
> 2.11.0

Applied, thanks.

-- PMM

Re: [Qemu-devel] [PATCH for-2.13 4/4] arm/boot: split load_dtb() from arm_load_kernel()

2018-04-16 Thread Peter Maydell

On 12 April 2018 at 17:40, Igor Mammedov  wrote:
> load_dtb() depends on arm_load_kernel() to figure out place
> in RAM where it should be loaded, but it's not required for
> arm_load_kernel() to work. Sometimes it's neccesary for
> devices added with -device/device_add to be enumerated in
> DTB as well, which's lead to [1] and surrounding commits to
> add 2 more machine_done notifiers with non obvious ordering
> to make dynamic sysbus devices initialization happen in
> the right order.
>
> However instead of moving whole arm_load_kernel() in to
> machine_done, it's sufficient to move only load_dtb() into
> virt_machine_done() notifier and remove ArmLoadKernelNotifier/
> /PlatformBusFDTNotifierParams notifiers, which saves us ~90LOC
> and simplifies code flow quite a bit.
> Later would allow to consolidate DTB generation within one
> function for 'mach-virt' board and make it reentrant so it
> could generate updated DTB in device hotplug secenarios.
>
> While at it rename load_dtb() to arm_load_dtb() since it's
> public now.
>
> Add additional field skip_dtb_autoload to struct arm_boot_info
> to allow manual DTB load later in mach-virt and to avoid touching
> all other boards to explicitly call arm_load_dtb().
>
>  1) (ac9d32e hw/arm/boot: arm_load_kernel implemented as a machine init done 
> notifier)
>
> Signed-off-by: Igor Mammedov 
> ---
>  include/hw/arm/arm.h| 37 ++---
>  include/hw/arm/sysbus-fdt.h | 37 +
>  hw/arm/boot.c   | 67 
> +++--
>  hw/arm/sysbus-fdt.c | 64 ---
>  hw/arm/virt.c   | 64 ---
>  5 files changed, 86 insertions(+), 183 deletions(-)
>
> diff --git a/include/hw/arm/arm.h b/include/hw/arm/arm.h
> index 188d18b..312e533 100644
> --- a/include/hw/arm/arm.h
> +++ b/include/hw/arm/arm.h
> @@ -39,15 +39,6 @@ DeviceState *armv7m_init(MemoryRegion *system_memory, int 
> mem_size, int num_irq,
>   */
>  void armv7m_load_kernel(ARMCPU *cpu, const char *kernel_filename, int 
> mem_size);
>
> -/*
> - * struct used as a parameter of the arm_load_kernel machine init
> - * done notifier
> - */
> -typedef struct {
> -Notifier notifier; /* actual notifier */
> -ARMCPU *cpu; /* handle to the first cpu object */
> -} ArmLoadKernelNotifier;
> -
>  /* arm_boot.c */
>  struct arm_boot_info {
>  uint64_t ram_size;
> @@ -56,6 +47,9 @@ struct arm_boot_info {
>  const char *initrd_filename;
>  const char *dtb_filename;
>  hwaddr loader_start;
> +hwaddr dtb_start;
> +hwaddr dtb_limit;
> +bool skip_dtb_autoload;

skip_dtb_autoload is a flag that the board code needs to set, so can
we have a comment documenting what it does, please?

>  /* multicore boards that use the default secondary core boot functions
>   * need to put the address of the secondary boot code, the boot reg,
>   * and the GIC address in the next 3 values, respectively. boards that

Otherwise this looks good, especially the diffstat :-)
I'm assuming Eric will check the platform-bus/sysbus-fdt parts
of this patch.

thanks
-- PMM

Re: [Qemu-devel] [PATCH for-2.13 2/4] platform-bus-device: use device plug callback instead of machine_done notifier

2018-04-16 Thread Peter Maydell

On 12 April 2018 at 17:40, Igor Mammedov  wrote:
> platform-bus were using machine_done notifier to get and map
> (assign irq/mmio resources) dynamically added sysbus devices
> after all '-device' options had been processed.
> That however creates non obvious dependencies on ordering of
> machine_done notifiers and requires carefull line juggling
> to keep it working. For example see comment above
> create_platform_bus() and 'straitforward' arm_load_kernel()
> had to converted to machine_done notifier and that lead to
> yet another machine_done notifier to keep it working
> arm_register_platform_bus_fdt_creator().
>
> Instead of hiding resource assignment in platform-bus-device
> to magically initialize sysbus devices, use device plug
> callback and assign resources explicitly at board level
> at the moment each -device option is being processed.
>
> That adds a bunch of machine declaration boiler plate to
> e500plat board, similar to ARM/x86 but gets rid of hidden
> machine_done notifier and would allow to remove the dependent
> notifiers in ARM code simplifying it and making code flow
> easier to follow.
>
> Signed-off-by: Igor Mammedov 

> +static void virt_machine_device_plug_cb(HotplugHandler *hotplug_dev,
> +DeviceState *dev, Error **errp)
> +{
> +VirtMachineState *vms = VIRT_MACHINE(hotplug_dev);
> +
> +if (object_dynamic_cast(OBJECT(dev), TYPE_SYS_BUS_DEVICE)) {
> +if (vms->platform_bus_dev) {
> +
> platform_bus_link_device(PLATFORM_BUS_DEVICE(vms->platform_bus_dev),
> + SYS_BUS_DEVICE(dev));
> +}
> +}
> +}
> +
> +static HotplugHandler *virt_machine_get_hotpug_handler(MachineState *machine,
> +   DeviceState *dev)

Nit: typo in function name, should be "hotplug".

Will let others review the meat of the patch.

thanks
-- PMM

Re: [Qemu-devel] [Qemu-arm] [PATCH for-2.13 3/4] arm: always start from first_cpu when registering loader cpu reset callback

2018-04-16 Thread Peter Maydell

On 12 April 2018 at 19:29, Peter Maydell  wrote:
> On 12 April 2018 at 17:40, Igor Mammedov  wrote:
>> if arm_load_kernel() were passed non first_cpu, QEMU would end up
>> with partially set do_cpu_reset() callback leaving some CPUs without it.
>>
>> Make sure that do_cpu_reset() is registered for all CPUs by enumerating
>> CPUs from first_cpu.
>>
>> Signed-off-by: Igor Mammedov 
>> ---
>>  hw/arm/boot.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/hw/arm/boot.c b/hw/arm/boot.c
>> index 2f464ca..2591fee 100644
>> --- a/hw/arm/boot.c
>> +++ b/hw/arm/boot.c
>> @@ -1188,7 +1188,7 @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info 
>> *info)
>>   * actually loading a kernel, the handler is also responsible for
>>   * arranging that we start it correctly.
>>   */
>> -for (cs = CPU(cpu); cs; cs = CPU_NEXT(cs)) {
>> +for (cs = first_cpu; cs; cs = CPU_NEXT(cs)) {
>>  qemu_register_reset(do_cpu_reset, ARM_CPU(cs));
>>  }
>>  }
>
> Definitely a bug fix, so:
> Reviewed-by: Peter Maydell 
>
> I think though that in at least some cases we'll still mishandle
> being passed anything other than first_cpu as the CPU pointer,
> because in do_cpu_reset() we do some checks for "do this if
> cs == first_cpu", on the assumption that first_cpu is the
> primary CPU that we're booting. We should instead I suppose
> be checking against the CPU pointer we're given as the
> arm_load_kernel() argument (which I think do_cpu_reset() can
> get at via info->load_kernel_notifier.cpu).
>
> We should probably analyse which boards actually pass anything
> other than first_cpu -- I suspect it will end up just being
> the xilinx board which has both A and R profile cores...

I did the check, and in fact all of our boards always pass the
first CPU as the boot CPU. Xilinx comes the closest, in that
the SoC object has a boot-cpu property to pick a boot CPU, but
since none of our boards set that it defaults to the first CPU.

Since this patch isn't very related to the rest of the series
I've just applied it to target-arm.next ready for 2.13 (with
a note in the commit message that in practice it isn't a behaviour
change). Let me know if that's awkward for you and you'd prefer
to keep it in this series, in which case I'll drop it.

thanks
-- PMM

[Qemu-devel] [PATCH 8/9] iotests: Copy 197 for COR filter driver

2018-04-16 Thread Max Reitz

iotest 197 tests copy-on-read using the (now old) copy-on-read flag.
Copy it to 215 and modify it to use the COR filter driver instead.

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/215 | 120 +
 tests/qemu-iotests/215.out |  26 ++
 tests/qemu-iotests/group   |   1 +
 3 files changed, 147 insertions(+)
 create mode 100755 tests/qemu-iotests/215
 create mode 100644 tests/qemu-iotests/215.out

diff --git a/tests/qemu-iotests/215 b/tests/qemu-iotests/215
new file mode 100755
index 00..8c4073a439
--- /dev/null
+++ b/tests/qemu-iotests/215
@@ -0,0 +1,120 @@
+#!/bin/bash
+#
+# Test case for copy-on-read into qcow2, using the COR filter driver
+#
+# Copyright (C) 2018 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+
+here="$PWD"
+status=1 # failure is the default!
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+TEST_WRAP="$TEST_DIR/t.wrap.qcow2"
+BLKDBG_CONF="$TEST_DIR/blkdebug.conf"
+
+# Sanity check: our use of blkdebug fails if $TEST_DIR contains spaces
+# or other problems
+case "$TEST_DIR" in
+*[^-_a-zA-Z0-9/]*)
+_notrun "Suspicious TEST_DIR='$TEST_DIR', cowardly refusing to run" ;;
+esac
+
+_cleanup()
+{
+_cleanup_test_img
+rm -f "$TEST_WRAP"
+rm -f "$BLKDBG_CONF"
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# Test is supported for any backing file; but we force qcow2 for our wrapper.
+_supported_fmt generic
+_supported_proto generic
+_supported_os Linux
+# LUKS support may be possible, but it complicates things.
+_unsupported_fmt luks
+
+echo
+echo '=== Copy-on-read ==='
+echo
+
+# Prep the images
+# VPC rounds image sizes to a specific geometry, force a specific size.
+if [ "$IMGFMT" = "vpc" ]; then
+IMGOPTS=$(_optstr_add "$IMGOPTS" "force_size")
+fi
+_make_test_img 4G
+$QEMU_IO -c "write -P 55 3G 1k" "$TEST_IMG" | _filter_qemu_io
+IMGPROTO=file IMGFMT=qcow2 IMGOPTS= TEST_IMG_FILE="$TEST_WRAP" \
+_make_test_img -F "$IMGFMT" -b "$TEST_IMG" | _filter_img_create
+$QEMU_IO -f qcow2 -c "write -z -u 1M 64k" "$TEST_WRAP" | _filter_qemu_io
+
+# Ensure that a read of two clusters, but where one is already allocated,
+# does not re-write the allocated cluster
+cat > "$BLKDBG_CONF" <&1 | _filter_qemu_io)
+case $output in
+*allocate*)
+_notrun "Insufficent memory to run test" ;;
+*) printf '%s\n' "$output" ;;
+esac
+$QEMU_IO \
+-c "open -o driver=cor,file.driver=qcow2 $TEST_WRAP" \
+-c "read -P 0 $((3*1024*1024*1024 + 1024)) 1k" \
+| _filter_qemu_io
+
+# Copy-on-read is incompatible with read-only
+$QEMU_IO \
+-c "open -r -o driver=cor,file.driver=qcow2 $TEST_WRAP" \
+2>&1 | _filter_testdir
+
+# Break the backing chain, and show that images are identical, and that
+# we properly copied over explicit zeros.
+$QEMU_IMG rebase -u -b "" -f qcow2 "$TEST_WRAP"
+$QEMU_IO -f qcow2 -c map "$TEST_WRAP"
+_check_test_img
+$QEMU_IMG compare -f $IMGFMT -F qcow2 "$TEST_IMG" "$TEST_WRAP"
+
+# success, all done
+echo '*** done'
+status=0
diff --git a/tests/qemu-iotests/215.out b/tests/qemu-iotests/215.out
new file mode 100644
index 00..70b0f5fb19
--- /dev/null
+++ b/tests/qemu-iotests/215.out
@@ -0,0 +1,26 @@
+QA output created by 215
+
+=== Copy-on-read ===
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=4294967296
+wrote 1024/1024 bytes at offset 3221225472
+1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+Formatting 'TEST_DIR/t.wrap.IMGFMT', fmt=IMGFMT size=4294967296 
backing_file=TEST_DIR/t.IMGFMT backing_fmt=IMGFMT
+wrote 65536/65536 bytes at offset 1048576
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 131072/131072 bytes at offset 1048576
+128 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 0/0 bytes at offset 0
+0 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 2147483136/2147483136 bytes at offset 1024
+2 GiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 1024/1024 bytes at offset 3221226496
+1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+can't open device TEST_DIR/t.wrap.qcow2: Block node is read-only
+2 GiB (0x8001) bytes allocated at offset 0 bytes (0x0)
+1023.938 MiB (0x3fff) bytes not allocated at offset 2 GiB (0x8001)
+64 KiB (0x1) bytes allocated at offset 3

Re: [Qemu-devel] [PATCH v1 23/24] Makefile.target: add (clean-)guest-tests targets

2018-04-16 Thread Philippe Mathieu-Daudé

On 04/16/2018 05:53 AM, Alex Bennée wrote:
> 
> Philippe Mathieu-Daudé  writes:
> 
>> On 04/10/2018 04:39 PM, Alex Bennée wrote:
>>> Now all the build infrastructure is in place we can build tests for
>>> each guest that we support. That support mainly depends on having
>>> cross compilers installed or docker setup. To keep all the logic for
>>> that together we put the rules in tests/tcg/Makefile.include and
>>> include it from the main Makefile.target.
>>>
>>> Signed-off-by: Alex Bennée 
>>> ---
>>>  Makefile.target|  5 +++
>>>  tests/tcg/Makefile.include | 79 
>>> ++
>>>  2 files changed, 84 insertions(+)
>>>  create mode 100644 tests/tcg/Makefile.include
>>>
>>> diff --git a/Makefile.target b/Makefile.target
>>> index d0ec77a307..a30fd40257 100644
>>> --- a/Makefile.target
>>> +++ b/Makefile.target
>>> @@ -36,6 +36,11 @@ endif
>>>  PROGS=$(QEMU_PROG) $(QEMU_PROGW)
>>>  STPFILES=
>>>  
>>> +# Makefile Tests
>>> +ifdef CONFIG_USER_ONLY
>>> +include $(SRC_PATH)/tests/tcg/Makefile.include
>>> +endif
>>> +
>>>  config-target.h: config-target.h-timestamp
>>>  config-target.h-timestamp: config-target.mak
>>>  
>>> diff --git a/tests/tcg/Makefile.include b/tests/tcg/Makefile.include
>>> new file mode 100644
>>> index 00..cb8bb36026
>>> --- /dev/null
>>> +++ b/tests/tcg/Makefile.include
>>> @@ -0,0 +1,79 @@
>>> +# -*- Mode: makefile -*-
>>> +#
>>> +# TCG tests (per-target rules)
>>> +#
>>> +# This Makefile fragement is included from the per-target
>>> +# Makefile.target so will be invoked for each linux-user program we
>>> +# build. We have two options for compiling, either using a configured
>>> +# guest compiler or calling one of our docker images to do it for us.
>>> +#
>>> +
>>> +# The per ARCH makefile, if it exists holds extra information about
>>> +# useful docker images or alternative compiler flags. Include it if it
>>> +# exists
>>> +
>>> +ARCH_MAKEFILE=$(SRC_PATH)/tests/tcg/$(TARGET_NAME)/Makefile.include
>>> +CHECK_INCLUDE=$(wildcard $(ARCH_MAKEFILE))
>>> +
>>> +ifeq ($(ARCH_MAKEFILE),$(CHECK_INCLUDE))
>>> +include $(ARCH_MAKEFILE)
>>> +endif
>>> +
>>> +GUEST_BUILD=
>>> +
>>> +# Support installed Cross Compilers
>>> +
>>> +ifdef CROSS_CC_GUEST
>>> +
>>> +.PHONY: cross-build-guest-tests
>>> +cross-build-guest-tests:
>>> +   $(call quiet-command, \
>>> +  (mkdir -p tests && cd tests && \
>>> +  make -f $(SRC_PATH)/tests/tcg/Makefile ARCH=$(TARGET_NAME) 
>>> CC=$(CROSS_CC_GUEST)), \
>>> + "CROSS-BUILD","$(TARGET_NAME) guest-tests with $(CROSS_CC_GUEST)")
>>> +
>>> +
>>> +GUEST_BUILD=cross-build-guest-tests
>>> +
>>> +endif
>>> +
>>> +# Support building with Docker
>>> +
>>> +ifeq ($(HAVE_USER_DOCKER)$(GUEST_BUILD),y)
>>> +ifneq ($(DOCKER_IMAGE),)
>>> +
>>> +# We also need the Docker make rules to depend on
>>> +include $(SRC_PATH)/tests/docker/Makefile.include
>>> +
>>
>> Eventually:
>>
>> DOCKER_CROSS_COMPILER ?= $(DOCKER_CROSS_COMPILER_PREFIX)-gcc
>>
>>> +DOCKER_COMPILE_CMD="$(DOCKER_SCRIPT) cc --user $(shell id -u) \
>>> +   --cc $(DOCKER_CROSS_COMPILER) \
>>> +   -i qemu:$(DOCKER_IMAGE) \
>>> +   -s $(SRC_PATH) -- "
>>
>> Should we check $(DOCKER_CROSS_COMPILER) is set?
> 
> Currently it's implied. It would be nice if there was an easy way just
> to fish it out of the env of the docker image as they already have it
> set there.
> 
> I've been trying quite hard to ensure all the tests can just use LD (see
> the hoops I jump in i386 for this). Maybe a prefix config makes more
> sense though

We might also need some CROSS_CFLAGS, i.e. for armeb we need to use
-mbig-endian. This would go in tests/tcg/$(TARGET_NAME)/Makefile.target

> 
>>
>>> +DOCKER_PREREQ=docker-image-$(DOCKER_IMAGE)
>>> +
>>> +.PHONY: docker-build-guest-tests
>>> +docker-build-guest-tests: $(DOCKER_PREREQ)
>>> +   $(call quiet-command, \
>>> +  (mkdir -p tests && cd tests && \
>>> +  make -f $(SRC_PATH)/tests/tcg/Makefile ARCH=$(TARGET_NAME) 
>>> CC=$(DOCKER_COMPILE_CMD)), \
>>> + "CROSS-BUILD","$(TARGET_NAME) guest-tests with docker 
>>> qemu:$(DOCKER_IMAGE)")
>>> +
>>> +GUEST_BUILD=docker-build-guest-tests
>>> +
>>> +endif
>>> +endif
>>> +
>>> +# Final targets
>>> +.PHONY: guest-tests
>>> +
>>> +ifneq ($(GUEST_BUILD),)
>>> +guest-tests: $(GUEST_BUILD)
>>> +else
>>> +guest-tests:
>>> +   $(call quiet-command, /bin/true, "CROSS-BUILD", "$(TARGET_NAME) 
>>> guest-tests SKIPPED")
>>> +endif
>>> +
>>> +# It doesn't mater if these don't exits
>>> +.PHONY: clean-guest-tests
>>> +clean-guest-tests:
>>> +   rm -rf tests || echo "no $(TARGET_NAME) tests to remove"
>>>

Re: [Qemu-devel] [PATCH 08/12] os-posix: Provide new -runas : facility

2018-04-16 Thread Ian Jackson

Markus Armbruster writes ("Re: [Qemu-devel] [PATCH 08/12] os-posix: Provide new 
-runas : facility"):
> Ian Jackson  writes:
> > That would defer the getpwnam from argument parsing to os_setup_post.
> > I think that's undesriable.
> 
> No argument.  But why can't os_parse_cmd_args() call getpwnam() as it
> does now, then store user_pwd->pw_uid, ->pw_gid and ->pw_name instead of
> user_pwd?  Store a null name when it parses the argument as UID:GID.

Oh, I see.  It seems less obvious to me than what I have done, but I
can do it like that if you like.

Ian.

[Qemu-devel] [PATCH v2 for 2.13] migration: Don't activate block devices if using -S

2018-04-16 Thread Dr. David Alan Gilbert (git)

From: "Dr. David Alan Gilbert" 

Activating the block devices causes the locks to be taken on
the backing file.  If we're running with -S and the destination libvirt
hasn't started the destination with 'cont', it's expecting the locks are
still untaken.

Don't activate the block devices if we're not going to autostart the VM;
'cont' already will do that anyway.   This change is tied to the new
migration capability 'late-block-activate' that defaults to off, keeping
the old behaviour by default.

bz: https://bugzilla.redhat.com/show_bug.cgi?id=1560854
Signed-off-by: Dr. David Alan Gilbert 
---
 migration/migration.c | 34 +++---
 qapi/migration.json   |  6 +-
 2 files changed, 32 insertions(+), 8 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 52a5092add..a05c6c893b 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -197,6 +197,16 @@ static void migrate_generate_event(int new_state)
 }
 }
 
+static bool migrate_late_block_activate(void)
+{
+MigrationState *s;
+
+s = migrate_get_current();
+
+return s->enabled_capabilities[
+MIGRATION_CAPABILITY_LATE_BLOCK_ACTIVATE];
+}
+
 /*
  * Called on -incoming with a defer: uri.
  * The migration can be started later after any parameters have been
@@ -306,13 +316,23 @@ static void process_incoming_migration_bh(void *opaque)
 Error *local_err = NULL;
 MigrationIncomingState *mis = opaque;
 
-/* Make sure all file formats flush their mutable metadata.
- * If we get an error here, just don't restart the VM yet. */
-bdrv_invalidate_cache_all(_err);
-if (local_err) {
-error_report_err(local_err);
-local_err = NULL;
-autostart = false;
+/* If capability late_block_activate is set:
+ * Only fire up the block code now if we're going to restart the
+ * VM, else 'cont' will do it.
+ * This causes file locking to happen; so we don't want it to happen
+ * unless we really are starting the VM.
+ */
+if (!migrate_late_block_activate() ||
+ (autostart && (!global_state_received() ||
+global_state_get_runstate() == RUN_STATE_RUNNING))) {
+/* Make sure all file formats flush their mutable metadata.
+ * If we get an error here, just don't restart the VM yet. */
+bdrv_invalidate_cache_all(_err);
+if (local_err) {
+error_report_err(local_err);
+local_err = NULL;
+autostart = false;
+}
 }
 
 /*
diff --git a/qapi/migration.json b/qapi/migration.json
index 9d0bf82cf4..757302762d 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -357,13 +357,17 @@
 # @dirty-bitmaps: If enabled, QEMU will migrate named dirty bitmaps.
 # (since 2.12)
 #
+# @late-block-activate: If enabled, the destination will not activate block
+#   devices (and thus take locks) immediately at the end of migration.
+#   (since 2.13)
+#
 # Since: 1.2
 ##
 { 'enum': 'MigrationCapability',
   'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks',
'compress', 'events', 'postcopy-ram', 'x-colo', 'release-ram',
'block', 'return-path', 'pause-before-switchover', 'x-multifd',
-   'dirty-bitmaps' ] }
+   'dirty-bitmaps', 'late-block-activate' ] }
 
 ##
 # @MigrationCapabilityStatus:
-- 
2.17.0

[Qemu-devel] [PATCH 7/9] iotests: Clean up wrap image in 197

2018-04-16 Thread Max Reitz

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/197 | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tests/qemu-iotests/197 b/tests/qemu-iotests/197
index 5e869fe2b7..3ae4975eec 100755
--- a/tests/qemu-iotests/197
+++ b/tests/qemu-iotests/197
@@ -44,6 +44,7 @@ esac
 _cleanup()
 {
 _cleanup_test_img
+rm -f "$TEST_WRAP"
 rm -f "$BLKDBG_CONF"
 }
 trap "_cleanup; exit \$status" 0 1 2 3 15
-- 
2.14.3

[Qemu-devel] [PATCH 9/9] iotests: Add test for COR across nodes

2018-04-16 Thread Max Reitz

COR across nodes (that is, you have some filter node between the
actually COR target and the node that performs the COR) cannot reliably
work together with the permission system when there is no explicit COR
node that can request the WRITE_UNCHANGED permission for its child.
This is because COR (currently) sneaks its requests by the usual
permission checks, so it can work without a WRITE* permission; but if
there is a filter node in between, that will re-issue the request, which
then passes through the usual check -- and if nobody has requested a
WRITE_UNCHANGED permission, that check will fail.

There is no real direct fix apart from hoping that there is someone who
has requested that permission; in case of just the qemu-io HMP command
(and no guest device), however, that is not the case.  The real real fix
is to implement the copy-on-read flag through an implicitly added COR
node.  Such a node can request the necessary permissions as shown in
this test.

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/216 | 117 +
 tests/qemu-iotests/216.out |  28 +++
 tests/qemu-iotests/group   |   1 +
 3 files changed, 146 insertions(+)
 create mode 100755 tests/qemu-iotests/216
 create mode 100644 tests/qemu-iotests/216.out

diff --git a/tests/qemu-iotests/216 b/tests/qemu-iotests/216
new file mode 100755
index 00..2f34f94faa
--- /dev/null
+++ b/tests/qemu-iotests/216
@@ -0,0 +1,117 @@
+#!/usr/bin/env python
+#
+# Copy-on-read tests using a COR filter node
+#
+# Copyright (C) 2018 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+# Creator/Owner: Max Reitz 
+#
+# Non-shared storage migration test using NBD server and drive-mirror
+
+import iotests
+from iotests import log, qemu_img_pipe, qemu_io, filter_qemu_io
+
+# Need backing file support
+iotests.verify_image_format(supported_fmts=['qcow2', 'qcow', 'qed', 'vmdk'])
+iotests.verify_platform(['linux'])
+
+log('')
+log('=== Copy-on-read across nodes ===')
+log('')
+
+# The old copy-on-read mechanism without a filter node cannot request
+# WRITE_UNCHANGED permissions for its child.  Therefore it just tries
+# to sneak its write by the usual permission system and holds its
+# fingers crossed.  However, that sneaking does not work so well when
+# there is a filter node in the way: That will receive the write
+# request and re-issue a new one to its child, which this time is a
+# proper write request that will make the permission system cough --
+# unless there is someone at the top (like a guest device) that has
+# requested write permissions.
+#
+# A COR filter node, however, can request the proper permissions for
+# its child and therefore is not hit by this issue.
+
+with iotests.FilePath('base.img') as base_img_path, \
+ iotests.FilePath('top.img') as top_img_path, \
+ iotests.VM() as vm:
+
+log('--- Setting up images ---')
+log('')
+
+qemu_img_pipe('create', '-f', iotests.imgfmt, base_img_path, '64M')
+
+log(filter_qemu_io(qemu_io(base_img_path, '-c', 'write -P 1 0M 1M')))
+
+qemu_img_pipe('create', '-f', iotests.imgfmt, '-b', base_img_path,
+  top_img_path)
+
+log(filter_qemu_io(qemu_io(top_img_path,  '-c', 'write -P 2 1M 1M')))
+
+log('')
+log('--- Doing COR ---')
+log('')
+
+# Compare with e.g. the following:
+#   vm.add_drive_raw('if=none,node-name=node0,copy-on-read=on,driver=raw,' 
\
+#'file.driver=%s,file.file.filename=%s' %
+#   (iotests.imgfmt, top_img_path))
+# (Remove the blockdev-add instead.)
+# ((Not tested here because it hits an assertion in the permission
+#   system.))
+
+vm.launch()
+
+log(vm.qmp('blockdev-add',
+node_name='node0',
+driver='cor',
+file={
+'driver': 'raw',
+'file': {
+'driver': 'cor',
+'file': {
+'driver': 'raw',
+'file': {
+'driver': iotests.imgfmt,
+'file': {
+'driver': 'file',
+'filename': top_img_path
+

[Qemu-devel] [PATCH 4/9] block: Set BDRV_REQ_WRITE_UNCHANGED for COR writes

2018-04-16 Thread Max Reitz

Signed-off-by: Max Reitz 
---
 block/io.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/block/io.c b/block/io.c
index 134b2a498f..fada4efbf3 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1115,13 +1115,15 @@ static int coroutine_fn 
bdrv_co_do_copy_on_readv(BdrvChild *child,
 /* FIXME: Should we (perhaps conditionally) be setting
  * BDRV_REQ_MAY_UNMAP, if it will allow for a sparser copy
  * that still correctly reads as zero? */
-ret = bdrv_co_do_pwrite_zeroes(bs, cluster_offset, pnum, 0);
+ret = bdrv_co_do_pwrite_zeroes(bs, cluster_offset, pnum,
+   BDRV_REQ_WRITE_UNCHANGED);
 } else {
 /* This does not change the data on the disk, it is not
  * necessary to flush even in cache=writethrough mode.
  */
 ret = bdrv_driver_pwritev(bs, cluster_offset, pnum,
-  _qiov, 0);
+  _qiov,
+  BDRV_REQ_WRITE_UNCHANGED);
 }
 
 if (ret < 0) {
-- 
2.14.3

[Qemu-devel] [PATCH 5/9] block/quorum: Support BDRV_REQ_WRITE_UNCHANGED

2018-04-16 Thread Max Reitz

We just need to forward it to quorum's children (except in case of a
rewrite because of corruption), but for that we first have to support
flags in child requests at all.

Signed-off-by: Max Reitz 
---
 block/quorum.c | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/block/quorum.c b/block/quorum.c
index cfe484a945..8cd689b2c1 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -115,6 +115,7 @@ struct QuorumAIOCB {
 /* Request metadata */
 uint64_t offset;
 uint64_t bytes;
+int flags;
 
 QEMUIOVector *qiov; /* calling IOV */
 
@@ -157,7 +158,8 @@ static bool quorum_64bits_compare(QuorumVoteValue *a, 
QuorumVoteValue *b)
 static QuorumAIOCB *quorum_aio_get(BlockDriverState *bs,
QEMUIOVector *qiov,
uint64_t offset,
-   uint64_t bytes)
+   uint64_t bytes,
+   int flags)
 {
 BDRVQuorumState *s = bs->opaque;
 QuorumAIOCB *acb = g_new(QuorumAIOCB, 1);
@@ -168,6 +170,7 @@ static QuorumAIOCB *quorum_aio_get(BlockDriverState *bs,
 .bs = bs,
 .offset = offset,
 .bytes  = bytes,
+.flags  = flags,
 .qiov   = qiov,
 .votes.compare  = quorum_sha256_compare,
 .votes.vote_list= QLIST_HEAD_INITIALIZER(acb.votes.vote_list),
@@ -271,9 +274,11 @@ static void quorum_rewrite_entry(void *opaque)
 BDRVQuorumState *s = acb->bs->opaque;
 
 /* Ignore any errors, it's just a correction attempt for already
- * corrupted data. */
+ * corrupted data.
+ * Mask out BDRV_REQ_WRITE_UNCHANGED because this overwrites the
+ * area with different data from the other children. */
 bdrv_co_pwritev(s->children[co->idx], acb->offset, acb->bytes,
-acb->qiov, 0);
+acb->qiov, acb->flags & ~BDRV_REQ_WRITE_UNCHANGED);
 
 /* Wake up the caller after the last rewrite */
 acb->rewrite_count--;
@@ -673,7 +678,7 @@ static int quorum_co_preadv(BlockDriverState *bs, uint64_t 
offset,
 uint64_t bytes, QEMUIOVector *qiov, int flags)
 {
 BDRVQuorumState *s = bs->opaque;
-QuorumAIOCB *acb = quorum_aio_get(bs, qiov, offset, bytes);
+QuorumAIOCB *acb = quorum_aio_get(bs, qiov, offset, bytes, flags);
 int ret;
 
 acb->is_read = true;
@@ -699,7 +704,7 @@ static void write_quorum_entry(void *opaque)
 
 sacb->bs = s->children[i]->bs;
 sacb->ret = bdrv_co_pwritev(s->children[i], acb->offset, acb->bytes,
-acb->qiov, 0);
+acb->qiov, acb->flags);
 if (sacb->ret == 0) {
 acb->success_count++;
 } else {
@@ -719,7 +724,7 @@ static int quorum_co_pwritev(BlockDriverState *bs, uint64_t 
offset,
  uint64_t bytes, QEMUIOVector *qiov, int flags)
 {
 BDRVQuorumState *s = bs->opaque;
-QuorumAIOCB *acb = quorum_aio_get(bs, qiov, offset, bytes);
+QuorumAIOCB *acb = quorum_aio_get(bs, qiov, offset, bytes, flags);
 int i, ret;
 
 for (i = 0; i < s->num_children; i++) {
@@ -961,6 +966,8 @@ static int quorum_open(BlockDriverState *bs, QDict 
*options, int flags,
 }
 s->next_child_index = s->num_children;
 
+bs->supported_write_flags = BDRV_REQ_WRITE_UNCHANGED;
+
 g_free(opened);
 goto exit;
 
-- 
2.14.3

[Qemu-devel] [PATCH 1/9] block: Add COR filter driver

2018-04-16 Thread Max Reitz

This adds a simple copy-on-read filter driver.  It relies on the already
existing COR functionality in the central block layer code, which may be
moved here once we no longer need it there.

Signed-off-by: Max Reitz 
---
 block/Makefile.objs  |   2 +-
 qapi/block-core.json |   5 +-
 block/cor.c  | 171 +++
 3 files changed, 176 insertions(+), 2 deletions(-)
 create mode 100644 block/cor.c

diff --git a/block/Makefile.objs b/block/Makefile.objs
index d644bac60a..6fdf786c04 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -26,7 +26,7 @@ block-obj-y += accounting.o dirty-bitmap.o
 block-obj-y += write-threshold.o
 block-obj-y += backup.o
 block-obj-$(CONFIG_REPLICATION) += replication.o
-block-obj-y += throttle.o
+block-obj-y += throttle.o cor.o
 
 block-obj-y += crypto.o
 
diff --git a/qapi/block-core.json b/qapi/block-core.json
index c50517bff3..3abbf4fcf1 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2506,11 +2506,12 @@
 # @vxhs: Since 2.10
 # @throttle: Since 2.11
 # @nvme: Since 2.12
+# @cor: Since 2.13
 #
 # Since: 2.9
 ##
 { 'enum': 'BlockdevDriver',
-  'data': [ 'blkdebug', 'blkverify', 'bochs', 'cloop',
+  'data': [ 'blkdebug', 'blkverify', 'bochs', 'cloop', 'cor',
 'dmg', 'file', 'ftp', 'ftps', 'gluster', 'host_cdrom',
 'host_device', 'http', 'https', 'iscsi', 'luks', 'nbd', 'nfs',
 'null-aio', 'null-co', 'nvme', 'parallels', 'qcow', 'qcow2', 'qed',
@@ -3522,6 +3523,7 @@
   'blkverify':  'BlockdevOptionsBlkverify',
   'bochs':  'BlockdevOptionsGenericFormat',
   'cloop':  'BlockdevOptionsGenericFormat',
+  'cor':'BlockdevOptionsGenericFormat',
   'dmg':'BlockdevOptionsGenericFormat',
   'file':   'BlockdevOptionsFile',
   'ftp':'BlockdevOptionsCurlFtp',
@@ -4049,6 +4051,7 @@
   'blkverify':  'BlockdevCreateNotSupported',
   'bochs':  'BlockdevCreateNotSupported',
   'cloop':  'BlockdevCreateNotSupported',
+  'cor':'BlockdevCreateNotSupported',
   'dmg':'BlockdevCreateNotSupported',
   'file':   'BlockdevCreateOptionsFile',
   'ftp':'BlockdevCreateNotSupported',
diff --git a/block/cor.c b/block/cor.c
new file mode 100644
index 00..a4cb4e2b84
--- /dev/null
+++ b/block/cor.c
@@ -0,0 +1,171 @@
+/*
+ * Copy-on-read filter block driver
+ *
+ * Copyright (c) 2018 Red Hat, Inc.
+ *
+ * Author:
+ *   Max Reitz 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 or
+ * (at your option) version 3 of the License.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "block/block_int.h"
+
+
+static int cor_open(BlockDriverState *bs, QDict *options, int flags,
+Error **errp)
+{
+bs->file = bdrv_open_child(NULL, options, "file", bs, _file, false,
+   errp);
+if (!bs->file) {
+return -EINVAL;
+}
+
+bs->supported_write_flags = BDRV_REQ_FUA &
+bs->file->bs->supported_write_flags;
+
+bs->supported_zero_flags = (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) &
+bs->file->bs->supported_zero_flags;
+
+return 0;
+}
+
+
+static void cor_close(BlockDriverState *bs)
+{
+}
+
+
+#define PERM_PASSTHROUGH (BLK_PERM_CONSISTENT_READ \
+  | BLK_PERM_WRITE \
+  | BLK_PERM_RESIZE)
+#define PERM_UNCHANGED (BLK_PERM_ALL & ~PERM_PASSTHROUGH)
+
+static void cor_child_perm(BlockDriverState *bs, BdrvChild *c,
+   const BdrvChildRole *role,
+   BlockReopenQueue *reopen_queue,
+   uint64_t perm, uint64_t shared,
+   uint64_t *nperm, uint64_t *nshared)
+{
+if (c == NULL) {
+*nperm = (perm & PERM_PASSTHROUGH) | BLK_PERM_WRITE_UNCHANGED;
+*nshared = (shared & PERM_PASSTHROUGH) | PERM_UNCHANGED;
+return;
+}
+
+*nperm = (perm & PERM_PASSTHROUGH) |
+ (c->perm & PERM_UNCHANGED);
+*nshared = (shared & PERM_PASSTHROUGH) |
+   (c->shared_perm & PERM_UNCHANGED);
+}
+
+
+static int64_t cor_getlength(BlockDriverState *bs)
+{
+return bdrv_getlength(bs->file->bs);
+}
+
+
+static int cor_truncate(BlockDriverState *bs, int64_t offset,
+

[Qemu-devel] [PATCH 6/9] block: Support BDRV_REQ_WRITE_UNCHANGED in filters

2018-04-16 Thread Max Reitz

Update the rest of the filter drivers to support
BDRV_REQ_WRITE_UNCHANGED.  They already forward write request flags to
their children, so we just have to announce support for it.

This patch does not cover the replication driver because that currently
does not support flags at all, and because it just grabs the WRITE
permission for its children when it can, so we should be fine just
submitting the incoming WRITE_UNCHANGED requests as normal writes.

It also does not cover format drivers for similar reasons.  They all use
bdrv_format_default_perms() as their .bdrv_child_perm() implementation
so they just always grab the WRITE permission for their file children
whenever possible.  In addition, it often would be difficult to
ascertain whether incoming unchanging writes end up as unchanging writes
in their files.  So we just leave them as normal potentially changing
writes.

Signed-off-by: Max Reitz 
---
 block/blkdebug.c   |  9 +
 block/blkreplay.c  |  3 +++
 block/blkverify.c  |  3 +++
 block/cor.c| 10 ++
 block/mirror.c |  2 ++
 block/raw-format.c |  9 +
 block/throttle.c   |  6 --
 7 files changed, 28 insertions(+), 14 deletions(-)

diff --git a/block/blkdebug.c b/block/blkdebug.c
index 589712475a..762ec2527c 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -398,10 +398,11 @@ static int blkdebug_open(BlockDriverState *bs, QDict 
*options, int flags,
 goto out;
 }
 
-bs->supported_write_flags = BDRV_REQ_FUA &
-bs->file->bs->supported_write_flags;
-bs->supported_zero_flags = (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) &
-bs->file->bs->supported_zero_flags;
+bs->supported_write_flags = BDRV_REQ_WRITE_UNCHANGED |
+(BDRV_REQ_FUA & bs->file->bs->supported_write_flags);
+bs->supported_zero_flags = BDRV_REQ_WRITE_UNCHANGED |
+((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) &
+bs->file->bs->supported_zero_flags);
 ret = -EINVAL;
 
 /* Set alignment overrides */
diff --git a/block/blkreplay.c b/block/blkreplay.c
index fe5a9b4a98..b016dbeee7 100755
--- a/block/blkreplay.c
+++ b/block/blkreplay.c
@@ -35,6 +35,9 @@ static int blkreplay_open(BlockDriverState *bs, QDict 
*options, int flags,
 goto fail;
 }
 
+bs->supported_write_flags = BDRV_REQ_WRITE_UNCHANGED;
+bs->supported_zero_flags = BDRV_REQ_WRITE_UNCHANGED;
+
 ret = 0;
 fail:
 return ret;
diff --git a/block/blkverify.c b/block/blkverify.c
index 331365be33..1646404b46 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -141,6 +141,9 @@ static int blkverify_open(BlockDriverState *bs, QDict 
*options, int flags,
 goto fail;
 }
 
+bs->supported_write_flags = BDRV_REQ_WRITE_UNCHANGED;
+bs->supported_zero_flags = BDRV_REQ_WRITE_UNCHANGED;
+
 ret = 0;
 fail:
 qemu_opts_del(opts);
diff --git a/block/cor.c b/block/cor.c
index a4cb4e2b84..189e5bd13e 100644
--- a/block/cor.c
+++ b/block/cor.c
@@ -33,11 +33,13 @@ static int cor_open(BlockDriverState *bs, QDict *options, 
int flags,
 return -EINVAL;
 }
 
-bs->supported_write_flags = BDRV_REQ_FUA &
-bs->file->bs->supported_write_flags;
+bs->supported_write_flags = BDRV_REQ_WRITE_UNCHANGED |
+(BDRV_REQ_FUA &
+bs->file->bs->supported_write_flags);
 
-bs->supported_zero_flags = (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) &
-bs->file->bs->supported_zero_flags;
+bs->supported_zero_flags = BDRV_REQ_WRITE_UNCHANGED |
+   ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) &
+bs->file->bs->supported_zero_flags);
 
 return 0;
 }
diff --git a/block/mirror.c b/block/mirror.c
index 820f512c7b..6fdd36d27f 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1148,6 +1148,8 @@ static void mirror_start_job(const char *job_id, 
BlockDriverState *bs,
 mirror_top_bs->implicit = true;
 }
 mirror_top_bs->total_sectors = bs->total_sectors;
+mirror_top_bs->supported_write_flags = BDRV_REQ_WRITE_UNCHANGED;
+mirror_top_bs->supported_zero_flags = BDRV_REQ_WRITE_UNCHANGED;
 bdrv_set_aio_context(mirror_top_bs, bdrv_get_aio_context(bs));
 
 /* bdrv_append takes ownership of the mirror_top_bs reference, need to keep
diff --git a/block/raw-format.c b/block/raw-format.c
index a378547c99..fe33693a2d 100644
--- a/block/raw-format.c
+++ b/block/raw-format.c
@@ -415,10 +415,11 @@ static int raw_open(BlockDriverState *bs, QDict *options, 
int flags,
 }
 
 bs->sg = bs->file->bs->sg;
-bs->supported_write_flags = BDRV_REQ_FUA &
-bs->file->bs->supported_write_flags;
-bs->supported_zero_flags = (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) &
-bs->file->bs->supported_zero_flags;
+bs->supported_write_flags = BDRV_REQ_WRITE_UNCHANGED |
+(BDRV_REQ_FUA & bs->file->bs->supported_write_flags);
+

[Qemu-devel] [PATCH 2/9] block: BLK_PERM_WRITE includes ..._UNCHANGED

2018-04-16 Thread Max Reitz

Currently we never actually check whether the WRITE_UNCHANGED
permission has been taken for unchanging writes.  But the one check that
is commented out checks both WRITE and WRITE_UNCHANGED; and considering
that WRITE_UNCHANGED is already documented as being weaker then WRITE,
we should probably explicitly document WRITE to include WRITE_UNCHANGED.

Signed-off-by: Max Reitz 
---
 include/block/block.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/block/block.h b/include/block/block.h
index cdec3639a3..397b5e8d44 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -205,6 +205,9 @@ enum {
  * This permission (which is weaker than BLK_PERM_WRITE) is both enough and
  * required for writes to the block node when the caller promises that
  * the visible disk content doesn't change.
+ *
+ * As the BLK_PERM_WRITE permission is strictly stronger, either is
+ * sufficient to perform an unchanging write.
  */
 BLK_PERM_WRITE_UNCHANGED= 0x04,
 
-- 
2.14.3

Re: [Qemu-devel] [PULL 0/1] i386: Don't automatically enable FEAT_KVM_HINTS bits

2018-04-16 Thread Peter Maydell

On 16 April 2018 at 17:39, Eduardo Habkost  wrote:
> Last remaining fix for -rc4.
>
> The following changes since commit 042f6a31af3d38eefc6ec995cce1d762c41d4515:
>
>   Merge remote-tracking branch 'remotes/maxreitz/tags/pull-block-2018-04-16' 
> into staging (2018-04-16 15:30:54 +0100)
>
> are available in the Git repository at:
>
>   git://github.com/ehabkost/qemu.git tags/x86-next-pull-request
>
> for you to fetch changes up to 0d914f39a77954bf5472b16130aeefe573a068f5:
>
>   i386: Don't automatically enable FEAT_KVM_HINTS bits (2018-04-16 13:36:52 
> -0300)
>
> 
> i386: Don't automatically enable FEAT_KVM_HINTS bits
>
> Bug fix for "-cpu host" with newer kernels.
>
> 

Is this a regression since 2.11 ?

thanks
-- PMM

[Qemu-devel] [PATCH 0/9] block: Add COR filter driver

2018-04-16 Thread Max Reitz

In our quest to... (Oh, man, I always struggle with writing cover
letters.  But rarely have I become stuck so early on.)  Orthogonalize?
the block layer (that is, turn hard-coded special options into
independent data processing nodes you can put anywhere in your data flow
graph), this series adds a copy-on-read filter driver that is supposed
to replace the copy-on-read -drive option in the long run.  In the short
run, it allows you to use COR with blockdev-add.

The patch itself is extremely simple.  It's just patch 1.  All we have
to do for now is to set the BDRV_REQ_COPY_ON_READ flag for read
requests.  We probably want to extend the COR driver's capabilities
later on, though, more on that below under "What to do as a follow-up?".

But there is something to do on top of that.  One real issue with the
current copy-on-read model is that you generally cannot do COR through
filter nodes (-drive copy-on-read=on,driver=raw,file.driver=qcow2,...).
Unless you are lucky that someone has requested WRITE permission, you
will run into a failed assertion from the permission system (see
patch 9).  Actually, you should always run into such an abort, but the
COR code cleverly sneaks its writes around it.  Well, actually, the
respective assertion is just commented out with a comment noting that
we'd need a proper COR filter so we could do the assertion.

So the new COR node will take the permission for its child that the
current code does not (because it cannot, for the lack of a node that
would do this).  However, of course we don't want it to require a WRITE
permission but just WRITE_UNCHANGED.  But when it issues a post-COR
write request that goes down to a filter and that filter re-issues that
write down to its child, then it will just be a normal write.  So
suddenly WRITE_UNCHANGED won't be sufficient, we'd need a standard WRITE
instead.

That isn't what we want, though.  So this series also adds a
BDRV_REQ_WRITE_UNCHANGED write request flag that is set when issuing a
COR write.  This flag tells the block code that the WRITE_UNCHANGED
permission will be sufficient to execute the request.  Filter drivers
need to pass this along; format drivers don't because they take a
full-on WRITE permission on their file anyway.

So most of this series is about adding this new flag.

(Note that this does nothing to fix the situation with the old
 copy-on-read=on.  That will still runs into a failed assertion if you
 poke it the right way.  But the cruel reality is that the only way to
 really fix this is by converting copy-on-read=on into an implicit COR
 node.)


Finally, the test we have for COR (197) is copied and modified to use
the COR filter; and another test is added for the situation described
above (COR through filter nodes).


=== What to do as a follow-up? ===

The obvious thing is to transform the current copy-on-read flag into an
implicit COR node.  Probably not too difficult, actually (just handle it
like snapshot=on), but we need to hide that fact in the query functions.

Also, we want to make the stream block job code use this node.  Maybe we
eventually want this node become the stream block job eventually,
actually.  That is, give it enough runtime options that you can perform
a stream operation just by inserting a COR node into the graph.  But
probably that won't be possible, because someone still needs to submit
reads requests across the whole disk so that everything is actually
copied.  This could be achieved with a blockdev-copy to null-co://,
though.

Once both is done, we want to remove the COR code from block.c and move
it directly into the COR driver.

Max Reitz (9):
  block: Add COR filter driver
  block: BLK_PERM_WRITE includes ..._UNCHANGED
  block: Add BDRV_REQ_WRITE_UNCHANGED flag
  block: Set BDRV_REQ_WRITE_UNCHANGED for COR writes
  block/quorum: Support BDRV_REQ_WRITE_UNCHANGED
  block: Support BDRV_REQ_WRITE_UNCHANGED in filters
  iotests: Clean up wrap image in 197
  iotests: Copy 197 for COR filter driver
  iotests: Add test for COR across nodes

 block/Makefile.objs|   2 +-
 qapi/block-core.json   |   5 +-
 include/block/block.h  |   9 ++-
 block/blkdebug.c   |   9 +--
 block/blkreplay.c  |   3 +
 block/blkverify.c  |   3 +
 block/cor.c| 173 +
 block/io.c |  12 +++-
 block/mirror.c |   2 +
 block/quorum.c |  19 +++--
 block/raw-format.c |   9 +--
 block/throttle.c   |   6 +-
 tests/qemu-iotests/197 |   1 +
 tests/qemu-iotests/215 | 120 +++
 tests/qemu-iotests/215.out |  26 +++
 tests/qemu-iotests/216 | 117 ++
 tests/qemu-iotests/216.out |  28 
 tests/qemu-iotests/group   |   2 +
 18 files changed, 524 insertions(+), 22 deletions(-)
 create mode 100644 block/cor.c
 create mode 100755 tests/qemu-iotests/215
 create mode 100644 tests/qemu-iotests/215.out
 create

[Qemu-devel] [PATCH 3/9] block: Add BDRV_REQ_WRITE_UNCHANGED flag

2018-04-16 Thread Max Reitz

This flag signifies that a write request will not change the visible
disk content.  With this flag set, it is sufficient to have the
BLK_PERM_WRITE_UNCHANGED permission instead of BLK_PERM_WRITE.

Signed-off-by: Max Reitz 
---
 include/block/block.h | 6 +-
 block/io.c| 6 +-
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 397b5e8d44..3894edda9d 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -54,8 +54,12 @@ typedef enum {
 BDRV_REQ_FUA= 0x10,
 BDRV_REQ_WRITE_COMPRESSED   = 0x20,
 
+/* Signifies that this write request will not change the visible disk
+ * content. */
+BDRV_REQ_WRITE_UNCHANGED= 0x40,
+
 /* Mask of valid flags */
-BDRV_REQ_MASK   = 0x3f,
+BDRV_REQ_MASK   = 0x7f,
 } BdrvRequestFlags;
 
 typedef struct BlockSizes {
diff --git a/block/io.c b/block/io.c
index bd9a19a9c4..134b2a498f 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1501,7 +1501,11 @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChild 
*child,
 assert(!waited || !req->serialising);
 assert(req->overlap_offset <= offset);
 assert(offset + bytes <= req->overlap_offset + req->overlap_bytes);
-assert(child->perm & BLK_PERM_WRITE);
+if (flags & BDRV_REQ_WRITE_UNCHANGED) {
+assert(child->perm & (BLK_PERM_WRITE_UNCHANGED | BLK_PERM_WRITE));
+} else {
+assert(child->perm & BLK_PERM_WRITE);
+}
 assert(end_sector <= bs->total_sectors || child->perm & BLK_PERM_RESIZE);
 
 ret = notifier_with_return_list_notify(>before_write_notifiers, req);
-- 
2.14.3

[Qemu-devel] [PATCH v3 1/4] pc-bios/s390-ccw: rename MAX_TABLE_ENTRIES to MAX_BOOT_ENTRIES

2018-04-16 Thread Collin Walling

The MAX_TABLE_ENTRIES constant has a name that is too generic. As we
want to declare a limit for boot menu entries, let's rename it to a more
fitting MAX_BOOT_ENTRIES and set its value to 31 (30 boot entries and
1 default entry). Also we move it from bootmap.h to s390-ccw.h to make
it available for menu.c in a later patch.

Signed-off-by: Collin Walling 
Reviewed-by: Thomas Huth 
Reviewed-by: Janosch Frank 
---
 pc-bios/s390-ccw/bootmap.c  | 6 +++---
 pc-bios/s390-ccw/bootmap.h  | 2 --
 pc-bios/s390-ccw/s390-ccw.h | 2 ++
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/pc-bios/s390-ccw/bootmap.c b/pc-bios/s390-ccw/bootmap.c
index fc2a9fe..2f79346 100644
--- a/pc-bios/s390-ccw/bootmap.c
+++ b/pc-bios/s390-ccw/bootmap.c
@@ -290,7 +290,7 @@ static void run_eckd_boot_script(block_number_t 
bmt_block_nr,
 }
 
 debug_print_int("loadparm", loadparm);
-IPL_assert(loadparm <= MAX_TABLE_ENTRIES, "loadparm value greater than"
+IPL_assert(loadparm < MAX_BOOT_ENTRIES, "loadparm value greater than"
" maximum number of boot entries allowed");
 
 memset(sec, FREE_SPACE_FILLER, sizeof(sec));
@@ -578,7 +578,7 @@ static void ipl_scsi(void)
 read_block(mbr->pt.blockno, sec, "Error reading Program Table");
 IPL_assert(magic_match(sec, ZIPL_MAGIC), "No zIPL magic in PT");
 
-while (program_table_entries <= MAX_TABLE_ENTRIES) {
+while (program_table_entries < MAX_BOOT_ENTRIES) {
 if (!prog_table->entry[program_table_entries].scsi.blockno) {
 break;
 }
@@ -593,7 +593,7 @@ static void ipl_scsi(void)
 }
 
 debug_print_int("loadparm", loadparm);
-IPL_assert(loadparm <= MAX_TABLE_ENTRIES, "loadparm value greater than"
+IPL_assert(loadparm < MAX_BOOT_ENTRIES, "loadparm value greater than"
" maximum number of boot entries allowed");
 
 zipl_run(_table->entry[loadparm].scsi); /* no return */
diff --git a/pc-bios/s390-ccw/bootmap.h b/pc-bios/s390-ccw/bootmap.h
index 07eb600..732c111 100644
--- a/pc-bios/s390-ccw/bootmap.h
+++ b/pc-bios/s390-ccw/bootmap.h
@@ -57,8 +57,6 @@ typedef union BootMapPointer {
 ExtEckdBlockPtr xeckd;
 } __attribute__ ((packed)) BootMapPointer;
 
-#define MAX_TABLE_ENTRIES  30
-
 /* aka Program Table */
 typedef struct BootMapTable {
 uint8_t magic[4];
diff --git a/pc-bios/s390-ccw/s390-ccw.h b/pc-bios/s390-ccw/s390-ccw.h
index fd18da2..2c9e601 100644
--- a/pc-bios/s390-ccw/s390-ccw.h
+++ b/pc-bios/s390-ccw/s390-ccw.h
@@ -94,6 +94,8 @@ bool menu_is_enabled_zipl(void);
 int menu_get_enum_boot_index(int entries);
 bool menu_is_enabled_enum(void);
 
+#define MAX_BOOT_ENTRIES  31
+
 static inline void fill_hex(char *out, unsigned char val)
 {
 const char hex[] = "0123456789abcdef";
-- 
2.7.4

[Qemu-devel] [PATCH v3 2/4] pc-bios/s390-ccw: fix loadparm initialization and int conversion

2018-04-16 Thread Collin Walling

Rename the loadparm char array in main.c to loadparm_str and
increased the size by one byte to account for a null termination
when converting the loadparm string to an int  via atoui. We
also allow the boot menu to be enabled when loadparm is set to
an empty string or a series of spaces.

Signed-off-by: Collin Walling 
Reported-by: Vasily Gorbik 
Reviewed-by: Thomas Huth 
Reviewed-by: Janosch Frank 
---
 hw/s390x/ipl.c  |  4 
 pc-bios/s390-ccw/main.c | 14 +++---
 2 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/hw/s390x/ipl.c b/hw/s390x/ipl.c
index fdeaec3..8907136 100644
--- a/hw/s390x/ipl.c
+++ b/hw/s390x/ipl.c
@@ -352,6 +352,10 @@ int s390_ipl_set_loadparm(uint8_t *loadparm)
 loadparm[i] = ascii2ebcdic[(uint8_t) lp[i]];
 }
 
+if (i < 8) {
+memset(loadparm + i, 0x40, 8 - i); /* fill with EBCDIC spaces */
+}
+
 g_free(lp);
 return 0;
 }
diff --git a/pc-bios/s390-ccw/main.c b/pc-bios/s390-ccw/main.c
index 9d9f8cf..26f9adf 100644
--- a/pc-bios/s390-ccw/main.c
+++ b/pc-bios/s390-ccw/main.c
@@ -15,11 +15,11 @@
 char stack[PAGE_SIZE * 8] __attribute__((__aligned__(PAGE_SIZE)));
 static SubChannelId blk_schid = { .one = 1 };
 IplParameterBlock iplb __attribute__((__aligned__(PAGE_SIZE)));
-static char loadparm[8] = { 0, 0, 0, 0, 0, 0, 0, 0 };
+static char loadparm_str[9] = { 0, 0, 0, 0, 0, 0, 0, 0, 0 };
 QemuIplParameters qipl;
 
 #define LOADPARM_PROMPT "PROMPT  "
-#define LOADPARM_EMPTY  ""
+#define LOADPARM_EMPTY  ""
 #define BOOT_MENU_FLAG_MASK (QIPL_FLAG_BM_OPTS_CMD | QIPL_FLAG_BM_OPTS_ZIPL)
 
 /*
@@ -45,7 +45,7 @@ void panic(const char *string)
 
 unsigned int get_loadparm_index(void)
 {
-return atoui(loadparm);
+return atoui(loadparm_str);
 }
 
 static bool find_dev(Schib *schib, int dev_no)
@@ -80,13 +80,13 @@ static bool find_dev(Schib *schib, int dev_no)
 
 static void menu_setup(void)
 {
-if (memcmp(loadparm, LOADPARM_PROMPT, 8) == 0) {
+if (memcmp(loadparm_str, LOADPARM_PROMPT, 8) == 0) {
 menu_set_parms(QIPL_FLAG_BM_OPTS_CMD, 0);
 return;
 }
 
 /* If loadparm was set to any other value, then do not enable menu */
-if (memcmp(loadparm, LOADPARM_EMPTY, 8) != 0) {
+if (memcmp(loadparm_str, LOADPARM_EMPTY, 8) != 0) {
 return;
 }
 
@@ -116,8 +116,8 @@ static void virtio_setup(void)
  */
 enable_mss_facility();
 
-sclp_get_loadparm_ascii(loadparm);
-memcpy(ldp + 10, loadparm, 8);
+sclp_get_loadparm_ascii(loadparm_str);
+memcpy(ldp + 10, loadparm_str, 8);
 sclp_print(ldp);
 
 memcpy(, early_qipl, sizeof(QemuIplParameters));
-- 
2.7.4

[Qemu-devel] [PATCH v3 4/4] pc-bios/s390-ccw: fix non-sequential boot entries (enum)

2018-04-16 Thread Collin Walling

zIPL boot menu entries can be non-sequential. Let's account
for this issue for the s390 enumerated boot menu. Since we
can no longer print a range of available entries to the
user, we have to present a list of each available entry.

An example of this menu:

  s390-ccw Enumerated Boot Menu.

   [0] default

   [1]
   [2]
   [7]
   [8]
   [9]
  [11]
  [12]

  Please choose:

Signed-off-by: Collin Walling 
Reported-by: Vasily Gorbik 
Reviewed-by: Thomas Huth 
Reviewed-by: Janosch Frank 
---
 pc-bios/s390-ccw/bootmap.c  | 12 +++-
 pc-bios/s390-ccw/menu.c | 29 -
 pc-bios/s390-ccw/s390-ccw.h |  2 +-
 3 files changed, 28 insertions(+), 15 deletions(-)

diff --git a/pc-bios/s390-ccw/bootmap.c b/pc-bios/s390-ccw/bootmap.c
index 2f79346..f9a2fb3 100644
--- a/pc-bios/s390-ccw/bootmap.c
+++ b/pc-bios/s390-ccw/bootmap.c
@@ -558,6 +558,8 @@ static void ipl_scsi(void)
 int program_table_entries = 0;
 BootMapTable *prog_table = (void *)sec;
 unsigned int loadparm = get_loadparm_index();
+bool valid_entries[MAX_BOOT_ENTRIES] = {false};
+size_t i;
 
 /* Grab the MBR */
 memset(sec, FREE_SPACE_FILLER, sizeof(sec));
@@ -578,18 +580,18 @@ static void ipl_scsi(void)
 read_block(mbr->pt.blockno, sec, "Error reading Program Table");
 IPL_assert(magic_match(sec, ZIPL_MAGIC), "No zIPL magic in PT");
 
-while (program_table_entries < MAX_BOOT_ENTRIES) {
-if (!prog_table->entry[program_table_entries].scsi.blockno) {
-break;
+for (i = 0; i < MAX_BOOT_ENTRIES; i++) {
+if (prog_table->entry[i].scsi.blockno) {
+valid_entries[i] = true;
+program_table_entries++;
 }
-program_table_entries++;
 }
 
 debug_print_int("program table entries", program_table_entries);
 IPL_assert(program_table_entries != 0, "Empty Program Table");
 
 if (menu_is_enabled_enum()) {
-loadparm = menu_get_enum_boot_index(program_table_entries);
+loadparm = menu_get_enum_boot_index(valid_entries);
 }
 
 debug_print_int("loadparm", loadparm);
diff --git a/pc-bios/s390-ccw/menu.c b/pc-bios/s390-ccw/menu.c
index aaf5d61..82a4ae6 100644
--- a/pc-bios/s390-ccw/menu.c
+++ b/pc-bios/s390-ccw/menu.c
@@ -228,19 +228,30 @@ int menu_get_zipl_boot_index(const char *menu_data)
 return get_boot_index(valid_entries);
 }
 
-
-int menu_get_enum_boot_index(int entries)
+int menu_get_enum_boot_index(bool *valid_entries)
 {
-char tmp[4];
+char tmp[3];
+int i;
 
-sclp_print("s390x Enumerated Boot Menu.\n\n");
+sclp_print("s390-ccw Enumerated Boot Menu.\n\n");
 
-sclp_print(uitoa(entries, tmp, sizeof(tmp)));
-sclp_print(" entries detected. Select from boot index 0 to ");
-sclp_print(uitoa(entries - 1, tmp, sizeof(tmp)));
-sclp_print(".\n\n");
+for (i = 0; i < MAX_BOOT_ENTRIES; i++) {
+if (valid_entries[i]) {
+if (i < 10) {
+sclp_print(" ");
+}
+sclp_print("[");
+sclp_print(uitoa(i, tmp, sizeof(tmp)));
+sclp_print("]");
+if (i == 0) {
+sclp_print(" default\n");
+}
+sclp_print("\n");
+}
+}
 
-return get_boot_index(entries);
+sclp_print("\n");
+return get_boot_index(valid_entries);
 }
 
 void menu_set_parms(uint8_t boot_menu_flag, uint32_t boot_menu_timeout)
diff --git a/pc-bios/s390-ccw/s390-ccw.h b/pc-bios/s390-ccw/s390-ccw.h
index 2c9e601..a1bdb4c 100644
--- a/pc-bios/s390-ccw/s390-ccw.h
+++ b/pc-bios/s390-ccw/s390-ccw.h
@@ -91,7 +91,7 @@ void zipl_load(void);
 void menu_set_parms(uint8_t boot_menu_flag, uint32_t boot_menu_timeout);
 int menu_get_zipl_boot_index(const char *menu_data);
 bool menu_is_enabled_zipl(void);
-int menu_get_enum_boot_index(int entries);
+int menu_get_enum_boot_index(bool *valid_entries);
 bool menu_is_enabled_enum(void);
 
 #define MAX_BOOT_ENTRIES  31
-- 
2.7.4

[Qemu-devel] [PATCH v3 3/4] pc-bios/s390-ccw: fix non-sequential boot entries (eckd)

2018-04-16 Thread Collin Walling

zIPL boot menu entries can be non-sequential. Let's account
for this issue for the s390 zIPL boot menu. Since this boot
menu is actually an imitation and is not completely capable
of everything the real zIPL menu can do, let's also print a
different banner to the user.

Signed-off-by: Collin Walling 
Reported-by: Vasily Gorbik 
Reviewed-by: Thomas Huth 
Reviewed-by: Janosch Frank 
---
 pc-bios/s390-ccw/menu.c | 29 -
 1 file changed, 20 insertions(+), 9 deletions(-)

diff --git a/pc-bios/s390-ccw/menu.c b/pc-bios/s390-ccw/menu.c
index 96eec81..aaf5d61 100644
--- a/pc-bios/s390-ccw/menu.c
+++ b/pc-bios/s390-ccw/menu.c
@@ -158,7 +158,7 @@ static void boot_menu_prompt(bool retry)
 }
 }
 
-static int get_boot_index(int entries)
+static int get_boot_index(bool *valid_entries)
 {
 int boot_index;
 bool retry = false;
@@ -168,7 +168,8 @@ static int get_boot_index(int entries)
 boot_menu_prompt(retry);
 boot_index = get_index();
 retry = true;
-} while (boot_index < 0 || boot_index >= entries);
+} while (boot_index < 0 || boot_index >= MAX_BOOT_ENTRIES ||
+ !valid_entries[boot_index]);
 
 sclp_print("\nBooting entry #");
 sclp_print(uitoa(boot_index, tmp, sizeof(tmp)));
@@ -176,7 +177,8 @@ static int get_boot_index(int entries)
 return boot_index;
 }
 
-static void zipl_println(const char *data, size_t len)
+/* Returns the entry number that was printed */
+static int zipl_print_entry(const char *data, size_t len)
 {
 char buf[len + 2];
 
@@ -185,12 +187,15 @@ static void zipl_println(const char *data, size_t len)
 buf[len + 1] = '\0';
 
 sclp_print(buf);
+
+return buf[0] == ' ' ? atoui(buf + 1) : atoui(buf);
 }
 
 int menu_get_zipl_boot_index(const char *menu_data)
 {
 size_t len;
-int entries;
+int entry;
+bool valid_entries[MAX_BOOT_ENTRIES] = {false};
 uint16_t zipl_flag = *(uint16_t *)(menu_data - ZIPL_FLAG_OFFSET);
 uint16_t zipl_timeout = *(uint16_t *)(menu_data - ZIPL_TIMEOUT_OFFSET);
 
@@ -202,19 +207,25 @@ int menu_get_zipl_boot_index(const char *menu_data)
 timeout = zipl_timeout * 1000;
 }
 
-/* Print and count all menu items, including the banner */
-for (entries = 0; *menu_data; entries++) {
+/* Print banner */
+sclp_print("s390-ccw zIPL Boot Menu\n\n");
+menu_data += strlen(menu_data) + 1;
+
+/* Print entries */
+while (*menu_data) {
 len = strlen(menu_data);
-zipl_println(menu_data, len);
+entry = zipl_print_entry(menu_data, len);
 menu_data += len + 1;
 
-if (entries < 2) {
+valid_entries[entry] = true;
+
+if (entry == 0) {
 sclp_print("\n");
 }
 }
 
 sclp_print("\n");
-return get_boot_index(entries - 1); /* subtract 1 to exclude banner */
+return get_boot_index(valid_entries);
 }
 
 
-- 
2.7.4

[Qemu-devel] [PATCH v3 0/4] Small fixes for s390x QEMU boot menu

2018-04-16 Thread Collin Walling

Change Log:

v3

- added r-b's
- added check around memset

v2

- added r-b's
- s/zipl_println/zipl_print_entry
- prints entry and returns entry number
- while loop now handles valid_entries

These patches fix the following:

- The QEMU zIPL boot menu does not allow accurate selection of
non-sequential entries.

- The QEMU zIPL boot menu does not have all the capabilities of the
real zIPL menu (such as commandline args). We should print a different
banner to reflect this.

- The loadparm array in main.c can end up being not null terminated when
converted to an integer via atoui.

- A loadparm set to an empty string does not allow a boot menu.

Collin Walling (4):
  pc-bios/s390-ccw: rename MAX_TABLE_ENTRIES to MAX_BOOT_ENTRIES
  pc-bios/s390-ccw: fix loadparm initialization and int conversion
  pc-bios/s390-ccw: fix non-sequential boot entries (eckd)
  pc-bios/s390-ccw: fix non-sequential boot entries (enum)

 hw/s390x/ipl.c  |  4 
 pc-bios/s390-ccw/bootmap.c  | 16 +++--
 pc-bios/s390-ccw/bootmap.h  |  2 --
 pc-bios/s390-ccw/main.c | 14 +--
 pc-bios/s390-ccw/menu.c | 58 +++--
 pc-bios/s390-ccw/s390-ccw.h |  4 +++-
 6 files changed, 63 insertions(+), 35 deletions(-)

-- 
2.7.4

Re: [Qemu-devel] [PATCH v11 14/17] hw/arm/virt: Introduce the iommu option

2018-04-16 Thread Peter Maydell

On 12 April 2018 at 08:38, Eric Auger  wrote:
> ARM virt machine now exposes a new "iommu" option.
> The SMMUv3 IOMMU is instantiated using -machine virt,iommu=smmuv3.
>
> Signed-off-by: Eric Auger 
> Signed-off-by: Prem Mallappa 
>
> ---
> v9 -> v10:
> - remove no_iommu
>
> v7 -> v8:
> - Revert to machine option, now dubbed "iommu", preparing for virtio
>   instantiation.
>
> v5 -> v6: machine 2_11
> ---
>  hw/arm/virt.c | 36 
>  1 file changed, 36 insertions(+)
>
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 1799702..a3398d6 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -1560,6 +1560,34 @@ static void virt_set_gic_version(Object *obj, const 
> char *value, Error **errp)
>  }
>  }
>
> +static char *virt_get_iommu(Object *obj, Error **errp)
> +{
> +VirtMachineState *vms = VIRT_MACHINE(obj);
> +
> +switch (vms->iommu) {
> +case VIRT_IOMMU_NONE:
> +return g_strdup("none");
> +case VIRT_IOMMU_SMMUV3:
> +return g_strdup("smmuv3");
> +default:
> +g_assert_not_reached();
> +}
> +}
> +
> +static void virt_set_iommu(Object *obj, const char *value, Error **errp)
> +{
> +VirtMachineState *vms = VIRT_MACHINE(obj);
> +
> +if (!strcmp(value, "smmuv3")) {
> +vms->iommu = VIRT_IOMMU_SMMUV3;
> +} else if (!strcmp(value, "none")) {
> +vms->iommu = VIRT_IOMMU_NONE;
> +} else {
> +error_setg(errp, "Invalid iommu value");
> +error_append_hint(errp, "Valid values are none, smmuv3.\n");
> +}
> +}
> +
>  static CpuInstanceProperties
>  virt_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
>  {
> @@ -1692,6 +1720,14 @@ static void virt_2_12_instance_init(Object *obj)
>  NULL);
>  }
>
> +/* Default disallows iommu instantiation */
> +vms->iommu = VIRT_IOMMU_NONE;
> +object_property_add_str(obj, "iommu", virt_get_iommu, virt_set_iommu, 
> NULL);
> +object_property_set_description(obj, "iommu",
> +"Set the IOMMU model among "
> +"none, smmuv3 (default none)",
> +NULL);

We should phrase this consistently with how we're describing other
options like the GIC version, so
   "Set the IOMMU type. Valid values are none and smmuv3"

Otherwise
Reviewed-by: Peter Maydell 

thanks
-- PMM

Re: [Qemu-devel] [PATCH v4 2/4] qobject: use a QObjectBase_ struct

2018-04-16 Thread Marc-André Lureau

On Mon, Apr 16, 2018 at 10:31 AM, Markus Armbruster  wrote:
> Markus Armbruster  writes:
>
>> Marc-André Lureau  writes:
>>
>>> By moving the base fields to a QObjectBase_, QObject can be a type
>>> which also has a 'base' field. This allows to write a generic
>>> QOBJECT() macro that will work with any QObject type, including
>>> QObject itself. The container_of() macro ensures that the object to
>>> cast has a QObjectBase_ base field, giving some type safety
>>> guarantees. However, for it to work properly, all QObject types must
>>> have 'base' at offset 0 (which is ensured by static checking from
>>> the previous patch)
>>
>> I'm afraid this condition is neither sufficient nor necessary.
>>
>> QOBJECT() maps a pointer to some subtype to the base type QObject.  For
>> this to work, the subtype must contain a QObject.
>>
>> Before the patch, this is trivially the case: the subtypes have a member
>> QObject base, and QOBJECT() returns its address.
>>
>> Afterwards, the subtypes have a member QObjectBase_ base, and QOBJECT()
>> returns the address of a notional QObject wrapped around this member.
>> Works because QObject has no other members.
>>
>> The condition 'base is at offset 0' does not ensure this, and is
>> therefore not sufficient.
>>
>> It's not necessary, either: putting base elsewhere would work just fine
>> as long as we put *all* of QObject there.
>>
>> Please document the real condition "QObject must have no members but
>> QObjectBase_ base, or else QOBJECT() breaks".  Feel free to check their
>> sizes are the same (I wouldn't bother).
>>
>> If requiring base to be at offset zero for all subtypes materially
>> simplifies code, then go ahead and do that in a separate patch.  My gut
>> feeling is it doesn't, but I could be wrong.
>
> Uh, there's another reason: existing type casts from QObject * to
> subtypes.  I just spotted one in tests/check-qdict.c:
>
> dst = (QDict *)qdict_crumple(src, _abort);
>
> There may well be more.
>

So we better have checks from patch 1, don't we?

Re: [Qemu-devel] [PATCH v11 08/17] hw/arm/smmuv3: Event queue recording helper

2018-04-16 Thread Peter Maydell

On 12 April 2018 at 08:37, Eric Auger  wrote:
> Let's introduce a helper function aiming at recording an
> event in the event queue.
>
> Signed-off-by: Eric Auger 
>
> ---
> v9 -> v10:
> - rework SMMU_EVENT_STRING
> - trigger a GERROR EVENTQ_ABT_ERR in case of eventq write failure
>
> v8 -> v9:
> - add SMMU_EVENT_STRING
>
> v7 -> v8:
> - use dma_addr_t instead of hwaddr in smmuv3_record_event()
> - introduce struct SMMUEventInfo
> - add event_stringify + helpers for all fields
> ---
>  hw/arm/smmuv3-internal.h | 142 
> ++-
>  hw/arm/smmuv3.c  | 108 +--
>  hw/arm/trace-events  |   1 +
>  3 files changed, 243 insertions(+), 8 deletions(-)
>
> diff --git a/hw/arm/smmuv3-internal.h b/hw/arm/smmuv3-internal.h
> index 8550be0..8e546bf 100644
> --- a/hw/arm/smmuv3-internal.h
> +++ b/hw/arm/smmuv3-internal.h
> @@ -205,8 +205,6 @@ static inline void smmu_write_cmdq_err(SMMUv3State *s, 
> uint32_t err_type)
>  s->cmdq.cons = FIELD_DP32(s->cmdq.cons, CMDQ_CONS, ERR, err_type);
>  }
>
> -void smmuv3_write_eventq(SMMUv3State *s, Evt *evt);
> -
>  /* Commands */
>
>  typedef enum SMMUCommandType {
> @@ -308,4 +306,144 @@ enum { /* Command completion notification */
>
>  #define SMMU_FEATURE_2LVL_STE (1 << 0)
>
> +/* Events */
> +
> +typedef enum SMMUEventType {
> +SMMU_EVT_OK = 0x00,
> +SMMU_EVT_F_UUT  = 0x01,
> +SMMU_EVT_C_BAD_STREAMID = 0x02,
> +SMMU_EVT_F_STE_FETCH= 0x03,
> +SMMU_EVT_C_BAD_STE  = 0x04,
> +SMMU_EVT_F_BAD_ATS_TREQ = 0x05,
> +SMMU_EVT_F_STREAM_DISABLED  = 0x06,
> +SMMU_EVT_F_TRANS_FORBIDDEN  = 0x07,
> +SMMU_EVT_C_BAD_SUBSTREAMID  = 0x08,
> +SMMU_EVT_F_CD_FETCH = 0x09,
> +SMMU_EVT_C_BAD_CD   = 0x0a,
> +SMMU_EVT_F_WALK_EABT= 0x0b,

Most of the other enums you let the auto-increment deal
with long runs of consecutive integers like this. I don't
much care which but being consistent is generally nicer.

> +SMMU_EVT_F_TRANSLATION  = 0x10,
> +SMMU_EVT_F_ADDR_SIZE= 0x11,
> +SMMU_EVT_F_ACCESS   = 0x12,
> +SMMU_EVT_F_PERMISSION   = 0x13,
> +SMMU_EVT_F_TLB_CONFLICT = 0x20,
> +SMMU_EVT_F_CFG_CONFLICT = 0x21,
> +SMMU_EVT_E_PAGE_REQ = 0x24,
> +} SMMUEventType;
> +
> +static const char *event_stringify[] = {
> +[SMMU_EVT_OK]   = "SMMU_EVT_OK",
> +[SMMU_EVT_F_UUT]= "SMMU_EVT_F_UUT",
> +[SMMU_EVT_C_BAD_STREAMID]   = "SMMU_EVT_C_BAD_STREAMID",
> +[SMMU_EVT_F_STE_FETCH]  = "SMMU_EVT_F_STE_FETCH",
> +[SMMU_EVT_C_BAD_STE]= "SMMU_EVT_C_BAD_STE",
> +[SMMU_EVT_F_BAD_ATS_TREQ]   = "SMMU_EVT_F_BAD_ATS_TREQ",
> +[SMMU_EVT_F_STREAM_DISABLED]= "SMMU_EVT_F_STREAM_DISABLED",
> +[SMMU_EVT_F_TRANS_FORBIDDEN]= "SMMU_EVT_F_TRANS_FORBIDDEN",
> +[SMMU_EVT_C_BAD_SUBSTREAMID]= "SMMU_EVT_C_BAD_SUBSTREAMID",
> +[SMMU_EVT_F_CD_FETCH]   = "SMMU_EVT_F_CD_FETCH",
> +[SMMU_EVT_C_BAD_CD] = "SMMU_EVT_C_BAD_CD",
> +[SMMU_EVT_F_WALK_EABT]  = "SMMU_EVT_F_WALK_EABT",
> +[SMMU_EVT_F_TRANSLATION]= "SMMU_EVT_F_TRANSLATION",
> +[SMMU_EVT_F_ADDR_SIZE]  = "SMMU_EVT_F_ADDR_SIZE",
> +[SMMU_EVT_F_ACCESS] = "SMMU_EVT_F_ACCESS",
> +[SMMU_EVT_F_PERMISSION] = "SMMU_EVT_F_PERMISSION",
> +[SMMU_EVT_F_TLB_CONFLICT]   = "SMMU_EVT_F_TLB_CONFLICT",
> +[SMMU_EVT_F_CFG_CONFLICT]   = "SMMU_EVT_F_CFG_CONFLICT",
> +[SMMU_EVT_E_PAGE_REQ]   = "SMMU_EVT_E_PAGE_REQ",
> +};
> +
> +static inline const char *smmu_event_string(SMMUEventType type)
> +{
> +return event_stringify[type] ? event_stringify[type] : "UNKNOWN";

Same remarks about being defensive about out of range values
apply here, I expect.

> +}
> +
> +/*  Encode an event record */
> +typedef struct SMMUEventInfo {
> +SMMUEventType type;
> +uint32_t sid;
> +bool recorded;
> +bool record_trans_faults;
> +union {
> +struct {
> +uint32_t ssid;
> +bool ssv;
> +dma_addr_t addr;
> +bool rnw;
> +bool pnu;
> +bool ind;
> +   } f_uut;
> +   struct ssid_info {
> +uint32_t ssid;
> +bool ssv;
> +   } c_bad_streamid;

If we were being really picky about coding style these
embedded struct names like ssid_info ought to be camelcase.

> +   struct ssid_addr_info {
> +uint32_t ssid;
> +bool ssv;
> +dma_addr_t addr;
> +   } f_ste_fetch;
> +   struct ssid_info c_bad_ste;
> +   struct {
> +dma_addr_t addr;
> +bool rnw;
> +   } f_transl_forbidden;
> +   struct {
> +uint32_t ssid;
> +

Re: [Qemu-devel] [PATCH v4 2/4] qobject: use a QObjectBase_ struct

2018-04-16 Thread Marc-André Lureau

Hi

On Mon, Apr 16, 2018 at 10:12 AM, Markus Armbruster  wrote:
> Marc-André Lureau  writes:
>
>> By moving the base fields to a QObjectBase_, QObject can be a type
>> which also has a 'base' field. This allows to write a generic
>> QOBJECT() macro that will work with any QObject type, including
>> QObject itself. The container_of() macro ensures that the object to
>> cast has a QObjectBase_ base field, giving some type safety
>> guarantees. However, for it to work properly, all QObject types must
>> have 'base' at offset 0 (which is ensured by static checking from
>> the previous patch)
>
> I'm afraid this condition is neither sufficient nor necessary.
>
> QOBJECT() maps a pointer to some subtype to the base type QObject.  For
> this to work, the subtype must contain a QObject.
>
> Before the patch, this is trivially the case: the subtypes have a member
> QObject base, and QOBJECT() returns its address.
>
> Afterwards, the subtypes have a member QObjectBase_ base, and QOBJECT()
> returns the address of a notional QObject wrapped around this member.
> Works because QObject has no other members.
>
> The condition 'base is at offset 0' does not ensure this, and is
> therefore not sufficient.
>
> It's not necessary, either: putting base elsewhere would work just fine
> as long as we put *all* of QObject there.
>
> Please document the real condition "QObject must have no members but
> QObjectBase_ base, or else QOBJECT() breaks".  Feel free to check their
> sizes are the same (I wouldn't bother).

ok

>
> If requiring base to be at offset zero for all subtypes materially
> simplifies code, then go ahead and do that in a separate patch.  My gut
> feeling is it doesn't, but I could be wrong.

what is missing from patch 1?

>
>> QObjectBase_ is not typedef and use a trailing underscore to make it
>> obvious it is not for normal use and to avoid potential abuse.
>
> Trailing underscore I like, lack of typedef I don't mind (but I'm firmly
> in the "eschew typedef for structs" camp).  It does violate the common
> QEMU coding style, though.
>
> A comment /* Not for use outside include/qapi/qmp/ */ next to
> QObjectBase_ wouldn't hurt.
>

ok

Re: [Qemu-devel] [PATCH for 2.13 v3 1/2] spapr: Add ibm, max-associativity-domains property

2018-04-16 Thread Serhii Popovych

Bharata B Rao wrote:
> On Wed, Apr 11, 2018 at 02:41:59PM -0400, Serhii Popovych wrote:
>> Now recent kernels (i.e. since linux-stable commit a346137e9142
>> ("powerpc/numa: Use ibm,max-associativity-domains to discover possible 
>> nodes")
>> support this property to mark initially memory-less NUMA nodes as "possible"
>> to allow further memory hot-add to them.
>>
>> Advertise this property for pSeries machines to let guest kernels detect
>> maximum supported node configuration and benefit from kernel side change
>> when hot-add memory to specific, possibly empty before, NUMA node.
>>
>> Signed-off-by: Serhii Popovych 
>> ---
>>  hw/ppc/spapr.c | 10 ++
>>  1 file changed, 10 insertions(+)
>>
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index a81570e..c05bbad 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -910,6 +910,13 @@ static void spapr_dt_rtas(sPAPRMachineState *spapr, 
>> void *fdt)
>>  0, cpu_to_be32(SPAPR_MEMORY_BLOCK_SIZE),
>>  cpu_to_be32(max_cpus / smp_threads),
>>  };
>> +uint32_t maxdomains[] = {
>> +cpu_to_be32(4),
>> +cpu_to_be32(0),
>> +cpu_to_be32(0),
>> +cpu_to_be32(0),
>> +cpu_to_be32(nb_numa_nodes - 1),
>> +};
>>
>>  _FDT(rtas = fdt_add_subnode(fdt, 0, "rtas"));
>>
>> @@ -946,6 +953,9 @@ static void spapr_dt_rtas(sPAPRMachineState *spapr, void 
>> *fdt)
>>  _FDT(fdt_setprop(fdt, rtas, "ibm,associativity-reference-points",
>>   refpoints, sizeof(refpoints)));
>>
>> +_FDT(fdt_setprop(fdt, rtas, "ibm,max-associativity-domains",
>> + maxdomains, sizeof(maxdomains)));
>> +
>>  _FDT(fdt_setprop_cell(fdt, rtas, "rtas-error-log-max",
>>RTAS_ERROR_LOG_MAX));
>>  _FDT(fdt_setprop_cell(fdt, rtas, "rtas-event-scan-rate",
>> -- 
>> 1.8.3.1
> 
> This commit causes hash guest with latest guest kernel to hang at early boot.

I use v4.16 tag from stable and can't reproduce on P8 machine reported
issue.

Could you please share more details about your setup, kernel commit id
you spot problem?

> 
> Quiescing Open Firmware ...
> Booting Linux via __start() @ 0x0200 ...
> [0.00] hash-mmu: Page sizes from device-tree:
> [0.00] hash-mmu: base_shift=12: shift=12, sllp=0x, 
> avpnm=0x, tlbiel=1, penc=0
> [0.00] hash-mmu: base_shift=16: shift=16, sllp=0x0110, 
> avpnm=0x, tlbiel=1, penc=1
> [0.00] Using 1TB segments
> [0.00] hash-mmu: Initializing hash mmu with SLB
> [0.00] Linux version 4.16.0-rc7+ (root@localhost.localdomain) (gcc 
> version 7.1.1 20170622 (Red Hat 7.1.1-3) (GCC)) #60 SMP Wed Apr 11 10:36:22 
> IST 2018
> [0.00] Found initrd at 0xc3c0:0xc4f9a34c
> [0.00] Using pSeries machine description
> [0.00] bootconsole [udbg0] enabled
> [0.00] Partition configured for 32 cpus.
> [0.00] CPU maps initialized for 1 thread per core
> [0.00] -
> [0.00] ppc64_pft_size= 0x1a
> [0.00] phys_mem_size = 0x2
> [0.00] dcache_bsize  = 0x80
> [0.00] icache_bsize  = 0x80
> [0.00] cpu_features  = 0x077c7a6c18500249
> [0.00]   possible= 0x18500649
> [0.00]   always  = 0x18100040
> [0.00] cpu_user_features = 0xdc0065c2 0xae00
> [0.00] mmu_features  = 0x78006001
> [0.00] firmware_features = 0x0001415a445f
> [0.00] htab_hash_mask= 0x7
> [0.00] -
> 
> No progess after this.
> 
> 


-- 
Thanks,
Serhii



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PULL 0/1] vhost: bugfix

2018-04-16 Thread Michael S. Tsirkin

The following changes since commit 38e83a71d02e026d4a6d0ab1ef9855c4924c2c68:

  Update version for v2.12.0-rc3 release (2018-04-11 19:03:24 +0100)

are available in the git repository at:

  git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream

for you to fetch changes up to aebbdbee559389a9f0d0e56670b9332534b6bb9b:

  vhost: do not verify ring mappings when IOMMU is enabled (2018-04-16 19:11:38 
+0300)


vhost: bugfix

This fixes a regression in vhost.

Signed-off-by: Michael S. Tsirkin 


Jason Wang (1):
  vhost: do not verify ring mappings when IOMMU is enabled

 hw/virtio/vhost.c | 4 
 1 file changed, 4 insertions(+)

[Qemu-devel] [PULL 1/1] vhost: do not verify ring mappings when IOMMU is enabled

2018-04-16 Thread Michael S. Tsirkin

From: Jason Wang 

When IOMMU is enabled, we store virtqueue metadata as iova (though it
may has _phys suffix) and access them through dma helpers. Any
translation failures could be reported by IOMMU.

In this case, trying to validate iova against gpa won't work and will
cause a false error reporting. So this patch bypasses the ring
verification if IOMMU is enabled which is similar to the behavior
before 0ca1fd2d6878 that calls vhost_memory_map() which is a nop when
IOMMU is enabled.

Fixes: 0ca1fd2d6878 ("vhost: Simplify ring verification checks")
Cc: Dr. David Alan Gilbert 
Cc: Igor Mammedov 
Cc: Michael S. Tsirkin 
Signed-off-by: Jason Wang 
Reviewed-by: Igor Mammedov 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/virtio/vhost.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index f51bf57..9d5850a 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -342,6 +342,10 @@ static int vhost_verify_ring_mappings(struct vhost_dev 
*dev,
 "used ring"
 };
 
+if (vhost_dev_has_iommu(dev)) {
+return 0;
+}
+
 for (i = 0; i < dev->nvqs; ++i) {
 struct vhost_virtqueue *vq = dev->vqs + i;
 
-- 
MST

Re: [Qemu-devel] [PATCH v11 07/17] hw/arm/smmuv3: Implement MMIO write operations

2018-04-16 Thread Peter Maydell

On 12 April 2018 at 08:37, Eric Auger  wrote:
> Now we have relevant helpers for queue and irq
> management, let's implement MMIO write operations.
>
> Signed-off-by: Eric Auger 
> Signed-off-by: Prem Mallappa 

> -int smmuv3_cmdq_consume(SMMUv3State *s)
> +static int smmuv3_cmdq_consume(SMMUv3State *s)
>  {
>  SMMUCmdError cmd_error = SMMU_CERROR_NONE;
>  SMMUQueue *q = >cmdq;
> @@ -269,11 +270,170 @@ int smmuv3_cmdq_consume(SMMUv3State *s)
>  return 0;
>  }
>
> -static MemTxResult smmu_write_mmio(void *opaque, hwaddr offset, uint64_t 
> data,

You changed the name of the argument here from 'value' to 'data', which
is why the diff has come out looking a bit awkward like this.
Better to be consistent with the name from the start.

> +static MemTxResult smmu_writell(SMMUv3State *s, hwaddr offset,
> +   uint64_t value, MemTxAttrs attrs)

Otherwise
Reviewed-by: Peter Maydell 

thanks
-- PMM

[Qemu-devel] [Bug 1654137] Re: Ctrl-A b not working in 2.8.0

2018-04-16 Thread Daniel Berrange

@elmarco could you take a look at this possible regression since bisect
claims it was due to the mux refactor

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1654137

Title:
  Ctrl-A b not working in 2.8.0

Status in QEMU:
  Confirmed

Bug description:
  With a recent update from 2.7.0 to 2.8.0 I have discovered that I can
  no longer send a "break" to the VM.  Ctrl-A b is simply ignored.
  Other Ctrl-A sequences seem to work correctly.

  This is on a NetBSD amd64 system, version 7.99.53, and qemu was
  installed on this system from source.

  Reverting to the previous install restores "break" capability.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1654137/+subscriptions

Re: [Qemu-devel] [PATCH v11 06/17] hw/arm/smmuv3: Queue helpers

2018-04-16 Thread Peter Maydell

On 12 April 2018 at 08:37, Eric Auger  wrote:
> We introduce helpers to read/write into the command and event
> circular queues.
>
> smmuv3_write_eventq and smmuv3_cmq_consume will become static
> in subsequent patches.
>
> Invalidation commands are not yet dealt with. We do not cache
> data that need to be invalidated. This will change with vhost
> integration.
>
> Signed-off-by: Eric Auger 
> Signed-off-by: Prem Mallappa 
>
> ---
> v9 -> v10:
> - simplified macros
> - s/BASE/Q_BASE
> - use log2size field
> - static inline functions replacing some macros
> - simplified queue_prod_incr/queue_cons_incr and use deposit32
> - trace for cmdq_consume failure
>
> v8 -> v9:
> - fix CMD_SSID & CMD_ADDR + some renamings
> - do cons increment after the execution of the command
> - add Q_INCONSISTENT()
>
> v7 -> v8
> - use address_space_rw
> - helpers inspired from spec
> ---
>  hw/arm/smmuv3-internal.h | 154 
> +++
>  hw/arm/smmuv3.c  | 136 +
>  hw/arm/trace-events  |   5 ++
>  3 files changed, 295 insertions(+)
>
> diff --git a/hw/arm/smmuv3-internal.h b/hw/arm/smmuv3-internal.h
> index 32f81d4..968fa25 100644
> --- a/hw/arm/smmuv3-internal.h
> +++ b/hw/arm/smmuv3-internal.h
> @@ -156,4 +156,158 @@ static inline bool 
> smmuv3_gerror_irq_enabled(SMMUv3State *s)
>  void smmuv3_trigger_irq(SMMUv3State *s, SMMUIrq irq, uint32_t gerror_mask);
>  void smmuv3_write_gerrorn(SMMUv3State *s, uint32_t gerrorn);
>
> +/* Queue Handling */
> +
> +#define Q_BASE(q)  ((q)->base & SMMU_BASE_ADDR_MASK)
> +#define WRAP_MASK(q)   (1 << (q)->log2size)
> +#define INDEX_MASK(q)  (((1 << (q)->log2size)) - 1)
> +#define WRAP_INDEX_MASK(q) ((1 << ((q)->log2size + 1)) - 1)
> +
> +#define Q_CONS(q) ((q)->cons & INDEX_MASK(q))
> +#define Q_PROD(q) ((q)->prod & INDEX_MASK(q))
> +
> +#define Q_CONS_ENTRY(q)  (Q_BASE(q) + (q)->entry_size * Q_CONS(q))
> +#define Q_PROD_ENTRY(q)  (Q_BASE(q) + (q)->entry_size * Q_PROD(q))
> +
> +#define Q_CONS_WRAP(q) (((q)->cons & WRAP_MASK(q)) >> (q)->log2size)
> +#define Q_PROD_WRAP(q) (((q)->prod & WRAP_MASK(q)) >> (q)->log2size)
> +
> +static inline bool smmuv3_q_full(SMMUQueue *q)
> +{
> +return ((q->cons ^ q->prod) & WRAP_INDEX_MASK(q)) == WRAP_MASK(q);
> +}
> +
> +static inline bool smmuv3_q_empty(SMMUQueue *q)
> +{
> +return (q->cons & WRAP_INDEX_MASK(q)) == (q->prod & WRAP_INDEX_MASK(q));
> +}
> +
> +static inline void queue_prod_incr(SMMUQueue *q)
> +{
> +q->prod = (q->prod + 1) & WRAP_INDEX_MASK(q);
> +}
> +
> +static inline void queue_cons_incr(SMMUQueue *q)
> +{
> +q->cons = deposit32(q->cons, 0, q->log2size + 1, q->cons + 1);
> +}

I was expecting these two functions to be implemented in
the same way. Worth a comment
/* We have to use deposit for the CONS registers to preserve
 * the ERR field in the high bits.
 */

or if you prefer, instead a comment in queue_prod_incr saying
   /* We don't need to use deposit here because there are no
* fields above WRAP in a PROD register
*/

(you don't need both comments).

> +
> +static inline bool smmuv3_cmdq_enabled(SMMUv3State *s)
> +{
> +return FIELD_EX32(s->cr[0], CR0, CMDQEN);
> +}
> +
> +static inline bool smmuv3_eventq_enabled(SMMUv3State *s)
> +{
> +return FIELD_EX32(s->cr[0], CR0, EVENTQEN);
> +}
> +
> +static inline void smmu_write_cmdq_err(SMMUv3State *s, uint32_t err_type)
> +{
> +s->cmdq.cons = FIELD_DP32(s->cmdq.cons, CMDQ_CONS, ERR, err_type);
> +}
> +
> +void smmuv3_write_eventq(SMMUv3State *s, Evt *evt);
> +
> +/* Commands */
> +
> +typedef enum SMMUCommandType {
> +SMMU_CMD_PREFETCH_CONFIG = 0x01,
> +SMMU_CMD_PREFETCH_ADDR,
> +SMMU_CMD_CFGI_STE,
> +SMMU_CMD_CFGI_STE_RANGE,
> +SMMU_CMD_CFGI_CD,
> +SMMU_CMD_CFGI_CD_ALL,
> +SMMU_CMD_CFGI_ALL,
> +SMMU_CMD_TLBI_NH_ALL = 0x10,
> +SMMU_CMD_TLBI_NH_ASID,
> +SMMU_CMD_TLBI_NH_VA,
> +SMMU_CMD_TLBI_NH_VAA,
> +SMMU_CMD_TLBI_EL3_ALL= 0x18,
> +SMMU_CMD_TLBI_EL3_VA = 0x1a,
> +SMMU_CMD_TLBI_EL2_ALL= 0x20,
> +SMMU_CMD_TLBI_EL2_ASID,
> +SMMU_CMD_TLBI_EL2_VA,
> +SMMU_CMD_TLBI_EL2_VAA,  /* 0x23 */
> +SMMU_CMD_TLBI_S12_VMALL  = 0x28,
> +SMMU_CMD_TLBI_S2_IPA = 0x2a,
> +SMMU_CMD_TLBI_NSNH_ALL   = 0x30,
> +SMMU_CMD_ATC_INV = 0x40,
> +SMMU_CMD_PRI_RESP,
> +SMMU_CMD_RESUME  = 0x44,
> +SMMU_CMD_STALL_TERM,
> +SMMU_CMD_SYNC,  /* 0x46 */

I don't think the comments are very useful here (and they're
not consistent -- you don't bother to mark 0x13 or 0x07).

> +} SMMUCommandType;
> +
> +static const char *cmd_stringify[] = {
> +[SMMU_CMD_PREFETCH_CONFIG] = "SMMU_CMD_PREFETCH_CONFIG",
> +[SMMU_CMD_PREFETCH_ADDR]   = "SMMU_CMD_PREFETCH_ADDR",
> +[SMMU_CMD_CFGI_STE]= "SMMU_CMD_CFGI_STE",
> +[SMMU_CMD_CFGI_STE_RANGE]  =

[Qemu-devel] [PULL 1/1] i386: Don't automatically enable FEAT_KVM_HINTS bits

2018-04-16 Thread Eduardo Habkost

The assumption in the cpu->max_features code is that anything
enabled on GET_SUPPORTED_CPUID should be enabled on "-cpu host".
This shouldn't be the case for FEAT_KVM_HINTS.

This adds a new FeatureWordInfo::no_autoenable_flags field, that
can be used to prevent FEAT_KVM_HINTS bits to be enabled
automatically.

Reviewed-by: Paolo Bonzini 
Signed-off-by: Eduardo Habkost 
Message-Id: <20180410211534.26079-1-ehabk...@redhat.com>
Tested-by: Wanpeng Li 
Signed-off-by: Eduardo Habkost 
---
 target/i386/cpu.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 1a6b082b6f..a20fe26573 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -295,6 +295,8 @@ typedef struct FeatureWordInfo {
 uint32_t tcg_features; /* Feature flags supported by TCG */
 uint32_t unmigratable_flags; /* Feature flags known to be unmigratable */
 uint32_t migratable_flags; /* Feature flags known to be migratable */
+/* Features that shouldn't be auto-enabled by "-cpu host" */
+uint32_t no_autoenable_flags;
 } FeatureWordInfo;
 
 static FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
@@ -400,6 +402,11 @@ static FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
 },
 .cpuid_eax = KVM_CPUID_FEATURES, .cpuid_reg = R_EDX,
 .tcg_features = TCG_KVM_FEATURES,
+/*
+ * KVM hints aren't auto-enabled by -cpu host, they need to be
+ * explicitly enabled in the command-line.
+ */
+.no_autoenable_flags = ~0U,
 },
 [FEAT_HYPERV_EAX] = {
 .feat_names = {
@@ -4062,7 +4069,8 @@ static void x86_cpu_expand_features(X86CPU *cpu, Error 
**errp)
  */
 env->features[w] |=
 x86_cpu_get_supported_feature_word(w, cpu->migratable) &
-~env->user_features[w];
+~env->user_features[w] & \
+~feature_word_info[w].no_autoenable_flags;
 }
 }
 
-- 
2.14.3

[Qemu-devel] [PULL 0/1] i386: Don't automatically enable FEAT_KVM_HINTS bits

2018-04-16 Thread Eduardo Habkost

Last remaining fix for -rc4.

The following changes since commit 042f6a31af3d38eefc6ec995cce1d762c41d4515:

  Merge remote-tracking branch 'remotes/maxreitz/tags/pull-block-2018-04-16' 
into staging (2018-04-16 15:30:54 +0100)

are available in the Git repository at:

  git://github.com/ehabkost/qemu.git tags/x86-next-pull-request

for you to fetch changes up to 0d914f39a77954bf5472b16130aeefe573a068f5:

  i386: Don't automatically enable FEAT_KVM_HINTS bits (2018-04-16 13:36:52 
-0300)


i386: Don't automatically enable FEAT_KVM_HINTS bits

Bug fix for "-cpu host" with newer kernels.



Eduardo Habkost (1):
  i386: Don't automatically enable FEAT_KVM_HINTS bits

 target/i386/cpu.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

-- 
2.14.3

Re: [Qemu-devel] [PATCH 0/3] Remove artificial length limits when parsing options

2018-04-16 Thread Daniel P . Berrangé

On Mon, Apr 16, 2018 at 06:30:45PM +0200, Markus Armbruster wrote:
> Daniel P. Berrangé  writes:
> 
> > A user trying out SMBIOS "OEM strings" feature reported that the data
> > they are exposing to the guest was truncated at 1023 bytes, which breaks
> > the app consuming in the guest. After searching for the cause I
> > eventually found that the QemuOpts parsing is using fixed length 1024
> > byte array for option values and 128 byte array for key names.
> >
> > We can certainly debate whether it is sane to have such long command
> > line argument values (it is not sane), but if the OS was capable of
> > exec'ing QEMU with such an ARGV array, there is little good reason for
> > imposing an artificial length restriction when parsing it. Even worse is
> > that we silently truncate without reporting an error when hitting limits
> > resulting in a semantically incorrect behaviour, possibly even leading
> > to security flaws depending on the data that was truncated.
> >
> > Thus this patch series removes the artificial length limits by killing
> > the fixed length buffers.
> >
> > Separately I intend to make it possible to read "OEM strings" data from
> > a file, to avoid need to have long command line args.
> 
> Too bad I haven't been able to complete my quest to kill QemuOpts.
> 
> As far as I know, keyval.c's only arbitrary limit is the length of a key
> fragment (the things separated by '.').

Looks like that's the same scenario I tried to address in patch 2. The
'key' part in QemuOpts has the same 128 byte limit as in the keyval.c
code. I fear that could be hit with -blockdev when setting params on
very deeply nested block backends.  On the plus side  keyval.c actually
reports an error when it hits its 128 byte limit, instead of silently
carrying on as if all was well like QemuOpts did :-)

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [Qemu-devel] [PATCH] WHPX fixes an issue with CPUID 1 not returning CPUID_EXT_HYPERVISOR

2018-04-16 Thread Eduardo Habkost

On Thu, Apr 05, 2018 at 05:50:20PM +0200, Paolo Bonzini wrote:
> On 28/03/2018 22:48, Justin Terry (VM) wrote:
[...]
> > If we use [2] to inject the answers at creation time WHPX needs access
> > to the CPUX86State at accel init which also doesn't seem to be possible
> > in QEMU today. WHPX could basically just call cpu_x86_cpuid() for each
> > CPUID QEMU cares about and plumb the answer before start. This has the
> > best performance as we avoid the additional exits but has an issue in
> > that the results must be known ahead of time.
> 
> The earliest where you have access to that is x86_cpu_initfn.

x86_cpu_initfn() is the earliest you have access to the CPU
object, but note that the final CPUID bits (based on -cpu
options, accel data, and possibly other input) are known only
when x86_cpu_realizefn() is called.

-- 
Eduardo

Re: [Qemu-devel] [PATCH 0/3] Remove artificial length limits when parsing options

2018-04-16 Thread Markus Armbruster

Daniel P. Berrangé  writes:

> A user trying out SMBIOS "OEM strings" feature reported that the data
> they are exposing to the guest was truncated at 1023 bytes, which breaks
> the app consuming in the guest. After searching for the cause I
> eventually found that the QemuOpts parsing is using fixed length 1024
> byte array for option values and 128 byte array for key names.
>
> We can certainly debate whether it is sane to have such long command
> line argument values (it is not sane), but if the OS was capable of
> exec'ing QEMU with such an ARGV array, there is little good reason for
> imposing an artificial length restriction when parsing it. Even worse is
> that we silently truncate without reporting an error when hitting limits
> resulting in a semantically incorrect behaviour, possibly even leading
> to security flaws depending on the data that was truncated.
>
> Thus this patch series removes the artificial length limits by killing
> the fixed length buffers.
>
> Separately I intend to make it possible to read "OEM strings" data from
> a file, to avoid need to have long command line args.

Too bad I haven't been able to complete my quest to kill QemuOpts.

As far as I know, keyval.c's only arbitrary limit is the length of a key
fragment (the things separated by '.').

[Qemu-devel] [Bug 1759264] Re: fpu/softfloat: round_to_int_and_pack refactor broke TriCore ftoi insns

2018-04-16 Thread Bastian Koppelmann

** Changed in: qemu
   Status: New => Fix Committed

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1759264

Title:
  fpu/softfloat: round_to_int_and_pack refactor broke TriCore ftoi insns

Status in QEMU:
  Fix Committed

Bug description:
  After the refactor from ab52f973a504f8de0c5df64631ba4caea70a7d9e the
  bahaviour of int32_to_float32() was altered.

  helper_ftoi() in target/tricore/fpu_helper.c relied on
  int32_to_float32 to raise the invalid flag if the input was NaN to
  properly return 0. Likewise if the input is infinity.

  The obvious fix for softfloat would be to raise this flag in 
round_to_int_and_pack(). However,
  I'm not sure if this breaks other targets and I have no easy way to test it.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1759264/+subscriptions

Re: [Qemu-devel] [PATCH v2 2/2] vl: Partial support for non-scalar properties with -object

2018-04-16 Thread Daniel P . Berrangé

On Mon, Aug 14, 2017 at 01:24:17PM +0200, Markus Armbruster wrote:
> "Daniel P. Berrange"  writes:
> 
> > On Fri, Aug 11, 2017 at 12:47:44PM -0500, Eric Blake wrote:
> >> On 08/11/2017 11:05 AM, Markus Armbruster wrote:
> >> > We've wanted -object to support non-scalar properties for a while.
> >> > Dan Berrange tried in "[PATCH v4 00/10]Provide a QOM-based
> >> > authorization API".  Review led to the conclusion that we need to
> >> > replace rather than add to QemuOpts.  Initial work towards that goal
> >> > has been merged to provide -blockdev (commit 8746709), but there's
> >> > substantial work left, mostly due to an bewildering array of
> >> > compatibility problems.
> >> > 
> >> > Even if a full solution is still out of reach, we can have a partial
> >> > solution now: accept -object argument in JSON syntax.  This should
> >> > unblock development work that needs non-scalar properties with
> >> > -object.
> >> > 
> >> > The implementation is similar to -blockdev, except we use the new
> >> > infrastructure only for the new JSON case, and stick to QemuOpts for
> >> > the existing KEY=VALUE,... case, to sidestep compatibility problems.
> >> > 
> >> > If we did this for more options, we'd have to factor out common code.
> >> > But for one option, this will do.
> >> > 
> >> > Signed-off-by: Markus Armbruster 
> >> > ---
> >> >  qapi-schema.json | 14 +++---
> >> >  vl.c | 51 
> >> > +++
> >> >  2 files changed, 62 insertions(+), 3 deletions(-)
> >> > 
> >> >  static void object_create(bool (*type_predicate)(const char *))
> >> >  {
> >> > +ObjectOptionsQueueEntry *e, *next;
> >> > +
> >> > +QSIMPLEQ_FOREACH_SAFE(e, _queue, entry, next) {
> >> > +if (!type_predicate(e->oo->qom_type)) {
> >> > +continue;
> >> > +}
> >> > +
> >> > +loc_push_restore(>loc);
> >> > +qmp_object_add(e->oo->qom_type, e->oo->id,
> >> > +   e->oo->has_props, e->oo->props, _fatal);
> >> > +loc_pop(>loc);
> >> > +
> >> > +QSIMPLEQ_REMOVE(_queue, e, ObjectOptionsQueueEntry, entry);
> >> > +qapi_free_ObjectOptions(e->oo);
> >> > +}
> >> > +
> >> >  if (qemu_opts_foreach(qemu_find_opts("object"),
> >> 
> >> This handles all JSON forms prior to any QemuOpt forms (within the two
> >> priority levels), such that a command line using:
> >> 
> >>  -object type,id=1,oldstyle... -object '{'id':2, 'type':..., newstyle...}'
> >> 
> >> processes the arguments in a different order than
> >> 
> >>  -object type,id=1,oldstyle... -object type,id=2,oldstyle
> >> 
> >> But I don't see that as too bad (ideally, someone using the {} JSON
> >> style will use it consistently).
> >
> > I don't really like such a constraint - the ordering of object
> > creation is already complex with some objets created at a different
> > point in startup to other objects. Adding yet another constraint
> > feels like it is painting ourselves into a corner wrt future changes.
> 
> The full solution will evaluate -object left to right.
> 
> This partial solution doesn't, but it's not meant for use in anger, just
> for unblocking development work.  Can add scary warnings to deter
> premature use.
> 
> > In particular I think it is quite possible to use the dotted
> > form primarily, and only use JSON for the immediate scenario
> > where non-JSON form won't work - I expect that's how we would
> > use it in libvirt - I don't see libvirt changing 100% to JSON
> > based objects
> 
> You need the JSON form anyway for QMP, and for the cases where dotted
> keys break down.  Doing both just for the command line requires code to
> do dotted keys (which may already exist), and code to decide whether it
> can be used (which probably doesn't exist, yet).
> 
> Wouldn't this add complexity?  For what benefit exactly?

Surprisingly, it appears we do actually have code that generates the
JSON syntax for (probably) all uses of -object today. In fact we are
actually generating JSON and then converting it to dotted syntax in
most cases, which I didn't realize when writing the above.

We'll have to keep support for dotted syntax around a while for old
QEMU versions, but it looks like we could reasonably easily switch
to JSON syntax for all -object usage at the same time.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [Qemu-devel] [PATCH v2 2/2] vl: Partial support for non-scalar properties with -object

2018-04-16 Thread Daniel P . Berrangé

On Mon, Aug 14, 2017 at 07:49:54AM +0200, Markus Armbruster wrote:
> Eric Blake  writes:
> 
> > On 08/11/2017 11:05 AM, Markus Armbruster wrote:
> >> We've wanted -object to support non-scalar properties for a while.
> >> Dan Berrange tried in "[PATCH v4 00/10]Provide a QOM-based
> >> authorization API".  Review led to the conclusion that we need to
> >> replace rather than add to QemuOpts.  Initial work towards that goal
> >> has been merged to provide -blockdev (commit 8746709), but there's
> >> substantial work left, mostly due to an bewildering array of
> >> compatibility problems.
> >> 
> >> Even if a full solution is still out of reach, we can have a partial
> >> solution now: accept -object argument in JSON syntax.  This should
> >> unblock development work that needs non-scalar properties with
> >> -object.
> >> 
> >> The implementation is similar to -blockdev, except we use the new
> >> infrastructure only for the new JSON case, and stick to QemuOpts for
> >> the existing KEY=VALUE,... case, to sidestep compatibility problems.
> >> 
> >> If we did this for more options, we'd have to factor out common code.
> >> But for one option, this will do.
> >> 
> >> Signed-off-by: Markus Armbruster 
> >> ---
> >>  qapi-schema.json | 14 +++---
> >>  vl.c | 51 +++
> >>  2 files changed, 62 insertions(+), 3 deletions(-)
> >> 
> >>  static void object_create(bool (*type_predicate)(const char *))
> >>  {
> >> +ObjectOptionsQueueEntry *e, *next;
> >> +
> >> +QSIMPLEQ_FOREACH_SAFE(e, _queue, entry, next) {
> >> +if (!type_predicate(e->oo->qom_type)) {
> >> +continue;
> >> +}
> >> +
> >> +loc_push_restore(>loc);
> >> +qmp_object_add(e->oo->qom_type, e->oo->id,
> >> +   e->oo->has_props, e->oo->props, _fatal);
> >> +loc_pop(>loc);
> >> +
> >> +QSIMPLEQ_REMOVE(_queue, e, ObjectOptionsQueueEntry, entry);
> >> +qapi_free_ObjectOptions(e->oo);
> >> +}
> >> +
> >>  if (qemu_opts_foreach(qemu_find_opts("object"),
> >
> > This handles all JSON forms prior to any QemuOpt forms (within the two
> > priority levels), such that a command line using:
> >
> >  -object type,id=1,oldstyle... -object '{'id':2, 'type':..., newstyle...}'
> >
> > processes the arguments in a different order than
> >
> >  -object type,id=1,oldstyle... -object type,id=2,oldstyle
> >
> > But I don't see that as too bad (ideally, someone using the {} JSON
> > style will use it consistently).
> 
> Yes, that's another restriction: if you need your -object evaluated in a
> certain order, you may have to stick to one of the two syntax variants.
> 
> Aside: there are two sane evaluation orders: (1) strictly left to right,
> and (2) order doesn't matter.  QEMU of course does (3) unpredicable for
> humans without referring back to the source code.

IIUC, to "fix" the ordering problem we need to be able to consider the
ordering of all QEMU args, not just -object.  

The horrible hack with the two stage setup of -object in vl.c is driven by
the fact that some objects are referenced by -device/-chardev args, while
objects are referencing -device/-chardev args etc. This is the big problem
to untangle, and understandable you don't want to tackle that for this
patch. Until we can figure out how to address the big problem, it would be
nice not to introduce yet another ordering though driven off usage of
json vs non-json syntax.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [Qemu-devel] [PATCH v2 0/2] vl: Partial support for non-scalar properties with -object

2018-04-16 Thread Daniel P . Berrangé

Just a ping to say I'd like us to restart work this patch and try to get
it mergable for the 2.13 cycle, so I can rely it on for the ACL support
I've had out of tree since 2.6 :-)

On Fri, Aug 11, 2017 at 06:05:20PM +0200, Markus Armbruster wrote:
> v2:
> * PATCH 1: Whitespace change dropped [Eric]
> * PATCH 2: Deallocation done differently [Paolo], R-by dropped
>Commit message typo [Eric]
> 
> Markus Armbruster (2):
>   vl: Factor object_create() out of main()
>   vl: Partial support for non-scalar properties with -object
> 
>  qapi-schema.json | 14 ---
>  vl.c | 71 
> 
>  2 files changed, 72 insertions(+), 13 deletions(-)
> 
> -- 
> 2.7.5
> 

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [Qemu-devel] [PATCH 08/12] os-posix: Provide new -runas : facility

2018-04-16 Thread Markus Armbruster

Ian Jackson  writes:

> Thanks for the review.  Taking your comments out of order slightly:
>
> Markus Armbruster writes ("Re: [Qemu-devel] [PATCH 08/12] os-posix: Provide 
> new -runas : facility"):
>> [change_process_uid] is the only user of @user_pwd, @user_uid, @user_gid.
>> 
>> Have you considered replacing global @user_pwd by @user_uid, @user_gid
>> and @user_name?  --runas with numeric uid and gid would leave @user_name
>> null.
>
> That would defer the getpwnam from argument parsing to os_setup_post.
> I think that's undesriable.

No argument.  But why can't os_parse_cmd_args() call getpwnam() as it
does now, then store user_pwd->pw_uid, ->pw_gid and ->pw_name instead of
user_pwd?  Store a null name when it parses the argument as UID:GID.

>> Ian Jackson  writes:
>> >  static struct passwd *user_pwd;
>> > +static uid_t user_uid = (uid_t)-1;
>> > +static gid_t user_gid = (gid_t)-1;
>> 
>> As we'll see below, @user_pwd->pw_uid, @user_pwd_pw_gid take precedence
>> over @user_uid, @user_gid.  Awkward.
>
> My patch has the right behaviour: each -runas completely overrides the
> previous one.  -runas that sets user_{uid,gid} always clears user_pwd
> on the way.  So user_pwd can only be set if the most recent -runas was
> a name, and then we should honour the name.
>
> This is rather obscure.  I think you are right that this is confusing.
> It ought to be clearer.
>
> I will
>   - add a comment next to these three variables saying they must
> all be set at the same time
>   - explicitly (redundantly) clear user_pwd in os_parse_runas_uid_gid
>   - explicitly set user_{uid,gid} to -1 when -runas gets a
> success from getpwnam
>   - assert in change_process_uid that the combination is legal

Yes, that's better.  But perhaps you like my idea above.

[...]

Re: [Qemu-devel] [PATCH v2 2/2] docs: Document the new default sizes of the qcow2 caches

2018-04-16 Thread Max Reitz

On 2018-03-14 09:29, Alberto Garcia wrote:
> We have just reduced the refcount cache size to the minimum unless
> the user explicitly requests a larger one, so we have to update the
> documentation to reflect this change.
> 
> Signed-off-by: Alberto Garcia 
> Reviewed-by: Eric Blake 
> ---
>  docs/qcow2-cache.txt | 31 ++-
>  1 file changed, 14 insertions(+), 17 deletions(-)
> 
> diff --git a/docs/qcow2-cache.txt b/docs/qcow2-cache.txt
> index 170191a242..c640d45d06 100644
> --- a/docs/qcow2-cache.txt
> +++ b/docs/qcow2-cache.txt
> @@ -116,31 +116,28 @@ There are three options available, and all of them take 
> bytes:
>  "refcount-cache-size":   maximum size of the refcount block cache
>  "cache-size":maximum size of both caches combined
>  
> -There are two things that need to be taken into account:
> +There are a few things that need to be taken into account:
>  
>   - Both caches must have a size that is a multiple of the cluster size
> (or the cache entry size: see "Using smaller cache sizes" below).
>  
> - - If you only set one of the options above, QEMU will automatically
> -   adjust the others so that the L2 cache is 4 times bigger than the
> -   refcount cache.
> + - The default L2 cache size is 8 clusters or 1MB (whichever is more),
> +   and the minimum is 2 clusters (or 2 cache entries, see below).
>  
> -This means that these options are equivalent:
> + - The default (and minimum) refcount cache size is 4 clusters.
>  
> -   -drive file=hd.qcow2,l2-cache-size=2097152
> -   -drive file=hd.qcow2,refcount-cache-size=524288
> -   -drive file=hd.qcow2,cache-size=2621440
> + - If only "cache-size" is specified then QEMU will assign as much
> +   memory as possible to the L2 cache before increasing the refcount
> +   cache size.
>  
> -The reason for this 1/4 ratio is to ensure that both caches cover the
> -same amount of disk space. Note however that this is only valid with
> -the default value of refcount_bits (16). If you are using a different
> -value you might want to calculate both cache sizes yourself since QEMU
> -will always use the same 1/4 ratio.
> +Unlike L2 tables, refcount blocks are not used during normal I/O but
> +only during allocations and internal snapshots. In most cases they are
> +accessed sequentially (even during random guest I/O) so increasing the
> +refcount cache size won't have any measurable effect in performance.

Could you add a note that it can have a measurable effect when using
internal snapshots and the user may want to think about increasing its
size when making heavy use of them?

Max

> -It's also worth mentioning that there's no strict need for both caches
> -to cover the same amount of disk space. The refcount cache is used
> -much less often than the L2 cache, so it's perfectly reasonable to
> -keep it small.
> +Before QEMU 2.12 the refcount cache had a default size of 1/4 of the
> +L2 cache size. This resulted in unnecessarily large caches, so now the
> +refcount cache is as small as possible unless overridden by the user.
>  
>  
>  Using smaller cache entries
> 




signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v2 1/2] qcow2: Give the refcount cache the minimum possible size by default

2018-04-16 Thread Max Reitz

On 2018-03-14 09:29, Alberto Garcia wrote:
> The L2 and refcount caches have default sizes that can be overridden
> using the l2-cache-size and refcount-cache-size (an additional
> parameter named cache-size sets the combined size of both caches).
> 
> Unless forced by one of the aforementioned parameters, QEMU will set
> the unspecified sizes so that the L2 cache is 4 times larger than the
> refcount cache.
> 
> This is based on the premise that the refcount metadata needs to be
> only a fourth of the L2 metadata to cover the same amount of disk
> space. This is incorrect for two reasons:
> 
>  a) The amount of disk covered by an L2 table depends solely on the
> cluster size, but in the case of a refcount block it depends on
> the cluster size *and* the width of each refcount entry.
> The 4/1 ratio is only valid with 16-bit entries (the default).
> 
>  b) When we talk about disk space and L2 tables we are talking about
> guest space (L2 tables map guest clusters to host clusters),
> whereas refcount blocks are used for host clusters (including
> L1/L2 tables and the refcount blocks themselves). On a fully
> populated (and uncompressed) qcow2 file, image size > virtual size
> so there are more refcount entries than L2 entries.
> 
> Problem (a) could be fixed by adjusting the algorithm to take into
> account the refcount entry width. Problem (b) could be fixed by
> increasing a bit the refcount cache size to account for the clusters
> used for qcow2 metadata.
> 
> However this patch takes a completely different approach and instead
> of keeping a ratio between both cache sizes it assigns as much as
> possible to the L2 cache and the remainder to the refcount cache.
> 
> The reason is that L2 tables are used for every single I/O request
> from the guest and the effect of increasing the cache is significant
> and clearly measurable. Refcount blocks are however only used for
> cluster allocation and internal snapshots and in practice are accessed
> sequentially in most cases, so the effect of increasing the cache is
> negligible (even when doing random writes from the guest).
> 
> So, make the refcount cache as small as possible unless the user
> explicitly asks for a larger one.
> 
> Signed-off-by: Alberto Garcia 
> Reviewed-by: Eric Blake 
> ---
>  block/qcow2.c  | 31 +++
>  block/qcow2.h  |  4 
>  tests/qemu-iotests/137.out |  2 +-
>  3 files changed, 20 insertions(+), 17 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [Bug 1654137] Re: Ctrl-A b not working in 2.8.0

2018-04-16 Thread Andreas Gustafsson

This regression is still unfixed three months after being reported, and
it's rendering qemu 2.11.1 unusable for my present use case, so I just
reverted my system to the ever reliable qemu 0.15.1.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1654137

Title:
  Ctrl-A b not working in 2.8.0

Status in QEMU:
  Confirmed

Bug description:
  With a recent update from 2.7.0 to 2.8.0 I have discovered that I can
  no longer send a "break" to the VM.  Ctrl-A b is simply ignored.
  Other Ctrl-A sequences seem to work correctly.

  This is on a NetBSD amd64 system, version 7.99.53, and qemu was
  installed on this system from source.

  Reverting to the previous install restores "break" capability.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1654137/+subscriptions

Re: [Qemu-devel] [PATCH] checkpatch: warn about missing MAINTAINERS file changes

2018-04-16 Thread Markus Armbruster

Peter Maydell  writes:

> On 16 April 2018 at 15:11, Markus Armbruster  wrote:
>> Peter Maydell  writes:
>>
>>> On 12 March 2018 at 13:18, Stefan Hajnoczi  wrote:
 Warn if files are added/renamed/deleted without MAINTAINERS file
 changes.  This has helped me in Linux and we could benefit from this
 check in QEMU.

 This patch is a manual cherry-pick of Linux commit
 13f1937ef33950b1112049972249e6191b82e6c9 ("checkpatch: emit a warning on
 file add/move/delete") by Joe Perches .

 Signed-off-by: Stefan Hajnoczi 
 ---
>>>
>>> Unfortunately the patch doesn't magically cause maintainers
>>> to appear for the new files :-)
>>
>> Fortunately, the patch can get new files non-magically rejected unless a
>> maintainers appears :)
>
> I think that "author of code lists themselves in MAINTAINERS
> but then doesn't in practice do anything" is not really much
> better (and arguably worse) than "code has no listed maintainer".

Having our tooling flag new and moved files for a possible MAINTAINERS
update need not mean adding J. Random Codeslinger to MAINTAINERS.  It
should make us stop and think.

If the kernel guys can do that, why can't we?

Re: [Qemu-devel] [PATCH for-2.12?] m25p80: Correct the at25128a/at25256a eeproms size

2018-04-16 Thread mar.krzeminski


W dniu 15.04.2018 o 22:31, Philippe Mathieu-Daudé pisze:

>From the datasheet (3368J–SEEPR) description:
 The AT25128A/256A provides 131,072/262,144 bits of serial
 electrically-erasable programmable read only memory (EEPROM)
 organized as 16,384/32,768 words of 8 bits each.

However QEMU models the flash size in bytes.
Correct the at25128a/at25256a entries to reflect the datasheet size.

Signed-off-by: Philippe Mathieu-Daudé 

Reviewed-by: Marcin Krzeminski 

---
This is not a regression, so can now wait for 2.13.

  hw/block/m25p80.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/block/m25p80.c b/hw/block/m25p80.c
index b49c8e9caa..d0b9fbfc50 100644
--- a/hw/block/m25p80.c
+++ b/hw/block/m25p80.c
@@ -196,8 +196,8 @@ static const FlashPartInfo known_devices[] = {
  /* Atmel EEPROMS - it is assumed, that don't care bit in command
   * is set to 0. Block protection is not supported.
   */
-{ INFO("at25128a-nonjedec", 0x0, 0, 1, 131072, EEPROM) },
-{ INFO("at25256a-nonjedec", 0x0, 0, 1, 262144, EEPROM) },
+{ INFO("at25128a-nonjedec", 0x0, 0, 1, 16384, EEPROM) },
+{ INFO("at25256a-nonjedec", 0x0, 0, 1, 32768, EEPROM) },

Thanks!
Marcin
  
  /* EON -- en25xxx */

  { INFO("en25f32", 0x1c3116,  0,  64 << 10,  64, ER_4K) },

Re: [Qemu-devel] [PATCH v4 3/5] s390x/cpumodel: Set up CPU model for AP device support

2018-04-16 Thread David Hildenbrand


>  
> diff --git a/target/s390x/gen-features.c b/target/s390x/gen-features.c
> index 0cdbc15..0d5b0f7 100644
> --- a/target/s390x/gen-features.c
> +++ b/target/s390x/gen-features.c
> @@ -447,6 +447,9 @@ static uint16_t full_GEN12_GA1[] = {
>  S390_FEAT_ADAPTER_INT_SUPPRESSION,
>  S390_FEAT_EDAT_2,
>  S390_FEAT_SIDE_EFFECT_ACCESS_ESOP2,
> +S390_FEAT_AP_QUERY_CONFIG_INFO,
> +S390_FEAT_AP_FACILITIES_TEST,
> +S390_FEAT_AP,
>  };
>  

Now I have to ask a very stupid question:

I heard that the execution controls in the SIE block for AP are one of
the oldest ones we have around. How can it be that the AP feature cannot
be used before zEC12?


-- 

Thanks,

David / dhildenb

Re: [Qemu-devel] [RFC PATCH] target/arm: support reading of CNTVCT_EL0 from user-space

2018-04-16 Thread Peter Maydell

On 16 April 2018 at 16:29, Alex Bennée  wrote:
>
> Peter Maydell  writes:
>> CNTVCT_EL0 isn't much use without CNTFRQ_EL0 which tells
>> you how fast it ticks...
>
> I've added it but of course:
>
> /* Note that CNTFRQ is purely reads-as-written for the benefit
>  * of software; writing it doesn't actually change the timer frequency.
>  * Our reset value matches the fixed frequency we implement the timer at.
>  */
>
> So it's not like we do anything with it internally.

But we do correctly use it to report the frequency of our
system-mode CNTVCT counters, as the comment says.

> I assume in real
> life you could mess with this but still have a monotonically
> increasing counter.

It's purely a reporting mechanism. In real hardware the
firmware is supposed to know how fast the system clock is
and and set CNTFRQ up appropriately to tell the OS that that's
how fast the CNTVCT counter runs. (Notice that CNTFRQ is only
RW to the highest implemented exception level, and RO below that.)

>> It looks like other targets use cpu_get_host_ticks() for an
>> arbitrary time-counter thingy. Not sure you can get the frequency
>> for it, though :-(
>
> Hmm I've just used:
>
>   return cpu_get_clock()/GTIMER_SCALE;
>
> For now

I guess that will work -- it boils down to a gettimeofday()
syscall, which will be ok for speed if it's in a VDSO and
a bit worse if it's a real syscall.

thanks
-- PMM

Re: [Qemu-devel] [PATCH] e1000e: Do not auto-clear ICR bits which aren't set in EIAC

2018-04-16 Thread Peter Maydell

On 16 April 2018 at 16:25, Jan Kiszka  wrote:
> On 2018-04-01 23:17, Jan Kiszka wrote:
>> From: Jan Kiszka 
>>
>> The spec does not justify clearing of any E1000_ICR_OTHER_CAUSES when
>> E1000_ICR_OTHER is set in EIAC. In fact, removing this code fixes the
>> issue the Linux driver runs into since 4aea7a5c5e94 ("e1000e: Avoid
>> receiver overrun interrupt bursts") and was worked around by
>> 745d0bd3af99 ("e1000e: Remove Other from EIAC").
>>
>> Signed-off-by: Jan Kiszka 
>> ---
>>
>> This resolves the issue I reported on February 18 ("e1000e: MSI-X
>> problem with recent Linux drivers").
>>
>>  hw/net/e1000e_core.c | 4 
>>  1 file changed, 4 deletions(-)
>>
>> diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
>> index ecf9b1..d38f025c0f 100644
>> --- a/hw/net/e1000e_core.c
>> +++ b/hw/net/e1000e_core.c
>> @@ -2022,10 +2022,6 @@ e1000e_msix_notify_one(E1000ECore *core, uint32_t 
>> cause, uint32_t int_cfg)
>>
>>  effective_eiac = core->mac[EIAC] & cause;
>>
>> -if (effective_eiac == E1000_ICR_OTHER) {
>> -effective_eiac |= E1000_ICR_OTHER_CAUSES;
>> -}
>> -
>>  core->mac[ICR] &= ~effective_eiac;
>>
>>  if (!(core->mac[CTRL_EXT] & E1000_CTRL_EXT_IAME)) {
>>
>
> Ping for this - as well as https://patchwork.ozlabs.org/patch/895476.
>
> Given that q35 uses e1000e by default and many Linux kernel versions no
> longer work, this should likely go into upcoming and stable versions

I'd rather not put it into 2.12 at this point in the release
cycle unless it's a regression from 2.11, I think.

thanks
-- PMM

Re: [Qemu-devel] [RFC PATCH] target/arm: support reading of CNTVCT_EL0 from user-space

2018-04-16 Thread Alex Bennée


Peter Maydell  writes:

> On 16 April 2018 at 15:03, Alex Bennée  wrote:
>> Since kernel commit a86bd139f2 (arm64: arch_timer: Enable CNTVCT_EL0
>> trap..) user-space has been able to read this system register. This
>> patch enables access to that register although currently it always
>> returns 0 as we don't yet have a mechanism for managing timers in
>> linux-user mode.
>>
>> Signed-off-by: Alex Bennée 
>> ---
>>  target/arm/helper.c | 20 +---
>>  1 file changed, 17 insertions(+), 3 deletions(-)
>>
>> diff --git a/target/arm/helper.c b/target/arm/helper.c
>> index b14fdab140..8244badd63 100644
>> --- a/target/arm/helper.c
>> +++ b/target/arm/helper.c
>> @@ -2121,11 +2121,25 @@ static const ARMCPRegInfo generic_timer_cp_reginfo[] 
>> = {
>>  };
>>
>>  #else
>> -/* In user-mode none of the generic timer registers are accessible,
>> - * and their implementation depends on QEMU_CLOCK_VIRTUAL and qdev gpio 
>> outputs,
>> - * so instead just don't register any of them.
>> +
>> +/* In user-mode most of the generic timer registers are inaccessible
>> + * however modern kernels (4.12+) allow access to cntvct_el0
>>   */
>> +
>> +static uint64_t gt_virt_cnt_read(CPUARMState *env, const ARMCPRegInfo *ri)
>> +{
>> +/* Currently we have no support for QEMUTimer in linux-user so we
>> + * can't call gt_get_countervalue(env).
>> + */
>> +return 0;
>> +}
>> +
>>  static const ARMCPRegInfo generic_timer_cp_reginfo[] = {
>> +{ .name = "CNTVCT_EL0", .state = ARM_CP_STATE_AA64,
>> +  .opc0 = 3, .opc1 = 3, .crn = 14, .crm = 0, .opc2 = 2,
>> +  .access = PL0_R, .type = ARM_CP_NO_RAW | ARM_CP_IO,
>> +  .readfn = gt_virt_cnt_read,
>> +},
>>  REGINFO_SENTINEL
>>  };
>
> CNTVCT_EL0 isn't much use without CNTFRQ_EL0 which tells
> you how fast it ticks...

I've added it but of course:

/* Note that CNTFRQ is purely reads-as-written for the benefit
 * of software; writing it doesn't actually change the timer frequency.
 * Our reset value matches the fixed frequency we implement the timer at.
 */

So it's not like we do anything with it internally. I assume in real
life you could mess with this but still have a monotonically increasing
counter.

>
> It looks like other targets use cpu_get_host_ticks() for an
> arbitrary time-counter thingy. Not sure you can get the frequency
> for it, though :-(

Hmm I've just used:

  return cpu_get_clock()/GTIMER_SCALE;

For now

>
> thanks
> -- PMM


--
Alex Bennée

Re: [Qemu-devel] [PATCH qemu] RFC: memory/hmp: Print owners/parents in "info mtree"

2018-04-16 Thread Paolo Bonzini

On 16/04/2018 16:27, Igor Mammedov wrote:
> On Mon, 16 Apr 2018 15:30:39 +0200
> Paolo Bonzini  wrote:
> 
>> On 16/04/2018 15:20, Igor Mammedov wrote:
>>> Generally object doesn't need to know its own name,
>>> we use it only for debugging and nice error reporting so far.
>>> I'd rather have 'id' property at Object level so we won't have
>>> to fish out ID from parent /which we aren't supposed to do and
>>> which doesn't work in some cases/ when it's needed within
>>> object itself.  
>>
>> Having an 'id' at object level is also a mess, because that id is
>> invalid after unparent.
> I'd just consider 'id' as object name which is valid even if there
> is no parent (during whole object lifecycle).

What's the point of an object name if it cannot be unique?

Paolo

> That would allow for object to have a reachable name vs getting NULL
> when parent isn't set.
> Maybe Object::id is overkill, but we probably could use NamedObject
> where it's needed and avoid reverse engineering id from path.
>  
>> Since this is just for debugging use,
>> object_get_canonical_path_component is the right function.  We can just
>> make it return NULL if there is no parent.
>
> looking at current use it out-grew just debugging usecases
> and it's rather messy right now:
> 
> ram_backend_memory_alloc, throttle_group_obj_complete,
> xlnx_zynqmp_create_rpu, spapr_drc.c:realize, iothread_complete,
> memory_region_name

I agree, _but_ it's definitely okay for debugging usecases.

Paolo

Re: [Qemu-devel] [PATCH] e1000e: Do not auto-clear ICR bits which aren't set in EIAC

2018-04-16 Thread Jan Kiszka

On 2018-04-01 23:17, Jan Kiszka wrote:
> From: Jan Kiszka 
> 
> The spec does not justify clearing of any E1000_ICR_OTHER_CAUSES when
> E1000_ICR_OTHER is set in EIAC. In fact, removing this code fixes the
> issue the Linux driver runs into since 4aea7a5c5e94 ("e1000e: Avoid
> receiver overrun interrupt bursts") and was worked around by
> 745d0bd3af99 ("e1000e: Remove Other from EIAC").
> 
> Signed-off-by: Jan Kiszka 
> ---
> 
> This resolves the issue I reported on February 18 ("e1000e: MSI-X
> problem with recent Linux drivers").
> 
>  hw/net/e1000e_core.c | 4 
>  1 file changed, 4 deletions(-)
> 
> diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
> index ecf9b1..d38f025c0f 100644
> --- a/hw/net/e1000e_core.c
> +++ b/hw/net/e1000e_core.c
> @@ -2022,10 +2022,6 @@ e1000e_msix_notify_one(E1000ECore *core, uint32_t 
> cause, uint32_t int_cfg)
>  
>  effective_eiac = core->mac[EIAC] & cause;
>  
> -if (effective_eiac == E1000_ICR_OTHER) {
> -effective_eiac |= E1000_ICR_OTHER_CAUSES;
> -}
> -
>  core->mac[ICR] &= ~effective_eiac;
>  
>  if (!(core->mac[CTRL_EXT] & E1000_CTRL_EXT_IAME)) {
> 

Ping for this - as well as https://patchwork.ozlabs.org/patch/895476.

Given that q35 uses e1000e by default and many Linux kernel versions no
longer work, this should likely go into upcoming and stable versions

Jan

-- 
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux

Re: [Qemu-devel] [PULL 0/2] Block patches for 2.12.0-rc4

2018-04-16 Thread Peter Maydell

On 16 April 2018 at 13:38, Max Reitz  wrote:
> The following changes since commit ae2b1b4e1bb89ea949446597c8776255da0a79d3:
>
>   Merge remote-tracking branch 
> 'remotes/vivier/tags/m68k-for-2.12-pull-request' into staging (2018-04-16 
> 10:11:17 +0100)
>
> are available in the Git repository at:
>
>   git://github.com/XanClic/qemu.git tags/pull-block-2018-04-16
>
> for you to fetch changes up to 25bf2426f3b27857afa35194227040eab821a047:
>
>   iotests: fix 169 (2018-04-16 13:35:32 +0200)
>
> 
> A fix for handling dirty bitmaps stored in qcow2 files.  This is not
> absolutely necessary for 2.12, but if there is an rc4, it should go in.
>
> 
> Vladimir Sementsov-Ogievskiy (2):
>   qcow2: try load bitmaps only once
>   iotests: fix 169
>
>  block/qcow2.h  |  1 +
>  block/qcow2.c  | 16 
>  tests/qemu-iotests/169 | 48 +++-
>  3 files changed, 36 insertions(+), 29 deletions(-)

Applied, thanks.

-- PMM

1 2 3 >

1 - 100 of 231 matches

Mail list logo