date:20180412

Re: [Qemu-devel] [Bug Report] vm paused after succeeding to migrate

2018-04-12 Thread linzhecheng



> -邮件原件-
> 发件人: Dr. David Alan Gilbert [mailto:dgilb...@redhat.com]
> 发送时间: 2018年4月12日 20:37
> 收件人: linzhecheng ; pbonz...@redhat.com
> 抄送: qemu-devel@nongnu.org; wangxin (U) ;
> Zhoujian (jay) ; quint...@redhat.com
> 主题: Re: [Qemu-devel] [Bug Report] vm paused after succeeding to migrate
> 
> * linzhecheng (linzhech...@huawei.com) wrote:
> > Hi, all
> > I encounterd a bug when I try to migrate a windows vm.
> >
> > Enviroment information:
> > host A: cpu E5620(model WestmereEP without flag xsave) host B: cpu
> > E5-2643(model SandyBridgeEP with xsave)
> >
> > The reproduce steps is :
> > 1. Start a windows 2008 vm with -cpu host(which means host-passthrough).
> > 2. Migrate the vm to host B when cr4.OSXSAVE=0 (successfully).
> > 3. Vm runs on host B for a while so that cr4.OSXSAVE changes to 1.
> > 4. Then migrate the vm to host A (successfully), but vm was paused, and
> qemu printed log as followed:
> 
> Remember that migrating using -cpu host  across different CPU models is NOT
> expected to work.
> 
> > KVM: entry failed, hardware error 0x8021
> >
> > If you're running a guest on an Intel machine without unrestricted
> > mode support, the failure can be most likely due to the guest entering
> > an invalid state for Intel VT. For example, the guest maybe running in
> > big real mode which is not supported on less recent Intel processors.
> >
> > EAX=019b3bb0 EBX=01a3ae80 ECX=01a61ce8 EDX=
> > ESI=01a62000 EDI= EBP= ESP=01718b20
> > EIP=0185d982 EFL=0286 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES
> > =   9300 CS =f000   9b00
> > SS =   9300 DS =  
> > 9300 FS =   9300 GS = 
> >  9300
> > LDT=   8200
> > TR =   8b00
> > GDT=  
> > IDT=  
> > CR0=6010 CR2= CR3= CR4=
> > DR0= DR1= DR2=
> > DR3=
> > DR6=0ff0 DR7=0400
> > EFER=
> > Code=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00>
> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > 00 00 00 00 00 00
> >
> > I have found that problem happened when kvm_put_sregs returns err -
> 22(called by kvm_arch_put_registers(qemu)).
> > Because kvm_arch_vcpu_ioctl_set_sregs(kvm-mod) checked that
> guest_cpuid_has no X86_FEATURE_XSAVE but cr4.OSXSAVE=1.
> > So should we cancel migration when kvm_arch_put_registers returns error?
> 
> It would seem good if we can make the migration fail there rather than hitting
> that KVM error.
> It looks like we need to do a bit of plumbing to convert the places that call 
> it to
> return a bool rather than void.
I think we should return a int value of run_on_cpu which callback 
run_on_cpu_func, but run_on_cpu_func is the prototype of many functions,
Is it overkill?
> 
> Dave
> 
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH 2/4] pc-bios/s390-ccw: fix loadparm initialization and int conversion

2018-04-12 Thread Thomas Huth

On 12.04.2018 23:04, Farhan Ali wrote:
> 
> 
> On 04/12/2018 04:57 PM, Collin Walling wrote:
>> On 04/12/2018 02:57 PM, Thomas Huth wrote:
>>> On 10.04.2018 17:01, Collin Walling wrote:
 Rename the loadparm char array in main.c to loadparm_str and
 increase the size by one byte to account for a null termination
 when converting the loadparm string to an int via atoui. Also
 allow the boot menu to be enabled when loadparm is set to an
 empty string or a series of spaces.

 Signed-off-by: Collin Walling 
 Reported-by: Vasily Gorbik 
 ---
   hw/s390x/ipl.c  |  2 ++
   pc-bios/s390-ccw/main.c | 14 +++---
   2 files changed, 9 insertions(+), 7 deletions(-)

 diff --git a/hw/s390x/ipl.c b/hw/s390x/ipl.c
 index fdeaec3..23b5b54 100644
 --- a/hw/s390x/ipl.c
 +++ b/hw/s390x/ipl.c
 @@ -352,6 +352,8 @@ int s390_ipl_set_loadparm(uint8_t *loadparm)
   loadparm[i] = ascii2ebcdic[(uint8_t) lp[i]];
   }
   +    memset(loadparm + i, 0x40, 8 - i); /* fill with EBCDIC
 spaces */
 +
   g_free(lp);
   return 0;
   }
 diff --git a/pc-bios/s390-ccw/main.c b/pc-bios/s390-ccw/main.c
 index 9d9f8cf..26f9adf 100644
 --- a/pc-bios/s390-ccw/main.c
 +++ b/pc-bios/s390-ccw/main.c
 @@ -15,11 +15,11 @@
   char stack[PAGE_SIZE * 8] __attribute__((__aligned__(PAGE_SIZE)));
   static SubChannelId blk_schid = { .one = 1 };
   IplParameterBlock iplb __attribute__((__aligned__(PAGE_SIZE)));
 -static char loadparm[8] = { 0, 0, 0, 0, 0, 0, 0, 0 };
 +static char loadparm_str[9] = { 0, 0, 0, 0, 0, 0, 0, 0, 0 };
   QemuIplParameters qipl;
     #define LOADPARM_PROMPT "PROMPT  "
 -#define LOADPARM_EMPTY  ""
 +#define LOADPARM_EMPTY  "    "
>>>
>>> Sorry for my ignorance, but why was the old string containing dots?
>>>
>>>   Thomas
>>>
>>
>> No need for apologies :)
>>
>> If -machine loadparm is *not* present on the command line, then the
>> loadparm in the sclp
>> will be a series of nulls. For whatever reason, that gets translated
>> into a series of dots.
>>
> 
> It's because of the ebc2asc table we use for conversion, which results
> in the dots when converting from ebcdic_to_ascii.

Ah, great, thanks to both of you for the explanation, that was the part
that I was missing. I was only looking at the tables in
include/hw/s390x/ebcdic.h (since I thought that the dots were already
created on the QEMU side), and did not expect that the pc-bios could
create them.

The patch now makes sense to me:

Reviewed-by: Thomas Huth

Re: [Qemu-devel] [PATCH v2 1/3] pc-dimm: factor out MemoryDevice interface

2018-04-12 Thread David Gibson

On Thu, Apr 12, 2018 at 04:36:33PM +0200, David Hildenbrand wrote:
> On the qmp level, we already have the concept of memory devices:
> "query-memory-devices"
> Right now, we only support NVDIMM and PCDIMM.
> 
> We want to map other devices later into the address space of the guest.
> Such device could e.g. be virtio devices. These devices will have a
> guest memory range assigned but won't be exposed via e.g. ACPI. We want
> to make them look like memory device, but not glued to pc-dimm.
> 
> Especially, it will not always be possible to have TYPE_PC_DIMM as a parent
> class (e.g. virtio devices). Let's use an interface instead. As a first
> part, convert handling of
> - qmp_pc_dimm_device_list
> - get_plugged_memory_size
> to our new model. plug/unplug stuff etc. will follow later.
> 
> A memory device will have to provide the following functions:
> - get_addr(): Necessary, as the property "addr" can e.g. not be used for
>   virtio devices (already defined).

Is the assumption here that a memory device has a fixed RAM address
(GPA) from when it is realized?  It seems to me plausible that you
could have a memory device that has a specific size, but has a BAR of
some sort that's programmed by the guest as part of the "plug"
process.

> - get_plugged_size(): The amount this device offers to the guest as of
>   now.
> - get_region_size(): Because this can later on be bigger than the
>  plugged size.
> - fill_device_info(): Fill MemoryDeviceInfo, e.g. for qmp.
> 
> Signed-off-by: David Hildenbrand 
> ---
>  hw/i386/acpi-build.c |   3 +-
>  hw/mem/Makefile.objs |   1 +
>  hw/mem/memory-device.c   | 119 
> +++
>  hw/mem/pc-dimm.c | 119 
> ++-
>  hw/ppc/spapr.c   |   3 +-
>  hw/ppc/spapr_hcall.c |   1 +
>  include/hw/mem/memory-device.h   |  44 ++
>  include/hw/mem/pc-dimm.h |   2 -
>  numa.c   |   3 +-
>  qmp.c|   4 +-
>  stubs/Makefile.objs  |   2 +-
>  stubs/{qmp_pc_dimm.c => qmp_memory_device.c} |   4 +-
>  12 files changed, 239 insertions(+), 66 deletions(-)
>  create mode 100644 hw/mem/memory-device.c
>  create mode 100644 include/hw/mem/memory-device.h
>  rename stubs/{qmp_pc_dimm.c => qmp_memory_device.c} (61%)
> 
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index 3cf2a1679c..ca3645d57b 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -46,6 +46,7 @@
>  #include "hw/acpi/vmgenid.h"
>  #include "sysemu/tpm_backend.h"
>  #include "hw/timer/mc146818rtc_regs.h"
> +#include "hw/mem/memory-device.h"
>  #include "sysemu/numa.h"
>  
>  /* Supported chipsets: */
> @@ -2253,7 +2254,7 @@ build_tpm2(GArray *table_data, BIOSLinker *linker, 
> GArray *tcpalog)
>  static void build_srat_hotpluggable_memory(GArray *table_data, uint64_t base,
> uint64_t len, int default_node)
>  {
> -MemoryDeviceInfoList *info_list = qmp_pc_dimm_device_list();
> +MemoryDeviceInfoList *info_list = qmp_memory_device_list();
>  MemoryDeviceInfoList *info;
>  MemoryDeviceInfo *mi;
>  PCDIMMDeviceInfo *di;
> diff --git a/hw/mem/Makefile.objs b/hw/mem/Makefile.objs
> index f12f8b97a2..10be4df2a2 100644
> --- a/hw/mem/Makefile.objs
> +++ b/hw/mem/Makefile.objs
> @@ -1,2 +1,3 @@
>  common-obj-$(CONFIG_MEM_HOTPLUG) += pc-dimm.o
> +common-obj-$(CONFIG_MEM_HOTPLUG) += memory-device.o
>  common-obj-$(CONFIG_NVDIMM) += nvdimm.o
> diff --git a/hw/mem/memory-device.c b/hw/mem/memory-device.c
> new file mode 100644
> index 00..0e0d6b539a
> --- /dev/null
> +++ b/hw/mem/memory-device.c
> @@ -0,0 +1,119 @@
> +/*
> + * Memory Device Interface
> + *
> + * Copyright ProfitBricks GmbH 2012
> + * Copyright (C) 2014 Red Hat Inc
> + * Copyright (c) 2018 Red Hat Inc
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "hw/mem/memory-device.h"
> +#include "hw/qdev.h"
> +#include "qapi/error.h"
> +#include "hw/boards.h"
> +#include "qemu/range.h"
> +
> +static gint memory_device_addr_sort(gconstpointer a, gconstpointer b)
> +{
> +MemoryDeviceState *md_a = MEMORY_DEVICE(a);
> +MemoryDeviceState *md_b = MEMORY_DEVICE(b);
> +MemoryDeviceClass *mdc = MEMORY_DEVICE_GET_CLASS(a);

AFAICT from the code below, the two devices don't necessarily have to
have the same class, so I think you need to get the class pointers for
each of them separately.  (Or if they do have to be the same by
design, I think there should at least be an assert).

> +const uint64_t addr_a = mdc->get_addr(md_a);
> +const uint64_t addr_b =

[Qemu-devel] [PATCH] vhost: do not verify ring mappings when IOMMU is enabled

2018-04-12 Thread Jason Wang

When IOMMU is enabled, we store virtqueue metadata as iova (though it
may has _phys suffix) and access them through dma helpers. Any
translation failures could be reported by IOMMU.

In this case, trying to validate iova against gpa won't work and will
cause a false error reporting. So this patch bypasses the ring
verification if IOMMU is enabled which is similar to the behavior
before 0ca1fd2d6878 that calls vhost_memory_map() which is a nop when
IOMMU is enabled.

Fixes: 0ca1fd2d6878 ("vhost: Simplify ring verification checks")
Cc: Dr. David Alan Gilbert 
Cc: Igor Mammedov 
Cc: Michael S. Tsirkin 
Signed-off-by: Jason Wang 
---
 hw/virtio/vhost.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index f51bf57..9d5850a 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -342,6 +342,10 @@ static int vhost_verify_ring_mappings(struct vhost_dev 
*dev,
 "used ring"
 };
 
+if (vhost_dev_has_iommu(dev)) {
+return 0;
+}
+
 for (i = 0; i < dev->nvqs; ++i) {
 struct vhost_virtqueue *vq = dev->vqs + i;
 
-- 
2.7.4

[Qemu-devel] [Bug 1763536] [NEW] go build fails under qemu-ppc64le-static (qemu-user)

2018-04-12 Thread David Wilder

Public bug reported:

I am using qemu-user (built static) in a docker container environment.
When running multi-threaded go commands in the container (go build for
example) the process may hang, report segfaults or other errors.  I
built qemu-ppc64le from the upstream git (master).

I see the problem running on a multi core system with Intel i7 processors.
# cat /proc/cpuinfo | grep "model name"
model name  : Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz
model name  : Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz
model name  : Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz
model name  : Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz
model name  : Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz
model name  : Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz
model name  : Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz
model name  : Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz

Steps to reproduce:
1) Build qemu-ppc64le as static and copy into docker build directory named it 
qemu-ppc64le-static.

2) Add hello.go to docker build dir.

package main
import "fmt"
func main() {
fmt.Println("hello world")
}

3) Create the Dockerfile from below:

FROM ppc64le/golang:1.10.1-alpine3.
COPY qemu-ppc64le-static /usr/bin/
COPY hello.go /go

4) Build container
$ docker build -t qemutest -f Dockerfile ./go 

5) Run test
$ docker run -it qemutest

/go # /usr/bin/qemu-ppc64le-static --version
qemu-ppc64le version 2.11.93 (v2.12.0-rc3-dirty)
Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers

/go # go version
go version go1.10.1 linux/ppc64le

/go # go build hello.go
fatal error: fatal error: stopm holding locksunexpected signal during runtime 
execution

panic during panic
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1003528c]

runtime stack:
runtime: unexpected return pc for syscall.Syscall6 called from 0xc42007f500
stack: frame={sp:0xc4203be840, fp:0xc4203be860} 
stack=[0x4000b7ecf0,0x4000b928f0)

syscall.Syscall6(0x100744e8, 0x3d, 0xc42050c140, 0x20, 0x18, 0x10422b80, 
0xc4203be968[signal , 0x10012d88SIGSEGV: segmentation violation, 0xc420594000 
code=, 0x00x1 addr=0x0 pc=0x1003528c)
]

runtime stack:

/usr/local/go/src/syscall/asm_linux_ppc64x.s:61runtime.throw(0x10472d19, 0x13)
 +  /usr/local/go/src/runtime/panic.go:0x6c616 +0x68


runtime.stopm()
/usr/local/go/src/runtime/proc.go:1939goroutine  +10x158
 [runtime.exitsyscall0semacquire(0xc42007f500)
/usr/local/go/src/runtime/proc.go:3129 +]:
0x130
runtime.mcall(0xc42007f500)
/usr/local/go/src/runtime/asm_ppc64x.s:183 +0x58sync.runtime_Semacquire
(0xc4201fab1c)
/usr/local/go/src/runtime/sema.go:56 +0x38


Note the results may differ between attempts,  hangs and other faults sometimes 
happen.

If I run "go: single threaded I don't see the problem, for example:

/go # GOMAXPROCS=1 go build -p 1 hello.go 
/go # ./hello
hello world

I see the same issue with arm64.  I don't think this is a go issue, but
don't have a real evidence to prove that.  This problem looks similar to
other problem I have seen reported against qemu running multi-threaded
applications.

** Affects: qemu
 Importance: Undecided
 Status: New

** Summary changed:

- go buld failes under qemu-ppc64le-static (qemu-user) 
+ go build fails under qemu-ppc64le-static (qemu-user)

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1763536

Title:
  go build fails under qemu-ppc64le-static (qemu-user)

Status in QEMU:
  New

Bug description:
  I am using qemu-user (built static) in a docker container environment.
  When running multi-threaded go commands in the container (go build for
  example) the process may hang, report segfaults or other errors.  I
  built qemu-ppc64le from the upstream git (master).

  I see the problem running on a multi core system with Intel i7 processors.
  # cat /proc/cpuinfo | grep "model name"
  model name: Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz
  model name: Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz
  model name: Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz
  model name: Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz
  model name: Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz
  model name: Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz
  model name: Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz
  model name: Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz

  Steps to reproduce:
  1) Build qemu-ppc64le as static and copy into docker build directory named it 
qemu-ppc64le-static.

  2) Add hello.go to docker build dir.

  package main
  import "fmt"
  func main() {
fmt.Println("hello world")
  }

  3) Create the Dockerfile from below:

  FROM ppc64le/golang:1.10.1-alpine3.
  COPY qemu-ppc64le-static /usr/bin/
  COPY hello.go /go

  4) Build container
  $ docker build -t qemutest -f Dockerfile ./go 

  5) Run test
  $ docker run -it qemutest

  /go #

Re: [Qemu-devel] [PATCH 2/4] pc-bios/s390-ccw: fix loadparm initialization and int conversion

2018-04-12 Thread Farhan Ali




On 04/12/2018 04:57 PM, Collin Walling wrote:

On 04/12/2018 02:57 PM, Thomas Huth wrote:

On 10.04.2018 17:01, Collin Walling wrote:

Rename the loadparm char array in main.c to loadparm_str and
increase the size by one byte to account for a null termination
when converting the loadparm string to an int via atoui. Also
allow the boot menu to be enabled when loadparm is set to an
empty string or a series of spaces.

Signed-off-by: Collin Walling 
Reported-by: Vasily Gorbik 
---
  hw/s390x/ipl.c  |  2 ++
  pc-bios/s390-ccw/main.c | 14 +++---
  2 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/hw/s390x/ipl.c b/hw/s390x/ipl.c
index fdeaec3..23b5b54 100644
--- a/hw/s390x/ipl.c
+++ b/hw/s390x/ipl.c
@@ -352,6 +352,8 @@ int s390_ipl_set_loadparm(uint8_t *loadparm)
  loadparm[i] = ascii2ebcdic[(uint8_t) lp[i]];
  }
  
+memset(loadparm + i, 0x40, 8 - i); /* fill with EBCDIC spaces */

+
  g_free(lp);
  return 0;
  }
diff --git a/pc-bios/s390-ccw/main.c b/pc-bios/s390-ccw/main.c
index 9d9f8cf..26f9adf 100644
--- a/pc-bios/s390-ccw/main.c
+++ b/pc-bios/s390-ccw/main.c
@@ -15,11 +15,11 @@
  char stack[PAGE_SIZE * 8] __attribute__((__aligned__(PAGE_SIZE)));
  static SubChannelId blk_schid = { .one = 1 };
  IplParameterBlock iplb __attribute__((__aligned__(PAGE_SIZE)));
-static char loadparm[8] = { 0, 0, 0, 0, 0, 0, 0, 0 };
+static char loadparm_str[9] = { 0, 0, 0, 0, 0, 0, 0, 0, 0 };
  QemuIplParameters qipl;
  
  #define LOADPARM_PROMPT "PROMPT  "

-#define LOADPARM_EMPTY  ""
+#define LOADPARM_EMPTY  ""


Sorry for my ignorance, but why was the old string containing dots?

  Thomas



No need for apologies :)

If -machine loadparm is *not* present on the command line, then the loadparm in 
the sclp
will be a series of nulls. For whatever reason, that gets translated into a 
series of dots.



It's because of the ebc2asc table we use for conversion, which results 
in the dots when converting from ebcdic_to_ascii.



If -machine loadparm="" is present, then loadparm will in the sclp will be a 
series of
spaces. >
We want to enable the boot menu for both of these cases and, to make things 
easier, this
patch replaces any nulls (dots) to spaces when setting loadparm so that we only 
have to
handle one case instead of two within the bios.

Re: [Qemu-devel] [PATCH 2/4] pc-bios/s390-ccw: fix loadparm initialization and int conversion

2018-04-12 Thread Collin Walling

On 04/12/2018 02:57 PM, Thomas Huth wrote:
> On 10.04.2018 17:01, Collin Walling wrote:
>> Rename the loadparm char array in main.c to loadparm_str and
>> increase the size by one byte to account for a null termination
>> when converting the loadparm string to an int via atoui. Also 
>> allow the boot menu to be enabled when loadparm is set to an 
>> empty string or a series of spaces.
>>
>> Signed-off-by: Collin Walling 
>> Reported-by: Vasily Gorbik 
>> ---
>>  hw/s390x/ipl.c  |  2 ++
>>  pc-bios/s390-ccw/main.c | 14 +++---
>>  2 files changed, 9 insertions(+), 7 deletions(-)
>>
>> diff --git a/hw/s390x/ipl.c b/hw/s390x/ipl.c
>> index fdeaec3..23b5b54 100644
>> --- a/hw/s390x/ipl.c
>> +++ b/hw/s390x/ipl.c
>> @@ -352,6 +352,8 @@ int s390_ipl_set_loadparm(uint8_t *loadparm)
>>  loadparm[i] = ascii2ebcdic[(uint8_t) lp[i]];
>>  }
>>  
>> +memset(loadparm + i, 0x40, 8 - i); /* fill with EBCDIC spaces */
>> +
>>  g_free(lp);
>>  return 0;
>>  }
>> diff --git a/pc-bios/s390-ccw/main.c b/pc-bios/s390-ccw/main.c
>> index 9d9f8cf..26f9adf 100644
>> --- a/pc-bios/s390-ccw/main.c
>> +++ b/pc-bios/s390-ccw/main.c
>> @@ -15,11 +15,11 @@
>>  char stack[PAGE_SIZE * 8] __attribute__((__aligned__(PAGE_SIZE)));
>>  static SubChannelId blk_schid = { .one = 1 };
>>  IplParameterBlock iplb __attribute__((__aligned__(PAGE_SIZE)));
>> -static char loadparm[8] = { 0, 0, 0, 0, 0, 0, 0, 0 };
>> +static char loadparm_str[9] = { 0, 0, 0, 0, 0, 0, 0, 0, 0 };
>>  QemuIplParameters qipl;
>>  
>>  #define LOADPARM_PROMPT "PROMPT  "
>> -#define LOADPARM_EMPTY  ""
>> +#define LOADPARM_EMPTY  ""
> 
> Sorry for my ignorance, but why was the old string containing dots?
> 
>  Thomas
> 

No need for apologies :)

If -machine loadparm is *not* present on the command line, then the loadparm in 
the sclp 
will be a series of nulls. For whatever reason, that gets translated into a 
series of dots.

If -machine loadparm="" is present, then loadparm will in the sclp will be a 
series of
spaces. 

We want to enable the boot menu for both of these cases and, to make things 
easier, this 
patch replaces any nulls (dots) to spaces when setting loadparm so that we only 
have to 
handle one case instead of two within the bios.


-- 
Respectfully,
- Collin Walling

Re: [Qemu-devel] [PULL, 08/36] qapi: Remove qobject_to_X() functions

2018-04-12 Thread Eric Blake

On 04/12/2018 12:51 PM, Yuval Shaia wrote:
> Hi Eric,
> 
> On Mon, Mar 12, 2018 at 01:35:59PM -0500, Eric Blake wrote:
>> From: Max Reitz 
>>
>> They are no longer needed now.
> 
> I'm doing some off-list development which use these functions.
> Unfortunately i'm not subscribed (yet) to qemu-devel so obviously missed
> this email.
> 
> Is there any alternative to these functions?

Yes, in the patches right before this one.

> 
> My code looks something like this:
> 
> QObject *oid;
> unsigned long id;
> 
> oid = qlist_pop(list);
> id = qnum_get_uint(qobject_to_qnum(oid));
> 

id = qnum_get_uint(qobject_to(QNum, oid));

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH RFC] s390x: refactor reset/reipl handling

2018-04-12 Thread Paolo Bonzini

On 12/04/2018 21:26, David Hildenbrand wrote:
> Calling pause_all_vcpus()/resume_all_vcpus() from a VCPU thread might
> not be the best idea. As pause_all_vcpus() temporarily drops the qemu
> mutex, two parallel calls to pause_all_vcpus() can be active at a time,
> resulting in a deadlock. (either by two VCPUs or by the main thread and a
> VCPU)
> 
> Let's handle it via the main loop instead, as suggested by Paolo. If we
> would have two parallel reset requests by two different VCPUs at the
> same time, the last one would win.
> 
> We use the existing ipl device to handle it. The nice side effect is
> that we can get rid of reipl_requested.
> 
> This change implies that all reset handling now goes via the common
> path, so "no-reboot" handling is now active for all kinds of reboots.
> 
> Let's execute any CPU initialization code on the target CPU using
> run_on_cpu.
> 
> Signed-off-by: David Hildenbrand 
> ---
> 
> Did only a quick check with TCG. I think this way it is way easier to
> control what is getting reset.

Happy that it went well (there's always some trepidation when suggesting
a refactoring :)).  For what I can understand, it looks sane.

Paolo

>  hw/s390x/ipl.c | 44 ---
>  hw/s390x/ipl.h | 16 --
>  hw/s390x/s390-virtio-ccw.c | 49 +-
>  include/hw/s390x/s390-virtio-ccw.h |  2 --
>  target/s390x/cpu.h | 26 
>  target/s390x/diag.c| 61 
> +++---
>  target/s390x/internal.h|  6 
>  target/s390x/kvm.c |  2 +-
>  8 files changed, 127 insertions(+), 79 deletions(-)
> 
> diff --git a/hw/s390x/ipl.c b/hw/s390x/ipl.c
> index fb554ab156..f7589d20f1 100644
> --- a/hw/s390x/ipl.c
> +++ b/hw/s390x/ipl.c
> @@ -26,6 +26,7 @@
>  #include "qemu/config-file.h"
>  #include "qemu/cutils.h"
>  #include "qemu/option.h"
> +#include "exec/exec-all.h"
>  
>  #define KERN_IMAGE_START0x01UL
>  #define KERN_PARM_AREA  0x010480UL
> @@ -484,11 +485,22 @@ IplParameterBlock *s390_ipl_get_iplb(void)
>  return >iplb;
>  }
>  
> -void s390_reipl_request(void)
> +void s390_ipl_reset_request(CPUState *cs, enum s390_reset reset_type)
>  {
>  S390IPLState *ipl = get_ipl_device();
>  
> -ipl->reipl_requested = true;
> +if (reset_type == S390_RESET_EXTERNAL || reset_type == S390_RESET_REIPL) 
> {
> +/* let's always use CPU 0 */
> +ipl->reset_cpu = NULL;
> +} else {
> +ipl->reset_cpu = cs;
> +}
> +ipl->reset_type = reset_type;
> +
> +if (reset_type != S390_RESET_REIPL) {
> +goto no_reipl;
> +}
> +
>  if (ipl->iplb_valid &&
>  !ipl->netboot &&
>  ipl->iplb.pbt == S390_IPL_TYPE_CCW &&
> @@ -505,7 +517,32 @@ void s390_reipl_request(void)
>  ipl->iplb_valid = s390_gen_initial_iplb(ipl);
>  }
>  }
> +no_reipl:
>  qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET);
> +/* as this is triggered by a CPU, make sure to exit the loop */
> +if (tcg_enabled()) {
> +cpu_loop_exit(cs);
> +}
> +}
> +
> +void s390_ipl_get_reset_request(CPUState **cs, enum s390_reset *reset_type)
> +{
> +S390IPLState *ipl = get_ipl_device();
> +
> +*cs = ipl->reset_cpu;
> +if (!ipl->reset_cpu) {
> +/* use CPU 0 as default */
> +*cs = qemu_get_cpu(0);
> +}
> +*reset_type = ipl->reset_type;
> +}
> +
> +void s390_ipl_clear_reset_request(void)
> +{
> +S390IPLState *ipl = get_ipl_device();
> +
> +ipl->reset_type = S390_RESET_EXTERNAL;
> +ipl->reset_cpu = NULL;
>  }
>  
>  static void s390_ipl_prepare_qipl(S390CPU *cpu)
> @@ -552,11 +589,10 @@ static void s390_ipl_reset(DeviceState *dev)
>  {
>  S390IPLState *ipl = S390_IPL(dev);
>  
> -if (!ipl->reipl_requested) {
> +if (ipl->reset_type != S390_RESET_REIPL) {
>  ipl->iplb_valid = false;
>  memset(>iplb, 0, sizeof(IplParameterBlock));
>  }
> -ipl->reipl_requested = false;
>  }
>  
>  static void s390_ipl_class_init(ObjectClass *klass, void *data)
> diff --git a/hw/s390x/ipl.h b/hw/s390x/ipl.h
> index 0570d0ad75..102f1ea7af 100644
> --- a/hw/s390x/ipl.h
> +++ b/hw/s390x/ipl.h
> @@ -87,7 +87,17 @@ int s390_ipl_set_loadparm(uint8_t *loadparm);
>  void s390_ipl_update_diag308(IplParameterBlock *iplb);
>  void s390_ipl_prepare_cpu(S390CPU *cpu);
>  IplParameterBlock *s390_ipl_get_iplb(void);
> -void s390_reipl_request(void);
> +
> +enum s390_reset {
> +/* default is a reset not triggered by a CPU e.g. issued by QMP */
> +S390_RESET_EXTERNAL = 0,
> +S390_RESET_REIPL,
> +S390_RESET_MODIFIED_CLEAR,
> +S390_RESET_LOAD_NORMAL,
> +};
> +void s390_ipl_reset_request(CPUState *cs, enum s390_reset reset_type);
> +void s390_ipl_get_reset_request(CPUState **cs, enum s390_reset *reset_type);
> +void s390_ipl_clear_reset_request(void);
>

Re: [Qemu-devel] [PATCH 07/19] block: Really pause block jobs on drain

2018-04-12 Thread Paolo Bonzini

On 12/04/2018 16:25, Kevin Wolf wrote:
> This is already the order we have there. What is probably different from
> what you envision is that after the parents have concluded, we still
> check that they are still quiescent in every iteration.

Yes, and that's the quadratic part.

> What we could do easily is introducing a bool bs->quiesce_concluded or
> something that bdrv_drain_poll() sets to true the first time that it
> returns false. It also gets a shortcut so that it returns false
> immediately if bs->quiesce_concluded is true. The field is reset in
> bdrv_do_drained_end() when bs->quiesce_counter reaches 0.

Or bs->quiescent, for the sake of bikeshedding.

> That would be a rather simple optimisation that could be done as the
> final patch in this series, and would ensure that we don't recurse back
> to parents that are already quiescent.
> 
> Would you be happy with this change?

Yes, I'll leave the organization of the series to you of course.

Paolo

Re: [Qemu-devel] Bad icount read when running qemu-system-ppc64 and mfspr atbu guest instruction

2018-04-12 Thread Richard Henderson

On 04/11/2018 04:37 PM, Darrell Leinwand wrote:
> Ah I see, that makes sense and the atb instructions are marked as unused in
the mainline which when I compared I recall now that we modifed them because
the guest software we are running is trying to use those special registers.
> 
> I added the wrappers into those function, is that correct?

Yes.


r~

Re: [Qemu-devel] [PATCH v3 00/22] More fully implement ARM PMUv3

2018-04-12 Thread Aaron Lindsay

On Apr 12 18:32, Peter Maydell wrote:
> On 16 March 2018 at 20:30, Aaron Lindsay  wrote:
> > The ARM PMU implementation currently contains a basic cycle counter, but it 
> > is
> > often useful to gather counts of other events and filter them based on
> > execution mode. These patches flesh out the implementations of various PMU
> > registers including PM[X]EVCNTR and PM[X]EVTYPER, add a struct definition to
> > represent arbitrary counter types, implement mode filtering, and add
> > instruction, cycle, and software increment events.
> 
> Hi; sorry it's taken me a while to get to this (I've been focusing
> on for-2.12 work). I've now reviewed most of the patches in this
> set. My suggestion is that you fix the stuff I've commented on
> and send out a v4, and then I can take some of the patches from
> the start of that into target-arm.next, and I'll review the tail
> end of the set then.

Thanks for the review! I'll see if I can get a v4 out late this week or
early next.

I'll make a note of this when I send it out, but one thing to look out
for in the next version is that I had to reorganize a few things to make
the interrupt-on-overflow functionality work better. I think the changes
are fairly minor and limited to the later patches, with the possible
exception of using two variables per counter (because you have to know
what the previous counter value was in addition to the current value to
know if you've overflowed since your last check).

-Aaron

-- 
Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

[Qemu-devel] [PATCH RFC] s390x: refactor reset/reipl handling

2018-04-12 Thread David Hildenbrand

Calling pause_all_vcpus()/resume_all_vcpus() from a VCPU thread might
not be the best idea. As pause_all_vcpus() temporarily drops the qemu
mutex, two parallel calls to pause_all_vcpus() can be active at a time,
resulting in a deadlock. (either by two VCPUs or by the main thread and a
VCPU)

Let's handle it via the main loop instead, as suggested by Paolo. If we
would have two parallel reset requests by two different VCPUs at the
same time, the last one would win.

We use the existing ipl device to handle it. The nice side effect is
that we can get rid of reipl_requested.

This change implies that all reset handling now goes via the common
path, so "no-reboot" handling is now active for all kinds of reboots.

Let's execute any CPU initialization code on the target CPU using
run_on_cpu.

Signed-off-by: David Hildenbrand 
---

Did only a quick check with TCG. I think this way it is way easier to
control what is getting reset.

 hw/s390x/ipl.c | 44 ---
 hw/s390x/ipl.h | 16 --
 hw/s390x/s390-virtio-ccw.c | 49 +-
 include/hw/s390x/s390-virtio-ccw.h |  2 --
 target/s390x/cpu.h | 26 
 target/s390x/diag.c| 61 +++---
 target/s390x/internal.h|  6 
 target/s390x/kvm.c |  2 +-
 8 files changed, 127 insertions(+), 79 deletions(-)

diff --git a/hw/s390x/ipl.c b/hw/s390x/ipl.c
index fb554ab156..f7589d20f1 100644
--- a/hw/s390x/ipl.c
+++ b/hw/s390x/ipl.c
@@ -26,6 +26,7 @@
 #include "qemu/config-file.h"
 #include "qemu/cutils.h"
 #include "qemu/option.h"
+#include "exec/exec-all.h"
 
 #define KERN_IMAGE_START0x01UL
 #define KERN_PARM_AREA  0x010480UL
@@ -484,11 +485,22 @@ IplParameterBlock *s390_ipl_get_iplb(void)
 return >iplb;
 }
 
-void s390_reipl_request(void)
+void s390_ipl_reset_request(CPUState *cs, enum s390_reset reset_type)
 {
 S390IPLState *ipl = get_ipl_device();
 
-ipl->reipl_requested = true;
+if (reset_type == S390_RESET_EXTERNAL || reset_type == S390_RESET_REIPL) {
+/* let's always use CPU 0 */
+ipl->reset_cpu = NULL;
+} else {
+ipl->reset_cpu = cs;
+}
+ipl->reset_type = reset_type;
+
+if (reset_type != S390_RESET_REIPL) {
+goto no_reipl;
+}
+
 if (ipl->iplb_valid &&
 !ipl->netboot &&
 ipl->iplb.pbt == S390_IPL_TYPE_CCW &&
@@ -505,7 +517,32 @@ void s390_reipl_request(void)
 ipl->iplb_valid = s390_gen_initial_iplb(ipl);
 }
 }
+no_reipl:
 qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET);
+/* as this is triggered by a CPU, make sure to exit the loop */
+if (tcg_enabled()) {
+cpu_loop_exit(cs);
+}
+}
+
+void s390_ipl_get_reset_request(CPUState **cs, enum s390_reset *reset_type)
+{
+S390IPLState *ipl = get_ipl_device();
+
+*cs = ipl->reset_cpu;
+if (!ipl->reset_cpu) {
+/* use CPU 0 as default */
+*cs = qemu_get_cpu(0);
+}
+*reset_type = ipl->reset_type;
+}
+
+void s390_ipl_clear_reset_request(void)
+{
+S390IPLState *ipl = get_ipl_device();
+
+ipl->reset_type = S390_RESET_EXTERNAL;
+ipl->reset_cpu = NULL;
 }
 
 static void s390_ipl_prepare_qipl(S390CPU *cpu)
@@ -552,11 +589,10 @@ static void s390_ipl_reset(DeviceState *dev)
 {
 S390IPLState *ipl = S390_IPL(dev);
 
-if (!ipl->reipl_requested) {
+if (ipl->reset_type != S390_RESET_REIPL) {
 ipl->iplb_valid = false;
 memset(>iplb, 0, sizeof(IplParameterBlock));
 }
-ipl->reipl_requested = false;
 }
 
 static void s390_ipl_class_init(ObjectClass *klass, void *data)
diff --git a/hw/s390x/ipl.h b/hw/s390x/ipl.h
index 0570d0ad75..102f1ea7af 100644
--- a/hw/s390x/ipl.h
+++ b/hw/s390x/ipl.h
@@ -87,7 +87,17 @@ int s390_ipl_set_loadparm(uint8_t *loadparm);
 void s390_ipl_update_diag308(IplParameterBlock *iplb);
 void s390_ipl_prepare_cpu(S390CPU *cpu);
 IplParameterBlock *s390_ipl_get_iplb(void);
-void s390_reipl_request(void);
+
+enum s390_reset {
+/* default is a reset not triggered by a CPU e.g. issued by QMP */
+S390_RESET_EXTERNAL = 0,
+S390_RESET_REIPL,
+S390_RESET_MODIFIED_CLEAR,
+S390_RESET_LOAD_NORMAL,
+};
+void s390_ipl_reset_request(CPUState *cs, enum s390_reset reset_type);
+void s390_ipl_get_reset_request(CPUState **cs, enum s390_reset *reset_type);
+void s390_ipl_clear_reset_request(void);
 
 #define QIPL_ADDRESS  0xcc
 
@@ -129,9 +139,11 @@ struct S390IPLState {
 bool enforce_bios;
 IplParameterBlock iplb;
 bool iplb_valid;
-bool reipl_requested;
 bool netboot;
 QemuIplParameters qipl;
+/* reset related properties don't have to be migrated or reset */
+enum s390_reset reset_type;
+CPUState *reset_cpu;
 
 /*< public >*/
 char *kernel;
diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index

Re: [Qemu-devel] [PATCH 4/5] migration: fix qemu carsh when RDMA live migration

2018-04-12 Thread Dr. David Alan Gilbert

* 858585 jemmy (jemmy858...@gmail.com) wrote:
> On Thu, Apr 12, 2018 at 12:43 AM, Dr. David Alan Gilbert
>  wrote:
> > * Lidong Chen (jemmy858...@gmail.com) wrote:
> >> After postcopy, the destination qemu work in the dedicated
> >> thread, so only invoke yield_until_fd_readable before postcopy
> >> migration.
> >
> > The subject line needs to be more discriptive:
> >migration: Stop rdma yielding during incoming postcopy
> >
> > I think.
> > (Also please check the subject spellings)
> >
> >> Signed-off-by: Lidong Chen 
> >> ---
> >>  migration/rdma.c | 4 +++-
> >>  1 file changed, 3 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/migration/rdma.c b/migration/rdma.c
> >> index 53773c7..81be482 100644
> >> --- a/migration/rdma.c
> >> +++ b/migration/rdma.c
> >> @@ -1489,11 +1489,13 @@ static int qemu_rdma_wait_comp_channel(RDMAContext 
> >> *rdma)
> >>   * Coroutine doesn't start until migration_fd_process_incoming()
> >>   * so don't yield unless we know we're running inside of a coroutine.
> >>   */
> >> -if (rdma->migration_started_on_destination) {
> >> +if (rdma->migration_started_on_destination &&
> >> +migration_incoming_get_current()->state == 
> >> MIGRATION_STATUS_ACTIVE) {
> >
> > OK, that's a bit delicate; watch if it ever gets called in a failure
> > case or similar - and also wathc out if we make more use of the status
> > on the destination, but otherwise, and with a fix for the subject;
> 
> How about use migration_incoming_get_current()->have_listen_thread?

That's supposed to be pretty internal to the postcopy code, so I prefer
the status check.

Dave

> if (rdma->migration_started_on_destination &&
> migration_incoming_get_current()->have_listen_thread == false) {
> yield_until_fd_readable(rdma->comp_channel->fd);
> }
> 
> >
> >
> > Reviewed-by: Dr. David Alan Gilbert 
> >
> >>  yield_until_fd_readable(rdma->comp_channel->fd);
> >>  } else {
> >>  /* This is the source side, we're in a separate thread
> >>   * or destination prior to migration_fd_process_incoming()
> >> + * after postcopy, the destination also in a seprate thread.
> >>   * we can't yield; so we have to poll the fd.
> >>   * But we need to be able to handle 'cancel' or an error
> >>   * without hanging forever.
> >> --
> >> 1.8.3.1
> >>
> > --
> > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH 2/4] pc-bios/s390-ccw: fix loadparm initialization and int conversion

2018-04-12 Thread Thomas Huth

On 10.04.2018 17:01, Collin Walling wrote:
> Rename the loadparm char array in main.c to loadparm_str and
> increase the size by one byte to account for a null termination
> when converting the loadparm string to an int via atoui. Also 
> allow the boot menu to be enabled when loadparm is set to an 
> empty string or a series of spaces.
> 
> Signed-off-by: Collin Walling 
> Reported-by: Vasily Gorbik 
> ---
>  hw/s390x/ipl.c  |  2 ++
>  pc-bios/s390-ccw/main.c | 14 +++---
>  2 files changed, 9 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/s390x/ipl.c b/hw/s390x/ipl.c
> index fdeaec3..23b5b54 100644
> --- a/hw/s390x/ipl.c
> +++ b/hw/s390x/ipl.c
> @@ -352,6 +352,8 @@ int s390_ipl_set_loadparm(uint8_t *loadparm)
>  loadparm[i] = ascii2ebcdic[(uint8_t) lp[i]];
>  }
>  
> +memset(loadparm + i, 0x40, 8 - i); /* fill with EBCDIC spaces */
> +
>  g_free(lp);
>  return 0;
>  }
> diff --git a/pc-bios/s390-ccw/main.c b/pc-bios/s390-ccw/main.c
> index 9d9f8cf..26f9adf 100644
> --- a/pc-bios/s390-ccw/main.c
> +++ b/pc-bios/s390-ccw/main.c
> @@ -15,11 +15,11 @@
>  char stack[PAGE_SIZE * 8] __attribute__((__aligned__(PAGE_SIZE)));
>  static SubChannelId blk_schid = { .one = 1 };
>  IplParameterBlock iplb __attribute__((__aligned__(PAGE_SIZE)));
> -static char loadparm[8] = { 0, 0, 0, 0, 0, 0, 0, 0 };
> +static char loadparm_str[9] = { 0, 0, 0, 0, 0, 0, 0, 0, 0 };
>  QemuIplParameters qipl;
>  
>  #define LOADPARM_PROMPT "PROMPT  "
> -#define LOADPARM_EMPTY  ""
> +#define LOADPARM_EMPTY  ""

Sorry for my ignorance, but why was the old string containing dots?

 Thomas

Re: [Qemu-devel] [PATCH 5/5] migration: disable RDMA WRITR after postcopy started.

2018-04-12 Thread Dr. David Alan Gilbert

* 858585 jemmy (jemmy858...@gmail.com) wrote:
> On Wed, Apr 11, 2018 at 11:56 PM, Dr. David Alan Gilbert
>  wrote:
> > * Lidong Chen (jemmy858...@gmail.com) wrote:
> >> RDMA write operations are performed with no notification to the destination
> >> qemu, then the destination qemu can not wakeup. So disable RDMA WRITE after
> >> postcopy started.
> >>
> >> Signed-off-by: Lidong Chen 
> >
> > This patch needs to be near the beginning of the series; at the moment a
> > bisect would lead you to the middle of the series which had return
> > paths, but then would fail to work properly because it would try and use
> > the RDMA code.
> 
> I will fix this problem in next version.
> 
> >
> >> ---
> >>  migration/qemu-file.c |  3 ++-
> >>  migration/rdma.c  | 12 
> >>  2 files changed, 14 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> >> index 8acb574..a64ac3a 100644
> >> --- a/migration/qemu-file.c
> >> +++ b/migration/qemu-file.c
> >> @@ -260,7 +260,8 @@ size_t ram_control_save_page(QEMUFile *f, ram_addr_t 
> >> block_offset,
> >>  int ret = f->hooks->save_page(f, f->opaque, block_offset,
> >>offset, size, bytes_sent);
> >>  f->bytes_xfer += size;
> >> -if (ret != RAM_SAVE_CONTROL_DELAYED) {
> >> +if (ret != RAM_SAVE_CONTROL_DELAYED &&
> >> +ret != RAM_SAVE_CONTROL_NOT_SUPP) {
> >
> > What about f->bytes_xfer in this case?
> 
> f->bytes_xfer should not update when RAM_SAVE_CONTROL_NOT_SUPP.
> I will fix this problem in next version.
> 
> >
> > Is there anything we have to do at the switchover into postcopy to make
> > sure that all pages have been received?
> 
> ram_save_iterate invoke ram_control_after_iterate(f, RAM_CONTROL_ROUND),
> so before next iteration which switchover into postcopy, all the pages
> sent by previous
> iteration have been received.

OK, great.

Dave

> >
> > Dave
> >
> >>  if (bytes_sent && *bytes_sent > 0) {
> >>  qemu_update_position(f, *bytes_sent);
> >>  } else if (ret < 0) {
> >> diff --git a/migration/rdma.c b/migration/rdma.c
> >> index 81be482..8529ddd 100644
> >> --- a/migration/rdma.c
> >> +++ b/migration/rdma.c
> >> @@ -2964,6 +2964,10 @@ static size_t qemu_rdma_save_page(QEMUFile *f, void 
> >> *opaque,
> >>
> >>  CHECK_ERROR_STATE();
> >>
> >> +if (migrate_get_current()->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) 
> >> {
> >> +return RAM_SAVE_CONTROL_NOT_SUPP;
> >> +}
> >> +
> >>  qemu_fflush(f);
> >>
> >>  if (size > 0) {
> >> @@ -3528,6 +3532,10 @@ static int qemu_rdma_registration_start(QEMUFile 
> >> *f, void *opaque,
> >>
> >>  CHECK_ERROR_STATE();
> >>
> >> +if (migrate_get_current()->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) 
> >> {
> >> +return 0;
> >> +}
> >> +
> >>  trace_qemu_rdma_registration_start(flags);
> >>  qemu_put_be64(f, RAM_SAVE_FLAG_HOOK);
> >>  qemu_fflush(f);
> >> @@ -3550,6 +3558,10 @@ static int qemu_rdma_registration_stop(QEMUFile *f, 
> >> void *opaque,
> >>
> >>  CHECK_ERROR_STATE();
> >>
> >> +if (migrate_get_current()->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) 
> >> {
> >> +return 0;
> >> +}
> >> +
> >>  qemu_fflush(f);
> >>  ret = qemu_rdma_drain_cq(f, rdma);
> >>
> >> --
> >> 1.8.3.1
> >>
> > --
> > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH] block/gluster: defend on legacy ftruncate api use

2018-04-12 Thread Niels de Vos

On Thu, Apr 12, 2018 at 11:21:42AM -0500, Eric Blake wrote:
> n 04/12/2018 08:31 AM, Niels de Vos wrote:
> > This change looks good to me, but a commit message would have been
> > helpful. I suggest something like this:
> > 
> >   Gluster 4.0 changed the signature of glfs_ftruncate(). The function
> >   now has two additional arguments, namely prestat and poststat. These
> >   provide not benefit for QEMU, so ignoring them and passing NULL makes
> >   the gluster-block driver compile with the new Gluster version again.
> > 
> > And maybe add this too:
> > 
> >   Glusters libgfapi uses symbol versioning and provides backwards
> >   compatible functions. Binaries compiled against previous versions of
> >   Gluster keep on functioning with the new library.
> > 
> 
> >> @@ -3856,6 +3857,9 @@ if test "$glusterfs" != "no" ; then
> >>glusterfs_fallocate="yes"
> >>glusterfs_zerofill="yes"
> >>  fi
> >> +if ! $pkg_config --atleast-version=7.4 glusterfs-api; then
> >> +  glusterfs_legacy_ftruncate="yes"
> 
> Also, version-based tests are lousy.  Feature-based tests (does a call
> to glfs_ftruncate(0, 0) compile without error? then we are legacy) are
> less brittle, especially when features can be backported across
> versions.  So this should be reworked to a compile check, rather than a
> version query.

In this case, the cange to glfs_ftruncate() will not be backported in
the releases from the Gluster Community or any distributions that I am
familiar with.

I agree that a compile-test is safer and can understand that those are
preferred over pkg-config version checks.

Niels

Re: [Qemu-devel] [PATCH 1/4] pc-bios/s390-ccw: rename MAX_TABLE_ENTRIES to MAX_BOOT_ENTRIES

2018-04-12 Thread Thomas Huth

On 10.04.2018 17:01, Collin Walling wrote:
> The MAX_TABLE_ENTRIES constant has a name that is too generic. As we
> want to declare a limit for boot menu entries, let's rename it to a more
> fitting MAX_BOOT_ENTRIES and set its value to 31 (30 boot entries and
> 1 default entry). Also we move it from bootmap.h to s390-ccw.h to make
> it available for menu.c in a later patch.
> 
> Signed-off-by: Collin Walling 
> ---
>  pc-bios/s390-ccw/bootmap.c  | 6 +++---
>  pc-bios/s390-ccw/bootmap.h  | 2 --
>  pc-bios/s390-ccw/s390-ccw.h | 2 ++
>  3 files changed, 5 insertions(+), 5 deletions(-)

Reviewed-by: Thomas Huth

Re: [Qemu-devel] [Qemu-arm] [PATCH for-2.13 3/4] arm: always start from first_cpu when registering loader cpu reset callback

2018-04-12 Thread Peter Maydell

On 12 April 2018 at 17:40, Igor Mammedov  wrote:
> if arm_load_kernel() were passed non first_cpu, QEMU would end up
> with partially set do_cpu_reset() callback leaving some CPUs without it.
>
> Make sure that do_cpu_reset() is registered for all CPUs by enumerating
> CPUs from first_cpu.
>
> Signed-off-by: Igor Mammedov 
> ---
>  hw/arm/boot.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/hw/arm/boot.c b/hw/arm/boot.c
> index 2f464ca..2591fee 100644
> --- a/hw/arm/boot.c
> +++ b/hw/arm/boot.c
> @@ -1188,7 +1188,7 @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info 
> *info)
>   * actually loading a kernel, the handler is also responsible for
>   * arranging that we start it correctly.
>   */
> -for (cs = CPU(cpu); cs; cs = CPU_NEXT(cs)) {
> +for (cs = first_cpu; cs; cs = CPU_NEXT(cs)) {
>  qemu_register_reset(do_cpu_reset, ARM_CPU(cs));
>  }
>  }

Definitely a bug fix, so:
Reviewed-by: Peter Maydell 

I think though that in at least some cases we'll still mishandle
being passed anything other than first_cpu as the CPU pointer,
because in do_cpu_reset() we do some checks for "do this if
cs == first_cpu", on the assumption that first_cpu is the
primary CPU that we're booting. We should instead I suppose
be checking against the CPU pointer we're given as the
arm_load_kernel() argument (which I think do_cpu_reset() can
get at via info->load_kernel_notifier.cpu).

We should probably analyse which boards actually pass anything
other than first_cpu -- I suspect it will end up just being
the xilinx board which has both A and R profile cores...

thanks
-- PMM

Re: [Qemu-devel] [PATCH for-2.13 1/4] arm: reuse arm_boot_address_space() in armv7m_load_kernel()

2018-04-12 Thread Peter Maydell

On 12 April 2018 at 17:40, Igor Mammedov  wrote:
> reduce code duplication by resusing arm_boot_address_space()
>
> Signed-off-by: Igor Mammedov 
> ---
>  include/hw/arm/arm.h |  2 ++
>  hw/arm/armv7m.c  | 10 +-
>  hw/arm/boot.c| 16 
>  3 files changed, 11 insertions(+), 17 deletions(-)

This minor duplication was deliberate, because hw/arm/armv7m.c
is for M profile cores only, and hw/arm/boot.c is for
A/R profile cores only. I don't think that saving a
single if/else in armv7m.c really justifies blurring
that dividing line.

thanks
-- PMM

Re: [Qemu-devel] [PATCH 04/14] timer: ds1338 clarify HOUR handling

2018-04-12 Thread Peter Maydell

On 24 March 2018 at 19:24, Michael Davidsaver  wrote:
> Simplify and comment the translation between
> registers and struct tm.
>
> Signed-off-by: Michael Davidsaver 
> ---
>  hw/timer/ds1338.c | 35 +--
>  1 file changed, 17 insertions(+), 18 deletions(-)
>
> diff --git a/hw/timer/ds1338.c b/hw/timer/ds1338.c
> index 72a4692d60..9bcda26e51 100644
> --- a/hw/timer/ds1338.c
> +++ b/hw/timer/ds1338.c
> @@ -88,23 +88,23 @@ static void capture_current_time(DS1338State *s)
>   * which will be actually read by the data transfer operation.
>   */
>  struct tm now;
> +bool mode12 = ARRAY_FIELD_EX32(s->nvram, HOUR, SET12);
>  qemu_get_timedate(, s->offset);
> +
>  s->nvram[R_SEC] = to_bcd(now.tm_sec);
>  s->nvram[R_MIN] = to_bcd(now.tm_min);
> -if (ARRAY_FIELD_EX32(s->nvram, HOUR, SET12)) {
> -int tmp = now.tm_hour;
> -if (tmp % 12 == 0) {
> -tmp += 12;
> -}
> -s->nvram[R_HOUR] = 0;
> +s->nvram[R_HOUR] = 0;
> +if (mode12) {
> +/* map 0-23 to 1-12 am/pm */
>  ARRAY_FIELD_DP32(s->nvram, HOUR, SET12, 1);
> -if (tmp <= 12) {
> -ARRAY_FIELD_DP32(s->nvram, HOUR, AMPM, 0);
> -ARRAY_FIELD_DP32(s->nvram, HOUR, HOUR12, to_bcd(tmp));
> -} else {
> -ARRAY_FIELD_DP32(s->nvram, HOUR, AMPM, 1);
> -ARRAY_FIELD_DP32(s->nvram, HOUR, HOUR12, to_bcd(tmp - 12));

Oh, you're rewriting all this code from the earlier patch anyway.
In that case you should definitely just have the earlier patch
match the logic that the existing code did, so that it's easier
to review as being a no-change patch.

> +ARRAY_FIELD_DP32(s->nvram, HOUR, AMPM, now.tm_hour >= 12u);
> +now.tm_hour %= 12u; /* wrap 0-23 to 0-11 */
> +if (now.tm_hour == 0u) {
> +/* midnight/noon stored as 12 */
> +now.tm_hour = 12u;
>  }
> +ARRAY_FIELD_DP32(s->nvram, HOUR, HOUR12, to_bcd(now.tm_hour));
> +
>  } else {
>  ARRAY_FIELD_DP32(s->nvram, HOUR, HOUR24, to_bcd(now.tm_hour));
>  }
> @@ -182,14 +182,13 @@ static int ds1338_send(I2CSlave *i2c, uint8_t data)
>  break;
>  case R_HOUR:
>  if (FIELD_EX32(data, HOUR, SET12)) {
> -int tmp = from_bcd(FIELD_EX32(data, HOUR, HOUR12));
> +/* 12 hour (1-12) */
> +/* read and wrap 1-12 -> 0-11 */
> +now.tm_hour = from_bcd(FIELD_EX32(data, HOUR, HOUR12)) % 12u;
>  if (FIELD_EX32(data, HOUR, AMPM)) {
> -tmp += 12;
> +now.tm_hour += 12;
>  }
> -if (tmp % 12 == 0) {
> -tmp -= 12;
> -}
> -now.tm_hour = tmp;
> +
>  } else {
>  now.tm_hour = from_bcd(FIELD_EX32(data, HOUR, HOUR24));
>  }
> --
> 2.11.0
>


thanks
-- PMM

Re: [Qemu-devel] [PATCH 03/14] timer: ds1338 persist 12-hour mode selection

2018-04-12 Thread Peter Maydell

On 24 March 2018 at 19:24, Michael Davidsaver  wrote:
> Need to save HOUR[HOUR12] bit to keep
> track of guest selection of 12-hour mode.
> Write through current time registers to
> achieve this.  Will be overwritten
> by the next read/latch.
>
> Signed-off-by: Michael Davidsaver 
> ---
>  hw/timer/ds1338.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/hw/timer/ds1338.c b/hw/timer/ds1338.c
> index b5630e214a..72a4692d60 100644
> --- a/hw/timer/ds1338.c
> +++ b/hw/timer/ds1338.c
> @@ -224,10 +224,8 @@ static int ds1338_send(I2CSlave *i2c, uint8_t data)
> value unchanged. */
>  data = (data & ~CTRL_OSF) | (data & s->nvram[s->ptr] & CTRL_OSF);
>
> -s->nvram[s->ptr] = data;
> -} else {
> -s->nvram[s->ptr] = data;
>  }
> +s->nvram[s->ptr] = data;
>  inc_regptr(s);
>  return 0;

The code change here looks like a reasonable no-behaviour-change
simplification of the code, but it doesn't match up with
the description in the commit message ?

thanks
-- PMM

Re: [Qemu-devel] [PATCH 02/14] timer: ds1338 use registerfields.h

2018-04-12 Thread Peter Maydell

On 24 March 2018 at 19:24, Michael Davidsaver  wrote:
> Use names for registers and bits except
> for R_CTRL which will be dealt with later,
> and isn't modeled anyway.
>
> Signed-off-by: Michael Davidsaver 

> ---
>  hw/timer/ds1338.c | 84 
> ++-
>  1 file changed, 58 insertions(+), 26 deletions(-)
> @@ -61,25 +89,29 @@ static void capture_current_time(DS1338State *s)
>   */
>  struct tm now;
>  qemu_get_timedate(, s->offset);
> -s->nvram[0] = to_bcd(now.tm_sec);
> -s->nvram[1] = to_bcd(now.tm_min);
> -if (s->nvram[2] & HOURS_12) {
> +s->nvram[R_SEC] = to_bcd(now.tm_sec);
> +s->nvram[R_MIN] = to_bcd(now.tm_min);
> +if (ARRAY_FIELD_EX32(s->nvram, HOUR, SET12)) {
>  int tmp = now.tm_hour;
>  if (tmp % 12 == 0) {
>  tmp += 12;
>  }
> +s->nvram[R_HOUR] = 0;
> +ARRAY_FIELD_DP32(s->nvram, HOUR, SET12, 1);
>  if (tmp <= 12) {
> -s->nvram[2] = HOURS_12 | to_bcd(tmp);
> +ARRAY_FIELD_DP32(s->nvram, HOUR, AMPM, 0);

This is zeroing a bit that's already zero.

> +ARRAY_FIELD_DP32(s->nvram, HOUR, HOUR12, to_bcd(tmp));
>  } else {
> -s->nvram[2] = HOURS_12 | HOURS_PM | to_bcd(tmp - 12);
> +ARRAY_FIELD_DP32(s->nvram, HOUR, AMPM, 1);
> +ARRAY_FIELD_DP32(s->nvram, HOUR, HOUR12, to_bcd(tmp - 12));
>  }
>  } else {
> -s->nvram[2] = to_bcd(now.tm_hour);
> +ARRAY_FIELD_DP32(s->nvram, HOUR, HOUR24, to_bcd(now.tm_hour));

This else clause used to definitely set all the bits in nvram[2],
and now it doesn't (the s->nvram[R_HOUR] = 0 is only in the if {}).

For cases like this where we're setting the whole thing, I think
s->nvram[R_HOUR] = R_HOUR_SET12_MASK | to_bcd(tmp);

s->nvram[R_HOUR] = R_HOUR_SET12_MASK | R_HOUR_AMPM_MASK | to_bcd(tmp - 12);

s->nvram[R_HOUR] = R_HOUR_HOUR24_MASK | to_bcm(now.tm_hour);

is clearer than individually setting each field, but you can
do the "clear and then set individual fields" approach if you like.

Otherwise

Reviewed-by: Peter Maydell 

thanks
-- PMM

Re: [Qemu-devel] [PULL, 08/36] qapi: Remove qobject_to_X() functions

2018-04-12 Thread Yuval Shaia

Hi Eric,

On Mon, Mar 12, 2018 at 01:35:59PM -0500, Eric Blake wrote:
> From: Max Reitz 
> 
> They are no longer needed now.

I'm doing some off-list development which use these functions.
Unfortunately i'm not subscribed (yet) to qemu-devel so obviously missed
this email.

Is there any alternative to these functions?

My code looks something like this:

QObject *oid;
unsigned long id;

oid = qlist_pop(list);
id = qnum_get_uint(qobject_to_qnum(oid));

Since objects in the list are QObject type.

Yuval

> 
> Signed-off-by: Max Reitz 
> Reviewed-by: Alberto Garcia 
> Message-Id: <20180224154033.29559-5-mre...@redhat.com>
> Reviewed-by: Eric Blake 
> Signed-off-by: Eric Blake 
> ---
>  include/qapi/qmp/qbool.h   |  1 -
>  include/qapi/qmp/qdict.h   |  1 -
>  include/qapi/qmp/qlist.h   |  1 -
>  include/qapi/qmp/qnum.h|  1 -
>  include/qapi/qmp/qstring.h |  1 -
>  qobject/qbool.c| 11 ---
>  qobject/qdict.c| 11 ---
>  qobject/qlist.c| 11 ---
>  qobject/qnum.c | 11 ---
>  qobject/qstring.c  | 11 ---
>  10 files changed, 60 deletions(-)
> 
> diff --git a/include/qapi/qmp/qbool.h b/include/qapi/qmp/qbool.h
> index 629c508d34f..b9a44a1bfe0 100644
> --- a/include/qapi/qmp/qbool.h
> +++ b/include/qapi/qmp/qbool.h
> @@ -23,7 +23,6 @@ struct QBool {
> 
>  QBool *qbool_from_bool(bool value);
>  bool qbool_get_bool(const QBool *qb);
> -QBool *qobject_to_qbool(const QObject *obj);
>  bool qbool_is_equal(const QObject *x, const QObject *y);
>  void qbool_destroy_obj(QObject *obj);
> 
> diff --git a/include/qapi/qmp/qdict.h b/include/qapi/qmp/qdict.h
> index 7c6d8445495..2cc3e906f7b 100644
> --- a/include/qapi/qmp/qdict.h
> +++ b/include/qapi/qmp/qdict.h
> @@ -39,7 +39,6 @@ void qdict_put_obj(QDict *qdict, const char *key, QObject 
> *value);
>  void qdict_del(QDict *qdict, const char *key);
>  int qdict_haskey(const QDict *qdict, const char *key);
>  QObject *qdict_get(const QDict *qdict, const char *key);
> -QDict *qobject_to_qdict(const QObject *obj);
>  bool qdict_is_equal(const QObject *x, const QObject *y);
>  void qdict_iter(const QDict *qdict,
>  void (*iter)(const char *key, QObject *obj, void *opaque),
> diff --git a/include/qapi/qmp/qlist.h b/include/qapi/qmp/qlist.h
> index 5fd976a3981..5c673acb060 100644
> --- a/include/qapi/qmp/qlist.h
> +++ b/include/qapi/qmp/qlist.h
> @@ -53,7 +53,6 @@ QObject *qlist_pop(QList *qlist);
>  QObject *qlist_peek(QList *qlist);
>  int qlist_empty(const QList *qlist);
>  size_t qlist_size(const QList *qlist);
> -QList *qobject_to_qlist(const QObject *obj);
>  bool qlist_is_equal(const QObject *x, const QObject *y);
>  void qlist_destroy_obj(QObject *obj);
> 
> diff --git a/include/qapi/qmp/qnum.h b/include/qapi/qmp/qnum.h
> index 15e3971c7f1..3e47475b2cb 100644
> --- a/include/qapi/qmp/qnum.h
> +++ b/include/qapi/qmp/qnum.h
> @@ -68,7 +68,6 @@ double qnum_get_double(QNum *qn);
> 
>  char *qnum_to_string(QNum *qn);
> 
> -QNum *qobject_to_qnum(const QObject *obj);
>  bool qnum_is_equal(const QObject *x, const QObject *y);
>  void qnum_destroy_obj(QObject *obj);
> 
> diff --git a/include/qapi/qmp/qstring.h b/include/qapi/qmp/qstring.h
> index 98070ef3d6a..b72843fc1b0 100644
> --- a/include/qapi/qmp/qstring.h
> +++ b/include/qapi/qmp/qstring.h
> @@ -30,7 +30,6 @@ const char *qstring_get_str(const QString *qstring);
>  void qstring_append_int(QString *qstring, int64_t value);
>  void qstring_append(QString *qstring, const char *str);
>  void qstring_append_chr(QString *qstring, int c);
> -QString *qobject_to_qstring(const QObject *obj);
>  bool qstring_is_equal(const QObject *x, const QObject *y);
>  void qstring_destroy_obj(QObject *obj);
> 
> diff --git a/qobject/qbool.c b/qobject/qbool.c
> index 5be6277cca8..b58249925c2 100644
> --- a/qobject/qbool.c
> +++ b/qobject/qbool.c
> @@ -39,17 +39,6 @@ bool qbool_get_bool(const QBool *qb)
>  return qb->value;
>  }
> 
> -/**
> - * qobject_to_qbool(): Convert a QObject into a QBool
> - */
> -QBool *qobject_to_qbool(const QObject *obj)
> -{
> -if (!obj || qobject_type(obj) != QTYPE_QBOOL) {
> -return NULL;
> -}
> -return container_of(obj, QBool, base);
> -}
> -
>  /**
>   * qbool_is_equal(): Test whether the two QBools are equal
>   */
> diff --git a/qobject/qdict.c b/qobject/qdict.c
> index 1e588123d00..45c8b53361f 100644
> --- a/qobject/qdict.c
> +++ b/qobject/qdict.c
> @@ -37,17 +37,6 @@ QDict *qdict_new(void)
>  return qdict;
>  }
> 
> -/**
> - * qobject_to_qdict(): Convert a QObject into a QDict
> - */
> -QDict *qobject_to_qdict(const QObject *obj)
> -{
> -if (!obj || qobject_type(obj) != QTYPE_QDICT) {
> -return NULL;
> -}
> -return container_of(obj, QDict, base);
> -}
> -
>  /**
>   * tdb_hash(): based on the hash agorithm from gdbm, via tdb
>   * (from module-init-tools)
> diff

Re: [Qemu-devel] [PATCH v1] s390x/kvm: cleanup calls to cpu_synchronize_state()

2018-04-12 Thread Thomas Huth

On 12.04.2018 11:35, David Hildenbrand wrote:
> We have a call to cpu_synchronize_state() on every kvm_arch_handle_exit().
> 
> Let's remove the ones that are no longer needed.
> 
> Remaining places (for s390x) are in
> - target/s390x/sigp.c, on the target CPU
> - target/s390x/cpu.c:s390_cpu_get_crash_info()
> 
> While at it, use kvm_cpu_synchronize_state() instead of
> cpu_synchronize_state() in KVM code. (suggested by Thomas Huth)
> 
> Signed-off-by: David Hildenbrand 
> ---
>  hw/s390x/s390-pci-inst.c |  8 
>  target/s390x/kvm.c   | 20 +---
>  2 files changed, 1 insertion(+), 27 deletions(-)

Wow, that was really a mess, looking at which different "levels" the
cpu_synchronize_state was done before. Good that you cleaned it up now.

Reviewed-by: Thomas Huth

Re: [Qemu-devel] [PATCH v3 12/22] target/arm: Filter cycle counter based on PMCCFILTR_EL0

2018-04-12 Thread Aaron Lindsay

On Apr 12 18:15, Peter Maydell wrote:
> On 16 March 2018 at 20:31, Aaron Lindsay  wrote:
> > The pmu_counter_filtered and pmu_op_start/finish functions are generic
> > (as opposed to PMCCNTR-specific) to allow for the implementation of
> > other events.
> >
> > Signed-off-by: Aaron Lindsay 
> > ---
> >  target/arm/cpu.c|  3 ++
> >  target/arm/cpu.h| 37 +++
> >  target/arm/helper.c | 87 
> > ++---
> >  3 files changed, 116 insertions(+), 11 deletions(-)
> >
> > diff --git a/target/arm/cpu.c b/target/arm/cpu.c
> > index a2cb21e..b0d032c 100644
> > --- a/target/arm/cpu.c
> > +++ b/target/arm/cpu.c
> > @@ -887,6 +887,9 @@ static void arm_cpu_realizefn(DeviceState *dev, Error 
> > **errp)
> >  if (!cpu->has_pmu) {
> >  unset_feature(env, ARM_FEATURE_PMU);
> >  cpu->id_aa64dfr0 &= ~0xf00;
> > +} else {
> > +arm_register_pre_el_change_hook(cpu, _pre_el_change, 0);
> > +arm_register_el_change_hook(cpu, _post_el_change, 0);
> 
> You probably don't want to do this if we're using KVM, as there
> are some code paths where we call do_interrupt() when using KVM
> which will trigger the hooks and likely do unexpected things.
> (For instance kvm_arm_handle_debug() does this to set the guest
> up to take debug exceptions.)

Okay, I'll surround this with `if (!kvm_enabled())` for the next pass.
> 
> >  }
> >
> >  if (!arm_feature(env, ARM_FEATURE_EL2)) {
> > diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> > index b0ef727..9c3b5ef 100644
> > --- a/target/arm/cpu.h
> > +++ b/target/arm/cpu.h
> > @@ -458,6 +458,11 @@ typedef struct CPUARMState {
> >   * was reset. Otherwise it stores the counter value
> >   */
> >  uint64_t c15_ccnt;
> > +/* ccnt_cached_cycles is used to hold the last cycle count when
> > + * c15_ccnt holds the guest-visible count instead of the delta 
> > during
> > + * PMU operations which require this.
> > + */
> > +uint64_t ccnt_cached_cycles;
> 
> Can this ever hold valid state at a point when we need to do VM
> migration, or is it purely temporary ?

I believe that as of this version of the patch it is temporary and will
not need to be migrated. However, I believe it's going to be necessary
to have two variables to represent the state of each counter in order to
implement interrupt on overflow. 

> 
> >  uint64_t pmccfiltr_el0; /* Performance Monitor Filter Register */
> >  uint64_t vpidr_el2; /* Virtualization Processor ID Register */
> >  uint64_t vmpidr_el2; /* Virtualization Multiprocessor ID Register 
> > */
> > @@ -896,15 +901,35 @@ int cpu_arm_signal_handler(int host_signum, void 
> > *pinfo,
> > void *puc);
> >
> >  /**
> > - * pmccntr_sync
> > + * pmccntr_op_start/finish
> 
> Shouldn't this doc change and prototype change have gone with the earlier
> patch that changed the implementation?

Hmmm, yes. Seems like I got pmccntr_op_start and pmu_op_start confused
when staging this.

> >   * @env: CPUARMState
> >   *
> > - * Synchronises the counter in the PMCCNTR. This must always be called 
> > twice,
> > - * once before any action that might affect the timer and again afterwards.
> > - * The function is used to swap the state of the register if required.
> > - * This only happens when not in user mode (!CONFIG_USER_ONLY)
> > + * Convert the counter in the PMCCNTR between its delta form (the typical 
> > mode
> > + * when it's enabled) and the guest-visible value. These two calls must 
> > always
> > + * surround any action which might affect the counter, and the return value
> > + * from pmccntr_op_start must be supplied as the second argument to
> > + * pmccntr_op_finish.
> > + */
> > +uint64_t pmccntr_op_start(CPUARMState *env);
> > +void pmccntr_op_finish(CPUARMState *env, uint64_t prev_cycles);
> > +
> > +/**
> > + * pmu_op_start/finish
> > + * @env: CPUARMState
> > + *
> > + * Convert all PMU counters between their delta form (the typical mode when
> > + * they are enabled) and the guest-visible values. These two calls must
> > + * surround any action which might affect the counters, and the return 
> > value
> > + * from pmu_op_start must be supplied as the second argument to 
> > pmu_op_finish.
> > + */
> > +uint64_t pmu_op_start(CPUARMState *env);
> > +void pmu_op_finish(CPUARMState *env, uint64_t prev_cycles);
> > +
> > +/**
> > + * Functions to register as EL change hooks for PMU mode filtering
> >   */
> > -void pmccntr_sync(CPUARMState *env);
> > +void pmu_pre_el_change(ARMCPU *cpu, void *ignored);
> > +void pmu_post_el_change(ARMCPU *cpu, void *ignored);
> >
> >  /* SCTLR bit meanings. Several bits have been reused in newer
> >   * versions of the architecture; in that case we define constants
> > diff --git a/target/arm/helper.c b/target/arm/helper.c
> > index 0102357..95b09d6 100644
> > ---

Re: [Qemu-devel] [PATCH v3 00/22] More fully implement ARM PMUv3

2018-04-12 Thread Peter Maydell

On 16 March 2018 at 20:30, Aaron Lindsay  wrote:
> The ARM PMU implementation currently contains a basic cycle counter, but it is
> often useful to gather counts of other events and filter them based on
> execution mode. These patches flesh out the implementations of various PMU
> registers including PM[X]EVCNTR and PM[X]EVTYPER, add a struct definition to
> represent arbitrary counter types, implement mode filtering, and add
> instruction, cycle, and software increment events.

Hi; sorry it's taken me a while to get to this (I've been focusing
on for-2.12 work). I've now reviewed most of the patches in this
set. My suggestion is that you fix the stuff I've commented on
and send out a v4, and then I can take some of the patches from
the start of that into target-arm.next, and I'll review the tail
end of the set then.

thanks
-- PMM

Re: [Qemu-devel] [PATCH v3 17/22] target/arm: Split arm_ccnt_enabled into generic pmu_counter_enabled

2018-04-12 Thread Peter Maydell

On 16 March 2018 at 20:31, Aaron Lindsay  wrote:
> Signed-off-by: Aaron Lindsay 
> Reviewed-by: Peter Maydell 
> ---
>  target/arm/helper.c | 11 ---
>  1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/target/arm/helper.c b/target/arm/helper.c
> index f5e800e..2073d56 100644
> --- a/target/arm/helper.c
> +++ b/target/arm/helper.c
> @@ -1009,17 +1009,22 @@ static CPAccessResult pmreg_access_ccntr(CPUARMState 
> *env,
>  return pmreg_access(env, ri, isread);
>  }
>
> -static inline bool arm_ccnt_enabled(CPUARMState *env)
> +static inline bool pmu_counter_enabled(CPUARMState *env, uint8_t counter)
>  {
>  /* Does not check PMCCFILTR_EL0, which is handled by 
> pmu_counter_filtered */
> -
> -if (!(env->cp15.c9_pmcr & PMCRE) || !(env->cp15.c9_pmcnten & (1 << 31))) 
> {
> +if (!(env->cp15.c9_pmcr & PMCRE) ||
> +!(env->cp15.c9_pmcnten & (1 << counter))) {
>  return false;
>  }
>
>  return true;
>  }
>
> +static inline bool arm_ccnt_enabled(CPUARMState *env)
> +{
> +return pmu_counter_enabled(env, 31);
> +}

We only use arm_ccnt_enabled() in a couple of places, so
I would just make those callsites use pmu_counter_enabled(env, 31)
directly.

thanks
-- PMM

Re: [Qemu-devel] [PATCH v3 16/22] target/arm: Implement PMOVSSET

2018-04-12 Thread Peter Maydell

On 16 March 2018 at 20:31, Aaron Lindsay  wrote:
> Adding an array for v7VE+ CP registers was necessary so that PMOVSSET
> wasn't defined for all v7 processors.
>
> Signed-off-by: Aaron Lindsay 
> ---
>  target/arm/helper.c | 32 +++-
>  1 file changed, 31 insertions(+), 1 deletion(-)
>
> diff --git a/target/arm/helper.c b/target/arm/helper.c
> index d4f06e6..f5e800e 100644
> --- a/target/arm/helper.c
> +++ b/target/arm/helper.c
> @@ -1241,9 +1241,17 @@ static void pmcntenclr_write(CPUARMState *env, const 
> ARMCPRegInfo *ri,
>  static void pmovsr_write(CPUARMState *env, const ARMCPRegInfo *ri,
>   uint64_t value)
>  {
> +value &= PMU_COUNTER_MASK(env);
>  env->cp15.c9_pmovsr &= ~value;
>  }
>
> +static void pmovsset_write(CPUARMState *env, const ARMCPRegInfo *ri,
> + uint64_t value)
> +{
> +value &= PMU_COUNTER_MASK(env);
> +env->cp15.c9_pmovsr |= value;
> +}
> +
>  static void pmxevtyper_write(CPUARMState *env, const ARMCPRegInfo *ri,
>   uint64_t value)
>  {
> @@ -1406,7 +1414,7 @@ static const ARMCPRegInfo v7_cp_reginfo[] = {
>.fieldoffset = offsetof(CPUARMState, cp15.c9_pmcnten),
>.writefn = pmcntenclr_write },
>  { .name = "PMOVSR", .cp = 15, .crn = 9, .crm = 12, .opc1 = 0, .opc2 = 3,
> -  .access = PL0_RW, .fieldoffset = offsetof(CPUARMState, cp15.c9_pmovsr),
> +  .access = PL0_RW, .fieldoffset = offsetoflow32(CPUARMState, 
> cp15.c9_pmovsr),
>.accessfn = pmreg_access,
>.writefn = pmovsr_write,
>.raw_writefn = raw_write },

This change is half of a bug fix (the other half being to make the
field in the CPU struct be uint64_t rather than uint32_t). That
bug fix should be in a patch of its own.

pmuserenr has the same bug (uint32_t state field accessed by
a STATE_AA64 sysreg).

thanks
-- PMM

Re: [Qemu-devel] [PATCH v3 10/22] target/arm: Allow EL change hooks to do IO

2018-04-12 Thread Peter Maydell

On 12 April 2018 at 18:08, Aaron Lindsay  wrote:
> On Apr 12 17:53, Peter Maydell wrote:
>> On 16 March 2018 at 20:31, Aaron Lindsay  wrote:
>> > During code generation, surround CPSR writes and exception returns which
>> > call the EL change hooks with gen_io_start/end. The immediate need is
>> > for the PMU to access the clock and icount during EL change to support
>> > mode filtering.
>> >
>> > Signed-off-by: Aaron Lindsay 
>> > ---
>> >  target/arm/translate-a64.c | 2 ++
>> >  target/arm/translate.c | 4 
>> >  2 files changed, 6 insertions(+)
>> >
>> > diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
>> > index 31ff047..e1ae676 100644
>> > --- a/target/arm/translate-a64.c
>> > +++ b/target/arm/translate-a64.c
>> > @@ -1919,7 +1919,9 @@ static void disas_uncond_b_reg(DisasContext *s, 
>> > uint32_t insn)
>> >  unallocated_encoding(s);
>> >  return;
>> >  }
>> > +gen_io_start();
>> >  gen_helper_exception_return(cpu_env);
>> > +gen_io_end();
>>
>> You don't want to call gen_io_start() or gen_io_end() unless
>> tb_cflags(s->base.tb) & CF_USE_ICOUNT) is true.
>>
>> (Ditto in the other cases below.)
>
> I assume there's nothing tricky about this and updating this as follows
> is sufficient?

Yes, that's sufficient. (The other thing that needs to happen
for a gen_io_start/end is that the insn has to end the TB --
but in all these cases that's already true as they set
s->base.is_jmp = DISAS_EXIT.)

thanks
-- PMM

Re: [Qemu-devel] [PATCH v3 09/22] target/arm: Add pre-EL change hooks

2018-04-12 Thread Peter Maydell

On 12 April 2018 at 18:01, Aaron Lindsay  wrote:
> On Apr 12 17:49, Peter Maydell wrote:
>> Do we make the guarantee that if we call the pre-change hook
>> then we will definitely subsequently call the post-change hook?
>> (ie does the PMU hook rely on that?)
>
> Yes, the PMU relies on the pre- and post- hooks being called in pairs
> since they drive the state machine of sorts that exists in the variables
> holding the counter values/deltas. And unless I've really screwed up the
> implementation, I believe the change hooks in my patchset make that
> guarantee.

Yes, I think I agree. We should document in the comment that
this is a guarantee that we make.

thanks
-- PMM

Re: [Qemu-devel] [PATCH v3 15/22] target/arm: Add ARM_FEATURE_V7VE for v7 Virtualization Extensions

2018-04-12 Thread Peter Maydell

On 16 March 2018 at 20:31, Aaron Lindsay  wrote:
> Signed-off-by: Aaron Lindsay 
> ---
>  target/arm/cpu.c | 3 +++
>  target/arm/cpu.h | 1 +
>  2 files changed, 4 insertions(+)
>
> diff --git a/target/arm/cpu.c b/target/arm/cpu.c
> index b0d032c..e544f1d 100644
> --- a/target/arm/cpu.c
> +++ b/target/arm/cpu.c
> @@ -765,6 +765,7 @@ static void arm_cpu_realizefn(DeviceState *dev, Error 
> **errp)
>  /* Some features automatically imply others: */
>  if (arm_feature(env, ARM_FEATURE_V8)) {
>  set_feature(env, ARM_FEATURE_V7);
> +set_feature(env, ARM_FEATURE_V7VE);
>  set_feature(env, ARM_FEATURE_ARM_DIV);
>  set_feature(env, ARM_FEATURE_LPAE);
>  }
> @@ -1481,6 +1482,7 @@ static void cortex_a7_initfn(Object *obj)
>
>  cpu->dtb_compatible = "arm,cortex-a7";
>  set_feature(>env, ARM_FEATURE_V7);
> +set_feature(>env, ARM_FEATURE_V7VE);
>  set_feature(>env, ARM_FEATURE_VFP4);
>  set_feature(>env, ARM_FEATURE_NEON);
>  set_feature(>env, ARM_FEATURE_THUMB2EE);
> @@ -1526,6 +1528,7 @@ static void cortex_a15_initfn(Object *obj)
>
>  cpu->dtb_compatible = "arm,cortex-a15";
>  set_feature(>env, ARM_FEATURE_V7);
> +set_feature(>env, ARM_FEATURE_V7VE);
>  set_feature(>env, ARM_FEATURE_VFP4);
>  set_feature(>env, ARM_FEATURE_NEON);
>  set_feature(>env, ARM_FEATURE_THUMB2EE);
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index fb2f983..cc1e2fb 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -1439,6 +1439,7 @@ enum arm_features {
>  ARM_FEATURE_OMAPCP, /* OMAP specific CP15 ops handling.  */
>  ARM_FEATURE_THUMB2EE,
>  ARM_FEATURE_V7MP,/* v7 Multiprocessing Extensions */
> +ARM_FEATURE_V7VE,/* v7 with Virtualization Extensions */
>  ARM_FEATURE_V4T,
>  ARM_FEATURE_V5,
>  ARM_FEATURE_STRONGARM,

What's the difference between this and ARM_FEATURE_EL2 ?

thanks
-- PMM

[Qemu-devel] [PATCH v3] RFC: target/arm: Send interrupts on PMU counter overflow

2018-04-12 Thread Aaron Lindsay

On Mar 16 16:30, Aaron Lindsay wrote:
> I aim to eventually add raising interrupts on counter overflow, but that is 
> not
> covered by this patchset. I think I have a reasonable grasp of the mechanics 
> of
> *how* to raise them, but am curious if anyone has thoughts on how to determine
> *when* to raise them - we don't want to call into PMU code every time an
> instruction is executed to check if any instruction counters have overflowed,
> etc. The main candidate I've seen for doing this so far would be to set up a
> QEMUTimer, but I haven't fully explored it. Does that seem plausible? Any
> other/better ideas?

I'm planning to post a full v4 of this patchset soon, pending a few
review fixes, but I figured I'd throw out an early version of a patch to
add interrupts on overflow in case it obviously has major issues that
will need to be addressed.

This patch sets up a QEMUTimer to get a callback when we expect counters
to next overflow and triggers an interrupt at that time.

Signed-off-by: Aaron Lindsay 
---
 target/arm/cpu.c|  11 +
 target/arm/cpu.h|   7 +++
 target/arm/helper.c | 129 
 3 files changed, 138 insertions(+), 9 deletions(-)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index df27188..9108c6b 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -740,6 +740,12 @@ static void arm_cpu_finalizefn(Object *obj)
 QLIST_REMOVE(hook, node);
 g_free(hook);
 }
+#ifndef CONFIG_USER_ONLY
+if (arm_feature(>env, ARM_FEATURE_PMU)) {
+timer_deinit(cpu->pmu_timer);
+timer_free(cpu->pmu_timer);
+}
+#endif
 }
 
 static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
@@ -907,6 +913,11 @@ static void arm_cpu_realizefn(DeviceState *dev, Error 
**errp)
 
 arm_register_pre_el_change_hook(cpu, _pre_el_change, 0);
 arm_register_el_change_hook(cpu, _post_el_change, 0);
+
+#ifndef CONFIG_USER_ONLY
+cpu->pmu_timer = timer_new(QEMU_CLOCK_VIRTUAL, 1, arm_pmu_timer_cb,
+cpu);
+#endif
 } else {
 cpu->pmceid0 = 0x;
 cpu->pmceid1 = 0x;
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 5e6bbd3..bc0867f 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -703,6 +703,8 @@ struct ARMCPU {
 
 /* Timers used by the generic (architected) timer */
 QEMUTimer *gt_timer[NUM_GTIMERS];
+/* Timer used by the PMU */
+QEMUTimer *pmu_timer;
 /* GPIO outputs for generic timer */
 qemu_irq gt_timer_outputs[NUM_GTIMERS];
 /* GPIO output for GICv3 maintenance interrupt signal */
@@ -934,6 +936,11 @@ void pmu_op_start(CPUARMState *env);
 void pmu_op_finish(CPUARMState *env);
 
 /**
+ * Called when a PMU counter is due to overflow
+ */
+void arm_pmu_timer_cb(void *opaque);
+
+/**
  * Functions to register as EL change hooks for PMU mode filtering
  */
 void pmu_pre_el_change(ARMCPU *cpu, void *ignored);
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 2147678..abe24dc 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -905,6 +905,7 @@ static const ARMCPRegInfo v6_cp_reginfo[] = {
 /* Definitions for the PMU registers */
 #define PMCRN_MASK  0xf800
 #define PMCRN_SHIFT 11
+#define PMCRLC  0x40
 #define PMCRD   0x8
 #define PMCRC   0x4
 #define PMCRP   0x2
@@ -919,6 +920,8 @@ static const ARMCPRegInfo v6_cp_reginfo[] = {
 #define PMXEVTYPER_MT 0x0200
 #define PMXEVTYPER_EVTCOUNT   0x03ff
 
+#define PMEVCNTR_OVERFLOW_MASK ((uint64_t)1 << 31)
+
 #define PMCCFILTR 0xf800
 #define PMCCFILTR_M   PMXEVTYPER_M
 #define PMCCFILTR_EL0 (PMCCFILTR | PMCCFILTR_M)
@@ -934,6 +937,11 @@ typedef struct pm_event {
 /* Retrieve the current count of the underlying event. The programmed
  * counters hold a difference from the return value from this function */
 uint64_t (*get_count)(CPUARMState *);
+/* Return how many nanoseconds it will take (at a minimum) for count events
+ * to occur. A negative value indicates the counter will never overflow, or
+ * that the counter has otherwise arranged for the overflow bit to be set
+ * and the PMU interrupt to be raised on overflow. */
+int64_t (*ns_per_count)(uint64_t);
 } pm_event;
 
 static bool event_always_supported(CPUARMState *env)
@@ -950,6 +958,11 @@ static uint64_t swinc_get_count(CPUARMState *env)
 return 0;
 }
 
+static int64_t swinc_ns_per(uint64_t ignored)
+{
+return -1;
+}
+
 /*
  * Return the underlying cycle count for the PMU cycle counters. If we're in
  * usermode, simply return 0.
@@ -965,6 +978,11 @@ static uint64_t cycles_get_count(CPUARMState *env)
 }
 
 #ifndef CONFIG_USER_ONLY
+static int64_t cycles_ns_per(uint64_t cycles)
+{
+return ARM_CPU_FREQ/NANOSECONDS_PER_SECOND;
+}
+
 static bool instructions_supported(CPUARMState *env)
 {
 return use_icount == 1 /* Precise instruction counting */;
@@ -974,22 +992,30 @@ static uint64_t

Re: [Qemu-devel] [PATCH v3 12/22] target/arm: Filter cycle counter based on PMCCFILTR_EL0

2018-04-12 Thread Peter Maydell

On 16 March 2018 at 20:31, Aaron Lindsay  wrote:
> The pmu_counter_filtered and pmu_op_start/finish functions are generic
> (as opposed to PMCCNTR-specific) to allow for the implementation of
> other events.
>
> Signed-off-by: Aaron Lindsay 
> ---
>  target/arm/cpu.c|  3 ++
>  target/arm/cpu.h| 37 +++
>  target/arm/helper.c | 87 
> ++---
>  3 files changed, 116 insertions(+), 11 deletions(-)
>
> diff --git a/target/arm/cpu.c b/target/arm/cpu.c
> index a2cb21e..b0d032c 100644
> --- a/target/arm/cpu.c
> +++ b/target/arm/cpu.c
> @@ -887,6 +887,9 @@ static void arm_cpu_realizefn(DeviceState *dev, Error 
> **errp)
>  if (!cpu->has_pmu) {
>  unset_feature(env, ARM_FEATURE_PMU);
>  cpu->id_aa64dfr0 &= ~0xf00;
> +} else {
> +arm_register_pre_el_change_hook(cpu, _pre_el_change, 0);
> +arm_register_el_change_hook(cpu, _post_el_change, 0);

You probably don't want to do this if we're using KVM, as there
are some code paths where we call do_interrupt() when using KVM
which will trigger the hooks and likely do unexpected things.
(For instance kvm_arm_handle_debug() does this to set the guest
up to take debug exceptions.)

>  }
>
>  if (!arm_feature(env, ARM_FEATURE_EL2)) {
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index b0ef727..9c3b5ef 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -458,6 +458,11 @@ typedef struct CPUARMState {
>   * was reset. Otherwise it stores the counter value
>   */
>  uint64_t c15_ccnt;
> +/* ccnt_cached_cycles is used to hold the last cycle count when
> + * c15_ccnt holds the guest-visible count instead of the delta during
> + * PMU operations which require this.
> + */
> +uint64_t ccnt_cached_cycles;

Can this ever hold valid state at a point when we need to do VM
migration, or is it purely temporary ?

>  uint64_t pmccfiltr_el0; /* Performance Monitor Filter Register */
>  uint64_t vpidr_el2; /* Virtualization Processor ID Register */
>  uint64_t vmpidr_el2; /* Virtualization Multiprocessor ID Register */
> @@ -896,15 +901,35 @@ int cpu_arm_signal_handler(int host_signum, void *pinfo,
> void *puc);
>
>  /**
> - * pmccntr_sync
> + * pmccntr_op_start/finish

Shouldn't this doc change and prototype change have gone with the earlier
patch that changed the implementation?

>   * @env: CPUARMState
>   *
> - * Synchronises the counter in the PMCCNTR. This must always be called twice,
> - * once before any action that might affect the timer and again afterwards.
> - * The function is used to swap the state of the register if required.
> - * This only happens when not in user mode (!CONFIG_USER_ONLY)
> + * Convert the counter in the PMCCNTR between its delta form (the typical 
> mode
> + * when it's enabled) and the guest-visible value. These two calls must 
> always
> + * surround any action which might affect the counter, and the return value
> + * from pmccntr_op_start must be supplied as the second argument to
> + * pmccntr_op_finish.
> + */
> +uint64_t pmccntr_op_start(CPUARMState *env);
> +void pmccntr_op_finish(CPUARMState *env, uint64_t prev_cycles);
> +
> +/**
> + * pmu_op_start/finish
> + * @env: CPUARMState
> + *
> + * Convert all PMU counters between their delta form (the typical mode when
> + * they are enabled) and the guest-visible values. These two calls must
> + * surround any action which might affect the counters, and the return value
> + * from pmu_op_start must be supplied as the second argument to 
> pmu_op_finish.
> + */
> +uint64_t pmu_op_start(CPUARMState *env);
> +void pmu_op_finish(CPUARMState *env, uint64_t prev_cycles);
> +
> +/**
> + * Functions to register as EL change hooks for PMU mode filtering
>   */
> -void pmccntr_sync(CPUARMState *env);
> +void pmu_pre_el_change(ARMCPU *cpu, void *ignored);
> +void pmu_post_el_change(ARMCPU *cpu, void *ignored);
>
>  /* SCTLR bit meanings. Several bits have been reused in newer
>   * versions of the architecture; in that case we define constants
> diff --git a/target/arm/helper.c b/target/arm/helper.c
> index 0102357..95b09d6 100644
> --- a/target/arm/helper.c
> +++ b/target/arm/helper.c
> @@ -908,6 +908,15 @@ static const ARMCPRegInfo v6_cp_reginfo[] = {
>  #define PMCRC   0x4
>  #define PMCRE   0x1
>
> +#define PMXEVTYPER_P  0x8000
> +#define PMXEVTYPER_U  0x4000
> +#define PMXEVTYPER_NSK0x2000
> +#define PMXEVTYPER_NSU0x1000
> +#define PMXEVTYPER_NSH0x0800
> +#define PMXEVTYPER_M  0x0400
> +#define PMXEVTYPER_MT 0x0200
> +#define PMXEVTYPER_EVTCOUNT   0x03ff
> +
>  #define PMU_NUM_COUNTERS(env) ((env->cp15.c9_pmcr & PMCRN_MASK) >> 
> PMCRN_SHIFT)
>  /* Bits allowed to be set/cleared for PMCNTEN* and PMINTEN* */
>  #define

Re: [Qemu-devel] [PATCH v3 10/22] target/arm: Allow EL change hooks to do IO

2018-04-12 Thread Aaron Lindsay

On Apr 12 17:53, Peter Maydell wrote:
> On 16 March 2018 at 20:31, Aaron Lindsay  wrote:
> > During code generation, surround CPSR writes and exception returns which
> > call the EL change hooks with gen_io_start/end. The immediate need is
> > for the PMU to access the clock and icount during EL change to support
> > mode filtering.
> >
> > Signed-off-by: Aaron Lindsay 
> > ---
> >  target/arm/translate-a64.c | 2 ++
> >  target/arm/translate.c | 4 
> >  2 files changed, 6 insertions(+)
> >
> > diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
> > index 31ff047..e1ae676 100644
> > --- a/target/arm/translate-a64.c
> > +++ b/target/arm/translate-a64.c
> > @@ -1919,7 +1919,9 @@ static void disas_uncond_b_reg(DisasContext *s, 
> > uint32_t insn)
> >  unallocated_encoding(s);
> >  return;
> >  }
> > +gen_io_start();
> >  gen_helper_exception_return(cpu_env);
> > +gen_io_end();
> 
> You don't want to call gen_io_start() or gen_io_end() unless
> tb_cflags(s->base.tb) & CF_USE_ICOUNT) is true.
> 
> (Ditto in the other cases below.)

I assume there's nothing tricky about this and updating this as follows
is sufficient?

> > +if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
> > +gen_io_start();
> > +}
> >  gen_helper_exception_return(cpu_env);
> > +if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
> > +gen_io_end();
> > +}

-Aaron

> 
> >  /* Must exit loop to check un-masked IRQs */
> >  s->base.is_jmp = DISAS_EXIT;
> >  return;
> > diff --git a/target/arm/translate.c b/target/arm/translate.c
> > index ba6ab7d..fd5871e 100644
> > --- a/target/arm/translate.c
> > +++ b/target/arm/translate.c
> > @@ -4536,7 +4536,9 @@ static void gen_rfe(DisasContext *s, TCGv_i32 pc, 
> > TCGv_i32 cpsr)
> >   * appropriately depending on the new Thumb bit, so it must
> >   * be called after storing the new PC.
> >   */
> > +gen_io_start();
> >  gen_helper_cpsr_write_eret(cpu_env, cpsr);
> > +gen_io_end();
> >  tcg_temp_free_i32(cpsr);
> >  /* Must exit loop to check un-masked IRQs */
> >  s->base.is_jmp = DISAS_EXIT;
> > @@ -9828,7 +9830,9 @@ static void disas_arm_insn(DisasContext *s, unsigned 
> > int insn)
> >  if (exc_return) {
> >  /* Restore CPSR from SPSR.  */
> >  tmp = load_cpu_field(spsr);
> > +gen_io_start();
> >  gen_helper_cpsr_write_eret(cpu_env, tmp);
> > +gen_io_end();
> >  tcg_temp_free_i32(tmp);
> >  /* Must exit loop to check un-masked IRQs */
> >  s->base.is_jmp = DISAS_EXIT;
> > --
> > Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, 
> > Inc.
> > Qualcomm Technologies, Inc. is a member of the
> > Code Aurora Forum, a Linux Foundation Collaborative Project.
> >
> 
> thanks
> -- PMM

-- 
Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

[Qemu-devel] [RFC] Intermediate block mirroring

2018-04-12 Thread Alberto Garcia

Hello,

I mentioned this some time ago, but I'd like to retake it now: I'm
checking how to copy arbitrary nodes on a backing chain, so if I have
e.g.

   [A] <- [B] <- [C] <- [D]

I'd like to end up with

   [A] <- [E] <- [C] <- [D]

where [E] is a copy of [B]. The most obvious use case is to move [B]
to a different storage backend.

At the moment we can already copy [B] into [E] (outside QEMU) and then
do

   change-backing-file device=[D] image-node-name=[C] backing-file=[E]

However this only changes the image on disk and the bs->backing_file
string in memory. QEMU keeps using the [B] BlockDriverState, and we
need to make it switch to [E].

One possible way to do this would be to modify blockdev-mirror so the
source image can be located anywhere on a chain. Currently there's
bs->backing_blocker preventing that, plus qmp_blockdev_mirror()
refuses to take any non-root source node. Other than permission
changes I don't think the algorithm itself needs any additional
modification, although I suppose we'd like to change the backing file
as part of the same job, and that would require API changes.

One other way is to have a more generic replace-node command which
would call bdrv_replace_node(), but I don't know if we want to expose
that and I don't have any other use case for it at the moment.

Opinions, comments?

Berto

Re: [Qemu-devel] [PATCH v3 09/22] target/arm: Add pre-EL change hooks

2018-04-12 Thread Aaron Lindsay

On Apr 12 17:49, Peter Maydell wrote:
> On 16 March 2018 at 20:31, Aaron Lindsay  wrote:
> > Because the design of the PMU requires that the counter values be
> > converted between their delta and guest-visible forms for mode
> > filtering, an additional hook which occurs before the EL is changed is
> > necessary.
> >
> > Signed-off-by: Aaron Lindsay 
> > ---
> >  target/arm/cpu.c   | 13 +
> >  target/arm/cpu.h   | 12 
> >  target/arm/helper.c| 14 --
> >  target/arm/internals.h |  7 +++
> >  target/arm/op_helper.c |  8 
> >  5 files changed, 44 insertions(+), 10 deletions(-)
> >
> > diff --git a/target/arm/cpu.c b/target/arm/cpu.c
> > index 5f782bf..a2cb21e 100644
> > --- a/target/arm/cpu.c
> > +++ b/target/arm/cpu.c
> > @@ -55,6 +55,18 @@ static bool arm_cpu_has_work(CPUState *cs)
> >   | CPU_INTERRUPT_EXITTB);
> >  }
> >
> > +void arm_register_pre_el_change_hook(ARMCPU *cpu, ARMELChangeHookFn *hook,
> > + void *opaque)
> > +{
> > +ARMELChangeHook *entry;
> > +entry = g_malloc0(sizeof (*entry));
> 
> g_new0().
> 
> > +
> > +entry->hook = hook;
> > +entry->opaque = opaque;
> > +
> > +QLIST_INSERT_HEAD(>pre_el_change_hooks, entry, node);
> > +}
> > +
> >  void arm_register_el_change_hook(ARMCPU *cpu, ARMELChangeHookFn *hook,
> >   void *opaque)
> >  {
> > @@ -747,6 +759,7 @@ static void arm_cpu_realizefn(DeviceState *dev, Error 
> > **errp)
> >  return;
> >  }
> >
> > +QLIST_INIT(>pre_el_change_hooks);
> >  QLIST_INIT(>el_change_hooks);
> >
> >  /* Some features automatically imply others: */
> > diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> > index 3b45d3d..b0ef727 100644
> > --- a/target/arm/cpu.h
> > +++ b/target/arm/cpu.h
> > @@ -832,6 +832,7 @@ struct ARMCPU {
> >   */
> >  bool cfgend;
> >
> > +QLIST_HEAD(, ARMELChangeHook) pre_el_change_hooks;
> >  QLIST_HEAD(, ARMELChangeHook) el_change_hooks;
> >
> >  int32_t node_id; /* NUMA node this CPU belongs to */
> > @@ -2895,12 +2896,15 @@ static inline AddressSpace 
> > *arm_addressspace(CPUState *cs, MemTxAttrs attrs)
> >  #endif
> >
> >  /**
> > + * arm_register_pre_el_change_hook:
> >   * arm_register_el_change_hook:
> > - * Register a hook function which will be called back whenever this
> > - * CPU changes exception level or mode. The hook function will be
> > - * passed a pointer to the ARMCPU and the opaque data pointer passed
> > - * to this function when the hook was registered.
> > + * Register a hook function which will be called back before or after this 
> > CPU
> > + * changes exception level or mode. The hook function will be passed a 
> > pointer
> > + * to the ARMCPU and the opaque data pointer passed to this function when 
> > the
> > + * hook was registered.
> >   */
> 
> I would just have one doc comment for each function, rather than
> trying to share. (Some day we may actually autogenerate HTML docs
> from these comments...)
> 
> Do we make the guarantee that if we call the pre-change hook
> then we will definitely subsequently call the post-change hook?
> (ie does the PMU hook rely on that?)

Yes, the PMU relies on the pre- and post- hooks being called in pairs
since they drive the state machine of sorts that exists in the variables
holding the counter values/deltas. And unless I've really screwed up the
implementation, I believe the change hooks in my patchset make that
guarantee.

-Aaron

> 
> Otherwise looks OK.
> 
> thanks
> -- PMM

-- 
Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

Re: [Qemu-devel] [PATCH v3 04/22] target/arm: Treat PMCCNTR as alias of PMCCNTR_EL0

2018-04-12 Thread Aaron Lindsay

On Apr 12 17:10, Peter Maydell wrote:
> On 16 March 2018 at 20:31, Aaron Lindsay  wrote:
> > They share the same underlying state
> >
> > Signed-off-by: Aaron Lindsay 
> > ---
> >  target/arm/helper.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/target/arm/helper.c b/target/arm/helper.c
> > index 5e48982..5634561 100644
> > --- a/target/arm/helper.c
> > +++ b/target/arm/helper.c
> > @@ -1318,7 +1318,7 @@ static const ARMCPRegInfo v7_cp_reginfo[] = {
> >.fieldoffset = offsetof(CPUARMState, cp15.c9_pmselr),
> >.writefn = pmselr_write, .raw_writefn = raw_write, },
> >  { .name = "PMCCNTR", .cp = 15, .crn = 9, .crm = 13, .opc1 = 0, .opc2 = 
> > 0,
> > -  .access = PL0_RW, .resetvalue = 0, .type = ARM_CP_IO,
> > +  .access = PL0_RW, .resetvalue = 0, .type = ARM_CP_ALIAS | ARM_CP_IO,
> >.readfn = pmccntr_read, .writefn = pmccntr_write32,
> >.accessfn = pmreg_access_ccntr },
> >  { .name = "PMCCNTR_EL0", .state = ARM_CP_STATE_AA64,
> > --
> 
> Does this fix an observed bug (presumably migration related),
> or is it just something you saw in code inspection ?
> Worth noting in the commit message if the former.

Purely code inspection. I forget now, but I think I found it after
chasing down some other register issue in the vicinity and noticed this
while digging around.

-Aaron

> Reviewed-by: Peter Maydell 
> 
> thanks
> -- PMM

-- 
Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

Re: [Qemu-devel] [PATCH v3 10/22] target/arm: Allow EL change hooks to do IO

2018-04-12 Thread Peter Maydell

On 16 March 2018 at 20:31, Aaron Lindsay  wrote:
> During code generation, surround CPSR writes and exception returns which
> call the EL change hooks with gen_io_start/end. The immediate need is
> for the PMU to access the clock and icount during EL change to support
> mode filtering.
>
> Signed-off-by: Aaron Lindsay 
> ---
>  target/arm/translate-a64.c | 2 ++
>  target/arm/translate.c | 4 
>  2 files changed, 6 insertions(+)
>
> diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
> index 31ff047..e1ae676 100644
> --- a/target/arm/translate-a64.c
> +++ b/target/arm/translate-a64.c
> @@ -1919,7 +1919,9 @@ static void disas_uncond_b_reg(DisasContext *s, 
> uint32_t insn)
>  unallocated_encoding(s);
>  return;
>  }
> +gen_io_start();
>  gen_helper_exception_return(cpu_env);
> +gen_io_end();

You don't want to call gen_io_start() or gen_io_end() unless
tb_cflags(s->base.tb) & CF_USE_ICOUNT) is true.

(Ditto in the other cases below.)

>  /* Must exit loop to check un-masked IRQs */
>  s->base.is_jmp = DISAS_EXIT;
>  return;
> diff --git a/target/arm/translate.c b/target/arm/translate.c
> index ba6ab7d..fd5871e 100644
> --- a/target/arm/translate.c
> +++ b/target/arm/translate.c
> @@ -4536,7 +4536,9 @@ static void gen_rfe(DisasContext *s, TCGv_i32 pc, 
> TCGv_i32 cpsr)
>   * appropriately depending on the new Thumb bit, so it must
>   * be called after storing the new PC.
>   */
> +gen_io_start();
>  gen_helper_cpsr_write_eret(cpu_env, cpsr);
> +gen_io_end();
>  tcg_temp_free_i32(cpsr);
>  /* Must exit loop to check un-masked IRQs */
>  s->base.is_jmp = DISAS_EXIT;
> @@ -9828,7 +9830,9 @@ static void disas_arm_insn(DisasContext *s, unsigned 
> int insn)
>  if (exc_return) {
>  /* Restore CPSR from SPSR.  */
>  tmp = load_cpu_field(spsr);
> +gen_io_start();
>  gen_helper_cpsr_write_eret(cpu_env, tmp);
> +gen_io_end();
>  tcg_temp_free_i32(tmp);
>  /* Must exit loop to check un-masked IRQs */
>  s->base.is_jmp = DISAS_EXIT;
> --
> Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, 
> Inc.
> Qualcomm Technologies, Inc. is a member of the
> Code Aurora Forum, a Linux Foundation Collaborative Project.
>

thanks
-- PMM

Re: [Qemu-devel] [PATCH v3 09/22] target/arm: Add pre-EL change hooks

2018-04-12 Thread Peter Maydell

On 16 March 2018 at 20:31, Aaron Lindsay  wrote:
> Because the design of the PMU requires that the counter values be
> converted between their delta and guest-visible forms for mode
> filtering, an additional hook which occurs before the EL is changed is
> necessary.
>
> Signed-off-by: Aaron Lindsay 
> ---
>  target/arm/cpu.c   | 13 +
>  target/arm/cpu.h   | 12 
>  target/arm/helper.c| 14 --
>  target/arm/internals.h |  7 +++
>  target/arm/op_helper.c |  8 
>  5 files changed, 44 insertions(+), 10 deletions(-)
>
> diff --git a/target/arm/cpu.c b/target/arm/cpu.c
> index 5f782bf..a2cb21e 100644
> --- a/target/arm/cpu.c
> +++ b/target/arm/cpu.c
> @@ -55,6 +55,18 @@ static bool arm_cpu_has_work(CPUState *cs)
>   | CPU_INTERRUPT_EXITTB);
>  }
>
> +void arm_register_pre_el_change_hook(ARMCPU *cpu, ARMELChangeHookFn *hook,
> + void *opaque)
> +{
> +ARMELChangeHook *entry;
> +entry = g_malloc0(sizeof (*entry));

g_new0().

> +
> +entry->hook = hook;
> +entry->opaque = opaque;
> +
> +QLIST_INSERT_HEAD(>pre_el_change_hooks, entry, node);
> +}
> +
>  void arm_register_el_change_hook(ARMCPU *cpu, ARMELChangeHookFn *hook,
>   void *opaque)
>  {
> @@ -747,6 +759,7 @@ static void arm_cpu_realizefn(DeviceState *dev, Error 
> **errp)
>  return;
>  }
>
> +QLIST_INIT(>pre_el_change_hooks);
>  QLIST_INIT(>el_change_hooks);
>
>  /* Some features automatically imply others: */
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index 3b45d3d..b0ef727 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -832,6 +832,7 @@ struct ARMCPU {
>   */
>  bool cfgend;
>
> +QLIST_HEAD(, ARMELChangeHook) pre_el_change_hooks;
>  QLIST_HEAD(, ARMELChangeHook) el_change_hooks;
>
>  int32_t node_id; /* NUMA node this CPU belongs to */
> @@ -2895,12 +2896,15 @@ static inline AddressSpace *arm_addressspace(CPUState 
> *cs, MemTxAttrs attrs)
>  #endif
>
>  /**
> + * arm_register_pre_el_change_hook:
>   * arm_register_el_change_hook:
> - * Register a hook function which will be called back whenever this
> - * CPU changes exception level or mode. The hook function will be
> - * passed a pointer to the ARMCPU and the opaque data pointer passed
> - * to this function when the hook was registered.
> + * Register a hook function which will be called back before or after this 
> CPU
> + * changes exception level or mode. The hook function will be passed a 
> pointer
> + * to the ARMCPU and the opaque data pointer passed to this function when the
> + * hook was registered.
>   */

I would just have one doc comment for each function, rather than
trying to share. (Some day we may actually autogenerate HTML docs
from these comments...)

Do we make the guarantee that if we call the pre-change hook
then we will definitely subsequently call the post-change hook?
(ie does the PMU hook rely on that?)

Otherwise looks OK.

thanks
-- PMM

Re: [Qemu-devel] [PATCH v3 11/22] target/arm: Fix bitmask for PMCCFILTR writes

2018-04-12 Thread Peter Maydell

On 16 March 2018 at 20:31, Aaron Lindsay  wrote:
> It was shifted to the left one bit too few.
>
> Signed-off-by: Aaron Lindsay 
> ---
>  target/arm/helper.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/target/arm/helper.c b/target/arm/helper.c
> index 50eaed7..0102357 100644
> --- a/target/arm/helper.c
> +++ b/target/arm/helper.c
> @@ -1123,7 +1123,7 @@ static void pmccfiltr_write(CPUARMState *env, const 
> ARMCPRegInfo *ri,
>  uint64_t value)
>  {
>  uint64_t saved_cycles = pmccntr_op_start(env);
> -env->cp15.pmccfiltr_el0 = value & 0x7E00;
> +env->cp15.pmccfiltr_el0 = value & 0xfc00;
>  pmccntr_op_finish(env, saved_cycles);
>  }
>

I wonder why we got that one wrong.

Reviewed-by: Peter Maydell 

Strictly speaking, bit 26 (M) should be visible only in
the AArch64 view of the register, not the AArch32 one,
but that's a separate issue.

thanks
-- PMM

[Qemu-devel] [PATCH for-2.13 4/4] arm/boot: split load_dtb() from arm_load_kernel()

2018-04-12 Thread Igor Mammedov

load_dtb() depends on arm_load_kernel() to figure out place
in RAM where it should be loaded, but it's not required for
arm_load_kernel() to work. Sometimes it's neccesary for
devices added with -device/device_add to be enumerated in
DTB as well, which's lead to [1] and surrounding commits to
add 2 more machine_done notifiers with non obvious ordering
to make dynamic sysbus devices initialization happen in
the right order.

However instead of moving whole arm_load_kernel() in to
machine_done, it's sufficient to move only load_dtb() into
virt_machine_done() notifier and remove ArmLoadKernelNotifier/
/PlatformBusFDTNotifierParams notifiers, which saves us ~90LOC
and simplifies code flow quite a bit.
Later would allow to consolidate DTB generation within one
function for 'mach-virt' board and make it reentrant so it
could generate updated DTB in device hotplug secenarios.

While at it rename load_dtb() to arm_load_dtb() since it's
public now.

Add additional field skip_dtb_autoload to struct arm_boot_info
to allow manual DTB load later in mach-virt and to avoid touching
all other boards to explicitly call arm_load_dtb().

 1) (ac9d32e hw/arm/boot: arm_load_kernel implemented as a machine init done 
notifier)

Signed-off-by: Igor Mammedov 
---
 include/hw/arm/arm.h| 37 ++---
 include/hw/arm/sysbus-fdt.h | 37 +
 hw/arm/boot.c   | 67 +++--
 hw/arm/sysbus-fdt.c | 64 ---
 hw/arm/virt.c   | 64 ---
 5 files changed, 86 insertions(+), 183 deletions(-)

diff --git a/include/hw/arm/arm.h b/include/hw/arm/arm.h
index 188d18b..312e533 100644
--- a/include/hw/arm/arm.h
+++ b/include/hw/arm/arm.h
@@ -39,15 +39,6 @@ DeviceState *armv7m_init(MemoryRegion *system_memory, int 
mem_size, int num_irq,
  */
 void armv7m_load_kernel(ARMCPU *cpu, const char *kernel_filename, int 
mem_size);
 
-/*
- * struct used as a parameter of the arm_load_kernel machine init
- * done notifier
- */
-typedef struct {
-Notifier notifier; /* actual notifier */
-ARMCPU *cpu; /* handle to the first cpu object */
-} ArmLoadKernelNotifier;
-
 /* arm_boot.c */
 struct arm_boot_info {
 uint64_t ram_size;
@@ -56,6 +47,9 @@ struct arm_boot_info {
 const char *initrd_filename;
 const char *dtb_filename;
 hwaddr loader_start;
+hwaddr dtb_start;
+hwaddr dtb_limit;
+bool skip_dtb_autoload;
 /* multicore boards that use the default secondary core boot functions
  * need to put the address of the secondary boot code, the boot reg,
  * and the GIC address in the next 3 values, respectively. boards that
@@ -95,7 +89,6 @@ struct arm_boot_info {
  */
 void (*modify_dtb)(const struct arm_boot_info *info, void *fdt);
 /* machine init done notifier executing arm_load_dtb */
-ArmLoadKernelNotifier load_kernel_notifier;
 /* Used internally by arm_boot.c */
 int is_linux;
 hwaddr initrd_start;
@@ -145,6 +138,30 @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info 
*info);
 
 AddressSpace *arm_boot_address_space(ARMCPU *cpu, bool secure_boot);
 
+/**
+ * arm_load_dtb() - load a device tree binary image into memory
+ * @addr:   the address to load the image at
+ * @binfo:  struct describing the boot environment
+ * @addr_limit: upper limit of the available memory area at @addr
+ * @as: address space to load image to
+ *
+ * Load a device tree supplied by the machine or by the user  with the
+ * '-dtb' command line option, and put it at offset @addr in target
+ * memory.
+ *
+ * If @addr_limit contains a meaningful value (i.e., it is strictly greater
+ * than @addr), the device tree is only loaded if its size does not exceed
+ * the limit.
+ *
+ * Returns: the size of the device tree image on success,
+ *  0 if the image size exceeds the limit,
+ *  -1 on errors.
+ *
+ * Note: Must not be called unless have_dtb(binfo) is true.
+ */
+int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
+ hwaddr addr_limit, AddressSpace *as);
+
 /* Write a secure board setup routine with a dummy handler for SMCs */
 void arm_write_secure_board_setup_dummy_smc(ARMCPU *cpu,
 const struct arm_boot_info *info,
diff --git a/include/hw/arm/sysbus-fdt.h b/include/hw/arm/sysbus-fdt.h
index e15bb81..340c382 100644
--- a/include/hw/arm/sysbus-fdt.h
+++ b/include/hw/arm/sysbus-fdt.h
@@ -24,37 +24,14 @@
 #ifndef HW_ARM_SYSBUS_FDT_H
 #define HW_ARM_SYSBUS_FDT_H
 
-#include "hw/arm/arm.h"
-#include "qemu-common.h"
-#include "hw/sysbus.h"
-
-/*
- * struct that contains dimensioning parameters of the platform bus
- */
-typedef struct {
-hwaddr platform_bus_base; /* start address of the bus */
-hwaddr platform_bus_size; /* size of the bus */
-int platform_bus_first_irq; /* first hwirq assigned

[Qemu-devel] [PATCH for-2.13 2/4] platform-bus-device: use device plug callback instead of machine_done notifier

2018-04-12 Thread Igor Mammedov

platform-bus were using machine_done notifier to get and map
(assign irq/mmio resources) dynamically added sysbus devices
after all '-device' options had been processed.
That however creates non obvious dependencies on ordering of
machine_done notifiers and requires carefull line juggling
to keep it working. For example see comment above
create_platform_bus() and 'straitforward' arm_load_kernel()
had to converted to machine_done notifier and that lead to
yet another machine_done notifier to keep it working
arm_register_platform_bus_fdt_creator().

Instead of hiding resource assignment in platform-bus-device
to magically initialize sysbus devices, use device plug
callback and assign resources explicitly at board level
at the moment each -device option is being processed.

That adds a bunch of machine declaration boiler plate to
e500plat board, similar to ARM/x86 but gets rid of hidden
machine_done notifier and would allow to remove the dependent
notifiers in ARM code simplifying it and making code flow
easier to follow.

Signed-off-by: Igor Mammedov 
---
CC: ag...@suse.de
CC: da...@gibson.dropbear.id.au
CC: qemu-...@nongnu.org
---
 hw/ppc/e500.h |  3 +++
 include/hw/arm/virt.h |  3 +++
 include/hw/platform-bus.h |  4 ++--
 hw/arm/sysbus-fdt.c   |  3 ---
 hw/arm/virt.c | 36 
 hw/core/platform-bus.c| 29 ---
 hw/ppc/e500.c | 37 +
 hw/ppc/e500plat.c | 60 +--
 8 files changed, 129 insertions(+), 46 deletions(-)

diff --git a/hw/ppc/e500.h b/hw/ppc/e500.h
index 70ba1d8..d0f8ddd 100644
--- a/hw/ppc/e500.h
+++ b/hw/ppc/e500.h
@@ -2,6 +2,7 @@
 #define PPCE500_H
 
 #include "hw/boards.h"
+#include "hw/sysbus.h"
 
 typedef struct PPCE500Params {
 int pci_first_slot;
@@ -26,6 +27,8 @@ typedef struct PPCE500Params {
 
 void ppce500_init(MachineState *machine, PPCE500Params *params);
 
+void ppce500_plug_dynamic_sysbus_device(SysBusDevice *sbdev);
+
 hwaddr booke206_page_size_to_tlb(uint64_t size);
 
 #endif
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index ba0c1a4..5535760 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -86,11 +86,14 @@ typedef struct {
 bool no_pmu;
 bool claim_edge_triggered_timers;
 bool smbios_old_sys_ver;
+HotplugHandler *(*get_hotplug_handler)(MachineState *machine,
+   DeviceState *dev);
 } VirtMachineClass;
 
 typedef struct {
 MachineState parent;
 Notifier machine_done;
+DeviceState *platform_bus_dev;
 FWCfgState *fw_cfg;
 bool secure;
 bool highmem;
diff --git a/include/hw/platform-bus.h b/include/hw/platform-bus.h
index a00775c..19e20c5 100644
--- a/include/hw/platform-bus.h
+++ b/include/hw/platform-bus.h
@@ -37,8 +37,6 @@ typedef struct PlatformBusDevice PlatformBusDevice;
 struct PlatformBusDevice {
 /*< private >*/
 SysBusDevice parent_obj;
-Notifier notifier;
-bool done_gathering;
 
 /*< public >*/
 uint32_t mmio_size;
@@ -54,4 +52,6 @@ int platform_bus_get_irqn(PlatformBusDevice *platform_bus, 
SysBusDevice *sbdev,
 hwaddr platform_bus_get_mmio_addr(PlatformBusDevice *pbus, SysBusDevice *sbdev,
   int n);
 
+void platform_bus_link_device(PlatformBusDevice *pbus, SysBusDevice *sbdev);
+
 #endif /* HW_PLATFORM_BUS_H */
diff --git a/hw/arm/sysbus-fdt.c b/hw/arm/sysbus-fdt.c
index d68e3dc..80ff70e 100644
--- a/hw/arm/sysbus-fdt.c
+++ b/hw/arm/sysbus-fdt.c
@@ -506,9 +506,6 @@ static void 
add_all_platform_bus_fdt_nodes(ARMPlatformBusFDTParams *fdt_params)
 dev = qdev_find_recursive(sysbus_get_default(), TYPE_PLATFORM_BUS_DEVICE);
 pbus = PLATFORM_BUS_DEVICE(dev);
 
-/* We can only create dt nodes for dynamic devices when they're ready */
-assert(pbus->done_gathering);
-
 PlatformBusFDTData data = {
 .fdt = fdt,
 .irq_start = irq_start,
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 94dcb12..2e10d8b 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1087,6 +1087,7 @@ static void create_platform_bus(VirtMachineState *vms, 
qemu_irq *pic)
 qdev_prop_set_uint32(dev, "mmio_size",
 platform_bus_params.platform_bus_size);
 qdev_init_nofail(dev);
+vms->platform_bus_dev = dev;
 s = SYS_BUS_DEVICE(dev);
 
 for (i = 0; i < platform_bus_params.platform_bus_num_irqs; i++) {
@@ -1536,9 +1537,37 @@ static const CPUArchIdList 
*virt_possible_cpu_arch_ids(MachineState *ms)
 return ms->possible_cpus;
 }
 
+static void virt_machine_device_plug_cb(HotplugHandler *hotplug_dev,
+DeviceState *dev, Error **errp)
+{
+VirtMachineState *vms = VIRT_MACHINE(hotplug_dev);
+
+if (object_dynamic_cast(OBJECT(dev), TYPE_SYS_BUS_DEVICE)) {
+if (vms->platform_bus_dev) {
+

[Qemu-devel] [PATCH for-2.13 1/4] arm: reuse arm_boot_address_space() in armv7m_load_kernel()

2018-04-12 Thread Igor Mammedov

reduce code duplication by resusing arm_boot_address_space()

Signed-off-by: Igor Mammedov 
---
 include/hw/arm/arm.h |  2 ++
 hw/arm/armv7m.c  | 10 +-
 hw/arm/boot.c| 16 
 3 files changed, 11 insertions(+), 17 deletions(-)

diff --git a/include/hw/arm/arm.h b/include/hw/arm/arm.h
index ce769bd..188d18b 100644
--- a/include/hw/arm/arm.h
+++ b/include/hw/arm/arm.h
@@ -143,6 +143,8 @@ struct arm_boot_info {
  */
 void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info);
 
+AddressSpace *arm_boot_address_space(ARMCPU *cpu, bool secure_boot);
+
 /* Write a secure board setup routine with a dummy handler for SMCs */
 void arm_write_secure_board_setup_dummy_smc(ARMCPU *cpu,
 const struct arm_boot_info *info,
diff --git a/hw/arm/armv7m.c b/hw/arm/armv7m.c
index f123cc7..d372d9c 100644
--- a/hw/arm/armv7m.c
+++ b/hw/arm/armv7m.c
@@ -289,8 +289,6 @@ void armv7m_load_kernel(ARMCPU *cpu, const char 
*kernel_filename, int mem_size)
 uint64_t lowaddr;
 int big_endian;
 AddressSpace *as;
-int asidx;
-CPUState *cs = CPU(cpu);
 
 #ifdef TARGET_WORDS_BIGENDIAN
 big_endian = 1;
@@ -303,13 +301,7 @@ void armv7m_load_kernel(ARMCPU *cpu, const char 
*kernel_filename, int mem_size)
 exit(1);
 }
 
-if (arm_feature(>env, ARM_FEATURE_EL3)) {
-asidx = ARMASIdx_S;
-} else {
-asidx = ARMASIdx_NS;
-}
-as = cpu_get_address_space(cs, asidx);
-
+as = arm_boot_address_space(cpu, true);
 if (kernel_filename) {
 image_size = load_elf_as(kernel_filename, NULL, NULL, , ,
  NULL, big_endian, EM_ARM, 1, 0, as);
diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index 26184bc..2f464ca 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -36,8 +36,7 @@
 #define ARM64_TEXT_OFFSET_OFFSET8
 #define ARM64_MAGIC_OFFSET  56
 
-static AddressSpace *arm_boot_address_space(ARMCPU *cpu,
-const struct arm_boot_info *info)
+AddressSpace *arm_boot_address_space(ARMCPU *cpu, bool secure_boot)
 {
 /* Return the address space to use for bootloader reads and writes.
  * We prefer the secure address space if the CPU has it and we're
@@ -46,7 +45,7 @@ static AddressSpace *arm_boot_address_space(ARMCPU *cpu,
 int asidx;
 CPUState *cs = CPU(cpu);
 
-if (arm_feature(>env, ARM_FEATURE_EL3) && info->secure_boot) {
+if (arm_feature(>env, ARM_FEATURE_EL3) && secure_boot) {
 asidx = ARMASIdx_S;
 } else {
 asidx = ARMASIdx_NS;
@@ -193,7 +192,7 @@ static void default_write_secondary(ARMCPU *cpu,
 const struct arm_boot_info *info)
 {
 uint32_t fixupcontext[FIXUP_MAX];
-AddressSpace *as = arm_boot_address_space(cpu, info);
+AddressSpace *as = arm_boot_address_space(cpu, info->secure_boot);
 
 fixupcontext[FIXUP_GIC_CPU_IF] = info->gic_cpu_if_addr;
 fixupcontext[FIXUP_BOOTREG] = info->smp_bootreg_addr;
@@ -211,7 +210,7 @@ void arm_write_secure_board_setup_dummy_smc(ARMCPU *cpu,
 const struct arm_boot_info *info,
 hwaddr mvbar_addr)
 {
-AddressSpace *as = arm_boot_address_space(cpu, info);
+AddressSpace *as = arm_boot_address_space(cpu, info->secure_boot);
 int n;
 uint32_t mvbar_blob[] = {
 /* mvbar_addr: secure monitor vectors
@@ -262,7 +261,7 @@ void arm_write_secure_board_setup_dummy_smc(ARMCPU *cpu,
 static void default_reset_secondary(ARMCPU *cpu,
 const struct arm_boot_info *info)
 {
-AddressSpace *as = arm_boot_address_space(cpu, info);
+AddressSpace *as = arm_boot_address_space(cpu, info->secure_boot);
 CPUState *cs = CPU(cpu);
 
 address_space_stl_notdirty(as, info->smp_bootreg_addr,
@@ -753,7 +752,8 @@ static void do_cpu_reset(void *opaque)
 }
 
 if (cs == first_cpu) {
-AddressSpace *as = arm_boot_address_space(cpu, info);
+AddressSpace *as =
+arm_boot_address_space(cpu, info->secure_boot);
 
 cpu_set_pc(cs, info->loader_start);
 
@@ -950,7 +950,7 @@ static void arm_load_kernel_notify(Notifier *notifier, void 
*data)
 ARMCPU *cpu = n->cpu;
 struct arm_boot_info *info =
 container_of(n, struct arm_boot_info, load_kernel_notifier);
-AddressSpace *as = arm_boot_address_space(cpu, info);
+AddressSpace *as = arm_boot_address_space(cpu, info->secure_boot);
 
 /* The board code is not supposed to set secure_board_setup unless
  * running its code in secure mode is actually possible, and KVM
-- 
2.7.4

[Qemu-devel] [PATCH for-2.13 3/4] arm: always start from first_cpu when registering loader cpu reset callback

2018-04-12 Thread Igor Mammedov

if arm_load_kernel() were passed non first_cpu, QEMU would end up
with partially set do_cpu_reset() callback leaving some CPUs without it.

Make sure that do_cpu_reset() is registered for all CPUs by enumerating
CPUs from first_cpu.

Signed-off-by: Igor Mammedov 
---
 hw/arm/boot.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index 2f464ca..2591fee 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -1188,7 +1188,7 @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info 
*info)
  * actually loading a kernel, the handler is also responsible for
  * arranging that we start it correctly.
  */
-for (cs = CPU(cpu); cs; cs = CPU_NEXT(cs)) {
+for (cs = first_cpu; cs; cs = CPU_NEXT(cs)) {
 qemu_register_reset(do_cpu_reset, ARM_CPU(cs));
 }
 }
-- 
2.7.4

[Qemu-devel] [PATCH for-2.13 0/4] arm: isolate and clean up dtb generation

2018-04-12 Thread Igor Mammedov


While working on cpu hotplug for mach-virt, I've noticed that DTB
is generated incrementally across whole machvirt_init(). While it's
fine for machines with static DTB, it won't really work in case of
cpu hotplug and followed up reset since machine will load old DTB
that doesn't account for hotplugged CPUs.
So I've set on a quest to consolidate DTB generation and make it
reentrant so that on reset guest would see update DTB.

It's preliminary series which makes possible to call load_dtb()
later outside of arm_load_kernel() and in process of it drops
several machine_done notifiers, that were introduced to make
plaform-bus-devices work. Hopefully it makes code easier to follow.
It replaces machine_done notifiers with device hotplug framework
to allow for dynamic sysbus devices mapping at the moment they
are created instead of waiting for machine_done time and trying to
juggle with notifiers order to do initialization in propper order.

Mostly 'make check' tested +
manually with "ppce500" machine type and eTSEC device
(eTSEC is still initialized with the same IRQs as before series)


CC: qemu-...@nongnu.org
CC: peter.mayd...@linaro.org
CC: eric.au...@redhat.com

Igor Mammedov (4):
  arm: reuse arm_boot_address_space() in armv7m_load_kernel()
  platform-bus-device: use device plug callback instead of machine_done
notifier
  arm: always start from first_cpu when registering loader cpu reset
callback
  arm/boot: split load_dtb() from arm_load_kernel()

 hw/ppc/e500.h   |   3 ++
 include/hw/arm/arm.h|  39 -
 include/hw/arm/sysbus-fdt.h |  37 
 include/hw/arm/virt.h   |   3 ++
 include/hw/platform-bus.h   |   4 +-
 hw/arm/armv7m.c |  10 +
 hw/arm/boot.c   |  85 +++--
 hw/arm/sysbus-fdt.c |  67 +++--
 hw/arm/virt.c   | 100 +---
 hw/core/platform-bus.c  |  29 +++--
 hw/ppc/e500.c   |  37 +---
 hw/ppc/e500plat.c   |  60 +-
 12 files changed, 227 insertions(+), 247 deletions(-)

-- 
2.7.4

Re: [Qemu-devel] [PATCH v3 08/22] target/arm: Support multiple EL change hooks

2018-04-12 Thread Peter Maydell

On 16 March 2018 at 20:31, Aaron Lindsay  wrote:
> Signed-off-by: Aaron Lindsay 
> ---
>  target/arm/cpu.c   | 15 ++-
>  target/arm/cpu.h   | 23 ---
>  target/arm/internals.h |  7 ---
>  3 files changed, 26 insertions(+), 19 deletions(-)
>
> diff --git a/target/arm/cpu.c b/target/arm/cpu.c
> index 072cbbf..5f782bf 100644
> --- a/target/arm/cpu.c
> +++ b/target/arm/cpu.c
> @@ -55,13 +55,16 @@ static bool arm_cpu_has_work(CPUState *cs)
>   | CPU_INTERRUPT_EXITTB);
>  }
>
> -void arm_register_el_change_hook(ARMCPU *cpu, ARMELChangeHook *hook,
> +void arm_register_el_change_hook(ARMCPU *cpu, ARMELChangeHookFn *hook,
>   void *opaque)
>  {
> -/* We currently only support registering a single hook function */
> -assert(!cpu->el_change_hook);
> -cpu->el_change_hook = hook;
> -cpu->el_change_hook_opaque = opaque;
> +ARMELChangeHook *entry;
> +entry = g_malloc0(sizeof (*entry));

g_new0(ARMELChangeHook, 1) is nicer for initializing a thing of
known size.

> +
> +entry->hook = hook;
> +entry->opaque = opaque;
> +
> +QLIST_INSERT_HEAD(>el_change_hooks, entry, node);
>  }
>
>  static void cp_reg_reset(gpointer key, gpointer value, gpointer opaque)
> @@ -744,6 +747,8 @@ static void arm_cpu_realizefn(DeviceState *dev, Error 
> **errp)
>  return;
>  }
>
> +QLIST_INIT(>el_change_hooks);
> +

If you put this in arm_cpu_initfn() you don't have to wonder
whether anybody's calling arm_register_el_change_hook() after
the CPU object has been created but before it is realized...

>  /* Some features automatically imply others: */
>  if (arm_feature(env, ARM_FEATURE_V8)) {
>  set_feature(env, ARM_FEATURE_V7);
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index f17592b..3b45d3d 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -633,11 +633,17 @@ typedef struct CPUARMState {
>
>  /**
>   * ARMELChangeHook:
> - * type of a function which can be registered via 
> arm_register_el_change_hook()
> - * to get callbacks when the CPU changes its exception level or mode.
> + * Support registering functions with ARMELChangeHookFn's signature via
> + * arm_register_el_change_hook() to get callbacks when the CPU changes its
> + * exception level or mode.
>   */

This is supposed to be the doc comment for the type,
which I think was fine as it is, apart from the name change.
If you want to also have a doc comment for the struct type
that's fine but would be something different (and is less
necessary, because the struct type is internal to the CPU
whereas the function type is part of its API to the rest of
the system).

> -typedef void ARMELChangeHook(ARMCPU *cpu, void *opaque);
> -
> +typedef void ARMELChangeHookFn(ARMCPU *cpu, void *opaque);
> +typedef struct ARMELChangeHook ARMELChangeHook;
> +struct ARMELChangeHook {
> +ARMELChangeHookFn *hook;
> +void *opaque;
> +QLIST_ENTRY(ARMELChangeHook) node;
> +};
>
>  /* These values map onto the return values for
>   * QEMU_PSCI_0_2_FN_AFFINITY_INFO */
> @@ -826,8 +832,7 @@ struct ARMCPU {
>   */
>  bool cfgend;
>
> -ARMELChangeHook *el_change_hook;
> -void *el_change_hook_opaque;
> +QLIST_HEAD(, ARMELChangeHook) el_change_hooks;
>
>  int32_t node_id; /* NUMA node this CPU belongs to */
>
> @@ -2895,12 +2900,8 @@ static inline AddressSpace *arm_addressspace(CPUState 
> *cs, MemTxAttrs attrs)
>   * CPU changes exception level or mode. The hook function will be
>   * passed a pointer to the ARMCPU and the opaque data pointer passed
>   * to this function when the hook was registered.
> - *
> - * Note that we currently only support registering a single hook function,
> - * and will assert if this function is called twice.
> - * This facility is intended for the use of the GICv3 emulation.
>   */
> -void arm_register_el_change_hook(ARMCPU *cpu, ARMELChangeHook *hook,
> +void arm_register_el_change_hook(ARMCPU *cpu, ARMELChangeHookFn *hook,
>   void *opaque);
>
>  /**
> diff --git a/target/arm/internals.h b/target/arm/internals.h
> index 47cc224..7df3eda 100644
> --- a/target/arm/internals.h
> +++ b/target/arm/internals.h
> @@ -727,11 +727,12 @@ void arm_cpu_do_transaction_failed(CPUState *cs, hwaddr 
> physaddr,
> int mmu_idx, MemTxAttrs attrs,
> MemTxResult response, uintptr_t retaddr);
>
> -/* Call the EL change hook if one has been registered */
> +/* Call any registered EL change hooks */
>  static inline void arm_call_el_change_hook(ARMCPU *cpu)
>  {
> -if (cpu->el_change_hook) {
> -cpu->el_change_hook(cpu, cpu->el_change_hook_opaque);
> +ARMELChangeHook *hook, *next;
> +QLIST_FOREACH_SAFE(hook, >el_change_hooks, node, next) {
> +hook->hook(cpu, hook->opaque);
>  }
>  }
>

thanks
-- PMM

Re: [Qemu-devel] [PATCH v3 07/22] target/arm: Fetch GICv3 state directly from CPUARMState

2018-04-12 Thread Peter Maydell

On 16 March 2018 at 20:31, Aaron Lindsay  wrote:
> This eliminates the need for fetching it from el_change_hook_opaque, and
> allows for supporting multiple el_change_hooks without having to hack
> something together to find the registered opaque belonging to GICv3.
>
> Signed-off-by: Aaron Lindsay 
> ---
>  hw/intc/arm_gicv3_cpuif.c | 10 ++
>  target/arm/cpu.h  | 10 --
>  2 files changed, 2 insertions(+), 18 deletions(-)

I'm not wonderfully happy about this, because the original aim
here was to try to keep the GICv3 device end reasonably decoupled
from the CPU proper. On the other hand since d3a3e529626fb we've had
this pointer because we ended up needing it in the KVM reset codepath,
so we might as well use it here too. Maybe one day I'll come back
to this and clean it up.

Reviewed-by: Peter Maydell 

thanks
-- PMM

Re: [Qemu-devel] [PATCH v3 06/22] target/arm: Mask PMU register writes based on PMCR_EL0.N

2018-04-12 Thread Peter Maydell

On 16 March 2018 at 20:31, Aaron Lindsay  wrote:
> This is in preparation for enabling counters other than PMCCNTR
>
> Signed-off-by: Aaron Lindsay 
> ---
>  target/arm/helper.c | 24 +++-
>  1 file changed, 15 insertions(+), 9 deletions(-)
>
> diff --git a/target/arm/helper.c b/target/arm/helper.c
> index 6480b80..5d5c738 100644
> --- a/target/arm/helper.c
> +++ b/target/arm/helper.c
> @@ -52,11 +52,6 @@ typedef struct V8M_SAttributes {
>  static void v8m_security_lookup(CPUARMState *env, uint32_t address,
>  MMUAccessType access_type, ARMMMUIdx mmu_idx,
>  V8M_SAttributes *sattrs);
> -
> -/* Definitions for the PMCCNTR and PMCR registers */
> -#define PMCRD   0x8
> -#define PMCRC   0x4
> -#define PMCRE   0x1
>  #endif
>
>  static int vfp_gdb_get_reg(CPUARMState *env, uint8_t *buf, int reg)
> @@ -906,6 +901,17 @@ static const ARMCPRegInfo v6_cp_reginfo[] = {
>  REGINFO_SENTINEL
>  };
>
> +/* Definitions for the PMU registers */
> +#define PMCRN_MASK  0xf800
> +#define PMCRN_SHIFT 11
> +#define PMCRD   0x8
> +#define PMCRC   0x4
> +#define PMCRE   0x1
> +
> +#define PMU_NUM_COUNTERS(env) ((env->cp15.c9_pmcr & PMCRN_MASK) >> 
> PMCRN_SHIFT)
> +/* Bits allowed to be set/cleared for PMCNTEN* and PMINTEN* */
> +#define PMU_COUNTER_MASK(env) ((1 << 31) | ((1 << PMU_NUM_COUNTERS(env)) - 
> 1))

These would be better as inline functions I think.

Otherwise
Reviewed-by: Peter Maydell 

thanks
-- PMM

Re: [Qemu-devel] [RFC PATCH] migration: discard RAMBlocks of type ram_device

2018-04-12 Thread Alex Williamson

On Thu, 12 Apr 2018 15:59:24 +
"Zhang, Yulei"  wrote:

> > -Original Message-
> > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > Sent: Thursday, April 12, 2018 1:55 AM
> > To: Cédric Le Goater 
> > Cc: qemu-devel@nongnu.org; Juan Quintela ; Dr .
> > David Alan Gilbert ; David Gibson
> > ; Zhang, Yulei ; Tian,
> > Kevin ; joonas.lahti...@linux.intel.com;
> > zhen...@linux.intel.com; kwankh...@nvidia.com; Wang, Zhi A
> > 
> > Subject: Re: [Qemu-devel] [RFC PATCH] migration: discard RAMBlocks of type
> > ram_device
> > 
> > [cc +folks working on vfio-mdev migration]
> > 
> > On Wed, 11 Apr 2018 19:20:14 +0200
> > Cédric Le Goater  wrote:
> >   
> > > Here is some context for this strange change request.
> > >
> > > On the POWER9 processor, the XIVE interrupt controller can control
> > > interrupt sources using MMIO to trigger events, to EOI or to turn off
> > > the sources. Priority management and interrupt acknowledgment is also
> > > controlled by MMIO in the presenter subengine.
> > >
> > > These MMIO regions are exposed to guests in QEMU with a set of 'ram
> > > device' memory mappings, similarly to VFIO, and the VMAs are populated
> > > dynamically with the appropriate pages using a fault handler.
> > >
> > > But, these regions are an issue for migration. We need to discard the
> > > associated RAMBlocks from the RAM state on the source VM and let the
> > > destination VM rebuild the memory mappings on the new host in the
> > > post_load() operation just before resuming the system.
> > >
> > > This is the goal of the following proposal. Does it make sense ? It
> > > seems to be working enough to migrate a running guest but there might
> > > be a better, more subtle, approach.  
> > 
> > Yulei, is this something you've run into with GVT-g migration?  I don't see
> > how we can read from or write to ram_device regions in a useful way during
> > migration anyway, so the change initially looks correct to me.
> > Thanks,
> > 
> > Alex
> >   
> 
> Didn't meet such issue before. I think the change will be fine if the vendor 
> driver
> handle the reconstruction well on the target side. And I agree with Dave's 
> suggestion,
> how about vendor driver report a flag about the mapped region to indicate it 
> can
> be registered as migratable memory block or not.   

"migration" and "memory blocks" are not vfio concepts, you'd need to
come up with flags that actually conveys the device level property of
the region that you're trying to indicate.  I don't see why we'd do
this though, the application of such a flag seems too narrow and it
tarnishes the concept that a vendor driver provides a region, through
which *all* device state is saved and restored.  Thanks,

Alex

Re: [Qemu-devel] [PATCH] block/gluster: defend on legacy ftruncate api use

2018-04-12 Thread Eric Blake

n 04/12/2018 08:31 AM, Niels de Vos wrote:
> This change looks good to me, but a commit message would have been
> helpful. I suggest something like this:
> 
>   Gluster 4.0 changed the signature of glfs_ftruncate(). The function
>   now has two additional arguments, namely prestat and poststat. These
>   provide not benefit for QEMU, so ignoring them and passing NULL makes
>   the gluster-block driver compile with the new Gluster version again.
> 
> And maybe add this too:
> 
>   Glusters libgfapi uses symbol versioning and provides backwards
>   compatible functions. Binaries compiled against previous versions of
>   Gluster keep on functioning with the new library.
> 

>> @@ -3856,6 +3857,9 @@ if test "$glusterfs" != "no" ; then
>>glusterfs_fallocate="yes"
>>glusterfs_zerofill="yes"
>>  fi
>> +if ! $pkg_config --atleast-version=7.4 glusterfs-api; then
>> +  glusterfs_legacy_ftruncate="yes"

Also, version-based tests are lousy.  Feature-based tests (does a call
to glfs_ftruncate(0, 0) compile without error? then we are legacy) are
less brittle, especially when features can be backported across
versions.  So this should be reworked to a compile check, rather than a
version query.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v3 05/22] target/arm: Reorganize PMCCNTR read, write, sync

2018-04-12 Thread Peter Maydell

On 16 March 2018 at 20:31, Aaron Lindsay  wrote:
> pmccntr_read and pmccntr_write contained duplicate code that was already
> being handled by pmccntr_sync. Split pmccntr_sync into pmccntr_op_start
> and pmccntr_op_finish, passing the clock value between the two, to avoid
> losing time between the two calls.
>
> Signed-off-by: Aaron Lindsay 
> ---
>  target/arm/helper.c | 101 
> +---
>  1 file changed, 56 insertions(+), 45 deletions(-)
>
> diff --git a/target/arm/helper.c b/target/arm/helper.c
> index 5634561..6480b80 100644
> --- a/target/arm/helper.c
> +++ b/target/arm/helper.c
> @@ -1000,28 +1000,58 @@ static inline bool arm_ccnt_enabled(CPUARMState *env)
>
>  return true;
>  }
> -
> -void pmccntr_sync(CPUARMState *env)

If you configure your git to use the 'histogram' diff algorithm
("git config --global diff.algorithm histogram", or edit ~/.gitconfig
equivalently), does it make git format-patch make less of a mess
of this commit ?

> +/*
> + * Ensure c15_ccnt is the guest-visible count so that operations such as
> + * enabling/disabling the counter or filtering, modifying the count itself,
> + * etc. can be done logically. This is essentially a no-op if the counter is
> + * not enabled at the time of the call.
> + *
> + * The current cycle count is returned so that it can be passed into the 
> paired
> + * pmccntr_op_finish() call which must follow each call to 
> pmccntr_op_start().
> + */
> +uint64_t pmccntr_op_start(CPUARMState *env)
>  {
> -uint64_t temp_ticks;
> -
> -temp_ticks = muldiv64(qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL),
> +uint64_t cycles = 0;
> +#ifndef CONFIG_USER_ONLY
> +cycles = muldiv64(qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL),
>ARM_CPU_FREQ, NANOSECONDS_PER_SECOND);
> +#endif

Is this ifdef necessary? You have a do-nothing version of
pmccntr_op_start() for CONFIG_USER_ONLY later on, so presumably
this one is already inside a suitable ifndef.

Otherwise

Reviewed-by: Peter Maydell 

thanks
-- PMM

Re: [Qemu-devel] [PATCH v3 04/22] target/arm: Treat PMCCNTR as alias of PMCCNTR_EL0

2018-04-12 Thread Peter Maydell

On 16 March 2018 at 20:31, Aaron Lindsay  wrote:
> They share the same underlying state
>
> Signed-off-by: Aaron Lindsay 
> ---
>  target/arm/helper.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/target/arm/helper.c b/target/arm/helper.c
> index 5e48982..5634561 100644
> --- a/target/arm/helper.c
> +++ b/target/arm/helper.c
> @@ -1318,7 +1318,7 @@ static const ARMCPRegInfo v7_cp_reginfo[] = {
>.fieldoffset = offsetof(CPUARMState, cp15.c9_pmselr),
>.writefn = pmselr_write, .raw_writefn = raw_write, },
>  { .name = "PMCCNTR", .cp = 15, .crn = 9, .crm = 13, .opc1 = 0, .opc2 = 0,
> -  .access = PL0_RW, .resetvalue = 0, .type = ARM_CP_IO,
> +  .access = PL0_RW, .resetvalue = 0, .type = ARM_CP_ALIAS | ARM_CP_IO,
>.readfn = pmccntr_read, .writefn = pmccntr_write32,
>.accessfn = pmreg_access_ccntr },
>  { .name = "PMCCNTR_EL0", .state = ARM_CP_STATE_AA64,
> --

Does this fix an observed bug (presumably migration related),
or is it just something you saw in code inspection ?
Worth noting in the commit message if the former.

Reviewed-by: Peter Maydell 

thanks
-- PMM

Re: [Qemu-devel] [PATCH v3 02/22] target/arm: A15 PMCEID0 initialization style nit

2018-04-12 Thread Peter Maydell

On 16 March 2018 at 20:31, Aaron Lindsay  wrote:
> Signed-off-by: Aaron Lindsay 
> ---
>  target/arm/cpu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/target/arm/cpu.c b/target/arm/cpu.c
> index 022d8c5..072cbbf 100644
> --- a/target/arm/cpu.c
> +++ b/target/arm/cpu.c
> @@ -1524,7 +1524,7 @@ static void cortex_a15_initfn(Object *obj)
>  cpu->id_pfr0 = 0x1131;
>  cpu->id_pfr1 = 0x00011011;
>  cpu->id_dfr0 = 0x02010555;
> -cpu->pmceid0 = 0x000;
> +cpu->pmceid0 = 0x;
>  cpu->pmceid1 = 0x;
>  cpu->id_afr0 = 0x;
>  cpu->id_mmfr0 = 0x10201105;

Reviewed-by: Peter Maydell 

thanks
-- PMM

Re: [Qemu-devel] [RFC PATCH] migration: discard RAMBlocks of type ram_device

2018-04-12 Thread Zhang, Yulei



> -Original Message-
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Thursday, April 12, 2018 1:55 AM
> To: Cédric Le Goater 
> Cc: qemu-devel@nongnu.org; Juan Quintela ; Dr .
> David Alan Gilbert ; David Gibson
> ; Zhang, Yulei ; Tian,
> Kevin ; joonas.lahti...@linux.intel.com;
> zhen...@linux.intel.com; kwankh...@nvidia.com; Wang, Zhi A
> 
> Subject: Re: [Qemu-devel] [RFC PATCH] migration: discard RAMBlocks of type
> ram_device
> 
> [cc +folks working on vfio-mdev migration]
> 
> On Wed, 11 Apr 2018 19:20:14 +0200
> Cédric Le Goater  wrote:
> 
> > Here is some context for this strange change request.
> >
> > On the POWER9 processor, the XIVE interrupt controller can control
> > interrupt sources using MMIO to trigger events, to EOI or to turn off
> > the sources. Priority management and interrupt acknowledgment is also
> > controlled by MMIO in the presenter subengine.
> >
> > These MMIO regions are exposed to guests in QEMU with a set of 'ram
> > device' memory mappings, similarly to VFIO, and the VMAs are populated
> > dynamically with the appropriate pages using a fault handler.
> >
> > But, these regions are an issue for migration. We need to discard the
> > associated RAMBlocks from the RAM state on the source VM and let the
> > destination VM rebuild the memory mappings on the new host in the
> > post_load() operation just before resuming the system.
> >
> > This is the goal of the following proposal. Does it make sense ? It
> > seems to be working enough to migrate a running guest but there might
> > be a better, more subtle, approach.
> 
> Yulei, is this something you've run into with GVT-g migration?  I don't see
> how we can read from or write to ram_device regions in a useful way during
> migration anyway, so the change initially looks correct to me.
> Thanks,
> 
> Alex
> 

Didn't meet such issue before. I think the change will be fine if the vendor 
driver
handle the reconstruction well on the target side. And I agree with Dave's 
suggestion,
how about vendor driver report a flag about the mapped region to indicate it can
be registered as migratable memory block or not.   

Yulei
> > Signed-off-by: Cédric Le Goater 
> > ---
> >  migration/ram.c | 42 --
> >  1 file changed, 40 insertions(+), 2 deletions(-)
> >
> > diff --git a/migration/ram.c b/migration/ram.c index
> > 0e90efa09236..6404ccd046d8 100644
> > --- a/migration/ram.c
> > +++ b/migration/ram.c
> > @@ -780,6 +780,10 @@ unsigned long
> migration_bitmap_find_dirty(RAMState *rs, RAMBlock *rb,
> >  unsigned long *bitmap = rb->bmap;
> >  unsigned long next;
> >
> > +if (memory_region_is_ram_device(rb->mr)) {
> > +return size;
> > +}
> > +
> >  if (rs->ram_bulk_stage && start > 0) {
> >  next = start + 1;
> >  } else {
> > @@ -826,6 +830,9 @@ uint64_t ram_pagesize_summary(void)
> >  uint64_t summary = 0;
> >
> >  RAMBLOCK_FOREACH(block) {
> > +if (memory_region_is_ram_device(block->mr)) {
> > +continue;
> > +}
> >  summary |= block->page_size;
> >  }
> >
> > @@ -850,6 +857,9 @@ static void migration_bitmap_sync(RAMState *rs)
> >  qemu_mutex_lock(>bitmap_mutex);
> >  rcu_read_lock();
> >  RAMBLOCK_FOREACH(block) {
> > +if (memory_region_is_ram_device(block->mr)) {
> > +continue;
> > +}
> >  migration_bitmap_sync_range(rs, block, 0, block->used_length);
> >  }
> >  rcu_read_unlock();
> > @@ -1499,6 +1509,10 @@ static int ram_save_host_page(RAMState *rs,
> PageSearchStatus *pss,
> >  size_t pagesize_bits =
> >  qemu_ram_pagesize(pss->block) >> TARGET_PAGE_BITS;
> >
> > +if (memory_region_is_ram_device(pss->block->mr)) {
> > +return 0;
> > +}
> > +
> >  do {
> >  tmppages = ram_save_target_page(rs, pss, last_stage);
> >  if (tmppages < 0) {
> > @@ -1588,6 +1602,9 @@ uint64_t ram_bytes_total(void)
> >
> >  rcu_read_lock();
> >  RAMBLOCK_FOREACH(block) {
> > +if (memory_region_is_ram_device(block->mr)) {
> > +continue;
> > +}
> >  total += block->used_length;
> >  }
> >  rcu_read_unlock();
> > @@ -1643,6 +1660,9 @@ static void ram_save_cleanup(void *opaque)
> >  memory_global_dirty_log_stop();
> >
> >  QLIST_FOREACH_RCU(block, _list.blocks, next) {
> > +if (memory_region_is_ram_device(block->mr)) {
> > +continue;
> > +}
> >  g_free(block->bmap);
> >  block->bmap = NULL;
> >  g_free(block->unsentmap);
> > @@ -1710,6 +1730,9 @@ void
> ram_postcopy_migrated_memory_release(MigrationState *ms)
> >  unsigned long range = block->used_length >> TARGET_PAGE_BITS;
> >  unsigned long run_start

Re: [Qemu-devel] [RFC PATCH V4 4/4] vifo: introduce new VFIO ioctl VFIO_IOMMU_GET_DIRTY_BITMAP

2018-04-12 Thread Eric Blake

On 04/10/2018 01:03 AM, Yulei Zhang wrote:
> New VFIO ioctl VFIO_IOMMU_GET_DIRTY_BITMAP is used to fetch the
> bitmap of pinned memory in iommu container, we need copy those
> memory to the target during the migration as they are dirtied by
> mdev devices.

[meta-comment]

This patch, and the other 3 in the series, were sent with no
'In-Reply-To:' or 'References:' headers; as such, they were not threaded
to the 0/4 cover letter ('Message-Id:
<1523340124-8881-1-git-send-email-yulei.zh...@intel.com>').  In the
future, you'll want to make sure that proper threading is preserved in
the mail messages, as it makes it a lot easier when each message in the
series is not a new top-level thread.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] Withdrawn: Tweak the NBD_OPT_SET_META_CONTEXT layout

2018-04-12 Thread Eric Blake

On 03/31/2018 07:24 AM, Eric Blake wrote:
> As mentioned in earlier email threads, I experimented with an
> alternative layout to the structs sent across the wire during
> NBD_OPT_{LIST,SET}_META_CONTEXT, on the grounds that having the
> namespace and leaf-name combined into one string that requires further
> parsing on each end is not as nice as having two separate fields.  I'm
> replying to this mail with two RFC patches, one for the NBD spec, and
> one for the qemu implementation.
> 
> If we like the idea, this has to go into qemu 2.12, so we have less than
> a week.  Or, we can ignore the RFC, keep qemu the way it is currently
> coded, and when it is released, it will be time to promote the NBD
> extension-blockstatus to current as there will be an existing
> implementation to be interoperable with.

As I got no comments, and as qemu is now frozen hard at 2.12-rc3, it is
officially too late to change the format sent over the wire; qemu's
implementation will be the reference implementation, and for nbd.git,
I'll be promoting extension-blockstatus to the master branch once qemu
2.12 is officially released.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v3 6/7] s390x/kvm: handle AP instruction interception

2018-04-12 Thread Tony Krowiak


On 04/11/2018 09:50 AM, Halil Pasic wrote:


On 04/11/2018 03:20 PM, Tony Krowiak wrote:

I may be all wrong, though... can we at least have a translation of
ECA.28 and EECA.28 (the "ap is there" bit and the "ap instructions are
interpreted" bit?)


I think we have a misunderstanding here. I will wait for Tony. Maybe
he can understand this better or explain in more accessible way.

  From what I get from your explanation, this approach sounds like a good
way forward. But let's wait for Tony.

I agree, this solves the problem with specifying installing AP facilities
on the guest (-cpu xxx,ap=on) and not configuring a vfio-ap device
(-device vfio-ap,sysfsdev=$path-to-mediated-device).

To recap:
The realize() function for the vfio-ap device opens the mediated matrix
device file descriptor to set up the communication pathway with the vfio_ap
kernel driver. When the fd is opened, the vfio_ap driver sets ECA.28 for the
guest to instruct SIE to interpret AP instructions. The driver also
configures the AP matrix for the guest in its SIE state description. 
Consequently,
if a vfio-ap device is not configured for the guest, but AP facilities are
installed, all AP instructions will be intercepted and routed to QEMU. If there
are no interception handlers for the AP instructions, QEMU injects an operation
exception into the guest. This results in initialization of the AP bus on the
guest to terminate. This patch was intended to circumvent that problem.

With Halil's suggestion, there is no need to provide these handlers. If ECA.28
is set for the guest by default when the AP facilities are installed, then the 
AP
instructions will be interpreted and the AP bus will get initialized on the
guest. Since there is no vfio-ap device to provide the AP matrix configuration
for the guest, the AP bus will not detect any devices, but that's okay. AP
instructions targeting at an APQN will execute successfully and set a response
code in the status block returned from the instruction indicating the
APQN is invalid  but there will be no exception unless there is truly
an exception condition caused by the execution of the instruction.

That's exactly the idea, but it will only work out this way if the effective
ECA.28 bit (i.e. EECA.28) is also set.

Could you please comment the first paragraph quoted in this email?
See my response to Connie's comment in Message ID 
<20180409113244.380a32c4.coh...@redhat.com>.
My response is in Message ID 
<17b4ab9b-a8c7-37ec-bae4-c57b146ba...@linux.vnet.ibm.com>.


Halil

Re: [Qemu-devel] [PATCH v3 6/7] s390x/kvm: handle AP instruction interception

2018-04-12 Thread Tony Krowiak


On 04/09/2018 05:32 AM, Cornelia Huck wrote:

On Fri, 6 Apr 2018 18:07:56 +0200
Halil Pasic  wrote:


On 04/05/2018 06:38 PM, Tony Krowiak wrote:

On 04/03/2018 05:36 AM, Cornelia Huck wrote:

On Mon, 2 Apr 2018 12:36:27 -0400
Tony Krowiak  wrote:
  

On 03/26/2018 05:03 AM, Pierre Morel wrote:

On 26/03/2018 10:32, David Hildenbrand wrote:

On 16.03.2018 00:24, Tony Krowiak wrote:

+/*
+ * The Query Configuration Information (QCI) function (fc == 4)
does not
+ * set a response code in reg 1, so check for that along with the
+ * AP feature.
+ */
+if ((fc != 4) && s390_has_feat(S390_FEAT_AP)) {
+env->regs[1] = 0x1;
+
+return 0;
+}

This would imply an operation exception in case fc==4, which sounds very
wrong.

It depends but I think that the S390_FEAT_AP_QUERY_CONFIG_INFO must be
tested
to know what to answer.
If the feature is there, QCI must be answered correctly.

This is an interesting proposition which raises several issues that will
need to
be addressed. The only time the PQAP(QCI) instruction is intercepted is
when:
* A vfio-ap device is NOT defined for the guest because the vfio_ap
device driver
 will set ECA.28 and the PQAP(QCI) instruction will be interpreted
* STFLE.12 is set for the guest

You say that the QCI must be answered correctly, but what is the correct
response?
If we return the matrix - i.e., APM, ADM and AQM - configured via the
mediated
matrix device's sysfs attributes files, then if any APQNs are defined in
the matrix,
we will have to handle *ALL* AP instructions by executing them on behalf
of the
guest. I suppose we could return an empty matrix in which case the AP
bus will come
up without any devices on the guest, but what is the expectation of an
admin who
deliberately configures the mediated matrix device? Should we forego
handling interception
of AP instructions and consider a guest configuration that turns on
S390_FEAT_AP but
does not define a vfio-ap device to be erroneous and terminate starting
of the guest?
Anybody have any thoughts?

Hard to really give good advice without access to the documentation, but:
- If we tell the guest that the feature is available, but it does not
get any cards to use, returning an empty matrix makes the most sense
to me.
- I would not tie starting the guest to the presence of a vfio-ap
device. Having the feature available in theory but without any
devices actually being usable by the guest does not really sound
wrong (can we hotplug this later?)

For this phase of development, it is my opinion that introducing AP instruction
interception handlers is superfluous for the following reasons:

1. Interception handling was introduced solely to ensure an operation exception 
would
not be injected into the guest when CPU model feature for AP (i.e., ap=on)
is specified but a VFIO AP device (i.e., -device vfio-ap,sysfsdev=$path)
is not.

We can kind of (i.e. modulo EECA.28) ensure this in a different fashion I 
think. How
about proclaiming a 'has ap instructions, but nothing to see here' in the
SIE interpreted flavor (ECA.28 set) the default way of having ap instructions
under KVM. This should be easily accomplished with an all zero CRYCB and eca.28
set. The for the guest to actually get real work done with AP we would
still require some sort of driver to either provide a non-zero matrix by
altering the CRYCB or unsettling ECA.28 and doing the intercepted flavor.

Please notice, the cpu facility ap would still keep it's semantic
'has ap instructions' (opposed to 'has ap instructions implemented in
SIE interpreted flavor). And give us all the flexibility.

Yet implementing what we want to have in absence of a driver would become
much easier (under the assumption that ECA.28 equals EECA.28).

How about this?

Unfortunately, this is really hard to follow without the AR... let me
summarize it to check whether I got the gist of it :)

- If the "ap" cpu feature is specified, set a bit that indicates "hey,
   we basically have have AP support" and create the basics, but don't
   enable actual SIE handling. This means the guest gets exceptions from
   the SIE already and we don't need to emulate them.
- Actually enable the missing pieces if a vfio device is created. This
   would enable processing by the SIE, and we would not need to do
   emulation, either (for most of it, IIRC).

I may be all wrong, though... can we at least have a translation of
ECA.28 and EECA.28 (the "ap is there" bit and the "ap instructions are
interpreted" bit?)

I am not sure what you are asking here, but I'll attempt to answer the
question I think you are asking.

The ap=on|off flag indicates that AP instructions are installed on the 
guest.

This feature is enabled by the kernel only if AP instructions are installed
on the host. Since there is no facilities bit to query, this is determined
by attempting to execute an AP instruction using an

Re: [Qemu-devel] [PATCH v3] monitor: let cur_mon be per-thread

2018-04-12 Thread Eric Blake

On 04/12/2018 01:11 AM, Peter Xu wrote:
> In the future the monitor iothread may be accessing the cur_mon as
> well (via monitor_qmp_dispatch_one()).  Before we introduce a real
> Out-Of-Band command, let's convert the cur_mon variable to be a
> per-thread variable to make sure there won't be a race between threads.
> 
> Note that thread variables are not initialized to a valid value when new
> thread is created.  However for our case we don't need to set it up,
> since the cur_mon variable is only used in such a pattern:
> 
>   old_mon = cur_mon;
>   cur_mon = xxx;
>   (do something, read cur_mon if necessary in the stack)
>   cur_mon = old_mon;
> 
> It plays a role as stack variable, so no need to be initialized at all.
> We only need to make sure the variable won't be changed unexpectedly by
> other threads.
> 
> Signed-off-by: Peter Xu 
> ---

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v2 1/2 for-2.12?] qapi: Parameter gl of DisplayType now accept an enum

2018-04-12 Thread Eric Blake

On 04/12/2018 09:11 AM, Elie Tournier wrote:
> Hello,
> 
> On Tue, Apr 10, 2018 at 03:33:35PM +0200, Gerd Hoffmann wrote:
>>> # @off: Disable OpenGL (default).
> 
> Just to be sure, I have to add @ in front of all parameter, right?

Yes.


>>> You may be in luck - there is no instance of 'window-close' in the
>>> introspection output, which means 'DisplayType' exists only for ease of
>>> command-line parsing and is not currently used by QMP, so tweaks here
>>> are not affecting the command line.
>>
>> Yes, right now the struct is only used to store the parsed command line
>> opts, so no effect on QMP.
>>
>> Plan for the future is to also parse command line options with generic
>> qapi/json code instead of the home-grown parser, but that switch didn't
>> happen yet.
>>
>>> That said, you can STILL name the enum value something smarter than 'on'
>>> IF you make the change now, for 2.12, given that the QAPI type was only
>>> introduced in 2.12 (you only have to worry about backwards compatibility
>>> if 2.11 already parsed gl=on).
>>
>> gl=on is older, so that must continue to work.  Making both "on" and
>> "auto" work isn't a problem for our home-grown parser (aka
>> parse_display() in vl.c).  But having quirks like this makes the switch
>> to generic parser code more difficuilt, so I'd prefer to avoid that ...
> 
> Is it possible to upstream this change before 2.12 release?

At this point, no, we've missed -rc3, and it's not a big enough bug to
warrant needing -rc4 on its own.  And as Gerd said, we already parsed
'gl=on' in 2.11 (the QAPI representation is new to 2.12, but only to
simplify how the existing command line parser was coded), so that's
different than if 2.12 were introducing brand-new content in a form that
should be fixed before it is baked into existing uses.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH] migration: discard RAMBlocks of type ram_device

2018-04-12 Thread Cédric Le Goater

On 04/12/2018 04:11 PM, Paolo Bonzini wrote:
> On 12/04/2018 15:51, Peter Maydell wrote:
>> Paolo may have an opinion what the API here should be, but
>> at the MemoryRegion level we already have a mix of functions
>> memory_region_init_foo_nomigrate() and memory_region_init_foo(),
>> which at the moment just control whether we call
>> vmstate_register_ram() or not. Is it possible to hook into that,
>> so that we default to marking RAMBlocks as not-migrated, and
>> when vmstate_register_ram() is called we set the flag to
>> "migrated"?
> 
> Yes, that sounds pretty.

I have added the MIGRATABLE flag un/setting under qemu_ram_un/set_idstr() 
which are the only routines called by vmstate_un/register_ram().   
Tell me if that's fine or if you'd rather have helpers to
set MIGRATABLE from savem ? below is some code extract.

Thanks,

C.

>> NB: this probably requires careful thought to make sure we
>> don't accidentally break migration compat, but it seems like
>> the cleaner approach. At the moment we do weird things if you
>> migrate a RAM block that hasn't had its ID string set, IIRC,
>> so just not migrating those RAM blocks seems like a better
>> plan than mishandling them. It would also mean that the
>> vmstate_register/unregister_ram() functions did what they
>> claim to do based on the function name, I think.

--- qemu-xive.git.orig/exec.c
+++ qemu-xive.git/exec.c
@@ -104,6 +104,9 @@ static MemoryRegion io_mem_unassigned;
  * (Set during postcopy)
  */
 #define RAM_UF_ZEROPAGE (1 << 3)
+
+/* RAM can be migrated */
+#define RAM_MIGRATABLE (1 << 4)
 #endif
 
 #ifdef TARGET_PAGE_BITS_VARY
@@ -1807,6 +1810,11 @@ void qemu_ram_set_uf_zeroable(RAMBlock *
 rb->flags |= RAM_UF_ZEROPAGE;
 }
 
+bool qemu_ram_is_migratable(RAMBlock *rb)
+{
+return rb->flags & RAM_MIGRATABLE;
+}
+
 /* Called with iothread lock held.  */
 void qemu_ram_set_idstr(RAMBlock *new_block, const char *name, DeviceState 
*dev)
 {
@@ -1823,6 +1831,7 @@ void qemu_ram_set_idstr(RAMBlock *new_bl
 }
 }
 pstrcat(new_block->idstr, sizeof(new_block->idstr), name);
+new_block->flags |= RAM_MIGRATABLE;
 
 rcu_read_lock();
 RAMBLOCK_FOREACH(block) {
@@ -1845,6 +1854,7 @@ void qemu_ram_unset_idstr(RAMBlock *bloc
  */
 if (block) {
 memset(block->idstr, 0, sizeof(block->idstr));
+block->flags &= ~RAM_MIGRATABLE;
 }
 }

[Qemu-devel] [PATCH v3 6/6] vhost-user: support registering external host notifiers

2018-04-12 Thread Tiwei Bie

This patch introduces VHOST_USER_PROTOCOL_F_HOST_NOTIFIER.
With this feature negotiated, vhost-user backend can register
memory region based host notifiers. And it will allow the guest
driver in the VM to notify the hardware accelerator at the
vhost-user backend directly.

Signed-off-by: Tiwei Bie 
---
 docs/interop/vhost-user.txt|  33 +++
 hw/virtio/vhost-user.c | 123 +
 include/hw/virtio/vhost-user.h |   8 +++
 3 files changed, 164 insertions(+)

diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
index 534caab18a..9e57b36b20 100644
--- a/docs/interop/vhost-user.txt
+++ b/docs/interop/vhost-user.txt
@@ -132,6 +132,16 @@ Depending on the request type, payload can be:
Payload: Size bytes array holding the contents of the virtio
device's configuration space
 
+ * Vring area description
+   ---
+   | u64 | size | offset |
+   ---
+
+   u64: a 64-bit integer contains vring index and flags
+   Size: a 64-bit size of this area
+   Offset: a 64-bit offset of this area from the start of the
+   supplied file descriptor
+
 In QEMU the vhost-user message is implemented with the following struct:
 
 typedef struct VhostUserMsg {
@@ -146,6 +156,7 @@ typedef struct VhostUserMsg {
 VhostUserLog log;
 struct vhost_iotlb_msg iotlb;
 VhostUserConfig config;
+VhostUserVringArea area;
 };
 } QEMU_PACKED VhostUserMsg;
 
@@ -380,6 +391,7 @@ Protocol features
 #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7
 #define VHOST_USER_PROTOCOL_F_PAGEFAULT  8
 #define VHOST_USER_PROTOCOL_F_CONFIG 9
+#define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER  10
 
 Master message types
 
@@ -777,6 +789,27 @@ Slave message types
  the VHOST_USER_NEED_REPLY flag, master must respond with zero when
  operation is successfully completed, or non-zero otherwise.
 
+ * VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG
+
+  Id: 3
+  Equivalent ioctl: N/A
+  Slave payload: vring area description
+  Master payload: N/A
+
+  Sets host notifier for a specified queue. The queue index is contained
+  in the u64 field of the vring area description. The host notifier is
+  described by the file descriptor (typically it's a VFIO device fd) which
+  is passed as ancillary data and the size (which is mmap size and should
+  be the same as host page size) and offset (which is mmap offset) carried
+  in the vring area description. QEMU can mmap the file descriptor based
+  on the size and offset to get a memory range. Registering a host notifier
+  means mapping this memory range to the VM as the specified queue's notify
+  MMIO region. Slave sends this request to tell QEMU to de-register the
+  existing notifier if any and register the new notifier if the request is
+  sent with a file descriptor.
+  This request should be sent only when VHOST_USER_PROTOCOL_F_HOST_NOTIFIER
+  protocol feature has been successfully negotiated.
+
 VHOST_USER_PROTOCOL_F_REPLY_ACK:
 ---
 The original vhost-user specification only demands replies for certain
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 791e0a4763..1cd9c7276b 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -13,6 +13,7 @@
 #include "hw/virtio/vhost.h"
 #include "hw/virtio/vhost-user.h"
 #include "hw/virtio/vhost-backend.h"
+#include "hw/virtio/virtio.h"
 #include "hw/virtio/virtio-net.h"
 #include "chardev/char-fe.h"
 #include "sysemu/kvm.h"
@@ -48,6 +49,7 @@ enum VhostUserProtocolFeature {
 VHOST_USER_PROTOCOL_F_CRYPTO_SESSION = 7,
 VHOST_USER_PROTOCOL_F_PAGEFAULT = 8,
 VHOST_USER_PROTOCOL_F_CONFIG = 9,
+VHOST_USER_PROTOCOL_F_HOST_NOTIFIER = 10,
 VHOST_USER_PROTOCOL_F_MAX
 };
 
@@ -92,6 +94,7 @@ typedef enum VhostUserSlaveRequest {
 VHOST_USER_SLAVE_NONE = 0,
 VHOST_USER_SLAVE_IOTLB_MSG = 1,
 VHOST_USER_SLAVE_CONFIG_CHANGE_MSG = 2,
+VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG = 3,
 VHOST_USER_SLAVE_MAX
 }  VhostUserSlaveRequest;
 
@@ -136,6 +139,12 @@ static VhostUserConfig c __attribute__ ((unused));
+ sizeof(c.size) \
+ sizeof(c.flags))
 
+typedef struct VhostUserVringArea {
+uint64_t u64;
+uint64_t size;
+uint64_t offset;
+} VhostUserVringArea;
+
 typedef struct {
 VhostUserRequest request;
 
@@ -157,6 +166,7 @@ typedef union {
 struct vhost_iotlb_msg iotlb;
 VhostUserConfig config;
 VhostUserCryptoSession session;
+VhostUserVringArea area;
 } VhostUserPayload;
 
 typedef struct VhostUserMsg {
@@ -638,9 +648,37 @@ static int vhost_user_set_vring_num(struct vhost_dev *dev,
 return vhost_set_vring(dev, VHOST_USER_SET_VRING_NUM, ring);
 }
 
+static void vhost_user_host_notifier_restore(struct vhost_dev *dev,
+

Re: [Qemu-devel] [PATCH] migration: discard RAMBlocks of type ram_device

2018-04-12 Thread Eric Blake

On 04/12/2018 06:47 AM, Peter Maydell wrote:

>>  /*
>> + * Identifies RAM blocks which should be discarded from migration. For
>> + * the moment, it only applies to blocks backed by a 'ram_device'
>> + * memory region.
>> + */
>> +static inline bool ram_block_is_migratable(RAMBlock *block)
>> +{
>> +return !memory_region_is_ram_device(block->mr);
>> +}
>> +
>> +/* Should be holding either ram_list.mutex, or the RCU lock. */
>> +#define RAMBLOCK_FOREACH_MIGRATABLE(block) \
>> +RAMBLOCK_FOREACH(block)\
>> +if (ram_block_is_migratable(block))
> 
> This will mishandle some uses, like:
> 
> if (foo)
> RAMBLOCK_FOREACH_MIGRATABLE(block)
> stuff;
> else
> morestuff;
> 
> as the if() inside the macro will capture the else clause.
> (The lack of braces in the calling code would be against our
> coding style, of course, so not very likely.)

All existing callers of RAMBLOCK_FOREACH() already supply their own {}
around stuff (same is true for QLIST_FOREACH_RCU(), which is what is
being used under the hood).  By the time you correct that to:

if (foo)
RAMBLOCK_FOREACH_MIGRATABLE(block) {
stuff;
}
else
morestuff;

then even though the outer if violates coding standard, the correct
usage of the macro with {} around stuff won't leak that there is a
trailing if.

But yeah, if you're worrying about code that omitted {}, then dealing
with omitted {} in both places is something to think about.

> 
> Eric, is there a 'standard' trick for this? I thought of
> maybe
> 
> #define RAMBLOCK_FOREACH_MIGRATABLE(block) \
> RAMBLOCK_FOREACH(block)\
> if (!ram_block_is_migratable(block)) {} else

at which point, yes, this is a little bit safer for taking exactly one
statement without risking that a later bare 'else' could match to the
wrong unbraced 'if'.  I'm not coming up with any better ideas for
(abusing?) the syntax.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH v3 3/6] vhost-user: support receiving file descriptors in slave_read

2018-04-12 Thread Tiwei Bie

Signed-off-by: Tiwei Bie 
---
 hw/virtio/vhost-user.c | 41 -
 1 file changed, 40 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 91edd95453..9cea2c8c51 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -854,14 +854,44 @@ static void slave_read(void *opaque)
 VhostUserHeader hdr = { 0, };
 VhostUserPayload payload = { 0, };
 int size, ret = 0;
+struct iovec iov;
+struct msghdr msgh;
+int fd = -1;
+char control[CMSG_SPACE(sizeof(fd))];
+struct cmsghdr *cmsg;
+size_t fdsize;
+
+memset(, 0, sizeof(msgh));
+msgh.msg_iov = 
+msgh.msg_iovlen = 1;
+msgh.msg_control = control;
+msgh.msg_controllen = sizeof(control);
 
 /* Read header */
-size = read(u->slave_fd, , VHOST_USER_HDR_SIZE);
+iov.iov_base = 
+iov.iov_len = VHOST_USER_HDR_SIZE;
+
+size = recvmsg(u->slave_fd, , 0);
 if (size != VHOST_USER_HDR_SIZE) {
 error_report("Failed to read from slave.");
 goto err;
 }
 
+if (msgh.msg_flags & MSG_CTRUNC) {
+error_report("Truncated message.");
+goto err;
+}
+
+for (cmsg = CMSG_FIRSTHDR(); cmsg != NULL;
+ cmsg = CMSG_NXTHDR(, cmsg)) {
+if (cmsg->cmsg_level == SOL_SOCKET &&
+cmsg->cmsg_type == SCM_RIGHTS) {
+fdsize = cmsg->cmsg_len - CMSG_LEN(0);
+memcpy(, CMSG_DATA(cmsg), fdsize);
+break;
+}
+}
+
 if (hdr.size > VHOST_USER_PAYLOAD_SIZE) {
 error_report("Failed to read msg header."
 " Size %d exceeds the maximum %zu.", hdr.size,
@@ -885,9 +915,15 @@ static void slave_read(void *opaque)
 break;
 default:
 error_report("Received unexpected msg type.");
+if (fd != -1) {
+close(fd);
+}
 ret = -EINVAL;
 }
 
+/* Message handlers need to make sure that fd will be consumed. */
+fd = -1;
+
 /*
  * REPLY_ACK feature handling. Other reply types has to be managed
  * directly in their request handlers.
@@ -920,6 +956,9 @@ err:
 qemu_set_fd_handler(u->slave_fd, NULL, NULL, NULL);
 close(u->slave_fd);
 u->slave_fd = -1;
+if (fd != -1) {
+close(fd);
+}
 return;
 }
 
-- 
2.11.0

[Qemu-devel] [PATCH v3 5/6] vhost: allow backends to filter memory sections

2018-04-12 Thread Tiwei Bie

This patch introduces a vhost op for vhost backends to allow
them to filter the memory sections that they can handle.

Signed-off-by: Tiwei Bie 
---
 hw/virtio/vhost-user.c| 11 +++
 hw/virtio/vhost.c |  9 +++--
 include/hw/virtio/vhost-backend.h |  4 
 3 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 9cea2c8c51..791e0a4763 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -1622,6 +1622,16 @@ vhost_user_crypto_close_session(struct vhost_dev *dev, 
uint64_t session_id)
 return 0;
 }
 
+static bool vhost_user_mem_section_filter(struct vhost_dev *dev,
+  MemoryRegionSection *section)
+{
+bool result;
+
+result = memory_region_get_fd(section->mr) >= 0;
+
+return result;
+}
+
 VhostUserState *vhost_user_init(void)
 {
 VhostUserState *user = g_new0(struct VhostUserState, 1);
@@ -1663,4 +1673,5 @@ const VhostOps user_ops = {
 .vhost_set_config = vhost_user_set_config,
 .vhost_crypto_create_session = vhost_user_crypto_create_session,
 .vhost_crypto_close_session = vhost_user_crypto_close_session,
+.vhost_backend_mem_section_filter = vhost_user_mem_section_filter,
 };
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index f51bf573d5..a939a38d97 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -382,7 +382,7 @@ static int vhost_verify_ring_mappings(struct vhost_dev *dev,
 return r;
 }
 
-static bool vhost_section(MemoryRegionSection *section)
+static bool vhost_section(struct vhost_dev *dev, MemoryRegionSection *section)
 {
 bool result;
 bool log_dirty = memory_region_get_dirty_log_mask(section->mr) &
@@ -395,6 +395,11 @@ static bool vhost_section(MemoryRegionSection *section)
  */
 result &= !log_dirty;
 
+if (result && dev->vhost_ops->vhost_backend_mem_section_filter) {
+result &=
+dev->vhost_ops->vhost_backend_mem_section_filter(dev, section);
+}
+
 trace_vhost_section(section->mr->name, result);
 return result;
 }
@@ -628,7 +633,7 @@ static void vhost_region_addnop(MemoryListener *listener,
 struct vhost_dev *dev = container_of(listener, struct vhost_dev,
  memory_listener);
 
-if (!vhost_section(section)) {
+if (!vhost_section(dev, section)) {
 return;
 }
 vhost_region_add_section(dev, section);
diff --git a/include/hw/virtio/vhost-backend.h 
b/include/hw/virtio/vhost-backend.h
index 5dac61f9ea..81283ec50f 100644
--- a/include/hw/virtio/vhost-backend.h
+++ b/include/hw/virtio/vhost-backend.h
@@ -101,6 +101,9 @@ typedef int (*vhost_crypto_create_session_op)(struct 
vhost_dev *dev,
 typedef int (*vhost_crypto_close_session_op)(struct vhost_dev *dev,
  uint64_t session_id);
 
+typedef bool (*vhost_backend_mem_section_filter_op)(struct vhost_dev *dev,
+MemoryRegionSection *section);
+
 typedef struct VhostOps {
 VhostBackendType backend_type;
 vhost_backend_init vhost_backend_init;
@@ -138,6 +141,7 @@ typedef struct VhostOps {
 vhost_set_config_op vhost_set_config;
 vhost_crypto_create_session_op vhost_crypto_create_session;
 vhost_crypto_close_session_op vhost_crypto_close_session;
+vhost_backend_mem_section_filter_op vhost_backend_mem_section_filter;
 } VhostOps;
 
 extern const VhostOps user_ops;
-- 
2.11.0

[Qemu-devel] [PATCH v3 2/6] vhost-user: introduce shared vhost-user state

2018-04-12 Thread Tiwei Bie

When multi queue is enabled e.g. for a virtio-net device,
each queue pair will have a vhost_dev, and the only thing
shared between vhost devs currently is the chardev. This
patch introduces a vhost-user state structure which will
be shared by all vhost devs of the same virtio device.

Signed-off-by: Tiwei Bie 
---
 backends/cryptodev-vhost-user.c | 20 ++-
 hw/block/vhost-user-blk.c   | 22 +++-
 hw/scsi/vhost-user-scsi.c   | 20 ++-
 hw/virtio/Makefile.objs |  2 +-
 hw/virtio/vhost-stub.c  | 10 ++
 hw/virtio/vhost-user.c  | 31 +++-
 include/hw/virtio/vhost-user-blk.h  |  2 ++
 include/hw/virtio/vhost-user-scsi.h |  2 ++
 include/hw/virtio/vhost-user.h  | 20 +++
 net/vhost-user.c| 40 ++---
 10 files changed, 149 insertions(+), 20 deletions(-)
 create mode 100644 include/hw/virtio/vhost-user.h

diff --git a/backends/cryptodev-vhost-user.c b/backends/cryptodev-vhost-user.c
index 862d4f2580..d52daccfcd 100644
--- a/backends/cryptodev-vhost-user.c
+++ b/backends/cryptodev-vhost-user.c
@@ -26,6 +26,7 @@
 #include "qapi/error.h"
 #include "qapi/qmp/qerror.h"
 #include "qemu/error-report.h"
+#include "hw/virtio/vhost-user.h"
 #include "standard-headers/linux/virtio_crypto.h"
 #include "sysemu/cryptodev-vhost.h"
 #include "chardev/char-fe.h"
@@ -46,6 +47,7 @@
 typedef struct CryptoDevBackendVhostUser {
 CryptoDevBackend parent_obj;
 
+VhostUserState *vhost_user;
 CharBackend chr;
 char *chr_name;
 bool opened;
@@ -102,7 +104,7 @@ cryptodev_vhost_user_start(int queues,
 continue;
 }
 
-options.opaque = >chr;
+options.opaque = s->vhost_user;
 options.backend_type = VHOST_BACKEND_TYPE_USER;
 options.cc = b->conf.peers.ccs[i];
 s->vhost_crypto[i] = cryptodev_vhost_init();
@@ -185,6 +187,7 @@ static void cryptodev_vhost_user_init(
 size_t i;
 Error *local_err = NULL;
 Chardev *chr;
+VhostUserState *user;
 CryptoDevBackendClient *cc;
 CryptoDevBackendVhostUser *s =
   CRYPTODEV_BACKEND_VHOST_USER(backend);
@@ -215,6 +218,15 @@ static void cryptodev_vhost_user_init(
 }
 }
 
+user = vhost_user_init();
+if (!user) {
+error_setg(errp, "Failed to init vhost_user");
+return;
+}
+
+user->chr = >chr;
+s->vhost_user = user;
+
 qemu_chr_fe_set_handlers(>chr, NULL, NULL,
  cryptodev_vhost_user_event, NULL, s, NULL, true);
 
@@ -299,6 +311,12 @@ static void cryptodev_vhost_user_cleanup(
 backend->conf.peers.ccs[i] = NULL;
 }
 }
+
+if (s->vhost_user) {
+vhost_user_cleanup(s->vhost_user);
+g_free(s->vhost_user);
+s->vhost_user = NULL;
+}
 }
 
 static void cryptodev_vhost_user_set_chardev(Object *obj,
diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 262baca432..4021d71c31 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -229,6 +229,7 @@ static void vhost_user_blk_device_realize(DeviceState *dev, 
Error **errp)
 {
 VirtIODevice *vdev = VIRTIO_DEVICE(dev);
 VHostUserBlk *s = VHOST_USER_BLK(vdev);
+VhostUserState *user;
 int i, ret;
 
 if (!s->chardev.chr) {
@@ -246,6 +247,15 @@ static void vhost_user_blk_device_realize(DeviceState 
*dev, Error **errp)
 return;
 }
 
+user = vhost_user_init();
+if (!user) {
+error_setg(errp, "vhost-user-blk: failed to init vhost_user");
+return;
+}
+
+user->chr = >chardev;
+s->vhost_user = user;
+
 virtio_init(vdev, "virtio-blk", VIRTIO_ID_BLOCK,
 sizeof(struct virtio_blk_config));
 
@@ -261,7 +271,7 @@ static void vhost_user_blk_device_realize(DeviceState *dev, 
Error **errp)
 
 vhost_dev_set_config_notifier(>dev, _ops);
 
-ret = vhost_dev_init(>dev, >chardev, VHOST_BACKEND_TYPE_USER, 0);
+ret = vhost_dev_init(>dev, s->vhost_user, VHOST_BACKEND_TYPE_USER, 0);
 if (ret < 0) {
 error_setg(errp, "vhost-user-blk: vhost initialization failed: %s",
strerror(-ret));
@@ -286,6 +296,10 @@ vhost_err:
 virtio_err:
 g_free(s->dev.vqs);
 virtio_cleanup(vdev);
+
+vhost_user_cleanup(user);
+g_free(user);
+s->vhost_user = NULL;
 }
 
 static void vhost_user_blk_device_unrealize(DeviceState *dev, Error **errp)
@@ -297,6 +311,12 @@ static void vhost_user_blk_device_unrealize(DeviceState 
*dev, Error **errp)
 vhost_dev_cleanup(>dev);
 g_free(s->dev.vqs);
 virtio_cleanup(vdev);
+
+if (s->vhost_user) {
+vhost_user_cleanup(s->vhost_user);
+g_free(s->vhost_user);
+s->vhost_user = NULL;
+}
 }
 
 static void vhost_user_blk_instance_init(Object *obj)
diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c

[Qemu-devel] [PATCH v3 4/6] virtio: support setting memory region based host notifier

2018-04-12 Thread Tiwei Bie

This patch introduces the support for setting memory region
based host notifiers for virtio device. This is helpful when
using a hardware accelerator for a virtio device, because
hardware heavily depends on the notification, this will allow
the guest driver in the VM to notify the hardware directly.

Signed-off-by: Tiwei Bie 
---
 hw/virtio/virtio-pci.c | 22 ++
 hw/virtio/virtio.c | 13 +
 include/hw/virtio/virtio-bus.h |  2 ++
 include/hw/virtio/virtio.h |  2 ++
 4 files changed, 39 insertions(+)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 1e8ab7bbc5..5eb0c323ca 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1037,6 +1037,27 @@ assign_error:
 return r;
 }
 
+static int virtio_pci_set_host_notifier_mr(DeviceState *d, int n,
+   MemoryRegion *mr, bool assign)
+{
+VirtIOPCIProxy *proxy = to_virtio_pci_proxy(d);
+int offset;
+
+if (n >= VIRTIO_QUEUE_MAX || !virtio_pci_modern(proxy) ||
+virtio_pci_queue_mem_mult(proxy) != memory_region_size(mr)) {
+return -1;
+}
+
+if (assign) {
+offset = virtio_pci_queue_mem_mult(proxy) * n;
+memory_region_add_subregion_overlap(>notify.mr, offset, mr, 1);
+} else {
+memory_region_del_subregion(>notify.mr, mr);
+}
+
+return 0;
+}
+
 static void virtio_pci_vmstate_change(DeviceState *d, bool running)
 {
 VirtIOPCIProxy *proxy = to_virtio_pci_proxy(d);
@@ -2652,6 +2673,7 @@ static void virtio_pci_bus_class_init(ObjectClass *klass, 
void *data)
 k->has_extra_state = virtio_pci_has_extra_state;
 k->query_guest_notifiers = virtio_pci_query_guest_notifiers;
 k->set_guest_notifiers = virtio_pci_set_guest_notifiers;
+k->set_host_notifier_mr = virtio_pci_set_host_notifier_mr;
 k->vmstate_change = virtio_pci_vmstate_change;
 k->pre_plugged = virtio_pci_pre_plugged;
 k->device_plugged = virtio_pci_device_plugged;
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 006d3d1148..1debb0147b 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2454,6 +2454,19 @@ EventNotifier *virtio_queue_get_host_notifier(VirtQueue 
*vq)
 return >host_notifier;
 }
 
+int virtio_queue_set_host_notifier_mr(VirtIODevice *vdev, int n,
+  MemoryRegion *mr, bool assign)
+{
+BusState *qbus = qdev_get_parent_bus(DEVICE(vdev));
+VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
+
+if (k->set_host_notifier_mr) {
+return k->set_host_notifier_mr(qbus->parent, n, mr, assign);
+}
+
+return -1;
+}
+
 void virtio_device_set_child_bus_name(VirtIODevice *vdev, char *bus_name)
 {
 g_free(vdev->bus_name);
diff --git a/include/hw/virtio/virtio-bus.h b/include/hw/virtio/virtio-bus.h
index ced3d2d2b0..7fec9dc929 100644
--- a/include/hw/virtio/virtio-bus.h
+++ b/include/hw/virtio/virtio-bus.h
@@ -52,6 +52,8 @@ typedef struct VirtioBusClass {
 bool (*has_extra_state)(DeviceState *d);
 bool (*query_guest_notifiers)(DeviceState *d);
 int (*set_guest_notifiers)(DeviceState *d, int nvqs, bool assign);
+int (*set_host_notifier_mr)(DeviceState *d, int n,
+MemoryRegion *mr, bool assign);
 void (*vmstate_change)(DeviceState *d, bool running);
 /*
  * Expose the features the transport layer supports before
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 098bdaaea3..9c1fa07d6d 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -239,6 +239,8 @@ void virtio_queue_set_align(VirtIODevice *vdev, int n, int 
align);
 void virtio_queue_notify(VirtIODevice *vdev, int n);
 uint16_t virtio_queue_vector(VirtIODevice *vdev, int n);
 void virtio_queue_set_vector(VirtIODevice *vdev, int n, uint16_t vector);
+int virtio_queue_set_host_notifier_mr(VirtIODevice *vdev, int n,
+  MemoryRegion *mr, bool assign);
 int virtio_set_status(VirtIODevice *vdev, uint8_t val);
 void virtio_reset(void *opaque);
 void virtio_update_irq(VirtIODevice *vdev);
-- 
2.11.0

[Qemu-devel] [PATCH v3 0/6] Extend vhost-user to support registering external host notifiers

2018-04-12 Thread Tiwei Bie

The original subject is: Extend vhost-user to support VFIO based accelerators

Update notes


Now, this patch set just focuses on adding the support for
registering memory region based host notifiers. With this
support, guest driver in the VM will be able to notify the
hardware device at the vhost backend directly.

It's one of the most important things in vDPA -- the host
notification offload. Because, normally, the hardware device
heavily depends on the notifications. Without this support,
there will be a lot of VM-Exit happen due to the notifications
from guest driver (it will drop the VM performance) and a
lot of CPU resources wasted to do the notification relay
(it will make the hardware offload less attractive, because
one important goal of hardware offload is to free the CPU
resources).

More backgrounds of this patch set can be found from the
cover letter of the previous versions:

RFC: http://lists.nongnu.org/archive/html/qemu-devel/2017-12/msg04844.html
v1:  http://lists.nongnu.org/archive/html/qemu-devel/2018-01/msg06028.html
v2:  http://lists.nongnu.org/archive/html/qemu-devel/2018-03/msg05009.html

v2 -> v3:
- A better implementation of the shared vhost-user state (MST);
- Use bus callback to add/delete subregions for notification (MST);
- Refine APIs' names which add/delete subregions for notification (MST);
- Refine the doc of the new vhost-user types and messages (MST);
- Separate host notification offload from the guest notification offload (MST);
- Drop the guest notification offload support from this patch;
- Add memory filter for vhost backend to filter the sections they can handle;

v1 -> v2:
- Add some explanations about why extend vhost-user in commit log (Paolo);
- Bug fix in slave_read() according to Stefan's fix in DPDK;
- Remove IOMMU feature check and related commit log;
- Some minor refinements;
- Rebase to the latest QEMU;

RFC -> v1:
- Add some details about how vDPA works in cover letter (Alexey)
- Add some details about the OVS offload use-case in cover letter (Jason)
- Move PCI specific stuffs out of vhost-user (Jason)
- Handle the virtual IOMMU case (Jason)
- Move VFIO group management code into vfio/common.c (Alex)
- Various refinements;
(approximately sorted by comment posting time)

Tiwei Bie (6):
  vhost-user: add Net prefix to internal state structure
  vhost-user: introduce shared vhost-user state
  vhost-user: support receiving file descriptors in slave_read
  virtio: support setting memory region based host notifier
  vhost: allow backends to filter memory sections
  vhost-user: support registering external host notifiers

 backends/cryptodev-vhost-user.c |  20 +++-
 docs/interop/vhost-user.txt |  33 ++
 hw/block/vhost-user-blk.c   |  22 +++-
 hw/scsi/vhost-user-scsi.c   |  20 +++-
 hw/virtio/Makefile.objs |   2 +-
 hw/virtio/vhost-stub.c  |  10 ++
 hw/virtio/vhost-user.c  | 206 ++--
 hw/virtio/vhost.c   |   9 +-
 hw/virtio/virtio-pci.c  |  22 
 hw/virtio/virtio.c  |  13 +++
 include/hw/virtio/vhost-backend.h   |   4 +
 include/hw/virtio/vhost-user-blk.h  |   2 +
 include/hw/virtio/vhost-user-scsi.h |   2 +
 include/hw/virtio/vhost-user.h  |  28 +
 include/hw/virtio/virtio-bus.h  |   2 +
 include/hw/virtio/virtio.h  |   2 +
 net/vhost-user.c|  78 +-
 17 files changed, 433 insertions(+), 42 deletions(-)
 create mode 100644 include/hw/virtio/vhost-user.h

-- 
2.11.0

[Qemu-devel] [PATCH v3 1/6] vhost-user: add Net prefix to internal state structure

2018-04-12 Thread Tiwei Bie

We are going to introduce a shared vhost user state which
will be named as 'VhostUserState'. So add 'Net' prefix to
the existing internal state structure in the vhost-user
netdev to avoid conflict.

Signed-off-by: Tiwei Bie 
---
 net/vhost-user.c | 38 +++---
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/net/vhost-user.c b/net/vhost-user.c
index e0f16c895b..fa28aad12d 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -20,38 +20,38 @@
 #include "qemu/option.h"
 #include "trace.h"
 
-typedef struct VhostUserState {
+typedef struct NetVhostUserState {
 NetClientState nc;
 CharBackend chr; /* only queue index 0 */
 VHostNetState *vhost_net;
 guint watch;
 uint64_t acked_features;
 bool started;
-} VhostUserState;
+} NetVhostUserState;
 
 VHostNetState *vhost_user_get_vhost_net(NetClientState *nc)
 {
-VhostUserState *s = DO_UPCAST(VhostUserState, nc, nc);
+NetVhostUserState *s = DO_UPCAST(NetVhostUserState, nc, nc);
 assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_USER);
 return s->vhost_net;
 }
 
 uint64_t vhost_user_get_acked_features(NetClientState *nc)
 {
-VhostUserState *s = DO_UPCAST(VhostUserState, nc, nc);
+NetVhostUserState *s = DO_UPCAST(NetVhostUserState, nc, nc);
 assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_USER);
 return s->acked_features;
 }
 
 static void vhost_user_stop(int queues, NetClientState *ncs[])
 {
-VhostUserState *s;
+NetVhostUserState *s;
 int i;
 
 for (i = 0; i < queues; i++) {
 assert(ncs[i]->info->type == NET_CLIENT_DRIVER_VHOST_USER);
 
-s = DO_UPCAST(VhostUserState, nc, ncs[i]);
+s = DO_UPCAST(NetVhostUserState, nc, ncs[i]);
 
 if (s->vhost_net) {
 /* save acked features */
@@ -68,7 +68,7 @@ static int vhost_user_start(int queues, NetClientState 
*ncs[], CharBackend *be)
 {
 VhostNetOptions options;
 struct vhost_net *net = NULL;
-VhostUserState *s;
+NetVhostUserState *s;
 int max_queues;
 int i;
 
@@ -77,7 +77,7 @@ static int vhost_user_start(int queues, NetClientState 
*ncs[], CharBackend *be)
 for (i = 0; i < queues; i++) {
 assert(ncs[i]->info->type == NET_CLIENT_DRIVER_VHOST_USER);
 
-s = DO_UPCAST(VhostUserState, nc, ncs[i]);
+s = DO_UPCAST(NetVhostUserState, nc, ncs[i]);
 
 options.net_backend = ncs[i];
 options.opaque  = be;
@@ -123,7 +123,7 @@ static ssize_t vhost_user_receive(NetClientState *nc, const 
uint8_t *buf,
without GUEST_ANNOUNCE capability.
  */
 if (size == 60) {
-VhostUserState *s = DO_UPCAST(VhostUserState, nc, nc);
+NetVhostUserState *s = DO_UPCAST(NetVhostUserState, nc, nc);
 int r;
 static int display_rarp_failure = 1;
 char mac_addr[6];
@@ -146,7 +146,7 @@ static ssize_t vhost_user_receive(NetClientState *nc, const 
uint8_t *buf,
 
 static void vhost_user_cleanup(NetClientState *nc)
 {
-VhostUserState *s = DO_UPCAST(VhostUserState, nc, nc);
+NetVhostUserState *s = DO_UPCAST(NetVhostUserState, nc, nc);
 
 if (s->vhost_net) {
 vhost_net_cleanup(s->vhost_net);
@@ -180,7 +180,7 @@ static bool vhost_user_has_ufo(NetClientState *nc)
 
 static NetClientInfo net_vhost_user_info = {
 .type = NET_CLIENT_DRIVER_VHOST_USER,
-.size = sizeof(VhostUserState),
+.size = sizeof(NetVhostUserState),
 .receive = vhost_user_receive,
 .cleanup = vhost_user_cleanup,
 .has_vnet_hdr = vhost_user_has_vnet_hdr,
@@ -190,7 +190,7 @@ static NetClientInfo net_vhost_user_info = {
 static gboolean net_vhost_user_watch(GIOChannel *chan, GIOCondition cond,
void *opaque)
 {
-VhostUserState *s = opaque;
+NetVhostUserState *s = opaque;
 
 qemu_chr_fe_disconnect(>chr);
 
@@ -203,7 +203,7 @@ static void chr_closed_bh(void *opaque)
 {
 const char *name = opaque;
 NetClientState *ncs[MAX_QUEUE_NUM];
-VhostUserState *s;
+NetVhostUserState *s;
 Error *err = NULL;
 int queues;
 
@@ -212,7 +212,7 @@ static void chr_closed_bh(void *opaque)
   MAX_QUEUE_NUM);
 assert(queues < MAX_QUEUE_NUM);
 
-s = DO_UPCAST(VhostUserState, nc, ncs[0]);
+s = DO_UPCAST(NetVhostUserState, nc, ncs[0]);
 
 qmp_set_link(name, false, );
 vhost_user_stop(queues, ncs);
@@ -229,7 +229,7 @@ static void net_vhost_user_event(void *opaque, int event)
 {
 const char *name = opaque;
 NetClientState *ncs[MAX_QUEUE_NUM];
-VhostUserState *s;
+NetVhostUserState *s;
 Chardev *chr;
 Error *err = NULL;
 int queues;
@@ -239,7 +239,7 @@ static void net_vhost_user_event(void *opaque, int event)
   MAX_QUEUE_NUM);
 assert(queues < MAX_QUEUE_NUM);
 
-s = DO_UPCAST(VhostUserState, nc, ncs[0]);
+s = DO_UPCAST(NetVhostUserState, nc, ncs[0]);

[Qemu-devel] [Bug 1761401] Re: ARM/Neon: vcvt rounding error

2018-04-12 Thread Christophe Lyon

I updated my QEMU git tree such that it includes commit
bd49e6027cbc207c, and the test now passes.

Thanks for the prompt fix!


** Changed in: qemu
   Status: New => Fix Committed

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1761401

Title:
  ARM/Neon: vcvt rounding error

Status in QEMU:
  Fix Committed

Bug description:
  Hello,

  While using QEMU commit 47d3b60858d90ac8a0cc3a72af7f95c96781125a
  (March 28, 2018), I've noticed failures in one of the GCC ARM/Neon
  tests. The test passes on hardware, and with QEMU-2.11.0, so it looks
  like a recent regression.

  The test builds a vector of 4 float32 with "125.9" as value, then converts 
them to 4 uint32_t.
  The expected result is 125, but we get 126 instead.

  Maybe it's just a matter of default rounding mode?

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1761401/+subscriptions

Re: [Qemu-devel] [PATCH v3 35/38] console: make screendump asynchronous

2018-04-12 Thread Dr. David Alan Gilbert

* Marc-André Lureau (marcandre.lur...@redhat.com) wrote:
> Make screendump asynchronous to provide correct screendumps.
> 
> HMP doesn't have async support, so it has to remain synchronous and
> potentially incorrect to avoid potential races.
> 
> Fixes:
> https://bugzilla.redhat.com/show_bug.cgi?id=1230527
> 
> Signed-off-by: Marc-André Lureau 

Hi,
  Looking at this and the previous pair of patches, I think we'd be
better if we defined that async commands took a callback function
pointer and data that they call.

  Here we've got 'graphic_hw_update_done' that explicitly walks qmp
lists; but I think it's better just to pass graphic_hw_update() the
completion data together with a callback function and it calls that when
it's done.

(Also isn't it a little bit of a race, calling graphic_hw_update() and
*then* adding the entry to the list? Can't it end up calling the _done()
before we've added it to the list).

Also, do we need a list of outstanding screendumps, or should we just
have a list of async commands.

It wouldn't seem hard to use the async-QMP command from the HMP
and then just provide a way to tell whether it had finished.

Dave

> ---
>  qapi/ui.json |  3 +-
>  include/ui/console.h |  3 ++
>  hmp.c|  2 +-
>  ui/console.c | 99 
>  4 files changed, 96 insertions(+), 11 deletions(-)
> 
> diff --git a/qapi/ui.json b/qapi/ui.json
> index 5d01ad4304..4d2c326fb9 100644
> --- a/qapi/ui.json
> +++ b/qapi/ui.json
> @@ -96,7 +96,8 @@
>  #
>  ##
>  { 'command': 'screendump',
> -  'data': {'filename': 'str', '*device': 'str', '*head': 'int'} }
> +  'data': {'filename': 'str', '*device': 'str', '*head': 'int'},
> +  'async': true }
>  
>  ##
>  # == Spice
> diff --git a/include/ui/console.h b/include/ui/console.h
> index fc21621656..0c02190963 100644
> --- a/include/ui/console.h
> +++ b/include/ui/console.h
> @@ -74,6 +74,9 @@ struct MouseTransformInfo {
>  };
>  
>  void hmp_mouse_set(Monitor *mon, const QDict *qdict);
> +void hmp_screendump_sync(const char *filename,
> + bool has_device, const char *device,
> + bool has_head, int64_t head, Error **errp);
>  
>  /* keysym is a unicode code except for special keys (see QEMU_KEY_xxx
> constants) */
> diff --git a/hmp.c b/hmp.c
> index 679467d85a..da9008fe63 100644
> --- a/hmp.c
> +++ b/hmp.c
> @@ -2144,7 +2144,7 @@ void hmp_screendump(Monitor *mon, const QDict *qdict)
>  int64_t head = qdict_get_try_int(qdict, "head", 0);
>  Error *err = NULL;
>  
> -qmp_screendump(filename, id != NULL, id, id != NULL, head, );
> +hmp_screendump_sync(filename, id != NULL, id, id != NULL, head, );
>  hmp_handle_error(mon, );
>  }
>  
> diff --git a/ui/console.c b/ui/console.c
> index 29234605a7..da51861191 100644
> --- a/ui/console.c
> +++ b/ui/console.c
> @@ -32,6 +32,7 @@
>  #include "chardev/char-fe.h"
>  #include "trace.h"
>  #include "exec/memory.h"
> +#include "monitor/monitor.h"
>  
>  #define DEFAULT_BACKSCROLL 512
>  #define CONSOLE_CURSOR_PERIOD 500
> @@ -116,6 +117,12 @@ typedef enum {
>  TEXT_CONSOLE_FIXED_SIZE
>  } console_type_t;
>  
> +struct qmp_screendump {
> +gchar *filename;
> +QmpReturn *ret;
> +QLIST_ENTRY(qmp_screendump) link;
> +};
> +
>  struct QemuConsole {
>  Object parent;
>  
> @@ -165,6 +172,8 @@ struct QemuConsole {
>  QEMUFIFO out_fifo;
>  uint8_t out_fifo_buf[16];
>  QEMUTimer *kbd_timer;
> +
> +QLIST_HEAD(, qmp_screendump) qmp_screendumps;
>  };
>  
>  struct DisplayState {
> @@ -190,6 +199,8 @@ static void dpy_refresh(DisplayState *s);
>  static DisplayState *get_alloc_displaystate(void);
>  static void text_console_update_cursor_timer(void);
>  static void text_console_update_cursor(void *opaque);
> +static void ppm_save(const char *filename, DisplaySurface *ds,
> + Error **errp);
>  
>  static void gui_update(void *opaque)
>  {
> @@ -256,8 +267,42 @@ static void gui_setup_refresh(DisplayState *ds)
>  ds->have_text = have_text;
>  }
>  
> +static void qmp_screendump_finish(QemuConsole *con, const char *filename,
> +  QmpReturn *ret)
> +{
> +Error *err = NULL;
> +DisplaySurface *surface;
> +Monitor *prev_mon = cur_mon;
> +
> +if (qmp_return_is_cancelled(ret)) {
> +return;
> +}
> +
> +cur_mon = qmp_return_get_monitor(ret);
> +surface = qemu_console_surface(con);
> +
> +/* FIXME: async save with coroutine? it would have to copy or lock
> + * the surface. */
> +ppm_save(filename, surface, );
> +
> +if (err) {
> +qmp_return_error(ret, err);
> +} else {
> +qmp_screendump_return(ret);
> +}
> +cur_mon = prev_mon;
> +}
> +
>  void graphic_hw_update_done(QemuConsole *con)
>  {
> +struct qmp_screendump *dump, *next;
> +
> +QLIST_FOREACH_SAFE(dump, >qmp_screendumps, link, next) {
> +

[Qemu-devel] [PATCH v2 2/3] machine: make MemoryHotplugState accessible via the machine

2018-04-12 Thread David Hildenbrand

Let's allow to query the MemoryHotplugState from the machine.

This allows us to generically detect if a certain machine has support
for memory devices, and to generically manage it (find free address
range, plug/unplug a memory region).

Signed-off-by: David Hildenbrand 
---
 hw/i386/pc.c | 12 
 hw/ppc/spapr.c   | 12 
 include/hw/boards.h  | 16 
 include/hw/mem/pc-dimm.h | 12 +---
 4 files changed, 41 insertions(+), 11 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index d36bac8c89..fa8862af33 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -2337,6 +2337,17 @@ static void x86_nmi(NMIState *n, int cpu_index, Error 
**errp)
 }
 }
 
+static MemoryHotplugState *pc_machine_get_memory_hotplug_state(MachineState 
*ms)
+{
+PCMachineState *pcms = PC_MACHINE(ms);
+
+if (!pcms->hotplug_memory.base) {
+/* hotplug not supported */
+return NULL;
+}
+return >hotplug_memory;
+}
+
 static void pc_machine_class_init(ObjectClass *oc, void *data)
 {
 MachineClass *mc = MACHINE_CLASS(oc);
@@ -2376,6 +2387,7 @@ static void pc_machine_class_init(ObjectClass *oc, void 
*data)
 hc->unplug = pc_machine_device_unplug_cb;
 nc->nmi_monitor_handler = x86_nmi;
 mc->default_cpu_type = TARGET_DEFAULT_CPU_TYPE;
+mc->get_memory_hotplug_state = pc_machine_get_memory_hotplug_state;
 
 object_class_property_add(oc, PC_MACHINE_MEMHP_REGION_SIZE, "int",
 pc_machine_get_hotplug_memory_region_size, NULL,
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index a7428f7da7..7ccdb705b3 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -3868,6 +3868,17 @@ PowerPCCPU *spapr_find_cpu(int vcpu_id)
 return NULL;
 }
 
+static MemoryHotplugState *spapr_get_memory_hotplug_state(MachineState *ms)
+{
+sPAPRMachineState *spapr = SPAPR_MACHINE(ms);
+
+if (!spapr->hotplug_memory.base) {
+/* hotplug not supported */
+return NULL;
+}
+return >hotplug_memory;
+}
+
 static void spapr_machine_class_init(ObjectClass *oc, void *data)
 {
 MachineClass *mc = MACHINE_CLASS(oc);
@@ -3927,6 +3938,7 @@ static void spapr_machine_class_init(ObjectClass *oc, 
void *data)
  * in which LMBs are represented and hot-added
  */
 mc->numa_mem_align_shift = 28;
+mc->get_memory_hotplug_state = spapr_get_memory_hotplug_state;
 
 smc->default_caps.caps[SPAPR_CAP_HTM] = SPAPR_CAP_OFF;
 smc->default_caps.caps[SPAPR_CAP_VSX] = SPAPR_CAP_ON;
diff --git a/include/hw/boards.h b/include/hw/boards.h
index a609239112..447bdc7219 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -105,6 +105,17 @@ typedef struct {
 CPUArchId cpus[0];
 } CPUArchIdList;
 
+/**
+ * MemoryHotplugState:
+ * @base: address in guest physical address space where hotplug memory
+ * address space begins.
+ * @mr: hotplug memory address space container
+ */
+typedef struct MemoryHotplugState {
+hwaddr base;
+MemoryRegion mr;
+} MemoryHotplugState;
+
 /**
  * MachineClass:
  * @max_cpus: maximum number of CPUs supported. Default: 1
@@ -156,6 +167,10 @@ typedef struct {
  *should instead use "unimplemented-device" for all memory ranges where
  *the guest will attempt to probe for a device that QEMU doesn't
  *implement and a stub device is required.
+ * @get_memory_hotplug_state:
+ *If a machine support memory hotplug, the returned data ontains
+ *information about the portion of guest physical address space
+ *where memory devices can be mapped to (e.g. to hotplug a pc-dimm).
  */
 struct MachineClass {
 /*< private >*/
@@ -212,6 +227,7 @@ struct MachineClass {
  unsigned cpu_index);
 const CPUArchIdList *(*possible_cpu_arch_ids)(MachineState *machine);
 int64_t (*get_default_cpu_node_id)(const MachineState *ms, int idx);
+MemoryHotplugState *(*get_memory_hotplug_state)(MachineState *ms);
 };
 
 /**
diff --git a/include/hw/mem/pc-dimm.h b/include/hw/mem/pc-dimm.h
index e88073321f..8bda37adab 100644
--- a/include/hw/mem/pc-dimm.h
+++ b/include/hw/mem/pc-dimm.h
@@ -19,6 +19,7 @@
 #include "exec/memory.h"
 #include "sysemu/hostmem.h"
 #include "hw/qdev.h"
+#include "hw/boards.h"
 
 #define TYPE_PC_DIMM "pc-dimm"
 #define PC_DIMM(obj) \
@@ -75,17 +76,6 @@ typedef struct PCDIMMDeviceClass {
 MemoryRegion *(*get_vmstate_memory_region)(PCDIMMDevice *dimm);
 } PCDIMMDeviceClass;
 
-/**
- * MemoryHotplugState:
- * @base: address in guest physical address space where hotplug memory
- * address space begins.
- * @mr: hotplug memory address space container
- */
-typedef struct MemoryHotplugState {
-hwaddr base;
-MemoryRegion mr;
-} MemoryHotplugState;
-
 uint64_t pc_dimm_get_free_addr(uint64_t address_space_start,
uint64_t address_space_size,
uint64_t *hint, uint64_t align, uint64_t size,
-- 
2.14.3

[Qemu-devel] [PATCH v2 3/3] pc-dimm: factor out address space logic into MemoryDevice code

2018-04-12 Thread David Hildenbrand

To be able to reuse MemoryDevice logic from other devices besides
pc-dimm, factor the relevant stuff out into the MemoryDevice code.

As we don't care about slots for memory devices that are not pc-dimm,
don't factor that part out.

Most of this patch just moves checks and logic around. While at it, make
the code properly detect certain error conditions better (e.g. fragmented
memory).

Signed-off-by: David Hildenbrand 
---
 hw/i386/pc.c   |  12 +--
 hw/mem/memory-device.c | 162 
 hw/mem/pc-dimm.c   | 185 +++--
 hw/ppc/spapr.c |   9 +-
 include/hw/mem/memory-device.h |   4 +
 include/hw/mem/pc-dimm.h   |  14 +---
 6 files changed, 185 insertions(+), 201 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index fa8862af33..1c25546a0c 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1711,7 +1711,7 @@ static void pc_dimm_plug(HotplugHandler *hotplug_dev,
 goto out;
 }
 
-pc_dimm_memory_plug(dev, >hotplug_memory, mr, align, _err);
+pc_dimm_memory_plug(dev, align, _err);
 if (local_err) {
 goto out;
 }
@@ -1761,17 +1761,9 @@ static void pc_dimm_unplug(HotplugHandler *hotplug_dev,
DeviceState *dev, Error **errp)
 {
 PCMachineState *pcms = PC_MACHINE(hotplug_dev);
-PCDIMMDevice *dimm = PC_DIMM(dev);
-PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
-MemoryRegion *mr;
 HotplugHandlerClass *hhc;
 Error *local_err = NULL;
 
-mr = ddc->get_memory_region(dimm, _err);
-if (local_err) {
-goto out;
-}
-
 hhc = HOTPLUG_HANDLER_GET_CLASS(pcms->acpi_dev);
 hhc->unplug(HOTPLUG_HANDLER(pcms->acpi_dev), dev, _err);
 
@@ -1779,7 +1771,7 @@ static void pc_dimm_unplug(HotplugHandler *hotplug_dev,
 goto out;
 }
 
-pc_dimm_memory_unplug(dev, >hotplug_memory, mr);
+pc_dimm_memory_unplug(dev);
 object_unparent(OBJECT(dev));
 
  out:
diff --git a/hw/mem/memory-device.c b/hw/mem/memory-device.c
index 0e0d6b539a..611c3252f0 100644
--- a/hw/mem/memory-device.c
+++ b/hw/mem/memory-device.c
@@ -15,6 +15,8 @@
 #include "qapi/error.h"
 #include "hw/boards.h"
 #include "qemu/range.h"
+#include "hw/virtio/vhost.h"
+#include "sysemu/kvm.h"
 
 static gint memory_device_addr_sort(gconstpointer a, gconstpointer b)
 {
@@ -105,6 +107,166 @@ uint64_t get_plugged_memory_size(void)
 return size;
 }
 
+static int memory_device_used_region_size_internal(Object *obj, void *opaque)
+{
+uint64_t *size = opaque;
+
+if (object_dynamic_cast(obj, TYPE_MEMORY_DEVICE)) {
+DeviceState *dev = DEVICE(obj);
+MemoryDeviceState *md = MEMORY_DEVICE(obj);
+MemoryDeviceClass *mdc = MEMORY_DEVICE_GET_CLASS(obj);
+
+if (dev->realized) {
+*size += mdc->get_region_size(md, _abort);
+}
+}
+
+object_child_foreach(obj, memory_device_used_region_size_internal, opaque);
+return 0;
+}
+
+static uint64_t memory_device_used_region_size(void)
+{
+uint64_t size = 0;
+
+memory_device_used_region_size_internal(qdev_get_machine(), );
+
+return size;
+}
+
+uint64_t memory_device_get_free_addr(uint64_t *hint, uint64_t align,
+ uint64_t size, Error **errp)
+{
+const uint64_t used_region_size = memory_device_used_region_size();
+uint64_t address_space_start, address_space_end;
+MachineState *machine = MACHINE(qdev_get_machine());
+MachineClass *mc = MACHINE_GET_CLASS(machine);
+MemoryHotplugState *hpms;
+GSList *list = NULL, *item;
+uint64_t new_addr = 0;
+
+if (!mc->get_memory_hotplug_state) {
+error_setg(errp, "memory devices (e.g. for memory hotplug) are not "
+ "supported by the machine");
+return 0;
+}
+
+hpms = mc->get_memory_hotplug_state(machine);
+if (!hpms || !memory_region_size(>mr)) {
+error_setg(errp, "memory devices (e.g. for memory hotplug) are not "
+ "enabled, please specify the maxmem option");
+return 0;
+}
+address_space_start = hpms->base;
+address_space_end = address_space_start + memory_region_size(>mr);
+g_assert(address_space_end >= address_space_start);
+
+if (used_region_size + size > machine->maxram_size - machine->ram_size) {
+error_setg(errp, "not enough space, currently 0x%" PRIx64
+   " in use of total hot pluggable 0x" RAM_ADDR_FMT,
+   used_region_size, machine->maxram_size - machine->ram_size);
+return 0;
+}
+
+if (hint && QEMU_ALIGN_UP(*hint, align) != *hint) {
+error_setg(errp, "address must be aligned to 0x%" PRIx64 " bytes",
+   align);
+return 0;
+}
+
+if (QEMU_ALIGN_UP(size, align) != size) {
+error_setg(errp, "backend memory size must be multiple of 0x%"
+   PRIx64, align);
+

[Qemu-devel] [PATCH v2 0/3] pc-dimm: factor out MemoryDevice

2018-04-12 Thread David Hildenbrand

Right now we can only map PCDIMM/NVDIMM into guest address space. In the
future, we might want to do the same for virtio devices - e.g.
virtio-pmem or virtio-mem. Especially, they should be able to live side
by side to each other.

E.g. the virto based memory devices regions will not be exposed via ACPI
and friends. They will be detected just like other virtio devices and
indicate the applicable memory region. This makes it possible to also use
them on architectures without memory device detection support (e.g. s390x).

Let's factor out the memory device code into a MemoryDevice interface.

See the patch descriptions for details.

v1 -> v2:
- Fix compile issues on ppc (still untested O:-) )

David Hildenbrand (3):
  pc-dimm: factor out MemoryDevice interface
  machine: make MemoryHotplugState accessible via the machine
  pc-dimm: factor out address space logic into MemoryDevice code

 hw/i386/acpi-build.c |   3 +-
 hw/i386/pc.c |  24 ++-
 hw/mem/Makefile.objs |   1 +
 hw/mem/memory-device.c   | 281 +
 hw/mem/pc-dimm.c | 304 +++
 hw/ppc/spapr.c   |  24 ++-
 hw/ppc/spapr_hcall.c |   1 +
 include/hw/boards.h  |  16 ++
 include/hw/mem/memory-device.h   |  48 +
 include/hw/mem/pc-dimm.h |  26 +--
 numa.c   |   3 +-
 qmp.c|   4 +-
 stubs/Makefile.objs  |   2 +-
 stubs/{qmp_pc_dimm.c => qmp_memory_device.c} |   4 +-
 14 files changed, 464 insertions(+), 277 deletions(-)
 create mode 100644 hw/mem/memory-device.c
 create mode 100644 include/hw/mem/memory-device.h
 rename stubs/{qmp_pc_dimm.c => qmp_memory_device.c} (61%)

-- 
2.14.3

[Qemu-devel] [PATCH v2 1/3] pc-dimm: factor out MemoryDevice interface

2018-04-12 Thread David Hildenbrand

On the qmp level, we already have the concept of memory devices:
"query-memory-devices"
Right now, we only support NVDIMM and PCDIMM.

We want to map other devices later into the address space of the guest.
Such device could e.g. be virtio devices. These devices will have a
guest memory range assigned but won't be exposed via e.g. ACPI. We want
to make them look like memory device, but not glued to pc-dimm.

Especially, it will not always be possible to have TYPE_PC_DIMM as a parent
class (e.g. virtio devices). Let's use an interface instead. As a first
part, convert handling of
- qmp_pc_dimm_device_list
- get_plugged_memory_size
to our new model. plug/unplug stuff etc. will follow later.

A memory device will have to provide the following functions:
- get_addr(): Necessary, as the property "addr" can e.g. not be used for
  virtio devices (already defined).
- get_plugged_size(): The amount this device offers to the guest as of
  now.
- get_region_size(): Because this can later on be bigger than the
 plugged size.
- fill_device_info(): Fill MemoryDeviceInfo, e.g. for qmp.

Signed-off-by: David Hildenbrand 
---
 hw/i386/acpi-build.c |   3 +-
 hw/mem/Makefile.objs |   1 +
 hw/mem/memory-device.c   | 119 +++
 hw/mem/pc-dimm.c | 119 ++-
 hw/ppc/spapr.c   |   3 +-
 hw/ppc/spapr_hcall.c |   1 +
 include/hw/mem/memory-device.h   |  44 ++
 include/hw/mem/pc-dimm.h |   2 -
 numa.c   |   3 +-
 qmp.c|   4 +-
 stubs/Makefile.objs  |   2 +-
 stubs/{qmp_pc_dimm.c => qmp_memory_device.c} |   4 +-
 12 files changed, 239 insertions(+), 66 deletions(-)
 create mode 100644 hw/mem/memory-device.c
 create mode 100644 include/hw/mem/memory-device.h
 rename stubs/{qmp_pc_dimm.c => qmp_memory_device.c} (61%)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 3cf2a1679c..ca3645d57b 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -46,6 +46,7 @@
 #include "hw/acpi/vmgenid.h"
 #include "sysemu/tpm_backend.h"
 #include "hw/timer/mc146818rtc_regs.h"
+#include "hw/mem/memory-device.h"
 #include "sysemu/numa.h"
 
 /* Supported chipsets: */
@@ -2253,7 +2254,7 @@ build_tpm2(GArray *table_data, BIOSLinker *linker, GArray 
*tcpalog)
 static void build_srat_hotpluggable_memory(GArray *table_data, uint64_t base,
uint64_t len, int default_node)
 {
-MemoryDeviceInfoList *info_list = qmp_pc_dimm_device_list();
+MemoryDeviceInfoList *info_list = qmp_memory_device_list();
 MemoryDeviceInfoList *info;
 MemoryDeviceInfo *mi;
 PCDIMMDeviceInfo *di;
diff --git a/hw/mem/Makefile.objs b/hw/mem/Makefile.objs
index f12f8b97a2..10be4df2a2 100644
--- a/hw/mem/Makefile.objs
+++ b/hw/mem/Makefile.objs
@@ -1,2 +1,3 @@
 common-obj-$(CONFIG_MEM_HOTPLUG) += pc-dimm.o
+common-obj-$(CONFIG_MEM_HOTPLUG) += memory-device.o
 common-obj-$(CONFIG_NVDIMM) += nvdimm.o
diff --git a/hw/mem/memory-device.c b/hw/mem/memory-device.c
new file mode 100644
index 00..0e0d6b539a
--- /dev/null
+++ b/hw/mem/memory-device.c
@@ -0,0 +1,119 @@
+/*
+ * Memory Device Interface
+ *
+ * Copyright ProfitBricks GmbH 2012
+ * Copyright (C) 2014 Red Hat Inc
+ * Copyright (c) 2018 Red Hat Inc
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/mem/memory-device.h"
+#include "hw/qdev.h"
+#include "qapi/error.h"
+#include "hw/boards.h"
+#include "qemu/range.h"
+
+static gint memory_device_addr_sort(gconstpointer a, gconstpointer b)
+{
+MemoryDeviceState *md_a = MEMORY_DEVICE(a);
+MemoryDeviceState *md_b = MEMORY_DEVICE(b);
+MemoryDeviceClass *mdc = MEMORY_DEVICE_GET_CLASS(a);
+const uint64_t addr_a = mdc->get_addr(md_a);
+const uint64_t addr_b = mdc->get_addr(md_b);
+
+if (addr_a > addr_b) {
+return 1;
+} else if (addr_a < addr_b) {
+return -1;
+}
+return 0;
+}
+
+static int memory_device_built_list(Object *obj, void *opaque)
+{
+GSList **list = opaque;
+
+if (object_dynamic_cast(obj, TYPE_MEMORY_DEVICE)) {
+DeviceState *dev = DEVICE(obj);
+if (dev->realized) { /* only realized memory devices matter */
+*list = g_slist_insert_sorted(*list, dev, memory_device_addr_sort);
+}
+}
+
+object_child_foreach(obj, memory_device_built_list, opaque);
+return 0;
+}
+
+MemoryDeviceInfoList *qmp_memory_device_list(void)
+{
+GSList *devices = NULL, *item;
+MemoryDeviceInfoList *list = NULL, *prev = NULL;
+
+object_child_foreach(qdev_get_machine(),

Re: [Qemu-devel] [PATCH v1 04/24] Makefile: Rename TARGET_DIRS to TARGET_LIST

2018-04-12 Thread Philippe Mathieu-Daudé

On 04/10/2018 04:38 PM, Alex Bennée wrote:
> From: Fam Zheng 
> 
> To be more accurate on its purpose and make code that looks for a certain
> target out of this variable more readable.
> 
> Signed-off-by: Fam Zheng 

Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Philippe Mathieu-Daudé 

> ---
>  Makefile   | 20 ++--
>  configure  |  2 +-
>  scripts/create_config  |  2 +-
>  tests/Makefile.include |  2 +-
>  4 files changed, 13 insertions(+), 13 deletions(-)
> 
> diff --git a/Makefile b/Makefile
> index 727ef118f3..2c54cd8345 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -62,8 +62,8 @@ seems to have been used for an in-tree build. You can fix 
> this by running \
>  endif
>  endif
>  
> -CONFIG_SOFTMMU := $(if $(filter %-softmmu,$(TARGET_DIRS)),y)
> -CONFIG_USER_ONLY := $(if $(filter %-user,$(TARGET_DIRS)),y)
> +CONFIG_SOFTMMU := $(if $(filter %-softmmu,$(TARGET_LIST)),y)
> +CONFIG_USER_ONLY := $(if $(filter %-user,$(TARGET_LIST)),y)
>  CONFIG_XEN := $(CONFIG_XEN_BACKEND)
>  CONFIG_ALL=y
>  -include config-all-devices.mak
> @@ -362,8 +362,8 @@ DOCS=
>  endif
>  
>  SUBDIR_MAKEFLAGS=$(if $(V),,--no-print-directory --quiet) 
> BUILD_DIR=$(BUILD_DIR)
> -SUBDIR_DEVICES_MAK=$(patsubst %, %/config-devices.mak, $(TARGET_DIRS))
> -SUBDIR_DEVICES_MAK_DEP=$(patsubst %, %-config-devices.mak.d, $(TARGET_DIRS))
> +SUBDIR_DEVICES_MAK=$(patsubst %, %/config-devices.mak, $(TARGET_LIST))
> +SUBDIR_DEVICES_MAK_DEP=$(patsubst %, %-config-devices.mak.d, $(TARGET_LIST))
>  
>  ifeq ($(SUBDIR_DEVICES_MAK),)
>  config-all-devices.mak:
> @@ -466,7 +466,7 @@ config-host.h-timestamp: config-host.mak
>  qemu-options.def: $(SRC_PATH)/qemu-options.hx $(SRC_PATH)/scripts/hxtool
>   $(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -h < $< > 
> $@,"GEN","$@")
>  
> -SUBDIR_RULES=$(patsubst %,subdir-%, $(TARGET_DIRS))
> +SUBDIR_RULES=$(patsubst %,subdir-%, $(TARGET_LIST))
>  SOFTMMU_SUBDIR_RULES=$(filter %-softmmu,$(SUBDIR_RULES))
>  
>  $(SOFTMMU_SUBDIR_RULES): $(block-obj-y)
> @@ -510,7 +510,7 @@ ROMSUBDIR_RULES=$(patsubst %,romsubdir-%, $(ROMS))
>  romsubdir-%:
>   $(call quiet-command,$(MAKE) $(SUBDIR_MAKEFLAGS) -C pc-bios/$* V="$(V)" 
> TARGET_DIR="$*/" CFLAGS="$(filter -O% -g%,$(CFLAGS))",)
>  
> -ALL_SUBDIRS=$(TARGET_DIRS) $(patsubst %,pc-bios/%, $(ROMS))
> +ALL_SUBDIRS=$(TARGET_LIST) $(patsubst %,pc-bios/%, $(ROMS))
>  
>  recurse-all: $(SUBDIR_RULES) $(ROMSUBDIR_RULES)
>  
> @@ -763,7 +763,7 @@ distclean: clean
>   rm -f docs/interop/qemu-qmp-ref.pdf docs/interop/qemu-ga-ref.pdf
>   rm -f docs/interop/qemu-qmp-ref.html docs/interop/qemu-ga-ref.html
>   rm -f docs/qemu-block-drivers.7
> - for d in $(TARGET_DIRS); do \
> + for d in $(TARGET_LIST); do \
>   rm -rf $$d || exit 1 ; \
>  done
>   rm -Rf .sdk
> @@ -864,7 +864,7 @@ endif
>   $(INSTALL_DATA) $(SRC_PATH)/pc-bios/keymaps/$$x 
> "$(DESTDIR)$(qemu_datadir)/keymaps"; \
>   done
>   $(INSTALL_DATA) $(BUILD_DIR)/trace-events-all 
> "$(DESTDIR)$(qemu_datadir)/trace-events-all"
> - for d in $(TARGET_DIRS); do \
> + for d in $(TARGET_LIST); do \
>   $(MAKE) $(SUBDIR_MAKEFLAGS) TARGET_DIR=$$d/ -C $$d $@ || exit 1 ; \
>  done
>  
> @@ -1062,9 +1062,9 @@ endif
>   @echo  '  ctags/TAGS  - Generate tags file for editors'
>   @echo  '  cscope  - Generate cscope index'
>   @echo  ''
> - @$(if $(TARGET_DIRS), \
> + @$(if $(TARGET_LIST), \
>   echo 'Architecture specific targets:'; \
> - $(foreach t, $(TARGET_DIRS), \
> + $(foreach t, $(TARGET_LIST), \
>   printf "  %-30s - Build for %s\\n" $(patsubst %,subdir-%,$(t)) 
> $(t);) \
>   echo '')
>   @echo  'Cleaning targets:'
> diff --git a/configure b/configure
> index add87ff4d4..5a41c87cc3 100755
> --- a/configure
> +++ b/configure
> @@ -6094,7 +6094,7 @@ qemu_version=$(head $source_path/VERSION)
>  echo "VERSION=$qemu_version" >>$config_host_mak
>  echo "PKGVERSION=$pkgversion" >>$config_host_mak
>  echo "SRC_PATH=$source_path" >> $config_host_mak
> -echo "TARGET_DIRS=$target_list" >> $config_host_mak
> +echo "TARGET_LIST=$target_list" >> $config_host_mak
>  if [ "$docs" = "yes" ] ; then
>echo "BUILD_DOCS=yes" >> $config_host_mak
>  fi
> diff --git a/scripts/create_config b/scripts/create_config
> index d727e5e36e..58948a67a4 100755
> --- a/scripts/create_config
> +++ b/scripts/create_config
> @@ -107,7 +107,7 @@ case $line in
>  target_name=${line#*=}
>  echo "#define TARGET_NAME \"$target_name\""
>  ;;
> - TARGET_DIRS=*)
> + TARGET_LIST=*)
>  # do nothing
>  ;;
>   TARGET_*=y) # configuration
> diff --git a/tests/Makefile.include b/tests/Makefile.include
> index 3b9a5e31a2..3d2f0458ab 100644
> --- a/tests/Makefile.include
> +++ b/tests/Makefile.include
> @@ -854,7 +854,7 @@ endif
>  
>  # QTest rules
>  
> -TARGETS=$(patsubst

[Qemu-devel] buffer sharing across VMs - xen-zcopy and hyper_dmabuf discussion

2018-04-12 Thread Dongwon Kim

(changed subject and decoupling from udmabuf thread)

On Wed, Apr 11, 2018 at 08:59:32AM +0300, Oleksandr Andrushchenko wrote:
> On 04/10/2018 08:26 PM, Dongwon Kim wrote:
> >On Tue, Apr 10, 2018 at 09:37:53AM +0300, Oleksandr Andrushchenko wrote:
> >>On 04/06/2018 09:57 PM, Dongwon Kim wrote:
> >>>On Fri, Apr 06, 2018 at 03:36:03PM +0300, Oleksandr Andrushchenko wrote:
> On 04/06/2018 02:57 PM, Gerd Hoffmann wrote:
> >   Hi,
> >
> >>>I fail to see any common ground for xen-zcopy and udmabuf ...
> >>Does the above mean you can assume that xen-zcopy and udmabuf
> >>can co-exist as two different solutions?
> >Well, udmabuf route isn't fully clear yet, but yes.
> >
> >See also gvt (intel vgpu), where the hypervisor interface is abstracted
> >away into a separate kernel modules even though most of the actual vgpu
> >emulation code is common.
> Thank you for your input, I'm just trying to figure out
> which of the three z-copy solutions intersect and how much
> >>And what about hyper-dmabuf?
> >>>xen z-copy solution is pretty similar fundamentally to hyper_dmabuf
> >>>in terms of these core sharing feature:
> >>>
> >>>1. the sharing process - import prime/dmabuf from the producer -> extract
> >>>underlying pages and get those shared -> return references for shared pages
> >Another thing is danvet was kind of against to the idea of importing existing
> >dmabuf/prime buffer and forward it to the other domain due to synchronization
> >issues. He proposed to make hyper_dmabuf only work as an exporter so that it
> >can have a full control over the buffer. I think we need to talk about this
> >further as well.
> Yes, I saw this. But this limits the use-cases so much.

I agree. Our current approach is a lot more flexible. You can find very
similar feedback in my reply to those review messages. However, I also
understand Daniel's concern as well. I believe we need more dicussion
regarding this matter.

> For instance, running Android as a Guest (which uses ION to allocate
> buffers) means that finally HW composer will import dma-buf into
> the DRM driver. Then, in case of xen-front for example, it needs to be
> shared with the backend (Host side). Of course, we can change user-space
> to make xen-front allocate the buffers (make it exporter), but what we try
> to avoid is to change user-space which in normal world would have remain
> unchanged otherwise.
> So, I do think we have to support this use-case and just have to understand
> the complexity.
> 
> >
> >danvet, can you comment on this topic?
> >
> >>>2. the page sharing mechanism - it uses Xen-grant-table.
> >>>
> >>>And to give you a quick summary of differences as far as I understand
> >>>between two implementations (please correct me if I am wrong, Oleksandr.)
> >>>
> >>>1. xen-zcopy is DRM specific - can import only DRM prime buffer
> >>>while hyper_dmabuf can export any dmabuf regardless of originator
> >>Well, this is true. And at the same time this is just a matter
> >>of extending the API: xen-zcopy is a helper driver designed for
> >>xen-front/back use-case, so this is why it only has DRM PRIME API
> >>>2. xen-zcopy doesn't seem to have dma-buf synchronization between two VMs
> >>>while (as danvet called it as remote dmabuf api sharing) hyper_dmabuf sends
> >>>out synchronization message to the exporting VM for synchronization.
> >>This is true. Again, this is because of the use-cases it covers.
> >>But having synchronization for a generic solution seems to be a good idea.
> >Yeah, understood xen-zcopy works ok with your use case. But I am just curious
> >if it is ok not to have any inter-domain synchronization in this sharing 
> >model.
> The synchronization is done with displif protocol [1]
> >The buffer being shared is technically dma-buf and originator needs to be 
> >able
> >to keep track of it.
> As I am working in DRM terms the tracking is done by the DRM core
> for me for free. (This might be one of the reasons Daniel sees DRM
> based implementation fit very good from code-reuse POV).

yeah but once you have a DRM object (whether it's dmabuf or not) on a remote
domain, it is totally new object and out of sync (correct me if I am wrong)
with original DRM prime, isn't it? How could these two different but based on
same pages be synchronized?

> >
> >>>3. 1-level references - when using grant-table for sharing pages, there 
> >>>will
> >>>be same # of refs (each 8 byte)
> >>To be precise, grant ref is 4 bytes
> >You are right. Thanks for correction.;)
> >
> >>>as # of shared pages, which is passed to
> >>>the userspace to be shared with importing VM in case of xen-zcopy.
> >>The reason for that is that xen-zcopy is a helper driver, e.g.
> >>the grant references come from the display backend [1], which implements
> >>Xen display protocol [2]. So, effectively the backend extracts references
> >>from frontend's requests and passes those to xen-zcopy as an array
> >>of refs.
> >>>  Compared
> >>>to

Re: [Qemu-devel] [PATCH 07/19] block: Really pause block jobs on drain

2018-04-12 Thread Kevin Wolf

Am 12.04.2018 um 15:42 hat Paolo Bonzini geschrieben:
> On 12/04/2018 15:27, Kevin Wolf wrote:
> > Not sure I follow. Let's look at an example. Say, we have a block job
> > BlockBackend as the root (because that uses proper layering, unlike
> > devices which use aio_disable_external()), connected to a qcow2 node
> > over file.
> > 
> > 1. The block job issues a request to the qcow2 node
> > 
> > 2. In order to process that request, the qcow2 node internally issues
> >one or more requests to the file node
> > 
> > 3. Someone calls bdrv_drained_begin(qcow2_node)
> > 
> > 4. We call BlockDriver.bdrv_drained_begin() for qcow2_node and
> >file_node, and BdrvChildRole.drained_begin() for the block job.
> > 
> >Now the block nodes don't create any new original requests any more
> >(qcow2 and file don't do that anyway; qcow2 only creates requests to
> >its children in the context of parent requests). The job potentially
> >continues sending requests until it reaches the next pause point. The
> >previously issued requests are still in flight.
> >
> >Is this step what you meant by X->drv->bdrv_drain(X)? I don't see why
> >pending requests can only be in X's children. Why can't the request
> >be pending in X itself, say waiting for a thread pool worker
> >decrypting a buffer?
> 
> No, that step is ->bdrv_co_drain_begin in BlockDriver.  It's where the
> "last" requests are sent to file_node after we know that qcow2_node
> won't get any more requests.

That's not really how .bdrv_co_drain_begin works. It is called first,
potentially long before we know that the parent won't send new requests
any more. All it does is preventing the node from creating new original
requests, i.e. internal requests that are not triggered by a parent
request.

> >Also, note that after this series, the driver callbacks are called
> >asynchronously, but I don't think it makes a big difference here.
> > 
> > 5. The file_node request completes. file_node doesn't have any requests
> >in flight any more, but in theory it could still get new requests
> >from qcow2_node. Anyway, let's say this is the last request, so I
> >think we'd call its requests concluded?
> 
> No, if it can still get more requests they're not concluded.

Okay. In that case, we can't really determine any order, because the
whole subtree concludes when the root node concludes. (Ignoring any
additional parents, but the effect is the same: Children always conclude
at the same time as their last parent.)

> That's why we need to first ensure qcow2_node is quiescent, and before
> then we need to ensure that the BlockBackends are quiescent (in this
> case meaning the job has reached its pause point).  Only then we can
> look at file_node.  In this case we'll see that we have nothing to
> do---file_node is already quiescent---and bdrv_drained_begin() can
> return.

bdrv_drained_begin(qcow2_node) actually never looks at file_node. Only
bdrv_subtree_drained_begin(qcow2_node) would do so.

But what you're saying is essentially that for a subtree drain, we want
the following order in bdrv_drained_poll():

1. Check if a parent still has pending requests. If so, don't bother
   checking the rest.

2. Check the node the drain request was for. If so, don't bother looking
   at the children.

3. Finally, if all parents and the node itself are drained, look at the
   children.

This is already the order we have there. What is probably different from
what you envision is that after the parents have concluded, we still
check that they are still quiescent in every iteration.

What we could do easily is introducing a bool bs->quiesce_concluded or
something that bdrv_drain_poll() sets to true the first time that it
returns false. It also gets a shortcut so that it returns false
immediately if bs->quiesce_concluded is true. The field is reset in
bdrv_do_drained_end() when bs->quiesce_counter reaches 0.

That would be a rather simple optimisation that could be done as the
final patch in this series, and would ensure that we don't recurse back
to parents that are already quiescent.

Would you be happy with this change?

Kevin

Re: [Qemu-devel] [PATCH v2 1/2 for-2.12?] qapi: Parameter gl of DisplayType now accept an enum

2018-04-12 Thread Elie Tournier

Hello,

On Tue, Apr 10, 2018 at 03:33:35PM +0200, Gerd Hoffmann wrote:
> > # @off: Disable OpenGL (default).

Just to be sure, I have to add @ in front of all parameter, right?

> > 
> > > + # 'on'Use OpenGL, pick context type automatically.
> > > + # Would better be named 'auto' but is called 'on' for backward
> > > + # compatibility with bool type.
> > 
> > See below...
> 
> > DisplayOptions was added in 2.12.  This is a backwards-incompatible
> > change in QMP (you CANNOT change 'bool' to 'DisplayGLMode' across
> > releases, because the on-the-wire representation differs; pre-patch it
> > would be "gl":true, post-patch it is "gl":"on").  So if it affects QMP,
> > and we want it, this patch either HAS to go in 2.12, or we have to have
> > more finesse (perhaps by using an 'alternate' type in the QAPI) so that
> > it remains backwards-compatible.
> > 
> > /me goes and looks at introspection output...
> > 
> > You may be in luck - there is no instance of 'window-close' in the
> > introspection output, which means 'DisplayType' exists only for ease of
> > command-line parsing and is not currently used by QMP, so tweaks here
> > are not affecting the command line.
> 
> Yes, right now the struct is only used to store the parsed command line
> opts, so no effect on QMP.
> 
> Plan for the future is to also parse command line options with generic
> qapi/json code instead of the home-grown parser, but that switch didn't
> happen yet.
> 
> > That said, you can STILL name the enum value something smarter than 'on'
> > IF you make the change now, for 2.12, given that the QAPI type was only
> > introduced in 2.12 (you only have to worry about backwards compatibility
> > if 2.11 already parsed gl=on).
> 
> gl=on is older, so that must continue to work.  Making both "on" and
> "auto" work isn't a problem for our home-grown parser (aka
> parse_display() in vl.c).  But having quirks like this makes the switch
> to generic parser code more difficuilt, so I'd prefer to avoid that ...

Is it possible to upstream this change before 2.12 release?

> 
> cheers,
>   Gerd
>

Re: [Qemu-devel] [PATCH] migration: discard RAMBlocks of type ram_device

2018-04-12 Thread Paolo Bonzini

On 12/04/2018 15:51, Peter Maydell wrote:
> Paolo may have an opinion what the API here should be, but
> at the MemoryRegion level we already have a mix of functions
> memory_region_init_foo_nomigrate() and memory_region_init_foo(),
> which at the moment just control whether we call
> vmstate_register_ram() or not. Is it possible to hook into that,
> so that we default to marking RAMBlocks as not-migrated, and
> when vmstate_register_ram() is called we set the flag to
> "migrated"?

Yes, that sounds pretty.

Paolo

> NB: this probably requires careful thought to make sure we
> don't accidentally break migration compat, but it seems like
> the cleaner approach. At the moment we do weird things if you
> migrate a RAM block that hasn't had its ID string set, IIRC,
> so just not migrating those RAM blocks seems like a better
> plan than mishandling them. It would also mean that the
> vmstate_register/unregister_ram() functions did what they
> claim to do based on the function name, I think.

[Qemu-devel] [PATCH for-2.12] linux-user/signal.c: Put AArch64 frame record in the right place

2018-04-12 Thread Peter Maydell

AArch64 stack frames include a 'frame record' which holds a pointer
to the next frame record in the chain and the LR on entry to the
function. The procedure calling standard doesn't mandate where
exactly this frame record is in the stack frame, but for signal
frames the kernel puts it right at the top. We used to put it
there too, but in commit 7f0f4208b3a96f22 we accidentally put
the "enlarge to the 4K reserved space minimum" check after the
"allow for the frame record" code, rather than before it, with
the effect that the frame record would be inside the reserved
space and immediately after the last used part of it.

Move the frame record back out of the reserved space to where
we used to put it.

This bug shouldn't break any sensible guest code, but test
programs that deliberately look at the internal details
of the signal frame layout will not find what they are
expecting to see.

Fixes: 7f0f4208b3a96f22
Signed-off-by: Peter Maydell 
---
I'm marking this as for-2.12 on the basis that it puts our frame
layout back to exactly what 2.11 had, and so seems safest.
No sensible guest code should really care, though, so this is
in the "only if we're doing an rc4" bucket; but I think that the
softfloat fixes deserve an rc4 anyway.

 linux-user/signal.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/linux-user/signal.c b/linux-user/signal.c
index 8d9e6e8410..e6dfe0adfd 100644
--- a/linux-user/signal.c
+++ b/linux-user/signal.c
@@ -1843,6 +1843,12 @@ static void target_setup_frame(int usig, struct 
target_sigaction *ka,
 layout.total_size += sizeof(struct target_aarch64_ctx);
 }
 
+/* We must always provide at least the standard 4K reserved space,
+ * even if we don't use all of it (this is part of the ABI)
+ */
+layout.total_size = MAX(layout.total_size,
+sizeof(struct target_rt_sigframe));
+
 /* Reserve space for the return code.  On a real system this would
  * be within the VDSO.  So, despite the name this is not a "real"
  * record within the frame.
@@ -1850,12 +1856,6 @@ static void target_setup_frame(int usig, struct 
target_sigaction *ka,
 fr_ofs = layout.total_size;
 layout.total_size += sizeof(struct target_rt_frame_record);
 
-/* We must always provide at least the standard 4K reserved space,
- * even if we don't use all of it (this is part of the ABI)
- */
-layout.total_size = MAX(layout.total_size,
-sizeof(struct target_rt_sigframe));
-
 frame_addr = get_sigframe(ka, env, layout.total_size);
 trace_user_setup_frame(env, frame_addr);
 if (!lock_user_struct(VERIFY_WRITE, frame, frame_addr, 0)) {
-- 
2.16.2

[Qemu-devel] [PATCH qemu v3] device_tree: Increase FDT_MAX_SIZE to 1 MiB

2018-04-12 Thread Geert Uytterhoeven

It is not uncommon for a contemporary FDT to be larger than 64 KiB,
leading to failures loading the device tree from sysfs:

qemu-system-aarch64: qemu_fdt_setprop: Couldn't set ...: FDT_ERR_NOSPACE

Hence increase the limit to 1 MiB, like on PPC.

For reference, the largest arm64 DTB created from the Linux sources is
ca. 75 KiB large (100 KiB when built with symbols/fixup support).

Signed-off-by: Geert Uytterhoeven 
---
v3:
  - Update example size figures,

v2:
  - Enlarge from 128 KiB to 1 MiB, as suggested by Peter Maydell.
---
 device_tree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/device_tree.c b/device_tree.c
index 19458b32bf81e55e..52c3358a55838d33 100644
--- a/device_tree.c
+++ b/device_tree.c
@@ -29,7 +29,7 @@
 
 #include 
 
-#define FDT_MAX_SIZE  0x1
+#define FDT_MAX_SIZE  0x10
 
 void *create_device_tree(int *sizep)
 {
-- 
2.7.4

Re: [Qemu-devel] [PATCH] migration: discard RAMBlocks of type ram_device

2018-04-12 Thread Peter Maydell

On 12 April 2018 at 14:41, Cédric Le Goater  wrote:
> On 04/12/2018 02:08 PM, Peter Maydell wrote:
>> On 12 April 2018 at 12:53, Dr. David Alan Gilbert  
>> wrote:
>>> * Peter Maydell (peter.mayd...@linaro.org) wrote:
 David suggested on IRC that we would want a flag on the ramblock
 for "not migratable", because there are other uses for "don't
 migrate this" than just "is this a ram device".
>>>
>>> My original suggestion to your series was with a flag, but I'd forgotten
>>> about that by the time I'd made the suggestion to Cédric.
>>> In your case would just adding an extra term to the
>>> ram_block_is_migratable function work, or do you really need a flag?
>>
>> I don't see how else you would identify the ram block that needs
>> to be skipped. Also I think it's just better design to decouple
>> the decision about "should we migrate this ram block" from the
>> migration code itself, and push it up to the code layer that knows
>> it's creating ram blocks that shouldn't be migrated.
>
> Do you mean adding a new RAMBlock flag RAM_NON_MIGRATABLE
> in exec.c ? That would require to add an extra bool to the
> following functions :
>
> memory_region_init_ram_ptr()
> qemu_ram_alloc_from_ptr()
> qemu_ram_alloc_internal()
>
> Would that be ok ?

Paolo may have an opinion what the API here should be, but
at the MemoryRegion level we already have a mix of functions
memory_region_init_foo_nomigrate() and memory_region_init_foo(),
which at the moment just control whether we call
vmstate_register_ram() or not. Is it possible to hook into that,
so that we default to marking RAMBlocks as not-migrated, and
when vmstate_register_ram() is called we set the flag to
"migrated"?

NB: this probably requires careful thought to make sure we
don't accidentally break migration compat, but it seems like
the cleaner approach. At the moment we do weird things if you
migrate a RAM block that hasn't had its ID string set, IIRC,
so just not migrating those RAM blocks seems like a better
plan than mishandling them. It would also mean that the
vmstate_register/unregister_ram() functions did what they
claim to do based on the function name, I think.

thanks
-- PMM

Re: [Qemu-devel] [PATCH 07/19] block: Really pause block jobs on drain

2018-04-12 Thread Paolo Bonzini

On 12/04/2018 15:27, Kevin Wolf wrote:
> Not sure I follow. Let's look at an example. Say, we have a block job
> BlockBackend as the root (because that uses proper layering, unlike
> devices which use aio_disable_external()), connected to a qcow2 node
> over file.
> 
> 1. The block job issues a request to the qcow2 node
> 
> 2. In order to process that request, the qcow2 node internally issues
>one or more requests to the file node
> 
> 3. Someone calls bdrv_drained_begin(qcow2_node)
> 
> 4. We call BlockDriver.bdrv_drained_begin() for qcow2_node and
>file_node, and BdrvChildRole.drained_begin() for the block job.
> 
>Now the block nodes don't create any new original requests any more
>(qcow2 and file don't do that anyway; qcow2 only creates requests to
>its children in the context of parent requests). The job potentially
>continues sending requests until it reaches the next pause point. The
>previously issued requests are still in flight.
>
>Is this step what you meant by X->drv->bdrv_drain(X)? I don't see why
>pending requests can only be in X's children. Why can't the request
>be pending in X itself, say waiting for a thread pool worker
>decrypting a buffer?

No, that step is ->bdrv_co_drain_begin in BlockDriver.  It's where the
"last" requests are sent to file_node after we know that qcow2_node
won't get any more requests.

>Also, note that after this series, the driver callbacks are called
>asynchronously, but I don't think it makes a big difference here.
> 
> 5. The file_node request completes. file_node doesn't have any requests
>in flight any more, but in theory it could still get new requests
>from qcow2_node. Anyway, let's say this is the last request, so I
>think we'd call its requests concluded?

No, if it can still get more requests they're not concluded.  That's why
we need to first ensure qcow2_node is quiescent, and before then we need
to ensure that the BlockBackends are quiescent (in this case meaning the
job has reached its pause point).  Only then we can look at file_node.
In this case we'll see that we have nothing to do---file_node is already
quiescent---and bdrv_drained_begin() can return.

> 6. qcow2 still has a request in flight, but doesn't need to access
>file_node for it. It finishes the work and therefore concludes its
>requests as well. Note that qcow2_node (the parent) concludes after
>file_node (the child).
> 
> 7. We'll keep the example simple, so after completion of its request,
>the job reaches a pause point without sending a new request. Again,
>this happens after qcow2_node has concluded.
> 
> 8. Only when neither file_node nor qcow2_node have a request in flight
>and the job has reached a pause point, bdrv_drained_begin() can
>return.
> 
> So completing the last request and reaching an actually quiescent state
> looks very much like a process in child-to-parent order to me?

Re: [Qemu-devel] [PATCH] migration: discard RAMBlocks of type ram_device

2018-04-12 Thread Cédric Le Goater

On 04/12/2018 02:08 PM, Peter Maydell wrote:
> On 12 April 2018 at 12:53, Dr. David Alan Gilbert  wrote:
>> * Peter Maydell (peter.mayd...@linaro.org) wrote:
>>> David suggested on IRC that we would want a flag on the ramblock
>>> for "not migratable", because there are other uses for "don't
>>> migrate this" than just "is this a ram device".
>>
>> My original suggestion to your series was with a flag, but I'd forgotten
>> about that by the time I'd made the suggestion to Cédric.
>> In your case would just adding an extra term to the
>> ram_block_is_migratable function work, or do you really need a flag?
> 
> I don't see how else you would identify the ram block that needs
> to be skipped. Also I think it's just better design to decouple
> the decision about "should we migrate this ram block" from the
> migration code itself, and push it up to the code layer that knows
> it's creating ram blocks that shouldn't be migrated.

Do you mean adding a new RAMBlock flag RAM_NON_MIGRATABLE 
in exec.c ? That would require to add an extra bool to the 
following functions :

memory_region_init_ram_ptr()
qemu_ram_alloc_from_ptr()
qemu_ram_alloc_internal()

Would that be ok ? 

Thanks,

C.

Re: [Qemu-devel] [Bug 1759264] Re: fpu/softfloat: round_to_int_and_pack refactor broke TriCore ftoi insns

2018-04-12 Thread Alex Bennée


Bastian Koppelmann  writes:

> On 04/11/2018 01:01 PM, Alex Bennée wrote:
>> Bastian Koppelmann  writes:
>>
>>> On 04/10/2018 10:07 PM, Alex Bennée wrote:
 Yeah it looks like it was missed, the round_to_uint code does it.

 Do you have a test case I can verify?

>>>
>>> For the NaN input 0x the expected result for the flags is that
>>> flag_invalid is raised.
>>>
>>> I can provide you with some TriCore asm, but it is a bit of pain to get
>>> the gnu assembler to build, since the public version is a decade old.
>>
>> I'll trust you if you send me a static binary for this particular
>> verification ;-)
>
> I set up a github repo with working binutils and the corresponding
> testcase:
>
> https://github.com/bkoppelmann/tricore-fpu
>
> The one caveat is, that we cannot produce any binaries with the TriCore
> ISA > 1.3.
>
> In this testcase the last ftoi instruction is supposed to raise the
> invalid flag and return 0 since the input was NaN. We did that by only
> checking for NaN if any flag was raised after ftoi, then do the NaN
> check, and if positive, return 0.

Well it builds and I get an fpu-test.elf but I'm a bit stuck on how to
run it. What are the runes for launching the test?

>
> Cheers,
> Bastian


--
Alex Bennée

Re: [Qemu-devel] [PATCH] block/gluster: defend on legacy ftruncate api use

2018-04-12 Thread Niels de Vos

This change looks good to me, but a commit message would have been
helpful. I suggest something like this:

  Gluster 4.0 changed the signature of glfs_ftruncate(). The function
  now has two additional arguments, namely prestat and poststat. These
  provide not benefit for QEMU, so ignoring them and passing NULL makes
  the gluster-block driver compile with the new Gluster version again.

And maybe add this too:

  Glusters libgfapi uses symbol versioning and provides backwards
  compatible functions. Binaries compiled against previous versions of
  Gluster keep on functioning with the new library.

Thanks!

Reviewed-by: Niels de Vos 


On Thu, Apr 12, 2018 at 04:46:14PM +0530, Prasanna Kumar Kalever wrote:
> Signed-off-by: Prasanna Kumar Kalever 
> ---
>  block/gluster.c | 15 +--
>  configure   |  8 
>  2 files changed, 21 insertions(+), 2 deletions(-)
> 
> diff --git a/block/gluster.c b/block/gluster.c
> index 4adc1a875b..2474580ad6 100644
> --- a/block/gluster.c
> +++ b/block/gluster.c
> @@ -996,6 +996,7 @@ static int qemu_gluster_do_truncate(struct glfs_fd *fd, 
> int64_t offset,
>  PreallocMode prealloc, Error **errp)
>  {
>  int64_t current_length;
> +int ret;
>  
>  current_length = glfs_lseek(fd, 0, SEEK_END);
>  if (current_length < 0) {
> @@ -1023,7 +1024,12 @@ static int qemu_gluster_do_truncate(struct glfs_fd 
> *fd, int64_t offset,
>  #endif /* CONFIG_GLUSTERFS_FALLOCATE */
>  #ifdef CONFIG_GLUSTERFS_ZEROFILL
>  case PREALLOC_MODE_FULL:
> -if (glfs_ftruncate(fd, offset)) {
> +#ifdef CONFIG_GLUSTERFS_LEGACY_FTRUNCATE
> +ret = glfs_ftruncate(fd, offset);
> +#else
> +ret = glfs_ftruncate(fd, offset, NULL, NULL);
> +#endif
> +if (ret) {
>  error_setg_errno(errp, errno, "Could not resize file");
>  return -errno;
>  }
> @@ -1034,7 +1040,12 @@ static int qemu_gluster_do_truncate(struct glfs_fd 
> *fd, int64_t offset,
>  break;
>  #endif /* CONFIG_GLUSTERFS_ZEROFILL */
>  case PREALLOC_MODE_OFF:
> -if (glfs_ftruncate(fd, offset)) {
> +#ifdef CONFIG_GLUSTERFS_LEGACY_FTRUNCATE
> +ret = glfs_ftruncate(fd, offset);
> +#else
> +ret = glfs_ftruncate(fd, offset, NULL, NULL);
> +#endif
> +if (ret) {
>  error_setg_errno(errp, errno, "Could not resize file");
>  return -errno;
>  }
> diff --git a/configure b/configure
> index 0a19b033bc..69827b0098 100755
> --- a/configure
> +++ b/configure
> @@ -429,6 +429,7 @@ glusterfs_xlator_opt="no"
>  glusterfs_discard="no"
>  glusterfs_fallocate="no"
>  glusterfs_zerofill="no"
> +glusterfs_legacy_ftruncate="no"
>  gtk=""
>  gtkabi=""
>  gtk_gl="no"
> @@ -3856,6 +3857,9 @@ if test "$glusterfs" != "no" ; then
>glusterfs_fallocate="yes"
>glusterfs_zerofill="yes"
>  fi
> +if ! $pkg_config --atleast-version=7.4 glusterfs-api; then
> +  glusterfs_legacy_ftruncate="yes"
> +fi
>else
>  if test "$glusterfs" = "yes" ; then
>feature_not_found "GlusterFS backend support" \
> @@ -6502,6 +6506,10 @@ if test "$glusterfs_zerofill" = "yes" ; then
>echo "CONFIG_GLUSTERFS_ZEROFILL=y" >> $config_host_mak
>  fi
>  
> +if test "$glusterfs_legacy_ftruncate" = "yes" ; then
> +  echo "CONFIG_GLUSTERFS_LEGACY_FTRUNCATE=y" >> $config_host_mak
> +fi
> +
>  if test "$libssh2" = "yes" ; then
>echo "CONFIG_LIBSSH2=m" >> $config_host_mak
>echo "LIBSSH2_CFLAGS=$libssh2_cflags" >> $config_host_mak
> -- 
> 2.14.3
>

Re: [Qemu-devel] [PATCH 07/19] block: Really pause block jobs on drain

2018-04-12 Thread Kevin Wolf

Am 12.04.2018 um 14:02 hat Paolo Bonzini geschrieben:
> On 12/04/2018 13:53, Kevin Wolf wrote:
> >> The problem I have is that there is a direction through which I/O flows
> >> (parent-to-child), so why can't draining follow that natural direction.
> >> Having to check for the parents' I/O, while draining the child, seems
> >> wrong.  Perhaps we can't help it, but I cannot understand the reason.
> > I'm not sure what's there that could be not understood. You already
> > confirmed that we need to drain the parents, too, when we drain a node.
> > Drain really must propagate in the opposite direction of I/O, because
> > part of its job is to quiesce the origin of any I/O to the node that
> > should be drained. Opposite of I/O _is_ the natural direction for drain.
> 
> Opposite of I/O is the natural direction for drain to propagate, yes.
> 
> However, I/O direction is the natural direction for requests to stop.
> After quiescing X and calling X->drv->bdrv_drain(X), there can be
> pending requests only in X's children.  So I don't understand why you
> need to keep checking in_flight over the whole subgraph, when there are
> roots that will conclude their request first, and then their children,
> and so on so forth.

Not sure I follow. Let's look at an example. Say, we have a block job
BlockBackend as the root (because that uses proper layering, unlike
devices which use aio_disable_external()), connected to a qcow2 node
over file.

1. The block job issues a request to the qcow2 node

2. In order to process that request, the qcow2 node internally issues
   one or more requests to the file node

3. Someone calls bdrv_drained_begin(qcow2_node)

4. We call BlockDriver.bdrv_drained_begin() for qcow2_node and
   file_node, and BdrvChildRole.drained_begin() for the block job.

   Now the block nodes don't create any new original requests any more
   (qcow2 and file don't do that anyway; qcow2 only creates requests to
   its children in the context of parent requests). The job potentially
   continues sending requests until it reaches the next pause point. The
   previously issued requests are still in flight.

   Is this step what you meant by X->drv->bdrv_drain(X)? I don't see why
   pending requests can only be in X's children. Why can't the request
   be pending in X itself, say waiting for a thread pool worker
   decrypting a buffer?

   Also, note that after this series, the driver callbacks are called
   asynchronously, but I don't think it makes a big difference here.

5. The file_node request completes. file_node doesn't have any requests
   in flight any more, but in theory it could still get new requests
   from qcow2_node. Anyway, let's say this is the last request, so I
   think we'd call its requests concluded?

6. qcow2 still has a request in flight, but doesn't need to access
   file_node for it. It finishes the work and therefore concludes its
   requests as well. Note that qcow2_node (the parent) concludes after
   file_node (the child).

7. We'll keep the example simple, so after completion of its request,
   the job reaches a pause point without sending a new request. Again,
   this happens after qcow2_node has concluded.

8. Only when neither file_node nor qcow2_node have a request in flight
   and the job has reached a pause point, bdrv_drained_begin() can
   return.

So completing the last request and reaching an actually quiescent state
looks very much like a process in child-to-parent order to me?

Kevin

Re: [Qemu-devel] [PATCH v1 2/2] fpu/softfloat: raise float_invalid for NaN in float_to_int

2018-04-12 Thread Alex Bennée


Peter Maydell  writes:

> On 12 April 2018 at 12:58, Alex Bennée  wrote:
>> Fixes https://bugs.launchpad.net/qemu/+bug/1759264
>>
>> Signed-off-by: Alex Bennée 
>> Cc: Bastian Koppelmann 
>> ---
>>  fpu/softfloat.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
>> index 9b99aa6ec8..ddc77c273c 100644
>> --- a/fpu/softfloat.c
>> +++ b/fpu/softfloat.c
>> @@ -1344,6 +1344,7 @@ static int64_t round_to_int_and_pack(FloatParts in, 
>> int rmode,
>>  case float_class_qnan:
>>  case float_class_dnan:
>>  case float_class_msnan:
>> +s->float_exception_flags = orig_flags | float_flag_invalid;
>>  return max;
>>  case float_class_inf:
>>  return p.sign ? min : max;
>
> Don't we also need to raise the Invalid flag for float_class_inf ?
>
> In both cases, this is fixing a regression introduced in
> commit ab52f973a50, I think.

And I guess the round_to_uint_and_pack as well which doesn't handle the
input inf case.

>
> thanks
> -- PMM


--
Alex Bennée

Re: [Qemu-devel] [PATCH v7 4/8] migration: Create socket-address parameter

2018-04-12 Thread Dr. David Alan Gilbert

* Juan Quintela (quint...@redhat.com) wrote:
> It will be used to store the uri parameters. We want this only for
> tcp, so we don't set it for other uris.  We need it to know what port
> is migration running.
> 
> Signed-off-by: Juan Quintela 
> 
> --
> 
> This used to be uri parameter, but it has so many troubles to
> reproduce that it don't just make sense.
> 
> This used to be a port parameter.  I was asked to move to
> SocketAddress, done.
> I also merged the setting of the migration tcp port in this one
> because now I need to free the address, and this makes it easier.
> This used to be x-socket-address with a single direction, now it is a
> list of addresses.

Is there a reason it's a parameter rather than just an entry in
MigrationInfo?

Dave

> ---
>  hmp.c | 14 ++
>  migration/migration.c | 25 +
>  migration/migration.h |  1 +
>  migration/socket.c| 11 +++
>  qapi/migration.json   | 13 +++--
>  qapi/sockets.json | 13 +
>  6 files changed, 75 insertions(+), 2 deletions(-)
> 
> diff --git a/hmp.c b/hmp.c
> index a25c7bd9a8..caf94345c9 100644
> --- a/hmp.c
> +++ b/hmp.c
> @@ -355,6 +355,20 @@ void hmp_info_migrate_parameters(Monitor *mon, const 
> QDict *qdict)
>  monitor_printf(mon, "%s: %" PRIu64 "\n",
>  MigrationParameter_str(MIGRATION_PARAMETER_XBZRLE_CACHE_SIZE),
>  params->xbzrle_cache_size);
> +if (params->has_socket_address) {
> +SocketAddressList *addr;
> +
> +monitor_printf(mon, "%s: [\n",
> +MigrationParameter_str(MIGRATION_PARAMETER_SOCKET_ADDRESS));
> +
> +for (addr = params->socket_address; addr; addr = addr->next) {
> +char *s = SocketAddress_to_str("", addr->value,
> +   false, false);
> +monitor_printf(mon, "\t%s\n", s);
> +}
> +
> +monitor_printf(mon, "]\n");
> +}
>  }
>  
>  qapi_free_MigrationParameters(params);
> diff --git a/migration/migration.c b/migration/migration.c
> index 58bd382730..71fa4c8176 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -31,6 +31,8 @@
>  #include "migration/vmstate.h"
>  #include "block/block.h"
>  #include "qapi/error.h"
> +#include "qapi/clone-visitor.h"
> +#include "qapi/qapi-visit-sockets.h"
>  #include "qapi/qapi-commands-migration.h"
>  #include "qapi/qapi-events-migration.h"
>  #include "qapi/qmp/qerror.h"
> @@ -277,6 +279,21 @@ int migrate_send_rp_req_pages(MigrationIncomingState 
> *mis, const char *rbname,
>  return migrate_send_rp_message(mis, msg_type, msglen, bufc);
>  }
>  
> +void migrate_set_address(SocketAddress *address)
> +{
> +MigrationState *s = migrate_get_current();
> +SocketAddressList *addrs;
> +
> +addrs = g_new0(SocketAddressList, 1);
> +addrs->next = s->parameters.socket_address;
> +s->parameters.socket_address = addrs;
> +
> +if (!s->parameters.has_socket_address) {
> +s->parameters.has_socket_address = true;
> +}
> +addrs->value = QAPI_CLONE(SocketAddress, address);
> +}
> +
>  void qemu_start_incoming_migration(const char *uri, Error **errp)
>  {
>  const char *p;
> @@ -564,6 +581,11 @@ MigrationParameters *qmp_query_migrate_parameters(Error 
> **errp)
>  params->x_multifd_page_count = s->parameters.x_multifd_page_count;
>  params->has_xbzrle_cache_size = true;
>  params->xbzrle_cache_size = s->parameters.xbzrle_cache_size;
> +if (s->parameters.socket_address) {
> +params->has_socket_address = true;
> +params->socket_address =
> +QAPI_CLONE(SocketAddressList, s->parameters.socket_address);
> +}
>  
>  return params;
>  }
> @@ -2571,6 +2593,9 @@ static void migration_instance_finalize(Object *obj)
>  qemu_mutex_destroy(>error_mutex);
>  g_free(params->tls_hostname);
>  g_free(params->tls_creds);
> +if (params->socket_address) {
> +qapi_free_SocketAddressList(params->socket_address);
> +}
>  qemu_sem_destroy(>pause_sem);
>  error_free(ms->error);
>  }
> diff --git a/migration/migration.h b/migration/migration.h
> index 8d2f320c48..4774ee305f 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -241,5 +241,6 @@ int migrate_send_rp_req_pages(MigrationIncomingState 
> *mis, const char* rbname,
>  
>  void dirty_bitmap_mig_before_vm_start(void);
>  void init_dirty_bitmap_incoming_migration(void);
> +void migrate_set_address(SocketAddress *address);
>  
>  #endif
> diff --git a/migration/socket.c b/migration/socket.c
> index 122d8ccfbe..5195fd57c5 100644
> --- a/migration/socket.c
> +++ b/migration/socket.c
> @@ -15,6 +15,7 @@
>   */
>  
>  #include "qemu/osdep.h"
> +#include "qemu/cutils.h"
>  
>  #include "qemu-common.h"
>  #include "qemu/error-report.h"
> @@ -152,6 +153,7 @@ static void socket_start_incoming_migration(SocketAddress 
>

[Qemu-devel] [PATCH v2 5/5] qmp: add pmemload command

2018-04-12 Thread Simon Ruderich

Adapted patch from Baojun Wang [1] with the following commit message:

I found this could be useful to have qemu-softmmu as a cross
debugger (launch with -s -S command line option), then if we can
have a command to load guest physical memory, we can use cross gdb
to do some target debug which gdb cannot do directly.

pmemload is necessary to directly write physical memory which is not
possible with gdb alone as it uses only logical addresses.

The QAPI for pmemload uses "val" as parameter name for the physical
address. This name is not very descriptive but is consistent with the
existing pmemsave. Changing the parameter name of pmemsave is not
possible without breaking the existing API.

[1]: https://lists.gnu.org/archive/html/qemu-trivial/2014-04/msg00074.html

Based-on-patch-by: Baojun Wang 
Signed-off-by: Simon Ruderich 
---
 cpus.c  | 41 +
 hmp-commands.hx | 14 ++
 hmp.c   | 12 
 hmp.h   |  1 +
 qapi/misc.json  | 20 
 5 files changed, 88 insertions(+)

diff --git a/cpus.c b/cpus.c
index d256d8e9b4..c690dc205f 100644
--- a/cpus.c
+++ b/cpus.c
@@ -2314,6 +2314,47 @@ exit:
 qemu_close(fd);
 }
 
+void qmp_pmemload(int64_t addr, int64_t size, int64_t offset,
+  const char *filename, Error **errp)
+{
+int fd;
+size_t l;
+ssize_t r;
+uint8_t buf[1024];
+
+fd = qemu_open(filename, O_RDONLY | O_BINARY);
+if (fd < 0) {
+error_setg_file_open(errp, errno, filename);
+return;
+}
+if (offset > 0) {
+if (lseek(fd, offset, SEEK_SET) != offset) {
+error_setg_errno(errp, errno,
+ "could not seek to offset %" PRIx64, offset);
+goto exit;
+}
+}
+
+while (size != 0) {
+l = sizeof(buf);
+if (l > size) {
+l = size;
+}
+r = read(fd, buf, l);
+if (r <= 0) {
+error_setg(errp, QERR_IO_ERROR);
+goto exit;
+}
+l = r; /* in case of short read */
+cpu_physical_memory_write(addr, buf, l);
+addr += l;
+size -= l;
+}
+
+exit:
+qemu_close(fd);
+}
+
 void qmp_inject_nmi(Error **errp)
 {
 nmi_monitor_handle(monitor_get_cpu_index(), errp);
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 35d862a5d2..a392d0e379 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -822,6 +822,20 @@ STEXI
 @item pmemsave @var{addr} @var{size} @var{file}
 @findex pmemsave
 save to disk physical memory dump starting at @var{addr} of size @var{size}.
+ETEXI
+
+{
+.name   = "pmemload",
+.args_type  = "val:l,size:i,offset:i,filename:s",
+.params = "addr size offset file",
+.help   = "load from disk physical memory dump starting at 'addr' 
of size 'size' at file offset 'offset'",
+.cmd= hmp_pmemload,
+},
+
+STEXI
+@item pmemload @var{addr} @var{size} @var{offset} @var{file}
+@findex pmemload
+load from disk physical memory dump starting at @var{addr} of size @var{size} 
at file offset @var{offset}.
 ETEXI
 
 {
diff --git a/hmp.c b/hmp.c
index 1e392055f7..c9b37dbfbd 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1090,6 +1090,18 @@ void hmp_pmemsave(Monitor *mon, const QDict *qdict)
 hmp_handle_error(mon, );
 }
 
+void hmp_pmemload(Monitor *mon, const QDict *qdict)
+{
+uint64_t size = qdict_get_int(qdict, "size");
+uint64_t offset = qdict_get_int(qdict, "offset");
+const char *filename = qdict_get_str(qdict, "filename");
+uint64_t addr = qdict_get_int(qdict, "val");
+Error *err = NULL;
+
+qmp_pmemload(addr, size, offset, filename, );
+hmp_handle_error(mon, );
+}
+
 void hmp_ringbuf_write(Monitor *mon, const QDict *qdict)
 {
 const char *chardev = qdict_get_str(qdict, "device");
diff --git a/hmp.h b/hmp.h
index 4e2ec375b0..0a69e371ca 100644
--- a/hmp.h
+++ b/hmp.h
@@ -47,6 +47,7 @@ void hmp_system_powerdown(Monitor *mon, const QDict *qdict);
 void hmp_cpu(Monitor *mon, const QDict *qdict);
 void hmp_memsave(Monitor *mon, const QDict *qdict);
 void hmp_pmemsave(Monitor *mon, const QDict *qdict);
+void hmp_pmemload(Monitor *mon, const QDict *qdict);
 void hmp_ringbuf_write(Monitor *mon, const QDict *qdict);
 void hmp_ringbuf_read(Monitor *mon, const QDict *qdict);
 void hmp_cont(Monitor *mon, const QDict *qdict);
diff --git a/qapi/misc.json b/qapi/misc.json
index 5636f4a149..2255d219fa 100644
--- a/qapi/misc.json
+++ b/qapi/misc.json
@@ -1185,6 +1185,26 @@
 { 'command': 'pmemsave',
   'data': {'val': 'int', 'size': 'int', 'filename': 'str'} }
 
+##
+# @pmemload:
+#
+# Load a portion of guest physical memory from a file.
+#
+# @val: the physical address of the guest to start from
+#
+# @size: the size of memory region to load
+#
+# @offset: the offset in the file to start from
+#
+# @filename: the file to load the memory from as binary data
+#

[Qemu-devel] [PATCH v2 1/5] cpus: correct coding style in qmp_memsave/qmp_pmemsave

2018-04-12 Thread Simon Ruderich

Signed-off-by: Simon Ruderich 
---
 cpus.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/cpus.c b/cpus.c
index 38eba8bff3..c78f430532 100644
--- a/cpus.c
+++ b/cpus.c
@@ -2263,8 +2263,9 @@ void qmp_memsave(int64_t addr, int64_t size, const char 
*filename,
 
 while (size != 0) {
 l = sizeof(buf);
-if (l > size)
+if (l > size) {
 l = size;
+}
 if (cpu_memory_rw_debug(cpu, addr, buf, l, 0) != 0) {
 error_setg(errp, "Invalid addr 0x%016" PRIx64 "/size %" PRId64
  " specified", orig_addr, orig_size);
@@ -2297,8 +2298,9 @@ void qmp_pmemsave(int64_t addr, int64_t size, const char 
*filename,
 
 while (size != 0) {
 l = sizeof(buf);
-if (l > size)
+if (l > size) {
 l = size;
+}
 cpu_physical_memory_read(addr, buf, l);
 if (fwrite(buf, 1, l, f) != l) {
 error_setg(errp, QERR_IO_ERROR);
-- 
2.15.0

[Qemu-devel] [PATCH v2 4/5] hmp: don't truncate size in hmp_memsave/hmp_pmemsave

2018-04-12 Thread Simon Ruderich

The called function takes an uint64_t as size parameter and
qdict_get_int() returns an uint64_t. Don't truncate it needlessly to an
uint32_t.

Signed-off-by: Simon Ruderich 
---
 hmp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hmp.c b/hmp.c
index a25c7bd9a8..1e392055f7 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1064,7 +1064,7 @@ void hmp_cpu(Monitor *mon, const QDict *qdict)
 
 void hmp_memsave(Monitor *mon, const QDict *qdict)
 {
-uint32_t size = qdict_get_int(qdict, "size");
+uint64_t size = qdict_get_int(qdict, "size");
 const char *filename = qdict_get_str(qdict, "filename");
 uint64_t addr = qdict_get_int(qdict, "val");
 Error *err = NULL;
@@ -1081,7 +1081,7 @@ void hmp_memsave(Monitor *mon, const QDict *qdict)
 
 void hmp_pmemsave(Monitor *mon, const QDict *qdict)
 {
-uint32_t size = qdict_get_int(qdict, "size");
+uint64_t size = qdict_get_int(qdict, "size");
 const char *filename = qdict_get_str(qdict, "filename");
 uint64_t addr = qdict_get_int(qdict, "val");
 Error *err = NULL;
-- 
2.15.0

[Qemu-devel] [PATCH v2 2/5] cpus: convert qmp_memsave/qmp_pmemsave to use qemu_open

2018-04-12 Thread Simon Ruderich

qemu_open() allow passing file descriptors to qemu which is used in
restricted environments like libvirt where open() is prohibited.

Suggested-by: Eric Blake 
Signed-off-by: Simon Ruderich 
---
 cpus.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/cpus.c b/cpus.c
index c78f430532..292d5b94b1 100644
--- a/cpus.c
+++ b/cpus.c
@@ -2238,7 +2238,7 @@ CpuInfoFastList *qmp_query_cpus_fast(Error **errp)
 void qmp_memsave(int64_t addr, int64_t size, const char *filename,
  bool has_cpu, int64_t cpu_index, Error **errp)
 {
-FILE *f;
+int fd;
 uint32_t l;
 CPUState *cpu;
 uint8_t buf[1024];
@@ -2255,8 +2255,8 @@ void qmp_memsave(int64_t addr, int64_t size, const char 
*filename,
 return;
 }
 
-f = fopen(filename, "wb");
-if (!f) {
+fd = qemu_open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY, 0600);
+if (fd < 0) {
 error_setg_file_open(errp, errno, filename);
 return;
 }
@@ -2271,7 +2271,7 @@ void qmp_memsave(int64_t addr, int64_t size, const char 
*filename,
  " specified", orig_addr, orig_size);
 goto exit;
 }
-if (fwrite(buf, 1, l, f) != l) {
+if (qemu_write_full(fd, buf, l) != l) {
 error_setg(errp, QERR_IO_ERROR);
 goto exit;
 }
@@ -2280,18 +2280,18 @@ void qmp_memsave(int64_t addr, int64_t size, const char 
*filename,
 }
 
 exit:
-fclose(f);
+qemu_close(fd);
 }
 
 void qmp_pmemsave(int64_t addr, int64_t size, const char *filename,
   Error **errp)
 {
-FILE *f;
+int fd;
 uint32_t l;
 uint8_t buf[1024];
 
-f = fopen(filename, "wb");
-if (!f) {
+fd = qemu_open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY, 0600);
+if (fd < 0) {
 error_setg_file_open(errp, errno, filename);
 return;
 }
@@ -2302,7 +2302,7 @@ void qmp_pmemsave(int64_t addr, int64_t size, const char 
*filename,
 l = size;
 }
 cpu_physical_memory_read(addr, buf, l);
-if (fwrite(buf, 1, l, f) != l) {
+if (qemu_write_full(fd, buf, l) != l) {
 error_setg(errp, QERR_IO_ERROR);
 goto exit;
 }
@@ -2311,7 +2311,7 @@ void qmp_pmemsave(int64_t addr, int64_t size, const char 
*filename,
 }
 
 exit:
-fclose(f);
+qemu_close(fd);
 }
 
 void qmp_inject_nmi(Error **errp)
-- 
2.15.0

Re: [Qemu-devel] [PATCH/RFC 3/5] hw/arm/virt: Allow dynamic sysbus devices again

2018-04-12 Thread Geert Uytterhoeven

Hi Eric,

On Wed, Feb 14, 2018 at 11:37 AM, Auger Eric  wrote:
> On 09/02/18 16:17, Geert Uytterhoeven wrote:
>> Allow the instantation of generic dynamic sysbus devices again, without
>> the need to create a new device-specific vfio type.
>>
>> This is a partial revert of commit  6f2062b9758ebc64 ("hw/arm/virt:
>> Allow only supported dynamic sysbus devices").
>>
>> Not-Yet-Signed-off-by: Geert Uytterhoeven 
>> ---
>>  hw/arm/virt.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
>> index b334c82edae9fb1f..fa83784fc08ed076 100644
>> --- a/hw/arm/virt.c
>> +++ b/hw/arm/virt.c
>> @@ -1596,6 +1596,7 @@ static void virt_machine_class_init(ObjectClass *oc, 
>> void *data)
>>  mc->max_cpus = 255;
>>  machine_class_allow_dynamic_sysbus_dev(mc, TYPE_VFIO_CALXEDA_XGMAC);
>>  machine_class_allow_dynamic_sysbus_dev(mc, TYPE_VFIO_AMD_XGBE);
>> +machine_class_allow_dynamic_sysbus_dev(mc, TYPE_SYS_BUS_DEVICE);
> Couldn't you limit that to TYPE_VFIO_PLATFORM instead?

Yep, that works. Thanks!

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

[Qemu-devel] [PATCH v2 3/5] cpus: use size_t in qmp_memsave/qmp_pmemsave

2018-04-12 Thread Simon Ruderich

It's the natural type for object sizes and matches the return value of
sizeof(buf).

Signed-off-by: Simon Ruderich 
---
 cpus.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/cpus.c b/cpus.c
index 292d5b94b1..d256d8e9b4 100644
--- a/cpus.c
+++ b/cpus.c
@@ -2239,7 +2239,7 @@ void qmp_memsave(int64_t addr, int64_t size, const char 
*filename,
  bool has_cpu, int64_t cpu_index, Error **errp)
 {
 int fd;
-uint32_t l;
+size_t l;
 CPUState *cpu;
 uint8_t buf[1024];
 int64_t orig_addr = addr, orig_size = size;
@@ -2287,7 +2287,7 @@ void qmp_pmemsave(int64_t addr, int64_t size, const char 
*filename,
   Error **errp)
 {
 int fd;
-uint32_t l;
+size_t l;
 uint8_t buf[1024];
 
 fd = qemu_open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY, 0600);
-- 
2.15.0

Re: [Qemu-devel] [PATCH] qmp: add pmemload command

2018-04-12 Thread Simon Ruderich

On Wed, Apr 11, 2018 at 08:02:58AM -0500, Eric Blake wrote:
> You could always add qemu_fopen/qemu_fclose to match the existing
> qemu_open/qemu_close.  But you do have a point that you can't call
> qemu_close/fclose (because fclose would be left with a stale fd that
> might spuriously close something that has been opened in another thread
> during the race window), nor fclose/qemu_close.
>
> The FILE interface may sometimes be easier to work with, but it also
> comes with buffering issues that you don't have to worry about when
> using the fd interface.

Yeah. I'm using plain qemu_open()/qemu_close() and read/write
directly.

> Good point - but yes, ideally it should always be possible to pass in an
> fd instead of having qemu itself open a file, because there are
> execution environments where qemu is intentionally prohibited from
> directly calling open() for security reasons (where the management app,
> such as libvirt, opens and then passes fds to qemu instead).

I've converted qmp_memsave/qmp_pmemsave in extra patches to use
qemu_open() (and some additional cleanup).

> You are correct that we can't really change the existing interface; so
> documenting in the commit message that your choice of names is for
> consistency reasons may be sufficient.

Will do.

Revised patches which should implement all of your suggestions
(and the style issue noticed by the bot) as replies to this mail.

Regards
Simon
-- 
+ privacy is necessary
+ using gnupg http://gnupg.org
+ public key id: 0x92FEFDB7E44C32F9

Re: [Qemu-devel] [Bug Report] vm paused after succeeding to migrate

2018-04-12 Thread Dr. David Alan Gilbert

* linzhecheng (linzhech...@huawei.com) wrote:
> Hi, all
> I encounterd a bug when I try to migrate a windows vm.
> 
> Enviroment information:
> host A: cpu E5620(model WestmereEP without flag xsave)
> host B: cpu E5-2643(model SandyBridgeEP with xsave)
> 
> The reproduce steps is :
> 1. Start a windows 2008 vm with -cpu host(which means host-passthrough).
> 2. Migrate the vm to host B when cr4.OSXSAVE=0 (successfully).
> 3. Vm runs on host B for a while so that cr4.OSXSAVE changes to 1.
> 4. Then migrate the vm to host A (successfully), but vm was paused, and qemu 
> printed log as followed:

Remember that migrating using -cpu host  across different CPU models is NOT
expected to work.

> KVM: entry failed, hardware error 0x8021
> 
> If you're running a guest on an Intel machine without unrestricted mode
> support, the failure can be most likely due to the guest entering an invalid
> state for Intel VT. For example, the guest maybe running in big real mode
> which is not supported on less recent Intel processors.
> 
> EAX=019b3bb0 EBX=01a3ae80 ECX=01a61ce8 EDX=
> ESI=01a62000 EDI= EBP= ESP=01718b20
> EIP=0185d982 EFL=0286 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =   9300
> CS =f000   9b00
> SS =   9300
> DS =   9300
> FS =   9300
> GS =   9300
> LDT=   8200
> TR =   8b00
> GDT=  
> IDT=  
> CR0=6010 CR2= CR3= CR4=
> DR0= DR1= DR2= 
> DR3=
> DR6=0ff0 DR7=0400
> EFER=
> Code=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
> 00
> 
> I have found that problem happened when kvm_put_sregs returns err -22(called 
> by kvm_arch_put_registers(qemu)).
> Because kvm_arch_vcpu_ioctl_set_sregs(kvm-mod) checked that guest_cpuid_has 
> no X86_FEATURE_XSAVE but cr4.OSXSAVE=1.
> So should we cancel migration when kvm_arch_put_registers returns error?

It would seem good if we can make the migration fail there rather than
hitting that KVM error.
It looks like we need to do a bit of plumbing to convert the places that
call it to return a bool rather than void.

Dave

--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH v1 2/2] fpu/softfloat: raise float_invalid for NaN in float_to_int

2018-04-12 Thread Bastian Koppelmann

On 04/12/2018 02:26 PM, Peter Maydell wrote:
> On 12 April 2018 at 12:58, Alex Bennée  wrote:
>> Fixes https://bugs.launchpad.net/qemu/+bug/1759264
>>
>> Signed-off-by: Alex Bennée 
>> Cc: Bastian Koppelmann 
>> ---
>>  fpu/softfloat.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
>> index 9b99aa6ec8..ddc77c273c 100644
>> --- a/fpu/softfloat.c
>> +++ b/fpu/softfloat.c
>> @@ -1344,6 +1344,7 @@ static int64_t round_to_int_and_pack(FloatParts in, 
>> int rmode,
>>  case float_class_qnan:
>>  case float_class_dnan:
>>  case float_class_msnan:
>> +s->float_exception_flags = orig_flags | float_flag_invalid;
>>  return max;
>>  case float_class_inf:
>>  return p.sign ? min : max;
> 
> Don't we also need to raise the Invalid flag for float_class_inf ?
> 

Yup, as mentioned in the bug report.

Cheers,
Bastian

1 2 >

1 - 100 of 182 matches

Mail list logo