date:20170414

[Qemu-devel] [Bug 821078] Re: virtio-serial-bus: Unexpected port id

2017-04-14 Thread Launchpad Bug Tracker

[Expired for QEMU because there has been no activity for 60 days.]

** Changed in: qemu
   Status: Incomplete => Expired

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/821078

Title:
  virtio-serial-bus: Unexpected port id

Status in QEMU:
  Expired

Bug description:
  With qemu-kvm-0.15.0-rc1 virtio-serial-bus reports an error, and windows 
vdagent can not start.  qemu-0.15.0-rc1 behaves as expected, ie vdagent runs in 
the guest, mouse passes seamlessly between spicec and host and copy/paste works 
between guest and host.
  qemu-kvm has been configured with
  ./configure --target-list=x86_64-softmmu --disable-curses  --disable-curl 
--audio-drv-list=alsa --audio-card-list=sb16,ac97,hda --enable-vnc-thread 
--disable-bluez --enable-vhost-net --enable-spice
  and is started with
  qemu-system-x86_64 -cpu host -enable-kvm -pidfile /home/rick/qemu/hds/wxp.pid 
-drive file=/home/rick/qemu/hds/wxp.raw,if=virtio,aio=native -m 1536 -name 
WinXP -net nic,model -net user -localtime -usb -vga qxl -device 
virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x5 -chardev 
spicevmc,name=vdagent,id=vdagent -device 
virtserialport,nr=1,bus=virtio-serial0.0,chardev=vdagent,name=com.redhat.spice.0
 -spice port=1234,disable-ticketing -monitor stdio

  I've also tried start qemu like
  qemu-system-x86_64 -cpu host -enable-kvm -pidfile /home/rick/qemu/hds/wxp.pid 
-drive file=/home/rick/qemu/hds/wxp.raw,if=virtio -m 768 -name WinXP -net 
nic,model=virtio -net user -localtime -usb -vga qxl -device virtio-serial 
-chardev spicevmc,name=vdagent,id=vdagent -device 
virtserialport,chardev=vdagent,name=com.redhat.spice.0 -spice 
port=1234,disable-ticketing -monitor stdio
  and observed the same results.

  the host runs 2.6.39.4 vanilla kernel.  the guest uses the most recent
  virtio-serial, vga-qxl and vdagent from spice-space.org

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/821078/+subscriptions

[Qemu-devel] [Bug 988909] Re: Assert failed in arp_table.c

2017-04-14 Thread Launchpad Bug Tracker

[Expired for QEMU because there has been no activity for 60 days.]

** Changed in: qemu
   Status: Incomplete => Expired

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/988909

Title:
  Assert failed in arp_table.c

Status in QEMU:
  Expired

Bug description:
  
  With latest git (8001954) hen running:

  qemu-system-64 -hda $VDISK -kernel arch/x86/boot/bzImage \
  -append "ro root=/dev/hda1 console=ttyS0 init=/bin/systemd" \
  -curses \
  -net nic  -smp 3 -m 312 $@

  I'm getting this:

   qemu-system-x86_64: slirp/arp_table.c:75: arp_table_search: Assertion
  `(ip_addr & (__extension__ ({ register unsigned int __v, __x = (~(0xf
  << 28)); if (__builtin_constant_p (__x)) __v = __x) & 0xff00)
  >> 24) | (((__x) & 0x00ff) >> 8) | (((__x) & 0xff00) << 8) |
  (((__x) & 0x00ff) << 24)); else __asm__ ("bswap %0" : "=r" (__v) :
  "0" (__x)); __v; }))) != 0' failed.

  Bug #824650 seems to be related to this one, but it is not. Fix for that one 
is already upstream. 
  I can help on testing.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/988909/+subscriptions

[Qemu-devel] [Bug 989504] Re: assertion failed when attaching USB MSD device

2017-04-14 Thread Launchpad Bug Tracker

[Expired for QEMU because there has been no activity for 60 days.]

** Changed in: qemu
   Status: Incomplete => Expired

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/989504

Title:
  assertion failed when attaching USB MSD device

Status in QEMU:
  Expired

Bug description:
  version: git rev be5ea8ed4481f0ffa4ea0f7ba13e465701536001
  commandline: qemu-system-i386 -usb -fda dosusb.img -drive 
if=none,id=usbstick,file=usb.img -device usb-storage,bus=usb.0,drive=usbstick 
-boot a -L d:\_programs\qemu

  ---
  Microsoft Visual C++ Runtime Library
  ---
  Assertion failed!

  Program: E:\qemu-system-i386.exe
  File: C:/msys/home/User/qemu/hw/usb/hcd-uhci.c
  Line: 968

  Expression: ret == TD_RESULT_ASYNC_START

  For information on how your program can cause an assertion
  failure, see the Visual C++ documentation on asserts

  (Press Retry to debug the application - JIT must be enabled)
  ---
  Abort   Retry   Ignore   
  ---

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/989504/+subscriptions

[Qemu-devel] [Bug 966316] Re: Can't load Android VBOX image or even linux test image as well

2017-04-14 Thread Launchpad Bug Tracker

[Expired for QEMU because there has been no activity for 60 days.]

** Changed in: qemu
   Status: Incomplete => Expired

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/966316

Title:
  Can't load Android VBOX image or even linux test image as well

Status in QEMU:
  Expired

Bug description:
  Can't load Android X86 ICS 4.0 VBOX image.
  It worked with previous version before adding /qemu/hw/pc_sysfw.c file ( 
tested with version 1.0 ). 

  x86_64-softmmu# ./qemu-system-x86_64 ~/kvm-test-image/x86-linux-0.2.img
  qemu: PC system firmware (pflash) must be a multiple of 0x1000

  In QEMU website (http://wiki.qemu.org/Testing), there is a test image for 
linux
  but, new version can't load the image as well because of upper error.
  linux-0.2.img.bz2 (8 MB)  Small Linux disk image containing a 2.6.20 
Linux kernel, X11 and various utilities to test QEMU

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/966316/+subscriptions

Re: [Qemu-devel] [PATCH v3 0/2] block: Quiesce old aio context during bdrv_set_aio_context

2017-04-14 Thread Paolo Bonzini



On 13/04/2017 15:21, Fam Zheng wrote:
> Paolo, maybe you want to submit a formal patch for 2.10? (Sorry for the
> confusion, these series of mine went a bit messy.)

Yes, will do.  Or just submit the alternative fix that is mentioned in
the comment.

Paolo

Re: [Qemu-devel] QEMU build breakage on ARM against Xen 4.9 caused by libxendevicemodel

2017-04-14 Thread Stefano Stabellini

On Fri, 14 Apr 2017, Stefano Stabellini wrote:
> Hi Paul,
> 
> The following commit in my qemu "next" branch breaks the build on arm
> and arm64:
> 
> commit 670271647ad15e9d937ced7a72c892349c709216
> Author: Paul Durrant 
> Date:   Tue Mar 7 10:55:34 2017 +
> 
> xen: use libxendevicemodel when available
> 
> See the appended build log. Sorry for not realizing it sooner.

As I imagined, this bug is easy to solve. It is reproducible on x86 too,
if you pass -DXC_WANT_COMPAT_DEVICEMODEL_API=1 to configure and
forcefully disable Xen 4.9 detection in the configure script.

If QEMU detects xen_ctrl_version = 480, the build will fail against Xen
4.9, even when -DXC_WANT_COMPAT_DEVICEMODEL_API=1 is specified.

The appended patch solves the problem. However, Xen 4.9 detection and
compilation remain broken.

---

diff --git a/include/hw/xen/xen_common.h b/include/hw/xen/xen_common.h
index 274accc..b1f5f53 100644
--- a/include/hw/xen/xen_common.h
+++ b/include/hw/xen/xen_common.h
@@ -9,7 +9,6 @@
 #undef XC_WANT_COMPAT_EVTCHN_API
 #undef XC_WANT_COMPAT_GNTTAB_API
 #undef XC_WANT_COMPAT_MAP_FOREIGN_API
-#undef XC_WANT_COMPAT_DEVICEMODEL_API

 #include 
 #include 
@@ -154,6 +153,7 @@ static inline int xendevicemodel_set_mem_type(

 #else /* CONFIG_XEN_CTRL_INTERFACE_VERSION >= 490 */

+#undef XC_WANT_COMPAT_DEVICEMODEL_API
 #include 

 #endif

[Qemu-devel] QEMU build breakage on ARM against Xen 4.9 caused by libxendevicemodel

2017-04-14 Thread Stefano Stabellini

Hi Paul,

The following commit in my qemu "next" branch breaks the build on arm
and arm64:

commit 670271647ad15e9d937ced7a72c892349c709216
Author: Paul Durrant 
Date:   Tue Mar 7 10:55:34 2017 +

xen: use libxendevicemodel when available

See the appended build log. Sorry for not realizing it sooner.

Please note that up to now we have been unable to disentangle the PV
machine from the i386 emulator in QEMU.  Thus, on arm and arm64 we build
qemu with --target-list=i386-softmmu as we do on x86. Of course, the
xenfv machine cannot work on arm. Only the xenpv machine can be used.
However, the xenfv machine is still included in the build.

This means that we cannot break the xenfv machine build on arm and arm64.


Going into details, "xen: use libxendevicemodel when available" breaks
the build on arm and arm64 even with -DXC_WANT_COMPAT_DEVICEMODEL_API=1.
It breaks 4.8 compatibility which is a regression from QEMU's point of
view.


Why 4.8 compatibility? Because the previous commit:

commit 2a15d804eb0e4f8bc25738b6aec30a0142d5d293
Author: Paul Durrant 
Date:   Tue Mar 7 10:55:33 2017 +

configure: detect presence of libxendevicemodel

doesn't work properly on arm: Xen 4.9 is not detected. However, that is
not a regression per se.


We cannot break the QEMU build against Xen 4.9 with
-DXC_WANT_COMPAT_DEVICEMODEL_API=1 or older releases. That has to work,
but I guess it won't be too difficult to fix.

The difficulty is how to introduce proper Xen 4.9 and libxendevicemodel
support in QEMU. We either need to fix libxendevicemodel on arm
(although obviously is never going to be actually used at run time, but
it is necessary for the build), or we need to disentangle the xenpv
machine build from the xenfv machine build in QEMU. I prefer the latter
approach, of course.

I haven't dug into what is the problem with libxendevicemodel on arm.
Needless to say, if there is an issue on the Xen side, it might impact
the Xen 4.9 release.


For your information, you can download a free (as in beer) arm64
software model from ARM called "Foundation Model", see the Xen wiki for
instructions:

https://wiki.xenproject.org/wiki/Xen_ARM_with_Virtualization_Extensions/FastModels

In addition, I think you can purchase access to ARM64 servers from
packet.net.


Thanks,

Stefano


---

In file included from /root/qemu/include/hw/xen/xen_backend.h:4:0,
 from hw/block/xen_disk.c:27:
/root/qemu/include/hw/xen/xen_common.h: In function 
‘xendevicemodel_create_ioreq_server’:
/root/qemu/include/hw/xen/xen_common.h:46:12: error: implicit declaration of 
function ‘xc_hvm_create_ioreq_server’ [-Werror=implicit-function-declaration]
 return xc_hvm_create_ioreq_server(dmod, domid, handle_bufioreq,
^
/root/qemu/include/hw/xen/xen_common.h:46:5: error: nested extern declaration 
of ‘xc_hvm_create_ioreq_server’ [-Werror=nested-externs]
 return xc_hvm_create_ioreq_server(dmod, domid, handle_bufioreq,
 ^
/root/qemu/include/hw/xen/xen_common.h: In function 
‘xendevicemodel_get_ioreq_server_info’:
/root/qemu/include/hw/xen/xen_common.h:55:12: error: implicit declaration of 
function ‘xc_hvm_get_ioreq_server_info’ [-Werror=implicit-function-declaration]
 return xc_hvm_get_ioreq_server_info(dmod, domid, id, ioreq_pfn,
^
/root/qemu/include/hw/xen/xen_common.h:55:5: error: nested extern declaration 
of ‘xc_hvm_get_ioreq_server_info’ [-Werror=nested-externs]
 return xc_hvm_get_ioreq_server_info(dmod, domid, id, ioreq_pfn,
 ^
/root/qemu/include/hw/xen/xen_common.h: In function 
‘xendevicemodel_map_io_range_to_ioreq_server’:
/root/qemu/include/hw/xen/xen_common.h:63:12: error: implicit declaration of 
function ‘xc_hvm_map_io_range_to_ioreq_server’ 
[-Werror=implicit-function-declaration]
 return xc_hvm_map_io_range_to_ioreq_server(dmod, domid, id, is_mmio,
^
/root/qemu/include/hw/xen/xen_common.h:63:5: error: nested extern declaration 
of ‘xc_hvm_map_io_range_to_ioreq_server’ [-Werror=nested-externs]
 return xc_hvm_map_io_range_to_ioreq_server(dmod, domid, id, is_mmio,
 ^
/root/qemu/include/hw/xen/xen_common.h: In function 
‘xendevicemodel_unmap_io_range_from_ioreq_server’:
/root/qemu/include/hw/xen/xen_common.h:71:12: error: implicit declaration of 
function ‘xc_hvm_unmap_io_range_from_ioreq_server’ 
[-Werror=implicit-function-declaration]
 return xc_hvm_unmap_io_range_from_ioreq_server(dmod, domid, id, is_mmio,
^
/root/qemu/include/hw/xen/xen_common.h:71:5: error: nested extern declaration 
of ‘xc_hvm_unmap_io_range_from_ioreq_server’ [-Werror=nested-externs]
 return xc_hvm_unmap_io_range_from_ioreq_server(dmod, domid, id, is_mmio,
 ^
/root/qemu/include/hw/xen/xen_common.h: In function 
‘xendevicemodel_map_pcidev_to_ioreq_server’:
/root/qemu/include/hw/xen/xen_common.h:79:12: error: implicit declaration of 
function ‘xc_hvm_map_pcidev_to_ioreq_server’

Re: [Qemu-devel] [virtio-dev] Re: [PATCH v9 2/5] virtio-balloon: VIRTIO_BALLOON_F_BALLOON_CHUNKS

2017-04-14 Thread Michael S. Tsirkin

On Fri, Apr 14, 2017 at 04:37:52PM +0800, Wei Wang wrote:
> On 04/14/2017 12:34 AM, Michael S. Tsirkin wrote:
> > On Thu, Apr 13, 2017 at 05:35:05PM +0800, Wei Wang wrote:
> > 
> > So we don't need the bitmap to talk to host, it is just
> > a data structure we chose to maintain lists of pages, right?
> Right. bitmap is the way to gather pages to chunk.
> It's only needed in the balloon page case.
> For the unused page case, we don't need it, since the free
> page blocks are already chunks.
> 
> > OK as far as it goes but you need much better isolation for it.
> > Build a data structure with APIs such as _init, _cleanup, _add, _clear,
> > _find_first, _find_next.
> > Completely unrelated to pages, it just maintains bits.
> > Then use it here.
> > 
> > 
> > >   static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES;
> > >   module_param(oom_pages, int, S_IRUSR | S_IWUSR);
> > >   MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
> > > @@ -50,6 +54,10 @@ MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
> > >   static struct vfsmount *balloon_mnt;
> > >   #endif
> > > +/* Types of pages to chunk */
> > > +#define PAGE_CHUNK_TYPE_BALLOON 0
> > > +
> > Doesn't look like you are ever adding more types in this
> > patchset.  Pls keep code simple, generalize it later.
> > 
> "#define PAGE_CHUNK_TYPE_UNUSED 1" is added in another patch.

I would say add the extra code there too. Or maybe we can avoid
adding it altogether.

> Types of page to chunk are treated differently. Different types of page
> chunks are sent to the host via different protocols.
> 
> 1) PAGE_CHUNK_TYPE_BALLOON: Ballooned (i.e. inflated/deflated) pages
> to chunk.  For the ballooned type, it uses the basic chunk msg format:
> 
> virtio_balloon_page_chunk_hdr +
> virtio_balloon_page_chunk * MAX_PAGE_CHUNKS
> 
> 2) PAGE_CHUNK_TYPE_UNUSED: unused pages to chunk. It uses this miscq msg
> format:
> miscq_hdr +
> virtio_balloon_page_chunk_hdr +
> virtio_balloon_page_chunk * MAX_PAGE_CHUNKS
> 
> The chunk msg is actually the payload of the miscq msg.
> 
> 

So just combine the two message formats and then it'll all be easier?


> > > +#define MAX_PAGE_CHUNKS 4096
> > This is an order-4 allocation. I'd make it 4095 and then it's
> > an order-3 one.
> 
> Sounds good, thanks.
> I think it would be better to make it 4090. Leave some space for the hdr
> as well.

And miscq hdr. In fact just let compiler do the math - something like:
(8 * PAGE_SIZE - sizeof(hdr)) / sizeof(chunk)


I skimmed explanation of algorithms below but please make sure
code speaks for itself and add comments inline to document it.
Whenever you answered me inline this is where you want to
try to make code clearer and add comments.

Also, pls find ways to abstract the data structure so we don't
need to deal with its internals all over the code.




> > 
> > >   {
> > >   struct scatterlist sg;
> > > + struct virtio_balloon_page_chunk_hdr *hdr;
> > > + void *buf;
> > >   unsigned int len;
> > > - sg_init_one(, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
> > > + switch (type) {
> > > + case PAGE_CHUNK_TYPE_BALLOON:
> > > + hdr = vb->balloon_page_chunk_hdr;
> > > + len = 0;
> > > + break;
> > > + default:
> > > + dev_warn(>vdev->dev, "%s: chunk %d of unknown pages\n",
> > > +  __func__, type);
> > > + return;
> > > + }
> > > - /* We should always be able to add one buffer to an empty queue. */
> > > - virtqueue_add_outbuf(vq, , 1, vb, GFP_KERNEL);
> > > - virtqueue_kick(vq);
> > > + buf = (void *)hdr - len;
> > Moving back to before the header? How can this make sense?
> > It works fine since len is 0, so just buf = hdr.
> > 
> For the unused page chunk case, it follows its own protocol:
> miscq_hdr + payload(chunk msg).
>  "buf = (void *)hdr - len" moves the buf pointer to the miscq_hdr, to send
> the entire miscq msg.

Well just pass the correct pointer in.

> Please check the patch for implementing the unused page chunk,
> it will be clear. If necessary, I can put "buf = (void *)hdr - len" from
> that patch.

Exactly. And all this pointer math is very messy. Please look for ways
to clean it. It's generally easy to fill structures:

struct foo *foo = kmalloc(..., sizeof(*foo) + n * sizeof(foo->a[0]));
for (i = 0; i < n; ++i)
foo->a[i] = b;

this is the kind of code that's easy to understand and it's
obvious there are no overflows and no info leaks here.

> 
> > > + len += sizeof(struct virtio_balloon_page_chunk_hdr);
> > > + len += hdr->chunks * sizeof(struct virtio_balloon_page_chunk);
> > > + sg_init_table(, 1);
> > > + sg_set_buf(, buf, len);
> > > + if (!virtqueue_add_outbuf(vq, , 1, vb, GFP_KERNEL)) {
> > > + virtqueue_kick(vq);
> > > + if (busy_wait)
> > > + while (!virtqueue_get_buf(vq, ) &&
> > > +!virtqueue_is_broken(vq))
> > > + cpu_relax();
> > > + else
> > > + wait_event(vb->acked,

Re: [Qemu-devel] [Qemu-devel RFC v2 2/4] msf2: Microsemi Smartfusion2 System Register block.

2017-04-14 Thread Alistair Francis

On Sun, Apr 9, 2017 at 4:19 AM, Subbaraya Sundeep
 wrote:
> Added Sytem register block of Smartfusion2.
> This block has PLL registers which are accessed by guest.
>
> Signed-off-by: Subbaraya Sundeep 
> ---
>  hw/misc/Makefile.objs |   1 +
>  hw/misc/msf2_sysreg.c | 168 
> ++
>  2 files changed, 169 insertions(+)
>  create mode 100644 hw/misc/msf2_sysreg.c
>
> diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
> index c8b4893..aee53df 100644
> --- a/hw/misc/Makefile.objs
> +++ b/hw/misc/Makefile.objs
> @@ -56,3 +56,4 @@ obj-$(CONFIG_EDU) += edu.o
>  obj-$(CONFIG_HYPERV_TESTDEV) += hyperv_testdev.o
>  obj-$(CONFIG_AUX) += auxbus.o
>  obj-$(CONFIG_ASPEED_SOC) += aspeed_scu.o aspeed_sdmc.o
> +obj-$(CONFIG_MSF2) += msf2_sysreg.o
> diff --git a/hw/misc/msf2_sysreg.c b/hw/misc/msf2_sysreg.c
> new file mode 100644
> index 000..4873463
> --- /dev/null
> +++ b/hw/misc/msf2_sysreg.c
> @@ -0,0 +1,168 @@
> +/*
> + * System Register block model of Microsemi SmartFusion2.
> + *
> + * Copyright (c) 2017 Subbaraya Sundeep 
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + *
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, see .
> + */
> +
> +#include "qemu/osdep.h"
> +#include "hw/hw.h"
> +#include "qemu/timer.h"
> +#include "hw/sysbus.h"
> +#include "sysemu/sysemu.h"
> +#include "qemu/log.h"
> +
> +#ifndef MSF2_SYSREG_ERR_DEBUG
> +#define MSF2_SYSREG_ERR_DEBUG 0
> +#endif
> +
> +#define DB_PRINT(...) do { \
> +if (MSF2_SYSREG_ERR_DEBUG) { \
> +fprintf(stderr,  ": %s: ", __func__); \
> +fprintf(stderr, ## __VA_ARGS__); \

You should use qemu_log() here.

> +} \
> +} while (0);
> +
> +#define R_PSS_RST_CTRL_SOFT_RST 0x1
> +
> +enum {
> +ESRAM_CR= 0x00 / 4,
> +ESRAM_MAX_LAT,
> +DDR_CR,
> +ENVM_CR,
> +ENVM_REMAP_BASE_CR,
> +ENVM_REMAP_FAB_CR,
> +CC_CR,
> +CC_REGION_CR,
> +CC_LOCK_BASE_ADDR_CR,
> +CC_FLUSH_INDX_CR,
> +DDRB_BUF_TIMER_CR,
> +DDRB_NB_ADDR_CR,
> +DDRB_NB_SIZE_CR,
> +DDRB_CR,
> +
> +SOFT_RESET_CR  = 0x48 / 4,
> +M3_CR,
> +
> +GPIO_SYSRESET_SEL_CR = 0x58 / 4,
> +
> +MDDR_CR = 0x60 / 4,
> +
> +MSSDDR_PLL_STATUS_LOW_CR = 0x90 / 4,
> +MSSDDR_PLL_STATUS_HIGH_CR,
> +MSSDDR_FACC1_CR,
> +MSSDDR_FACC2_CR,
> +
> +MSSDDR_PLL_STATUS = 0x150 / 4,
> +
> +};
> +
> +#define MSF2_SYSREG_MMIO_SIZE 0x300
> +#define MSF2_SYSREG_NUM_REGS  (MSF2_SYSREG_MMIO_SIZE / 4)
> +
> +#define TYPE_MSF2_SYSREG  "msf2-sysreg"
> +#define MSF2_SYSREG(obj)  OBJECT_CHECK(Sf2SysregState, (obj), 
> TYPE_MSF2_SYSREG)
> +
> +typedef struct Sf2SysregState {
> +SysBusDevice parent_obj;
> +
> +MemoryRegion iomem;
> +
> +uint32_t regs[MSF2_SYSREG_NUM_REGS];
> +} Sf2SysregState;
> +
> +static void msf2_sysreg_reset(DeviceState *d)
> +{
> +Sf2SysregState *s = MSF2_SYSREG(d);
> +
> +DB_PRINT("RESET\n");
> +
> +s->regs[MSSDDR_PLL_STATUS_LOW_CR] = 0x02420041;
> +s->regs[MSSDDR_FACC1_CR] = 0x0A482124;
> +s->regs[MSSDDR_PLL_STATUS] = 0x3;
> +}
> +
> +static uint64_t msf2_sysreg_read(void *opaque, hwaddr offset,
> +unsigned size)
> +{
> +Sf2SysregState *s = opaque;
> +offset /= 4;
> +uint32_t ret = s->regs[offset];

Probably should check that this offset value is inside the array.

> +
> +DB_PRINT("addr: %08" HWADDR_PRIx " data: %08" PRIx32 "\n", offset * 4, 
> ret);
> +
> +   return ret;
> +}
> +
> +static void msf2_sysreg_write(void *opaque, hwaddr offset,
> +  uint64_t val, unsigned size)
> +{
> +Sf2SysregState *s = (Sf2SysregState *)opaque;
> +offset /= 4;
> +
> +DB_PRINT("addr: %08" HWADDR_PRIx " data: %08" PRIx64 "\n", offset * 4, 
> val);
> +
> +switch (offset) {
> +case MSSDDR_PLL_STATUS:
> +break;
> +
> +default:
> +s->regs[offset] = val;

Check the bounds here as well.

> +break;
> +}
> +}
> +
> +static const MemoryRegionOps sysreg_ops = {
> +.read = msf2_sysreg_read,
> +.write = msf2_sysreg_write,
> +.endianness = DEVICE_NATIVE_ENDIAN,
> +};
> +
> +static void msf2_sysreg_init(Object *obj)
> +{
> +Sf2SysregState *s = MSF2_SYSREG(obj);
> +
> +memory_region_init_io(>iomem, obj, _ops, s, "sysreg",
> +  MSF2_SYSREG_MMIO_SIZE);
> +sysbus_init_mmio(SYS_BUS_DEVICE(obj), >iomem);
> +}
> +
> +static const VMStateDescription vmstate_msf2_sysreg = {
> +.name = "msf2_sysreg",
> +.version_id = 2,
> +.minimum_version_id = 2,

This version_id should start at 1, you only increase it when

Re: [Qemu-devel] [Qemu-devel RFC v2 1/4] msf2: Add Smartfusion2 System timer

2017-04-14 Thread Alistair Francis

On Sun, Apr 9, 2017 at 4:19 AM, Subbaraya Sundeep
 wrote:
> Modelled System Timer in Microsemi's Smartfusion2 Soc.
> Timer has two 32bit down counters and two interrupts.
>
> Signed-off-by: Subbaraya Sundeep 

Hey Sundeep,

I have some comments inline below.

> ---
>  hw/timer/Makefile.objs |   1 +
>  hw/timer/msf2_timer.c  | 273 
> +
>  2 files changed, 274 insertions(+)
>  create mode 100644 hw/timer/msf2_timer.c
>
> diff --git a/hw/timer/Makefile.objs b/hw/timer/Makefile.objs
> index dd6f27e..0bdf1e1 100644
> --- a/hw/timer/Makefile.objs
> +++ b/hw/timer/Makefile.objs
> @@ -41,3 +41,4 @@ common-obj-$(CONFIG_STM32F2XX_TIMER) += stm32f2xx_timer.o
>  common-obj-$(CONFIG_ASPEED_SOC) += aspeed_timer.o
>
>  common-obj-$(CONFIG_SUN4V_RTC) += sun4v-rtc.o
> +common-obj-$(CONFIG_MSF2) += msf2_timer.o
> diff --git a/hw/timer/msf2_timer.c b/hw/timer/msf2_timer.c
> new file mode 100644
> index 000..962ada4
> --- /dev/null
> +++ b/hw/timer/msf2_timer.c
> @@ -0,0 +1,273 @@
> +/*
> + * Timer block model of Microsemi SmartFusion2.
> + *
> + * Copyright (c) 2017 Subbaraya Sundeep .
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a 
> copy
> + * of this software and associated documentation files (the "Software"), to 
> deal
> + * in the Software without restriction, including without limitation the 
> rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
> FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "hw/sysbus.h"
> +#include "hw/ptimer.h"
> +#include "qemu/log.h"
> +#include "qemu/main-loop.h"
> +
> +#define D(x)

Don't use this for debug prints, you want something more along the
lines of what is at the top of hw/misc/stm32f2xx_syscfg.c. Basically
wrap around the qemu_log() function.

> +
> +#define NUM_TIMERS 2
> +
> +#define R_VAL  0
> +#define R_LOADVAL  1
> +#define R_BGLOADVAL2
> +#define R_CTRL 3
> +#define R_RIS  4
> +#define R_MIS  5
> +#define R_MAX  6
> +
> +#define TIMER_CTRL_ENBL (1 << 0)
> +#define TIMER_CTRL_ONESHOT  (1 << 1)
> +#define TIMER_CTRL_INTR (1 << 2)
> +#define TIMER_RIS_ACK   (1 << 0)
> +#define TIMER_RST_CLR   (1 << 6)
> +
> +struct msf2_timer {
> +QEMUBH *bh;
> +ptimer_state *ptimer;
> +void *parent;
> +int nr; /* for debug.  */
> +
> +unsigned long timer_div;
> +
> +uint32_t regs[R_MAX];
> +qemu_irq irq;
> +};

Structs in QEMU should always be named in CamelCase.

> +
> +#define TYPE_MSF2_TIMER "msf2-timer"
> +#define MSF2_TIMER(obj) \
> +OBJECT_CHECK(struct timerblock, (obj), TYPE_MSF2_TIMER)
> +
> +struct timerblock {
> +SysBusDevice parent_obj;
> +
> +MemoryRegion mmio;
> +uint32_t freq_hz;
> +struct msf2_timer *timers;
> +};
> +
> +static inline unsigned int timer_from_addr(hwaddr addr)
> +{
> +/* Timers get a 6x32bit control reg area each.  */
> +return addr / R_MAX;
> +}
> +
> +static void timer_update_irq(struct msf2_timer *st)
> +{
> +int isr;
> +int ier;

I'd declare these both on the same line and declare them as bools.

> +
> +isr = !!(st->regs[R_RIS] & TIMER_RIS_ACK);
> +ier = !!(st->regs[R_CTRL] & TIMER_CTRL_INTR);
> +
> +qemu_set_irq(st->irq, (ier && isr));
> +}
> +
> +static uint64_t
> +timer_read(void *opaque, hwaddr addr, unsigned int size)
> +{
> +struct timerblock *t = opaque;
> +struct msf2_timer *st;
> +uint32_t r = 0;
> +unsigned int timer;
> +int isr;
> +int ier;
> +
> +addr >>= 2;
> +timer = timer_from_addr(addr);
> +st = >timers[timer];
> +
> +if (timer) {
> +addr -= 6;
> +}

Isn't this timer logic just checking if (addr >> 2) == R_MAX and if it
is set (addr >> 2) back to zero? This seems an overly complex way to
check that.

> +
> +switch (addr) {
> +case R_VAL:
> +r = ptimer_get_count(st->ptimer);
> +D(qemu_log("msf2_timer t=%d read counter=%x\n", timer, r));
> +break;
> +
> +case R_MIS:
> +isr = !!(st->regs[R_RIS] & TIMER_RIS_ACK);
> +ier =

Re: [Qemu-devel] [Qemu-devel RFC v2 4/4] msf2: Add Emcraft's Smartfusion2 SOM kit.

2017-04-14 Thread Alistair Francis

On Sun, Apr 9, 2017 at 4:19 AM, Subbaraya Sundeep
 wrote:
> Emulated Emcraft's Smartfusion2 System On Module starter
> kit.
>
> Signed-off-by: Subbaraya Sundeep 

Hey Sundeep,

Cool patch, I have some comments below.

> ---
>  default-configs/arm-softmmu.mak |   1 +
>  hw/arm/Makefile.objs|   2 +-
>  hw/arm/msf2_soc.c   | 141 
> 
>  3 files changed, 143 insertions(+), 1 deletion(-)
>  create mode 100644 hw/arm/msf2_soc.c
>
> diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
> index 1e3bd2b..379f7e1 100644
> --- a/default-configs/arm-softmmu.mak
> +++ b/default-configs/arm-softmmu.mak
> @@ -121,3 +121,4 @@ CONFIG_ACPI=y
>  CONFIG_SMBIOS=y
>  CONFIG_ASPEED_SOC=y
>  CONFIG_GPIO_KEY=y
> +CONFIG_MSF2=y
> diff --git a/hw/arm/Makefile.objs b/hw/arm/Makefile.objs
> index 4c5c4ee..cce2759 100644
> --- a/hw/arm/Makefile.objs
> +++ b/hw/arm/Makefile.objs
> @@ -1,7 +1,7 @@
>  obj-y += boot.o collie.o exynos4_boards.o gumstix.o highbank.o
>  obj-$(CONFIG_DIGIC) += digic_boards.o
>  obj-y += integratorcp.o mainstone.o musicpal.o nseries.o
> -obj-y += omap_sx1.o palm.o realview.o spitz.o stellaris.o
> +obj-y += omap_sx1.o palm.o realview.o spitz.o stellaris.o msf2_soc.o
>  obj-y += tosa.o versatilepb.o vexpress.o virt.o xilinx_zynq.o z2.o
>  obj-$(CONFIG_ACPI) += virt-acpi-build.o
>  obj-y += netduino2.o
> diff --git a/hw/arm/msf2_soc.c b/hw/arm/msf2_soc.c
> new file mode 100644
> index 000..7277b51
> --- /dev/null
> +++ b/hw/arm/msf2_soc.c
> @@ -0,0 +1,141 @@
> +/*
> + * Smartfusion2 SOM starter kit(from Emcraft) emulation.
> + *
> + * Copyright (c) 2017 Subbaraya Sundeep 
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a 
> copy
> + * of this software and associated documentation files (the "Software"), to 
> deal
> + * in the Software without restriction, including without limitation the 
> rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
> FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qapi/error.h"
> +#include "qemu-common.h"
> +#include "hw/arm/arm.h"
> +#include "exec/address-spaces.h"
> +#include "hw/sysbus.h"
> +#include "hw/char/serial.h"
> +#include "hw/boards.h"
> +#include "sysemu/block-backend.h"
> +#include "hw/ssi/ssi.h"
> +
> +#define MSF2_NUM_USARTS   1
> +#define MSF2_NUM_TIMERS   2
> +
> +#define ENVM_BASE_ADDRESS 0x6000
> +#define ENVM_SIZE (128 * 1024)
> +
> +#define DDR_BASE_ADDRESS  0xA000
> +#define DDR_SIZE  (64 * 1024 * 1024)
> +
> +#define SRAM_BASE_ADDRESS 0x2000
> +#define SRAM_SIZE (64 * 1024)
> +
> +#define MSF2_TIMER_BASE   0x40004000
> +#define MSF2_UART0_BASE   0x4000
> +#define MSF2_SYSREG_BASE  0x40038000
> +#define MSF2_SPI0_BASE0x40001000
> +
> +#define MSF2_SPI0_IRQ 2
> +#define MSF2_UART0_IRQ10
> +#define MSF2_TIMER_IRQ0   14
> +#define MSF2_TIMER_IRQ1   15
> +
> +static void msf2_init(MachineState *machine)
> +{
> +const char *kernel_filename = NULL;
> +DeviceState *dev, *nvic;
> +MemoryRegion *system_memory = get_system_memory();
> +MemoryRegion *nvm = g_new(MemoryRegion, 1);
> +MemoryRegion *nvm_alias = g_new(MemoryRegion, 1);
> +MemoryRegion *sram = g_new(MemoryRegion, 1);
> +MemoryRegion *ddr = g_new(MemoryRegion, 1);
> +QemuOpts *machine_opts = qemu_get_machine_opts();
> +SysBusDevice *busdev;
> +DriveInfo *dinfo = drive_get_next(IF_MTD);
> +qemu_irq cs_line;
> +SSIBus *spi;
> +
> +kernel_filename = qemu_opt_get(machine_opts, "kernel");

You can just get the kernel filename directly from: machine->kernel_filename

> +
> +memory_region_init_ram(nvm, NULL, "MSF2.envm", ENVM_SIZE,
> +   _fatal);
> +memory_region_init_alias(nvm_alias, NULL, "MSF2.flash.alias",
> + nvm, 0, ENVM_SIZE);
> +vmstate_register_ram_global(nvm);
> +
> +memory_region_set_readonly(nvm, true);
> +memory_region_set_readonly(nvm_alias, true);
> +
> +

Re: [Qemu-devel] [PATCH v2 1/4] arm: remove remaining cannot_destroy_with_object_finalize_yet

2017-04-14 Thread Alistair Francis

On Fri, Apr 14, 2017 at 1:37 AM, Laurent Vivier  wrote:
> With commit ce5b1bbf624b ("exec: move cpu_exec_init() calls to
> realize functions"), we can now remove all the
> remaining cannot_destroy_with_object_finalize_yet as
> unsafe references have been moved to cpu_exec_realizefn().
> (tested with QOM command provided by commit 4c315c27).
>
> Suggested-by: Markus Armbruster 
> Signed-off-by: Laurent Vivier 
> Reviewed-by: Markus Armbruster 

Acked-by: Alistair Francis 

Thanks,

Alistair

> ---
>  hw/arm/allwinner-a10.c | 6 --
>  hw/arm/bcm2836.c   | 6 --
>  hw/arm/digic.c | 6 --
>  hw/arm/fsl-imx25.c | 5 -
>  hw/arm/fsl-imx31.c | 5 -
>  hw/arm/fsl-imx6.c  | 5 -
>  hw/arm/xlnx-zynqmp.c   | 6 --
>  7 files changed, 39 deletions(-)
>
> diff --git a/hw/arm/allwinner-a10.c b/hw/arm/allwinner-a10.c
> index ca15d1c..f62a9a3 100644
> --- a/hw/arm/allwinner-a10.c
> +++ b/hw/arm/allwinner-a10.c
> @@ -118,12 +118,6 @@ static void aw_a10_class_init(ObjectClass *oc, void 
> *data)
>  DeviceClass *dc = DEVICE_CLASS(oc);
>
>  dc->realize = aw_a10_realize;
> -
> -/*
> - * Reason: creates an ARM CPU, thus use after free(), see
> - * arm_cpu_class_init()
> - */
> -dc->cannot_destroy_with_object_finalize_yet = true;
>  }
>
>  static const TypeInfo aw_a10_type_info = {
> diff --git a/hw/arm/bcm2836.c b/hw/arm/bcm2836.c
> index 8451190..8c43291 100644
> --- a/hw/arm/bcm2836.c
> +++ b/hw/arm/bcm2836.c
> @@ -160,12 +160,6 @@ static void bcm2836_class_init(ObjectClass *oc, void 
> *data)
>
>  dc->props = bcm2836_props;
>  dc->realize = bcm2836_realize;
> -
> -/*
> - * Reason: creates an ARM CPU, thus use after free(), see
> - * arm_cpu_class_init()
> - */
> -dc->cannot_destroy_with_object_finalize_yet = true;
>  }
>
>  static const TypeInfo bcm2836_type_info = {
> diff --git a/hw/arm/digic.c b/hw/arm/digic.c
> index d60ea39..94f3263 100644
> --- a/hw/arm/digic.c
> +++ b/hw/arm/digic.c
> @@ -101,12 +101,6 @@ static void digic_class_init(ObjectClass *oc, void *data)
>  DeviceClass *dc = DEVICE_CLASS(oc);
>
>  dc->realize = digic_realize;
> -
> -/*
> - * Reason: creates an ARM CPU, thus use after free(), see
> - * arm_cpu_class_init()
> - */
> -dc->cannot_destroy_with_object_finalize_yet = true;
>  }
>
>  static const TypeInfo digic_type_info = {
> diff --git a/hw/arm/fsl-imx25.c b/hw/arm/fsl-imx25.c
> index 2126f73..9056f27 100644
> --- a/hw/arm/fsl-imx25.c
> +++ b/hw/arm/fsl-imx25.c
> @@ -290,11 +290,6 @@ static void fsl_imx25_class_init(ObjectClass *oc, void 
> *data)
>
>  dc->realize = fsl_imx25_realize;
>
> -/*
> - * Reason: creates an ARM CPU, thus use after free(), see
> - * arm_cpu_class_init()
> - */
> -dc->cannot_destroy_with_object_finalize_yet = true;
>  dc->desc = "i.MX25 SOC";
>  }
>
> diff --git a/hw/arm/fsl-imx31.c b/hw/arm/fsl-imx31.c
> index dd1c713..d7e2d83 100644
> --- a/hw/arm/fsl-imx31.c
> +++ b/hw/arm/fsl-imx31.c
> @@ -262,11 +262,6 @@ static void fsl_imx31_class_init(ObjectClass *oc, void 
> *data)
>
>  dc->realize = fsl_imx31_realize;
>
> -/*
> - * Reason: creates an ARM CPU, thus use after free(), see
> - * arm_cpu_class_init()
> - */
> -dc->cannot_destroy_with_object_finalize_yet = true;
>  dc->desc = "i.MX31 SOC";
>  }
>
> diff --git a/hw/arm/fsl-imx6.c b/hw/arm/fsl-imx6.c
> index 76dd8a4..6969e73 100644
> --- a/hw/arm/fsl-imx6.c
> +++ b/hw/arm/fsl-imx6.c
> @@ -442,11 +442,6 @@ static void fsl_imx6_class_init(ObjectClass *oc, void 
> *data)
>
>  dc->realize = fsl_imx6_realize;
>
> -/*
> - * Reason: creates an ARM CPU, thus use after free(), see
> - * arm_cpu_class_init()
> - */
> -dc->cannot_destroy_with_object_finalize_yet = true;
>  dc->desc = "i.MX6 SOC";
>  }
>
> diff --git a/hw/arm/xlnx-zynqmp.c b/hw/arm/xlnx-zynqmp.c
> index bc4e66b..4f67158 100644
> --- a/hw/arm/xlnx-zynqmp.c
> +++ b/hw/arm/xlnx-zynqmp.c
> @@ -439,12 +439,6 @@ static void xlnx_zynqmp_class_init(ObjectClass *oc, void 
> *data)
>
>  dc->props = xlnx_zynqmp_props;
>  dc->realize = xlnx_zynqmp_realize;
> -
> -/*
> - * Reason: creates an ARM CPU, thus use after free(), see
> - * arm_cpu_class_init()
> - */
> -dc->cannot_destroy_with_object_finalize_yet = true;
>  }
>
>  static const TypeInfo xlnx_zynqmp_type_info = {
> --
> 2.9.3
>
>

[Qemu-devel] [Bug 986318] Re: [sdl] Mouse grab breaks GNOME 3 screensaver unlock screen

2017-04-14 Thread Philippe Gauthier via Qemu-devel

On current versions (QEMU 2.8, GNOME-Shell 3.22.3 on Wayland), the
screensaver will activate. Cursor control is given to the lock screen,
everything works properly. After the screen is unlocked, the cursor is
given back to QEMU and is still in "mouse grab" mode.

** Changed in: qemu
   Status: Incomplete => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/986318

Title:
  [sdl] Mouse grab breaks GNOME 3 screensaver unlock screen

Status in QEMU:
  Fix Released

Bug description:
  When the GNOME 3 screensaver activates with the mouse cursor over the
  SDL window, the screensaver will not unlock unless the gnome-shell
  process is killed and restarted manually. This seems to be related to
  the fact that the key strokes are grabbed by SDL, but the screensaver
  will not allow the Ctrl and Alt keys to be passed to QEMU to exit the
  mouse grab.

  Qemu-kvm 1.0.1, QEMU 1.0.1, kernel 3.2.15.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/986318/+subscriptions

Re: [Qemu-devel] [PATCH 0/3] move xen related files to corresponding xen directory

2017-04-14 Thread Stefano Stabellini

On Wed, 5 Apr 2017, Anthony Xu wrote:
> 
>   move xen related files to corresponding xen directory
> 
>   move xen-common.c to hw/xen/
>   move xen-hvm.c to hw/i386/xen/
>   move xen-mapcache.c to hw/i386/xen/
> 
> Signed-off -by: Anthony Xu 

Reviewed-by: Stefano Stabellini 

I queued this series in my "next" branch.


> 
>  Makefile.target  |  6 --
>  default-configs/i386-softmmu.mak |  1 -
>  default-configs/x86_64-softmmu.mak   |  1 -
>  hw/i386/xen/Makefile.objs|  2 +-
>  hw/i386/xen/trace-events | 17 +
>  xen-hvm.c => hw/i386/xen/xen-hvm.c   |  2 +-
>  xen-mapcache.c => hw/i386/xen/xen-mapcache.c |  2 +-
>  hw/xen/Makefile.objs |  2 +-
>  xen-common.c => hw/xen/xen-common.c  |  0
>  stubs/Makefile.objs  |  2 ++
>  xen-common-stub.c => stubs/xen-common.c  |  0
>  xen-hvm-stub.c => stubs/xen-hvm.c|  0
>  trace-events | 16 
>  13 files changed, 23 insertions(+), 28 deletions(-)
>  rename xen-hvm.c => hw/i386/xen/xen-hvm.c (99%)
>  rename xen-mapcache.c => hw/i386/xen/xen-mapcache.c (99%)
>  rename xen-common.c => hw/xen/xen-common.c (100%)
>  rename xen-common-stub.c => stubs/xen-common.c (100%)
>  rename xen-hvm-stub.c => stubs/xen-hvm.c (100%)
> 
> -- 
> 1.8.3.1
>

Re: [Qemu-devel] [Xen-devel] [PATCH] configure: introduce --enable-xen-fb-backend

2017-04-14 Thread Stefano Stabellini

On Fri, 14 Apr 2017, Juergen Gross wrote:
> On 14/04/17 08:06, Oleksandr Andrushchenko wrote:
> > On 04/14/2017 03:12 AM, Stefano Stabellini wrote:
> >> On Tue, 11 Apr 2017, Oleksandr Andrushchenko wrote:
> >>> From: Oleksandr Andrushchenko 
> >>>
> >>> For some use cases when Xen framebuffer/input backend
> >>> is not a part of Qemu it is required to disable it,
> >>> because of conflicting access to input/display devices.
> >>> Introduce additional configuration option for explicit
> >>> input/display control.
> >> In these cases when you don't want xenfb, why don't you just remove
> >> "vfb" from the xl config file? QEMU only starts the xenfb backend when
> >> requested by the toolstack.
> >>
> >> Is it because you have an alternative xenfb backend? If so, is it really
> >> fully xenfb compatible, or is it a different protocol? If it is a
> >> different protocol, I suggest you rename your frontend/backend PV device
> >> name to something different from "vfb".
> >>
> > Well, offending part is vkbd actually (for multi-touch
> > we run our own user-space backend which supports
> > kbd/ptr/mtouch), but vfb and vkbd is the same backend
> > in QEMU. So, I am ok for vfb, but just want vkbd off
> > So, there are 2 options:
> > 1. At compile time remove vkbd and still allow vfb
> > 2. Remove xenfb completely, if acceptable (this is my case)
> 
> What about adding a Xenstore entry for backend type and let qemu test
> for it being not present or containing "qemu"?

That is what we do for the console, using the xenstore node "type". QEMU
is "ioemu" while xenconsoled is "xenconsoled". Weirdly, instead of a
backend node, it is a read-only frontend node, see
tools/libxl/libxl_console.c:libxl__device_console_add.

Oleksandr, I am sorry to feature-creep this simple patch, but I think
Juergen is right. But we cannot do it just for one protocol. We need to
introduce a generic way to enable/disable backends in QEMU. Using a
xenstore node is OK.

We could do exactly the same as the PV console, thus "type" = "ioemu",
read-only, under the frontend xenstore directory. Or we could introduce
new nodes. I would probably go for "backend-type" = "qemu" under the
backend xenstore directory. I don't have a strong opinion about this. In
the example below I'll use the PV console convention.

For starters:

* libxl needs to write the "type" node to xenstore for *all* protocols.
  The "type" is not yet configurable.
* qemu reads them for all backends, proceeds if "type" = "ioemu"

These should be two simple patches. Stage 2:

* we add options in the xl config file to configure any backend, libxl
  set "type" accordingly (Maybe not *any*, but vif, vkbd, vfb could all
  have a "type". It is OK if you only add an option for vkbd.)
* non-QEMU backends, in particular Linux backends, also read the "type"
  node and proceed if it's "linux"

Does this sound OK to you?

Re: [Qemu-devel] [RFC v2 4/4] spec/vhost-user spec: Add IOMMU support

2017-04-14 Thread Maxime Coquelin




On 04/14/2017 07:40 PM, Maxime Coquelin wrote:

 static void slave_read(void *opaque)
 {
 struct vhost_dev *dev = opaque;
@@ -611,6 +639,8 @@ static void slave_read(void *opaque)
 }

 switch (msg.request) {

Oops, I missed "case VHOST_USER_SLAVE_IOTLB_MSG:" here..

+ret = vhost_user_iotlb_read(dev, );
+break;
 default:
 error_report("Received unexpected msg type.");
 ret = -EINVAL;
@@ -848,6 +878,71 @@ static int vhost_user_net_set_mtu(struct vhost_dev *dev, 
uint16_t mtu)
 return 0;
 }


Maxime

[Qemu-devel] [RFC v2 3/4] vhost-user: add slave-req-fd support

2017-04-14 Thread Maxime Coquelin

From: Marc-André Lureau 

Learn to give a socket to the slave to let him make requests to the
master.

Signed-off-by: Marc-André Lureau 
Signed-off-by: Maxime Coquelin 
---
 docs/specs/vhost-user.txt |  30 +++-
 hw/virtio/vhost-user.c| 117 ++
 2 files changed, 145 insertions(+), 2 deletions(-)

diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt
index 036890f..49c6293 100644
--- a/docs/specs/vhost-user.txt
+++ b/docs/specs/vhost-user.txt
@@ -139,6 +139,7 @@ in the ancillary data:
  * VHOST_USER_SET_VRING_KICK
  * VHOST_USER_SET_VRING_CALL
  * VHOST_USER_SET_VRING_ERR
+ * VHOST_USER_SET_SLAVE_REQ_FD
 
 If Master is unable to send the full message or receives a wrong reply it will
 close the connection. An optional reconnection mechanism can be implemented.
@@ -252,6 +253,18 @@ Once the source has finished migration, rings will be 
stopped by
 the source. No further update must be done before rings are
 restarted.
 
+Slave communication
+---
+
+An optional communication channel is provided if the slave declares
+VHOST_USER_PROTOCOL_F_SLAVE_REQ protocol feature, to allow the slave to make
+requests to the master.
+
+The fd is provided via VHOST_USER_SET_SLAVE_REQ_FD ancillary data.
+
+A slave may then send VHOST_USER_SLAVE_* messages to the master
+using this fd communication channel.
+
 Protocol features
 -
 
@@ -260,9 +273,10 @@ Protocol features
 #define VHOST_USER_PROTOCOL_F_RARP   2
 #define VHOST_USER_PROTOCOL_F_REPLY_ACK  3
 #define VHOST_USER_PROTOCOL_F_MTU4
+#define VHOST_USER_PROTOCOL_F_SLAVE_REQ  5
 
-Message types
--
+Master message types
+
 
  * VHOST_USER_GET_FEATURES
 
@@ -486,6 +500,18 @@ Message types
   If VHOST_USER_PROTOCOL_F_REPLY_ACK is negotiated, slave must respond
   with zero in case the specified MTU is valid, or non-zero otherwise.
 
+ * VHOST_USER_SET_SLAVE_REQ_FD
+
+  Id: 21
+  Equivalent ioctl: N/A
+  Master payload: N/A
+
+  Set the socket file descriptor for slave initiated requests. It is passed
+  in the ancillary data.
+  This request should be sent only when VHOST_USER_F_PROTOCOL_FEATURES
+  has been negotiated, and protocol feature bit 
VHOST_USER_PROTOCOL_F_SLAVE_REQ
+  bit is present in VHOST_USER_GET_PROTOCOL_FEATURES.
+
 VHOST_USER_PROTOCOL_F_REPLY_ACK:
 ---
 The original vhost-user specification only demands replies for certain
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index f0e10d0..f26f11b 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -33,6 +33,7 @@ enum VhostUserProtocolFeature {
 VHOST_USER_PROTOCOL_F_RARP = 2,
 VHOST_USER_PROTOCOL_F_REPLY_ACK = 3,
 VHOST_USER_PROTOCOL_F_NET_MTU = 4,
+VHOST_USER_PROTOCOL_F_SLAVE_REQ = 5,
 
 VHOST_USER_PROTOCOL_F_MAX
 };
@@ -61,9 +62,15 @@ typedef enum VhostUserRequest {
 VHOST_USER_SET_VRING_ENABLE = 18,
 VHOST_USER_SEND_RARP = 19,
 VHOST_USER_NET_SET_MTU = 20,
+VHOST_USER_SET_SLAVE_REQ_FD = 21,
 VHOST_USER_MAX
 } VhostUserRequest;
 
+typedef enum VhostUserSlaveRequest {
+VHOST_USER_SLAVE_NONE = 0,
+VHOST_USER_SLAVE_MAX
+}  VhostUserSlaveRequest;
+
 typedef struct VhostUserMemoryRegion {
 uint64_t guest_phys_addr;
 uint64_t memory_size;
@@ -113,6 +120,7 @@ static VhostUserMsg m __attribute__ ((unused));
 
 struct vhost_user {
 CharBackend *chr;
+int slave_fd;
 };
 
 static bool ioeventfd_enabled(void)
@@ -574,6 +582,104 @@ static int vhost_user_reset_device(struct vhost_dev *dev)
 return 0;
 }
 
+static void slave_read(void *opaque)
+{
+struct vhost_dev *dev = opaque;
+struct vhost_user *u = dev->opaque;
+VhostUserMsg msg = { 0, };
+int size, ret = 0;
+
+/* Read header */
+size = read(u->slave_fd, , VHOST_USER_HDR_SIZE);
+if (size != VHOST_USER_HDR_SIZE) {
+error_report("Failed to read from slave.");
+goto err;
+}
+
+if (msg.size > VHOST_USER_PAYLOAD_SIZE) {
+error_report("Failed to read msg header."
+" Size %d exceeds the maximum %zu.", msg.size,
+VHOST_USER_PAYLOAD_SIZE);
+goto err;
+}
+
+/* Read payload */
+size = read(u->slave_fd, , msg.size);
+if (size != msg.size) {
+error_report("Failed to read payload from slave.");
+goto err;
+}
+
+switch (msg.request) {
+default:
+error_report("Received unexpected msg type.");
+ret = -EINVAL;
+}
+
+/*
+ * REPLY_ACK feature handling. Other reply types has to be managed
+ * directly in their request handlers.
+ */
+if (msg.flags & VHOST_USER_NEED_REPLY_MASK) {
+msg.flags &= ~VHOST_USER_NEED_REPLY_MASK;
+msg.flags |= VHOST_USER_REPLY_MASK;
+
+msg.payload.u64 = !!ret;

[Qemu-devel] [RFC v2 4/4] spec/vhost-user spec: Add IOMMU support

2017-04-14 Thread Maxime Coquelin

This patch specifies and implements the master/slave communication
to support device IOTLB in slave.

The vhost_iotlb_msg structure introduced for kernel backends is
re-used, making the design close between the two backends.

An exception is the use of the secondary channel to enable the
slave to send IOTLB miss requests to the master.

Signed-off-by: Maxime Coquelin 
---
 docs/specs/vhost-user.txt | 74 +++
 hw/virtio/vhost-user.c| 98 +++
 2 files changed, 172 insertions(+)

diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt
index 49c6293..a16dc44 100644
--- a/docs/specs/vhost-user.txt
+++ b/docs/specs/vhost-user.txt
@@ -97,6 +97,23 @@ Depending on the request type, payload can be:
log offset: offset from start of supplied file descriptor
where logging starts (i.e. where guest address 0 would be logged)
 
+ * An IOTLB message
+   -
+   | iova | size | user address | permissions flags | type |
+   -
+
+   IOVA: a 64-bit guest I/O virtual address
+   Size: a 64-bit size
+   User address: a 64-bit user address
+   Permissions flags: a 8-bit bit field:
+- Bit 0: Read access
+- Bit 1: Write access
+   Type: a 8-bit IOTLB message type:
+- 1: IOTLB miss
+- 2: IOTLB update
+- 3: IOTLB invalidate
+- 4: IOTLB access fail
+
 In QEMU the vhost-user message is implemented with the following struct:
 
 typedef struct VhostUserMsg {
@@ -109,6 +126,7 @@ typedef struct VhostUserMsg {
 struct vhost_vring_addr addr;
 VhostUserMemory memory;
 VhostUserLog log;
+struct vhost_iotlb_msg iotlb;
 };
 } QEMU_PACKED VhostUserMsg;
 
@@ -253,6 +271,30 @@ Once the source has finished migration, rings will be 
stopped by
 the source. No further update must be done before rings are
 restarted.
 
+IOMMU support
+-
+
+When the VIRTIO_F_IOMMU_PLATFORM feature has been negotiated, the master has
+to send IOTLB entries update & invalidation by sending VHOST_USER_IOTLB_MSG
+requests to the slave with a struct vhost_iotlb_msg payload. For update events,
+the iotlb payload has to be filled with the update message type (2), the I/O
+virtual address, the size, the user virtual address, and the permissions
+flags. For invalidation events, the iotlb payload has to be filled with the
+update message type (3), the I/O virtual address and the size. On success, the
+slave is expected to reply with a zero payload, non-zero otherwise.
+
+When the VHOST_USER_PROTOCOL_F_SLAVE_REQ is supported by the slave, and the
+master initiated the slave to master communication channel using the
+VHOST_USER_SET_SLAVE_REQ_FD request, the slave can send IOTLB miss and access
+failure events by sending VHOST_USER_SLAVE_IOTLB_MSG requests to the master
+with a struct vhost_iotlb_msg payload. For miss events, the iotlb payload has
+to be filled with the miss message type (1), the I/O virtual address and the
+permissions flags. For access failure event, the iotlb payload has to be
+filled with the access failure message type (4), the I/O virtual address and
+the permissions flags. For synchronization purpose, the slave may rely on the
+reply-ack feature, so the master may send a reply when operation is completed
+if the reply-ack feature is negotiated and slaves requests a reply.
+
 Slave communication
 ---
 
@@ -512,6 +554,38 @@ Master message types
   has been negotiated, and protocol feature bit 
VHOST_USER_PROTOCOL_F_SLAVE_REQ
   bit is present in VHOST_USER_GET_PROTOCOL_FEATURES.
 
+ * VHOST_USER_IOTLB_MSG
+
+  Id: 22
+  Equivalent ioctl: N/A (equivalent to VHOST_IOTLB_MSG message type)
+  Master payload: struct vhost_iotlb_msg
+  Slave payload: u64
+
+  Send IOTLB messages with struct vhost_iotlb_msg as payload.
+  Master sends such requests to update and invalidate entries in the device
+  IOTLB. The slave has to acknowledge the request with sending zero as u64
+  payload for success, non-zero otherwise.
+  This request should be send only when VIRTIO_F_IOMMU_PLATFORM feature
+  has been successfully negotiated.
+
+Slave message types
+---
+
+ * VHOST_USER_SLAVE_IOTLB_MSG
+
+  Id: 1
+  Equivalent ioctl: N/A (equivalent to VHOST_IOTLB_MSG message type)
+  Slave payload: struct vhost_iotlb_msg
+  Master payload: N/A
+
+  Send IOTLB messages with struct vhost_iotlb_msg as payload.
+  Slave sends such requests to notify of an IOTLB miss, or an IOTLB
+  access failure. If VHOST_USER_PROTOCOL_F_REPLY_ACK is negotiated,
+  and slave set the VHOST_USER_NEED_REPLY flag, master must respond with
+  zero when operation is successfully completed, or non-zero otherwise.
+  This request should be send only when VIRTIO_F_IOMMU_PLATFORM feature
+

[Qemu-devel] [RFC v2 0/4] vhost-user: Specify and implement device IOTLB support

2017-04-14 Thread Maxime Coquelin

This series aims at specifying ans implementing the protocol update
required to support device IOTLB with user backends.

In this second version, the slave requests channel part is re-used
from Marc-André's series submitted last year[0], with main changes
from original version being request/feature names renaming and addition
of the REPLY_ACK feature support.

Regarding IOTLB protocol, one noticeable change is the IOTLB miss request
reply made optionnal (i.e. only if slave requests it by setting the
VHOST_USER_NEED_REPLY flag in the message header). This change provides
more flexibility in the backend implementation of the feature.

The protocol is very close to kernel backends, except that a new
communication channel is introduced to enable the slave to send
requests to the master.

[0]: https://lists.gnu.org/archive/html/qemu-devel/2016-04/msg00095.html

Marc-André Lureau (2):
  vhost-user: add vhost_user to hold the chr
  vhost-user: add slave-req-fd support

Maxime Coquelin (2):
  vhost: propagate errors in vhost_device_iotlb_miss()
  spec/vhost-user spec: Add IOMMU support

 docs/specs/vhost-user.txt | 104 -
 hw/virtio/vhost-user.c| 234 +-
 hw/virtio/vhost.c |  15 ++-
 include/hw/virtio/vhost.h |   2 +-
 4 files changed, 344 insertions(+), 11 deletions(-)

-- 
2.9.3

[Qemu-devel] [RFC v2 2/4] vhost-user: add vhost_user to hold the chr

2017-04-14 Thread Maxime Coquelin

From: Marc-André Lureau 

Next patches will add more fields to the structure

Signed-off-by: Marc-André Lureau 
Signed-off-by: Maxime Coquelin 
---
 hw/virtio/vhost-user.c | 19 ---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 9334a8a..f0e10d0 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -111,6 +111,10 @@ static VhostUserMsg m __attribute__ ((unused));
 /* The version of the protocol we support */
 #define VHOST_USER_VERSION(0x1)
 
+struct vhost_user {
+CharBackend *chr;
+};
+
 static bool ioeventfd_enabled(void)
 {
 return kvm_enabled() && kvm_eventfds_enabled();
@@ -118,7 +122,8 @@ static bool ioeventfd_enabled(void)
 
 static int vhost_user_read(struct vhost_dev *dev, VhostUserMsg *msg)
 {
-CharBackend *chr = dev->opaque;
+struct vhost_user *u = dev->opaque;
+CharBackend *chr = u->chr;
 uint8_t *p = (uint8_t *) msg;
 int r, size = VHOST_USER_HDR_SIZE;
 
@@ -199,7 +204,8 @@ static bool vhost_user_one_time_request(VhostUserRequest 
request)
 static int vhost_user_write(struct vhost_dev *dev, VhostUserMsg *msg,
 int *fds, int fd_num)
 {
-CharBackend *chr = dev->opaque;
+struct vhost_user *u = dev->opaque;
+CharBackend *chr = u->chr;
 int ret, size = VHOST_USER_HDR_SIZE + msg->size;
 
 /*
@@ -571,11 +577,14 @@ static int vhost_user_reset_device(struct vhost_dev *dev)
 static int vhost_user_init(struct vhost_dev *dev, void *opaque)
 {
 uint64_t features;
+struct vhost_user *u;
 int err;
 
 assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER);
 
-dev->opaque = opaque;
+u = g_new0(struct vhost_user, 1);
+u->chr = opaque;
+dev->opaque = u;
 
 err = vhost_user_get_features(dev, );
 if (err < 0) {
@@ -620,8 +629,12 @@ static int vhost_user_init(struct vhost_dev *dev, void 
*opaque)
 
 static int vhost_user_cleanup(struct vhost_dev *dev)
 {
+struct vhost_user *u;
+
 assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER);
 
+u = dev->opaque;
+g_free(u);
 dev->opaque = 0;
 
 return 0;
-- 
2.9.3

[Qemu-devel] [RFC v2 1/4] vhost: propagate errors in vhost_device_iotlb_miss()

2017-04-14 Thread Maxime Coquelin

Some backends might want to know when things went wrong.

Signed-off-by: Maxime Coquelin 
---
 hw/virtio/vhost.c | 15 ++-
 include/hw/virtio/vhost.h |  2 +-
 2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 613494d..c490cf9 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -964,18 +964,20 @@ static int vhost_memory_region_lookup(struct vhost_dev 
*hdev,
 return -EFAULT;
 }
 
-void vhost_device_iotlb_miss(struct vhost_dev *dev, uint64_t iova, int write)
+int vhost_device_iotlb_miss(struct vhost_dev *dev, uint64_t iova, int write)
 {
 IOMMUTLBEntry iotlb;
 uint64_t uaddr, len;
+int ret = -EFAULT;
 
 rcu_read_lock();
 
 iotlb = address_space_get_iotlb_entry(dev->vdev->dma_as,
   iova, write);
 if (iotlb.target_as != NULL) {
-if (vhost_memory_region_lookup(dev, iotlb.translated_addr,
-   , )) {
+ret = vhost_memory_region_lookup(dev, iotlb.translated_addr,
+ , );
+if (ret) {
 error_report("Fail to lookup the translated address "
  "%"PRIx64, iotlb.translated_addr);
 goto out;
@@ -984,14 +986,17 @@ void vhost_device_iotlb_miss(struct vhost_dev *dev, 
uint64_t iova, int write)
 len = MIN(iotlb.addr_mask + 1, len);
 iova = iova & ~iotlb.addr_mask;
 
-if (dev->vhost_ops->vhost_update_device_iotlb(dev, iova, uaddr,
-  len, iotlb.perm)) {
+ret = dev->vhost_ops->vhost_update_device_iotlb(dev, iova, uaddr,
+  len, iotlb.perm);
+if (ret) {
 error_report("Fail to update device iotlb");
 goto out;
 }
 }
 out:
 rcu_read_unlock();
+
+return ret;
 }
 
 static int vhost_virtqueue_start(struct vhost_dev *dev,
diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index a450321..467dc77 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -105,5 +105,5 @@ bool vhost_has_free_slot(void);
 int vhost_net_set_backend(struct vhost_dev *hdev,
   struct vhost_vring_file *file);
 
-void vhost_device_iotlb_miss(struct vhost_dev *dev, uint64_t iova, int write);
+int vhost_device_iotlb_miss(struct vhost_dev *dev, uint64_t iova, int write);
 #endif
-- 
2.9.3

Re: [Qemu-devel] [Qemu-block] [PATCH for-2.9-rc5 v2] block: Drain BH in bdrv_drained_begin

2017-04-14 Thread Paolo Bonzini

On 14/04/2017 16:51, Stefan Hajnoczi wrote:
> On Fri, Apr 14, 2017 at 9:02 AM, Fam Zheng  wrote:
>> @@ -398,11 +399,15 @@ void bdrv_drain_all(void);
>>   */\
>>  assert(!bs_->wakeup);  \
>>  bs_->wakeup = true;\
>> -while ((cond)) {   \
>> -aio_context_release(ctx_); \
>> -aio_poll(qemu_get_aio_context(), true);\
>> -aio_context_acquire(ctx_); \
>> -waited_ = true;\
>> +while (busy_) {\
>> +if ((cond)) {  \
>> +waited_ = busy_ = true;\
>> +aio_context_release(ctx_); \
>> +aio_poll(qemu_get_aio_context(), true);\
>> +aio_context_acquire(ctx_); \
>> +} else {   \
>> +busy_ = aio_poll(ctx_, false); \
>> +}  \
> 
> Wait, I'm confused.  The current thread is not in the BDS AioContext.
> We're not allowed to call aio_poll(ctx_, false).

It's pretty ugly indeed.  Strictly from a thread-safety point of view,
everything that aio_poll calls will acquire the AioContext, so that is
safe and in fact the release/acquire pair can beeven  hoisted outside
the "if".

If we did that for blocking=true in both I/O and main thread, then that
would be racy.  This is the scenario mentioned in the commit message for
c9d1a56, "block: only call aio_poll on the current thread's AioContext",
2016-10-28).

If only one thread has blocking=true, it's subject to races too.  In
this case, the I/O thread may fail to be woken by iothread_stop's
aio_notify.  However, by the time iothread_stop is called there should
be no BlockDriverStates (and thus no BDRV_POLL_WHILE running the above
code) for the I/O thread's AioContext.

The root cause of the bug is simple: due to the main thread having the
I/O thread's AioContext, the main thread is not letting the I/O thread
run.  This is what RFifoLock and the contention callbacks were designed
to fix, though they had other issues related to recursive locking (if
you acquired an AioContext twice, the contention callback failed to
release it).  The right way to fix it is just not to hold _any_ lock
across BDRV_POLL_WHILE.  This means getting rid of
aio_context_acquire/release, and being somewhat careful about
bdrv_drained_begin, but overall it's not hard.

But Fam's patch seems ugly but safe, making it the right thing to do for
the 2.9 branch---possibly with a FIXME comment explaining the above.
bdrv_coroutine_enter can be removed too once BDS can do fine-grained
locking.

So, the ramifications of the current partial conversion are pretty
complex (though many older QEMU releases had other similar
dataplane+blockjob bugs due to the above recursive locking issue), but
it looks like it is more or less manageable to fix them either now or in
a quick 2.9.1 release a few weeks after 2.9.0.  (Fam and I also tried
another way to fix it today, but it led to deadlocks not related at all
to the partial conversion, so it seemed like a dead end).

Regarding the other code path mentioned in the cover letter:

>> 
>> qmp_block_commit
>>   commit_active_start
>> bdrv_reopen
>>   bdrv_reopen_multiple
>> bdrv_reopen_prepare
>>   bdrv_flush
>> aio_poll
>>   aio_bh_poll
>> aio_bh_call
>>   block_job_defer_to_main_loop_bh
>> stream_complete
>>   bdrv_reopen

it's possible that it's just a missing bdrv_ref/unref pair, respectively
in bdrv_reopen_queue_child and bdrv_reopen_multiple, so not related to
this patch.  We didn't test adding the ref/unref though.

Thanks,

Paolo

> Did you mean aio_poll(qemu_get_aio_context(), false) in order to
> process defer to main loop BHs?

Re: [Qemu-devel] [PATCH 2/6] util: introduce glib-helper.c

2017-04-14 Thread Philippe Mathieu-Daudé


Hi Alexey,

On 04/14/2017 10:17 AM, Alexey Perevalov wrote:

There is a lack of g_int_cmp which compares pointers value in glib,
xen_disk.c introduced its own, so the same function now requires
in migration.c. So logically to move it into common place.
Futher: maybe extend glib.

Also this commit moves existing glib-compat.h into util/glib
folder for consolidation purpose.


Can you do this in two commits? First one moving files only, second move 
the function?


I'm not sure naming it "g_int_cmp()" won't clash with future _extended_ 
glib, what do you think about naming it "qemu_g_int_cmp()"?



Signed-off-by: Alexey Perevalov 
---
 hw/block/xen_disk.c|  10 +-
 include/glib-compat.h  | 352 -
 include/glib/glib-compat.h | 352 +
 include/glib/glib-helper.h |  30 
 include/qemu/osdep.h   |   2 +-
 linux-user/main.c  |   2 +-
 scripts/clean-includes |   2 +-
 util/Makefile.objs |   1 +
 util/glib-helper.c |  29 
 9 files changed, 417 insertions(+), 363 deletions(-)
 delete mode 100644 include/glib-compat.h
 create mode 100644 include/glib/glib-compat.h
 create mode 100644 include/glib/glib-helper.h
 create mode 100644 util/glib-helper.c

diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
index 456a2d5..36f6396 100644
--- a/hw/block/xen_disk.c
+++ b/hw/block/xen_disk.c
@@ -20,6 +20,7 @@
  */

 #include "qemu/osdep.h"
+#include "glib/glib-helper.h"
 #include 
 #include 

@@ -154,13 +155,6 @@ static void ioreq_reset(struct ioreq *ioreq)
 qemu_iovec_reset(>v);
 }

-static gint int_cmp(gconstpointer a, gconstpointer b, gpointer user_data)
-{
-uint ua = GPOINTER_TO_UINT(a);
-uint ub = GPOINTER_TO_UINT(b);
-return (ua > ub) - (ua < ub);
-}
-
 static void destroy_grant(gpointer pgnt)
 {
 PersistentGrant *grant = pgnt;
@@ -1191,7 +1185,7 @@ static int blk_connect(struct XenDevice *xendev)
 if (blkdev->feature_persistent) {
 /* Init persistent grants */
 blkdev->max_grants = max_requests * BLKIF_MAX_SEGMENTS_PER_REQUEST;
-blkdev->persistent_gnts = g_tree_new_full((GCompareDataFunc)int_cmp,
+blkdev->persistent_gnts = g_tree_new_full((GCompareDataFunc)g_int_cmp,
  NULL, NULL,
  batch_maps ?
  (GDestroyNotify)g_free :
diff --git a/include/glib-compat.h b/include/glib-compat.h
deleted file mode 100644
index 863c8cf..000
--- a/include/glib-compat.h
+++ /dev/null
@@ -1,352 +0,0 @@
-/*
- * GLIB Compatibility Functions
- *
- * Copyright IBM, Corp. 2013
- *
- * Authors:
- *  Anthony Liguori   
- *  Michael Tokarev   
- *  Paolo Bonzini 
- *
- * This work is licensed under the terms of the GNU GPL, version 2 or later.
- * See the COPYING file in the top-level directory.
- *
- */
-
-#ifndef QEMU_GLIB_COMPAT_H
-#define QEMU_GLIB_COMPAT_H
-
-#include 
-
-/* GLIB version compatibility flags */
-#if !GLIB_CHECK_VERSION(2, 26, 0)
-#define G_TIME_SPAN_SECOND  (G_GINT64_CONSTANT(100))
-#endif
-
-#if !GLIB_CHECK_VERSION(2, 28, 0)
-static inline gint64 qemu_g_get_monotonic_time(void)
-{
-/* g_get_monotonic_time() is best-effort so we can use the wall clock as a
- * fallback.
- */
-
-GTimeVal time;
-g_get_current_time();
-
-return time.tv_sec * G_TIME_SPAN_SECOND + time.tv_usec;
-}
-/* work around distro backports of this interface */
-#define g_get_monotonic_time() qemu_g_get_monotonic_time()
-#endif
-
-#if defined(_WIN32) && !GLIB_CHECK_VERSION(2, 50, 0)
-/*
- * g_poll has a problem on Windows when using
- * timeouts < 10ms, so use wrapper.
- */
-#define g_poll(fds, nfds, timeout) g_poll_fixed(fds, nfds, timeout)
-gint g_poll_fixed(GPollFD *fds, guint nfds, gint timeout);
-#endif
-
-#if !GLIB_CHECK_VERSION(2, 30, 0)
-/* Not a 100% compatible implementation, but good enough for most
- * cases. Placeholders are only supported at the end of the
- * template. */
-static inline gchar *qemu_g_dir_make_tmp(gchar const *tmpl, GError **error)
-{
-gchar *path = g_build_filename(g_get_tmp_dir(), tmpl ?: ".XX", NULL);
-
-if (mkdtemp(path) != NULL) {
-return path;
-}
-/* Error occurred, clean up. */
-g_set_error(error, G_FILE_ERROR, g_file_error_from_errno(errno),
-"mkdtemp() failed");
-g_free(path);
-return NULL;
-}
-#define g_dir_make_tmp(tmpl, error) qemu_g_dir_make_tmp(tmpl, error)
-#endif /* glib 2.30 */
-
-#if !GLIB_CHECK_VERSION(2, 31, 0)
-/* before glib-2.31, GMutex and GCond was dynamic-only (there was a separate
- * GStaticMutex, but it didn't work with condition variables).
- *
- * Our implementation uses GOnce to fake a static implementation that does
- * not require separate initialization.
- * We need to rename the types to

Re: [Qemu-devel] [PATCH for-2.10 1/3] qemu-img/convert: Always set ret < 0 on error

2017-04-14 Thread Philippe Mathieu-Daudé


On 04/13/2017 05:33 PM, Max Reitz wrote:

Otherwise the qemu-img process will exit with EXIT_SUCCESS instead of
EXIT_FAILURE.

Cc: qemu-stable 
Signed-off-by: Max Reitz 


Reviewed-by: Philippe Mathieu-Daudé 


---
 qemu-img.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/qemu-img.c b/qemu-img.c
index 37c2894678..f2809e1ab4 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -2069,6 +2069,7 @@ static int img_convert(int argc, char **argv)
 opts = qemu_opts_parse_noisily(_object_opts,
optarg, true);
 if (!opts) {
+ret = -1;
 goto fail_getopt;
 }
 break;
@@ -2081,6 +2082,7 @@ static int img_convert(int argc, char **argv)
 if (qemu_opts_foreach(_object_opts,
   user_creatable_add_opts_foreach,
   NULL, NULL)) {
+ret = -1;
 goto fail_getopt;
 }

Re: [Qemu-devel] [PATCH v9 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration

2017-04-14 Thread Michael S. Tsirkin

On Fri, Apr 14, 2017 at 02:47:40AM -0700, Matthew Wilcox wrote:
> On Fri, Apr 14, 2017 at 04:50:48AM +0300, Michael S. Tsirkin wrote:
> > On Thu, Apr 13, 2017 at 01:44:11PM -0700, Matthew Wilcox wrote:
> > > On Thu, Apr 13, 2017 at 05:35:03PM +0800, Wei Wang wrote:
> > > > 2) transfer the guest unused pages to the host so that they
> > > > can be skipped to migrate in live migration.
> > > 
> > > I don't understand this second bit.  You leave the pages on the free list,
> > > and tell the host they're free.  What's preventing somebody else from
> > > allocating them and using them for something?  Is the guest semi-frozen
> > > at this point with just enough of it running to ask the balloon driver
> > > to do things?
> > 
> > There's missing documentation here.
> > 
> > The way things actually work is host sends to guest
> > a request for unused pages and then write-protects all memory.
> 
> ... hopefully you mean "write protects all memory, then sends a request
> for unused pages", otherwise there's a race condition.

Exactly.

> And I see the utility of this, but does this functionality belong in
> the balloon driver?

We have historically put all kind of memory-related functionality in the
balloon device. Consider for example memory statistics - seems related
conceptually. See patches 1-2: the new mechanism for reporting lists of
pages seems to be benefitial for both which seems to indicate using the
balloon for this is a good idea.

> It seems like it's something you might want even if you don't have the
> balloon driver loaded.  Or something you might not want if you do have
> the balloon driver loaded.

Most of balloon functionality is kind of loosely coupled.  Yes we could
split it up but I'm not sure what would this buy us. What do you have
in mind?

-- 
MST

[Qemu-devel] [PATCH 6/6] migration: detailed traces for postcopy

2017-04-14 Thread Alexey Perevalov

It could help to track down vCPU state during page fault and
page fault sources.

This patch showes proc's status/stack/syscall file at the moment of pagefault,
it's very interesting to know who was page fault initiator.

Signed-off-by: Alexey Perevalov 
---
 migration/postcopy-ram.c | 98 +++-
 migration/trace-events   |  6 +++
 2 files changed, 103 insertions(+), 1 deletion(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 42330fd..513633c 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -412,7 +412,91 @@ static int ram_block_enable_notify(const char *block_name, 
void *host_addr,
 return 0;
 }
 
-static int get_mem_fault_cpu_index(uint32_t pid)
+#define PROC_LEN 1024
+#define DEBUG_FAULT_PROCESS_STATUS 1
+
+#ifdef DEBUG_FAULT_PROCESS_STATUS
+
+static FILE *get_proc_file(const gchar *frmt, pid_t thread_id)
+{
+FILE *f = NULL;
+gchar *file_path = g_strdup_printf(frmt, thread_id);
+if (file_path == NULL) {
+error_report("Couldn't allocate path for %u", thread_id);
+return NULL;
+}
+f = fopen(file_path, "r");
+if (!f) {
+error_report("can't open %s", file_path);
+}
+
+trace_get_proc_file(file_path);
+g_free(file_path);
+return f;
+}
+
+typedef void(*proc_line_handler)(const char *line);
+
+static void proc_line_cb(const char *line)
+{
+/* trace_ functions are inline */
+trace_proc_line_cb(line);
+}
+
+static void foreach_line_in_file(FILE *f, proc_line_handler cb)
+{
+char *line = NULL;
+ssize_t read;
+size_t len;
+
+while ((read = getline(, , f)) != -1) {
+/* workaround, trace_ infrastructure already insert \n
+ * and getline includes it */
+ssize_t str_len = strlen(line) - 1;
+if (str_len <= 0)
+continue;
+line[str_len] = '\0';
+cb(line);
+}
+free(line);
+}
+
+static void observe_thread_proc(const gchar *path_frmt, pid_t thread_id)
+{
+FILE *f = get_proc_file(path_frmt, thread_id);
+if (!f) {
+error_report("can't read thread's proc");
+return;
+}
+
+foreach_line_in_file(f, proc_line_cb);
+fclose(f);
+}
+
+/*
+ * for convinience tracing need to trace
+ * observe_thread_begin
+ * get_proc_file
+ * proc_line_cb
+ * observe_thread_end
+ */
+static void observe_thread(const char *msg, pid_t thread_id)
+{
+trace_observe_thread_begin(msg);
+observe_thread_proc("/proc/%d/status", thread_id);
+observe_thread_proc("/proc/%d/syscall", thread_id);
+observe_thread_proc("/proc/%d/stack", thread_id);
+trace_observe_thread_end(msg);
+}
+
+#else
+static void observe_thread(const char *msg, pid_t thread_id)
+{
+}
+
+#endif /* DEBUG_FAULT_PROCESS_STATUS */
+
+static int get_mem_fault_cpu_index(pid_t pid)
 {
 CPUState *cpu_iter;
 
@@ -421,9 +505,20 @@ static int get_mem_fault_cpu_index(uint32_t pid)
return cpu_iter->cpu_index;
 }
 trace_get_mem_fault_cpu_index(pid);
+observe_thread("not a vCPU", pid);
+
 return -1;
 }
 
+static void observe_vcpu_state(void)
+{
+CPUState *cpu_iter;
+CPU_FOREACH(cpu_iter) {
+observe_thread("vCPU", cpu_iter->thread_id);
+trace_vcpu_state(cpu_iter->running, cpu_iter->cpu_index);
+}
+}
+
 /*
  * Handle faults detected by the USERFAULT markings
  */
@@ -465,6 +560,7 @@ static void *postcopy_ram_fault_thread(void *opaque)
 }
 
 ret = read(mis->userfault_fd, , sizeof(msg));
+observe_vcpu_state();
 if (ret != sizeof(msg)) {
 if (errno == EAGAIN) {
 /*
diff --git a/migration/trace-events b/migration/trace-events
index ab2e1e4..3a74f91 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -202,6 +202,12 @@ save_xbzrle_page_overflow(void) ""
 ram_save_iterate_big_wait(uint64_t milliconds, int iterations) "big wait: %" 
PRIu64 " milliseconds, %d iterations"
 ram_load_complete(int ret, uint64_t seq_iter) "exit_code %d seq iteration %" 
PRIu64
 get_mem_fault_cpu_index(uint32_t pid) "pid %u is not vCPU"
+observe_thread_status(int ptid, char *name, char *status) "host_tid %d %s %s"
+vcpu_state(int cpu_index, int is_running) "cpu %d running %d"
+proc_line_cb(const char *str) "%s"
+get_proc_file(const char *str) "opened %s"
+observe_thread_begin(const char *str) "%s"
+observe_thread_end(const char *str) "%s"
 
 # migration/exec.c
 migration_exec_outgoing(const char *cmd) "cmd=%s"
-- 
1.8.3.1

[Qemu-devel] [PATCH 5/6] migration: send postcopy downtime back to source

2017-04-14 Thread Alexey Perevalov

Right now to initiate postcopy live migration need to
send request to source machine and specify destination.

User could request migration status by query-migrate qmp command on
source machine, but postcopy downtime is being evaluated on destination,
so it should be transmitted back to source. For this purpose return path
socket was shosen.

Signed-off-by: Alexey Perevalov 
---
 include/migration/migration.h |  4 +++-
 migration/migration.c | 20 ++--
 migration/postcopy-ram.c  |  1 +
 3 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 5d2c628..5535aa6 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -55,7 +55,8 @@ enum mig_rp_message_type {
 
 MIG_RP_MSG_REQ_PAGES_ID, /* data (start: be64, len: be32, id: string) */
 MIG_RP_MSG_REQ_PAGES,/* data (start: be64, len: be32) */
-
+MIG_RP_MSG_DOWNTIME,/* downtime value from destination,
+   calculated and sent in case of post copy */
 MIG_RP_MSG_MAX
 };
 
@@ -364,6 +365,7 @@ void migrate_send_rp_pong(MigrationIncomingState *mis,
   uint32_t value);
 void migrate_send_rp_req_pages(MigrationIncomingState *mis, const char* rbname,
   ram_addr_t start, size_t len);
+void migrate_send_rp_downtime(MigrationIncomingState *mis, uint64_t downtime);
 
 void ram_control_before_iterate(QEMUFile *f, uint64_t flags);
 void ram_control_after_iterate(QEMUFile *f, uint64_t flags);
diff --git a/migration/migration.c b/migration/migration.c
index 5bac434..3134e24 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -553,6 +553,19 @@ void migrate_send_rp_message(MigrationIncomingState *mis,
 }
 
 /*
+ * Send postcopy migration downtime,
+ * at the moment of calling this function migration should
+ * be completed.
+ */
+void migrate_send_rp_downtime(MigrationIncomingState *mis, uint64_t downtime)
+{
+uint64_t buf;
+
+buf = cpu_to_be64(downtime);
+migrate_send_rp_message(mis, MIG_RP_MSG_DOWNTIME, sizeof(downtime), );
+}
+
+/*
  * Send a 'SHUT' message on the return channel with the given value
  * to indicate that we've finished with the RP.  Non-0 value indicates
  * error.
@@ -1483,6 +1496,7 @@ static struct rp_cmd_args {
 [MIG_RP_MSG_PONG]   = { .len =  4, .name = "PONG" },
 [MIG_RP_MSG_REQ_PAGES]  = { .len = 12, .name = "REQ_PAGES" },
 [MIG_RP_MSG_REQ_PAGES_ID]   = { .len = -1, .name = "REQ_PAGES_ID" },
+[MIG_RP_MSG_DOWNTIME]   = { .len =  8, .name = "DOWNTIME" },
 [MIG_RP_MSG_MAX]= { .len = -1, .name = "MAX" },
 };
 
@@ -1613,6 +1627,10 @@ static void *source_return_path_thread(void *opaque)
 migrate_handle_rp_req_pages(ms, (char *)[13], start, len);
 break;
 
+case MIG_RP_MSG_DOWNTIME:
+ms->downtime = ldq_be_p(buf);
+break;
+
 default:
 break;
 }
@@ -1677,7 +1695,6 @@ static int postcopy_start(MigrationState *ms, bool 
*old_vm_running)
 int ret;
 QIOChannelBuffer *bioc;
 QEMUFile *fb;
-int64_t time_at_stop = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
 bool restart_block = false;
 migrate_set_state(>state, MIGRATION_STATUS_ACTIVE,
   MIGRATION_STATUS_POSTCOPY_ACTIVE);
@@ -1779,7 +1796,6 @@ static int postcopy_start(MigrationState *ms, bool 
*old_vm_running)
  */
 ms->postcopy_after_devices = true;
 notifier_list_notify(_state_notifiers, ms);
-ms->downtime =  qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - time_at_stop;
 
 qemu_mutex_unlock_iothread();
 
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index ea89f4e..42330fd 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -330,6 +330,7 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState 
*mis)
 }
 
 postcopy_state_set(POSTCOPY_INCOMING_END);
+migrate_send_rp_downtime(mis, get_postcopy_total_downtime());
 migrate_send_rp_shut(mis, qemu_file_get_error(mis->from_src_file) != 0);
 
 if (mis->postcopy_tmp_page) {
-- 
1.8.3.1

[Qemu-devel] [PATCH 2/6] util: introduce glib-helper.c

2017-04-14 Thread Alexey Perevalov

There is a lack of g_int_cmp which compares pointers value in glib,
xen_disk.c introduced its own, so the same function now requires
in migration.c. So logically to move it into common place.
Futher: maybe extend glib.

Also this commit moves existing glib-compat.h into util/glib
folder for consolidation purpose.

Signed-off-by: Alexey Perevalov 
---
 hw/block/xen_disk.c|  10 +-
 include/glib-compat.h  | 352 -
 include/glib/glib-compat.h | 352 +
 include/glib/glib-helper.h |  30 
 include/qemu/osdep.h   |   2 +-
 linux-user/main.c  |   2 +-
 scripts/clean-includes |   2 +-
 util/Makefile.objs |   1 +
 util/glib-helper.c |  29 
 9 files changed, 417 insertions(+), 363 deletions(-)
 delete mode 100644 include/glib-compat.h
 create mode 100644 include/glib/glib-compat.h
 create mode 100644 include/glib/glib-helper.h
 create mode 100644 util/glib-helper.c

diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
index 456a2d5..36f6396 100644
--- a/hw/block/xen_disk.c
+++ b/hw/block/xen_disk.c
@@ -20,6 +20,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "glib/glib-helper.h"
 #include 
 #include 
 
@@ -154,13 +155,6 @@ static void ioreq_reset(struct ioreq *ioreq)
 qemu_iovec_reset(>v);
 }
 
-static gint int_cmp(gconstpointer a, gconstpointer b, gpointer user_data)
-{
-uint ua = GPOINTER_TO_UINT(a);
-uint ub = GPOINTER_TO_UINT(b);
-return (ua > ub) - (ua < ub);
-}
-
 static void destroy_grant(gpointer pgnt)
 {
 PersistentGrant *grant = pgnt;
@@ -1191,7 +1185,7 @@ static int blk_connect(struct XenDevice *xendev)
 if (blkdev->feature_persistent) {
 /* Init persistent grants */
 blkdev->max_grants = max_requests * BLKIF_MAX_SEGMENTS_PER_REQUEST;
-blkdev->persistent_gnts = g_tree_new_full((GCompareDataFunc)int_cmp,
+blkdev->persistent_gnts = g_tree_new_full((GCompareDataFunc)g_int_cmp,
  NULL, NULL,
  batch_maps ?
  (GDestroyNotify)g_free :
diff --git a/include/glib-compat.h b/include/glib-compat.h
deleted file mode 100644
index 863c8cf..000
--- a/include/glib-compat.h
+++ /dev/null
@@ -1,352 +0,0 @@
-/*
- * GLIB Compatibility Functions
- *
- * Copyright IBM, Corp. 2013
- *
- * Authors:
- *  Anthony Liguori   
- *  Michael Tokarev   
- *  Paolo Bonzini 
- *
- * This work is licensed under the terms of the GNU GPL, version 2 or later.
- * See the COPYING file in the top-level directory.
- *
- */
-
-#ifndef QEMU_GLIB_COMPAT_H
-#define QEMU_GLIB_COMPAT_H
-
-#include 
-
-/* GLIB version compatibility flags */
-#if !GLIB_CHECK_VERSION(2, 26, 0)
-#define G_TIME_SPAN_SECOND  (G_GINT64_CONSTANT(100))
-#endif
-
-#if !GLIB_CHECK_VERSION(2, 28, 0)
-static inline gint64 qemu_g_get_monotonic_time(void)
-{
-/* g_get_monotonic_time() is best-effort so we can use the wall clock as a
- * fallback.
- */
-
-GTimeVal time;
-g_get_current_time();
-
-return time.tv_sec * G_TIME_SPAN_SECOND + time.tv_usec;
-}
-/* work around distro backports of this interface */
-#define g_get_monotonic_time() qemu_g_get_monotonic_time()
-#endif
-
-#if defined(_WIN32) && !GLIB_CHECK_VERSION(2, 50, 0)
-/*
- * g_poll has a problem on Windows when using
- * timeouts < 10ms, so use wrapper.
- */
-#define g_poll(fds, nfds, timeout) g_poll_fixed(fds, nfds, timeout)
-gint g_poll_fixed(GPollFD *fds, guint nfds, gint timeout);
-#endif
-
-#if !GLIB_CHECK_VERSION(2, 30, 0)
-/* Not a 100% compatible implementation, but good enough for most
- * cases. Placeholders are only supported at the end of the
- * template. */
-static inline gchar *qemu_g_dir_make_tmp(gchar const *tmpl, GError **error)
-{
-gchar *path = g_build_filename(g_get_tmp_dir(), tmpl ?: ".XX", NULL);
-
-if (mkdtemp(path) != NULL) {
-return path;
-}
-/* Error occurred, clean up. */
-g_set_error(error, G_FILE_ERROR, g_file_error_from_errno(errno),
-"mkdtemp() failed");
-g_free(path);
-return NULL;
-}
-#define g_dir_make_tmp(tmpl, error) qemu_g_dir_make_tmp(tmpl, error)
-#endif /* glib 2.30 */
-
-#if !GLIB_CHECK_VERSION(2, 31, 0)
-/* before glib-2.31, GMutex and GCond was dynamic-only (there was a separate
- * GStaticMutex, but it didn't work with condition variables).
- *
- * Our implementation uses GOnce to fake a static implementation that does
- * not require separate initialization.
- * We need to rename the types to avoid passing our CompatGMutex/CompatGCond
- * by mistake to a function that expects GMutex/GCond.  However, for ease
- * of use we keep the GLib function names.  GLib uses macros for the
- * implementation, we use inline functions instead and undefine the macros.
- */
-
-typedef

[Qemu-devel] [PATCH 4/6] migration: calculate downtime on dst side

2017-04-14 Thread Alexey Perevalov

This patch provides downtime calculation per vCPU,
as a summary and as a overlapped value for all vCPUs.

This approach just keeps tree with page fault addr as a key,
and t1-t2 interval of pagefault time and page copy time, with
affected vCPU bit mask.
For more implementation details please see comment to
get_postcopy_total_downtime function.

Signed-off-by: Alexey Perevalov 
---
 include/migration/migration.h |  14 +++
 migration/migration.c | 280 +-
 migration/postcopy-ram.c  |  24 +++-
 migration/qemu-file.c |   1 -
 migration/trace-events|   9 +-
 5 files changed, 323 insertions(+), 5 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 5720c88..5d2c628 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -123,10 +123,24 @@ struct MigrationIncomingState {
 
 /* See savevm.c */
 LoadStateEntry_Head loadvm_handlers;
+
+/*
+ *  Tree for keeping postcopy downtime,
+ *  necessary to calculate correct downtime, during multiple
+ *  vm suspends, it keeps host page address as a key and
+ *  DowntimeDuration as a data
+ *  NULL means kernel couldn't provide process thread id,
+ *  and QEMU couldn't identify which vCPU raise page fault
+ */
+GTree *postcopy_downtime;
 };
 
 MigrationIncomingState *migration_incoming_get_current(void);
 void migration_incoming_state_destroy(void);
+void mark_postcopy_downtime_begin(uint64_t addr, int cpu);
+void mark_postcopy_downtime_end(uint64_t addr);
+uint64_t get_postcopy_total_downtime(void);
+void destroy_downtime_duration(gpointer data);
 
 /*
  * An outstanding page request, on the source, having been received
diff --git a/migration/migration.c b/migration/migration.c
index 79f6425..5bac434 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -38,6 +38,8 @@
 #include "io/channel-tls.h"
 #include "migration/colo.h"
 
+#define DEBUG_VCPU_DOWNTIME 1
+
 #define MAX_THROTTLE  (32 << 20)  /* Migration transfer speed throttling */
 
 /* Amount of time to allocate to each "chunk" of bandwidth-throttled
@@ -77,6 +79,19 @@ static NotifierList migration_state_notifiers =
 
 static bool deferred_incoming;
 
+typedef struct {
+int64_t begin;
+int64_t end;
+uint64_t *cpus; /* cpus bit mask array, QEMU bit functions support
+ bit operation on memory regions, but doesn't check out of range */
+} DowntimeDuration;
+
+typedef struct {
+int64_t tp; /* point in time */
+bool is_end;
+uint64_t *cpus;
+} OverlapDowntime;
+
 /*
  * Current state of incoming postcopy; note this is not part of
  * MigrationIncomingState since it's state is used during cleanup
@@ -117,6 +132,13 @@ MigrationState *migrate_get_current(void)
 return _migration;
 }
 
+void destroy_downtime_duration(gpointer data)
+{
+DowntimeDuration *dd = (DowntimeDuration *)data;
+g_free(dd->cpus);
+g_free(data);
+}
+
 MigrationIncomingState *migration_incoming_get_current(void)
 {
 static bool once;
@@ -138,10 +160,13 @@ void migration_incoming_state_destroy(void)
 struct MigrationIncomingState *mis = migration_incoming_get_current();
 
 qemu_event_destroy(>main_thread_load_event);
+if (mis->postcopy_downtime) {
+g_tree_destroy(mis->postcopy_downtime);
+mis->postcopy_downtime = NULL;
+}
 loadvm_free_handlers(mis);
 }
 
-
 typedef struct {
 bool optional;
 uint32_t size;
@@ -1754,7 +1779,6 @@ static int postcopy_start(MigrationState *ms, bool 
*old_vm_running)
  */
 ms->postcopy_after_devices = true;
 notifier_list_notify(_state_notifiers, ms);
-
 ms->downtime =  qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - time_at_stop;
 
 qemu_mutex_unlock_iothread();
@@ -2117,3 +2141,255 @@ PostcopyState postcopy_state_set(PostcopyState 
new_state)
 return atomic_xchg(_postcopy_state, new_state);
 }
 
+#define SIZE_TO_KEEP_CPUBITS (1 + smp_cpus/sizeof(guint64))
+
+void mark_postcopy_downtime_begin(uint64_t addr, int cpu)
+{
+MigrationIncomingState *mis = migration_incoming_get_current();
+DowntimeDuration *dd;
+if (!mis->postcopy_downtime) {
+return;
+}
+
+dd = g_tree_lookup(mis->postcopy_downtime, (gpointer)addr); /* !!! cast */
+if (!dd) {
+dd = (DowntimeDuration *)g_new0(DowntimeDuration, 1);
+dd->cpus = g_new0(guint64, SIZE_TO_KEEP_CPUBITS);
+g_tree_insert(mis->postcopy_downtime, (gpointer)addr, (gpointer)dd);
+}
+
+if (cpu < 0) {
+/* assume in this situation all vCPUs are sleeping */
+int i;
+for (i = 0; i < SIZE_TO_KEEP_CPUBITS; i++) {
+dd->cpus[i] = ~(uint64_t)0u;
+}
+} else
+set_bit(cpu, dd->cpus);
+
+/*
+ *  overwrite previously set dd->begin, if that page already was
+ * faulted on another cpu
+ */
+dd->begin = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+

[Qemu-devel] [PATCH 1/6] userfault: add pid into uffd_msg & update UFFD_FEATURE_*

2017-04-14 Thread Alexey Perevalov

This commit duplicates header of "userfaultfd: provide pid in userfault msg"
into linux kernel.

Signed-off-by: Alexey Perevalov 
---
 linux-headers/linux/userfaultfd.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/linux-headers/linux/userfaultfd.h 
b/linux-headers/linux/userfaultfd.h
index 2ed5dc3..760f02f 100644
--- a/linux-headers/linux/userfaultfd.h
+++ b/linux-headers/linux/userfaultfd.h
@@ -77,6 +77,9 @@ struct uffd_msg {
struct {
__u64   flags;
__u64   address;
+  union {
+   __u32   ptid;
+  } feat;
} pagefault;
 
struct {
@@ -158,6 +161,8 @@ struct uffdio_api {
 #define UFFD_FEATURE_EVENT_MADVDONTNEED(1<<3)
 #define UFFD_FEATURE_MISSING_HUGETLBFS (1<<4)
 #define UFFD_FEATURE_MISSING_SHMEM (1<<5)
+#define UFFD_FEATURE_EVENT_UNMAP   (1<<6)
+#define UFFD_FEATURE_THREAD_ID (1<<7)
__u64 features;
 
__u64 ioctls;
-- 
1.8.3.1

[Qemu-devel] [PATCH 3/6] migration: add UFFD_FEATURE_THREAD_ID feature support

2017-04-14 Thread Alexey Perevalov

Userfaultfd mechanism is able to provide process thread id,
in case when client request it with UFDD_API ioctl.

Signed-off-by: Alexey Perevalov 
---
 include/migration/postcopy-ram.h |  2 +-
 migration/migration.c|  2 +-
 migration/postcopy-ram.c | 12 ++--
 migration/savevm.c   |  2 +-
 4 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
index 8e036b9..809f6db 100644
--- a/include/migration/postcopy-ram.h
+++ b/include/migration/postcopy-ram.h
@@ -14,7 +14,7 @@
 #define QEMU_POSTCOPY_RAM_H
 
 /* Return true if the host supports everything we need to do postcopy-ram */
-bool postcopy_ram_supported_by_host(void);
+bool postcopy_ram_supported_by_host(MigrationIncomingState *mis);
 
 /*
  * Make all of RAM sensitive to accesses to areas that haven't yet been written
diff --git a/migration/migration.c b/migration/migration.c
index ad4036f..79f6425 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -802,7 +802,7 @@ void 
qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
  * special support.
  */
 if (!old_postcopy_cap && runstate_check(RUN_STATE_INMIGRATE) &&
-!postcopy_ram_supported_by_host()) {
+!postcopy_ram_supported_by_host(NULL)) {
 /* postcopy_ram_supported_by_host will have emitted a more
  * detailed message
  */
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index dc80dbb..70f0480 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -60,13 +60,13 @@ struct PostcopyDiscardState {
 #include 
 #include 
 
-static bool ufd_version_check(int ufd)
+static bool ufd_version_check(int ufd, MigrationIncomingState *mis)
 {
 struct uffdio_api api_struct;
 uint64_t ioctl_mask;
 
 api_struct.api = UFFD_API;
-api_struct.features = 0;
+api_struct.features = UFFD_FEATURE_THREAD_ID;
 if (ioctl(ufd, UFFDIO_API, _struct)) {
 error_report("postcopy_ram_supported_by_host: UFFDIO_API failed: %s",
  strerror(errno));
@@ -113,7 +113,7 @@ static int test_range_shared(const char *block_name, void 
*host_addr,
  * normally fine since if the postcopy succeeds it gets turned back on at the
  * end.
  */
-bool postcopy_ram_supported_by_host(void)
+bool postcopy_ram_supported_by_host(MigrationIncomingState *mis)
 {
 long pagesize = getpagesize();
 int ufd = -1;
@@ -136,7 +136,7 @@ bool postcopy_ram_supported_by_host(void)
 }
 
 /* Version and features check */
-if (!ufd_version_check(ufd)) {
+if (!ufd_version_check(ufd, mis)) {
 goto out;
 }
 
@@ -515,7 +515,7 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
  * Although the host check already tested the API, we need to
  * do the check again as an ABI handshake on the new fd.
  */
-if (!ufd_version_check(mis->userfault_fd)) {
+if (!ufd_version_check(mis->userfault_fd, mis)) {
 return -1;
 }
 
@@ -653,7 +653,7 @@ void *postcopy_get_tmp_page(MigrationIncomingState *mis)
 
 #else
 /* No target OS support, stubs just fail */
-bool postcopy_ram_supported_by_host(void)
+bool postcopy_ram_supported_by_host(MigrationIncomingState *mis)
 {
 error_report("%s: No OS support", __func__);
 return false;
diff --git a/migration/savevm.c b/migration/savevm.c
index 3b19a4a..f01e418 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1360,7 +1360,7 @@ static int 
loadvm_postcopy_handle_advise(MigrationIncomingState *mis)
 return -1;
 }
 
-if (!postcopy_ram_supported_by_host()) {
+if (!postcopy_ram_supported_by_host(mis)) {
 postcopy_state_set(POSTCOPY_INCOMING_NONE);
 return -1;
 }
-- 
1.8.3.1

[Qemu-devel] [PATCH 0/6] calculate downtime for postcopy live migration

2017-04-14 Thread Alexey Perevalov

This patch set includes downtime calculation on destination,
sending it to source machine for statistics (query-migration).

Also additional traceses here for track down who was pagefault
initiator.

This patch set is based on master branch of git://git.qemu-project.org/qemu.git
base commit is commit 372b3fe0b2ecdd39ba850e31c0c6686315c507af.

It contains kernel side pages, just for convinience of applying current patch 
set,
for testing util kernel headers arn't synced.

Alexey Perevalov (6):
  userfault: add pid into uffd_msg & update UFFD_FEATURE_*
  util: introduce glib-helper.c
  migration: add UFFD_FEATURE_THREAD_ID feature support
  migration: calculate downtime on dst side
  migration: send postcopy downtime back to source
  migration: detailed traces for postcopy

 hw/block/xen_disk.c   |  10 +-
 include/glib-compat.h | 352 --
 include/glib/glib-compat.h| 352 ++
 include/glib/glib-helper.h|  30 
 include/migration/migration.h |  18 +-
 include/migration/postcopy-ram.h  |   2 +-
 include/qemu/osdep.h  |   2 +-
 linux-headers/linux/userfaultfd.h |   5 +
 linux-user/main.c |   2 +-
 migration/migration.c | 302 +++-
 migration/postcopy-ram.c  | 133 +-
 migration/qemu-file.c |   1 -
 migration/savevm.c|   2 +-
 migration/trace-events|  15 +-
 scripts/clean-includes|   2 +-
 util/Makefile.objs|   1 +
 util/glib-helper.c|  29 
 17 files changed, 878 insertions(+), 380 deletions(-)
 delete mode 100644 include/glib-compat.h
 create mode 100644 include/glib/glib-compat.h
 create mode 100644 include/glib/glib-helper.h
 create mode 100644 util/glib-helper.c

-- 
1.8.3.1

[Qemu-devel] [RFC v5 3/4] hw/intc/arm_gicv3_kvm: Implement pending table save

2017-04-14 Thread Eric Auger

This patch adds the flush of the LPI pending bits into the
redistributor pending tables. This happens on VM stop.

There is no explicit restore as the tables are implicitly sync'ed
on ITS table restore and on LPI enable at redistributor level.

Signed-off-by: Eric Auger 

---

v5: new
---
 hw/intc/arm_gicv3_kvm.c | 25 +
 1 file changed, 25 insertions(+)

diff --git a/hw/intc/arm_gicv3_kvm.c b/hw/intc/arm_gicv3_kvm.c
index 19aab56..9898c49 100644
--- a/hw/intc/arm_gicv3_kvm.c
+++ b/hw/intc/arm_gicv3_kvm.c
@@ -25,6 +25,7 @@
 #include "hw/sysbus.h"
 #include "qemu/error-report.h"
 #include "sysemu/kvm.h"
+#include "sysemu/sysemu.h"
 #include "kvm_arm.h"
 #include "gicv3_internal.h"
 #include "vgic_common.h"
@@ -680,6 +681,26 @@ static const ARMCPRegInfo gicv3_cpuif_reginfo[] = {
 REGINFO_SENTINEL
 };
 
+/**
+ * vm_change_state_handler - VM change state callback aiming at flushing
+ * RDIST pending tables into guest RAM
+ *
+ * The tables get flushed to guest RAM whenever the VM gets stopped.
+ */
+static void vm_change_state_handler(void *opaque, int running,
+RunState state)
+{
+GICv3State *s = (GICv3State *)opaque;
+
+if (running) {
+return;
+}
+
+kvm_device_access(s->dev_fd, KVM_DEV_ARM_VGIC_GRP_CTRL,
+  KVM_DEV_ARM_VGIC_SAVE_PENDING_TABLES, NULL, true);
+}
+
+
 static void kvm_arm_gicv3_realize(DeviceState *dev, Error **errp)
 {
 GICv3State *s = KVM_ARM_GICV3(dev);
@@ -751,6 +772,10 @@ static void kvm_arm_gicv3_realize(DeviceState *dev, Error 
**errp)
 return;
 }
 }
+if (kvm_device_check_attr(s->dev_fd, KVM_DEV_ARM_VGIC_GRP_CTRL,
+  KVM_DEV_ARM_VGIC_SAVE_PENDING_TABLES)) {
+qemu_add_vm_change_state_handler(vm_change_state_handler, s);
+}
 }
 
 static void kvm_arm_gicv3_class_init(ObjectClass *klass, void *data)
-- 
2.5.5

[Qemu-devel] [RFC v5 4/4] hw/intc/arm_gicv3_its: Allow save/restore

2017-04-14 Thread Eric Auger

We change the restoration priority of both the GICv3 and ITS. The
GICv3 must be restored before the ITS and the ITS needs to be restored
before PCIe devices since it translates their MSI transactions.

Signed-off-by: Eric Auger 
Reviewed-by: Juan Quintela 

---
v2 -> v3:
- reword migration blocker message
- remove unmigratable setting to false

v1 -> v2:
- handle case where migrate_add_blocker fails
- add comments along with ITS and GICv3 migration priorities
---
 hw/intc/arm_gicv3_common.c |  1 +
 hw/intc/arm_gicv3_its_common.c |  2 +-
 hw/intc/arm_gicv3_its_kvm.c| 24 
 include/migration/vmstate.h|  2 ++
 4 files changed, 16 insertions(+), 13 deletions(-)

diff --git a/hw/intc/arm_gicv3_common.c b/hw/intc/arm_gicv3_common.c
index c6493d6..4228b7c 100644
--- a/hw/intc/arm_gicv3_common.c
+++ b/hw/intc/arm_gicv3_common.c
@@ -145,6 +145,7 @@ static const VMStateDescription vmstate_gicv3 = {
 .minimum_version_id = 1,
 .pre_save = gicv3_pre_save,
 .post_load = gicv3_post_load,
+.priority = MIG_PRI_GICV3,
 .fields = (VMStateField[]) {
 VMSTATE_UINT32(gicd_ctlr, GICv3State),
 VMSTATE_UINT32_ARRAY(gicd_statusr, GICv3State, 2),
diff --git a/hw/intc/arm_gicv3_its_common.c b/hw/intc/arm_gicv3_its_common.c
index efab8c7..22ce4c4 100644
--- a/hw/intc/arm_gicv3_its_common.c
+++ b/hw/intc/arm_gicv3_its_common.c
@@ -48,7 +48,7 @@ static const VMStateDescription vmstate_its = {
 .name = "arm_gicv3_its",
 .pre_save = gicv3_its_pre_save,
 .post_load = gicv3_its_post_load,
-.unmigratable = true,
+.priority = MIG_PRI_GICV3_ITS,
 .fields = (VMStateField[]) {
 VMSTATE_UINT32(ctlr, GICv3ITSState),
 VMSTATE_UINT32(iidr, GICv3ITSState),
diff --git a/hw/intc/arm_gicv3_its_kvm.c b/hw/intc/arm_gicv3_its_kvm.c
index 7c5502c..8401d2f 100644
--- a/hw/intc/arm_gicv3_its_kvm.c
+++ b/hw/intc/arm_gicv3_its_kvm.c
@@ -77,18 +77,6 @@ static void kvm_arm_its_realize(DeviceState *dev, Error 
**errp)
 GICv3ITSState *s = ARM_GICV3_ITS_COMMON(dev);
 Error *local_err = NULL;
 
-/*
- * Block migration of a KVM GICv3 ITS device: the API for saving and
- * restoring the state in the kernel is not yet available
- */
-error_setg(>migration_blocker, "vITS migration is not implemented");
-migrate_add_blocker(s->migration_blocker, _err);
-if (local_err) {
-error_propagate(errp, local_err);
-error_free(s->migration_blocker);
-return;
-}
-
 s->dev_fd = kvm_create_device(kvm_state, KVM_DEV_TYPE_ARM_VGIC_ITS, false);
 if (s->dev_fd < 0) {
 error_setg_errno(errp, -s->dev_fd, "error creating in-kernel ITS");
@@ -105,6 +93,18 @@ static void kvm_arm_its_realize(DeviceState *dev, Error 
**errp)
 
 gicv3_its_init_mmio(s, NULL);
 
+if (!kvm_device_check_attr(s->dev_fd, KVM_DEV_ARM_VGIC_GRP_ITS_REGS,
+GITS_CTLR)) {
+error_setg(>migration_blocker, "This operating system kernel "
+   "does not support vGICv3 migration");
+migrate_add_blocker(s->migration_blocker, _err);
+if (local_err) {
+error_propagate(errp, local_err);
+error_free(s->migration_blocker);
+return;
+}
+}
+
 kvm_msi_use_devid = true;
 kvm_gsi_direct_mapping = false;
 kvm_msi_via_irqfd_allowed = kvm_irqfds_enabled();
diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index f2dbf84..8dab9c7 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -198,6 +198,8 @@ enum VMStateFlags {
 typedef enum {
 MIG_PRI_DEFAULT = 0,
 MIG_PRI_IOMMU,  /* Must happen before PCI devices */
+MIG_PRI_GICV3_ITS,  /* Must happen before PCI devices */
+MIG_PRI_GICV3,  /* Must happen before the ITS */
 MIG_PRI_MAX,
 } MigrationPriority;
 
-- 
2.5.5

[Qemu-devel] [RFC v5 2/4] hw/intc/arm_gicv3_its: Implement state save/restore

2017-04-14 Thread Eric Auger

We need to handle both registers and ITS tables. While
register handling is standard, ITS table handling is more
challenging since the kernel API is devised so that the
tables are flushed into guest RAM and not in vmstate buffers.

Flushing the ITS tables on device pre_save() is too late
since the guest RAM is already saved at this point.

Table flushing needs to happen when we are sure the vcpus
are stopped and before the last dirty page saving. The
right point is RUN_STATE_FINISH_MIGRATE but sometimes the
VM gets stopped before migration launch so let's simply
flush the tables each time the VM gets stopped.

For regular ITS registers we just can use vmstate pre_save()
and post_load() callbacks.

Signed-off-by: Eric Auger 

---
v4 -> v5:
- Use the new user API, ie. attributes in
  KVM_DEV_ARM_VGIC_GRP_CTRL group

v3 -> v4:
- remove useless reg declaration in pre_save()

v2 -> v3:
- take into account Peter's comments:
  - add iidr save/restore
  - reword comments
  - remove s->ctlr = extract64(reg, 0, 32);
  - rename get()/put() function into pre_save()/post_load()
- do not execute put() if iidr == 0

Signed-off-by: Eric Auger 
---
 hw/intc/arm_gicv3_its_common.c |  9 
 hw/intc/arm_gicv3_its_kvm.c| 96 ++
 include/hw/intc/arm_gicv3_its_common.h |  8 +++
 3 files changed, 113 insertions(+)

diff --git a/hw/intc/arm_gicv3_its_common.c b/hw/intc/arm_gicv3_its_common.c
index 9d67c5c..efab8c7 100644
--- a/hw/intc/arm_gicv3_its_common.c
+++ b/hw/intc/arm_gicv3_its_common.c
@@ -49,6 +49,15 @@ static const VMStateDescription vmstate_its = {
 .pre_save = gicv3_its_pre_save,
 .post_load = gicv3_its_post_load,
 .unmigratable = true,
+.fields = (VMStateField[]) {
+VMSTATE_UINT32(ctlr, GICv3ITSState),
+VMSTATE_UINT32(iidr, GICv3ITSState),
+VMSTATE_UINT64(cbaser, GICv3ITSState),
+VMSTATE_UINT64(cwriter, GICv3ITSState),
+VMSTATE_UINT64(creadr, GICv3ITSState),
+VMSTATE_UINT64_ARRAY(baser, GICv3ITSState, 8),
+VMSTATE_END_OF_LIST()
+},
 };
 
 static MemTxResult gicv3_its_trans_read(void *opaque, hwaddr offset,
diff --git a/hw/intc/arm_gicv3_its_kvm.c b/hw/intc/arm_gicv3_its_kvm.c
index bd4f3aa..7c5502c 100644
--- a/hw/intc/arm_gicv3_its_kvm.c
+++ b/hw/intc/arm_gicv3_its_kvm.c
@@ -53,6 +53,25 @@ static int kvm_its_send_msi(GICv3ITSState *s, uint32_t 
value, uint16_t devid)
 return kvm_vm_ioctl(kvm_state, KVM_SIGNAL_MSI, );
 }
 
+/**
+ * vm_change_state_handler - VM change state callback aiming at flushing
+ * ITS tables into guest RAM
+ *
+ * The tables get flushed to guest RAM whenever the VM gets stopped.
+ */
+static void vm_change_state_handler(void *opaque, int running,
+RunState state)
+{
+GICv3ITSState *s = (GICv3ITSState *)opaque;
+
+if (running) {
+return;
+}
+
+kvm_device_access(s->dev_fd, KVM_DEV_ARM_VGIC_GRP_CTRL,
+  KVM_DEV_ARM_ITS_SAVE_TABLES, NULL, true);
+}
+
 static void kvm_arm_its_realize(DeviceState *dev, Error **errp)
 {
 GICv3ITSState *s = ARM_GICV3_ITS_COMMON(dev);
@@ -89,6 +108,8 @@ static void kvm_arm_its_realize(DeviceState *dev, Error 
**errp)
 kvm_msi_use_devid = true;
 kvm_gsi_direct_mapping = false;
 kvm_msi_via_irqfd_allowed = kvm_irqfds_enabled();
+
+qemu_add_vm_change_state_handler(vm_change_state_handler, s);
 }
 
 static void kvm_arm_its_init(Object *obj)
@@ -102,6 +123,79 @@ static void kvm_arm_its_init(Object *obj)
  _abort);
 }
 
+/**
+ * kvm_arm_its_pre_save - handles the saving of ITS registers.
+ * ITS tables are flushed into guest RAM separately and earlier,
+ * through the VM change state handler, since at the moment pre_save()
+ * is called, the guest RAM has already been saved.
+ */
+static void kvm_arm_its_pre_save(GICv3ITSState *s)
+{
+int i;
+
+for (i = 0; i < 8; i++) {
+kvm_device_access(s->dev_fd, KVM_DEV_ARM_VGIC_GRP_ITS_REGS,
+  GITS_BASER + i * 8, >baser[i], false);
+}
+
+kvm_device_access(s->dev_fd, KVM_DEV_ARM_VGIC_GRP_ITS_REGS,
+  GITS_CTLR, >ctlr, false);
+
+kvm_device_access(s->dev_fd, KVM_DEV_ARM_VGIC_GRP_ITS_REGS,
+  GITS_CBASER, >cbaser, false);
+
+kvm_device_access(s->dev_fd, KVM_DEV_ARM_VGIC_GRP_ITS_REGS,
+  GITS_CREADR, >creadr, false);
+
+kvm_device_access(s->dev_fd, KVM_DEV_ARM_VGIC_GRP_ITS_REGS,
+  GITS_CWRITER, >cwriter, false);
+
+kvm_device_access(s->dev_fd, KVM_DEV_ARM_VGIC_GRP_ITS_REGS,
+  GITS_IIDR, >iidr, false);
+}
+
+/**
+ * kvm_arm_its_post_load - Restore both the ITS registers and tables
+ */
+static void kvm_arm_its_post_load(GICv3ITSState *s)
+{
+uint64_t reg;
+int i;
+
+if (!s->iidr) {
+return;
+}
+
+kvm_device_access(s->dev_fd,

[Qemu-devel] [RFC v5 1/4] linux-headers: Update for vITS save/restore

2017-04-14 Thread Eric Auger

This is a linux header update against 4.11-rc6 plus the non
upstreamed ITS migration series.

https://github.com/eauger/linux/tree/v4.11-rc6-its-mig-v5

It aims at enhancing the KVM user API with vITS save/restore
capability. Those are new attributes supported in the
ITS and GICv3 KVM device KVM_DEV_ARM_VGIC_GRP_CTRL groups.

Signed-off-by: Eric Auger 
---
 linux-headers/asm-arm/kvm.h   | 6 +-
 linux-headers/asm-arm64/kvm.h | 6 +-
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/linux-headers/asm-arm/kvm.h b/linux-headers/asm-arm/kvm.h
index 1101d55..ca7b5bf 100644
--- a/linux-headers/asm-arm/kvm.h
+++ b/linux-headers/asm-arm/kvm.h
@@ -192,13 +192,17 @@ struct kvm_arch_memory_slot {
 #define KVM_DEV_ARM_VGIC_GRP_REDIST_REGS 5
 #define KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS 6
 #define KVM_DEV_ARM_VGIC_GRP_LEVEL_INFO  7
+#define KVM_DEV_ARM_VGIC_GRP_ITS_REGS  8
 #define KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_SHIFT 10
 #define KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_MASK \
(0x3fULL << KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_SHIFT)
 #define KVM_DEV_ARM_VGIC_LINE_LEVEL_INTID_MASK 0x3ff
 #define VGIC_LEVEL_INFO_LINE_LEVEL 0
 
-#define   KVM_DEV_ARM_VGIC_CTRL_INIT0
+#define   KVM_DEV_ARM_VGIC_CTRL_INIT   0
+#define   KVM_DEV_ARM_ITS_SAVE_TABLES  1
+#define   KVM_DEV_ARM_ITS_RESTORE_TABLES   2
+#define   KVM_DEV_ARM_VGIC_SAVE_PENDING_TABLES 3
 
 /* KVM_IRQ_LINE irq field index values */
 #define KVM_ARM_IRQ_TYPE_SHIFT 24
diff --git a/linux-headers/asm-arm64/kvm.h b/linux-headers/asm-arm64/kvm.h
index 651ec30..d1a9046 100644
--- a/linux-headers/asm-arm64/kvm.h
+++ b/linux-headers/asm-arm64/kvm.h
@@ -212,13 +212,17 @@ struct kvm_arch_memory_slot {
 #define KVM_DEV_ARM_VGIC_GRP_REDIST_REGS 5
 #define KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS 6
 #define KVM_DEV_ARM_VGIC_GRP_LEVEL_INFO  7
+#define KVM_DEV_ARM_VGIC_GRP_ITS_REGS 8
 #define KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_SHIFT 10
 #define KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_MASK \
(0x3fULL << KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_SHIFT)
 #define KVM_DEV_ARM_VGIC_LINE_LEVEL_INTID_MASK 0x3ff
 #define VGIC_LEVEL_INFO_LINE_LEVEL 0
 
-#define   KVM_DEV_ARM_VGIC_CTRL_INIT   0
+#define   KVM_DEV_ARM_VGIC_CTRL_INIT   0
+#define   KVM_DEV_ARM_ITS_SAVE_TABLES   1
+#define   KVM_DEV_ARM_ITS_RESTORE_TABLES2
+#define   KVM_DEV_ARM_VGIC_SAVE_PENDING_TABLES 3
 
 /* Device Control API on vcpu fd */
 #define KVM_ARM_VCPU_PMU_V3_CTRL   0
-- 
2.5.5

[Qemu-devel] [RFC v5 0/4] vITS save/restore

2017-04-14 Thread Eric Auger

This series allows ITS save/restore and migration use cases.
It relies on not upstreamed kernel series [1].

ITS tables are flushed into guest RAM on VM stop while registers
are save on pre_save() callback. Tables and registers are restored
on ITS post_load().

Redistributor pending tables also are flushed on VM stop, independently
on ITS tables.

That work was tested on Cavium ThunderX using virsh save/restore and
virt-manager live migration.

Best Regards

Eric

Host Kernel dependencies:
- [1] [PATCH v5 00/22] vITS save/restore

History:
v4 -> v5:
- adapt to the new user API
- new patch "hw/intc/arm_gicv3_kvm: Implement pending table save"
  as the pending table save now is handled on GICV3 side.

v3 -> v4:
- oversight in v3, missed a last minute correction related to
  reg useless declaration in kvm_arm_its_pre_save

v2 -> v3:
- GITS_IIDR is now saved and restored to check ABI revision.
- get/put functions renamed into pre_save/post_load
- unmigratable = false removed
- changed the migration blocker message
- remove the extract64 round s->ctlr
- reword some comments

v1 -> v2:
- rebase on 2.9 soft release code
- handle case where migrate_add_blocker fails
- add comments along with ITS and GICv3 migration priorities


Eric Auger (4):
  linux-headers: Update for vITS save/restore
  hw/intc/arm_gicv3_its: Implement state save/restore
  hw/intc/arm_gicv3_kvm: Implement pending table save
  hw/intc/arm_gicv3_its: Allow save/restore

 hw/intc/arm_gicv3_common.c |   1 +
 hw/intc/arm_gicv3_its_common.c |  11 ++-
 hw/intc/arm_gicv3_its_kvm.c| 120 +
 hw/intc/arm_gicv3_kvm.c|  25 +++
 include/hw/intc/arm_gicv3_its_common.h |   8 +++
 include/migration/vmstate.h|   2 +
 linux-headers/asm-arm/kvm.h|   6 +-
 linux-headers/asm-arm64/kvm.h  |   6 +-
 8 files changed, 164 insertions(+), 15 deletions(-)

-- 
2.5.5

Re: [Qemu-devel] [Bug 1030666] Re: gdb can't proceed after a breakpoint

2017-04-14 Thread Legorol

Hi,

Thank you for your email, I remember this issue. Unfortunately I don’t
have the time to try this right now, but I may be able to get to it in
the next couple of weeks.

Regards,
Legorol

From: Thomas Huth
Sent: 07 April 2017 14:59
To: legorol@gmail.com
Subject: [Bug 1030666] Re: gdb can't proceed after a breakpoint

Triaging old bug tickets ... can you still reproduce this issue with the
latest version of QEMU (version 2.9)?

** Changed in: qemu
   Status: New => Incomplete

-- 
You received this bug notification because you are subscribed to the bug
report.
https://bugs.launchpad.net/bugs/1030666

Title:
  gdb can't proceed after a breakpoint

Status in QEMU:
  Incomplete

Bug description:
  Using qemu-1.1.0-windows.zip package from http://lassauge.free.fr/qemu/
  Host: Windows 7 Ultimate 64-bit
  Guest: i386 system running MS-DOS 6.22
  Launch command line:
    qemu-system-i386.exe -L Bios -fda "DOS.vfd" -fdb "Data.vfd" -gdb 
tcp:127.0.0.1:1234
  Debbugers tried:
    gdb 7.3.50.20111026-cvs running on Cygwin 1.7.16
    gdb 7.4 on MinGW

  Short description:
  I use gdb to attach to a running Qemu session, set a breakpoint and resume 
execution. When the breakpoint is hit, gdb gains control as expected. However, 
trying to single-step or continue at this point just causes the same breakpoint 
to be hit immediately again. Deleting the breakpoint allows single-stepping or 
continue to work.

  Steps to reproduce:
  DOS.vfd is a floppy image containing an MS-DOS 6.22 startup disk.
  Data.vfd is a floppy image containing a single program (hello.com).
  The aim is to debug the execution of hello.com with gdb.
  Launch Qemu.
  Launch gdb, an attach to qemu:
    "target remote localhost:1234"
  I know the address at which hello.com will be loaded, so set a breakpoint 
there and resume execution:
    "break *0xf730"
    "continue"
  In Qemu, start hello.com. The breakpoint is immediately hit, execution stops 
and gdb gains control.
  Examining the program gives expected results (such as "info reg" or 
"disassemble").
  At this point, try to proceed either with single-stepping ("si" or "ni") or 
with "continue", and the same breakpoint is immediately hit again. Subsequent 
attempts to single-step or continue just keep hitting the same breakpoint.
  The only way to proceed at this point is to delete the breakpoint, after 
which both single-stepping and continue work.

  Note that single-stepping and continue works as expected if it is done
  after interrupting execution with Ctrl-C.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1030666/+subscriptions

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1030666

Title:
  gdb can't proceed after a breakpoint

Status in QEMU:
  Incomplete

Bug description:
  Using qemu-1.1.0-windows.zip package from http://lassauge.free.fr/qemu/
  Host: Windows 7 Ultimate 64-bit
  Guest: i386 system running MS-DOS 6.22
  Launch command line:
    qemu-system-i386.exe -L Bios -fda "DOS.vfd" -fdb "Data.vfd" -gdb 
tcp:127.0.0.1:1234
  Debbugers tried:
    gdb 7.3.50.20111026-cvs running on Cygwin 1.7.16
    gdb 7.4 on MinGW

  Short description:
  I use gdb to attach to a running Qemu session, set a breakpoint and resume 
execution. When the breakpoint is hit, gdb gains control as expected. However, 
trying to single-step or continue at this point just causes the same breakpoint 
to be hit immediately again. Deleting the breakpoint allows single-stepping or 
continue to work.

  Steps to reproduce:
  DOS.vfd is a floppy image containing an MS-DOS 6.22 startup disk.
  Data.vfd is a floppy image containing a single program (hello.com).
  The aim is to debug the execution of hello.com with gdb.
  Launch Qemu.
  Launch gdb, an attach to qemu:
    "target remote localhost:1234"
  I know the address at which hello.com will be loaded, so set a breakpoint 
there and resume execution:
    "break *0xf730"
    "continue"
  In Qemu, start hello.com. The breakpoint is immediately hit, execution stops 
and gdb gains control.
  Examining the program gives expected results (such as "info reg" or 
"disassemble").
  At this point, try to proceed either with single-stepping ("si" or "ni") or 
with "continue", and the same breakpoint is immediately hit again. Subsequent 
attempts to single-step or continue just keep hitting the same breakpoint.
  The only way to proceed at this point is to delete the breakpoint, after 
which both single-stepping and continue work.

  Note that single-stepping and continue works as expected if it is done
  after interrupting execution with Ctrl-C.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1030666/+subscriptions

Re: [Qemu-devel] [Qemu-block] [RFC] Proposed qcow2 extension: subcluster allocation

2017-04-14 Thread Roman Kagan

On Thu, Apr 13, 2017 at 09:06:19PM -0400, John Snow wrote:
> So if we have a 1MB cluster with 64k subclusters as a hypothetical, if
> we write just the first subcluster, we'll have a map like:
> 
> X---
> 
> Whatever actually happens to exist in this space, whether it be a hole
> we punched via fallocate or literal zeroes, this space is known to the
> filesystem to be contiguous.
> 
> If we write to the last subcluster, we'll get:
> 
> X--X
> 
> And again, maybe the dashes are a fallocate hole, maybe they're zeroes.
> but the last subcluster is located virtually exactly 15 subclusters
> behind the first, they're not physically contiguous. We've saved the
> space between them. Future out-of-order writes won't contribute to any
> fragmentation, at least at this level.

Yeah I think this is where the confusion lies.  You apparently assume
that the filesystem is smart enough to compensate for the subclusters
being sparse within a cluster, and will make them eventually contiguous
on the *media* once they are all written.  Denis is claiming the
opposite.  I posted a simple experiment with a 64kB sparse file written
out of order which ended up being 16 disparate blocks on the platters
(ext4; with xfs this may be different), and this is obviously
detrimental for performance with rotating disks.

Note also that if the filesystem actually is smart to maintain the
subclusters contiguos even if written out of order, apparently by not
allowing blocks from other files to take the yet unused space between
sparse subclusters, the disk space saving becomes not so obvious.

> You might be able to reduce COW from 5 IOPs to 3 IOPs, but if we tune
> the subclusters right, we'll have *zero*, won't we?

Right, this is an attractive advantage.  Need to test if the later
access to such interleaved clusters is not degraded, though.

Roman.

Re: [Qemu-devel] [PATCH] char: Fix removing wrong GSource that be found by fd_in_tag

2017-04-14 Thread Hailiang Zhang


Hi,

On 2017/4/14 18:10, Marc-André Lureau wrote:

Hi

- Original Message -

We use fd_in_tag to find a GSource, fd_in_tag is return value of
g_source_attach(GSource *source, GMainContext *context), the return
value is unique only in the same context, so we may get the same
values with different 'context' parameters.

It is no problem to find the right fd_in_tag by using
  g_main_context_find_source_by_id(GMainContext *context, guint source_id)
while there is only one default main context.

But colo-compare tries to create/use its own context, and if we pass wrong
'context' parameter with right fd_in_tag, we will find a wrong GSource
to handle. We tied to fix the related codes in commit b43dec, but it didn't

tied->tried

Please use a bit longer commit sha1, or full sha1, it will likely conflict 
otherwise in the future.


OK, I'll replace it with the full sha1.

fix the bug completely, because we still have some codes didn't pass *right*
context parameter for remove_fd_in_watch().

Mixing contexts sounds like a colo-compare bug, did you fix it there too?


Yes, we don't have to change any other codes in colo-compare,
We just call qemu_chr_fe_set_handlers() in colo-compare to detach the old 
GSource
from main default context, and attach the new GSource to the new context.


Let's fix it by record the GSource directly instead of fd_in_tag.

Make sense to me, and even simplify a bit the code. We should be more careful 
with pointers though (the reason why tag existed in the first place I imagine).


Agreed.


Signed-off-by: zhanghailiang 
---
  chardev/char-fd.c |  8 
  chardev/char-io.c | 23 ---
  chardev/char-io.h |  4 ++--
  chardev/char-pty.c|  6 +++---
  chardev/char-socket.c |  8 
  chardev/char-udp.c|  8 
  chardev/char.c|  2 +-
  include/sysemu/char.h |  2 +-
  8 files changed, 27 insertions(+), 34 deletions(-)

diff --git a/chardev/char-fd.c b/chardev/char-fd.c
index 548dd4c..7f0169d 100644
--- a/chardev/char-fd.c
+++ b/chardev/char-fd.c
@@ -58,7 +58,7 @@ static gboolean fd_chr_read(QIOChannel *chan, GIOCondition
cond, void *opaque)
  ret = qio_channel_read(
  chan, (gchar *)buf, len, NULL);
  if (ret == 0) {
-remove_fd_in_watch(chr, NULL);
+remove_fd_in_watch(chr);
  qemu_chr_be_event(chr, CHR_EVENT_CLOSED);
  return FALSE;
  }
@@ -89,9 +89,9 @@ static void fd_chr_update_read_handler(Chardev *chr,
  {
  FDChardev *s = FD_CHARDEV(chr);
  
-remove_fd_in_watch(chr, NULL);

+remove_fd_in_watch(chr);
  if (s->ioc_in) {
-chr->fd_in_tag = io_add_watch_poll(chr, s->ioc_in,
+chr->chr_gsource = io_add_watch_poll(chr, s->ioc_in,
 fd_chr_read_poll,
 fd_chr_read, chr,
 context);
@@ -103,7 +103,7 @@ static void char_fd_finalize(Object *obj)
  Chardev *chr = CHARDEV(obj);
  FDChardev *s = FD_CHARDEV(obj);
  
-remove_fd_in_watch(chr, NULL);

+remove_fd_in_watch(chr);
  if (s->ioc_in) {
  object_unref(OBJECT(s->ioc_in));
  }
diff --git a/chardev/char-io.c b/chardev/char-io.c
index b4bb094..6deb193 100644
--- a/chardev/char-io.c
+++ b/chardev/char-io.c
@@ -98,7 +98,7 @@ static GSourceFuncs io_watch_poll_funcs = {
  .finalize = io_watch_poll_finalize,
  };
  
-guint io_add_watch_poll(Chardev *chr,

+GSource *io_add_watch_poll(Chardev *chr,
  QIOChannel *ioc,
  IOCanReadHandler *fd_can_read,
  QIOChannelFunc fd_read,
@@ -106,7 +106,6 @@ guint io_add_watch_poll(Chardev *chr,
  GMainContext *context)
  {
  IOWatchPoll *iwp;
-int tag;
  char *name;
  
  iwp = (IOWatchPoll *) g_source_new(_watch_poll_funcs,

@@ -122,21 +121,15 @@ guint io_add_watch_poll(Chardev *chr,
  g_source_set_name((GSource *)iwp, name);
  g_free(name);
  
-tag = g_source_attach(>parent, context);

+g_source_attach(>parent, context);
  g_source_unref(>parent);
-return tag;
+return (GSource *)iwp;
  }
  
-static void io_remove_watch_poll(guint tag, GMainContext *context)

+static void io_remove_watch_poll(GSource *source)
  {
-GSource *source;
  IOWatchPoll *iwp;
  
-g_return_if_fail(tag > 0);

-
-source = g_main_context_find_source_by_id(context, tag);
-g_return_if_fail(source != NULL);
-
  iwp = io_watch_poll_from_source(source);
  if (iwp->src) {
  g_source_destroy(iwp->src);
@@ -146,11 +139,11 @@ static void io_remove_watch_poll(guint tag,
GMainContext *context)
  g_source_destroy(>parent);
  }
  
-void remove_fd_in_watch(Chardev *chr, GMainContext *context)

+void remove_fd_in_watch(Chardev *chr)
  {
-if (chr->fd_in_tag) {
-io_remove_watch_poll(chr->fd_in_tag, context);
-chr->fd_in_tag = 0;

Re: [Qemu-devel] [PATCH] char: Fix removing wrong GSource that be found by fd_in_tag

2017-04-14 Thread Marc-André Lureau

Hi

- Original Message -
> We use fd_in_tag to find a GSource, fd_in_tag is return value of
> g_source_attach(GSource *source, GMainContext *context), the return
> value is unique only in the same context, so we may get the same
> values with different 'context' parameters.
> 
> It is no problem to find the right fd_in_tag by using
>  g_main_context_find_source_by_id(GMainContext *context, guint source_id)
> while there is only one default main context.
> 
> But colo-compare tries to create/use its own context, and if we pass wrong
> 'context' parameter with right fd_in_tag, we will find a wrong GSource
> to handle. We tied to fix the related codes in commit b43dec, but it didn't

tied->tried

Please use a bit longer commit sha1, or full sha1, it will likely conflict 
otherwise in the future.

> fix the bug completely, because we still have some codes didn't pass *right*
> context parameter for remove_fd_in_watch().

Mixing contexts sounds like a colo-compare bug, did you fix it there too?

> 
> Let's fix it by record the GSource directly instead of fd_in_tag.

Make sense to me, and even simplify a bit the code. We should be more careful 
with pointers though (the reason why tag existed in the first place I imagine).

> 
> Signed-off-by: zhanghailiang 

> ---
>  chardev/char-fd.c |  8 
>  chardev/char-io.c | 23 ---
>  chardev/char-io.h |  4 ++--
>  chardev/char-pty.c|  6 +++---
>  chardev/char-socket.c |  8 
>  chardev/char-udp.c|  8 
>  chardev/char.c|  2 +-
>  include/sysemu/char.h |  2 +-
>  8 files changed, 27 insertions(+), 34 deletions(-)
> 
> diff --git a/chardev/char-fd.c b/chardev/char-fd.c
> index 548dd4c..7f0169d 100644
> --- a/chardev/char-fd.c
> +++ b/chardev/char-fd.c
> @@ -58,7 +58,7 @@ static gboolean fd_chr_read(QIOChannel *chan, GIOCondition
> cond, void *opaque)
>  ret = qio_channel_read(
>  chan, (gchar *)buf, len, NULL);
>  if (ret == 0) {
> -remove_fd_in_watch(chr, NULL);
> +remove_fd_in_watch(chr);
>  qemu_chr_be_event(chr, CHR_EVENT_CLOSED);
>  return FALSE;
>  }
> @@ -89,9 +89,9 @@ static void fd_chr_update_read_handler(Chardev *chr,
>  {
>  FDChardev *s = FD_CHARDEV(chr);
>  
> -remove_fd_in_watch(chr, NULL);
> +remove_fd_in_watch(chr);
>  if (s->ioc_in) {
> -chr->fd_in_tag = io_add_watch_poll(chr, s->ioc_in,
> +chr->chr_gsource = io_add_watch_poll(chr, s->ioc_in,
> fd_chr_read_poll,
> fd_chr_read, chr,
> context);
> @@ -103,7 +103,7 @@ static void char_fd_finalize(Object *obj)
>  Chardev *chr = CHARDEV(obj);
>  FDChardev *s = FD_CHARDEV(obj);
>  
> -remove_fd_in_watch(chr, NULL);
> +remove_fd_in_watch(chr);
>  if (s->ioc_in) {
>  object_unref(OBJECT(s->ioc_in));
>  }
> diff --git a/chardev/char-io.c b/chardev/char-io.c
> index b4bb094..6deb193 100644
> --- a/chardev/char-io.c
> +++ b/chardev/char-io.c
> @@ -98,7 +98,7 @@ static GSourceFuncs io_watch_poll_funcs = {
>  .finalize = io_watch_poll_finalize,
>  };
>  
> -guint io_add_watch_poll(Chardev *chr,
> +GSource *io_add_watch_poll(Chardev *chr,
>  QIOChannel *ioc,
>  IOCanReadHandler *fd_can_read,
>  QIOChannelFunc fd_read,
> @@ -106,7 +106,6 @@ guint io_add_watch_poll(Chardev *chr,
>  GMainContext *context)
>  {
>  IOWatchPoll *iwp;
> -int tag;
>  char *name;
>  
>  iwp = (IOWatchPoll *) g_source_new(_watch_poll_funcs,
> @@ -122,21 +121,15 @@ guint io_add_watch_poll(Chardev *chr,
>  g_source_set_name((GSource *)iwp, name);
>  g_free(name);
>  
> -tag = g_source_attach(>parent, context);
> +g_source_attach(>parent, context);
>  g_source_unref(>parent);
> -return tag;
> +return (GSource *)iwp;
>  }
>  
> -static void io_remove_watch_poll(guint tag, GMainContext *context)
> +static void io_remove_watch_poll(GSource *source)
>  {
> -GSource *source;
>  IOWatchPoll *iwp;
>  
> -g_return_if_fail(tag > 0);
> -
> -source = g_main_context_find_source_by_id(context, tag);
> -g_return_if_fail(source != NULL);
> -
>  iwp = io_watch_poll_from_source(source);
>  if (iwp->src) {
>  g_source_destroy(iwp->src);
> @@ -146,11 +139,11 @@ static void io_remove_watch_poll(guint tag,
> GMainContext *context)
>  g_source_destroy(>parent);
>  }
>  
> -void remove_fd_in_watch(Chardev *chr, GMainContext *context)
> +void remove_fd_in_watch(Chardev *chr)
>  {
> -if (chr->fd_in_tag) {
> -io_remove_watch_poll(chr->fd_in_tag, context);
> -chr->fd_in_tag = 0;
> +if (chr->chr_gsource) {
> +io_remove_watch_poll(chr->chr_gsource);
> +chr->chr_gsource = 0;

It's a pointer, let's use NULL.

Re: [Qemu-devel] [PATCH v9 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration

2017-04-14 Thread Matthew Wilcox

On Fri, Apr 14, 2017 at 04:50:48AM +0300, Michael S. Tsirkin wrote:
> On Thu, Apr 13, 2017 at 01:44:11PM -0700, Matthew Wilcox wrote:
> > On Thu, Apr 13, 2017 at 05:35:03PM +0800, Wei Wang wrote:
> > > 2) transfer the guest unused pages to the host so that they
> > > can be skipped to migrate in live migration.
> > 
> > I don't understand this second bit.  You leave the pages on the free list,
> > and tell the host they're free.  What's preventing somebody else from
> > allocating them and using them for something?  Is the guest semi-frozen
> > at this point with just enough of it running to ask the balloon driver
> > to do things?
> 
> There's missing documentation here.
> 
> The way things actually work is host sends to guest
> a request for unused pages and then write-protects all memory.

... hopefully you mean "write protects all memory, then sends a request
for unused pages", otherwise there's a race condition.

And I see the utility of this, but does this functionality belong in
the balloon driver?  It seems like it's something you might want even
if you don't have the balloon driver loaded.  Or something you might
not want if you do have the balloon driver loaded.

Re: [Qemu-devel] [PATCH for-2.9 4/5] block: Drain BH in bdrv_drained_begin

2017-04-14 Thread Paolo Bonzini



On 06/04/2017 23:37, Kevin Wolf wrote:
> We used to poll BHs in earlier times. Commit 99723548 ('block: add BDS
> field to count in-flight requests') changed this, without an explanation
> in the commit message.
> 
> Paolo, is this simply a bug in that commit, or do you rely on it
> somewhere? I remember that you wanted to get rid of some aio_poll()
> calls a while ago.

It was replaced by the bdrv_inc_in_flight/bdrv_dec_in_flight calls in
bdrv_aio_prw and bdrv_co_complete.

These let bdrv_drain complete the invocation of the callbacks.  But
block_job_defer_to_main_loop is different because it schedules the
bottom half in another QEMU AioContext (the main thread's).

Since he's here with me, I'll ask Fam exactly what's happening.  Stefan
also found something not too convincing in his review.

Paolo

>>  block/io.c | 7 ++-
>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/block/io.c b/block/io.c
>> index 2709a70..b9cfd18 100644
>> --- a/block/io.c
>> +++ b/block/io.c
>> @@ -228,7 +228,12 @@ void bdrv_drained_begin(BlockDriverState *bs)
>>  bdrv_parent_drained_begin(bs);
>>  }
>>  
>> -bdrv_drain_recurse(bs);
>> +while (true) {
>> +if (!bdrv_drain_recurse(bs) &&
>> +!aio_poll(bdrv_get_aio_context(bs), false)) {
>> +break;
>> +}
> 
> The indentation is off here.
> 
>> +}
>>  }
> 
> The old code had this in what is now the BDRV_POLL_WHILE() call in
> bdrv_drain_recurse(). I think it makes more sense there, saves you a
> loop here and fixes bdrv_drain_all_begin() at the same time.
> 
> Kevin
> 
>

Re: [Qemu-devel] [RFC 1/2] spec/vhost-user: Introduce secondary channel for slave initiated requests

2017-04-14 Thread Marc-André Lureau

Hi

On Tue, Apr 11, 2017 at 5:53 PM Maxime Coquelin 
wrote:

> Hi Marc-André,
>
> On 04/11/2017 03:06 PM, Marc-André Lureau wrote:
> > Hi
> >
> > On Tue, Apr 11, 2017 at 12:10 PM Maxime Coquelin
> > > wrote:
> >
> > This vhost-user specification update aims at enabling the
> > slave to send requests to the master using a dedicated socket
> > created by the master.
> >
> > It can be used for example when the slave implements a device
> > IOTLB to send cache miss requests to the master.
> >
> > The message types list is updated with an "Initiator" field to
> > indicate for each type whether the master and/or slave can
> > initiate the request.
> >
> > Signed-off-by: Maxime Coquelin  > >
> >
> >
> > This is very similar to a patch I proposed for shutdown slave initiated
> > requests:
> > https://lists.gnu.org/archive/html/qemu-devel/2016-04/msg00095.html
>
> Indeed, thanks for pointing this out, I wasn't aware of your series.
>
> I find your proposal of having dedicated messages types
> (VHOST_USER_SLAVE_*) cleaner.
>
> ok


> Are you ok if I handover your patch, and replace
> VHOST_USER_SET_SLAVE_FD to VHOST_USER_SET_SLAVE_REQ_FD?
>

They are very similar, I suggest you update your patch with the best of
both.

I suppose you came to the same conclusion with me that trying to make the
communication both ways on the same fd would be quite difficult, although
it's a bit strange that the qemu implementation forces the design of the
protocol in some direction.
-- 
Marc-André Lureau

Re: [Qemu-devel] [PATCH v9 3/5] mm: function to offer a page block on the free list

2017-04-14 Thread Wei Wang


On 04/14/2017 10:58 AM, Matthew Wilcox wrote:

On Fri, Apr 14, 2017 at 10:30:27AM +0800, Wei Wang wrote:

OK. What do you think if we add this:

#if defined(CONFIG_VIRTIO_BALLOON) || defined(CONFIG_VIRTIO_BALLOON_MODULE)

That's spelled "IS_ENABLED(CONFIG_VIRTIO_BALLOON)", FYI.


Right, thanks.

Best,
Wei

Re: [Qemu-devel] [Qemu-block] [PATCH for-2.9-rc5 v2] block: Drain BH in bdrv_drained_begin

2017-04-14 Thread Stefan Hajnoczi

On Fri, Apr 14, 2017 at 9:02 AM, Fam Zheng  wrote:
> @@ -398,11 +399,15 @@ void bdrv_drain_all(void);
>   */\
>  assert(!bs_->wakeup);  \
>  bs_->wakeup = true;\
> -while ((cond)) {   \
> -aio_context_release(ctx_); \
> -aio_poll(qemu_get_aio_context(), true);\
> -aio_context_acquire(ctx_); \
> -waited_ = true;\
> +while (busy_) {\
> +if ((cond)) {  \
> +waited_ = busy_ = true;\
> +aio_context_release(ctx_); \
> +aio_poll(qemu_get_aio_context(), true);\
> +aio_context_acquire(ctx_); \
> +} else {   \
> +busy_ = aio_poll(ctx_, false); \
> +}  \

Wait, I'm confused.  The current thread is not in the BDS AioContext.
We're not allowed to call aio_poll(ctx_, false).

Did you mean aio_poll(qemu_get_aio_context(), false) in order to
process defer to main loop BHs?

Stefan

Re: [Qemu-devel] [Xen-devel] [PATCH] configure: introduce --enable-xen-fb-backend

2017-04-14 Thread Juergen Gross

On 14/04/17 08:06, Oleksandr Andrushchenko wrote:
> On 04/14/2017 03:12 AM, Stefano Stabellini wrote:
>> On Tue, 11 Apr 2017, Oleksandr Andrushchenko wrote:
>>> From: Oleksandr Andrushchenko 
>>>
>>> For some use cases when Xen framebuffer/input backend
>>> is not a part of Qemu it is required to disable it,
>>> because of conflicting access to input/display devices.
>>> Introduce additional configuration option for explicit
>>> input/display control.
>> In these cases when you don't want xenfb, why don't you just remove
>> "vfb" from the xl config file? QEMU only starts the xenfb backend when
>> requested by the toolstack.
>>
>> Is it because you have an alternative xenfb backend? If so, is it really
>> fully xenfb compatible, or is it a different protocol? If it is a
>> different protocol, I suggest you rename your frontend/backend PV device
>> name to something different from "vfb".
>>
> Well, offending part is vkbd actually (for multi-touch
> we run our own user-space backend which supports
> kbd/ptr/mtouch), but vfb and vkbd is the same backend
> in QEMU. So, I am ok for vfb, but just want vkbd off
> So, there are 2 options:
> 1. At compile time remove vkbd and still allow vfb
> 2. Remove xenfb completely, if acceptable (this is my case)

What about adding a Xenstore entry for backend type and let qemu test
for it being not present or containing "qemu"?


Juergen

[Qemu-devel] [BUG] network : windows os lost ip address of the network card in some cases

2017-04-14 Thread yin.zuowei

we  found this problem for a long time 。For example, if we has three network 
card in virtual xml file ，such as "network connection 1" / "network connection 
2"/"network connection 3" 。

Echo network card has own ip address ，such as 192.168.1.1 / 2.1 /3.1 , when 
delete the first card ，reboot the windows virtual os, then this problem 
happened !




we found that the sencond network card will  replace the first one , then the 
ip address of "network connection 2 " become 192.168.1.1 。


Our third party users began to complain about this bug 。All the business of the 
second ip  lost !!! 

I mean both of windows and linux has this bug ,  we solve this bug in linux  
throught bonding netcrad pci and mac address 。

There is no good solution on windows os . thera are ?  we implemented a plan to 
resumption of IP by QGA.  Is there a better way ?

Re: [Qemu-devel] [Qemu-block] [PATCH for-2.9-rc5 v2] block: Drain BH in bdrv_drained_begin

2017-04-14 Thread Stefan Hajnoczi

On Fri, Apr 14, 2017 at 9:02 AM, Fam Zheng  wrote:
> During block job completion, nothing is preventing
> block_job_defer_to_main_loop_bh from being called in a nested
> aio_poll(), which is a trouble, such as in this code path:
>
> qmp_block_commit
>   commit_active_start
> bdrv_reopen
>   bdrv_reopen_multiple
> bdrv_reopen_prepare
>   bdrv_flush
> aio_poll
>   aio_bh_poll
> aio_bh_call
>   block_job_defer_to_main_loop_bh
> stream_complete
>   bdrv_reopen
>
> block_job_defer_to_main_loop_bh is the last step of the stream job,
> which should have been "paused" by the bdrv_drained_begin/end in
> bdrv_reopen_multiple, but it is not done because it's in the form of a
> main loop BH.
>
> Similar to why block jobs should be paused between drained_begin and
> drained_end, BHs they schedule must be excluded as well.  To achieve
> this, this patch forces draining the BH in BDRV_POLL_WHILE.
>
> Also because the BH in question can do bdrv_unref and child replacing,
> protect @bs carefully to avoid use-after-free.
>
> As a side effect this fixes a hang in block_job_detach_aio_context
> during system_reset when a block job is ready:
>
> #0  0x55aa79f3 in bdrv_drain_recurse
> #1  0x55aa825d in bdrv_drained_begin
> #2  0x55aa8449 in bdrv_drain
> #3  0x55a9c356 in blk_drain
> #4  0x55aa3cfd in mirror_drain
> #5  0x55a66e11 in block_job_detach_aio_context
> #6  0x55a62f4d in bdrv_detach_aio_context
> #7  0x55a63116 in bdrv_set_aio_context
> #8  0x55a9d326 in blk_set_aio_context
> #9  0x557e38da in virtio_blk_data_plane_stop
> #10 0x559f9d5f in virtio_bus_stop_ioeventfd
> #11 0x559fa49b in virtio_bus_stop_ioeventfd
> #12 0x559f6a18 in virtio_pci_stop_ioeventfd
> #13 0x559f6a18 in virtio_pci_reset
> #14 0x559139a9 in qdev_reset_one
> #15 0x55916738 in qbus_walk_children
> #16 0x55913318 in qdev_walk_children
> #17 0x55916738 in qbus_walk_children
> #18 0x559168ca in qemu_devices_reset
> #19 0x5581fcbb in pc_machine_reset
> #20 0x558a4d96 in qemu_system_reset
> #21 0x5577157a in main_loop_should_exit
> #22 0x5577157a in main_loop
> #23 0x5577157a in main
>
> The rationale is that the loop in block_job_detach_aio_context cannot
> make any progress in pausing/completing the job, because bs->in_flight
> is 0, so bdrv_drain doesn't process the block_job_defer_to_main_loop
> BH. With this patch, it does.
>
> Reported-by: Jeff Cody 
> Signed-off-by: Fam Zheng 
>
> ---
>
> v2: Do the BH poll in BDRV_POLL_WHILE to cover bdrv_drain_all_begin as
> well. [Kevin]
> ---
>  block/io.c| 10 +++---
>  include/block/block.h | 21 +
>  2 files changed, 20 insertions(+), 11 deletions(-)

Reviewed-by: Stefan Hajnoczi 

>
> diff --git a/block/io.c b/block/io.c
> index 8706bfa..a472157 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -158,7 +158,7 @@ bool bdrv_requests_pending(BlockDriverState *bs)
>
>  static bool bdrv_drain_recurse(BlockDriverState *bs)
>  {
> -BdrvChild *child;
> +BdrvChild *child, *tmp;
>  bool waited;
>
>  waited = BDRV_POLL_WHILE(bs, atomic_read(>in_flight) > 0);
> @@ -167,8 +167,12 @@ static bool bdrv_drain_recurse(BlockDriverState *bs)
>  bs->drv->bdrv_drain(bs);
>  }
>
> -QLIST_FOREACH(child, >children, next) {
> -waited |= bdrv_drain_recurse(child->bs);
> +QLIST_FOREACH_SAFE(child, >children, next, tmp) {
> +BlockDriverState *bs = child->bs;
> +assert(bs->refcnt > 0);
> +bdrv_ref(bs);
> +waited |= bdrv_drain_recurse(bs);
> +bdrv_unref(bs);
>  }

There are other loops over bs->children in the block layer code.  If a
they indirectly call aio_poll() then the child bs could disappear.
They should probably also take areference.

That said, I wonder if this really solves the problem of graph changes
during bs->children traversal.  We now have a reference but are
iterating over stale data.

This is a separate issue and doesn't need to hold up this patch.

>
>  return waited;
> diff --git a/include/block/block.h b/include/block/block.h
> index 97d4330..3f05e71 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -381,12 +381,13 @@ void bdrv_drain_all(void);
>
>  #define BDRV_POLL_WHILE(bs, cond) ({   \
>  bool waited_ = false;  \
> +bool busy_ = true; \
>  BlockDriverState *bs_ = (bs);  \
>  AioContext *ctx_ = bdrv_get_aio_context(bs_);

[Qemu-devel] [PATCH v2 4/4] qdev: remove cannot_destroy_with_object_finalize_yet

2017-04-14 Thread Laurent Vivier

As all users have been removed, we can remove
cannot_destroy_with_object_finalize_yet field
from the DeviceClass structure.

Signed-off-by: Laurent Vivier 
---
 include/hw/qdev-core.h | 13 -
 qmp.c  |  5 -
 2 files changed, 18 deletions(-)

diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index b44b476..ac682a6 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -113,19 +113,6 @@ typedef struct DeviceClass {
  * TODO remove once we're there
  */
 bool cannot_instantiate_with_device_add_yet;
-/*
- * Does this device model survive object_unref(object_new(TNAME))?
- * All device models should, and this flag shouldn't exist.  Some
- * devices crash in object_new(), some crash or hang in
- * object_unref().  Makes introspecting properties with
- * qmp_device_list_properties() dangerous.  Bad, because it's used
- * by -device FOO,help.  This flag serves to protect that code.
- * It should never be set without a comment explaining why it is
- * set.
- * TODO remove once we're there
- */
-bool cannot_destroy_with_object_finalize_yet;
-
 bool hotpluggable;
 
 /* callbacks */
diff --git a/qmp.c b/qmp.c
index a744e44..ab74cd7 100644
--- a/qmp.c
+++ b/qmp.c
@@ -548,11 +548,6 @@ DevicePropertyInfoList *qmp_device_list_properties(const 
char *typename,
 return NULL;
 }
 
-if (DEVICE_CLASS(klass)->cannot_destroy_with_object_finalize_yet) {
-error_setg(errp, "Can't list properties of device '%s'", typename);
-return NULL;
-}
-
 obj = object_new(typename);
 
 object_property_iter_init(, obj);
-- 
2.9.3

[Qemu-devel] [PATCH v2 0/4] qdev: remove all remaining cannot_destroy_with_object_finalize_yet

2017-04-14 Thread Laurent Vivier

This series removes all the remaining uses of
cannot_destroy_with_object_finalize_yet to finally remove
the flag itself.

The ARM patch has already been sent alone and reviewed by Markus.
I have tested the ppc one on ppc64 machine with KVM and using
QDM device-list-properties command.

For the versatile one, the flag allowed to workaround a problem
in the bus unparent function: the bus unparent is trying to
unparent all the children of the bus. To do that, it has a list
of the children of the bus, and calls object_unparent() for each
child, and object_unparent() calls object_property_del_child() if
obj->parent is not NULL.  As qdev_set_parent_bus() set only
parent_bus and the list of children, parent is NULL and the child
is never deleted.  We can avoid the problem by moving the
qdev_set_parent_bus() to the realize part.

I've tested all the changes with "make check" (including
device-introspect-test). I've booted a versatilepb machine
with a 3.16.0-4 debian installer kernel.

Laurent Vivier (4):
  arm: remove remaining cannot_destroy_with_object_finalize_yet
  ppc: remove cannot_destroy_with_object_finalize_yet
  versatile: remove cannot_destroy_with_object_finalize_yet
  qdev: remove cannot_destroy_with_object_finalize_yet

 hw/arm/allwinner-a10.c  |  6 --
 hw/arm/bcm2836.c|  6 --
 hw/arm/digic.c  |  6 --
 hw/arm/fsl-imx25.c  |  5 -
 hw/arm/fsl-imx31.c  |  5 -
 hw/arm/fsl-imx6.c   |  5 -
 hw/arm/xlnx-zynqmp.c|  6 --
 hw/pci-host/versatile.c | 35 ---
 include/hw/qdev-core.h  | 13 -
 qmp.c   |  5 -
 target/ppc/kvm.c| 10 --
 11 files changed, 12 insertions(+), 90 deletions(-)

-- 
2.9.3

[Qemu-devel] [PATCH v2 1/4] arm: remove remaining cannot_destroy_with_object_finalize_yet

2017-04-14 Thread Laurent Vivier

With commit ce5b1bbf624b ("exec: move cpu_exec_init() calls to
realize functions"), we can now remove all the
remaining cannot_destroy_with_object_finalize_yet as
unsafe references have been moved to cpu_exec_realizefn().
(tested with QOM command provided by commit 4c315c27).

Suggested-by: Markus Armbruster 
Signed-off-by: Laurent Vivier 
Reviewed-by: Markus Armbruster 
---
 hw/arm/allwinner-a10.c | 6 --
 hw/arm/bcm2836.c   | 6 --
 hw/arm/digic.c | 6 --
 hw/arm/fsl-imx25.c | 5 -
 hw/arm/fsl-imx31.c | 5 -
 hw/arm/fsl-imx6.c  | 5 -
 hw/arm/xlnx-zynqmp.c   | 6 --
 7 files changed, 39 deletions(-)

diff --git a/hw/arm/allwinner-a10.c b/hw/arm/allwinner-a10.c
index ca15d1c..f62a9a3 100644
--- a/hw/arm/allwinner-a10.c
+++ b/hw/arm/allwinner-a10.c
@@ -118,12 +118,6 @@ static void aw_a10_class_init(ObjectClass *oc, void *data)
 DeviceClass *dc = DEVICE_CLASS(oc);
 
 dc->realize = aw_a10_realize;
-
-/*
- * Reason: creates an ARM CPU, thus use after free(), see
- * arm_cpu_class_init()
- */
-dc->cannot_destroy_with_object_finalize_yet = true;
 }
 
 static const TypeInfo aw_a10_type_info = {
diff --git a/hw/arm/bcm2836.c b/hw/arm/bcm2836.c
index 8451190..8c43291 100644
--- a/hw/arm/bcm2836.c
+++ b/hw/arm/bcm2836.c
@@ -160,12 +160,6 @@ static void bcm2836_class_init(ObjectClass *oc, void *data)
 
 dc->props = bcm2836_props;
 dc->realize = bcm2836_realize;
-
-/*
- * Reason: creates an ARM CPU, thus use after free(), see
- * arm_cpu_class_init()
- */
-dc->cannot_destroy_with_object_finalize_yet = true;
 }
 
 static const TypeInfo bcm2836_type_info = {
diff --git a/hw/arm/digic.c b/hw/arm/digic.c
index d60ea39..94f3263 100644
--- a/hw/arm/digic.c
+++ b/hw/arm/digic.c
@@ -101,12 +101,6 @@ static void digic_class_init(ObjectClass *oc, void *data)
 DeviceClass *dc = DEVICE_CLASS(oc);
 
 dc->realize = digic_realize;
-
-/*
- * Reason: creates an ARM CPU, thus use after free(), see
- * arm_cpu_class_init()
- */
-dc->cannot_destroy_with_object_finalize_yet = true;
 }
 
 static const TypeInfo digic_type_info = {
diff --git a/hw/arm/fsl-imx25.c b/hw/arm/fsl-imx25.c
index 2126f73..9056f27 100644
--- a/hw/arm/fsl-imx25.c
+++ b/hw/arm/fsl-imx25.c
@@ -290,11 +290,6 @@ static void fsl_imx25_class_init(ObjectClass *oc, void 
*data)
 
 dc->realize = fsl_imx25_realize;
 
-/*
- * Reason: creates an ARM CPU, thus use after free(), see
- * arm_cpu_class_init()
- */
-dc->cannot_destroy_with_object_finalize_yet = true;
 dc->desc = "i.MX25 SOC";
 }
 
diff --git a/hw/arm/fsl-imx31.c b/hw/arm/fsl-imx31.c
index dd1c713..d7e2d83 100644
--- a/hw/arm/fsl-imx31.c
+++ b/hw/arm/fsl-imx31.c
@@ -262,11 +262,6 @@ static void fsl_imx31_class_init(ObjectClass *oc, void 
*data)
 
 dc->realize = fsl_imx31_realize;
 
-/*
- * Reason: creates an ARM CPU, thus use after free(), see
- * arm_cpu_class_init()
- */
-dc->cannot_destroy_with_object_finalize_yet = true;
 dc->desc = "i.MX31 SOC";
 }
 
diff --git a/hw/arm/fsl-imx6.c b/hw/arm/fsl-imx6.c
index 76dd8a4..6969e73 100644
--- a/hw/arm/fsl-imx6.c
+++ b/hw/arm/fsl-imx6.c
@@ -442,11 +442,6 @@ static void fsl_imx6_class_init(ObjectClass *oc, void 
*data)
 
 dc->realize = fsl_imx6_realize;
 
-/*
- * Reason: creates an ARM CPU, thus use after free(), see
- * arm_cpu_class_init()
- */
-dc->cannot_destroy_with_object_finalize_yet = true;
 dc->desc = "i.MX6 SOC";
 }
 
diff --git a/hw/arm/xlnx-zynqmp.c b/hw/arm/xlnx-zynqmp.c
index bc4e66b..4f67158 100644
--- a/hw/arm/xlnx-zynqmp.c
+++ b/hw/arm/xlnx-zynqmp.c
@@ -439,12 +439,6 @@ static void xlnx_zynqmp_class_init(ObjectClass *oc, void 
*data)
 
 dc->props = xlnx_zynqmp_props;
 dc->realize = xlnx_zynqmp_realize;
-
-/*
- * Reason: creates an ARM CPU, thus use after free(), see
- * arm_cpu_class_init()
- */
-dc->cannot_destroy_with_object_finalize_yet = true;
 }
 
 static const TypeInfo xlnx_zynqmp_type_info = {
-- 
2.9.3

[Qemu-devel] [PATCH v2 2/4] ppc: remove cannot_destroy_with_object_finalize_yet

2017-04-14 Thread Laurent Vivier

This removes the assert(kvm_enabled()) from kvmppc_host_cpu_initfn()

This assert can never be triggered as the function is only registered
when KVM is available (see also 4c315c2
"qdev: Protect device-list-properties against broken devices").

So we can remove the cannot_destroy_with_object_finalize_yet from
kvmppc_host_cpu_class_init() without fear and beyond reproach.
(as it has already be done for i386 with 771a13e "i386: Unset
cannot_destroy_with_object_finalize_yet on "host" model" and
e435601 "target-i386: Remove assert(kvm_enabled()) from
host_x86_cpu_initfn()")

Signed-off-by: Laurent Vivier 
---
 target/ppc/kvm.c | 10 --
 1 file changed, 10 deletions(-)

diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
index 9f1f132..64017ac 100644
--- a/target/ppc/kvm.c
+++ b/target/ppc/kvm.c
@@ -2245,14 +2245,8 @@ static void alter_insns(uint64_t *word, uint64_t flags, 
bool on)
 }
 }
 
-static void kvmppc_host_cpu_initfn(Object *obj)
-{
-assert(kvm_enabled());
-}
-
 static void kvmppc_host_cpu_class_init(ObjectClass *oc, void *data)
 {
-DeviceClass *dc = DEVICE_CLASS(oc);
 PowerPCCPUClass *pcc = POWERPC_CPU_CLASS(oc);
 uint32_t vmx = kvmppc_get_vmx();
 uint32_t dfp = kvmppc_get_dfp();
@@ -2279,9 +2273,6 @@ static void kvmppc_host_cpu_class_init(ObjectClass *oc, 
void *data)
 if (icache_size != -1) {
 pcc->l1_icache_size = icache_size;
 }
-
-/* Reason: kvmppc_host_cpu_initfn() dies when !kvm_enabled() */
-dc->cannot_destroy_with_object_finalize_yet = true;
 }
 
 bool kvmppc_has_cap_epr(void)
@@ -2333,7 +2324,6 @@ static int kvm_ppc_register_host_cpu_type(void)
 {
 TypeInfo type_info = {
 .name = TYPE_HOST_POWERPC_CPU,
-.instance_init = kvmppc_host_cpu_initfn,
 .class_init = kvmppc_host_cpu_class_init,
 };
 PowerPCCPUClass *pvr_pcc;
-- 
2.9.3

[Qemu-devel] [PATCH v2 3/4] versatile: remove cannot_destroy_with_object_finalize_yet

2017-04-14 Thread Laurent Vivier

cannot_destroy_with_object_finalize_yet was added by 4c315c2
("qdev: Protect device-list-properties against broken devices")
because "realview_pci" and "versatile_pci" were hanging
during "device-list-properties" cleanup (an infinite loop in
bus_unparent()).

We have this problem because the child is not removed from
the list of the PCI bus child because it has no defined parent:
qdev_set_parent_bus() set the device parent_bus pointer to bus, and
adds the device in the bus children list, but doesn't update the
device parent pointer.

To fix the problem, move all the involved parts to the realize function.

Signed-off-by: Laurent Vivier 
---
 hw/pci-host/versatile.c | 35 ---
 1 file changed, 12 insertions(+), 23 deletions(-)

diff --git a/hw/pci-host/versatile.c b/hw/pci-host/versatile.c
index 467cbb9..27fde46 100644
--- a/hw/pci-host/versatile.c
+++ b/hw/pci-host/versatile.c
@@ -380,20 +380,8 @@ static void pci_vpb_reset(DeviceState *d)
 
 static void pci_vpb_init(Object *obj)
 {
-PCIHostState *h = PCI_HOST_BRIDGE(obj);
 PCIVPBState *s = PCI_VPB(obj);
 
-memory_region_init(>pci_io_space, OBJECT(s), "pci_io", 1ULL << 32);
-memory_region_init(>pci_mem_space, OBJECT(s), "pci_mem", 1ULL << 32);
-
-pci_bus_new_inplace(>pci_bus, sizeof(s->pci_bus), DEVICE(obj), "pci",
->pci_mem_space, >pci_io_space,
-PCI_DEVFN(11, 0), TYPE_PCI_BUS);
-h->bus = >pci_bus;
-
-object_initialize(>pci_dev, sizeof(s->pci_dev), 
TYPE_VERSATILE_PCI_HOST);
-qdev_set_parent_bus(DEVICE(>pci_dev), BUS(>pci_bus));
-
 /* Window sizes for VersatilePB; realview_pci's init will override */
 s->mem_win_size[0] = 0x0c00;
 s->mem_win_size[1] = 0x1000;
@@ -403,10 +391,22 @@ static void pci_vpb_init(Object *obj)
 static void pci_vpb_realize(DeviceState *dev, Error **errp)
 {
 PCIVPBState *s = PCI_VPB(dev);
+PCIHostState *h = PCI_HOST_BRIDGE(dev);
 SysBusDevice *sbd = SYS_BUS_DEVICE(dev);
 pci_map_irq_fn mapfn;
 int i;
 
+memory_region_init(>pci_io_space, OBJECT(s), "pci_io", 1ULL << 32);
+memory_region_init(>pci_mem_space, OBJECT(s), "pci_mem", 1ULL << 32);
+
+pci_bus_new_inplace(>pci_bus, sizeof(s->pci_bus), dev, "pci",
+>pci_mem_space, >pci_io_space,
+PCI_DEVFN(11, 0), TYPE_PCI_BUS);
+h->bus = >pci_bus;
+
+object_initialize(>pci_dev, sizeof(s->pci_dev), 
TYPE_VERSATILE_PCI_HOST);
+qdev_set_parent_bus(DEVICE(>pci_dev), BUS(>pci_bus));
+
 for (i = 0; i < 4; i++) {
 sysbus_init_irq(sbd, >irq[i]);
 }
@@ -503,8 +503,6 @@ static void pci_vpb_class_init(ObjectClass *klass, void 
*data)
 dc->reset = pci_vpb_reset;
 dc->vmsd = _vpb_vmstate;
 dc->props = pci_vpb_properties;
-/* Reason: object_unref() hangs */
-dc->cannot_destroy_with_object_finalize_yet = true;
 }
 
 static const TypeInfo pci_vpb_info = {
@@ -526,19 +524,10 @@ static void pci_realview_init(Object *obj)
 s->mem_win_size[2] = 0x0800;
 }
 
-static void pci_realview_class_init(ObjectClass *class, void *data)
-{
-DeviceClass *dc = DEVICE_CLASS(class);
-
-/* Reason: object_unref() hangs */
-dc->cannot_destroy_with_object_finalize_yet = true;
-}
-
 static const TypeInfo pci_realview_info = {
 .name  = "realview_pci",
 .parent= TYPE_VERSATILE_PCI,
 .instance_init = pci_realview_init,
-.class_init= pci_realview_class_init,
 };
 
 static void versatile_pci_register_types(void)
-- 
2.9.3

[Qemu-devel] [PATCH v3 5/5] slirp: add a fake NC-SI backend

2017-04-14 Thread Cédric Le Goater

NC-SI (Network Controller Sideband Interface) enables a BMC to manage
a set of NICs on a system. This model takes the simplest approach and
reverses the NC-SI packets to pretend a NIC is present and exercise
the Linux driver.

The NCSI header file  comes from mainline Linux and was
untabified.

Signed-off-by: Cédric Le Goater 
Reviewed-by: Philippe Mathieu-Daudé 
Acked-by: Samuel Thibault 
---

 Changes since v2:
 - modified the 'ncsi-pkt.h' file to add its location under the Linux
   tree 
 - removed useless zeroing of the 'reserved' fields 
 - introduced a ncsi_rsp_handlers array to hold the payload size and
   a specific handler routine for each command   
 - fixed the output size of the frame to match the command payload
   size

 include/net/eth.h   |   1 +
 include/net/eth.h   |   1 +
 slirp/Makefile.objs |   2 +-
 slirp/ncsi-pkt.h| 419 
 slirp/ncsi.c| 130 
 slirp/slirp.c   |   4 +
 slirp/slirp.h   |   3 +
 6 files changed, 558 insertions(+), 1 deletion(-)
 create mode 100644 slirp/ncsi-pkt.h
 create mode 100644 slirp/ncsi.c

diff --git a/include/net/eth.h b/include/net/eth.h
index afeb45be34e1..09054a506d6e 100644
--- a/include/net/eth.h
+++ b/include/net/eth.h
@@ -209,6 +209,7 @@ struct tcp_hdr {
 #define ETH_P_IPV6(0x86dd)
 #define ETH_P_VLAN(0x8100)
 #define ETH_P_DVLAN   (0x88a8)
+#define ETH_P_NCSI(0x88f8)
 #define ETH_P_UNKNOWN (0x)
 #define VLAN_VID_MASK 0x0fff
 #define IP_HEADER_VERSION_4   (4)
diff --git a/slirp/Makefile.objs b/slirp/Makefile.objs
index 1baa1f1c7cfb..28049b03cddc 100644
--- a/slirp/Makefile.objs
+++ b/slirp/Makefile.objs
@@ -2,4 +2,4 @@ common-obj-y = cksum.o if.o ip_icmp.o ip6_icmp.o ip6_input.o 
ip6_output.o \
ip_input.o ip_output.o dnssearch.o dhcpv6.o
 common-obj-y += slirp.o mbuf.o misc.o sbuf.o socket.o tcp_input.o tcp_output.o
 common-obj-y += tcp_subr.o tcp_timer.o udp.o udp6.o bootp.o tftp.o arp_table.o 
\
-ndp_table.o
+ndp_table.o ncsi.o
diff --git a/slirp/ncsi-pkt.h b/slirp/ncsi-pkt.h
new file mode 100644
index ..ea07d1cd0f97
--- /dev/null
+++ b/slirp/ncsi-pkt.h
@@ -0,0 +1,419 @@
+/*
+ * Copyright Gavin Shan, IBM Corporation 2016.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#ifndef NCSI_PKT_H
+#define NCSI_PKT_H
+
+/* from linux/net/ncsi/ncsi-pkt.h */
+#define __be32 uint32_t
+#define __be16 uint16_t
+
+struct ncsi_pkt_hdr {
+unsigned char mc_id;/* Management controller ID */
+unsigned char revision; /* NCSI version - 0x01  */
+unsigned char reserved; /* Reserved */
+unsigned char id;   /* Packet sequence number   */
+unsigned char type; /* Packet type  */
+unsigned char channel;  /* Network controller ID*/
+__be16length;   /* Payload length   */
+__be32reserved1[2]; /* Reserved */
+};
+
+struct ncsi_cmd_pkt_hdr {
+struct ncsi_pkt_hdr common; /* Common NCSI packet header */
+};
+
+struct ncsi_rsp_pkt_hdr {
+struct ncsi_pkt_hdr common; /* Common NCSI packet header */
+__be16  code;   /* Response code */
+__be16  reason; /* Response reason   */
+};
+
+struct ncsi_aen_pkt_hdr {
+struct ncsi_pkt_hdr common;   /* Common NCSI packet header */
+unsigned char   reserved2[3]; /* Reserved  */
+unsigned char   type; /* AEN packet type   */
+};
+
+/* NCSI common command packet */
+struct ncsi_cmd_pkt {
+struct ncsi_cmd_pkt_hdr cmd;  /* Command header */
+__be32  checksum; /* Checksum   */
+unsigned char   pad[26];
+};
+
+struct ncsi_rsp_pkt {
+struct ncsi_rsp_pkt_hdr rsp;  /* Response header */
+__be32  checksum; /* Checksum*/
+unsigned char   pad[22];
+};
+
+/* Select Package */
+struct ncsi_cmd_sp_pkt {
+struct ncsi_cmd_pkt_hdr cmd;/* Command header */
+unsigned char   reserved[3];/* Reserved   */
+unsigned char   hw_arbitration; /* HW arbitration */
+__be32  checksum;   /* Checksum   */
+unsigned char   pad[22];
+};
+
+/* Disable Channel */
+struct ncsi_cmd_dc_pkt {
+struct ncsi_cmd_pkt_hdr cmd; /* Command header  */
+unsigned char   reserved[3]; /* Reserved*/
+unsigned

[Qemu-devel] [PATCH v3 4/5] aspeed: add a FTGMAC100 nic

2017-04-14 Thread Cédric Le Goater

There is a second NIC but we do not use it for the moment. We use the
'aspeed' property to tune the definition of the end of ring buffer bit
for the Aspeed SoCs.

Signed-off-by: Cédric Le Goater 
---
 hw/arm/aspeed_soc.c | 21 +
 include/hw/arm/aspeed_soc.h |  2 ++
 2 files changed, 23 insertions(+)

diff --git a/hw/arm/aspeed_soc.c b/hw/arm/aspeed_soc.c
index 571e4f097b02..4937e2bc8323 100644
--- a/hw/arm/aspeed_soc.c
+++ b/hw/arm/aspeed_soc.c
@@ -19,6 +19,7 @@
 #include "hw/char/serial.h"
 #include "qemu/log.h"
 #include "hw/i2c/aspeed_i2c.h"
+#include "net/net.h"
 
 #define ASPEED_SOC_UART_5_BASE  0x00184000
 #define ASPEED_SOC_IOMEM_SIZE   0x0020
@@ -33,6 +34,8 @@
 #define ASPEED_SOC_TIMER_BASE   0x1E782000
 #define ASPEED_SOC_WDT_BASE 0x1E785000
 #define ASPEED_SOC_I2C_BASE 0x1E78A000
+#define ASPEED_SOC_ETH1_BASE0x1E66
+#define ASPEED_SOC_ETH2_BASE0x1E68
 
 static const int uart_irqs[] = { 9, 32, 33, 34, 10 };
 static const int timer_irqs[] = { 16, 17, 18, 35, 36, 37, 38, 39, };
@@ -175,6 +178,10 @@ static void aspeed_soc_init(Object *obj)
 object_initialize(>wdt, sizeof(s->wdt), TYPE_ASPEED_WDT);
 object_property_add_child(obj, "wdt", OBJECT(>wdt), NULL);
 qdev_set_parent_bus(DEVICE(>wdt), sysbus_get_default());
+
+object_initialize(>ftgmac100, sizeof(s->ftgmac100), TYPE_FTGMAC100);
+object_property_add_child(obj, "ftgmac100", OBJECT(>ftgmac100), NULL);
+qdev_set_parent_bus(DEVICE(>ftgmac100), sysbus_get_default());
 }
 
 static void aspeed_soc_realize(DeviceState *dev, Error **errp)
@@ -299,6 +306,20 @@ static void aspeed_soc_realize(DeviceState *dev, Error 
**errp)
 return;
 }
 sysbus_mmio_map(SYS_BUS_DEVICE(>wdt), 0, ASPEED_SOC_WDT_BASE);
+
+/* Net */
+qdev_set_nic_properties(DEVICE(>ftgmac100), _table[0]);
+object_property_set_bool(OBJECT(>ftgmac100), true, "aspeed", );
+object_property_set_bool(OBJECT(>ftgmac100), true, "realized",
+ _err);
+error_propagate(, local_err);
+if (err) {
+error_propagate(errp, err);
+return;
+}
+sysbus_mmio_map(SYS_BUS_DEVICE(>ftgmac100), 0, ASPEED_SOC_ETH1_BASE);
+sysbus_connect_irq(SYS_BUS_DEVICE(>ftgmac100), 0,
+   qdev_get_gpio_in(DEVICE(>vic), 2));
 }
 
 static void aspeed_soc_class_init(ObjectClass *oc, void *data)
diff --git a/include/hw/arm/aspeed_soc.h b/include/hw/arm/aspeed_soc.h
index dbec0c159885..4c5fc66a1ee0 100644
--- a/include/hw/arm/aspeed_soc.h
+++ b/include/hw/arm/aspeed_soc.h
@@ -20,6 +20,7 @@
 #include "hw/i2c/aspeed_i2c.h"
 #include "hw/ssi/aspeed_smc.h"
 #include "hw/watchdog/wdt_aspeed.h"
+#include "hw/net/ftgmac100.h"
 
 #define ASPEED_SPIS_NUM  2
 
@@ -39,6 +40,7 @@ typedef struct AspeedSoCState {
 AspeedSMCState spi[ASPEED_SPIS_NUM];
 AspeedSDMCState sdmc;
 AspeedWDTState wdt;
+FTGMAC100State ftgmac100;
 } AspeedSoCState;
 
 #define TYPE_ASPEED_SOC "aspeed-soc"
-- 
2.7.4

Re: [Qemu-devel] [virtio-dev] Re: [PATCH v9 2/5] virtio-balloon: VIRTIO_BALLOON_F_BALLOON_CHUNKS

2017-04-14 Thread Wei Wang


On 04/14/2017 12:34 AM, Michael S. Tsirkin wrote:

On Thu, Apr 13, 2017 at 05:35:05PM +0800, Wei Wang wrote:

So we don't need the bitmap to talk to host, it is just
a data structure we chose to maintain lists of pages, right?

Right. bitmap is the way to gather pages to chunk.
It's only needed in the balloon page case.
For the unused page case, we don't need it, since the free
page blocks are already chunks.


OK as far as it goes but you need much better isolation for it.
Build a data structure with APIs such as _init, _cleanup, _add, _clear,
_find_first, _find_next.
Completely unrelated to pages, it just maintains bits.
Then use it here.



  static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES;
  module_param(oom_pages, int, S_IRUSR | S_IWUSR);
  MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
@@ -50,6 +54,10 @@ MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
  static struct vfsmount *balloon_mnt;
  #endif
  
+/* Types of pages to chunk */

+#define PAGE_CHUNK_TYPE_BALLOON 0
+

Doesn't look like you are ever adding more types in this
patchset.  Pls keep code simple, generalize it later.


"#define PAGE_CHUNK_TYPE_UNUSED 1" is added in another patch.

Types of page to chunk are treated differently. Different types of page
chunks are sent to the host via different protocols.

1) PAGE_CHUNK_TYPE_BALLOON: Ballooned (i.e. inflated/deflated) pages
to chunk.  For the ballooned type, it uses the basic chunk msg format:

virtio_balloon_page_chunk_hdr +
virtio_balloon_page_chunk * MAX_PAGE_CHUNKS

2) PAGE_CHUNK_TYPE_UNUSED: unused pages to chunk. It uses this miscq msg
format:
miscq_hdr +
virtio_balloon_page_chunk_hdr +
virtio_balloon_page_chunk * MAX_PAGE_CHUNKS

The chunk msg is actually the payload of the miscq msg.




+#define MAX_PAGE_CHUNKS 4096

This is an order-4 allocation. I'd make it 4095 and then it's
an order-3 one.


Sounds good, thanks.
I think it would be better to make it 4090. Leave some space for the hdr
as well.




  struct virtio_balloon {
struct virtio_device *vdev;
struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
@@ -78,6 +86,32 @@ struct virtio_balloon {
/* Synchronize access/update to this struct virtio_balloon elements */
struct mutex balloon_lock;
  
+	/*

+* Buffer for PAGE_CHUNK_TYPE_BALLOON:
+* virtio_balloon_page_chunk_hdr +
+* virtio_balloon_page_chunk * MAX_PAGE_CHUNKS
+*/
+   struct virtio_balloon_page_chunk_hdr *balloon_page_chunk_hdr;
+   struct virtio_balloon_page_chunk *balloon_page_chunk;
+
+   /* Bitmap used to record pages */
+   unsigned long *page_bmap[PAGE_BMAP_COUNT_MAX];
+   /* Number of the allocated page_bmap */
+   unsigned int page_bmaps;
+
+   /*
+* The allocated page_bmap size may be smaller than the pfn range of
+* the ballooned pages. In this case, we need to use the page_bmap
+* multiple times to cover the entire pfn range. It's like using a
+* short ruler several times to finish measuring a long object.
+* The start location of the ruler in the next measurement is the end
+* location of the ruler in the previous measurement.
+*
+* pfn_max & pfn_min: forms the pfn range of the ballooned pages
+* pfn_start & pfn_stop: records the start and stop pfn in each cover

cover? what does this mean?

looks like you only use these to pass data to tell_host.
so pass these as parameters and you won't need to keep
them in this structure.

And then you can move this comment to set_page_bmap where
it belongs.


+*/
+   unsigned long pfn_min, pfn_max, pfn_start, pfn_stop;
+
/* The array of pfns we tell the Host about. */
unsigned int num_pfns;
__virtio32 pfns[VIRTIO_BALLOON_ARRAY_PFNS_MAX];
@@ -110,20 +144,201 @@ static void balloon_ack(struct virtqueue *vq)
wake_up(>acked);
  }
  
-static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)

+static inline void init_page_bmap_range(struct virtio_balloon *vb)
+{
+   vb->pfn_min = ULONG_MAX;
+   vb->pfn_max = 0;
+}
+
+static inline void update_page_bmap_range(struct virtio_balloon *vb,
+ struct page *page)
+{
+   unsigned long balloon_pfn = page_to_balloon_pfn(page);
+
+   vb->pfn_min = min(balloon_pfn, vb->pfn_min);
+   vb->pfn_max = max(balloon_pfn, vb->pfn_max);
+}
+
+/* The page_bmap size is extended by adding more number of page_bmap */

did you mean

Allocate more bitmaps to cover the given number of pfns
and add them to page_bmap

?

This isn't what this function does.
It blindly assumes 1 bitmap is allocated
and allocates more, up to PAGE_BMAP_COUNT_MAX.



Please let me use a concrete analogy to explain this algorithm:
We have a 2-meter long ruler (i.e. page_bmap[0]).

Case 1:
To measure a  1-meter long object (i.e. pfn_max=1, pfn_min=0),
we can simply use the ruler once and get to know that the object
is

[Qemu-devel] [PATCH v3 2/5] net: add FTGMAC100 support

2017-04-14 Thread Cédric Le Goater

The FTGMAC100 device is an Ethernet controller with DMA function that
can be found on Aspeed SoCs (which include NCSI).

It is fully compliant with IEEE 802.3 specification for 10/100 Mbps
Ethernet and IEEE 802.3z specification for 1000 Mbps Ethernet and
includes Reduced Media Independent Interface (RMII) and Reduced
Gigabit Media Independent Interface (RGMII) interfaces. It adopts an
AHB bus interface and integrates a link list DMA engine with direct
M-Bus accesses for transmitting and receiving packets. It has
independent TX/RX fifos, supports half and full duplex (1000 Mbps mode
only supports full duplex), flow control for full duplex and
backpressure for half duplex.

The FTGMAC100 also implements IP, TCP, UDP checksum offloads and
supports IEEE 802.1Q VLAN tag insertion and removal. It offers
high-priority transmit queue for QoS and CoS applications

This model is backed with a RealTek 8211E PHY which is the chip found
on the AST2500 EVB. It is complete enough to satisfy two different
Linux drivers and a U-Boot driver. Not supported features are :

 - IEEE 802.1Q VLAN
 - High Priority Transmit Queue
 - Wake-On-LAN functions

The code is based on the Coldfire Fast Ethernet Controller model.

Signed-off-by: Cédric Le Goater 
---
 Tested on the palmetto, romulus, ast2500 evb machines.

 Changes since v1:
 - removed TODO comments and used LOG_UNIMP in the read/write ops
 - used camelcase for struct names and typedefs. 
 - removed the useless struct definitions for ring descriptors and
   the alignment pragma   
 - introduced a frame buffer at the machine level to reduce stack
   usage in the tx path.
 - introduced symbolic constants for PHY values.
 - introduced rtl8211e PHY chip specific registers 
 - removed qemu_set_irq() in reset path 
 - checked for dma_memory_read() errors. Also for write but that was
   less important as the descriptor is first read so it should be
   valid for the write.
 - removed the irq state
 - removed the weird hack to catch a first valid descriptor.
 - fixed the read of the mac address 

 default-configs/arm-softmmu.mak |1 +
 hw/net/Makefile.objs|1 +
 hw/net/ftgmac100.c  | 1003 +++
 include/hw/net/ftgmac100.h  |   60 +++
 4 files changed, 1065 insertions(+)
 create mode 100644 hw/net/ftgmac100.c
 create mode 100644 include/hw/net/ftgmac100.h

diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
index 1e3bd2b8ca23..78d7af03a2e8 100644
--- a/default-configs/arm-softmmu.mak
+++ b/default-configs/arm-softmmu.mak
@@ -29,6 +29,7 @@ CONFIG_LAN9118=y
 CONFIG_SMC91C111=y
 CONFIG_ALLWINNER_EMAC=y
 CONFIG_IMX_FEC=y
+CONFIG_FTGMAC100=y
 CONFIG_DS1338=y
 CONFIG_PFLASH_CFI01=y
 CONFIG_PFLASH_CFI02=y
diff --git a/hw/net/Makefile.objs b/hw/net/Makefile.objs
index 6a95d92d37f8..5ddaffe63a46 100644
--- a/hw/net/Makefile.objs
+++ b/hw/net/Makefile.objs
@@ -26,6 +26,7 @@ common-obj-$(CONFIG_IMX_FEC) += imx_fec.o
 common-obj-$(CONFIG_CADENCE) += cadence_gem.o
 common-obj-$(CONFIG_STELLARIS_ENET) += stellaris_enet.o
 common-obj-$(CONFIG_LANCE) += lance.o
+common-obj-$(CONFIG_FTGMAC100) += ftgmac100.o
 
 obj-$(CONFIG_ETRAXFS) += etraxfs_eth.o
 obj-$(CONFIG_COLDFIRE) += mcf_fec.o
diff --git a/hw/net/ftgmac100.c b/hw/net/ftgmac100.c
new file mode 100644
index ..fc309b3629fd
--- /dev/null
+++ b/hw/net/ftgmac100.c
@@ -0,0 +1,1003 @@
+/*
+ * Faraday FTGMAC100 Gigabit Ethernet
+ *
+ * Copyright (C) 2016-2017, IBM Corporation.
+ *
+ * Based on Coldfire Fast Ethernet Controller emulation.
+ *
+ * Copyright (c) 2007 CodeSourcery.
+ *
+ * This code is licensed under the GPL version 2 or later. See the
+ * COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/net/ftgmac100.h"
+#include "sysemu/dma.h"
+#include "qemu/log.h"
+#include "net/checksum.h"
+#include "net/eth.h"
+#include "hw/net/mii.h"
+
+/* For crc32 */
+#include 
+
+/*
+ * FTGMAC100 registers
+ */
+#define FTGMAC100_ISR 0x00
+#define FTGMAC100_IER 0x04
+#define FTGMAC100_MAC_MADR0x08
+#define FTGMAC100_MAC_LADR0x0c
+#define FTGMAC100_MATH0   0x10
+#define FTGMAC100_MATH1   0x14
+#define FTGMAC100_NPTXPD  0x18
+#define FTGMAC100_RXPD0x1C
+#define FTGMAC100_NPTXR_BADR  0x20
+#define FTGMAC100_RXR_BADR0x24
+#define FTGMAC100_HPTXPD  0x28
+#define FTGMAC100_HPTXR_BADR  0x2c
+#define FTGMAC100_ITC 0x30
+#define FTGMAC100_APTC0x34
+#define FTGMAC100_DBLAC   0x38
+#define FTGMAC100_REVR0x40
+#define FTGMAC100_FEAR1   0x44
+#define FTGMAC100_RBSR0x4c
+#define FTGMAC100_TPAFCR  0x48
+
+#define FTGMAC100_MACCR   0x50
+#define FTGMAC100_MACSR   0x54
+#define FTGMAC100_PHYCR   0x60
+#define FTGMAC100_PHYDATA 0x64
+#define FTGMAC100_FCR 0x68
+
+/*
+ * Interrupt status register & interrupt

[Qemu-devel] [PATCH v3 3/5] net/ftgmac100: add a 'aspeed' property

2017-04-14 Thread Cédric Le Goater

The Aspeed SoCs have a different definition of the end of the ring
buffer bit. Add a property to specify which set of bits should be used
by the NIC.

Signed-off-by: Cédric Le Goater 
---
 hw/net/ftgmac100.c | 17 +++--
 include/hw/net/ftgmac100.h |  4 
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/hw/net/ftgmac100.c b/hw/net/ftgmac100.c
index fc309b3629fd..4c218bd79c64 100644
--- a/hw/net/ftgmac100.c
+++ b/hw/net/ftgmac100.c
@@ -126,6 +126,7 @@
 #define FTGMAC100_TXDES0_CRC_ERR (1 << 19)
 #define FTGMAC100_TXDES0_LTS (1 << 28)
 #define FTGMAC100_TXDES0_FTS (1 << 29)
+#define FTGMAC100_TXDES0_EDOTR_ASPEED(1 << 30)
 #define FTGMAC100_TXDES0_TXDMA_OWN   (1 << 31)
 
 #define FTGMAC100_TXDES1_VLANTAG_CI(x)   ((x) & 0x)
@@ -154,6 +155,7 @@
 #define FTGMAC100_RXDES0_PAUSE_FRAME (1 << 25)
 #define FTGMAC100_RXDES0_LRS (1 << 28)
 #define FTGMAC100_RXDES0_FRS (1 << 29)
+#define FTGMAC100_RXDES0_EDORR_ASPEED(1 << 30)
 #define FTGMAC100_RXDES0_RXPKT_RDY   (1 << 31)
 
 #define FTGMAC100_RXDES1_VLANTAG_CI  0x
@@ -462,7 +464,7 @@ static void ftgmac100_do_tx(FTGMAC100State *s, uint32_t 
tx_ring,
 /* Write back the modified descriptor.  */
 ftgmac100_write_bd(, addr);
 /* Advance to the next descriptor.  */
-if (bd.des0 & FTGMAC100_TXDES0_EDOTR) {
+if (bd.des0 & s->txdes0_edotr) {
 addr = tx_ring;
 } else {
 addr += sizeof(FTGMAC100Desc);
@@ -880,7 +882,7 @@ static ssize_t ftgmac100_receive(NetClientState *nc, const 
uint8_t *buf,
 s->isr |= FTGMAC100_INT_RPKT_FIFO;
 }
 ftgmac100_write_bd(, addr);
-if (bd.des0 & FTGMAC100_RXDES0_EDORR) {
+if (bd.des0 & s->rxdes0_edorr) {
 addr = s->rx_ring;
 } else {
 addr += sizeof(FTGMAC100Desc);
@@ -921,6 +923,14 @@ static void ftgmac100_realize(DeviceState *dev, Error 
**errp)
 FTGMAC100State *s = FTGMAC100(dev);
 SysBusDevice *sbd = SYS_BUS_DEVICE(dev);
 
+if (s->aspeed) {
+s->txdes0_edotr = FTGMAC100_TXDES0_EDOTR_ASPEED;
+s->rxdes0_edorr = FTGMAC100_RXDES0_EDORR_ASPEED;
+} else {
+s->txdes0_edotr = FTGMAC100_TXDES0_EDOTR;
+s->rxdes0_edorr = FTGMAC100_RXDES0_EDORR;
+}
+
 memory_region_init_io(>iomem, OBJECT(dev), _ops, s,
   TYPE_FTGMAC100, 0x2000);
 sysbus_init_mmio(sbd, >iomem);
@@ -967,11 +977,14 @@ static const VMStateDescription vmstate_ftgmac100 = {
 VMSTATE_UINT32(phy_advertise, FTGMAC100State),
 VMSTATE_UINT32(phy_int, FTGMAC100State),
 VMSTATE_UINT32(phy_int_mask, FTGMAC100State),
+VMSTATE_UINT32(txdes0_edotr, FTGMAC100State),
+VMSTATE_UINT32(rxdes0_edorr, FTGMAC100State),
 VMSTATE_END_OF_LIST()
 }
 };
 
 static Property ftgmac100_properties[] = {
+DEFINE_PROP_BOOL("aspeed", FTGMAC100State, aspeed, false),
 DEFINE_NIC_PROPERTIES(FTGMAC100State, conf),
 DEFINE_PROP_END_OF_LIST(),
 };
diff --git a/include/hw/net/ftgmac100.h b/include/hw/net/ftgmac100.h
index 962a718f43b8..d9bc589fbf72 100644
--- a/include/hw/net/ftgmac100.h
+++ b/include/hw/net/ftgmac100.h
@@ -55,6 +55,10 @@ typedef struct FTGMAC100State {
 uint32_t phy_advertise;
 uint32_t phy_int;
 uint32_t phy_int_mask;
+
+bool aspeed;
+uint32_t txdes0_edotr;
+uint32_t rxdes0_edorr;
 } FTGMAC100State;
 
 #endif
-- 
2.7.4

[Qemu-devel] [PATCH v3 1/5] hw/net: add MII definitions

2017-04-14 Thread Cédric Le Goater

This adds comments on the Basic mode control and status registers bit
definitions. It also adds a couple of bits for 1000BASE-T and the
RealTek 8211E PHY for the FTGMAC100 model to use.

Signed-off-by: Cédric Le Goater 
Reviewed-by: Philippe Mathieu-Daudé 
---
 include/hw/net/mii.h | 71 +++-
 1 file changed, 53 insertions(+), 18 deletions(-)

diff --git a/include/hw/net/mii.h b/include/hw/net/mii.h
index 9fdd7bbe75f1..6ce48a6d7802 100644
--- a/include/hw/net/mii.h
+++ b/include/hw/net/mii.h
@@ -22,13 +22,20 @@
 #define MII_H
 
 /* PHY registers */
-#define MII_BMCR0
-#define MII_BMSR1
-#define MII_PHYID1  2
-#define MII_PHYID2  3
-#define MII_ANAR4
-#define MII_ANLPAR  5
-#define MII_ANER6
+#define MII_BMCR0  /* Basic mode control register */
+#define MII_BMSR1  /* Basic mode status register */
+#define MII_PHYID1  2  /* ID register 1 */
+#define MII_PHYID2  3  /* ID register 2 */
+#define MII_ANAR4  /* Autonegotiation advertisement */
+#define MII_ANLPAR  5  /* Autonegotiation lnk partner abilities */
+#define MII_ANER6  /* Autonegotiation expansion */
+#define MII_ANNP7  /* Autonegotiation next page */
+#define MII_ANLPRNP 8  /* Autonegotiation link partner rx next page */
+#define MII_CTRL10009  /* 1000BASE-T control */
+#define MII_STAT100010 /* 1000BASE-T status */
+#define MII_MDDACR  13 /* MMD access control */
+#define MII_MDDAADR 14 /* MMD access address data */
+#define MII_EXTSTAT 15 /* Extended Status */
 #define MII_NSR 16
 #define MII_LBREMR  17
 #define MII_REC 18
@@ -38,19 +45,33 @@
 /* PHY registers fields */
 #define MII_BMCR_RESET  (1 << 15)
 #define MII_BMCR_LOOPBACK   (1 << 14)
-#define MII_BMCR_SPEED  (1 << 13)
-#define MII_BMCR_AUTOEN (1 << 12)
-#define MII_BMCR_FD (1 << 8)
+#define MII_BMCR_SPEED100   (1 << 13)  /* LSB of Speed (100) */
+#define MII_BMCR_SPEED  MII_BMCR_SPEED100
+#define MII_BMCR_AUTOEN (1 << 12) /* Autonegotiation enable */
+#define MII_BMCR_PDOWN  (1 << 11) /* Enable low power state */
+#define MII_BMCR_ISOLATE(1 << 10) /* Isolate data paths from MII */
+#define MII_BMCR_ANRESTART  (1 << 9)  /* Auto negotiation restart */
+#define MII_BMCR_FD (1 << 8)  /* Set duplex mode */
+#define MII_BMCR_CTST   (1 << 7)  /* Collision test */
+#define MII_BMCR_SPEED1000  (1 << 6)  /* MSB of Speed (1000) */
 
-#define MII_BMSR_100TX_FD   (1 << 14)
-#define MII_BMSR_100TX_HD   (1 << 13)
-#define MII_BMSR_10T_FD (1 << 12)
-#define MII_BMSR_10T_HD (1 << 11)
-#define MII_BMSR_MFPS   (1 << 6)
-#define MII_BMSR_AN_COMP(1 << 5)
-#define MII_BMSR_AUTONEG(1 << 3)
-#define MII_BMSR_LINK_ST(1 << 2)
+#define MII_BMSR_100TX_FD   (1 << 14) /* Can do 100mbps, full-duplex */
+#define MII_BMSR_100TX_HD   (1 << 13) /* Can do 100mbps, half-duplex */
+#define MII_BMSR_10T_FD (1 << 12) /* Can do 10mbps, full-duplex */
+#define MII_BMSR_10T_HD (1 << 11) /* Can do 10mbps, half-duplex */
+#define MII_BMSR_100T2_FD   (1 << 10) /* Can do 100mbps T2, full-duplex */
+#define MII_BMSR_100T2_HD   (1 << 9)  /* Can do 100mbps T2, half-duplex */
+#define MII_BMSR_EXTSTAT(1 << 8)  /* Extended status in register 15 */
+#define MII_BMSR_MFPS   (1 << 6)  /* MII Frame Preamble Suppression */
+#define MII_BMSR_AN_COMP(1 << 5)  /* Auto-negotiation complete */
+#define MII_BMSR_RFAULT (1 << 4)  /* Remote fault */
+#define MII_BMSR_AUTONEG(1 << 3)  /* Able to do auto-negotiation */
+#define MII_BMSR_LINK_ST(1 << 2)  /* Link status */
+#define MII_BMSR_JABBER (1 << 1)  /* Jabber detected */
+#define MII_BMSR_EXTCAP (1 << 0)  /* Ext-reg capability */
 
+#define MII_ANAR_PAUSE_ASYM (1 << 11) /* Try for asymetric pause */
+#define MII_ANAR_PAUSE  (1 << 10) /* Try for pause */
 #define MII_ANAR_TXFD   (1 << 8)
 #define MII_ANAR_TX (1 << 7)
 #define MII_ANAR_10FD   (1 << 6)
@@ -58,17 +79,31 @@
 #define MII_ANAR_CSMACD (1 << 0)
 
 #define MII_ANLPAR_ACK  (1 << 14)
+#define MII_ANLPAR_PAUSEASY (1 << 11) /* can pause asymmetrically */
+#define MII_ANLPAR_PAUSE(1 << 10) /* can pause */
 #define MII_ANLPAR_TXFD (1 << 8)
 #define MII_ANLPAR_TX   (1 << 7)
 #define MII_ANLPAR_10FD (1 << 6)
 #define MII_ANLPAR_10   (1 << 5)
 #define MII_ANLPAR_CSMACD   (1 << 0)
 
+#define MII_ANER_NWAY   (1 << 0) /* Can do N-way auto-nego */
+
+#define MII_CTRL1000_FULL   (1 << 9)  /* 1000BASE-T full duplex */
+#define MII_CTRL1000_HALF   (1 << 8)  /* 1000BASE-T half duplex */
+
+#define MII_STAT1000_FULL   (1 << 11) /* 1000BASE-T full duplex */
+#define MII_STAT1000_HALF   (1 << 10) /* 1000BASE-T half duplex */
+
 /* List of vendor identifiers */
 /* RealTek 8201 */
 #define

[Qemu-devel] [PATCH v3 0/5] FTGMAC100 nic model for the Aspeed SoCs

2017-04-14 Thread Cédric Le Goater

Hi,

The Aspeed SoCs AST2400 and AST2500 have two FTGMAC100 ethernet
controllers. This series proposes a model for this device and a way to
customize the bit definitions which are slightly different from the
Faraday definitions.

The last patch adds a fake NC-SI (Network Controller Sideband
Interface) backend to pretend a NIC is being managed. This is only
usable with the slirp stack for the moment.

The model has been tested on the 'palmetto', 'romulus' and
'ast2500-evb' machines using different implementations of the Linux
driver and with U-Boot. It has been stressed with iperf.

The full patchset is available here, based on QEMU v2.9.0-rc4 :

  https://github.com/legoater/qemu/commits/aspeed

To test, grab a palmetto or a romulus BMC firmware image :

  
https://openpower.xyz/job/openbmc-build/distro=ubuntu,target=palmetto/lastSuccessfulBuild/artifact/images/palmetto/flash-palmetto
  
https://openpower.xyz/job/openbmc-build/distro=ubuntu,target=romulus/lastSuccessfulBuild/artifact/images/romulus/flash-romulus

and start the machines with  :

  qemu-system-arm -m 512 -M romulus-bmc  -drive 
file=flash-romulus,format=raw,if=mtd  -nographic
  qemu-system-arm -m 512 -M palmetto-bmc -drive 
file=flash-palmetto,format=raw,if=mtd -nographic

the fake NC-SI NIC should come up :

  ftgmac100 1e66.ethernet eth0: NCSI interface up

and to log, use credentials root/0penBmc

Thanks,

C. 

Changes since v2:
 - rebased on QEMU v2.9.0-rc4
 - modified the 'ncsi-pkt.h' file to add its location under the Linux
   tree 
 - removed useless zeroing of the 'reserved' fields 
 - introduced a ncsi_rsp_handlers array to hold the payload size and
   a specific handler routine for each command   
 - fixed the output size of the frame to match the command payload
   size

Changes since v1:
 - removed TODO comments and used LOG_UNIMP in the read/write ops
 - used camelcase for struct names and typedefs. 
 - removed the useless struct definitions for ring descriptors and
   the alignment pragma   
 - introduced a frame buffer at the machine level to reduce stack
   usage in the tx path.
 - introduced symbolic constants for PHY values.
 - introduced rtl8211e PHY chip specific registers 
 - removed qemu_set_irq() in reset path 
 - checked for dma_memory_read() errors. Also for write but that was
   less important as the descriptor is first read so it should be
   valid for the write.
 - removed the irq state
 - removed the weird hack to catch a first valid descriptor.
 - fixed the read of the mac address 

Cédric Le Goater (5):
  hw/net: add MII definitions
  net: add FTGMAC100 support
  net/ftgmac100: add a 'aspeed' property
  aspeed: add a FTGMAC100 nic
  slirp: add a fake NC-SI backend

 default-configs/arm-softmmu.mak |1 +
 hw/arm/aspeed_soc.c |   21 +
 hw/net/Makefile.objs|1 +
 hw/net/ftgmac100.c  | 1016 +++
 include/hw/arm/aspeed_soc.h |2 +
 include/hw/net/ftgmac100.h  |   64 +++
 include/hw/net/mii.h|   71 ++-
 include/net/eth.h   |1 +
 slirp/Makefile.objs |2 +-
 slirp/ncsi-pkt.h|  419 
 slirp/ncsi.c|  130 +
 slirp/slirp.c   |4 +
 slirp/slirp.h   |3 +
 13 files changed, 1716 insertions(+), 19 deletions(-)
 create mode 100644 hw/net/ftgmac100.c
 create mode 100644 include/hw/net/ftgmac100.h
 create mode 100644 slirp/ncsi-pkt.h
 create mode 100644 slirp/ncsi.c

-- 
2.7.4

Re: [Qemu-devel] [PATCH for-2.9-rc5 v2] block: Drain BH in bdrv_drained_begin

2017-04-14 Thread Fam Zheng

On Fri, 04/14 16:02, Fam Zheng wrote:
> @@ -167,8 +167,12 @@ static bool bdrv_drain_recurse(BlockDriverState *bs)
>  bs->drv->bdrv_drain(bs);
>  }
>  
> -QLIST_FOREACH(child, >children, next) {
> -waited |= bdrv_drain_recurse(child->bs);
> +QLIST_FOREACH_SAFE(child, >children, next, tmp) {
> +BlockDriverState *bs = child->bs;

Poor name, @bs is the function parameter already, let's use something like @bs1.

Fam

[Qemu-devel] [PATCH for-2.9-rc5 v2] block: Drain BH in bdrv_drained_begin

2017-04-14 Thread Fam Zheng

During block job completion, nothing is preventing
block_job_defer_to_main_loop_bh from being called in a nested
aio_poll(), which is a trouble, such as in this code path:

qmp_block_commit
  commit_active_start
bdrv_reopen
  bdrv_reopen_multiple
bdrv_reopen_prepare
  bdrv_flush
aio_poll
  aio_bh_poll
aio_bh_call
  block_job_defer_to_main_loop_bh
stream_complete
  bdrv_reopen

block_job_defer_to_main_loop_bh is the last step of the stream job,
which should have been "paused" by the bdrv_drained_begin/end in
bdrv_reopen_multiple, but it is not done because it's in the form of a
main loop BH.

Similar to why block jobs should be paused between drained_begin and
drained_end, BHs they schedule must be excluded as well.  To achieve
this, this patch forces draining the BH in BDRV_POLL_WHILE.

Also because the BH in question can do bdrv_unref and child replacing,
protect @bs carefully to avoid use-after-free.

As a side effect this fixes a hang in block_job_detach_aio_context
during system_reset when a block job is ready:

#0  0x55aa79f3 in bdrv_drain_recurse
#1  0x55aa825d in bdrv_drained_begin
#2  0x55aa8449 in bdrv_drain
#3  0x55a9c356 in blk_drain
#4  0x55aa3cfd in mirror_drain
#5  0x55a66e11 in block_job_detach_aio_context
#6  0x55a62f4d in bdrv_detach_aio_context
#7  0x55a63116 in bdrv_set_aio_context
#8  0x55a9d326 in blk_set_aio_context
#9  0x557e38da in virtio_blk_data_plane_stop
#10 0x559f9d5f in virtio_bus_stop_ioeventfd
#11 0x559fa49b in virtio_bus_stop_ioeventfd
#12 0x559f6a18 in virtio_pci_stop_ioeventfd
#13 0x559f6a18 in virtio_pci_reset
#14 0x559139a9 in qdev_reset_one
#15 0x55916738 in qbus_walk_children
#16 0x55913318 in qdev_walk_children
#17 0x55916738 in qbus_walk_children
#18 0x559168ca in qemu_devices_reset
#19 0x5581fcbb in pc_machine_reset
#20 0x558a4d96 in qemu_system_reset
#21 0x5577157a in main_loop_should_exit
#22 0x5577157a in main_loop
#23 0x5577157a in main

The rationale is that the loop in block_job_detach_aio_context cannot
make any progress in pausing/completing the job, because bs->in_flight
is 0, so bdrv_drain doesn't process the block_job_defer_to_main_loop
BH. With this patch, it does.

Reported-by: Jeff Cody 
Signed-off-by: Fam Zheng 

---

v2: Do the BH poll in BDRV_POLL_WHILE to cover bdrv_drain_all_begin as
well. [Kevin]
---
 block/io.c| 10 +++---
 include/block/block.h | 21 +
 2 files changed, 20 insertions(+), 11 deletions(-)

diff --git a/block/io.c b/block/io.c
index 8706bfa..a472157 100644
--- a/block/io.c
+++ b/block/io.c
@@ -158,7 +158,7 @@ bool bdrv_requests_pending(BlockDriverState *bs)
 
 static bool bdrv_drain_recurse(BlockDriverState *bs)
 {
-BdrvChild *child;
+BdrvChild *child, *tmp;
 bool waited;
 
 waited = BDRV_POLL_WHILE(bs, atomic_read(>in_flight) > 0);
@@ -167,8 +167,12 @@ static bool bdrv_drain_recurse(BlockDriverState *bs)
 bs->drv->bdrv_drain(bs);
 }
 
-QLIST_FOREACH(child, >children, next) {
-waited |= bdrv_drain_recurse(child->bs);
+QLIST_FOREACH_SAFE(child, >children, next, tmp) {
+BlockDriverState *bs = child->bs;
+assert(bs->refcnt > 0);
+bdrv_ref(bs);
+waited |= bdrv_drain_recurse(bs);
+bdrv_unref(bs);
 }
 
 return waited;
diff --git a/include/block/block.h b/include/block/block.h
index 97d4330..3f05e71 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -381,12 +381,13 @@ void bdrv_drain_all(void);
 
 #define BDRV_POLL_WHILE(bs, cond) ({   \
 bool waited_ = false;  \
+bool busy_ = true; \
 BlockDriverState *bs_ = (bs);  \
 AioContext *ctx_ = bdrv_get_aio_context(bs_);  \
 if (aio_context_in_iothread(ctx_)) {   \
-while ((cond)) {   \
-aio_poll(ctx_, true);  \
-waited_ = true;\
+while ((cond) || busy_) {  \
+busy_ = aio_poll(ctx_, (cond));\
+waited_ |= !!(cond);   \
 }  \
 } else {   \
 assert(qemu_get_current_aio_context() ==   \
@@ -398,11 +399,15 @@ void bdrv_drain_all(void);
  */

Re: [Qemu-devel] [Qemu-block] [RFC] Proposed qcow2 extension: subcluster allocation

2017-04-14 Thread Denis V. Lunev

[skipped...]

> Hi Denis,
>
> I've read this entire thread now and I really like Berto's summary which
> I think is one of the best recaps of existing qcow2 problems and this
> discussion so far.
>
> I understand your opinion that we should focus on compatible changes
> before incompatible ones, and I also understand that you are very
> concerned about physical fragmentation for reducing long-term IO.
>
> What I don't understand is why you think that subclusters will increase
> fragmentation. If we admit that fragmentation is a problem now, surely
> increasing cluster sizes to 1 or 2 MB will only help to *reduce*
> physical fragmentation, right?
>
> Subclusters as far as I am understanding them will not actually allow
> subclusters to be located at virtually disparate locations, we will
> continue to allocate clusters as normal -- we'll just keep track of
> which portions of the cluster we've written to to help us optimize COW*.
>
> So if we have a 1MB cluster with 64k subclusters as a hypothetical, if
> we write just the first subcluster, we'll have a map like:
>
> X---
>
> Whatever actually happens to exist in this space, whether it be a hole
> we punched via fallocate or literal zeroes, this space is known to the
> filesystem to be contiguous.
>
> If we write to the last subcluster, we'll get:
>
> X--X
>
> And again, maybe the dashes are a fallocate hole, maybe they're zeroes.
> but the last subcluster is located virtually exactly 15 subclusters
> behind the first, they're not physically contiguous. We've saved the
> space between them. Future out-of-order writes won't contribute to any
> fragmentation, at least at this level.
>
> You might be able to reduce COW from 5 IOPs to 3 IOPs, but if we tune
> the subclusters right, we'll have *zero*, won't we?
>
> As far as I can tell, this lets us do a lot of good things all at once:
>
> (1) We get some COW optimizations (for reasons Berto and Kevin have
> detailed)
Yes. We are fine with COW. Let us assume that we will have issued read
entire cluster command after the COW, in the situation

X--X

with a backing store. This is possible even with 1-2 Mb cluster size.
I have seen 4-5 Mb requests from the guest in the real life. In this
case we will have 3 IOP:
read left X area, read backing, read right X.
This is the real drawback of the approach, if sub-cluster size is really
small enough, which should be the case for optimal COW. Thus we
will have random IO in the host instead of sequential one in guest.
Thus we have optimized COW at the cost of long term reads. This
is what I am worrying about as we can have a lot of such reads before
any further data change.

With real holes the situation is even worse. If we have real hole
(obtained with truncate), we are in the case mentioned by Roman.
Virtually the space is continuous, but we have host fragmentation,
equal to sub-cluster size.

We are in the tough position even with COW. If sub-cluster is
not equals to the size of the guest filesystem block, we still need
to do COW. The only win is the amount of data copied, but the loss
in the amount of IOPSes is the same. On the other hand, to get
real win, we should have properly aligned guest partitions, know
exactly guest filesystem block size etc etc etc. This is not that
easy as can seen.

> (2) We can increase our cluster size
> (3) L2 cache can cover bigger files with less memory
> (4) Fragmentation decreases.
yes. I like (2)-(4) a lot. But at my opinion we could stick to 1 Mb
without sub-clusters and still get (2)-(4).

>
> Is this not a win all around? We can improve throughput for a lot of
> different reasons all at once, here. Have I misunderstood the discussion
> so far, anyone?
>
> Please correct me where I am wrong and _ELI5_.
Have tried that, sorry for my illiteracy ;)

Den

Re: [Qemu-devel] [Qemu-block] [PATCH v6] migration/block: use blk_pwrite_zeroes for each zero cluster

2017-04-14 Thread 858585 jemmy

On Fri, Apr 14, 2017 at 2:38 PM, Stefan Hajnoczi  wrote:
> On Fri, Apr 14, 2017 at 7:00 AM, Fam Zheng  wrote:
>> On Thu, 04/13 10:34, jemmy858...@gmail.com wrote:
>>> From: Lidong Chen 
>>>
>>> BLOCK_SIZE is (1 << 20), qcow2 cluster size is 65536 by default,
>>> this may cause the qcow2 file size to be bigger after migration.
>>> This patch checks each cluster, using blk_pwrite_zeroes for each
>>> zero cluster.
>>>
>>> Reviewed-by: Stefan Hajnoczi 
>>> Signed-off-by: Lidong Chen 
>>> ---
>>> v6 changelog:
>>> Fix up some grammar in the comment.
>>> ---
>>>  migration/block.c | 35 +--
>>>  1 file changed, 33 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/migration/block.c b/migration/block.c
>>> index 7734ff7..41c7a55 100644
>>> --- a/migration/block.c
>>> +++ b/migration/block.c
>>> @@ -885,6 +885,8 @@ static int block_load(QEMUFile *f, void *opaque, int 
>>> version_id)
>>>  int64_t total_sectors = 0;
>>>  int nr_sectors;
>>>  int ret;
>>> +BlockDriverInfo bdi;
>>> +int cluster_size;
>>>
>>>  do {
>>>  addr = qemu_get_be64(f);
>>> @@ -919,6 +921,15 @@ static int block_load(QEMUFile *f, void *opaque, int 
>>> version_id)
>>>  error_report_err(local_err);
>>>  return -EINVAL;
>>>  }
>>> +
>>> +ret = bdrv_get_info(blk_bs(blk), );
>>> +if (ret == 0 && bdi.cluster_size > 0 &&
>>> +bdi.cluster_size <= BLOCK_SIZE &&
>>> +BLOCK_SIZE % bdi.cluster_size == 0) {
>>> +cluster_size = bdi.cluster_size;
>>> +} else {
>>> +cluster_size = BLOCK_SIZE;
>>> +}
>>>  }
>>>
>>>  if (total_sectors - addr < BDRV_SECTORS_PER_DIRTY_CHUNK) {
>>> @@ -932,10 +943,30 @@ static int block_load(QEMUFile *f, void *opaque, int 
>>> version_id)
>>>  nr_sectors * BDRV_SECTOR_SIZE,
>>>  BDRV_REQ_MAY_UNMAP);
>>>  } else {
>>> +int i;
>>> +int64_t cur_addr;
>>> +uint8_t *cur_buf;
>>> +
>>>  buf = g_malloc(BLOCK_SIZE);
>>>  qemu_get_buffer(f, buf, BLOCK_SIZE);
>>> -ret = blk_pwrite(blk, addr * BDRV_SECTOR_SIZE, buf,
>>> - nr_sectors * BDRV_SECTOR_SIZE, 0);
>>> +for (i = 0; i < BLOCK_SIZE / cluster_size; i++) {
>>> +cur_addr = addr * BDRV_SECTOR_SIZE + i * cluster_size;
>>> +cur_buf = buf + i * cluster_size;
>>> +
>>> +if ((!block_mig_state.zero_blocks ||
>>> +cluster_size < BLOCK_SIZE) &&
>>> +buffer_is_zero(cur_buf, cluster_size)) {
>>> +ret = blk_pwrite_zeroes(blk, cur_addr,
>>> +cluster_size,
>>> +BDRV_REQ_MAY_UNMAP);
>>> +} else {
>>> +ret = blk_pwrite(blk, cur_addr, cur_buf,
>>> + cluster_size, 0);
>>> +}
>>> +if (ret < 0) {
>>> +break;
>>> +}
>>> +}
>>>  g_free(buf);
>>>  }
>>
>> Sorry for asking this question so late, but, before it gets too late: did you
>> evaluate the performance impact of this change under real world workload?
>>
>> Effectively, if no cluster is zero, this patch still splits a big write into
>> small ones, which is the opposition of usual performance optimizations (i.e.
>> trying to coalesce requests).
>
> Good point!
>
> Another patch can modify the loop to perform the largest writes
> possible.  In other words, do not perform the write immediately and
> keep a cluster counter instead.  When the zero/non-zero state changes,
> perform the write for the accumulated cluster count.

if the zero/non-zero state changes very frequently, it will not work.

I also consider this way before i submit this patch.
but i find the performance is almost the same for qcow2 which
cluster_size is 65536.

I worry about some other format which have very small cluster_size,for
example 512.
but i don't find. please tell me if you know, and i will test it.

Do you think it's necessary if the size of the cluster_size is too small, we
can cluster_size*N instead?

Thanks.


>
> Stefan

Re: [Qemu-devel] [Qemu-block] [PATCH for-2.10] Revert "block/io: Comment out permission assertions"

2017-04-14 Thread Stefan Hajnoczi

On Fri, Apr 14, 2017 at 2:10 AM, John Snow  wrote:
> On 04/13/2017 10:34 AM, Max Reitz wrote:
>> On 13.04.2017 15:28, Stefan Hajnoczi wrote:
>>> On Tue, Apr 11, 2017 at 05:52:26PM +0200, Max Reitz wrote:
 This reverts commit e3e0003a8f6570aba1421ef99a0b383a43371a74.

 This commit was necessary for the 2.9 release because we were unable to
 fix the underlying issue(s) in time. However, we will be for 2.10.

 Signed-off-by: Max Reitz 
 ---
  block.c|  6 +-
  block/io.c | 12 ++--
  2 files changed, 3 insertions(+), 15 deletions(-)
>>>
>>> Should we merge a fix before enabling the assertion again?  It's a known
>>> issue.  Let people using qemu.git live a little and have fun without the
>>> inevitable SIGABRT coredumps.  We don't benefit if more people encounter
>>> this crash and duplicate work debugging it.
>>
>> Yes, we should probably merge the fixes we know about before. But after
>> that, I'd rather merge this patch as soon as possible so we do have a
>> chance of getting more reports (if anything else is broken) before the
>> next freeze. :-)
>>
>> Max
>>
>
> It's nice to have a working tree, but I think Max has a good point about
> wanting to see the reports sooner rather than later, especially if we
> want to roll a 2.9.1 very shortly after release.
>
> (Which I think we should.)

I interpreted Max's reply to mean "okay, let's merge a fix first and
then immediately enable the assertion again".

Your reply seems to interpret Max's email as "the assertion might
catch other bugs so it should be enabled immediately without a fix".

Those two are not the same :).

Stefan

[Qemu-devel] [Bug 1681439] Re: qemu-system-x86_64: hw/ide/core.c:685: ide_cancel_dma_sync: Assertion `s->bus->dma->aiocb == NULL' failed.

2017-04-14 Thread Michał Kępień via Qemu-devel

> Let's try and see if this doesn't fix your problem:
> https://github.com/jnsnow/qemu/commit/57bf2ccdfe8dd35838c1e6642bf9bd76dc9ad1a9

Sadly, no, it still crashes.  Same assertion, same canceled command,
same time to reproduce.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1681439

Title:
  qemu-system-x86_64: hw/ide/core.c:685: ide_cancel_dma_sync: Assertion
  `s->bus->dma->aiocb == NULL' failed.

Status in QEMU:
  New

Bug description:
  Since upgrading to QEMU 2.8.0, my Windows 7 64-bit virtual machines
  started crashing due to the assertion quoted in the summary failing.
  The assertion in question was added by commit 9972354856 ("block: add
  BDS field to count in-flight requests").  My tests show that setting
  discard=unmap is needed to reproduce the issue.  Speaking of
  reproduction, it is a bit flaky, because I have been unable to come up
  with specific instructions that would allow the issue to be triggered
  outside of my environment, but I do have a semi-sane way of testing that
  appears to depend on a specific initial state of data on the underlying
  storage volume, actions taken within the VM and waiting for about 20
  minutes.

  Here is the shortest QEMU command line that I managed to reproduce the
  bug with:

  qemu-system-x86_64 \
  -machine pc-i440fx-2.7,accel=kvm \
  -m 3072 \
  -drive file=/dev/lvm/qemu,format=raw,if=ide,discard=unmap \
-netdev tap,id=hostnet0,ifname=tap0,script=no,downscript=no,vhost=on \
  -device virtio-net-pci,netdev=hostnet0 \
-vnc :0

  The underlying storage (/dev/lvm/qemu) is a thin LVM snapshot.

  QEMU was compiled using:

  ./configure --python=/usr/bin/python2.7 --target-list=x86_64-softmmu
  make -j3

  My virtualization environment is not really a critical one and
  reproduction is not that much of a hassle, so if you need me to gather
  further diagnostic information or test patches, I will be happy to help.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1681439/+subscriptions

Re: [Qemu-devel] [PATCH v2 0/5] FTGMAC100 nic model for the Aspeed SoCs

2017-04-14 Thread Cédric Le Goater

On 04/14/2017 08:26 AM, Jason Wang wrote:
> 
> 
> On 2017年04月13日 17:41, Cédric Le Goater wrote:
>> Hi,
>>
>> The Aspeed SoCs AST2400 and AST2500 have two FTGMAC100 ethernet
>> controllers. This serie proposes a model for this device and a way to
>> customize the bit definitions which are slightly different from the
>> Faraday definitions.
>>
>> The last patch adds a fake NC-SI (Network Controller Sideband
>> Interface) backend to pretend a NIC is being managed. This is only
>> usable with the slirp stack.
>>
>> The model has been tested on the 'palmetto', 'romulus' and
>> 'ast2500-evb' machines using different implementations of the Linux
>> driver and with U-Boot.  It has been stressed with iperf.
>>
>> Thanks,
>>
>> C.
>>
>> Cédric Le Goater (5):
>>hw/net: add MII definitions
>>net: add FTGMAC100 support
>>net/ftgmac100: add a 'aspeed' property
>>aspeed: add a FTGMAC100 nic
>>slirp: add a fake NC-SI backend
>>
>>   default-configs/arm-softmmu.mak |1 +
>>   hw/arm/aspeed_soc.c |   21 +
>>   hw/net/Makefile.objs|1 +
>>   hw/net/ftgmac100.c  | 1016 
>> +++
>>   include/hw/arm/aspeed_soc.h |2 +
>>   include/hw/net/ftgmac100.h  |   64 +++
>>   include/hw/net/mii.h|   71 ++-
>>   include/net/eth.h   |1 +
>>   slirp/Makefile.objs |2 +-
>>   slirp/ncsi-pkt.h|  418 
>>   slirp/ncsi.c|   78 +++
>>   slirp/slirp.c   |4 +
>>   slirp/slirp.h   |3 +
>>   13 files changed, 1663 insertions(+), 19 deletions(-)
>>   create mode 100644 hw/net/ftgmac100.c
>>   create mode 100644 include/hw/net/ftgmac100.h
>>   create mode 100644 slirp/ncsi-pkt.h
>>   create mode 100644 slirp/ncsi.c
>>
> 
> Want to merge but looks like patch 2 does not apply cleanly on qemu master.

ah. I will check that. Anyhow I have improved the fake NC-SI model. So I will 
send a v3. 

Thanks,

C.

Re: [Qemu-devel] [PATCH 02/15] colo-compare: implement the process of checkpoint

2017-04-14 Thread Jason Wang




On 2017年04月14日 14:22, Hailiang Zhang wrote:

Hi Jason,

On 2017/4/14 13:57, Jason Wang wrote:


On 2017年02月22日 17:31, Zhang Chen wrote:


On 02/22/2017 11:42 AM, zhanghailiang wrote:

While do checkpoint, we need to flush all the unhandled packets,
By using the filter notifier mechanism, we can easily to notify
every compare object to do this process, which runs inside
of compare threads as a coroutine.

Hi~ Jason and Hailiang.

I will send a patch set later about colo-compare notify mechanism for
Xen like this patch.
I want to add a new chardev socket way in colo-comapre connect to Xen
colo, for notify
checkpoint or failover, Because We have no choice to use this way
communicate with Xen codes.
That's means we will have two notify mechanism.
What do you think about this?


Thanks
Zhang Chen

I was thinking the possibility of using similar way to for colo compare.
E.g can we use socket? This can saves duplicated codes more or less.


Since there are too many sockets used by filter and COLO, (Two unix 
sockets and two
 tcp sockets for each vNIC), I don't want to introduce more ;) , but 
i'm not sure if it is
possible to make it more flexible and optional, abstract these 
duplicated codes,
pass the opened fd (No matter eventfd or socket fd ) as parameter, for 
example.

Is this way acceptable ?

Thanks,
Hailiang


Yes, that's kind of what I want. We don't want to use two message 
format. Passing a opened fd need management support, we still need a 
fallback if there's no management on top. For qemu/kvm, we can do all 
stuffs transparent to the cli by e.g socketpair() or others, but the key 
is to have a unified message format.


Thoughts?

Thanks




Thanks


.

Re: [Qemu-devel] [Qemu-block] [PATCH v6] migration/block: use blk_pwrite_zeroes for each zero cluster

2017-04-14 Thread Stefan Hajnoczi

On Fri, Apr 14, 2017 at 7:00 AM, Fam Zheng  wrote:
> On Thu, 04/13 10:34, jemmy858...@gmail.com wrote:
>> From: Lidong Chen 
>>
>> BLOCK_SIZE is (1 << 20), qcow2 cluster size is 65536 by default,
>> this may cause the qcow2 file size to be bigger after migration.
>> This patch checks each cluster, using blk_pwrite_zeroes for each
>> zero cluster.
>>
>> Reviewed-by: Stefan Hajnoczi 
>> Signed-off-by: Lidong Chen 
>> ---
>> v6 changelog:
>> Fix up some grammar in the comment.
>> ---
>>  migration/block.c | 35 +--
>>  1 file changed, 33 insertions(+), 2 deletions(-)
>>
>> diff --git a/migration/block.c b/migration/block.c
>> index 7734ff7..41c7a55 100644
>> --- a/migration/block.c
>> +++ b/migration/block.c
>> @@ -885,6 +885,8 @@ static int block_load(QEMUFile *f, void *opaque, int 
>> version_id)
>>  int64_t total_sectors = 0;
>>  int nr_sectors;
>>  int ret;
>> +BlockDriverInfo bdi;
>> +int cluster_size;
>>
>>  do {
>>  addr = qemu_get_be64(f);
>> @@ -919,6 +921,15 @@ static int block_load(QEMUFile *f, void *opaque, int 
>> version_id)
>>  error_report_err(local_err);
>>  return -EINVAL;
>>  }
>> +
>> +ret = bdrv_get_info(blk_bs(blk), );
>> +if (ret == 0 && bdi.cluster_size > 0 &&
>> +bdi.cluster_size <= BLOCK_SIZE &&
>> +BLOCK_SIZE % bdi.cluster_size == 0) {
>> +cluster_size = bdi.cluster_size;
>> +} else {
>> +cluster_size = BLOCK_SIZE;
>> +}
>>  }
>>
>>  if (total_sectors - addr < BDRV_SECTORS_PER_DIRTY_CHUNK) {
>> @@ -932,10 +943,30 @@ static int block_load(QEMUFile *f, void *opaque, int 
>> version_id)
>>  nr_sectors * BDRV_SECTOR_SIZE,
>>  BDRV_REQ_MAY_UNMAP);
>>  } else {
>> +int i;
>> +int64_t cur_addr;
>> +uint8_t *cur_buf;
>> +
>>  buf = g_malloc(BLOCK_SIZE);
>>  qemu_get_buffer(f, buf, BLOCK_SIZE);
>> -ret = blk_pwrite(blk, addr * BDRV_SECTOR_SIZE, buf,
>> - nr_sectors * BDRV_SECTOR_SIZE, 0);
>> +for (i = 0; i < BLOCK_SIZE / cluster_size; i++) {
>> +cur_addr = addr * BDRV_SECTOR_SIZE + i * cluster_size;
>> +cur_buf = buf + i * cluster_size;
>> +
>> +if ((!block_mig_state.zero_blocks ||
>> +cluster_size < BLOCK_SIZE) &&
>> +buffer_is_zero(cur_buf, cluster_size)) {
>> +ret = blk_pwrite_zeroes(blk, cur_addr,
>> +cluster_size,
>> +BDRV_REQ_MAY_UNMAP);
>> +} else {
>> +ret = blk_pwrite(blk, cur_addr, cur_buf,
>> + cluster_size, 0);
>> +}
>> +if (ret < 0) {
>> +break;
>> +}
>> +}
>>  g_free(buf);
>>  }
>
> Sorry for asking this question so late, but, before it gets too late: did you
> evaluate the performance impact of this change under real world workload?
>
> Effectively, if no cluster is zero, this patch still splits a big write into
> small ones, which is the opposition of usual performance optimizations (i.e.
> trying to coalesce requests).

Good point!

Another patch can modify the loop to perform the largest writes
possible.  In other words, do not perform the write immediately and
keep a cluster counter instead.  When the zero/non-zero state changes,
perform the write for the accumulated cluster count.

Stefan

Re: [Qemu-devel] [PATCH v6] migration/block: use blk_pwrite_zeroes for each zero cluster

2017-04-14 Thread 858585 jemmy

On Fri, Apr 14, 2017 at 2:00 PM, Fam Zheng  wrote:
> On Thu, 04/13 10:34, jemmy858...@gmail.com wrote:
>> From: Lidong Chen 
>>
>> BLOCK_SIZE is (1 << 20), qcow2 cluster size is 65536 by default,
>> this may cause the qcow2 file size to be bigger after migration.
>> This patch checks each cluster, using blk_pwrite_zeroes for each
>> zero cluster.
>>
>> Reviewed-by: Stefan Hajnoczi 
>> Signed-off-by: Lidong Chen 
>> ---
>> v6 changelog:
>> Fix up some grammar in the comment.
>> ---
>>  migration/block.c | 35 +--
>>  1 file changed, 33 insertions(+), 2 deletions(-)
>>
>> diff --git a/migration/block.c b/migration/block.c
>> index 7734ff7..41c7a55 100644
>> --- a/migration/block.c
>> +++ b/migration/block.c
>> @@ -885,6 +885,8 @@ static int block_load(QEMUFile *f, void *opaque, int 
>> version_id)
>>  int64_t total_sectors = 0;
>>  int nr_sectors;
>>  int ret;
>> +BlockDriverInfo bdi;
>> +int cluster_size;
>>
>>  do {
>>  addr = qemu_get_be64(f);
>> @@ -919,6 +921,15 @@ static int block_load(QEMUFile *f, void *opaque, int 
>> version_id)
>>  error_report_err(local_err);
>>  return -EINVAL;
>>  }
>> +
>> +ret = bdrv_get_info(blk_bs(blk), );
>> +if (ret == 0 && bdi.cluster_size > 0 &&
>> +bdi.cluster_size <= BLOCK_SIZE &&
>> +BLOCK_SIZE % bdi.cluster_size == 0) {
>> +cluster_size = bdi.cluster_size;
>> +} else {
>> +cluster_size = BLOCK_SIZE;
>> +}
>>  }
>>
>>  if (total_sectors - addr < BDRV_SECTORS_PER_DIRTY_CHUNK) {
>> @@ -932,10 +943,30 @@ static int block_load(QEMUFile *f, void *opaque, int 
>> version_id)
>>  nr_sectors * BDRV_SECTOR_SIZE,
>>  BDRV_REQ_MAY_UNMAP);
>>  } else {
>> +int i;
>> +int64_t cur_addr;
>> +uint8_t *cur_buf;
>> +
>>  buf = g_malloc(BLOCK_SIZE);
>>  qemu_get_buffer(f, buf, BLOCK_SIZE);
>> -ret = blk_pwrite(blk, addr * BDRV_SECTOR_SIZE, buf,
>> - nr_sectors * BDRV_SECTOR_SIZE, 0);
>> +for (i = 0; i < BLOCK_SIZE / cluster_size; i++) {
>> +cur_addr = addr * BDRV_SECTOR_SIZE + i * cluster_size;
>> +cur_buf = buf + i * cluster_size;
>> +
>> +if ((!block_mig_state.zero_blocks ||
>> +cluster_size < BLOCK_SIZE) &&
>> +buffer_is_zero(cur_buf, cluster_size)) {
>> +ret = blk_pwrite_zeroes(blk, cur_addr,
>> +cluster_size,
>> +BDRV_REQ_MAY_UNMAP);
>> +} else {
>> +ret = blk_pwrite(blk, cur_addr, cur_buf,
>> + cluster_size, 0);
>> +}
>> +if (ret < 0) {
>> +break;
>> +}
>> +}
>>  g_free(buf);
>>  }
>
> Sorry for asking this question so late, but, before it gets too late: did you
> evaluate the performance impact of this change under real world workload?
>
> Effectively, if no cluster is zero, this patch still splits a big write into
> small ones, which is the opposition of usual performance optimizations (i.e.
> trying to coalesce requests).

I test this patch for qcow2, the migration speed is the same before
apply this patch.

Do you know some other format which have very small cluster size?

>
> Fam

[Qemu-devel] [PATCH] net/filter-rewriter.c: Fix trace_event print bug

2017-04-14 Thread Zhang Chen

Because of inet_ntoa() return a statically allocated buffer,
subsequent calls will overwrite, So we fix this bug.

Signed-off-by: Zhang Chen 
---
 net/filter-rewriter.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c
index 91cf6b2..aad00b4 100644
--- a/net/filter-rewriter.c
+++ b/net/filter-rewriter.c
@@ -69,8 +69,13 @@ static int handle_primary_tcp_pkt(RewriterState *rf,
 
 tcp_pkt = (struct tcphdr *)pkt->transport_header;
 if (trace_event_get_state(TRACE_COLO_FILTER_REWRITER_DEBUG)) {
+char ip_src[20], ip_dst[20];
+
+strcpy(ip_src, inet_ntoa(pkt->ip->ip_src));
+strcpy(ip_dst, inet_ntoa(pkt->ip->ip_dst));
+
 trace_colo_filter_rewriter_pkt_info(__func__,
-inet_ntoa(pkt->ip->ip_src), inet_ntoa(pkt->ip->ip_dst),
+ip_src, ip_dst,
 ntohl(tcp_pkt->th_seq), ntohl(tcp_pkt->th_ack),
 tcp_pkt->th_flags);
 trace_colo_filter_rewriter_conn_offset(conn->offset);
@@ -144,8 +149,13 @@ static int handle_secondary_tcp_pkt(RewriterState *rf,
 tcp_pkt = (struct tcphdr *)pkt->transport_header;
 
 if (trace_event_get_state(TRACE_COLO_FILTER_REWRITER_DEBUG)) {
+char ip_src[20], ip_dst[20];
+
+strcpy(ip_src, inet_ntoa(pkt->ip->ip_src));
+strcpy(ip_dst, inet_ntoa(pkt->ip->ip_dst));
+
 trace_colo_filter_rewriter_pkt_info(__func__,
-inet_ntoa(pkt->ip->ip_src), inet_ntoa(pkt->ip->ip_dst),
+ip_src, ip_dst,
 ntohl(tcp_pkt->th_seq), ntohl(tcp_pkt->th_ack),
 tcp_pkt->th_flags);
 trace_colo_filter_rewriter_conn_offset(conn->offset);
-- 
2.7.4

Re: [Qemu-devel] [PATCH] progress: Show current progress on SIGINFO

2017-04-14 Thread Fam Zheng

On Thu, 04/13 22:38, Max Reitz wrote:
> On 08.02.2017 00:57, Max Reitz wrote:
> > Currently we only print progress information on retrieval of SIGUSR1.
> > Some systems have a dedicated SIGINFO for this, however, so it should be
> > handled appropriately if it is available.
> > 
> > Buglink: https://bugs.launchpad.net/qemu/+bug/1662468
> > Signed-off-by: Max Reitz 
> > ---
> > Can anyone test this on a BSD system? O:-)
> > ---
> >  util/qemu-progress.c | 3 +++
> >  qemu-img.texi| 3 ++-
> >  2 files changed, 5 insertions(+), 1 deletion(-)
> 
> Ping
> 
> (If nobody replies in the next month or so, I'm just going to merge
> this, as util/qemu-progress.c doesn't have a maintainer.)

qemu-img.c is the sole user, so it implicitly belongs to the block maintainers,
I assume.

Actually, why not add this file under block layer section of MAINTAINERS?

Fam

Re: [Qemu-devel] [PATCH v2 0/5] FTGMAC100 nic model for the Aspeed SoCs

2017-04-14 Thread Jason Wang




On 2017年04月13日 17:41, Cédric Le Goater wrote:

Hi,

The Aspeed SoCs AST2400 and AST2500 have two FTGMAC100 ethernet
controllers. This serie proposes a model for this device and a way to
customize the bit definitions which are slightly different from the
Faraday definitions.

The last patch adds a fake NC-SI (Network Controller Sideband
Interface) backend to pretend a NIC is being managed. This is only
usable with the slirp stack.

The model has been tested on the 'palmetto', 'romulus' and
'ast2500-evb' machines using different implementations of the Linux
driver and with U-Boot.  It has been stressed with iperf.

Thanks,

C.

Cédric Le Goater (5):
   hw/net: add MII definitions
   net: add FTGMAC100 support
   net/ftgmac100: add a 'aspeed' property
   aspeed: add a FTGMAC100 nic
   slirp: add a fake NC-SI backend

  default-configs/arm-softmmu.mak |1 +
  hw/arm/aspeed_soc.c |   21 +
  hw/net/Makefile.objs|1 +
  hw/net/ftgmac100.c  | 1016 +++
  include/hw/arm/aspeed_soc.h |2 +
  include/hw/net/ftgmac100.h  |   64 +++
  include/hw/net/mii.h|   71 ++-
  include/net/eth.h   |1 +
  slirp/Makefile.objs |2 +-
  slirp/ncsi-pkt.h|  418 
  slirp/ncsi.c|   78 +++
  slirp/slirp.c   |4 +
  slirp/slirp.h   |3 +
  13 files changed, 1663 insertions(+), 19 deletions(-)
  create mode 100644 hw/net/ftgmac100.c
  create mode 100644 include/hw/net/ftgmac100.h
  create mode 100644 slirp/ncsi-pkt.h
  create mode 100644 slirp/ncsi.c



Want to merge but looks like patch 2 does not apply cleanly on qemu master.

Thanks

Re: [Qemu-devel] [PATCH 02/15] colo-compare: implement the process of checkpoint

2017-04-14 Thread Hailiang Zhang


Hi Jason,

On 2017/4/14 13:57, Jason Wang wrote:


On 2017年02月22日 17:31, Zhang Chen wrote:


On 02/22/2017 11:42 AM, zhanghailiang wrote:

While do checkpoint, we need to flush all the unhandled packets,
By using the filter notifier mechanism, we can easily to notify
every compare object to do this process, which runs inside
of compare threads as a coroutine.

Hi~ Jason and Hailiang.

I will send a patch set later about colo-compare notify mechanism for
Xen like this patch.
I want to add a new chardev socket way in colo-comapre connect to Xen
colo, for notify
checkpoint or failover, Because We have no choice to use this way
communicate with Xen codes.
That's means we will have two notify mechanism.
What do you think about this?


Thanks
Zhang Chen

I was thinking the possibility of using similar way to for colo compare.
E.g can we use socket? This can saves duplicated codes more or less.


Since there are too many sockets used by filter and COLO, (Two unix sockets and 
two
 tcp sockets for each vNIC), I don't want to introduce more ;) , but i'm not 
sure if it is
possible to make it more flexible and optional, abstract these duplicated codes,
pass the opened fd (No matter eventfd or socket fd ) as parameter, for example.
Is this way acceptable ?

Thanks,
Hailiang


Thanks


.

Re: [Qemu-devel] [Xen-devel][PATCH] configure: introduce --enable-xen-fb-backend

2017-04-14 Thread Oleksandr Andrushchenko


On 04/14/2017 03:12 AM, Stefano Stabellini wrote:

On Tue, 11 Apr 2017, Oleksandr Andrushchenko wrote:

From: Oleksandr Andrushchenko 

For some use cases when Xen framebuffer/input backend
is not a part of Qemu it is required to disable it,
because of conflicting access to input/display devices.
Introduce additional configuration option for explicit
input/display control.

In these cases when you don't want xenfb, why don't you just remove
"vfb" from the xl config file? QEMU only starts the xenfb backend when
requested by the toolstack.

Is it because you have an alternative xenfb backend? If so, is it really
fully xenfb compatible, or is it a different protocol? If it is a
different protocol, I suggest you rename your frontend/backend PV device
name to something different from "vfb".


Well, offending part is vkbd actually (for multi-touch
we run our own user-space backend which supports
kbd/ptr/mtouch), but vfb and vkbd is the same backend
in QEMU. So, I am ok for vfb, but just want vkbd off
So, there are 2 options:
1. At compile time remove vkbd and still allow vfb
2. Remove xenfb completely, if acceptable (this is my case)

Signed-off-by: Oleksandr Andrushchenko 
---
  configure | 18 ++
  hw/display/Makefile.objs  |  2 +-
  hw/xen/xen_backend.c  |  2 ++
  hw/xenpv/xen_machine_pv.c |  4 
  4 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/configure b/configure
index 476210b1b93f..b805cb908f03 100755
--- a/configure
+++ b/configure
@@ -220,6 +220,7 @@ xen=""
  xen_ctrl_version=""
  xen_pv_domain_build="no"
  xen_pci_passthrough=""
+xen_fb_backend=""
  linux_aio=""
  cap_ng=""
  attr=""
@@ -909,6 +910,10 @@ for opt do
;;
--enable-xen-pv-domain-build) xen_pv_domain_build="yes"
;;
+  --disable-xen-fb-backend) xen_fb_backend="no"
+  ;;
+  --enable-xen-fb-backend) xen_fb_backend="yes"
+  ;;
--disable-brlapi) brlapi="no"
;;
--enable-brlapi) brlapi="yes"
@@ -1368,6 +1373,7 @@ disabled with --disable-FEATURE, default is enabled if 
available:
virtfs  VirtFS
xen xen backend driver support
xen-pci-passthrough
+  xen-fb-backend  framebuffer/input backend support
brlapi  BrlAPI (Braile)
curlcurl connectivity
fdt fdt device tree
@@ -2213,6 +2219,15 @@ if test "$xen_pv_domain_build" = "yes" &&
   "which requires Xen support."
  fi
  
+if test "$xen_fb_backend" != "no"; then

+   if test "$xen" = "yes"; then
+ xen_fb_backend=yes
+   else
+ error_exit "User requested feature Xen framebufer backend support" \
+" but this feature requires Xen support."
+   fi
+fi
+
  ##
  # Sparse probe
  if test "$sparse" != "no" ; then
@@ -5444,6 +5459,9 @@ if test "$xen" = "yes" ; then
if test "$xen_pv_domain_build" = "yes" ; then
  echo "CONFIG_XEN_PV_DOMAIN_BUILD=y" >> $config_host_mak
fi
+  if test "$xen_fb_backend" = "yes" ; then
+echo "CONFIG_XEN_FB_BACKEND=y" >> $config_host_mak
+  fi
  fi
  if test "$linux_aio" = "yes" ; then
echo "CONFIG_LINUX_AIO=y" >> $config_host_mak
diff --git a/hw/display/Makefile.objs b/hw/display/Makefile.objs
index 063889beaf4a..f5ec97ed4f48 100644
--- a/hw/display/Makefile.objs
+++ b/hw/display/Makefile.objs
@@ -5,7 +5,7 @@ common-obj-$(CONFIG_JAZZ_LED) += jazz_led.o
  common-obj-$(CONFIG_PL110) += pl110.o
  common-obj-$(CONFIG_SSD0303) += ssd0303.o
  common-obj-$(CONFIG_SSD0323) += ssd0323.o
-common-obj-$(CONFIG_XEN_BACKEND) += xenfb.o
+common-obj-$(CONFIG_XEN_FB_BACKEND) += xenfb.o
  
  common-obj-$(CONFIG_VGA_PCI) += vga-pci.o

  common-obj-$(CONFIG_VGA_ISA) += vga-isa.o
diff --git a/hw/xen/xen_backend.c b/hw/xen/xen_backend.c
index d1190041ae12..5146cbba6ca5 100644
--- a/hw/xen/xen_backend.c
+++ b/hw/xen/xen_backend.c
@@ -582,7 +582,9 @@ void xen_be_register_common(void)
  xen_set_dynamic_sysbus();
  
  xen_be_register("console", _console_ops);

+#ifdef CONFIG_XEN_FB_BACKEND
  xen_be_register("vkbd", _kbdmouse_ops);
+#endif
  xen_be_register("qdisk", _blkdev_ops);
  #ifdef CONFIG_USB_LIBUSB
  xen_be_register("qusb", _usb_ops);
diff --git a/hw/xenpv/xen_machine_pv.c b/hw/xenpv/xen_machine_pv.c
index 79aef4ecc37b..b731344c3f0a 100644
--- a/hw/xenpv/xen_machine_pv.c
+++ b/hw/xenpv/xen_machine_pv.c
@@ -68,7 +68,9 @@ static void xen_init_pv(MachineState *machine)
  }
  
  xen_be_register_common();

+#ifdef CONFIG_XEN_FB_BACKEND
  xen_be_register("vfb", _framebuffer_ops);
+#endif
  xen_be_register("qnic", _netdev_ops);
  
  /* configure framebuffer */

@@ -95,8 +97,10 @@ static void xen_init_pv(MachineState *machine)
  /* config cleanup hook */
  atexit(xen_config_cleanup);
  
+#ifdef CONFIG_XEN_FB_BACKEND

  /* setup framebuffer */
  xen_init_display(xen_domid);
+#endif
  }
  
  static void xenpv_machine_init(MachineClass *mc)

--

Re: [Qemu-devel] [PATCH 0/5] mc146818rtc: fix Windows VM clock faster

2017-04-14 Thread Xiao Guangrong




On 04/14/2017 01:09 PM, Paolo Bonzini wrote:



On 13/04/2017 16:52, Xiao Guangrong wrote:

On 04/13/2017 04:39 PM, Xiao Guangrong wrote:

On 04/13/2017 02:37 PM, Paolo Bonzini wrote:

However, I think that the above should be exactly how the RTC should
work.  The original RTC circuit had 22 divider stages (see page 13 of
the datasheet[1], at the bottom right), and the periodic interrupt taps
the rising edge of one of the dividers (page 16, second paragraph).  The
datasheet also never mentions a comparator being used to trigger the
periodic interrupts.



That was my thought before, however, after more test, i am not sure if
re-configuring RegA changes these divider stages internal...


It's unlikely because there is a separate divider reset command.  But
Hailiang found the same problem, and Bochs does the same implementation
as you.



Happy to see the same approach is being used in Bochs. Thanks for your
check. :)

Re: [Qemu-devel] [PATCH v6] migration/block: use blk_pwrite_zeroes for each zero cluster

2017-04-14 Thread Fam Zheng

On Thu, 04/13 10:34, jemmy858...@gmail.com wrote:
> From: Lidong Chen 
> 
> BLOCK_SIZE is (1 << 20), qcow2 cluster size is 65536 by default,
> this may cause the qcow2 file size to be bigger after migration.
> This patch checks each cluster, using blk_pwrite_zeroes for each
> zero cluster.
> 
> Reviewed-by: Stefan Hajnoczi 
> Signed-off-by: Lidong Chen 
> ---
> v6 changelog:
> Fix up some grammar in the comment.
> ---
>  migration/block.c | 35 +--
>  1 file changed, 33 insertions(+), 2 deletions(-)
> 
> diff --git a/migration/block.c b/migration/block.c
> index 7734ff7..41c7a55 100644
> --- a/migration/block.c
> +++ b/migration/block.c
> @@ -885,6 +885,8 @@ static int block_load(QEMUFile *f, void *opaque, int 
> version_id)
>  int64_t total_sectors = 0;
>  int nr_sectors;
>  int ret;
> +BlockDriverInfo bdi;
> +int cluster_size;
>  
>  do {
>  addr = qemu_get_be64(f);
> @@ -919,6 +921,15 @@ static int block_load(QEMUFile *f, void *opaque, int 
> version_id)
>  error_report_err(local_err);
>  return -EINVAL;
>  }
> +
> +ret = bdrv_get_info(blk_bs(blk), );
> +if (ret == 0 && bdi.cluster_size > 0 &&
> +bdi.cluster_size <= BLOCK_SIZE &&
> +BLOCK_SIZE % bdi.cluster_size == 0) {
> +cluster_size = bdi.cluster_size;
> +} else {
> +cluster_size = BLOCK_SIZE;
> +}
>  }
>  
>  if (total_sectors - addr < BDRV_SECTORS_PER_DIRTY_CHUNK) {
> @@ -932,10 +943,30 @@ static int block_load(QEMUFile *f, void *opaque, int 
> version_id)
>  nr_sectors * BDRV_SECTOR_SIZE,
>  BDRV_REQ_MAY_UNMAP);
>  } else {
> +int i;
> +int64_t cur_addr;
> +uint8_t *cur_buf;
> +
>  buf = g_malloc(BLOCK_SIZE);
>  qemu_get_buffer(f, buf, BLOCK_SIZE);
> -ret = blk_pwrite(blk, addr * BDRV_SECTOR_SIZE, buf,
> - nr_sectors * BDRV_SECTOR_SIZE, 0);
> +for (i = 0; i < BLOCK_SIZE / cluster_size; i++) {
> +cur_addr = addr * BDRV_SECTOR_SIZE + i * cluster_size;
> +cur_buf = buf + i * cluster_size;
> +
> +if ((!block_mig_state.zero_blocks ||
> +cluster_size < BLOCK_SIZE) &&
> +buffer_is_zero(cur_buf, cluster_size)) {
> +ret = blk_pwrite_zeroes(blk, cur_addr,
> +cluster_size,
> +BDRV_REQ_MAY_UNMAP);
> +} else {
> +ret = blk_pwrite(blk, cur_addr, cur_buf,
> + cluster_size, 0);
> +}
> +if (ret < 0) {
> +break;
> +}
> +}
>  g_free(buf);
>  }

Sorry for asking this question so late, but, before it gets too late: did you
evaluate the performance impact of this change under real world workload?

Effectively, if no cluster is zero, this patch still splits a big write into
small ones, which is the opposition of usual performance optimizations (i.e.
trying to coalesce requests).

Fam

78 matches

Mail list logo