date:20160525

Re: [Qemu-devel] [PATCH v5 27/27] qemu-iotests: Add test case 153 for image locking

2016-05-25 Thread Max Reitz

On 17.05.2016 09:35, Fam Zheng wrote:
> Signed-off-by: Fam Zheng 
> ---
>  tests/qemu-iotests/153 | 197 +
>  tests/qemu-iotests/153.out | 426 
> +
>  tests/qemu-iotests/group   |   1 +
>  3 files changed, 624 insertions(+)
>  create mode 100755 tests/qemu-iotests/153
>  create mode 100644 tests/qemu-iotests/153.out
> 
> diff --git a/tests/qemu-iotests/153 b/tests/qemu-iotests/153
> new file mode 100755
> index 000..cbeda8e
> --- /dev/null
> +++ b/tests/qemu-iotests/153

[...]

> +for opts1 in "" "readonly=on" "lock-mode=off" "lock-mode=shared"; do
> +echo
> +echo "== Creating base image =="
> +TEST_IMG="${TEST_IMG}.base" _make_test_img $size
> +
> +echo
> +echo "== Creating test image =="
> +$QEMU_IMG create -f $IMGFMT "${TEST_IMG}" -b ${TEST_IMG}.base | 
> _filter_img_create

_make_test_img understands -b, so maybe you want to use it here.

Also I'm not sure why you're creating the images inside of this loop,
but it certainly isn't wrong.

But maybe you want to add a line that echos $opts1 somewhere here.

However you decide:

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v3 05/15] block: Convert block job core to BlockBackend

2016-05-25 Thread Max Reitz

On 25.05.2016 14:29, Kevin Wolf wrote:
> This adds a new BlockBackend field to the BlockJob struct, which
> coexists with the BlockDriverState while converting the individual jobs.
> 
> When creating a block job, a new BlockBackend is created on top of the
> given BlockDriverState, and it is destroyed when the BlockJob ends. The
> reference to the BDS is now held by the BlockBackend instead of calling
> bdrv_ref/unref manually.
> 
> We have to be careful when we use bdrv_replace_in_backing_chain() in
> block jobs because this changes the BDS that job->blk points to. At the
> moment block jobs are too tightly coupled with their BDS, so that moving
> a job to another BDS isn't easily possible; therefore, we need to just
> manually undo this change afterwards.
> 
> Signed-off-by: Kevin Wolf 
> Reviewed-by: Eric Blake 
> Reviewed-by: Alberto Garcia 
> ---
>  block/mirror.c   |  3 +++
>  blockjob.c   | 37 -
>  include/block/blockjob.h |  3 ++-
>  3 files changed, 25 insertions(+), 18 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v3 07/15] stream: Use BlockBackend for I/O

2016-05-25 Thread Max Reitz

On 25.05.2016 14:29, Kevin Wolf wrote:
> This changes the streaming block job to use the job's BlockBackend for
> performing the COR reads. job->bs isn't used by the streaming code any
> more afterwards.
> 
> Signed-off-by: Kevin Wolf 
> Reviewed-by: Eric Blake 
> Reviewed-by: Alberto Garcia 
> ---
>  block/io.c|  9 -
>  block/stream.c| 15 +--
>  include/block/block.h |  2 --
>  trace-events  |  1 -
>  4 files changed, 9 insertions(+), 18 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v2 2/3] atomics: emit an smp_read_barrier_depends() barrier only for Sparc and Thread Sanitizer

2016-05-25 Thread Emilio G. Cota

On Wed, May 25, 2016 at 14:16:56 +0200, Paolo Bonzini wrote:
> On 24/05/2016 22:06, Emilio G. Cota wrote:
> > For correctness, smp_read_barrier_depends() is only required to
> > emit a barrier on Sparc hosts. However, we are currently emitting
> > a consume fence unconditionally.
> 
> Let's say why this is suboptimal:
> 
> ... and most compilers currently treat consume and acquire fences as 
> equivalent.
> 
> Likewise, let's add a comment like this:
> 
> +/* Most compilers currently treat consume and acquire the same, but really
> + * no processors except Alpha need a barrier here.  Leave it in if
> + * using Thread Sanitizer to avoid warnings, otherwise optimize it away.
> + */
> 
> If okay I can do the change myself.

On Wed, May 25, 2016 at 14:20:02 +0200, Paolo Bonzini wrote:
> On 24/05/2016 22:06, Emilio G. Cota wrote:
> > Currently we emit a consume-load in atomic_rcu_read. This is
> > overkill for non-Sparc hosts, and is only useful to make
> > things easier for Thread Sanitizer, which as far as I understand
> > works best without explicit fences.
> 
> Likewise:
> 
> Currently we emit a consume-load in atomic_rcu_read.  Because of
> limitations in current compilers, this is overkill for non-Alpha hosts
> and it is only useful to make Thread Sanitizer work.
> 
> and
> 
> +/* See above: most compilers currently treat consume and acquire the
> + * same, but this slows down atomic_rcu_read unnecessarily.
> + */

Please go ahead with these changes. Don't forget to s/Sparc/Alpha/ on the
commit messages! There are 3 bogus Sparc's in the commit log of
(my) patch 2/3, including the commit title.

Thanks,

Emilio

Re: [Qemu-devel] [PATCH v5 26/27] block: Turn on image locking by default

2016-05-25 Thread Max Reitz

On 17.05.2016 09:35, Fam Zheng wrote:
> Now that test cases are covered, we can turn it on.
> 
> Signed-off-by: Fam Zheng 
> ---
>  blockdev.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [Bug 1585533] Re: cache-miss-rate / Invalid JSON

2016-05-25 Thread Paolo Bonzini

Marc, what distro are you running on?  QEMU 2.3 is not maintained
anymore upstream, so unless you are running on Ubuntu (and then we can
reuse this bug tracker) you'll have to reopen the bug in your distro.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1585533

Title:
  cache-miss-rate / Invalid JSON

Status in QEMU:
  New

Bug description:
  Hi,

  We have VMs which were started with an older version than qemu 2.1
  which added "cache-miss-rate" property for XBZRLECacheStats. While
  trying to migrate the VM to a new host which is running a higher
  version (2.3) of Qemu we got an exception:

  virJSONValueFromString:1642 : internal error: cannot parse json {"return": 
{"expected-downtime": 1, "xbzrle-cache": {"bytes": 0, "cache-size": 67108864, 
"cache-miss-rate": -nan, "pages": 0, "overflow": 0, "cache-miss": 8933}, 
"status": "active", "disk": {"total": 429496729600, "dirty-sync-count": 0, 
"remaining": 193896382464, "mbps": 0, "transferred": 235600347136, "duplicate": 
0, "dirty-pages-rate": 0, "skipped": 0, "normal-bytes": 0, "normal": 0}, 
"setup-time": 13, "total-time": 1543124, "ram": {"total": 8599183360, 
"dirty-sync-count": 4, "remaining": 30695424, "mbps": 830.636997, 
"transferred": 3100448901, "duplicate": 1358341, "dirty-pages-rate": 7, 
"skipped": 0, "normal-bytes": 3082199040, "normal": 752490}}, "id": 
"libvirt-186200"}: lexical error: malformed number, a digit is required after 
the minus sign.
67108864, "cache-miss-rate": -nan, "pages": 0, "overflow": 0
   (right here) --^

  virNetClientStreamRaiseError:191 : stream aborted at client request

  
  Would it be possible to improve the JSON parser to skip the key if the value 
is incorrect instead of throwing an exception? Then hopefully qemu 2.3 or 
higher is able to handle the data without this property, falling back to its 
default.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1585533/+subscriptions

[Qemu-devel] [RFC PATCH V4 1/4] colo-compare: introduce colo compare initialization

2016-05-25 Thread Zhang Chen

Packets coming from the primary char indev will be sent to outdev
Packets coming from the secondary char dev will be dropped

usage:

primary:
-netdev tap,id=hn0,vhost=off,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown
-device e1000,id=e0,netdev=hn0,mac=52:a4:00:12:78:66
-chardev socket,id=mirror0,host=3.3.3.3,port=9003,server,nowait
-chardev socket,id=compare1,host=3.3.3.3,port=9004,server,nowait
-chardev socket,id=compare0,host=3.3.3.3,port=9001,server,nowait
-chardev socket,id=compare0-0,host=3.3.3.3,port=9001
-chardev socket,id=compare_out,host=3.3.3.3,port=9005,server,nowait
-chardev socket,id=compare_out0,host=3.3.3.3,port=9005
-object filter-mirror,id=m0,netdev=hn0,queue=tx,outdev=mirror0
-object filter-redirector,netdev=hn0,id=redire0,queue=rx,indev=compare_out
-object filter-redirector,netdev=hn0,id=redire1,queue=rx,outdev=compare0
-object 
colo-compare,id=comp0,primary_in=compare0-0,secondary_in=compare1,outdev=compare_out0

secondary:
-netdev tap,id=hn0,vhost=off,script=/etc/qemu-ifup,down script=/etc/qemu-ifdown
-device e1000,netdev=hn0,mac=52:a4:00:12:78:66
-chardev socket,id=red0,host=3.3.3.3,port=9003
-chardev socket,id=red1,host=3.3.3.3,port=9004
-object filter-redirector,id=f1,netdev=hn0,queue=tx,indev=red0
-object filter-redirector,id=f2,netdev=hn0,queue=rx,outdev=red1

Signed-off-by: Zhang Chen 
Signed-off-by: Li Zhijian 
Signed-off-by: Wen Congyang 
---
 net/Makefile.objs  |   1 +
 net/colo-compare.c | 299 +
 qemu-options.hx|  34 ++
 vl.c   |   3 +-
 4 files changed, 336 insertions(+), 1 deletion(-)
 create mode 100644 net/colo-compare.c

diff --git a/net/Makefile.objs b/net/Makefile.objs
index b7c22fd..ba92f73 100644
--- a/net/Makefile.objs
+++ b/net/Makefile.objs
@@ -16,3 +16,4 @@ common-obj-$(CONFIG_NETMAP) += netmap.o
 common-obj-y += filter.o
 common-obj-y += filter-buffer.o
 common-obj-y += filter-mirror.o
+common-obj-y += colo-compare.o
diff --git a/net/colo-compare.c b/net/colo-compare.c
new file mode 100644
index 000..9768ed8
--- /dev/null
+++ b/net/colo-compare.c
@@ -0,0 +1,299 @@
+/*
+ * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
+ * (a.k.a. Fault Tolerance or Continuous Replication)
+ *
+ * Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
+ * Copyright (c) 2016 FUJITSU LIMITED
+ * Copyright (c) 2016 Intel Corporation
+ *
+ * Author: Zhang Chen 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/error-report.h"
+#include "qemu-common.h"
+#include "qapi/qmp/qerror.h"
+#include "qapi/error.h"
+#include "net/net.h"
+#include "net/vhost_net.h"
+#include "qom/object_interfaces.h"
+#include "qemu/iov.h"
+#include "qom/object.h"
+#include "qemu/typedefs.h"
+#include "net/queue.h"
+#include "sysemu/char.h"
+#include "qemu/sockets.h"
+#include "qapi-visit.h"
+#include "trace.h"
+
+#define TYPE_COLO_COMPARE "colo-compare"
+#define COLO_COMPARE(obj) \
+OBJECT_CHECK(CompareState, (obj), TYPE_COLO_COMPARE)
+
+#define COMPARE_READ_LEN_MAX NET_BUFSIZE
+
+static QTAILQ_HEAD(, CompareState) net_compares =
+   QTAILQ_HEAD_INITIALIZER(net_compares);
+
+typedef struct CompareState {
+Object parent;
+
+char *pri_indev;
+char *sec_indev;
+char *outdev;
+CharDriverState *chr_pri_in;
+CharDriverState *chr_sec_in;
+CharDriverState *chr_out;
+QTAILQ_ENTRY(CompareState) next;
+SocketReadState pri_rs;
+SocketReadState sec_rs;
+} CompareState;
+
+typedef struct CompareClass {
+ObjectClass parent_class;
+} CompareClass;
+
+static int compare_chr_send(CharDriverState *out,
+const uint8_t *buf,
+uint32_t size)
+{
+int ret = 0;
+uint32_t len = htonl(size);
+
+if (!size) {
+return 0;
+}
+
+ret = qemu_chr_fe_write_all(out, (uint8_t *), sizeof(len));
+if (ret != sizeof(len)) {
+goto err;
+}
+
+ret = qemu_chr_fe_write_all(out, (uint8_t *)buf, size);
+if (ret != size) {
+goto err;
+}
+
+return 0;
+
+err:
+return ret < 0 ? ret : -EIO;
+}
+
+static int compare_chr_can_read(void *opaque)
+{
+return COMPARE_READ_LEN_MAX;
+}
+
+/*
+ * called from the main thread on the primary for packets
+ * arriving over the socket from the primary.
+ */
+static void compare_pri_chr_in(void *opaque, const uint8_t *buf, int size)
+{
+CompareState *s = COLO_COMPARE(opaque);
+int ret;
+
+ret = net_fill_rstate(>pri_rs, buf, size);
+if (ret == -1) {
+qemu_chr_add_handlers(s->chr_pri_in, NULL, NULL, NULL, NULL);
+error_report("colo-compare primary_in error");
+}
+}
+
+/*
+ * called from the main thread on the primary for packets
+ * arriving over the socket from the secondary.

Re: [Qemu-devel] [PATCH v3 03/15] block: Cancel jobs first in bdrv_close_all()

2016-05-25 Thread Max Reitz

On 25.05.2016 14:29, Kevin Wolf wrote:
> So far, bdrv_close_all() first removed all root BlockDriverStates of
> BlockBackends and monitor owned BDSes, and then assumed that the
> remaining BDSes must be related to jobs and cancelled these jobs.
> 
> This order doesn't work that well any more when block jobs use
> BlockBackends internally because then they will lose their BDS before
> being cancelled.
> 
> This patch changes bdrv_close_all() to first cancel all jobs and then
> remove all root BDSes from the remaining BBs.
> 
> Signed-off-by: Kevin Wolf 
> Reviewed-by: Eric Blake 
> Reviewed-by: Alberto Garcia 
> ---
>  block.c  | 23 ++-
>  blockjob.c   | 13 +
>  include/block/blockjob.h |  7 +++
>  3 files changed, 22 insertions(+), 21 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 11/12] vfio: register aer resume notification handler for aer resume

2016-05-25 Thread Zhou Jie

Hi, Alex

>> Do we even need a timer?  What if we simply spin?
>>
>> for (i = 0; i < 1000; i++) {
>> if (vdev->pci_aer_resume_signaled) {
>> break;
>> }
>> g_usleep(1000);
>> }
>>
>> if (i == 1000) {
>> /* We failed */
>> } else {
>> /* Proceed with reset */
>> }
>>
>> Does QEMU have enough concurrency to do this?  Thanks,On 2016/5/25
In my test, it can not work.
vfio_aer_resume_notifier_handler is invoked after vfio_pci_reset.
It is useless to wait in vfio_pci_reset.

Sincerely,
Zhou Jie

14:23, Zhou Jie wrote:

Hi, Alex

  3. Stall any access to the device until resume is signaled.

The code below doesn't actually do this, mmaps are disabled, but that
just means any device access will use the slow path through QEMU.  That
path is still fully enabled and so is config space access.

I will modify the code and find other way to
stall any access to the device.

I find that to stall any access to the device,
I need modify the following function.
1. vfio_region_read and vfio_region_write
   For stalling any access to the device bar region.
2. vfio_vga_read and vfio_vga_write
   For stalling any access to the vga device region.
3. vfio_pci_read_config and vfio_pci_write_config
   For stalling any access to the device config space.

What will happen if I don't stall any access to the device?

Sincerely,
Zhou Jie

Re: [Qemu-devel] [patch v6 11/12] vfio: register aer resume notification handler for aer resume

2016-05-25 Thread Alex Williamson

On Wed, 25 May 2016 11:45:11 +0300
"Michael S. Tsirkin"  wrote:

> On Tue, May 24, 2016 at 08:54:06PM -0600, Alex Williamson wrote:
> > On Tue, 24 May 2016 13:49:12 +0300
> > "Michael S. Tsirkin"  wrote:
> >   
> > > On Tue, Apr 26, 2016 at 08:48:15AM -0600, Alex Williamson wrote:  
> > > > I think that means that if we want to switch from a
> > > > simple halt-on-error to a mechanism for the guest to handle recovery,
> > > > we need to disable access to the device between being notified that the
> > > > error occurred and being notified to resume.
> > > 
> > > But this isn't what happens on bare metal.
> > > Errors are reported asynchronously and host might access the device
> > > meanwhile.  These accesses might or might not trigger more errors, but
> > > fundamentally this should not matter too much as device is going to be
> > > reset.  
> > 
> > Bare metal also doesn't have a hypervisor underneath performing a PCI
> > bus reset,  
> 
> This is where I get lost. I assumed we do reset when guest
> requests it. Isn't that the case? Why not?

Unless we can somehow opt-out vfio-pci devices from AER handling on the
host, then the host is going to try to recover the device as part of
the core AER handling, which is not driver directed.  The host driver
can register various callbacks, which is where we get the
error_detected notification, but until we get the resume notification,
I think we assume the device is being operated on by the core.

At the same time we're using the error_detected to signal the AER event
to the guest, which begins a similar recovery process.  We don't
particularly want to rely on how the host has recovered the device and
the guest driver needs to be aware that the device state may need to be
recovered, so we let the recovery proceed in both the host and guest,
but it seems we need some synchronization point and guest accesses to
the device while we know the host is still in recovery may do more harm
than good.

> > there's only one OS trying to control the device at a time,
> > so we have some clear differences from bare metal that I don't know we
> > can avoid.  The thought here was that we need to notify the guest at the
> > earliest point we can, but let the host recovery run to completion
> > before allowing the user to interact with the device.  Perhaps there is
> > no need to block region access to the device (ie. config space & BAR
> > resources), but I think we do need to somehow synchronize the bus resets
> > or else we get situations like that observed previously where the bus is
> > still in reset while userspace trys to proceed with using it.
> >  
> 
> Why do we have to trigger reset upon an error?
> Why not wait for guest to request reset?

Is there a way to opt-out of host AER handling?  Do we want to create a
special case for vfio-pci to do this?  Does doing so allow the user to
exploit the host in anyway, such as the user failing to recover the
device, continuing to signal error events on the host, and perhaps
leaving the device in a lest trustworthy state when returned to the
host (not that we should ever be trusting the state of the device at
that point).

> > The next question then would be whether that's QEMU's job or something
> > that should be done in the host kernel.  It's been proposed to add yet
> > another eventfd for the kernel vfio-pci to signal QEMU when a resume
> > notification has occured, but perhaps the better approach would be for
> > the hot reset ioctl (and base reset ioctl) to handle this situation more
> > transparently.  We could immediately return -EAGAIN and allow QEMU to
> > delay itself for any reset ioctl received after the AER error detected
> > event, but before the resume event.  We could also allow some sort of
> > timeout, that the ioctl might enter an interruptible sleep, woken on
> > the resume notification or timeout.  That sounds a bit better to me as
> > the specification of what's allowed between the error detected
> > notification and the resume notification is otherwise pretty poorly
> > defined.  
> 
> So if guest started reset, it might take a while for
> device to come out of that state, and access during this
> time might trigger errors. But that's already possible
> for guest to trigger, right?  How is this different?

You can look back through the history of this series, there was an
arbitrary delay added after reset because the device was still in
reset (presumably host and guest reset racing).  Yes we do not
currently block access to the device during a reset, the issue is
mostly that we don't expect device resets to be occurring in the host
except when directed by the guest.  In this case we expect a host
directed reset is occurring and the guest directed reset seems to be a
synchronization point.

> >  Do you think we can run completely asynchronous, letting the
> > host and guest bus resets race?  Thanks,
> > 
> > Alex  
> 
> I have a feeling we need to put some code out,
>

Re: [Qemu-devel] [PATCH RFC kernel] balloon: speed up inflating/deflating process

2016-05-25 Thread Li, Liang Z

> > > Interesting. How about instead of tell host, we do multiple scans,
> > > each time ignoring pages out of range?
> > >
> > >   for (pfn = min pfn; pfn < max pfn; pfn += 1G) {
> > >   foreach page
> > >   if page pfn < pfn || page pfn >= pfn + 1G
> > >   continue
> > >   set bit
> > >   tell host
> > >   }
> > >
> >
> > That means we have to allocate/free all the requested pages first, and then
> tell the host.
> > It works fine for inflating, but for deflating, because the page has
> > been deleted from the vb-> vb_dev_info->pages, so, we have to use a
> > struct to save the dequeued pages before calling
> > release_pages_balloon(),
> 
> struct list_head?  I think you can just replace set_page_pfns with
> list_add(>lru, _list);
> 
> 

That's is fine, I will retry and get back to you.

Thanks!

Liang
> 
> >  I think a page bitmap is the best struct to save these pages, because it
> consumes less memory.
> > And that bitmap should be large enough to save pfn 0 to max_pfn.
> >
> > If the above is true, then we are back to the square one. we really need a
> large page bitmap. Right?
> >
> > Liang
> 
> These look like implementation issues to me.
> 
> I think the below might be helpful (completely untested), your work can go
> on top.
> 
> --->
> 
> virtio-balloon: rework deflate to add page to a tmp list
> 
> Will allow faster notifications using a bitmap down the road.
> 
> Signed-off-by: Michael S. Tsirkin 
> 
> ---
> 
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index 476c0e3..44050a3 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -195,8 +195,9 @@ static void release_pages_balloon(struct
> virtio_balloon *vb)  static unsigned leak_balloon(struct virtio_balloon *vb,
> size_t num)  {
>   unsigned num_freed_pages;
> - struct page *page;
> + struct page *page, *next;
>   struct balloon_dev_info *vb_dev_info = >vb_dev_info;
> + LIST_HEAD(pages);   /* Pages dequeued for handing to
> Host */
> 
>   /* We can only do one array worth at a time. */
>   num = min(num, ARRAY_SIZE(vb->pfns));
> @@ -207,10 +208,13 @@ static unsigned leak_balloon(struct virtio_balloon
> *vb, size_t num)
>   page = balloon_page_dequeue(vb_dev_info);
>   if (!page)
>   break;
> - set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
> + list_add(>lru, );
>   vb->num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE;
>   }
> 
> + list_for_each_entry_safe(page, next, , lru)
> + set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
> +
>   num_freed_pages = vb->num_pfns;
>   /*
>* Note that if
> --
> MST

[Qemu-devel] [PATCH] 9p: getattr: use fstat if we have a fd

2016-05-25 Thread Greg Kurz

If we have an opened fd, it is better to call fstat() as the underlying
file may have been unlinked and lstat() will fail.

Signed-off-by: Greg Kurz 
---

This QEMU patch goes with the create-unlink-getattr fix at:

http://git.kernel.org/cgit/linux/kernel/git/ericvh/v9fs.git/commit/?id=5eb393c464294b2183d526a25a344d7ae6ba8383

I could verify that a basic program doing open(O_CREAT)+unlink()+fstat() in
the guest works as expected.

Eric (or other v9fs maintainers),

Is there a chance the above kernel patch gets pushed upstream in a near
future ?

Cheers.

--
Greg

 hw/9pfs/9p.c |6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
index 482a35ead51b..572aaf09930b 100644
--- a/hw/9pfs/9p.c
+++ b/hw/9pfs/9p.c
@@ -1103,7 +1103,11 @@ static void v9fs_getattr(void *opaque)
  * Currently we only support BASIC fields in stat, so there is no
  * need to look at request_mask.
  */
-retval = v9fs_co_lstat(pdu, >path, );
+if (fidp->fs.fd > 0) {
+retval = v9fs_co_fstat(pdu, fidp, );
+} else {
+retval = v9fs_co_lstat(pdu, >path, );
+}
 if (retval < 0) {
 goto out;
 }

Re: [Qemu-devel] [PATCH 12/13] vmdk: Convert to bdrv_co_pwrite_zeroes()

2016-05-25 Thread Eric Blake

On 05/25/2016 08:23 AM, Kevin Wolf wrote:
> Am 25.05.2016 um 00:25 hat Eric Blake geschrieben:
>> Another step on our continuing quest to switch to byte-based
>> interfaces.
>>
>> Signed-off-by: Eric Blake 
>> ---
>>  block/vmdk.c | 13 ++---
>>  1 file changed, 6 insertions(+), 7 deletions(-)
>>
>> diff --git a/block/vmdk.c b/block/vmdk.c
>> index 8494d63..284d7a0 100644
>> --- a/block/vmdk.c
>> +++ b/block/vmdk.c
>> @@ -1704,15 +1704,14 @@ static int vmdk_write_compressed(BlockDriverState 
>> *bs,
>>  }
>>  }
>>
>> -static int coroutine_fn vmdk_co_write_zeroes(BlockDriverState *bs,
>> - int64_t sector_num,
>> - int nb_sectors,
>> - BdrvRequestFlags flags)
>> +static int coroutine_fn vmdk_co_pwrite_zeroes(BlockDriverState *bs,
>> +  int64_t offset,
>> +  int count,
>> +  BdrvRequestFlags flags)
>>  {
>>  int ret;
>>  BDRVVmdkState *s = bs->opaque;
>> -uint64_t offset = sector_num * BDRV_SECTOR_SIZE;
>> -uint64_t bytes = nb_sectors * BDRV_SECTOR_SIZE;
>> +uint64_t bytes = count;
> 
> That's an unnecessary variable again. Whether you decide to change it or
> not:
> 
> Reviewed-by: Kevin Wolf 

Unnecessary, except that it is 64-bit instead of the block layer
interface 32-bit, and I didn't want to have to think too hard about how
'bytes' was used in the rest of the function if I used the narrower type
from the get-go.  I also think that 'int count' is fishy, because it
forces us to think about negative values and placating code sanitizers
on undefined shift values; maybe we'd be better with making all byte
interfaces use 'uint32_t' (but still limiting ourselves to 0x8000 or
2G for any power-of-two limit, and 0x size transactions would
not be possible if request_alignment is larger than 1).  If we made that
switch, I'd still want to keep 0 as a no-op transaction, and not a
special case for a 4G transaction.  Still, now might be the time to do it.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v3 01/15] block: Rename blk_write_zeroes()

2016-05-25 Thread Max Reitz

On 25.05.2016 14:29, Kevin Wolf wrote:
> From: Eric Blake 
> 
> Commit 983a1600 changed the semantics of blk_write_zeroes() to
> be byte-based rather than sector-based, but did not change the
> name, which is an open invitation for other code to misuse the
> function.  Renaming to pwrite_zeroes() makes it more in line
> with other byte-based interfaces, and will help make it easier
> to track which remaining write_zeroes interfaces still need
> conversion.
> 
> Reported-by: Kevin Wolf 
> Signed-off-by: Eric Blake 
> Signed-off-by: Kevin Wolf 
> ---
>  block/block-backend.c  | 14 +++---
>  block/parallels.c  |  4 ++--
>  hw/scsi/scsi-disk.c|  2 +-
>  include/sysemu/block-backend.h | 14 +++---
>  qemu-img.c |  4 ++--
>  qemu-io-cmds.c | 22 +++---
>  6 files changed, 30 insertions(+), 30 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 00/23] GICv3 emulation

2016-05-25 Thread Shannon Zhao

On 2016年05月10日 01:29, Peter Maydell wrote:
> This series implements emulation of the GICv3 interrupt controller.
> It is based to some extent on previous patches from Shlomo and
> Pavel, but the bulk of it has turned out to be new code. (The
> combination of changing the underlying data structures, adding
> support for TrustZone and implementing proper GICv3 behaviour rather
> than borrowing fragments of GICv2 emulation code meant there wasn't
> much left to reuse.) I've tried to reflect this in the various
> authorship credits on the patches, but please let me know if you
> feel I got anything miscredited one way or the other.
> 
> Key points about the GICv3 emulated here:
>  * "non-legacy" only, ie system registers and affinity routing
>  * TrustZone is implemented
>  * no virtualization support
>  * only the "core" GICv3, so no LPI support (via ITS or otherwise)
> 
> I have included the "support KVM save/restore" patches from Pavel,
> reworked to use the new data structures, but they are only RFC
> status because the kernel API is not yet final (there are a couple
> of loose threads there to be followed up). Those patches are at the
> end of the series; I think everything else is in a commitable state
> (give or take code review).
> 
> Testing: I have confirmed that we can boot a Linux guest kernel,
> but not tried any other GIC-using guest code. I've done some light
> stress-testing using 'stress', and checked an SMP (2-cpu) boot.
> 
> Design: all the code here is in hw/intc/, split into several
> files to avoid them being huge. The interface between the CPU
> proper and the CPU interface is a bit ad-hoc (you can see the
> awkward seams that result from the choice to think of the cpu
> i/f as part of the GIC device rather than part of the CPU itself),
> but I think that if you put the cpu i/f in the CPU you'd end up
> with an ad-hoc interface and awkward seams in the other direction.
> The GICv3 device currently assumes it is always connected to all
> CPUs; we can change that later to allow some kind of QOM link
> property to specify the CPUs explicitly, but I think this is OK
> for now (and it's already a pretty huge set of code changes to
> have to review).
> 
> I include a workaround for a Linux kernel bug which otherwise
> prevents booting on the virt board. I've reported the bug to
> Marc Zyngier and he should post a kernel patch to fix it shortly,
> but I think we'll need to retain the workaround for the benefit
> of older kernels (an ugly but pragmatic choice).
> 
> 
> Code review, testing, attempts to run guests other than Linux
> welcome (UEFI, anybody?)
> 
> thanks
> -- PMM
> 
> Pavel Fedin (5):
>   target-arm: Add mp-affinity property for ARM CPU class
>   hw/intc/arm_gicv3: Add state information
>   hw/intc/arm_gicv3: Add vmstate descriptors
>   NOT-FOR-UPSTREAM: kernel: Add definitions for GICv3 attributes
>   RFC: hw/intc/arm_gicv3_kvm: Implement get/put functions
> 
> Peter Maydell (15):
>   migration: Define VMSTATE_UINT64_2DARRAY
>   bitops.h: Implement half-shuffle and half-unshuffle ops
>   target-arm: Define new arm_is_el3_or_mon() function
>   target-arm: Provide hook to tell GICv3 about changes of security state
>   hw/intc/arm_gicv3: Move irq lines into GICv3CPUState structure
>   hw/intc/arm_gicv3: Implement functions to identify next pending irq
>   hw/intc/arm_gicv3: Wire up distributor and redistributor MMIO regions
>   hw/intc/arm_gicv3: Implement gicv3_set_irq()
>   hw/intc/arm_gicv3: Implement GICv3 CPU interface registers
>   hw/intc/arm_gicv3: Implement gicv3_cpuif_update()
>   hw/intc/arm_gicv3: Implement CPU i/f SGI generation registers
>   hw/intc/arm_gicv3: Add IRQ handling CPU interface registers
>   target-arm/machine.c: Allow user to request GICv3 emulation
>   target-arm/monitor.c: Advertise emulated GICv3 in capabilities
>   hw/intc/arm_gicv3: Work around Linux assuming interrupts are group 1
> 
> Shlomo Pongratz (3):
>   hw/intc/arm_gicv3: ARM GICv3 device framework
>   hw/intc/arm_gicv3: Implement GICv3 distributor registers
>   hw/intc/arm_gicv3: Implement GICv3 redistributor registers
> 
Hi Peter, I have gone through these patches. I think I don't have
further comments. Sorry that I just reviewed the patches while studying
the SPEC, so there are only few problems I found(on the hand your
patches are good:) ). I hope I'll give effective comments on the next
version.

Thanks,
-- 
Shannon

Re: [Qemu-devel] [PATCH v3 09/15] mirror: Use BlockBackend for I/O

2016-05-25 Thread Max Reitz

On 25.05.2016 14:29, Kevin Wolf wrote:
> This changes the mirror block job to use the job's BlockBackend for
> performing its I/O. job->bs isn't used by the mirroring code any more
> afterwards.
> 
> Signed-off-by: Kevin Wolf 
> Reviewed-by: Eric Blake 
> ---
>  block/mirror.c | 70 
> --
>  blockdev.c |  4 +---
>  2 files changed, 40 insertions(+), 34 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v3 13/15] backup: Use BlockBackend for I/O

2016-05-25 Thread Max Reitz

On 25.05.2016 14:29, Kevin Wolf wrote:
> This changes the backup block job to use the job's BlockBackend for
> performing its I/O. job->bs isn't used by the backup code any more
> afterwards.
> 
> Signed-off-by: Kevin Wolf 
> Reviewed-by: Eric Blake 
> ---
>  block/backup.c| 46 +-
>  block/io.c|  9 -
>  blockdev.c|  4 +---
>  include/block/block.h |  2 --
>  trace-events  |  1 -
>  5 files changed, 22 insertions(+), 40 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH 1/6] hw/char: QOM'ify pl011 model

2016-05-25 Thread xiaoqiang zhao

* drop qemu_char_get_next_serial and use chardev prop
* add pl011_create wrapper function to create pl011 uart device
* change affected board code to use the new way

Signed-off-by: xiaoqiang zhao 
---
 hw/arm/bcm2835_peripherals.c | 16 +++---
 hw/arm/highbank.c|  3 ++-
 hw/arm/integratorcp.c|  5 +++--
 hw/arm/realview.c|  9 
 hw/arm/stellaris.c   |  6 +++--
 hw/arm/versatilepb.c |  9 
 hw/arm/vexpress.c|  9 
 hw/arm/virt.c|  1 +
 hw/char/pl011.c  | 11 +-
 include/hw/char/pl011.h  | 52 
 10 files changed, 86 insertions(+), 35 deletions(-)
 create mode 100644 include/hw/char/pl011.h

diff --git a/hw/arm/bcm2835_peripherals.c b/hw/arm/bcm2835_peripherals.c
index 234d518..46c320f 100644
--- a/hw/arm/bcm2835_peripherals.c
+++ b/hw/arm/bcm2835_peripherals.c
@@ -14,6 +14,7 @@
 #include "hw/misc/bcm2835_mbox_defs.h"
 #include "hw/arm/raspi_platform.h"
 #include "sysemu/char.h"
+#include "sysemu/sysemu.h"
 
 /* Peripheral base address on the VC (GPU) system bus */
 #define BCM2835_VC_PERI_BASE 0x7e00
@@ -106,7 +107,6 @@ static void bcm2835_peripherals_realize(DeviceState *dev, 
Error **errp)
 MemoryRegion *ram;
 Error *err = NULL;
 uint32_t ram_size, vcram_size;
-CharDriverState *chr;
 int n;
 
 obj = object_property_get_link(OBJECT(dev), "ram", );
@@ -147,6 +147,7 @@ static void bcm2835_peripherals_realize(DeviceState *dev, 
Error **errp)
 sysbus_pass_irq(SYS_BUS_DEVICE(s), SYS_BUS_DEVICE(>ic));
 
 /* UART0 */
+qdev_prop_set_chr(DEVICE(>uart0), "chardev", serial_hds[0]);
 object_property_set_bool(OBJECT(s->uart0), true, "realized", );
 if (err) {
 error_propagate(errp, err);
@@ -158,17 +159,8 @@ static void bcm2835_peripherals_realize(DeviceState *dev, 
Error **errp)
 sysbus_connect_irq(s->uart0, 0,
 qdev_get_gpio_in_named(DEVICE(>ic), BCM2835_IC_GPU_IRQ,
INTERRUPT_UART));
-
 /* AUX / UART1 */
-/* TODO: don't call qemu_char_get_next_serial() here, instead set
- * chardev properties for each uart at the board level, once pl011
- * (uart0) has been updated to avoid qemu_char_get_next_serial()
- */
-chr = qemu_char_get_next_serial();
-if (chr == NULL) {
-chr = qemu_chr_new("bcm2835.uart1", "null", NULL);
-}
-qdev_prop_set_chr(DEVICE(>aux), "chardev", chr);
+qdev_prop_set_chr(DEVICE(>aux), "chardev", serial_hds[1]);
 
 object_property_set_bool(OBJECT(>aux), true, "realized", );
 if (err) {
@@ -292,8 +284,6 @@ static void bcm2835_peripherals_class_init(ObjectClass *oc, 
void *data)
 DeviceClass *dc = DEVICE_CLASS(oc);
 
 dc->realize = bcm2835_peripherals_realize;
-/* Reason: realize() method uses qemu_char_get_next_serial() */
-dc->cannot_instantiate_with_device_add_yet = true;
 }
 
 static const TypeInfo bcm2835_peripherals_type_info = {
diff --git a/hw/arm/highbank.c b/hw/arm/highbank.c
index 41029a6..80e5fd4 100644
--- a/hw/arm/highbank.c
+++ b/hw/arm/highbank.c
@@ -30,6 +30,7 @@
 #include "sysemu/block-backend.h"
 #include "exec/address-spaces.h"
 #include "qemu/error-report.h"
+#include "hw/char/pl011.h"
 
 #define SMP_BOOT_ADDR   0x100
 #define SMP_BOOT_REG0x40
@@ -326,7 +327,7 @@ static void calxeda_init(MachineState *machine, enum 
cxmachines machine_id)
 busdev = SYS_BUS_DEVICE(dev);
 sysbus_mmio_map(busdev, 0, 0xfff34000);
 sysbus_connect_irq(busdev, 0, pic[18]);
-sysbus_create_simple("pl011", 0xfff36000, pic[20]);
+pl011_create(0xfff36000, pic[20], serial_hds[0]);
 
 dev = qdev_create(NULL, "highbank-regs");
 qdev_init_nofail(dev);
diff --git a/hw/arm/integratorcp.c b/hw/arm/integratorcp.c
index 24f1687..96dc150 100644
--- a/hw/arm/integratorcp.c
+++ b/hw/arm/integratorcp.c
@@ -20,6 +20,7 @@
 #include "exec/address-spaces.h"
 #include "sysemu/sysemu.h"
 #include "qemu/error-report.h"
+#include "hw/char/pl011.h"
 
 #define TYPE_INTEGRATOR_CM "integrator_core"
 #define INTEGRATOR_CM(obj) \
@@ -588,8 +589,8 @@ static void integratorcp_init(MachineState *machine)
 sysbus_create_varargs("integrator_pit", 0x1300,
   pic[5], pic[6], pic[7], NULL);
 sysbus_create_simple("pl031", 0x1500, pic[8]);
-sysbus_create_simple("pl011", 0x1600, pic[1]);
-sysbus_create_simple("pl011", 0x1700, pic[2]);
+pl011_create(0x1600, pic[1], serial_hds[0]);
+pl011_create(0x1700, pic[2], serial_hds[1]);
 icp = sysbus_create_simple(TYPE_ICP_CONTROL_REGS, 0xcb00,
qdev_get_gpio_in(sic, 3));
 sysbus_create_simple("pl050_keyboard", 0x1800, pic[3]);
diff --git a/hw/arm/realview.c b/hw/arm/realview.c
index 3222b36..7d0aa6f 100644
--- a/hw/arm/realview.c
+++ b/hw/arm/realview.c
@@ -23,6 +23,7 @@
 #include

[Qemu-devel] [PATCH 5/6] hw/char: QOM'ify xilinx_uartlite model

2016-05-25 Thread xiaoqiang zhao

* drop qemu_char_get_next_serial and use chardev prop
* create xilinx_uartlite_create wrapper function to create
  xilinx_uartlite device
* change affected board code to use the new way

Signed-off-by: xiaoqiang zhao 
---
 hw/char/xilinx_uartlite.c| 10 +
 hw/microblaze/petalogix_s3adsp1800_mmu.c |  5 +++--
 include/hw/char/xilinx_uartlite.h| 35 
 3 files changed, 44 insertions(+), 6 deletions(-)
 create mode 100644 include/hw/char/xilinx_uartlite.h

diff --git a/hw/char/xilinx_uartlite.c b/hw/char/xilinx_uartlite.c
index 911af4a..4847efb 100644
--- a/hw/char/xilinx_uartlite.c
+++ b/hw/char/xilinx_uartlite.c
@@ -172,6 +172,11 @@ static const MemoryRegionOps uart_ops = {
 }
 };
 
+static Property xilinx_uartlite_properties[] = {
+DEFINE_PROP_CHR("chardev", XilinxUARTLite, chr),
+DEFINE_PROP_END_OF_LIST(),
+};
+
 static void uart_rx(void *opaque, const uint8_t *buf, int size)
 {
 XilinxUARTLite *s = opaque;
@@ -206,8 +211,6 @@ static void xilinx_uartlite_realize(DeviceState *dev, Error 
**errp)
 {
 XilinxUARTLite *s = XILINX_UARTLITE(dev);
 
-/* FIXME use a qdev chardev prop instead of qemu_char_get_next_serial() */
-s->chr = qemu_char_get_next_serial();
 if (s->chr)
 qemu_chr_add_handlers(s->chr, uart_can_rx, uart_rx, uart_event, s);
 }
@@ -229,8 +232,7 @@ static void xilinx_uartlite_class_init(ObjectClass *klass, 
void *data)
 
 dc->reset = xilinx_uartlite_reset;
 dc->realize = xilinx_uartlite_realize;
-/* Reason: realize() method uses qemu_char_get_next_serial() */
-dc->cannot_instantiate_with_device_add_yet = true;
+dc->props = xilinx_uartlite_properties;
 }
 
 static const TypeInfo xilinx_uartlite_info = {
diff --git a/hw/microblaze/petalogix_s3adsp1800_mmu.c 
b/hw/microblaze/petalogix_s3adsp1800_mmu.c
index f821e1c..423bcd7 100644
--- a/hw/microblaze/petalogix_s3adsp1800_mmu.c
+++ b/hw/microblaze/petalogix_s3adsp1800_mmu.c
@@ -36,6 +36,7 @@
 #include "hw/boards.h"
 #include "sysemu/block-backend.h"
 #include "exec/address-spaces.h"
+#include "hw/char/xilinx_uartlite.h"
 
 #include "boot.h"
 
@@ -103,8 +104,8 @@ petalogix_s3adsp1800_init(MachineState *machine)
 irq[i] = qdev_get_gpio_in(dev, i);
 }
 
-sysbus_create_simple("xlnx.xps-uartlite", UARTLITE_BASEADDR,
- irq[UARTLITE_IRQ]);
+xilinx_uartlite_create(UARTLITE_BASEADDR, irq[UARTLITE_IRQ],
+   serial_hds[0]);
 
 /* 2 timers at irq 2 @ 62 Mhz.  */
 dev = qdev_create(NULL, "xlnx.xps-timer");
diff --git a/include/hw/char/xilinx_uartlite.h 
b/include/hw/char/xilinx_uartlite.h
new file mode 100644
index 000..8b4fc54
--- /dev/null
+++ b/include/hw/char/xilinx_uartlite.h
@@ -0,0 +1,35 @@
+/*
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see .
+ */
+
+#ifndef XILINX_UARTLITE_H
+#define XILINX_UARTLITE_H
+
+static inline DeviceState *xilinx_uartlite_create(hwaddr addr,
+qemu_irq irq,
+CharDriverState *chr)
+{
+DeviceState *dev;
+SysBusDevice *s;
+
+dev = qdev_create(NULL, "xlnx.xps-uartlite");
+s = SYS_BUS_DEVICE(dev);
+qdev_prop_set_chr(dev, "chardev", chr);
+qdev_init_nofail(dev);
+sysbus_mmio_map(s, 0, addr);
+sysbus_connect_irq(s, 0, irq);
+
+return dev;
+}
+
+#endif
-- 
2.1.4

[Qemu-devel] [PATCH v2 1/4] xlnx-zynqmp: Add a secure prop to en/disable ARM Security Extensions

2016-05-25 Thread Edgar E. Iglesias

From: "Edgar E. Iglesias" 

Add a secure prop to en/disable ARM Security Extensions.
This is particularly useful for KVM runs.

Default to disabled to match the behavior of KVM.

This changes the default setup from having the ARM Security
Extensions to not longer having them.

Signed-off-by: Edgar E. Iglesias 
---
 hw/arm/xlnx-zynqmp.c | 3 +++
 include/hw/arm/xlnx-zynqmp.h | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/hw/arm/xlnx-zynqmp.c b/hw/arm/xlnx-zynqmp.c
index 4d504da..965a250 100644
--- a/hw/arm/xlnx-zynqmp.c
+++ b/hw/arm/xlnx-zynqmp.c
@@ -238,6 +238,8 @@ static void xlnx_zynqmp_realize(DeviceState *dev, Error 
**errp)
 }
 g_free(name);
 
+object_property_set_bool(OBJECT(>apu_cpu[i]),
+ s->secure, "has_el3", NULL);
 object_property_set_int(OBJECT(>apu_cpu[i]), GIC_BASE_ADDR,
 "reset-cbar", _abort);
 object_property_set_bool(OBJECT(>apu_cpu[i]), true, "realized",
@@ -370,6 +372,7 @@ static void xlnx_zynqmp_realize(DeviceState *dev, Error 
**errp)
 
 static Property xlnx_zynqmp_props[] = {
 DEFINE_PROP_STRING("boot-cpu", XlnxZynqMPState, boot_cpu),
+DEFINE_PROP_BOOL("secure", XlnxZynqMPState, secure, false),
 DEFINE_PROP_END_OF_LIST()
 };
 
diff --git a/include/hw/arm/xlnx-zynqmp.h b/include/hw/arm/xlnx-zynqmp.h
index 2332596..38d4c8c 100644
--- a/include/hw/arm/xlnx-zynqmp.h
+++ b/include/hw/arm/xlnx-zynqmp.h
@@ -84,6 +84,9 @@ typedef struct XlnxZynqMPState {
 
 char *boot_cpu;
 ARMCPU *boot_cpu_ptr;
+
+/* Has the ARM Security extensions?  */
+bool secure;
 }  XlnxZynqMPState;
 
 #define XLNX_ZYNQMP_H
-- 
2.5.0

Re: [Qemu-devel] [PATCH v2 0/3] linux-user: netlink support

2016-05-25 Thread Riku Voipio

On Tue, May 24, 2016 at 02:08:14PM +0100, Peter Maydell wrote:
> On 24 May 2016 at 13:54, Riku Voipio  wrote:
> > On tiistaina 24. toukokuuta 2016 15.05.08 EEST, Riku Voipio wrote:
> >>
> >> Ok, fair enough, applied.
> >
> >
> > dropped since the build failed on squeeze. I take the simplest fix is to
> > wrap all nl support #ifdef IFLA_PROTO_DOWN since that is the newest define
> > available.
> >
> > /squeeze-container/qemu/linux-user/syscall.c
 
> Tangentially, are your containers here chroots, ad-hoc containers,
> docker images, or something else? I was thinking about trying to
> sort out a more self-contained and reproducible LTP setup than the
> chroot I have at the moment, but perhaps you've already gone down
> that path ?

Local docker images - these aren't exactly clean yet. But if people want
I can look at publishing them in dockerhub. That said, for build testing
docker images don't add much to plain chroots.

Riku

Re: [Qemu-devel] [PATCH 0/3] Qemu: scsi: megasas: various OOB r/w access checks

2016-05-25 Thread Paolo Bonzini



On 25/05/2016 12:31, P J P wrote:
> From: Prasad J Pandit 
> 
> Hello,
> 
> Multiple OOB r/w access issues were reported by Li Qiang, in SCSI MegaRAID SAS
> Host bus Adapter emulation. Please see below proposed patches to fix these
> issues.

Thanks, queued.

Paolo

Re: [Qemu-devel] [PATCH v6 for-2.7 25/28] migration: define 'tls-creds' and 'tls-hostname' migration parameters

2016-05-25 Thread Amit Shah

On (Wed) 27 Apr 2016 [11:05:15], Daniel P. Berrange wrote:
> Define two new migration parameters to be used with TLS encryption.
> The 'tls-creds' parameter provides the ID of an instance of the
> 'tls-creds' object type, or rather a subclass such as 'tls-creds-x509'.
> Providing these credentials will enable use of TLS on the migration
> data stream.
> 
> If using x509 certificates, together with a migration URI that does
> not include a hostname, the 'tls-hostname' parameter provides the
> hostname to use when verifying the server's x509 certificate. This
> allows TLS to be used in combination with fd: and exec: protocols
> where a TCP connection is established by a 3rd party outside of
> QEMU.
> 
> NB, this requires changing the migrate_set_parameter method in the
> HMP to accept a 's' (string) value instead of 'i' (integer). This
> is backwards compatible, because the parsing of strings allows the
> quotes to be optional, thus any integer is also a valid string.
> 
> Reviewed-by: Dr. David Alan Gilbert 
> Signed-off-by: Daniel P. Berrange 

> diff --git a/qapi-schema.json b/qapi-schema.json
> index 9aa14b4..12be303 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -617,11 +617,28 @@
>  # @x-cpu-throttle-increment: throttle percentage increase each time
>  #auto-converge detects that migration is not 
> making
>  #progress. The default value is 10. (Since 2.5)
> +#
> +# @tls-creds: ID of the 'tls-creds' object that provides credentials for
> +# establishing a TLS connection over the migration data channel.
> +# On the outgoing side of the migration, the credentials must
> +# be for a 'client' endpoint, while for the incoming side the
> +# credentials must be for a 'server' endpoint. Setting this
> +# will enable TLS for all migrations. The default is unset,
> +# resulting in unsecured migration at the QEMU level. (Since 2.6)

All these need to be "Since 2.7"

I've updated these in my branch, no respin required for this.

Amit

Re: [Qemu-devel] [PATCH v2 1/3] docs/atomics: update atomic_read/set comparison with Linux

2016-05-25 Thread Paolo Bonzini



On 24/05/2016 22:06, Emilio G. Cota wrote:
> Recently Linux did a mass conversion of its atomic_read/set calls
> so that they at least are READ/WRITE_ONCE. See Linux's commit
> 62e8a325 ("atomic, arch: Audit atomic_{read,set}()"). It seems though
> that their documentation hasn't been updated to reflect this.

Let's mention explicitly that the change happened in Linux 4.1.  Queued
with this small change.

Thanks,

Paolo

[Qemu-devel] [PATCH v3 09/15] mirror: Use BlockBackend for I/O

2016-05-25 Thread Kevin Wolf

This changes the mirror block job to use the job's BlockBackend for
performing its I/O. job->bs isn't used by the mirroring code any more
afterwards.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
---
 block/mirror.c | 70 --
 blockdev.c |  4 +---
 2 files changed, 40 insertions(+), 34 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 23aa10e..80fd3c7 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -35,7 +35,7 @@ typedef struct MirrorBuffer {
 typedef struct MirrorBlockJob {
 BlockJob common;
 RateLimit limit;
-BlockDriverState *target;
+BlockBackend *target;
 BlockDriverState *base;
 /* The name of the graph node to replace */
 char *replaces;
@@ -156,7 +156,8 @@ static void mirror_read_complete(void *opaque, int ret)
 mirror_iteration_done(op, ret);
 return;
 }
-bdrv_aio_writev(s->target, op->sector_num, >qiov, op->nb_sectors,
+blk_aio_pwritev(s->target, op->sector_num * BDRV_SECTOR_SIZE, >qiov,
+op->nb_sectors * BDRV_SECTOR_SIZE,
 mirror_write_complete, op);
 }
 
@@ -185,7 +186,7 @@ static int mirror_cow_align(MirrorBlockJob *s,
 need_cow |= !test_bit((*sector_num + *nb_sectors - 1) / chunk_sectors,
   s->cow_bitmap);
 if (need_cow) {
-bdrv_round_to_clusters(s->target, *sector_num, *nb_sectors,
+bdrv_round_to_clusters(blk_bs(s->target), *sector_num, *nb_sectors,
_sector_num, _nb_sectors);
 }
 
@@ -223,7 +224,7 @@ static inline void mirror_wait_for_io(MirrorBlockJob *s)
 static int mirror_do_read(MirrorBlockJob *s, int64_t sector_num,
   int nb_sectors)
 {
-BlockDriverState *source = s->common.bs;
+BlockBackend *source = s->common.blk;
 int sectors_per_chunk, nb_chunks;
 int ret = nb_sectors;
 MirrorOp *op;
@@ -273,7 +274,8 @@ static int mirror_do_read(MirrorBlockJob *s, int64_t 
sector_num,
 s->sectors_in_flight += nb_sectors;
 trace_mirror_one_iteration(s, sector_num, nb_sectors);
 
-bdrv_aio_readv(source, sector_num, >qiov, nb_sectors,
+blk_aio_preadv(source, sector_num * BDRV_SECTOR_SIZE, >qiov,
+   nb_sectors * BDRV_SECTOR_SIZE,
mirror_read_complete, op);
 return ret;
 }
@@ -295,10 +297,11 @@ static void mirror_do_zero_or_discard(MirrorBlockJob *s,
 s->in_flight++;
 s->sectors_in_flight += nb_sectors;
 if (is_discard) {
-bdrv_aio_discard(s->target, sector_num, op->nb_sectors,
- mirror_write_complete, op);
+blk_aio_discard(s->target, sector_num, op->nb_sectors,
+mirror_write_complete, op);
 } else {
-bdrv_aio_write_zeroes(s->target, sector_num, op->nb_sectors,
+blk_aio_pwrite_zeroes(s->target, sector_num * BDRV_SECTOR_SIZE,
+  op->nb_sectors * BDRV_SECTOR_SIZE,
   s->unmap ? BDRV_REQ_MAY_UNMAP : 0,
   mirror_write_complete, op);
 }
@@ -306,7 +309,7 @@ static void mirror_do_zero_or_discard(MirrorBlockJob *s,
 
 static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
 {
-BlockDriverState *source = s->common.bs;
+BlockDriverState *source = blk_bs(s->common.blk);
 int64_t sector_num, first_chunk;
 uint64_t delay_ns = 0;
 /* At least the first dirty chunk is mirrored in one iteration. */
@@ -383,7 +386,7 @@ static uint64_t coroutine_fn 
mirror_iteration(MirrorBlockJob *s)
 } else if (ret >= 0 && !(ret & BDRV_BLOCK_DATA)) {
 int64_t target_sector_num;
 int target_nb_sectors;
-bdrv_round_to_clusters(s->target, sector_num, io_sectors,
+bdrv_round_to_clusters(blk_bs(s->target), sector_num, io_sectors,
_sector_num, _nb_sectors);
 if (target_sector_num == sector_num &&
 target_nb_sectors == io_sectors) {
@@ -448,7 +451,8 @@ static void mirror_exit(BlockJob *job, void *opaque)
 MirrorBlockJob *s = container_of(job, MirrorBlockJob, common);
 MirrorExitData *data = opaque;
 AioContext *replace_aio_context = NULL;
-BlockDriverState *src = s->common.bs;
+BlockDriverState *src = blk_bs(s->common.blk);
+BlockDriverState *target_bs = blk_bs(s->target);
 
 /* Make sure that the source BDS doesn't go away before we called
  * block_job_completed(). */
@@ -460,20 +464,20 @@ static void mirror_exit(BlockJob *job, void *opaque)
 }
 
 if (s->should_complete && data->ret == 0) {
-BlockDriverState *to_replace = s->common.bs;
+BlockDriverState *to_replace = src;
 if (s->to_replace) {
 to_replace = s->to_replace;
 }
 
-if (bdrv_get_flags(s->target) != bdrv_get_flags(to_replace)) {
-bdrv_reopen(s->target,

Re: [Qemu-devel] [PATCH v2] nbd: Don't trim unrequested bytes

2016-05-25 Thread Kevin Wolf

Am 25.05.2016 um 13:25 hat Eric Blake geschrieben:
> On 05/25/2016 04:59 AM, Eric Blake wrote:
> > +/* Ignore unaligned head or tail, until block layer adds byte
> > + * interface */
> > +if (request.len >= BDRV_SECTOR_SIZE) {
> > +request.len -= (request.from + request.len) % BDRV_SECTOR_SIZE;
> > +ret = blk_co_discard(exp->blk,
> > + DIV_ROUND_UP(request.from + 
> > exp->dev_offset,
> > +  BDRV_SECTOR_SIZE),
> > + request.len / BDRV_SECTOR_SIZE);
> 
> This can end up calling blk_co_discard(, , 0) - do we need to audit
> whether all underlying drivers handle a 0-length request, and/or
> special-case this in blk_co_discard() to be an explicit no-op?

Auditing shouldn't be too hard as we don't have many drivers that
support the operation in the first place. In any case, it might be
useful to add a qemu-iotest case that tries all kinds of operations with
a zero length.

Kevin


pgp2EXGsE2khp.pgp
Description: PGP signature

[Qemu-devel] [RFC PATCH V4 3/4] colo-compare: introduce packet comparison thread

2016-05-25 Thread Zhang Chen

if packets are same, we send primary packet and drop secondary
packet, otherwise notify COLO do checkpoint.

Signed-off-by: Zhang Chen 
Signed-off-by: Li Zhijian 
Signed-off-by: Wen Congyang 
---
 net/colo-base.c|   1 +
 net/colo-base.h|   3 +
 net/colo-compare.c | 172 +
 trace-events   |   2 +
 4 files changed, 178 insertions(+)

diff --git a/net/colo-base.c b/net/colo-base.c
index 6bb80ca..e795b5f 100644
--- a/net/colo-base.c
+++ b/net/colo-base.c
@@ -134,6 +134,7 @@ Packet *packet_new(const void *data, int size)
 
 pkt->data = g_memdup(data, size);
 pkt->size = size;
+pkt->creation_ms = qemu_clock_get_ms(QEMU_CLOCK_HOST);
 
 return pkt;
 }
diff --git a/net/colo-base.h b/net/colo-base.h
index 5d72302..7b0abdf 100644
--- a/net/colo-base.h
+++ b/net/colo-base.h
@@ -18,6 +18,7 @@
 #include "slirp/slirp.h"
 #include "qemu/jhash.h"
 #include "qemu/rcu.h"
+#include "qemu/timer.h"
 
 #define HASHTABLE_MAX_SIZE 16384
 
@@ -47,6 +48,8 @@ typedef struct Packet {
 };
 uint8_t *transport_layer;
 int size;
+/* Time of packet creation, in wall clock ms */
+int64_t creation_ms;
 } Packet;
 
 typedef struct ConnectionKey {
diff --git a/net/colo-compare.c b/net/colo-compare.c
index 309d341..f2ad260 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -35,6 +35,8 @@
 OBJECT_CHECK(CompareState, (obj), TYPE_COLO_COMPARE)
 
 #define COMPARE_READ_LEN_MAX NET_BUFSIZE
+/* TODO: Should be configurable */
+#define REGULAR_CHECK_MS 400
 
 static QTAILQ_HEAD(, CompareState) net_compares =
QTAILQ_HEAD_INITIALIZER(net_compares);
@@ -86,6 +88,14 @@ typedef struct CompareState {
 GQueue unprocessed_connections;
 /* proxy current hash size */
 uint32_t hashtable_size;
+
+/* notify compare thread */
+QemuEvent event;
+/* compare thread, a thread for each NIC */
+QemuThread thread;
+int thread_status;
+/* Timer used on the primary to find packets that are never matched */
+QEMUTimer *timer;
 } CompareState;
 
 typedef struct CompareClass {
@@ -97,6 +107,15 @@ enum {
 SECONDARY_IN,
 };
 
+enum {
+/* compare thread isn't started */
+COMPARE_THREAD_NONE,
+/* compare thread is running */
+COMPARE_THREAD_RUNNING,
+/* compare thread exit */
+COMPARE_THREAD_EXIT,
+};
+
 static int compare_chr_send(CharDriverState *out,
 const uint8_t *buf,
 uint32_t size);
@@ -145,6 +164,119 @@ static int packet_enqueue(CompareState *s, int mode)
 return 0;
 }
 
+/*
+ * The IP packets sent by primary and secondary
+ * will be compared in here
+ * TODO support ip fragment, Out-Of-Order
+ * return:0  means packet same
+ *> 0 || < 0 means packet different
+ */
+static int colo_packet_compare(Packet *ppkt, Packet *spkt)
+{
+trace_colo_compare_ip_info(ppkt->size, inet_ntoa(ppkt->ip->ip_src),
+   inet_ntoa(ppkt->ip->ip_dst), spkt->size,
+   inet_ntoa(spkt->ip->ip_src),
+   inet_ntoa(spkt->ip->ip_dst));
+
+if (ppkt->size == spkt->size) {
+return memcmp(ppkt->data, spkt->data, spkt->size);
+} else {
+return -1;
+}
+}
+
+static int colo_packet_compare_all(Packet *spkt, Packet *ppkt)
+{
+trace_colo_compare_main("compare all");
+return colo_packet_compare(ppkt, spkt);
+}
+
+static void colo_old_packet_check(void *opaque_packet, void *opaque_found)
+{
+int64_t now;
+bool *found_old = (bool *)opaque_found;
+Packet *ppkt = (Packet *)opaque_packet;
+
+if (*found_old) {
+/* Someone found an old packet earlier in the queue */
+return;
+}
+
+now = qemu_clock_get_ms(QEMU_CLOCK_HOST);
+if ((ppkt->creation_ms < now) &&
+((now - ppkt->creation_ms) > REGULAR_CHECK_MS)) {
+trace_colo_old_packet_check_found(ppkt->creation_ms);
+*found_old = true;
+}
+}
+
+/*
+ * called from the compare thread on the primary
+ * for compare connection
+ */
+static void colo_compare_connection(void *opaque, void *user_data)
+{
+CompareState *s = user_data;
+Connection *conn = opaque;
+Packet *pkt = NULL;
+GList *result = NULL;
+bool found_old;
+int ret;
+
+while (!g_queue_is_empty(>primary_list) &&
+   !g_queue_is_empty(>secondary_list)) {
+qemu_mutex_lock(>list_lock);
+pkt = g_queue_pop_tail(>primary_list);
+result = g_queue_find_custom(>secondary_list,
+  pkt, (GCompareFunc)colo_packet_compare_all);
+qemu_mutex_unlock(>list_lock);
+
+if (result) {
+ret = compare_chr_send(s->chr_out, pkt->data, pkt->size);
+if (ret < 0) {
+error_report("colo_send_primary_packet failed");
+}
+trace_colo_compare_main("packet same

Re: [Qemu-devel] [PATCH 06/13] qcow2: Convert to bdrv_co_pwrite_zeroes()

2016-05-25 Thread Kevin Wolf

Am 25.05.2016 um 00:25 hat Eric Blake geschrieben:
> Another step on our continuing quest to switch to byte-based
> interfaces.
> 
> There are still opportunities to optimize the qcow2 handling
> of zero clusters.  For example, if the backing file only has
> non-zero data in the portion about to be overwritten, then
> we could widen the request and make the entire cluster zero,
> rather than falling back to -ENOTSUP.  But for this patch,
> intentionally leave the semantics unchanged, even if not
> optimal.
> 
> Signed-off-by: Eric Blake 

Reviewed-by: Kevin Wolf

Re: [Qemu-devel] ARM IRQ Generation

2016-05-25 Thread Peter Maydell

On 25 May 2016 at 13:42, Karthik  wrote:
> Okay, understood. I`ll hold the IRQ line until CPU acknowledges the
> interrupt.
>
> By the way, is there a distinction between edge and level triggered
> interrupt in the qemu or it is up to the emulation implementation?

In QEMU, if you call qemu_set_irq(irq, 1) this is equivalent to
the hardware taking the IRQ wire to logical 1. Calling
qemu_set_irq(irq, 0) is taking the wire to logical 0.
(qemu_irq_pulse() takes the wire very briefly to logical 1, and
is a bit of an odd thing to do.) You need to do what the hardware
does to the hardware irq wire in the same situations when the
hardware changes the irq wire state.

Edge vs level is about what a device does when it sees a change
in state on an an incoming interrupt wire. "Edge triggered" means
it cares about the 0->1 change, and implies that it has an
internal latch which is set when the edge is detected, so that
it doesn't matter if the incoming wire drops to 0 again. "Level
triggered" means that it is the level state of the incoming wire
that matters. This is again all about "how do you behave when
you see the wire state changing", and you need to do what the
hardware does. If the hardware is strictly edge-triggered then
your model should be also; if it is always level-triggered then
your model must do the same; if it is configurable then again
you need to model the configurability.

thanks
-- PMM

Re: [Qemu-devel] [PATCH 6/6] char: get rid of qemu_char_get_next_serial

2016-05-25 Thread Paolo Bonzini



On 25/05/2016 12:58, xiaoqiang zhao wrote:
> since there is no user of qemu_char_get_next_serial any more,
> it's time to let it go away.
> 
> Signed-off-by: xiaoqiang zhao 
> ---
>  include/sysemu/char.h |  1 -
>  qemu-char.c   | 16 
>  2 files changed, 17 deletions(-)

Acked-by: Paolo Bonzini

Re: [Qemu-devel] [PATCH qemu v16 01/19] vfio: Delay DMA address space listener release

2016-05-25 Thread Alex Williamson

On Wed, 25 May 2016 16:34:37 +1000
David Gibson  wrote:

> On Fri, May 13, 2016 at 04:24:53PM -0600, Alex Williamson wrote:
> > On Fri, 13 May 2016 17:16:48 +1000
> > Alexey Kardashevskiy  wrote:
> >   
> > > On 05/06/2016 08:39 AM, Alex Williamson wrote:  
> > > > On Wed,  4 May 2016 16:52:13 +1000
> > > > Alexey Kardashevskiy  wrote:
> > > >
> > > >> This postpones VFIO container deinitialization to let region_del()
> > > >> callbacks (called via vfio_listener_release) do proper clean up
> > > >> while the group is still attached to the container.
> > > >
> > > > Any mappings within the container should clean themselves up when the
> > > > container is deprivleged by removing the last group in the kernel. Is
> > > > the issue that that doesn't happen, which would be a spapr vfio kernel
> > > > bug, or that our QEMU side structures get all out of whack if we let
> > > > that happen?
> > > 
> > > My mailbase got corrupted, missed that.
> > > 
> > > This is mostly for "[PATCH qemu v16 17/19] spapr_iommu, vfio, memory: 
> > > Notify IOMMU about starting/stopping being used by VFIO", I should have 
> > > put 
> > > 01/19 and 02/19 right before 17/19, sorry about that.  
> > 
> > Which I object to, it's just ridiculous to have vfio start/stop
> > callbacks in a set of generic iommu region ops.  
> 
> It's ugly, but I don't actually see a better way to do this (the
> general concept of having vfio start/stop callbacks, that is, not the
> specifics of the patches).
> 
> The fact is that how we implement the guest side IOMMU *does* need to
> change depending on whether VFIO devices are present or not. 

No, how the guest side iommu is implemented needs to change depending
on whether there's someone, anyone, in QEMU that cares about the iommu,
which can be determined by whether the iommu notifier has any clients.
Alexey has posted another patch that does this.

> That's
> due essentially to incompatibilities between a couple of kernel
> mechanisms.  Which in itself is ugly, but nonetheless real.
> 
> A (usually blank) vfio on/off callback in the guest side IOMMU ops
> seems like the least-bad way to handle this.

I disagree, we already call memory_region_register_iommu_notifier() to
indicate we care about the guest iommu, so the abstraction is already
there, there's absolutely no reason to make a vfio specific interface.
Thanks,

Alex

Re: [Qemu-devel] [PATCH 10/13] raw-posix: Convert to bdrv_co_pwrite_zeroes()

2016-05-25 Thread Kevin Wolf

Am 25.05.2016 um 00:25 hat Eric Blake geschrieben:
> Another step on our continuing quest to switch to byte-based
> interfaces.
> 
> Signed-off-by: Eric Blake 
> ---
>  block/raw-posix.c | 37 +++--
>  trace-events  |  2 +-
>  2 files changed, 16 insertions(+), 23 deletions(-)
> 
> diff --git a/block/raw-posix.c b/block/raw-posix.c
> index a4f5a1b..bb691f6 100644
> --- a/block/raw-posix.c
> +++ b/block/raw-posix.c
> @@ -1252,8 +1252,7 @@ static int aio_worker(void *arg)
>  }
> 
>  static int paio_submit_co(BlockDriverState *bs, int fd,
> -int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
> -int type)
> +  int64_t offset, int count, int type)

Removing qiov makes sense if we only want to use the function for
write_zeroes and therefore don't need the full power of paio_submit(). I
still think that it would be good to convert raw-posix to the
(coroutine-based) .bdrv_co_preadv/pwritev and then we will need qiov
again.

Kevin

Re: [Qemu-devel] [PATCH 11/13] raw_bsd: Convert to bdrv_co_pwrite_zeroes()

2016-05-25 Thread Kevin Wolf

Am 25.05.2016 um 00:25 hat Eric Blake geschrieben:
> Another step on our continuing quest to switch to byte-based
> interfaces.
> 
> Signed-off-by: Eric Blake 

Reviewed-by: Kevin Wolf

Re: [Qemu-devel] [PATCH 13/13] block: Kill bdrv_co_write_zeroes()

2016-05-25 Thread Kevin Wolf

Am 25.05.2016 um 00:25 hat Eric Blake geschrieben:
> Now that all drivers have been converted to a byte interface,
> we no longer need a sector interface.
> 
> Signed-off-by: Eric Blake 

Reviewed-by: Kevin Wolf

Re: [Qemu-devel] [PATCH v4 00/15] Dirty bitmap changes for migration/persistence work

2016-05-25 Thread Vladimir Sementsov-Ogievskiy


Hi!

Are you going to update the series in the near future?

On 08.03.2016 07:44, Fam Zheng wrote:

v4: Rebase.
 Add rev-by from John in patches 1-5, 7, 8.
 Remove BdrvDirtyBitmap typedef from dirty-bitmap.h in patch 4. [Max]
 Add assertion on bm->meta in patch 9. [John]

Two major features are added to block dirty bitmap (and underlying HBitmap) in
this series: meta bitmap and serialization, together with all other supportive
patches.

Both operations are common in dirty bitmap migration and persistence: they need
to find whether and which part of the dirty bitmap in question has changed with
meta dirty bitmap, and they need to write it to the target with serialization.


Fam Zheng (13):
   backup: Use Bitmap to replace "s->bitmap"
   block: Include hbitmap.h in block.h
   typedefs: Add BdrvDirtyBitmap
   block: Move block dirty bitmap code to separate files
   block: Remove unused typedef of BlockDriverDirtyHandler
   block: Hide HBitmap in block dirty bitmap interface
   HBitmap: Introduce "meta" bitmap to track bit changes
   tests: Add test code for meta bitmap
   block: Support meta dirty bitmap
   block: Add two dirty bitmap getters
   block: Assert that bdrv_release_dirty_bitmap succeeded
   tests: Add test code for hbitmap serialization
   block: More operations for meta dirty bitmap

Vladimir Sementsov-Ogievskiy (2):
   hbitmap: serialization
   block: BdrvDirtyBitmap serialization interface

  block.c  | 360 -
  block/Makefile.objs  |   2 +-
  block/backup.c   |  25 +-
  block/dirty-bitmap.c | 535 +++
  block/mirror.c   |  15 +-
  include/block/block.h|  40 +---
  include/block/dirty-bitmap.h |  75 ++
  include/qemu/hbitmap.h   |  96 
  include/qemu/typedefs.h  |   2 +
  tests/test-hbitmap.c | 255 +
  util/hbitmap.c   | 203 ++--
  11 files changed, 1177 insertions(+), 431 deletions(-)
  create mode 100644 block/dirty-bitmap.c
  create mode 100644 include/block/dirty-bitmap.h




--
Best regards,
Vladimir

Re: [Qemu-devel] [PATCH v3 06/15] block: Make blk_co_preadv/pwritev() public

2016-05-25 Thread Max Reitz

On 25.05.2016 14:29, Kevin Wolf wrote:
> Also add trace points now that the function can be directly called.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  block/block-backend.c  | 21 ++---
>  include/sysemu/block-backend.h |  6 ++
>  trace-events   |  4 
>  3 files changed, 24 insertions(+), 7 deletions(-)
> 
> diff --git a/block/block-backend.c b/block/block-backend.c
> index c8b13f1..34500e6 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -19,6 +19,7 @@
>  #include "sysemu/sysemu.h"
>  #include "qapi-event.h"
>  #include "qemu/id.h"
> +#include "trace.h"

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 09/13] qed: Convert to bdrv_co_pwrite_zeroes()

2016-05-25 Thread Kevin Wolf

Am 25.05.2016 um 16:28 hat Eric Blake geschrieben:
> On 05/25/2016 08:07 AM, Kevin Wolf wrote:
> > Am 25.05.2016 um 00:25 hat Eric Blake geschrieben:
> >> Another step on our continuing quest to switch to byte-based
> >> interfaces.
> >>
> >> Kill an abuse of the comma operator while at it (fortunately,
> >> the semantics were still right).
> >>
> >> Signed-off-by: Eric Blake 
> >> ---
> >>  block/qed.c | 25 +
> >>  1 file changed, 13 insertions(+), 12 deletions(-)
> >>
> 
> >> @@ -1443,10 +1443,10 @@ static int coroutine_fn 
> >> bdrv_qed_co_write_zeroes(BlockDriverState *bs,
> >>
> >>  /* Refuse if there are untouched backing file sectors */
> 
> The comment wasn't very helpful, so I may rewort it, too
> (s/untouched/unaligned/, or something like that)

I think "unaligned" is not what the comment means. What would an
unaligned sector be anyway?

The idea is probably like in qcow2 that if there is no backing file, the
rest of the cluster already reads as zeros, so we can overwrite it.

> >>  if (bs->backing) {
> >> -if (qed_offset_into_cluster(s, sector_num * BDRV_SECTOR_SIZE) != 
> >> 0) {
> >> +if (qed_offset_into_cluster(s, offset) != 0) {
> >>  return -ENOTSUP;
> >>  }
> >> -if (qed_offset_into_cluster(s, nb_sectors * BDRV_SECTOR_SIZE) != 
> >> 0) {
> >> +if (qed_offset_into_cluster(s, count) != 0) {
> >>  return -ENOTSUP;
> >>  }
> >>  }
> > 
> > Unaligned requests are only emulated if there is no backing file...
> > 
> >> @@ -1454,12 +1454,13 @@ static int coroutine_fn 
> >> bdrv_qed_co_write_zeroes(BlockDriverState *bs,
> >>  /* Zero writes start without an I/O buffer.  If a buffer becomes 
> >> necessary
> >>   * then it will be allocated during request processing.
> >>   */
> >> -iov.iov_base = NULL,
> >> -iov.iov_len  = nb_sectors * BDRV_SECTOR_SIZE,
> >> +iov.iov_base = NULL;
> >> +iov.iov_len = count;
> >>
> >>  qemu_iovec_init_external(, , 1);
> >> -blockacb = qed_aio_setup(bs, sector_num, , nb_sectors,
> >> - qed_co_write_zeroes_cb, ,
> >> +blockacb = qed_aio_setup(bs, offset >> BDRV_SECTOR_BITS, ,
> >> + count >> BDRV_SECTOR_BITS,
> > 
> > ...so offset and count can still be unaligned here and we end up zeroing
> > out the wrong part of the sector. I guess we need to return -ENOTSUP for
> > all sub-sector requests, even without a backing file.
> 
> Hmm. Wouldn't it be nicer if we could guarantee that blk_pwrite_zeroes()
> will never call bdrv_co_pwrite_zeroes() with less than
> request_alignment?

If we want this restriction, the right place is to implement it is in
bdrv_co_pwrite_zeroes() before calling into the driver, not on the
BlockBackend level.

> That is, if the block layer takes care of
> read-modify-write for any unaligned byte offset less than
> request_alignment, then the driver only has to worry about sector
> alignment.  Except qed.c doesn't seem to set request_alignment, but is
> just relying on io.c currently setting it to MAX(BDRV_SECTOR_SIZE,
> bs->bl.request_alignment) everywhere. (And the fact that
> request_alignment is a sibling rather than a member to BlockLimits bl is
> awkward.)

As explained on IRC, the block driver shouldn't care about sector
alignment. 512 is just an arbitrary number that some interfaces
misguidedly happen to use as their unit for parameters. The QED request
struct is one of them, but there is no real reason for it. There is no
metadata that we have at a sector granularity.

So I would avoid baking assumptions of bad drivers into new interfaces.

> Maybe we want three limits in BlockLimits, rather than two: the current
> max_pwrite_zeroes does a good job at saying how small blk_pwrite_zeroes
> must fragment large requests, and pwrite_zeroes_alignment does a good
> job at saying how large a request must be to potentially punch a hole,
> but at least in the case of qcow2, where we want to optimize a partial
> write to potentially zeroing an entire cluster, we still want to limit
> things to sector boundaries when checking for whether the rest of the
> cluster already reads as zeroes, whether or not we also want to support
> request_alignment of 1 instead of 512.

For qcow2, the only reason that sectors are involved in the optimisation
is that that's the granularity of bdrv_get_block_status(). Once that is
fixed, qcow2 can use bytes for its optimisation.

Until then, if we decided that we don't want to check the full cluster
any more (clusters are always aligned to sectors) but only the area that
is not overwritten, it would have to round start and end of the request
to the sector bounary.

> There are other drivers that I touched in this series that were relying
> on the fact that the block layer currently guarantees sector alignment,
> and maybe they should be setting request_alignment, or maybe we want to
> add yet another

Re: [Qemu-devel] [PATCH v3 15/15] blockjob: Remove BlockJob.bs

2016-05-25 Thread Max Reitz

On 25.05.2016 14:29, Kevin Wolf wrote:
> There is a single remaining user in qemu-img, and another one in a test
> case, both of which can be trivially converted to using BlockJob.blk
> instead.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  blockjob.c| 1 -
>  include/block/blockjob.h  | 1 -
>  qemu-img.c| 2 +-
>  tests/test-blockjob-txn.c | 3 ++-
>  4 files changed, 3 insertions(+), 4 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [RFC PATCH 3/3] tcg: Add frontend support for fence gen in ARMv7

2016-05-25 Thread Richard Henderson


On 05/24/2016 10:18 AM, Pranith Kumar wrote:

-/* We don't emulate caches so these are a no-op.  */
+if (TCG_TARGET_HAS_fence) {
+tcg_gen_fence();
+}


This should then be unconditional.


r~

Re: [Qemu-devel] [RFC PATCH 2/3] tcg: Add support for fence generation in x86 backend

2016-05-25 Thread Richard Henderson


On 05/24/2016 10:18 AM, Pranith Kumar wrote:

Signed-off-by: Pranith Kumar 
---
 tcg/i386/tcg-target.h | 1 +
 tcg/i386/tcg-target.inc.c | 9 +
 tcg/tcg-opc.h | 2 +-
 tcg/tcg.c | 1 +
 4 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 92be341..93ea42e 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -100,6 +100,7 @@ extern bool have_bmi1;
 #define TCG_TARGET_HAS_muls2_i321
 #define TCG_TARGET_HAS_muluh_i320
 #define TCG_TARGET_HAS_mulsh_i320
+#define TCG_TARGET_HAS_fence1


This has to be defined for all hosts.

The default implementation should be a function call into tcg-runtime.c that 
calls smp_mb().



@@ -347,6 +347,7 @@ static inline int tcg_target_const_match(tcg_target_long 
val, TCGType type,
 #define OPC_SHRX(0xf7 | P_EXT38 | P_SIMDF2)
 #define OPC_TESTL  (0x85)
 #define OPC_XCHG_ax_r32(0x90)
+#define OPC_MFENCE  (0xAE | P_EXT)

 #define OPC_GRP3_Ev(0xf7)
 #define OPC_GRP5   (0xff)
@@ -686,6 +687,14 @@ static inline void tcg_out_pushi(TCGContext *s, 
tcg_target_long val)
 }
 }

+static inline void tcg_out_fence(TCGContext *s)
+{
+/* TODO: Figure out an appropriate place for the encoding */
+tcg_out8(s, 0x0F);
+tcg_out8(s, 0xAE);
+tcg_out8(s, 0xF0);
+}


Why define OPC_MFENCE if you're not going to use it?  Of course, it's not 
exactly a complete and useful definition, so maybe just delete OPC_MFENCE.


Also, for 32-bit you need to check for sse2 before outputting this.  See also 
the existing cpuid checks in tcg_target_init and the fallback smp_mb definition 
for pre-gcc-4.4.



r~

[Qemu-devel] [PULL 03/31] block: Let bdrv_open_inherit() return the snapshot

2016-05-25 Thread Kevin Wolf

From: Max Reitz 

If bdrv_open_inherit() creates a snapshot BDS and *pbs is NULL, that
snapshot BDS should be returned instead of the BDS under it.

This has worked so far because (nearly) all users of BDRV_O_SNAPSHOT use
blk_new_open() to create the BDS tree. bdrv_append() (which is called by
bdrv_append_temp_snapshot()) redirects pointers from parents (i.e. the
BB in this case) to the newly appended child (i.e. the overlay),
therefore, while bdrv_open_inherit() did not return the root BDS, the BB
still pointed to it.

The only instance where BDRV_O_SNAPSHOT is used but blk_new_open() is
not is in blockdev_init() if no BDS tree is created, and instead
blk_new() is used and the flags are stored in the BB root state.
However, qmp_blockdev_change_medium() filters the BDRV_O_SNAPSHOT flag
before invoking bdrv_open(), so it will not have any effect.

In any case, it would be nicer if bdrv_open_inherit() could just always
return the root of the BDS tree that has been created.

To this end, bdrv_append_temp_snapshot() now returns the snapshot BDS
instead of just appending it on top of the snapshotted BDS. Also, it
calls bdrv_ref() before bdrv_append() (which bdrv_open_inherit() has to
undo if not returning the overlay).

Signed-off-by: Max Reitz 
Reviewed-by: Kevin Wolf 
Signed-off-by: Kevin Wolf 
---
 block.c | 47 ---
 1 file changed, 40 insertions(+), 7 deletions(-)

diff --git a/block.c b/block.c
index f85c5a2..17ee088 100644
--- a/block.c
+++ b/block.c
@@ -1422,8 +1422,10 @@ done:
 return c;
 }
 
-static int bdrv_append_temp_snapshot(BlockDriverState *bs, int flags,
- QDict *snapshot_options, Error **errp)
+static BlockDriverState *bdrv_append_temp_snapshot(BlockDriverState *bs,
+   int flags,
+   QDict *snapshot_options,
+   Error **errp)
 {
 /* TODO: extra byte is a hack to ensure MAX_PATH space on Windows. */
 char *tmp_filename = g_malloc0(PATH_MAX + 1);
@@ -1439,7 +1441,6 @@ static int bdrv_append_temp_snapshot(BlockDriverState 
*bs, int flags,
 /* Get the required size from the image */
 total_size = bdrv_getlength(bs);
 if (total_size < 0) {
-ret = total_size;
 error_setg_errno(errp, -total_size, "Could not get image size");
 goto out;
 }
@@ -1479,12 +1480,19 @@ static int bdrv_append_temp_snapshot(BlockDriverState 
*bs, int flags,
 goto out;
 }
 
+/* bdrv_append() consumes a strong reference to bs_snapshot (i.e. it will
+ * call bdrv_unref() on it), so in order to be able to return one, we have
+ * to increase bs_snapshot's refcount here */
+bdrv_ref(bs_snapshot);
 bdrv_append(bs_snapshot, bs);
 
+g_free(tmp_filename);
+return bs_snapshot;
+
 out:
 QDECREF(snapshot_options);
 g_free(tmp_filename);
-return ret;
+return NULL;
 }
 
 /*
@@ -1704,17 +1712,42 @@ static int bdrv_open_inherit(BlockDriverState **pbs, 
const char *filename,
 }
 
 QDECREF(options);
-*pbs = bs;
 
 /* For snapshot=on, create a temporary qcow2 overlay. bs points to the
  * temporary snapshot afterwards. */
 if (snapshot_flags) {
-ret = bdrv_append_temp_snapshot(bs, snapshot_flags, snapshot_options,
-_err);
+BlockDriverState *snapshot_bs;
+snapshot_bs = bdrv_append_temp_snapshot(bs, snapshot_flags,
+snapshot_options, _err);
 snapshot_options = NULL;
 if (local_err) {
+ret = -EINVAL;
 goto close_and_fail;
 }
+if (!*pbs) {
+/* We are not going to return bs but the overlay on top of it
+ * (snapshot_bs); thus, we have to drop the strong reference to bs
+ * (which we obtained by calling bdrv_new()). bs will not be
+ * deleted, though, because the overlay still has a reference to 
it.
+ */
+bdrv_unref(bs);
+bs = snapshot_bs;
+} else {
+/* We are not going to return snapshot_bs, so we have to drop the
+ * strong reference to it (which was returned by
+ * bdrv_append_temp_snapshot()). snapshot_bs will not be deleted,
+ * though, because bdrv_append_temp_snapshot() made all parental
+ * references to bs (*pbs) point to snapshot_bs.
+ * In fact, if *pbs was not NULL, we are not going to return any 
new
+ * BDS. But we do not need to decrement bs's refcount here as is
+ * done above, because with a non-NULL *pbs this function never 
even
+ * had a strong reference to bs. */
+bdrv_unref(snapshot_bs);
+}
+}
+
+if (!*pbs) {
+

[Qemu-devel] [PULL 11/31] block: Introduce bdrv_replace_child()

2016-05-25 Thread Kevin Wolf

This adds a common function that is called when attaching a new child to
a parent, removing a child from a parent and when reconfiguring the
graph so that an existing child points to a different node now.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Fam Zheng 
---
 block.c | 26 --
 1 file changed, 20 insertions(+), 6 deletions(-)

diff --git a/block.c b/block.c
index 38df365..351344e 100644
--- a/block.c
+++ b/block.c
@@ -1150,18 +1150,32 @@ static int bdrv_fill_options(QDict **options, const 
char *filename,
 return 0;
 }
 
+static void bdrv_replace_child(BdrvChild *child, BlockDriverState *new_bs)
+{
+BlockDriverState *old_bs = child->bs;
+
+if (old_bs) {
+QLIST_REMOVE(child, next_parent);
+}
+if (new_bs) {
+QLIST_INSERT_HEAD(_bs->parents, child, next_parent);
+}
+
+child->bs = new_bs;
+}
+
 BdrvChild *bdrv_root_attach_child(BlockDriverState *child_bs,
   const char *child_name,
   const BdrvChildRole *child_role)
 {
 BdrvChild *child = g_new(BdrvChild, 1);
 *child = (BdrvChild) {
-.bs = child_bs,
+.bs = NULL,
 .name   = g_strdup(child_name),
 .role   = child_role,
 };
 
-QLIST_INSERT_HEAD(_bs->parents, child, next_parent);
+bdrv_replace_child(child, child_bs);
 
 return child;
 }
@@ -1182,7 +1196,9 @@ static void bdrv_detach_child(BdrvChild *child)
 QLIST_REMOVE(child, next);
 child->next.le_prev = NULL;
 }
-QLIST_REMOVE(child, next_parent);
+
+bdrv_replace_child(child, NULL);
+
 g_free(child->name);
 g_free(child);
 }
@@ -2203,10 +2219,8 @@ static void change_parent_backing_link(BlockDriverState 
*from,
 
 QLIST_FOREACH_SAFE(c, >parents, next_parent, next) {
 assert(c->role != _backing);
-c->bs = to;
-QLIST_REMOVE(c, next_parent);
-QLIST_INSERT_HEAD(>parents, c, next_parent);
 bdrv_ref(to);
+bdrv_replace_child(c, to);
 bdrv_unref(from);
 }
 }
-- 
1.8.3.1

[Qemu-devel] [PULL 05/31] block: Drop blk_new_with_bs()

2016-05-25 Thread Kevin Wolf

From: Max Reitz 

Its only caller is blk_new_open(), so we can just inline it there.

The bdrv_new_root() call is dropped in the process because we can just
let bdrv_open() create the BDS.

Signed-off-by: Max Reitz 
Signed-off-by: Kevin Wolf 
---
 block/block-backend.c  | 30 +++---
 include/sysemu/block-backend.h |  1 -
 2 files changed, 7 insertions(+), 24 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 9f306f8..b96323f 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -136,27 +136,7 @@ BlockBackend *blk_new(Error **errp)
 }
 
 /*
- * Create a new BlockBackend with a new BlockDriverState attached.
- * Otherwise just like blk_new(), which see.
- */
-BlockBackend *blk_new_with_bs(Error **errp)
-{
-BlockBackend *blk;
-BlockDriverState *bs;
-
-blk = blk_new(errp);
-if (!blk) {
-return NULL;
-}
-
-bs = bdrv_new_root();
-blk->root = bdrv_root_attach_child(bs, "root", _root);
-blk->root->opaque = blk;
-return blk;
-}
-
-/*
- * Calls blk_new_with_bs() and then calls bdrv_open() on the BlockDriverState.
+ * Creates a new BlockBackend, opens a new BlockDriverState, and connects both.
  *
  * Just as with bdrv_open(), after having called this function the reference to
  * @options belongs to the block layer (even on failure).
@@ -171,21 +151,25 @@ BlockBackend *blk_new_open(const char *filename, const 
char *reference,
QDict *options, int flags, Error **errp)
 {
 BlockBackend *blk;
+BlockDriverState *bs;
 int ret;
 
-blk = blk_new_with_bs(errp);
+blk = blk_new(errp);
 if (!blk) {
 QDECREF(options);
 return NULL;
 }
 
-ret = bdrv_open(>root->bs, filename, reference, options, flags, errp);
+bs = NULL;
+ret = bdrv_open(, filename, reference, options, flags, errp);
 if (ret < 0) {
 blk_unref(blk);
 return NULL;
 }
 
 blk_set_enable_write_cache(blk, true);
+blk->root = bdrv_root_attach_child(bs, "root", _root);
+blk->root->opaque = blk;
 
 return blk;
 }
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 68d92b5..d0db3c3 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -79,7 +79,6 @@ typedef struct BlockBackendPublic {
 } BlockBackendPublic;
 
 BlockBackend *blk_new(Error **errp);
-BlockBackend *blk_new_with_bs(Error **errp);
 BlockBackend *blk_new_open(const char *filename, const char *reference,
QDict *options, int flags, Error **errp);
 int blk_get_refcnt(BlockBackend *blk);
-- 
1.8.3.1

[Qemu-devel] [PULL 08/31] block: Assert !bs->refcnt in bdrv_close()

2016-05-25 Thread Kevin Wolf

From: Max Reitz 

The only caller of bdrv_close() left is bdrv_delete(). We may as well
assert that, in a way (there are some things in bdrv_close() that make
more sense under that assumption, such as the call to
bdrv_release_all_dirty_bitmaps() which in turn assumes that no frozen
bitmaps are attached to the BDS).

In addition, being called only in bdrv_delete() means that we can drop
bdrv_close()'s forward declaration at the top of block.c.

Signed-off-by: Max Reitz 
Reviewed-by: Alberto Garcia 
Reviewed-by: Kevin Wolf 
Signed-off-by: Kevin Wolf 
---
 block.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/block.c b/block.c
index ad4be90..4c021c0 100644
--- a/block.c
+++ b/block.c
@@ -74,8 +74,6 @@ static BlockDriverState *bdrv_open_inherit(const char 
*filename,
 /* If non-zero, use only whitelisted block drivers */
 static int use_bdrv_whitelist;
 
-static void bdrv_close(BlockDriverState *bs);
-
 #ifdef _WIN32
 static int is_windows_drive_prefix(const char *filename)
 {
@@ -2110,6 +2108,7 @@ static void bdrv_close(BlockDriverState *bs)
 BdrvAioNotifier *ban, *ban_next;
 
 assert(!bs->job);
+assert(!bs->refcnt);
 
 bdrv_drained_begin(bs); /* complete I/O */
 bdrv_flush(bs);
-- 
1.8.3.1

[Qemu-devel] [PULL 19/31] block: Cancel jobs first in bdrv_close_all()

2016-05-25 Thread Kevin Wolf

So far, bdrv_close_all() first removed all root BlockDriverStates of
BlockBackends and monitor owned BDSes, and then assumed that the
remaining BDSes must be related to jobs and cancelled these jobs.

This order doesn't work that well any more when block jobs use
BlockBackends internally because then they will lose their BDS before
being cancelled.

This patch changes bdrv_close_all() to first cancel all jobs and then
remove all root BDSes from the remaining BBs.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Alberto Garcia 
Reviewed-by: Max Reitz 
---
 block.c  | 23 ++-
 blockjob.c   | 13 +
 include/block/blockjob.h |  7 +++
 3 files changed, 22 insertions(+), 21 deletions(-)

diff --git a/block.c b/block.c
index 45cddd6..736432f 100644
--- a/block.c
+++ b/block.c
@@ -2209,8 +2209,7 @@ static void bdrv_close(BlockDriverState *bs)
 
 void bdrv_close_all(void)
 {
-BlockDriverState *bs;
-AioContext *aio_context;
+block_job_cancel_sync_all();
 
 /* Drop references from requests still in flight, such as canceled block
  * jobs whose AIO context has not been polled yet */
@@ -2219,25 +2218,7 @@ void bdrv_close_all(void)
 blk_remove_all_bs();
 blockdev_close_all_bdrv_states();
 
-/* Cancel all block jobs */
-while (!QTAILQ_EMPTY(_bdrv_states)) {
-QTAILQ_FOREACH(bs, _bdrv_states, bs_list) {
-aio_context = bdrv_get_aio_context(bs);
-
-aio_context_acquire(aio_context);
-if (bs->job) {
-block_job_cancel_sync(bs->job);
-aio_context_release(aio_context);
-break;
-}
-aio_context_release(aio_context);
-}
-
-/* All the remaining BlockDriverStates are referenced directly or
- * indirectly from block jobs, so there needs to be at least one BDS
- * directly used by a block job */
-assert(bs);
-}
+assert(QTAILQ_EMPTY(_bdrv_states));
 }
 
 static void change_parent_backing_link(BlockDriverState *from,
diff --git a/blockjob.c b/blockjob.c
index 0f1bc77..e916b41 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -331,6 +331,19 @@ int block_job_cancel_sync(BlockJob *job)
 return block_job_finish_sync(job, _job_cancel_err, NULL);
 }
 
+void block_job_cancel_sync_all(void)
+{
+BlockJob *job;
+AioContext *aio_context;
+
+while ((job = QLIST_FIRST(_jobs))) {
+aio_context = bdrv_get_aio_context(job->bs);
+aio_context_acquire(aio_context);
+block_job_cancel_sync(job);
+aio_context_release(aio_context);
+}
+}
+
 int block_job_complete_sync(BlockJob *job, Error **errp)
 {
 return block_job_finish_sync(job, _job_complete, errp);
diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index 30bb2c6..4ac6831 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -371,6 +371,13 @@ bool block_job_is_paused(BlockJob *job);
 int block_job_cancel_sync(BlockJob *job);
 
 /**
+ * block_job_cancel_sync_all:
+ *
+ * Synchronously cancels all jobs using block_job_cancel_sync().
+ */
+void block_job_cancel_sync_all(void);
+
+/**
  * block_job_complete_sync:
  * @job: The job to be completed.
  * @errp: Error object which may be set by block_job_complete(); this is not
-- 
1.8.3.1

[Qemu-devel] [PULL 28/31] backup: Remove bs parameter from backup_do_cow()

2016-05-25 Thread Kevin Wolf

Now that we pass the job to the function, bs is implied by that.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
Reviewed-by: Alberto Garcia 
---
 block/backup.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index a57288f..670ba50 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -94,12 +94,12 @@ static void cow_request_end(CowRequest *req)
 qemu_co_queue_restart_all(>wait_queue);
 }
 
-static int coroutine_fn backup_do_cow(BlockDriverState *bs,
-  BackupBlockJob *job,
+static int coroutine_fn backup_do_cow(BackupBlockJob *job,
   int64_t sector_num, int nb_sectors,
   bool *error_is_read,
   bool is_write_notifier)
 {
+BlockDriverState *bs = job->common.bs;
 CowRequest cow_request;
 struct iovec iov;
 QEMUIOVector bounce_qiov;
@@ -203,11 +203,11 @@ static int coroutine_fn backup_before_write_notify(
 int64_t sector_num = req->offset >> BDRV_SECTOR_BITS;
 int nb_sectors = req->bytes >> BDRV_SECTOR_BITS;
 
+assert(req->bs == job->common.bs);
 assert((req->offset & (BDRV_SECTOR_SIZE - 1)) == 0);
 assert((req->bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
 
-return backup_do_cow(req->bs, job, sector_num,
- nb_sectors, NULL, true);
+return backup_do_cow(job, sector_num, nb_sectors, NULL, true);
 }
 
 static void backup_set_speed(BlockJob *job, int64_t speed, Error **errp)
@@ -324,7 +324,6 @@ static int coroutine_fn 
backup_run_incremental(BackupBlockJob *job)
 int64_t end;
 int64_t last_cluster = -1;
 int64_t sectors_per_cluster = cluster_size_sectors(job);
-BlockDriverState *bs = job->common.bs;
 HBitmapIter hbi;
 
 granularity = bdrv_dirty_bitmap_granularity(job->sync_bitmap);
@@ -346,7 +345,7 @@ static int coroutine_fn 
backup_run_incremental(BackupBlockJob *job)
 if (yield_and_check(job)) {
 return ret;
 }
-ret = backup_do_cow(bs, job, cluster * sectors_per_cluster,
+ret = backup_do_cow(job, cluster * sectors_per_cluster,
 sectors_per_cluster, _is_read,
 false);
 if ((ret < 0) &&
@@ -446,7 +445,7 @@ static void coroutine_fn backup_run(void *opaque)
 }
 }
 /* FULL sync mode we copy the whole drive. */
-ret = backup_do_cow(bs, job, start * sectors_per_cluster,
+ret = backup_do_cow(job, start * sectors_per_cluster,
 sectors_per_cluster, _is_read, false);
 if (ret < 0) {
 /* Depending on error action, fail now or retry cluster */
-- 
1.8.3.1

[Qemu-devel] [PULL 25/31] mirror: Use BlockBackend for I/O

2016-05-25 Thread Kevin Wolf

This changes the mirror block job to use the job's BlockBackend for
performing its I/O. job->bs isn't used by the mirroring code any more
afterwards.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
---
 block/mirror.c | 70 --
 blockdev.c |  4 +---
 2 files changed, 40 insertions(+), 34 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 23aa10e..80fd3c7 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -35,7 +35,7 @@ typedef struct MirrorBuffer {
 typedef struct MirrorBlockJob {
 BlockJob common;
 RateLimit limit;
-BlockDriverState *target;
+BlockBackend *target;
 BlockDriverState *base;
 /* The name of the graph node to replace */
 char *replaces;
@@ -156,7 +156,8 @@ static void mirror_read_complete(void *opaque, int ret)
 mirror_iteration_done(op, ret);
 return;
 }
-bdrv_aio_writev(s->target, op->sector_num, >qiov, op->nb_sectors,
+blk_aio_pwritev(s->target, op->sector_num * BDRV_SECTOR_SIZE, >qiov,
+op->nb_sectors * BDRV_SECTOR_SIZE,
 mirror_write_complete, op);
 }
 
@@ -185,7 +186,7 @@ static int mirror_cow_align(MirrorBlockJob *s,
 need_cow |= !test_bit((*sector_num + *nb_sectors - 1) / chunk_sectors,
   s->cow_bitmap);
 if (need_cow) {
-bdrv_round_to_clusters(s->target, *sector_num, *nb_sectors,
+bdrv_round_to_clusters(blk_bs(s->target), *sector_num, *nb_sectors,
_sector_num, _nb_sectors);
 }
 
@@ -223,7 +224,7 @@ static inline void mirror_wait_for_io(MirrorBlockJob *s)
 static int mirror_do_read(MirrorBlockJob *s, int64_t sector_num,
   int nb_sectors)
 {
-BlockDriverState *source = s->common.bs;
+BlockBackend *source = s->common.blk;
 int sectors_per_chunk, nb_chunks;
 int ret = nb_sectors;
 MirrorOp *op;
@@ -273,7 +274,8 @@ static int mirror_do_read(MirrorBlockJob *s, int64_t 
sector_num,
 s->sectors_in_flight += nb_sectors;
 trace_mirror_one_iteration(s, sector_num, nb_sectors);
 
-bdrv_aio_readv(source, sector_num, >qiov, nb_sectors,
+blk_aio_preadv(source, sector_num * BDRV_SECTOR_SIZE, >qiov,
+   nb_sectors * BDRV_SECTOR_SIZE,
mirror_read_complete, op);
 return ret;
 }
@@ -295,10 +297,11 @@ static void mirror_do_zero_or_discard(MirrorBlockJob *s,
 s->in_flight++;
 s->sectors_in_flight += nb_sectors;
 if (is_discard) {
-bdrv_aio_discard(s->target, sector_num, op->nb_sectors,
- mirror_write_complete, op);
+blk_aio_discard(s->target, sector_num, op->nb_sectors,
+mirror_write_complete, op);
 } else {
-bdrv_aio_write_zeroes(s->target, sector_num, op->nb_sectors,
+blk_aio_pwrite_zeroes(s->target, sector_num * BDRV_SECTOR_SIZE,
+  op->nb_sectors * BDRV_SECTOR_SIZE,
   s->unmap ? BDRV_REQ_MAY_UNMAP : 0,
   mirror_write_complete, op);
 }
@@ -306,7 +309,7 @@ static void mirror_do_zero_or_discard(MirrorBlockJob *s,
 
 static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
 {
-BlockDriverState *source = s->common.bs;
+BlockDriverState *source = blk_bs(s->common.blk);
 int64_t sector_num, first_chunk;
 uint64_t delay_ns = 0;
 /* At least the first dirty chunk is mirrored in one iteration. */
@@ -383,7 +386,7 @@ static uint64_t coroutine_fn 
mirror_iteration(MirrorBlockJob *s)
 } else if (ret >= 0 && !(ret & BDRV_BLOCK_DATA)) {
 int64_t target_sector_num;
 int target_nb_sectors;
-bdrv_round_to_clusters(s->target, sector_num, io_sectors,
+bdrv_round_to_clusters(blk_bs(s->target), sector_num, io_sectors,
_sector_num, _nb_sectors);
 if (target_sector_num == sector_num &&
 target_nb_sectors == io_sectors) {
@@ -448,7 +451,8 @@ static void mirror_exit(BlockJob *job, void *opaque)
 MirrorBlockJob *s = container_of(job, MirrorBlockJob, common);
 MirrorExitData *data = opaque;
 AioContext *replace_aio_context = NULL;
-BlockDriverState *src = s->common.bs;
+BlockDriverState *src = blk_bs(s->common.blk);
+BlockDriverState *target_bs = blk_bs(s->target);
 
 /* Make sure that the source BDS doesn't go away before we called
  * block_job_completed(). */
@@ -460,20 +464,20 @@ static void mirror_exit(BlockJob *job, void *opaque)
 }
 
 if (s->should_complete && data->ret == 0) {
-BlockDriverState *to_replace = s->common.bs;
+BlockDriverState *to_replace = src;
 if (s->to_replace) {
 to_replace = s->to_replace;
 }
 
-if (bdrv_get_flags(s->target) !=

Re: [Qemu-devel] [RFC v2 03/11] docs: new design document multi-thread-tcg.txt (DRAFTING)

2016-05-25 Thread Sergey Fedorov

On 25/05/16 21:03, Paolo Bonzini wrote:
>> The page table seems to be protected by 'mmap_lock' in user mode
>> emulation but by 'tb_lock' in system mode emulation. It may turn to be
>> possible to read it safely even with no lock held.
> Yes, it is possible to at least follow the radix tree safely with no
> lock held.  The fields in the leaves can be either lockless or protected
> by a lock.
>
> The radix tree can be followed without a lock just like you do with RCU.
> The difference with RCU is that:
>
> 1) the leaves are protected with a lock, so you don't do the "copy";
> instead after reading you lock around updates
>
> 2) the radix tree is only ever added to, so you don't need to protect
> the reads with rcu_read_lock/rcu_read_unlock.  rcu_read_lock and
> rcu_read_unlock are only needed to inform the deleters that something
> cannot yet go away.  Without deleters, you don't need rcu_read_lock
> and rcu_read_unlock (but you still need atomic_rcu_read/atomic_rcu_set).
>
>

Yes, however looking closer at how the leafs are used I can't see much
point to do this so far...

Thanks,
Sergey

Re: [Qemu-devel] [PATCH v2 2/4] xlnx-zynqmp: Make the RPU subsystem optional

2016-05-25 Thread Alistair Francis

On Wed, May 25, 2016 at 3:52 AM, Edgar E. Iglesias
 wrote:
> From: "Edgar E. Iglesias" 
>
> The way we currently model the RPU subsystem is of quite
> limited use. In addition to that, it causes problems for
> KVM and for GDB debugging.
>
> Make the RPU optional by adding a has_rpu property and
> default to having it disabled.
>
> This changes the default setup from having the RPU to not
> longer having it.
>
> Signed-off-by: Edgar E. Iglesias 

Reviewed-by: Alistair Francis 

Thanks,

Alistair

> ---
>  hw/arm/xlnx-zynqmp.c | 62 
> +++-
>  include/hw/arm/xlnx-zynqmp.h |  2 ++
>  2 files changed, 40 insertions(+), 24 deletions(-)
>
> diff --git a/hw/arm/xlnx-zynqmp.c b/hw/arm/xlnx-zynqmp.c
> index 965a250..3a8af6a 100644
> --- a/hw/arm/xlnx-zynqmp.c
> +++ b/hw/arm/xlnx-zynqmp.c
> @@ -83,6 +83,41 @@ static inline int arm_gic_ppi_index(int cpu_nr, int 
> ppi_index)
>  return GIC_NUM_SPI_INTR + cpu_nr * GIC_INTERNAL + ppi_index;
>  }
>
> +static void xlnx_zynqmp_create_rpu(XlnxZynqMPState *s, const char *boot_cpu,
> +   Error **errp)
> +{
> +Error *err = NULL;
> +int i;
> +
> +for (i = 0; i < XLNX_ZYNQMP_NUM_RPU_CPUS; i++) {
> +char *name;
> +
> +object_initialize(>rpu_cpu[i], sizeof(s->rpu_cpu[i]),
> +  "cortex-r5-" TYPE_ARM_CPU);
> +object_property_add_child(OBJECT(s), "rpu-cpu[*]",
> +  OBJECT(>rpu_cpu[i]), _abort);
> +
> +name = object_get_canonical_path_component(OBJECT(>rpu_cpu[i]));
> +if (strcmp(name, boot_cpu)) {
> +/* Secondary CPUs start in PSCI powered-down state */
> +object_property_set_bool(OBJECT(>rpu_cpu[i]), true,
> + "start-powered-off", _abort);
> +} else {
> +s->boot_cpu_ptr = >rpu_cpu[i];
> +}
> +g_free(name);
> +
> +object_property_set_bool(OBJECT(>rpu_cpu[i]), true, 
> "reset-hivecs",
> + _abort);
> +object_property_set_bool(OBJECT(>rpu_cpu[i]), true, "realized",
> + );
> +if (err) {
> +error_propagate(errp, err);
> +return;
> +}
> +}
> +}
> +
>  static void xlnx_zynqmp_init(Object *obj)
>  {
>  XlnxZynqMPState *s = XLNX_ZYNQMP(obj);
> @@ -95,13 +130,6 @@ static void xlnx_zynqmp_init(Object *obj)
>_abort);
>  }
>
> -for (i = 0; i < XLNX_ZYNQMP_NUM_RPU_CPUS; i++) {
> -object_initialize(>rpu_cpu[i], sizeof(s->rpu_cpu[i]),
> -  "cortex-r5-" TYPE_ARM_CPU);
> -object_property_add_child(obj, "rpu-cpu[*]", OBJECT(>rpu_cpu[i]),
> -  _abort);
> -}
> -
>  object_property_add_link(obj, "ddr-ram", TYPE_MEMORY_REGION,
>   (Object **)>ddr_ram,
>   qdev_prop_allow_set_link_before_realize,
> @@ -260,23 +288,8 @@ static void xlnx_zynqmp_realize(DeviceState *dev, Error 
> **errp)
>  qdev_connect_gpio_out(DEVICE(>apu_cpu[i]), 1, irq);
>  }
>
> -for (i = 0; i < XLNX_ZYNQMP_NUM_RPU_CPUS; i++) {
> -char *name;
> -
> -name = object_get_canonical_path_component(OBJECT(>rpu_cpu[i]));
> -if (strcmp(name, boot_cpu)) {
> -/* Secondary CPUs start in PSCI powered-down state */
> -object_property_set_bool(OBJECT(>rpu_cpu[i]), true,
> - "start-powered-off", _abort);
> -} else {
> -s->boot_cpu_ptr = >rpu_cpu[i];
> -}
> -g_free(name);
> -
> -object_property_set_bool(OBJECT(>rpu_cpu[i]), true, 
> "reset-hivecs",
> - _abort);
> -object_property_set_bool(OBJECT(>rpu_cpu[i]), true, "realized",
> - );
> +if (s->has_rpu) {
> +xlnx_zynqmp_create_rpu(s, boot_cpu, );
>  if (err) {
>  error_propagate(errp, err);
>  return;
> @@ -373,6 +386,7 @@ static void xlnx_zynqmp_realize(DeviceState *dev, Error 
> **errp)
>  static Property xlnx_zynqmp_props[] = {
>  DEFINE_PROP_STRING("boot-cpu", XlnxZynqMPState, boot_cpu),
>  DEFINE_PROP_BOOL("secure", XlnxZynqMPState, secure, false),
> +DEFINE_PROP_BOOL("has_rpu", XlnxZynqMPState, has_rpu, false),
>  DEFINE_PROP_END_OF_LIST()
>  };
>
> diff --git a/include/hw/arm/xlnx-zynqmp.h b/include/hw/arm/xlnx-zynqmp.h
> index 38d4c8c..68f6eb0 100644
> --- a/include/hw/arm/xlnx-zynqmp.h
> +++ b/include/hw/arm/xlnx-zynqmp.h
> @@ -87,6 +87,8 @@ typedef struct XlnxZynqMPState {
>
>  /* Has the ARM Security extensions?  */
>  bool secure;
> +/* Has the RPU subsystem?  */
> +bool has_rpu;
>  }  XlnxZynqMPState;
>
>  #define

[Qemu-devel] [PULL 00/31] Block layer patches

2016-05-25 Thread Kevin Wolf

The following changes since commit 287db79df8af8e31f18e262feb5e05103a09e4d4:

  Merge remote-tracking branch 'remotes/ehabkost/tags/x86-pull-request' into 
staging (2016-05-24 13:06:33 +0100)

are available in the git repository at:


  git://repo.or.cz/qemu/kevin.git tags/for-upstream

for you to fetch changes up to b75536c9fa742f887304769d0608557bb8e3a27f:

  blockjob: Remove BlockJob.bs (2016-05-25 19:04:21 +0200)


Block layer patches


Alberto Garcia (1):
  block: keep a list of block jobs

Eric Blake (1):
  block: Rename blk_write_zeroes()

John Snow (1):
  backup: Pack Notifier within BackupBlockJob

Kevin Wolf (17):
  block: Fix bdrv_next() memory leak
  block: Introduce bdrv_replace_child()
  block: Make bdrv_drain() use bdrv_drained_begin/end()
  block: Fix reconfiguring graph with drained nodes
  block: Propagate .drained_begin/end callbacks
  block: Cancel jobs first in bdrv_close_all()
  block: Default to enabled write cache in blk_new()
  block: Convert block job core to BlockBackend
  block: Make blk_co_preadv/pwritev() public
  stream: Use BlockBackend for I/O
  mirror: Allow target that already has a BlockBackend
  mirror: Use BlockBackend for I/O
  backup: Don't leak BackupBlockJob in error path
  backup: Remove bs parameter from backup_do_cow()
  backup: Use BlockBackend for I/O
  commit: Use BlockBackend for I/O
  blockjob: Remove BlockJob.bs

Max Reitz (9):
  block: Drop useless bdrv_new() call
  block: Let bdrv_open_inherit() return the snapshot
  tests: Drop BDS from test-throttle.c
  block: Drop blk_new_with_bs()
  block: Drop bdrv_new_root()
  block: Make bdrv_open() return a BDS
  block: Assert !bs->refcnt in bdrv_close()
  block: Drop bdrv_parent_cb_...() from bdrv_close()
  block: Drop errp parameter from blk_new()

Paolo Bonzini (2):
  dma-helpers: change interface to byte-based
  dma-helpers: change BlockBackend to opaque value in DMAIOFunc

 block.c| 245 -
 block/backup.c |  71 ++--
 block/block-backend.c  | 123 +
 block/commit.c |  53 +
 block/io.c |  97 +++-
 block/mirror.c | 100 -
 block/parallels.c  |   4 +-
 block/snapshot.c   |  55 ++---
 block/stream.c |  15 ++-
 block/vvfat.c  |   8 +-
 blockdev.c |  60 --
 blockjob.c |  62 ---
 dma-helpers.c  |  54 ++---
 hw/block/nvme.c|   6 +-
 hw/ide/ahci.c  |   6 +-
 hw/ide/core.c  |  20 ++--
 hw/ide/internal.h  |   6 +-
 hw/ide/macio.c |   2 +-
 hw/scsi/scsi-disk.c|   8 +-
 include/block/block.h  |  24 ++--
 include/block/block_int.h  |   3 +-
 include/block/blockjob.h   |  23 +++-
 include/sysemu/block-backend.h |  23 ++--
 include/sysemu/dma.h   |  20 ++--
 migration/block.c  |   4 +-
 monitor.c  |   4 +-
 qemu-img.c |   6 +-
 qemu-io-cmds.c |  22 ++--
 qmp.c  |   5 +-
 tests/qemu-iotests/041 |  27 -
 tests/qemu-iotests/041.out |   4 +-
 tests/test-blockjob-txn.c  |   3 +-
 tests/test-throttle.c  |   6 +-
 trace-events   |   8 +-
 34 files changed, 603 insertions(+), 574 deletions(-)

[Qemu-devel] [PULL 02/31] block: Drop useless bdrv_new() call

2016-05-25 Thread Kevin Wolf

From: Max Reitz 

bdrv_append_temp_snapshot() uses bdrv_new() to create an empty BDS
before invoking bdrv_open() on that BDS. This is probably a relict from
when it used to do some modifications on that empty BDS, but now that is
unnecessary, so we can just set bs_snapshot to NULL and let bdrv_open()
do the rest.

Signed-off-by: Max Reitz 
Reviewed-by: Alberto Garcia 
Reviewed-by: Kevin Wolf 
Signed-off-by: Kevin Wolf 
---
 block.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/block.c b/block.c
index d287771..f85c5a2 100644
--- a/block.c
+++ b/block.c
@@ -1470,8 +1470,7 @@ static int bdrv_append_temp_snapshot(BlockDriverState 
*bs, int flags,
 qdict_put(snapshot_options, "driver",
   qstring_from_str("qcow2"));
 
-bs_snapshot = bdrv_new();
-
+bs_snapshot = NULL;
 ret = bdrv_open(_snapshot, NULL, NULL, snapshot_options,
 flags, _err);
 snapshot_options = NULL;
-- 
1.8.3.1

[Qemu-devel] [PULL 01/31] block: Fix bdrv_next() memory leak

2016-05-25 Thread Kevin Wolf

The bdrv_next() users all leaked the BdrvNextIterator after completing
the iteration. Simply changing bdrv_next() to free the iterator before
returning NULL at the end of list doesn't work because some callers exit
the loop before looking at all BDSes.

This patch moves the BdrvNextIterator from the heap to the stack of
the caller and switches to a bdrv_first()/bdrv_next() interface for
initialising the iterator.

Signed-off-by: Kevin Wolf 
Reviewed-by: Fam Zheng 
---
 block.c   | 18 -
 block/block-backend.c | 41 +-
 block/io.c| 10 --
 block/snapshot.c  | 55 ++-
 blockdev.c|  4 ++--
 include/block/block.h | 15 --
 migration/block.c |  4 ++--
 monitor.c |  4 ++--
 qmp.c |  5 ++---
 9 files changed, 92 insertions(+), 64 deletions(-)

diff --git a/block.c b/block.c
index 1205ef8..d287771 100644
--- a/block.c
+++ b/block.c
@@ -3195,9 +3195,9 @@ void bdrv_invalidate_cache_all(Error **errp)
 {
 BlockDriverState *bs;
 Error *local_err = NULL;
-BdrvNextIterator *it = NULL;
+BdrvNextIterator it;
 
-while ((it = bdrv_next(it, )) != NULL) {
+for (bs = bdrv_first(); bs; bs = bdrv_next()) {
 AioContext *aio_context = bdrv_get_aio_context(bs);
 
 aio_context_acquire(aio_context);
@@ -3239,11 +3239,11 @@ static int bdrv_inactivate_recurse(BlockDriverState *bs,
 int bdrv_inactivate_all(void)
 {
 BlockDriverState *bs = NULL;
-BdrvNextIterator *it = NULL;
+BdrvNextIterator it;
 int ret = 0;
 int pass;
 
-while ((it = bdrv_next(it, )) != NULL) {
+for (bs = bdrv_first(); bs; bs = bdrv_next()) {
 aio_context_acquire(bdrv_get_aio_context(bs));
 }
 
@@ -3252,8 +3252,7 @@ int bdrv_inactivate_all(void)
  * the second pass sets the BDRV_O_INACTIVE flag so that no further write
  * is allowed. */
 for (pass = 0; pass < 2; pass++) {
-it = NULL;
-while ((it = bdrv_next(it, )) != NULL) {
+for (bs = bdrv_first(); bs; bs = bdrv_next()) {
 ret = bdrv_inactivate_recurse(bs, pass);
 if (ret < 0) {
 goto out;
@@ -3262,8 +3261,7 @@ int bdrv_inactivate_all(void)
 }
 
 out:
-it = NULL;
-while ((it = bdrv_next(it, )) != NULL) {
+for (bs = bdrv_first(); bs; bs = bdrv_next()) {
 aio_context_release(bdrv_get_aio_context(bs));
 }
 
@@ -3753,10 +3751,10 @@ bool bdrv_recurse_is_first_non_filter(BlockDriverState 
*bs,
 bool bdrv_is_first_non_filter(BlockDriverState *candidate)
 {
 BlockDriverState *bs;
-BdrvNextIterator *it = NULL;
+BdrvNextIterator it;
 
 /* walk down the bs forest recursively */
-while ((it = bdrv_next(it, )) != NULL) {
+for (bs = bdrv_first(); bs; bs = bdrv_next()) {
 bool perm;
 
 /* try to recurse in this top level bs */
diff --git a/block/block-backend.c b/block/block-backend.c
index 6928d61..9f306f8 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -286,25 +286,11 @@ BlockBackend *blk_next(BlockBackend *blk)
: QTAILQ_FIRST(_block_backends);
 }
 
-struct BdrvNextIterator {
-enum {
-BDRV_NEXT_BACKEND_ROOTS,
-BDRV_NEXT_MONITOR_OWNED,
-} phase;
-BlockBackend *blk;
-BlockDriverState *bs;
-};
-
 /* Iterates over all top-level BlockDriverStates, i.e. BDSs that are owned by
  * the monitor or attached to a BlockBackend */
-BdrvNextIterator *bdrv_next(BdrvNextIterator *it, BlockDriverState **bs)
+BlockDriverState *bdrv_next(BdrvNextIterator *it)
 {
-if (!it) {
-it = g_new(BdrvNextIterator, 1);
-*it = (BdrvNextIterator) {
-.phase = BDRV_NEXT_BACKEND_ROOTS,
-};
-}
+BlockDriverState *bs;
 
 /* First, return all root nodes of BlockBackends. In order to avoid
  * returning a BDS twice when multiple BBs refer to it, we only return it
@@ -312,11 +298,11 @@ BdrvNextIterator *bdrv_next(BdrvNextIterator *it, 
BlockDriverState **bs)
 if (it->phase == BDRV_NEXT_BACKEND_ROOTS) {
 do {
 it->blk = blk_all_next(it->blk);
-*bs = it->blk ? blk_bs(it->blk) : NULL;
-} while (it->blk && (*bs == NULL || bdrv_first_blk(*bs) != it->blk));
+bs = it->blk ? blk_bs(it->blk) : NULL;
+} while (it->blk && (bs == NULL || bdrv_first_blk(bs) != it->blk));
 
-if (*bs) {
-return it;
+if (bs) {
+return bs;
 }
 it->phase = BDRV_NEXT_MONITOR_OWNED;
 }
@@ -326,10 +312,19 @@ BdrvNextIterator *bdrv_next(BdrvNextIterator *it, 
BlockDriverState **bs)
  * by the above block already */
 do {
 it->bs = bdrv_next_monitor_owned(it->bs);
-*bs = it->bs;
-} while (*bs && bdrv_has_blk(*bs));
+bs = it->bs;
+} while (bs && bdrv_has_blk(bs));
+
+return bs;
+}
+

[Qemu-devel] [PULL 13/31] block: Fix reconfiguring graph with drained nodes

2016-05-25 Thread Kevin Wolf

When changing the BlockDriverState that a BdrvChild points to while the
node is currently drained, we must call the .drained_end() parent
callback. Conversely, when this means attaching a new node that is
already drained, we need to call .drained_begin().

bdrv_root_attach_child() takes now an opaque parameter, which is needed
because the callbacks must also be called if we're attaching a new child
to the BlockBackend when the root node is already drained, and they need
a way to identify the BlockBackend. Previously, child->opaque was set
too late and the callbacks would still see it as NULL.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Fam Zheng 
---
 block.c   | 18 ++
 block/block-backend.c |  9 +
 include/block/block_int.h |  3 ++-
 3 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/block.c b/block.c
index 351344e..598624f 100644
--- a/block.c
+++ b/block.c
@@ -1155,24 +1155,33 @@ static void bdrv_replace_child(BdrvChild *child, 
BlockDriverState *new_bs)
 BlockDriverState *old_bs = child->bs;
 
 if (old_bs) {
+if (old_bs->quiesce_counter && child->role->drained_end) {
+child->role->drained_end(child);
+}
 QLIST_REMOVE(child, next_parent);
 }
+
+child->bs = new_bs;
+
 if (new_bs) {
 QLIST_INSERT_HEAD(_bs->parents, child, next_parent);
+if (new_bs->quiesce_counter && child->role->drained_begin) {
+child->role->drained_begin(child);
+}
 }
-
-child->bs = new_bs;
 }
 
 BdrvChild *bdrv_root_attach_child(BlockDriverState *child_bs,
   const char *child_name,
-  const BdrvChildRole *child_role)
+  const BdrvChildRole *child_role,
+  void *opaque)
 {
 BdrvChild *child = g_new(BdrvChild, 1);
 *child = (BdrvChild) {
 .bs = NULL,
 .name   = g_strdup(child_name),
 .role   = child_role,
+.opaque = opaque,
 };
 
 bdrv_replace_child(child, child_bs);
@@ -1185,7 +1194,8 @@ BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
  const char *child_name,
  const BdrvChildRole *child_role)
 {
-BdrvChild *child = bdrv_root_attach_child(child_bs, child_name, 
child_role);
+BdrvChild *child = bdrv_root_attach_child(child_bs, child_name, child_role,
+  NULL);
 QLIST_INSERT_HEAD(_bs->children, child, next);
 return child;
 }
diff --git a/block/block-backend.c b/block/block-backend.c
index 4e8298b..14e528e 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -161,8 +161,7 @@ BlockBackend *blk_new_open(const char *filename, const char 
*reference,
 }
 
 blk_set_enable_write_cache(blk, true);
-blk->root = bdrv_root_attach_child(bs, "root", _root);
-blk->root->opaque = blk;
+blk->root = bdrv_root_attach_child(bs, "root", _root, blk);
 
 return blk;
 }
@@ -481,8 +480,7 @@ void blk_remove_bs(BlockBackend *blk)
 void blk_insert_bs(BlockBackend *blk, BlockDriverState *bs)
 {
 bdrv_ref(bs);
-blk->root = bdrv_root_attach_child(bs, "root", _root);
-blk->root->opaque = blk;
+blk->root = bdrv_root_attach_child(bs, "root", _root, blk);
 
 notifier_list_notify(>insert_bs_notifiers, blk);
 if (blk->public.throttle_state) {
@@ -1676,6 +1674,9 @@ static void blk_root_drained_begin(BdrvChild *child)
 {
 BlockBackend *blk = child->opaque;
 
+/* Note that blk->root may not be accessible here yet if we are just
+ * attaching to a BlockDriverState that is drained. Use child instead. */
+
 if (blk->public.io_limits_disabled++ == 0) {
 throttle_group_restart_blk(blk);
 }
diff --git a/include/block/block_int.h b/include/block/block_int.h
index b6f4755..30a9717 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -719,7 +719,8 @@ void hmp_drive_add_node(Monitor *mon, const char *optstr);
 
 BdrvChild *bdrv_root_attach_child(BlockDriverState *child_bs,
   const char *child_name,
-  const BdrvChildRole *child_role);
+  const BdrvChildRole *child_role,
+  void *opaque);
 void bdrv_root_unref_child(BdrvChild *child);
 
 const char *bdrv_get_parent_name(const BlockDriverState *bs);
-- 
1.8.3.1

[Qemu-devel] [PULL 10/31] block: Drop errp parameter from blk_new()

2016-05-25 Thread Kevin Wolf

From: Max Reitz 

blk_new() cannot fail so its Error ** parameter has become superfluous.

Signed-off-by: Max Reitz 
Signed-off-by: Kevin Wolf 
---
 block/block-backend.c  | 9 ++---
 blockdev.c | 6 +-
 include/sysemu/block-backend.h | 2 +-
 tests/test-throttle.c  | 6 +++---
 4 files changed, 7 insertions(+), 16 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index bc1f071..4e8298b 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -119,7 +119,7 @@ static const BdrvChildRole child_root = {
  * Store an error through @errp on failure, unless it's null.
  * Return the new BlockBackend on success, null on failure.
  */
-BlockBackend *blk_new(Error **errp)
+BlockBackend *blk_new(void)
 {
 BlockBackend *blk;
 
@@ -153,12 +153,7 @@ BlockBackend *blk_new_open(const char *filename, const 
char *reference,
 BlockBackend *blk;
 BlockDriverState *bs;
 
-blk = blk_new(errp);
-if (!blk) {
-QDECREF(options);
-return NULL;
-}
-
+blk = blk_new();
 bs = bdrv_open(filename, reference, options, flags, errp);
 if (!bs) {
 blk_unref(blk);
diff --git a/blockdev.c b/blockdev.c
index 7d4a1ba..af0b022 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -567,11 +567,7 @@ static BlockBackend *blockdev_init(const char *file, QDict 
*bs_opts,
 if ((!file || !*file) && !qdict_size(bs_opts)) {
 BlockBackendRootState *blk_rs;
 
-blk = blk_new(errp);
-if (!blk) {
-goto early_err;
-}
-
+blk = blk_new();
 blk_rs = blk_get_root_state(blk);
 blk_rs->open_flags= bdrv_flags;
 blk_rs->read_only = !(bdrv_flags & BDRV_O_RDWR);
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index d0db3c3..9d6615c 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -78,7 +78,7 @@ typedef struct BlockBackendPublic {
 QLIST_ENTRY(BlockBackendPublic) round_robin;
 } BlockBackendPublic;
 
-BlockBackend *blk_new(Error **errp);
+BlockBackend *blk_new(void);
 BlockBackend *blk_new_open(const char *filename, const char *reference,
QDict *options, int flags, Error **errp);
 int blk_get_refcnt(BlockBackend *blk);
diff --git a/tests/test-throttle.c b/tests/test-throttle.c
index d7fb0a6..c02be80 100644
--- a/tests/test-throttle.c
+++ b/tests/test-throttle.c
@@ -578,9 +578,9 @@ static void test_groups(void)
 BlockBackend *blk1, *blk2, *blk3;
 BlockBackendPublic *blkp1, *blkp2, *blkp3;
 
-blk1 = blk_new(_abort);
-blk2 = blk_new(_abort);
-blk3 = blk_new(_abort);
+blk1 = blk_new();
+blk2 = blk_new();
+blk3 = blk_new();
 
 blkp1 = blk_get_public(blk1);
 blkp2 = blk_get_public(blk2);
-- 
1.8.3.1

[Qemu-devel] [PULL 15/31] dma-helpers: change interface to byte-based

2016-05-25 Thread Kevin Wolf

From: Paolo Bonzini 

Signed-off-by: Paolo Bonzini 
Signed-off-by: Kevin Wolf 
---
 dma-helpers.c| 14 +++---
 hw/block/nvme.c  |  6 +++---
 hw/ide/ahci.c|  6 --
 hw/ide/core.c|  8 +---
 hw/scsi/scsi-disk.c  |  6 --
 include/sysemu/dma.h |  6 +++---
 trace-events |  2 +-
 7 files changed, 27 insertions(+), 21 deletions(-)

diff --git a/dma-helpers.c b/dma-helpers.c
index a6cc15f..af07aab 100644
--- a/dma-helpers.c
+++ b/dma-helpers.c
@@ -192,18 +192,18 @@ static const AIOCBInfo dma_aiocb_info = {
 };
 
 BlockAIOCB *dma_blk_io(
-BlockBackend *blk, QEMUSGList *sg, uint64_t sector_num,
+BlockBackend *blk, QEMUSGList *sg, uint64_t offset,
 DMAIOFunc *io_func, BlockCompletionFunc *cb,
 void *opaque, DMADirection dir)
 {
 DMAAIOCB *dbs = blk_aio_get(_aiocb_info, blk, cb, opaque);
 
-trace_dma_blk_io(dbs, blk, sector_num, (dir == DMA_DIRECTION_TO_DEVICE));
+trace_dma_blk_io(dbs, blk, offset, (dir == DMA_DIRECTION_TO_DEVICE));
 
 dbs->acb = NULL;
 dbs->blk = blk;
 dbs->sg = sg;
-dbs->offset = sector_num << BDRV_SECTOR_BITS;
+dbs->offset = offset;
 dbs->sg_cur_index = 0;
 dbs->sg_cur_byte = 0;
 dbs->dir = dir;
@@ -216,18 +216,18 @@ BlockAIOCB *dma_blk_io(
 
 
 BlockAIOCB *dma_blk_read(BlockBackend *blk,
- QEMUSGList *sg, uint64_t sector,
+ QEMUSGList *sg, uint64_t offset,
  void (*cb)(void *opaque, int ret), void *opaque)
 {
-return dma_blk_io(blk, sg, sector, blk_aio_preadv, cb, opaque,
+return dma_blk_io(blk, sg, offset, blk_aio_preadv, cb, opaque,
   DMA_DIRECTION_FROM_DEVICE);
 }
 
 BlockAIOCB *dma_blk_write(BlockBackend *blk,
-  QEMUSGList *sg, uint64_t sector,
+  QEMUSGList *sg, uint64_t offset,
   void (*cb)(void *opaque, int ret), void *opaque)
 {
-return dma_blk_io(blk, sg, sector, blk_aio_pwritev, cb, opaque,
+return dma_blk_io(blk, sg, offset, blk_aio_pwritev, cb, opaque,
   DMA_DIRECTION_TO_DEVICE);
 }
 
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 173988e..9faad29 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -239,7 +239,7 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeNamespace *ns, 
NvmeCmd *cmd,
 uint8_t lba_index  = NVME_ID_NS_FLBAS_INDEX(ns->id_ns.flbas);
 uint8_t data_shift = ns->id_ns.lbaf[lba_index].ds;
 uint64_t data_size = (uint64_t)nlb << data_shift;
-uint64_t aio_slba  = slba << (data_shift - BDRV_SECTOR_BITS);
+uint64_t data_offset = slba << data_shift;
 int is_write = rw->opcode == NVME_CMD_WRITE ? 1 : 0;
 enum BlockAcctType acct = is_write ? BLOCK_ACCT_WRITE : BLOCK_ACCT_READ;
 
@@ -258,8 +258,8 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeNamespace *ns, 
NvmeCmd *cmd,
 req->has_sg = true;
 dma_acct_start(n->conf.blk, >acct, >qsg, acct);
 req->aiocb = is_write ?
-dma_blk_write(n->conf.blk, >qsg, aio_slba, nvme_rw_cb, req) :
-dma_blk_read(n->conf.blk, >qsg, aio_slba, nvme_rw_cb, req);
+dma_blk_write(n->conf.blk, >qsg, data_offset, nvme_rw_cb, req) :
+dma_blk_read(n->conf.blk, >qsg, data_offset, nvme_rw_cb, req);
 
 return NVME_NO_COMPLETE;
 }
diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index f244bc0..502d4f1 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -1006,7 +1006,8 @@ static void execute_ncq_command(NCQTransferState *ncq_tfs)
 dma_acct_start(ide_state->blk, _tfs->acct,
_tfs->sglist, BLOCK_ACCT_READ);
 ncq_tfs->aiocb = dma_blk_read(ide_state->blk, _tfs->sglist,
-  ncq_tfs->lba, ncq_cb, ncq_tfs);
+  ncq_tfs->lba << BDRV_SECTOR_BITS,
+  ncq_cb, ncq_tfs);
 break;
 case WRITE_FPDMA_QUEUED:
 DPRINTF(port, "NCQ writing %d sectors to LBA %"PRId64", tag %d\n",
@@ -1018,7 +1019,8 @@ static void execute_ncq_command(NCQTransferState *ncq_tfs)
 dma_acct_start(ide_state->blk, _tfs->acct,
_tfs->sglist, BLOCK_ACCT_WRITE);
 ncq_tfs->aiocb = dma_blk_write(ide_state->blk, _tfs->sglist,
-   ncq_tfs->lba, ncq_cb, ncq_tfs);
+   ncq_tfs->lba << BDRV_SECTOR_BITS,
+   ncq_cb, ncq_tfs);
 break;
 default:
 DPRINTF(port, "error: unsupported NCQ command (0x%02x) received\n",
diff --git a/hw/ide/core.c b/hw/ide/core.c
index fe2bfba..7326189 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -799,6 +799,7 @@ static void ide_dma_cb(void *opaque, int ret)
 IDEState *s = opaque;
 int n;
 int64_t sector_num;
+uint64_t offset;
 bool stay_active = false;
 
 if (ret == -ECANCELED) {
@@ -859,17 +860,18 @@ static

[Qemu-devel] [PULL 18/31] block: keep a list of block jobs

2016-05-25 Thread Kevin Wolf

From: Alberto Garcia 

The current way to obtain the list of existing block jobs is to
iterate over all root nodes and check which ones own a job.

Since we want to be able to support block jobs in other nodes as well,
this patch keeps a list of jobs that is updated every time one is
created or destroyed.

Signed-off-by: Alberto Garcia 
Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 blockjob.c   | 13 +
 include/block/blockjob.h | 14 ++
 2 files changed, 27 insertions(+)

diff --git a/blockjob.c b/blockjob.c
index 5b840a7..0f1bc77 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -50,6 +50,16 @@ struct BlockJobTxn {
 int refcnt;
 };
 
+static QLIST_HEAD(, BlockJob) block_jobs = QLIST_HEAD_INITIALIZER(block_jobs);
+
+BlockJob *block_job_next(BlockJob *job)
+{
+if (!job) {
+return QLIST_FIRST(_jobs);
+}
+return QLIST_NEXT(job, job_list);
+}
+
 void *block_job_create(const BlockJobDriver *driver, BlockDriverState *bs,
int64_t speed, BlockCompletionFunc *cb,
void *opaque, Error **errp)
@@ -76,6 +86,8 @@ void *block_job_create(const BlockJobDriver *driver, 
BlockDriverState *bs,
 job->refcnt= 1;
 bs->job = job;
 
+QLIST_INSERT_HEAD(_jobs, job, job_list);
+
 /* Only set speed when necessary to avoid NotSupported error */
 if (speed != 0) {
 Error *local_err = NULL;
@@ -103,6 +115,7 @@ void block_job_unref(BlockJob *job)
 bdrv_unref(job->bs);
 error_free(job->blocker);
 g_free(job->id);
+QLIST_REMOVE(job, job_list);
 g_free(job);
 }
 }
diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index 073a433..30bb2c6 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -135,6 +135,9 @@ struct BlockJob {
  */
 bool deferred_to_main_loop;
 
+/** Element of the list of block jobs */
+QLIST_ENTRY(BlockJob) job_list;
+
 /** Status that is published by the query-block-jobs QMP API */
 BlockDeviceIoStatus iostatus;
 
@@ -173,6 +176,17 @@ struct BlockJob {
 };
 
 /**
+ * block_job_next:
+ * @job: A block job, or %NULL.
+ *
+ * Get the next element from the list of block jobs after @job, or the
+ * first one if @job is %NULL.
+ *
+ * Returns the requested job, or %NULL if there are no more jobs left.
+ */
+BlockJob *block_job_next(BlockJob *job);
+
+/**
  * block_job_create:
  * @job_type: The class object for the newly-created job.
  * @bs: The block
-- 
1.8.3.1

[Qemu-devel] [PULL 27/31] backup: Pack Notifier within BackupBlockJob

2016-05-25 Thread Kevin Wolf

From: John Snow 

Instead of relying on peeking at bs->job, we want to explicitly get
a reference to the job that was involved in this notifier callback.

Pack the Notifier inside of the BackupBlockJob so we can use
container_of to get a reference back to the BackupBlockJob object.

This cuts out one more case where we rely unnecessarily on bs->job.

Suggested-by: Paolo Bonzini 
Signed-off-by: John Snow 
Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 block/backup.c | 19 ++-
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index a990cf1..a57288f 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -47,6 +47,7 @@ typedef struct BackupBlockJob {
 uint64_t sectors_read;
 unsigned long *done_bitmap;
 int64_t cluster_size;
+NotifierWithReturn before_write;
 QLIST_HEAD(, CowRequest) inflight_reqs;
 } BackupBlockJob;
 
@@ -94,11 +95,11 @@ static void cow_request_end(CowRequest *req)
 }
 
 static int coroutine_fn backup_do_cow(BlockDriverState *bs,
+  BackupBlockJob *job,
   int64_t sector_num, int nb_sectors,
   bool *error_is_read,
   bool is_write_notifier)
 {
-BackupBlockJob *job = (BackupBlockJob *)bs->job;
 CowRequest cow_request;
 struct iovec iov;
 QEMUIOVector bounce_qiov;
@@ -197,6 +198,7 @@ static int coroutine_fn backup_before_write_notify(
 NotifierWithReturn *notifier,
 void *opaque)
 {
+BackupBlockJob *job = container_of(notifier, BackupBlockJob, before_write);
 BdrvTrackedRequest *req = opaque;
 int64_t sector_num = req->offset >> BDRV_SECTOR_BITS;
 int nb_sectors = req->bytes >> BDRV_SECTOR_BITS;
@@ -204,7 +206,8 @@ static int coroutine_fn backup_before_write_notify(
 assert((req->offset & (BDRV_SECTOR_SIZE - 1)) == 0);
 assert((req->bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
 
-return backup_do_cow(req->bs, sector_num, nb_sectors, NULL, true);
+return backup_do_cow(req->bs, job, sector_num,
+ nb_sectors, NULL, true);
 }
 
 static void backup_set_speed(BlockJob *job, int64_t speed, Error **errp)
@@ -343,7 +346,7 @@ static int coroutine_fn 
backup_run_incremental(BackupBlockJob *job)
 if (yield_and_check(job)) {
 return ret;
 }
-ret = backup_do_cow(bs, cluster * sectors_per_cluster,
+ret = backup_do_cow(bs, job, cluster * sectors_per_cluster,
 sectors_per_cluster, _is_read,
 false);
 if ((ret < 0) &&
@@ -378,9 +381,6 @@ static void coroutine_fn backup_run(void *opaque)
 BackupCompleteData *data;
 BlockDriverState *bs = job->common.bs;
 BlockDriverState *target = job->target;
-NotifierWithReturn before_write = {
-.notify = backup_before_write_notify,
-};
 int64_t start, end;
 int64_t sectors_per_cluster = cluster_size_sectors(job);
 int ret = 0;
@@ -393,7 +393,8 @@ static void coroutine_fn backup_run(void *opaque)
 
 job->done_bitmap = bitmap_new(end);
 
-bdrv_add_before_write_notifier(bs, _write);
+job->before_write.notify = backup_before_write_notify;
+bdrv_add_before_write_notifier(bs, >before_write);
 
 if (job->sync_mode == MIRROR_SYNC_MODE_NONE) {
 while (!block_job_is_cancelled(>common)) {
@@ -445,7 +446,7 @@ static void coroutine_fn backup_run(void *opaque)
 }
 }
 /* FULL sync mode we copy the whole drive. */
-ret = backup_do_cow(bs, start * sectors_per_cluster,
+ret = backup_do_cow(bs, job, start * sectors_per_cluster,
 sectors_per_cluster, _is_read, false);
 if (ret < 0) {
 /* Depending on error action, fail now or retry cluster */
@@ -461,7 +462,7 @@ static void coroutine_fn backup_run(void *opaque)
 }
 }
 
-notifier_with_return_remove(_write);
+notifier_with_return_remove(>before_write);
 
 /* wait until pending backup_do_cow() calls have completed */
 qemu_co_rwlock_wrlock(>flush_rwlock);
-- 
1.8.3.1

[Qemu-devel] [PULL 14/31] block: Propagate .drained_begin/end callbacks

2016-05-25 Thread Kevin Wolf

When draining intermediate nodes (i.e. nodes that aren't the root node
for at least one of their parents; with node references, the user can
always configure the graph to create this situation), we need to
propagate the .drained_begin/end callbacks all the way up to the root
for the drain to be effective.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Fam Zheng 
---
 block.c | 20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index 598624f..45cddd6 100644
--- a/block.c
+++ b/block.c
@@ -659,6 +659,18 @@ int bdrv_parse_cache_mode(const char *mode, int *flags, 
bool *writethrough)
 return 0;
 }
 
+static void bdrv_child_cb_drained_begin(BdrvChild *child)
+{
+BlockDriverState *bs = child->opaque;
+bdrv_drained_begin(bs);
+}
+
+static void bdrv_child_cb_drained_end(BdrvChild *child)
+{
+BlockDriverState *bs = child->opaque;
+bdrv_drained_end(bs);
+}
+
 /*
  * Returns the options and flags that a temporary snapshot should get, based on
  * the originally requested flags (the originally requested image will have
@@ -705,6 +717,8 @@ static void bdrv_inherited_options(int *child_flags, QDict 
*child_options,
 
 const BdrvChildRole child_file = {
 .inherit_options = bdrv_inherited_options,
+.drained_begin   = bdrv_child_cb_drained_begin,
+.drained_end = bdrv_child_cb_drained_end,
 };
 
 /*
@@ -723,6 +737,8 @@ static void bdrv_inherited_fmt_options(int *child_flags, 
QDict *child_options,
 
 const BdrvChildRole child_format = {
 .inherit_options = bdrv_inherited_fmt_options,
+.drained_begin   = bdrv_child_cb_drained_begin,
+.drained_end = bdrv_child_cb_drained_end,
 };
 
 /*
@@ -750,6 +766,8 @@ static void bdrv_backing_options(int *child_flags, QDict 
*child_options,
 
 static const BdrvChildRole child_backing = {
 .inherit_options = bdrv_backing_options,
+.drained_begin   = bdrv_child_cb_drained_begin,
+.drained_end = bdrv_child_cb_drained_end,
 };
 
 static int bdrv_open_flags(BlockDriverState *bs, int flags)
@@ -1195,7 +1213,7 @@ BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
  const BdrvChildRole *child_role)
 {
 BdrvChild *child = bdrv_root_attach_child(child_bs, child_name, child_role,
-  NULL);
+  parent_bs);
 QLIST_INSERT_HEAD(_bs->children, child, next);
 return child;
 }
-- 
1.8.3.1

[Qemu-devel] [PULL 23/31] stream: Use BlockBackend for I/O

2016-05-25 Thread Kevin Wolf

This changes the streaming block job to use the job's BlockBackend for
performing the COR reads. job->bs isn't used by the streaming code any
more afterwards.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Alberto Garcia 
Reviewed-by: Max Reitz 
---
 block/io.c|  9 -
 block/stream.c| 15 +--
 include/block/block.h |  2 --
 trace-events  |  1 -
 4 files changed, 9 insertions(+), 18 deletions(-)

diff --git a/block/io.c b/block/io.c
index 9bc1d45..bc2eda2 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1117,15 +1117,6 @@ int coroutine_fn 
bdrv_co_readv_no_serialising(BlockDriverState *bs,
 BDRV_REQ_NO_SERIALISING);
 }
 
-int coroutine_fn bdrv_co_copy_on_readv(BlockDriverState *bs,
-int64_t sector_num, int nb_sectors, QEMUIOVector *qiov)
-{
-trace_bdrv_co_copy_on_readv(bs, sector_num, nb_sectors);
-
-return bdrv_co_do_readv(bs, sector_num, nb_sectors, qiov,
-BDRV_REQ_COPY_ON_READ);
-}
-
 #define MAX_WRITE_ZEROES_BOUNCE_BUFFER 32768
 
 static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
diff --git a/block/stream.c b/block/stream.c
index 40aa322..c0efbda 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -39,7 +39,7 @@ typedef struct StreamBlockJob {
 char *backing_file_str;
 } StreamBlockJob;
 
-static int coroutine_fn stream_populate(BlockDriverState *bs,
+static int coroutine_fn stream_populate(BlockBackend *blk,
 int64_t sector_num, int nb_sectors,
 void *buf)
 {
@@ -52,7 +52,8 @@ static int coroutine_fn stream_populate(BlockDriverState *bs,
 qemu_iovec_init_external(, , 1);
 
 /* Copy-on-read the unallocated clusters */
-return bdrv_co_copy_on_readv(bs, sector_num, nb_sectors, );
+return blk_co_preadv(blk, sector_num * BDRV_SECTOR_SIZE, qiov.size, ,
+ BDRV_REQ_COPY_ON_READ);
 }
 
 typedef struct {
@@ -64,6 +65,7 @@ static void stream_complete(BlockJob *job, void *opaque)
 {
 StreamBlockJob *s = container_of(job, StreamBlockJob, common);
 StreamCompleteData *data = opaque;
+BlockDriverState *bs = blk_bs(job->blk);
 BlockDriverState *base = s->base;
 
 if (!block_job_is_cancelled(>common) && data->reached_end &&
@@ -75,8 +77,8 @@ static void stream_complete(BlockJob *job, void *opaque)
 base_fmt = base->drv->format_name;
 }
 }
-data->ret = bdrv_change_backing_file(job->bs, base_id, base_fmt);
-bdrv_set_backing_hd(job->bs, base);
+data->ret = bdrv_change_backing_file(bs, base_id, base_fmt);
+bdrv_set_backing_hd(bs, base);
 }
 
 g_free(s->backing_file_str);
@@ -88,7 +90,8 @@ static void coroutine_fn stream_run(void *opaque)
 {
 StreamBlockJob *s = opaque;
 StreamCompleteData *data;
-BlockDriverState *bs = s->common.bs;
+BlockBackend *blk = s->common.blk;
+BlockDriverState *bs = blk_bs(blk);
 BlockDriverState *base = s->base;
 int64_t sector_num = 0;
 int64_t end = -1;
@@ -159,7 +162,7 @@ wait:
 goto wait;
 }
 }
-ret = stream_populate(bs, sector_num, n, buf);
+ret = stream_populate(blk, sector_num, n, buf);
 }
 if (ret < 0) {
 BlockErrorAction action =
diff --git a/include/block/block.h b/include/block/block.h
index e899f7d..0a4b973 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -242,8 +242,6 @@ int bdrv_pwrite_sync(BlockDriverState *bs, int64_t offset,
 const void *buf, int count);
 int coroutine_fn bdrv_co_readv(BlockDriverState *bs, int64_t sector_num,
 int nb_sectors, QEMUIOVector *qiov);
-int coroutine_fn bdrv_co_copy_on_readv(BlockDriverState *bs,
-int64_t sector_num, int nb_sectors, QEMUIOVector *qiov);
 int coroutine_fn bdrv_co_readv_no_serialising(BlockDriverState *bs,
 int64_t sector_num, int nb_sectors, QEMUIOVector *qiov);
 int coroutine_fn bdrv_co_writev(BlockDriverState *bs, int64_t sector_num,
diff --git a/trace-events b/trace-events
index b85fb49..6966dd6 100644
--- a/trace-events
+++ b/trace-events
@@ -72,7 +72,6 @@ bdrv_aio_readv(void *bs, int64_t sector_num, int nb_sectors, 
void *opaque) "bs %
 bdrv_aio_writev(void *bs, int64_t sector_num, int nb_sectors, void *opaque) 
"bs %p sector_num %"PRId64" nb_sectors %d opaque %p"
 bdrv_aio_write_zeroes(void *bs, int64_t sector_num, int nb_sectors, int flags, 
void *opaque) "bs %p sector_num %"PRId64" nb_sectors %d flags %#x opaque %p"
 bdrv_co_readv(void *bs, int64_t sector_num, int nb_sector) "bs %p sector_num 
%"PRId64" nb_sectors %d"
-bdrv_co_copy_on_readv(void *bs, int64_t sector_num, int nb_sector) "bs %p 
sector_num %"PRId64" nb_sectors %d"
 bdrv_co_readv_no_serialising(void *bs, int64_t sector_num, int nb_sector) "bs 
%p sector_num %"PRId64"

Re: [Qemu-devel] [PATCH v2 4/4] xlnx-zynqmp: Use the in kernel GIC model for KVM runs

2016-05-25 Thread Alistair Francis

On Wed, May 25, 2016 at 3:52 AM, Edgar E. Iglesias
 wrote:
> From: "Edgar E. Iglesias" 
>
> Use the in kernel GIC model when running with KVM enabled.
>
> Reviewed-by: Peter Maydell 
> Signed-off-by: Edgar E. Iglesias 

Reviewed-by: Alistair Francis 

Thanks,

Alistair

> ---
>  hw/arm/xlnx-zynqmp.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/hw/arm/xlnx-zynqmp.c b/hw/arm/xlnx-zynqmp.c
> index db5b82b..9a1bc94 100644
> --- a/hw/arm/xlnx-zynqmp.c
> +++ b/hw/arm/xlnx-zynqmp.c
> @@ -22,6 +22,8 @@
>  #include "hw/arm/xlnx-zynqmp.h"
>  #include "hw/intc/arm_gic_common.h"
>  #include "exec/address-spaces.h"
> +#include "sysemu/kvm.h"
> +#include "kvm_arm.h"
>
>  #define GIC_NUM_SPI_INTR 160
>
> @@ -135,7 +137,7 @@ static void xlnx_zynqmp_init(Object *obj)
>   qdev_prop_allow_set_link_before_realize,
>   OBJ_PROP_LINK_UNREF_ON_RELEASE, _abort);
>
> -object_initialize(>gic, sizeof(s->gic), TYPE_ARM_GIC);
> +object_initialize(>gic, sizeof(s->gic), gic_class_name());
>  qdev_set_parent_bus(DEVICE(>gic), sysbus_get_default());
>
>  for (i = 0; i < XLNX_ZYNQMP_NUM_GEMS; i++) {
> --
> 2.5.0
>
>

[Qemu-devel] [PULL 12/31] block: Make bdrv_drain() use bdrv_drained_begin/end()

2016-05-25 Thread Kevin Wolf

Until now, bdrv_drained_begin() used bdrv_drain() internally to drain
the queue. This is kind of backwards and caused quiescing code to be
duplicated because bdrv_drained_begin() had to ensure that no new
requests come in even after bdrv_drain() returns, whereas bdrv_drain()
had to have them because it could be called from other places.

Instead move the bdrv_drain() code to bdrv_drained_begin() and make
bdrv_drain() a simple wrapper around bdrv_drained_begin/end().

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Fam Zheng 
---
 block/io.c | 69 ++
 1 file changed, 33 insertions(+), 36 deletions(-)

diff --git a/block/io.c b/block/io.c
index 2a28d63..9bc1d45 100644
--- a/block/io.c
+++ b/block/io.c
@@ -225,6 +225,34 @@ static void coroutine_fn 
bdrv_co_yield_to_drain(BlockDriverState *bs)
 assert(data.done);
 }
 
+void bdrv_drained_begin(BlockDriverState *bs)
+{
+if (!bs->quiesce_counter++) {
+aio_disable_external(bdrv_get_aio_context(bs));
+bdrv_parent_drained_begin(bs);
+}
+
+bdrv_io_unplugged_begin(bs);
+bdrv_drain_recurse(bs);
+if (qemu_in_coroutine()) {
+bdrv_co_yield_to_drain(bs);
+} else {
+bdrv_drain_poll(bs);
+}
+bdrv_io_unplugged_end(bs);
+}
+
+void bdrv_drained_end(BlockDriverState *bs)
+{
+assert(bs->quiesce_counter > 0);
+if (--bs->quiesce_counter > 0) {
+return;
+}
+
+bdrv_parent_drained_end(bs);
+aio_enable_external(bdrv_get_aio_context(bs));
+}
+
 /*
  * Wait for pending requests to complete on a single BlockDriverState subtree,
  * and suspend block driver's internal I/O until next request arrives.
@@ -238,26 +266,15 @@ static void coroutine_fn 
bdrv_co_yield_to_drain(BlockDriverState *bs)
  */
 void coroutine_fn bdrv_co_drain(BlockDriverState *bs)
 {
-bdrv_parent_drained_begin(bs);
-bdrv_io_unplugged_begin(bs);
-bdrv_drain_recurse(bs);
-bdrv_co_yield_to_drain(bs);
-bdrv_io_unplugged_end(bs);
-bdrv_parent_drained_end(bs);
+assert(qemu_in_coroutine());
+bdrv_drained_begin(bs);
+bdrv_drained_end(bs);
 }
 
 void bdrv_drain(BlockDriverState *bs)
 {
-bdrv_parent_drained_begin(bs);
-bdrv_io_unplugged_begin(bs);
-bdrv_drain_recurse(bs);
-if (qemu_in_coroutine()) {
-bdrv_co_yield_to_drain(bs);
-} else {
-bdrv_drain_poll(bs);
-}
-bdrv_io_unplugged_end(bs);
-bdrv_parent_drained_end(bs);
+bdrv_drained_begin(bs);
+bdrv_drained_end(bs);
 }
 
 /*
@@ -2541,23 +2558,3 @@ void bdrv_io_unplugged_end(BlockDriverState *bs)
 }
 }
 }
-
-void bdrv_drained_begin(BlockDriverState *bs)
-{
-if (!bs->quiesce_counter++) {
-aio_disable_external(bdrv_get_aio_context(bs));
-}
-bdrv_parent_drained_begin(bs);
-bdrv_drain(bs);
-}
-
-void bdrv_drained_end(BlockDriverState *bs)
-{
-bdrv_parent_drained_end(bs);
-
-assert(bs->quiesce_counter > 0);
-if (--bs->quiesce_counter > 0) {
-return;
-}
-aio_enable_external(bdrv_get_aio_context(bs));
-}
-- 
1.8.3.1

[Qemu-devel] [PULL 04/31] tests: Drop BDS from test-throttle.c

2016-05-25 Thread Kevin Wolf

From: Max Reitz 

Now that throttling has been moved to the BlockBackend level, we do not
need to create a BDS along with the BB in the I/O throttling test.

Signed-off-by: Max Reitz 
Reviewed-by: Kevin Wolf 
Signed-off-by: Kevin Wolf 
---
 tests/test-throttle.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tests/test-throttle.c b/tests/test-throttle.c
index 5ec966c..d7fb0a6 100644
--- a/tests/test-throttle.c
+++ b/tests/test-throttle.c
@@ -578,9 +578,9 @@ static void test_groups(void)
 BlockBackend *blk1, *blk2, *blk3;
 BlockBackendPublic *blkp1, *blkp2, *blkp3;
 
-blk1 = blk_new_with_bs(_abort);
-blk2 = blk_new_with_bs(_abort);
-blk3 = blk_new_with_bs(_abort);
+blk1 = blk_new(_abort);
+blk2 = blk_new(_abort);
+blk3 = blk_new(_abort);
 
 blkp1 = blk_get_public(blk1);
 blkp2 = blk_get_public(blk2);
-- 
1.8.3.1

[Qemu-devel] [PULL 21/31] block: Convert block job core to BlockBackend

2016-05-25 Thread Kevin Wolf

This adds a new BlockBackend field to the BlockJob struct, which
coexists with the BlockDriverState while converting the individual jobs.

When creating a block job, a new BlockBackend is created on top of the
given BlockDriverState, and it is destroyed when the BlockJob ends. The
reference to the BDS is now held by the BlockBackend instead of calling
bdrv_ref/unref manually.

We have to be careful when we use bdrv_replace_in_backing_chain() in
block jobs because this changes the BDS that job->blk points to. At the
moment block jobs are too tightly coupled with their BDS, so that moving
a job to another BDS isn't easily possible; therefore, we need to just
manually undo this change afterwards.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Alberto Garcia 
Reviewed-by: Max Reitz 
---
 block/mirror.c   |  3 +++
 blockjob.c   | 37 -
 include/block/blockjob.h |  3 ++-
 3 files changed, 25 insertions(+), 18 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index b9986d8..efca8fc 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -478,6 +478,9 @@ static void mirror_exit(BlockJob *job, void *opaque)
 bdrv_reopen(s->target, bdrv_get_flags(to_replace), NULL);
 }
 bdrv_replace_in_backing_chain(to_replace, s->target);
+/* We just changed the BDS the job BB refers to */
+blk_remove_bs(job->blk);
+blk_insert_bs(job->blk, src);
 }
 
 out:
diff --git a/blockjob.c b/blockjob.c
index e916b41..2097e1d 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -64,13 +64,17 @@ void *block_job_create(const BlockJobDriver *driver, 
BlockDriverState *bs,
int64_t speed, BlockCompletionFunc *cb,
void *opaque, Error **errp)
 {
+BlockBackend *blk;
 BlockJob *job;
 
 if (bs->job) {
 error_setg(errp, QERR_DEVICE_IN_USE, bdrv_get_device_name(bs));
 return NULL;
 }
-bdrv_ref(bs);
+
+blk = blk_new();
+blk_insert_bs(blk, bs);
+
 job = g_malloc0(driver->instance_size);
 error_setg(>blocker, "block device is in use by block job: %s",
BlockJobType_lookup[driver->job_type]);
@@ -80,6 +84,7 @@ void *block_job_create(const BlockJobDriver *driver, 
BlockDriverState *bs,
 job->driver= driver;
 job->id= g_strdup(bdrv_get_device_name(bs));
 job->bs= bs;
+job->blk   = blk;
 job->cb= cb;
 job->opaque= opaque;
 job->busy  = true;
@@ -110,9 +115,10 @@ void block_job_ref(BlockJob *job)
 void block_job_unref(BlockJob *job)
 {
 if (--job->refcnt == 0) {
-job->bs->job = NULL;
-bdrv_op_unblock_all(job->bs, job->blocker);
-bdrv_unref(job->bs);
+BlockDriverState *bs = blk_bs(job->blk);
+bs->job = NULL;
+bdrv_op_unblock_all(bs, job->blocker);
+blk_unref(job->blk);
 error_free(job->blocker);
 g_free(job->id);
 QLIST_REMOVE(job, job_list);
@@ -153,7 +159,7 @@ static void block_job_completed_txn_abort(BlockJob *job)
 txn->aborting = true;
 /* We are the first failed job. Cancel other jobs. */
 QLIST_FOREACH(other_job, >jobs, txn_list) {
-ctx = bdrv_get_aio_context(other_job->bs);
+ctx = blk_get_aio_context(other_job->blk);
 aio_context_acquire(ctx);
 }
 QLIST_FOREACH(other_job, >jobs, txn_list) {
@@ -170,7 +176,7 @@ static void block_job_completed_txn_abort(BlockJob *job)
 assert(other_job->completed);
 }
 QLIST_FOREACH_SAFE(other_job, >jobs, txn_list, next) {
-ctx = bdrv_get_aio_context(other_job->bs);
+ctx = blk_get_aio_context(other_job->blk);
 block_job_completed_single(other_job);
 aio_context_release(ctx);
 }
@@ -192,7 +198,7 @@ static void block_job_completed_txn_success(BlockJob *job)
 }
 /* We are the last completed job, commit the transaction. */
 QLIST_FOREACH_SAFE(other_job, >jobs, txn_list, next) {
-ctx = bdrv_get_aio_context(other_job->bs);
+ctx = blk_get_aio_context(other_job->blk);
 aio_context_acquire(ctx);
 assert(other_job->ret == 0);
 block_job_completed_single(other_job);
@@ -202,9 +208,7 @@ static void block_job_completed_txn_success(BlockJob *job)
 
 void block_job_completed(BlockJob *job, int ret)
 {
-BlockDriverState *bs = job->bs;
-
-assert(bs->job == job);
+assert(blk_bs(job->blk)->job == job);
 assert(!job->completed);
 job->completed = true;
 job->ret = ret;
@@ -295,11 +299,10 @@ static int block_job_finish_sync(BlockJob *job,
  void (*finish)(BlockJob *, Error **errp),
  Error **errp)
 {
-BlockDriverState *bs = job->bs;
 Error *local_err = NULL;
 int ret;
 
-assert(bs->job == job);
+

[Qemu-devel] [PULL 26/31] backup: Don't leak BackupBlockJob in error path

2016-05-25 Thread Kevin Wolf

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
Reviewed-by: Alberto Garcia 
---
 block/backup.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index fec45e8..a990cf1 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -485,6 +485,7 @@ void backup_start(BlockDriverState *bs, BlockDriverState 
*target,
 {
 int64_t len;
 BlockDriverInfo bdi;
+BackupBlockJob *job = NULL;
 int ret;
 
 assert(bs);
@@ -542,8 +543,7 @@ void backup_start(BlockDriverState *bs, BlockDriverState 
*target,
 goto error;
 }
 
-BackupBlockJob *job = block_job_create(_job_driver, bs, speed,
-   cb, opaque, errp);
+job = block_job_create(_job_driver, bs, speed, cb, opaque, errp);
 if (!job) {
 goto error;
 }
@@ -584,4 +584,7 @@ void backup_start(BlockDriverState *bs, BlockDriverState 
*target,
 if (sync_bitmap) {
 bdrv_reclaim_dirty_bitmap(bs, sync_bitmap, NULL);
 }
+if (job) {
+block_job_unref(>common);
+}
 }
-- 
1.8.3.1

Re: [Qemu-devel] [RFC v2 03/11] docs: new design document multi-thread-tcg.txt (DRAFTING)

2016-05-25 Thread Paolo Bonzini

> The page table seems to be protected by 'mmap_lock' in user mode
> emulation but by 'tb_lock' in system mode emulation. It may turn to be
> possible to read it safely even with no lock held.

Yes, it is possible to at least follow the radix tree safely with no
lock held.  The fields in the leaves can be either lockless or protected
by a lock.

The radix tree can be followed without a lock just like you do with RCU.
The difference with RCU is that:

1) the leaves are protected with a lock, so you don't do the "copy";
instead after reading you lock around updates

2) the radix tree is only ever added to, so you don't need to protect
the reads with rcu_read_lock/rcu_read_unlock.  rcu_read_lock and
rcu_read_unlock are only needed to inform the deleters that something
cannot yet go away.  Without deleters, you don't need rcu_read_lock
and rcu_read_unlock (but you still need atomic_rcu_read/atomic_rcu_set).

Paolo

Re: [Qemu-devel] [PATCH 1/5] block: split write_zeroes always

2016-05-25 Thread Eric Blake

On 05/17/2016 10:34 AM, Kevin Wolf wrote:
> Am 17.05.2016 um 11:15 hat Denis V. Lunev geschrieben:
>> We should split requests even if they are less than write_zeroes_alignment.
>> For example we can have the following request:
>>   offset 62k
>>   size   4k
>>   write_zeroes_alignment 64k
>> The original code sent 1 request covering 2 qcow2 clusters, and resulted
>> in both clusters being allocated. But by splitting the request, we can
>> cater to the case where one of the two clusters can be zeroed as a
>> whole, for only 1 cluster allocated after the operation.
>>
>> Signed-off-by: Denis V. Lunev 
>> CC: Eric Blake 
>> CC: Kevin Wolf 
>> ---
>>  block/io.c | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/block/io.c b/block/io.c
>> index cd6d71a..6a24ea8 100644
>> --- a/block/io.c
>> +++ b/block/io.c
>> @@ -1172,13 +1172,13 @@ static int coroutine_fn 
>> bdrv_co_do_write_zeroes(BlockDriverState *bs,
>>  /* Align request.  Block drivers can expect the "bulk" of the 
>> request
>>   * to be aligned.
>>   */
>> -if (bs->bl.write_zeroes_alignment
>> -&& num > bs->bl.write_zeroes_alignment) {
>> +if (bs->bl.write_zeroes_alignment) {
>>  if (sector_num % bs->bl.write_zeroes_alignment != 0) {
>>  /* Make a small request up to the first aligned sector.  */
>>  num = bs->bl.write_zeroes_alignment;
>>  num -= sector_num % bs->bl.write_zeroes_alignment;
> 
> Turns out this doesn't work. If this is a small request that zeros
> something in the middle of a single cluster (i.e. we have untouched data
> both before and after the request in the same cluster), then num can now
> become greater than nb_sectors, so that we end up zeroing too much.

I'm planning on folding in a working version of this patch in my
byte-based write_zeroes conversion series.  As part of the patch, I'm
also hoisting the division out of the loop (no guarantees that the
compiler can spot that bs->bl.write_zeroes_alignment will be a power of
two, to optimize it to a shift).



-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v2 1/4] xlnx-zynqmp: Add a secure prop to en/disable ARM Security Extensions

2016-05-25 Thread Alistair Francis

On Wed, May 25, 2016 at 3:52 AM, Edgar E. Iglesias
 wrote:
> From: "Edgar E. Iglesias" 
>
> Add a secure prop to en/disable ARM Security Extensions.
> This is particularly useful for KVM runs.
>
> Default to disabled to match the behavior of KVM.
>
> This changes the default setup from having the ARM Security
> Extensions to not longer having them.
>
> Signed-off-by: Edgar E. Iglesias 

Looks good to me

Reviewed-by: Alistair Francis 

Thanks,

Alistair

> ---
>  hw/arm/xlnx-zynqmp.c | 3 +++
>  include/hw/arm/xlnx-zynqmp.h | 3 +++
>  2 files changed, 6 insertions(+)
>
> diff --git a/hw/arm/xlnx-zynqmp.c b/hw/arm/xlnx-zynqmp.c
> index 4d504da..965a250 100644
> --- a/hw/arm/xlnx-zynqmp.c
> +++ b/hw/arm/xlnx-zynqmp.c
> @@ -238,6 +238,8 @@ static void xlnx_zynqmp_realize(DeviceState *dev, Error 
> **errp)
>  }
>  g_free(name);
>
> +object_property_set_bool(OBJECT(>apu_cpu[i]),
> + s->secure, "has_el3", NULL);
>  object_property_set_int(OBJECT(>apu_cpu[i]), GIC_BASE_ADDR,
>  "reset-cbar", _abort);
>  object_property_set_bool(OBJECT(>apu_cpu[i]), true, "realized",
> @@ -370,6 +372,7 @@ static void xlnx_zynqmp_realize(DeviceState *dev, Error 
> **errp)
>
>  static Property xlnx_zynqmp_props[] = {
>  DEFINE_PROP_STRING("boot-cpu", XlnxZynqMPState, boot_cpu),
> +DEFINE_PROP_BOOL("secure", XlnxZynqMPState, secure, false),
>  DEFINE_PROP_END_OF_LIST()
>  };
>
> diff --git a/include/hw/arm/xlnx-zynqmp.h b/include/hw/arm/xlnx-zynqmp.h
> index 2332596..38d4c8c 100644
> --- a/include/hw/arm/xlnx-zynqmp.h
> +++ b/include/hw/arm/xlnx-zynqmp.h
> @@ -84,6 +84,9 @@ typedef struct XlnxZynqMPState {
>
>  char *boot_cpu;
>  ARMCPU *boot_cpu_ptr;
> +
> +/* Has the ARM Security extensions?  */
> +bool secure;
>  }  XlnxZynqMPState;
>
>  #define XLNX_ZYNQMP_H
> --
> 2.5.0
>
>

[Qemu-devel] [PULL 09/31] block: Drop bdrv_parent_cb_...() from bdrv_close()

2016-05-25 Thread Kevin Wolf

From: Max Reitz 

bdrv_close() now asserts that the BDS's refcount is 0, therefore it
cannot have any parents and the bdrv_parent_cb_change_media() call is a
no-op.

Signed-off-by: Max Reitz 
Reviewed-by: Kevin Wolf 
Signed-off-by: Kevin Wolf 
---
 block.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/block.c b/block.c
index 4c021c0..38df365 100644
--- a/block.c
+++ b/block.c
@@ -2117,8 +2117,6 @@ static void bdrv_close(BlockDriverState *bs)
 bdrv_release_named_dirty_bitmaps(bs);
 assert(QLIST_EMPTY(>dirty_bitmaps));
 
-bdrv_parent_cb_change_media(bs, false);
-
 if (bs->drv) {
 BdrvChild *child, *next;
 
-- 
1.8.3.1

[Qemu-devel] [PULL 06/31] block: Drop bdrv_new_root()

2016-05-25 Thread Kevin Wolf

From: Max Reitz 

It is unused now, so we may just as well drop it.

Signed-off-by: Max Reitz 
Reviewed-by: Alberto Garcia 
Reviewed-by: Kevin Wolf 
Signed-off-by: Kevin Wolf 
---
 block.c   | 5 -
 include/block/block.h | 1 -
 2 files changed, 6 deletions(-)

diff --git a/block.c b/block.c
index 17ee088..ed4b487 100644
--- a/block.c
+++ b/block.c
@@ -220,11 +220,6 @@ void bdrv_register(BlockDriver *bdrv)
 QLIST_INSERT_HEAD(_drivers, bdrv, list);
 }
 
-BlockDriverState *bdrv_new_root(void)
-{
-return bdrv_new();
-}
-
 BlockDriverState *bdrv_new(void)
 {
 BlockDriverState *bs;
diff --git a/include/block/block.h b/include/block/block.h
index 770ca26..348e478 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -197,7 +197,6 @@ BlockDriver *bdrv_find_format(const char *format_name);
 int bdrv_create(BlockDriver *drv, const char* filename,
 QemuOpts *opts, Error **errp);
 int bdrv_create_file(const char *filename, QemuOpts *opts, Error **errp);
-BlockDriverState *bdrv_new_root(void);
 BlockDriverState *bdrv_new(void);
 void bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top);
 void bdrv_replace_in_backing_chain(BlockDriverState *old,
-- 
1.8.3.1

[Qemu-devel] [PULL 24/31] mirror: Allow target that already has a BlockBackend

2016-05-25 Thread Kevin Wolf

We had to forbid mirroring to a target BDS that already had a BB
attached because the node swapping at job completion would add a second
BB and we didn't support multiple BBs on a single BDS at the time. Now
we do, so we can lift the restriction.

As we allow additional BlockBackends for the target, we must expect
other users to be sending requests. There may no requests be in flight
during the graph modification, so we have to drain those users now.

The core part of this patch is a revert of commit 40365552.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 block/mirror.c | 33 ++---
 blockdev.c |  4 
 tests/qemu-iotests/041 | 27 ---
 tests/qemu-iotests/041.out |  4 ++--
 4 files changed, 8 insertions(+), 60 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index efca8fc..23aa10e 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -20,7 +20,6 @@
 #include "qapi/qmp/qerror.h"
 #include "qemu/ratelimit.h"
 #include "qemu/bitmap.h"
-#include "qemu/error-report.h"
 
 #define SLICE_TIME1ULL /* ns */
 #define MAX_IN_FLIGHT 16
@@ -466,24 +465,20 @@ static void mirror_exit(BlockJob *job, void *opaque)
 to_replace = s->to_replace;
 }
 
-/* This was checked in mirror_start_job(), but meanwhile one of the
- * nodes could have been newly attached to a BlockBackend. */
-if (bdrv_has_blk(to_replace) && bdrv_has_blk(s->target)) {
-error_report("block job: Can't create node with two 
BlockBackends");
-data->ret = -EINVAL;
-goto out;
-}
-
 if (bdrv_get_flags(s->target) != bdrv_get_flags(to_replace)) {
 bdrv_reopen(s->target, bdrv_get_flags(to_replace), NULL);
 }
+
+/* The mirror job has no requests in flight any more, but we need to
+ * drain potential other users of the BDS before changing the graph. */
+bdrv_drained_begin(s->target);
 bdrv_replace_in_backing_chain(to_replace, s->target);
+bdrv_drained_end(s->target);
+
 /* We just changed the BDS the job BB refers to */
 blk_remove_bs(job->blk);
 blk_insert_bs(job->blk, src);
 }
-
-out:
 if (s->to_replace) {
 bdrv_op_unblock_all(s->to_replace, s->replace_blocker);
 error_free(s->replace_blocker);
@@ -807,7 +802,6 @@ static void mirror_start_job(BlockDriverState *bs, 
BlockDriverState *target,
  bool is_none_mode, BlockDriverState *base)
 {
 MirrorBlockJob *s;
-BlockDriverState *replaced_bs;
 
 if (granularity == 0) {
 granularity = bdrv_get_default_bitmap_granularity(target);
@@ -824,21 +818,6 @@ static void mirror_start_job(BlockDriverState *bs, 
BlockDriverState *target,
 buf_size = DEFAULT_MIRROR_BUF_SIZE;
 }
 
-/* We can't support this case as long as the block layer can't handle
- * multiple BlockBackends per BlockDriverState. */
-if (replaces) {
-replaced_bs = bdrv_lookup_bs(replaces, replaces, errp);
-if (replaced_bs == NULL) {
-return;
-}
-} else {
-replaced_bs = bs;
-}
-if (bdrv_has_blk(replaced_bs) && bdrv_has_blk(target)) {
-error_setg(errp, "Can't create node with two BlockBackends");
-return;
-}
-
 s = block_job_create(driver, bs, speed, cb, opaque, errp);
 if (!s) {
 return;
diff --git a/blockdev.c b/blockdev.c
index af0b022..3c06746 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3459,10 +3459,6 @@ static void blockdev_mirror_common(BlockDriverState *bs,
 if (bdrv_op_is_blocked(target, BLOCK_OP_TYPE_MIRROR_TARGET, errp)) {
 return;
 }
-if (bdrv_has_blk(target)) {
-error_setg(errp, "Cannot mirror to an attached block device");
-return;
-}
 
 if (!bs->backing && sync == MIRROR_SYNC_MODE_TOP) {
 sync = MIRROR_SYNC_MODE_FULL;
diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
index b1c542f..ed1d9d4 100755
--- a/tests/qemu-iotests/041
+++ b/tests/qemu-iotests/041
@@ -207,33 +207,6 @@ class TestSingleBlockdev(TestSingleDrive):
 test_image_not_found = None
 test_small_buffer2 = None
 
-class TestBlockdevAttached(iotests.QMPTestCase):
-image_len = 1 * 1024 * 1024 # MB
-
-def setUp(self):
-iotests.create_image(backing_img, self.image_len)
-qemu_img('create', '-f', iotests.imgfmt, '-o', 'backing_file=%s' % 
backing_img, test_img)
-qemu_img('create', '-f', iotests.imgfmt, '-o', 'backing_file=%s' % 
backing_img, target_img)
-self.vm = iotests.VM().add_drive(test_img)
-self.vm.launch()
-
-def tearDown(self):
-self.vm.shutdown()
-os.remove(test_img)
-os.remove(target_img)
-
-def test_blockdev_attached(self):
-self.assert_no_active_block_jobs()
-args = {'options':
-{'driver':

[Qemu-devel] [PULL 22/31] block: Make blk_co_preadv/pwritev() public

2016-05-25 Thread Kevin Wolf

Also add trace points now that the function can be directly called.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
Reviewed-by: Eric Blake 
Reviewed-by: Alberto Garcia 
---
 block/block-backend.c  | 21 ++---
 include/sysemu/block-backend.h |  6 ++
 trace-events   |  4 
 3 files changed, 24 insertions(+), 7 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index c8b13f1..34500e6 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -19,6 +19,7 @@
 #include "sysemu/sysemu.h"
 #include "qapi-event.h"
 #include "qemu/id.h"
+#include "trace.h"
 
 /* Number of coroutines to reserve per attached device model */
 #define COROUTINE_POOL_RESERVATION 64
@@ -741,11 +742,15 @@ static int blk_check_request(BlockBackend *blk, int64_t 
sector_num,
   nb_sectors * BDRV_SECTOR_SIZE);
 }
 
-static int coroutine_fn blk_co_preadv(BlockBackend *blk, int64_t offset,
-  unsigned int bytes, QEMUIOVector *qiov,
-  BdrvRequestFlags flags)
+int coroutine_fn blk_co_preadv(BlockBackend *blk, int64_t offset,
+   unsigned int bytes, QEMUIOVector *qiov,
+   BdrvRequestFlags flags)
 {
-int ret = blk_check_byte_request(blk, offset, bytes);
+int ret;
+
+trace_blk_co_preadv(blk, blk_bs(blk), offset, bytes, flags);
+
+ret = blk_check_byte_request(blk, offset, bytes);
 if (ret < 0) {
 return ret;
 }
@@ -758,12 +763,14 @@ static int coroutine_fn blk_co_preadv(BlockBackend *blk, 
int64_t offset,
 return bdrv_co_preadv(blk_bs(blk), offset, bytes, qiov, flags);
 }
 
-static int coroutine_fn blk_co_pwritev(BlockBackend *blk, int64_t offset,
-  unsigned int bytes, QEMUIOVector *qiov,
-  BdrvRequestFlags flags)
+int coroutine_fn blk_co_pwritev(BlockBackend *blk, int64_t offset,
+unsigned int bytes, QEMUIOVector *qiov,
+BdrvRequestFlags flags)
 {
 int ret;
 
+trace_blk_co_pwritev(blk, blk_bs(blk), offset, bytes, flags);
+
 ret = blk_check_byte_request(blk, offset, bytes);
 if (ret < 0) {
 return ret;
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 9571726..c04af8e 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -113,6 +113,12 @@ void *blk_get_attached_dev(BlockBackend *blk);
 void blk_set_dev_ops(BlockBackend *blk, const BlockDevOps *ops, void *opaque);
 int blk_pread_unthrottled(BlockBackend *blk, int64_t offset, uint8_t *buf,
   int count);
+int coroutine_fn blk_co_preadv(BlockBackend *blk, int64_t offset,
+   unsigned int bytes, QEMUIOVector *qiov,
+   BdrvRequestFlags flags);
+int coroutine_fn blk_co_pwritev(BlockBackend *blk, int64_t offset,
+   unsigned int bytes, QEMUIOVector *qiov,
+   BdrvRequestFlags flags);
 int blk_pwrite_zeroes(BlockBackend *blk, int64_t offset,
   int count, BdrvRequestFlags flags);
 BlockAIOCB *blk_aio_pwrite_zeroes(BlockBackend *blk, int64_t offset,
diff --git a/trace-events b/trace-events
index e01b436..b85fb49 100644
--- a/trace-events
+++ b/trace-events
@@ -61,6 +61,10 @@ virtio_console_chr_event(unsigned int port, int event) "port 
%u, event %d"
 bdrv_open_common(void *bs, const char *filename, int flags, const char 
*format_name) "bs %p filename \"%s\" flags %#x format_name \"%s\""
 bdrv_lock_medium(void *bs, bool locked) "bs %p locked %d"
 
+# block/block-backend.c
+blk_co_preadv(void *blk, void *bs, int64_t offset, unsigned int bytes, int 
flags) "blk %p bs %p offset %"PRId64" bytes %u flags %x"
+blk_co_pwritev(void *blk, void *bs, int64_t offset, unsigned int bytes, int 
flags) "blk %p bs %p offset %"PRId64" bytes %u flags %x"
+
 # block/io.c
 bdrv_aio_discard(void *bs, int64_t sector_num, int nb_sectors, void *opaque) 
"bs %p sector_num %"PRId64" nb_sectors %d opaque %p"
 bdrv_aio_flush(void *bs, void *opaque) "bs %p opaque %p"
-- 
1.8.3.1

[Qemu-devel] [PULL 17/31] block: Rename blk_write_zeroes()

2016-05-25 Thread Kevin Wolf

From: Eric Blake 

Commit 983a1600 changed the semantics of blk_write_zeroes() to
be byte-based rather than sector-based, but did not change the
name, which is an open invitation for other code to misuse the
function.  Renaming to pwrite_zeroes() makes it more in line
with other byte-based interfaces, and will help make it easier
to track which remaining write_zeroes interfaces still need
conversion.

Reported-by: Kevin Wolf 
Signed-off-by: Eric Blake 
Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 block/block-backend.c  | 14 +++---
 block/parallels.c  |  4 ++--
 hw/scsi/scsi-disk.c|  2 +-
 include/sysemu/block-backend.h | 14 +++---
 qemu-img.c |  4 ++--
 qemu-io-cmds.c | 22 +++---
 6 files changed, 30 insertions(+), 30 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 14e528e..f9da1d1 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -855,8 +855,8 @@ int blk_pread_unthrottled(BlockBackend *blk, int64_t 
offset, uint8_t *buf,
 return ret;
 }
 
-int blk_write_zeroes(BlockBackend *blk, int64_t offset,
- int count, BdrvRequestFlags flags)
+int blk_pwrite_zeroes(BlockBackend *blk, int64_t offset,
+  int count, BdrvRequestFlags flags)
 {
 return blk_prw(blk, offset, NULL, count, blk_write_entry,
flags | BDRV_REQ_ZERO_WRITE);
@@ -971,9 +971,9 @@ static void blk_aio_write_entry(void *opaque)
 blk_aio_complete(acb);
 }
 
-BlockAIOCB *blk_aio_write_zeroes(BlockBackend *blk, int64_t offset,
- int count, BdrvRequestFlags flags,
- BlockCompletionFunc *cb, void *opaque)
+BlockAIOCB *blk_aio_pwrite_zeroes(BlockBackend *blk, int64_t offset,
+  int count, BdrvRequestFlags flags,
+  BlockCompletionFunc *cb, void *opaque)
 {
 return blk_aio_prwv(blk, offset, count, NULL, blk_aio_write_entry,
 flags | BDRV_REQ_ZERO_WRITE, cb, opaque);
@@ -1462,8 +1462,8 @@ void *blk_aio_get(const AIOCBInfo *aiocb_info, 
BlockBackend *blk,
 return qemu_aio_get(aiocb_info, blk_bs(blk), cb, opaque);
 }
 
-int coroutine_fn blk_co_write_zeroes(BlockBackend *blk, int64_t offset,
- int count, BdrvRequestFlags flags)
+int coroutine_fn blk_co_pwrite_zeroes(BlockBackend *blk, int64_t offset,
+  int count, BdrvRequestFlags flags)
 {
 return blk_co_pwritev(blk, offset, count, NULL,
   flags | BDRV_REQ_ZERO_WRITE);
diff --git a/block/parallels.c b/block/parallels.c
index 88cface..99fc0f7 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -517,8 +517,8 @@ static int parallels_create(const char *filename, QemuOpts 
*opts, Error **errp)
 if (ret < 0) {
 goto exit;
 }
-ret = blk_write_zeroes(file, BDRV_SECTOR_SIZE,
-   (bat_sectors - 1) << BDRV_SECTOR_BITS, 0);
+ret = blk_pwrite_zeroes(file, BDRV_SECTOR_SIZE,
+(bat_sectors - 1) << BDRV_SECTOR_BITS, 0);
 if (ret < 0) {
 goto exit;
 }
diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index c374ea8..8865da5 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -1780,7 +1780,7 @@ static void scsi_disk_emulate_write_same(SCSIDiskReq *r, 
uint8_t *inbuf)
 block_acct_start(blk_get_stats(s->qdev.conf.blk), >acct,
  nb_sectors * s->qdev.blocksize,
 BLOCK_ACCT_WRITE);
-r->req.aiocb = blk_aio_write_zeroes(s->qdev.conf.blk,
+r->req.aiocb = blk_aio_pwrite_zeroes(s->qdev.conf.blk,
 r->req.cmd.lba * s->qdev.blocksize,
 nb_sectors * s->qdev.blocksize,
 flags, scsi_aio_complete, r);
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 9d6615c..9571726 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -113,11 +113,11 @@ void *blk_get_attached_dev(BlockBackend *blk);
 void blk_set_dev_ops(BlockBackend *blk, const BlockDevOps *ops, void *opaque);
 int blk_pread_unthrottled(BlockBackend *blk, int64_t offset, uint8_t *buf,
   int count);
-int blk_write_zeroes(BlockBackend *blk, int64_t offset,
- int count, BdrvRequestFlags flags);
-BlockAIOCB *blk_aio_write_zeroes(BlockBackend *blk, int64_t offset,
- int count, BdrvRequestFlags flags,
- BlockCompletionFunc *cb, void *opaque);
+int blk_pwrite_zeroes(BlockBackend *blk, int64_t offset,
+  int count, BdrvRequestFlags flags);
+BlockAIOCB

Re: [Qemu-devel] [PATCH v3] xen-hvm: ignore background I/O sections

2016-05-25 Thread Paolo Bonzini



- Original Message -
> From: "Anthony PERARD" 
> To: "Paul Durrant" 
> Cc: qemu-devel@nongnu.org, xen-de...@lists.xenproject.org, "Stefano 
> Stabellini" , "Paolo
> Bonzini" 
> Sent: Wednesday, May 25, 2016 5:52:32 PM
> Subject: Re: [PATCH v3] xen-hvm: ignore background I/O sections
> 
> On Mon, May 09, 2016 at 05:31:20PM +0100, Paul Durrant wrote:
> > Since Xen will correctly handle accesses to unimplemented I/O ports (by
> > returning all 1's for reads and ignoring writes) there is no need for
> > QEMU to register backgroud I/O sections.
> > 
> > This patch therefore adds checks to xen_io_add/del so that sections with
> > memory-region ops pointing at 'unassigned_io_ops' are ignored.
> > 
> > Signed-off-by: Paul Durrant 
> > Cc: Stefano Stabellini 
> > Cc: Anthony Perard 
> > Cc: Paolo Bonzini 
> 
> Acked-by: Anthony PERARD 

Do you have a signed GPG key or do I need to apply this for you?

Thanks,

Paolo

[Qemu-devel] [PULL 31/31] blockjob: Remove BlockJob.bs

2016-05-25 Thread Kevin Wolf

There is a single remaining user in qemu-img, and another one in a test
case, both of which can be trivially converted to using BlockJob.blk
instead.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 blockjob.c| 1 -
 include/block/blockjob.h  | 1 -
 qemu-img.c| 2 +-
 tests/test-blockjob-txn.c | 3 ++-
 4 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index 2097e1d..c095cc5 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -83,7 +83,6 @@ void *block_job_create(const BlockJobDriver *driver, 
BlockDriverState *bs,
 
 job->driver= driver;
 job->id= g_strdup(bdrv_get_device_name(bs));
-job->bs= bs;
 job->blk   = blk;
 job->cb= cb;
 job->opaque= opaque;
diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index 32012af..86d2807 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -82,7 +82,6 @@ struct BlockJob {
 const BlockJobDriver *driver;
 
 /** The block device on which the job is operating.  */
-BlockDriverState *bs; /* TODO Remove */
 BlockBackend *blk;
 
 /**
diff --git a/qemu-img.c b/qemu-img.c
index 2065348..4b56ad3 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -775,7 +775,7 @@ static void common_block_job_cb(void *opaque, int ret)
 
 static void run_block_job(BlockJob *job, Error **errp)
 {
-AioContext *aio_context = bdrv_get_aio_context(job->bs);
+AioContext *aio_context = blk_get_aio_context(job->blk);
 
 do {
 aio_poll(aio_context, true);
diff --git a/tests/test-blockjob-txn.c b/tests/test-blockjob-txn.c
index 55fad95..828389b 100644
--- a/tests/test-blockjob-txn.c
+++ b/tests/test-blockjob-txn.c
@@ -15,6 +15,7 @@
 #include "qapi/error.h"
 #include "qemu/main-loop.h"
 #include "block/blockjob.h"
+#include "sysemu/block-backend.h"
 
 typedef struct {
 BlockJob common;
@@ -30,7 +31,7 @@ static const BlockJobDriver test_block_job_driver = {
 
 static void test_block_job_complete(BlockJob *job, void *opaque)
 {
-BlockDriverState *bs = job->bs;
+BlockDriverState *bs = blk_bs(job->blk);
 int rc = (intptr_t)opaque;
 
 if (block_job_is_cancelled(job)) {
-- 
1.8.3.1

Re: [Qemu-devel] [QEMU RFC PATCH v2 4/6] Migration: migrate QTAILQ

2016-05-25 Thread Jianjun Duan

I will try to explain my design rationale in details here.

1 QTAILQ should only be accessed using the interfaces defined in
queue.h. Its structs should not be directly used. So I created
interfaces in queue.h to query about its layout. If the implementation
is changed, these interfaces should be changed accordingly. Code using
these interfaces should not break.

2 Based on point 1, vmstate_load_state/vmstate_put_state in vmstate.c
doesn't exactly know the structs of QTAILAQ head and entry. So pointer
arithmetic is needed to put/get a QTAILQ. To do it, we need those 6
parameters to be passed in. So it is not redundant if we only want to
only use the interfaces.

3 At this moment, vmstate_load_state/vmstate_put_state couldn't handle a
queue, or a list, or another recursive structure. To make it
extensible, I think a metadata is needed. The idea is for any
structure which needs special handling, customized metadata/put/get
should provide enough flexibility to hack around.

There are two issues I tried to address. One is the recursive queue,
another is working around of hiding of the QTAILQ implementation
details. It seems we need to agree on how the latter should be
addressed.

I will give it a try to fix those 35 calling sites of VMStateInfo.

Thanks,
Jianjun

On 05/25/2016 10:51 AM, Paolo Bonzini wrote:
 +/*
 + * Following 3 fields are for VMStateField which needs customized
 handling,
 + * such as QTAILQ in qemu/queue.h, lists, and tree.
 + */
 +const void *meta_data;
 +int (*extend_get)(QEMUFile *f, const void *metadata, void *opaque);
 +void (*extend_put)(QEMUFile *f, const void *metadata, void *opaque,
 +   QJSON *vmdesc);
  } VMStateField;
>>>
>>> Do not add these two function pointers to VMStateField, instead add
>>> QJSON* and VMStateField* arguments as needed to VMStateInfo's get and put.
>>
>> That is definitely the ideal way. However, VMStateInfo's get/put are
>> already used extensively. If I change them, I need change all the
>> calling sites of them. Not very sure about whether it will be welcomed.
> 
> Sure, don't be worried. :)  True, there are quite a few VMStateInfos (about
> 35) but the changes are simple.
> 
 +#define VMSTATE_QTAILQ_METADATA(_field, _state, _type, _next, _vmsd) { \
 +.first = QTAILQ_FIRST_OFFSET(typeof_field(_state, _field)),\
 +.last =  QTAILQ_LAST_OFFSET(typeof_field(_state, _field)), \
 +.next = QTAILQ_NEXT_OFFSET(_type, _next),  \
 +.prev = QTAILQ_PREV_OFFSET(_type, _next),  \
 +.entry = offsetof(_type, _next),   \
 +.size = sizeof(_type), \
 +.vmsd = &(_vmsd),  \
 +}
>>>
>>> .last and .prev are unnecessary, since they come just after .first and
>>> .next (and their use is hidden behind QTAILQ_RAW_*).  .first and .next
>   
> 
> Actually, .first and .next are always 0.  I misread your changes to 
> qemu/queue.h.
> See below.
> 
>>> can be placed in .offset and .num_offset respectively.  So you don't
>>> really need an extra metadata struct.
>>>
>>> If you prefer you could have something like
>>>
>>> union {
>>> size_t num_offset;/* VMS_VARRAY */
>>> size_t size_offset;   /* VMS_VBUFFER */
>>> size_t next_offset;   /* VMS_TAILQ */
>>> } offsets;
>>
>> Actually I explored the approach in which I use a VMSD to encode all the
>> information. But a VMSD describes actual memory layout. Interpreting it
>> another way could be confusing.
>>
>> One of the assumption about QTAILQ is that we can only use the
>> interfaces defined in queue.h to access it. I intend not to depend on
>> its actually layout. From this perspective, these 6 parameters are
>> needed.
> 
> You are already adding QTAILQ_RAW_*, aren't you?  Those interfaces
> would need to know about the layout, and you are passing redundant
> information:
> 
> - .next_offset should always be 0
> - .prev_offset should always be sizeof(void *)
> - .first_offset should always be 0
> - .last_offset should always be sizeof(void *)
> 
> so you only need head and entry, which you can store in .offset and
> .num_offset.  The .vmsd field you have in ->metadata can be stored
> in VMStateField's .vmsd, and likewise for .size (which can be stored
> in VMStateField's .vmsd).
> 
> Thanks,
> 
> Paolo
>

[Qemu-devel] [PATCH v1 1/1] zynqmp: Add the ZCU102 board

2016-05-25 Thread Alistair Francis

Most Zynq UltraScale+ users will be targetting and using the ZCU102
board instead of the development focused EP108. To make our QEMU machine
names clearer add a ZCU102 machine model.

Signed-off-by: Alistair Francis 
---
There are differences between the two boards, but at the moment we don't
support those differences in QEMU so it doesn't make sense to use a new
init function. So for the time being this machine is the same as the
EP108 machine.

 hw/arm/xlnx-ep108.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/hw/arm/xlnx-ep108.c b/hw/arm/xlnx-ep108.c
index 40f7cc1..34b4641 100644
--- a/hw/arm/xlnx-ep108.c
+++ b/hw/arm/xlnx-ep108.c
@@ -114,3 +114,11 @@ static void xlnx_ep108_machine_init(MachineClass *mc)
 }
 
 DEFINE_MACHINE("xlnx-ep108", xlnx_ep108_machine_init)
+
+static void xlnx_zcu102_machine_init(MachineClass *mc)
+{
+mc->desc = "Xilinx ZynqMP ZCU102 board";
+mc->init = xlnx_ep108_init;
+}
+
+DEFINE_MACHINE("xlnx-zcu102", xlnx_zcu102_machine_init)
-- 
2.7.4

[Qemu-devel] [PATCH v7 1/3] loader: Allow ELF loader to auto-detect the ELF arch

2016-05-25 Thread Alistair Francis

If the caller didn't specify an architecture for the ELF machine
the load_elf() function will auto detect it based on the ELF file.

Signed-off-by: Alistair Francis 
---
V7:
 - Fix typo

 hw/core/loader.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/hw/core/loader.c b/hw/core/loader.c
index 53e0e41..a8a372d 100644
--- a/hw/core/loader.c
+++ b/hw/core/loader.c
@@ -419,6 +419,7 @@ int load_elf(const char *filename, uint64_t 
(*translate_fn)(void *, uint64_t),
 {
 int fd, data_order, target_data_order, must_swab, ret = ELF_LOAD_FAILED;
 uint8_t e_ident[EI_NIDENT];
+uint16_t e_machine;
 
 fd = open(filename, O_RDONLY | O_BINARY);
 if (fd < 0) {
@@ -451,6 +452,15 @@ int load_elf(const char *filename, uint64_t 
(*translate_fn)(void *, uint64_t),
 goto fail;
 }
 
+if (elf_machine < 1) {
+/* The caller didn't specify an ARCH, we can figure it out */
+lseek(fd, 0x12, SEEK_SET);
+if (read(fd, _machine, sizeof(e_machine)) != sizeof(e_machine)) {
+goto fail;
+}
+elf_machine = e_machine;
+}
+
 lseek(fd, 0, SEEK_SET);
 if (e_ident[EI_CLASS] == ELFCLASS64) {
 ret = load_elf64(filename, fd, translate_fn, translate_opaque, 
must_swab,
-- 
2.7.4

Re: [Qemu-devel] [RFC PATCH 2/3] tcg: Add support for fence generation in x86 backend

2016-05-25 Thread Alex Bennée


Richard Henderson  writes:

> On 05/24/2016 10:18 AM, Pranith Kumar wrote:
>> Signed-off-by: Pranith Kumar 
>> ---
>>  tcg/i386/tcg-target.h | 1 +
>>  tcg/i386/tcg-target.inc.c | 9 +
>>  tcg/tcg-opc.h | 2 +-
>>  tcg/tcg.c | 1 +
>>  4 files changed, 12 insertions(+), 1 deletion(-)
>>
>> diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
>> index 92be341..93ea42e 100644
>> --- a/tcg/i386/tcg-target.h
>> +++ b/tcg/i386/tcg-target.h
>> @@ -100,6 +100,7 @@ extern bool have_bmi1;
>>  #define TCG_TARGET_HAS_muls2_i321
>>  #define TCG_TARGET_HAS_muluh_i320
>>  #define TCG_TARGET_HAS_mulsh_i320
>> +#define TCG_TARGET_HAS_fence1
>
> This has to be defined for all hosts.
>
> The default implementation should be a function call into tcg-runtime.c that
> calls smp_mb().

That would solves the problem of converting the various backends
piecemeal - although obviously we should move to all backends having
"native" support ASAP. However by introducing expensive substitute
functions we will slow down the translations as each front end is
expanded to translate the target barrier ops.

Should we make the emitting of the function call/TCGop conditional on
MTTCG being enabled? If we are running in round-robin mode there is no
need to issue any fence operations.

>
>> @@ -347,6 +347,7 @@ static inline int tcg_target_const_match(tcg_target_long 
>> val, TCGType type,
>>  #define OPC_SHRX(0xf7 | P_EXT38 | P_SIMDF2)
>>  #define OPC_TESTL   (0x85)
>>  #define OPC_XCHG_ax_r32 (0x90)
>> +#define OPC_MFENCE  (0xAE | P_EXT)
>>
>>  #define OPC_GRP3_Ev (0xf7)
>>  #define OPC_GRP5(0xff)
>> @@ -686,6 +687,14 @@ static inline void tcg_out_pushi(TCGContext *s, 
>> tcg_target_long val)
>>  }
>>  }
>>
>> +static inline void tcg_out_fence(TCGContext *s)
>> +{
>> +/* TODO: Figure out an appropriate place for the encoding */
>> +tcg_out8(s, 0x0F);
>> +tcg_out8(s, 0xAE);
>> +tcg_out8(s, 0xF0);
>> +}
>
> Why define OPC_MFENCE if you're not going to use it?  Of course, it's not
> exactly a complete and useful definition, so maybe just delete OPC_MFENCE.
>
> Also, for 32-bit you need to check for sse2 before outputting this.  See also
> the existing cpuid checks in tcg_target_init and the fallback smp_mb 
> definition
> for pre-gcc-4.4.
>
>
> r~


--
Alex Bennée

Re: [Qemu-devel] [RFC PATCH 2/3] tcg: Add support for fence generation in x86 backend

2016-05-25 Thread Sergey Fedorov

On 25/05/16 22:25, Alex Bennée wrote:
> Richard Henderson  writes:
>> On 05/24/2016 10:18 AM, Pranith Kumar wrote:
>>> Signed-off-by: Pranith Kumar 
>>> ---
>>>  tcg/i386/tcg-target.h | 1 +
>>>  tcg/i386/tcg-target.inc.c | 9 +
>>>  tcg/tcg-opc.h | 2 +-
>>>  tcg/tcg.c | 1 +
>>>  4 files changed, 12 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
>>> index 92be341..93ea42e 100644
>>> --- a/tcg/i386/tcg-target.h
>>> +++ b/tcg/i386/tcg-target.h
>>> @@ -100,6 +100,7 @@ extern bool have_bmi1;
>>>  #define TCG_TARGET_HAS_muls2_i321
>>>  #define TCG_TARGET_HAS_muluh_i320
>>>  #define TCG_TARGET_HAS_mulsh_i320
>>> +#define TCG_TARGET_HAS_fence1
>> This has to be defined for all hosts.
>>
>> The default implementation should be a function call into tcg-runtime.c that
>> calls smp_mb().
> That would solves the problem of converting the various backends
> piecemeal - although obviously we should move to all backends having
> "native" support ASAP. However by introducing expensive substitute
> functions we will slow down the translations as each front end is
> expanded to translate the target barrier ops.

I think it would better not to defer native support for the operation.
It should be relatively simple instruction. Otherwise we could wind up
deferring this indefinitely.

> Should we make the emitting of the function call/TCGop conditional on
> MTTCG being enabled? If we are running in round-robin mode there is no
> need to issue any fence operations.

Good idea.

Kind regards,
Sergey

Re: [Qemu-devel] [PATCH v2 3/4] xlnx-zynqmp: Delay realization of GIC until post CPU realization

2016-05-25 Thread Alistair Francis

On Wed, May 25, 2016 at 3:52 AM, Edgar E. Iglesias
 wrote:
> From: "Edgar E. Iglesias" 
>
> Delay the realization of the GIC until after CPUs are
> realized. This is needed for KVM as the in-kernel GIC
> model will fail if it is realized with no available CPUs.
>
> Reviewed-by: Peter Maydell 
> Signed-off-by: Edgar E. Iglesias 

Reviewed-by: Alistair Francis 

Thanks,

Alistair

> ---
>  hw/arm/xlnx-zynqmp.c | 56 
> +---
>  1 file changed, 31 insertions(+), 25 deletions(-)
>
> diff --git a/hw/arm/xlnx-zynqmp.c b/hw/arm/xlnx-zynqmp.c
> index 3a8af6a..db5b82b 100644
> --- a/hw/arm/xlnx-zynqmp.c
> +++ b/hw/arm/xlnx-zynqmp.c
> @@ -224,33 +224,9 @@ static void xlnx_zynqmp_realize(DeviceState *dev, Error 
> **errp)
>  qdev_prop_set_uint32(DEVICE(>gic), "num-irq", GIC_NUM_SPI_INTR + 32);
>  qdev_prop_set_uint32(DEVICE(>gic), "revision", 2);
>  qdev_prop_set_uint32(DEVICE(>gic), "num-cpu", 
> XLNX_ZYNQMP_NUM_APU_CPUS);
> -object_property_set_bool(OBJECT(>gic), true, "realized", );
> -if (err) {
> -error_propagate(errp, err);
> -return;
> -}
> -assert(ARRAY_SIZE(xlnx_zynqmp_gic_regions) == XLNX_ZYNQMP_GIC_REGIONS);
> -for (i = 0; i < XLNX_ZYNQMP_GIC_REGIONS; i++) {
> -SysBusDevice *gic = SYS_BUS_DEVICE(>gic);
> -const XlnxZynqMPGICRegion *r = _zynqmp_gic_regions[i];
> -MemoryRegion *mr = sysbus_mmio_get_region(gic, r->region_index);
> -uint32_t addr = r->address;
> -int j;
> -
> -sysbus_mmio_map(gic, r->region_index, addr);
> -
> -for (j = 0; j < XLNX_ZYNQMP_GIC_ALIASES; j++) {
> -MemoryRegion *alias = >gic_mr[i][j];
> -
> -addr += XLNX_ZYNQMP_GIC_REGION_SIZE;
> -memory_region_init_alias(alias, OBJECT(s), "zynqmp-gic-alias", 
> mr,
> - 0, XLNX_ZYNQMP_GIC_REGION_SIZE);
> -memory_region_add_subregion(system_memory, addr, alias);
> -}
> -}
>
> +/* Realize APUs before realizing the GIC. KVM requires this.  */
>  for (i = 0; i < XLNX_ZYNQMP_NUM_APU_CPUS; i++) {
> -qemu_irq irq;
>  char *name;
>
>  object_property_set_int(OBJECT(>apu_cpu[i]), 
> QEMU_PSCI_CONDUIT_SMC,
> @@ -276,6 +252,36 @@ static void xlnx_zynqmp_realize(DeviceState *dev, Error 
> **errp)
>  error_propagate(errp, err);
>  return;
>  }
> +}
> +
> +object_property_set_bool(OBJECT(>gic), true, "realized", );
> +if (err) {
> +error_propagate(errp, err);
> +return;
> +}
> +
> +assert(ARRAY_SIZE(xlnx_zynqmp_gic_regions) == XLNX_ZYNQMP_GIC_REGIONS);
> +for (i = 0; i < XLNX_ZYNQMP_GIC_REGIONS; i++) {
> +SysBusDevice *gic = SYS_BUS_DEVICE(>gic);
> +const XlnxZynqMPGICRegion *r = _zynqmp_gic_regions[i];
> +MemoryRegion *mr = sysbus_mmio_get_region(gic, r->region_index);
> +uint32_t addr = r->address;
> +int j;
> +
> +sysbus_mmio_map(gic, r->region_index, addr);
> +
> +for (j = 0; j < XLNX_ZYNQMP_GIC_ALIASES; j++) {
> +MemoryRegion *alias = >gic_mr[i][j];
> +
> +addr += XLNX_ZYNQMP_GIC_REGION_SIZE;
> +memory_region_init_alias(alias, OBJECT(s), "zynqmp-gic-alias", 
> mr,
> + 0, XLNX_ZYNQMP_GIC_REGION_SIZE);
> +memory_region_add_subregion(system_memory, addr, alias);
> +}
> +}
> +
> +for (i = 0; i < XLNX_ZYNQMP_NUM_APU_CPUS; i++) {
> +qemu_irq irq;
>
>  sysbus_connect_irq(SYS_BUS_DEVICE(>gic), i,
> qdev_get_gpio_in(DEVICE(>apu_cpu[i]),
> --
> 2.5.0
>
>

[Qemu-devel] [PULL 20/31] block: Default to enabled write cache in blk_new()

2016-05-25 Thread Kevin Wolf

The existing users of the function are:

1. blk_new_open(), which already enabled the write cache
2. Some test cases that don't care about the setting
3. blockdev_init() for empty drives, where the cache mode is overridden
   with the value from the options when a medium is inserted

Therefore, this patch doesn't change the current behaviour. It will be
convenient, however, for additional users of blk_new() (like block
jobs) if the most sensible WCE setting is the default.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
Reviewed-by: Alberto Garcia 
---
 block/block-backend.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index f9da1d1..c8b13f1 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -125,6 +125,8 @@ BlockBackend *blk_new(void)
 
 blk = g_new0(BlockBackend, 1);
 blk->refcnt = 1;
+blk_set_enable_write_cache(blk, true);
+
 qemu_co_queue_init(>public.throttled_reqs[0]);
 qemu_co_queue_init(>public.throttled_reqs[1]);
 
@@ -160,7 +162,6 @@ BlockBackend *blk_new_open(const char *filename, const char 
*reference,
 return NULL;
 }
 
-blk_set_enable_write_cache(blk, true);
 blk->root = bdrv_root_attach_child(bs, "root", _root, blk);
 
 return blk;
-- 
1.8.3.1

[Qemu-devel] [PULL 16/31] dma-helpers: change BlockBackend to opaque value in DMAIOFunc

2016-05-25 Thread Kevin Wolf

From: Paolo Bonzini 

Callers of dma_blk_io have no way to pass extra data to the DMAIOFunc,
because the original callback and opaque are gone by the time DMAIOFunc
is called.  On the other hand, the BlockBackend is usually derived
from those extra data that you could pass to the DMAIOFunc (in the
next patch, that would be the SCSIRequest).

So change DMAIOFunc's prototype, decoupling it from blk_aio_readv
and blk_aio_writev's.  The new prototype loses the BlockBackend
and gains an extra opaque value which, in the case of dma_blk_readv
and dma_blk_writev, is of course used for the BlockBackend.

Signed-off-by: Paolo Bonzini 
Signed-off-by: Kevin Wolf 
---
 dma-helpers.c| 48 +++-
 hw/ide/core.c| 14 --
 hw/ide/internal.h|  6 +++---
 hw/ide/macio.c   |  2 +-
 include/sysemu/dma.h | 12 ++--
 5 files changed, 53 insertions(+), 29 deletions(-)

diff --git a/dma-helpers.c b/dma-helpers.c
index af07aab..b521d84 100644
--- a/dma-helpers.c
+++ b/dma-helpers.c
@@ -70,7 +70,7 @@ void qemu_sglist_destroy(QEMUSGList *qsg)
 
 typedef struct {
 BlockAIOCB common;
-BlockBackend *blk;
+AioContext *ctx;
 BlockAIOCB *acb;
 QEMUSGList *sg;
 uint64_t offset;
@@ -80,6 +80,7 @@ typedef struct {
 QEMUIOVector iov;
 QEMUBH *bh;
 DMAIOFunc *io_func;
+void *io_func_opaque;
 } DMAAIOCB;
 
 static void dma_blk_cb(void *opaque, int ret);
@@ -154,8 +155,7 @@ static void dma_blk_cb(void *opaque, int ret)
 
 if (dbs->iov.size == 0) {
 trace_dma_map_wait(dbs);
-dbs->bh = aio_bh_new(blk_get_aio_context(dbs->blk),
- reschedule_dma, dbs);
+dbs->bh = aio_bh_new(dbs->ctx, reschedule_dma, dbs);
 cpu_register_map_client(dbs->bh);
 return;
 }
@@ -164,8 +164,8 @@ static void dma_blk_cb(void *opaque, int ret)
 qemu_iovec_discard_back(>iov, dbs->iov.size & ~BDRV_SECTOR_MASK);
 }
 
-dbs->acb = dbs->io_func(dbs->blk, dbs->offset, >iov, 0,
-dma_blk_cb, dbs);
+dbs->acb = dbs->io_func(dbs->offset, >iov,
+dma_blk_cb, dbs, dbs->io_func_opaque);
 assert(dbs->acb);
 }
 
@@ -191,23 +191,25 @@ static const AIOCBInfo dma_aiocb_info = {
 .cancel_async   = dma_aio_cancel,
 };
 
-BlockAIOCB *dma_blk_io(
-BlockBackend *blk, QEMUSGList *sg, uint64_t offset,
-DMAIOFunc *io_func, BlockCompletionFunc *cb,
+BlockAIOCB *dma_blk_io(AioContext *ctx,
+QEMUSGList *sg, uint64_t offset,
+DMAIOFunc *io_func, void *io_func_opaque,
+BlockCompletionFunc *cb,
 void *opaque, DMADirection dir)
 {
-DMAAIOCB *dbs = blk_aio_get(_aiocb_info, blk, cb, opaque);
+DMAAIOCB *dbs = qemu_aio_get(_aiocb_info, NULL, cb, opaque);
 
-trace_dma_blk_io(dbs, blk, offset, (dir == DMA_DIRECTION_TO_DEVICE));
+trace_dma_blk_io(dbs, io_func_opaque, offset, (dir == 
DMA_DIRECTION_TO_DEVICE));
 
 dbs->acb = NULL;
-dbs->blk = blk;
 dbs->sg = sg;
+dbs->ctx = ctx;
 dbs->offset = offset;
 dbs->sg_cur_index = 0;
 dbs->sg_cur_byte = 0;
 dbs->dir = dir;
 dbs->io_func = io_func;
+dbs->io_func_opaque = io_func_opaque;
 dbs->bh = NULL;
 qemu_iovec_init(>iov, sg->nsg);
 dma_blk_cb(dbs, 0);
@@ -215,19 +217,39 @@ BlockAIOCB *dma_blk_io(
 }
 
 
+static
+BlockAIOCB *dma_blk_read_io_func(int64_t offset, QEMUIOVector *iov,
+ BlockCompletionFunc *cb, void *cb_opaque,
+ void *opaque)
+{
+BlockBackend *blk = opaque;
+return blk_aio_preadv(blk, offset, iov, 0, cb, cb_opaque);
+}
+
 BlockAIOCB *dma_blk_read(BlockBackend *blk,
  QEMUSGList *sg, uint64_t offset,
  void (*cb)(void *opaque, int ret), void *opaque)
 {
-return dma_blk_io(blk, sg, offset, blk_aio_preadv, cb, opaque,
+return dma_blk_io(blk_get_aio_context(blk),
+  sg, offset, dma_blk_read_io_func, blk, cb, opaque,
   DMA_DIRECTION_FROM_DEVICE);
 }
 
+static
+BlockAIOCB *dma_blk_write_io_func(int64_t offset, QEMUIOVector *iov,
+  BlockCompletionFunc *cb, void *cb_opaque,
+  void *opaque)
+{
+BlockBackend *blk = opaque;
+return blk_aio_pwritev(blk, offset, iov, 0, cb, cb_opaque);
+}
+
 BlockAIOCB *dma_blk_write(BlockBackend *blk,
   QEMUSGList *sg, uint64_t offset,
   void (*cb)(void *opaque, int ret), void *opaque)
 {
-return dma_blk_io(blk, sg, offset, blk_aio_pwritev, cb, opaque,
+return dma_blk_io(blk_get_aio_context(blk),
+  sg, offset, dma_blk_write_io_func, blk, cb, opaque,
   DMA_DIRECTION_TO_DEVICE);
 }
 
diff --git a/hw/ide/core.c b/hw/ide/core.c
index 7326189..029f6b9 100644
--- a/hw/ide/core.c
+++

[Qemu-devel] [PULL 07/31] block: Make bdrv_open() return a BDS

2016-05-25 Thread Kevin Wolf

From: Max Reitz 

There are no callers to bdrv_open() or bdrv_open_inherit() left that
pass a pointer to a non-NULL BDS pointer as the first argument of these
functions, so we can finally drop that parameter and just make them
return the new BDS.

Generally, the following pattern is applied:

bs = NULL;
ret = bdrv_open(, ..., _err);
if (ret < 0) {
error_propagate(errp, local_err);
...
}

by

bs = bdrv_open(..., errp);
if (!bs) {
ret = -EINVAL;
...
}

Of course, there are only a few instances where the pattern is really
pure.

Signed-off-by: Max Reitz 
Reviewed-by: Kevin Wolf 
Signed-off-by: Kevin Wolf 
---
 block.c   | 138 --
 block/block-backend.c |   6 +--
 block/vvfat.c |   8 +--
 blockdev.c|  38 +-
 include/block/block.h |   4 +-
 5 files changed, 66 insertions(+), 128 deletions(-)

diff --git a/block.c b/block.c
index ed4b487..ad4be90 100644
--- a/block.c
+++ b/block.c
@@ -64,10 +64,12 @@ static QTAILQ_HEAD(, BlockDriverState) all_bdrv_states =
 static QLIST_HEAD(, BlockDriver) bdrv_drivers =
 QLIST_HEAD_INITIALIZER(bdrv_drivers);
 
-static int bdrv_open_inherit(BlockDriverState **pbs, const char *filename,
- const char *reference, QDict *options, int flags,
- BlockDriverState *parent,
- const BdrvChildRole *child_role, Error **errp);
+static BlockDriverState *bdrv_open_inherit(const char *filename,
+   const char *reference,
+   QDict *options, int flags,
+   BlockDriverState *parent,
+   const BdrvChildRole *child_role,
+   Error **errp);
 
 /* If non-zero, use only whitelisted block drivers */
 static int use_bdrv_whitelist;
@@ -1336,14 +1338,13 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict 
*parent_options,
 qdict_put(options, "driver", qstring_from_str(bs->backing_format));
 }
 
-backing_hd = NULL;
-ret = bdrv_open_inherit(_hd,
-*backing_filename ? backing_filename : NULL,
-reference, options, 0, bs, _backing,
-errp);
-if (ret < 0) {
+backing_hd = bdrv_open_inherit(*backing_filename ? backing_filename : NULL,
+   reference, options, 0, bs, _backing,
+   errp);
+if (!backing_hd) {
 bs->open_flags |= BDRV_O_NO_BACKING;
 error_prepend(errp, "Could not open backing file: ");
+ret = -EINVAL;
 goto free_exit;
 }
 
@@ -1383,7 +1384,6 @@ BdrvChild *bdrv_open_child(const char *filename,
 BdrvChild *c = NULL;
 BlockDriverState *bs;
 QDict *image_options;
-int ret;
 char *bdref_key_dot;
 const char *reference;
 
@@ -1403,10 +1403,9 @@ BdrvChild *bdrv_open_child(const char *filename,
 goto done;
 }
 
-bs = NULL;
-ret = bdrv_open_inherit(, filename, reference, image_options, 0,
-parent, child_role, errp);
-if (ret < 0) {
+bs = bdrv_open_inherit(filename, reference, image_options, 0,
+   parent, child_role, errp);
+if (!bs) {
 goto done;
 }
 
@@ -1427,7 +1426,6 @@ static BlockDriverState 
*bdrv_append_temp_snapshot(BlockDriverState *bs,
 int64_t total_size;
 QemuOpts *opts = NULL;
 BlockDriverState *bs_snapshot;
-Error *local_err = NULL;
 int ret;
 
 /* if snapshot, we create a temporary backing file and open it
@@ -1466,12 +1464,10 @@ static BlockDriverState 
*bdrv_append_temp_snapshot(BlockDriverState *bs,
 qdict_put(snapshot_options, "driver",
   qstring_from_str("qcow2"));
 
-bs_snapshot = NULL;
-ret = bdrv_open(_snapshot, NULL, NULL, snapshot_options,
-flags, _err);
+bs_snapshot = bdrv_open(NULL, NULL, snapshot_options, flags, errp);
 snapshot_options = NULL;
-if (ret < 0) {
-error_propagate(errp, local_err);
+if (!bs_snapshot) {
+ret = -EINVAL;
 goto out;
 }
 
@@ -1505,10 +1501,12 @@ out:
  * should be opened. If specified, neither options nor a filename may be given,
  * nor can an existing BDS be reused (that is, *pbs has to be NULL).
  */
-static int bdrv_open_inherit(BlockDriverState **pbs, const char *filename,
- const char *reference, QDict *options, int flags,
- BlockDriverState *parent,
- const BdrvChildRole *child_role, Error **errp)
+static BlockDriverState *bdrv_open_inherit(const char *filename,
+   const char

[Qemu-devel] [PULL 30/31] commit: Use BlockBackend for I/O

2016-05-25 Thread Kevin Wolf

This changes the commit block job to use the job's BlockBackend for
performing its I/O. job->bs isn't used by the commit code any more
afterwards.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
---
 block/commit.c | 53 +
 1 file changed, 33 insertions(+), 20 deletions(-)

diff --git a/block/commit.c b/block/commit.c
index f308c8c..8a00e11 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -36,28 +36,36 @@ typedef struct CommitBlockJob {
 BlockJob common;
 RateLimit limit;
 BlockDriverState *active;
-BlockDriverState *top;
-BlockDriverState *base;
+BlockBackend *top;
+BlockBackend *base;
 BlockdevOnError on_error;
 int base_flags;
 int orig_overlay_flags;
 char *backing_file_str;
 } CommitBlockJob;
 
-static int coroutine_fn commit_populate(BlockDriverState *bs,
-BlockDriverState *base,
+static int coroutine_fn commit_populate(BlockBackend *bs, BlockBackend *base,
 int64_t sector_num, int nb_sectors,
 void *buf)
 {
 int ret = 0;
+QEMUIOVector qiov;
+struct iovec iov = {
+.iov_base = buf,
+.iov_len = nb_sectors * BDRV_SECTOR_SIZE,
+};
 
-ret = bdrv_read(bs, sector_num, buf, nb_sectors);
-if (ret) {
+qemu_iovec_init_external(, , 1);
+
+ret = blk_co_preadv(bs, sector_num * BDRV_SECTOR_SIZE,
+qiov.size, , 0);
+if (ret < 0) {
 return ret;
 }
 
-ret = bdrv_write(base, sector_num, buf, nb_sectors);
-if (ret) {
+ret = blk_co_pwritev(base, sector_num * BDRV_SECTOR_SIZE,
+ qiov.size, , 0);
+if (ret < 0) {
 return ret;
 }
 
@@ -73,8 +81,8 @@ static void commit_complete(BlockJob *job, void *opaque)
 CommitBlockJob *s = container_of(job, CommitBlockJob, common);
 CommitCompleteData *data = opaque;
 BlockDriverState *active = s->active;
-BlockDriverState *top = s->top;
-BlockDriverState *base = s->base;
+BlockDriverState *top = blk_bs(s->top);
+BlockDriverState *base = blk_bs(s->base);
 BlockDriverState *overlay_bs;
 int ret = data->ret;
 
@@ -94,6 +102,8 @@ static void commit_complete(BlockJob *job, void *opaque)
 bdrv_reopen(overlay_bs, s->orig_overlay_flags, NULL);
 }
 g_free(s->backing_file_str);
+blk_unref(s->top);
+blk_unref(s->base);
 block_job_completed(>common, ret);
 g_free(data);
 }
@@ -102,8 +112,6 @@ static void coroutine_fn commit_run(void *opaque)
 {
 CommitBlockJob *s = opaque;
 CommitCompleteData *data;
-BlockDriverState *top = s->top;
-BlockDriverState *base = s->base;
 int64_t sector_num, end;
 int ret = 0;
 int n = 0;
@@ -111,27 +119,27 @@ static void coroutine_fn commit_run(void *opaque)
 int bytes_written = 0;
 int64_t base_len;
 
-ret = s->common.len = bdrv_getlength(top);
+ret = s->common.len = blk_getlength(s->top);
 
 
 if (s->common.len < 0) {
 goto out;
 }
 
-ret = base_len = bdrv_getlength(base);
+ret = base_len = blk_getlength(s->base);
 if (base_len < 0) {
 goto out;
 }
 
 if (base_len < s->common.len) {
-ret = bdrv_truncate(base, s->common.len);
+ret = blk_truncate(s->base, s->common.len);
 if (ret) {
 goto out;
 }
 }
 
 end = s->common.len >> BDRV_SECTOR_BITS;
-buf = qemu_blockalign(top, COMMIT_BUFFER_SIZE);
+buf = blk_blockalign(s->top, COMMIT_BUFFER_SIZE);
 
 for (sector_num = 0; sector_num < end; sector_num += n) {
 uint64_t delay_ns = 0;
@@ -146,7 +154,8 @@ wait:
 break;
 }
 /* Copy if allocated above the base */
-ret = bdrv_is_allocated_above(top, base, sector_num,
+ret = bdrv_is_allocated_above(blk_bs(s->top), blk_bs(s->base),
+  sector_num,
   COMMIT_BUFFER_SIZE / BDRV_SECTOR_SIZE,
   );
 copy = (ret == 1);
@@ -158,7 +167,7 @@ wait:
 goto wait;
 }
 }
-ret = commit_populate(top, base, sector_num, n, buf);
+ret = commit_populate(s->top, s->base, sector_num, n, buf);
 bytes_written += n * BDRV_SECTOR_SIZE;
 }
 if (ret < 0) {
@@ -253,8 +262,12 @@ void commit_start(BlockDriverState *bs, BlockDriverState 
*base,
 return;
 }
 
-s->base   = base;
-s->top= top;
+s->base = blk_new();
+blk_insert_bs(s->base, base);
+
+s->top = blk_new();
+blk_insert_bs(s->top, top);
+
 s->active = bs;
 
 s->base_flags  = orig_base_flags;
-- 
1.8.3.1

[Qemu-devel] [PULL 29/31] backup: Use BlockBackend for I/O

2016-05-25 Thread Kevin Wolf

This changes the backup block job to use the job's BlockBackend for
performing its I/O. job->bs isn't used by the backup code any more
afterwards.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
---
 block/backup.c| 46 +-
 block/io.c|  9 -
 blockdev.c|  4 +---
 include/block/block.h |  2 --
 trace-events  |  1 -
 5 files changed, 22 insertions(+), 40 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 670ba50..feeb9f8 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -36,7 +36,7 @@ typedef struct CowRequest {
 
 typedef struct BackupBlockJob {
 BlockJob common;
-BlockDriverState *target;
+BlockBackend *target;
 /* bitmap for sync=incremental */
 BdrvDirtyBitmap *sync_bitmap;
 MirrorSyncMode sync_mode;
@@ -99,7 +99,7 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
   bool *error_is_read,
   bool is_write_notifier)
 {
-BlockDriverState *bs = job->common.bs;
+BlockBackend *blk = job->common.blk;
 CowRequest cow_request;
 struct iovec iov;
 QEMUIOVector bounce_qiov;
@@ -132,20 +132,15 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
 start * sectors_per_cluster);
 
 if (!bounce_buffer) {
-bounce_buffer = qemu_blockalign(bs, job->cluster_size);
+bounce_buffer = blk_blockalign(blk, job->cluster_size);
 }
 iov.iov_base = bounce_buffer;
 iov.iov_len = n * BDRV_SECTOR_SIZE;
 qemu_iovec_init_external(_qiov, , 1);
 
-if (is_write_notifier) {
-ret = bdrv_co_readv_no_serialising(bs,
-   start * sectors_per_cluster,
-   n, _qiov);
-} else {
-ret = bdrv_co_readv(bs, start * sectors_per_cluster, n,
-_qiov);
-}
+ret = blk_co_preadv(blk, start * job->cluster_size,
+bounce_qiov.size, _qiov,
+is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0);
 if (ret < 0) {
 trace_backup_do_cow_read_fail(job, start, ret);
 if (error_is_read) {
@@ -155,13 +150,11 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
 }
 
 if (buffer_is_zero(iov.iov_base, iov.iov_len)) {
-ret = bdrv_co_write_zeroes(job->target,
-   start * sectors_per_cluster,
-   n, BDRV_REQ_MAY_UNMAP);
+ret = blk_co_pwrite_zeroes(job->target, start * job->cluster_size,
+   bounce_qiov.size, BDRV_REQ_MAY_UNMAP);
 } else {
-ret = bdrv_co_writev(job->target,
- start * sectors_per_cluster, n,
- _qiov);
+ret = blk_co_pwritev(job->target, start * job->cluster_size,
+ bounce_qiov.size, _qiov, 0);
 }
 if (ret < 0) {
 trace_backup_do_cow_write_fail(job, start, ret);
@@ -203,7 +196,7 @@ static int coroutine_fn backup_before_write_notify(
 int64_t sector_num = req->offset >> BDRV_SECTOR_BITS;
 int nb_sectors = req->bytes >> BDRV_SECTOR_BITS;
 
-assert(req->bs == job->common.bs);
+assert(req->bs == blk_bs(job->common.blk));
 assert((req->offset & (BDRV_SECTOR_SIZE - 1)) == 0);
 assert((req->bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
 
@@ -224,7 +217,7 @@ static void backup_set_speed(BlockJob *job, int64_t speed, 
Error **errp)
 static void backup_cleanup_sync_bitmap(BackupBlockJob *job, int ret)
 {
 BdrvDirtyBitmap *bm;
-BlockDriverState *bs = job->common.bs;
+BlockDriverState *bs = blk_bs(job->common.blk);
 
 if (ret < 0 || block_job_is_cancelled(>common)) {
 /* Merge the successor back into the parent, delete nothing. */
@@ -282,7 +275,7 @@ static void backup_complete(BlockJob *job, void *opaque)
 BackupBlockJob *s = container_of(job, BackupBlockJob, common);
 BackupCompleteData *data = opaque;
 
-bdrv_unref(s->target);
+blk_unref(s->target);
 
 block_job_completed(job, data->ret);
 g_free(data);
@@ -378,8 +371,8 @@ static void coroutine_fn backup_run(void *opaque)
 {
 BackupBlockJob *job = opaque;
 BackupCompleteData *data;
-BlockDriverState *bs = job->common.bs;
-BlockDriverState *target = job->target;
+BlockDriverState *bs = blk_bs(job->common.blk);
+BlockBackend *target = job->target;
 int64_t start, end;
 int64_t sectors_per_cluster = cluster_size_sectors(job);
 int ret = 0;
@@ -468,7 +461,7 @@ static void coroutine_fn backup_run(void *opaque)
 qemu_co_rwlock_unlock(>flush_rwlock);

Re: [Qemu-devel] [QEMU RFC PATCH v2 4/6] Migration: migrate QTAILQ

2016-05-25 Thread Paolo Bonzini

> >> +/*
> >> + * Following 3 fields are for VMStateField which needs customized
> >> handling,
> >> + * such as QTAILQ in qemu/queue.h, lists, and tree.
> >> + */
> >> +const void *meta_data;
> >> +int (*extend_get)(QEMUFile *f, const void *metadata, void *opaque);
> >> +void (*extend_put)(QEMUFile *f, const void *metadata, void *opaque,
> >> +   QJSON *vmdesc);
> >>  } VMStateField;
> > 
> > Do not add these two function pointers to VMStateField, instead add
> > QJSON* and VMStateField* arguments as needed to VMStateInfo's get and put.
> 
> That is definitely the ideal way. However, VMStateInfo's get/put are
> already used extensively. If I change them, I need change all the
> calling sites of them. Not very sure about whether it will be welcomed.

Sure, don't be worried. :)  True, there are quite a few VMStateInfos (about
35) but the changes are simple.

> >> +#define VMSTATE_QTAILQ_METADATA(_field, _state, _type, _next, _vmsd) { \
> >> +.first = QTAILQ_FIRST_OFFSET(typeof_field(_state, _field)),\
> >> +.last =  QTAILQ_LAST_OFFSET(typeof_field(_state, _field)), \
> >> +.next = QTAILQ_NEXT_OFFSET(_type, _next),  \
> >> +.prev = QTAILQ_PREV_OFFSET(_type, _next),  \
> >> +.entry = offsetof(_type, _next),   \
> >> +.size = sizeof(_type), \
> >> +.vmsd = &(_vmsd),  \
> >> +}
> > 
> > .last and .prev are unnecessary, since they come just after .first and
> > .next (and their use is hidden behind QTAILQ_RAW_*).  .first and .next
  

Actually, .first and .next are always 0.  I misread your changes to 
qemu/queue.h.
See below.

> > can be placed in .offset and .num_offset respectively.  So you don't
> > really need an extra metadata struct.
> > 
> > If you prefer you could have something like
> > 
> > union {
> > size_t num_offset;/* VMS_VARRAY */
> > size_t size_offset;   /* VMS_VBUFFER */
> > size_t next_offset;   /* VMS_TAILQ */
> > } offsets;
>
> Actually I explored the approach in which I use a VMSD to encode all the
> information. But a VMSD describes actual memory layout. Interpreting it
> another way could be confusing.
> 
> One of the assumption about QTAILQ is that we can only use the
> interfaces defined in queue.h to access it. I intend not to depend on
> its actually layout. From this perspective, these 6 parameters are
> needed.

You are already adding QTAILQ_RAW_*, aren't you?  Those interfaces
would need to know about the layout, and you are passing redundant
information:

- .next_offset should always be 0
- .prev_offset should always be sizeof(void *)
- .first_offset should always be 0
- .last_offset should always be sizeof(void *)

so you only need head and entry, which you can store in .offset and
.num_offset.  The .vmsd field you have in ->metadata can be stored
in VMStateField's .vmsd, and likewise for .size (which can be stored
in VMStateField's .vmsd).

Thanks,

Paolo

Re: [Qemu-devel] [PATCH v6 1/3] loader: Allow ELF loader to auto-detect the ELF arch

2016-05-25 Thread Alistair Francis

On Tue, May 24, 2016 at 3:08 PM, Cleber Rosa  wrote:
>
> On 05/13/2016 05:37 PM, Alistair Francis wrote:
>>
>>
>> +if (elf_machine < 1) {
>> +/* The caller didn't specify and ARCH, we can figure it out */
>
>
> Spotted a comment typo: s/and/an/

Thanks, sending a version 7 with it fixed.

Thanks,

Alistair

>
>
>> +lseek(fd, 0x12, SEEK_SET);
>> +if (read(fd, _machine, sizeof(e_machine)) != sizeof(e_machine))
>> {
>> +goto fail;
>> +}
>> +elf_machine = e_machine;
>> +}
>> +
>
>
> --
> Cleber Rosa
> [ Sr Software Engineer - Virtualization Team - Red Hat ]
> [ Avocado Test Framework - avocado-framework.github.io ]
>

[Qemu-devel] [PATCH v7 0/3] Add a generic loader

2016-05-25 Thread Alistair Francis

This work is based on the original work by Li Guang with extra
features added by Peter C and myself.

The idea of this loader is to allow the user to load multiple images
or values into QEMU at startup.

Memory values can be loaded like this: -device 
loader,addr=0xfd1a0104,data=0x800e,data-len=4

Images can be loaded like this: -device loader,file=./images/u-boot.elf,cpu=0

This can be useful and we use it a lot in Xilinx to load multiple images
into a machine at creation (ATF, Kernel and DTB for example).

It can also be used to set registers.

This patch series also make the load_elf() function more generic by not
requiring an architecture.

V7:
 - Fix typo in comment
 - Rebase
V6:
 - Add error checking
V5:
 - Rebase
V4:
 - Re-write documentation
 - Allow the loader to work with every architecture
 - Move the file to hw/core
 - Increase the maximum number of CPUs
 - Make the CPU operations conditional
 - Convert the cpu option to cpu-num
 - Require the user to specify endianess
V2:
 - Add an entry to the maintainers file
 - Add some documentation
 - Perform bounds checking on the data_len
 - Register and unregister the reset in the realise/unrealise
Changes since RFC:
 - Add support for BE


Alistair Francis (3):
  loader: Allow ELF loader to auto-detect the ELF arch
  generic-loader: Add a generic loader
  docs: Add a generic loader explanation document

 MAINTAINERS  |   6 ++
 docs/generic-loader.txt  |  54 +
 hw/core/Makefile.objs|   2 +
 hw/core/generic-loader.c | 170 +++
 hw/core/loader.c |  10 +++
 include/hw/core/generic-loader.h |  45 +++
 6 files changed, 287 insertions(+)
 create mode 100644 docs/generic-loader.txt
 create mode 100644 hw/core/generic-loader.c
 create mode 100644 include/hw/core/generic-loader.h

-- 
2.7.4

[Qemu-devel] [PATCH v7 3/3] docs: Add a generic loader explanation document

2016-05-25 Thread Alistair Francis

Signed-off-by: Alistair Francis 
---
V6:
 - Fixup documentation
V4:
 - Re-write to be more comprehensive

 docs/generic-loader.txt | 54 +
 1 file changed, 54 insertions(+)
 create mode 100644 docs/generic-loader.txt

diff --git a/docs/generic-loader.txt b/docs/generic-loader.txt
new file mode 100644
index 000..effb244
--- /dev/null
+++ b/docs/generic-loader.txt
@@ -0,0 +1,54 @@
+Copyright (c) 2016 Xilinx Inc.
+
+This work is licensed under the terms of the GNU GPL, version 2 or later.  See
+the COPYING file in the top-level directory.
+
+
+The 'loader' device allows the user to load multiple images or values into
+QEMU at startup.
+
+Loading Memory Values
+-
+The loader device allows memory values to be set from the command line. This
+can be done by following the syntax below:
+
+-device loader,addr=,data=,data-len=
+-device loader,addr=,cpu-num=
+
+NOTE: It is also possible to mix the commands above, e.g. include the cpu-num
+  argument with the data argument.
+
+  - The address to store the data or the value to set the CPUs PC
+  - The value to be written to the addr. The maximum size of the
+  data is 8 bytes.
+  - The length of the data in bytes. This argument must be 
included
+  if the data argument is.
+   - Set to true if the data to be stored on the guest should be
+  written as big endian data. The default is to write little
+  endian data.
+   - This will cause the CPU to be reset and the PC to be set to
+  the value of addr.
+
+An example of loading value 0x800e to address 0xfd1a0104 is:
+-device loader,addr=0xfd1a0104,data=0x800e,data-len=4
+
+Loading Files
+-
+The loader device also allows files to be loaded into memory. This can be done
+similarly to setting memory values. The syntax is shown below:
+
+-device loader,file=,addr=,cpu-num=,force-raw=
+
+  - A file to be loaded into memory
+  - The addr in memory that the file should be loaded. This is
+  ignored if you are using an ELF (unless force-raw is true).
+  This is required if you aren't loading an ELF.
+   - This specifies the CPU that should be used. This is an
+  optional argument and will cause the CPU's PC to be set to
+  where the image is stored. This option should only be used
+  for the boot image.
+ - Forces the file to be treated as a raw image. This can be
+  used to specify the load address of ELF files.
+
+An example of loading an ELF file which CPU0 will boot is shown below:
+-device loader,file=./images/boot.elf,cpu-num=0
-- 
2.7.4

[Qemu-devel] [PATCH v7 2/3] generic-loader: Add a generic loader

2016-05-25 Thread Alistair Francis

Add a generic loader to QEMU which can be used to load images or set
memory values.

Signed-off-by: Alistair Francis 
---
V7:
 - Rebase
V6:
 - Add error checking
V5:
 - Rebase
V4:
 - Allow the loader to work with every architecture
 - Move the file to hw/core
 - Increase the maximum number of CPUs
 - Make the CPU operations conditional
 - Convert the cpu option to cpu-num
 - Require the user to specify endianess
V3:
 - Pass the ram_size to load_image_targphys()
V2:
 - Add maintainers entry
 - Perform bounds checking
 - Register and unregister the reset in the realise/unrealise
Changes since RFC:
 - Add BE support

 MAINTAINERS  |   6 ++
 hw/core/Makefile.objs|   2 +
 hw/core/generic-loader.c | 170 +++
 include/hw/core/generic-loader.h |  45 +++
 4 files changed, 223 insertions(+)
 create mode 100644 hw/core/generic-loader.c
 create mode 100644 include/hw/core/generic-loader.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 81e7fac..b4a77d8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -972,6 +972,12 @@ F: hw/acpi/nvdimm.c
 F: hw/mem/nvdimm.c
 F: include/hw/mem/nvdimm.h
 
+Generic Loader
+M: Alistair Francis 
+S: Maintained
+F: hw/core/generic-loader.c
+F: include/hw/core/generic-loader.h
+
 Subsystems
 --
 Audio
diff --git a/hw/core/Makefile.objs b/hw/core/Makefile.objs
index 70951d4..eb00e7d 100644
--- a/hw/core/Makefile.objs
+++ b/hw/core/Makefile.objs
@@ -15,3 +15,5 @@ common-obj-$(CONFIG_SOFTMMU) += null-machine.o
 common-obj-$(CONFIG_SOFTMMU) += loader.o
 common-obj-$(CONFIG_SOFTMMU) += qdev-properties-system.o
 common-obj-$(CONFIG_PLATFORM_BUS) += platform-bus.o
+
+obj-$(CONFIG_SOFTMMU) += generic-loader.o
diff --git a/hw/core/generic-loader.c b/hw/core/generic-loader.c
new file mode 100644
index 000..7160d58
--- /dev/null
+++ b/hw/core/generic-loader.c
@@ -0,0 +1,170 @@
+/*
+ * Generic Loader
+ *
+ * Copyright (C) 2014 Li Guang
+ * Copyright (C) 2016 Xilinx Inc.
+ * Written by Li Guang 
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ */
+
+#include "qemu/osdep.h"
+#include "qom/cpu.h"
+#include "hw/sysbus.h"
+#include "sysemu/dma.h"
+#include "hw/loader.h"
+#include "qapi/error.h"
+#include "hw/core/generic-loader.h"
+
+#define CPU_NONE 0x
+
+static void generic_loader_reset(void *opaque)
+{
+GenericLoaderState *s = GENERIC_LOADER(opaque);
+
+if (s->cpu) {
+CPUClass *cc = CPU_GET_CLASS(s->cpu);
+cpu_reset(s->cpu);
+if (cc) {
+cc->set_pc(s->cpu, s->addr);
+}
+}
+
+if (s->data_len) {
+assert(s->data_len < sizeof(s->data));
+dma_memory_write((s->cpu ? s->cpu : first_cpu)->as, s->addr, >data,
+ s->data_len);
+}
+}
+
+static void generic_loader_realize(DeviceState *dev, Error **errp)
+{
+GenericLoaderState *s = GENERIC_LOADER(dev);
+hwaddr entry;
+int big_endian;
+int size = 0;
+
+/* Perform some eror checking on the users options */
+if (s->data || s->data_len  || s->data_be) {
+/* User is loading memory values */
+if (s->file) {
+error_setg(errp, "Specifying a file is not supported when loading "
+   "memory values.");
+return;
+} else if (s->force_raw) {
+error_setg(errp, "Specifying force raw is not supported when "
+   "loading memory values.");
+return;
+} else if (!s->data || !s->data_len) {
+error_setg(errp, "Both data and data length must be specified.");
+return;
+}
+} else if (s->file || s->force_raw)  {
+/* User is loading an image */
+if (s->data || s->data_len || s->data_be) {
+error_setg(errp, "Data can not be specified when loading an "
+   "image.");
+return;
+}
+}
+
+qemu_register_reset(generic_loader_reset, dev);
+
+if (s->cpu_num != CPU_NONE) {
+s->cpu = qemu_get_cpu(s->cpu_num);
+if (!s->cpu) {
+error_setg(errp, "Specified boot CPU#%d is nonexistent",
+   s->cpu_num);
+return;
+}
+}
+
+#ifdef TARGET_WORDS_BIGENDIAN
+big_endian = 1;
+#else
+big_endian = 0;
+#endif
+
+if (s->file) {
+if (!s->force_raw) {
+size = load_elf(s->file, NULL, NULL, , NULL, NULL,
+

Re: [Qemu-devel] [QEMU RFC PATCH v2 4/6] Migration: migrate QTAILQ

2016-05-25 Thread Paolo Bonzini

> 1 QTAILQ should only be accessed using the interfaces defined in
> queue.h. Its structs should not be directly used. So I created
> interfaces in queue.h to query about its layout. If the implementation
> is changed, these interfaces should be changed accordingly. Code using
> these interfaces should not break.

You don't need to query the layout, as long as the knowledge
remains hidden in QTAILQ_RAW_* macros.  And because QTAILQ_*_OFFSET
returns constant values, you can just put the knowledge of the offsets
directly in QTAILQ_RAW_FOREACH and QTAILQ_RAW_INSERT_TAIL.

> 2 Based on point 1, vmstate_load_state/vmstate_put_state in vmstate.c
> doesn't exactly know the structs of QTAILAQ head and entry. So pointer
> arithmetic is needed to put/get a QTAILQ. To do it, we need those 6
> parameters to be passed in. So it is not redundant if we only want to
> only use the interfaces.

No, you only need two.  The other four are internal to qemu/queue.h.
Just like QTAILQ users do not know about tqh_* and tqe_*, they need not
know about their offsets, only the fields that contain them.

> 3 At this moment, vmstate_load_state/vmstate_put_state couldn't handle a
> queue, or a list, or another recursive structure. To make it
> extensible, I think a metadata is needed. The idea is for any
> structure which needs special handling, customized metadata/put/get
> should provide enough flexibility to hack around.

I think your solution is a bit overengineered.  If the metadata can
fit in the VMStateField, you can use VMStateField.

Thanks,

Paolo

Re: [Qemu-devel] [RFC PATCH v4 1/3] Mediated device Core driver

2016-05-25 Thread Alex Williamson

On Wed, 25 May 2016 01:28:15 +0530
Kirti Wankhede  wrote:

> Design for Mediated Device Driver:
> Main purpose of this driver is to provide a common interface for mediated
> device management that can be used by differnt drivers of different
> devices.
> 
> This module provides a generic interface to create the device, add it to
> mediated bus, add device to IOMMU group and then add it to vfio group.
> 
> Below is the high Level block diagram, with Nvidia, Intel and IBM devices
> as example, since these are the devices which are going to actively use
> this module as of now.
> 
>  +---+
>  |   |
>  | +---+ |  mdev_register_driver() +--+
>  | |   | +<+ __init() |
>  | |   | | |  |
>  | |  mdev | +>+  |<-> VFIO user
>  | |  bus  | | probe()/remove()| vfio_mpci.ko |APIs
>  | |  driver   | | |  |
>  | |   | | +--+
>  | |   | |  mdev_register_driver() +--+
>  | |   | +<+ __init() |
>  | |   | | |  |
>  | |   | +>+  |<-> VFIO user
>  | +---+ | probe()/remove()| vfio_mccw.ko |APIs
>  |   | |  |
>  |  MDEV CORE| +--+
>  |   MODULE  |
>  |   mdev.ko |
>  | +---+ |  mdev_register_device() +--+
>  | |   | +<+  |
>  | |   | | |  nvidia.ko   |<-> physical
>  | |   | +>+  |device
>  | |   | |callback +--+
>  | | Physical  | |
>  | |  device   | |  mdev_register_device() +--+
>  | | interface | |<+  |
>  | |   | | |  i915.ko |<-> physical
>  | |   | +>+  |device
>  | |   | |callback +--+
>  | |   | |
>  | |   | |  mdev_register_device() +--+
>  | |   | +<+  |
>  | |   | | | ccw_device.ko|<-> physical
>  | |   | +>+  |device
>  | |   | |callback +--+
>  | +---+ |
>  +---+
> 
> Core driver provides two types of registration interfaces:
> 1. Registration interface for mediated bus driver:
> 
> /**
>   * struct mdev_driver - Mediated device's driver
>   * @name: driver name
>   * @probe: called when new device created
>   * @remove:called when device removed
>   * @match: called when new device or driver is added for this bus.
>   Return 1 if given device can be handled by given driver and
>   zero otherwise.
>   * @driver:device driver structure
>   *
>   **/
> struct mdev_driver {
>  const char *name;
>  int  (*probe)  (struct device *dev);
>  void (*remove) (struct device *dev);
>int  (*match)(struct device *dev);
>  struct device_driverdriver;
> };
> 
> int  mdev_register_driver(struct mdev_driver *drv, struct module *owner);
> void mdev_unregister_driver(struct mdev_driver *drv);
> 
> Mediated device's driver for mdev should use this interface to register
> with Core driver. With this, mediated devices driver for such devices is
> responsible to add mediated device to VFIO group.
> 
> 2. Physical device driver interface
> This interface provides vendor driver the set APIs to manage physical
> device related work in their own driver. APIs are :
> - supported_config: provide supported configuration list by the vendor
>   driver
> - create: to allocate basic resources in vendor driver for a mediated
> device.
> - destroy: to free resources in vendor driver when mediated device is
>  destroyed.
> - start: to initiate mediated device initialization process from vendor
>driver when VM boots and before QEMU starts.
> - shutdown: to teardown mediated device resources during VM teardown.
> - read : read emulation callback.
> - write: write emulation callback.
> - set_irqs: send interrupt configuration information that QEMU sets.
> - get_region_info: to provide region size and its flags for the mediated
>  device.
> - validate_map_request: to validate remap pfn request.

nit, vfio is a userspace driver interface where QEMU is simply a user
of that interface.  We should never assume QEMU is the only user.

> 
> This registration interface should be used by vendor drivers to register
> each physical device to mdev core driver.
> 
>

[Qemu-devel] [PULL 1/1] qdev: Start disentangling bus from device

2016-05-25 Thread Andreas Färber

Move bus type and related APIs to a separate file bus.c.
This is a first step in breaking up qdev.c into more manageable chunks.

Reviewed-by: Peter Maydell 
[AF: Rebased onto osdep.h]
Signed-off-by: Andreas Färber 
---
 hw/core/Makefile.objs |   1 +
 hw/core/bus.c | 251 ++
 hw/core/qdev.c| 222 
 3 files changed, 252 insertions(+), 222 deletions(-)
 create mode 100644 hw/core/bus.c

diff --git a/hw/core/Makefile.objs b/hw/core/Makefile.objs
index 70951d4..82a9ef8 100644
--- a/hw/core/Makefile.objs
+++ b/hw/core/Makefile.objs
@@ -1,5 +1,6 @@
 # core qdev-related obj files, also used by *-user:
 common-obj-y += qdev.o qdev-properties.o
+common-obj-y += bus.o
 common-obj-y += fw-path-provider.o
 # irq.o needed for qdev GPIO handling:
 common-obj-y += irq.o
diff --git a/hw/core/bus.c b/hw/core/bus.c
new file mode 100644
index 000..3e3f8ac
--- /dev/null
+++ b/hw/core/bus.c
@@ -0,0 +1,251 @@
+/*
+ *  Dynamic device configuration and creation -- buses.
+ *
+ *  Copyright (c) 2009 CodeSourcery
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "hw/qdev.h"
+#include "qapi/error.h"
+
+static void qbus_set_hotplug_handler_internal(BusState *bus, Object *handler,
+  Error **errp)
+{
+
+object_property_set_link(OBJECT(bus), OBJECT(handler),
+ QDEV_HOTPLUG_HANDLER_PROPERTY, errp);
+}
+
+void qbus_set_hotplug_handler(BusState *bus, DeviceState *handler, Error 
**errp)
+{
+qbus_set_hotplug_handler_internal(bus, OBJECT(handler), errp);
+}
+
+void qbus_set_bus_hotplug_handler(BusState *bus, Error **errp)
+{
+qbus_set_hotplug_handler_internal(bus, OBJECT(bus), errp);
+}
+
+int qbus_walk_children(BusState *bus,
+   qdev_walkerfn *pre_devfn, qbus_walkerfn *pre_busfn,
+   qdev_walkerfn *post_devfn, qbus_walkerfn *post_busfn,
+   void *opaque)
+{
+BusChild *kid;
+int err;
+
+if (pre_busfn) {
+err = pre_busfn(bus, opaque);
+if (err) {
+return err;
+}
+}
+
+QTAILQ_FOREACH(kid, >children, sibling) {
+err = qdev_walk_children(kid->child,
+ pre_devfn, pre_busfn,
+ post_devfn, post_busfn, opaque);
+if (err < 0) {
+return err;
+}
+}
+
+if (post_busfn) {
+err = post_busfn(bus, opaque);
+if (err) {
+return err;
+}
+}
+
+return 0;
+}
+
+static void qbus_realize(BusState *bus, DeviceState *parent, const char *name)
+{
+const char *typename = object_get_typename(OBJECT(bus));
+BusClass *bc;
+char *buf;
+int i, len, bus_id;
+
+bus->parent = parent;
+
+if (name) {
+bus->name = g_strdup(name);
+} else if (bus->parent && bus->parent->id) {
+/* parent device has id -> use it plus parent-bus-id for bus name */
+bus_id = bus->parent->num_child_bus;
+
+len = strlen(bus->parent->id) + 16;
+buf = g_malloc(len);
+snprintf(buf, len, "%s.%d", bus->parent->id, bus_id);
+bus->name = buf;
+} else {
+/* no id -> use lowercase bus type plus global bus-id for bus name */
+bc = BUS_GET_CLASS(bus);
+bus_id = bc->automatic_ids++;
+
+len = strlen(typename) + 16;
+buf = g_malloc(len);
+len = snprintf(buf, len, "%s.%d", typename, bus_id);
+for (i = 0; i < len; i++) {
+buf[i] = qemu_tolower(buf[i]);
+}
+bus->name = buf;
+}
+
+if (bus->parent) {
+QLIST_INSERT_HEAD(>parent->child_bus, bus, sibling);
+bus->parent->num_child_bus++;
+object_property_add_child(OBJECT(bus->parent), bus->name, OBJECT(bus), 
NULL);
+object_unref(OBJECT(bus));
+} else if (bus != sysbus_get_default()) {
+/* TODO: once all bus devices are qdevified,
+   only reset handler for main_system_bus should be registered here. */
+qemu_register_reset(qbus_reset_all_fn, bus);
+}
+}
+
+static void bus_unparent(Object *obj)
+{
+BusState *bus = BUS(obj);
+BusChild *kid;
+
+

[Qemu-devel] [PULL 0/1] QOM devices patch queue 2016-05-25

2016-05-25 Thread Andreas Färber

Hello Peter,

This is my QOM (devices) patch queue. Please pull.

I've needed to build-fix it twice by now, so if I fixed the #includes wrongly
please pick it up as patch and tweak it or apply a cleanup on top.

Thanks,
Andreas

P.S. I don't seem to have a MAINTAINERS patch to go with it yet, but part of
the discussion prompting the patch was about creating more fine-grained sections
reflecting maintenance reality, so we might want to consider adding a new one?
The intended follow-up was refactoring tree-walking and device reset for s390x.

Cc: Peter Maydell 
Cc: Eduardo Habkost 
Cc: David Hildenbrand 

The following changes since commit 287db79df8af8e31f18e262feb5e05103a09e4d4:

  Merge remote-tracking branch 'remotes/ehabkost/tags/x86-pull-request' into 
staging (2016-05-24 13:06:33 +0100)

are available in the git repository at:

  git://github.com/afaerber/qemu-cpu.git tags/qom-devices-for-peter

for you to fetch changes up to 8cce06115b6f69e4596e9f81455140759cca8c9d:

  qdev: Start disentangling bus from device (2016-05-25 23:24:35 +0200)


QOM infrastructure fixes and device conversions

* Start splitting up qdev.c


Andreas Färber (1):
  qdev: Start disentangling bus from device

 hw/core/Makefile.objs |   1 +
 hw/core/bus.c | 251 ++
 hw/core/qdev.c| 222 
 3 files changed, 252 insertions(+), 222 deletions(-)
 create mode 100644 hw/core/bus.c

Re: [Qemu-devel] [RFC PATCH 2/3] tcg: Add support for fence generation in x86 backend

2016-05-25 Thread Pranith Kumar

On Wed, May 25, 2016 at 3:25 PM, Alex Bennée  wrote:
> Should we make the emitting of the function call/TCGop conditional on
> MTTCG being enabled? If we are running in round-robin mode there is no
> need to issue any fence operations.
>

Also, we should check if SMP(> 1 processors) is enabled since fences
are not necessary on UP systems.

-- 
Pranith

Re: [Qemu-devel] [RFC PATCH 2/3] tcg: Add support for fence generation in x86 backend

2016-05-25 Thread Pranith Kumar

On Wed, May 25, 2016 at 3:43 PM, Sergey Fedorov  wrote:
>
> I think it would better not to defer native support for the operation.
> It should be relatively simple instruction. Otherwise we could wind up
> deferring this indefinitely.
>

Agreed. I will go with the native generation for now.

Thanks,
-- 
Pranith

Re: [Qemu-devel] [QEMU RFC PATCH v2 4/6] Migration: migrate QTAILQ

2016-05-25 Thread Jianjun Duan



On 05/25/2016 12:22 PM, Paolo Bonzini wrote:
>> 1 QTAILQ should only be accessed using the interfaces defined in
>> queue.h. Its structs should not be directly used. So I created
>> interfaces in queue.h to query about its layout. If the implementation
>> is changed, these interfaces should be changed accordingly. Code using
>> these interfaces should not break.
> 
> You don't need to query the layout, as long as the knowledge
> remains hidden in QTAILQ_RAW_* macros.  And because QTAILQ_*_OFFSET
> returns constant values, you can just put the knowledge of the offsets
> directly in QTAILQ_RAW_FOREACH and QTAILQ_RAW_INSERT_TAIL.
> 
>> 2 Based on point 1, vmstate_load_state/vmstate_put_state in vmstate.c
>> doesn't exactly know the structs of QTAILAQ head and entry. So pointer
>> arithmetic is needed to put/get a QTAILQ. To do it, we need those 6
>> parameters to be passed in. So it is not redundant if we only want to
>> only use the interfaces.
> 
> No, you only need two.  The other four are internal to qemu/queue.h.
> Just like QTAILQ users do not know about tqh_* and tqe_*, they need not
> know about their offsets, only the fields that contain them.
> 

In vmstate_load/put_state usually we use a VMSD to describe the type for
dump/load purpose. The metadata for the QTAILQ is for the same purpose.
Of course we can encode the information in a VMSD or VMStateField if
we don't have to change much.

The user may only care the position of head and entry. But to
implement QTAILQ_RAW_***, we need more offset information than that.
If we don't query the offsets using something like offset() and store
it in a metadata, we have to make the assumption that all the pointer
types have the same size. Since in vmstate_load/put_state we don't have
any type information about the QTAILQ instance, we cannot use offset()
in QTAILQ_RAW_*** macros. May have to stick the constants there for
first/last/next/prev in QTAILQ_RAW_***. Not sure if it will work for
all arches.
>> 3 At this moment, vmstate_load_state/vmstate_put_state couldn't handle a
>> queue, or a list, or another recursive structure. To make it
>> extensible, I think a metadata is needed. The idea is for any
>> structure which needs special handling, customized metadata/put/get
>> should provide enough flexibility to hack around.
> 
> I think your solution is a bit overengineered.  If the metadata can
> fit in the VMStateField, you can use VMStateField.
> 
> Thanks,
> 
> Paolo
> 

Thanks,
Jianjun

Re: [Qemu-devel] [RFC PATCH 2/3] tcg: Add support for fence generation in x86 backend

2016-05-25 Thread Richard Henderson


On 05/25/2016 12:25 PM, Alex Bennée wrote:

That would solves the problem of converting the various backends
piecemeal - although obviously we should move to all backends having
"native" support ASAP. However by introducing expensive substitute
functions we will slow down the translations as each front end is
expanded to translate the target barrier ops.


Obviously.  We could in fact do that all up front if desired.  It doesn't take 
long to look up the barrier instructions for each isa.




Should we make the emitting of the function call/TCGop conditional on
MTTCG being enabled? If we are running in round-robin mode there is no
need to issue any fence operations.


Probably.  But to keep the translators clean we should probably hide that 
within tcg_gen_fence().



r~

Re: [Qemu-devel] [RFC PATCH 2/3] tcg: Add support for fence generation in x86 backend

2016-05-25 Thread Sergey Fedorov

On 25/05/16 22:59, Pranith Kumar wrote:
> On Wed, May 25, 2016 at 3:43 PM, Sergey Fedorov  wrote:
>> I think it would better not to defer native support for the operation.
>> It should be relatively simple instruction. Otherwise we could wind up
>> deferring this indefinitely.
>>
> Agreed. I will go with the native generation for now.

I mean we'd better implement native support for all the supported host
architectures right away.

Kind regards,
Sergey

Re: [Qemu-devel] [RFC PATCH 2/3] tcg: Add support for fence generation in x86 backend

2016-05-25 Thread Pranith Kumar

Hi Richard,

Thank you for the helpful comments.

On Wed, May 25, 2016 at 1:35 PM, Richard Henderson  wrote:
> On 05/24/2016 10:18 AM, Pranith Kumar wrote:
>> diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
>> index 92be341..93ea42e 100644
>> --- a/tcg/i386/tcg-target.h
>> +++ b/tcg/i386/tcg-target.h
>> @@ -100,6 +100,7 @@ extern bool have_bmi1;
>>  #define TCG_TARGET_HAS_muls2_i321
>>  #define TCG_TARGET_HAS_muluh_i320
>>  #define TCG_TARGET_HAS_mulsh_i320
>> +#define TCG_TARGET_HAS_fence1
>
>
> This has to be defined for all hosts.

OK. I will add an entry in tcg.h with default 0 and override in
individual architecture once it is implemented.

>> @@ -347,6 +347,7 @@ static inline int
>> tcg_target_const_match(tcg_target_long val, TCGType type,
>>  #define OPC_SHRX(0xf7 | P_EXT38 | P_SIMDF2)
>>  #define OPC_TESTL  (0x85)
>>  #define OPC_XCHG_ax_r32(0x90)
>> +#define OPC_MFENCE  (0xAE | P_EXT)
>
> Why define OPC_MFENCE if you're not going to use it?  Of course, it's not
> exactly a complete and useful definition, so maybe just delete OPC_MFENCE.

I want to use OPC_MFENCE instead of hard-coding the value in
tcg_out_fence(), but as you said the definition is not complete(it
currently generates only 0x0FAE). I am trying to figure out how to
generate 0x0FAEF0 using the definition.

>
> Also, for 32-bit you need to check for sse2 before outputting this.  See
> also the existing cpuid checks in tcg_target_init and the fallback smp_mb
> definition for pre-gcc-4.4.

OK, I'll check the current code and do something similar.

Thanks,
-- 
Pranith

1 2 3 4 >

1 - 100 of 350 matches

Mail list logo