date:20131213

[Qemu-devel] [PATCH v8 08/12] block: Parse backing option to reference existing BDS

2013-12-13 Thread Fam Zheng

Now it's safe to allow reference for backing_hd in the interface.

Signed-off-by: Fam Zheng f...@redhat.com
---
 block.c | 26 +++---
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/block.c b/block.c
index b3993d7..fba7148 100644
--- a/block.c
+++ b/block.c
@@ -1191,11 +1191,25 @@ int bdrv_open(BlockDriverState *bs, const char 
*filename, QDict *options,
 /* If there is a backing file, use it */
 if ((flags  BDRV_O_NO_BACKING) == 0) {
 QDict *backing_options;
-
-qdict_extract_subqdict(options, backing_options, backing.);
-ret = bdrv_open_backing_file(bs, backing_options, local_err);
-if (ret  0) {
-goto close_and_fail;
+const char *backing_name;
+BlockDriverState *backing_hd;
+
+backing_name = qdict_get_try_str(options, backing);
+qdict_del(options, backing);
+if (backing_name) {
+backing_hd = bdrv_find(backing_name);
+if (!backing_hd) {
+error_set(local_err, QERR_DEVICE_NOT_FOUND, backing_name);
+ret = -ENOENT;
+goto close_and_fail;
+}
+bdrv_set_backing_hd(bs, backing_hd);
+} else {
+qdict_extract_subqdict(options, backing_options, backing.);
+ret = bdrv_open_backing_file(bs, backing_options, local_err);
+if (ret  0) {
+goto close_and_fail;
+}
 }
 }
 
@@ -1682,7 +1696,6 @@ void bdrv_swap(BlockDriverState *bs_new, BlockDriverState 
*bs_old)
 assert(QLIST_EMPTY(bs_new-dirty_bitmaps));
 assert(bs_new-job == NULL);
 assert(bs_new-dev == NULL);
-assert(bdrv_op_blocker_is_empty(bs_new));
 assert(bs_new-io_limits_enabled == false);
 assert(!throttle_have_timer(bs_new-throttle_state));
 
@@ -1701,7 +1714,6 @@ void bdrv_swap(BlockDriverState *bs_new, BlockDriverState 
*bs_old)
 /* Check a few fields that should remain attached to the device */
 assert(bs_new-dev == NULL);
 assert(bs_new-job == NULL);
-assert(bdrv_op_blocker_is_empty(bs_new));
 assert(bs_new-io_limits_enabled == false);
 assert(!throttle_have_timer(bs_new-throttle_state));
 
-- 
1.8.5.1

Re: [Qemu-devel] Debugging with printf

2013-12-13 Thread Fam Zheng


On 2013年12月13日 15:47, Mar Tsan wrote:

I understand that they're not one and the same but there are
similarities. After all the Emulator is based on QEMU. How would someone
go about editing QEMU source to display messages?



There are also similarities in editing any C program to display message. 
You didn't say what information you want from QEMU and you said you know 
it's about adding printf's, then what's the problem you have there? 
Please be specific about your question. If you want to know how to get 
or build the source code, here are some instructions:


http://wiki.qemu-project.org/Documentation/GettingStartedDevelopers

Thanks,
Fam

Re: [Qemu-devel] Debugging with printf

2013-12-13 Thread Mar Tsan

My problem is how do I read the information I want to print. From where?
What do I do?


2013/12/13 Fam Zheng f...@redhat.com

 On 2013年12月13日 15:47, Mar Tsan wrote:

 I understand that they're not one and the same but there are
 similarities. After all the Emulator is based on QEMU. How would someone
 go about editing QEMU source to display messages?


 There are also similarities in editing any C program to display message.
 You didn't say what information you want from QEMU and you said you know
 it's about adding printf's, then what's the problem you have there? Please
 be specific about your question. If you want to know how to get or build
 the source code, here are some instructions:

 http://wiki.qemu-project.org/Documentation/GettingStartedDevelopers

 Thanks,
 Fam

Re: [Qemu-devel] Occasional clockjump in Win2012 after Live Migration

2013-12-13 Thread Peter Lieven

Am 13.12.2013 05:12, schrieb Vadim Rozenfeld:
 Does your VM belong to domain or workgroup? 

We had 2 vServers where this happened. One was a Domain Controller and the 
second was an independent Workgroup Server.

Do you have evidence how the DateTime Clock is driven in Windows 2012?

Peter

Re: [Qemu-devel] controlling qemu

2013-12-13 Thread Stefan Hajnoczi

On Mon, Dec 09, 2013 at 05:59:32PM +0100, lukass.va...@seznam.cz wrote:
 I want to implement library in c++, that will start and control QEMU. But I 
 dont know how to handle qemu and how to send signals to QEMU. Could you give
 me at least some hint how to do that?

Most management stacks and cloud frameworks seem to be using libvirt
nowadays.  And that's a good thing because a lot of work goes into
libvirt to use the latest and greatest QEMU options, set up networking
and firewalls, PCI passthrough, device hotplug, etc.

So start by looking at http://libvirt.org/ to see if it meets your
needs.

If you really want to write your own library, learn the QEMU
command-line and QMP monitor.  See docs/qmp/qmp-spec.txt and
scripts/qmp/qmp-shell for QMP examples.

Also consider that writing your own library means you'll have to study
the man pages and QEMU source code.  Here is a very quick overview of
QEMU to get you started:
http://vmsplice.net/~stefan/qemu-code-overview.pdf

Stefan

Re: [Qemu-devel] [RFC 3/7] iothread: add I/O thread object

2013-12-13 Thread Stefan Hajnoczi

On Thu, Dec 12, 2013 at 12:00:12PM -0600, Michael Roth wrote:
 Quoting Stefan Hajnoczi (2013-12-12 07:19:40)
  +static void *iothread_run(void *opaque)
  +{
  +IOThread *iothread = opaque;
  +
  +for (;;) {
  +/* TODO can we optimize away acquire/release to only happen when
  + * aio_notify() was called?
  + */
 
 Perhaps have the AioContext's notifier callback set a flag that can be
 checked for afterward to determine whether we should release/re-acquire?
 Calls to aio_context_acquire() could reset it upon acquistion, so we could
 maybe do something like:
 
 while(!iothread-stopping) {
 aio_context_acquire(iothread-ctx);
 while (!iothread-ctx-notified) {
 aio_poll(iothread-ctx, true);
 }
 aio_context_release(iothread-ctx);
 }

When aio_notify() kicks aio_poll() it returns false.  So I was thinking of:

while (!iothread-stopping) {
aio_context_acquire(iothread-ctx);
while (!iothread-stopping  aio_poll(iothread-ctx, true)) {
/* Progress was made, keep going */
}
aio_context_release(iothread-ctx);
}

I'll try it in the next version.  Just didn't want to get too fancy yet.

 
  +aio_context_acquire(iothread-ctx);
  +if (iothread-stopping) {
  +aio_context_release(iothread-ctx);
  +break;
  +}
  +aio_poll(iothread-ctx, true);
  +aio_context_release(iothread-ctx);
  +}
  +return NULL;
  +}
  +
  +static void iothread_instance_init(Object *obj)
  +{
  +IOThread *iothread = IOTHREAD(obj);
  +
  +iothread-stopping = false;
  +iothread-ctx = aio_context_new();
  +
  +/* This assumes .instance_init() is called from a thread with useful 
  CPU
  + * affinity for us to inherit.
  + */
 
 Is this assumption necessary/controllable? Couldn't we just expose the thread
 id via QOM or some other interface so users/management can set the affinity
 later?

This assumption holds since the monitor and command-line run in the main
thread.

The fix has traditionally been to create the thread from a BH scheduled
in the main loop.  That way it inherits the main thread's affinity.

We definitely need to expose tids via QOM/QMP.  That's something I'm
looking at QContext for.  Did you already implement an interface?

Stefan

Re: [Qemu-devel] [PATCH] target-sh4: Use new qemu_ld/st opcodes

2013-12-13 Thread Alex Bennée


aurel...@aurel32.net writes:

 Signed-off-by: Aurelien Jarno aurel...@aurel32.net
 ---
  target-sh4/translate.c |  167 
 ++--
  1 file changed, 90 insertions(+), 77 deletions(-)

 diff --git a/target-sh4/translate.c b/target-sh4/translate.c
 index 2272eb0..87f532a 100644
 --- a/target-sh4/translate.c
 +++ b/target-sh4/translate.c
 @@ -464,7 +464,7 @@ static void _decode_opc(DisasContext * ctx)
   {
   TCGv addr = tcg_temp_new();
   tcg_gen_addi_i32(addr, REG(B11_8), B3_0 * 4);
 - tcg_gen_qemu_st32(REG(B7_4), addr, ctx-memidx);
 +tcg_gen_qemu_st_i32(REG(B7_4), addr, ctx-memidx, MO_TEUL);
   tcg_temp_free(addr);
snip

There seems to be a fix of tabs and spaces in that patch.

-- 
Alex Bennée
QEMU/KVM Hacker for Linaro

Re: [Qemu-devel] [PATCH] trace: add glib 2.32+ static GMutex support

2013-12-13 Thread Stefan Hajnoczi

On Thu, Dec 12, 2013 at 07:36:18PM +0400, Michael Tokarev wrote:
 12.12.2013 18:52, Stefan Hajnoczi wrote:
  The GStaticMutex API was deprecated in glib 2.32.  We cannot switch over
  to GMutex unconditionally since we would drop support for older glib
  versions.  But the deprecated API warnings during build are annoying so
  use static GMutex when possible.
  
  Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
  ---
   trace/simple.c | 45 ++---
   1 file changed, 38 insertions(+), 7 deletions(-)
  
  diff --git a/trace/simple.c b/trace/simple.c
  index 1e3f691..941f7ea 100644
  --- a/trace/simple.c
  +++ b/trace/simple.c
  @@ -39,7 +39,11 @@
* Trace records are written out by a dedicated thread.  The thread waits 
  for
* records to become available, writes them out, and then waits again.
*/
  +#if GLIB_CHECK_VERSION(2, 32, 0)
  +static GMutex trace_lock;
  +#else
   static GStaticMutex trace_lock = G_STATIC_MUTEX_INIT;
  +#endif
   
   /* g_cond_new() was deprecated in glib 2.31 but we still need to support 
  it */
   #if GLIB_CHECK_VERSION(2, 31, 0)
  @@ -86,6 +90,34 @@ typedef struct {
   static void read_from_buffer(unsigned int idx, void *dataptr, size_t size);
   static unsigned int write_to_buffer(unsigned int idx, void *dataptr, 
  size_t size);
   
  +/* Hide changes in glib mutex APIs */
  +static void lock_trace_lock(void)
  +{
  +#if GLIB_CHECK_VERSION(2, 32, 0)
  +g_mutex_lock(trace_lock);
  +#else
  +g_static_mutex_lock(trace_lock);
  +#endif
  +}
  +
  +static void unlock_trace_lock(void)
  +{
  +#if GLIB_CHECK_VERSION(2, 32, 0)
  +g_mutex_unlock(trace_lock);
  +#else
  +g_static_mutex_unlock(trace_lock);
  +#endif
  +}
  +
  +static GMutex *get_trace_lock_mutex(void)
  +{
  +#if GLIB_CHECK_VERSION(2, 32, 0)
  +return trace_lock;
  +#else
  +return g_static_mutex_get_mutex(trace_lock);
  +#endif
  +}
 
 
 I'd group mutex definition above with all the functions accessing it,
 and also make the functions inline.
 
 Well, to my taste, this is a good example where #define is better than
 an inline function.  Compare the above with:
 
 diff --git a/trace/simple.c b/trace/simple.c
 index 1e3f691..2e55ac1 100644
 --- a/trace/simple.c
 +++ b/trace/simple.c
 @@ -39,7 +39,17 @@
   * Trace records are written out by a dedicated thread.  The thread waits for
   * records to become available, writes them out, and then waits again.
   */
 +#if GLIB_CHECK_VERSION(2, 32, 0)
 +static GMutex trace_lock;
 +#define lock_trace_lock() g_mutex_lock(trace_lock)
 +#define unlock_trace_lock() g_mutex_unlock(trace_lock)
 +#define get_trace_lock_mutex() (trace_lock)
 +#else
  static GStaticMutex trace_lock = G_STATIC_MUTEX_INIT;
 +#define lock_trace_lock() g_static_mutex_lock(trace_lock)
 +#define unlock_trace_lock() g_static_mutex_unlock(trace_lock)
 +#define get_trace_lock_mutex() g_static_mutex_get_mutex(trace_lock)
 +#endif
 
  /* g_cond_new() was deprecated in glib 2.31 but we still need to support it 
 */
  #if GLIB_CHECK_VERSION(2, 31, 0)
 
 (#defines here and elsewhere has added bonus - when debugging, debugger
 does not step into the inline functions, -- such stepping is quite annoying).
 
 But somehow many developers prefer inline functions (sometimes it is better
 indeed, especially in a commonly used header files, and when the functions
 has complex or many parameters; in this case we have much simpler situation.
 
 For fun, this #ifdeffery is 5 times larger than the actual users of the
 functions being defined :)

Yes, I think you are right.  In general I avoid using macros but here it
does make things nicer.

Stefan

Re: [Qemu-devel] [Spice-devel] Vdagent not working on xen linux hvm DomUs

2013-12-13 Thread Fabio Fantoni


Il 12/12/2013 17:05, Fabio Fantoni ha scritto:

Il 12/12/2013 16:23, Wei Liu ha scritto:

On Thu, Dec 12, 2013 at 02:10:23PM +0100, Fabio Fantoni wrote:
[...]
I did some other tests, I narrowed down the commit range to the one 
between:


commit c9fea5d701f8fd33f0843728ec264d95cee3ed37 Mon, 22 Jul 2013
15:14:18 (Merge remote-tracking branch 'bonzini/iommu-for-anthony')
where there is virtio net regression with xen

and

commit962b03fcf509db25c847aa67c4eff574c240dcfe Thu, 4 Jul 2013
15:42:43 + (xen: Mark fixed platform I/O as unaligned)
where virtio net is working

I also tested:
commit 2562becfc126ed7678c662ee23b7c1fe135d8966 Mon, 15 Jul 2013
19:02:41 +
and
commit dcb117bfda5af6f6ceb7231778d36d8bce4aee93 Thu, 4 Jul 2013
15:42:46 +
but qemu crashes on xl create for another error and I haven't found
which is the commit to apply with git cherry-pick so that I can
check if the virtio net regression is present.

Can someone help me please?

I added also qemu-devel to cc.




I did a quick test with Xen's QEMU, currently at

commit 1c514a7734b7f98625a0d18d5e8ee7581f26e50c
Merge: 79c097d 35bdc13
Author: Stefano Stabellini stefano.stabell...@eu.citrix.com
Date:   Tue Jun 25 11:34:24 2013 +

 Merge remote branch 'perard/cpu-hotplug-port-v2' into 
xen-staging-master-7


from git://xenbits.xen.org/qemu-upstream-unstable.git

My guest is Squeeze with stock kernel 2.6.32.

vif=['model=virtio-net-pci,bridge=xenbr0']

No pci=nomsi in guest kernel command line.

Everything worked fine. And /proc/interrupts shows that it's indeed
using MSI for virtio PCI.

I'm kind of confused. (And in the long run of this thread I probably
didn't remember everything.)

Wei.


I tried with commit e16435c95be86244bd92c5c26579bd4298aa65a6 
(xen_disk: mark ioreq as mapped before unmapping in error case) from 
git://xenbits.xen.org/qemu-upstream-4.3-testing.git.

There are only 4 commits difference between mine and your test.
FWIK the only other difference is domUs kernel versions, and the msi 
problem is probably a regression between kernel 2.6.32 and 3.2 (the 
older domUs used in my tests was Precise with kernel 3.2).

Tomorrow I'll try also with squeeze.


I tested with squeeze and with qemu 1.3.1, virtio net works also without 
pci=nomsi, so seems kernel regression about msi using xen (that make 
virtio devices not working) between versions 2.6.32 and 3.2.

Any idea about solve it?



RIguardo invece l'altra regressione qemu tra il 4 e 22 luglio che da 
errore xen mapcache usando virtio net puoi aiutarmi?
Another question: the qemu 1.6 regression between july 4th-22nd 
commits (qemu crash on domU kernel load with xen mapcache error with 
virtio net), could you help me?


Thanks for any reply.


I did another test about this qemu regression and I'm unable to decrease 
the range of commits :(

[Qemu-devel] [PATCH v2] inet_listen_opts: add error checking

2013-12-13 Thread Gerd Hoffmann

Don't use atoi() function which doesn't detect errors, switch to
strtol and error out on failures.  Also add a range check while
being at it.

[ v2: use parse_uint_full instead of strtol ]

Signed-off-by: Gerd Hoffmann kra...@redhat.com
---
 util/qemu-sockets.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
index 6b97dc1..c3560c1 100644
--- a/util/qemu-sockets.c
+++ b/util/qemu-sockets.c
@@ -133,8 +133,18 @@ int inet_listen_opts(QemuOpts *opts, int port_offset, 
Error **errp)
 ai.ai_family = PF_INET6;
 
 /* lookup */
-if (port_offset)
-snprintf(port, sizeof(port), %d, atoi(port) + port_offset);
+if (port_offset) {
+int baseport;
+if (parse_uint_full(port, baseport, 10)  0) {
+error_setg(errp, can't convert to a number: %s, port);
+return -1;
+}
+if (baseport  0 || baseport + port_offset  65535) {
+error_setg(errp, port %s out of range, port);
+return -1;
+}
+snprintf(port, sizeof(port), %d, baseport + port_offset);
+}
 rc = getaddrinfo(strlen(addr) ? addr : NULL, port, ai, res);
 if (rc != 0) {
 error_setg(errp, address resolution failed for %s:%s: %s, addr, port,
-- 
1.8.3.1

Re: [Qemu-devel] [PATCH 5/7] hw/arm/boot: Allow easier swapping in of different loader code

2013-12-13 Thread Peter Maydell

On 13 December 2013 03:19, Peter Crosthwaite
peter.crosthwa...@xilinx.com wrote:
 Why do we need blobs at all? Cant we just fix arm/boot to directly
 setup the CPU state to the desired? Rather than complex blobs that
 execute ARM instructions just manipulate the regs directly.

We could in theory do this for the primary bootloader, but
the secondary CPU blob has to have a loop in it so we
can sit around waiting for the guest code running in the
primary to tell us it's time to go:

 +static const ARMInsnFixup smpboot[] = {
 +{ 0xe59f2028 }, /* ldr r2, gic_cpu_if */
 +{ 0xe59f0028 }, /* ldr r0, startaddr */
 +{ 0xe3a01001 }, /* mov r1, #1 */
 +{ 0xe5821000 }, /* str r1, [r2] - set GICC_CTLR.Enable */
 +{ 0xe3a010ff }, /* mov r1, #0xff */
 +{ 0xe5821004 }, /* str r1, [r2, 4] - set GIC_PMR.Priority to 0xff */
 +{ 0, FIXUP_DSB },   /* dsb */
 +{ 0xe320f003 }, /* wfi */
 +{ 0xe5901000 }, /* ldr r1, [r0] */
 +{ 0xe1110001 }, /* tst r1, r1 */
 +{ 0x0afb }, /* beq wfi */
 +{ 0xe12fff11 }, /* bx  r1 */
 +{ 0, FIXUP_GIC_CPU_IF },
 +{ 0, FIXUP_BOOTREG },
 +{ 0, FIXUP_TERMINATOR }
  };

We're also writing to devices here, and it's cleaner to do that
by running a guest code instruction than by somehow having
the boot code ferret around inside the device's implementation
pre-start, I think.

thanks
-- PMM

[Qemu-devel] [PATCH v3] inet_listen_opts: add error checking

2013-12-13 Thread Gerd Hoffmann

Don't use atoi() function which doesn't detect errors, switch to
strtol and error out on failures.  Also add a range check while
being at it.

[ v3: oops, v2 didn't build ]
[ v2: use parse_uint_full instead of strtol ]

Signed-off-by: Gerd Hoffmann kra...@redhat.com
---
 util/qemu-sockets.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
index 6b97dc1..c3560c1 100644
--- a/util/qemu-sockets.c
+++ b/util/qemu-sockets.c
@@ -133,8 +133,18 @@ int inet_listen_opts(QemuOpts *opts, int port_offset, 
Error **errp)
 ai.ai_family = PF_INET6;
 
 /* lookup */
-if (port_offset)
-snprintf(port, sizeof(port), %d, atoi(port) + port_offset);
+if (port_offset) {
+int baseport;
+if (parse_uint_full(port, baseport, 10)  0) {
+error_setg(errp, can't convert to a number: %s, port);
+return -1;
+}
+if (baseport  0 || baseport + port_offset  65535) {
+error_setg(errp, port %s out of range, port);
+return -1;
+}
+snprintf(port, sizeof(port), %d, baseport + port_offset);
+}
 rc = getaddrinfo(strlen(addr) ? addr : NULL, port, ai, res);
 if (rc != 0) {
 error_setg(errp, address resolution failed for %s:%s: %s, addr, port,
-- 
1.8.3.1

Re: [Qemu-devel] Occasional clockjump in Win2012 after Live Migration

2013-12-13 Thread Vadim Rozenfeld

On Fri, 2013-12-13 at 09:27 +0100, Peter Lieven wrote:
 Am 13.12.2013 05:12, schrieb Vadim Rozenfeld:
  Does your VM belong to domain or workgroup? 
 
 We had 2 vServers where this happened. One was a Domain Controller and the 
 second was an independent Workgroup Server.
 
 Do you have evidence how the DateTime Clock is driven in Windows 2012?

Should be CMOS periodic timer.

 
 Peter

[Qemu-devel] [PATCH v4] inet_listen_opts: add error checking

2013-12-13 Thread Gerd Hoffmann

Don't use atoi() function which doesn't detect errors, switch to
strtol and error out on failures.  Also add a range check while
being at it.

[ v4: didn't commit buildfix.  -ENOCOFFEE.  sorry for the spam ]
[ v3: oops, v2 didn't build ]
[ v2: use parse_uint_full instead of strtol ]

Signed-off-by: Gerd Hoffmann kra...@redhat.com
---
 util/qemu-sockets.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
index 6b97dc1..13590de 100644
--- a/util/qemu-sockets.c
+++ b/util/qemu-sockets.c
@@ -133,8 +133,18 @@ int inet_listen_opts(QemuOpts *opts, int port_offset, 
Error **errp)
 ai.ai_family = PF_INET6;
 
 /* lookup */
-if (port_offset)
-snprintf(port, sizeof(port), %d, atoi(port) + port_offset);
+if (port_offset) {
+unsigned long long baseport;
+if (parse_uint_full(port, baseport, 10)  0) {
+error_setg(errp, can't convert to a number: %s, port);
+return -1;
+}
+if (baseport + port_offset  65535) {
+error_setg(errp, port %s out of range, port);
+return -1;
+}
+snprintf(port, sizeof(port), %d, (int)baseport + port_offset);
+}
 rc = getaddrinfo(strlen(addr) ? addr : NULL, port, ai, res);
 if (rc != 0) {
 error_setg(errp, address resolution failed for %s:%s: %s, addr, port,
-- 
1.8.3.1

Re: [Qemu-devel] [PATCH] inet_listen_opts: add error checking

2013-12-13 Thread Gerd Hoffmann

  Hi,

 rant
 WHY is strtol() such a PAINFUL interface to use correctly?  And WHY
 can't qemu copy libvirt's lead of writing a SANE wrapper function, and
 then mandating that the rest of the code base use the sane wrapper
 instead of strtol()?
 /rant

Turns out there already is one, just /me didn't know.
So, for the record and the mailing list archives:

util/cutil.c provides parse_uint() and parse_uint_full().

The first returns a pointer to the remaining bits to parse, so you can
inspect the suffix (if any).  The second errors out in case there is
trailing garbage, simliar to the libvirt wrapper in case you pass in
NULL as end_ptr.

cheers,
  Gerd

[Qemu-devel] [PATCH v3 1/7] Add -mem-share option

2013-12-13 Thread Antonios Motakis

This option complements -mem-path. It implies -mem-prealloc. If specified,
the guest RAM is allocated as a shared memory object. If both -mem-path
and -mem-share are provided, the memory is allocated from the HugeTLBFS
supplied path, and then mmapped with MAP_SHARED.

Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
Signed-off-by: Nikolay Nikolaev n.nikol...@virtualopensystems.com
---
 exec.c | 72 +++---
 include/exec/cpu-all.h |  1 +
 qemu-options.hx|  9 +++
 vl.c   |  5 
 4 files changed, 66 insertions(+), 21 deletions(-)

diff --git a/exec.c b/exec.c
index f4b9ef2..10720f1 100644
--- a/exec.c
+++ b/exec.c
@@ -883,6 +883,7 @@ void qemu_mutex_unlock_ramlist(void)
 #include sys/vfs.h
 
 #define HUGETLBFS_MAGIC   0x958458f6
+#define MIN_HUGE_PAGE_SIZE(2*1024*1024)
 
 static long gethugepagesize(const char *path)
 {
@@ -920,15 +921,24 @@ static void *file_ram_alloc(RAMBlock *block,
 char *c;
 void *area;
 int fd;
+int flags;
 unsigned long hpagesize;
 
-hpagesize = gethugepagesize(path);
-if (!hpagesize) {
-return NULL;
-}
+if (path) {
+hpagesize = gethugepagesize(path);
 
-if (memory  hpagesize) {
-return NULL;
+if (!hpagesize) {
+return NULL ;
+}
+
+if (memory  hpagesize) {
+return NULL ;
+}
+} else {
+if (memory  MIN_HUGE_PAGE_SIZE) {
+return NULL ;
+}
+hpagesize = MIN_HUGE_PAGE_SIZE;
 }
 
 if (kvm_enabled()  !kvm_has_sync_mmu()) {
@@ -943,20 +953,37 @@ static void *file_ram_alloc(RAMBlock *block,
 *c = '_';
 }
 
-filename = g_strdup_printf(%s/qemu_back_mem.%s.XX, path,
-   sanitized_name);
-g_free(sanitized_name);
+if (path) {
+
+filename = g_strdup_printf(%s/qemu_back_mem.%s.XX, path,
+sanitized_name);
+g_free(sanitized_name);
 
-fd = mkstemp(filename);
-if (fd  0) {
-perror(unable to create backing store for hugepages);
+fd = mkstemp(filename);
+if (fd  0) {
+perror(unable to create backing store for huge);
+g_free(filename);
+return NULL ;
+}
+unlink(filename);
 g_free(filename);
+memory = (memory + hpagesize - 1)  ~(hpagesize - 1);
+} else if (mem_share) {
+filename = g_strdup_printf(qemu_back_mem.%s.XX, sanitized_name);
+g_free(sanitized_name);
+fd = shm_open(filename, O_CREAT | O_RDWR | O_EXCL,
+  S_IRWXU | S_IRWXG | S_IRWXO);
+if (fd  0) {
+perror(unable to create backing store for shared memory);
+g_free(filename);
+return NULL ;
+}
+shm_unlink(filename);
+g_free(filename);
+} else {
+fprintf(stderr, -mem-path or -mem-share required\n);
 return NULL;
 }
-unlink(filename);
-g_free(filename);
-
-memory = (memory+hpagesize-1)  ~(hpagesize-1);
 
 /*
  * ftruncate is not supported by hugetlbfs in older
@@ -967,7 +994,8 @@ static void *file_ram_alloc(RAMBlock *block,
 if (ftruncate(fd, memory))
 perror(ftruncate);
 
-area = mmap(0, memory, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
+flags = mem_share ? MAP_SHARED : MAP_PRIVATE;
+area = mmap(0, memory, PROT_READ | PROT_WRITE, flags, fd, 0);
 if (area == MAP_FAILED) {
 perror(file_ram_alloc: can't mmap RAM pages);
 close(fd);
@@ -1150,13 +1178,14 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, 
void *host,
 new_block-host = host;
 new_block-flags |= RAM_PREALLOC_MASK;
 } else if (xen_enabled()) {
-if (mem_path) {
-fprintf(stderr, -mem-path not supported with Xen\n);
+if (mem_path || mem_share) {
+fprintf(stderr,
+-mem-path and -mem-share not supported with Xen\n);
 exit(1);
 }
 xen_ram_alloc(new_block-offset, size, mr);
 } else {
-if (mem_path) {
+if (mem_path || mem_share) {
 if (phys_mem_alloc != qemu_anon_ram_alloc) {
 /*
  * file_ram_alloc() needs to allocate just like
@@ -1164,7 +1193,8 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void 
*host,
  * a hook there.
  */
 fprintf(stderr,
--mem-path not supported with this accelerator\n);
+-mem-path and -mem-share 
+not supported with this accelerator\n);
 exit(1);
 }
 new_block-host = file_ram_alloc(new_block, size, mem_path);
diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index b6998f0..05ad0ee 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -469,6

[Qemu-devel] [PATCH v3 0/7] host and vhost-net support for userspace based backends

2013-12-13 Thread Antonios Motakis

In this patch series we would like to introduce our approach for putting a
virtio-net backend in an external userspace process. Our eventual target is to
run the network backend in the Snabbswitch ethernet switch, while receiving
traffic from a guest inside QEMU/KVM which runs an unmodified virtio-net
implementation.

For this, we are working into extending vhost to allow equivalent functionality
for userspace. Vhost already passes control of the data plane of virtio-net to
the host kernel; we want to realize a similar model, but for userspace.

In this patch series the concept of a vhost-backend is introduced.

We define two vhost backend types - vhost-kernel and vhost-user. The former is
the interface to the current kernel module implementation. Its control plane is
ioctl based. The data plane is the kernel directly accessing the QEMU allocated,
guest memory.

In the new vhost-user backend, the control plane is based on communication
between QEMU and another userspace process using a unix domain socket. This
allows to implement a virtio backend for a guest running in QEMU, inside the
other userspace process.

We introduce a new memory related flag: -mem-share. When used the ram is
created as a shared memory object. When combined with -mem-path, it will reuse
a HugeTLBFS file instead of /dev/shm. This flag also implies -mem-prealloc.

The data path is realized by directly accessing the vrings and the buffer data
off the guest's memory.

The current user of vhost-user is only vhost-net. We add new netdev backend
that is intended to initialize vhost-net with vhost-user backend.

Example usage:

qemu -m 1024 -mem-share -mem-path /hugetlbfs \
 -netdev type=vhost-user,id=net0,file=/path/to/sock \
 -device virtio-net-pci,netdev=net0

Changes from v2:
 - Reconnect when the backend disappears

Changes from v1:
 - Implementation of vhost-user netdev backend
 - Code improvements

Antonios Motakis (7):
  Add -mem-share option
  Decouple vhost from kernel interface
  Add vhost-user skeleton
  Add domain socket communication for vhost-user backend
  Add vhost-user calls implementation
  Add new vhost-user netdev backend
  Add vhost-user reconnection

 exec.c|  72 +---
 hmp-commands.hx   |   4 +-
 hw/net/vhost_net.c| 144 +++
 hw/net/virtio-net.c   |  42 ++---
 hw/scsi/vhost-scsi.c  |  13 +-
 hw/virtio/Makefile.objs   |   2 +-
 hw/virtio/vhost-backend.c | 372 ++
 hw/virtio/vhost.c |  46 ++---
 include/exec/cpu-all.h|   1 +
 include/hw/virtio/vhost-backend.h |  40 
 include/hw/virtio/vhost.h |   4 +-
 include/net/vhost-user.h  |  17 ++
 include/net/vhost_net.h   |  15 +-
 net/Makefile.objs |   2 +-
 net/clients.h |   3 +
 net/hub.c |   1 +
 net/net.c |   2 +
 net/tap.c |  16 +-
 net/vhost-user.c  | 167 +
 qapi-schema.json  |  18 +-
 qemu-options.hx   |  12 ++
 vl.c  |   5 +
 22 files changed, 873 insertions(+), 125 deletions(-)
 create mode 100644 hw/virtio/vhost-backend.c
 create mode 100644 include/hw/virtio/vhost-backend.h
 create mode 100644 include/net/vhost-user.h
 create mode 100644 net/vhost-user.c

-- 
1.8.3.2

[Qemu-devel] [PATCH v3 5/7] Add vhost-user calls implementation

2013-12-13 Thread Antonios Motakis

Each ioctl request of vhost-kernel has a vhost-user message equivalent,
which is sent it over the control socket.

The general approach is to copy the data from the supplied argument
pointer to a designated field in the message. If a file descriptor is
to be passed it should be placed also in the fds array for inclusion in
the sendmsd control header.

VHOST_SET_MEM_TABLE ignores the supplied vhost_memory structure and scans
the global ram_list for ram blocks wiht a valid fd field set. This would
be set when the -mem-path and -mem-prealloc command line options are used.

Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
Signed-off-by: Nikolay Nikolaev n.nikol...@virtualopensystems.com
---
 hw/virtio/vhost-backend.c | 99 ++-
 1 file changed, 98 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
index 96d3bf0..c280936 100644
--- a/hw/virtio/vhost-backend.c
+++ b/hw/virtio/vhost-backend.c
@@ -66,6 +66,38 @@ typedef struct VhostUserMsg {
 };
 } VhostUserMsg;
 
+static unsigned long int ioctl_to_vhost_user_request[VHOST_USER_MAX] = {
+-1, /* VHOST_USER_NONE */
+VHOST_GET_FEATURES, /* VHOST_USER_GET_FEATURES */
+VHOST_SET_FEATURES, /* VHOST_USER_SET_FEATURES */
+VHOST_SET_OWNER, /* VHOST_USER_SET_OWNER */
+VHOST_RESET_OWNER, /* VHOST_USER_RESET_OWNER */
+VHOST_SET_MEM_TABLE, /* VHOST_USER_SET_MEM_TABLE */
+VHOST_SET_LOG_BASE, /* VHOST_USER_SET_LOG_BASE */
+VHOST_SET_LOG_FD, /* VHOST_USER_SET_LOG_FD */
+VHOST_SET_VRING_NUM, /* VHOST_USER_SET_VRING_NUM */
+VHOST_SET_VRING_ADDR, /* VHOST_USER_SET_VRING_ADDR */
+VHOST_SET_VRING_BASE, /* VHOST_USER_SET_VRING_BASE */
+VHOST_GET_VRING_BASE, /* VHOST_USER_GET_VRING_BASE */
+VHOST_SET_VRING_KICK, /* VHOST_USER_SET_VRING_KICK */
+VHOST_SET_VRING_CALL, /* VHOST_USER_SET_VRING_CALL */
+VHOST_SET_VRING_ERR, /* VHOST_USER_SET_VRING_ERR */
+VHOST_NET_SET_BACKEND /* VHOST_USER_NET_SET_BACKEND */
+};
+
+static VhostUserRequest vhost_user_request_translate(unsigned long int request)
+{
+VhostUserRequest idx;
+
+for (idx = 0; idx  VHOST_USER_MAX; idx++) {
+if (ioctl_to_vhost_user_request[idx] == request) {
+break;
+}
+}
+
+return (idx == VHOST_USER_MAX) ? VHOST_USER_NONE : idx;
+}
+
 static int vhost_user_recv(int fd, VhostUserMsg *msg)
 {
 ssize_t r = read(fd, msg, sizeof(VhostUserMsg));
@@ -129,13 +161,74 @@ static int vhost_user_call(struct vhost_dev *dev, 
unsigned long int request,
 {
 int fd = dev-control;
 VhostUserMsg msg;
+RAMBlock *block = 0;
 int result = 0, need_reply = 0;
 int fds[VHOST_MEMORY_MAX_NREGIONS];
 size_t fd_num = 0;
 
 assert(dev-vhost_ops-backend_type == VHOST_BACKEND_TYPE_USER);
 
+msg.request = vhost_user_request_translate(request);
+msg.flags = 0;
+
 switch (request) {
+case VHOST_GET_FEATURES:
+case VHOST_GET_VRING_BASE:
+need_reply = 1;
+break;
+
+case VHOST_SET_FEATURES:
+case VHOST_SET_LOG_BASE:
+msg.u64 = *((uint64_t *) arg);
+break;
+
+case VHOST_SET_OWNER:
+case VHOST_RESET_OWNER:
+break;
+
+case VHOST_SET_MEM_TABLE:
+QTAILQ_FOREACH(block, ram_list.blocks, next)
+{
+if (block-fd  0) {
+msg.memory.regions[fd_num].userspace_addr = (__u64) 
block-host;
+msg.memory.regions[fd_num].memory_size = block-length;
+msg.memory.regions[fd_num].guest_phys_addr = block-offset;
+fds[fd_num++] = block-fd;
+}
+}
+
+msg.memory.nregions = fd_num;
+
+if (!fd_num) {
+fprintf(stderr, Failed initializing vhost-user memory map\n
+consider -mem-path and -mem-prealloc options\n);
+return -1;
+}
+break;
+
+case VHOST_SET_LOG_FD:
+msg.fd = *((int *) arg);
+break;
+
+case VHOST_SET_VRING_NUM:
+case VHOST_SET_VRING_BASE:
+memcpy(msg.state, arg, sizeof(struct vhost_vring_state));
+break;
+
+case VHOST_SET_VRING_ADDR:
+memcpy(msg.addr, arg, sizeof(struct vhost_vring_addr));
+break;
+
+case VHOST_SET_VRING_KICK:
+case VHOST_SET_VRING_CALL:
+case VHOST_SET_VRING_ERR:
+case VHOST_NET_SET_BACKEND:
+memcpy(msg.file, arg, sizeof(struct vhost_vring_file));
+if (msg.file.fd  0) {
+fds[0] = msg.file.fd;
+fd_num = 1;
+}
+break;
 default:
 fprintf(stderr, vhost-user trying to send unhandled ioctl\n);
 return -1;
@@ -148,7 +241,11 @@ static int vhost_user_call(struct vhost_dev *dev, unsigned 
long int request,
 result = vhost_user_recv(fd, msg);
 if (!result) {
 switch (request) {
-default:
+case VHOST_GET_FEATURES:
+*((uint64_t *) arg) = msg.u64;
+

[Qemu-devel] [PATCH v3 4/7] Add domain socket communication for vhost-user backend

2013-12-13 Thread Antonios Motakis

Add structures for passing vhost-user messages over a unix domain socket.
This is the equivalent of the existing vhost-kernel ioctls.

Connect to the named unix domain socket. The system call sendmsg
is used for communication. To be able to pass file descriptors
between processes - we use SCM_RIGHTS type in the message control header.

Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
Signed-off-by: Nikolay Nikolaev n.nikol...@virtualopensystems.com
---
 hw/virtio/vhost-backend.c | 167 --
 1 file changed, 161 insertions(+), 6 deletions(-)

diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
index 847809f..96d3bf0 100644
--- a/hw/virtio/vhost-backend.c
+++ b/hw/virtio/vhost-backend.c
@@ -14,30 +14,185 @@
 #include fcntl.h
 #include unistd.h
 #include sys/ioctl.h
+#include sys/socket.h
+#include sys/un.h
+#include linux/vhost.h
+
+#define VHOST_MEMORY_MAX_NREGIONS8
+
+typedef enum VhostUserRequest {
+VHOST_USER_NONE = 0,
+VHOST_USER_GET_FEATURES = 1,
+VHOST_USER_SET_FEATURES = 2,
+VHOST_USER_SET_OWNER = 3,
+VHOST_USER_RESET_OWNER = 4,
+VHOST_USER_SET_MEM_TABLE = 5,
+VHOST_USER_SET_LOG_BASE = 6,
+VHOST_USER_SET_LOG_FD = 7,
+VHOST_USER_SET_VRING_NUM = 8,
+VHOST_USER_SET_VRING_ADDR = 9,
+VHOST_USER_SET_VRING_BASE = 10,
+VHOST_USER_GET_VRING_BASE = 11,
+VHOST_USER_SET_VRING_KICK = 12,
+VHOST_USER_SET_VRING_CALL = 13,
+VHOST_USER_SET_VRING_ERR = 14,
+VHOST_USER_NET_SET_BACKEND = 15,
+VHOST_USER_MAX
+} VhostUserRequest;
+
+typedef struct VhostUserMemoryRegion {
+__u64 guest_phys_addr;
+__u64 memory_size;
+__u64 userspace_addr;
+} VhostUserMemoryRegion;
+
+typedef struct VhostUserMemory {
+__u32 nregions;
+VhostUserMemoryRegion regions[VHOST_MEMORY_MAX_NREGIONS];
+} VhostUserMemory;
+
+typedef struct VhostUserMsg {
+VhostUserRequest request;
+
+int flags;
+union {
+uint64_tu64;
+int fd;
+struct vhost_vring_state state;
+struct vhost_vring_addr addr;
+struct vhost_vring_file file;
+
+VhostUserMemory memory;
+};
+} VhostUserMsg;
+
+static int vhost_user_recv(int fd, VhostUserMsg *msg)
+{
+ssize_t r = read(fd, msg, sizeof(VhostUserMsg));
+
+return (r == sizeof(VhostUserMsg)) ? 0 : -1;
+}
+
+static int vhost_user_send_fds(int fd, const VhostUserMsg *msg, int *fds,
+size_t fd_num)
+{
+int r;
+
+struct msghdr msgh;
+struct iovec iov[1];
+
+size_t fd_size = fd_num * sizeof(int);
+char control[CMSG_SPACE(fd_size)];
+struct cmsghdr *cmsg;
+
+memset(msgh, 0, sizeof(msgh));
+memset(control, 0, sizeof(control));
+
+/* set the payload */
+iov[0].iov_base = (void *) msg;
+iov[0].iov_len = sizeof(VhostUserMsg);
+
+msgh.msg_iov = iov;
+msgh.msg_iovlen = 1;
+
+if (fd_num) {
+msgh.msg_control = control;
+msgh.msg_controllen = sizeof(control);
+
+cmsg = CMSG_FIRSTHDR(msgh);
+
+cmsg-cmsg_len = CMSG_LEN(fd_size);
+cmsg-cmsg_level = SOL_SOCKET;
+cmsg-cmsg_type = SCM_RIGHTS;
+memcpy(CMSG_DATA(cmsg), fds, fd_size);
+} else {
+msgh.msg_control = 0;
+msgh.msg_controllen = 0;
+}
+
+do {
+r = sendmsg(fd, msgh, 0);
+} while (r  0  errno == EINTR);
+
+if (r  0) {
+fprintf(stderr, Failed to send msg(%d), reason: %s\n,
+msg-request, strerror(errno));
+} else {
+r = 0;
+}
+
+return r;
+}
 
 static int vhost_user_call(struct vhost_dev *dev, unsigned long int request,
 void *arg)
 {
+int fd = dev-control;
+VhostUserMsg msg;
+int result = 0, need_reply = 0;
+int fds[VHOST_MEMORY_MAX_NREGIONS];
+size_t fd_num = 0;
+
 assert(dev-vhost_ops-backend_type == VHOST_BACKEND_TYPE_USER);
-fprintf(stderr, vhost_user_call not implemented\n);
 
-return -1;
+switch (request) {
+default:
+fprintf(stderr, vhost-user trying to send unhandled ioctl\n);
+return -1;
+break;
+}
+
+result = vhost_user_send_fds(fd, msg, fds, fd_num);
+
+if (!result  need_reply) {
+result = vhost_user_recv(fd, msg);
+if (!result) {
+switch (request) {
+default:
+break;
+}
+}
+}
+
+return result;
 }
 
 static int vhost_user_init(struct vhost_dev *dev, const char *devpath)
 {
+int fd = -1;
+struct sockaddr_un un;
+size_t len;
+
 assert(dev-vhost_ops-backend_type == VHOST_BACKEND_TYPE_USER);
-fprintf(stderr, vhost_user_init not implemented\n);
 
-return -1;
+/* Create the socket */
+fd = socket(AF_UNIX, SOCK_STREAM, 0);
+if (fd == -1) {
+perror(socket);
+return -1;
+}
+
+un.sun_family = AF_UNIX;
+strcpy(un.sun_path, devpath);
+
+len = sizeof(un.sun_family) + strlen(devpath);
+
+/* Connect */
+if

[Qemu-devel] [PATCH v3 7/7] Add vhost-user reconnection

2013-12-13 Thread Antonios Motakis

At runtime vhost-user netdev will detect if the vhost backend is up or down.
Upon disconnection it will set link_down accordingly and notify virtio-net.

Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
Signed-off-by: Nikolay Nikolaev n.nikol...@virtualopensystems.com
---
 hw/net/vhost_net.c| 16 +++
 hw/virtio/vhost-backend.c | 21 +++
 include/hw/virtio/vhost-backend.h |  2 ++
 include/net/vhost_net.h   |  1 +
 net/vhost-user.c  | 56 +++
 5 files changed, 96 insertions(+)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index e42f4d6..56c218e 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -304,6 +304,17 @@ void vhost_net_virtqueue_mask(VHostNetState *net, 
VirtIODevice *dev,
 vhost_virtqueue_mask(net-dev, dev, idx, mask);
 }
 
+int vhost_net_link_status(VHostNetState *net)
+{
+int r = 0;
+
+if (net-dev.vhost_ops-vhost_status) {
+r = net-dev.vhost_ops-vhost_status(net-dev);
+}
+
+return r;
+}
+
 VHostNetState *get_vhost_net(NetClientState *nc)
 {
 VHostNetState *vhost_net = 0;
@@ -372,6 +383,11 @@ void vhost_net_virtqueue_mask(VHostNetState *net, 
VirtIODevice *dev,
 {
 }
 
+int vhost_net_link_status(VHostNetState *net)
+{
+return 0;
+}
+
 VHostNetState *get_vhost_net(NetClientState *nc)
 {
 return 0;
diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
index c280936..2c0b4f0 100644
--- a/hw/virtio/vhost-backend.c
+++ b/hw/virtio/vhost-backend.c
@@ -168,6 +168,10 @@ static int vhost_user_call(struct vhost_dev *dev, unsigned 
long int request,
 
 assert(dev-vhost_ops-backend_type == VHOST_BACKEND_TYPE_USER);
 
+if (fd  0) {
+return 0;
+}
+
 msg.request = vhost_user_request_translate(request);
 msg.flags = 0;
 
@@ -251,9 +255,24 @@ static int vhost_user_call(struct vhost_dev *dev, unsigned 
long int request,
 }
 }
 
+/* mark the backend non operational */
+if (result  0) {
+dev-control = -1;
+return 0;
+}
+
 return result;
 }
 
+static int vhost_user_status(struct vhost_dev *dev)
+{
+uint64_t features = 0;
+
+vhost_user_call(dev, VHOST_GET_FEATURES, features);
+
+return (dev-control = 0);
+}
+
 static int vhost_user_init(struct vhost_dev *dev, const char *devpath)
 {
 int fd = -1;
@@ -295,6 +314,7 @@ static int vhost_user_cleanup(struct vhost_dev *dev)
 static const VhostOps user_ops = {
 .backend_type = VHOST_BACKEND_TYPE_USER,
 .vhost_call = vhost_user_call,
+.vhost_status = vhost_user_status,
 .vhost_backend_init = vhost_user_init,
 .vhost_backend_cleanup = vhost_user_cleanup
 };
@@ -327,6 +347,7 @@ static int vhost_kernel_cleanup(struct vhost_dev *dev)
 static const VhostOps kernel_ops = {
 .backend_type = VHOST_BACKEND_TYPE_KERNEL,
 .vhost_call = vhost_kernel_call,
+.vhost_status = 0,
 .vhost_backend_init = vhost_kernel_init,
 .vhost_backend_cleanup = vhost_kernel_cleanup
 };
diff --git a/include/hw/virtio/vhost-backend.h 
b/include/hw/virtio/vhost-backend.h
index ef87ffa..f2b4a6c 100644
--- a/include/hw/virtio/vhost-backend.h
+++ b/include/hw/virtio/vhost-backend.h
@@ -22,12 +22,14 @@ struct vhost_dev;
 
 typedef int (*vhost_call)(struct vhost_dev *dev, unsigned long int request,
  void *arg);
+typedef int (*vhost_status)(struct vhost_dev *dev);
 typedef int (*vhost_backend_init)(struct vhost_dev *dev, const char *devpath);
 typedef int (*vhost_backend_cleanup)(struct vhost_dev *dev);
 
 typedef struct VhostOps {
 VhostBackendType backend_type;
 vhost_call vhost_call;
+vhost_status vhost_status;
 vhost_backend_init vhost_backend_init;
 vhost_backend_cleanup vhost_backend_cleanup;
 } VhostOps;
diff --git a/include/net/vhost_net.h b/include/net/vhost_net.h
index abd3d0b..6390907 100644
--- a/include/net/vhost_net.h
+++ b/include/net/vhost_net.h
@@ -31,5 +31,6 @@ void vhost_net_ack_features(VHostNetState *net, unsigned 
features);
 bool vhost_net_virtqueue_pending(VHostNetState *net, int n);
 void vhost_net_virtqueue_mask(VHostNetState *net, VirtIODevice *dev,
   int idx, bool mask);
+int vhost_net_link_status(VHostNetState *net);
 VHostNetState *get_vhost_net(NetClientState *nc);
 #endif
diff --git a/net/vhost-user.c b/net/vhost-user.c
index 6fd5afc..56f7dd4 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -12,6 +12,7 @@
 #include net/vhost_net.h
 #include net/vhost-user.h
 #include qemu/error-report.h
+#include qemu/timer.h
 
 typedef struct VhostUserState {
 NetClientState nc;
@@ -19,6 +20,9 @@ typedef struct VhostUserState {
 char *devpath;
 } VhostUserState;
 
+static QEMUTimer *vhost_user_timer;
+#define VHOST_USER_TIMEOUT  (1*1000)
+
 VHostNetState *vhost_user_get_vhost_net(NetClientState *nc)
 {
 VhostUserState *s = DO_UPCAST(VhostUserState, nc, nc);
@@ -31,6 +35,11

[Qemu-devel] [PATCH v3 3/7] Add vhost-user skeleton

2013-12-13 Thread Antonios Motakis

Add empty vhost_call, init and cleanup for the vhost-user backend.

Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
Signed-off-by: Nikolay Nikolaev n.nikol...@virtualopensystems.com
---
 hw/net/vhost_net.c| 57 ++-
 hw/virtio/vhost-backend.c | 35 
 include/hw/virtio/vhost-backend.h |  3 ++-
 include/net/vhost_net.h   | 13 -
 net/tap.c | 16 ++-
 5 files changed, 91 insertions(+), 33 deletions(-)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 4aaf0b4..3614e6c 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -91,43 +91,51 @@ static int vhost_net_get_fd(NetClientState *backend)
 }
 }
 
-struct vhost_net *vhost_net_init(NetClientState *backend, int devfd,
- bool force)
+struct vhost_net *vhost_net_init(VhostNetOptions *options)
 {
-int r;
+int r = -1;
 struct vhost_net *net = g_malloc(sizeof *net);
-if (!backend) {
-fprintf(stderr, vhost-net requires backend to be setup\n);
+
+if (!options-net_backend) {
+fprintf(stderr, vhost-net requires net backend to be setup\n);
 goto fail;
 }
-r = vhost_net_get_fd(backend);
-if (r  0) {
-goto fail;
+
+if (options-backend_type == VHOST_BACKEND_TYPE_KERNEL) {
+r = vhost_net_get_fd(options-net_backend);
+if (r  0) {
+goto fail;
+}
+
+net-dev.backend_features =
+tap_has_vnet_hdr(options-net_backend) ? 0 :
+(1  VHOST_NET_F_VIRTIO_NET_HDR);
 }
-net-nc = backend;
-net-dev.backend_features = tap_has_vnet_hdr(backend) ? 0 :
-(1  VHOST_NET_F_VIRTIO_NET_HDR);
+
+net-nc = options-net_backend;
 net-backend = r;
 
 net-dev.nvqs = 2;
 net-dev.vqs = net-vqs;
 
-r = vhost_dev_init(net-dev, devfd, /dev/vhost-net,
-   VHOST_BACKEND_TYPE_KERNEL, force);
+r = vhost_dev_init(net-dev, options-devfd, options-devpath,
+   options-backend_type, options-force);
 if (r  0) {
 goto fail;
 }
-if (!tap_has_vnet_hdr_len(backend,
-  sizeof(struct virtio_net_hdr_mrg_rxbuf))) {
-net-dev.features = ~(1  VIRTIO_NET_F_MRG_RXBUF);
-}
-if (~net-dev.features  net-dev.backend_features) {
-fprintf(stderr, vhost lacks feature mask % PRIu64  for backend\n,
-(uint64_t)(~net-dev.features  net-dev.backend_features));
-vhost_dev_cleanup(net-dev);
-goto fail;
+if (options-backend_type == VHOST_BACKEND_TYPE_KERNEL) {
+if (!tap_has_vnet_hdr_len(options-net_backend,
+sizeof(struct virtio_net_hdr_mrg_rxbuf))) {
+net-dev.features = ~(1  VIRTIO_NET_F_MRG_RXBUF);
+}
+if (~net-dev.features  net-dev.backend_features) {
+fprintf(stderr, vhost lacks feature mask % PRIu64
+for backend\n,
+   (uint64_t)(~net-dev.features  net-dev.backend_features));
+vhost_dev_cleanup(net-dev);
+goto fail;
+}
 }
-
 /* Set sane init value. Override when guest acks. */
 vhost_net_ack_features(net, 0);
 return net;
@@ -286,8 +294,7 @@ void vhost_net_virtqueue_mask(VHostNetState *net, 
VirtIODevice *dev,
 vhost_virtqueue_mask(net-dev, dev, idx, mask);
 }
 #else
-struct vhost_net *vhost_net_init(NetClientState *backend, int devfd,
- bool force)
+struct vhost_net *vhost_net_init(VhostNetOptions *options)
 {
 error_report(vhost-net support is not compiled in);
 return NULL;
diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
index 2a9a1ec..847809f 100644
--- a/hw/virtio/vhost-backend.c
+++ b/hw/virtio/vhost-backend.c
@@ -15,6 +15,38 @@
 #include unistd.h
 #include sys/ioctl.h
 
+static int vhost_user_call(struct vhost_dev *dev, unsigned long int request,
+void *arg)
+{
+assert(dev-vhost_ops-backend_type == VHOST_BACKEND_TYPE_USER);
+fprintf(stderr, vhost_user_call not implemented\n);
+
+return -1;
+}
+
+static int vhost_user_init(struct vhost_dev *dev, const char *devpath)
+{
+assert(dev-vhost_ops-backend_type == VHOST_BACKEND_TYPE_USER);
+fprintf(stderr, vhost_user_init not implemented\n);
+
+return -1;
+}
+
+static int vhost_user_cleanup(struct vhost_dev *dev)
+{
+assert(dev-vhost_ops-backend_type == VHOST_BACKEND_TYPE_USER);
+fprintf(stderr, vhost_user_cleanup not implemented\n);
+
+return -1;
+}
+
+static const VhostOps user_ops = {
+.backend_type = VHOST_BACKEND_TYPE_USER,
+.vhost_call = vhost_user_call,
+.vhost_backend_init = vhost_user_init,
+.vhost_backend_cleanup = vhost_user_cleanup
+};
+
 static int vhost_kernel_call(struct vhost_dev *dev, unsigned long int request,
  void *arg)
 {

[Qemu-devel] [PATCH v3 6/7] Add new vhost-user netdev backend

2013-12-13 Thread Antonios Motakis

Add a new QEMU netdev backend that is intended to invoke vhost_net
with the vhost-user backend. Also decouple virtio-net from the tap
backend.

Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
Signed-off-by: Nikolay Nikolaev n.nikol...@virtualopensystems.com
---
 hmp-commands.hx  |   4 +-
 hw/net/vhost_net.c   |  66 ++--
 hw/net/virtio-net.c  |  42 --
 hw/virtio/vhost.c|   1 -
 include/net/vhost-user.h |  17 
 include/net/vhost_net.h  |   1 +
 net/Makefile.objs|   2 +-
 net/clients.h|   3 ++
 net/hub.c|   1 +
 net/net.c|   2 +
 net/vhost-user.c | 111 +++
 qapi-schema.json |  18 +++-
 qemu-options.hx  |   3 ++
 13 files changed, 227 insertions(+), 44 deletions(-)
 create mode 100644 include/net/vhost-user.h
 create mode 100644 net/vhost-user.c

diff --git a/hmp-commands.hx b/hmp-commands.hx
index ebe8e78..d5a3774 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1190,7 +1190,7 @@ ETEXI
 {
 .name   = host_net_add,
 .args_type  = device:s,opts:s?,
-.params = tap|user|socket|vde|netmap|dump [options],
+.params = tap|user|socket|vde|netmap|vhost-user|dump [options],
 .help   = add host VLAN client,
 .mhandler.cmd = net_host_device_add,
 },
@@ -1218,7 +1218,7 @@ ETEXI
 {
 .name   = netdev_add,
 .args_type  = netdev:O,
-.params = 
[user|tap|socket|hubport|netmap],id=str[,prop=value][,...],
+.params = 
[user|tap|socket|hubport|netmap|vhost-user],id=str[,prop=value][,...],
 .help   = add host network device,
 .mhandler.cmd = hmp_netdev_add,
 },
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 3614e6c..e42f4d6 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -15,6 +15,7 @@
 
 #include net/net.h
 #include net/tap.h
+#include net/vhost-user.h
 
 #include hw/virtio/virtio-net.h
 #include net/vhost_net.h
@@ -174,15 +175,20 @@ static int vhost_net_start_one(struct vhost_net *net,
 goto fail_start;
 }
 
-net-nc-info-poll(net-nc, false);
-qemu_set_fd_handler(net-backend, NULL, NULL, NULL);
-file.fd = net-backend;
-for (file.index = 0; file.index  net-dev.nvqs; ++file.index) {
-const VhostOps *vhost_ops = net-dev.vhost_ops;
-r = vhost_ops-vhost_call(net-dev, VHOST_NET_SET_BACKEND, file);
-if (r  0) {
-r = -errno;
-goto fail;
+if (net-nc-info-poll) {
+net-nc-info-poll(net-nc, false);
+}
+
+if (net-nc-info-type == NET_CLIENT_OPTIONS_KIND_TAP) {
+qemu_set_fd_handler(net-backend, NULL, NULL, NULL);
+file.fd = net-backend;
+for (file.index = 0; file.index  net-dev.nvqs; ++file.index) {
+const VhostOps *vhost_ops = net-dev.vhost_ops;
+r = vhost_ops-vhost_call(net-dev, VHOST_NET_SET_BACKEND, file);
+if (r  0) {
+r = -errno;
+goto fail;
+}
 }
 }
 return 0;
@@ -193,7 +199,9 @@ fail:
 int r = vhost_ops-vhost_call(net-dev, VHOST_NET_SET_BACKEND, file);
 assert(r = 0);
 }
-net-nc-info-poll(net-nc, true);
+if (net-nc-info-poll) {
+net-nc-info-poll(net-nc, true);
+}
 vhost_dev_stop(net-dev, dev);
 fail_start:
 vhost_dev_disable_notifiers(net-dev, dev);
@@ -215,7 +223,9 @@ static void vhost_net_stop_one(struct vhost_net *net,
 int r = vhost_ops-vhost_call(net-dev, VHOST_NET_SET_BACKEND, file);
 assert(r = 0);
 }
-net-nc-info-poll(net-nc, true);
+if (net-nc-info-poll) {
+net-nc-info-poll(net-nc, true);
+}
 vhost_dev_stop(net-dev, dev);
 vhost_dev_disable_notifiers(net-dev, dev);
 }
@@ -235,7 +245,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
 }
 
 for (i = 0; i  total_queues; i++) {
-r = vhost_net_start_one(tap_get_vhost_net(ncs[i].peer), dev, i * 2);
+r = vhost_net_start_one(get_vhost_net(ncs[i].peer), dev, i * 2);
 
 if (r  0) {
 goto err;
@@ -252,7 +262,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
 
 err:
 while (--i = 0) {
-vhost_net_stop_one(tap_get_vhost_net(ncs[i].peer), dev);
+vhost_net_stop_one(get_vhost_net(ncs[i].peer), dev);
 }
 return r;
 }
@@ -273,7 +283,7 @@ void vhost_net_stop(VirtIODevice *dev, NetClientState *ncs,
 assert(r = 0);
 
 for (i = 0; i  total_queues; i++) {
-vhost_net_stop_one(tap_get_vhost_net(ncs[i].peer), dev);
+vhost_net_stop_one(get_vhost_net(ncs[i].peer), dev);
 }
 }
 
@@ -293,6 +303,29 @@ void vhost_net_virtqueue_mask(VHostNetState *net, 
VirtIODevice *dev,
 {
 vhost_virtqueue_mask(net-dev, dev, idx, mask);
 }
+
+VHostNetState *get_vhost_net(NetClientState *nc)
+{
+VHostNetState

[Qemu-devel] [PATCH v3 2/7] Decouple vhost from kernel interface

2013-12-13 Thread Antonios Motakis

We introduce the concept of vhost-backend, which can be either vhost-kernel
or vhost-user. The existing vhost interface to the kernel is abstracted
behind the vhost-kernel backend.

We replace all direct ioctls to the kernel with a vhost_call to the backend.
vhost dev-control is referenced only in the vhost-backend (ioctl, open, close).

Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
Signed-off-by: Nikolay Nikolaev n.nikol...@virtualopensystems.com
---
 hw/net/vhost_net.c| 13 +---
 hw/scsi/vhost-scsi.c  | 13 +---
 hw/virtio/Makefile.objs   |  2 +-
 hw/virtio/vhost-backend.c | 64 +++
 hw/virtio/vhost.c | 47 ++--
 include/hw/virtio/vhost-backend.h | 37 ++
 include/hw/virtio/vhost.h |  4 ++-
 7 files changed, 147 insertions(+), 33 deletions(-)
 create mode 100644 hw/virtio/vhost-backend.c
 create mode 100644 include/hw/virtio/vhost-backend.h

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 006576d..4aaf0b4 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -27,7 +27,6 @@
 #include sys/socket.h
 #include linux/kvm.h
 #include fcntl.h
-#include sys/ioctl.h
 #include linux/virtio_ring.h
 #include netpacket/packet.h
 #include net/ethernet.h
@@ -113,7 +112,8 @@ struct vhost_net *vhost_net_init(NetClientState *backend, 
int devfd,
 net-dev.nvqs = 2;
 net-dev.vqs = net-vqs;
 
-r = vhost_dev_init(net-dev, devfd, /dev/vhost-net, force);
+r = vhost_dev_init(net-dev, devfd, /dev/vhost-net,
+   VHOST_BACKEND_TYPE_KERNEL, force);
 if (r  0) {
 goto fail;
 }
@@ -170,7 +170,8 @@ static int vhost_net_start_one(struct vhost_net *net,
 qemu_set_fd_handler(net-backend, NULL, NULL, NULL);
 file.fd = net-backend;
 for (file.index = 0; file.index  net-dev.nvqs; ++file.index) {
-r = ioctl(net-dev.control, VHOST_NET_SET_BACKEND, file);
+const VhostOps *vhost_ops = net-dev.vhost_ops;
+r = vhost_ops-vhost_call(net-dev, VHOST_NET_SET_BACKEND, file);
 if (r  0) {
 r = -errno;
 goto fail;
@@ -180,7 +181,8 @@ static int vhost_net_start_one(struct vhost_net *net,
 fail:
 file.fd = -1;
 while (file.index--  0) {
-int r = ioctl(net-dev.control, VHOST_NET_SET_BACKEND, file);
+const VhostOps *vhost_ops = net-dev.vhost_ops;
+int r = vhost_ops-vhost_call(net-dev, VHOST_NET_SET_BACKEND, file);
 assert(r = 0);
 }
 net-nc-info-poll(net-nc, true);
@@ -201,7 +203,8 @@ static void vhost_net_stop_one(struct vhost_net *net,
 }
 
 for (file.index = 0; file.index  net-dev.nvqs; ++file.index) {
-int r = ioctl(net-dev.control, VHOST_NET_SET_BACKEND, file);
+const VhostOps *vhost_ops = net-dev.vhost_ops;
+int r = vhost_ops-vhost_call(net-dev, VHOST_NET_SET_BACKEND, file);
 assert(r = 0);
 }
 net-nc-info-poll(net-nc, true);
diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
index 9e770fb..b074b65 100644
--- a/hw/scsi/vhost-scsi.c
+++ b/hw/scsi/vhost-scsi.c
@@ -27,12 +27,13 @@
 static int vhost_scsi_set_endpoint(VHostSCSI *s)
 {
 VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(s);
+const VhostOps *vhost_ops = s-dev.vhost_ops;
 struct vhost_scsi_target backend;
 int ret;
 
 memset(backend, 0, sizeof(backend));
 pstrcpy(backend.vhost_wwpn, sizeof(backend.vhost_wwpn), vs-conf.wwpn);
-ret = ioctl(s-dev.control, VHOST_SCSI_SET_ENDPOINT, backend);
+ret = vhost_ops-vhost_call(s-dev, VHOST_SCSI_SET_ENDPOINT, backend);
 if (ret  0) {
 return -errno;
 }
@@ -43,10 +44,11 @@ static void vhost_scsi_clear_endpoint(VHostSCSI *s)
 {
 VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(s);
 struct vhost_scsi_target backend;
+const VhostOps *vhost_ops = s-dev.vhost_ops;
 
 memset(backend, 0, sizeof(backend));
 pstrcpy(backend.vhost_wwpn, sizeof(backend.vhost_wwpn), vs-conf.wwpn);
-ioctl(s-dev.control, VHOST_SCSI_CLEAR_ENDPOINT, backend);
+vhost_ops-vhost_call(s-dev, VHOST_SCSI_CLEAR_ENDPOINT, backend);
 }
 
 static int vhost_scsi_start(VHostSCSI *s)
@@ -55,13 +57,15 @@ static int vhost_scsi_start(VHostSCSI *s)
 VirtIODevice *vdev = VIRTIO_DEVICE(s);
 BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
 VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
+const VhostOps *vhost_ops = s-dev.vhost_ops;
 
 if (!k-set_guest_notifiers) {
 error_report(binding does not support guest notifiers);
 return -ENOSYS;
 }
 
-ret = ioctl(s-dev.control, VHOST_SCSI_GET_ABI_VERSION, abi_version);
+ret = vhost_ops-vhost_call(s-dev,
+VHOST_SCSI_GET_ABI_VERSION, abi_version);
 if (ret  0) {
 return -errno;
 }
@@ -225,7 +229,8 @@ static int vhost_scsi_init(VirtIODevice *vdev)
 s-dev.vqs = g_new(struct vhost_virtqueue, s-dev.nvqs);

[Qemu-devel] [PATCH] inet_listen_opts: add error checking

2013-12-13 Thread Gerd Hoffmann

Don't use atoi() function which doesn't detect errors, switch to
strtol and error out on failures.  Also add a range check while
being at it.

[ v2: use parse_uint_full instead of strtol ]

Signed-off-by: Gerd Hoffmann kra...@redhat.com
---
 util/qemu-sockets.c | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
index 6b97dc1..5636510 100644
--- a/util/qemu-sockets.c
+++ b/util/qemu-sockets.c
@@ -133,8 +133,20 @@ int inet_listen_opts(QemuOpts *opts, int port_offset, 
Error **errp)
 ai.ai_family = PF_INET6;
 
 /* lookup */
-if (port_offset)
-snprintf(port, sizeof(port), %d, atoi(port) + port_offset);
+if (port_offset) {
+int baseport;
+errno = 0;
+baseport = strtol(port, NULL, 10);
+if (errno != 0) {
+error_setg(errp, can't convert to a number: %s, port);
+return -1;
+}
+if (baseport  0 || baseport + port_offset  65535) {
+error_setg(errp, port %s out of range, port);
+return -1;
+}
+snprintf(port, sizeof(port), %d, baseport + port_offset);
+}
 rc = getaddrinfo(strlen(addr) ? addr : NULL, port, ai, res);
 if (rc != 0) {
 error_setg(errp, address resolution failed for %s:%s: %s, addr, port,
-- 
1.8.3.1

[Qemu-devel] [PATCH v2] trace: add glib 2.32+ static GMutex support

2013-12-13 Thread Stefan Hajnoczi

The GStaticMutex API was deprecated in glib 2.32.  We cannot switch over
to GMutex unconditionally since we would drop support for older glib
versions.  But the deprecated API warnings during build are annoying so
use static GMutex when possible.

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 trace/simple.c | 23 ---
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/trace/simple.c b/trace/simple.c
index 1e3f691..40ce448 100644
--- a/trace/simple.c
+++ b/trace/simple.c
@@ -39,7 +39,17 @@
  * Trace records are written out by a dedicated thread.  The thread waits for
  * records to become available, writes them out, and then waits again.
  */
+#if GLIB_CHECK_VERSION(2, 32, 0)
+static GMutex trace_lock;
+#define lock_trace_lock() g_mutex_lock(trace_lock)
+#define unlock_trace_lock() g_mutex_unlock(trace_lock)
+#define get_trace_lock_mutex() (trace_lock)
+#else
 static GStaticMutex trace_lock = G_STATIC_MUTEX_INIT;
+#define lock_trace_lock() g_static_mutex_lock(trace_lock)
+#define unlock_trace_lock() g_static_mutex_unlock(trace_lock)
+#define get_trace_lock_mutex() g_static_mutex_get_mutex(trace_lock)
+#endif
 
 /* g_cond_new() was deprecated in glib 2.31 but we still need to support it */
 #if GLIB_CHECK_VERSION(2, 31, 0)
@@ -139,27 +149,26 @@ static bool get_trace_record(unsigned int idx, 
TraceRecord **recordptr)
  */
 static void flush_trace_file(bool wait)
 {
-g_static_mutex_lock(trace_lock);
+lock_trace_lock();
 trace_available = true;
 g_cond_signal(trace_available_cond);
 
 if (wait) {
-g_cond_wait(trace_empty_cond, g_static_mutex_get_mutex(trace_lock));
+g_cond_wait(trace_empty_cond, get_trace_lock_mutex());
 }
 
-g_static_mutex_unlock(trace_lock);
+unlock_trace_lock();
 }
 
 static void wait_for_trace_records_available(void)
 {
-g_static_mutex_lock(trace_lock);
+lock_trace_lock();
 while (!(trace_available  trace_writeout_enabled)) {
 g_cond_signal(trace_empty_cond);
-g_cond_wait(trace_available_cond,
-g_static_mutex_get_mutex(trace_lock));
+g_cond_wait(trace_available_cond, get_trace_lock_mutex());
 }
 trace_available = false;
-g_static_mutex_unlock(trace_lock);
+unlock_trace_lock();
 }
 
 static gpointer writeout_thread(gpointer opaque)
-- 
1.8.4.2

Re: [Qemu-devel] Occasional clockjump in Win2012 after Live Migration

2013-12-13 Thread Peter Lieven

Am 13.12.2013 11:10, schrieb Vadim Rozenfeld:
 On Fri, 2013-12-13 at 09:27 +0100, Peter Lieven wrote:
 Am 13.12.2013 05:12, schrieb Vadim Rozenfeld:
 Does your VM belong to domain or workgroup? 
 We had 2 vServers where this happened. One was a Domain Controller and the 
 second was an independent Workgroup Server.

 Do you have evidence how the DateTime Clock is driven in Windows 2012?
 Should be CMOS periodic timer.
you mean the RTC?
Ok, so if there is a huge clock jump this one must have got messed up?

Peter

[Qemu-devel] [PATCH 1/5] net: extend NetClientInfo for offloading manipulations

2013-12-13 Thread Vincenzo Maffione

A set of new callbacks has been added to the NetClientInfo struct in
order to abstract the operations done by virtio-net and vmxnet3
frontends to manipulate TAP offloadings.

The net.h API has been extended with functions that access those
abstract operations, providing frontends with a way to manipulate
backend offloadings.

Signed-off-by: Vincenzo Maffione v.maffi...@gmail.com
---
 include/net/net.h | 19 +++
 net/net.c | 55 +++
 2 files changed, 74 insertions(+)

diff --git a/include/net/net.h b/include/net/net.h
index 11e1468..f5b5bae 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -50,6 +50,12 @@ typedef void (NetCleanup) (NetClientState *);
 typedef void (LinkStatusChanged)(NetClientState *);
 typedef void (NetClientDestructor)(NetClientState *);
 typedef RxFilterInfo *(QueryRxFilter)(NetClientState *);
+typedef bool (HasUfo)(NetClientState *);
+typedef int (HasVnetHdr)(NetClientState *);
+typedef int (HasVnetHdrLen)(NetClientState *, int);
+typedef void (UsingVnetHdr)(NetClientState *, bool);
+typedef void (SetOffload)(NetClientState *, int, int, int, int, int);
+typedef void (SetVnetHdrLen)(NetClientState *, int);
 
 typedef struct NetClientInfo {
 NetClientOptionsKind type;
@@ -62,6 +68,12 @@ typedef struct NetClientInfo {
 LinkStatusChanged *link_status_changed;
 QueryRxFilter *query_rx_filter;
 NetPoll *poll;
+HasUfo *has_ufo;
+HasVnetHdr *has_vnet_hdr;
+HasVnetHdrLen *has_vnet_hdr_len;
+UsingVnetHdr *using_vnet_hdr;
+SetOffload *set_offload;
+SetVnetHdrLen *set_vnet_hdr_len;
 } NetClientInfo;
 
 struct NetClientState {
@@ -120,6 +132,13 @@ ssize_t qemu_send_packet_async(NetClientState *nc, const 
uint8_t *buf,
 void qemu_purge_queued_packets(NetClientState *nc);
 void qemu_flush_queued_packets(NetClientState *nc);
 void qemu_format_nic_info_str(NetClientState *nc, uint8_t macaddr[6]);
+bool qemu_peer_has_ufo(NetClientState *nc);
+int qemu_peer_has_vnet_hdr(NetClientState *nc);
+int qemu_peer_has_vnet_hdr_len(NetClientState *nc, int len);
+void qemu_peer_using_vnet_hdr(NetClientState *nc, bool enable);
+void qemu_peer_set_offload(NetClientState *nc, int csum, int tso4, int tso6,
+   int ecn, int ufo);
+void qemu_peer_set_vnet_hdr_len(NetClientState *nc, int len);
 void qemu_macaddr_default_if_unset(MACAddr *macaddr);
 int qemu_show_nic_models(const char *arg, const char *const *models);
 void qemu_check_nic_model(NICInfo *nd, const char *model);
diff --git a/net/net.c b/net/net.c
index 9db88cc..96f05d9 100644
--- a/net/net.c
+++ b/net/net.c
@@ -381,6 +381,61 @@ void qemu_foreach_nic(qemu_nic_foreach func, void *opaque)
 }
 }
 
+bool qemu_peer_has_ufo(NetClientState *nc)
+{
+if (!nc-peer || !nc-peer-info-has_ufo) {
+return false;
+}
+
+return nc-peer-info-has_ufo(nc-peer);
+}
+
+int qemu_peer_has_vnet_hdr(NetClientState *nc)
+{
+if (!nc-peer || !nc-peer-info-has_vnet_hdr) {
+return false;
+}
+
+return nc-peer-info-has_vnet_hdr(nc-peer);
+}
+
+int qemu_peer_has_vnet_hdr_len(NetClientState *nc, int len)
+{
+if (!nc-peer || !nc-peer-info-has_vnet_hdr_len) {
+return false;
+}
+
+return nc-peer-info-has_vnet_hdr_len(nc-peer, len);
+}
+
+void qemu_peer_using_vnet_hdr(NetClientState *nc, bool enable)
+{
+if (!nc-peer || !nc-peer-info-using_vnet_hdr) {
+return;
+}
+
+nc-peer-info-using_vnet_hdr(nc-peer, enable);
+}
+
+void qemu_peer_set_offload(NetClientState *nc, int csum, int tso4, int tso6,
+  int ecn, int ufo)
+{
+if (!nc-peer || !nc-peer-info-set_offload) {
+return;
+}
+
+nc-peer-info-set_offload(nc-peer, csum, tso4, tso6, ecn, ufo);
+}
+
+void qemu_peer_set_vnet_hdr_len(NetClientState *nc, int len)
+{
+if (!nc-peer || !nc-peer-info-set_vnet_hdr_len) {
+return;
+}
+
+nc-peer-info-set_vnet_hdr_len(nc-peer, len);
+}
+
 int qemu_can_send_packet(NetClientState *sender)
 {
 if (!sender-peer) {
-- 
1.8.5.1

[Qemu-devel] [PATCH 2/5] net: TAP uses NetClientInfo offloading callbacks

2013-12-13 Thread Vincenzo Maffione

The TAP NetClientInfo structure is inizialized with the TAP-specific
callbacks that manipulates backend offloading features.

Signed-off-by: Vincenzo Maffione v.maffi...@gmail.com
---
 net/tap.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/net/tap.c b/net/tap.c
index 39c1cda..175fcb3 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -314,6 +314,12 @@ static NetClientInfo net_tap_info = {
 .receive_iov = tap_receive_iov,
 .poll = tap_poll,
 .cleanup = tap_cleanup,
+.has_ufo = tap_has_ufo,
+.has_vnet_hdr = tap_has_vnet_hdr,
+.has_vnet_hdr_len = tap_has_vnet_hdr_len,
+.using_vnet_hdr = tap_using_vnet_hdr,
+.set_offload = tap_set_offload,
+.set_vnet_hdr_len = tap_set_vnet_hdr_len,
 };
 
 static TAPState *net_tap_fd_init(NetClientState *peer,
-- 
1.8.5.1

[Qemu-devel] [PATCH 5/5] net: virtio-net and vmxnet3 can use netmap offloadings

2013-12-13 Thread Vincenzo Maffione

With this patch we remove the existing checks in the virtio-net
and vmxnet3 frontends that prevents them from using
offloadings with backends different from TAP (e.g. netmap).

Signed-off-by: Vincenzo Maffione v.maffi...@gmail.com
---
 hw/net/virtio-net.c | 4 
 hw/net/vmxnet3.c| 4 +---
 2 files changed, 1 insertion(+), 7 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index c8ee2fa..8a94539 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -325,10 +325,6 @@ static void peer_test_vnet_hdr(VirtIONet *n)
 return;
 }
 
-if (nc-peer-info-type != NET_CLIENT_OPTIONS_KIND_TAP) {
-return;
-}
-
 n-has_vnet_hdr = qemu_peer_has_vnet_hdr(nc);
 }
 
diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
index f00c649..0524684 100644
--- a/hw/net/vmxnet3.c
+++ b/hw/net/vmxnet3.c
@@ -1885,9 +1885,7 @@ static bool vmxnet3_peer_has_vnet_hdr(VMXNET3State *s)
 {
 NetClientState *nc = qemu_get_queue(s-nic);
 
-if ((NULL != nc-peer)  
-(nc-peer-info-type == NET_CLIENT_OPTIONS_KIND_TAP)   
-qemu_peer_has_vnet_hdr(nc)) {
+if (qemu_peer_has_vnet_hdr(nc)) {
 return true;
 }
 
-- 
1.8.5.1

[Qemu-devel] [PATCH 0/5] Add netmap backend offloadings support

2013-12-13 Thread Vincenzo Maffione

The purpose of this patch series is to add offloadings support
(TSO/UFO/CSUM) to the netmap network backend, and make it possible
for the paravirtual network frontends (virtio-net and vmxnet3) to
use it.
In order to achieve this, these patches extend the existing
net.h interface to add abstract operations through which a network
frontend can manipulate backend offloading features, instead of
directly calling TAP-specific functions.

Guest-to-guest performance before this patches for virtio-net + netmap:

TCP_STREAM  5.0 Gbps
TCP_RR  12.7 Gbps
UDP_STREAM (64-bytes)   790 Kpps

Guest-to-guest performance after this patches for virtio-net + netmap:

TCP_STREAM  21.4 Gbps
TCP_RR  12.7 Gbps
UDP_STREAM (64-bytes)   790 Kpps

Experiment details:
- Processor: Intel i7-3770K CPU @ 3.50GHz (8 cores)
- Memory @ 1333 MHz
- Host O.S.: Archlinux with Linux 3.11
- Guest O.S.: Archlinux with Linux 3.11

- QEMU command line:
qemu-system-x86_64 archdisk.qcow -snapshot -enable-kvm -device 
virtio-net-pci,ioeventfd=on,mac=00:AA:BB:CC:DD:01,netdev=mynet -netdev 
netmap,ifname=vale0:01,id=mynet -smp 2 -vga std -m 3G


Vincenzo Maffione (5):
  net: extend NetClientInfo for offloading manipulations
  net: TAP uses NetClientInfo offloading callbacks
  net: virtio-net and vmxnet3 use offloading API
  net: add offloadings support to netmap backend
  net: virtio-net and vmxnet3 can use netmap offloadings

 hw/net/virtio-net.c | 16 +-
 hw/net/vmxnet3.c| 12 +-
 include/net/net.h   | 19 
 net/net.c   | 55 +
 net/netmap.c| 64 -
 net/tap.c   |  6 +
 6 files changed, 154 insertions(+), 18 deletions(-)

-- 
1.8.5.1

[Qemu-devel] [PATCH 3/5] net: virtio-net and vmxnet3 use offloading API

2013-12-13 Thread Vincenzo Maffione

With this patch, virtio-net and vmxnet3 frontends make
use of the qemu_peer_* API for backend offloadings manipulations,
instead of calling TAP-specific functions directly.

Signed-off-by: Vincenzo Maffione v.maffi...@gmail.com
---
 hw/net/virtio-net.c | 12 ++--
 hw/net/vmxnet3.c| 14 +++---
 2 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index d312b9c..c8ee2fa 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -329,7 +329,7 @@ static void peer_test_vnet_hdr(VirtIONet *n)
 return;
 }
 
-n-has_vnet_hdr = tap_has_vnet_hdr(nc-peer);
+n-has_vnet_hdr = qemu_peer_has_vnet_hdr(nc);
 }
 
 static int peer_has_vnet_hdr(VirtIONet *n)
@@ -342,7 +342,7 @@ static int peer_has_ufo(VirtIONet *n)
 if (!peer_has_vnet_hdr(n))
 return 0;
 
-n-has_ufo = tap_has_ufo(qemu_get_queue(n-nic)-peer);
+n-has_ufo = qemu_peer_has_ufo(qemu_get_queue(n-nic));
 
 return n-has_ufo;
 }
@@ -361,8 +361,8 @@ static void virtio_net_set_mrg_rx_bufs(VirtIONet *n, int 
mergeable_rx_bufs)
 nc = qemu_get_subqueue(n-nic, i);
 
 if (peer_has_vnet_hdr(n) 
-tap_has_vnet_hdr_len(nc-peer, n-guest_hdr_len)) {
-tap_set_vnet_hdr_len(nc-peer, n-guest_hdr_len);
+qemu_peer_has_vnet_hdr_len(nc, n-guest_hdr_len)) {
+qemu_peer_set_vnet_hdr_len(nc, n-guest_hdr_len);
 n-host_hdr_len = n-guest_hdr_len;
 }
 }
@@ -463,7 +463,7 @@ static uint32_t virtio_net_bad_features(VirtIODevice *vdev)
 
 static void virtio_net_apply_guest_offloads(VirtIONet *n)
 {
-tap_set_offload(qemu_get_subqueue(n-nic, 0)-peer,
+qemu_peer_set_offload(qemu_get_subqueue(n-nic, 0),
 !!(n-curr_guest_offloads  (1ULL  VIRTIO_NET_F_GUEST_CSUM)),
 !!(n-curr_guest_offloads  (1ULL  VIRTIO_NET_F_GUEST_TSO4)),
 !!(n-curr_guest_offloads  (1ULL  VIRTIO_NET_F_GUEST_TSO6)),
@@ -1546,7 +1546,7 @@ static int virtio_net_device_init(VirtIODevice *vdev)
 peer_test_vnet_hdr(n);
 if (peer_has_vnet_hdr(n)) {
 for (i = 0; i  n-max_queues; i++) {
-tap_using_vnet_hdr(qemu_get_subqueue(n-nic, i)-peer, true);
+qemu_peer_using_vnet_hdr(qemu_get_subqueue(n-nic, i), true);
 }
 n-host_hdr_len = sizeof(struct virtio_net_hdr);
 } else {
diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
index 19687aa..f00c649 100644
--- a/hw/net/vmxnet3.c
+++ b/hw/net/vmxnet3.c
@@ -1290,7 +1290,7 @@ static void vmxnet3_update_features(VMXNET3State *s)
   s-lro_supported, rxcso_supported,
   s-rx_vlan_stripping);
 if (s-peer_has_vhdr) {
-tap_set_offload(qemu_get_queue(s-nic)-peer,
+qemu_peer_set_offload(qemu_get_queue(s-nic),
 rxcso_supported,
 s-lro_supported,
 s-lro_supported,
@@ -1883,11 +1883,11 @@ static NetClientInfo net_vmxnet3_info = {
 
 static bool vmxnet3_peer_has_vnet_hdr(VMXNET3State *s)
 {
-NetClientState *peer = qemu_get_queue(s-nic)-peer;
+NetClientState *nc = qemu_get_queue(s-nic);
 
-if ((NULL != peer)  
-(peer-info-type == NET_CLIENT_OPTIONS_KIND_TAP)   
-tap_has_vnet_hdr(peer)) {
+if ((NULL != nc-peer)  
+(nc-peer-info-type == NET_CLIENT_OPTIONS_KIND_TAP)   
+qemu_peer_has_vnet_hdr(nc)) {
 return true;
 }
 
@@ -1935,10 +1935,10 @@ static void vmxnet3_net_init(VMXNET3State *s)
 s-lro_supported = false;
 
 if (s-peer_has_vhdr) {
-tap_set_vnet_hdr_len(qemu_get_queue(s-nic)-peer,
+qemu_peer_set_vnet_hdr_len(qemu_get_queue(s-nic),
 sizeof(struct virtio_net_hdr));
 
-tap_using_vnet_hdr(qemu_get_queue(s-nic)-peer, 1);
+qemu_peer_using_vnet_hdr(qemu_get_queue(s-nic), 1);
 }
 
 qemu_format_nic_info_str(qemu_get_queue(s-nic), s-conf.macaddr.a);
-- 
1.8.5.1

[Qemu-devel] [PATCH 4/5] net: add offloadings support to netmap backend

2013-12-13 Thread Vincenzo Maffione

Whit this patch, the netmap backend supports TSO/UFO/CSUM
offloadings, and accepts the virtio-net header, similarly to what
happens with TAP. The offloading callbacks in the NetClientInfo
interface have been implemented.

Signed-off-by: Vincenzo Maffione v.maffi...@gmail.com
---
 net/netmap.c | 64 +++-
 1 file changed, 63 insertions(+), 1 deletion(-)

diff --git a/net/netmap.c b/net/netmap.c
index 0ccc497..f02a574 100644
--- a/net/netmap.c
+++ b/net/netmap.c
@@ -54,6 +54,7 @@ typedef struct NetmapState {
 boolread_poll;
 boolwrite_poll;
 struct ioveciov[IOV_MAX];
+int vnet_hdr_len;  /* Current virtio-net header length. */
 } NetmapState;
 
 #define D(format, ...)  \
@@ -274,7 +275,7 @@ static ssize_t netmap_receive_iov(NetClientState *nc,
 return iov_size(iov, iovcnt);
 }
 
-i = ring-cur;
+last = i = ring-cur;
 avail = ring-avail;
 
 if (avail  iovcnt) {
@@ -394,6 +395,60 @@ static void netmap_cleanup(NetClientState *nc)
 s-me.fd = -1;
 }
 
+/* Offloading manipulation support callbacks.
+ *
+ * Currently we are simply able to pass the virtio-net header
+ * throughout the VALE switch, without modifying it or interpreting it.
+ * For this reason we are always able to support TSO/UFO/CSUM and
+ * different header lengths as long as two virtio-net-header-capable
+ * VMs are attached to the bridge.
+ */
+static bool netmap_has_ufo(NetClientState *nc)
+{
+return true;
+}
+
+static int netmap_has_vnet_hdr(NetClientState *nc)
+{
+return true;
+}
+
+static int netmap_has_vnet_hdr_len(NetClientState *nc, int len)
+{
+return len = 0  len = NETMAP_BDG_MAX_OFFSET;
+}
+
+static void netmap_using_vnet_hdr(NetClientState *nc, bool enable)
+{
+}
+
+static void netmap_set_offload(NetClientState *nc, int csum, int tso4, int 
tso6,
+   int ecn, int ufo)
+{
+}
+
+static void netmap_set_vnet_hdr_len(NetClientState *nc, int len)
+{
+NetmapState *s = DO_UPCAST(NetmapState, nc, nc);
+int err;
+struct nmreq req;
+
+/* Issue a NETMAP_BDG_OFFSET command to change the netmap adapter
+   offset of 'me-ifname'. */
+memset(req, 0, sizeof(req));
+pstrcpy(req.nr_name, sizeof(req.nr_name), s-me.ifname);
+req.nr_version = NETMAP_API;
+req.nr_cmd = NETMAP_BDG_OFFSET;
+req.nr_arg1 = len;
+err = ioctl(s-me.fd, NIOCREGIF, req);
+if (err) {
+error_report(Unable to execute NETMAP_BDG_OFFSET on %s: %s,
+ s-me.ifname, strerror(errno));
+} else {
+/* Keep track of the current length, may be usefule in the future. */
+s-vnet_hdr_len = len;
+}
+}
 
 /* NetClientInfo methods */
 static NetClientInfo net_netmap_info = {
@@ -403,6 +458,12 @@ static NetClientInfo net_netmap_info = {
 .receive_iov = netmap_receive_iov,
 .poll = netmap_poll,
 .cleanup = netmap_cleanup,
+.has_ufo = netmap_has_ufo,
+.has_vnet_hdr = netmap_has_vnet_hdr,
+.has_vnet_hdr_len = netmap_has_vnet_hdr_len,
+.using_vnet_hdr = netmap_using_vnet_hdr,
+.set_offload = netmap_set_offload,
+.set_vnet_hdr_len = netmap_set_vnet_hdr_len,
 };
 
 /* The exported init function
@@ -428,6 +489,7 @@ int net_init_netmap(const NetClientOptions *opts,
 nc = qemu_new_net_client(net_netmap_info, peer, netmap, name);
 s = DO_UPCAST(NetmapState, nc, nc);
 s-me = me;
+s-vnet_hdr_len = 0;
 netmap_read_poll(s, true); /* Initially only poll for reads. */
 
 return 0;
-- 
1.8.5.1

Re: [Qemu-devel] [PATCH 4/5] monitor: add object-add (QMP) and object_add (HMP) command

2013-12-13 Thread Paolo Bonzini

Il 13/12/2013 03:55, Wenchao Xia ha scritto:
 于 2013/12/11 2:15, Paolo Bonzini 写道:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Il 10/12/2013 19:00, Eric Blake ha scritto:
 +  'data': {'qom-type': 'str', 'id': 'str', '*props': 'dict'},
 +  'gen': 'no' }

 This feels VERY open-coded.  No where else in qapi-schema do we
 have 'dict' as a type

 Yes, in fact the data field is entirely skipped by the code
 generator (that's 'gen':'no').

 
 Could we use hmp_object_add()-qmp_object_add()-object_add() code path,
 instead of  hmp_object_add()-object_add(),qmp_object_add()-object_add()?
 Would skipping by generator brings some difficult to it?

No, you cannot do that because both hmp_object_add and qmp_object_add
take a QDict.  But hmp_object_add's qdict has strings, while
qmp_object_add has the right types (e.g. integer for integer properties).

It is not entirely clear, but actually the structure is the same as
regular commands that use code generation.

The usual code path is

hmp_cont  qmp_marshal_input_cont
\/
 qmp_cont

Here it is

  hmp_object_add  qmp_object_add
\/
object_add

So the current structure is fine.

Paolo

Re: [Qemu-devel] [PATCH v3] inet_listen_opts: add error checking

2013-12-13 Thread Eric Blake

On 12/13/2013 03:08 AM, Gerd Hoffmann wrote:
 Don't use atoi() function which doesn't detect errors, switch to
 strtol and error out on failures.  Also add a range check while
 being at it.
 
 [ v3: oops, v2 didn't build ]
 [ v2: use parse_uint_full instead of strtol ]

Patch changelog belongs...

 
 Signed-off-by: Gerd Hoffmann kra...@redhat.com
 ---

...here, so that 'git am' will strip it (it's useful for reviewers on
list, but not in 'git log', where you no longer have access to v1 or v2).

  util/qemu-sockets.c | 14 --
  1 file changed, 12 insertions(+), 2 deletions(-)
 
 diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
 index 6b97dc1..c3560c1 100644
 --- a/util/qemu-sockets.c
 +++ b/util/qemu-sockets.c
 @@ -133,8 +133,18 @@ int inet_listen_opts(QemuOpts *opts, int port_offset, 
 Error **errp)
  ai.ai_family = PF_INET6;
  
  /* lookup */
 -if (port_offset)
 -snprintf(port, sizeof(port), %d, atoi(port) + port_offset);
 +if (port_offset) {
 +int baseport;
 +if (parse_uint_full(port, baseport, 10)  0) {

parse_uint_full takes an 'unsigned long long *', but you are passing an
'int *'.  I'm surprised it compiled for you.  It causes a buffer
overflow if the pointer is assigned to, and gives different results
depending on platform endianness.

 +error_setg(errp, can't convert to a number: %s, port);
 +return -1;
 +}
 +if (baseport  0 || baseport + port_offset  65535) {
 +error_setg(errp, port %s out of range, port);

But errno is not set to a sane value at this point, so error_setg() is
wrong.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v2] roms: Flush icache when writing roms to guest memory

2013-12-13 Thread Paolo Bonzini

Il 12/12/2013 10:29, Alexander Graf ha scritto:
 We use the rom infrastructure to write firmware and/or initial kernel
 blobs into guest address space. So we're basically emulating the cache
 off phase on very early system bootup.
 
 That phase is usually responsible for clearing the instruction cache for
 anything it writes into cachable memory, to ensure that after reboot we
 don't happen to execute stale bits from the instruction cache.
 
 So we need to invalidate the icache every time we write a rom into guest
 address space. We do not need to do this for every DMA since the guest
 expects it has to flush the icache manually in that case.
 
 This fixes random reboot issues on e5500 (booke ppc) for me.
 
 Signed-off-by: Alexander Graf ag...@suse.de

Interesting way to avoid cut-and-paste.

Applied to uq/master, thanks.

Paolo

 ---
 
 v1 - v2:
 
   - extract the flush into a helper function
 ---
  exec.c| 44 +++-
  hw/core/loader.c  |  7 +++
  include/exec/cpu-common.h |  1 +
  3 files changed, 47 insertions(+), 5 deletions(-)
 
 diff --git a/exec.c b/exec.c
 index f4b9ef2..896f7b8 100644
 --- a/exec.c
 +++ b/exec.c
 @@ -50,6 +50,7 @@
  #include translate-all.h
  
  #include exec/memory-internal.h
 +#include qemu/cache-utils.h
  
  //#define DEBUG_SUBPAGE
  
 @@ -2010,9 +2011,13 @@ void cpu_physical_memory_rw(hwaddr addr, uint8_t *buf,
  address_space_rw(address_space_memory, addr, buf, len, is_write);
  }
  
 -/* used for ROM loading : can write in RAM and ROM */
 -void cpu_physical_memory_write_rom(hwaddr addr,
 -   const uint8_t *buf, int len)
 +enum write_rom_type {
 +WRITE_DATA,
 +FLUSH_CACHE,
 +};
 +
 +static inline void cpu_physical_memory_write_rom_internal(
 +hwaddr addr, const uint8_t *buf, int len, enum write_rom_type type)
  {
  hwaddr l;
  uint8_t *ptr;
 @@ -2031,8 +2036,15 @@ void cpu_physical_memory_write_rom(hwaddr addr,
  addr1 += memory_region_get_ram_addr(mr);
  /* ROM/RAM case */
  ptr = qemu_get_ram_ptr(addr1);
 -memcpy(ptr, buf, l);
 -invalidate_and_set_dirty(addr1, l);
 +switch (type) {
 +case WRITE_DATA:
 +memcpy(ptr, buf, l);
 +invalidate_and_set_dirty(addr1, l);
 +break;
 +case FLUSH_CACHE:
 +flush_icache_range((uintptr_t)ptr, (uintptr_t)ptr + l);
 +break;
 +}
  }
  len -= l;
  buf += l;
 @@ -2040,6 +2052,28 @@ void cpu_physical_memory_write_rom(hwaddr addr,
  }
  }
  
 +/* used for ROM loading : can write in RAM and ROM */
 +void cpu_physical_memory_write_rom(hwaddr addr,
 +   const uint8_t *buf, int len)
 +{
 +cpu_physical_memory_write_rom_internal(addr, buf, len, WRITE_DATA);
 +}
 +
 +void cpu_flush_icache_range(hwaddr start, int len)
 +{
 +/*
 + * This function should do the same thing as an icache flush that was
 + * triggered from within the guest. For TCG we are always cache coherent,
 + * so there is no need to flush anything. For KVM / Xen we need to flush
 + * the host's instruction cache at least.
 + */
 +if (tcg_enabled()) {
 +return;
 +}
 +
 +cpu_physical_memory_write_rom_internal(start, NULL, len, FLUSH_CACHE);
 +}
 +
  typedef struct {
  MemoryRegion *mr;
  void *buffer;
 diff --git a/hw/core/loader.c b/hw/core/loader.c
 index 60d2ebd..0634bee 100644
 --- a/hw/core/loader.c
 +++ b/hw/core/loader.c
 @@ -785,6 +785,13 @@ static void rom_reset(void *unused)
  g_free(rom-data);
  rom-data = NULL;
  }
 +/*
 + * The rom loader is really on the same level as firmware in the 
 guest
 + * shadowing a ROM into RAM. Such a shadowing mechanism needs to 
 ensure
 + * that the instruction cache for that new region is clear, so that 
 the
 + * CPU definitely fetches its instructions from the just written 
 data.
 + */
 +cpu_flush_icache_range(rom-addr, rom-datasize);
  }
  }
  
 diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
 index e4996e1..8f33122 100644
 --- a/include/exec/cpu-common.h
 +++ b/include/exec/cpu-common.h
 @@ -110,6 +110,7 @@ void stq_phys(hwaddr addr, uint64_t val);
  
  void cpu_physical_memory_write_rom(hwaddr addr,
 const uint8_t *buf, int len);
 +void cpu_flush_icache_range(hwaddr start, int len);
  
  extern struct MemoryRegion io_mem_rom;
  extern struct MemoryRegion io_mem_notdirty;

[Qemu-devel] [Bug 1259499] Re: QEmu 1.7.0 cannot restore a 1.6.0 live snapshot made in qemu-system-x86_64

2013-12-13 Thread Dr. David Alan Gilbert

Hi Francois,
  I've managed to reproduce this, in my log file 
(/var/log/libvirt/qemu/machinename.log) I see:

Unknown ramblock :02.0/qxl.vram, cannot accept migration
qemu: warning: error while loading state for instance 0x0 of device 'ram'
qemu-system-x86_64: Error -22 while loading VM state

do you also see that unknown ramblock warning?

(I'm running on F20 using 1.6.0 and 1.7.0 qemu's built from source
running minimal F20 guests)

Dave

** Changed in: qemu
   Status: New = Confirmed

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1259499

Title:
  QEmu 1.7.0 cannot restore a 1.6.0 live snapshot made in qemu-system-
  x86_64

Status in QEMU:
  Confirmed

Bug description:
  I have upgraded to QEmu 1.7.0 (Debian 1.7.0+dfsg-2) but now when I try
  to restore a live snapshot made in QEmu 1.6.0 (Debian 1.6.0+dfsg-1) I
  see that the VM boots from scratch instead of starting directly in the
  snapshot's running state.

  Furthermore if the VM is already running and I try to revert to the
  snapshot again I get the following message:

  $ virsh --connect qemu:///system snapshot-revert fgtbbuild wtb; echo $?
  error: operation failed: Error -22 while loading VM state
  1

  I have test VMs with live snapshots corresponding to different testing
  configurations. So I typically revert the VMs in one of the live
  snapshots and run the tests. It would be pretty annoying to have to
  recreate all these live snapshots any time I upgrade QEmu bug it looks
  like I'll have to do it again.

  This all sounds very much like bug 1123975 where QEmu 1.3 broke
  compatibility with previous versions live snapshots :-(

  Here is the command being run by libvirt:

  /usr/bin/qemu-system-x86_64 -name fgtbbuild -S -machine
  pc-1.1,accel=kvm,usb=off -m 512 -realtime mlock=off -smp
  4,sockets=4,cores=1,threads=1 -uuid f510955c-
  17de-9907-1e33-dfe1ef7a08b6 -no-user-config -nodefaults -chardev
  
socket,id=charmonitor,path=/var/lib/libvirt/qemu/fgtbbuild.monitor,server,nowait
  -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-
  shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive
  file=/mnt/storage1/qemu/fgtbbuild.qcow2,if=none,id=drive-virtio-
  disk0,format=qcow2,cache=writeback -device virtio-blk-
  pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-
  disk0,bootindex=1 -drive if=none,id=drive-
  ide0-0-0,readonly=on,format=raw -device ide-cd,bus=ide.0,unit=0,drive
  =drive-ide0-0-0,id=ide0-0-0 -netdev
  tap,fd=25,id=hostnet0,vhost=on,vhostfd=26 -device virtio-net-
  pci,netdev=hostnet0,id=net0,mac=52:54:00:0a:3c:e8,bus=pci.0,addr=0x3
  -chardev pty,id=charserial0 -device isa-
  serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0
  -vnc 127.0.0.1:0 -device qxl-
  vga,id=video0,ram_size=67108864,vram_size=67108864,bus=pci.0,addr=0x2
  -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-
  duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device virtio-balloon-
  pci,id=balloon0,bus=pci.0,addr=0x6 -loadvm wtb

  ipxe-qemu 1.0.0+git-20120202.f6840ba-3
  qemu 1.7.0+dfsg-2
  qemu-keymaps 1.7.0+dfsg-2
  qemu-slof 20130430+dfsg-1
  qemu-system 1.7.0+dfsg-2
  qemu-system-arm 1.7.0+dfsg-2
  qemu-system-common 1.7.0+dfsg-2
  qemu-system-mips 1.7.0+dfsg-2
  qemu-system-misc 1.7.0+dfsg-2
  qemu-system-ppc 1.7.0+dfsg-2
  qemu-system-sparc 1.7.0+dfsg-2
  qemu-system-x86 1.7.0+dfsg-2
  qemu-user 1.7.0+dfsg-2
  qemu-utils 1.7.0+dfsg-2
  libvirt-bin 1.1.4-2
  libvirt0 1.1.4-2
  libvirtodbc0 6.1.6+dfsg-4

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1259499/+subscriptions

Re: [Qemu-devel] [PATCH v4] inet_listen_opts: add error checking

2013-12-13 Thread Eric Blake

On 12/13/2013 03:12 AM, Gerd Hoffmann wrote:
 Don't use atoi() function which doesn't detect errors, switch to
 strtol and error out on failures.  Also add a range check while
 being at it.
 
 [ v4: didn't commit buildfix.  -ENOCOFFEE.  sorry for the spam ]
 [ v3: oops, v2 didn't build ]
 [ v2: use parse_uint_full instead of strtol ]

Patch changelog belongs...

 
 Signed-off-by: Gerd Hoffmann kra...@redhat.com
 ---

...here.  You can add a '---' separator in your commit message (and thus
have two '---' lines in 'git send-email' output), if you still want to
track the changelog in your commits (but remember to hoist your S-o-B
when doing that).

  util/qemu-sockets.c | 14 --
  1 file changed, 12 insertions(+), 2 deletions(-)
 

 +if (port_offset) {
 +unsigned long long baseport;
 +if (parse_uint_full(port, baseport, 10)  0) {
 +error_setg(errp, can't convert to a number: %s, port);
 +return -1;
 +}
 +if (baseport + port_offset  65535) {
 +error_setg(errp, port %s out of range, port);

error_setg() is still reporting on a bogus errno value at this point.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH 01/11] qom: do not register interface types in the type table

2013-12-13 Thread Igor Mammedov

From: Paolo Bonzini pbonz...@redhat.com

There should be no need to look them up nor enumerate the interface
types, whose classes are really just vtables.  Just create the
types and add them to the interface list of the parent type.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Igor Mammedov imamm...@redhat.com
---
 qom/object.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/qom/object.c b/qom/object.c
index fc19cf6..3a43186 100644
--- a/qom/object.c
+++ b/qom/object.c
@@ -88,7 +88,7 @@ static TypeImpl *type_table_lookup(const char *name)
 return g_hash_table_lookup(type_table_get(), name);
 }
 
-static TypeImpl *type_register_internal(const TypeInfo *info)
+static TypeImpl *type_new(const TypeInfo *info)
 {
 TypeImpl *ti = g_malloc0(sizeof(*ti));
 int i;
@@ -122,8 +122,15 @@ static TypeImpl *type_register_internal(const TypeInfo 
*info)
 }
 ti-num_interfaces = i;
 
-type_table_add(ti);
+return ti;
+}
 
+static TypeImpl *type_register_internal(const TypeInfo *info)
+{
+TypeImpl *ti;
+ti = type_new(info);
+
+type_table_add(ti);
 return ti;
 }
 
@@ -216,7 +223,7 @@ static void type_initialize_interface(TypeImpl *ti, const 
char *parent)
 info.name = g_strdup_printf(%s::%s, ti-name, info.parent);
 info.abstract = true;
 
-iface_impl = type_register(info);
+iface_impl = type_new(info);
 type_initialize(iface_impl);
 g_free((char *)info.name);
 
-- 
1.8.3.1

[Qemu-devel] [PATCH 00/11 v3] Refactor PCI/SHPC/PCIE hotplug to use a more generic hotplug API

2013-12-13 Thread Igor Mammedov

changes since v2:
* s/hotplugable/hotpluggable/
* move hotplug check to an earlier patch:
  qdev: add hotpluggable property to Device
--
Refactor PCI specific hotplug API to a more generic/reusable one.
Model it after SCSI-BUS like hotplug API replacing single hotplug
callback with hotplug/hot_unplug pair of callbacks as suggested by
Paolo.
Difference between SCSI-BUS and this approach is that the former
is BUS centric while the latter is device centred. Which is evolved
from the fact that hotplug callbacks used by ACPI/SHPC/PCIE are
implemented by devices rather than by bus and bus serves only as
a proxy to forward event to hotplug device.
Memory hotplug also exposes tha same usage pattern hence an attempt
to generalize hotplug API.

Refactoring also simplifies wiring of a hotplug device with a bus,
all it needs is to set hotplug-device link on bus, which
would potentially allow to do it from configuration file,
there is not need to setup hotplug device callbacks on bus
synce it can get them via HOTPLUG_DEVICE API of hotplug-device
target.

In addition device centred hotplug API may be used by bus-less
hotplug implementations as well if it's decided to use
linkfoo... instead of bus.

Patches 8-11 are should be merged as one and are split only for
simplifying review (they compile fine but PCI hotplug is broken
until the last patch is applyed).

git tree for testing:
https://github.com/imammedo/qemu/commits/hotplug_dev_inf_v3

tested only ACPI and PCIE hotplug.

Hervé Poussineau (1):
  qom: detect bad reentrance during object_class_foreach

Igor Mammedov (9):
  define hotplug interface
  qdev: add to BusState hotplug-handler link
  qdev: add hotpluggable property to Device
  hw/acpi: move typeinfo to the file end
  qdev:pci: refactor PCIDevice to use generic hotpluggable property
  acpi/piix4pm: convert ACPI PCI hotplug to use hotplug-handler API
  pci/shpc: convert SHPC hotplug to use hotplug-handler API
  pci/pcie: convert PCIE hotplug to use hotplug-handler API
  hw/pci: switch to a generic hotplug handling for PCIDevice

Paolo Bonzini (1):
  qom: do not register interface types in the type table

 hw/acpi/piix4.c| 151 ++---
 hw/core/Makefile.objs  |   1 +
 hw/core/hotplug.c  |  48 +
 hw/core/qdev.c |  50 --
 hw/display/cirrus_vga.c|   2 +-
 hw/display/qxl.c   |   2 +-
 hw/display/vga-pci.c   |   2 +-
 hw/display/vmware_vga.c|   2 +-
 hw/i386/acpi-build.c   |   6 +-
 hw/ide/piix.c  |   4 +-
 hw/isa/piix4.c |   2 +-
 hw/pci-bridge/pci_bridge_dev.c |   9 +++
 hw/pci-host/piix.c |   6 +-
 hw/pci/pci.c   |  40 +--
 hw/pci/pcie.c  |  73 +---
 hw/pci/pcie_port.c |   8 +++
 hw/pci/shpc.c  | 133 +++-
 hw/usb/hcd-ehci-pci.c  |   2 +-
 hw/usb/hcd-ohci.c  |   2 +-
 hw/usb/hcd-uhci.c  |   2 +-
 hw/usb/hcd-xhci.c  |   2 +-
 include/hw/hotplug.h   |  75 
 include/hw/pci/pci.h   |  13 
 include/hw/pci/pci_bus.h   |   2 -
 include/hw/pci/pcie.h  |   5 ++
 include/hw/pci/shpc.h  |   8 +++
 include/hw/qdev-core.h |   8 +++
 qom/object.c   |  17 -
 28 files changed, 455 insertions(+), 220 deletions(-)
 create mode 100644 hw/core/hotplug.c
 create mode 100644 include/hw/hotplug.h

-- 
1.8.3.1

[Qemu-devel] [PATCH 08/11] acpi/piix4pm: convert ACPI PCI hotplug to use hotplug-handler API

2013-12-13 Thread Igor Mammedov

Split piix4_device_hotplug() into hotplug/unplug callbacks
and register them as hotplug-handler interface implementation of
PIIX4_PM device.

Replace pci_bus_hotplug() wiring with setting link on
PCI BUS hotplug-handler property to PIIX4_PM device.

Signed-off-by: Igor Mammedov imamm...@redhat.com
---
 hw/acpi/piix4.c | 73 -
 1 file changed, 41 insertions(+), 32 deletions(-)

diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c
index fff2126..857d039 100644
--- a/hw/acpi/piix4.c
+++ b/hw/acpi/piix4.c
@@ -30,6 +30,8 @@
 #include hw/nvram/fw_cfg.h
 #include exec/address-spaces.h
 #include hw/acpi/piix4.h
+#include qapi/qmp/qerror.h
+#include hw/hotplug.h
 
 //#define DEBUG
 
@@ -107,7 +109,7 @@ typedef struct PIIX4PMState {
 OBJECT_CHECK(PIIX4PMState, (obj), TYPE_PIIX4_PM)
 
 static void piix4_acpi_system_hot_add_init(MemoryRegion *parent,
-   PCIBus *bus, PIIX4PMState *s);
+   BusState *bus, PIIX4PMState *s);
 
 #define ACPI_ENABLE 0xf1
 #define ACPI_DISABLE 0xf0
@@ -478,7 +480,7 @@ static int piix4_pm_initfn(PCIDevice *dev)
 qemu_add_machine_init_done_notifier(s-machine_ready);
 qemu_register_reset(piix4_reset, s);
 
-piix4_acpi_system_hot_add_init(pci_address_space_io(dev), dev-bus, s);
+piix4_acpi_system_hot_add_init(pci_address_space_io(dev), BUS(dev-bus), 
s);
 
 piix4_pm_add_propeties(s);
 return 0;
@@ -664,13 +666,11 @@ static void piix4_cpu_added_req(Notifier *n, void *opaque)
 piix4_cpu_hotplug_req(s, CPU(opaque), PLUG);
 }
 
-static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
-PCIHotplugState state);
-
 static void piix4_acpi_system_hot_add_init(MemoryRegion *parent,
-   PCIBus *bus, PIIX4PMState *s)
+   BusState *bus, PIIX4PMState *s)
 {
 CPUState *cpu;
+Error *local_error = NULL;
 
 memory_region_init_io(s-io_gpe, OBJECT(s), piix4_gpe_ops, s,
   acpi-gpe0, GPE_LEN);
@@ -680,7 +680,14 @@ static void piix4_acpi_system_hot_add_init(MemoryRegion 
*parent,
   acpi-pci-hotplug, PCI_HOTPLUG_SIZE);
 memory_region_add_subregion(parent, PCI_HOTPLUG_ADDR,
 s-io_pci);
-pci_bus_hotplug(bus, piix4_device_hotplug, DEVICE(s));
+object_property_set_link(OBJECT(bus), OBJECT(s),
+ QDEV_HOTPLUG_HANDLER_PROPERTY, local_error);
+if (error_is_set(local_error)) {
+qerror_report_err(local_error);
+error_free(local_error);
+abort();
+}
+bus-allow_hotplug = 1;
 
 CPU_FOREACH(cpu) {
 CPUClass *cc = CPU_GET_CLASS(cpu);
@@ -696,41 +703,36 @@ static void piix4_acpi_system_hot_add_init(MemoryRegion 
*parent,
 qemu_register_cpu_added_notifier(s-cpu_added_notifier);
 }
 
-static void enable_device(PIIX4PMState *s, int slot)
+static void piix4_pci_device_hotplug_cb(HotplugHandler *hotplug_dev,
+DeviceState *dev, Error **errp)
 {
-s-ar.gpe.sts[0] |= PIIX4_PCI_HOTPLUG_STATUS;
-s-pci0_slot_device_present |= (1U  slot);
-}
-
-static void disable_device(PIIX4PMState *s, int slot)
-{
-s-ar.gpe.sts[0] |= PIIX4_PCI_HOTPLUG_STATUS;
-s-pci0_status.down |= (1U  slot);
-}
-
-static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
-   PCIHotplugState state)
-{
-int slot = PCI_SLOT(dev-devfn);
-PIIX4PMState *s = PIIX4_PM(qdev);
+PCIDevice *pci_dev = PCI_DEVICE(dev);
+int slot = PCI_SLOT(pci_dev-devfn);
+PIIX4PMState *s = PIIX4_PM(hotplug_dev);
 
 /* Don't send event when device is enabled during qemu machine creation:
  * it is present on boot, no hotplug event is necessary. We do send an
  * event when the device is disabled later. */
-if (state == PCI_COLDPLUG_ENABLED) {
+if (!dev-hotplugged) {
 s-pci0_slot_device_present |= (1U  slot);
-return 0;
-}
-
-if (state == PCI_HOTPLUG_ENABLED) {
-enable_device(s, slot);
-} else {
-disable_device(s, slot);
+return;
 }
 
+s-ar.gpe.sts[0] |= PIIX4_PCI_HOTPLUG_STATUS;
+s-pci0_slot_device_present |= (1U  slot);
 pm_update_sci(s);
+}
 
-return 0;
+static void piix4_pci_device_hot_unplug_cb(HotplugHandler *hotplug_dev,
+   DeviceState *dev, Error **errp)
+{
+PCIDevice *pci_dev = PCI_DEVICE(dev);
+int slot = PCI_SLOT(pci_dev-devfn);
+PIIX4PMState *s = PIIX4_PM(hotplug_dev);
+
+s-ar.gpe.sts[0] |= PIIX4_PCI_HOTPLUG_STATUS;
+s-pci0_status.down |= (1U  slot);
+pm_update_sci(s);
 }
 
 static Property piix4_pm_properties[] = {
@@ -745,6 +747,7 @@ static void piix4_pm_class_init(ObjectClass *klass, void 
*data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
 PCIDeviceClass *k =

[Qemu-devel] [PATCH 06/11] hw/acpi: move typeinfo to the file end

2013-12-13 Thread Igor Mammedov

do so to avoid not necessary forward declarations and
place typeinfo registration at the file end where it's
usualy expected.

Signed-off-by: Igor Mammedov imamm...@redhat.com
---
 hw/acpi/piix4.c | 80 -
 1 file changed, 40 insertions(+), 40 deletions(-)

diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c
index 93849c8..8084a60 100644
--- a/hw/acpi/piix4.c
+++ b/hw/acpi/piix4.c
@@ -523,46 +523,6 @@ i2c_bus *piix4_pm_init(PCIBus *bus, int devfn, uint32_t 
smb_io_base,
 return s-smb.smbus;
 }
 
-static Property piix4_pm_properties[] = {
-DEFINE_PROP_UINT32(smb_io_base, PIIX4PMState, smb_io_base, 0),
-DEFINE_PROP_UINT8(ACPI_PM_PROP_S3_DISABLED, PIIX4PMState, disable_s3, 0),
-DEFINE_PROP_UINT8(ACPI_PM_PROP_S4_DISABLED, PIIX4PMState, disable_s4, 0),
-DEFINE_PROP_UINT8(ACPI_PM_PROP_S4_VAL, PIIX4PMState, s4_val, 2),
-DEFINE_PROP_END_OF_LIST(),
-};
-
-static void piix4_pm_class_init(ObjectClass *klass, void *data)
-{
-DeviceClass *dc = DEVICE_CLASS(klass);
-PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
-
-k-no_hotplug = 1;
-k-init = piix4_pm_initfn;
-k-config_write = pm_write_config;
-k-vendor_id = PCI_VENDOR_ID_INTEL;
-k-device_id = PCI_DEVICE_ID_INTEL_82371AB_3;
-k-revision = 0x03;
-k-class_id = PCI_CLASS_BRIDGE_OTHER;
-dc-desc = PM;
-dc-no_user = 1;
-dc-vmsd = vmstate_acpi;
-dc-props = piix4_pm_properties;
-}
-
-static const TypeInfo piix4_pm_info = {
-.name  = TYPE_PIIX4_PM,
-.parent= TYPE_PCI_DEVICE,
-.instance_size = sizeof(PIIX4PMState),
-.class_init= piix4_pm_class_init,
-};
-
-static void piix4_pm_register_types(void)
-{
-type_register_static(piix4_pm_info);
-}
-
-type_init(piix4_pm_register_types)
-
 static uint64_t gpe_readb(void *opaque, hwaddr addr, unsigned width)
 {
 PIIX4PMState *s = opaque;
@@ -772,3 +732,43 @@ static int piix4_device_hotplug(DeviceState *qdev, 
PCIDevice *dev,
 
 return 0;
 }
+
+static Property piix4_pm_properties[] = {
+DEFINE_PROP_UINT32(smb_io_base, PIIX4PMState, smb_io_base, 0),
+DEFINE_PROP_UINT8(ACPI_PM_PROP_S3_DISABLED, PIIX4PMState, disable_s3, 0),
+DEFINE_PROP_UINT8(ACPI_PM_PROP_S4_DISABLED, PIIX4PMState, disable_s4, 0),
+DEFINE_PROP_UINT8(ACPI_PM_PROP_S4_VAL, PIIX4PMState, s4_val, 2),
+DEFINE_PROP_END_OF_LIST(),
+};
+
+static void piix4_pm_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
+
+k-no_hotplug = 1;
+k-init = piix4_pm_initfn;
+k-config_write = pm_write_config;
+k-vendor_id = PCI_VENDOR_ID_INTEL;
+k-device_id = PCI_DEVICE_ID_INTEL_82371AB_3;
+k-revision = 0x03;
+k-class_id = PCI_CLASS_BRIDGE_OTHER;
+dc-desc = PM;
+dc-no_user = 1;
+dc-vmsd = vmstate_acpi;
+dc-props = piix4_pm_properties;
+}
+
+static const TypeInfo piix4_pm_info = {
+.name  = TYPE_PIIX4_PM,
+.parent= TYPE_PCI_DEVICE,
+.instance_size = sizeof(PIIX4PMState),
+.class_init= piix4_pm_class_init,
+};
+
+static void piix4_pm_register_types(void)
+{
+type_register_static(piix4_pm_info);
+}
+
+type_init(piix4_pm_register_types)
-- 
1.8.3.1

Re: [Qemu-devel] [PATCH v2 00/19] bsd-user: Add system call and mips/arm support.

2013-12-13 Thread Paolo Bonzini

Il 12/12/2013 20:57, Ed Maste ha scritto:
  This is a large change in an area that hasn't had a lot of activity of
  late; what are the next steps here?
 
  We're now in hard freeze, so the next step is to wait for 1.8 to be
  released.
 
  I reviewed the parts out of bsd-user, and had only one question.
 Ok, 1.7's now out, and we'll sort out the HOST_ABI vs. HOST_VARIANT
 question.  What's our next step after that?

Post the patches.  I'll review the non-bsd-user parts again.  And then
the next step is to find someone who commits them...  shouldn't be hard
since you probably know much more about this code than anyone else.

Paolo

[Qemu-devel] [PATCH 02/11] qom: detect bad reentrance during object_class_foreach

2013-12-13 Thread Igor Mammedov

From: Hervé Poussineau hpous...@reactos.org

We should not modify the type hash table while it is being iterated on.
Assert that it does not happen.

Signed-off-by: Hervé Poussineau hpous...@reactos.org
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Igor Mammedov imamm...@redhat.com
---
v2:
  * make ver more descriptinve s/enumerating/enumerating_classes/
[asked-by: Peter Crosthwaite]
---
 qom/object.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/qom/object.c b/qom/object.c
index 3a43186..4a0fb86 100644
--- a/qom/object.c
+++ b/qom/object.c
@@ -78,8 +78,10 @@ static GHashTable *type_table_get(void)
 return type_table;
 }
 
+static bool enumerating_classes = false;
 static void type_table_add(TypeImpl *ti)
 {
+assert(!enumerating_classes);
 g_hash_table_insert(type_table_get(), (void *)ti-name, ti);
 }
 
@@ -666,7 +668,9 @@ void object_class_foreach(void (*fn)(ObjectClass *klass, 
void *opaque),
 {
 OCFData data = { fn, implements_type, include_abstract, opaque };
 
+enumerating_classes = true;
 g_hash_table_foreach(type_table_get(), object_class_foreach_tramp, data);
+enumerating_classes = false;
 }
 
 int object_child_foreach(Object *obj, int (*fn)(Object *child, void *opaque),
-- 
1.8.3.1

[Qemu-devel] [PATCH 11/11] hw/pci: switch to a generic hotplug handling for PCIDevice

2013-12-13 Thread Igor Mammedov

make qdev_unplug()/device_set_realized() to call hotplug handler's
plug/unplug methods if available and remove not needed anymore
hot(un)plug handling from PCIDevice.

In case if hotplug handler is not available, revert to the legacy
hotplug method.

Signed-off-by: Igor Mammedov imamm...@redhat.com
---
 hw/core/qdev.c   | 17 +
 hw/pci/pci.c | 29 +
 include/hw/pci/pci.h | 10 --
 include/hw/pci/pci_bus.h |  2 --
 4 files changed, 14 insertions(+), 44 deletions(-)

diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index 9418fea..7711c76 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -213,7 +213,6 @@ void qdev_unplug(DeviceState *dev, Error **errp)
 error_set(errp, QERR_BUS_NO_HOTPLUG, dev-parent_bus-name);
 return;
 }
-assert(dc-unplug != NULL);
 
 if (!dc-hotpluggable) {
 error_set(errp, QERR_DEVICE_NO_HOTPLUG,
@@ -223,9 +222,13 @@ void qdev_unplug(DeviceState *dev, Error **errp)
 
 qdev_hot_removed = true;
 
-if (dc-unplug(dev)  0) {
-error_set(errp, QERR_UNDEFINED_ERROR);
-return;
+if (dev-parent_bus  dev-parent_bus-hotplug_handler) {
+hotplug_handler_unplug(dev-parent_bus-hotplug_handler, dev, errp);
+} else {
+assert(dc-unplug != NULL);
+if (dc-unplug(dev)  0) { /* legacy handler */
+error_set(errp, QERR_UNDEFINED_ERROR);
+}
 }
 }
 
@@ -705,6 +708,12 @@ static void device_set_realized(Object *obj, bool value, 
Error **err)
 dc-realize(dev, local_err);
 }
 
+if (dev-parent_bus  dev-parent_bus-hotplug_handler 
+local_err == NULL) {
+hotplug_handler_plug(dev-parent_bus-hotplug_handler,
+ dev, local_err);
+}
+
 if (qdev_get_vmsd(dev)  local_err == NULL) {
 vmstate_register_with_alias_id(dev, -1, qdev_get_vmsd(dev), dev,
dev-instance_id_alias,
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 8a7a21b..6642e4c 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -35,6 +35,7 @@
 #include hw/pci/msi.h
 #include hw/pci/msix.h
 #include exec/address-spaces.h
+#include hw/hotplug.h
 
 //#define DEBUG_PCI
 #ifdef DEBUG_PCI
@@ -348,13 +349,6 @@ void pci_bus_irqs(PCIBus *bus, pci_set_irq_fn set_irq, 
pci_map_irq_fn map_irq,
 bus-irq_count = g_malloc0(nirq * sizeof(bus-irq_count[0]));
 }
 
-void pci_bus_hotplug(PCIBus *bus, pci_hotplug_fn hotplug, DeviceState *qdev)
-{
-bus-qbus.allow_hotplug = 1;
-bus-hotplug = hotplug;
-bus-hotplug_qdev = qdev;
-}
-
 PCIBus *pci_register_bus(DeviceState *parent, const char *name,
  pci_set_irq_fn set_irq, pci_map_irq_fn map_irq,
  void *irq_opaque,
@@ -1752,29 +1746,9 @@ static int pci_qdev_init(DeviceState *qdev)
 }
 pci_add_option_rom(pci_dev, is_default_rom);
 
-if (bus-hotplug) {
-/* Let buses differentiate between hotplug and when device is
- * enabled during qemu machine creation. */
-rc = bus-hotplug(bus-hotplug_qdev, pci_dev,
-  qdev-hotplugged ? PCI_HOTPLUG_ENABLED:
-  PCI_COLDPLUG_ENABLED);
-if (rc != 0) {
-int r = pci_unregister_device(pci_dev-qdev);
-assert(!r);
-return rc;
-}
-}
 return 0;
 }
 
-static int pci_unplug_device(DeviceState *qdev)
-{
-PCIDevice *dev = PCI_DEVICE(qdev);
-
-return dev-bus-hotplug(dev-bus-hotplug_qdev, dev,
- PCI_HOTPLUG_DISABLED);
-}
-
 PCIDevice *pci_create_multifunction(PCIBus *bus, int devfn, bool multifunction,
 const char *name)
 {
@@ -2245,7 +2219,6 @@ static void pci_device_class_init(ObjectClass *klass, 
void *data)
 {
 DeviceClass *k = DEVICE_CLASS(klass);
 k-init = pci_qdev_init;
-k-unplug = pci_unplug_device;
 k-exit = pci_unregister_device;
 k-bus_type = TYPE_PCI_BUS;
 k-props = pci_props;
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 6b79358..dd45e04 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -327,15 +327,6 @@ typedef void (*pci_set_irq_fn)(void *opaque, int irq_num, 
int level);
 typedef int (*pci_map_irq_fn)(PCIDevice *pci_dev, int irq_num);
 typedef PCIINTxRoute (*pci_route_irq_fn)(void *opaque, int pin);
 
-typedef enum {
-PCI_HOTPLUG_DISABLED,
-PCI_HOTPLUG_ENABLED,
-PCI_COLDPLUG_ENABLED,
-} PCIHotplugState;
-
-typedef int (*pci_hotplug_fn)(DeviceState *qdev, PCIDevice *pci_dev,
-  PCIHotplugState state);
-
 #define TYPE_PCI_BUS PCI
 #define PCI_BUS(obj) OBJECT_CHECK(PCIBus, (obj), TYPE_PCI_BUS)
 #define TYPE_PCIE_BUS PCIE
@@ -354,7 +345,6 @@ PCIBus *pci_bus_new(DeviceState *parent, const char *name,
 void pci_bus_irqs(PCIBus *bus, pci_set_irq_fn set_irq, pci_map_irq_fn map_irq,
   void *irq_opaque, int

Re: [Qemu-devel] [PATCH v3] inet_listen_opts: add error checking

2013-12-13 Thread Gerd Hoffmann

  Hi,

 parse_uint_full takes an 'unsigned long long *', but you are passing an
 'int *'.  I'm surprised it compiled for you.  It causes a buffer
 overflow if the pointer is assigned to, and gives different results
 depending on platform endianness.

Fixed in v4.

  +error_setg(errp, can't convert to a number: %s, port);
  +return -1;
  +}
  +if (baseport  0 || baseport + port_offset  65535) {
  +error_setg(errp, port %s out of range, port);
 
 But errno is not set to a sane value at this point, so error_setg() is
 wrong.

That would be true for error_setg_errno()-

cheers,
  Gerd

Re: [Qemu-devel] [PATCH v4] inet_listen_opts: add error checking

2013-12-13 Thread Eric Blake

On 12/13/2013 05:42 AM, Eric Blake wrote:
 On 12/13/2013 03:12 AM, Gerd Hoffmann wrote:
 Don't use atoi() function which doesn't detect errors, switch to
 strtol and error out on failures.  Also add a range check while
 being at it.

 [ v4: didn't commit buildfix.  -ENOCOFFEE.  sorry for the spam ]
 [ v3: oops, v2 didn't build ]
 [ v2: use parse_uint_full instead of strtol ]
 

 
 +if (port_offset) {
 +unsigned long long baseport;
 +if (parse_uint_full(port, baseport, 10)  0) {
 +error_setg(errp, can't convert to a number: %s, port);
 +return -1;
 +}
 +if (baseport + port_offset  65535) {
 +error_setg(errp, port %s out of range, port);
 
 error_setg() is still reporting on a bogus errno value at this point.

My bad, now it's my turn for early-morning confusion.

I was thinking of error_setg_errno, but you are using error_setg.

That said, you STILL have a problem:

 +if (baseport + port_offset  65535) {

If baseport is 0xfffe, and port_offset is 5000, then their
sum is 4998 which is not  65535, so you fall through:

 +error_setg(errp, port %s out of range, port);
 +return -1;
 +}
 +snprintf(port, sizeof(port), %d, (int)baseport + port_offset);

and happily use a value that is less than port_offset at this point.  I
don't think you meant to do that.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH 07/11] qdev:pci: refactor PCIDevice to use generic hotpluggable property

2013-12-13 Thread Igor Mammedov

Get rid of PCIDevice specific PCIDeviceClass.no_hotplug and use
generic DeviceClass.hotpluggable field instead.

Signed-off-by: Igor Mammedov imamm...@redhat.com
---
v2:
* move generic hotplug checks to
  qdev: add hotpluggable property to Device patch
* s/hotplugable/hotpluggable/
---
 hw/acpi/piix4.c | 10 +-
 hw/display/cirrus_vga.c |  2 +-
 hw/display/qxl.c|  2 +-
 hw/display/vga-pci.c|  2 +-
 hw/display/vmware_vga.c |  2 +-
 hw/i386/acpi-build.c|  6 +++---
 hw/ide/piix.c   |  4 ++--
 hw/isa/piix4.c  |  2 +-
 hw/pci-host/piix.c  |  6 +++---
 hw/pci/pci.c| 11 +--
 hw/usb/hcd-ehci-pci.c   |  2 +-
 hw/usb/hcd-ohci.c   |  2 +-
 hw/usb/hcd-uhci.c   |  2 +-
 hw/usb/hcd-xhci.c   |  2 +-
 include/hw/pci/pci.h|  3 ---
 15 files changed, 23 insertions(+), 35 deletions(-)

diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c
index 8084a60..fff2126 100644
--- a/hw/acpi/piix4.c
+++ b/hw/acpi/piix4.c
@@ -323,9 +323,9 @@ static void acpi_piix_eject_slot(PIIX4PMState *s, unsigned 
slots)
 QTAILQ_FOREACH_SAFE(kid, bus-children, sibling, next) {
 DeviceState *qdev = kid-child;
 PCIDevice *dev = PCI_DEVICE(qdev);
-PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(dev);
+DeviceClass *dc = DEVICE_GET_CLASS(dev);
 if (PCI_SLOT(dev-devfn) == slot) {
-if (pc-no_hotplug) {
+if (!dc-hotpluggable) {
 slot_free = false;
 } else {
 object_unparent(OBJECT(qdev));
@@ -353,10 +353,10 @@ static void piix4_update_hotplug(PIIX4PMState *s)
 QTAILQ_FOREACH_SAFE(kid, bus-children, sibling, next) {
 DeviceState *qdev = kid-child;
 PCIDevice *pdev = PCI_DEVICE(qdev);
-PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(pdev);
+DeviceClass *dc = DEVICE_GET_CLASS(qdev);
 int slot = PCI_SLOT(pdev-devfn);
 
-if (pc-no_hotplug) {
+if (!dc-hotpluggable) {
 s-pci0_hotplug_enable = ~(1U  slot);
 }
 
@@ -746,7 +746,6 @@ static void piix4_pm_class_init(ObjectClass *klass, void 
*data)
 DeviceClass *dc = DEVICE_CLASS(klass);
 PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
 
-k-no_hotplug = 1;
 k-init = piix4_pm_initfn;
 k-config_write = pm_write_config;
 k-vendor_id = PCI_VENDOR_ID_INTEL;
@@ -757,6 +756,7 @@ static void piix4_pm_class_init(ObjectClass *klass, void 
*data)
 dc-no_user = 1;
 dc-vmsd = vmstate_acpi;
 dc-props = piix4_pm_properties;
+dc-hotpluggable = false;
 }
 
 static const TypeInfo piix4_pm_info = {
diff --git a/hw/display/cirrus_vga.c b/hw/display/cirrus_vga.c
index e4c345f..3a8fc0b 100644
--- a/hw/display/cirrus_vga.c
+++ b/hw/display/cirrus_vga.c
@@ -2996,7 +2996,6 @@ static void cirrus_vga_class_init(ObjectClass *klass, 
void *data)
 DeviceClass *dc = DEVICE_CLASS(klass);
 PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
 
-k-no_hotplug = 1;
 k-init = pci_cirrus_vga_initfn;
 k-romfile = VGABIOS_CIRRUS_FILENAME;
 k-vendor_id = PCI_VENDOR_ID_CIRRUS;
@@ -3006,6 +3005,7 @@ static void cirrus_vga_class_init(ObjectClass *klass, 
void *data)
 dc-desc = Cirrus CLGD 54xx VGA;
 dc-vmsd = vmstate_pci_cirrus_vga;
 dc-props = pci_vga_cirrus_properties;
+dc-hotpluggable = false;
 }
 
 static const TypeInfo cirrus_vga_info = {
diff --git a/hw/display/qxl.c b/hw/display/qxl.c
index efdefd6..abb45ce 100644
--- a/hw/display/qxl.c
+++ b/hw/display/qxl.c
@@ -2289,7 +2289,6 @@ static void qxl_primary_class_init(ObjectClass *klass, 
void *data)
 DeviceClass *dc = DEVICE_CLASS(klass);
 PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
 
-k-no_hotplug = 1;
 k-init = qxl_init_primary;
 k-romfile = vgabios-qxl.bin;
 k-vendor_id = REDHAT_PCI_VENDOR_ID;
@@ -2300,6 +2299,7 @@ static void qxl_primary_class_init(ObjectClass *klass, 
void *data)
 dc-reset = qxl_reset_handler;
 dc-vmsd = qxl_vmstate;
 dc-props = qxl_properties;
+dc-hotpluggable = false;
 }
 
 static const TypeInfo qxl_primary_info = {
diff --git a/hw/display/vga-pci.c b/hw/display/vga-pci.c
index b3a45c8..f74fc43 100644
--- a/hw/display/vga-pci.c
+++ b/hw/display/vga-pci.c
@@ -190,7 +190,6 @@ static void vga_class_init(ObjectClass *klass, void *data)
 DeviceClass *dc = DEVICE_CLASS(klass);
 PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
 
-k-no_hotplug = 1;
 k-init = pci_std_vga_initfn;
 k-romfile = vgabios-stdvga.bin;
 k-vendor_id = PCI_VENDOR_ID_QEMU;
@@ -198,6 +197,7 @@ static void vga_class_init(ObjectClass *klass, void *data)
 k-class_id = PCI_CLASS_DISPLAY_VGA;
 dc-vmsd = vmstate_vga_pci;
 dc-props = vga_pci_properties;
+dc-hotpluggable = false;
 set_bit(DEVICE_CATEGORY_DISPLAY, dc-categories);
 }
 
diff --git a/hw/display/vmware_vga.c b/hw/display/vmware_vga.c
index aba292c..334e718 100644
--- a/hw/display/vmware_vga.c
+++ b/hw/display/vmware_vga.c
@@ -1296,7 +1296,6 @@ static void

[Qemu-devel] [PATCH 05/11] qdev: add hotpluggable property to Device

2013-12-13 Thread Igor Mammedov

Currently it's possible to make PCIDevice not hotpluggable by using
no_hotplug field of PCIDeviceClass. However it limits this
only to PCI devices and prevents from generalizing hotplug code.

So add similar field to DeviceClass so it could be reused with other
Devices and would allow to replace PCI specific hotplug callbacks
with generic implementation.

In addition expose field as hotpluggable readonly property, to make
it possible to get it via QOM interface.

Make DeviceClass hotpluggable by default as it was assumed before.

Signed-off-by: Igor Mammedov imamm...@redhat.com
---
v4:
* s/hotplugable/hotpluggable/

v3:
* make DeviceClass hotpluggable by default
  Since PCIDevice still uses internal no_hotlpug checks it shouldn't
  reggress. And follow up patch that converts PCIDevices to use
  hotpluggable property will take care about not hotpluggable PCI
  devices explicitly setting hotpluggable to false in their class_init().

* move generic hotplug checks from
  7/11 qdev:pci: refactor PCIDevice to use generic hotplugable property
  to this patch
---
 hw/core/qdev.c | 29 +
 include/hw/qdev-core.h |  3 +++
 2 files changed, 32 insertions(+)

diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index 25c2d2c..9418fea 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -215,6 +215,12 @@ void qdev_unplug(DeviceState *dev, Error **errp)
 }
 assert(dc-unplug != NULL);
 
+if (!dc-hotpluggable) {
+error_set(errp, QERR_DEVICE_NO_HOTPLUG,
+  object_get_typename(OBJECT(dev)));
+return;
+}
+
 qdev_hot_removed = true;
 
 if (dc-unplug(dev)  0) {
@@ -679,6 +685,11 @@ static void device_set_realized(Object *obj, bool value, 
Error **err)
 DeviceClass *dc = DEVICE_GET_CLASS(dev);
 Error *local_err = NULL;
 
+if (dev-hotplugged  !dc-hotpluggable) {
+error_set(err, QERR_DEVICE_NO_HOTPLUG, object_get_typename(obj));
+return;
+}
+
 if (value  !dev-realized) {
 if (!obj-parent  local_err == NULL) {
 static int unattached_count;
@@ -719,6 +730,14 @@ static void device_set_realized(Object *obj, bool value, 
Error **err)
 dev-realized = value;
 }
 
+static bool device_get_hotpluggable(Object *obj, Error **err)
+{
+DeviceClass *dc = DEVICE_GET_CLASS(obj);
+DeviceState *dev = DEVICE(obj);
+
+return dc-hotpluggable  dev-parent_bus-allow_hotplug;
+}
+
 static void device_initfn(Object *obj)
 {
 DeviceState *dev = DEVICE(obj);
@@ -736,6 +755,8 @@ static void device_initfn(Object *obj)
 
 object_property_add_bool(obj, realized,
  device_get_realized, device_set_realized, NULL);
+object_property_add_bool(obj, hotpluggable,
+ device_get_hotpluggable, NULL, NULL);
 
 class = object_get_class(OBJECT(dev));
 do {
@@ -784,6 +805,14 @@ static void device_class_base_init(ObjectClass *class, 
void *data)
  * so do not propagate them to the subclasses.
  */
 klass-props = NULL;
+
+/* by default all devices were considered as hotpluggable,
+ * so with intent to check it in generic qdev_unplug() /
+ * device_set_realized() functions make every device
+ * hotpluggable. Devices that shouldn't be hoplugable,
+ * should override it in their class_init()
+ */
+klass-hotpluggable = true;
 }
 
 static void device_unparent(Object *obj)
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 684a5da..04bbef4 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -50,6 +50,8 @@ struct VMStateDescription;
  * is changed to %true. Deprecated, new types inheriting directly from
  * TYPE_DEVICE should use @realize instead, new leaf types should consult
  * their respective parent type.
+ * @hotpluggable: booleean indicating if #DeviceClass is hotpluggable, 
available
+ * as readonly hotpluggable property of #DeviceState instance
  *
  * # Realization #
  * Devices are constructed in two stages,
@@ -99,6 +101,7 @@ typedef struct DeviceClass {
 const char *desc;
 Property *props;
 int no_user;
+bool hotpluggable;
 
 /* callbacks */
 void (*reset)(DeviceState *dev);
-- 
1.8.3.1

[Qemu-devel] [PATCH 09/11] pci/shpc: convert SHPC hotplug to use hotplug-handler API

2013-12-13 Thread Igor Mammedov

Split shpc_device_hotplug() into hotplug/unplug callbacks
and register them as hotplug-handler interface implementation of
PCI_BRIDGE_DEV device.

Replace pci_bus_hotplug() wiring with setting link on PCI BUS
hotplug-handler property to PCI_BRIDGE_DEV device.

Signed-off-by: Igor Mammedov imamm...@redhat.com
---
 hw/pci-bridge/pci_bridge_dev.c |   9 +++
 hw/pci/shpc.c  | 133 ++---
 include/hw/pci/shpc.h  |   8 +++
 3 files changed, 103 insertions(+), 47 deletions(-)

diff --git a/hw/pci-bridge/pci_bridge_dev.c b/hw/pci-bridge/pci_bridge_dev.c
index 440e187..e68145c 100644
--- a/hw/pci-bridge/pci_bridge_dev.c
+++ b/hw/pci-bridge/pci_bridge_dev.c
@@ -26,6 +26,7 @@
 #include hw/pci/slotid_cap.h
 #include exec/memory.h
 #include hw/pci/pci_bus.h
+#include hw/hotplug.h
 
 #define TYPE_PCI_BRIDGE_DEV pci-bridge
 #define PCI_BRIDGE_DEV(obj) \
@@ -136,6 +137,8 @@ static void pci_bridge_dev_class_init(ObjectClass *klass, 
void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
 PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
+HotplugHandlerClass *hc = HOTPLUG_HANDLER_CLASS(klass);
+
 k-init = pci_bridge_dev_initfn;
 k-exit = pci_bridge_dev_exitfn;
 k-config_write = pci_bridge_dev_write_config;
@@ -148,6 +151,8 @@ static void pci_bridge_dev_class_init(ObjectClass *klass, 
void *data)
 dc-props = pci_bridge_dev_properties;
 dc-vmsd = pci_bridge_dev_vmstate;
 set_bit(DEVICE_CATEGORY_BRIDGE, dc-categories);
+hc-plug = shpc_device_hotplug_cb;
+hc-unplug = shpc_device_hot_unplug_cb;
 }
 
 static const TypeInfo pci_bridge_dev_info = {
@@ -155,6 +160,10 @@ static const TypeInfo pci_bridge_dev_info = {
 .parent= TYPE_PCI_BRIDGE,
 .instance_size = sizeof(PCIBridgeDev),
 .class_init = pci_bridge_dev_class_init,
+.interfaces = (InterfaceInfo[]) {
+{ TYPE_HOTPLUG_HANDLER },
+{ }
+}
 };
 
 static void pci_bridge_dev_register(void)
diff --git a/hw/pci/shpc.c b/hw/pci/shpc.c
index 576244b..92dbd95 100644
--- a/hw/pci/shpc.c
+++ b/hw/pci/shpc.c
@@ -7,6 +7,7 @@
 #include hw/pci/pci.h
 #include hw/pci/pci_bus.h
 #include hw/pci/msi.h
+#include qapi/qmp/qerror.h
 
 /* TODO: model power only and disabled slot states. */
 /* TODO: handle SERR and wakeups */
@@ -490,73 +491,103 @@ static const MemoryRegionOps shpc_mmio_ops = {
 .max_access_size = 4,
 },
 };
-
-static int shpc_device_hotplug(DeviceState *qdev, PCIDevice *affected_dev,
-   PCIHotplugState hotplug_state)
+static void shpc_device_hotplug_common(PCIDevice *affected_dev, int *slot,
+   SHPCDevice *shpc, Error **errp)
 {
 int pci_slot = PCI_SLOT(affected_dev-devfn);
-uint8_t state;
-uint8_t led;
-PCIDevice *d = DO_UPCAST(PCIDevice, qdev, qdev);
-SHPCDevice *shpc = d-shpc;
-int slot = SHPC_PCI_TO_IDX(pci_slot);
-if (pci_slot  SHPC_IDX_TO_PCI(0) || slot = shpc-nslots) {
-error_report(Unsupported PCI slot %d for standard hotplug 
- controller. Valid slots are between %d and %d.,
- pci_slot, SHPC_IDX_TO_PCI(0),
- SHPC_IDX_TO_PCI(shpc-nslots) - 1);
-return -1;
+*slot = SHPC_PCI_TO_IDX(pci_slot);
+
+if (pci_slot  SHPC_IDX_TO_PCI(0) || *slot = shpc-nslots) {
+error_setg(errp, Unsupported PCI slot %d for standard hotplug 
+   controller. Valid slots are between %d and %d.,
+   pci_slot, SHPC_IDX_TO_PCI(0),
+   SHPC_IDX_TO_PCI(shpc-nslots) - 1);
+return;
+}
+}
+
+void shpc_device_hotplug_cb(HotplugHandler *hotplug_dev, DeviceState *dev,
+Error **errp)
+{
+Error *local_err = NULL;
+PCIDevice *pci_hotplug_dev = PCI_DEVICE(hotplug_dev);
+SHPCDevice *shpc = pci_hotplug_dev-shpc;
+int slot;
+
+shpc_device_hotplug_common(PCI_DEVICE(dev), slot, shpc, local_err);
+if (error_is_set(local_err)) {
+error_propagate(errp, local_err);
+return;
 }
+
 /* Don't send event when device is enabled during qemu machine creation:
  * it is present on boot, no hotplug event is necessary. We do send an
  * event when the device is disabled later. */
-if (hotplug_state == PCI_COLDPLUG_ENABLED) {
+if (!dev-hotplugged) {
 shpc_set_status(shpc, slot, 0, SHPC_SLOT_STATUS_MRL_OPEN);
 shpc_set_status(shpc, slot, SHPC_SLOT_STATUS_PRSNT_7_5W,
 SHPC_SLOT_STATUS_PRSNT_MASK);
-return 0;
+return;
 }
-if (hotplug_state == PCI_HOTPLUG_DISABLED) {
-shpc-config[SHPC_SLOT_EVENT_LATCH(slot)] |= SHPC_SLOT_EVENT_BUTTON;
-state = shpc_get_status(shpc, slot, SHPC_SLOT_STATE_MASK);
-led = shpc_get_status(shpc, slot, SHPC_SLOT_PWR_LED_MASK);
-if (state == SHPC_STATE_DISABLED  led == SHPC_LED_OFF) {
-shpc_free_devices_in_slot(shpc, slot);
-

Re: [Qemu-devel] [PATCH 00/13] Freescale mxs/imx23 + Olimex Olinuxino support

2013-12-13 Thread M P

Can someone give me a pointer on how the review (if any) is done for these
patches? I have to say I'm rather amazed at the rate of submission on the
mailing list, and I worried to see these patches buried further and further
down in such a short timescale :-)

Michael



On Wed, Dec 11, 2013 at 1:56 PM, Michel Pollet buser...@gmail.com wrote:

 This series adds support for the imx233 SoC, and also adds support for
 emulating
 an Olinux Olinuxino board with a few peripherals, as a test harness.
 The emulation works pretty well, boots linux 3.12 vanilla from an emulated
 SD card,
 has USB bridge support (but no support for USB 1.1 devices like
 mouse+keyboard), RTC
 and quite a few other bits (some of them fairly skeletal)

 This series has been in used for quite a few months; it was posted here a
 few month
 back and one of the question was to wether I would stick around to support
 it.
 Perhaps the fact that I reworked it all on trunk and reposted it will help
 answer
 this question.

 This patch series is also available on this github branch, in case its'
 more
 convenient to use the inline comment function there.
 https://github.com/buserror-uk/qemu-buserror/commits/dev-imx233


 Michel Pollet (13):
   mxs/imx23: Add main header file
   mxs: Add CONFIG_MXS to the arm-softmmu config
   mxs/imx23: Add uart driver
   mxs/imx23: Add DMA driver
   mxs/imx23: Add the interrupt collector
   mxs/imx23: Add digctl driver
   mxs/imx23: Implements the pin mux, GPIOs
   mxs/imx23: Add SSP/SPI driver
   mxs/imx23: Add the RTC block
   mxs/imx23: Add the timers
   mxs/imx23: Add the USB driver
   mxs/imx23: Main core instantiation and minor IO blocks
   mxs/imx23: Adds support for an Olinuxino board

  default-configs/arm-softmmu.mak |   1 +
  hw/arm/Makefile.objs|   2 +
  hw/arm/imx233-olinuxino.c   | 169 +
  hw/arm/imx23_digctl.c   | 110 
  hw/arm/imx23_pinctrl.c  | 293 ++
  hw/arm/mxs.c| 388
 
  hw/arm/mxs.h| 208 +
  hw/char/Makefile.objs   |   1 +
  hw/char/mxs_uart.c  | 146 +++
  hw/dma/Makefile.objs|   1 +
  hw/dma/mxs_dma.c| 347 +++
  hw/intc/Makefile.objs   |   1 +
  hw/intc/mxs_icoll.c | 200 +
  hw/ssi/Makefile.objs|   1 +
  hw/ssi/mxs_spi.c| 239 +
  hw/timer/Makefile.objs  |   1 +
  hw/timer/mxs_rtc.c  | 147 +++
  hw/timer/mxs_timrot.c   | 271 
  hw/usb/Makefile.objs|   1 +
  hw/usb/mxs_usb.c| 254 ++
  20 files changed, 2781 insertions(+)
  create mode 100644 hw/arm/imx233-olinuxino.c
  create mode 100644 hw/arm/imx23_digctl.c
  create mode 100644 hw/arm/imx23_pinctrl.c
  create mode 100644 hw/arm/mxs.c
  create mode 100644 hw/arm/mxs.h
  create mode 100644 hw/char/mxs_uart.c
  create mode 100644 hw/dma/mxs_dma.c
  create mode 100644 hw/intc/mxs_icoll.c
  create mode 100644 hw/ssi/mxs_spi.c
  create mode 100644 hw/timer/mxs_rtc.c
  create mode 100644 hw/timer/mxs_timrot.c
  create mode 100644 hw/usb/mxs_usb.c

 --
 1.8.5.1

Re: [Qemu-devel] Occasional clockjump in Win2012 after Live Migration

2013-12-13 Thread Vadim Rozenfeld

On Fri, 2013-12-13 at 12:50 +0100, Peter Lieven wrote:
 Am 13.12.2013 11:10, schrieb Vadim Rozenfeld:
  On Fri, 2013-12-13 at 09:27 +0100, Peter Lieven wrote:
  Am 13.12.2013 05:12, schrieb Vadim Rozenfeld:
  Does your VM belong to domain or workgroup? 
  We had 2 vServers where this happened. One was a Domain Controller and the 
  second was an independent Workgroup Server.
 
  Do you have evidence how the DateTime Clock is driven in Windows 2012?
  Should be CMOS periodic timer.
 you mean the RTC?
Right.
 Ok, so if there is a huge clock jump this one must have got messed up?
It might be. We have seen some similar problem during WHQL testing some
time ago. The problem happened because time on the host was not set
correct. So guest was taking this wrong time on boot and then it was a
big jump after resync with domain controller.

best regards,
Vadim.   
 
 Peter

Re: [Qemu-devel] Occasional clockjump in Win2012 after Live Migration

2013-12-13 Thread Peter Lieven

Am 13.12.2013 13:53, schrieb Vadim Rozenfeld:
 On Fri, 2013-12-13 at 12:50 +0100, Peter Lieven wrote:
 Am 13.12.2013 11:10, schrieb Vadim Rozenfeld:
 On Fri, 2013-12-13 at 09:27 +0100, Peter Lieven wrote:
 Am 13.12.2013 05:12, schrieb Vadim Rozenfeld:
 Does your VM belong to domain or workgroup? 
 We had 2 vServers where this happened. One was a Domain Controller and the 
 second was an independent Workgroup Server.

 Do you have evidence how the DateTime Clock is driven in Windows 2012?
 Should be CMOS periodic timer.
 you mean the RTC?
 Right.
 Ok, so if there is a huge clock jump this one must have got messed up?
 It might be. We have seen some similar problem during WHQL testing some
 time ago. The problem happened because time on the host was not set
 correct. So guest was taking this wrong time on boot and then it was a
 big jump after resync with domain controller.
our system forbids starting of vservers on a host where the time is not 
synched. ;-)

the big jump i see seems to be introduced by the live migration. i will 
investigate further.

Peter

[Qemu-devel] [PATCH 10/11] pci/pcie: convert PCIE hotplug to use hotplug-handler API

2013-12-13 Thread Igor Mammedov

Split pcie_cap_slot_hotplug() into hotplug/unplug callbacks
and register them as hotplug-handler interface implementation of
PCIE_SLOT device.

Replace pci_bus_hotplug() wiring with setting link on PCI BUS
hotplug-handler property to PCI_BRIDGE_DEV device.

Signed-off-by: Igor Mammedov imamm...@redhat.com
---
 hw/pci/pcie.c | 73 +--
 hw/pci/pcie_port.c|  8 ++
 include/hw/pci/pcie.h |  5 
 3 files changed, 61 insertions(+), 25 deletions(-)

diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index ca60cf2..d4e978c 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -26,6 +26,7 @@
 #include hw/pci/pci_bus.h
 #include hw/pci/pcie_regs.h
 #include qemu/range.h
+#include qapi/qmp/qerror.h
 
 //#define DEBUG_PCIE
 #ifdef DEBUG_PCIE
@@ -216,28 +217,20 @@ static void pcie_cap_slot_event(PCIDevice *dev, 
PCIExpressHotPlugEvent event)
 hotplug_event_notify(dev);
 }
 
-static int pcie_cap_slot_hotplug(DeviceState *qdev,
- PCIDevice *pci_dev, PCIHotplugState state)
+static void pcie_cap_slot_hotplug_common(PCIDevice *hotplug_dev,
+ DeviceState *dev,
+ uint8_t **exp_cap, Error **errp)
 {
-PCIDevice *d = PCI_DEVICE(qdev);
-uint8_t *exp_cap = d-config + d-exp.exp_cap;
-uint16_t sltsta = pci_get_word(exp_cap + PCI_EXP_SLTSTA);
-
-/* Don't send event when device is enabled during qemu machine creation:
- * it is present on boot, no hotplug event is necessary. We do send an
- * event when the device is disabled later. */
-if (state == PCI_COLDPLUG_ENABLED) {
-pci_word_test_and_set_mask(exp_cap + PCI_EXP_SLTSTA,
-   PCI_EXP_SLTSTA_PDS);
-return 0;
-}
+PCIDevice *pci_dev = PCI_DEVICE(dev);
+*exp_cap = hotplug_dev-config + hotplug_dev-exp.exp_cap;
+uint16_t sltsta = pci_get_word(*exp_cap + PCI_EXP_SLTSTA);
 
 PCIE_DEV_PRINTF(pci_dev, hotplug state: %d\n, state);
 if (sltsta  PCI_EXP_SLTSTA_EIS) {
 /* the slot is electromechanically locked.
  * This error is propagated up to qdev and then to HMP/QMP.
  */
-return -EBUSY;
+error_setg_errno(errp, -EBUSY, slot is electromechanically locked);
 }
 
 /* TODO: multifunction hot-plug.
@@ -245,18 +238,40 @@ static int pcie_cap_slot_hotplug(DeviceState *qdev,
  * hot plugged/unplugged.
  */
 assert(PCI_FUNC(pci_dev-devfn) == 0);
+}
 
-if (state == PCI_HOTPLUG_ENABLED) {
+void pcie_cap_slot_hotplug_cb(HotplugHandler *hotplug_dev, DeviceState *dev,
+  Error **errp)
+{
+uint8_t *exp_cap;
+
+pcie_cap_slot_hotplug_common(PCI_DEVICE(hotplug_dev), dev, exp_cap, errp);
+
+/* Don't send event when device is enabled during qemu machine creation:
+ * it is present on boot, no hotplug event is necessary. We do send an
+ * event when the device is disabled later. */
+if (!dev-hotplugged) {
 pci_word_test_and_set_mask(exp_cap + PCI_EXP_SLTSTA,
PCI_EXP_SLTSTA_PDS);
-pcie_cap_slot_event(d, PCI_EXP_HP_EV_PDC);
-} else {
-object_unparent(OBJECT(pci_dev));
-pci_word_test_and_clear_mask(exp_cap + PCI_EXP_SLTSTA,
- PCI_EXP_SLTSTA_PDS);
-pcie_cap_slot_event(d, PCI_EXP_HP_EV_PDC);
+return;
 }
-return 0;
+
+pci_word_test_and_set_mask(exp_cap + PCI_EXP_SLTSTA,
+   PCI_EXP_SLTSTA_PDS);
+pcie_cap_slot_event(PCI_DEVICE(hotplug_dev), PCI_EXP_HP_EV_PDC);
+}
+
+void pcie_cap_slot_hot_unplug_cb(HotplugHandler *hotplug_dev, DeviceState *dev,
+ Error **errp)
+{
+uint8_t *exp_cap;
+
+pcie_cap_slot_hotplug_common(PCI_DEVICE(hotplug_dev), dev, exp_cap, errp);
+
+object_unparent(OBJECT(dev));
+pci_word_test_and_clear_mask(exp_cap + PCI_EXP_SLTSTA,
+ PCI_EXP_SLTSTA_PDS);
+pcie_cap_slot_event(PCI_DEVICE(hotplug_dev), PCI_EXP_HP_EV_PDC);
 }
 
 /* pci express slot for pci express root/downstream port
@@ -264,6 +279,8 @@ static int pcie_cap_slot_hotplug(DeviceState *qdev,
 void pcie_cap_slot_init(PCIDevice *dev, uint16_t slot)
 {
 uint32_t pos = dev-exp.exp_cap;
+BusState *bus = BUS(pci_bridge_get_sec_bus(PCI_BRIDGE(dev)));
+Error *local_error = NULL;
 
 pci_word_test_and_set_mask(dev-config + pos + PCI_EXP_FLAGS,
PCI_EXP_FLAGS_SLOT);
@@ -305,8 +322,14 @@ void pcie_cap_slot_init(PCIDevice *dev, uint16_t slot)
 
 dev-exp.hpev_notified = false;
 
-pci_bus_hotplug(pci_bridge_get_sec_bus(PCI_BRIDGE(dev)),
-pcie_cap_slot_hotplug, dev-qdev);
+object_property_set_link(OBJECT(bus), OBJECT(dev),
+ QDEV_HOTPLUG_HANDLER_PROPERTY, local_error);
+if (error_is_set(local_error)) {
+

[Qemu-devel] [PATCH v2 07/24] block: rename buffer_alignment to guest_block_size

2013-12-13 Thread Kevin Wolf

From: Paolo Bonzini pbonz...@redhat.com

The alignment field is now set to the value that is promised to the
guest, rather than required by the host.  The next patches will make
QEMU aware of the host-provided values, so make this clear.

The alignment is also not about memory buffers, but about the sectors on
the disk, change the documentation of the field.

At this point, the field is set by the device emulation, but completely
ignored by the block layer.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Kevin Wolf kw...@redhat.com
Reviewed-by: Wenchao Xia xiaw...@linux.vnet.ibm.com
---
 block.c   | 10 +-
 hw/block/virtio-blk.c |  2 +-
 hw/ide/core.c |  2 +-
 hw/scsi/scsi-disk.c   |  2 +-
 hw/scsi/scsi-generic.c|  2 +-
 include/block/block.h |  2 +-
 include/block/block_int.h |  4 ++--
 7 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/block.c b/block.c
index d50a4e4..3504d17 100644
--- a/block.c
+++ b/block.c
@@ -812,7 +812,7 @@ static int bdrv_open_common(BlockDriverState *bs, 
BlockDriverState *file,
 }
 
 bs-open_flags = flags;
-bs-buffer_alignment = 512;
+bs-guest_block_size = 512;
 bs-zero_beyond_eof = true;
 open_flags = bdrv_open_flags(bs, flags);
 bs-read_only = !(open_flags  BDRV_O_RDWR);
@@ -1648,7 +1648,7 @@ static void bdrv_move_feature_fields(BlockDriverState 
*bs_dest,
 bs_dest-dev_ops= bs_src-dev_ops;
 bs_dest-dev_opaque = bs_src-dev_opaque;
 bs_dest-dev= bs_src-dev;
-bs_dest-buffer_alignment   = bs_src-buffer_alignment;
+bs_dest-guest_block_size   = bs_src-guest_block_size;
 bs_dest-copy_on_read   = bs_src-copy_on_read;
 
 bs_dest-enable_write_cache = bs_src-enable_write_cache;
@@ -1800,7 +1800,7 @@ void bdrv_detach_dev(BlockDriverState *bs, void *dev)
 bs-dev = NULL;
 bs-dev_ops = NULL;
 bs-dev_opaque = NULL;
-bs-buffer_alignment = 512;
+bs-guest_block_size = 512;
 }
 
 /* TODO change to return DeviceState * when all users are qdevified */
@@ -4556,9 +4556,9 @@ BlockDriverAIOCB *bdrv_aio_ioctl(BlockDriverState *bs,
 return NULL;
 }
 
-void bdrv_set_buffer_alignment(BlockDriverState *bs, int align)
+void bdrv_set_guest_block_size(BlockDriverState *bs, int align)
 {
-bs-buffer_alignment = align;
+bs-guest_block_size = align;
 }
 
 void *qemu_blockalign(BlockDriverState *bs, size_t size)
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 13f6d82..323e9ec 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -720,7 +720,7 @@ static int virtio_blk_device_init(VirtIODevice *vdev)
 register_savevm(qdev, virtio-blk, virtio_blk_id++, 2,
 virtio_blk_save, virtio_blk_load, s);
 bdrv_set_dev_ops(s-bs, virtio_block_ops, s);
-bdrv_set_buffer_alignment(s-bs, s-conf-logical_block_size);
+bdrv_set_guest_block_size(s-bs, s-conf-logical_block_size);
 
 bdrv_iostatus_enable(s-bs);
 
diff --git a/hw/ide/core.c b/hw/ide/core.c
index e1f4c33..036cd4a 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -2103,7 +2103,7 @@ int ide_init_drive(IDEState *s, BlockDriverState *bs, 
IDEDriveKind kind,
 s-smart_selftest_count = 0;
 if (kind == IDE_CD) {
 bdrv_set_dev_ops(bs, ide_cd_block_ops, s);
-bdrv_set_buffer_alignment(bs, 2048);
+bdrv_set_guest_block_size(bs, 2048);
 } else {
 if (!bdrv_is_inserted(s-bs)) {
 error_report(Device needs media, but drive is empty);
diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index efadfc0..6cf6040 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -2225,7 +2225,7 @@ static int scsi_initfn(SCSIDevice *dev)
 } else {
 bdrv_set_dev_ops(s-qdev.conf.bs, scsi_disk_block_ops, s);
 }
-bdrv_set_buffer_alignment(s-qdev.conf.bs, s-qdev.blocksize);
+bdrv_set_guest_block_size(s-qdev.conf.bs, s-qdev.blocksize);
 
 bdrv_iostatus_enable(s-qdev.conf.bs);
 add_boot_device_path(s-qdev.conf.bootindex, dev-qdev, NULL);
diff --git a/hw/scsi/scsi-generic.c b/hw/scsi/scsi-generic.c
index 8f195be..f08b64e 100644
--- a/hw/scsi/scsi-generic.c
+++ b/hw/scsi/scsi-generic.c
@@ -210,7 +210,7 @@ static void scsi_read_complete(void * opaque, int ret)
 s-blocksize = ldl_be_p(r-buf[8]);
 s-max_lba = ldq_be_p(r-buf[0]);
 }
-bdrv_set_buffer_alignment(s-conf.bs, s-blocksize);
+bdrv_set_guest_block_size(s-conf.bs, s-blocksize);
 
 scsi_req_data(r-req, len);
 if (!r-req.io_canceled) {
diff --git a/include/block/block.h b/include/block/block.h
index cf63ee2..05252d5 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -422,7 +422,7 @@ void bdrv_img_create(const char *filename, const char *fmt,
 /* Returns the alignment in bytes that is required so that no bounce buffer
  * is required throughout the stack */
 size_t bdrv_opt_mem_align(BlockDriverState *bs);
-void

[Qemu-devel] [PATCH v2 00/24] block: Support for 512b-on-4k emulatio

2013-12-13 Thread Kevin Wolf

This patch series adds code to the block layer that allows performing
I/O requests in smaller granularities than required by the host backend
(most importantly, O_DIRECT restrictions). It achieves this for reads
by rounding the request to host-side block boundary, and for writes by
performing a read-modify-write cycle (and serialising requests
touching the same block so that the RMW doesn't write back stale data).

Originally I intended to reuse a lot of code from Paolo's previous
patch series, however as I tried to integrate pread/pwrite, which
already do a very similar thing (except for considering concurrency),
and because I wanted to implement zero-copy, most of this series ended
up being new code.

Zero-copy is possible in a common case because while XFS defauls to a
4k sector size and therefore 4k on-disk O_DIRECT alignment for 512E
disks, it still only has a 512 byte memory alignment requirement.
(Unfortunately the XFS_IOC_DIOINFO ioctl claims 4k even for memory, but
we know that the value is wrong and can probe it.)


Changes in v1 - v2:
- Fixed overlap_bytes calculation in mark_request_serialising()
- Fixed wait_serialising_requests() deadlock
- iscsi: Set bs-request_alignment [Peter]
- iscsi: Query block limits only in iscsi_open() when no other request
  are in flight, and in iscsi_refresh_limits() copy the stored values
  into bs-bl [Peter]

Changes in RFC - v1:
- Moved opt_mem_alignment into BlockLimits [Paolo]
- Changed BlockLimits in turn to work a bit more like the
  .bdrv_opt_mem_align() callback of the RFC; allows updating the
  BlockLimits later when the chain changes or bdrv_reopen() toggles
  O_DIRECT
- Fixed a typo in a commit message [Eric]

Kevin Wolf (21):
  block: Move initialisation of BlockLimits to bdrv_refresh_limits()
  block: Inherit opt_transfer_length
  block: Update BlockLimits when they might have changed
  qemu_memalign: Allow small alignments
  block: Detect unaligned length in bdrv_qiov_is_aligned()
  block: Don't use guest sector size for qemu_blockalign()
  block: Introduce bdrv_aligned_preadv()
  block: Introduce bdrv_co_do_preadv()
  block: Introduce bdrv_aligned_pwritev()
  block: write: Handle COR dependency after I/O throttling
  block: Introduce bdrv_co_do_pwritev()
  block: Switch BdrvTrackedRequest to byte granularity
  block: Allow waiting for overlapping requests between begin/end
  block: Make zero-after-EOF work with larger alignment
  block: Generalise and optimise COR serialisation
  block: Make overlap range for serialisation dynamic
  block: Allow wait_serialising_requests() at any point
  block: Align requests in bdrv_co_do_pwritev()
  block: Change coroutine wrapper to byte granularity
  block: Make bdrv_pread() a bdrv_prwv_co() wrapper
  block: Make bdrv_pwrite() a bdrv_prwv_co() wrapper

Paolo Bonzini (3):
  block: rename buffer_alignment to guest_block_size
  raw: Probe required direct I/O alignment
  iscsi: Set bs-request_alignment

 block.c   | 634 +++---
 block/backup.c|   7 +-
 block/iscsi.c |  47 ++--
 block/qcow2.c |  11 +-
 block/qed.c   |  11 +-
 block/raw-posix.c | 102 ++--
 block/raw-win32.c |  41 +++
 block/stream.c|   2 +
 block/vmdk.c  |  22 +-
 hw/block/virtio-blk.c |   2 +-
 hw/ide/core.c |   2 +-
 hw/scsi/scsi-disk.c   |   2 +-
 hw/scsi/scsi-generic.c|   2 +-
 include/block/block.h |   6 +-
 include/block/block_int.h |  27 +-
 util/oslib-posix.c|   5 +
 16 files changed, 673 insertions(+), 250 deletions(-)

-- 
1.8.1.4

[Qemu-devel] [PATCH v2 02/24] block: Inherit opt_transfer_length

2013-12-13 Thread Kevin Wolf

When there is a format driver between the backend, it's not guaranteed
that exposing the opt_transfer_length for the format driver results in
the optimal requests (because of fragmentation etc.), but it can't make
things worse, so let's just do it.

Signed-off-by: Kevin Wolf kw...@redhat.com
Reviewed-by: Wenchao Xia xiaw...@linux.vnet.ibm.com
---
 block.c | 20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index 3bbcad4..79a325f 100644
--- a/block.c
+++ b/block.c
@@ -485,7 +485,25 @@ static int bdrv_refresh_limits(BlockDriverState *bs)
 
 memset(bs-bl, 0, sizeof(bs-bl));
 
-if (drv  drv-bdrv_refresh_limits) {
+if (!drv) {
+return 0;
+}
+
+/* Take some limits from the children as a default */
+if (bs-file) {
+bdrv_refresh_limits(bs-file);
+bs-bl.opt_transfer_length = bs-file-bl.opt_transfer_length;
+}
+
+if (bs-backing_hd) {
+bdrv_refresh_limits(bs-backing_hd);
+bs-bl.opt_transfer_length =
+MAX(bs-bl.opt_transfer_length,
+bs-backing_hd-bl.opt_transfer_length);
+}
+
+/* Then let the driver override it */
+if (drv-bdrv_refresh_limits) {
 return drv-bdrv_refresh_limits(bs);
 }
 
-- 
1.8.1.4

[Qemu-devel] [PATCH v2 03/24] block: Update BlockLimits when they might have changed

2013-12-13 Thread Kevin Wolf

When reopening with different flags, or when backing files disappear
from the chain, the limits may change. Make sure they get updated in
these cases.

Signed-off-by: Kevin Wolf kw...@redhat.com
Reviewed-by: Wenchao Xia xiaw...@linux.vnet.ibm.com
---
 block.c   | 5 -
 block/stream.c| 2 ++
 include/block/block.h | 1 +
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index 79a325f..8eae359 100644
--- a/block.c
+++ b/block.c
@@ -479,7 +479,7 @@ int bdrv_create_file(const char* filename, 
QEMUOptionParameter *options,
 return ret;
 }
 
-static int bdrv_refresh_limits(BlockDriverState *bs)
+int bdrv_refresh_limits(BlockDriverState *bs)
 {
 BlockDriver *drv = bs-drv;
 
@@ -1464,6 +1464,8 @@ void bdrv_reopen_commit(BDRVReopenState *reopen_state)
 reopen_state-bs-enable_write_cache = !!(reopen_state-flags 
   BDRV_O_CACHE_WB);
 reopen_state-bs-read_only = !(reopen_state-flags  BDRV_O_RDWR);
+
+bdrv_refresh_limits(reopen_state-bs);
 }
 
 /*
@@ -2261,6 +2263,7 @@ int bdrv_drop_intermediate(BlockDriverState *active, 
BlockDriverState *top,
 }
 new_top_bs-backing_hd = base_bs;
 
+bdrv_refresh_limits(new_top_bs);
 
 QSIMPLEQ_FOREACH_SAFE(intermediate_state, states_to_delete, entry, next) {
 /* so that bdrv_close() does not recursively close the chain */
diff --git a/block/stream.c b/block/stream.c
index 46bec7d..dd0b4ac 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -75,6 +75,8 @@ static void close_unused_images(BlockDriverState *top, 
BlockDriverState *base,
 unused-backing_hd = NULL;
 bdrv_unref(unused);
 }
+
+bdrv_refresh_limits(top);
 }
 
 static void coroutine_fn stream_run(void *opaque)
diff --git a/include/block/block.h b/include/block/block.h
index 36efaea..64fb319 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -249,6 +249,7 @@ int bdrv_truncate(BlockDriverState *bs, int64_t offset);
 int64_t bdrv_getlength(BlockDriverState *bs);
 int64_t bdrv_get_allocated_file_size(BlockDriverState *bs);
 void bdrv_get_geometry(BlockDriverState *bs, uint64_t *nb_sectors_ptr);
+int bdrv_refresh_limits(BlockDriverState *bs);
 int bdrv_commit(BlockDriverState *bs);
 int bdrv_commit_all(void);
 int bdrv_change_backing_file(BlockDriverState *bs,
-- 
1.8.1.4

[Qemu-devel] [PATCH v2 04/24] qemu_memalign: Allow small alignments

2013-12-13 Thread Kevin Wolf

The functions used by qemu_memalign() require an alignment that is at
least sizeof(void*). Adjust it if it is too small.

Signed-off-by: Kevin Wolf kw...@redhat.com
Reviewed-by: Wenchao Xia xiaw...@linux.vnet.ibm.com
---
 util/oslib-posix.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index e00a44c..54f8932 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -85,6 +85,11 @@ void *qemu_oom_check(void *ptr)
 void *qemu_memalign(size_t alignment, size_t size)
 {
 void *ptr;
+
+if (alignment  sizeof(void*)) {
+alignment = sizeof(void*);
+}
+
 #if defined(_POSIX_C_SOURCE)  !defined(__sun__)
 int ret;
 ret = posix_memalign(ptr, alignment, size);
-- 
1.8.1.4

[Qemu-devel] [PATCH v2 01/24] block: Move initialisation of BlockLimits to bdrv_refresh_limits()

2013-12-13 Thread Kevin Wolf

This function separates filling the BlockLimits from bdrv_open(), which
allows it to call it from other operations which may change the limits
(e.g. modifications to the backing file chain or bdrv_reopen)

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 block.c   | 19 +++
 block/iscsi.c | 46 +-
 block/qcow2.c | 11 ++-
 block/qed.c   | 11 ++-
 block/vmdk.c  | 22 ++
 include/block/block_int.h |  2 ++
 6 files changed, 88 insertions(+), 23 deletions(-)

diff --git a/block.c b/block.c
index 13f001a..3bbcad4 100644
--- a/block.c
+++ b/block.c
@@ -479,6 +479,19 @@ int bdrv_create_file(const char* filename, 
QEMUOptionParameter *options,
 return ret;
 }
 
+static int bdrv_refresh_limits(BlockDriverState *bs)
+{
+BlockDriver *drv = bs-drv;
+
+memset(bs-bl, 0, sizeof(bs-bl));
+
+if (drv  drv-bdrv_refresh_limits) {
+return drv-bdrv_refresh_limits(bs);
+}
+
+return 0;
+}
+
 /*
  * Create a uniquely-named empty temporary file.
  * Return 0 upon success, otherwise a negative errno value.
@@ -833,6 +846,8 @@ static int bdrv_open_common(BlockDriverState *bs, 
BlockDriverState *file,
 goto free_and_fail;
 }
 
+bdrv_refresh_limits(bs);
+
 #ifndef _WIN32
 if (bs-is_temporary) {
 assert(bs-filename[0] != '\0');
@@ -1018,6 +1033,10 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict 
*options, Error **errp)
 }
 pstrcpy(bs-backing_file, sizeof(bs-backing_file),
 bs-backing_hd-file-filename);
+
+/* Recalculate the BlockLimits with the backing file */
+bdrv_refresh_limits(bs);
+
 return 0;
 }
 
diff --git a/block/iscsi.c b/block/iscsi.c
index 829d444..18bc093 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -1443,23 +1443,6 @@ static int iscsi_open(BlockDriverState *bs, QDict 
*options, int flags,
sizeof(struct scsi_inquiry_block_limits));
 scsi_free_scsi_task(task);
 task = NULL;
-
-if (iscsilun-bl.max_unmap  0x) {
-bs-bl.max_discard = sector_lun2qemu(iscsilun-bl.max_unmap,
- iscsilun);
-}
-bs-bl.discard_alignment = sector_lun2qemu(iscsilun-bl.opt_unmap_gran,
-   iscsilun);
-
-if (iscsilun-bl.max_ws_len  0x) {
-bs-bl.max_write_zeroes = sector_lun2qemu(iscsilun-bl.max_ws_len,
-  iscsilun);
-}
-bs-bl.write_zeroes_alignment = 
sector_lun2qemu(iscsilun-bl.opt_unmap_gran,
-iscsilun);
-
-bs-bl.opt_transfer_length = sector_lun2qemu(iscsilun-bl.opt_xfer_len,
- iscsilun);
 }
 
 #if defined(LIBISCSI_FEATURE_NOP_COUNTER)
@@ -1504,6 +1487,34 @@ static void iscsi_close(BlockDriverState *bs)
 memset(iscsilun, 0, sizeof(IscsiLun));
 }
 
+static int iscsi_refresh_limits(BlockDriverState *bs)
+{
+IscsiLun *iscsilun = bs-opaque;
+
+/* We don't actually refresh here, but just return data queried in
+ * iscsi_open(): iscsi targets don't change their limits. */
+if (iscsilun-lbp.lbpu || iscsilun-lbp.lbpws) {
+if (iscsilun-bl.max_unmap  0x) {
+bs-bl.max_discard = sector_lun2qemu(iscsilun-bl.max_unmap,
+ iscsilun);
+}
+bs-bl.discard_alignment = sector_lun2qemu(iscsilun-bl.opt_unmap_gran,
+   iscsilun);
+
+if (iscsilun-bl.max_ws_len  0x) {
+bs-bl.max_write_zeroes = sector_lun2qemu(iscsilun-bl.max_ws_len,
+  iscsilun);
+}
+bs-bl.write_zeroes_alignment = 
sector_lun2qemu(iscsilun-bl.opt_unmap_gran,
+iscsilun);
+
+bs-bl.opt_transfer_length = sector_lun2qemu(iscsilun-bl.opt_xfer_len,
+ iscsilun);
+}
+
+return 0;
+}
+
 static int iscsi_truncate(BlockDriverState *bs, int64_t offset)
 {
 IscsiLun *iscsilun = bs-opaque;
@@ -1616,6 +1627,7 @@ static BlockDriver bdrv_iscsi = {
 .bdrv_getlength  = iscsi_getlength,
 .bdrv_get_info   = iscsi_get_info,
 .bdrv_truncate   = iscsi_truncate,
+.bdrv_refresh_limits = iscsi_refresh_limits,
 
 #if defined(LIBISCSI_FEATURE_IOVECTOR)
 .bdrv_co_get_block_status = iscsi_co_get_block_status,
diff --git a/block/qcow2.c b/block/qcow2.c
index f29aa88..f40355c 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -718,7 +718,6 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, 
int flags,
 }
 
 qemu_opts_del(opts);
-bs-bl.write_zeroes_alignment = s-cluster_sectors;
 
 if

[Qemu-devel] [PATCH v2 05/24] block: Detect unaligned length in bdrv_qiov_is_aligned()

2013-12-13 Thread Kevin Wolf

For an O_DIRECT request to succeed, it's not only necessary that all
base addresses in the qiov are aligned, but also that each length in it
is aligned.

Signed-off-by: Kevin Wolf kw...@redhat.com
Reviewed-by: Wenchao Xia xiaw...@linux.vnet.ibm.com
---
 block.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block.c b/block.c
index 8eae359..62c5a75 100644
--- a/block.c
+++ b/block.c
@@ -4561,6 +4561,9 @@ bool bdrv_qiov_is_aligned(BlockDriverState *bs, 
QEMUIOVector *qiov)
 if ((uintptr_t) qiov-iov[i].iov_base % bs-buffer_alignment) {
 return false;
 }
+if (qiov-iov[i].iov_len % bs-buffer_alignment) {
+return false;
+}
 }
 
 return true;
-- 
1.8.1.4

[Qemu-devel] [PATCH v2 08/24] raw: Probe required direct I/O alignment

2013-12-13 Thread Kevin Wolf

From: Paolo Bonzini pbonz...@redhat.com

Add a bs-request_alignment field that contains the required
offset/length alignment for I/O requests and fill it in the raw block
drivers. Use ioctls if possible, else see what alignment it takes for
O_DIRECT to succeed.

While at it, also expose the memory alignment requirements, which may be
(and in practice are) different from the disk alignment requirements.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Kevin Wolf kw...@redhat.com
---
 block.c   |   3 ++
 block/raw-posix.c | 102 ++
 block/raw-win32.c |  41 +++
 include/block/block_int.h |   3 ++
 4 files changed, 132 insertions(+), 17 deletions(-)

diff --git a/block.c b/block.c
index 3504d17..b86e754 100644
--- a/block.c
+++ b/block.c
@@ -813,6 +813,7 @@ static int bdrv_open_common(BlockDriverState *bs, 
BlockDriverState *file,
 
 bs-open_flags = flags;
 bs-guest_block_size = 512;
+bs-request_alignment = 512;
 bs-zero_beyond_eof = true;
 open_flags = bdrv_open_flags(bs, flags);
 bs-read_only = !(open_flags  BDRV_O_RDWR);
@@ -881,6 +882,8 @@ static int bdrv_open_common(BlockDriverState *bs, 
BlockDriverState *file,
 }
 
 bdrv_refresh_limits(bs);
+assert(bdrv_opt_mem_align(bs) != 0);
+assert(bs-request_alignment != 0);
 
 #ifndef _WIN32
 if (bs-is_temporary) {
diff --git a/block/raw-posix.c b/block/raw-posix.c
index 10c6b34..e8e75a7 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -127,6 +127,8 @@ typedef struct BDRVRawState {
 int fd;
 int type;
 int open_flags;
+size_t buf_align;
+
 #if defined(__linux__)
 /* linux floppy specific */
 int64_t fd_open_time;
@@ -213,6 +215,76 @@ static int raw_normalize_devicepath(const char **filename)
 }
 #endif
 
+static void raw_probe_alignment(BlockDriverState *bs)
+{
+BDRVRawState *s = bs-opaque;
+char *buf;
+unsigned int sector_size;
+
+/* For /dev/sg devices the alignment is not really used.
+   With buffered I/O, we don't have any restrictions. */
+if (bs-sg || !(s-open_flags  O_DIRECT)) {
+bs-request_alignment = 1;
+s-buf_align = 1;
+return;
+}
+
+/* Try a few ioctls to get the right size */
+bs-request_alignment = 0;
+s-buf_align = 0;
+
+#ifdef BLKSSZGET
+if (ioctl(s-fd, BLKSSZGET, sector_size) = 0) {
+bs-request_alignment = sector_size;
+}
+#endif
+#ifdef DKIOCGETBLOCKSIZE
+if (ioctl(s-fd, DKIOCGETBLOCKSIZE, sector_size) = 0) {
+bs-request_alignment = sector_size;
+}
+#endif
+#ifdef DIOCGSECTORSIZE
+if (ioctl(s-fd, DIOCGSECTORSIZE, sector_size) = 0) {
+bs-request_alignment = sector_size;
+}
+#endif
+#ifdef CONFIG_XFS
+if (s-is_xfs) {
+struct dioattr da;
+if (xfsctl(NULL, s-fd, XFS_IOC_DIOINFO, da) = 0) {
+bs-request_alignment = da.d_miniosz;
+/* The kernel returns wrong information for d_mem */
+/* s-buf_align = da.d_mem; */
+}
+}
+#endif
+
+/* If we could not get the sizes so far, we can only guess them */
+if (!s-buf_align) {
+size_t align;
+buf = qemu_memalign(MAX_BLOCKSIZE, 2 * MAX_BLOCKSIZE);
+for (align = 512; align = MAX_BLOCKSIZE; align = 1) {
+if (pread(s-fd, buf + align, MAX_BLOCKSIZE, 0) = 0) {
+s-buf_align = align;
+break;
+}
+}
+qemu_vfree(buf);
+}
+
+if (!bs-request_alignment) {
+size_t align;
+buf = qemu_memalign(s-buf_align, MAX_BLOCKSIZE);
+for (align = 512; align = MAX_BLOCKSIZE; align = 1) {
+if (pread(s-fd, buf, align, 0) = 0) {
+bs-request_alignment = align;
+break;
+}
+}
+qemu_vfree(buf);
+}
+}
+
 static void raw_parse_flags(int bdrv_flags, int *open_flags)
 {
 assert(open_flags != NULL);
@@ -463,7 +535,6 @@ static int raw_reopen_prepare(BDRVReopenState *state,
 return ret;
 }
 
-
 static void raw_reopen_commit(BDRVReopenState *state)
 {
 BDRVRawReopenState *raw_s = state-opaque;
@@ -499,23 +570,15 @@ static void raw_reopen_abort(BDRVReopenState *state)
 state-opaque = NULL;
 }
 
+static int raw_refresh_limits(BlockDriverState *bs)
+{
+BDRVRawState *s = bs-opaque;
 
-/* XXX: use host sector size if necessary with:
-#ifdef DIOCGSECTORSIZE
-{
-unsigned int sectorsize = 512;
-if (!ioctl(fd, DIOCGSECTORSIZE, sectorsize) 
-sectorsize  bufsize)
-bufsize = sectorsize;
-}
-#endif
-#ifdef CONFIG_COCOA
-uint32_t blockSize = 512;
-if ( !ioctl( fd, DKIOCGETBLOCKSIZE, blockSize )  blockSize  
bufsize) {
-bufsize = blockSize;
-}
-#endif
-*/
+raw_probe_alignment(bs);
+bs-bl.opt_mem_alignment = s-buf_align;
+
+return 0;
+}
 
 static ssize_t

[Qemu-devel] [PATCH v2 10/24] block: Introduce bdrv_co_do_preadv()

2013-12-13 Thread Kevin Wolf

Similar to bdrv_pread(), which aligns byte-aligned request to 512 byte
sectors, bdrv_co_do_preadv() takes a byte-aligned request and aligns it
to the alignment specified in bs-request_alignment.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 block.c | 63 +--
 1 file changed, 57 insertions(+), 6 deletions(-)

diff --git a/block.c b/block.c
index cbc9f6d..7812d8f 100644
--- a/block.c
+++ b/block.c
@@ -2786,17 +2786,23 @@ out:
 /*
  * Handle a read request in coroutine context
  */
-static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
-int64_t sector_num, int nb_sectors, QEMUIOVector *qiov,
+static int coroutine_fn bdrv_co_do_preadv(BlockDriverState *bs,
+int64_t offset, unsigned int bytes, QEMUIOVector *qiov,
 BdrvRequestFlags flags)
 {
 BlockDriver *drv = bs-drv;
+/* TODO Lift BDRV_SECTOR_SIZE restriction in BlockDriver interface */
+uint64_t align = MAX(BDRV_SECTOR_SIZE, bs-request_alignment);
+uint8_t *head_buf = NULL;
+uint8_t *tail_buf = NULL;
+QEMUIOVector local_qiov;
+bool use_local_qiov = false;
 int ret;
 
 if (!drv) {
 return -ENOMEDIUM;
 }
-if (bdrv_check_request(bs, sector_num, nb_sectors)) {
+if (bdrv_check_byte_request(bs, offset, bytes)) {
 return -EIO;
 }
 
@@ -2806,14 +2812,59 @@ static int coroutine_fn 
bdrv_co_do_readv(BlockDriverState *bs,
 
 /* throttling disk I/O */
 if (bs-io_limits_enabled) {
-bdrv_io_limits_intercept(bs, nb_sectors, false);
+bdrv_io_limits_intercept(bs, bytes  BDRV_SECTOR_BITS, false);
+}
+
+/* Align read if necessary by padding qiov */
+if (offset  (align - 1)) {
+head_buf = qemu_blockalign(bs, align);
+qemu_iovec_init(local_qiov, qiov-niov + 2);
+qemu_iovec_add(local_qiov, head_buf, offset  (align - 1));
+qemu_iovec_concat(local_qiov, qiov, 0, qiov-size);
+use_local_qiov = true;
+
+bytes += offset  (align - 1);
+offset = offset  ~(align - 1);
+}
+
+if ((offset + bytes)  (align - 1)) {
+if (!use_local_qiov) {
+qemu_iovec_init(local_qiov, qiov-niov + 1);
+qemu_iovec_concat(local_qiov, qiov, 0, qiov-size);
+use_local_qiov = true;
+}
+tail_buf = qemu_blockalign(bs, align);
+qemu_iovec_add(local_qiov, tail_buf,
+   align - ((offset + bytes)  (align - 1)));
+
+bytes = ROUND_UP(bytes, align);
+}
+
+ret = bdrv_aligned_preadv(bs, offset, bytes,
+  use_local_qiov ? local_qiov : qiov,
+  flags);
+
+if (use_local_qiov) {
+qemu_iovec_destroy(local_qiov);
+qemu_vfree(head_buf);
+qemu_vfree(tail_buf);
 }
 
-ret = bdrv_aligned_preadv(bs, sector_num  BDRV_SECTOR_BITS,
- nb_sectors  BDRV_SECTOR_BITS, qiov, flags);
 return ret;
 }
 
+static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
+int64_t sector_num, int nb_sectors, QEMUIOVector *qiov,
+BdrvRequestFlags flags)
+{
+if (nb_sectors  0 || nb_sectors  (UINT_MAX  BDRV_SECTOR_BITS)) {
+return -EINVAL;
+}
+
+return bdrv_co_do_preadv(bs, sector_num  BDRV_SECTOR_BITS,
+ nb_sectors  BDRV_SECTOR_BITS, qiov, flags);
+}
+
 int coroutine_fn bdrv_co_readv(BlockDriverState *bs, int64_t sector_num,
 int nb_sectors, QEMUIOVector *qiov)
 {
-- 
1.8.1.4

[Qemu-devel] [PATCH v2 09/24] block: Introduce bdrv_aligned_preadv()

2013-12-13 Thread Kevin Wolf

This separates the part of bdrv_co_do_readv() that needs to happen
before the request is modified to match the backend alignment, and a
part that needs to be executed afterwards and passes the request to the
BlockDriver.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 block.c | 61 +++--
 1 file changed, 43 insertions(+), 18 deletions(-)

diff --git a/block.c b/block.c
index b86e754..cbc9f6d 100644
--- a/block.c
+++ b/block.c
@@ -2700,26 +2700,24 @@ err:
 }
 
 /*
- * Handle a read request in coroutine context
+ * Forwards an already correctly aligned request to the BlockDriver. This
+ * handles copy on read and zeroing after EOF; any other features must be
+ * implemented by the caller.
  */
-static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
-int64_t sector_num, int nb_sectors, QEMUIOVector *qiov,
-BdrvRequestFlags flags)
+static int coroutine_fn bdrv_aligned_preadv(BlockDriverState *bs,
+int64_t offset, unsigned int bytes, QEMUIOVector *qiov, int flags)
 {
 BlockDriver *drv = bs-drv;
 BdrvTrackedRequest req;
 int ret;
 
-if (!drv) {
-return -ENOMEDIUM;
-}
-if (bdrv_check_request(bs, sector_num, nb_sectors)) {
-return -EIO;
-}
+int64_t sector_num = offset  BDRV_SECTOR_BITS;
+unsigned int nb_sectors = bytes  BDRV_SECTOR_BITS;
 
-if (bs-copy_on_read) {
-flags |= BDRV_REQ_COPY_ON_READ;
-}
+assert((offset  (BDRV_SECTOR_SIZE - 1)) == 0);
+assert((bytes  (BDRV_SECTOR_SIZE - 1)) == 0);
+
+/* Handle Copy on Read and associated serialisation */
 if (flags  BDRV_REQ_COPY_ON_READ) {
 bs-copy_on_read_in_flight++;
 }
@@ -2728,11 +2726,6 @@ static int coroutine_fn 
bdrv_co_do_readv(BlockDriverState *bs,
 wait_for_overlapping_requests(bs, sector_num, nb_sectors);
 }
 
-/* throttling disk I/O */
-if (bs-io_limits_enabled) {
-bdrv_io_limits_intercept(bs, nb_sectors, false);
-}
-
 tracked_request_begin(req, bs, sector_num, nb_sectors, false);
 
 if (flags  BDRV_REQ_COPY_ON_READ) {
@@ -2749,6 +2742,7 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState 
*bs,
 }
 }
 
+/* Forward the request to the BlockDriver */
 if (!(bs-zero_beyond_eof  bs-growable)) {
 ret = drv-bdrv_co_readv(bs, sector_num, nb_sectors, qiov);
 } else {
@@ -2789,6 +2783,37 @@ out:
 return ret;
 }
 
+/*
+ * Handle a read request in coroutine context
+ */
+static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
+int64_t sector_num, int nb_sectors, QEMUIOVector *qiov,
+BdrvRequestFlags flags)
+{
+BlockDriver *drv = bs-drv;
+int ret;
+
+if (!drv) {
+return -ENOMEDIUM;
+}
+if (bdrv_check_request(bs, sector_num, nb_sectors)) {
+return -EIO;
+}
+
+if (bs-copy_on_read) {
+flags |= BDRV_REQ_COPY_ON_READ;
+}
+
+/* throttling disk I/O */
+if (bs-io_limits_enabled) {
+bdrv_io_limits_intercept(bs, nb_sectors, false);
+}
+
+ret = bdrv_aligned_preadv(bs, sector_num  BDRV_SECTOR_BITS,
+ nb_sectors  BDRV_SECTOR_BITS, qiov, flags);
+return ret;
+}
+
 int coroutine_fn bdrv_co_readv(BlockDriverState *bs, int64_t sector_num,
 int nb_sectors, QEMUIOVector *qiov)
 {
-- 
1.8.1.4

[Qemu-devel] [PATCH v2 06/24] block: Don't use guest sector size for qemu_blockalign()

2013-12-13 Thread Kevin Wolf

bs-buffer_alignment is set by the device emulation and contains the
logical block size of the guest device. This isn't something that the
block layer should know, and even less something to use for determining
the right alignment of buffers to be used for the host.

The new BlockLimits field opt_mem_alignment tells the qemu block layer
the optimal alignment to be used so that no bounce buffer must be used
in the driver.

This patch may change the buffer alignment from 4k to 512 for all
callers that used qemu_blockalign() with the top-level image format
BlockDriverState. The value was never propagated to other levels in the
tree, so in particular raw-posix never required anything else than 512.

While on disks with 4k sectors direct I/O requires a 4k alignment,
memory may still be okay when aligned to 512 byte boundaries. This is
what must have happened in practice, because otherwise this would
already have failed earlier. Therefore I don't expect regressions even
with this intermediate state. Later, raw-posix can implement the hook
and expose a different memory alignment requirement.

Signed-off-by: Kevin Wolf kw...@redhat.com
Reviewed-by: Wenchao Xia xiaw...@linux.vnet.ibm.com
---
 block.c   | 23 ---
 include/block/block.h |  3 +++
 include/block/block_int.h |  3 +++
 3 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/block.c b/block.c
index 62c5a75..d50a4e4 100644
--- a/block.c
+++ b/block.c
@@ -214,6 +214,16 @@ static void bdrv_io_limits_intercept(BlockDriverState *bs,
 qemu_co_queue_next(bs-throttled_reqs[is_write]);
 }
 
+size_t bdrv_opt_mem_align(BlockDriverState *bs)
+{
+if (!bs || !bs-drv) {
+/* 4k should be on the safe side */
+return 4096;
+}
+
+return bs-bl.opt_mem_alignment;
+}
+
 /* check if the path starts with protocol: */
 static int path_has_protocol(const char *path)
 {
@@ -493,6 +503,9 @@ int bdrv_refresh_limits(BlockDriverState *bs)
 if (bs-file) {
 bdrv_refresh_limits(bs-file);
 bs-bl.opt_transfer_length = bs-file-bl.opt_transfer_length;
+bs-bl.opt_mem_alignment = bs-file-bl.opt_mem_alignment;
+} else {
+bs-bl.opt_mem_alignment = 512;
 }
 
 if (bs-backing_hd) {
@@ -500,6 +513,9 @@ int bdrv_refresh_limits(BlockDriverState *bs)
 bs-bl.opt_transfer_length =
 MAX(bs-bl.opt_transfer_length,
 bs-backing_hd-bl.opt_transfer_length);
+bs-bl.opt_mem_alignment =
+MAX(bs-bl.opt_mem_alignment,
+bs-backing_hd-bl.opt_mem_alignment);
 }
 
 /* Then let the driver override it */
@@ -4547,7 +4563,7 @@ void bdrv_set_buffer_alignment(BlockDriverState *bs, int 
align)
 
 void *qemu_blockalign(BlockDriverState *bs, size_t size)
 {
-return qemu_memalign((bs  bs-buffer_alignment) ? bs-buffer_alignment : 
512, size);
+return qemu_memalign(bdrv_opt_mem_align(bs), size);
 }
 
 /*
@@ -4556,12 +4572,13 @@ void *qemu_blockalign(BlockDriverState *bs, size_t size)
 bool bdrv_qiov_is_aligned(BlockDriverState *bs, QEMUIOVector *qiov)
 {
 int i;
+size_t alignment = bdrv_opt_mem_align(bs);
 
 for (i = 0; i  qiov-niov; i++) {
-if ((uintptr_t) qiov-iov[i].iov_base % bs-buffer_alignment) {
+if ((uintptr_t) qiov-iov[i].iov_base % alignment) {
 return false;
 }
-if (qiov-iov[i].iov_len % bs-buffer_alignment) {
+if (qiov-iov[i].iov_len % alignment) {
 return false;
 }
 }
diff --git a/include/block/block.h b/include/block/block.h
index 64fb319..cf63ee2 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -419,6 +419,9 @@ void bdrv_img_create(const char *filename, const char *fmt,
  char *options, uint64_t img_size, int flags,
  Error **errp, bool quiet);
 
+/* Returns the alignment in bytes that is required so that no bounce buffer
+ * is required throughout the stack */
+size_t bdrv_opt_mem_align(BlockDriverState *bs);
 void bdrv_set_buffer_alignment(BlockDriverState *bs, int align);
 void *qemu_blockalign(BlockDriverState *bs, size_t size);
 bool bdrv_qiov_is_aligned(BlockDriverState *bs, QEMUIOVector *qiov);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index c49fa6b..6ffae7c 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -252,6 +252,9 @@ typedef struct BlockLimits {
 
 /* optimal transfer length in sectors */
 int opt_transfer_length;
+
+/* memory alignment so that no bounce buffer is needed */
+size_t opt_mem_alignment;
 } BlockLimits;
 
 /*
-- 
1.8.1.4

[Qemu-devel] [PATCH v2 17/24] block: Generalise and optimise COR serialisation

2013-12-13 Thread Kevin Wolf

Change the API so that specific requests can be marked serialising. Only
these requests are checked for overlaps then.

This means that during a Copy on Read operation, not all requests
overlapping other requests are serialised any more, but only those that
actually overlap with the specific COR request.

Also remove COR from function and variable names because this
functionality can be useful in other contexts.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 block.c   | 48 ---
 include/block/block_int.h |  5 +++--
 2 files changed, 32 insertions(+), 21 deletions(-)

diff --git a/block.c b/block.c
index 6dddb7c..bf8b46e 100644
--- a/block.c
+++ b/block.c
@@ -2028,6 +2028,10 @@ int bdrv_commit_all(void)
  */
 static void tracked_request_end(BdrvTrackedRequest *req)
 {
+if (req-serialising) {
+req-bs-serialising_in_flight--;
+}
+
 QLIST_REMOVE(req, list);
 qemu_co_queue_restart_all(req-wait_queue);
 }
@@ -2042,10 +2046,11 @@ static void tracked_request_begin(BdrvTrackedRequest 
*req,
 {
 *req = (BdrvTrackedRequest){
 .bs = bs,
-.offset = offset,
-.bytes = bytes,
-.is_write = is_write,
-.co = qemu_coroutine_self(),
+.offset = offset,
+.bytes  = bytes,
+.is_write   = is_write,
+.co = qemu_coroutine_self(),
+.serialising= false,
 };
 
 qemu_co_queue_init(req-wait_queue);
@@ -2053,6 +2058,14 @@ static void tracked_request_begin(BdrvTrackedRequest 
*req,
 QLIST_INSERT_HEAD(bs-tracked_requests, req, list);
 }
 
+static void mark_request_serialising(BdrvTrackedRequest *req)
+{
+if (!req-serialising) {
+req-bs-serialising_in_flight++;
+req-serialising = true;
+}
+}
+
 /**
  * Round a region to cluster boundaries
  */
@@ -2105,26 +2118,31 @@ static bool tracked_request_overlaps(BdrvTrackedRequest 
*req,
 return true;
 }
 
-static void coroutine_fn wait_for_overlapping_requests(BlockDriverState *bs,
-BdrvTrackedRequest *self, int64_t offset, unsigned int bytes)
+static void coroutine_fn wait_serialising_requests(BdrvTrackedRequest *self)
 {
+BlockDriverState *bs = self-bs;
 BdrvTrackedRequest *req;
 int64_t cluster_offset;
 unsigned int cluster_bytes;
 bool retry;
 
+if (!bs-serialising_in_flight) {
+return;
+}
+
 /* If we touch the same cluster it counts as an overlap.  This guarantees
  * that allocating writes will be serialized and not race with each other
  * for the same cluster.  For example, in copy-on-read it ensures that the
  * CoR read and write operations are atomic and guest writes cannot
  * interleave between them.
  */
-round_bytes_to_clusters(bs, offset, bytes, cluster_offset, 
cluster_bytes);
+round_bytes_to_clusters(bs, self-offset, self-bytes,
+cluster_offset, cluster_bytes);
 
 do {
 retry = false;
 QLIST_FOREACH(req, bs-tracked_requests, list) {
-if (req == self) {
+if (req == self || (!req-serialising  !self-serialising)) {
 continue;
 }
 if (tracked_request_overlaps(req, cluster_offset, cluster_bytes)) {
@@ -2738,12 +2756,10 @@ static int coroutine_fn 
bdrv_aligned_preadv(BlockDriverState *bs,
 
 /* Handle Copy on Read and associated serialisation */
 if (flags  BDRV_REQ_COPY_ON_READ) {
-bs-copy_on_read_in_flight++;
+mark_request_serialising(req);
 }
 
-if (bs-copy_on_read_in_flight) {
-wait_for_overlapping_requests(bs, req, offset, bytes);
-}
+wait_serialising_requests(req);
 
 if (flags  BDRV_REQ_COPY_ON_READ) {
 int pnum;
@@ -2791,10 +2807,6 @@ static int coroutine_fn 
bdrv_aligned_preadv(BlockDriverState *bs,
 }
 
 out:
-if (flags  BDRV_REQ_COPY_ON_READ) {
-bs-copy_on_read_in_flight--;
-}
-
 return ret;
 }
 
@@ -2992,9 +3004,7 @@ static int coroutine_fn 
bdrv_aligned_pwritev(BlockDriverState *bs,
 assert((offset  (BDRV_SECTOR_SIZE - 1)) == 0);
 assert((bytes  (BDRV_SECTOR_SIZE - 1)) == 0);
 
-if (bs-copy_on_read_in_flight) {
-wait_for_overlapping_requests(bs, req, offset, bytes);
-}
+wait_serialising_requests(req);
 
 ret = notifier_with_return_list_notify(bs-before_write_notifiers, req);
 
diff --git a/include/block/block_int.h b/include/block/block_int.h
index a11e5c9..b00402b 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -60,6 +60,7 @@ typedef struct BdrvTrackedRequest {
 int64_t offset;
 unsigned int bytes;
 bool is_write;
+bool serialising;
 QLIST_ENTRY(BdrvTrackedRequest) list;
 Coroutine *co; /* owner, used for deadlock detection */
 CoQueue wait_queue; /* coroutines blocked on this request */
@@ -296,8 +297,8 @@ struct BlockDriverState {
 /* Callback before write request is processed

[Qemu-devel] [PATCH v2 11/24] block: Introduce bdrv_aligned_pwritev()

2013-12-13 Thread Kevin Wolf

This separates the part of bdrv_co_do_writev() that needs to happen
before the request is modified to match the backend alignment, and a
part that needs to be executed afterwards and passes the request to the
BlockDriver.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 block.c | 62 +-
 1 file changed, 41 insertions(+), 21 deletions(-)

diff --git a/block.c b/block.c
index 7812d8f..e26e31c 100644
--- a/block.c
+++ b/block.c
@@ -2958,34 +2958,20 @@ static int coroutine_fn 
bdrv_co_do_write_zeroes(BlockDriverState *bs,
 }
 
 /*
- * Handle a write request in coroutine context
+ * Forwards an already correctly aligned write request to the BlockDriver.
  */
-static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs,
-int64_t sector_num, int nb_sectors, QEMUIOVector *qiov,
-BdrvRequestFlags flags)
+static int coroutine_fn bdrv_aligned_pwritev(BlockDriverState *bs,
+int64_t offset, unsigned int bytes, QEMUIOVector *qiov, int flags)
 {
 BlockDriver *drv = bs-drv;
 BdrvTrackedRequest req;
 int ret;
 
-if (!bs-drv) {
-return -ENOMEDIUM;
-}
-if (bs-read_only) {
-return -EACCES;
-}
-if (bdrv_check_request(bs, sector_num, nb_sectors)) {
-return -EIO;
-}
-
-if (bs-copy_on_read_in_flight) {
-wait_for_overlapping_requests(bs, sector_num, nb_sectors);
-}
+int64_t sector_num = offset  BDRV_SECTOR_BITS;
+unsigned int nb_sectors = bytes  BDRV_SECTOR_BITS;
 
-/* throttling disk I/O */
-if (bs-io_limits_enabled) {
-bdrv_io_limits_intercept(bs, nb_sectors, true);
-}
+assert((offset  (BDRV_SECTOR_SIZE - 1)) == 0);
+assert((bytes  (BDRV_SECTOR_SIZE - 1)) == 0);
 
 tracked_request_begin(req, bs, sector_num, nb_sectors, true);
 
@@ -3017,6 +3003,40 @@ static int coroutine_fn 
bdrv_co_do_writev(BlockDriverState *bs,
 return ret;
 }
 
+/*
+ * Handle a write request in coroutine context
+ */
+static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs,
+int64_t sector_num, int nb_sectors, QEMUIOVector *qiov,
+BdrvRequestFlags flags)
+{
+int ret;
+
+if (!bs-drv) {
+return -ENOMEDIUM;
+}
+if (bs-read_only) {
+return -EACCES;
+}
+if (bdrv_check_request(bs, sector_num, nb_sectors)) {
+return -EIO;
+}
+
+if (bs-copy_on_read_in_flight) {
+wait_for_overlapping_requests(bs, sector_num, nb_sectors);
+}
+
+/* throttling disk I/O */
+if (bs-io_limits_enabled) {
+bdrv_io_limits_intercept(bs, nb_sectors, true);
+}
+
+ret = bdrv_aligned_pwritev(bs, sector_num  BDRV_SECTOR_BITS,
+   nb_sectors  BDRV_SECTOR_BITS, qiov, flags);
+
+return ret;
+}
+
 int coroutine_fn bdrv_co_writev(BlockDriverState *bs, int64_t sector_num,
 int nb_sectors, QEMUIOVector *qiov)
 {
-- 
1.8.1.4

[Qemu-devel] [PATCH v2 13/24] block: Introduce bdrv_co_do_pwritev()

2013-12-13 Thread Kevin Wolf

This is going to become the bdrv_co_do_preadv() equivalent for writes.
In this patch, however, just a function taking byte offsets is created,
it doesn't align anything yet.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 block.c | 23 +--
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/block.c b/block.c
index 385fb8a..a80db2e 100644
--- a/block.c
+++ b/block.c
@@ -3010,8 +3010,8 @@ static int coroutine_fn 
bdrv_aligned_pwritev(BlockDriverState *bs,
 /*
  * Handle a write request in coroutine context
  */
-static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs,
-int64_t sector_num, int nb_sectors, QEMUIOVector *qiov,
+static int coroutine_fn bdrv_co_do_pwritev(BlockDriverState *bs,
+int64_t offset, unsigned int bytes, QEMUIOVector *qiov,
 BdrvRequestFlags flags)
 {
 int ret;
@@ -3022,21 +3022,32 @@ static int coroutine_fn 
bdrv_co_do_writev(BlockDriverState *bs,
 if (bs-read_only) {
 return -EACCES;
 }
-if (bdrv_check_request(bs, sector_num, nb_sectors)) {
+if (bdrv_check_byte_request(bs, offset, bytes)) {
 return -EIO;
 }
 
 /* throttling disk I/O */
 if (bs-io_limits_enabled) {
-bdrv_io_limits_intercept(bs, nb_sectors, true);
+bdrv_io_limits_intercept(bs, bytes  BDRV_SECTOR_BITS, true);
 }
 
-ret = bdrv_aligned_pwritev(bs, sector_num  BDRV_SECTOR_BITS,
-   nb_sectors  BDRV_SECTOR_BITS, qiov, flags);
+ret = bdrv_aligned_pwritev(bs, offset, bytes, qiov, flags);
 
 return ret;
 }
 
+static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs,
+int64_t sector_num, int nb_sectors, QEMUIOVector *qiov,
+BdrvRequestFlags flags)
+{
+if (nb_sectors  0 || nb_sectors  (UINT_MAX  BDRV_SECTOR_BITS)) {
+return -EINVAL;
+}
+
+return bdrv_co_do_pwritev(bs, sector_num  BDRV_SECTOR_BITS,
+  nb_sectors  BDRV_SECTOR_BITS, qiov, flags);
+}
+
 int coroutine_fn bdrv_co_writev(BlockDriverState *bs, int64_t sector_num,
 int nb_sectors, QEMUIOVector *qiov)
 {
-- 
1.8.1.4

[Qemu-devel] [PATCH v2 12/24] block: write: Handle COR dependency after I/O throttling

2013-12-13 Thread Kevin Wolf

First waiting for all COR requests to complete and calling the
throttling function afterwards means that the request could be delayed
and we still need to wait for the COR request even if it was issued only
after the throttled write request.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 block.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/block.c b/block.c
index e26e31c..385fb8a 100644
--- a/block.c
+++ b/block.c
@@ -2973,6 +2973,10 @@ static int coroutine_fn 
bdrv_aligned_pwritev(BlockDriverState *bs,
 assert((offset  (BDRV_SECTOR_SIZE - 1)) == 0);
 assert((bytes  (BDRV_SECTOR_SIZE - 1)) == 0);
 
+if (bs-copy_on_read_in_flight) {
+wait_for_overlapping_requests(bs, sector_num, nb_sectors);
+}
+
 tracked_request_begin(req, bs, sector_num, nb_sectors, true);
 
 ret = notifier_with_return_list_notify(bs-before_write_notifiers, req);
@@ -3022,10 +3026,6 @@ static int coroutine_fn 
bdrv_co_do_writev(BlockDriverState *bs,
 return -EIO;
 }
 
-if (bs-copy_on_read_in_flight) {
-wait_for_overlapping_requests(bs, sector_num, nb_sectors);
-}
-
 /* throttling disk I/O */
 if (bs-io_limits_enabled) {
 bdrv_io_limits_intercept(bs, nb_sectors, true);
-- 
1.8.1.4

[Qemu-devel] [PATCH v2 14/24] block: Switch BdrvTrackedRequest to byte granularity

2013-12-13 Thread Kevin Wolf

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 block.c   | 52 +++
 block/backup.c|  7 ++-
 include/block/block_int.h |  4 ++--
 3 files changed, 42 insertions(+), 21 deletions(-)

diff --git a/block.c b/block.c
index a80db2e..fa888d9 100644
--- a/block.c
+++ b/block.c
@@ -2037,13 +2037,13 @@ static void tracked_request_end(BdrvTrackedRequest *req)
  */
 static void tracked_request_begin(BdrvTrackedRequest *req,
   BlockDriverState *bs,
-  int64_t sector_num,
-  int nb_sectors, bool is_write)
+  int64_t offset,
+  unsigned int bytes, bool is_write)
 {
 *req = (BdrvTrackedRequest){
 .bs = bs,
-.sector_num = sector_num,
-.nb_sectors = nb_sectors,
+.offset = offset,
+.bytes = bytes,
 .is_write = is_write,
 .co = qemu_coroutine_self(),
 };
@@ -2074,25 +2074,43 @@ void bdrv_round_to_clusters(BlockDriverState *bs,
 }
 }
 
+static void round_bytes_to_clusters(BlockDriverState *bs,
+int64_t offset, unsigned int bytes,
+int64_t *cluster_offset,
+unsigned int *cluster_bytes)
+{
+BlockDriverInfo bdi;
+
+if (bdrv_get_info(bs, bdi)  0 || bdi.cluster_size == 0) {
+*cluster_offset = offset;
+*cluster_bytes = bytes;
+} else {
+*cluster_offset = QEMU_ALIGN_DOWN(offset, bdi.cluster_size);
+*cluster_bytes = QEMU_ALIGN_UP(offset - *cluster_offset + bytes,
+   bdi.cluster_size);
+}
+}
+
 static bool tracked_request_overlaps(BdrvTrackedRequest *req,
- int64_t sector_num, int nb_sectors) {
+ int64_t offset, int bytes)
+{
 /*    */
-if (sector_num = req-sector_num + req-nb_sectors) {
+if (offset = req-offset + req-bytes) {
 return false;
 }
 /*    */
-if (req-sector_num = sector_num + nb_sectors) {
+if (req-offset = offset + bytes) {
 return false;
 }
 return true;
 }
 
 static void coroutine_fn wait_for_overlapping_requests(BlockDriverState *bs,
-int64_t sector_num, int nb_sectors)
+int64_t offset, unsigned int bytes)
 {
 BdrvTrackedRequest *req;
-int64_t cluster_sector_num;
-int cluster_nb_sectors;
+int64_t cluster_offset;
+unsigned int cluster_bytes;
 bool retry;
 
 /* If we touch the same cluster it counts as an overlap.  This guarantees
@@ -2101,14 +2119,12 @@ static void coroutine_fn 
wait_for_overlapping_requests(BlockDriverState *bs,
  * CoR read and write operations are atomic and guest writes cannot
  * interleave between them.
  */
-bdrv_round_to_clusters(bs, sector_num, nb_sectors,
-   cluster_sector_num, cluster_nb_sectors);
+round_bytes_to_clusters(bs, offset, bytes, cluster_offset, 
cluster_bytes);
 
 do {
 retry = false;
 QLIST_FOREACH(req, bs-tracked_requests, list) {
-if (tracked_request_overlaps(req, cluster_sector_num,
- cluster_nb_sectors)) {
+if (tracked_request_overlaps(req, cluster_offset, cluster_bytes)) {
 /* Hitting this means there was a reentrant request, for
  * example, a block driver issuing nested requests.  This must
  * never happen since it means deadlock.
@@ -2723,10 +2739,10 @@ static int coroutine_fn 
bdrv_aligned_preadv(BlockDriverState *bs,
 }
 
 if (bs-copy_on_read_in_flight) {
-wait_for_overlapping_requests(bs, sector_num, nb_sectors);
+wait_for_overlapping_requests(bs, offset, bytes);
 }
 
-tracked_request_begin(req, bs, sector_num, nb_sectors, false);
+tracked_request_begin(req, bs, offset, bytes, false);
 
 if (flags  BDRV_REQ_COPY_ON_READ) {
 int pnum;
@@ -2974,10 +2990,10 @@ static int coroutine_fn 
bdrv_aligned_pwritev(BlockDriverState *bs,
 assert((bytes  (BDRV_SECTOR_SIZE - 1)) == 0);
 
 if (bs-copy_on_read_in_flight) {
-wait_for_overlapping_requests(bs, sector_num, nb_sectors);
+wait_for_overlapping_requests(bs, offset, bytes);
 }
 
-tracked_request_begin(req, bs, sector_num, nb_sectors, true);
+tracked_request_begin(req, bs, offset, bytes, true);
 
 ret = notifier_with_return_list_notify(bs-before_write_notifiers, req);
 
diff --git a/block/backup.c b/block/backup.c
index 0198514..15a2e55 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -181,8 +181,13 @@ static int coroutine_fn backup_before_write_notify(
 void *opaque)
 {
 BdrvTrackedRequest *req = opaque;
+int64_t sector_num = req-offset  BDRV_SECTOR_BITS;
+int

[Qemu-devel] [PATCH v2 24/24] iscsi: Set bs-request_alignment

2013-12-13 Thread Kevin Wolf

From: Paolo Bonzini pbonz...@redhat.com

The iSCSI backend already gets the block size from the READ CAPACITY
command it sends.  Save it so that the generic block layer gets it
too.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Kevin Wolf kw...@redhat.com
---
 block/iscsi.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block/iscsi.c b/block/iscsi.c
index 18bc093..039b8ad 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -1395,6 +1395,7 @@ static int iscsi_open(BlockDriverState *bs, QDict 
*options, int flags,
 goto out;
 }
 bs-total_sectors = sector_lun2qemu(iscsilun-num_blocks, iscsilun);
+bs-request_alignment = iscsilun-block_size;
 
 /* Medium changer or tape. We dont have any emulation for this so this must
  * be sg ioctl compatible. We force it to be sg, otherwise qemu will try
-- 
1.8.1.4

[Qemu-devel] [PATCH v2 19/24] block: Allow wait_serialising_requests() at any point

2013-12-13 Thread Kevin Wolf

We can only have a single wait_serialising_requests() call per request
because otherwise we can run into deadlocks where requests are waiting
for each other. The same is true when wait_serialising_requests() is not
at the very beginning of a request, so that other requests can be issued
between the start of the tracking and wait_serialising_requests().

Fix this by changing wait_serialising_requests() to ignore requests that
are already (directly or indirectly) waiting for the calling request.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 block.c   | 26 +++---
 include/block/block_int.h |  2 ++
 2 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/block.c b/block.c
index 73bba47..8a0225d 100644
--- a/block.c
+++ b/block.c
@@ -2123,6 +2123,20 @@ static bool tracked_request_overlaps(BdrvTrackedRequest 
*req,
 return true;
 }
 
+static bool coroutine_fn is_waiting_for(BdrvTrackedRequest *waiting,
+BdrvTrackedRequest *waited_for)
+{
+BdrvTrackedRequest *req;
+
+for (req = waiting; req != NULL; req = req-waiting_for) {
+if (req == waited_for) {
+return true;
+}
+}
+
+return false;
+}
+
 static void coroutine_fn wait_serialising_requests(BdrvTrackedRequest *self)
 {
 BlockDriverState *bs = self-bs;
@@ -2148,9 +2162,15 @@ static void coroutine_fn 
wait_serialising_requests(BdrvTrackedRequest *self)
  */
 assert(qemu_coroutine_self() != req-co);
 
-qemu_co_queue_wait(req-wait_queue);
-retry = true;
-break;
+/* If the request is already (indircetly) waiting for us, then
+ * just go on instead of producing a deadlock. */
+if (!is_waiting_for(req, self)) {
+self-waiting_for = req;
+qemu_co_queue_wait(req-wait_queue);
+self-waiting_for = NULL;
+retry = true;
+break;
+}
 }
 }
 } while (retry);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 1aee02b..0f7fcef 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -68,6 +68,8 @@ typedef struct BdrvTrackedRequest {
 QLIST_ENTRY(BdrvTrackedRequest) list;
 Coroutine *co; /* owner, used for deadlock detection */
 CoQueue wait_queue; /* coroutines blocked on this request */
+
+struct BdrvTrackedRequest *waiting_for;
 } BdrvTrackedRequest;
 
 struct BlockDriver {
-- 
1.8.1.4

[Qemu-devel] [PATCH v2 16/24] block: Make zero-after-EOF work with larger alignment

2013-12-13 Thread Kevin Wolf

Odd file sizes could make bdrv_aligned_preadv() shorten the request in
non-aligned ways. Fix it by rounding to the required alignment instead
of 512 bytes.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 block.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/block.c b/block.c
index b4f6ead..6dddb7c 100644
--- a/block.c
+++ b/block.c
@@ -2725,7 +2725,7 @@ err:
  */
 static int coroutine_fn bdrv_aligned_preadv(BlockDriverState *bs,
 BdrvTrackedRequest *req, int64_t offset, unsigned int bytes,
-QEMUIOVector *qiov, int flags)
+int64_t align, QEMUIOVector *qiov, int flags)
 {
 BlockDriver *drv = bs-drv;
 int ret;
@@ -2773,7 +2773,7 @@ static int coroutine_fn 
bdrv_aligned_preadv(BlockDriverState *bs,
 }
 
 total_sectors = DIV_ROUND_UP(len, BDRV_SECTOR_SIZE);
-max_nb_sectors = MAX(0, total_sectors - sector_num);
+max_nb_sectors = MAX(0, ROUND_UP(total_sectors - sector_num, align));
 if (max_nb_sectors  0) {
 ret = drv-bdrv_co_readv(bs, sector_num,
  MIN(nb_sectors, max_nb_sectors), qiov);
@@ -2858,7 +2858,7 @@ static int coroutine_fn 
bdrv_co_do_preadv(BlockDriverState *bs,
 }
 
 tracked_request_begin(req, bs, offset, bytes, false);
-ret = bdrv_aligned_preadv(bs, req, offset, bytes,
+ret = bdrv_aligned_preadv(bs, req, offset, bytes, align,
   use_local_qiov ? local_qiov : qiov,
   flags);
 tracked_request_end(req);
-- 
1.8.1.4

[Qemu-devel] [PATCH v2 22/24] block: Make bdrv_pread() a bdrv_prwv_co() wrapper

2013-12-13 Thread Kevin Wolf

Instead of implementing the alignment adjustment here, use the now
existing functionality of bdrv_co_do_preadv().

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 block.c | 49 +
 1 file changed, 13 insertions(+), 36 deletions(-)

diff --git a/block.c b/block.c
index 5b00d23..ab165c5 100644
--- a/block.c
+++ b/block.c
@@ -2545,49 +2545,26 @@ int bdrv_make_zero(BlockDriverState *bs, 
BdrvRequestFlags flags)
 }
 }
 
-int bdrv_pread(BlockDriverState *bs, int64_t offset,
-   void *buf, int count1)
+int bdrv_pread(BlockDriverState *bs, int64_t offset, void *buf, int bytes)
 {
-uint8_t tmp_buf[BDRV_SECTOR_SIZE];
-int len, nb_sectors, count;
-int64_t sector_num;
+QEMUIOVector qiov;
+struct iovec iov = {
+.iov_base = (void *)buf,
+.iov_len = bytes,
+};
 int ret;
 
-count = count1;
-/* first read to align to sector start */
-len = (BDRV_SECTOR_SIZE - offset)  (BDRV_SECTOR_SIZE - 1);
-if (len  count)
-len = count;
-sector_num = offset  BDRV_SECTOR_BITS;
-if (len  0) {
-if ((ret = bdrv_read(bs, sector_num, tmp_buf, 1))  0)
-return ret;
-memcpy(buf, tmp_buf + (offset  (BDRV_SECTOR_SIZE - 1)), len);
-count -= len;
-if (count == 0)
-return count1;
-sector_num++;
-buf += len;
+if (bytes  0) {
+return -EINVAL;
 }
 
-/* read the sectors in place */
-nb_sectors = count  BDRV_SECTOR_BITS;
-if (nb_sectors  0) {
-if ((ret = bdrv_read(bs, sector_num, buf, nb_sectors))  0)
-return ret;
-sector_num += nb_sectors;
-len = nb_sectors  BDRV_SECTOR_BITS;
-buf += len;
-count -= len;
+qemu_iovec_init_external(qiov, iov, 1);
+ret = bdrv_prwv_co(bs, offset, qiov, false, 0);
+if (ret  0) {
+return ret;
 }
 
-/* add data from the last sector */
-if (count  0) {
-if ((ret = bdrv_read(bs, sector_num, tmp_buf, 1))  0)
-return ret;
-memcpy(buf, tmp_buf, count);
-}
-return count1;
+return bytes;
 }
 
 int bdrv_pwritev(BlockDriverState *bs, int64_t offset, QEMUIOVector *qiov)
-- 
1.8.1.4

[Qemu-devel] [PATCH v2 20/24] block: Align requests in bdrv_co_do_pwritev()

2013-12-13 Thread Kevin Wolf

This patch changes bdrv_co_do_pwritev() to actually be what its name
promises. If requests aren't properly aligned, it performs a RMW.

Requests touching the same block are serialised against the RMW request.
Further optimisation of this is possible by differentiating types of
requests (concurrent reads should actually be okay here).

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 block.c | 86 -
 1 file changed, 85 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index 8a0225d..bfa2233 100644
--- a/block.c
+++ b/block.c
@@ -3061,6 +3061,12 @@ static int coroutine_fn 
bdrv_co_do_pwritev(BlockDriverState *bs,
 BdrvRequestFlags flags)
 {
 BdrvTrackedRequest req;
+/* TODO Lift BDRV_SECTOR_SIZE restriction in BlockDriver interface */
+uint64_t align = MAX(BDRV_SECTOR_SIZE, bs-request_alignment);
+uint8_t *head_buf = NULL;
+uint8_t *tail_buf = NULL;
+QEMUIOVector local_qiov;
+bool use_local_qiov = false;
 int ret;
 
 if (!bs-drv) {
@@ -3078,10 +3084,88 @@ static int coroutine_fn 
bdrv_co_do_pwritev(BlockDriverState *bs,
 bdrv_io_limits_intercept(bs, bytes  BDRV_SECTOR_BITS, true);
 }
 
+/*
+ * Align write if necessary by performing a read-modify-write cycle.
+ * Pad qiov with the read parts and be sure to have a tracked request not
+ * only for bdrv_aligned_pwritev, but also for the reads of the RMW cycle.
+ */
 tracked_request_begin(req, bs, offset, bytes, true);
-ret = bdrv_aligned_pwritev(bs, req, offset, bytes, qiov, flags);
+
+if (offset  (align - 1)) {
+QEMUIOVector head_qiov;
+struct iovec head_iov;
+
+mark_request_serialising(req, align);
+wait_serialising_requests(req);
+
+head_buf = qemu_blockalign(bs, align);
+head_iov = (struct iovec) {
+.iov_base   = head_buf,
+.iov_len= align,
+};
+qemu_iovec_init_external(head_qiov, head_iov, 1);
+
+ret = bdrv_aligned_preadv(bs, req, offset  ~(align - 1), align,
+  align, head_qiov, 0);
+if (ret  0) {
+goto fail;
+}
+
+qemu_iovec_init(local_qiov, qiov-niov + 2);
+qemu_iovec_add(local_qiov, head_buf, offset  (align - 1));
+qemu_iovec_concat(local_qiov, qiov, 0, qiov-size);
+use_local_qiov = true;
+
+bytes += offset  (align - 1);
+offset = offset  ~(align - 1);
+}
+
+if ((offset + bytes)  (align - 1)) {
+QEMUIOVector tail_qiov;
+struct iovec tail_iov;
+size_t tail_bytes;
+
+mark_request_serialising(req, align);
+wait_serialising_requests(req);
+
+tail_buf = qemu_blockalign(bs, align);
+tail_iov = (struct iovec) {
+.iov_base   = tail_buf,
+.iov_len= align,
+};
+qemu_iovec_init_external(tail_qiov, tail_iov, 1);
+
+ret = bdrv_aligned_preadv(bs, req, (offset + bytes)  ~(align - 1), 
align,
+  align, tail_qiov, 0);
+if (ret  0) {
+goto fail;
+}
+
+if (!use_local_qiov) {
+qemu_iovec_init(local_qiov, qiov-niov + 1);
+qemu_iovec_concat(local_qiov, qiov, 0, qiov-size);
+use_local_qiov = true;
+}
+
+tail_bytes = (offset + bytes)  (align - 1);
+qemu_iovec_add(local_qiov, tail_buf + tail_bytes, align - tail_bytes);
+
+bytes = ROUND_UP(bytes, align);
+}
+
+ret = bdrv_aligned_pwritev(bs, req, offset, bytes,
+   use_local_qiov ? local_qiov : qiov,
+   flags);
+
+fail:
 tracked_request_end(req);
 
+if (use_local_qiov) {
+qemu_iovec_destroy(local_qiov);
+qemu_vfree(head_buf);
+qemu_vfree(tail_buf);
+}
+
 return ret;
 }
 
-- 
1.8.1.4

[Qemu-devel] [PATCH v2 23/24] block: Make bdrv_pwrite() a bdrv_prwv_co() wrapper

2013-12-13 Thread Kevin Wolf

Instead of implementing the alignment adjustment here, use the now
existing functionality of bdrv_co_do_pwritev().

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 block.c | 64 +---
 1 file changed, 9 insertions(+), 55 deletions(-)

diff --git a/block.c b/block.c
index ab165c5..350cb81 100644
--- a/block.c
+++ b/block.c
@@ -2496,11 +2496,6 @@ int bdrv_write(BlockDriverState *bs, int64_t sector_num,
 return bdrv_rw_co(bs, sector_num, (uint8_t *)buf, nb_sectors, true, 0);
 }
 
-int bdrv_writev(BlockDriverState *bs, int64_t sector_num, QEMUIOVector *qiov)
-{
-return bdrv_prwv_co(bs, sector_num  BDRV_SECTOR_BITS, qiov, true, 0);
-}
-
 int bdrv_write_zeroes(BlockDriverState *bs, int64_t sector_num,
   int nb_sectors, BdrvRequestFlags flags)
 {
@@ -2569,70 +2564,29 @@ int bdrv_pread(BlockDriverState *bs, int64_t offset, 
void *buf, int bytes)
 
 int bdrv_pwritev(BlockDriverState *bs, int64_t offset, QEMUIOVector *qiov)
 {
-uint8_t tmp_buf[BDRV_SECTOR_SIZE];
-int len, nb_sectors, count;
-int64_t sector_num;
 int ret;
 
-count = qiov-size;
-
-/* first write to align to sector start */
-len = (BDRV_SECTOR_SIZE - offset)  (BDRV_SECTOR_SIZE - 1);
-if (len  count)
-len = count;
-sector_num = offset  BDRV_SECTOR_BITS;
-if (len  0) {
-if ((ret = bdrv_read(bs, sector_num, tmp_buf, 1))  0)
-return ret;
-qemu_iovec_to_buf(qiov, 0, tmp_buf + (offset  (BDRV_SECTOR_SIZE - 1)),
-  len);
-if ((ret = bdrv_write(bs, sector_num, tmp_buf, 1))  0)
-return ret;
-count -= len;
-if (count == 0)
-return qiov-size;
-sector_num++;
-}
-
-/* write the sectors in place */
-nb_sectors = count  BDRV_SECTOR_BITS;
-if (nb_sectors  0) {
-QEMUIOVector qiov_inplace;
-
-qemu_iovec_init(qiov_inplace, qiov-niov);
-qemu_iovec_concat(qiov_inplace, qiov, len,
-  nb_sectors  BDRV_SECTOR_BITS);
-ret = bdrv_writev(bs, sector_num, qiov_inplace);
-qemu_iovec_destroy(qiov_inplace);
-if (ret  0) {
-return ret;
-}
-
-sector_num += nb_sectors;
-len = nb_sectors  BDRV_SECTOR_BITS;
-count -= len;
+ret = bdrv_prwv_co(bs, offset, qiov, true, 0);
+if (ret  0) {
+return ret;
 }
 
-/* add data from the last sector */
-if (count  0) {
-if ((ret = bdrv_read(bs, sector_num, tmp_buf, 1))  0)
-return ret;
-qemu_iovec_to_buf(qiov, qiov-size - count, tmp_buf, count);
-if ((ret = bdrv_write(bs, sector_num, tmp_buf, 1))  0)
-return ret;
-}
 return qiov-size;
 }
 
 int bdrv_pwrite(BlockDriverState *bs, int64_t offset,
-const void *buf, int count1)
+const void *buf, int bytes)
 {
 QEMUIOVector qiov;
 struct iovec iov = {
 .iov_base   = (void *) buf,
-.iov_len= count1,
+.iov_len= bytes,
 };
 
+if (bytes  0) {
+return -EINVAL;
+}
+
 qemu_iovec_init_external(qiov, iov, 1);
 return bdrv_pwritev(bs, offset, qiov);
 }
-- 
1.8.1.4

Re: [Qemu-devel] [PATCH RFC 1/3] os-posix: include sys/time.h

2013-12-13 Thread Eric Blake

On 11/12/2013 06:44 PM, Wenchao Xia wrote:
 Since gettimeofday() is used in this header file as a macro define,
 include the function's define header file, to avoid compile warning
 when other file include os-posix.h.
 
 Signed-off-by: Wenchao Xia xiaw...@linux.vnet.ibm.com
 ---
  include/sysemu/os-posix.h |2 ++
  1 files changed, 2 insertions(+), 0 deletions(-)

Reviewed-by: Eric Blake ebl...@redhat.com

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH v2 21/24] block: Change coroutine wrapper to byte granularity

2013-12-13 Thread Kevin Wolf

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 block.c | 48 ++--
 1 file changed, 26 insertions(+), 22 deletions(-)

diff --git a/block.c b/block.c
index bfa2233..5b00d23 100644
--- a/block.c
+++ b/block.c
@@ -69,11 +69,11 @@ static int coroutine_fn bdrv_co_readv_em(BlockDriverState 
*bs,
 static int coroutine_fn bdrv_co_writev_em(BlockDriverState *bs,
  int64_t sector_num, int nb_sectors,
  QEMUIOVector *iov);
-static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
-int64_t sector_num, int nb_sectors, QEMUIOVector *qiov,
+static int coroutine_fn bdrv_co_do_preadv(BlockDriverState *bs,
+int64_t offset, unsigned int bytes, QEMUIOVector *qiov,
 BdrvRequestFlags flags);
-static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs,
-int64_t sector_num, int nb_sectors, QEMUIOVector *qiov,
+static int coroutine_fn bdrv_co_do_pwritev(BlockDriverState *bs,
+int64_t offset, unsigned int bytes, QEMUIOVector *qiov,
 BdrvRequestFlags flags);
 static BlockDriverAIOCB *bdrv_co_aio_rw_vector(BlockDriverState *bs,
int64_t sector_num,
@@ -2383,8 +2383,7 @@ static int bdrv_check_request(BlockDriverState *bs, 
int64_t sector_num,
 
 typedef struct RwCo {
 BlockDriverState *bs;
-int64_t sector_num;
-int nb_sectors;
+int64_t offset;
 QEMUIOVector *qiov;
 bool is_write;
 int ret;
@@ -2396,34 +2395,32 @@ static void coroutine_fn bdrv_rw_co_entry(void *opaque)
 RwCo *rwco = opaque;
 
 if (!rwco-is_write) {
-rwco-ret = bdrv_co_do_readv(rwco-bs, rwco-sector_num,
- rwco-nb_sectors, rwco-qiov,
- rwco-flags);
-} else {
-rwco-ret = bdrv_co_do_writev(rwco-bs, rwco-sector_num,
-  rwco-nb_sectors, rwco-qiov,
+rwco-ret = bdrv_co_do_preadv(rwco-bs, rwco-offset,
+  rwco-qiov-size, rwco-qiov,
   rwco-flags);
+} else {
+rwco-ret = bdrv_co_do_pwritev(rwco-bs, rwco-offset,
+   rwco-qiov-size, rwco-qiov,
+   rwco-flags);
 }
 }
 
 /*
  * Process a vectored synchronous request using coroutines
  */
-static int bdrv_rwv_co(BlockDriverState *bs, int64_t sector_num,
-   QEMUIOVector *qiov, bool is_write,
-   BdrvRequestFlags flags)
+static int bdrv_prwv_co(BlockDriverState *bs, int64_t offset,
+QEMUIOVector *qiov, bool is_write,
+BdrvRequestFlags flags)
 {
 Coroutine *co;
 RwCo rwco = {
 .bs = bs,
-.sector_num = sector_num,
-.nb_sectors = qiov-size  BDRV_SECTOR_BITS,
+.offset = offset,
 .qiov = qiov,
 .is_write = is_write,
 .ret = NOT_DONE,
 .flags = flags,
 };
-assert((qiov-size  (BDRV_SECTOR_SIZE - 1)) == 0);
 
 /**
  * In sync call context, when the vcpu is blocked, this throttling timer
@@ -2462,7 +2459,8 @@ static int bdrv_rw_co(BlockDriverState *bs, int64_t 
sector_num, uint8_t *buf,
 };
 
 qemu_iovec_init_external(qiov, iov, 1);
-return bdrv_rwv_co(bs, sector_num, qiov, is_write, flags);
+return bdrv_prwv_co(bs, sector_num  BDRV_SECTOR_BITS,
+qiov, is_write, flags);
 }
 
 /* return  0 if error. See bdrv_write() for the return codes */
@@ -2500,7 +2498,7 @@ int bdrv_write(BlockDriverState *bs, int64_t sector_num,
 
 int bdrv_writev(BlockDriverState *bs, int64_t sector_num, QEMUIOVector *qiov)
 {
-return bdrv_rwv_co(bs, sector_num, qiov, true, 0);
+return bdrv_prwv_co(bs, sector_num  BDRV_SECTOR_BITS, qiov, true, 0);
 }
 
 int bdrv_write_zeroes(BlockDriverState *bs, int64_t sector_num,
@@ -4608,9 +4606,15 @@ int bdrv_flush(BlockDriverState *bs)
 return rwco.ret;
 }
 
+typedef struct DiscardCo {
+BlockDriverState *bs;
+int64_t sector_num;
+int nb_sectors;
+int ret;
+} DiscardCo;
 static void coroutine_fn bdrv_discard_co_entry(void *opaque)
 {
-RwCo *rwco = opaque;
+DiscardCo *rwco = opaque;
 
 rwco-ret = bdrv_co_discard(rwco-bs, rwco-sector_num, rwco-nb_sectors);
 }
@@ -4694,7 +4698,7 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, 
int64_t sector_num,
 int bdrv_discard(BlockDriverState *bs, int64_t sector_num, int nb_sectors)
 {
 Coroutine *co;
-RwCo rwco = {
+DiscardCo rwco = {
 .bs = bs,
 .sector_num = sector_num,
 .nb_sectors = nb_sectors,
-- 
1.8.1.4

Re: [Qemu-devel] [PATCH 00/13] Freescale mxs/imx23 + Olimex Olinuxino support

2013-12-13 Thread Peter Maydell

On 13 December 2013 12:53, M P buser...@gmail.com wrote:
 Can someone give me a pointer on how the review (if any) is done for these
 patches? I have to say I'm rather amazed at the rate of submission on the
 mailing list, and I worried to see these patches buried further and further
 down in such a short timescale :-)

The rough process is:
 * people reply to your email with review comments; when you've
   accumulated enough you send out a fixed set of patches for
   further review
 * if nobody replies at all within say 2 weeks you can send out
   a 'ping' followup to this coverletter to bring the set back
   to peoples' attention

http://wiki.qemu.org/Contribute/SubmitAPatch is where we try
to describe how the process works.

In this particular case, I have tagged your mail in my mail
client as 'must-review'; however since my review queue is
currently pretty full (you're one of four new ARM board/soc
support patchsets, the others being Allwinner A10, Canon DIGIC
and RaspberryPi; then there's the TrustZone support patchset
and all the 64 bit support work) I'm afraid I can't currently
promise a particularly rapid turnaround time.

If you can find the mail threads relating to the other
recent board/soc support patchsets I listed above and
look through review comments to see if any of them would
apply to your board that would probably help.

thanks
-- PMM

Re: [Qemu-devel] [PATCH RFC 3/3] tests: add test cases for qapi event support

2013-12-13 Thread Eric Blake

On 11/12/2013 06:44 PM, Wenchao Xia wrote:
 Signed-off-by: Wenchao Xia xiaw...@linux.vnet.ibm.com
 ---

 +++ b/tests/qapi-schema/qapi-schema-test.json
 @@ -93,3 +93,15 @@
  '*u16' : [ 'uint16' ],
  '*i64x':   'int' ,
  '*u64x':   'uint64'  } }
 +
 +# testing event
 +{ 'type': 'EventStructOne',
 +  'data': { 'struct1': 'UserDefOne', 'string': 'str', '*enum2': 'EnumOne' } }
 +
 +{ 'event': 'EVENT_A' }
 +{ 'event': 'EVENT_B',
 +  'data': { } }

Do we really have events with no associated information (the mere name
of the event carrying the full data?)

 +{ 'event': 'EVENT_C',
 +  'data': { '*a': 'int', '*b': 'UserDefOne', 'c': 'str' } }
 +{ 'event': 'EVENT_D',
 +  'data': { 'a' : 'EventStructOne', 'b' : 'str', '*c': 'str', '*enum3': 
 'EnumOne' } }

Looks like reasonable support for encoding existing events.

 + OrderedDict([('event', 'EVENT_A')]),
 + OrderedDict([('event', 'EVENT_B'), ('data', OrderedDict())]),

Shouldn't the omission of 'data' be the same as 'data':{}?

 +++ b/tests/test-qmp-event.c
 @@ -0,0 +1,250 @@
 +/*
 + * QMP Input Visitor unit-tests.
 + *
 + * Copyright (C) 2011 Red Hat Inc.

Unusual choice of copyright year and ownership.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH] target-sh4: Use new qemu_ld/st opcodes

2013-12-13 Thread Aurelien Jarno

On Fri, Dec 13, 2013 at 09:36:13AM +, Alex Bennée wrote:
 
 aurel...@aurel32.net writes:
 
  Signed-off-by: Aurelien Jarno aurel...@aurel32.net
  ---
   target-sh4/translate.c |  167 
  ++--
   1 file changed, 90 insertions(+), 77 deletions(-)
 
  diff --git a/target-sh4/translate.c b/target-sh4/translate.c
  index 2272eb0..87f532a 100644
  --- a/target-sh4/translate.c
  +++ b/target-sh4/translate.c
  @@ -464,7 +464,7 @@ static void _decode_opc(DisasContext * ctx)
  {
  TCGv addr = tcg_temp_new();
  tcg_gen_addi_i32(addr, REG(B11_8), B3_0 * 4);
  -   tcg_gen_qemu_st32(REG(B7_4), addr, ctx-memidx);
  +tcg_gen_qemu_st_i32(REG(B7_4), addr, ctx-memidx, MO_TEUL);
  tcg_temp_free(addr);
 snip
 
 There seems to be a fix of tabs and spaces in that patch.
 

Indeed, this file is partly tabs indented for historical reasons, so
they are changed to space in the patch, to conform to the QEMU coding
style. AFAIK there is a consensus that things should be done that way.

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH RFC 2/3] qapi script: add support of event

2013-12-13 Thread Kevin Wolf

Am 13.12.2013 um 14:31 hat Eric Blake geschrieben:
 On 11/12/2013 06:44 PM, Wenchao Xia wrote:
  +++ b/scripts/qapi-event.py
  @@ -0,0 +1,355 @@
  +#
  +# QAPI event generator
  +#
  +# Copyright IBM, Corp. 2013
  +#
  +# Authors:
  +#  Wenchao Xia xiaw...@linux.vnet.ibm.com
  +#
  +# This work is licensed under the terms of the GNU GPLv2.
 
 Can you please use GPLv2+ (that is, add the or later clause)?  We
 already have GPLv2-only code, but I don't want to increase the size of
 that unfortunate license choice.

In fact, it's even worse:

+# This work is licensed under the terms of the GNU GPLv2.
+# See the COPYING.LIB file in the top-level directory.

These two lines contradict each other, COPYING.LIB contains the
LGPL 2.1. The same bad license header is in the other QAPI generator
scripts, so it's only copypaste here.

This doesn't make things easier, because if things are copied, the
license of the source must be respected. And it seems rather dubious to
me what this license actually is. If it's GPLv2-only, we can't just
change it in the new copy.

Kevin


pgpPXoh_N4hV5.pgp
Description: PGP signature

Re: [Qemu-devel] [PATCH V7 4/6] qemu-img: add -l for snapshot in convert

2013-12-13 Thread Eric Blake

On 12/08/2013 08:43 PM, Wenchao Xia wrote:

 convert -s snapshot.name=name1

Previous I planned to use -l for internal snapshot in all possible
 program, since -s is taken as external snapshot in qemu, qemu-nbd.

Consistency in command line options between different tools is nice, but
is less important than adding functionality.  I'm perfectly fine if we
use -l in one tool and -s in another, as long as the documentation is
clear on how to spell the option for the tool I want to use.

 let -s stands for internal in qemu-img convert only, may bring
 confuse to user, so I deprecated it instead of enhance it(I want
 to remove it but may bring compatiablity issue).
Yes, it should report error when both specified, will send a patch
 if you agree '-l' should still be used.


   Eric, I hope to get your idea before patching, any comments?
 

My biggest concern was that by adding -l as a superset of -s, but not
taking care of the relation between the two, you created odd command
line usage patterns.  For qemu-img, it may be simpler to just make -s do
everything, instead of trying to deprecate it (that is, adding -l for
consistency with other tools while breaking -s isn't nice).

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH 03/11] define hotplug interface

2013-12-13 Thread Igor Mammedov

Provide generic hotplug interface for devices.
Intended for replacing hotplug mechanism used by
PCI/PCIE/SHPC code and will be used for memory hotplug.

Signed-off-by: Igor Mammedov imamm...@redhat.com
---
v2:
* s/device/handler/
* add hotplug_handler_plug/hotplug_handler_unplug API
v1:
it's scsi-bus like interface, but abstracted from bus altogether
since all current users care about in hotplug handlers, it's
hotplug device and hotplugged device and bus only serves
as a means to get access to hotplug device and it's callbacks.
---
 hw/core/Makefile.objs |  1 +
 hw/core/hotplug.c | 48 +
 include/hw/hotplug.h  | 75 +++
 3 files changed, 124 insertions(+)
 create mode 100644 hw/core/hotplug.c
 create mode 100644 include/hw/hotplug.h

diff --git a/hw/core/Makefile.objs b/hw/core/Makefile.objs
index 950146c..9e324be 100644
--- a/hw/core/Makefile.objs
+++ b/hw/core/Makefile.objs
@@ -2,6 +2,7 @@
 common-obj-y += qdev.o qdev-properties.o
 # irq.o needed for qdev GPIO handling:
 common-obj-y += irq.o
+common-obj-y += hotplug.o
 
 common-obj-$(CONFIG_EMPTY_SLOT) += empty_slot.o
 common-obj-$(CONFIG_XILINX_AXI) += stream.o
diff --git a/hw/core/hotplug.c b/hw/core/hotplug.c
new file mode 100644
index 000..5c3b5c9
--- /dev/null
+++ b/hw/core/hotplug.c
@@ -0,0 +1,48 @@
+/*
+ * Hotplug handler interface.
+ *
+ * Copyright (c) 2013 Red Hat Inc.
+ *
+ * Authors:
+ *  Igor Mammedov imamm...@redhat.com,
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#include hw/hotplug.h
+#include qemu/module.h
+
+void hotplug_handler_plug(HotplugHandler *plug_handler,
+  DeviceState *plugged_dev,
+  Error **errp)
+{
+HotplugHandlerClass *hdc = HOTPLUG_HANDLER_GET_CLASS(plug_handler);
+
+if (hdc-plug) {
+hdc-plug(plug_handler, plugged_dev, errp);
+}
+}
+
+void hotplug_handler_unplug(HotplugHandler *plug_handler,
+DeviceState *plugged_dev,
+Error **errp)
+{
+HotplugHandlerClass *hdc = HOTPLUG_HANDLER_GET_CLASS(plug_handler);
+
+if (hdc-unplug) {
+hdc-unplug(plug_handler, plugged_dev, errp);
+}
+}
+
+static const TypeInfo hotplug_handler_info = {
+.name  = TYPE_HOTPLUG_HANDLER,
+.parent= TYPE_INTERFACE,
+.class_size = sizeof(HotplugHandlerClass),
+};
+
+static void hotplug_handler_register_types(void)
+{
+type_register_static(hotplug_handler_info);
+}
+
+type_init(hotplug_handler_register_types)
diff --git a/include/hw/hotplug.h b/include/hw/hotplug.h
new file mode 100644
index 000..64ae6f7
--- /dev/null
+++ b/include/hw/hotplug.h
@@ -0,0 +1,75 @@
+/*
+ * Hotplug handler interface.
+ *
+ * Copyright (c) 2013 Red Hat Inc.
+ *
+ * Authors:
+ *  Igor Mammedov imamm...@redhat.com,
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#ifndef HOTPLUG_H
+#define HOTPLUG_H
+
+#include qom/object.h
+#include qemu/typedefs.h
+
+#define TYPE_HOTPLUG_HANDLER hotplug-handler
+
+#define HOTPLUG_HANDLER_CLASS(klass) \
+ OBJECT_CLASS_CHECK(HotplugHandlerClass, (klass), TYPE_HOTPLUG_HANDLER)
+#define HOTPLUG_HANDLER_GET_CLASS(obj) \
+ OBJECT_GET_CLASS(HotplugHandlerClass, (obj), TYPE_HOTPLUG_HANDLER)
+#define HOTPLUG_HANDLER(obj) \
+ INTERFACE_CHECK(HotplugHandler, (obj), TYPE_HOTPLUG_HANDLER)
+
+
+typedef struct HotplugHandler {
+Object Parent;
+} HotplugHandler;
+
+/**
+ * hotplug_fn:
+ * @plug_handler: a device performing plug/uplug action
+ * @plugged_dev: a device that has been (un)plugged
+ * @errp: returns an error if this function fails
+ */
+typedef void (*hotplug_fn)(HotplugHandler *plug_handler,
+   DeviceState *plugged_dev, Error **errp);
+
+/**
+ * HotplugDeviceClass:
+ *
+ * Interface to be implemented by a device performing
+ * hardware (un)plug functions.
+ *
+ * @parent: Opaque parent interface.
+ * @plug: plug callback.
+ * @unplug: unplug callback.
+ */
+typedef struct HotplugHandlerClass {
+InterfaceClass parent;
+
+hotplug_fn plug;
+hotplug_fn unplug;
+} HotplugHandlerClass;
+
+/**
+ * hotplug_handler_plug:
+ *
+ * Call #HotplugHandlerClass.plug callback of @plug_handler.
+ */
+void hotplug_handler_plug(HotplugHandler *plug_handler,
+  DeviceState *plugged_dev,
+  Error **errp);
+
+/**
+ * hotplug_handler_unplug:
+ *
+ * Call #HotplugHandlerClass.unplug callback of @plug_handler.
+ */
+void hotplug_handler_unplug(HotplugHandler *plug_handler,
+DeviceState *plugged_dev,
+Error **errp);
+#endif
-- 
1.8.3.1

Re: [Qemu-devel] [PATCH 00/13] Freescale mxs/imx23 + Olimex Olinuxino support

2013-12-13 Thread M P

Thanks Peter, very helpful -- and err, sorry to land you more work than you
already had :-)

Would I contribute anything by also reviewing and/or testing the sus
mentioned patches? I can at least test the A10 and raspi if this helps...

M



On Fri, Dec 13, 2013 at 1:29 PM, Peter Maydell peter.mayd...@linaro.orgwrote:

 On 13 December 2013 12:53, M P buser...@gmail.com wrote:
  Can someone give me a pointer on how the review (if any) is done for
 these
  patches? I have to say I'm rather amazed at the rate of submission on the
  mailing list, and I worried to see these patches buried further and
 further
  down in such a short timescale :-)

 The rough process is:
  * people reply to your email with review comments; when you've
accumulated enough you send out a fixed set of patches for
further review
  * if nobody replies at all within say 2 weeks you can send out
a 'ping' followup to this coverletter to bring the set back
to peoples' attention

 http://wiki.qemu.org/Contribute/SubmitAPatch is where we try
 to describe how the process works.

 In this particular case, I have tagged your mail in my mail
 client as 'must-review'; however since my review queue is
 currently pretty full (you're one of four new ARM board/soc
 support patchsets, the others being Allwinner A10, Canon DIGIC
 and RaspberryPi; then there's the TrustZone support patchset
 and all the 64 bit support work) I'm afraid I can't currently
 promise a particularly rapid turnaround time.

 If you can find the mail threads relating to the other
 recent board/soc support patchsets I listed above and
 look through review comments to see if any of them would
 apply to your board that would probably help.

 thanks
 -- PMM

[Qemu-devel] [PATCH v2 18/24] block: Make overlap range for serialisation dynamic

2013-12-13 Thread Kevin Wolf

Copy on Read wants to serialise with all requests touching the same
cluster, so wait_serialising_requests() rounded to cluster boundaries.
Other users like alignment RMW will have different requirements, though
(requests touching the same sector), so make it dynamic.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 block.c   | 53 ---
 include/block/block_int.h |  4 
 2 files changed, 31 insertions(+), 26 deletions(-)

diff --git a/block.c b/block.c
index bf8b46e..73bba47 100644
--- a/block.c
+++ b/block.c
@@ -2051,6 +2051,8 @@ static void tracked_request_begin(BdrvTrackedRequest *req,
 .is_write   = is_write,
 .co = qemu_coroutine_self(),
 .serialising= false,
+.overlap_offset = offset,
+.overlap_bytes  = bytes,
 };
 
 qemu_co_queue_init(req-wait_queue);
@@ -2058,12 +2060,19 @@ static void tracked_request_begin(BdrvTrackedRequest 
*req,
 QLIST_INSERT_HEAD(bs-tracked_requests, req, list);
 }
 
-static void mark_request_serialising(BdrvTrackedRequest *req)
+static void mark_request_serialising(BdrvTrackedRequest *req, size_t align)
 {
+int64_t overlap_offset = req-offset  ~(align - 1);
+int overlap_bytes = ROUND_UP(req-offset + req-bytes, align)
+  - overlap_offset;
+
 if (!req-serialising) {
 req-bs-serialising_in_flight++;
 req-serialising = true;
 }
+
+req-overlap_offset = MIN(req-overlap_offset, overlap_offset);
+req-overlap_bytes = MAX(req-overlap_bytes, overlap_bytes);
 }
 
 /**
@@ -2087,20 +2096,16 @@ void bdrv_round_to_clusters(BlockDriverState *bs,
 }
 }
 
-static void round_bytes_to_clusters(BlockDriverState *bs,
-int64_t offset, unsigned int bytes,
-int64_t *cluster_offset,
-unsigned int *cluster_bytes)
+static int bdrv_get_cluster_size(BlockDriverState *bs)
 {
 BlockDriverInfo bdi;
+int ret;
 
-if (bdrv_get_info(bs, bdi)  0 || bdi.cluster_size == 0) {
-*cluster_offset = offset;
-*cluster_bytes = bytes;
+ret = bdrv_get_info(bs, bdi);
+if (ret  0 || bdi.cluster_size == 0) {
+return bs-request_alignment;
 } else {
-*cluster_offset = QEMU_ALIGN_DOWN(offset, bdi.cluster_size);
-*cluster_bytes = QEMU_ALIGN_UP(offset - *cluster_offset + bytes,
-   bdi.cluster_size);
+return bdi.cluster_size;
 }
 }
 
@@ -2108,11 +2113,11 @@ static bool tracked_request_overlaps(BdrvTrackedRequest 
*req,
  int64_t offset, int bytes)
 {
 /*    */
-if (offset = req-offset + req-bytes) {
+if (offset = req-overlap_offset + req-overlap_bytes) {
 return false;
 }
 /*    */
-if (req-offset = offset + bytes) {
+if (req-overlap_offset = offset + bytes) {
 return false;
 }
 return true;
@@ -2122,30 +2127,21 @@ static void coroutine_fn 
wait_serialising_requests(BdrvTrackedRequest *self)
 {
 BlockDriverState *bs = self-bs;
 BdrvTrackedRequest *req;
-int64_t cluster_offset;
-unsigned int cluster_bytes;
 bool retry;
 
 if (!bs-serialising_in_flight) {
 return;
 }
 
-/* If we touch the same cluster it counts as an overlap.  This guarantees
- * that allocating writes will be serialized and not race with each other
- * for the same cluster.  For example, in copy-on-read it ensures that the
- * CoR read and write operations are atomic and guest writes cannot
- * interleave between them.
- */
-round_bytes_to_clusters(bs, self-offset, self-bytes,
-cluster_offset, cluster_bytes);
-
 do {
 retry = false;
 QLIST_FOREACH(req, bs-tracked_requests, list) {
 if (req == self || (!req-serialising  !self-serialising)) {
 continue;
 }
-if (tracked_request_overlaps(req, cluster_offset, cluster_bytes)) {
+if (tracked_request_overlaps(req, self-overlap_offset,
+ self-overlap_bytes))
+{
 /* Hitting this means there was a reentrant request, for
  * example, a block driver issuing nested requests.  This must
  * never happen since it means deadlock.
@@ -2756,7 +2752,12 @@ static int coroutine_fn 
bdrv_aligned_preadv(BlockDriverState *bs,
 
 /* Handle Copy on Read and associated serialisation */
 if (flags  BDRV_REQ_COPY_ON_READ) {
-mark_request_serialising(req);
+/* If we touch the same cluster it counts as an overlap.  This
+ * guarantees that allocating writes will be serialized and not race
+ * with each other for the same cluster.  For example, in copy-on-read
+ * it ensures that the CoR read and write

Re: [Qemu-devel] [PATCH RFC 2/3] qapi script: add support of event

2013-12-13 Thread Eric Blake

On 11/12/2013 06:44 PM, Wenchao Xia wrote:
 Nested structure is not supported now, so following define is not valid:
 { 'event': 'EVENT_C',
   'data': { 'a': { 'a_a', 'str', 'a_b', 'str' }, 'b': 'int' }

But what IS valid?  You need to document this in docs/qapi-code-gen.txt
at a bare minimum.

This patch series is hard to review, and still has the RFC subject line.
 At this point, I think it's worth rebasing and resending what you have;
even if it needs more review, it will at least get it to a state that is
easier to review.

 +++ b/scripts/qapi-event.py
 @@ -0,0 +1,355 @@
 +#
 +# QAPI event generator
 +#
 +# Copyright IBM, Corp. 2013
 +#
 +# Authors:
 +#  Wenchao Xia xiaw...@linux.vnet.ibm.com
 +#
 +# This work is licensed under the terms of the GNU GPLv2.

Can you please use GPLv2+ (that is, add the or later clause)?  We
already have GPLv2-only code, but I don't want to increase the size of
that unfortunate license choice.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH v2 15/24] block: Allow waiting for overlapping requests between begin/end

2013-12-13 Thread Kevin Wolf

Previously, it was not possible to use wait_for_overlapping_requests()
between tracked_request_begin()/end() because it would wait for itself.

Ignore the current request in the overlap check and run more of the
bdrv_co_do_preadv/pwritev code with a BdrvTrackedRequest present.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 block.c | 38 --
 1 file changed, 20 insertions(+), 18 deletions(-)

diff --git a/block.c b/block.c
index fa888d9..b4f6ead 100644
--- a/block.c
+++ b/block.c
@@ -2106,7 +2106,7 @@ static bool tracked_request_overlaps(BdrvTrackedRequest 
*req,
 }
 
 static void coroutine_fn wait_for_overlapping_requests(BlockDriverState *bs,
-int64_t offset, unsigned int bytes)
+BdrvTrackedRequest *self, int64_t offset, unsigned int bytes)
 {
 BdrvTrackedRequest *req;
 int64_t cluster_offset;
@@ -2124,6 +2124,9 @@ static void coroutine_fn 
wait_for_overlapping_requests(BlockDriverState *bs,
 do {
 retry = false;
 QLIST_FOREACH(req, bs-tracked_requests, list) {
+if (req == self) {
+continue;
+}
 if (tracked_request_overlaps(req, cluster_offset, cluster_bytes)) {
 /* Hitting this means there was a reentrant request, for
  * example, a block driver issuing nested requests.  This must
@@ -2721,10 +2724,10 @@ err:
  * implemented by the caller.
  */
 static int coroutine_fn bdrv_aligned_preadv(BlockDriverState *bs,
-int64_t offset, unsigned int bytes, QEMUIOVector *qiov, int flags)
+BdrvTrackedRequest *req, int64_t offset, unsigned int bytes,
+QEMUIOVector *qiov, int flags)
 {
 BlockDriver *drv = bs-drv;
-BdrvTrackedRequest req;
 int ret;
 
 int64_t sector_num = offset  BDRV_SECTOR_BITS;
@@ -2739,11 +2742,9 @@ static int coroutine_fn 
bdrv_aligned_preadv(BlockDriverState *bs,
 }
 
 if (bs-copy_on_read_in_flight) {
-wait_for_overlapping_requests(bs, offset, bytes);
+wait_for_overlapping_requests(bs, req, offset, bytes);
 }
 
-tracked_request_begin(req, bs, offset, bytes, false);
-
 if (flags  BDRV_REQ_COPY_ON_READ) {
 int pnum;
 
@@ -2790,8 +2791,6 @@ static int coroutine_fn 
bdrv_aligned_preadv(BlockDriverState *bs,
 }
 
 out:
-tracked_request_end(req);
-
 if (flags  BDRV_REQ_COPY_ON_READ) {
 bs-copy_on_read_in_flight--;
 }
@@ -2807,6 +2806,8 @@ static int coroutine_fn 
bdrv_co_do_preadv(BlockDriverState *bs,
 BdrvRequestFlags flags)
 {
 BlockDriver *drv = bs-drv;
+BdrvTrackedRequest req;
+
 /* TODO Lift BDRV_SECTOR_SIZE restriction in BlockDriver interface */
 uint64_t align = MAX(BDRV_SECTOR_SIZE, bs-request_alignment);
 uint8_t *head_buf = NULL;
@@ -2856,9 +2857,11 @@ static int coroutine_fn 
bdrv_co_do_preadv(BlockDriverState *bs,
 bytes = ROUND_UP(bytes, align);
 }
 
-ret = bdrv_aligned_preadv(bs, offset, bytes,
+tracked_request_begin(req, bs, offset, bytes, false);
+ret = bdrv_aligned_preadv(bs, req, offset, bytes,
   use_local_qiov ? local_qiov : qiov,
   flags);
+tracked_request_end(req);
 
 if (use_local_qiov) {
 qemu_iovec_destroy(local_qiov);
@@ -2977,10 +2980,10 @@ static int coroutine_fn 
bdrv_co_do_write_zeroes(BlockDriverState *bs,
  * Forwards an already correctly aligned write request to the BlockDriver.
  */
 static int coroutine_fn bdrv_aligned_pwritev(BlockDriverState *bs,
-int64_t offset, unsigned int bytes, QEMUIOVector *qiov, int flags)
+BdrvTrackedRequest *req, int64_t offset, unsigned int bytes,
+QEMUIOVector *qiov, int flags)
 {
 BlockDriver *drv = bs-drv;
-BdrvTrackedRequest req;
 int ret;
 
 int64_t sector_num = offset  BDRV_SECTOR_BITS;
@@ -2990,12 +2993,10 @@ static int coroutine_fn 
bdrv_aligned_pwritev(BlockDriverState *bs,
 assert((bytes  (BDRV_SECTOR_SIZE - 1)) == 0);
 
 if (bs-copy_on_read_in_flight) {
-wait_for_overlapping_requests(bs, offset, bytes);
+wait_for_overlapping_requests(bs, req, offset, bytes);
 }
 
-tracked_request_begin(req, bs, offset, bytes, true);
-
-ret = notifier_with_return_list_notify(bs-before_write_notifiers, req);
+ret = notifier_with_return_list_notify(bs-before_write_notifiers, req);
 
 if (ret  0) {
 /* Do nothing, write notifier decided to fail this request */
@@ -3018,8 +3019,6 @@ static int coroutine_fn 
bdrv_aligned_pwritev(BlockDriverState *bs,
 bs-total_sectors = MAX(bs-total_sectors, sector_num + nb_sectors);
 }
 
-tracked_request_end(req);
-
 return ret;
 }
 
@@ -3030,6 +3029,7 @@ static int coroutine_fn 
bdrv_co_do_pwritev(BlockDriverState *bs,
 int64_t offset, unsigned int bytes, QEMUIOVector *qiov,
 BdrvRequestFlags flags)
 {
+BdrvTrackedRequest req;
 int ret;
 
 if (!bs-drv) {
@@ -3047,7 +3047,9 @@ static int coroutine_fn

Re: [Qemu-devel] [PATCH V4 6/9] qapi script: support pre-defined enum type as discriminator in union

2013-12-13 Thread Eric Blake

On 12/10/2013 10:48 PM, Wenchao Xia wrote:
 By default, any union will automatically generate a enum type as
 [UnionName]Kind in C code, and it is duplicated when the discriminator
 is specified as a pre-defined enum type in schema. After this patch,
 the pre-defined enum type will be really used as the switch case
 condition in generated C code, if discriminator is an enum field.
 
 Signed-off-by: Wenchao Xia xiaw...@linux.vnet.ibm.com
 ---
  docs/qapi-code-gen.txt |8 ++--
  scripts/qapi-types.py  |   18 ++
  scripts/qapi-visit.py  |   23 ---
  scripts/qapi.py|4 +++-
  4 files changed, 39 insertions(+), 14 deletions(-)
 
 diff --git a/docs/qapi-code-gen.txt b/docs/qapi-code-gen.txt
 index 0728f36..f00640b 100644
 --- a/docs/qapi-code-gen.txt
 +++ b/docs/qapi-code-gen.txt
 @@ -123,11 +123,15 @@ And it looks like this on the wire:
  
  Flat union types avoid the nesting on the wire. They are used whenever a
  specific field of the base type is declared as the discriminator ('type' is
 -then no longer generated). The discriminator must always be a string field.
 +then no longer generated). The discriminator can be a string field or a
 +predefined enum field. If it is a string field, a hidden enum type will be
 +generated as [UNION_NAME]Kind. If it is an enum field, compile time check

s/compile/a compile/

Minor enough that I'm okay giving:

Reviewed-by: Eric Blake ebl...@redhat.com

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH 04/11] qdev: add to BusState hotplug-handler link

2013-12-13 Thread Igor Mammedov

It will allow to reuse field with different BUSes, reducing code duplication.
Field is intended fot replacing 'hotplug_qdev' field in PCIBus and also
will allow to avoid adding equivalent field to DimmBus with possiblitity
to refactor other BUSes to use it instead of custom field.
In addition once all users of allow_hotplug field are converted to new
API, link could replace allow_hotplug in qdev hotplug code.

Signed-off-by: Igor Mammedov imamm...@redhat.com
---
 hw/core/qdev.c | 4 
 include/hw/qdev-core.h | 5 +
 2 files changed, 9 insertions(+)

diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index e374a93..25c2d2c 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -32,6 +32,7 @@
 #include qapi/visitor.h
 #include qapi/qmp/qjson.h
 #include monitor/monitor.h
+#include hw/hotplug.h
 
 int qdev_hotplug = 0;
 static bool qdev_hot_added = false;
@@ -868,6 +869,9 @@ static void qbus_initfn(Object *obj)
 BusState *bus = BUS(obj);
 
 QTAILQ_INIT(bus-children);
+object_property_add_link(obj, QDEV_HOTPLUG_HANDLER_PROPERTY,
+ TYPE_HOTPLUG_HANDLER,
+ (Object **)bus-hotplug_handler, NULL);
 }
 
 static char *default_bus_get_fw_dev_path(DeviceState *dev)
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index f2043a6..684a5da 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -8,6 +8,7 @@
 #include qom/object.h
 #include hw/irq.h
 #include qapi/error.h
+#include hw/hotplug.h
 
 enum {
 DEV_NVECTORS_UNSPECIFIED = -1,
@@ -169,14 +170,18 @@ typedef struct BusChild {
 QTAILQ_ENTRY(BusChild) sibling;
 } BusChild;
 
+#define QDEV_HOTPLUG_HANDLER_PROPERTY hotplug-handler
+
 /**
  * BusState:
+ * @hotplug_device: link to a hotplug device associated with bus.
  */
 struct BusState {
 Object obj;
 DeviceState *parent;
 const char *name;
 int allow_hotplug;
+HotplugHandler *hotplug_handler;
 int max_index;
 QTAILQ_HEAD(ChildrenHead, BusChild) children;
 QLIST_ENTRY(BusState) sibling;
-- 
1.8.3.1

Re: [Qemu-devel] [PATCH v2 1/8] target-arm: A64: add support for ld/st pair

2013-12-13 Thread Peter Maydell

On 12 December 2013 12:14, C Fontana claudio.font...@linaro.org wrote:
 I think that there is more than the missing return:

 we need to handle the case 0 as well, as it's a perfectly valid form
 of a load/store pair:
 it's the Load/Store no-allocate pair (offset) (LDNP, STNP).

 So in my view we need to add a case 0 where we handle  the load/store
 no-allocate pair,
 and no default case, or a default case where we assert(0) for unreachable 
 code,
 since all possible index values (2 bits) should be handled by the
 switch statement.

Yep. I've added support for the non-temporal versions
(pretty easy since qemu doesn't care about cache allocation
hints) to my patchstack so we'll get this in the next
version of the patchset.

thanks
-- PMM

[Qemu-devel] How to Learn Qemu for Development?

2013-12-13 Thread Hossein Zabolzadeh

Dear All,
Hi,
I traverse qemu website, to find out  some useful documents for learning qemu 
basics, and concepts. But, all available documents require so in-depth 
technical knowledge and no one found to be self-instructive.
I need a book or any other tutorial-like resources, to learn Qemu and its 
cohesion with linux kernel.
Thanks in advanced.

Re: [Qemu-devel] [PULL 00/30] virtio conversion to realize and hotplug/unplug fixes

2013-12-13 Thread Paolo Bonzini

Il 11/12/2013 08:45, Paolo Bonzini ha scritto:
  Anthony,
 
  the following changes since commit 
  7dc65c02fe3fb8f3146ce0b9ff5fec5945329f0e:
 
Open 2.0 development tree (2013-11-27 14:02:45 -0800)
  
  This also conflicts badly.
 This doesn't conflict here either.  What files are conflicting exactly?

FWIW, I tried again merging and it all went smoothly:

$ git pull  git://github.com/bonzini/qemu.git virtio
Da git://github.com/bonzini/qemu
 * branchvirtio - FETCH_HEAD
Auto-merging hw/net/virtio-net.c
Merge made by the 'recursive' strategy.
 hw/9pfs/virtio-9p-device.c  |  43 +++---
 hw/block/dataplane/virtio-blk.c |  30 +-
 hw/block/dataplane/virtio-blk.h |   5 +-
 hw/block/virtio-blk.c   |  47 ++--
 hw/char/virtio-serial-bus.c |  33 +--
 hw/net/virtio-net.c |  36 ++--
 hw/s390x/virtio-ccw.c   |  83 +++
 hw/s390x/virtio-ccw.h   |   1 -
 hw/scsi/vhost-scsi.c|  45 +++
 hw/scsi/virtio-scsi.c   |  57 +--
 hw/virtio/virtio-balloon.c  |  32 +--
 hw/virtio/virtio-bus.c  |  80 +++---
 hw/virtio/virtio-mmio.c |   9 +--
 hw/virtio/virtio-pci.c  | 122 
 hw/virtio/virtio-pci.h  |   1 -
 hw/virtio/virtio-rng.c  |  43 +++---
 hw/virtio/virtio.c  |  40 -
 include/hw/virtio/virtio-bus.h  |  22 +---
 include/hw/virtio/virtio-rng.h  |   2 +
 include/hw/virtio/virtio-scsi.h |   4 +-
 include/hw/virtio/virtio.h  |   8 ++-
 tests/qdev-monitor-test.c   |   4 +-
 22 files changed, 413 insertions(+), 334 deletions(-)

$ git pull  git://github.com/bonzini/qemu.git scsi-next
Da git://github.com/bonzini/qemu
 * branchscsi-next  - FETCH_HEAD
Auto-merging qemu-options.hx
Auto-merging hw/scsi/scsi-bus.c
Auto-merging configure
Merge made by the 'recursive' strategy.
 block/iscsi.c   | 401 ++--
 configure   |   6 +-
 hw/scsi/scsi-bus.c  |  14 +-
 hw/scsi/scsi-disk.c |  29 ++--
 qemu-options.hx |   2 +-
 5 files changed, 143 insertions(+), 309 deletions(-)

So I'm not going to resend these series.  In particular the beginning of
the virtio series needs to be backported to 1.7.1 (it fixes a crash), which
is why I based it on an earlier commit than usual.

The SCSI patches might conflict with Kevin's 512-on-4K changes so I might
have to resend that one anyway, but that's a different story and I'll get
there when it happens. :)

Please confirm that it works for you, or describe your breakage.  I'm using
git 1.8.4.2 in case that matters.

Paolo

Re: [Qemu-devel] [PATCH v3 04/21] qapi: extend qdict_flatten() for QLists

2013-12-13 Thread Max Reitz


On 12.12.2013 07:17, Fam Zheng wrote:

On 2013年12月12日 02:10, Max Reitz wrote:

Reversing qdict_array_split(), qdict_flatten() should flatten QLists as
well by interpreting them as QDicts where every entry's key is its
index.

This allows bringing QDicts with QLists from QMP commands to the same
form as they would be given as command-line options, thereby allowing
them to be parsed the same way.

Signed-off-by: Max Reitz mre...@redhat.com
---


The logic looks fine, just a few questions about assertions below.


qobject/qdict.c | 45 +
1 file changed, 41 insertions(+), 4 deletions(-)

diff --git a/qobject/qdict.c b/qobject/qdict.c
index fca1902..d7ce4b3 100644
--- a/qobject/qdict.c
+++ b/qobject/qdict.c
@@ -477,7 +477,40 @@ static void qdict_destroy_obj(QObject *obj)
g_free(qdict);
}

-static void qdict_do_flatten(QDict *qdict, QDict *target, const char 
*prefix)

+static void qdict_flatten_qdict(QDict *qdict, QDict *target,
+ const char *prefix);
+
+static void qdict_flatten_qlist(QList *qlist, QDict *target, const 
char *prefix)

+{
+ QObject *value;
+ const QListEntry *entry;
+ char *new_key;
+ int i;
+
+ /* This function is never called with prefix == NULL, i.e., it is 
always
+ * called from within qdict_flatten_q(list|dict)(). Therefore, it 
does not
+ * need to remove list entries during the iteration (the whole list 
will be
+ * deleted eventually anyway from qdict_flatten_qdict()). Also, 
prefix can

+ * never be NULL. */


Then maybe assert this?


Yes, I'll do that.




+
+ entry = qlist_first(qlist);
+
+ for (i = 0; entry; entry = qlist_next(entry), i++) {
+ value = qlist_entry_obj(entry);
+
+ qobject_incref(value);
+ new_key = g_strdup_printf(%s.%i, prefix, i);
+ qdict_put_obj(target, new_key, value);
+
+ if (qobject_type(value) == QTYPE_QDICT) {
+ qdict_flatten_qdict(qobject_to_qdict(value), target, new_key);
+ } else {


And maybe assert the type are not something other than dict and list?


I think I rather forgot to support other types. Those should be moved to 
the target QDict, just as qdict_flatten_qdict() does it.





+ qdict_flatten_qlist(qobject_to_qlist(value), target, new_key);
+ }
+ }
+}
+
+static void qdict_flatten_qdict(QDict *qdict, QDict *target, const 
char *prefix)

{
QObject *value;
const QDictEntry *entry, *next;
@@ -500,8 +533,12 @@ static void qdict_do_flatten(QDict *qdict, QDict 
*target, const char *prefix)

if (qobject_type(value) == QTYPE_QDICT) {
/* Entries of QDicts are processed recursively, the QDict object
* itself disappears. */
- qdict_do_flatten(qobject_to_qdict(value), target,
- new_key ? new_key : entry-key);
+ qdict_flatten_qdict(qobject_to_qdict(value), target,
+ new_key ? new_key : entry-key);
+ delete = true;
+ } else if (qobject_type(value) == QTYPE_QLIST) {
+ qdict_flatten_qlist(qobject_to_qlist(value), target,
+ new_key ? new_key : entry-key);
delete = true;
} else if (prefix) {
/* All other objects are moved to the target unchanged. */
@@ -531,7 +568,7 @@ static void qdict_do_flatten(QDict *qdict, QDict 
*target, const char *prefix)

*/
void qdict_flatten(QDict *qdict)
{
- qdict_do_flatten(qdict, qdict, NULL);
+ qdict_flatten_qdict(qdict, qdict, NULL);


If there was an assumption that the top level is a qdict, also assert 
that here?


Well, the type is already QDict *, therefore I think we are safe to 
assume it is indeed one – if that is what you meant. ;-)


The question to me would be whether we should be supporting QLists here, 
too. Since this would make qdict_flatten_qlist() more complex, I think 
we should not do that until it is required.


Thank you,

Max

Re: [Qemu-devel] [PATCH v3 04/21] qapi: extend qdict_flatten() for QLists

2013-12-13 Thread Max Reitz


On 12.12.2013 07:21, Fam Zheng wrote:

On 2013年12月12日 02:10, Max Reitz wrote:

Reversing qdict_array_split(), qdict_flatten() should flatten QLists as
well by interpreting them as QDicts where every entry's key is its
index.

This allows bringing QDicts with QLists from QMP commands to the same
form as they would be given as command-line options, thereby allowing
them to be parsed the same way.

Signed-off-by: Max Reitz mre...@redhat.com
---
qobject/qdict.c | 45 +
1 file changed, 41 insertions(+), 4 deletions(-)

diff --git a/qobject/qdict.c b/qobject/qdict.c
index fca1902..d7ce4b3 100644
--- a/qobject/qdict.c
+++ b/qobject/qdict.c
@@ -477,7 +477,40 @@ static void qdict_destroy_obj(QObject *obj)
g_free(qdict);
}

-static void qdict_do_flatten(QDict *qdict, QDict *target, const char 
*prefix)

+static void qdict_flatten_qdict(QDict *qdict, QDict *target,
+ const char *prefix);
+
+static void qdict_flatten_qlist(QList *qlist, QDict *target, const 
char *prefix)

+{
+ QObject *value;
+ const QListEntry *entry;
+ char *new_key;
+ int i;
+
+ /* This function is never called with prefix == NULL, i.e., it is 
always
+ * called from within qdict_flatten_q(list|dict)(). Therefore, it 
does not
+ * need to remove list entries during the iteration (the whole list 
will be
+ * deleted eventually anyway from qdict_flatten_qdict()). Also, 
prefix can

+ * never be NULL. */
+
+ entry = qlist_first(qlist);
+
+ for (i = 0; entry; entry = qlist_next(entry), i++) {
+ value = qlist_entry_obj(entry);
+
+ qobject_incref(value);
+ new_key = g_strdup_printf(%s.%i, prefix, i);
+ qdict_put_obj(target, new_key, value);
+
+ if (qobject_type(value) == QTYPE_QDICT) {
+ qdict_flatten_qdict(qobject_to_qdict(value), target, new_key);
+ } else {
+ qdict_flatten_qlist(qobject_to_qlist(value), target, new_key);
+ }
+ }
+}
+
+static void qdict_flatten_qdict(QDict *qdict, QDict *target, const 
char *prefix)


Sorry for replying twice on the same patch. But it would be nice if 
you could add a few comments on this function with some examples, as 
you did with qdict_array_split, I appreciated that. :)


Okay.

Max

Re: [Qemu-devel] [PATCH v3 08/21] block: Allow reference for bdrv_file_open()

2013-12-13 Thread Max Reitz


On 12.12.2013 09:01, Fam Zheng wrote:

On 2013年12月12日 02:11, Max Reitz wrote:

Allow specifying a reference to an existing block device (by name) for
bdrv_file_open() instead of a filename and/or options.

Signed-off-by: Max Reitz mre...@redhat.com
---
block.c | 27 ---
block/blkdebug.c | 2 +-
block/blkverify.c | 2 +-
block/cow.c | 3 ++-
block/qcow.c | 3 ++-
block/qcow2.c | 2 +-
block/qed.c | 4 ++--
block/sheepdog.c | 4 ++--
block/vhdx.c | 2 +-
block/vmdk.c | 4 ++--
include/block/block.h | 3 ++-
qemu-io.c | 2 +-
12 files changed, 41 insertions(+), 17 deletions(-)

diff --git a/block.c b/block.c
index 13f001a..9e197b3 100644
--- a/block.c
+++ b/block.c
@@ -858,9 +858,10 @@ free_and_fail:
* dictionary, it needs to use QINCREF() before calling bdrv_file_open.
*/
int bdrv_file_open(BlockDriverState **pbs, const char *filename,
- QDict *options, int flags, Error **errp)
+ const char *reference, QDict *options, int flags,
+ Error **errp)
{
- BlockDriverState *bs;
+ BlockDriverState *bs = NULL;
BlockDriver *drv;
const char *drvname;
bool allow_protocol_prefix = false;
@@ -872,6 +873,26 @@ int bdrv_file_open(BlockDriverState **pbs, const 
char *filename,

options = qdict_new();
}

+ if (reference) {
+ if (filename || qdict_size(options)) {
+ error_setg(errp, Cannot reference an existing block device with 
+ additional options or a new filename);
+ ret = -EINVAL;
+ goto fail;
+ }
+ QDECREF(options);


QDECREF called...


+
+ bs = bdrv_find(reference);
+ if (!bs) {
+ error_setg(errp, Cannot find block device '%s', reference);
+ ret = -ENODEV;
+ goto fail;


Jump to fail for QDECREF again. Duplicated?


Oh, right, thanks.

Max

Re: [Qemu-devel] [PATCH] qemu-img: make progress output more accurate during convert

2013-12-13 Thread Kevin Wolf

Am 05.12.2013 um 15:54 hat Peter Lieven geschrieben:
 the progress output is very bumpy if the input images contains
 a significant portion of unallocated sectors. This patch
 checks how much sectors are allocated a priori if progress
 output is selected.
 
 Signed-off-by: Peter Lieven p...@kamp.de

Thanks, applied to the block branch.

Kevin

Re: [Qemu-devel] [PATCH v3 21/21] iotests: Test new blkdebug/blkverify interface

2013-12-13 Thread Max Reitz


On 12.12.2013 11:41, Fam Zheng wrote:

On 2013年12月12日 02:11, Max Reitz wrote:

Add a test for the new blkdebug/blkverify interface.

Signed-off-by: Max Reitz mre...@redhat.com
---
  tests/qemu-iotests/071 | 201 
+

  tests/qemu-iotests/071.out |  73 
  tests/qemu-iotests/group   |   1 +
  3 files changed, 275 insertions(+)
  create mode 100755 tests/qemu-iotests/071
  create mode 100644 tests/qemu-iotests/071.out

diff --git a/tests/qemu-iotests/071 b/tests/qemu-iotests/071
new file mode 100755
index 000..4be525e
--- /dev/null
+++ b/tests/qemu-iotests/071
@@ -0,0 +1,201 @@
+#!/bin/bash


Is is intended to use hand coded json instead of iotests.py for QMP 
test here?


I first tried to use Python, but I hit problems with 
human-monitor-command and qemu-io. qemu-io prints errors to stderr, 
therefore they are not returned via QMP. iotests.py discards stderr, 
therefore there was no way to actually check for the qemu-io return status.


I could have added a function to iotests.py, but then I would have to 
manually compare and filter the qemu-io output with a reference. All 
these functions are already there for shell script tests.


I guess I should add this explanation to the commit message, though…

Max

Re: [Qemu-devel] detecting -enable-fips

2013-12-13 Thread Eric Blake

On 12/05/2013 02:04 PM, Eric Blake wrote:
 Commit 0f66998 added the command line option -enable-fips for qemu 1.2;
 but as of at least qemu 1.6, the 'query-command-line-options' QMP
 monitor command does not report it.  This is particularly annoying since
 the command line option is conditional - it is present in Linux builds
 but absent in BSD builds.  Does anyone know of any other QMP method for
 querying if this command line option is supported?  Or am I just
 relegated to trying it and seeing if the option gets rejected?
 
 [I'm personally of the opinion that libvirt should use -enable-fips 100%
 of the time; I don't really see what it is buying us to have an option
 that can be enabled but not disabled, and where enabling it has no
 impact except when running in FIPS mode; especially when the other
 libraries in use on the system already honor FIPS mode without any extra
 command line option.  But I'm not going to be the one to argue for a
 change in behavior other than the mere detection of the option.]

Ping.  Any thoughts at all on how to detect boolean command-line options
via QMP?

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [libvirt] [PATCH] qemu: always ask for -enable-fips

2013-12-13 Thread Eric Blake

On 12/13/2013 08:06 AM, Jiri Denemark wrote:
 On Fri, Dec 13, 2013 at 15:58:55 +0100, Michal Privoznik wrote:
 On 05.12.2013 22:54, Eric Blake wrote:
 On a system that is enforcing FIPS, most libraries honor the
 current mode by default.  Qemu, on the other hand, refused to
 honor FIPS mode unless you add the '-enable-fips' command
 line option; worse, this option is not discoverable via QMP,
 and is only present on binaries built for Linux.  As far as
 I can tell, unconditionally using the option when it is
 available has no negative consequences (the option has no
 change to qemu behavior except when FIPS is enabled, at which
 point it cripples insecure VNC passwords which is the one thing
 that libvirt must not allow when FIPS is active).

 This fixes https://bugzilla.redhat.com/show_bug.cgi?id=1035474

 Sigh, oh boy, your favorite swear-word. ACK.
 
 Don't we want to wait for QEMU to decide what they should be doing with
 -enable-fips to make it detectable?

I tried, and got no response:
https://lists.gnu.org/archive/html/qemu-devel/2013-12/msg00946.html

adding qemu-devel in CC for another try.

 If we push this patch, we can't
 basically move into detecting the option and enabling it only when
 detected since that could cause regressions for older QEMU version that
 supported the option but did not advertise it.

Not necessarily.  We can code things along these lines:
if qemu new enough to provide binary option detection:
use detection results, use if present
else:
if Linux:
hard-code on
else:
assume unavailable

 If we just wait for the
 option to be detectable and enable it only when we detect its support in
 QEMU, we won't enable it for all possible QEMU versions but we won't
 regress in any way.

I'm not worried about regressions - if someone backports binary option
detection, we'll do the right thing; if they don't, then trying to the
option on a build where it is not present will give a nice loud failure
from qemu about an unrecognized command line option, which we would get
anyways even if binary option detection is not added in qemu and our
hard-code guess is wrong.  I'd rather have libvirt turn the option on
NOW (since it is a FIPS certification nightmare if we don't) than wait
for some future qemu to fix binary option detection and hope it gets
backported to any version of qemu that libvirt must drive in a FIPS
environment.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH] blkdebug: Use QLIST_FOREACH_SAFE to resume IO

2013-12-13 Thread Kevin Wolf

Am 13.12.2013 um 08:25 hat Fam Zheng geschrieben:
 Qemu-iotest 030 was broken.
 
 When the coroutine runs and finishes, it will remove itself from the req
 list, so let's use safe version of foreach to avoid use after free.
 
 Signed-off-by: Fam Zheng f...@redhat.com

Thanks, applied to the block branch.

 diff --git a/block/blkdebug.c b/block/blkdebug.c
 index 37cf028..957be2c 100644
 --- a/block/blkdebug.c
 +++ b/block/blkdebug.c
 @@ -594,9 +594,9 @@ static int blkdebug_debug_breakpoint(BlockDriverState 
 *bs, const char *event,
  static int blkdebug_debug_resume(BlockDriverState *bs, const char *tag)
  {
  BDRVBlkdebugState *s = bs-opaque;
 -BlkdebugSuspendedReq *r;
 +BlkdebugSuspendedReq *r, *next;
  
 -QLIST_FOREACH(r, s-suspended_reqs, next) {
 +QLIST_FOREACH_SAFE(r, s-suspended_reqs, next, next) {
  if (!strcmp(r-tag, tag)) {
  qemu_coroutine_enter(r-co, NULL);
  return 0;

This hunk wasn't strictly necessary because of the return 0, but it
doesn't hurt either.

Kevin

1 2 3 >

1 - 100 of 224 matches

Mail list logo