Re: [Qemu-devel] [PULL 2/8] i.MX: Implement a more complete version of the GPT timer.

2013-06-25 Thread Paolo Bonzini
Il 25/06/2013 22:53, Peter Maydell ha scritto:
> On 25 June 2013 19:42, Paolo Bonzini  wrote:
>> Il 25/06/2013 20:21, Peter Maydell ha scritto:
>>> @@ -416,7 +513,7 @@ static int imx_timerg_init(SysBusDevice *dev)
>>>
>>>  sysbus_init_irq(dev, &s->irq);
>>>  memory_region_init_io(&s->iomem, &imx_timerg_ops,
>>> -  s, "imxg-timer",
>>> +  s, TYPE_IMX_GPT,
>>>0x1000);
>>>  sysbus_init_mmio(dev, &s->iomem);
>>>
>>
>> There was some agreement that this is not a good change.
> 
> I agree (and more so regarding the use of the macro in the
> vmstate name), but nobody actually posted any comment to
> that effect against any of the versions of this patch that
> got sent out for review...

Yeah, the timing was bad... Can you post a revert, though?

Paolo



Re: [Qemu-devel] [PATCH v3 1/2] net: introduce RX_FILTER_CHANGED event

2013-06-25 Thread Markus Armbruster
Eric Blake  writes:

> On 05/23/2013 03:07 AM, Amos Kong wrote:
>> Introduce this new QMP event to notify management after guest changes
>> rx-filter configuration.
>> 
>> Signed-off-by: Amos Kong 
>> ---
>>  QMP/qmp-events.txt| 14 ++
>>  include/monitor/monitor.h |  1 +
>>  monitor.c |  1 +
>>  3 files changed, 16 insertions(+)
>> 
>> diff --git a/QMP/qmp-events.txt b/QMP/qmp-events.txt
>> index 92fe5fb..ad6612b 100644
>> --- a/QMP/qmp-events.txt
>> +++ b/QMP/qmp-events.txt
>> @@ -154,6 +154,20 @@ Data:
>>  "path": "/machine/peripheral/virtio-net-pci-0" },
>>"timestamp": { "seconds": 1265044230, "microseconds": 450486 } }
>>  
>> +RX_FILTER_CHANGED
>> +-
>
>> +
>>  DEVICE_TRAY_MOVED
>>  -
>
> Isn't this file supposed to be kept in sorted order, to minimize merge
> conflicts when backporting events?

Yes, please.  Also helps humans navigate.

GUEST_PANICKED is out of order, needs fixing.



Re: [Qemu-devel] block: Review of .has_zero_init use

2013-06-25 Thread Bharata B Rao
On Wed, Jun 26, 2013 at 07:59:27AM +0200, Peter Lieven wrote:
> 
> Am 26.06.2013 um 05:14 schrieb Bharata B Rao :
> 
> > On Tue, Jun 25, 2013 at 01:39:11PM +0200, Kevin Wolf wrote:
> >> 
> >> Can you please review for the gluster, rbd, sheepdog and ssh driver
> >> whether it's safe to assume that the image reads back as zeros after
> >> bdrv_create?
> > 
> > Gluster supports both file and block backends. While the above is true for
> > file backend (which uses ftruncate), the same is not true for
> > block backend (which uses lvcreate & lvresize).
> > 
> > So overall it is not safe to assume that an image on GlusterFS volume
> > reads back as zeroes after create.
> 
> Okay, so for safety we have to return has_zero_init = 0. Erroneously assuming
> a device is zero initialized can bring severe filesystem corruption.
> 
> Do you see a way to query the information of the underlaying
> backend from the storage and return 1 or 0 conditionally?

Right now no. There have been efforts to get the capabilities (file, block etc)
of gluster volume from gluster cmdline, but there haven't been discussions
about providing such capabilities from libgfapi which is the library
QEMU uses to talk with gluster.

I see has_zero_init being used from qemu-img convert path. Is there any
other use of this in QEMU ?

Regards,
Bharata. 




Re: [Qemu-devel] [PATCH v11 14/15] rdma: introduce MIG_STATE_NONE and change MIG_STATE_SETUP state transition

2013-06-25 Thread Paolo Bonzini
Il 26/06/2013 02:31, Michael R. Hines ha scritto:
> On 06/25/2013 05:06 PM, Paolo Bonzini wrote:
>> Il 25/06/2013 22:56, Michael R. Hines ha scritto:
>>> I was wrong - this does require a protocol extension.
>>>
>>> This is because the RDMA transfers are asynchronous, and thus
>>> we cannot know in advance that it is safe to unregister the memory
>>> associated with each individual transfer before the transfer actually
>>> completes.
>>>
>>> While the destination currently uses the protocol to participate in
>>> *registering* the page, the destination does not participate in the
>>> RDMA transfers themselves, only the source does, and thus would
>>> require a new exchange of messages to block and instruct the
>>> destination to unpin the memory.
>> Yes, that's what I recalled too (really what mst told me :)).  Does it
>> need to be blocking though?  As long as the pinning is blocking, and
>> messages are processed in order, the source can proceed immediately
>> after sending an unpin message.  This assumes of course that the chunk
>> is not being transmitted, and I am not sure how easy the source can
>> determine that.
> 
> No, they're not processed in order. In fact, not only does the device
> write out of order, but also the PCI bus writes out of order.
> This was such a problem in fact, that I fixed several bugs as a result
> a few weeks ago (v7 of the patch with an in-depth description).
> 
> The destination simply cannot assume whatsoever what the ordering
> of the writes are - that's really the whole point of using RDMA in the
> first place so that the software can get out of the way of the transfer
> process to lower the latency of each transfer.

The memory is processed out of order, but what about the messages?
Those must be in order.

Note that I wrote above "This assumes of course that the chunk is not
being transmitted".  Can the source know when an asynchronous transfer
finished, and delay the unpinning until that time?

Paolo

> 
> The only option is to send a blocking message to the other side to
> request the unpinning (in addition to unpinning on the source first upon
> completion of the original transfer).
> 
> As you can expect, this would be very expensive and we must ensure
> that we have *very* good a-priori information that this memory will
> not need to be re-registered anytime in the near future.
> 
> - Michael
> 
> 
> 




Re: [Qemu-devel] [PATCH 0/3] make bh safe with hot-unplug

2013-06-25 Thread Paolo Bonzini
Il 26/06/2013 04:59, liu ping fan ha scritto:
>> The latter part could be the hard one in a multi-threaded context, but I
>> think it's up to the device to ensure it.  It doesn't _have_ to be hard.
>>  For example, joining the data-plane thread would do that as well.
>>
> It seems not easy, take consideration of the sequence:
> _bh_delete(), // ensure not scheduled, ie, no further access to 
> DeviceState
> wait for rcu grace period to come, // ensure not running
> free DeviceState in rcu-reclaimer
> But  in current code, _delete() can be placed in DeviceState
> finalization path(in reclaimer), which means while reclaiming, there
> still exist access path to the DeviceState.

It can be placed there, but it doesn't mean it is a good idea. :)

> On the other hand, pushing _delete() out of  finalization path is not
> easy, since we do not what time the DeviceState has done with its bh.

See above:

- if the BH will run in the iothread, the BH is definitely not running
(because the BH runs under the BQL, and the reclaimer owns it)

- if the BH is running in another thread, waiting for that thread to
terminate obviously makes the BH not running.

What we need is separation of removal and reclamation.  Without that,
any effort to make things unplug-safe is going to be way way more
complicated than necessary.

Paolo



Re: [Qemu-devel] [PATCH v4 06/10] qemu-ga: Add Windows VSS provider to quiesce applications on fsfreeze

2013-06-25 Thread Paolo Bonzini
Il 26/06/2013 00:31, Tomoki Sekiyama ha scritto:
> From: Paolo Bonzini [paolo.bonz...@gmail.com] on behalf of Paolo Bonzini 
> [pbonz...@redhat.com]
>> Il 06/06/2013 17:06, Tomoki Sekiyama ha scritto:
>>> +STDAPI VSSCheckOSVersion(void);
>>> +
>>> +STDAPI COMRegister(void);
>>> +STDAPI COMUnregister(void);
>>> +
>>> +STDAPI DllRegisterServer(void);
>>> +STDAPI DllUnregisterServer(void);
>>
>> Can you explain the difference between COMRegister/COMUnregister and
>> DllRegisterServer/DllUnregisterServer (and why the COM+ part need not be
>> done by regsvr32)?  Also, why does COMUnregister call
>> DllUnregisterServer but COMRegister does not call DllRegisterServer?
> 
> COMRegister and COMUnregister are called by`qemu-ga -s install`,
> to register/unregister the DLL into/from COM+ application catalogue.
> 
> DllRegisterServer is automatically called inside
> pCatalog->InstallComponent() (like regsvr32 does), and register
> this DLL as a VSS provider. DllUnregisterServer will do the oposite.
> 
> ICOMAdminCatalog (pCatalog) does not provide a method to uninstall
> component, so COMUnregister calls DllUnregisterServer by itself.

Understood, thanks.  Just one question remains: why is the COM+ part not
needed when you invoke regsvr32?

Thanks,

Paolo

>> Also, is it needed to call VSSCheckOSVersion from the requestor?  I
>> would think that checking VSSAPI.DLL is stronger than checking the
>> version, and indeed you do that check too.
> 
> In Windows XP, VSSAPI.DLL exists but it has different functionality
> and interfaces from newer Windows. 
> http://msdn.microsoft.com/en-us/library/windows/desktop/aa384627(v=vs.85).aspx
> 
> It is checking the OS version because this patchset only supports
> Windows 2003 or later.
> 
> Thanks,
> Tomoki Sekiyama
> 




Re: [Qemu-devel] block: Review of .has_zero_init use

2013-06-25 Thread Peter Lieven

Am 26.06.2013 um 05:14 schrieb Bharata B Rao :

> On Tue, Jun 25, 2013 at 01:39:11PM +0200, Kevin Wolf wrote:
>> 
>> Can you please review for the gluster, rbd, sheepdog and ssh driver
>> whether it's safe to assume that the image reads back as zeros after
>> bdrv_create?
> 
> Gluster supports both file and block backends. While the above is true for
> file backend (which uses ftruncate), the same is not true for
> block backend (which uses lvcreate & lvresize).
> 
> So overall it is not safe to assume that an image on GlusterFS volume
> reads back as zeroes after create.

Okay, so for safety we have to return has_zero_init = 0. Erroneously assuming
a device is zero initialized can bring severe filesystem corruption.

Do you see a way to query the information of the underlaying
backend from the storage and return 1 or 0 conditionally?

Peter



[Qemu-devel] meaningless to compare irqfd's msi message with new msi message in virtio_pci_vq_vector_unmask

2013-06-25 Thread Zhanghaoyu (A)
I searched "vector_irqfd" globally,  no place found to set/change irqfd's msi 
message, only irqfd's virq or users member may be changed in 
kvm_virtio_pci_vq_vector_use, 
kvm_virtio_pci_vq_vector_release, etc.
So I think it's meaningless to do below check in virtio_pci_vq_vector_unmask,
if (irqfd->msg.data != msg.data || irqfd->msg.address != msg.address)

And, I think the comparison between old msi message and new msi messge should 
be performed in kvm_update_routing_entry, the raw patch shown as below,
Signed-off-by: Zhang Haoyu 
Signed-off-by: Zhang Huanzhong 
---
 hw/virtio/virtio-pci.c |8 +++-
 kvm-all.c  |5 +
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index b070b64..e4829a3 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -613,11 +613,9 @@ static int virtio_pci_vq_vector_unmask(VirtIOPCIProxy 
*proxy,
 
 if (proxy->vector_irqfd) {
 irqfd = &proxy->vector_irqfd[vector];
-if (irqfd->msg.data != msg.data || irqfd->msg.address != msg.address) {
-ret = kvm_irqchip_update_msi_route(kvm_state, irqfd->virq, msg);
-if (ret < 0) {
-return ret;
-}
+ret = kvm_irqchip_update_msi_route(kvm_state, irqfd->virq, msg);
+if (ret < 0) {
+return ret;
 }
 }
 
diff --git a/kvm-all.c b/kvm-all.c
index e6b262f..63a33b4 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1034,6 +1034,11 @@ static int kvm_update_routing_entry(KVMState *s,
 continue;
 }
 
+if (entry->type == new_entry->type &&
+entry->flags == new_entry->flags &&
+!memcmp(&entry->u, &new_entry->u, sizeof(entry->u))) {
+return 0;
+}
 entry->type = new_entry->type;
 entry->flags = new_entry->flags;
 entry->u = new_entry->u;
-- 
1.7.3.1.msysgit.0


This patch works for both virtio-pci device and pci-passthrough device.
MST and I had been discussed this patch before, this patch can avoid 
meaninglessly updating the routing entry in kvm hypervisor when new msi message 
is identical with old msi message, 
especially in some cases, for example, frequently mask/unmask per-vector 
masking control bit in ISR on some old linux guest(e.g., rhel-5.5), which gains 
much.
At MST's request, the number will be provided later.

Thanks,
Zhang Haoyu



[Qemu-devel] [RFC PATCH 3/3] nbd: don't get ref if bs has no drive

2013-06-25 Thread Fam Zheng
Signed-off-by: Fam Zheng 
---
 blockdev-nbd.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/blockdev-nbd.c b/blockdev-nbd.c
index 95f10c8..2657a90 100644
--- a/blockdev-nbd.c
+++ b/blockdev-nbd.c
@@ -81,6 +81,7 @@ void qmp_nbd_server_add(const char *device, bool 
has_writable, bool writable,
 BlockDriverState *bs;
 NBDExport *exp;
 NBDCloseNotifier *n;
+DriveInfo *dinfo;
 
 if (server_fd == -1) {
 error_setg(errp, "NBD server not running");
@@ -109,7 +110,10 @@ void qmp_nbd_server_add(const char *device, bool 
has_writable, bool writable,
  nbd_server_put_ref);
 
 nbd_export_set_name(exp, device);
-drive_get_ref(drive_get_by_blockdev(bs));
+dinfo = drive_get_by_blockdev(bs);
+if (dinfo) {
+drive_get_ref(dinfo);
+}
 
 n = g_malloc0(sizeof(NBDCloseNotifier));
 n->n.notify = nbd_close_notifier;
-- 
1.8.3.1




[Qemu-devel] [RFC PATCH 0/3] Point-in-time snapshot exporting with drive-backup

2013-06-25 Thread Fam Zheng
This series adds partial support for point-in-time snapshot NBD exporting based
on drive-backup. The ideas is described below and patches followed (the missing
part is item 3, which work is in progress by Ian Main and will have another
patch on it). As the work does not overlap, these series should be quite
reviewable by itself.

Background
==

The goal of image fleecing is to provide a interface to inspect a point-in-time
snapshot of guest image data, not being interfered with guest overwrites after
it's created.  With drive-backup we already have the point-in-time snapshot
image (the target image), we only need three modifications to realize this:

 1. Give backup target an id, so we can add it to NBD server.

 2. Assign source device as backing of target, so reading the unallocated will
be passed to source.
As there's copy-on-write mechanism with drive-backup job, all the modified
data after snapshot created is copied to target, the unallocated data is
guaranteed to be unchanged, so reading from the source is correct. Note
that this requires target format supports backing file.

 3. Adding sync mode 'none' to drive-backup, so the block job only copy changed
data from source, which has minimal IO overhead.

Usage
=

With above three, we can simply export a point-in-time snapshot with two QMP 
commands:

drive-backup device=virtio0 format=qcow2 target=point-in-time.qcow2 
target-id=pit0 sync=none

nbd-server-add device=pit0 writable=no

Lifecycle
=

The block job originally had total control over target bs: when the job
completes, the target is deleted. Now it's shared with NBD, so we should make
sure the job completion or canceling wouldn't crash NBD server. This patch
doesn't handle this case: if the block job is ended before NBD exporting,
there'll be problem.

To avoid this, we either add more delicate condition checks as we do with
bs->in_use, or introduce reference count to BlockDriverState, which seems much
better in long term but needs more work. I have done some work on it, but it's
not ready yet. As we don't depend on it that heavily, let's do the things one
by one.

Any comments?

Fam Zheng (3):
  block: add target-id option to drive-backup QMP command
  block: assign backing relationship in drive-backup
  nbd: don't get ref if bs has no drive

 blockdev-nbd.c   | 6 +-
 blockdev.c   | 8 +++-
 qapi-schema.json | 7 +--
 qmp-commands.hx  | 3 ++-
 4 files changed, 19 insertions(+), 5 deletions(-)

-- 
1.8.3.1




[Qemu-devel] [RFC PATCH 2/3] block: assign backing relationship in drive-backup

2013-06-25 Thread Fam Zheng
Assign source image as the backing hd of target bs, so reading target bs
gets the point-in-time copy of data from source image.

Signed-off-by: Fam Zheng 
---
 blockdev.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/blockdev.c b/blockdev.c
index 5e694f3..a1d816e 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1504,6 +1504,10 @@ void qmp_drive_backup(const char *device, const char 
*target,
 return;
 }
 
+target_bs->backing_hd = bs;
+pstrcpy(target_bs->backing_file, sizeof(target_bs->backing_file),
+bs->filename);
+
 backup_start(bs, target_bs, speed, on_source_error, on_target_error,
  block_job_cb, bs, &local_err);
 if (local_err != NULL) {
-- 
1.8.3.1




[Qemu-devel] [RFC PATCH 1/3] block: add target-id option to drive-backup QMP command

2013-06-25 Thread Fam Zheng
Add target-id (optional) to drive-backup command, to make the target bs
a named drive so that we can operate on it (e.g. export with NBD).

Signed-off-by: Fam Zheng 
---
 blockdev.c   | 4 +++-
 qapi-schema.json | 7 +--
 qmp-commands.hx  | 3 ++-
 3 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index b3a57e0..5e694f3 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -935,6 +935,7 @@ static void drive_backup_prepare(BlkTransactionState 
*common, Error **errp)
 backup = common->action->drive_backup;
 
 qmp_drive_backup(backup->device, backup->target,
+ backup->has_target_id, backup->target_id,
  backup->has_format, backup->format,
  backup->has_mode, backup->mode,
  backup->has_speed, backup->speed,
@@ -1420,6 +1421,7 @@ void qmp_block_commit(const char *device,
 }
 
 void qmp_drive_backup(const char *device, const char *target,
+  bool has_target_id, const char *target_id,
   bool has_format, const char *format,
   bool has_mode, enum NewImageMode mode,
   bool has_speed, int64_t speed,
@@ -1494,7 +1496,7 @@ void qmp_drive_backup(const char *device, const char 
*target,
 return;
 }
 
-target_bs = bdrv_new("");
+target_bs = bdrv_new(has_target_id ? target_id : "");
 ret = bdrv_open(target_bs, target, NULL, flags, drv);
 if (ret < 0) {
 bdrv_delete(target_bs);
diff --git a/qapi-schema.json b/qapi-schema.json
index 30b1edb..abd29c3 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1646,7 +1646,8 @@
 # Since: 1.6
 ##
 { 'type': 'DriveBackup',
-  'data': { 'device': 'str', 'target': 'str', '*format': 'str',
+  'data': { 'device': 'str', 'target': 'str',
+'*target-id': 'str', '*format': 'str',
 '*mode': 'NewImageMode', '*speed': 'int',
 '*on-source-error': 'BlockdevOnError',
 '*on-target-error': 'BlockdevOnError' } }
@@ -1799,6 +1800,7 @@
 #  is a device, the existing file/device will be used as the new
 #  destination.  If it does not exist, a new file will be created.
 #
+# @target-id: #optional the drive id of the target.
 # @format: #optional the format of the new destination, default is to
 #  probe if @mode is 'existing', else the format of the source
 #
@@ -1825,7 +1827,8 @@
 # Since 1.6
 ##
 { 'command': 'drive-backup',
-  'data': { 'device': 'str', 'target': 'str', '*format': 'str',
+  'data': { 'device': 'str', 'target': 'str',
+'*target-id': 'str', '*format': 'str',
 '*mode': 'NewImageMode', '*speed': 'int',
 '*on-source-error': 'BlockdevOnError',
 '*on-target-error': 'BlockdevOnError' } }
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 362f0e1..c90e132 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -913,7 +913,7 @@ EQMP
 
 {
 .name   = "drive-backup",
-.args_type  = "device:B,target:s,speed:i?,mode:s?,format:s?,"
+.args_type  = 
"device:B,target:s,target-id:s?,speed:i?,mode:s?,format:s?,"
   "on-source-error:s?,on-target-error:s?",
 .mhandler.cmd_new = qmp_marshal_input_drive_backup,
 },
@@ -936,6 +936,7 @@ Arguments:
 device, the existing file/device will be used as the new
 destination.  If it does not exist, a new file will be created.
 (json-string)
+- "target_id": the drive id of the target image.
 - "format": the format of the new destination, default is to probe if 'mode' is
 'existing', else the format of the source
 (json-string, optional)
-- 
1.8.3.1




Re: [Qemu-devel] [PATCH V2 5/7] monitor: support sub commands in auto completion

2013-06-25 Thread Wenchao Xia
于 2013-6-24 20:48, Wenchao Xia 写道:
> This patch allow auot completion work normal in sub command case,
> "info block [DEVICE]" can auto complete now, by re-enter the completion
> function. Also, original "info" is treated as a special case, now it is
> treated as a sub command group, global variable info_cmds is not used
> any more.
> 
> "help" command is still treated as a special case, since it is not a sub
> command group but want to auto complete command in root command table.
> 
> Signed-off-by: Wenchao Xia 
> ---
>   monitor.c |   36 
>   1 files changed, 24 insertions(+), 12 deletions(-)
> 
> diff --git a/monitor.c b/monitor.c
> index aa641de..f364a0d 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -4179,10 +4179,11 @@ static const char *next_arg_type(const char *typestr)
>   return (p != NULL ? ++p : typestr);
>   }
> 
> -static void monitor_find_completion(Monitor *mon,
> -const char *cmdline)
> +static void monitor_find_completion_by_table(Monitor *mon,
> + const mon_cmd_t *cmd_table,
> + const char *cmdline)
>   {
> -const char *cmdname;
> +const char *cmdname, *p;
>   char *args[MAX_ARGS];
>   int nb_args, i, len;
>   const char *ptype, *str;
> @@ -4212,12 +4213,12 @@ static void monitor_find_completion(Monitor *mon,
>   else
>   cmdname = args[0];
>   readline_set_completion_index(mon->rs, strlen(cmdname));
> -for (cmd = mon->cmd_table; cmd->name != NULL; cmd++) {
> +for (cmd = cmd_table; cmd->name != NULL; cmd++) {
>   cmd_completion(mon, cmdname, cmd->name);
>   }
>   } else {
>   /* find the command */
> -for (cmd = mon->cmd_table; cmd->name != NULL; cmd++) {
> +for (cmd = cmd_table; cmd->name != NULL; cmd++) {
>   if (compare_cmd(args[0], cmd->name)) {
>   break;
>   }
> @@ -4226,6 +4227,17 @@ static void monitor_find_completion(Monitor *mon,
>   goto cleanup;
>   }
> 
> +/* locate next valid string in original cmdline used by re-enter */
> +p = cmdline + strlen(args[0]);
> +while (qemu_isspace(*p)) {
> +p++;
> +}
> +
  Here it can't handle command start with space such as "   blk", I plan
make parse_cmdline() return additional const char **args_cmdline, which
point to cmdline correspond to each args. Just want to
mention it to save reviewer's time, I'll fix it with any other
comments:).


> +if (cmd->sub_table) {
> +monitor_find_completion_by_table(mon, cmd->sub_table, p);
> +goto cleanup;
> +}
> +
>   ptype = next_arg_type(cmd->args_type);
>   for(i = 0; i < nb_args - 2; i++) {
>   if (*ptype != '\0') {
> @@ -4252,13 +4264,7 @@ static void monitor_find_completion(Monitor *mon,
>   bdrv_iterate(block_completion_it, &mbs);
>   break;
>   case 's':
> -/* XXX: more generic ? */
> -if (!strcmp(cmd->name, "info")) {
> -readline_set_completion_index(mon->rs, strlen(str));
> -for(cmd = info_cmds; cmd->name != NULL; cmd++) {
> -cmd_completion(mon, str, cmd->name);
> -}
> -} else if (!strcmp(cmd->name, "sendkey")) {
> +if (!strcmp(cmd->name, "sendkey")) {
>   char *sep = strrchr(str, '-');
>   if (sep)
>   str = sep + 1;
> @@ -4284,6 +4290,12 @@ cleanup:
>   }
>   }
> 
> +static void monitor_find_completion(Monitor *mon,
> +const char *cmdline)
> +{
> +return monitor_find_completion_by_table(mon, mon->cmd_table, cmdline);
> +}
> +
>   static int monitor_can_read(void *opaque)
>   {
>   Monitor *mon = opaque;
> 


-- 
Best Regards

Wenchao Xia




[Qemu-devel] [PATCH V3 8/9] hmp: add interface hmp_snapshot_delete_blkdev_internal

2013-06-25 Thread Wenchao Xia
It is hard to make both id and name optional in hmp console as qmp
interface, so this interface require user to specify name.

Signed-off-by: Wenchao Xia 
---
 hmp-commands.hx |   18 ++
 hmp.c   |   12 
 hmp.h   |1 +
 3 files changed, 31 insertions(+), 0 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 53e3ef3..b8d17dc 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1053,6 +1053,24 @@ Take an internal snapshot on device if it support
 ETEXI
 
 {
+.name   = "snapshot_delete_blkdev_internal",
+.args_type  = "device:B,name:s,id:s?",
+.params = "device name [id]",
+.help   = "delete an internal snapshot of device.\n\t\t\t"
+  "If id is specified, qemu will try delete\n\t\t\t"
+  "the snapshot matching both id and name.\n\t\t\t"
+  "The format of the image used by device must\n\t\t\t"
+  "support it, such as qcow2.\n\t\t\t",
+.mhandler.cmd = hmp_snapshot_delete_blkdev_internal,
+},
+
+STEXI
+@item snapshot_delete_blkdev_internal
+@findex snapshot_delete_blkdev_internal
+Delete an internal snapshot on device if it support
+ETEXI
+
+{
 .name   = "drive_mirror",
 .args_type  = "reuse:-n,full:-f,device:B,target:s,format:s?",
 .params = "[-n] [-f] device target [format]",
diff --git a/hmp.c b/hmp.c
index d66ab20..c202d6d 100644
--- a/hmp.c
+++ b/hmp.c
@@ -922,6 +922,18 @@ void hmp_snapshot_blkdev_internal(Monitor *mon, const 
QDict *qdict)
 hmp_handle_error(mon, &errp);
 }
 
+void hmp_snapshot_delete_blkdev_internal(Monitor *mon, const QDict *qdict)
+{
+const char *device = qdict_get_str(qdict, "device");
+const char *name = qdict_get_str(qdict, "name");
+const char *id = qdict_get_try_str(qdict, "id");
+Error *errp = NULL;
+
+qmp_blockdev_snapshot_delete_internal_sync(device, !!id, id,
+   true, name, &errp);
+hmp_handle_error(mon, &errp);
+}
+
 void hmp_migrate_cancel(Monitor *mon, const QDict *qdict)
 {
 qmp_migrate_cancel(NULL);
diff --git a/hmp.h b/hmp.h
index fddf3a9..fa0e5d9 100644
--- a/hmp.h
+++ b/hmp.h
@@ -55,6 +55,7 @@ void hmp_balloon(Monitor *mon, const QDict *qdict);
 void hmp_block_resize(Monitor *mon, const QDict *qdict);
 void hmp_snapshot_blkdev(Monitor *mon, const QDict *qdict);
 void hmp_snapshot_blkdev_internal(Monitor *mon, const QDict *qdict);
+void hmp_snapshot_delete_blkdev_internal(Monitor *mon, const QDict *qdict);
 void hmp_drive_mirror(Monitor *mon, const QDict *qdict);
 void hmp_migrate_cancel(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_downtime(Monitor *mon, const QDict *qdict);
-- 
1.7.1





[Qemu-devel] [PATCH V3 6/9] qmp: add interface blockdev-snapshot-delete-internal-sync

2013-06-25 Thread Wenchao Xia
This interface use id and name as optional parameters, to handle the
case that one image contain multiple snapshots with same name which
may be '', but with different id.

Adding parameter id is for historical compatiability reason, and
that case is not possible in qemu's new interface for internal
snapshot at block device level, but still possible in qemu-img.

Signed-off-by: Wenchao Xia 
---
 blockdev.c   |   61 ++
 qapi-schema.json |   27 +++
 qmp-commands.hx  |   41 
 3 files changed, 129 insertions(+), 0 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index ce89f83..35fffd6 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -790,6 +790,67 @@ void qmp_blockdev_snapshot_internal_sync(const char 
*device,
&snapshot, errp);
 }
 
+SnapshotInfo *qmp_blockdev_snapshot_delete_internal_sync(const char *device,
+ bool has_id,
+ const char *id,
+ bool has_name,
+ const char *name,
+ Error **errp)
+{
+BlockDriverState *bs = bdrv_find(device);
+QEMUSnapshotInfo sn;
+Error *local_err = NULL;
+SnapshotInfo *info = NULL;
+int ret;
+
+if (!bs) {
+error_set(errp, QERR_DEVICE_NOT_FOUND, device);
+return NULL;
+};
+
+if (!has_id) {
+id = NULL;
+}
+
+if (!has_name) {
+name = NULL;
+}
+
+if (!id && !name) {
+error_setg(errp, "Name or id must be provided");
+return NULL;
+}
+
+ret = bdrv_snapshot_find_by_id_and_name(bs, id, name, &sn, &local_err);
+if (error_is_set(&local_err)) {
+error_propagate(errp, local_err);
+return NULL;
+}
+if (!ret) {
+error_setg(errp,
+   "Snapshot with id '%s' and name '%s' do not exist on "
+   "device '%s'",
+   STR_PRINT_CHAR(id), STR_PRINT_CHAR(name), device);
+return NULL;
+}
+
+bdrv_snapshot_delete(bs, id, name, &local_err);
+if (error_is_set(&local_err)) {
+error_propagate(errp, local_err);
+return NULL;
+}
+
+info = g_malloc0(sizeof(SnapshotInfo));
+info->id = g_strdup(sn.id_str);
+info->name = g_strdup(sn.name);
+info->date_nsec = sn.date_nsec;
+info->date_sec = sn.date_sec;
+info->vm_state_size = sn.vm_state_size;
+info->vm_clock_nsec = sn.vm_clock_nsec % 10;
+info->vm_clock_sec = sn.vm_clock_nsec / 10;
+
+return info;
+}
 
 /* New and old BlockDriverState structs for group snapshots */
 
diff --git a/qapi-schema.json b/qapi-schema.json
index fba9b15..ffcdca7 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1760,6 +1760,33 @@
   'data': { 'device': 'str', 'name': 'str'} }
 
 ##
+# @blockdev-snapshot-delete-internal-sync
+#
+# Synchronously delete an internal snapshot of a block device, when the format
+# of the image used support it. The snapshot is identified by name or id or
+# both. One of the name or id is required. It will returns SnapshotInfo of
+# successfully deleted snapshot.
+#
+# @device: the name of the device to delete the snapshot from
+#
+# @id: optional the snapshot's ID to be deleted
+#
+# @name: optional the snapshot's name to be deleted
+#
+# Returns: SnapshotInfo on success
+#  If @device is not a valid block device, DeviceNotFound
+#  If snapshot not found, GenericError
+#  If the format of the image used does not support it,
+#  BlockFormatFeatureNotSupported
+#  If @id and @name are both not specified, GenericError
+#
+# Since 1.6
+##
+{ 'command': 'blockdev-snapshot-delete-internal-sync',
+  'data': { 'device': 'str', '*id': 'str', '*name': 'str'},
+  'returns': 'SnapshotInfo' }
+
+##
 # @human-monitor-command:
 #
 # Execute a command on the human monitor and return the output.
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 3df50de..71260c9 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -1121,6 +1121,47 @@ Example:
 EQMP
 
 {
+.name   = "blockdev-snapshot-delete-internal-sync",
+.args_type  = "device:B,id:s?,name:s?",
+.mhandler.cmd_new =
+  qmp_marshal_input_blockdev_snapshot_delete_internal_sync,
+},
+
+SQMP
+blockdev-snapshot-delete-internal-sync
+--
+
+Synchronously delete an internal snapshot of a block device when the format of
+image used support it.  The snapshot is identified by name or id or both.  One
+of the name or id is required.  If the snapshot is not found, operation will
+fail.
+
+Arguments:
+
+- "device": device name (json-string)
+- "id": ID of the snapshot (json-string, optional)
+- "name": name of the snapshot (j

[Qemu-devel] [PATCH V3 9/9] qemu-iotests: add 056 internal snapshot for block device test case

2013-06-25 Thread Wenchao Xia
Create in transaction and deletion in single command will be tested.

Signed-off-by: Wenchao Xia 
---
 tests/qemu-iotests/056 |  267 
 tests/qemu-iotests/056.out |5 +
 tests/qemu-iotests/group   |1 +
 3 files changed, 273 insertions(+), 0 deletions(-)
 create mode 100755 tests/qemu-iotests/056
 create mode 100644 tests/qemu-iotests/056.out

diff --git a/tests/qemu-iotests/056 b/tests/qemu-iotests/056
new file mode 100755
index 000..1a8a02c
--- /dev/null
+++ b/tests/qemu-iotests/056
@@ -0,0 +1,267 @@
+#!/usr/bin/env python
+#
+# Tests for internal snapshot.
+#
+# Copyright (C) 2013 IBM, Inc.
+#
+# Based on 055.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+import time
+import os
+import iotests
+from iotests import qemu_img, qemu_io
+
+test_drv_base_name = 'drive'
+
+class ImageSnapshotTestCase(iotests.QMPTestCase):
+image_len = 120 * 1024 * 1024 # MB
+
+def __init__(self, *args):
+self.expect = []
+super(ImageSnapshotTestCase, self).__init__(*args)
+
+def _setUp(self, test_img_base_name, image_num):
+self.vm = iotests.VM()
+for i in range(0, image_num):
+filename = '%s%d' % (test_img_base_name, i)
+img = os.path.join(iotests.test_dir, filename)
+device = '%s%d' % (test_drv_base_name, i)
+qemu_img('create', '-f', iotests.imgfmt, img, str(self.image_len))
+self.vm.add_drive(img)
+self.expect.append({'image': img, 'device': device,
+'snapshots': [],
+'snapshots_name_counter': 0})
+self.vm.launch()
+
+def tearDown(self):
+self.vm.shutdown()
+for dev_expect in self.expect:
+os.remove(dev_expect['image'])
+
+def createSnapshotInTransaction(self, snapshot_num, abort = False):
+actions = []
+for dev_expect in self.expect:
+num = dev_expect['snapshots_name_counter']
+for j in range(0, snapshot_num):
+name = '%s_sn%d' % (dev_expect['device'], num)
+num = num + 1
+if abort == False:
+dev_expect['snapshots'].append({'name': name})
+dev_expect['snapshots_name_counter'] = num
+actions.append({
+'type': 'blockdev-snapshot-internal-sync',
+'data': { 'device': dev_expect['device'],
+  'name': name },
+})
+
+if abort == True:
+actions.append({
+'type': 'abort',
+'data': {},
+})
+
+result = self.vm.qmp('transaction', actions = actions)
+
+if abort == True:
+self.assert_qmp(result, 'error/class', 'GenericError')
+else:
+self.assert_qmp(result, 'return', {})
+
+def verifySnapshotInfo(self):
+result = self.vm.qmp('query-block')
+
+# Verify each expected result
+for dev_expect in self.expect:
+# 1. Find the returned image value and snapshot info
+image_result = None
+for device in result['return']:
+if device['device'] == dev_expect['device']:
+image_result = device['inserted']['image']
+break
+self.assertTrue(image_result != None)
+# Do not consider zero snapshot case now
+sn_list_result = image_result['snapshots']
+sn_list_expect = dev_expect['snapshots']
+
+# 2. Verify it with expect
+self.assertTrue(len(sn_list_result) == len(sn_list_expect))
+
+for sn_expect in sn_list_expect:
+sn_result = None
+for sn in sn_list_result:
+if sn_expect['name'] == sn['name']:
+sn_result = sn
+break
+self.assertTrue(sn_result != None)
+# Fill in the detail info
+sn_expect.update(sn_result)
+
+def deleteSnapshot(self, device, id = None, name = None):
+sn_list_expect = None
+sn_expect = None
+
+self.assertTrue(id != None or name != None)
+
+# Fill in the detail info include ID
+self.verifySnapshotInfo()
+
+#find the expected snapshot list
+

[Qemu-devel] [PATCH V3 5/9] qmp: add interface blockdev-snapshot-internal-sync

2013-06-25 Thread Wenchao Xia
Snapshot ID can't be specified in this interface.

Signed-off-by: Wenchao Xia 
---
 blockdev.c   |   13 +
 qapi-schema.json |   22 ++
 qmp-commands.hx  |   30 ++
 3 files changed, 65 insertions(+), 0 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index c3b607d..ce89f83 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -777,6 +777,19 @@ void qmp_blockdev_snapshot_sync(const char *device, const 
char *snapshot_file,
&snapshot, errp);
 }
 
+void qmp_blockdev_snapshot_internal_sync(const char *device,
+ const char *name,
+ Error **errp)
+{
+BlockdevSnapshotInternal snapshot = {
+.device = (char *) device,
+.name = (char *) name
+};
+
+blockdev_do_action(TRANSACTION_ACTION_KIND_BLOCKDEV_SNAPSHOT_INTERNAL_SYNC,
+   &snapshot, errp);
+}
+
 
 /* New and old BlockDriverState structs for group snapshots */
 
diff --git a/qapi-schema.json b/qapi-schema.json
index 2547a7d..fba9b15 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1738,6 +1738,28 @@
 '*mode': 'NewImageMode'} }
 
 ##
+# @blockdev-snapshot-internal-sync
+#
+# Synchronously take an internal snapshot of a block device, when the format
+# of the image used supports it.
+#
+# @device: the name of the device to generate the snapshot from
+#
+# @name: the name of new snapshot name
+#
+# Returns: nothing on success
+#  If @device is not a valid block device, DeviceNotFound
+#  If any snapshot matching @name exists, or the name string is invalid
+#  which may mess up with snapshot ID, or name is empty, GenericError
+#  If the format of the image used does not support it,
+#  BlockFormatFeatureNotSupported
+#
+# Since 1.6
+##
+{ 'command': 'blockdev-snapshot-internal-sync',
+  'data': { 'device': 'str', 'name': 'str'} }
+
+##
 # @human-monitor-command:
 #
 # Execute a command on the human monitor and return the output.
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 27642c0..3df50de 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -1091,6 +1091,36 @@ Example:
 EQMP
 
 {
+.name   = "blockdev-snapshot-internal-sync",
+.args_type  = "device:B,name:s",
+.mhandler.cmd_new = qmp_marshal_input_blockdev_snapshot_internal_sync,
+},
+
+SQMP
+blockdev-snapshot-internal-sync
+---
+
+Synchronously take an internal snapshot of a block device when the format of
+image used supports it.  If the name is a numeric string which may mess up with
+ID, such as "19", or an empty string, the operation will fail.  If a snapshot
+with name already exists, the operation will fail.
+
+Arguments:
+
+- "device": device name to snapshot (json-string)
+- "name": name of the new snapshot (json-string)
+
+Example:
+
+-> { "execute": "blockdev-snapshot-internal-sync",
+"arguments": { "device": "ide-hd0",
+   "name": "snapshot0" }
+   }
+<- { "return": {} }
+
+EQMP
+
+{
 .name   = "drive-mirror",
 .args_type  = "sync:s,device:B,target:s,speed:i?,mode:s?,format:s?,"
   "on-source-error:s?,on-target-error:s?,"
-- 
1.7.1





[Qemu-devel] [PATCH V3 7/9] hmp: add interface hmp_snapshot_blkdev_internal

2013-06-25 Thread Wenchao Xia
Signed-off-by: Wenchao Xia 
---
 hmp-commands.hx |   19 +--
 hmp.c   |   10 ++
 hmp.h   |1 +
 3 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 915b0d1..53e3ef3 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1023,8 +1023,7 @@ ETEXI
   "of device. If a new image file is specified, 
the\n\t\t\t"
   "new image file will become the new root image.\n\t\t\t"
   "If format is specified, the snapshot file will\n\t\t\t"
-  "be created in that format. Otherwise the\n\t\t\t"
-  "snapshot will be internal! (currently 
unsupported).\n\t\t\t"
+  "be created in that format.\n\t\t\t"
   "The default format is qcow2.  The -n flag requests 
QEMU\n\t\t\t"
   "to reuse the image found in new-image-file, instead 
of\n\t\t\t"
   "recreating it from scratch.",
@@ -1038,6 +1037,22 @@ Snapshot device, using snapshot file as target if 
provided
 ETEXI
 
 {
+.name   = "snapshot_blkdev_internal",
+.args_type  = "device:B,name:s",
+.params = "device name",
+.help   = "take an internal snapshot of device.\n\t\t\t"
+  "The format of the image used by device must\n\t\t\t"
+  "support it, such as qcow2.\n\t\t\t",
+.mhandler.cmd = hmp_snapshot_blkdev_internal,
+},
+
+STEXI
+@item snapshot_blkdev_internal
+@findex snapshot_blkdev_internal
+Take an internal snapshot on device if it support
+ETEXI
+
+{
 .name   = "drive_mirror",
 .args_type  = "reuse:-n,full:-f,device:B,target:s,format:s?",
 .params = "[-n] [-f] device target [format]",
diff --git a/hmp.c b/hmp.c
index 494a9aa..d66ab20 100644
--- a/hmp.c
+++ b/hmp.c
@@ -912,6 +912,16 @@ void hmp_snapshot_blkdev(Monitor *mon, const QDict *qdict)
 hmp_handle_error(mon, &errp);
 }
 
+void hmp_snapshot_blkdev_internal(Monitor *mon, const QDict *qdict)
+{
+const char *device = qdict_get_str(qdict, "device");
+const char *name = qdict_get_str(qdict, "name");
+Error *errp = NULL;
+
+qmp_blockdev_snapshot_internal_sync(device, name, &errp);
+hmp_handle_error(mon, &errp);
+}
+
 void hmp_migrate_cancel(Monitor *mon, const QDict *qdict)
 {
 qmp_migrate_cancel(NULL);
diff --git a/hmp.h b/hmp.h
index 56d2e92..fddf3a9 100644
--- a/hmp.h
+++ b/hmp.h
@@ -54,6 +54,7 @@ void hmp_block_passwd(Monitor *mon, const QDict *qdict);
 void hmp_balloon(Monitor *mon, const QDict *qdict);
 void hmp_block_resize(Monitor *mon, const QDict *qdict);
 void hmp_snapshot_blkdev(Monitor *mon, const QDict *qdict);
+void hmp_snapshot_blkdev_internal(Monitor *mon, const QDict *qdict);
 void hmp_drive_mirror(Monitor *mon, const QDict *qdict);
 void hmp_migrate_cancel(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_downtime(Monitor *mon, const QDict *qdict);
-- 
1.7.1





[Qemu-devel] [PATCH V3 1/9] snapshot: new function bdrv_snapshot_find_by_id_and_name()

2013-06-25 Thread Wenchao Xia
To make it clear about id and name in searching, add this API
to distinguish them. Caller can choose to search by id or name,
*errp will be set only for exception.

Some code are modified based on Pavel's patch.

Signed-off-by: Wenchao Xia 
Signed-off-by: Pavel Hrdina 
---
 block/snapshot.c |   73 ++
 include/block/snapshot.h |6 
 2 files changed, 79 insertions(+), 0 deletions(-)

diff --git a/block/snapshot.c b/block/snapshot.c
index 6c6d9de..481a3ee 100644
--- a/block/snapshot.c
+++ b/block/snapshot.c
@@ -48,6 +48,79 @@ int bdrv_snapshot_find(BlockDriverState *bs, 
QEMUSnapshotInfo *sn_info,
 return ret;
 }
 
+/**
+ * Look up an internal snapshot by @id and @name.
+ * @bs: block device to search
+ * @id: unique snapshot ID, or NULL
+ * @name: snapshot name, or NULL
+ * @sn_info: location to store information on the snapshot found
+ * @errp: location to store error, will be set only for exception
+ *
+ * This function will traverse snapshot list in @bs to search the matching
+ * one, @id and @name are the matching condition:
+ * If both @id and @name are specified, find the first one with id @id and
+ * name @name.
+ * If only @id is specified, find the first one with id @id.
+ * If only @name is specified, find the first one with name @name.
+ * if none is specified, abort().
+ *
+ * Returns: true when a snapshot is found and @sn_info will be filled, false
+ * when error or not found. If all operation succeed but no matching one is
+ * found, @errp will NOT be set.
+ */
+bool bdrv_snapshot_find_by_id_and_name(BlockDriverState *bs,
+   const char *id,
+   const char *name,
+   QEMUSnapshotInfo *sn_info,
+   Error **errp)
+{
+QEMUSnapshotInfo *sn_tab, *sn;
+int nb_sns, i;
+bool ret = false;
+
+assert(id || name);
+
+nb_sns = bdrv_snapshot_list(bs, &sn_tab);
+if (nb_sns < 0) {
+error_setg_errno(errp, -nb_sns, "Failed to get a snapshot list");
+return false;
+} else if (nb_sns == 0) {
+return false;
+}
+
+if (id && name) {
+for (i = 0; i < nb_sns; i++) {
+sn = &sn_tab[i];
+if (!strcmp(sn->id_str, id) && !strcmp(sn->name, name)) {
+*sn_info = *sn;
+ret = true;
+break;
+}
+}
+} else if (id) {
+for (i = 0; i < nb_sns; i++) {
+sn = &sn_tab[i];
+if (!strcmp(sn->id_str, id)) {
+*sn_info = *sn;
+ret = true;
+break;
+}
+}
+} else if (name) {
+for (i = 0; i < nb_sns; i++) {
+sn = &sn_tab[i];
+if (!strcmp(sn->name, name)) {
+*sn_info = *sn;
+ret = true;
+break;
+}
+}
+}
+
+g_free(sn_tab);
+return ret;
+}
+
 int bdrv_can_snapshot(BlockDriverState *bs)
 {
 BlockDriver *drv = bs->drv;
diff --git a/include/block/snapshot.h b/include/block/snapshot.h
index eaf61f0..9d06dc1 100644
--- a/include/block/snapshot.h
+++ b/include/block/snapshot.h
@@ -26,6 +26,7 @@
 #define SNAPSHOT_H
 
 #include "qemu-common.h"
+#include "qapi/error.h"
 
 typedef struct QEMUSnapshotInfo {
 char id_str[128]; /* unique snapshot id */
@@ -40,6 +41,11 @@ typedef struct QEMUSnapshotInfo {
 
 int bdrv_snapshot_find(BlockDriverState *bs, QEMUSnapshotInfo *sn_info,
const char *name);
+bool bdrv_snapshot_find_by_id_and_name(BlockDriverState *bs,
+   const char *id,
+   const char *name,
+   QEMUSnapshotInfo *sn_info,
+   Error **errp);
 int bdrv_can_snapshot(BlockDriverState *bs);
 int bdrv_snapshot_create(BlockDriverState *bs,
  QEMUSnapshotInfo *sn_info);
-- 
1.7.1





[Qemu-devel] [PATCH V3 3/9] snapshot: distinguish id and name in snapshot delete

2013-06-25 Thread Wenchao Xia
Snapshot creation actually already distinguish id and name since it take
a structured parameter *sn, but delete can't. Later an accurate delete
is needed in qmp_transaction abort and blockdev-snapshot-delete-sync,
so change its prototype. Also *errp is added to tip error, but return
value is kepted to let caller check what kind of error happens. Existing
caller for it are savevm, delvm and qemu-img, they are not impacted by
check the return value and do the operations again.

Before this patch:
  For qcow2, it search id first then name to find the one to delete.
  For rbd, it search name.
  For sheepdog, it does nothing.

After this patch:
  For qcow2, logic is the same by call it twice in caller.
  For rbd, it always fails in delete with id, but still search for name
in second try, no change for user.

Some code for *errp is based on Pavel's patch.

Signed-off-by: Wenchao Xia 
Signed-off-by: Pavel Hrdina 
---
 block/qcow2-snapshot.c|   66 ++---
 block/qcow2.h |5 +++-
 block/rbd.c   |   23 +++-
 block/sheepdog.c  |5 +++-
 block/snapshot.c  |   36 ++--
 include/block/block_int.h |5 +++-
 include/block/snapshot.h  |5 +++-
 include/qemu-common.h |3 ++
 qemu-img.c|5 +++-
 savevm.c  |   10 +-
 10 files changed, 136 insertions(+), 27 deletions(-)

diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
index 4f9c524..e8a8e15 100644
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@@ -288,31 +288,47 @@ static void find_new_snapshot_id(BlockDriverState *bs,
 snapshot_id_string_generate(id_max + 1, id_str, id_str_size);
 }
 
-static int find_snapshot_by_id(BlockDriverState *bs, const char *id_str)
+static int find_snapshot_by_id_and_name(BlockDriverState *bs,
+const char *id,
+const char *name)
 {
 BDRVQcowState *s = bs->opaque;
 int i;
 
-for(i = 0; i < s->nb_snapshots; i++) {
-if (!strcmp(s->snapshots[i].id_str, id_str))
-return i;
+if (id && name) {
+for (i = 0; i < s->nb_snapshots; i++) {
+if (!strcmp(s->snapshots[i].id_str, id) &&
+!strcmp(s->snapshots[i].name, name)) {
+return i;
+}
+}
+} else if (id) {
+for (i = 0; i < s->nb_snapshots; i++) {
+if (!strcmp(s->snapshots[i].id_str, id)) {
+return i;
+}
+}
+} else if (name) {
+for (i = 0; i < s->nb_snapshots; i++) {
+if (!strcmp(s->snapshots[i].name, name)) {
+return i;
+}
+}
 }
+
 return -1;
 }
 
-static int find_snapshot_by_id_or_name(BlockDriverState *bs, const char *name)
+static int find_snapshot_by_id_or_name(BlockDriverState *bs,
+   const char *id_or_name)
 {
-BDRVQcowState *s = bs->opaque;
-int i, ret;
+int ret;
 
-ret = find_snapshot_by_id(bs, name);
-if (ret >= 0)
+ret = find_snapshot_by_id_and_name(bs, id_or_name, NULL);
+if (ret >= 0) {
 return ret;
-for(i = 0; i < s->nb_snapshots; i++) {
-if (!strcmp(s->snapshots[i].name, name))
-return i;
 }
-return -1;
+return find_snapshot_by_id_and_name(bs, NULL, id_or_name);
 }
 
 /* if no id is provided, a new one is constructed */
@@ -334,7 +350,7 @@ int qcow2_snapshot_create(BlockDriverState *bs, 
QEMUSnapshotInfo *sn_info)
 }
 
 /* Check that the ID is unique */
-if (find_snapshot_by_id(bs, sn_info->id_str) >= 0) {
+if (find_snapshot_by_id_and_name(bs, sn_info->id_str, NULL) >= 0) {
 return -EEXIST;
 }
 
@@ -531,15 +547,23 @@ fail:
 return ret;
 }
 
-int qcow2_snapshot_delete(BlockDriverState *bs, const char *snapshot_id)
+int qcow2_snapshot_delete(BlockDriverState *bs,
+  const char *snapshot_id,
+  const char *name,
+  Error **errp)
 {
 BDRVQcowState *s = bs->opaque;
 QCowSnapshot sn;
 int snapshot_index, ret;
+const char *device = bdrv_get_device_name(bs);
 
 /* Search the snapshot */
-snapshot_index = find_snapshot_by_id_or_name(bs, snapshot_id);
+snapshot_index = find_snapshot_by_id_and_name(bs, snapshot_id, name);
 if (snapshot_index < 0) {
+error_setg(errp,
+   "Can't find a snapshot with ID '%s' and name '%s' "
+   "on device '%s'",
+   STR_PRINT_CHAR(snapshot_id), STR_PRINT_CHAR(name), device);
 return -ENOENT;
 }
 sn = s->snapshots[snapshot_index];
@@ -551,6 +575,10 @@ int qcow2_snapshot_delete(BlockDriverState *bs, const char 
*snapshot_id)
 s->nb_snapshots--;
 ret = qcow2_write_snapshots(bs);
 if (ret < 0) {
+error_setg(errp,
+   "Failed to remo

[Qemu-devel] [PATCH V3 4/9] qmp: add internal snapshot support in qmp_transaction

2013-06-25 Thread Wenchao Xia
Unlike savevm, the qmp_transaction interface will not generate
snapshot name automatically, saving trouble to return information
of the new created snapshot. The snapshot name should not mess up
with snapshot ID, there is a check for it.

Although qcow2 support storing multiple snapshots with same name
but different ID, here it will fail when an snapshot with that name
already exist before the operation. Format such as rbd do not support
ID at all, and in most case, it means trouble to user when he faces
multiple snapshots with same name, so ban that case.

Snapshot ID can't be specified in this interface.

Signed-off-by: Wenchao Xia 
---
 blockdev.c   |  117 ++
 qapi-schema.json |   18 -
 qmp-commands.hx  |   33 ---
 3 files changed, 160 insertions(+), 8 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index b3a57e0..c3b607d 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -808,6 +808,118 @@ struct BlkTransactionState {
 QSIMPLEQ_ENTRY(BlkTransactionState) entry;
 };
 
+/* internal snapshot private data */
+typedef struct InternalSnapshotState {
+BlkTransactionState common;
+BlockDriverState *bs;
+QEMUSnapshotInfo sn;
+} InternalSnapshotState;
+
+static void internal_snapshot_prepare(BlkTransactionState *common,
+  Error **errp)
+{
+const char *device;
+const char *name;
+BlockDriverState *bs;
+QEMUSnapshotInfo sn, *sn1;
+bool ret;
+qemu_timeval tv;
+BlockdevSnapshotInternal *internal;
+InternalSnapshotState *state;
+
+g_assert(common->action->kind ==
+ TRANSACTION_ACTION_KIND_BLOCKDEV_SNAPSHOT_INTERNAL_SYNC);
+internal = common->action->blockdev_snapshot_internal_sync;
+state = DO_UPCAST(InternalSnapshotState, common, common);
+
+/* 1. parse input */
+device = internal->device;
+name = internal->name;
+
+/* 2. check for validation */
+bs = bdrv_find(device);
+if (!bs) {
+error_set(errp, QERR_DEVICE_NOT_FOUND, device);
+return;
+}
+
+if (!bdrv_is_inserted(bs)) {
+error_set(errp, QERR_DEVICE_HAS_NO_MEDIUM, device);
+return;
+}
+
+if (bdrv_is_read_only(bs)) {
+error_set(errp, QERR_DEVICE_IS_READ_ONLY, device);
+return;
+}
+
+if (!bdrv_can_snapshot(bs)) {
+error_set(errp, QERR_BLOCK_FORMAT_FEATURE_NOT_SUPPORTED,
+  bs->drv->format_name, device, "internal snapshot");
+return;
+}
+
+/* check whether a snapshot with name exist, no need to check id, since
+   name will be checked later to make sure it does not mess up with id. */
+ret = bdrv_snapshot_find_by_id_and_name(bs, NULL, name, &sn, errp);
+if (error_is_set(errp)) {
+return;
+}
+if (ret) {
+error_setg(errp,
+   "Snapshot with name '%s' already exist on device '%s'",
+   name, device);
+return;
+}
+
+/* Forbid having a name similar to id, empty name is also forbidden. */
+if (!snapshot_name_wellformed(name)) {
+error_setg(errp, "Name '%s' on device '%s' is not a valid one",
+   name, device);
+return;
+}
+
+/* 3. take the snapshot */
+sn1 = &state->sn;
+pstrcpy(sn1->name, sizeof(sn1->name), name);
+qemu_gettimeofday(&tv);
+sn1->date_sec = tv.tv_sec;
+sn1->date_nsec = tv.tv_usec * 1000;
+sn1->vm_clock_nsec = qemu_get_clock_ns(vm_clock);
+
+if (bdrv_snapshot_create(bs, sn1) < 0) {
+error_setg(errp, "Failed to create snapshot '%s' on device '%s'",
+   name, device);
+return;
+}
+
+/* 4. succeed, mark a snapshot is created */
+state->bs = bs;
+}
+
+static void internal_snapshot_abort(BlkTransactionState *common)
+{
+InternalSnapshotState *state =
+ DO_UPCAST(InternalSnapshotState, common, common);
+BlockDriverState *bs = state->bs;
+QEMUSnapshotInfo *sn = &state->sn;
+Error *local_error = NULL;
+
+if (!bs) {
+return;
+}
+
+if (bdrv_snapshot_delete(bs, sn->id_str, sn->name, &local_error) < 0) {
+error_report("Failed to delete snapshot with id '%s' and name '%s' on "
+ "device '%s' in abort, reason is: '%s'",
+ sn->id_str,
+ sn->name,
+ bdrv_get_device_name(bs),
+ error_get_pretty(local_error));
+error_free(local_error);
+}
+}
+
 /* external snapshot private data */
 typedef struct ExternalSnapshotState {
 BlkTransactionState common;
@@ -990,6 +1102,11 @@ static const BdrvActionOps actions[] = {
 .prepare = abort_prepare,
 .commit = abort_commit,
 },
+[TRANSACTION_ACTION_KIND_BLOCKDEV_SNAPSHOT_INTERNAL_SYNC] = {
+.instance_size = sizeof(InternalSnapshotState),
+.prepare  = internal_snapshot_prepare,
+.abort = internal_snap

[Qemu-devel] [PATCH V3 0/9] add internal snapshot support at block device level

2013-06-25 Thread Wenchao Xia
  This series brings internal snapshot support at block devices level, now we
have two three methods to do block snapshot lively: 1) backing chain,
2) internal one and 3) drive-back up approach.

Comparation:
 Advantages:Disadvantages:
1)delta data, taken fast, export, sizeperformance, delete slow.
2)  taken fast, delete fast, performance, size   delta data, format
3)  performance, export, format   taken slow, delta data, size

  I think in most case, saving vmstate in an standalone file is better than
saving it inside qcow2, So suggest treat internal snapshot as block level
methods and not encourage user to savevm in qcow2 any more.

Implemention details:
  To avoid trouble, this serial have hide ID in create interfaces, this make
sure no chaos of ID and name will be introduced by these interfaces.
  There is one patch may be common to Pavel's savvm transaction, patch 1/11,
others are not quite related. Patch 1/11 will not set errp when no snapshot
find, since patch 3/11 need to distinguish real error case.

Next steps to better full VM snapshot:
  Improve internal snapshot's export capability.
  Better vmstate saving.

  Thanks Kevin to give advisement about how add it in qmp_transaction, oldest
version comes drom Dietmar Maurer.

v3:
  General:
  Rebased after Stenfan's driver-backup patch V6.

  Address Eric's comments:
  4/9: grammar fix and better doc.
  5/9: parameter name is mandatory now. grammar fix.
  6/9: redesiged interface: take both id and name as optional parameter, return
the deleted snapshot's info.

  Address Stefan's comments:
  4/9: add '' around %s in message. drop code comments about vm_clock.
  9/9: better doc, refined the code and add more test case.


Wenchao Xia (9):
  1 snapshot: new function bdrv_snapshot_find_by_id_and_name()
  2 snapshot: add paired functions for internal snapshot id and name
  3 snapshot: distinguish id and name in snapshot delete
  4 qmp: add internal snapshot support in qmp_transaction
  5 qmp: add interface blockdev-snapshot-internal-sync
  6 qmp: add interface blockdev-snapshot-delete-internal-sync
  7 hmp: add interface hmp_snapshot_blkdev_internal
  8 hmp: add interface hmp_snapshot_delete_blkdev_internal
  9 qemu-iotests: add 056 internal snapshot for block device test case

 block/qcow2-snapshot.c |   68 +---
 block/qcow2.h  |5 +-
 block/rbd.c|   23 -
 block/sheepdog.c   |5 +-
 block/snapshot.c   |  130 +-
 blockdev.c |  191 +++
 hmp-commands.hx|   37 ++-
 hmp.c  |   22 
 hmp.h  |2 +
 include/block/block_int.h  |5 +-
 include/block/snapshot.h   |   14 ++-
 include/qemu-common.h  |3 +
 qapi-schema.json   |   67 +++-
 qemu-img.c |5 +-
 qmp-commands.hx|  104 -
 savevm.c   |   10 ++-
 tests/qemu-iotests/056 |  267 
 tests/qemu-iotests/056.out |5 +
 tests/qemu-iotests/group   |1 +
 19 files changed, 926 insertions(+), 38 deletions(-)
 create mode 100755 tests/qemu-iotests/056
 create mode 100644 tests/qemu-iotests/056.out





[Qemu-devel] [PATCH V3 2/9] snapshot: add paired functions for internal snapshot id and name

2013-06-25 Thread Wenchao Xia
Internal snapshot's ID and name concept are both visible in general
block level, they are observed by user in "info snapshots", so it is
possible to mess them up. Although we can separate the two concept in
programming, but if they can be distinguished in string itself, things
will be simple and clear, so introduce two functions to do it.

The implemention, qcow2 snapshot calls snapshot_id_string_generate() to
make sure it follows the rule in driver. If caller or user give a check
with snapshot_name_wellformed() before create snapshot, then the ID
and name will never mess up. The check can be also taken in
qcow2_snapshot_create(), but require it to return error reason.

For rbd, it have no ID, so have no impact.
For sheepdog, ID can't be determined in qemu, so still can't guarantee
that no more mess up for ID and name.

Signed-off-by: Wenchao Xia 
---
 block/qcow2-snapshot.c   |2 +-
 block/snapshot.c |   21 +
 include/block/snapshot.h |3 +++
 3 files changed, 25 insertions(+), 1 deletions(-)

diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
index 0caac90..4f9c524 100644
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@@ -285,7 +285,7 @@ static void find_new_snapshot_id(BlockDriverState *bs,
 if (id > id_max)
 id_max = id;
 }
-snprintf(id_str, id_str_size, "%d", id_max + 1);
+snapshot_id_string_generate(id_max + 1, id_str, id_str_size);
 }
 
 static int find_snapshot_by_id(BlockDriverState *bs, const char *id_str)
diff --git a/block/snapshot.c b/block/snapshot.c
index 481a3ee..40e0cc7 100644
--- a/block/snapshot.c
+++ b/block/snapshot.c
@@ -228,3 +228,24 @@ int bdrv_snapshot_load_tmp(BlockDriverState *bs,
 }
 return -ENOTSUP;
 }
+
+/*
+ * Return true if the given internal snapshot name is valid, false
+ * otherwise.
+ *
+ * To prevent clashes with internal snapshot IDs, names consisting only
+ * of digits are rejected.  Empty strings are also rejected.
+ */
+bool snapshot_name_wellformed(const char *name)
+{
+return strspn(name, "0123456789") != strlen(name);
+}
+
+/*
+ * Following function generate id string, used by block driver, such as qcow2.
+ * Since no better place to go, place the funtion here for now.
+ */
+void snapshot_id_string_generate(int id, char *id_str, int id_str_size)
+{
+snprintf(id_str, id_str_size, "%d", id);
+}
diff --git a/include/block/snapshot.h b/include/block/snapshot.h
index 9d06dc1..3d93719 100644
--- a/include/block/snapshot.h
+++ b/include/block/snapshot.h
@@ -56,4 +56,7 @@ int bdrv_snapshot_list(BlockDriverState *bs,
QEMUSnapshotInfo **psn_info);
 int bdrv_snapshot_load_tmp(BlockDriverState *bs,
const char *snapshot_name);
+
+bool snapshot_name_wellformed(const char *name);
+void snapshot_id_string_generate(int id, char *id_str, int id_str_size);
 #endif
-- 
1.7.1





Re: [Qemu-devel] [PATCH for mst/pci] output nc->name in NIC_RX_FILTER_CHANGED event

2013-06-25 Thread Amos Kong
On Mon, Jun 24, 2013 at 02:34:59PM +0800, Amos Kong wrote:
> netclient 'name' entry in event is useful for management to know
> which device is changed. n->netclient_name is not always set.
> This patch changes to use nc->name. If we don't assign 'id',
> qemu will set a generated name to nc->name.


IRC:  akong, what do other events include? name or id?

I just checked QMP/qmp-event.txt, they all use 'device name'.
(eg: BLOCK_IO_ERROR, DEVICE_DELETED, DEVICE_TRAY_MOVED, BLOCK_JOB_*)

If we assign 'id' for -device, device name will be set to id.
Otherwise, a generated device name will set to some device.
 
> Signed-off-by: Amos Kong 
> ---
>  hw/net/virtio-net.c | 11 +++
>  1 file changed, 3 insertions(+), 8 deletions(-)
> 
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index c88403a..e4d9752 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -200,14 +200,9 @@ static void rxfilter_notify(NetClientState *nc)
>  VirtIONet *n = qemu_get_nic_opaque(nc);
>  
>  if (nc->rxfilter_notify_enabled) {
> -if (n->netclient_name) {
> -event_data = qobject_from_jsonf("{ 'name': %s, 'path': %s }",
> -n->netclient_name,
> -
> object_get_canonical_path(OBJECT(n->qdev)));
> -} else {
> -event_data = qobject_from_jsonf("{ 'path': %s }",
> -
> object_get_canonical_path(OBJECT(n->qdev)));
> -}
> +event_data = qobject_from_jsonf("{ 'name': %s, 'path': %s }",
> +   nc->name,
> +   object_get_canonical_path(OBJECT(n->qdev)));
>  monitor_protocol_event(QEVENT_NIC_RX_FILTER_CHANGED, event_data);
>  qobject_decref(event_data);
>  
> -- 
> 1.8.1.4

-- 
Amos.



Re: [Qemu-devel] block: Review of .has_zero_init use

2013-06-25 Thread Bharata B Rao
On Tue, Jun 25, 2013 at 01:39:11PM +0200, Kevin Wolf wrote:
> 
> Can you please review for the gluster, rbd, sheepdog and ssh driver
> whether it's safe to assume that the image reads back as zeros after
> bdrv_create?

Gluster supports both file and block backends. While the above is true for
file backend (which uses ftruncate), the same is not true for
block backend (which uses lvcreate & lvresize).

So overall it is not safe to assume that an image on GlusterFS volume
reads back as zeroes after create.

Regards,
Bharata.




Re: [Qemu-devel] [PATCH 0/3] make bh safe with hot-unplug

2013-06-25 Thread liu ping fan
On Tue, Jun 25, 2013 at 3:53 PM, Paolo Bonzini  wrote:
> Il 25/06/2013 08:32, liu ping fan ha scritto:
>> On Tue, Jun 25, 2013 at 2:24 PM, Paolo Bonzini  wrote:
>>> Il 25/06/2013 19:38, Liu Ping Fan ha scritto:
 This series relies on refcnt of object used by bh callback to run against 
 unplug.

 Open issue:
 Another choice may be rcu, but I think some issues are hard to resolve.
 Using rcu, we have two choice:
   when holding object refcnt, call qemu_bh_delete(); then after grace 
 period, we can release.
   Or making qemu_bh_delete() sync in the path of DeviceState's 
 finalization.
>>>
>>> What do you mean exactly?  qemu_bh_delete() is async, but it operates on
>>> dynamically-allocated memory.
>>>
>> Before DeviceState is freed, we should make sure the bh is delete, and
>> not related bh is in fly.
>
> The bh need not be deleted.  It just needs to be not-scheduled (easy,
> qemu_bh_delete does that) and not running.
>
Yes, not-scheduled and not running are the only things needed, but not
easy to know.
> The latter part could be the hard one in a multi-threaded context, but I
> think it's up to the device to ensure it.  It doesn't _have_ to be hard.
>  For example, joining the data-plane thread would do that as well.
>
It seems not easy, take consideration of the sequence:
_bh_delete(), // ensure not scheduled, ie, no further access to DeviceState
wait for rcu grace period to come, // ensure not running
free DeviceState in rcu-reclaimer
But  in current code, _delete() can be placed in DeviceState
finalization path(in reclaimer), which means while reclaiming, there
still exist access path to the DeviceState.
On the other hand, pushing _delete() out of  finalization path is not
easy, since we do not what time the DeviceState has done with its bh.

Regards,
Pingfan
> Paolo
>
>>> Paolo
>>>
 but currently, the callers of qemu_bh_delete() can not satisfy any of the 
 two condition.
>>>
>



[Qemu-devel] Monitoring Screen Activity in QEMU/KVM

2013-06-25 Thread Shehbaz Jaffer
I want to determine the amount of screen activity taking place on VGA
monitor/ Screen for different applications (eg. playing vlc video, normal
typing.)

When I do not start the X server, I can easily determine the screen
activity by counting the number of pages accessed in the region 0xA -
0xB (This is the VGA Monitor region in boot screen).

However when I start the X Server, A diffrent set of pages are hit. Could
anyone please explain how the VGA Monitor works in QEMU? Or if someone
could suggest an alternate solution to determine amount of screen activity
while playing diffrent applications?

[NOTE : I have already checked for the region where X server pages are
mapped using cat /proc/iomem. But pages are not getting written to the
"possible" X server mmapped region.]

Regards,
-- 
Shehbaz Jaffer
Graduate Student
Department of Computer Engineering
Indian Institute of Technology, Delhi


Re: [Qemu-devel] [PATCH] target-arm: Added CP15 VBAR support

2013-06-25 Thread Peter Crosthwaite
Hi Peter,

(Sorry about the delay),

On Wed, Feb 27, 2013 at 8:10 PM, Peter Maydell  wrote:
> On 27 February 2013 07:16, Peter Crosthwaite
>  wrote:
>> From: Nathan Rossi 
>>
>> Added Vector Base Address remapping on ARM v7.
>
> This one's tricky because the VBAR only exists in CPUs with
> TrustZone, and strictly speaking QEMU models a non-TrustZone
> core. However we already have some dummy cp15 registers which
> are really TZ-only just to get guests working, and this may
> well fall into that category.
>
> I need to have a think about how we can coherently define
> what QEMU does about TZ so we can have a better idea of
> what should and shouldn't be implemented. (If you have any
> suggestions I'm open to them.)
>

ARM_FEATURE_TZ?

> On the patch itself:
>  * you've forgotten to bump the cpu vmstate version
>  * I'm trying to move towards using the official register
>abbreviations in cpu struct member names
>  * the low bits of VBAR are "UNK/SBZP", which means our
>implementation must ignore writes and read as zeroes. So
>you need to do the masking in the write-accessor, not at
>point of use

Fixed.

>  * we can just rely on the fact that the vbar field will be
>zero when QEMU is emulating a pre-v7 core, so you can
>avoid the feature check on use
>

Fixed.

Regards,
Peter

> thanks
> -- PMM
>



[Qemu-devel] [PATCH v12 00/15] rdma: migration support

2013-06-25 Thread Michael R. Hines

Changes since v11:

- Minor state transition fix
- Fixed localhost migration
- Fixed 0.0.0.0 migration
- Drop invalid hunk
- Updated tags

Michael R. Hines (15):
  rdma: add documentation
  rdma: introduce qemu_update_position()
  rdma: export yield_until_fd_readable()
  rdma: export throughput w/ MigrationStats QMP
  rdma: introduce qemu_file_mode_is_not_valid()
  rdma: export qemu_fflush()
  rdma: introduce ram_handle_compressed()
  rdma: introduce qemu_ram_foreach_block()
  rdma: new QEMUFileOps hooks
  rdma: introduce capability x-rdma-pin-all
  rdma: core logic
  rdma: send pc.ram
  rdma: allow state transitions between other states besides ACTIVE
  rdma: introduce MIG_STATE_NONE and change MIG_STATE_SETUP state
transition
  rdma: account for the time spent in MIG_STATE_SETUP through QMP

 Makefile.objs |1 +
 arch_init.c   |   69 +-
 configure |   29 +
 docs/rdma.txt |  415 ++
 exec.c|9 +
 hmp.c |6 +
 include/block/coroutine.h |6 +
 include/exec/cpu-common.h |5 +
 include/migration/migration.h |   32 +
 include/migration/qemu-file.h |   32 +
 migration-rdma.c  | 2789 
+

 migration.c   |   63 +-
 qapi-schema.json  |   21 +-
 qemu-coroutine-io.c   |   23 +
 savevm.c  |  114 +-
 15 files changed, 3554 insertions(+), 60 deletions(-)
 create mode 100644 docs/rdma.txt
 create mode 100644 migration-rdma.c

--
1.7.10.4




[Qemu-devel] [PATCH v12 14/15] rdma: introduce MIG_STATE_NONE and change MIG_STATE_SETUP state transition

2013-06-25 Thread mrhines
From: "Michael R. Hines" 

As described in the previous patch, until now, the MIG_STATE_SETUP
state was not really a 'formal' state. It has been used as a 'zero' state
(what we're calling 'NONE' here) and QEMU has been unconditionally transitioning
into this state when the QMP migration command was called. Instead we want to
introduce MIG_STATE_NONE, which is our starting state in the state machine, and
then immediately transition into the MIG_STATE_SETUP state when the QMP migrate
command is issued.

In order to do this, we must delay the transition into MIG_STATE_ACTIVE until
later in the migration_thread(). This is done to be able to timestamp the 
amount of
time spent in the SETUP state for proper accounting to the user during
an RDMA migration.

Furthermore, the management software, until now, has never been aware of the
existence of the SETUP state whatsoever. This must change, because, timing of 
this
state implies that the state actually exists.

These two patches cannot be separated because the 'query_migrate' QMP
switch statement needs to know how to handle this new state transition.

Reviewed-by: Juan Quintela 
Tested-by: Michael R. Hines 
Signed-off-by: Michael R. Hines 
---
 migration.c |   23 +++
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/migration.c b/migration.c
index dcb99c9..a199a67 100644
--- a/migration.c
+++ b/migration.c
@@ -36,7 +36,8 @@
 #endif
 
 enum {
-MIG_STATE_ERROR,
+MIG_STATE_ERROR = -1,
+MIG_STATE_NONE,
 MIG_STATE_SETUP,
 MIG_STATE_CANCELLED,
 MIG_STATE_ACTIVE,
@@ -63,7 +64,7 @@ static NotifierList migration_state_notifiers =
 MigrationState *migrate_get_current(void)
 {
 static MigrationState current_migration = {
-.state = MIG_STATE_SETUP,
+.state = MIG_STATE_NONE,
 .bandwidth_limit = MAX_THROTTLE,
 .xbzrle_cache_size = DEFAULT_MIGRATE_CACHE_SIZE,
 .mbps = -1,
@@ -184,9 +185,13 @@ MigrationInfo *qmp_query_migrate(Error **errp)
 MigrationState *s = migrate_get_current();
 
 switch (s->state) {
-case MIG_STATE_SETUP:
+case MIG_STATE_NONE:
 /* no migration has happened ever */
 break;
+case MIG_STATE_SETUP:
+info->has_status = true;
+info->status = g_strdup("setup");
+break;
 case MIG_STATE_ACTIVE:
 info->has_status = true;
 info->status = g_strdup("active");
@@ -257,7 +262,7 @@ void 
qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
 MigrationState *s = migrate_get_current();
 MigrationCapabilityStatusList *cap;
 
-if (s->state == MIG_STATE_ACTIVE) {
+if (s->state == MIG_STATE_ACTIVE || s->state == MIG_STATE_SETUP) {
 error_set(errp, QERR_MIGRATION_ACTIVE);
 return;
 }
@@ -316,7 +321,7 @@ static void migrate_fd_cancel(MigrationState *s)
 {
 DPRINTF("cancelling migration\n");
 
-migrate_set_state(s, MIG_STATE_ACTIVE, MIG_STATE_CANCELLED);
+migrate_set_state(s, s->state, MIG_STATE_CANCELLED);
 }
 
 void add_migration_state_change_notifier(Notifier *notify)
@@ -393,7 +398,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 params.blk = blk;
 params.shared = inc;
 
-if (s->state == MIG_STATE_ACTIVE) {
+if (s->state == MIG_STATE_ACTIVE || s->state == MIG_STATE_SETUP) {
 error_set(errp, QERR_MIGRATION_ACTIVE);
 return;
 }
@@ -525,6 +530,8 @@ static void *migration_thread(void *opaque)
 DPRINTF("beginning savevm\n");
 qemu_savevm_state_begin(s->file, &s->params);
 
+migrate_set_state(s, MIG_STATE_SETUP, MIG_STATE_ACTIVE);
+
 while (s->state == MIG_STATE_ACTIVE) {
 int64_t current_time;
 uint64_t pending_size;
@@ -604,8 +611,8 @@ static void *migration_thread(void *opaque)
 
 void migrate_fd_connect(MigrationState *s)
 {
-s->state = MIG_STATE_ACTIVE;
-trace_migrate_set_state(MIG_STATE_ACTIVE);
+s->state = MIG_STATE_SETUP;
+trace_migrate_set_state(MIG_STATE_SETUP);
 
 /* This is a best 1st approximation. ns to ms */
 s->expected_downtime = max_downtime/100;
-- 
1.7.10.4




[Qemu-devel] [PATCH v12 01/15] rdma: add documentation

2013-06-25 Thread mrhines
From: "Michael R. Hines" 

docs/rdma.txt contains full documentation,
wiki links, github url and contact information.

Reviewed-by: Juan Quintela 
Reviewed-by: Paolo Bonzini 
Reviewed-by: Chegu Vinod 
Tested-by: Chegu Vinod 
Tested-by: Michael R. Hines 
Signed-off-by: Michael R. Hines 
---
 docs/rdma.txt |  415 +
 1 file changed, 415 insertions(+)
 create mode 100644 docs/rdma.txt

diff --git a/docs/rdma.txt b/docs/rdma.txt
new file mode 100644
index 000..45a4b1d
--- /dev/null
+++ b/docs/rdma.txt
@@ -0,0 +1,415 @@
+(RDMA: Remote Direct Memory Access)
+RDMA Live Migration Specification, Version # 1
+==
+Wiki: http://wiki.qemu.org/Features/RDMALiveMigration
+Github: g...@github.com:hinesmr/qemu.git, 'rdma' branch
+
+Copyright (C) 2013 Michael R. Hines 
+
+An *exhaustive* paper (2010) shows additional performance details
+linked on the QEMU wiki above.
+
+Contents:
+=
+* Introduction
+* Before running
+* Running
+* Performance
+* RDMA Migration Protocol Description
+* Versioning and Capabilities
+* QEMUFileRDMA Interface
+* Migration of pc.ram
+* Error handling
+* TODO
+
+Introduction:
+=
+
+RDMA helps make your migration more deterministic under heavy load because
+of the significantly lower latency and higher throughput over TCP/IP. This is
+because the RDMA I/O architecture reduces the number of interrupts and
+data copies by bypassing the host networking stack. In particular, a TCP-based
+migration, under certain types of memory-bound workloads, may take a more
+unpredicatable amount of time to complete the migration if the amount of
+memory tracked during each live migration iteration round cannot keep pace
+with the rate of dirty memory produced by the workload.
+
+RDMA currently comes in two flavors: both Ethernet based (RoCE, or RDMA
+over Convered Ethernet) as well as Infiniband-based. This implementation of
+migration using RDMA is capable of using both technologies because of
+the use of the OpenFabrics OFED software stack that abstracts out the
+programming model irrespective of the underlying hardware.
+
+Refer to openfabrics.org or your respective RDMA hardware vendor for
+an understanding on how to verify that you have the OFED software stack
+installed in your environment. You should be able to successfully link
+against the "librdmacm" and "libibverbs" libraries and development headers
+for a working build of QEMU to run successfully using RDMA Migration.
+
+BEFORE RUNNING:
+===
+
+Use of RDMA during migration requires pinning and registering memory
+with the hardware. This means that memory must be physically resident
+before the hardware can transmit that memory to another machine.
+If this is not acceptable for your application or product, then the use
+of RDMA migration may in fact be harmful to co-located VMs or other
+software on the machine if there is not sufficient memory available to
+relocate the entire footprint of the virtual machine. If so, then the
+use of RDMA is discouraged and it is recommended to use standard TCP migration.
+
+Experimental: Next, decide if you want dynamic page registration.
+For example, if you have an 8GB RAM virtual machine, but only 1GB
+is in active use, then enabling this feature will cause all 8GB to
+be pinned and resident in memory. This feature mostly affects the
+bulk-phase round of the migration and can be enabled for extremely
+high-performance RDMA hardware using the following command:
+
+QEMU Monitor Command:
+$ migrate_set_capability x-rdma-pin-all on # disabled by default
+
+Performing this action will cause all 8GB to be pinned, so if that's
+not what you want, then please ignore this step altogether.
+
+On the other hand, this will also significantly speed up the bulk round
+of the migration, which can greatly reduce the "total" time of your migration.
+Example performance of this using an idle VM in the previous example
+can be found in the "Performance" section.
+
+Note: for very large virtual machines (hundreds of GBs), pinning all
+*all* of the memory of your virtual machine in the kernel is very expensive
+may extend the initial bulk iteration time by many seconds,
+and thus extending the total migration time. However, this will not
+affect the determinism or predictability of your migration you will
+still gain from the benefits of advanced pinning with RDMA.
+
+RUNNING:
+
+
+First, set the migration speed to match your hardware's capabilities:
+
+QEMU Monitor Command:
+$ migrate_set_speed 40g # or whatever is the MAX of your RDMA device
+
+Next, on the destination machine, add the following to the QEMU command line:
+
+qemu . -incoming x-rdma:host:port
+
+Finally, perform the actual migration on the source machine:
+
+QEMU Monitor Command:
+$ migrate -d x-rdma:host:port
+
+PERFORMANCE
+===
+
+Here is a brief summary of total migration time and downtime using RDMA:
+Using a 40gbp

[Qemu-devel] [PATCH v12 06/15] rdma: export qemu_fflush()

2013-06-25 Thread mrhines
From: "Michael R. Hines" 

RDMA uses this to flush the control channel before sending its
own message to handle page registrations.

Reviewed-by: Juan Quintela 
Reviewed-by: Paolo Bonzini 
Reviewed-by: Chegu Vinod 
Tested-by: Chegu Vinod 
Tested-by: Michael R. Hines 
Signed-off-by: Michael R. Hines 
---
 include/migration/qemu-file.h |1 +
 savevm.c  |2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index dd3fd51..37d1604 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -112,6 +112,7 @@ void qemu_file_reset_rate_limit(QEMUFile *f);
 void qemu_file_set_rate_limit(QEMUFile *f, int64_t new_rate);
 int64_t qemu_file_get_rate_limit(QEMUFile *f);
 int qemu_file_get_error(QEMUFile *f);
+void qemu_fflush(QEMUFile *f);
 
 static inline void qemu_put_be64s(QEMUFile *f, const uint64_t *pv)
 {
diff --git a/savevm.c b/savevm.c
index 67ddf06..26d5607 100644
--- a/savevm.c
+++ b/savevm.c
@@ -589,7 +589,7 @@ static inline bool qemu_file_is_writable(QEMUFile *f)
  * If there is writev_buffer QEMUFileOps it uses it otherwise uses
  * put_buffer ops.
  */
-static void qemu_fflush(QEMUFile *f)
+void qemu_fflush(QEMUFile *f)
 {
 ssize_t ret = 0;
 
-- 
1.7.10.4




[Qemu-devel] [PATCH v12 00/15] rdma: migration support

2013-06-25 Thread mrhines
From: "Michael R. Hines" 

Changes since v11:

- Minor state transition fix
- Fixed localhost migration
- Fixed 0.0.0.0 migration
- Drop invalid hunk
- Updated tags

Michael R. Hines (15):
  rdma: add documentation
  rdma: introduce qemu_update_position()
  rdma: export yield_until_fd_readable()
  rdma: export throughput w/ MigrationStats QMP
  rdma: introduce qemu_file_mode_is_not_valid()
  rdma: export qemu_fflush()
  rdma: introduce ram_handle_compressed()
  rdma: introduce qemu_ram_foreach_block()
  rdma: new QEMUFileOps hooks
  rdma: introduce capability x-rdma-pin-all
  rdma: core logic
  rdma: send pc.ram
  rdma: allow state transitions between other states besides ACTIVE
  rdma: introduce MIG_STATE_NONE and change MIG_STATE_SETUP state
transition
  rdma: account for the time spent in MIG_STATE_SETUP through QMP

 Makefile.objs |1 +
 arch_init.c   |   69 +-
 configure |   29 +
 docs/rdma.txt |  415 ++
 exec.c|9 +
 hmp.c |6 +
 include/block/coroutine.h |6 +
 include/exec/cpu-common.h |5 +
 include/migration/migration.h |   32 +
 include/migration/qemu-file.h |   32 +
 migration-rdma.c  | 2789 +
 migration.c   |   63 +-
 qapi-schema.json  |   21 +-
 qemu-coroutine-io.c   |   23 +
 savevm.c  |  114 +-
 15 files changed, 3554 insertions(+), 60 deletions(-)
 create mode 100644 docs/rdma.txt
 create mode 100644 migration-rdma.c

-- 
1.7.10.4




[Qemu-devel] [PATCH v12 13/15] rdma: allow state transitions between other states besides ACTIVE

2013-06-25 Thread mrhines
From: "Michael R. Hines" 

This patch is in preparation for the next ones: Until now the MIG_STATE_SETUP
state was not really a 'formal' state. It has been used as a 'zero' state
and QEMU has been unconditionally transitioning into this state when
the QMP migrate command was called. In preparation for timing this state,
we have to make this state a a 'real' state which actually gets transitioned
from later in the migration_thread() from SETUP => ACTIVE, rather than just
automatically dropping into this state at the beginninig of the migration.

This means that the state transition function (migration_finish_set_state())
needs to be capable of transitioning from valid states _other_ than just
MIG_STATE_ACTIVE.

The function is in fact already capable of doing that, but was not allowing the
old state to be a parameter specified as an input.

This patch fixes that and only makes the transition if the current state
matches the old state that the caller intended to transition from.

Reviewed-by: Juan Quintela 
Tested-by: Michael R. Hines 
Signed-off-by: Michael R. Hines 
---
 migration.c |   10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/migration.c b/migration.c
index 62c6b85..dcb99c9 100644
--- a/migration.c
+++ b/migration.c
@@ -295,9 +295,9 @@ static void migrate_fd_cleanup(void *opaque)
 notifier_list_notify(&migration_state_notifiers, s);
 }
 
-static void migrate_finish_set_state(MigrationState *s, int new_state)
+static void migrate_set_state(MigrationState *s, int old_state, int new_state)
 {
-if (__sync_val_compare_and_swap(&s->state, MIG_STATE_ACTIVE,
+if (__sync_val_compare_and_swap(&s->state, old_state,
 new_state) == new_state) {
 trace_migrate_set_state(new_state);
 }
@@ -316,7 +316,7 @@ static void migrate_fd_cancel(MigrationState *s)
 {
 DPRINTF("cancelling migration\n");
 
-migrate_finish_set_state(s, MIG_STATE_CANCELLED);
+migrate_set_state(s, MIG_STATE_ACTIVE, MIG_STATE_CANCELLED);
 }
 
 void add_migration_state_change_notifier(Notifier *notify)
@@ -546,14 +546,14 @@ static void *migration_thread(void *opaque)
 qemu_savevm_state_complete(s->file);
 qemu_mutex_unlock_iothread();
 if (!qemu_file_get_error(s->file)) {
-migrate_finish_set_state(s, MIG_STATE_COMPLETED);
+migrate_set_state(s, MIG_STATE_ACTIVE, 
MIG_STATE_COMPLETED);
 break;
 }
 }
 }
 
 if (qemu_file_get_error(s->file)) {
-migrate_finish_set_state(s, MIG_STATE_ERROR);
+migrate_set_state(s, MIG_STATE_ACTIVE, MIG_STATE_ERROR);
 break;
 }
 current_time = qemu_get_clock_ms(rt_clock);
-- 
1.7.10.4




[Qemu-devel] [PATCH v12 08/15] rdma: introduce qemu_ram_foreach_block()

2013-06-25 Thread mrhines
From: "Michael R. Hines" 

This is used during RDMA initialization in order to
transmit a description of all the RAM blocks to the
peer for later dynamic chunk registration purposes.

Reviewed-by: Juan Quintela 
Reviewed-by: Paolo Bonzini 
Reviewed-by: Chegu Vinod 
Tested-by: Chegu Vinod 
Tested-by: Michael R. Hines 
Signed-off-by: Michael R. Hines 
---
 exec.c|9 +
 include/exec/cpu-common.h |5 +
 2 files changed, 14 insertions(+)

diff --git a/exec.c b/exec.c
index 0b0118b..2e6fc00 100644
--- a/exec.c
+++ b/exec.c
@@ -2630,3 +2630,12 @@ bool cpu_physical_memory_is_io(hwaddr phys_addr)
  memory_region_is_romd(mr));
 }
 #endif
+
+void qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque)
+{
+RAMBlock *block;
+
+QTAILQ_FOREACH(block, &ram_list.blocks, next) {
+func(block->host, block->offset, block->length, opaque);
+}
+}
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index e061e21..92a4223 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -113,6 +113,11 @@ void cpu_physical_memory_write_rom(hwaddr addr,
 extern struct MemoryRegion io_mem_rom;
 extern struct MemoryRegion io_mem_notdirty;
 
+typedef void (RAMBlockIterFunc)(void *host_addr,
+ram_addr_t offset, ram_addr_t length, void *opaque);
+
+void qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque);
+
 #endif
 
 #endif /* !CPU_COMMON_H */
-- 
1.7.10.4




[Qemu-devel] [PATCH v12 02/15] rdma: introduce qemu_update_position()

2013-06-25 Thread mrhines
From: "Michael R. Hines" 

RDMA writes happen asynchronously, and thus the performance accounting
also needs to be able to occur asynchronously. This allows anybody
to call into savevm.c to update both f->pos as well as into arch_init.c
to update the acct_info structure with up-to-date values when
the RDMA transfer actually completes.

Reviewed-by: Juan Quintela 
Tested-by: Chegu Vinod 
Tested-by: Michael R. Hines 
Signed-off-by: Michael R. Hines 
---
 arch_init.c   |   12 
 include/migration/migration.h |2 ++
 include/migration/qemu-file.h |1 +
 savevm.c  |5 +
 4 files changed, 20 insertions(+)

diff --git a/arch_init.c b/arch_init.c
index a8b91ee..494a14f 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -498,6 +498,18 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
 
 static uint64_t bytes_transferred;
 
+void acct_update_position(QEMUFile *f, size_t size, bool zero)
+{
+uint64_t pages = size / TARGET_PAGE_SIZE;
+if (zero) {
+acct_info.dup_pages += pages;
+} else {
+acct_info.norm_pages += pages;
+bytes_transferred += size;
+qemu_update_position(f, size);
+}
+}
+
 static ram_addr_t ram_save_remaining(void)
 {
 return migration_dirty_pages;
diff --git a/include/migration/migration.h b/include/migration/migration.h
index e2acec6..0be28a2 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -92,6 +92,8 @@ uint64_t ram_bytes_remaining(void);
 uint64_t ram_bytes_transferred(void);
 uint64_t ram_bytes_total(void);
 
+void acct_update_position(QEMUFile *f, size_t size, bool zero);
+
 extern SaveVMHandlers savevm_ram_handlers;
 
 uint64_t dup_mig_bytes_transferred(void);
diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index 7519464..8fab0dd 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -93,6 +93,7 @@ void qemu_put_be32(QEMUFile *f, unsigned int v);
 void qemu_put_be64(QEMUFile *f, uint64_t v);
 int qemu_get_buffer(QEMUFile *f, uint8_t *buf, int size);
 int qemu_get_byte(QEMUFile *f);
+void qemu_update_position(QEMUFile *f, size_t size);
 
 static inline unsigned int qemu_get_ubyte(QEMUFile *f)
 {
diff --git a/savevm.c b/savevm.c
index 48cc2a9..9b5577e 100644
--- a/savevm.c
+++ b/savevm.c
@@ -671,6 +671,11 @@ int qemu_get_fd(QEMUFile *f)
 return -1;
 }
 
+void qemu_update_position(QEMUFile *f, size_t size)
+{
+f->pos += size;
+}
+
 /** Closes the file
  *
  * Returns negative error value if any error happened on previous operations or
-- 
1.7.10.4




[Qemu-devel] [PATCH v12 15/15] rdma: account for the time spent in MIG_STATE_SETUP through QMP

2013-06-25 Thread mrhines
From: "Michael R. Hines" 

Using the previous patches, we're now able to timestamp the SETUP
state. Once we have this time, let the user know about it in the
schema.

Reviewed-by: Juan Quintela 
Signed-off-by: Michael R. Hines 
---
 hmp.c |4 
 include/migration/migration.h |1 +
 migration.c   |9 +
 qapi-schema.json  |9 -
 4 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/hmp.c b/hmp.c
index 148a3fb..5f52f17 100644
--- a/hmp.c
+++ b/hmp.c
@@ -164,6 +164,10 @@ void hmp_info_migrate(Monitor *mon, const QDict *qdict)
 monitor_printf(mon, "downtime: %" PRIu64 " milliseconds\n",
info->downtime);
 }
+if (info->has_setup_time) {
+monitor_printf(mon, "setup: %" PRIu64 " milliseconds\n",
+   info->setup_time);
+}
 }
 
 if (info->has_ram) {
diff --git a/include/migration/migration.h b/include/migration/migration.h
index b5e413a..71dbe54 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -49,6 +49,7 @@ struct MigrationState
 int64_t dirty_bytes_rate;
 bool enabled_capabilities[MIGRATION_CAPABILITY_MAX];
 int64_t xbzrle_cache_size;
+int64_t setup_time;
 };
 
 void process_incoming_migration(QEMUFile *f);
diff --git a/migration.c b/migration.c
index a199a67..892302a 100644
--- a/migration.c
+++ b/migration.c
@@ -191,6 +191,7 @@ MigrationInfo *qmp_query_migrate(Error **errp)
 case MIG_STATE_SETUP:
 info->has_status = true;
 info->status = g_strdup("setup");
+info->has_total_time = false;
 break;
 case MIG_STATE_ACTIVE:
 info->has_status = true;
@@ -200,6 +201,8 @@ MigrationInfo *qmp_query_migrate(Error **errp)
 - s->total_time;
 info->has_expected_downtime = true;
 info->expected_downtime = s->expected_downtime;
+info->has_setup_time = true;
+info->setup_time = s->setup_time;
 
 info->has_ram = true;
 info->ram = g_malloc0(sizeof(*info->ram));
@@ -231,6 +234,8 @@ MigrationInfo *qmp_query_migrate(Error **errp)
 info->total_time = s->total_time;
 info->has_downtime = true;
 info->downtime = s->downtime;
+info->has_setup_time = true;
+info->setup_time = s->setup_time;
 
 info->has_ram = true;
 info->ram = g_malloc0(sizeof(*info->ram));
@@ -522,6 +527,7 @@ static void *migration_thread(void *opaque)
 {
 MigrationState *s = opaque;
 int64_t initial_time = qemu_get_clock_ms(rt_clock);
+int64_t setup_start = qemu_get_clock_ms(host_clock);
 int64_t initial_bytes = 0;
 int64_t max_size = 0;
 int64_t start_time = initial_time;
@@ -530,8 +536,11 @@ static void *migration_thread(void *opaque)
 DPRINTF("beginning savevm\n");
 qemu_savevm_state_begin(s->file, &s->params);
 
+s->setup_time = qemu_get_clock_ms(host_clock) - setup_start;
 migrate_set_state(s, MIG_STATE_SETUP, MIG_STATE_ACTIVE);
 
+DPRINTF("setup complete\n");
+
 while (s->state == MIG_STATE_ACTIVE) {
 int64_t current_time;
 uint64_t pending_size;
diff --git a/qapi-schema.json b/qapi-schema.json
index a30a728..0ad7257 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -578,6 +578,12 @@
 #expected downtime in milliseconds for the guest in last walk
 #of the dirty bitmap. (since 1.3)
 #
+# @setup-time: #optional amount of setup time spent _before_ the iterations
+#begin but _after_ the QMP command is issued. This is designed to
+#provide an accounting of any activities (such as RDMA pinning) which
+#may be expensive, but do not actually occur during the iterative
+#migration rounds themselves. (since 1.6)
+#
 # Since: 0.14.0
 ##
 { 'type': 'MigrationInfo',
@@ -586,7 +592,8 @@
'*xbzrle-cache': 'XBZRLECacheStats',
'*total-time': 'int',
'*expected-downtime': 'int',
-   '*downtime': 'int'} }
+   '*downtime': 'int',
+   '*setup-time' : 'int'} }
 
 ##
 # @query-migrate
-- 
1.7.10.4




[Qemu-devel] [PATCH v12 04/15] rdma: export throughput w/ MigrationStats QMP

2013-06-25 Thread mrhines
From: "Michael R. Hines" 

This exposes throughput (in megabits/sec) through QMP.

Reviewed-by: Juan Quintela 
Reviewed-by: Paolo Bonzini 
Reviewed-by: Chegu Vinod 
Tested-by: Chegu Vinod 
Tested-by: Michael R. Hines 
Signed-off-by: Michael R. Hines 
---
 hmp.c |2 ++
 include/migration/migration.h |1 +
 migration.c   |6 ++
 qapi-schema.json  |5 -
 4 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/hmp.c b/hmp.c
index 494a9aa..148a3fb 100644
--- a/hmp.c
+++ b/hmp.c
@@ -169,6 +169,8 @@ void hmp_info_migrate(Monitor *mon, const QDict *qdict)
 if (info->has_ram) {
 monitor_printf(mon, "transferred ram: %" PRIu64 " kbytes\n",
info->ram->transferred >> 10);
+monitor_printf(mon, "throughput: %0.2f mbps\n",
+   info->ram->mbps);
 monitor_printf(mon, "remaining ram: %" PRIu64 " kbytes\n",
info->ram->remaining >> 10);
 monitor_printf(mon, "total ram: %" PRIu64 " kbytes\n",
diff --git a/include/migration/migration.h b/include/migration/migration.h
index 0be28a2..535e844 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -40,6 +40,7 @@ struct MigrationState
 
 int state;
 MigrationParams params;
+double mbps;
 int64_t total_time;
 int64_t downtime;
 int64_t expected_downtime;
diff --git a/migration.c b/migration.c
index 058f9e6..441a3b2 100644
--- a/migration.c
+++ b/migration.c
@@ -66,6 +66,7 @@ MigrationState *migrate_get_current(void)
 .state = MIG_STATE_SETUP,
 .bandwidth_limit = MAX_THROTTLE,
 .xbzrle_cache_size = DEFAULT_MIGRATE_CACHE_SIZE,
+.mbps = -1,
 };
 
 return ¤t_migration;
@@ -201,6 +202,7 @@ MigrationInfo *qmp_query_migrate(Error **errp)
 info->ram->normal = norm_mig_pages_transferred();
 info->ram->normal_bytes = norm_mig_bytes_transferred();
 info->ram->dirty_pages_rate = s->dirty_pages_rate;
+info->ram->mbps = s->mbps;
 
 if (blk_mig_active()) {
 info->has_disk = true;
@@ -230,6 +232,7 @@ MigrationInfo *qmp_query_migrate(Error **errp)
 info->ram->skipped = skipped_mig_pages_transferred();
 info->ram->normal = norm_mig_pages_transferred();
 info->ram->normal_bytes = norm_mig_bytes_transferred();
+info->ram->mbps = s->mbps;
 break;
 case MIG_STATE_ERROR:
 info->has_status = true;
@@ -543,6 +546,9 @@ static void *migration_thread(void *opaque)
 double bandwidth = transferred_bytes / time_spent;
 max_size = bandwidth * migrate_max_downtime() / 100;
 
+s->mbps = time_spent ? (((double) transferred_bytes * 8.0) /
+((double) time_spent / 1000.0)) / 1000.0 / 1000.0 : -1;
+
 DPRINTF("transferred %" PRIu64 " time_spent %" PRIu64
 " bandwidth %g max_size %" PRId64 "\n",
 transferred_bytes, time_spent, bandwidth, max_size);
diff --git a/qapi-schema.json b/qapi-schema.json
index 6cc07c2..78da557 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -513,12 +513,15 @@
 # @dirty-pages-rate: number of pages dirtied by second by the
 #guest (since 1.3)
 #
+# @mbps: throughput in megabits/sec. (since 1.6)
+#
 # Since: 0.14.0
 ##
 { 'type': 'MigrationStats',
   'data': {'transferred': 'int', 'remaining': 'int', 'total': 'int' ,
'duplicate': 'int', 'skipped': 'int', 'normal': 'int',
-   'normal-bytes': 'int', 'dirty-pages-rate' : 'int' } }
+   'normal-bytes': 'int', 'dirty-pages-rate' : 'int',
+   'mbps' : 'number' } }
 
 ##
 # @XBZRLECacheStats
-- 
1.7.10.4




[Qemu-devel] [PATCH v12 09/15] rdma: new QEMUFileOps hooks

2013-06-25 Thread mrhines
From: "Michael R. Hines" 

These are the prototypes and implementation of new hooks that
RDMA takes advantage of to perform dynamic page registration.

An optional hook is also introduced for a custom function
to be able to override the default save_page function.

Also included are the prototypes and accessor methods used by
arch_init.c which invoke funtions inside savevm.c to call out
to the hooks that may or may not have been overridden
inside of QEMUFileOps.

Reviewed-by: Juan Quintela 
Reviewed-by: Paolo Bonzini 
Reviewed-by: Chegu Vinod 
Tested-by: Chegu Vinod 
Tested-by: Michael R. Hines 
Signed-off-by: Michael R. Hines 
---
 include/migration/migration.h |   20 ++
 include/migration/qemu-file.h |   29 
 savevm.c  |   59 +
 3 files changed, 108 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 8d9cbd1..7b153d7 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -21,6 +21,7 @@
 #include "qapi/error.h"
 #include "migration/vmstate.h"
 #include "qapi-types.h"
+#include "exec/cpu-common.h"
 
 struct MigrationParams {
 bool blk;
@@ -132,4 +133,23 @@ int migrate_use_xbzrle(void);
 int64_t migrate_xbzrle_cache_size(void);
 
 int64_t xbzrle_cache_resize(int64_t new_size);
+
+void ram_control_before_iterate(QEMUFile *f, uint64_t flags);
+void ram_control_after_iterate(QEMUFile *f, uint64_t flags);
+void ram_control_load_hook(QEMUFile *f, uint64_t flags);
+
+/* Whenever this is found in the data stream, the flags
+ * will be passed to ram_control_load_hook in the incoming-migration
+ * side. This lets before_ram_iterate/after_ram_iterate add
+ * transport-specific sections to the RAM migration data.
+ */
+#define RAM_SAVE_FLAG_HOOK 0x80
+
+#define RAM_SAVE_CONTROL_NOT_SUPP -1000
+#define RAM_SAVE_CONTROL_DELAYED  -2000
+
+size_t ram_control_save_page(QEMUFile *f, ram_addr_t block_offset,
+ ram_addr_t offset, size_t size,
+ int *bytes_sent);
+
 #endif
diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index 37d1604..0f757fb 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -23,6 +23,7 @@
  */
 #ifndef QEMU_FILE_H
 #define QEMU_FILE_H 1
+#include "exec/cpu-common.h"
 
 /* This function writes a chunk of data to a file at the given position.
  * The pos argument can be ignored if the file is only being used for
@@ -57,12 +58,40 @@ typedef int (QEMUFileGetFD)(void *opaque);
 typedef ssize_t (QEMUFileWritevBufferFunc)(void *opaque, struct iovec *iov,
int iovcnt, int64_t pos);
 
+/*
+ * This function provides hooks around different
+ * stages of RAM migration.
+ */
+typedef int (QEMURamHookFunc)(QEMUFile *f, void *opaque, uint64_t flags);
+
+/*
+ * Constants used by ram_control_* hooks
+ */
+#define RAM_CONTROL_SETUP0
+#define RAM_CONTROL_ROUND1
+#define RAM_CONTROL_HOOK 2
+#define RAM_CONTROL_FINISH   3
+
+/*
+ * This function allows override of where the RAM page
+ * is saved (such as RDMA, for example.)
+ */
+typedef size_t (QEMURamSaveFunc)(QEMUFile *f, void *opaque,
+   ram_addr_t block_offset,
+   ram_addr_t offset,
+   size_t size,
+   int *bytes_sent);
+
 typedef struct QEMUFileOps {
 QEMUFilePutBufferFunc *put_buffer;
 QEMUFileGetBufferFunc *get_buffer;
 QEMUFileCloseFunc *close;
 QEMUFileGetFD *get_fd;
 QEMUFileWritevBufferFunc *writev_buffer;
+QEMURamHookFunc *before_ram_iterate;
+QEMURamHookFunc *after_ram_iterate;
+QEMURamHookFunc *hook_ram_load;
+QEMURamSaveFunc *save_page;
 } QEMUFileOps;
 
 QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps *ops);
diff --git a/savevm.c b/savevm.c
index 26d5607..e0491e7 100644
--- a/savevm.c
+++ b/savevm.c
@@ -616,6 +616,65 @@ void qemu_fflush(QEMUFile *f)
 }
 }
 
+void ram_control_before_iterate(QEMUFile *f, uint64_t flags)
+{
+int ret = 0;
+
+if (f->ops->before_ram_iterate) {
+ret = f->ops->before_ram_iterate(f, f->opaque, flags);
+if (ret < 0) {
+qemu_file_set_error(f, ret);
+}
+}
+}
+
+void ram_control_after_iterate(QEMUFile *f, uint64_t flags)
+{
+int ret = 0;
+
+if (f->ops->after_ram_iterate) {
+ret = f->ops->after_ram_iterate(f, f->opaque, flags);
+if (ret < 0) {
+qemu_file_set_error(f, ret);
+}
+}
+}
+
+void ram_control_load_hook(QEMUFile *f, uint64_t flags)
+{
+int ret = 0;
+
+if (f->ops->hook_ram_load) {
+ret = f->ops->hook_ram_load(f, f->opaque, flags);
+if (ret < 0) {
+qemu_file_set_error(f, ret);
+}
+} else {
+qemu_file_set_error(f, ret);
+}
+}
+
+size_t ram_control_save_page(QEMUFile *f, ram_addr_t block_off

[Qemu-devel] [PATCH v12 03/15] rdma: export yield_until_fd_readable()

2013-06-25 Thread mrhines
From: "Michael R. Hines" 

The RDMA event channel can be made non-blocking just like a TCP
socket. Exporting this function allows us to yield so that the
QEMU monitor remains available.

Reviewed-by: Juan Quintela 
Reviewed-by: Paolo Bonzini 
Reviewed-by: Chegu Vinod 
Tested-by: Chegu Vinod 
Tested-by: Michael R. Hines 
Signed-off-by: Michael R. Hines 
---
 include/block/coroutine.h |6 ++
 qemu-coroutine-io.c   |   23 +++
 savevm.c  |   28 
 3 files changed, 29 insertions(+), 28 deletions(-)

diff --git a/include/block/coroutine.h b/include/block/coroutine.h
index a978162..377805a 100644
--- a/include/block/coroutine.h
+++ b/include/block/coroutine.h
@@ -209,4 +209,10 @@ void qemu_co_rwlock_unlock(CoRwlock *lock);
  */
 void coroutine_fn co_sleep_ns(QEMUClock *clock, int64_t ns);
 
+/**
+ * Yield until a file descriptor becomes readable
+ *
+ * Note that this function clobbers the handlers for the file descriptor.
+ */
+void coroutine_fn yield_until_fd_readable(int fd);
 #endif /* QEMU_COROUTINE_H */
diff --git a/qemu-coroutine-io.c b/qemu-coroutine-io.c
index e8ad1a4..c4df35a 100644
--- a/qemu-coroutine-io.c
+++ b/qemu-coroutine-io.c
@@ -63,3 +63,26 @@ qemu_co_send_recv(int sockfd, void *buf, size_t bytes, bool 
do_send)
 struct iovec iov = { .iov_base = buf, .iov_len = bytes };
 return qemu_co_sendv_recvv(sockfd, &iov, 1, 0, bytes, do_send);
 }
+
+typedef struct {
+Coroutine *co;
+int fd;
+} FDYieldUntilData;
+
+static void fd_coroutine_enter(void *opaque)
+{
+FDYieldUntilData *data = opaque;
+qemu_set_fd_handler(data->fd, NULL, NULL, NULL);
+qemu_coroutine_enter(data->co, NULL);
+}
+
+void coroutine_fn yield_until_fd_readable(int fd)
+{
+FDYieldUntilData data;
+
+assert(qemu_in_coroutine());
+data.co = qemu_coroutine_self();
+data.fd = fd;
+qemu_set_fd_handler(fd, fd_coroutine_enter, NULL, &data);
+qemu_coroutine_yield();
+}
diff --git a/savevm.c b/savevm.c
index 9b5577e..e35c7a4 100644
--- a/savevm.c
+++ b/savevm.c
@@ -149,34 +149,6 @@ typedef struct QEMUFileSocket
 QEMUFile *file;
 } QEMUFileSocket;
 
-typedef struct {
-Coroutine *co;
-int fd;
-} FDYieldUntilData;
-
-static void fd_coroutine_enter(void *opaque)
-{
-FDYieldUntilData *data = opaque;
-qemu_set_fd_handler(data->fd, NULL, NULL, NULL);
-qemu_coroutine_enter(data->co, NULL);
-}
-
-/**
- * Yield until a file descriptor becomes readable
- *
- * Note that this function clobbers the handlers for the file descriptor.
- */
-static void coroutine_fn yield_until_fd_readable(int fd)
-{
-FDYieldUntilData data;
-
-assert(qemu_in_coroutine());
-data.co = qemu_coroutine_self();
-data.fd = fd;
-qemu_set_fd_handler(fd, fd_coroutine_enter, NULL, &data);
-qemu_coroutine_yield();
-}
-
 static ssize_t socket_writev_buffer(void *opaque, struct iovec *iov, int 
iovcnt,
 int64_t pos)
 {
-- 
1.7.10.4




[Qemu-devel] [PATCH v12 10/15] rdma: introduce capability x-rdma-pin-all

2013-06-25 Thread mrhines
From: "Michael R. Hines" 

This capability allows you to disable dynamic chunk registration
for better throughput on high-performance links.

For example, using an 8GB RAM virtual machine with all 8GB of memory in
active use and the VM itself is completely idle using a 40 gbps infiniband link:

1. x-rdma-pin-all disabled total time: approximately 7.5 seconds @ 9.5 Gbps
2. x-rdma-pin-all enabled total time: approximately 4 seconds @ 26 Gbps

These numbers would of course scale up to whatever size virtual machine
you have to migrate using RDMA.

Enabling this feature does *not* have any measurable affect on
migration *downtime*. This is because, without this feature, all of the
memory will have already been registered already in advance during
the bulk round and does not need to be re-registered during the successive
iteration rounds.

Reviewed-by: Juan Quintela 
Reviewed-by: Paolo Bonzini 
Reviewed-by: Chegu Vinod 
Reviewed-by: Eric Blake 
Tested-by: Chegu Vinod 
Tested-by: Michael R. Hines 
Signed-off-by: Michael R. Hines 
---
 include/migration/migration.h |2 ++
 migration.c   |9 +
 qapi-schema.json  |7 ++-
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 7b153d7..9d3cc85 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -125,6 +125,8 @@ void migrate_add_blocker(Error *reason);
  */
 void migrate_del_blocker(Error *reason);
 
+bool migrate_rdma_pin_all(void);
+
 int xbzrle_encode_buffer(uint8_t *old_buf, uint8_t *new_buf, int slen,
  uint8_t *dst, int dlen);
 int xbzrle_decode_buffer(uint8_t *src, int slen, uint8_t *dst, int dlen);
diff --git a/migration.c b/migration.c
index 441a3b2..a704d48 100644
--- a/migration.c
+++ b/migration.c
@@ -476,6 +476,15 @@ void qmp_migrate_set_downtime(double value, Error **errp)
 max_downtime = (uint64_t)value;
 }
 
+bool migrate_rdma_pin_all(void)
+{
+MigrationState *s;
+
+s = migrate_get_current();
+
+return s->enabled_capabilities[MIGRATION_CAPABILITY_X_RDMA_PIN_ALL];
+}
+
 int migrate_use_xbzrle(void)
 {
 MigrationState *s;
diff --git a/qapi-schema.json b/qapi-schema.json
index 78da557..a30a728 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -608,10 +608,15 @@
 #  This feature allows us to minimize migration traffic for certain 
work
 #  loads, by sending compressed difference of the pages
 #
+# @x-rdma-pin-all: Controls whether or not the entire VM memory footprint is
+#  mlock()'d on demand or all at once. Refer to docs/rdma.txt for 
usage.
+#  Disabled by default. Experimental: may (or may not) be renamed after
+#  further testing is complete. (since 1.6)
+#
 # Since: 1.2
 ##
 { 'enum': 'MigrationCapability',
-  'data': ['xbzrle'] }
+  'data': ['xbzrle', 'x-rdma-pin-all'] }
 
 ##
 # @MigrationCapabilityStatus
-- 
1.7.10.4




[Qemu-devel] [PATCH v12 05/15] rdma: introduce qemu_file_mode_is_not_valid()

2013-06-25 Thread mrhines
From: "Michael R. Hines" 

QEMUFileRDMA also has read and write modes. This function is now
shared to reduce code duplication.

Reviewed-by: Juan Quintela 
Reviewed-by: Paolo Bonzini 
Reviewed-by: Chegu Vinod 
Tested-by: Chegu Vinod 
Tested-by: Michael R. Hines 
Signed-off-by: Michael R. Hines 
---
 include/migration/qemu-file.h |1 +
 savevm.c  |   20 +---
 2 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index 8fab0dd..dd3fd51 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -80,6 +80,7 @@ void qemu_put_byte(QEMUFile *f, int v);
  * The buffer should be available till it is sent asynchronously.
  */
 void qemu_put_buffer_async(QEMUFile *f, const uint8_t *buf, int size);
+bool qemu_file_mode_is_not_valid(const char *mode);
 
 static inline void qemu_put_ubyte(QEMUFile *f, unsigned int v)
 {
diff --git a/savevm.c b/savevm.c
index e35c7a4..67ddf06 100644
--- a/savevm.c
+++ b/savevm.c
@@ -449,14 +449,23 @@ static const QEMUFileOps socket_write_ops = {
 .close =  socket_close
 };
 
-QEMUFile *qemu_fopen_socket(int fd, const char *mode)
+bool qemu_file_mode_is_not_valid(const char *mode)
 {
-QEMUFileSocket *s;
-
 if (mode == NULL ||
 (mode[0] != 'r' && mode[0] != 'w') ||
 mode[1] != 'b' || mode[2] != 0) {
 fprintf(stderr, "qemu_fopen: Argument validity check failed\n");
+return true;
+}
+
+return false;
+}
+
+QEMUFile *qemu_fopen_socket(int fd, const char *mode)
+{
+QEMUFileSocket *s;
+
+if (qemu_file_mode_is_not_valid(mode)) {
 return NULL;
 }
 
@@ -475,10 +484,7 @@ QEMUFile *qemu_fopen(const char *filename, const char 
*mode)
 {
 QEMUFileStdio *s;
 
-if (mode == NULL ||
-   (mode[0] != 'r' && mode[0] != 'w') ||
-   mode[1] != 'b' || mode[2] != 0) {
-fprintf(stderr, "qemu_fopen: Argument validity check failed\n");
+if (qemu_file_mode_is_not_valid(mode)) {
 return NULL;
 }
 
-- 
1.7.10.4




[Qemu-devel] [PATCH v12 12/15] rdma: send pc.ram

2013-06-25 Thread mrhines
From: "Michael R. Hines" 

This takes advantages of the previous patches:

1. use the new QEMUFileOps hook 'save_page'

2. call out to the right accessor methods to invoke
   the iteration hooks defined in QEMUFileOps

Reviewed-by: Paolo Bonzini 
Reviewed-by: Chegu Vinod 
Tested-by: Chegu Vinod 
Tested-by: Michael R. Hines 
Signed-off-by: Michael R. Hines 
---
 arch_init.c |   33 -
 1 file changed, 32 insertions(+), 1 deletion(-)

diff --git a/arch_init.c b/arch_init.c
index 00bc3a9..5ea32db 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -115,6 +115,7 @@ const uint32_t arch_type = QEMU_ARCH;
 #define RAM_SAVE_FLAG_EOS  0x10
 #define RAM_SAVE_FLAG_CONTINUE 0x20
 #define RAM_SAVE_FLAG_XBZRLE   0x40
+/* 0x80 is reserved in migration.h start with 0x100 next */
 
 
 static struct defconfig_file {
@@ -447,6 +448,7 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
 ram_bulk_stage = false;
 }
 } else {
+int ret;
 uint8_t *p;
 int cont = (block == last_sent_block) ?
 RAM_SAVE_FLAG_CONTINUE : 0;
@@ -455,7 +457,18 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
 
 /* In doubt sent page as normal */
 bytes_sent = -1;
-if (is_zero_page(p)) {
+ret = ram_control_save_page(f, block->offset,
+   offset, TARGET_PAGE_SIZE, &bytes_sent);
+
+if (ret != RAM_SAVE_CONTROL_NOT_SUPP) {
+if (ret != RAM_SAVE_CONTROL_DELAYED) {
+if (bytes_sent > 0) {
+acct_info.norm_pages++;
+} else if (bytes_sent == 0) {
+acct_info.dup_pages++;
+}
+}
+} else if (is_zero_page(p)) {
 acct_info.dup_pages++;
 if (!ram_bulk_stage) {
 bytes_sent = save_block_hdr(f, block, offset, cont,
@@ -610,6 +623,10 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 }
 
 qemu_mutex_unlock_ramlist();
+
+ram_control_before_iterate(f, RAM_CONTROL_SETUP);
+ram_control_after_iterate(f, RAM_CONTROL_SETUP);
+
 qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
 
 return 0;
@@ -628,6 +645,8 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
 reset_ram_globals();
 }
 
+ram_control_before_iterate(f, RAM_CONTROL_ROUND);
+
 t0 = qemu_get_clock_ns(rt_clock);
 i = 0;
 while ((ret = qemu_file_rate_limit(f)) == 0) {
@@ -658,6 +677,12 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
 
 qemu_mutex_unlock_ramlist();
 
+/*
+ * Must occur before EOS (or any QEMUFile operation)
+ * because of RDMA protocol.
+ */
+ram_control_after_iterate(f, RAM_CONTROL_ROUND);
+
 if (ret < 0) {
 bytes_transferred += total_sent;
 return ret;
@@ -675,6 +700,8 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
 qemu_mutex_lock_ramlist();
 migration_bitmap_sync();
 
+ram_control_before_iterate(f, RAM_CONTROL_FINISH);
+
 /* try transferring iterative blocks of memory */
 
 /* flush all remaining blocks regardless of rate limiting */
@@ -688,6 +715,8 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
 }
 bytes_transferred += bytes_sent;
 }
+
+ram_control_after_iterate(f, RAM_CONTROL_FINISH);
 migration_end();
 
 qemu_mutex_unlock_ramlist();
@@ -884,6 +913,8 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 ret = -EINVAL;
 goto done;
 }
+} else if (flags & RAM_SAVE_FLAG_HOOK) {
+ram_control_load_hook(f, flags);
 }
 error = qemu_file_get_error(f);
 if (error) {
-- 
1.7.10.4




[Qemu-devel] RFC: Busifiying NAND

2013-06-25 Thread Peter Crosthwaite
Hi All,

NAND emulation is a little strange, in that is doesnt have an
appropriate bus type (NAND was formerly SYS_BUS_DEVICE which was
completely wrong - so changed to DEVICE). Do want TYPE_NAND_BUS so
centrally support multiple NAND chips connected to the one NAND
interface? Currently controllers have to do this ad-hoc.

It occurs to me that theres possibly a lot in common with
TYPE_SSI_BUS. Main difference is it's parallel instead of series,
which at the end of the day, QEMU really shouldn't care about.

So proposed action Items:

Convert write-protect and chip select pins to GPIO - Supports
(awkward) case where CS/WP is managed by something other than the NAND
controller.
Generalise SSI (Syncrhonous Serial Interface) to SI (Synchronous Interface)
Inherit SSI from SI (trivially)
Inherit NAND_BUS from SI
Add support for Address/Command Latch Enable ALE/CLE to NAND_BUS
Convert NAND and controllers.

The trafficking of data to/from multiple devs is handled by generic SI
layer, while the NAND specifics are in the inherited bus layer.

Regards,
Peter



Re: [Qemu-devel] [PATCH arm-devs v1 1/1] char/cadence_uart: Fix reset for unattached instances

2013-06-25 Thread Peter Crosthwaite
Hi Peter,

On Wed, Jun 26, 2013 at 3:16 AM, Peter Maydell  wrote:
> On 7 June 2013 05:02,   wrote:
>> From: Peter Crosthwaite 
>>
>> commit 1db8b5efe0c2b5000e50691eea61264a615f43de introduced an issue
>> where QEMU would segfault if you have an unattached Cadence UART.
>>
>> Fix by guarding the flush-on-reset logic on there being a qemu_chr
>> attachment.
>>
>> Reported-by: Soren Brinkmann 
>> Signed-off-by: Peter Crosthwaite 
>> ---
>>
>>  hw/char/cadence_uart.c | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/char/cadence_uart.c b/hw/char/cadence_uart.c
>> index c2a7834..29827d2 100644
>> --- a/hw/char/cadence_uart.c
>> +++ b/hw/char/cadence_uart.c
>> @@ -157,7 +157,9 @@ static void uart_rx_reset(UartState *s)
>>  {
>>  s->rx_wpos = 0;
>>  s->rx_count = 0;
>> -qemu_chr_accept_input(s->chr);
>> +if (s->chr) {
>> +qemu_chr_accept_input(s->chr);
>> +}
>
> There seem to be a lot of references to s->chr in this device:
> is this really the only one which needs a guard against NULL?
>

Actually no,

We have had some corner cases where other dereferences have segfaulted
on us too. On my todo list to sweep and fix globally.

But this is unique and a much bigger problem than the rest, as it is
in the reset path. So even if the uart is present in system and unused
it segafaults.

Regards,
Peter

> thanks
> -- PMM
>



Re: [Qemu-devel] [PATCH v1 0/5] Nand Cleanup

2013-06-25 Thread Peter Crosthwaite
On Wed, Jun 26, 2013 at 4:16 AM, Peter Maydell  wrote:
> On 18 June 2013 12:07,   wrote:
>> From: Peter Crosthwaite 
>>
>>
>> Misc refactorings and cleanup of NAND.
>>
>>
>> Peter Crosthwaite (5):
>>   block/nand: Factor out common code
>>   block/nand: Formatting sweep
>>   block/nand: QOM casting sweep
>>   block/nand: Convert Sysbus::init to Device::realize
>>   nand: Don't inherit from Sysbus
>
> I've applied patches 3..5 to arm-devs.next.
>

Thanks,

Regards
Peter

> Thanks
> -- PMM
>



Re: [Qemu-devel] [PATCH v11 14/15] rdma: introduce MIG_STATE_NONE and change MIG_STATE_SETUP state transition

2013-06-25 Thread Michael R. Hines

On 06/25/2013 05:06 PM, Paolo Bonzini wrote:

Il 25/06/2013 22:56, Michael R. Hines ha scritto:

I was wrong - this does require a protocol extension.

This is because the RDMA transfers are asynchronous, and thus
we cannot know in advance that it is safe to unregister the memory
associated with each individual transfer before the transfer actually
completes.

While the destination currently uses the protocol to participate in
*registering* the page, the destination does not participate in the
RDMA transfers themselves, only the source does, and thus would
require a new exchange of messages to block and instruct the
destination to unpin the memory.

Yes, that's what I recalled too (really what mst told me :)).  Does it
need to be blocking though?  As long as the pinning is blocking, and
messages are processed in order, the source can proceed immediately
after sending an unpin message.  This assumes of course that the chunk
is not being transmitted, and I am not sure how easy the source can
determine that.

Paolo



No, they're not processed in order. In fact, not only does the device
write out of order, but also the PCI bus writes out of order.
This was such a problem in fact, that I fixed several bugs as a result
a few weeks ago (v7 of the patch with an in-depth description).

The destination simply cannot assume whatsoever what the ordering
of the writes are - that's really the whole point of using RDMA in the
first place so that the software can get out of the way of the transfer 
process

to lower the latency of each transfer.

The only option is to send a blocking message to the other side to
request the unpinning (in addition to unpinning on the source first upon
completion of the original transfer).

As you can expect, this would be very expensive and we must ensure
that we have *very* good a-priori information that this memory will
not need to be re-registered anytime in the near future.

- Michael




Re: [Qemu-devel] [PATCH 1/2] device_tree: Add qemu_devtree_setprop_sized_cells() utility function

2013-06-25 Thread David Gibson
On Mon, Jun 24, 2013 at 12:02:39PM +0100, Peter Maydell wrote:
> On 24 June 2013 11:56, Alexander Graf  wrote:
> > On 24.06.2013, at 12:22, Peter Maydell wrote:
> >> We already have a qemu_devtree_setprop_cells() which sets a dtb
> >> property to an array of cells whose values are specified by varargs.
> >> However for the fairly common case of setting a property to a list
> >> of addresses or of address,size pairs the number of cells used by
> >> each element in the list depends on the parent's #address-cells
> >> and #size-cells properties. To make this easier we provide an analogous
> >> qemu_devtree_setprop_sized_cells() function which allows the number
> >> of cells used by each element to be specified.
> >>
> >> Signed-off-by: Peter Maydell 
> >
> > This looks pretty complicated for something actually quite
> > simple: All you want is to pass in a number of 64bit values,
> > rather than 32bit ones, right?
> 
> Nope. If the device tree blob says #address-cells is 1
> and #size-cells is 1, then I want to pass in values to
> go in 32 bit cells. If it says #address-cells is 2 but
> #size-cells is still 1, then I want to pass in a value
> for a 64 bit cell then one for a 32 bit cell. If they're
> both 2 then I want to pass in values for two 64 bit
> cells.

Hmm.. the property certainly needs to be constructed that way.  But I
think Alex's point is that you could make the arguments all 64-bit,
and then truncate them in the generated property.

There's a bigger problem, though, that exists with both versions.  In
general people expect integer arguments like this not to care too much
about the exact integer type, because it will be promoted to the right
thing.  Except with varargs it won't.  So if you ever have a
notionally 64-bit address that happens to fit in a 32-bit variable and
you pass that it, this function will be broken.  And the worst of it
is, it'll work most of the time, until you happen to hit the wrong ABI
and parameter combination :(.

> It's pretty complicated because the device tree
> spec is pretty complicated (if it had just mandated
> 64 bit addr/size everywhere then this would be easy).

Actually, no, because #address-cells and #size-cells cover things
other than "memory-like" addresses.  e.g. #address-cells is 3 for PCI,
which wouldn't be covered by 64-bit addressing.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgpS6hVxFkN0T.pgp
Description: PGP signature


Re: [Qemu-devel] [PATCH 2/5] Blackfin: initial port

2013-06-25 Thread Mike Frysinger
On Tuesday 25 June 2013 17:23:57 Richard Henderson wrote:

whee, got a review! :)  i've snipped items that were obvious in the "i'll go 
do this" category rather than just copying & pasting "OK" many times.  got a 
long flight coming up soon, so hopefully i can tackle the majority of this work 
then.

> > diff --git a/target-bfin/bfin-sim.c b/target-bfin/bfin-sim.c
> 
> Why this separate file from translate.c?

because this port is based on the GNU/sim Blackfin port.  bfin-sim.c focuses on 
the actual opcode translation while the higher level file (translate.c in QEMU 
and interp.c in GNU/sim) takes care of the higher layers (like clock ticking).  
i like keeping the core structure the same between the two sims so that i can 
more easily merge changes between them.

> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> 
> Certainly you shouldn't need these, since this isn't a separately
> compiled object -- you're included from translate.c.

yes, this is most likely true.  it's due to the previous reason.  maybe i 
should make it a sep compiled file then there won't be any confusion ...

> > +#define HOST_LONG_WORD_SIZE (sizeof(long) * 8)
> 
> You mean TCG_TARGET_REG_BITS?

maybe?  this is for extracting target encoded immediates and properly 
extending them up to the system that is running the code (e.g. x86_64) so that 
it can then be checked naturally (like comparing it to a -1).  but this is 
done in the decode logic, not in the tcg output logic, so i'm not sure TCG is 
the right abstraction.

> > +static TCGv
> > +get_allreg(DisasContext *dc, int grp, int reg)
> > +{
> > +TCGv *ret = cpu_regs[(grp << 3) | reg];
> > +if (ret) {
> > +   return *ret;
> > +}
> > +abort();
> > +illegal_instruction(dc);
> > +}
> 
> Well, which is it?  abort or illegal_instruction.

i had this in there while doing the initial port to track down bad things.  i 
can probably cut it over to illegal_instruction() now.

> And come to that, how is
> abort any better than SEGV from dereferecing the null?  Certainly the later
> will generate a faster translator...

QEMU doesn't need any help segfaulting :p.  an abort() is much better at 
showing the source of the problem.

> > +decode_multfunc_tl(DisasContext *dc, int h0, int h1, int src0, int src1,
> > +   int mmod, int MM, TCGv psat)
> > +{
> > +TCGv s0, s1, val;
> > +
> > +s0 = tcg_temp_local_new();
> 
> You'll really want to avoid local temps and branches, if at all possible. 
> For some of the more complex stuff that you're open-coding, you may be
> better off with helper functions instead.

it seemed like having generated (and cached) opcodes was better than relying 
on helpers since helpers requires interrupting the native code flow and muck 
around with state ?  is there a good (or even semi-decent) rule of thumb i can 
use to decide when to use one over the other ?

> > +static void
> > +saturate_s32(TCGv_i64 val, TCGv overflow)
> 
> I shall now stop mentioning movcond.  I sense there are many locations to
> come.

looks like movcond was introduced after i did the initial (bulk) port.  so 
there are probably many locations that can take advantage of it.  i'll have to 
go through the code top-to-bottom looking for things.  and probably look at 
the history of tcg/README to see what other interesting opcodes have been 
added since.

> > +} else if (prgfunc == 11 && poprnd < 6) {
> > +/* TESTSET (Preg{poprnd}); */
> > +TCGv tmp = tcg_temp_new();
> > +tcg_gen_qemu_ld8u(tmp, cpu_preg[poprnd], dc->mem_idx);
> > +tcg_gen_setcondi_tl(TCG_COND_EQ, cpu_cc, tmp, 0);
> > +tcg_gen_ori_tl(tmp, tmp, 0x80);
> > +tcg_gen_qemu_st8(tmp, cpu_preg[poprnd], dc->mem_idx);
> > +tcg_temp_free(tmp);
> 
> I'll note that this is fine for system code, but for user code ought to be
> atomic.  There are a bunch of really bad examples in the tree, and no real
> good solutions atm.

i general, i completely agree with you.  in practice, i think a (mis)feature 
of the Blackfin arch helps out here.  TESTSET doesn't work on cached memory, 
and it only works on system memory (i.e. not L1), and every Linux build runs 
with caches turned on.  so maybe flaky misbehavior is a good thing ? :)

i can drop a note in there noting the issue though

> > +/* Everything here needs to be aligned, so check once */
> > +gen_align_check(dc, cpu_spreg, 4, false);
> 
> You ought not need to generate explicit alignment checks.  Yes, we don't do
> that correctly for user-mode, but we do for system mode.
> 
> My hope is that user mode eventually has the option of using the system
> mode page tables too -- there are just too many things that don't work
> correctly when host and target page sizes don't match, or the host and
> target don't have the same unaligned access characteristics.

i had this code because it wasn't working in user mode, and someone in a 
previous review sugges

Re: [Qemu-devel] [PATCH v4 06/10] qemu-ga: Add Windows VSS provider to quiesce applications on fsfreeze

2013-06-25 Thread Tomoki Sekiyama
From: Paolo Bonzini [paolo.bonz...@gmail.com] on behalf of Paolo Bonzini 
[pbonz...@redhat.com]
> Il 06/06/2013 17:06, Tomoki Sekiyama ha scritto:
>> +STDAPI VSSCheckOSVersion(void);
>> +
>> +STDAPI COMRegister(void);
>> +STDAPI COMUnregister(void);
>> +
>> +STDAPI DllRegisterServer(void);
>> +STDAPI DllUnregisterServer(void);
>
> Can you explain the difference between COMRegister/COMUnregister and
> DllRegisterServer/DllUnregisterServer (and why the COM+ part need not be
> done by regsvr32)?  Also, why does COMUnregister call
> DllUnregisterServer but COMRegister does not call DllRegisterServer?

COMRegister and COMUnregister are called by`qemu-ga -s install`,
to register/unregister the DLL into/from COM+ application catalogue.

DllRegisterServer is automatically called inside
pCatalog->InstallComponent() (like regsvr32 does), and register
this DLL as a VSS provider. DllUnregisterServer will do the oposite.

ICOMAdminCatalog (pCatalog) does not provide a method to uninstall
component, so COMUnregister calls DllUnregisterServer by itself.

> Also, is it needed to call VSSCheckOSVersion from the requestor?  I
> would think that checking VSSAPI.DLL is stronger than checking the
> version, and indeed you do that check too.

In Windows XP, VSSAPI.DLL exists but it has different functionality
and interfaces from newer Windows. 
http://msdn.microsoft.com/en-us/library/windows/desktop/aa384627(v=vs.85).aspx

It is checking the OS version because this patchset only supports
Windows 2003 or later.

Thanks,
Tomoki Sekiyama


Re: [Qemu-devel] [PULL 0/8] arm-devs queue

2013-06-25 Thread Peter Maydell
On 25 June 2013 22:45, Anthony Liguori  wrote:
> Peter Maydell  writes:
>> and (more generically) do you care if the tag message
>> actually has any useful content? I notice mst does both.
>
> Note the tags don't show up in qemu.git git history.

The tags don't, but the message does, at least according to
https://www.kernel.org/pub/software/scm/git/docs/howto/using-signed-tag-in-pull-request.html
which says a signed tag always results in a merge commit
which includes the message that came with the tag.

For most of these pull requests I'm not sure the message
would actually add much useful info though, which is why
I left it at just "arm-devs queue".

thanks
-- PMM



Re: [Qemu-devel] [PULL v2 00/21] pci,kvm,misc enhancements

2013-06-25 Thread Anthony Liguori
"Michael S. Tsirkin"  writes:

> From: Michael S. Tsirkin 
>
> The following changes since commit 90a2541b763b31d2b551b07e24aae3de5266d31b:
>
>   target-i386: fix over 80 chars warnings (2013-06-15 17:50:38 +)

Just as an aside, you should rebase and test before sending pull
requests.  This conflicts with current tip.  Normally, wouldn't be a
problem but FYI for the future.

There's a trivial conflict with ppc-softmmu.mak that I can resolve but
also a conflict with Stefano's pull request that I cannot resolve
without rebasing :-/  So please rebase against latest tip and resubmit.

Regards,

Anthony Liguori

>
> are available in the git repository at:
>
>   git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_anthony
>
> for you to fetch changes up to 55cb9714d8a54f2dd82fc9fd7091762408bda531:
>
>   pci: Fold host_buses list into PCIHostState functionality (2013-06-23 
> 14:25:12 +0300)
>
> 
> pci,kvm,misc enhancements
>
> This includes some pci and kvm-related enhancements:
>
> Better support for systems with multiple PCI root buses
> KVM Speedup for MSI updates on kvm
> FW cfg interface for more robust pci programming in BIOS
> Minor fixes/cleanups for fw cfg and cross-version migration -
> because of dependencies with other patches
>
> Signed-off-by: Michael S. Tsirkin 
>
> Changes from v1:
> dropped rx filtering patch, it needs more testing
> there are no other changes
>
> 
> Andrew Jones (1):
>   e1000: cleanup process_tx_desc
>
> David Gibson (10):
>   pci: Cleanup configuration for pci-hotplug.c
>   pci: Move pci_read_devaddr to pci-hotplug-old.c
>   pci: Abolish pci_find_root_bus()
>   pci: Use helper to find device's root bus in pci_find_domain()
>   pci: Replace pci_find_domain() with more general pci_root_bus_path()
>   pci: Add root bus argument to pci_get_bus_devfn()
>   pci: Add root bus parameter to pci_nic_init()
>   pci: Simpler implementation of primary PCI bus
>   pci: Remove domain from PCIHostBus
>   pci: Fold host_buses list into PCIHostState functionality
>
> Michael S. Tsirkin (9):
>   range: add Range structure
>   pci: store PCI hole ranges in guestinfo structure
>   pc: pass PCI hole ranges to Guests
>   pc_piix: cleanup init compat handling
>   kvm: zero-initialize KVM_SET_GSI_ROUTING input
>   kvm: skip system call when msi route is unchanged
>   MAINTAINERS: s/Marcelo/Paolo/
>   pvpanic: initialization cleanup
>   pvpanic: fix fwcfg for big endian hosts
>
>  MAINTAINERS |   2 +-
>  default-configs/i386-softmmu.mak|   3 +-
>  default-configs/ppc64-softmmu.mak   |   2 -
>  default-configs/x86_64-softmmu.mak  |   3 +-
>  hmp-commands.hx |   4 +-
>  hw/alpha/dp264.c|   2 +-
>  hw/arm/realview.c   |   6 +-
>  hw/arm/versatilepb.c|   2 +-
>  hw/i386/pc.c|  74 ++-
>  hw/i386/pc_piix.c   |  40 +---
>  hw/i386/pc_q35.c|  18 +++-
>  hw/mips/mips_fulong2e.c |   6 +-
>  hw/mips/mips_malta.c|   6 +-
>  hw/misc/pvpanic.c   |  31 ---
>  hw/net/e1000.c  |  18 ++--
>  hw/pci-host/piix.c  |   9 ++
>  hw/pci-host/q35.c   |  17 
>  hw/pci/Makefile.objs|   2 +-
>  hw/pci/{pci-hotplug.c => pci-hotplug-old.c} |  75 ---
>  hw/pci/pci.c| 137 
> ++--
>  hw/pci/pci_host.c   |   1 +
>  hw/pci/pcie_aer.c   |   9 +-
>  hw/ppc/e500.c   |   2 +-
>  hw/ppc/mac_newworld.c   |   2 +-
>  hw/ppc/mac_oldworld.c   |   2 +-
>  hw/ppc/ppc440_bamboo.c  |   2 +-
>  hw/ppc/prep.c   |   2 +-
>  hw/ppc/spapr.c  |   2 +-
>  hw/ppc/spapr_pci.c  |  10 ++
>  hw/sh4/r2d.c|   5 +-
>  hw/sparc64/sun4u.c  |   2 +-
>  include/hw/i386/pc.h|  22 -
>  include/hw/pci-host/q35.h   |   2 +
>  include/hw/pci/pci.h|  17 ++--
>  include/hw/pci/pci_host.h   |  12 +++
>  include/qemu/range.h|  16 
>  include/qemu/typedefs.h |   1 +
>  kvm-all.c   |  23 +++--
>  38 files changed, 415 insertions(+), 174 deletions(-)
>  rename hw/pci/{pci-hotplug.c => pci-hotplug-old.c} (78%)



Re: [Qemu-devel] [PULL 0/8] arm-devs queue

2013-06-25 Thread Anthony Liguori
Peter Maydell  writes:

> On 25 June 2013 20:07, Anthony Liguori  wrote:
>> Peter Maydell  writes:
>>
>>> arm-devs pullreq; nothing particularly earthshattering.
>>> As with the target-arm pullreq, I'm trying to create a signed
>>> pullreq here, so let me know if I've messed it up.
>>> The signed tag itself is 'pull-arm-devs-20130625'; arm-devs.for-upstream
>>> is the branch name, as usual.
>>>
>>>
>>> The following changes since commit baf8673ca802cb3ea2cdbe94813441d23bde223b:
>>>
>>>   Merge remote-tracking branch 'stefanha/block' into staging (2013-06-24 
>>> 14:33:17 -0500)
>>>
>>> are available in the git repository at:
>>>
>>>
>>>   git://git.linaro.org/people/pmaydell/qemu-arm.git
>>>   arm-devs.for-upstream
>>
>> No need to resubmit, but you'll want to use the tag name instead of the
>> branch name here in the future.
>
> Yeah, I'd wondered if that might be the case.
>
> Oh, do you want the actual tag to have a signed-off-by:
> line,

I don't see any practical reason to add a SoB to a tag.  A tag is a
symbolic reference to an existing commit.  There is never any code
change.

A merge may involve code change so a SoB makes sense.  I now SoB all
merges.

Of course, if you want to include a SoB, more power to you but I am not
requiring it.

> and (more generically) do you care if the tag message
> actually has any useful content? I notice mst does both.

Note the tags don't show up in qemu.git git history.  I don't push them
as there's no point in pushing them there.  When I do a merge, I'm
merging the commit that the tag points to, not the tag itself.  The tag
only exists for me to be able to validate you are who you say you are
sending what you said you've sent.

So I really don't care if there's any content in the tag.  It's mostly
for your own personal benefit.

Regards,

Anthony Liguori

>
> thanks
> -- PMM




Re: [Qemu-devel] [PATCH 2/5] Blackfin: initial port

2013-06-25 Thread Richard Henderson
> diff --git a/target-bfin/bfin-sim.c b/target-bfin/bfin-sim.c

Why this separate file from translate.c?

> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 

Certainly you shouldn't need these, since this isn't a separately
compiled object -- you're included from translate.c.

> +static void
> +unhandled_instruction(DisasContext *dc, const char *insn)
> +{
> +fprintf(stderr, "unhandled insn: %s\n", insn);

Use LOG_UNIMP.

> +#define HOST_LONG_WORD_SIZE (sizeof(long) * 8)

You mean TCG_TARGET_REG_BITS?

> +static TCGv
> +get_allreg(DisasContext *dc, int grp, int reg)
> +{
> +TCGv *ret = cpu_regs[(grp << 3) | reg];
> +if (ret) {
> +   return *ret;
> +}
> +abort();
> +illegal_instruction(dc);
> +}

Well, which is it?  abort or illegal_instruction.  And come to that, how is
abort any better than SEGV from dereferecing the null?  Certainly the later
will generate a faster translator...

> +decode_multfunc_tl(DisasContext *dc, int h0, int h1, int src0, int src1,
> +   int mmod, int MM, TCGv psat)
> +{
> +TCGv s0, s1, val;
> +
> +s0 = tcg_temp_local_new();

You'll really want to avoid local temps and branches, if at all possible.  For
some of the more complex stuff that you're open-coding, you may be better off
with helper functions instead.

> +l = gen_new_label();
> +endl = gen_new_label();
> +
> +tcg_gen_brcondi_tl(TCG_COND_NE, val, 0x4000, l);
> +if (mmod == M_W32) {
> +tcg_gen_movi_tl(val, 0x7fff);
> +} else {
> +tcg_gen_movi_tl(val, 0x8000);
> +}
> +tcg_gen_movi_tl(psat, 1);
> +tcg_gen_br(endl);
> +
> +gen_set_label(l);
> +tcg_gen_shli_tl(val, val, 1);
> +
> +gen_set_label(endl);

Certainly possible here with 2 movcond, or 1 movcond, 1 setcond + 1 or.

> +l = gen_new_label();
> +tcg_gen_brcondi_tl(TCG_COND_EQ, psat, 0, l);
> +tcg_gen_ext32u_i64(val1, val1);
> +gen_set_label(l);

movcond again.

> +static void
> +saturate_s32(TCGv_i64 val, TCGv overflow)

I shall now stop mentioning movcond.  I sense there are many locations to come.

> +} else if (prgfunc == 11 && poprnd < 6) {
> +/* TESTSET (Preg{poprnd}); */
> +TCGv tmp = tcg_temp_new();
> +tcg_gen_qemu_ld8u(tmp, cpu_preg[poprnd], dc->mem_idx);
> +tcg_gen_setcondi_tl(TCG_COND_EQ, cpu_cc, tmp, 0);
> +tcg_gen_ori_tl(tmp, tmp, 0x80);
> +tcg_gen_qemu_st8(tmp, cpu_preg[poprnd], dc->mem_idx);
> +tcg_temp_free(tmp);

I'll note that this is fine for system code, but for user code ought to be
atomic.  There are a bunch of really bad examples in the tree, and no real
good solutions atm.

> +/* Can't push/pop reserved registers */
> +/*if (reg_is_reserved(grp, reg))
> +illegal_instruction(dc);*/

No commented out code like this.

> +/* Everything here needs to be aligned, so check once */
> +gen_align_check(dc, cpu_spreg, 4, false);

You ought not need to generate explicit alignment checks.  Yes, we don't do
that correctly for user-mode, but we do for system mode.

My hope is that user mode eventually has the option of using the system mode
page tables too -- there are just too many things that don't work correctly
when host and target page sizes don't match, or the host and target don't have
the same unaligned access characteristics.

> +} else if (grp == 4 && (reg == 0 || reg == 2)) {
> +/* Pop A#.X */
> +tmp = tcg_temp_new();
> +tcg_gen_qemu_ld32u(tmp, cpu_spreg, dc->mem_idx);
> +tcg_gen_andi_tl(tmp, tmp, 0xff);
> +tmp64 = tcg_temp_new_i64();
> +tcg_gen_extu_i32_i64(tmp64, tmp);
> +tcg_temp_free(tmp);
> +
> +tcg_gen_andi_i64(cpu_areg[reg >> 1], cpu_areg[reg >> 1], 
> 0x);
> +tcg_gen_shli_i64(tmp64, tmp64, 32);
> +tcg_gen_or_i64(cpu_areg[reg >> 1], cpu_areg[reg >> 1], tmp64);
> +tcg_temp_free_i64(tmp64);

Drop the andi with 0xff and use

tcg_gen_deposit_i64(cpu_areg[reg >> 1], cpu_areg[reg >> 1], tmp64, 32, 8)

> +} else if (grp == 4 && (reg == 1 || reg == 3)) {
> +/* Pop A#.W */
> +tcg_gen_andi_i64(cpu_areg[reg >> 1], cpu_areg[reg >> 1], 
> 0xff);
> +tmp = tcg_temp_new();
> +tcg_gen_qemu_ld32u(tmp, cpu_spreg, dc->mem_idx);
> +tmp64 = tcg_temp_new_i64();
> +tcg_gen_extu_i32_i64(tmp64, tmp);
> +tcg_temp_free(tmp);
> +tcg_gen_or_i64(cpu_areg[reg >> 1], cpu_areg[reg >> 1], tmp64);
> +tcg_temp_free_i64(tmp64);

And then this one becomes deposit(areg, areg, tmp64, 0, 32).

> +} else if (grp == 4 && (reg == 0 || reg == 2)) {
> +/* Push A#.X */
> +tmp64 = tcg_temp_new_i64();
> +tcg_gen_shri_i64(tmp64, cpu_areg[reg >> 1], 32);
> +tmp = tcg_temp_new();
>

Re: [Qemu-devel] [PATCH v4 06/10] qemu-ga: Add Windows VSS provider to quiesce applications on fsfreeze

2013-06-25 Thread Paolo Bonzini
Il 06/06/2013 17:06, Tomoki Sekiyama ha scritto:
> +STDAPI VSSCheckOSVersion(void);
> +
> +STDAPI COMRegister(void);
> +STDAPI COMUnregister(void);
> +
> +STDAPI DllRegisterServer(void);
> +STDAPI DllUnregisterServer(void);

Can you explain the difference between COMRegister/COMUnregister and
DllRegisterServer/DllUnregisterServer (and why the COM+ part need not be
done by regsvr32)?  Also, why does COMUnregister call
DllUnregisterServer but COMRegister does not call DllRegisterServer?

Also, is it needed to call VSSCheckOSVersion from the requestor?  I
would think that checking VSSAPI.DLL is stronger than checking the
version, and indeed you do that check too.

Sorry for the probably stupid questions, COM stuff is way beyond my
Windows knowledge.

Paolo



[Qemu-devel] [Bug 1101210] Re: qemu 1.3.0: usb keyboard not fully working

2013-06-25 Thread CrystalGamma
** Also affects: archlinux
   Importance: Undecided
   Status: New

** No longer affects: archlinux

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1101210

Title:
  qemu 1.3.0: usb keyboard not fully working

Status in QEMU:
  New

Bug description:
  When using the usb keyboard, I can't type the | character. I'm using
  german keyboard layout (de) on the host and inside the guest. As a
  guest OS, I use Linux (e.g. a recent KNOPPIX cd). To obtain the |
  character on a german keyboard, I need to press AltGr + the < or >
  key, i.e. the key right to the left shift.

  The qemu command line is something like this:
  ./qemu-system-i386 -device pci-ohci -device usb-kbd
  I also tried
  ./qemu-system-i386 -usb -usbdevice keyboard
  with the same effect.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1101210/+subscriptions



Re: [Qemu-devel] [PATCH v11 14/15] rdma: introduce MIG_STATE_NONE and change MIG_STATE_SETUP state transition

2013-06-25 Thread Paolo Bonzini
Il 25/06/2013 22:56, Michael R. Hines ha scritto:
>>
> I was wrong - this does require a protocol extension.
> 
> This is because the RDMA transfers are asynchronous, and thus
> we cannot know in advance that it is safe to unregister the memory
> associated with each individual transfer before the transfer actually
> completes.
> 
> While the destination currently uses the protocol to participate in
> *registering* the page, the destination does not participate in the
> RDMA transfers themselves, only the source does, and thus would
> require a new exchange of messages to block and instruct the
> destination to unpin the memory.

Yes, that's what I recalled too (really what mst told me :)).  Does it
need to be blocking though?  As long as the pinning is blocking, and
messages are processed in order, the source can proceed immediately
after sending an unpin message.  This assumes of course that the chunk
is not being transmitted, and I am not sure how easy the source can
determine that.

Paolo



Re: [Qemu-devel] [PATCH v11 14/15] rdma: introduce MIG_STATE_NONE and change MIG_STATE_SETUP state transition

2013-06-25 Thread Michael R. Hines

On 06/25/2013 10:55 AM, Paolo Bonzini wrote:

Il 25/06/2013 16:54, Michael R. Hines ha scritto:

Do those few lines of code change the protocol?  If yes, I'll go against
all my previous recommendations and suggest a #define.  If no, it is
fine to leave it for later, but I would still suggest posting the patch
on the list just for information.

Ok, you got it - no it does not change the protocol.
I'll use a #define and send it out with a new version for review.

Make sure you send an additional patch on top of these 15.

Paolo


I was wrong - this does require a protocol extension.

This is because the RDMA transfers are asynchronous, and thus
we cannot know in advance that it is safe to unregister the memory
associated with each individual transfer before the transfer actually
completes.

While the destination currently uses the protocol to participate in
*registering* the page, the destination does not participate in the
RDMA transfers themselves, only the source does, and thus would
require a new exchange of messages to block and instruct the
destination to unpin the memory.

- Michael




Re: [Qemu-devel] [PULL 2/8] i.MX: Implement a more complete version of the GPT timer.

2013-06-25 Thread Peter Maydell
On 25 June 2013 19:42, Paolo Bonzini  wrote:
> Il 25/06/2013 20:21, Peter Maydell ha scritto:
>> @@ -416,7 +513,7 @@ static int imx_timerg_init(SysBusDevice *dev)
>>
>>  sysbus_init_irq(dev, &s->irq);
>>  memory_region_init_io(&s->iomem, &imx_timerg_ops,
>> -  s, "imxg-timer",
>> +  s, TYPE_IMX_GPT,
>>0x1000);
>>  sysbus_init_mmio(dev, &s->iomem);
>>
>
> There was some agreement that this is not a good change.

I agree (and more so regarding the use of the macro in the
vmstate name), but nobody actually posted any comment to
that effect against any of the versions of this patch that
got sent out for review...

thanks
-- PMM



Re: [Qemu-devel] [PULL 0/8] arm-devs queue

2013-06-25 Thread Peter Maydell
On 25 June 2013 20:07, Anthony Liguori  wrote:
> Peter Maydell  writes:
>
>> arm-devs pullreq; nothing particularly earthshattering.
>> As with the target-arm pullreq, I'm trying to create a signed
>> pullreq here, so let me know if I've messed it up.
>> The signed tag itself is 'pull-arm-devs-20130625'; arm-devs.for-upstream
>> is the branch name, as usual.
>>
>>
>> The following changes since commit baf8673ca802cb3ea2cdbe94813441d23bde223b:
>>
>>   Merge remote-tracking branch 'stefanha/block' into staging (2013-06-24 
>> 14:33:17 -0500)
>>
>> are available in the git repository at:
>>
>>
>>   git://git.linaro.org/people/pmaydell/qemu-arm.git
>>   arm-devs.for-upstream
>
> No need to resubmit, but you'll want to use the tag name instead of the
> branch name here in the future.

Yeah, I'd wondered if that might be the case.

Oh, do you want the actual tag to have a signed-off-by:
line, and (more generically) do you care if the tag message
actually has any useful content? I notice mst does both.

thanks
-- PMM



Re: [Qemu-devel] [PATCH 10/15] target-i386: convert 'hv_spinlocks' to static property

2013-06-25 Thread Eduardo Habkost
On Tue, Jun 25, 2013 at 05:30:50PM -0300, Eduardo Habkost wrote:
> On Wed, Jun 05, 2013 at 03:18:41PM +0200, Igor Mammedov wrote:
> > Signed-off-by: Igor Mammedov 
> > ---
> [...]
> > @@ -1632,6 +1677,7 @@ static void cpu_x86_parse_featurestr(X86CPU *cpu, 
> > char *features, Error **errp)
> >  } else if (!strcmp(featurestr, "hv-spinlocks")) {
> >  char *err;
> >  const int min = 0xFFF;
> > +char num[32];
> >  numvalue = strtoul(val, &err, 0);
> >  if (!*val || *err) {
> >  error_setg(errp, "bad numerical value %s", val);
> > @@ -1643,7 +1689,8 @@ static void cpu_x86_parse_featurestr(X86CPU *cpu, 
> > char *features, Error **errp)
> >  min);
> >  numvalue = min;
> >  }
> > -env->hyperv_spinlock_attempts = numvalue;
> > +snprintf(num, sizeof(num), "%" PRId32, numvalue);
> > +object_property_parse(OBJECT(cpu), num, featurestr, errp);
> 
> Why not use object_property_set_int()?

Oh, I believe I have asked that before and you have already explained
it: you are using strings to allow the existing object_property_parse()
calls to be easily converted to qdev_prop_register_global() calls later.
Correct?

-- 
Eduardo



Re: [Qemu-devel] [PATCH 10/15] target-i386: convert 'hv_spinlocks' to static property

2013-06-25 Thread Eduardo Habkost
On Wed, Jun 05, 2013 at 03:18:41PM +0200, Igor Mammedov wrote:
> Signed-off-by: Igor Mammedov 
> ---
[...]
> @@ -1632,6 +1677,7 @@ static void cpu_x86_parse_featurestr(X86CPU *cpu, char 
> *features, Error **errp)
>  } else if (!strcmp(featurestr, "hv-spinlocks")) {
>  char *err;
>  const int min = 0xFFF;
> +char num[32];
>  numvalue = strtoul(val, &err, 0);
>  if (!*val || *err) {
>  error_setg(errp, "bad numerical value %s", val);
> @@ -1643,7 +1689,8 @@ static void cpu_x86_parse_featurestr(X86CPU *cpu, char 
> *features, Error **errp)
>  min);
>  numvalue = min;
>  }
> -env->hyperv_spinlock_attempts = numvalue;
> +snprintf(num, sizeof(num), "%" PRId32, numvalue);
> +object_property_parse(OBJECT(cpu), num, featurestr, errp);

Why not use object_property_set_int()?

-- 
Eduardo



Re: [Qemu-devel] [PATCH v4 06/10] qemu-ga: Add Windows VSS provider to quiesce applications on fsfreeze

2013-06-25 Thread Laszlo Ersek
On 06/25/13 20:23, Tomoki Sekiyama wrote:
> On 6/25/13 12:03 , "Laszlo Ersek"  wrote:

>>> +ifeq ($(CONFIG_QGA_VSS),y)
>>> +QEMU_CFLAGS += -DHAS_VSS_SDK
>>
>> Isn't this superfluous? First, I can find no other reference to
>> HAS_VSS_SDK in the series, second, CONFIG_QGA_VSS would be available
>> directly as a macro to source files.
> 
> It is referenced in e.g. qga/commands-win32.c in patch 8/10.

Ah sorry. "ack" (that I sometimes use for recursive grepping, outside of
git clones) didn't return any hits in the directory where I had saved
your patch emails (for git-am), that's why I said "no references in the
series".

Perhaps "ack" ignored the .eml files due to their CRLF line endings...
It did find the references with option "--all".

(Of course I should have noticed something was fishy, since "ack" didn't
find the very occurrence I was questioning! :))

> But as you say, I will omit this, by the second reason.

OK, thanks.

>> Will some VSS service load the provider DLL independently of
>> qemu-ga.exe?
> 
> Yes, VSS provider DLL is registered to VSS on installation of qemu-ga
> service (using .TLB), and loaded into VSS service component when
> snapshot is started.
> 
>> If so, (a) how will they communicate
> 
> Using COM interface.

[...]

>> ... I'll try to continue here. I've peaked forward a little bit, and
>> CQGAVssProvider::CommitSnapshots() seems to be the central method. It
>> kicks the "frozen" event, then blocks until the "thaw" event is kicked
>> back. I wonder how those will translate to QMP communication with
>> libvirt.
> 
> CommitSnapshots() is called by VSS, when the requestor of qemu-ga
> (introduced in next patch) initiates snapshot receiving
> "guest-fsfreeze-freeze" command.
> "frozen" event is used to tell qemu-ga.exe that the fs is frozen,
> then qemu-ga.exe will tell libvirt to "guest-fsfreeze-freeze" is done.
> 
> "thaw" event is kicked by qemnu-ga.exe when it receives
> "guest-fsfreeze-thaw" command. The command is finished when VSS notify
> qeum-ga.exe requestor that VSS snapshot process is completed.

Very interesting. I'm curious about the requestor in the next patch.

Based on the VSS description in MSDN, I thought the requestor would make
some kind of blocking call (like a normal function call) into VSS that
could not return until the full process was done (ie. filesystems were
re-thawed).

This likely isn't the case, as CommitSnapshots() appears to signal
(wake) qemu-ga.exe asynchronously -- the requestor (qemu-ga) and the
provider (the DLL in the VSS component) seem to run in parallel.

The more I think I understand of your design the more I like it.

I'll attempt to continue the review tomorrow.

Thanks!
Laszlo




Re: [Qemu-devel] [PATCH 09/15] target-i386: move hyperv_* static globals to CPUState

2013-06-25 Thread Eduardo Habkost
On Wed, Jun 05, 2013 at 03:18:40PM +0200, Igor Mammedov wrote:
> - since hyperv_* helper functions are used only in target-i386/kvm.c
>   move them there as static helpers
> 
> Signed-off-by: Igor Mammedov 
> Requestd-By: Eduardo Habkost 

Reviewed-by: Eduardo Habkost 


-- 
Eduardo



[Qemu-devel] [Bug 1101210] Re: qemu 1.4.2: usb keyboard not fully working

2013-06-25 Thread Sven
Any comment? The <>| key is still not working in qemu 1.4.2.

** Summary changed:

- qemu 1.3.0: usb keyboard not fully working
+ qemu 1.4.2: usb keyboard not fully working

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1101210

Title:
  qemu 1.4.2: usb keyboard not fully working

Status in QEMU:
  New

Bug description:
  When using the usb keyboard, I can't type the | character. I'm using
  german keyboard layout (de) on the host and inside the guest. As a
  guest OS, I use Linux (e.g. a recent KNOPPIX cd). To obtain the |
  character on a german keyboard, I need to press AltGr + the < or >
  key, i.e. the key right to the left shift.

  The qemu command line is something like this:
  ./qemu-system-i386 -device pci-ohci -device usb-kbd
  I also tried
  ./qemu-system-i386 -usb -usbdevice keyboard
  with the same effect.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1101210/+subscriptions



Re: [Qemu-devel] [PATCH v4 06/10] qemu-ga: Add Windows VSS provider to quiesce applications on fsfreeze

2013-06-25 Thread Tomoki Sekiyama
>Il 25/06/2013 20:23, Tomoki Sekiyama ha scritto:
>>> >(b) shouldn't "qga-obj-y" be independent of "qga-provider.dll"
>> The DLL is linked to qemu-ga.exe because qemu-ga.exe kicks the
>> registration of the DLL (COMRegister(), which depends on the DLL's
>> Internal variable) at the installation of the qemu-ga service.
> 
> Does the DLL also support DllRegisterServer?  I don't know Windows very
> much, but if so I wonder if LoadLibrary/GetProcAddress is preferrable to
> linking with the DLL.
> 
> Paolo

I think that is feasible.

That will cut off qemu-ga.exe from qga-provider.{dll,tlb}, so then,
we should build them separately, like
`make qemu-ga.exe qga/vss-win32-provider/qga-provider.{dll,tlb}`.

Thanks,
Tomoki Sekiyama



Re: [Qemu-devel] [PULL 0/8] arm-devs queue

2013-06-25 Thread Anthony Liguori
Peter Maydell  writes:

> arm-devs pullreq; nothing particularly earthshattering.
> As with the target-arm pullreq, I'm trying to create a signed
> pullreq here, so let me know if I've messed it up.
> The signed tag itself is 'pull-arm-devs-20130625'; arm-devs.for-upstream
> is the branch name, as usual. 
>
>
> The following changes since commit baf8673ca802cb3ea2cdbe94813441d23bde223b:
>
>   Merge remote-tracking branch 'stefanha/block' into staging (2013-06-24 
> 14:33:17 -0500)
>
> are available in the git repository at:
>
>
>   git://git.linaro.org/people/pmaydell/qemu-arm.git
>   arm-devs.for-upstream

No need to resubmit, but you'll want to use the tag name instead of the
branch name here in the future.

Regards,

Anthony Liguori

>
> for you to fetch changes up to 7426aa72c36c908a7d0eae3e38568bb0a70de479:
>
>   nand: Don't inherit from Sysbus (2013-06-25 19:15:46 +0100)
>
> 
> arm-devs queue
>
> 
> Jean-Christophe DUBOIS (2):
>   i.MX: Implement a more complete version of the GPT timer.
>   i.MX: Rework functions/types name and use new style initialization
>
> John Rigby (1):
>   ARM: Allow dumping of device tree
>
> Peter Crosthwaite (3):
>   block/nand: QOM casting sweep
>   block/nand: Convert Sysbus::init to Device::realize
>   nand: Don't inherit from Sysbus
>
> Peter Maydell (1):
>   arm/boot: Free dtb blob memory after use
>
> Stefan Weil (1):
>   i.MX31: Fix PRCS bit test
>
>  hw/arm/boot.c  |   21 +-
>  hw/block/nand.c|   48 +++--
>  hw/misc/imx_ccm.c  |2 +-
>  hw/timer/imx_gpt.c |  558 
> ++--
>  4 files changed, 369 insertions(+), 260 deletions(-)




Re: [Qemu-devel] [PATCH v4 06/10] qemu-ga: Add Windows VSS provider to quiesce applications on fsfreeze

2013-06-25 Thread Paolo Bonzini
Il 25/06/2013 20:23, Tomoki Sekiyama ha scritto:
>> >(b) shouldn't "qga-obj-y" be independent of "qga-provider.dll"
> The DLL is linked to qemu-ga.exe because qemu-ga.exe kicks the
> registration of the DLL (COMRegister(), which depends on the DLL's
> Internal variable) at the installation of the qemu-ga service.
> 

Does the DLL also support DllRegisterServer?  I don't know Windows very
much, but if so I wonder if LoadLibrary/GetProcAddress is preferrable to
linking with the DLL.

Paolo



Re: [Qemu-devel] [PATCH v11 14/15] rdma: introduce MIG_STATE_NONE and change MIG_STATE_SETUP state transition

2013-06-25 Thread Michael R. Hines

On 06/25/2013 10:17 AM, Juan Quintela wrote:

"Michael R. Hines"  wrote:

On 06/25/2013 06:13 AM, Paolo Bonzini wrote:

Il 25/06/2013 11:49, Juan Quintela ha scritto:

mrhi...@linux.vnet.ibm.com wrote:

From: "Michael R. Hines" 

As described in the previous patch, until now, the MIG_STATE_SETUP
state was not really a 'formal' state. It has been used as a 'zero' state
(what we're calling 'NONE' here) and QEMU has been unconditionally transitioning
into this state when the QMP migration command was called. Instead we want to
introduce MIG_STATE_NONE, which is our starting state in the state machine, and
then immediately transition into the MIG_STATE_SETUP state when the QMP migrate
command is issued.

In order to do this, we must delay the transition into MIG_STATE_ACTIVE until
later in the migration_thread(). This is done to be able to timestamp the 
amount of
time spent in the SETUP state for proper accounting to the user during
an RDMA migration.

Furthermore, the management software, until now, has never been aware of the
existence of the SETUP state whatsoever. This must change, because, timing of 
this
state implies that the state actually exists.

These two patches cannot be separated because the 'query_migrate' QMP
switch statement needs to know how to handle this new state transition.

Tested-by: Michael R. Hines 
Signed-off-by: Michael R. Hines 
@@ -316,6 +321,7 @@ static void migrate_fd_cancel(MigrationState *s)
   {
   DPRINTF("cancelling migration\n");
   +migrate_set_state(s, MIG_STATE_SETUP, MIG_STATE_CANCELLED);
   migrate_set_state(s, MIG_STATE_ACTIVE, MIG_STATE_CANCELLED);
   }

This chunk is wrong.

we can call qme_migrate_cancel() at any point,  and it is going to be
called normally from MIG_STATE_ACTIVE.

  migrate_set_satet(s, s->state,  MIG_STATE_CANCELLED)

should do the trick.  Or something like that,  what do you think?

I don't like the three-arguments migrate_set_state, but I don't have any
better idea.

With Juan's modification, it is fine (but not reviewed-by me :)).  While
you resend, the first 13 patches of v10 can be merged (pull request).
You can then rebase the last three on top.

So, I should send a v11 before Juan applies and generates the pull request?

Did I understand correctly?


Michael, did you look at the "debugging/getting the protocol ready" mode
where all pages are unpinned?

Paolo



We did not come up with a design on how such a mode would be
exposed to the user. Implementing this in migration-rdma.c is
just a few lines of code, but without a clear use case in this patch
series, I've not chosen expose it to the user yet without some kind
of agreement with you guys.

Recommendations?

Dunno.  I have tried to use the thing:

I am using -incoming x-rdma:deus:

(qemu) char device redirected to /dev/pts/2 (label serial0)
librdmacm: Warning: couldn't read ABI version.
librdmacm: Warning: assuming: 4
librdmacm: Fatal: unable to get RDMA device list
RDMA ERROR: could not create rdma event channel
-incoming x-rdma:deus:: RDMA ERROR: could not create rdma event channel


I also used "0" instead of deus (the machine name).

The only thing that I did appart from intsalling librdmacm-devel was:


# cat /etc/udev/rules.d/80-infinibad.rules
KERNEL="rdma_cm", NAME="infiniband/%k", MODE="0666"

As per README.  Something else I need to do to use the tcp
implementations?

Later,  Juan.




There's a simple bug found by Visilis affecting localhost migration 
which I believe also

affects using '0' as a hostname in the migration.

Both of these should be fixed in my next version. Will send out today.
(I never tried migrating with 0.0.0.0 until now).

- Michael










[Qemu-devel] [PULL 6/8] block/nand: QOM casting sweep

2013-06-25 Thread Peter Maydell
From: Peter Crosthwaite 

Define and use standard QOM cast macro. Remove usages of DO_UPCAST and
direct -> style casting.

Cc: afaer...@suse.de

Signed-off-by: Peter Crosthwaite 
Reviewed-by: Andreas Färber 
Signed-off-by: Peter Maydell 
---
 hw/block/nand.c |   25 -
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/hw/block/nand.c b/hw/block/nand.c
index 43401a0..861e893 100644
--- a/hw/block/nand.c
+++ b/hw/block/nand.c
@@ -82,6 +82,11 @@ struct NANDFlashState {
 uint32_t ioaddr_vmstate;
 };
 
+#define TYPE_NAND "nand"
+
+#define NAND(obj) \
+OBJECT_CHECK(NANDFlashState, (obj), TYPE_NAND)
+
 static void mem_and(uint8_t *dest, const uint8_t *src, size_t n)
 {
 /* Like memcpy() but we logical-AND the data into the destination */
@@ -224,7 +229,7 @@ static const struct {
 
 static void nand_reset(DeviceState *dev)
 {
-NANDFlashState *s = FROM_SYSBUS(NANDFlashState, SYS_BUS_DEVICE(dev));
+NANDFlashState *s = NAND(dev);
 s->cmd = NAND_CMD_READ0;
 s->addr = 0;
 s->addrlen = 0;
@@ -279,7 +284,7 @@ static void nand_command(NANDFlashState *s)
 break;
 
 case NAND_CMD_RESET:
-nand_reset(&s->busdev.qdev);
+nand_reset(DEVICE(s));
 break;
 
 case NAND_CMD_PAGEPROGRAM1:
@@ -319,14 +324,14 @@ static void nand_command(NANDFlashState *s)
 
 static void nand_pre_save(void *opaque)
 {
-NANDFlashState *s = opaque;
+NANDFlashState *s = NAND(opaque);
 
 s->ioaddr_vmstate = s->ioaddr - s->io;
 }
 
 static int nand_post_load(void *opaque, int version_id)
 {
-NANDFlashState *s = opaque;
+NANDFlashState *s = NAND(opaque);
 
 if (s->ioaddr_vmstate > sizeof(s->io)) {
 return -EINVAL;
@@ -365,7 +370,7 @@ static const VMStateDescription vmstate_nand = {
 static int nand_device_init(SysBusDevice *dev)
 {
 int pagesize;
-NANDFlashState *s = FROM_SYSBUS(NANDFlashState, dev);
+NANDFlashState *s = NAND(dev);
 
 s->buswidth = nand_flash_ids[s->chip_id].width >> 3;
 s->size = nand_flash_ids[s->chip_id].size << 20;
@@ -436,7 +441,7 @@ static void nand_class_init(ObjectClass *klass, void *data)
 }
 
 static const TypeInfo nand_info = {
-.name  = "nand",
+.name  = TYPE_NAND,
 .parent= TYPE_SYS_BUS_DEVICE,
 .instance_size = sizeof(NANDFlashState),
 .class_init= nand_class_init,
@@ -456,7 +461,8 @@ static void nand_register_types(void)
 void nand_setpins(DeviceState *dev, uint8_t cle, uint8_t ale,
   uint8_t ce, uint8_t wp, uint8_t gnd)
 {
-NANDFlashState *s = (NANDFlashState *) dev;
+NANDFlashState *s = NAND(dev);
+
 s->cle = cle;
 s->ale = ale;
 s->ce = ce;
@@ -477,7 +483,8 @@ void nand_getpins(DeviceState *dev, int *rb)
 void nand_setio(DeviceState *dev, uint32_t value)
 {
 int i;
-NANDFlashState *s = (NANDFlashState *) dev;
+NANDFlashState *s = NAND(dev);
+
 if (!s->ce && s->cle) {
 if (nand_flash_ids[s->chip_id].options & NAND_SAMSUNG_LP) {
 if (s->cmd == NAND_CMD_READ0 && value == NAND_CMD_LPREAD2)
@@ -581,7 +588,7 @@ uint32_t nand_getio(DeviceState *dev)
 {
 int offset;
 uint32_t x = 0;
-NANDFlashState *s = (NANDFlashState *) dev;
+NANDFlashState *s = NAND(dev);
 
 /* Allow sequential reading */
 if (!s->iolen && s->cmd == NAND_CMD_READ0) {
-- 
1.7.9.5




[Qemu-devel] [PULL 5/8] i.MX31: Fix PRCS bit test

2013-06-25 Thread Peter Maydell
From: Stefan Weil 

cppcheck detected a condition which was always false.

According to the MCIMX31 Reference Manual, the PRCS bits have to be 01
to select the Frequency Pre-Multiplier (FPM). PRCS uses bits 1 and 2,
so we have to test for 2.

Signed-off-by: Stefan Weil 
Signed-off-by: Peter Chubb 
Message-id: 1370810662-32320-1-git-send-email...@weilnetz.de
Signed-off-by: Peter Maydell 
---
 hw/misc/imx_ccm.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/misc/imx_ccm.c b/hw/misc/imx_ccm.c
index c153a24..427ce5c 100644
--- a/hw/misc/imx_ccm.c
+++ b/hw/misc/imx_ccm.c
@@ -153,7 +153,7 @@ static void update_clocks(IMXCCMState *s)
  * approach
  */
 
-if ((s->ccmr & CCMR_PRCS) == 1) {
+if ((s->ccmr & CCMR_PRCS) == 2) {
 s->pll_refclk_freq = CKIL_FREQ * 1024;
 } else {
 s->pll_refclk_freq = CKIH_FREQ;
-- 
1.7.9.5




[Qemu-devel] [PULL 3/8] i.MX: Rework functions/types name and use new style initialization

2013-06-25 Thread Peter Maydell
From: Jean-Christophe DUBOIS 

* use dynamic cast whenever possible
* Change function names to some more meaningful prefix
* Change type names to a more meaningful one
* use new style device initialization

Signed-off-by: Jean-Christophe DUBOIS 
Message-id: 1369898943-1993-3-git-send-email-...@tribudubois.net
Reviewed-by: Peter Chubb 
Signed-off-by: Peter Maydell 
---
 hw/timer/imx_gpt.c |  171 ++--
 1 file changed, 84 insertions(+), 87 deletions(-)

diff --git a/hw/timer/imx_gpt.c b/hw/timer/imx_gpt.c
index ecf6e53..de53b13 100644
--- a/hw/timer/imx_gpt.c
+++ b/hw/timer/imx_gpt.c
@@ -27,7 +27,7 @@
 #define DEBUG_TIMER 0
 #if DEBUG_TIMER
 
-static char const *imx_timerg_reg_name(uint32_t reg)
+static char const *imx_gpt_reg_name(uint32_t reg)
 {
 switch (reg) {
 case 0:
@@ -67,12 +67,14 @@ static char const *imx_timerg_reg_name(uint32_t reg)
  */
 #define DEBUG_IMPLEMENTATION 1
 #if DEBUG_IMPLEMENTATION
-#  define IPRINTF(fmt, args...) \
+#  define IPRINTF(fmt, args...) \
   do { fprintf(stderr, "%s: " fmt, __func__, ##args); } while (0)
 #else
 #  define IPRINTF(fmt, args...) do {} while (0)
 #endif
 
+#define IMX_GPT(obj) \
+OBJECT_CHECK(IMXGPTState, (obj), TYPE_IMX_GPT)
 /*
  * GPT : General purpose timer
  *
@@ -137,33 +139,33 @@ typedef struct {
 uint32_t freq;
 
 qemu_irq irq;
-} IMXTimerGState;
+} IMXGPTState;
 
-static const VMStateDescription vmstate_imx_timerg = {
+static const VMStateDescription vmstate_imx_timer_gpt = {
 .name = TYPE_IMX_GPT,
 .version_id = 3,
 .minimum_version_id = 3,
 .minimum_version_id_old = 3,
 .fields  = (VMStateField[]) {
-VMSTATE_UINT32(cr, IMXTimerGState),
-VMSTATE_UINT32(pr, IMXTimerGState),
-VMSTATE_UINT32(sr, IMXTimerGState),
-VMSTATE_UINT32(ir, IMXTimerGState),
-VMSTATE_UINT32(ocr1, IMXTimerGState),
-VMSTATE_UINT32(ocr2, IMXTimerGState),
-VMSTATE_UINT32(ocr3, IMXTimerGState),
-VMSTATE_UINT32(icr1, IMXTimerGState),
-VMSTATE_UINT32(icr2, IMXTimerGState),
-VMSTATE_UINT32(cnt, IMXTimerGState),
-VMSTATE_UINT32(next_timeout, IMXTimerGState),
-VMSTATE_UINT32(next_int, IMXTimerGState),
-VMSTATE_UINT32(freq, IMXTimerGState),
-VMSTATE_PTIMER(timer, IMXTimerGState),
+VMSTATE_UINT32(cr, IMXGPTState),
+VMSTATE_UINT32(pr, IMXGPTState),
+VMSTATE_UINT32(sr, IMXGPTState),
+VMSTATE_UINT32(ir, IMXGPTState),
+VMSTATE_UINT32(ocr1, IMXGPTState),
+VMSTATE_UINT32(ocr2, IMXGPTState),
+VMSTATE_UINT32(ocr3, IMXGPTState),
+VMSTATE_UINT32(icr1, IMXGPTState),
+VMSTATE_UINT32(icr2, IMXGPTState),
+VMSTATE_UINT32(cnt, IMXGPTState),
+VMSTATE_UINT32(next_timeout, IMXGPTState),
+VMSTATE_UINT32(next_int, IMXGPTState),
+VMSTATE_UINT32(freq, IMXGPTState),
+VMSTATE_PTIMER(timer, IMXGPTState),
 VMSTATE_END_OF_LIST()
 }
 };
 
-static const IMXClk imx_timerg_clocks[] = {
+static const IMXClk imx_gpt_clocks[] = {
 NOCLK,/* 000 No clock source */
 IPG,  /* 001 ipg_clk, 532MHz*/
 IPG,  /* 010 ipg_clk_highfreq */
@@ -174,10 +176,10 @@ static const IMXClk imx_timerg_clocks[] = {
 NOCLK,/* 111 not defined */
 };
 
-static void imx_timerg_set_freq(IMXTimerGState *s)
+static void imx_gpt_set_freq(IMXGPTState *s)
 {
 uint32_t clksrc = extract32(s->cr, GPT_CR_CLKSRC_SHIFT, 3);
-uint32_t freq = imx_clock_frequency(s->ccm, imx_timerg_clocks[clksrc])
+uint32_t freq = imx_clock_frequency(s->ccm, imx_gpt_clocks[clksrc])
 / (1 + s->pr);
 s->freq = freq;
 
@@ -188,7 +190,7 @@ static void imx_timerg_set_freq(IMXTimerGState *s)
 }
 }
 
-static void imx_timerg_update(IMXTimerGState *s)
+static void imx_gpt_update_int(IMXGPTState *s)
 {
 if ((s->sr & s->ir) && (s->cr & GPT_CR_EN)) {
 qemu_irq_raise(s->irq);
@@ -197,14 +199,14 @@ static void imx_timerg_update(IMXTimerGState *s)
 }
 }
 
-static uint32_t imx_timerg_update_counts(IMXTimerGState *s)
+static uint32_t imx_gpt_update_count(IMXGPTState *s)
 {
 s->cnt = s->next_timeout - (uint32_t)ptimer_get_count(s->timer);
 
 return s->cnt;
 }
 
-static inline uint32_t imx_timerg_find_limit(uint32_t count, uint32_t reg,
+static inline uint32_t imx_gpt_find_limit(uint32_t count, uint32_t reg,
  uint32_t timeout)
 {
 if ((count < reg) && (timeout > reg)) {
@@ -214,7 +216,7 @@ static inline uint32_t imx_timerg_find_limit(uint32_t 
count, uint32_t reg,
 return timeout;
 }
 
-static void imx_timerg_compute_next_timeout(IMXTimerGState *s, bool event)
+static void imx_gpt_compute_next_timeout(IMXGPTState *s, bool event)
 {
 uint32_t timeout = TIMER_MAX;
 uint32_t count = 0;
@@ -233,24 +235,24 @@ static void 
imx_timerg_compute_next_timeout

[Qemu-devel] [PULL 0/8] arm-devs queue

2013-06-25 Thread Peter Maydell
arm-devs pullreq; nothing particularly earthshattering.
As with the target-arm pullreq, I'm trying to create a signed
pullreq here, so let me know if I've messed it up.
The signed tag itself is 'pull-arm-devs-20130625'; arm-devs.for-upstream
is the branch name, as usual. 


The following changes since commit baf8673ca802cb3ea2cdbe94813441d23bde223b:

  Merge remote-tracking branch 'stefanha/block' into staging (2013-06-24 
14:33:17 -0500)

are available in the git repository at:


  git://git.linaro.org/people/pmaydell/qemu-arm.git arm-devs.for-upstream

for you to fetch changes up to 7426aa72c36c908a7d0eae3e38568bb0a70de479:

  nand: Don't inherit from Sysbus (2013-06-25 19:15:46 +0100)


arm-devs queue


Jean-Christophe DUBOIS (2):
  i.MX: Implement a more complete version of the GPT timer.
  i.MX: Rework functions/types name and use new style initialization

John Rigby (1):
  ARM: Allow dumping of device tree

Peter Crosthwaite (3):
  block/nand: QOM casting sweep
  block/nand: Convert Sysbus::init to Device::realize
  nand: Don't inherit from Sysbus

Peter Maydell (1):
  arm/boot: Free dtb blob memory after use

Stefan Weil (1):
  i.MX31: Fix PRCS bit test

 hw/arm/boot.c  |   21 +-
 hw/block/nand.c|   48 +++--
 hw/misc/imx_ccm.c  |2 +-
 hw/timer/imx_gpt.c |  558 ++--
 4 files changed, 369 insertions(+), 260 deletions(-)



Re: [Qemu-devel] [PATCH v11 11/15] rdma: core logic

2013-06-25 Thread Michael R. Hines

On 06/25/2013 12:31 PM, Vasilis Liaskovitis wrote:

Hi,

On Mon, Jun 24, 2013 at 09:58:01PM -0400, mrhi...@linux.vnet.ibm.com wrote:

From: "Michael R. Hines" 


[...]

+/*
+ * Put in the log file which RDMA device was opened and the details
+ * associated with that device.
+ */
+static void qemu_rdma_dump_id(const char *who, struct ibv_context *verbs)
+{
+printf("%s RDMA Device opened: kernel name %s "
+   "uverbs device name %s, "
+   "infiniband_verbs class device path %s,"
+   " infiniband class device path %s\n",
+who,
+verbs->device->name,
+verbs->device->dev_name,
+verbs->device->dev_path,
+verbs->device->ibdev_path);
+}

see below


+static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
+{
+int ret = -EINVAL, idx;
+struct sockaddr_in sin;
+struct rdma_cm_id *listen_id;
+char ip[40] = "unknown";
+
+for (idx = 0; idx < RDMA_CONTROL_MAX_WR; idx++) {
+rdma->wr_data[idx].control_len = 0;
+rdma->wr_data[idx].control_curr = NULL;
+}
+
+if (rdma->host == NULL) {
+ERROR(errp, "RDMA host is not set!\n");
+rdma->error_state = -EINVAL;
+return -1;
+}
+/* create CM channel */
+rdma->channel = rdma_create_event_channel();
+if (!rdma->channel) {
+ERROR(errp, "could not create rdma event channel\n");
+rdma->error_state = -EINVAL;
+return -1;
+}
+
+/* create CM id */
+ret = rdma_create_id(rdma->channel, &listen_id, NULL, RDMA_PS_TCP);
+if (ret) {
+ERROR(errp, "could not create cm_id!\n");
+goto err_dest_init_create_listen_id;
+}
+
+memset(&sin, 0, sizeof(sin));
+sin.sin_family = AF_INET;
+sin.sin_port = htons(rdma->port);
+
+if (rdma->host && strcmp("", rdma->host)) {
+struct hostent *dest_addr;
+dest_addr = gethostbyname(rdma->host);
+if (!dest_addr) {
+ERROR(errp, "migration could not gethostbyname!\n");
+ret = -EINVAL;
+goto err_dest_init_bind_addr;
+}
+memcpy(&sin.sin_addr.s_addr, dest_addr->h_addr,
+dest_addr->h_length);
+inet_ntop(AF_INET, dest_addr->h_addr, ip, sizeof ip);
+} else {
+sin.sin_addr.s_addr = INADDR_ANY;
+}
+
+DPRINTF("%s => %s\n", rdma->host, ip);
+
+ret = rdma_bind_addr(listen_id, (struct sockaddr *)&sin);
+if (ret) {
+ERROR(errp, "Error: could not rdma_bind_addr!\n");
+goto err_dest_init_bind_addr;
+}
+
+rdma->listen_id = listen_id;
+if (listen_id->verbs) {
+rdma->verbs = listen_id->verbs;
+}
+qemu_rdma_dump_id("dest_init", rdma->verbs);


I wonder if you have ever hit the case where rdma_bind_addr() does not set the
verbs structure in listen_id because we are binding to the loopback device (also
see linux kernel commit 8523c048).
I keep hitting this case on my destination VM ("incoming x-rdma:host:port)

Then I think qemu_rdma_dump_id can segfault trying to dereference a null verbs
structure. The dump_id function should check for non-NULL verbs argument,
or the dump should be made only in the (verbs != NULL) if clause.

Disabling the dump_id above, I have rdma_resolve_addr() problems on the source
VM side (getting RDMA_CM_EVENT_ADDR_ERROR instead of
RDMA_CM_EVENT_ADDR_RESOLVED).

I assume that is because of the null verbs structure destination problem above.
qemu_rdma_dest_prepare() will always fail with a NULL verbs argument:


Good catch, thank you. I'll fix this immediately in the next version.

I never tried binding to the localhost before..


+
+static int qemu_rdma_dest_prepare(RDMAContext *rdma, Error **errp)
+{
+int ret;
+int idx;
+
+if (!rdma->verbs) {
+ERROR(errp, "no verbs context!\n");
+return 0;
+}

It is first called from rdma_start_incoming_migration() and will fail with the
loopback binding case (rdma->verbs == NULL).

however later qemu_rdma_accept() will check against the incoming cm_event
verbs structure and set the RDMAContext's verb struct, calling
qemu_rdma_dest_prepare with that struct:

[...]

+static int qemu_rdma_accept(RDMAContext *rdma)

[...]

+if (!rdma->verbs) {
+rdma->verbs = verbs;
+/*
+ * Cannot propagate errp, as there is no error pointer
+ * to be propagated.
+ */
+ret = qemu_rdma_dest_prepare(rdma, NULL);
+if (ret) {
+fprintf(stderr, "rdma migration: error preparing dest!\n");
+goto err_rdma_dest_wait;
+}

Are these two cases intentionally different?


Another good catch. We should definitely allow loopback RDMA without any 
issues.


I will fix this immediately as well in the next version.

- Michael




Re: [Qemu-devel] [PULL 2/8] i.MX: Implement a more complete version of the GPT timer.

2013-06-25 Thread Paolo Bonzini
Il 25/06/2013 20:21, Peter Maydell ha scritto:
> @@ -416,7 +513,7 @@ static int imx_timerg_init(SysBusDevice *dev)
>  
>  sysbus_init_irq(dev, &s->irq);
>  memory_region_init_io(&s->iomem, &imx_timerg_ops,
> -  s, "imxg-timer",
> +  s, TYPE_IMX_GPT,
>0x1000);
>  sysbus_init_mmio(dev, &s->iomem);
>  

There was some agreement that this is not a good change.

Paolo



[Qemu-devel] [PULL 8/8] nand: Don't inherit from Sysbus

2013-06-25 Thread Peter Maydell
From: Peter Crosthwaite 

Nand chips are not sysbus devices - they do not have any sense of MMIO,
nor interrupts. Re-parent to TYPE_DEVICE accordingly.

Cc: afaer...@suse.de

Signed-off-by: Peter Crosthwaite 
Reviewed-by: Andreas Färber 
Signed-off-by: Peter Maydell 
---
 hw/block/nand.c |7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/hw/block/nand.c b/hw/block/nand.c
index a0232d1..a871ce0 100644
--- a/hw/block/nand.c
+++ b/hw/block/nand.c
@@ -21,7 +21,7 @@
 # include "hw/hw.h"
 # include "hw/block/flash.h"
 # include "sysemu/blockdev.h"
-# include "hw/sysbus.h"
+#include "hw/qdev.h"
 #include "qemu/error-report.h"
 
 # define NAND_CMD_READ00x00
@@ -54,7 +54,8 @@
 
 typedef struct NANDFlashState NANDFlashState;
 struct NANDFlashState {
-SysBusDevice busdev;
+DeviceState parent_obj;
+
 uint8_t manf_id, chip_id;
 uint8_t buswidth; /* in BYTES */
 int size, pages;
@@ -440,7 +441,7 @@ static void nand_class_init(ObjectClass *klass, void *data)
 
 static const TypeInfo nand_info = {
 .name  = TYPE_NAND,
-.parent= TYPE_SYS_BUS_DEVICE,
+.parent= TYPE_DEVICE,
 .instance_size = sizeof(NANDFlashState),
 .class_init= nand_class_init,
 };
-- 
1.7.9.5




Re: [Qemu-devel] [PATCH v4 06/10] qemu-ga: Add Windows VSS provider to quiesce applications on fsfreeze

2013-06-25 Thread Tomoki Sekiyama
Thanks for your review.

On 6/25/13 12:03 , "Laszlo Ersek"  wrote:
>I'm trying to wrap my head around the build changes first.

>> diff --git a/configure b/configure
>> index cafe830..8e45fad 100755
>> --- a/configure
>> +++ b/configure
>> @@ -3490,9 +3490,12 @@ if test "$softmmu" = yes ; then
>>virtfs=no
>>  fi
>>fi
>> -  if [ "$linux" = "yes" -o "$bsd" = "yes" -o "$solaris" = "yes" ] ;
>>then
>> +  if [ "$linux" = "yes" -o "$bsd" = "yes" -o "$solaris" = "yes" -o
>>"$mingw32" = "yes" ] ; then
>>  if [ "$guest_agent" = "yes" ]; then
>>tools="qemu-ga\$(EXESUF) $tools"
>> +  if [ "$mingw32" = "yes" ]; then
>> +tools="qga/vss-win32-provider/qga-provider.dll
>>qga/vss-win32-provider/qga-provider.tlb $tools"
>> +  fi
>>  fi
>>fi
>>  fi
>
>This adds three targets to "tools" on mingw32 -- "qemu-ga" hasn't been
>there until now. Is this actually a small, unrelated bugfix?

That is true, it is not necessary to build qemu-ga.exe.
The intention was to do everything by just `make`.

>I'm asking because I build upstream qemu-ga.exe on my RHEL-6 laptop as
>follows -- I don't have pixman installed, and configure forces me to
>specify --disable-tools as well:
> 
>$ ./configure --without-pixman --disable-system --disable-tools \
>  --cross-prefix=i686-pc-mingw32-
>$ make qemu-ga.exe
>
>Plain "make" after the above ./configure doesn't produce anything
>useful. Will I have to specify the DLL and TLB targets too on the
>command line? (Apologies, I can't try it out now, msiextract doesn't
>work for me.)

You don't have to specify DLL and TLB targets because qemu-ga.exe
has dependencies to them.

>> diff --git a/Makefile.objs b/Makefile.objs
>> index 286ce06..b6c1505 100644
>> --- a/Makefile.objs
>> +++ b/Makefile.objs
>> @@ -102,6 +102,7 @@ common-obj-y += disas/
>>  # FIXME: a few definitions from qapi-types.o/qapi-visit.o are needed
>>  # by libqemuutil.a.  These should be moved to a separate .json schema.
>>  qga-obj-y = qga/ qapi-types.o qapi-visit.o
>> +qga-prv-obj-y = qga/
>>
>>  vl.o: QEMU_CFLAGS+=$(GPROF_CFLAGS)
>>
>> @@ -113,6 +114,7 @@ nested-vars += \
>>  stub-obj-y \
>>  util-obj-y \
>>  qga-obj-y \
>> +qga-prv-obj-y \
>>  block-obj-y \
>>  common-obj-y
>>  dummy := $(call unnest-vars)
>
>What does this do? Does "qga-prv-obj-y" stand for "all objects under
>'qga'"?

No, these are objects for qga/vss-win32-provider/, used to build
qga-provider.dll. It looks better to be fully spelled as Paolo
said in another mail.


>Or does it mean "look into qga/Makefile.objs for further objects"?

Yes...

>>diff --git a/qga/Makefile.objs b/qga/Makefile.objs
>> index b8d7cd0..8d93866 100644
>> --- a/qga/Makefile.objs
>> +++ b/qga/Makefile.objs
>> @@ -3,3 +3,9 @@ qga-obj-$(CONFIG_POSIX) += commands-posix.o
>>channel-posix.o
>>  qga-obj-$(CONFIG_WIN32) += commands-win32.o channel-win32.o
>>service-win32.o
>>  qga-obj-y += qapi-generated/qga-qapi-types.o
>>qapi-generated/qga-qapi-visit.o
>>  qga-obj-y += qapi-generated/qga-qmp-marshal.o
>> +
>> +ifeq ($(CONFIG_QGA_VSS),y)
>> +QEMU_CFLAGS += -DHAS_VSS_SDK
>
>Isn't this superfluous? First, I can find no other reference to
>HAS_VSS_SDK in the series, second, CONFIG_QGA_VSS would be available
>directly as a macro to source files.

It is referenced in e.g. qga/commands-win32.c in patch 8/10.
But as you say, I will omit this, by the second reason.

>> +qga-obj-y += vss-win32-provider/
>> +qga-prv-obj-y += vss-win32-provider/
>> +endif
>
>So we're probably just saying "look into
>vss-win32-provider/Makefile.objs for further object files", for both
>variables.

Yes...

>> diff --git a/qga/vss-win32-provider/Makefile.objs
>>b/qga/vss-win32-provider/Makefile.objs
>> new file mode 100644
>> index 000..1fe1f8f
>> --- /dev/null
>> +++ b/qga/vss-win32-provider/Makefile.objs
>> @@ -0,0 +1,24 @@
>> +# rules to build qga-provider.dll
>> +
>> +qga-obj-y += qga-provider.dll
>> +qga-prv-obj-y += provider.o install.o
>> +
>> +obj-qga-prv-obj-y = $(addprefix $(obj)/, $(qga-prv-obj-y))
>> +$(obj-qga-prv-obj-y): QEMU_CXXFLAGS = $(filter-out -Wstrict-prototypes
>>-Wmissing-prototypes -Wnested-externs -Wold-style-declaration
>>-Wold-style-definition -Wredundant-decls -fstack-protector-all,
>>$(QEMU_CFLAGS)) -Wno-unknown-pragmas -Wno-delete-non-virtual-dtor
>> +
>> +$(obj)/qga-provider.dll: LDFLAGS = -shared
>>-Wl,--add-stdcall-alias,--enable-stdcall-fixup -lole32 -loleaut32
>>-lshlwapi -luuid -static
>> +$(obj)/qga-provider.dll: $(obj-qga-prv-obj-y)
>>$(SRC_PATH)/$(obj)/qga-provider.def $(obj)/qga-provider.tlb
>> +$(call quiet-command,$(CXX) -o $@ $(qga-prv-obj-y)
>>$(SRC_PATH)/qga/vss-win32-provider/qga-provider.def $(CXXFLAGS)
>>$(LDFLAGS),"  LINK  $(TARGET_DIR)$@")
>
>I'm having a hard time following this.
>
>Looks like "qga-prv-obj-y" consists of nothing more than "provider.o"
>and "install.o", and that "qga-provider.dll" depends on them. In theory,
>would it be possible *not* to introduce "qga-prv-obj-y" in the
>highe

[Qemu-devel] [PULL 2/8] i.MX: Implement a more complete version of the GPT timer.

2013-06-25 Thread Peter Maydell
From: Jean-Christophe DUBOIS 

* implement compare 1 2 and 3 registers
* simplify Debug printf

Signed-off-by: Jean-Christophe DUBOIS 
Message-id: 1369898943-1993-2-git-send-email-...@tribudubois.net
Reviewed-by: Peter Chubb 
Signed-off-by: Peter Maydell 
---
 hw/timer/imx_gpt.c |  443 +++-
 1 file changed, 269 insertions(+), 174 deletions(-)

diff --git a/hw/timer/imx_gpt.c b/hw/timer/imx_gpt.c
index d8c4f0b..ecf6e53 100644
--- a/hw/timer/imx_gpt.c
+++ b/hw/timer/imx_gpt.c
@@ -5,6 +5,7 @@
  * Copyright (c) 2011 NICTA Pty Ltd
  * Originally written by Hans Jiang
  * Updated by Peter Chubb
+ * Updated by Jean-Christophe Dubois
  *
  * This code is licensed under GPL version 2 or later.  See
  * the COPYING file in the top-level directory.
@@ -18,10 +19,44 @@
 #include "hw/sysbus.h"
 #include "hw/arm/imx.h"
 
-//#define DEBUG_TIMER 1
-#ifdef DEBUG_TIMER
+#define TYPE_IMX_GPT "imx.gpt"
+
+/*
+ * Define to 1 for debug messages
+ */
+#define DEBUG_TIMER 0
+#if DEBUG_TIMER
+
+static char const *imx_timerg_reg_name(uint32_t reg)
+{
+switch (reg) {
+case 0:
+return "CR";
+case 1:
+return "PR";
+case 2:
+return "SR";
+case 3:
+return "IR";
+case 4:
+return "OCR1";
+case 5:
+return "OCR2";
+case 6:
+return "OCR3";
+case 7:
+return "ICR1";
+case 8:
+return "ICR2";
+case 9:
+return "CNT";
+default:
+return "[?]";
+}
+}
+
 #  define DPRINTF(fmt, args...) \
-  do { printf("imx_timer: " fmt , ##args); } while (0)
+  do { printf("%s: " fmt , __func__, ##args); } while (0)
 #else
 #  define DPRINTF(fmt, args...) do {} while (0)
 #endif
@@ -33,7 +68,7 @@
 #define DEBUG_IMPLEMENTATION 1
 #if DEBUG_IMPLEMENTATION
 #  define IPRINTF(fmt, args...) \
-do  { fprintf(stderr, "imx_timer: " fmt, ##args); } while (0)
+  do { fprintf(stderr, "%s: " fmt, __func__, ##args); } while (0)
 #else
 #  define IPRINTF(fmt, args...) do {} while (0)
 #endif
@@ -43,16 +78,7 @@
  *
  * This timer counts up continuously while it is enabled, resetting itself
  * to 0 when it reaches TIMER_MAX (in freerun mode) or when it
- * reaches the value of ocr1 (in periodic mode).  WE simulate this using a
- * QEMU ptimer counting down from ocr1 and reloading from ocr1 in
- * periodic mode, or counting from ocr1 to zero, then TIMER_MAX - ocr1.
- * waiting_rov is set when counting from TIMER_MAX.
- *
- * In the real hardware, there are three comparison registers that can
- * trigger interrupts, and compare channel 1 can be used to
- * force-reset the timer. However, this is a `bare-bones'
- * implementation: only what Linux 3.x uses has been implemented
- * (free-running timer from 0 to OCR1 or TIMER_MAX) .
+ * reaches the value of one of the ocrX (in periodic mode).
  */
 
 #define TIMER_MAX  0XUL
@@ -79,9 +105,13 @@
 #define GPT_CR_FO3(1 << 31) /* Force Output Compare Channel 3 */
 
 #define GPT_SR_OF1  (1 << 0)
+#define GPT_SR_OF2  (1 << 1)
+#define GPT_SR_OF3  (1 << 2)
 #define GPT_SR_ROV  (1 << 5)
 
 #define GPT_IR_OF1IE  (1 << 0)
+#define GPT_IR_OF2IE  (1 << 1)
+#define GPT_IR_OF3IE  (1 << 2)
 #define GPT_IR_ROVIE  (1 << 5)
 
 typedef struct {
@@ -101,15 +131,19 @@ typedef struct {
 uint32_t icr2;
 uint32_t cnt;
 
-uint32_t waiting_rov;
+uint32_t next_timeout;
+uint32_t next_int;
+
+uint32_t freq;
+
 qemu_irq irq;
 } IMXTimerGState;
 
 static const VMStateDescription vmstate_imx_timerg = {
-.name = "imx-timerg",
-.version_id = 2,
-.minimum_version_id = 2,
-.minimum_version_id_old = 2,
+.name = TYPE_IMX_GPT,
+.version_id = 3,
+.minimum_version_id = 3,
+.minimum_version_id_old = 3,
 .fields  = (VMStateField[]) {
 VMSTATE_UINT32(cr, IMXTimerGState),
 VMSTATE_UINT32(pr, IMXTimerGState),
@@ -121,7 +155,9 @@ static const VMStateDescription vmstate_imx_timerg = {
 VMSTATE_UINT32(icr1, IMXTimerGState),
 VMSTATE_UINT32(icr2, IMXTimerGState),
 VMSTATE_UINT32(cnt, IMXTimerGState),
-VMSTATE_UINT32(waiting_rov, IMXTimerGState),
+VMSTATE_UINT32(next_timeout, IMXTimerGState),
+VMSTATE_UINT32(next_int, IMXTimerGState),
+VMSTATE_UINT32(freq, IMXTimerGState),
 VMSTATE_PTIMER(timer, IMXTimerGState),
 VMSTATE_END_OF_LIST()
 }
@@ -138,16 +174,14 @@ static const IMXClk imx_timerg_clocks[] = {
 NOCLK,/* 111 not defined */
 };
 
-
 static void imx_timerg_set_freq(IMXTimerGState *s)
 {
-int clksrc;
-uint32_t freq;
-
-clksrc = (s->cr >> GPT_CR_CLKSRC_SHIFT) & GPT_CR_CLKSRC_MASK;
-freq = imx_clock_frequency(s->ccm, imx_timerg_clocks[clksrc]) / (1 + 
s->pr);
+uint32_t clksrc = extract32(s->cr, GPT_CR_CLKSRC_SHIFT, 3);
+uint32_t freq = imx_clock_frequency(s->ccm, imx_timerg_clocks[clksrc])
+  

[Qemu-devel] [PULL 4/8] arm/boot: Free dtb blob memory after use

2013-06-25 Thread Peter Maydell
The dtb blob returned by load_device_tree() is in memory allocated
with g_malloc(). Free it accordingly once we have copied its
contents into the guest memory. To make this easy, we need also to
clean up the error handling in load_dtb() so that we consistently
handle errors in the same way (by printing a message and then
returning -1, rather than either plowing on or exiting immediately).

Signed-off-by: Peter Maydell 
Reviewed-by: Andreas Färber 
Message-id: 1371209256-11408-1-git-send-email-peter.mayd...@linaro.org
---
 hw/arm/boot.c |   20 +++-
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index 797c691..7c0090f 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -237,14 +237,14 @@ static int load_dtb(hwaddr addr, const struct 
arm_boot_info *binfo)
 filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, binfo->dtb_filename);
 if (!filename) {
 fprintf(stderr, "Couldn't open dtb file %s\n", binfo->dtb_filename);
-return -1;
+goto fail;
 }
 
 fdt = load_device_tree(filename, &size);
 if (!fdt) {
 fprintf(stderr, "Couldn't open dtb file %s\n", filename);
 g_free(filename);
-return -1;
+goto fail;
 }
 g_free(filename);
 
@@ -252,7 +252,7 @@ static int load_dtb(hwaddr addr, const struct arm_boot_info 
*binfo)
 scells = qemu_devtree_getprop_cell(fdt, "/", "#size-cells");
 if (acells == 0 || scells == 0) {
 fprintf(stderr, "dtb file invalid (#address-cells or #size-cells 
0)\n");
-return -1;
+goto fail;
 }
 
 mem_reg_propsize = acells + scells;
@@ -264,7 +264,7 @@ static int load_dtb(hwaddr addr, const struct arm_boot_info 
*binfo)
 } else if (hival != 0) {
 fprintf(stderr, "qemu: dtb file not compatible with "
 "RAM start address > 4GB\n");
-exit(1);
+goto fail;
 }
 mem_reg_property[acells + scells - 1] = cpu_to_be32(binfo->ram_size);
 hival = cpu_to_be32(binfo->ram_size >> 32);
@@ -273,13 +273,14 @@ static int load_dtb(hwaddr addr, const struct 
arm_boot_info *binfo)
 } else if (hival != 0) {
 fprintf(stderr, "qemu: dtb file not compatible with "
 "RAM size > 4GB\n");
-exit(1);
+goto fail;
 }
 
 rc = qemu_devtree_setprop(fdt, "/memory", "reg", mem_reg_property,
   mem_reg_propsize * sizeof(uint32_t));
 if (rc < 0) {
 fprintf(stderr, "couldn't set /memory/reg\n");
+goto fail;
 }
 
 if (binfo->kernel_cmdline && *binfo->kernel_cmdline) {
@@ -287,6 +288,7 @@ static int load_dtb(hwaddr addr, const struct arm_boot_info 
*binfo)
   binfo->kernel_cmdline);
 if (rc < 0) {
 fprintf(stderr, "couldn't set /chosen/bootargs\n");
+goto fail;
 }
 }
 
@@ -295,19 +297,27 @@ static int load_dtb(hwaddr addr, const struct 
arm_boot_info *binfo)
 binfo->initrd_start);
 if (rc < 0) {
 fprintf(stderr, "couldn't set /chosen/linux,initrd-start\n");
+goto fail;
 }
 
 rc = qemu_devtree_setprop_cell(fdt, "/chosen", "linux,initrd-end",
 binfo->initrd_start + binfo->initrd_size);
 if (rc < 0) {
 fprintf(stderr, "couldn't set /chosen/linux,initrd-end\n");
+goto fail;
 }
 }
 qemu_devtree_dumpdtb(fdt, size);
 
 cpu_physical_memory_write(addr, fdt, size);
 
+g_free(fdt);
+
 return 0;
+
+fail:
+g_free(fdt);
+return -1;
 }
 
 static void do_cpu_reset(void *opaque)
-- 
1.7.9.5




[Qemu-devel] [PULL 7/8] block/nand: Convert Sysbus::init to Device::realize

2013-06-25 Thread Peter Maydell
From: Peter Crosthwaite 

The prescribed transition from Sysbus::init function to a
Device::realize.

Cc: afaer...@suse.de

Signed-off-by: Peter Crosthwaite 
Reviewed-by: Andreas Färber 
Signed-off-by: Peter Maydell 
---
 hw/block/nand.c |   16 +++-
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/hw/block/nand.c b/hw/block/nand.c
index 861e893..a0232d1 100644
--- a/hw/block/nand.c
+++ b/hw/block/nand.c
@@ -367,7 +367,7 @@ static const VMStateDescription vmstate_nand = {
 }
 };
 
-static int nand_device_init(SysBusDevice *dev)
+static void nand_realize(DeviceState *dev, Error **errp)
 {
 int pagesize;
 NANDFlashState *s = NAND(dev);
@@ -393,16 +393,17 @@ static int nand_device_init(SysBusDevice *dev)
 nand_init_2048(s);
 break;
 default:
-error_report("Unsupported NAND block size");
-return -1;
+error_setg(errp, "Unsupported NAND block size %#x\n",
+   1 << s->page_shift);
+return;
 }
 
 pagesize = 1 << s->oob_shift;
 s->mem_oob = 1;
 if (s->bdrv) {
 if (bdrv_is_read_only(s->bdrv)) {
-error_report("Can't use a read-only drive");
-return -1;
+error_setg(errp, "Can't use a read-only drive");
+return;
 }
 if (bdrv_getlength(s->bdrv) >=
 (s->pages << s->page_shift) + (s->pages << s->oob_shift)) {
@@ -418,8 +419,6 @@ static int nand_device_init(SysBusDevice *dev)
 }
 /* Give s->ioaddr a sane value in case we save state before it is used. */
 s->ioaddr = s->io;
-
-return 0;
 }
 
 static Property nand_properties[] = {
@@ -432,9 +431,8 @@ static Property nand_properties[] = {
 static void nand_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
-SysBusDeviceClass *k = SYS_BUS_DEVICE_CLASS(klass);
 
-k->init = nand_device_init;
+dc->realize = nand_realize;
 dc->reset = nand_reset;
 dc->vmsd = &vmstate_nand;
 dc->props = nand_properties;
-- 
1.7.9.5




[Qemu-devel] [PULL 1/8] ARM: Allow dumping of device tree

2013-06-25 Thread Peter Maydell
From: John Rigby 

By calling qemu_devtree_dumpdtb near the end of load_dtb.

Signed-off-by: John Rigby 
Signed-off-by: Peter Maydell 
---
 hw/arm/boot.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index defcf15..797c691 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -303,6 +303,7 @@ static int load_dtb(hwaddr addr, const struct arm_boot_info 
*binfo)
 fprintf(stderr, "couldn't set /chosen/linux,initrd-end\n");
 }
 }
+qemu_devtree_dumpdtb(fdt, size);
 
 cpu_physical_memory_write(addr, fdt, size);
 
-- 
1.7.9.5




Re: [Qemu-devel] [PATCH v1 0/5] Nand Cleanup

2013-06-25 Thread Peter Maydell
On 18 June 2013 12:07,   wrote:
> From: Peter Crosthwaite 
>
>
> Misc refactorings and cleanup of NAND.
>
>
> Peter Crosthwaite (5):
>   block/nand: Factor out common code
>   block/nand: Formatting sweep
>   block/nand: QOM casting sweep
>   block/nand: Convert Sysbus::init to Device::realize
>   nand: Don't inherit from Sysbus

I've applied patches 3..5 to arm-devs.next.

Thanks
-- PMM



Re: [Qemu-devel] [PATCH] block: add drive_backup HMP command

2013-06-25 Thread Ian Main
On Tue, Jun 25, 2013 at 04:36:53PM +0200, Paolo Bonzini wrote:
> Il 25/06/2013 16:06, Kevin Wolf ha scritto:
> > Am 25.06.2013 um 15:49 hat Paolo Bonzini geschrieben:
> >> Il 25/06/2013 15:26, Kevin Wolf ha scritto:
> > +if (!full) {
> > +error_setg(&errp, "-f is not yet implemented");
> > +hmp_handle_error(mon, &errp);
> > +return;
> > +}
> >>> Then why make it a valid option and confuse users in the help text by
> >>> describing options that don't really exist?
> >>
> >> Because otherwise we're stuck with a meaning of the flag that is
> >> different between drive-mirror and block-backup.
> > 
> > Do you mean when "otherwise" isn't only "we don't add -f now", but also
> > "we accidentally add a -f with different meaning later"? Not sure if
> > there's a real danger of that when we're aware that we want -f with the
> > same meaning as for mirroring.
> 
> We have drive-mirror with:
> * the default is 'top'
> * -f gives 'full'
> 
> block-backup for now only implements 'full'.  If we do not force the
> user to add -f, the default is 'full' and we should not change it later.
> 
> However, I would move the "not yet implemented" error from HMP to QMP.
> This way, both drive-mirror and block-backup will have a mandatory
> 'sync' argument.  We plan to implement it anyway, and it makes sense imo
> to avoid gratuitous differences in the APIs.

I'm working on a patch to implement sync modes.  I should be posting it
soon.

Ian



[Qemu-devel] [PULL 5/8] target-arm: Initialize cpreg list from KVM when using KVM

2013-06-25 Thread Peter Maydell
When using KVM, use the kernel's initial state to set up the
cpreg list, and sync to and from the kernel when doing
migration.

Signed-off-by: Peter Maydell 
---
 target-arm/Makefile.objs |1 +
 target-arm/kvm-stub.c|   23 +++
 target-arm/kvm.c |  164 +-
 target-arm/kvm_arm.h |   33 ++
 target-arm/machine.c |   30 +++--
 5 files changed, 245 insertions(+), 6 deletions(-)
 create mode 100644 target-arm/kvm-stub.c

diff --git a/target-arm/Makefile.objs b/target-arm/Makefile.objs
index d89b57c..4a6e52e 100644
--- a/target-arm/Makefile.objs
+++ b/target-arm/Makefile.objs
@@ -1,5 +1,6 @@
 obj-y += arm-semi.o
 obj-$(CONFIG_SOFTMMU) += machine.o
 obj-$(CONFIG_KVM) += kvm.o
+obj-$(CONFIG_NO_KVM) += kvm-stub.o
 obj-y += translate.o op_helper.o helper.o cpu.o
 obj-y += neon_helper.o iwmmxt_helper.o
diff --git a/target-arm/kvm-stub.c b/target-arm/kvm-stub.c
new file mode 100644
index 000..cd1849f
--- /dev/null
+++ b/target-arm/kvm-stub.c
@@ -0,0 +1,23 @@
+/*
+ * QEMU KVM ARM specific function stubs
+ *
+ * Copyright Linaro Limited 2013
+ *
+ * Author: Peter Maydell 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+#include "qemu-common.h"
+#include "kvm_arm.h"
+
+bool write_kvmstate_to_list(ARMCPU *cpu)
+{
+abort();
+}
+
+bool write_list_to_kvmstate(ARMCPU *cpu)
+{
+abort();
+}
diff --git a/target-arm/kvm.c b/target-arm/kvm.c
index f427537..66ce67a 100644
--- a/target-arm/kvm.c
+++ b/target-arm/kvm.c
@@ -50,12 +50,35 @@ unsigned long kvm_arch_vcpu_id(CPUState *cpu)
 return cpu->cpu_index;
 }
 
+static bool reg_syncs_via_tuple_list(uint64_t regidx)
+{
+/* Return true if the regidx is a register we should synchronize
+ * via the cpreg_tuples array (ie is not a core reg we sync by
+ * hand in kvm_arch_get/put_registers())
+ */
+switch (regidx & KVM_REG_ARM_COPROC_MASK) {
+case KVM_REG_ARM_CORE:
+case KVM_REG_ARM_VFP:
+return false;
+default:
+return true;
+}
+}
+
+static int compare_u64(const void *a, const void *b)
+{
+return *(uint64_t *)a - *(uint64_t *)b;
+}
+
 int kvm_arch_init_vcpu(CPUState *cs)
 {
 struct kvm_vcpu_init init;
-int ret;
+int i, ret, arraylen;
 uint64_t v;
 struct kvm_one_reg r;
+struct kvm_reg_list rl;
+struct kvm_reg_list *rlp;
+ARMCPU *cpu = ARM_CPU(cs);
 
 init.target = KVM_ARM_TARGET_CORTEX_A15;
 memset(init.features, 0, sizeof(init.features));
@@ -74,6 +97,73 @@ int kvm_arch_init_vcpu(CPUState *cs)
 if (ret == -ENOENT) {
 return -EINVAL;
 }
+
+/* Populate the cpreg list based on the kernel's idea
+ * of what registers exist (and throw away the TCG-created list).
+ */
+rl.n = 0;
+ret = kvm_vcpu_ioctl(cs, KVM_GET_REG_LIST, &rl);
+if (ret != -E2BIG) {
+return ret;
+}
+rlp = g_malloc(sizeof(struct kvm_reg_list) + rl.n * sizeof(uint64_t));
+rlp->n = rl.n;
+ret = kvm_vcpu_ioctl(cs, KVM_GET_REG_LIST, rlp);
+if (ret) {
+goto out;
+}
+/* Sort the list we get back from the kernel, since cpreg_tuples
+ * must be in strictly ascending order.
+ */
+qsort(&rlp->reg, rlp->n, sizeof(rlp->reg[0]), compare_u64);
+
+for (i = 0, arraylen = 0; i < rlp->n; i++) {
+if (!reg_syncs_via_tuple_list(rlp->reg[i])) {
+continue;
+}
+switch (rlp->reg[i] & KVM_REG_SIZE_MASK) {
+case KVM_REG_SIZE_U32:
+case KVM_REG_SIZE_U64:
+break;
+default:
+fprintf(stderr, "Can't handle size of register in kernel list\n");
+ret = -EINVAL;
+goto out;
+}
+
+arraylen++;
+}
+
+cpu->cpreg_indexes = g_renew(uint64_t, cpu->cpreg_indexes, arraylen);
+cpu->cpreg_values = g_renew(uint64_t, cpu->cpreg_values, arraylen);
+cpu->cpreg_vmstate_indexes = g_renew(uint64_t, cpu->cpreg_vmstate_indexes,
+ arraylen);
+cpu->cpreg_vmstate_values = g_renew(uint64_t, cpu->cpreg_vmstate_values,
+arraylen);
+cpu->cpreg_array_len = arraylen;
+cpu->cpreg_vmstate_array_len = arraylen;
+
+for (i = 0, arraylen = 0; i < rlp->n; i++) {
+uint64_t regidx = rlp->reg[i];
+if (!reg_syncs_via_tuple_list(regidx)) {
+continue;
+}
+cpu->cpreg_indexes[arraylen] = regidx;
+arraylen++;
+}
+assert(cpu->cpreg_array_len == arraylen);
+
+if (!write_kvmstate_to_list(cpu)) {
+/* Shouldn't happen unless kernel is inconsistent about
+ * what registers exist.
+ */
+fprintf(stderr, "Initial read of kernel register state failed\n");
+ret = -EINVAL;
+goto out;
+}
+
+out:
+g_free(rlp);
 return ret;
 }
 
@@ -163,6 +253,78 @@ void kvm_arm_register_device(MemoryReg

[Qemu-devel] [PULL 1/8] target-arm: Allow special cpregs to have flags set

2013-06-25 Thread Peter Maydell
Relax the "is this a valid ARMCPRegInfo type value?" check to permit
"special" cpregs to have flags other than ARM_CP_SPECIAL set. At
the moment none of the other flags are relevant for special regs,
but the migration related flag we're about to introduce can apply
here too.

Signed-off-by: Peter Maydell 
---
 target-arm/cpu.h |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target-arm/cpu.h b/target-arm/cpu.h
index 5438444..737c00c 100644
--- a/target-arm/cpu.h
+++ b/target-arm/cpu.h
@@ -456,7 +456,7 @@ static inline bool cptype_valid(int cptype)
 {
 return ((cptype & ~ARM_CP_FLAG_MASK) == 0)
 || ((cptype & ARM_CP_SPECIAL) &&
-(cptype <= ARM_LAST_SPECIAL));
+((cptype & ~ARM_CP_FLAG_MASK) <= ARM_LAST_SPECIAL));
 }
 
 /* Access rights:
-- 
1.7.9.5




[Qemu-devel] [PULL 0/8] target-arm queue

2013-06-25 Thread Peter Maydell
Hi; this is the usual target-arm pullreq, mostly just the cpregs
migration patchset that I posted a while back.

NB: I've updated my make-pullreq script to create a GPG-signed
pull request, but I'm not sure if I got it right -- feedback
welcome :-)

In particular, target-arm.for-upstream is still a branch name
as usual; the signed tag is "pull-target-arm-20130625"; I'm
not sure whether the tag should be the thing named in the
'available at' line below rather than the branch.


The following changes since commit baf8673ca802cb3ea2cdbe94813441d23bde223b:

  Merge remote-tracking branch 'stefanha/block' into staging (2013-06-24 
14:33:17 -0500)

are available in the git repository at:


  git://git.linaro.org/people/pmaydell/qemu-arm.git target-arm.for-upstream

for you to fetch changes up to bdcc150dc44ea96152f05f9e68970b63508d5ae7:

  target-arm: Make LPAE feature imply V7MP (2013-06-25 18:16:10 +0100)


target-arm queue


Peter Maydell (8):
  target-arm: Allow special cpregs to have flags set
  target-arm: Add raw_readfn and raw_writefn to ARMCPRegInfo
  target-arm: mark up cpregs for no-migrate or raw access
  target-arm: Convert TCG to using (index,value) list for cp migration
  target-arm: Initialize cpreg list from KVM when using KVM
  target-arm: Reinitialize all KVM VCPU registers on reset
  target-arm: Use tuple list to sync cp regs with KVM
  target-arm: Make LPAE feature imply V7MP

 target-arm/Makefile.objs |1 +
 target-arm/cpu-qom.h |   24 
 target-arm/cpu.c |4 +-
 target-arm/cpu.h |   89 -
 target-arm/helper.c  |  327 +++---
 target-arm/kvm-stub.c|   23 
 target-arm/kvm.c |  292 +++--
 target-arm/kvm_arm.h |   33 +
 target-arm/machine.c |  134 ---
 9 files changed, 760 insertions(+), 167 deletions(-)
 create mode 100644 target-arm/kvm-stub.c



[Qemu-devel] [PULL 6/8] target-arm: Reinitialize all KVM VCPU registers on reset

2013-06-25 Thread Peter Maydell
Since the ARM KVM API doesn't include a "reset this VCPU"
ioctl, we have to capture the initial values of every
register it knows about so that we can reset the VCPU
by feeding those values back again.

Signed-off-by: Peter Maydell 
---
 target-arm/cpu-qom.h |6 +-
 target-arm/kvm.c |   16 
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/target-arm/cpu-qom.h b/target-arm/cpu-qom.h
index 2242eee..25239b8 100644
--- a/target-arm/cpu-qom.h
+++ b/target-arm/cpu-qom.h
@@ -72,7 +72,11 @@ typedef struct ARMCPU {
 uint64_t *cpreg_indexes;
 /* Values of the registers (cpreg_indexes[i]'s value is cpreg_values[i]) */
 uint64_t *cpreg_values;
-/* Length of the indexes, values arrays */
+/* When using KVM, keeps a copy of the initial state of the VCPU,
+ * so that on reset we can feed the reset values back into the kernel.
+ */
+uint64_t *cpreg_reset_values;
+/* Length of the indexes, values, reset_values arrays */
 int32_t cpreg_array_len;
 /* These are used only for migration: incoming data arrives in
  * these fields and is sanity checked in post_load before copying
diff --git a/target-arm/kvm.c b/target-arm/kvm.c
index 66ce67a..49108cf 100644
--- a/target-arm/kvm.c
+++ b/target-arm/kvm.c
@@ -162,6 +162,13 @@ int kvm_arch_init_vcpu(CPUState *cs)
 goto out;
 }
 
+/* Save a copy of the initial register values so that we can
+ * feed it back to the kernel on VCPU reset.
+ */
+cpu->cpreg_reset_values = g_memdup(cpu->cpreg_values,
+   cpu->cpreg_array_len *
+   sizeof(cpu->cpreg_values[0]));
+
 out:
 g_free(rlp);
 return ret;
@@ -603,6 +610,15 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
 
 void kvm_arch_reset_vcpu(CPUState *cs)
 {
+/* Feed the kernel back its initial register state */
+ARMCPU *cpu = ARM_CPU(cs);
+
+memmove(cpu->cpreg_values, cpu->cpreg_reset_values,
+cpu->cpreg_array_len * sizeof(cpu->cpreg_values[0]));
+
+if (!write_list_to_kvmstate(cpu)) {
+abort();
+}
 }
 
 bool kvm_arch_stop_on_emulation_error(CPUState *cs)
-- 
1.7.9.5




Re: [Qemu-devel] [PATCH 21/26] kvmclock: use realize for kvmclock

2013-06-25 Thread Eduardo Habkost
On Tue, Jun 25, 2013 at 10:20:08AM +0800, Hu Tao wrote:
[...]
> > Is TYPE_SYS_BUS_DEVICE guaranteed to never override ->realize() itself?
> > 
> > From DeviceClass documentation:
> > 
> >  * If a type derived directly from TYPE_DEVICE implements @realize, it does
> >  * not need to implement @init and therefore does not need to store and call
> >  * #DeviceClass' default @realize callback.
> >  * For other types consult the documentation and implementation of the
> >  * respective parent types.
> > 
> > The problem is that there's no documentation about ->realize() on
> > SysBusDeviceClass. Can we please explicitly document SysBusDeviceClass
> > expectations about ->realize() first, before making those changes?
> 
> IIUC, subclass's overriding ->realize should call parent's ->realize at
> some point. Peter Crosthwaite has a patchset to access a object's parent
> class at http://lists.nongnu.org/archive/html/qemu-devel/2013-06/msg02982.html
> 
> Regarding SysBusDevice::init and SysBusDevice::realize, I think it's the
> same as in the case of DeviceClass. If you agree I'll document it as in
> DeviceClass.

I believe it is reasonable to document that SysBusDevice subclasses
don't need to call the parent ->realize() method, like on DeviceClass.
This is exactly what this patch set does, after all, and I haven't seen
anybody complaining about it yet.

-- 
Eduardo



[Qemu-devel] [PULL 3/8] target-arm: mark up cpregs for no-migrate or raw access

2013-06-25 Thread Peter Maydell
Mark up coprocessor register definitions to add raw access
functions or mark the register as non-migratable where necessary.

Signed-off-by: Peter Maydell 
---
 target-arm/helper.c |  140 ++-
 1 file changed, 94 insertions(+), 46 deletions(-)

diff --git a/target-arm/helper.c b/target-arm/helper.c
index 2585d59..baf7576 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -64,6 +64,20 @@ static int vfp_gdb_set_reg(CPUARMState *env, uint8_t *buf, 
int reg)
 return 0;
 }
 
+static int raw_read(CPUARMState *env, const ARMCPRegInfo *ri,
+uint64_t *value)
+{
+*value = CPREG_FIELD32(env, ri);
+return 0;
+}
+
+static int raw_write(CPUARMState *env, const ARMCPRegInfo *ri,
+ uint64_t value)
+{
+CPREG_FIELD32(env, ri) = value;
+return 0;
+}
+
 static int dacr_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value)
 {
 env->cp15.c3 = value;
@@ -139,13 +153,13 @@ static const ARMCPRegInfo cp_reginfo[] = {
 { .name = "DACR", .cp = 15,
   .crn = 3, .crm = CP_ANY, .opc1 = CP_ANY, .opc2 = CP_ANY,
   .access = PL1_RW, .fieldoffset = offsetof(CPUARMState, cp15.c3),
-  .resetvalue = 0, .writefn = dacr_write },
+  .resetvalue = 0, .writefn = dacr_write, .raw_writefn = raw_write, },
 { .name = "FCSEIDR", .cp = 15, .crn = 13, .crm = 0, .opc1 = 0, .opc2 = 0,
   .access = PL1_RW, .fieldoffset = offsetof(CPUARMState, cp15.c13_fcse),
-  .resetvalue = 0, .writefn = fcse_write },
+  .resetvalue = 0, .writefn = fcse_write, .raw_writefn = raw_write, },
 { .name = "CONTEXTIDR", .cp = 15, .crn = 13, .crm = 0, .opc1 = 0, .opc2 = 
1,
   .access = PL1_RW, .fieldoffset = offsetof(CPUARMState, cp15.c13_fcse),
-  .resetvalue = 0, .writefn = contextidr_write },
+  .resetvalue = 0, .writefn = contextidr_write, .raw_writefn = raw_write, 
},
 /* ??? This covers not just the impdef TLB lockdown registers but also
  * some v7VMSA registers relating to TEX remap, so it is overly broad.
  */
@@ -155,13 +169,17 @@ static const ARMCPRegInfo cp_reginfo[] = {
  * the unified TLB ops but also the dside/iside/inner-shareable variants.
  */
 { .name = "TLBIALL", .cp = 15, .crn = 8, .crm = CP_ANY,
-  .opc1 = CP_ANY, .opc2 = 0, .access = PL1_W, .writefn = tlbiall_write, },
+  .opc1 = CP_ANY, .opc2 = 0, .access = PL1_W, .writefn = tlbiall_write,
+  .type = ARM_CP_NO_MIGRATE },
 { .name = "TLBIMVA", .cp = 15, .crn = 8, .crm = CP_ANY,
-  .opc1 = CP_ANY, .opc2 = 1, .access = PL1_W, .writefn = tlbimva_write, },
+  .opc1 = CP_ANY, .opc2 = 1, .access = PL1_W, .writefn = tlbimva_write,
+  .type = ARM_CP_NO_MIGRATE },
 { .name = "TLBIASID", .cp = 15, .crn = 8, .crm = CP_ANY,
-  .opc1 = CP_ANY, .opc2 = 2, .access = PL1_W, .writefn = tlbiasid_write, },
+  .opc1 = CP_ANY, .opc2 = 2, .access = PL1_W, .writefn = tlbiasid_write,
+  .type = ARM_CP_NO_MIGRATE },
 { .name = "TLBIMVAA", .cp = 15, .crn = 8, .crm = CP_ANY,
-  .opc1 = CP_ANY, .opc2 = 3, .access = PL1_W, .writefn = tlbimvaa_write, },
+  .opc1 = CP_ANY, .opc2 = 3, .access = PL1_W, .writefn = tlbimvaa_write,
+  .type = ARM_CP_NO_MIGRATE },
 /* Cache maintenance ops; some of this space may be overridden later. */
 { .name = "CACHEMAINT", .cp = 15, .crn = 7, .crm = CP_ANY,
   .opc1 = 0, .opc2 = CP_ANY, .access = PL1_W,
@@ -196,7 +214,8 @@ static const ARMCPRegInfo not_v7_cp_reginfo[] = {
   .resetvalue = 0 },
 /* v6 doesn't have the cache ID registers but Linux reads them anyway */
 { .name = "DUMMY", .cp = 15, .crn = 0, .crm = 0, .opc1 = 1, .opc2 = CP_ANY,
-  .access = PL1_R, .type = ARM_CP_CONST, .resetvalue = 0 },
+  .access = PL1_R, .type = ARM_CP_CONST | ARM_CP_NO_MIGRATE,
+  .resetvalue = 0 },
 REGINFO_SENTINEL
 };
 
@@ -235,6 +254,7 @@ static const ARMCPRegInfo v6_cp_reginfo[] = {
 REGINFO_SENTINEL
 };
 
+
 static int pmreg_read(CPUARMState *env, const ARMCPRegInfo *ri,
   uint64_t *value)
 {
@@ -366,13 +386,16 @@ static const ARMCPRegInfo v7_cp_reginfo[] = {
 { .name = "PMCNTENSET", .cp = 15, .crn = 9, .crm = 12, .opc1 = 0, .opc2 = 
1,
   .access = PL0_RW, .resetvalue = 0,
   .fieldoffset = offsetof(CPUARMState, cp15.c9_pmcnten),
-  .readfn = pmreg_read, .writefn = pmcntenset_write },
+  .readfn = pmreg_read, .writefn = pmcntenset_write,
+  .raw_readfn = raw_read, .raw_writefn = raw_write },
 { .name = "PMCNTENCLR", .cp = 15, .crn = 9, .crm = 12, .opc1 = 0, .opc2 = 
2,
   .access = PL0_RW, .fieldoffset = offsetof(CPUARMState, cp15.c9_pmcnten),
-  .readfn = pmreg_read, .writefn = pmcntenclr_write },
+  .readfn = pmreg_read, .writefn = pmcntenclr_write,
+  .type = ARM_CP_NO_MIGRATE },
 { .name = "PMOVSR", .cp = 15, .crn = 9, .crm = 12, .opc1 = 0, .opc2 = 3,
   .access = PL0_RW, .fieldoffset = offsetof(CPUARMState, cp15.c9_pmovsr),
-  .

[Qemu-devel] [PULL 8/8] target-arm: Make LPAE feature imply V7MP

2013-06-25 Thread Peter Maydell
The v7 ARM ARM specifies that the Large Physical Address
Extension requires implementation of the Multiprocessing
Extensions, so make our LPAE feature imply V7MP rather
than specifying both in the A15 CPU initfn.

Signed-off-by: Peter Maydell 
Reviewed-by: Andreas Färber 
Message-id: 1371127899-10364-1-git-send-email-peter.mayd...@linaro.org
---
 target-arm/cpu.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target-arm/cpu.c b/target-arm/cpu.c
index 241f032..2371f48 100644
--- a/target-arm/cpu.c
+++ b/target-arm/cpu.c
@@ -198,6 +198,7 @@ static void arm_cpu_realizefn(DeviceState *dev, Error 
**errp)
 set_feature(env, ARM_FEATURE_VFP);
 }
 if (arm_feature(env, ARM_FEATURE_LPAE)) {
+set_feature(env, ARM_FEATURE_V7MP);
 set_feature(env, ARM_FEATURE_PXN);
 }
 
@@ -573,7 +574,6 @@ static void cortex_a15_initfn(Object *obj)
 set_feature(&cpu->env, ARM_FEATURE_NEON);
 set_feature(&cpu->env, ARM_FEATURE_THUMB2EE);
 set_feature(&cpu->env, ARM_FEATURE_ARM_DIV);
-set_feature(&cpu->env, ARM_FEATURE_V7MP);
 set_feature(&cpu->env, ARM_FEATURE_GENERIC_TIMER);
 set_feature(&cpu->env, ARM_FEATURE_DUMMY_C15_REGS);
 set_feature(&cpu->env, ARM_FEATURE_LPAE);
-- 
1.7.9.5




Re: [Qemu-devel] [PATCH] i.MX31: Fix PRCS bit test

2013-06-25 Thread Peter Maydell
On 9 June 2013 21:44, Stefan Weil  wrote:
> cppcheck detected a condition which was always false.
>
> According to the MCIMX31 Reference Manual, the PRCS bits have to be 01
> to select the Frequency Pre-Multiplier (FPM). PRCS uses bits 1 and 2,
> so we have to test for 2.
>
> Signed-off-by: Stefan Weil 

Thanks, applied to arm-devs.next.

-- PMM



[Qemu-devel] [PULL 4/8] target-arm: Convert TCG to using (index, value) list for cp migration

2013-06-25 Thread Peter Maydell
Convert the TCG ARM target to using an (index,value) list for migrating
coprocessors. The primary benefit of the (index,value) list is for
passing state between KVM and QEMU, but it works for TCG-to-TCG
migration as well and is a useful self-contained first step.

Signed-off-by: Peter Maydell 
---
 target-arm/cpu-qom.h |   20 ++
 target-arm/cpu.c |2 +
 target-arm/cpu.h |   69 
 target-arm/helper.c  |  174 ++
 target-arm/kvm.c |9 +++
 target-arm/machine.c |  114 +++--
 6 files changed, 341 insertions(+), 47 deletions(-)

diff --git a/target-arm/cpu-qom.h b/target-arm/cpu-qom.h
index 12fcefe..2242eee 100644
--- a/target-arm/cpu-qom.h
+++ b/target-arm/cpu-qom.h
@@ -62,6 +62,25 @@ typedef struct ARMCPU {
 
 /* Coprocessor information */
 GHashTable *cp_regs;
+/* For marshalling (mostly coprocessor) register state between the
+ * kernel and QEMU (for KVM) and between two QEMUs (for migration),
+ * we use these arrays.
+ */
+/* List of register indexes managed via these arrays; (full KVM style
+ * 64 bit indexes, not CPRegInfo 32 bit indexes)
+ */
+uint64_t *cpreg_indexes;
+/* Values of the registers (cpreg_indexes[i]'s value is cpreg_values[i]) */
+uint64_t *cpreg_values;
+/* Length of the indexes, values arrays */
+int32_t cpreg_array_len;
+/* These are used only for migration: incoming data arrives in
+ * these fields and is sanity checked in post_load before copying
+ * to the working data structures above.
+ */
+uint64_t *cpreg_vmstate_indexes;
+uint64_t *cpreg_vmstate_values;
+int32_t cpreg_vmstate_array_len;
 
 /* The instance init functions for implementation-specific subclasses
  * set these fields to specify the implementation-dependent values of
@@ -116,6 +135,7 @@ extern const struct VMStateDescription vmstate_arm_cpu;
 #endif
 
 void register_cp_regs_for_features(ARMCPU *cpu);
+void init_cpreg_list(ARMCPU *cpu);
 
 void arm_cpu_do_interrupt(CPUState *cpu);
 void arm_v7m_cpu_do_interrupt(CPUState *cpu);
diff --git a/target-arm/cpu.c b/target-arm/cpu.c
index 496a59f..241f032 100644
--- a/target-arm/cpu.c
+++ b/target-arm/cpu.c
@@ -204,6 +204,8 @@ static void arm_cpu_realizefn(DeviceState *dev, Error 
**errp)
 register_cp_regs_for_features(cpu);
 arm_cpu_register_gdb_regs_for_features(cpu);
 
+init_cpreg_list(cpu);
+
 cpu_reset(CPU(cpu));
 qemu_init_vcpu(env);
 
diff --git a/target-arm/cpu.h b/target-arm/cpu.h
index 1d8eba5..abcc0b4 100644
--- a/target-arm/cpu.h
+++ b/target-arm/cpu.h
@@ -424,6 +424,43 @@ void armv7m_nvic_complete_irq(void *opaque, int irq);
 (((cp) << 16) | ((is64) << 15) | ((crn) << 11) |\
  ((crm) << 7) | ((opc1) << 3) | (opc2))
 
+/* Note that these must line up with the KVM/ARM register
+ * ID field definitions (kvm.c will check this, but we
+ * can't just use the KVM defines here as the kvm headers
+ * are unavailable to non-KVM-specific files)
+ */
+#define CP_REG_SIZE_SHIFT 52
+#define CP_REG_SIZE_MASK   0x00f0ULL
+#define CP_REG_SIZE_U320x0020ULL
+#define CP_REG_SIZE_U640x0030ULL
+#define CP_REG_ARM 0x4000ULL
+
+/* Convert a full 64 bit KVM register ID to the truncated 32 bit
+ * version used as a key for the coprocessor register hashtable
+ */
+static inline uint32_t kvm_to_cpreg_id(uint64_t kvmid)
+{
+uint32_t cpregid = kvmid;
+if ((kvmid & CP_REG_SIZE_MASK) == CP_REG_SIZE_U64) {
+cpregid |= (1 << 15);
+}
+return cpregid;
+}
+
+/* Convert a truncated 32 bit hashtable key into the full
+ * 64 bit KVM register ID.
+ */
+static inline uint64_t cpreg_to_kvm_id(uint32_t cpregid)
+{
+uint64_t kvmid = cpregid & ~(1 << 15);
+if (cpregid & (1 << 15)) {
+kvmid |= CP_REG_SIZE_U64 | CP_REG_ARM;
+} else {
+kvmid |= CP_REG_SIZE_U32 | CP_REG_ARM;
+}
+return kvmid;
+}
+
 /* ARMCPRegInfo type field bits. If the SPECIAL bit is set this is a
  * special-behaviour cp reg and bits [15..8] indicate what behaviour
  * it has. Otherwise it is a simple cp reg, where CONST indicates that
@@ -621,6 +658,38 @@ static inline bool cp_access_ok(CPUARMState *env,
 return (ri->access >> ((arm_current_pl(env) * 2) + isread)) & 1;
 }
 
+/**
+ * write_list_to_cpustate
+ * @cpu: ARMCPU
+ *
+ * For each register listed in the ARMCPU cpreg_indexes list, write
+ * its value from the cpreg_values list into the ARMCPUState structure.
+ * This updates TCG's working data structures from KVM data or
+ * from incoming migration state.
+ *
+ * Returns: true if all register values were updated correctly,
+ * false if some register was unknown or could not be written.
+ * Note that we do not stop early on failure -- we will attempt
+ * writing all registers in the list.
+ */
+bool write_list_to_cpustate(ARMCPU *cpu);
+
+/**
+ * write_cpustate

[Qemu-devel] [PULL 7/8] target-arm: Use tuple list to sync cp regs with KVM

2013-06-25 Thread Peter Maydell
Use the tuple list of cp registers for syncing KVM state to QEMU,
rather than only syncing a very minimal set by hand.

Signed-off-by: Peter Maydell 
---
 target-arm/kvm.c |  103 +-
 1 file changed, 33 insertions(+), 70 deletions(-)

diff --git a/target-arm/kvm.c b/target-arm/kvm.c
index 49108cf..d3937a2 100644
--- a/target-arm/kvm.c
+++ b/target-arm/kvm.c
@@ -344,17 +344,6 @@ typedef struct Reg {
 offsetof(CPUARMState, QEMUFIELD) \
 }
 
-#define CP15REG(CRN, CRM, OPC1, OPC2, QEMUFIELD) \
-{\
-KVM_REG_ARM | KVM_REG_SIZE_U32 | \
-(15 << KVM_REG_ARM_COPROC_SHIFT) |   \
-((CRN) << KVM_REG_ARM_32_CRN_SHIFT) |\
-((CRM) << KVM_REG_ARM_CRM_SHIFT) |   \
-((OPC1) << KVM_REG_ARM_OPC1_SHIFT) | \
-((OPC2) << KVM_REG_ARM_32_OPC2_SHIFT),   \
-offsetof(CPUARMState, QEMUFIELD) \
-}
-
 #define VFPSYSREG(R)   \
 {  \
 KVM_REG_ARM | KVM_REG_SIZE_U32 | KVM_REG_ARM_VFP | \
@@ -403,12 +392,6 @@ static const Reg regs[] = {
 COREREG(fiq_regs[7], banked_spsr[5]),
 /* R15 */
 COREREG(usr_regs.uregs[15], regs[15]),
-/* A non-comprehensive set of cp15 registers.
- * TODO: drive this from the cp_regs hashtable instead.
- */
-CP15REG(1, 0, 0, 0, cp15.c1_sys), /* SCTLR */
-CP15REG(2, 0, 0, 2, cp15.c2_control), /* TTBCR */
-CP15REG(3, 0, 0, 0, cp15.c3), /* DACR */
 /* VFP system registers */
 VFPSYSREG(FPSID),
 VFPSYSREG(MVFR1),
@@ -426,7 +409,6 @@ int kvm_arch_put_registers(CPUState *cs, int level)
 int mode, bn;
 int ret, i;
 uint32_t cpsr, fpscr;
-uint64_t ttbr;
 
 /* Make sure the banked regs are properly set */
 mode = env->uncached_cpsr & CPSR_M;
@@ -460,26 +442,6 @@ int kvm_arch_put_registers(CPUState *cs, int level)
 return ret;
 }
 
-/* TTBR0: cp15 crm=2 opc1=0 */
-ttbr = ((uint64_t)env->cp15.c2_base0_hi << 32) | env->cp15.c2_base0;
-r.id = KVM_REG_ARM | KVM_REG_SIZE_U64 | (15 << KVM_REG_ARM_COPROC_SHIFT) |
-(2 << KVM_REG_ARM_CRM_SHIFT) | (0 << KVM_REG_ARM_OPC1_SHIFT);
-r.addr = (uintptr_t)(&ttbr);
-ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, &r);
-if (ret) {
-return ret;
-}
-
-/* TTBR1: cp15 crm=2 opc1=1 */
-ttbr = ((uint64_t)env->cp15.c2_base1_hi << 32) | env->cp15.c2_base1;
-r.id = KVM_REG_ARM | KVM_REG_SIZE_U64 | (15 << KVM_REG_ARM_COPROC_SHIFT) |
-(2 << KVM_REG_ARM_CRM_SHIFT) | (1 << KVM_REG_ARM_OPC1_SHIFT);
-r.addr = (uintptr_t)(&ttbr);
-ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, &r);
-if (ret) {
-return ret;
-}
-
 /* VFP registers */
 r.id = KVM_REG_ARM | KVM_REG_SIZE_U64 | KVM_REG_ARM_VFP;
 for (i = 0; i < 32; i++) {
@@ -496,6 +458,31 @@ int kvm_arch_put_registers(CPUState *cs, int level)
 fpscr = vfp_get_fpscr(env);
 r.addr = (uintptr_t)&fpscr;
 ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, &r);
+if (ret) {
+return ret;
+}
+
+/* Note that we do not call write_cpustate_to_list()
+ * here, so we are only writing the tuple list back to
+ * KVM. This is safe because nothing can change the
+ * CPUARMState cp15 fields (in particular gdb accesses cannot)
+ * and so there are no changes to sync. In fact syncing would
+ * be wrong at this point: for a constant register where TCG and
+ * KVM disagree about its value, the preceding write_list_to_cpustate()
+ * would not have had any effect on the CPUARMState value (since the
+ * register is read-only), and a write_cpustate_to_list() here would
+ * then try to write the TCG value back into KVM -- this would either
+ * fail or incorrectly change the value the guest sees.
+ *
+ * If we ever want to allow the user to modify cp15 registers via
+ * the gdb stub, we would need to be more clever here (for instance
+ * tracking the set of registers kvm_arch_get_registers() successfully
+ * managed to update the CPUARMState with, and only allowing those
+ * to be written back up into the kernel).
+ */
+if (!write_list_to_kvmstate(cpu)) {
+return EINVAL;
+}
 
 return ret;
 }
@@ -508,7 +495,6 @@ int kvm_arch_get_registers(CPUState *cs)
 int mode, bn;
 int ret, i;
 uint32_t cpsr, fpscr;
-uint64_t ttbr;
 
 for (i = 0; i < ARRAY_SIZE(regs); i++) {
 r.id = regs[i].id;
@@ -529,28 +515,6 @@ int kvm_arch_get_registers(CPUState *cs)
 }
 cpsr_write(env, cpsr, 0x);
 
-/* TTBR0: cp15 crm=2 opc1=0 */
-r.id = KVM_REG_ARM | KVM_REG_SIZE_U64 | (15 << KVM_REG_ARM_COPROC_SHIFT) |
-(2 << KVM_REG_ARM_CRM_SHIFT) | (0 << KVM_REG_ARM_OPC1_SHIFT);
-r.addr = (uintptr_t)(&ttbr);
-ret = kvm_vcpu_ioctl(cs, KVM_GET_ONE_REG, &r);
-if (ret) {

[Qemu-devel] [PULL 2/8] target-arm: Add raw_readfn and raw_writefn to ARMCPRegInfo

2013-06-25 Thread Peter Maydell
For reading and writing register values from the kernel for KVM,
we need to provide accessor functions which are guaranteed to succeed
and don't impose access checks, mask out unwritable bits, etc.
Define new fields raw_readfn and raw_writefn for this purpose;
these only need to be provided if there is a readfn or writefn
already and it is not suitable.

Signed-off-by: Peter Maydell 
---
 target-arm/cpu.h|   18 +-
 target-arm/helper.c |   13 +
 2 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/target-arm/cpu.h b/target-arm/cpu.h
index 737c00c..1d8eba5 100644
--- a/target-arm/cpu.h
+++ b/target-arm/cpu.h
@@ -434,19 +434,22 @@ void armv7m_nvic_complete_irq(void *opaque, int irq);
  * a register definition to override a previous definition for the
  * same (cp, is64, crn, crm, opc1, opc2) tuple: either the new or the
  * old must have the OVERRIDE bit set.
+ * NO_MIGRATE indicates that this register should be ignored for migration;
+ * (eg because any state is accessed via some other coprocessor register).
  */
 #define ARM_CP_SPECIAL 1
 #define ARM_CP_CONST 2
 #define ARM_CP_64BIT 4
 #define ARM_CP_SUPPRESS_TB_END 8
 #define ARM_CP_OVERRIDE 16
+#define ARM_CP_NO_MIGRATE 32
 #define ARM_CP_NOP (ARM_CP_SPECIAL | (1 << 8))
 #define ARM_CP_WFI (ARM_CP_SPECIAL | (2 << 8))
 #define ARM_LAST_SPECIAL ARM_CP_WFI
 /* Used only as a terminator for ARMCPRegInfo lists */
 #define ARM_CP_SENTINEL 0x
 /* Mask of only the flag bits in a type field */
-#define ARM_CP_FLAG_MASK 0x1f
+#define ARM_CP_FLAG_MASK 0x3f
 
 /* Return true if cptype is a valid type field. This is used to try to
  * catch errors where the sentinel has been accidentally left off the end
@@ -562,6 +565,19 @@ struct ARMCPRegInfo {
  * by fieldoffset.
  */
 CPWriteFn *writefn;
+/* Function for doing a "raw" read; used when we need to copy
+ * coprocessor state to the kernel for KVM or out for
+ * migration. This only needs to be provided if there is also a
+ * readfn and it makes an access permission check.
+ */
+CPReadFn *raw_readfn;
+/* Function for doing a "raw" write; used when we need to copy KVM
+ * kernel coprocessor state into userspace, or for inbound
+ * migration. This only needs to be provided if there is also a
+ * writefn and it makes an access permission check or masks out
+ * "unwritable" bits or has write-one-to-clear or similar behaviour.
+ */
+CPWriteFn *raw_writefn;
 /* Function for resetting the register. If NULL, then reset will be done
  * by writing resetvalue to the field specified in fieldoffset. If
  * fieldoffset is 0 then no reset will be done.
diff --git a/target-arm/helper.c b/target-arm/helper.c
index fd055e8..2585d59 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -1392,6 +1392,19 @@ void define_one_arm_cp_reg_with_opaque(ARMCPU *cpu,
 r2->crm = crm;
 r2->opc1 = opc1;
 r2->opc2 = opc2;
+/* By convention, for wildcarded registers only the first
+ * entry is used for migration; the others are marked as
+ * NO_MIGRATE so we don't try to transfer the register
+ * multiple times. Special registers (ie NOP/WFI) are
+ * never migratable.
+ */
+if ((r->type & ARM_CP_SPECIAL) ||
+((r->crm == CP_ANY) && crm != 0) ||
+((r->opc1 == CP_ANY) && opc1 != 0) ||
+((r->opc2 == CP_ANY) && opc2 != 0)) {
+r2->type |= ARM_CP_NO_MIGRATE;
+}
+
 /* Overriding of an existing definition must be explicitly
  * requested.
  */
-- 
1.7.9.5




Re: [Qemu-devel] [PATCH arm-devs v1 1/1] char/cadence_uart: Fix reset for unattached instances

2013-06-25 Thread Peter Maydell
On 7 June 2013 05:02,   wrote:
> From: Peter Crosthwaite 
>
> commit 1db8b5efe0c2b5000e50691eea61264a615f43de introduced an issue
> where QEMU would segfault if you have an unattached Cadence UART.
>
> Fix by guarding the flush-on-reset logic on there being a qemu_chr
> attachment.
>
> Reported-by: Soren Brinkmann 
> Signed-off-by: Peter Crosthwaite 
> ---
>
>  hw/char/cadence_uart.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/hw/char/cadence_uart.c b/hw/char/cadence_uart.c
> index c2a7834..29827d2 100644
> --- a/hw/char/cadence_uart.c
> +++ b/hw/char/cadence_uart.c
> @@ -157,7 +157,9 @@ static void uart_rx_reset(UartState *s)
>  {
>  s->rx_wpos = 0;
>  s->rx_count = 0;
> -qemu_chr_accept_input(s->chr);
> +if (s->chr) {
> +qemu_chr_accept_input(s->chr);
> +}

There seem to be a lot of references to s->chr in this device:
is this really the only one which needs a guard against NULL?

thanks
-- PMM



[Qemu-devel] [PATCH] block/ssh: Set bdrv_has_zero_init according to the file type.

2013-06-25 Thread Richard W.M. Jones
From: "Richard W.M. Jones" 

If the remote is a regular file, set it to true (ie. reads of
uninitialized areas in a newly created file will return zeroes).
If we can't prove that, return false (a safe default).

Tested by adding a debugging print statement [not part of this commit]
and creating a remote file and a remote block device:

  $ ./qemu-img create ssh://localhost/tmp/new 100M
  Formatting 'ssh://localhost/tmp/new', fmt=raw size=104857600
  filename ssh://localhost/tmp/new: has_zero_init = 1
  $ sudo lvcreate -L 1G -n tmp /dev/fedora
Logical volume "tmp" created
  $ ./qemu-img create ssh://localhost/dev/fedora/tmp 1G
  Formatting 'ssh://localhost/dev/fedora/tmp', fmt=raw size=1073741824
  filename ssh://localhost/dev/fedora/tmp: has_zero_init = 0

Cc: Kevin Wolf 
Signed-off-by: Richard W.M. Jones 
---
 block/ssh.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/block/ssh.c b/block/ssh.c
index 246a70d..d7e7bf8 100644
--- a/block/ssh.c
+++ b/block/ssh.c
@@ -716,6 +716,21 @@ static void ssh_close(BlockDriverState *bs)
 ssh_state_free(s);
 }
 
+static int ssh_has_zero_init(BlockDriverState *bs)
+{
+BDRVSSHState *s = bs->opaque;
+/* Assume false, unless we can positively prove it's true. */
+int has_zero_init = 0;
+
+if (s->attrs.flags & LIBSSH2_SFTP_ATTR_PERMISSIONS) {
+if (s->attrs.permissions & LIBSSH2_SFTP_S_IFREG) {
+has_zero_init = 1;
+}
+}
+
+return has_zero_init;
+}
+
 static void restart_coroutine(void *opaque)
 {
 Coroutine *co = opaque;
@@ -1037,6 +1052,7 @@ static BlockDriver bdrv_ssh = {
 .bdrv_file_open   = ssh_file_open,
 .bdrv_create  = ssh_create,
 .bdrv_close   = ssh_close,
+.bdrv_has_zero_init   = ssh_has_zero_init,
 .bdrv_co_readv= ssh_co_readv,
 .bdrv_co_writev   = ssh_co_writev,
 .bdrv_getlength   = ssh_getlength,
-- 
1.8.2.1




Re: [Qemu-devel] [PATCH v11 14/15] rdma: introduce MIG_STATE_NONE and change MIG_STATE_SETUP state transition

2013-06-25 Thread Michael R. Hines

On 06/25/2013 10:55 AM, Paolo Bonzini wrote:

Il 25/06/2013 16:54, Michael R. Hines ha scritto:

Do those few lines of code change the protocol?  If yes, I'll go against
all my previous recommendations and suggest a #define.  If no, it is
fine to leave it for later, but I would still suggest posting the patch
on the list just for information.

Ok, you got it - no it does not change the protocol.
I'll use a #define and send it out with a new version for review.

Make sure you send an additional patch on top of these 15.

Paolo


Understood.

- Michael




Re: [Qemu-devel] [PATCH v11 14/15] rdma: introduce MIG_STATE_NONE and change MIG_STATE_SETUP state transition

2013-06-25 Thread Michael R. Hines

On 06/25/2013 10:17 AM, Juan Quintela wrote:

"Michael R. Hines"  wrote:

On 06/25/2013 06:13 AM, Paolo Bonzini wrote:

Il 25/06/2013 11:49, Juan Quintela ha scritto:

mrhi...@linux.vnet.ibm.com wrote:

From: "Michael R. Hines" 

As described in the previous patch, until now, the MIG_STATE_SETUP
state was not really a 'formal' state. It has been used as a 'zero' state
(what we're calling 'NONE' here) and QEMU has been unconditionally transitioning
into this state when the QMP migration command was called. Instead we want to
introduce MIG_STATE_NONE, which is our starting state in the state machine, and
then immediately transition into the MIG_STATE_SETUP state when the QMP migrate
command is issued.

In order to do this, we must delay the transition into MIG_STATE_ACTIVE until
later in the migration_thread(). This is done to be able to timestamp the 
amount of
time spent in the SETUP state for proper accounting to the user during
an RDMA migration.

Furthermore, the management software, until now, has never been aware of the
existence of the SETUP state whatsoever. This must change, because, timing of 
this
state implies that the state actually exists.

These two patches cannot be separated because the 'query_migrate' QMP
switch statement needs to know how to handle this new state transition.

Tested-by: Michael R. Hines 
Signed-off-by: Michael R. Hines 
@@ -316,6 +321,7 @@ static void migrate_fd_cancel(MigrationState *s)
   {
   DPRINTF("cancelling migration\n");
   +migrate_set_state(s, MIG_STATE_SETUP, MIG_STATE_CANCELLED);
   migrate_set_state(s, MIG_STATE_ACTIVE, MIG_STATE_CANCELLED);
   }

This chunk is wrong.

we can call qme_migrate_cancel() at any point,  and it is going to be
called normally from MIG_STATE_ACTIVE.

  migrate_set_satet(s, s->state,  MIG_STATE_CANCELLED)

should do the trick.  Or something like that,  what do you think?

I don't like the three-arguments migrate_set_state, but I don't have any
better idea.

With Juan's modification, it is fine (but not reviewed-by me :)).  While
you resend, the first 13 patches of v10 can be merged (pull request).
You can then rebase the last three on top.

So, I should send a v11 before Juan applies and generates the pull request?

Did I understand correctly?


Michael, did you look at the "debugging/getting the protocol ready" mode
where all pages are unpinned?

Paolo



We did not come up with a design on how such a mode would be
exposed to the user. Implementing this in migration-rdma.c is
just a few lines of code, but without a clear use case in this patch
series, I've not chosen expose it to the user yet without some kind
of agreement with you guys.

Recommendations?

Dunno.  I have tried to use the thing:

I am using -incoming x-rdma:deus:

(qemu) char device redirected to /dev/pts/2 (label serial0)
librdmacm: Warning: couldn't read ABI version.
librdmacm: Warning: assuming: 4
librdmacm: Fatal: unable to get RDMA device list
RDMA ERROR: could not create rdma event channel
-incoming x-rdma:deus:: RDMA ERROR: could not create rdma event channel



This means that your RDMA environment / IB configuration
(outside of QEMU) is not configured properly. There are several
"sanity checks" you can do to make sure it is working:

Run:

$ ibv_devices <= You should see a list of configured IB devices
$ ibv_devinfo <= You should see "PORT_ACTIVE" in the output 
somewhere for each IB device

$ ifconfig ib0 <= if using standard infiniband
$ ifconfig eth0   <= if using RDMA over Converged ethernet

Furthermore, there are several latency/throughput utilities that
come along with the OpenFabrics software stack you can run
to make sure your two hosts are communicating with each other:

There's these commands:

$ ucmatose  [args...]
$ ibv_rc_pingpong [args ..]
$ rdma_server/rdma_client [args...]



I also used "0" instead of deus (the machine name).

The only thing that I did appart from intsalling librdmacm-devel was:


# cat /etc/udev/rules.d/80-infinibad.rules
KERNEL="rdma_cm", NAME="infiniband/%k", MODE="0666"

As per README.  Something else I need to do to use the tcp
implementations?

Later,  Juan.










Re: [Qemu-devel] [PATCH v11 11/15] rdma: core logic

2013-06-25 Thread Paolo Bonzini
Il 25/06/2013 18:31, Vasilis Liaskovitis ha scritto:
> Then I think qemu_rdma_dump_id can segfault trying to dereference a null verbs
> structure. The dump_id function should check for non-NULL verbs argument,
> or the dump should be made only in the (verbs != NULL) if clause.
> 
> Disabling the dump_id above, I have rdma_resolve_addr() problems on the source
> VM side (getting RDMA_CM_EVENT_ADDR_ERROR instead of
> RDMA_CM_EVENT_ADDR_RESOLVED).
> 
> I assume that is because of the null verbs structure destination problem 
> above.
> qemu_rdma_dest_prepare() will always fail with a NULL verbs argument:

FWIW, I don't think this should block merging this patch.  We will
always find one more bug after merging 2500 lines of code.

Paolo



[Qemu-devel] [PULL v2 18/21] pci: Simpler implementation of primary PCI bus

2013-06-25 Thread Michael S. Tsirkin
From: David Gibson 

Currently pci_find_primary_bus() searches the list of root buses for one
with domain 0.  But since host buses are always registered with domain 0,
this just amounts to finding the only PCI host bus.  The only remaining
users of pci_find_primary_bus() are in pci-hotplug-old.c, which implements
the old style pci_add/pci_del commands.

Therefore, this patch redefines pci_find_primary_bus() to find the only
PCI root bus, returning an error if there are multiple roots.  The callers
in pci-hotplug-old.c are updated correspondingly, to produce sensible
error messages.

Signed-off-by: David Gibson 
Signed-off-by: Michael S. Tsirkin 
---
 hw/pci/pci-hotplug-old.c | 26 --
 hw/pci/pci.c |  9 ++---
 2 files changed, 26 insertions(+), 9 deletions(-)

diff --git a/hw/pci/pci-hotplug-old.c b/hw/pci/pci-hotplug-old.c
index 807260c..8077289 100644
--- a/hw/pci/pci-hotplug-old.c
+++ b/hw/pci/pci-hotplug-old.c
@@ -62,10 +62,17 @@ static PCIDevice *qemu_pci_hot_add_nic(Monitor *mon,
 {
 Error *local_err = NULL;
 QemuOpts *opts;
+PCIBus *root = pci_find_primary_bus();
 PCIBus *bus;
 int ret, devfn;
 
-bus = pci_get_bus_devfn(&devfn, pci_find_primary_bus(), devaddr);
+if (!root) {
+monitor_printf(mon, "no primary PCI bus (if there are multiple"
+   " PCI roots, you must use device_add instead)");
+return NULL;
+}
+
+bus = pci_get_bus_devfn(&devfn, root, devaddr);
 if (!bus) {
 monitor_printf(mon, "Invalid PCI device address %s\n", devaddr);
 return NULL;
@@ -92,8 +99,7 @@ static PCIDevice *qemu_pci_hot_add_nic(Monitor *mon,
 monitor_printf(mon, "Parameter addr not supported\n");
 return NULL;
 }
-return pci_nic_init(&nd_table[ret], pci_find_primary_bus(),
-"rtl8139", devaddr);
+return pci_nic_init(&nd_table[ret], root, "rtl8139", devaddr);
 }
 
 static int scsi_hot_add(Monitor *mon, DeviceState *adapter,
@@ -144,7 +150,8 @@ int pci_drive_hot_add(Monitor *mon, const QDict *qdict, 
DriveInfo *dinfo)
 switch (dinfo->type) {
 case IF_SCSI:
 if (!root) {
-monitor_printf(mon, "no primary PCI bus\n");
+monitor_printf(mon, "no primary PCI bus (if there are multiple"
+   " PCI roots, you must use device_add instead)");
 goto err;
 }
 if (pci_read_devaddr(mon, pci_addr, &pci_bus, &slot)) {
@@ -177,6 +184,7 @@ static PCIDevice *qemu_pci_hot_add_storage(Monitor *mon,
 DriveInfo *dinfo = NULL;
 int type = -1;
 char buf[128];
+PCIBus *root = pci_find_primary_bus();
 PCIBus *bus;
 int devfn;
 
@@ -206,7 +214,12 @@ static PCIDevice *qemu_pci_hot_add_storage(Monitor *mon,
 dinfo = NULL;
 }
 
-bus = pci_get_bus_devfn(&devfn, pci_find_primary_bus(), devaddr);
+if (!root) {
+monitor_printf(mon, "no primary PCI bus (if there are multiple"
+   " PCI roots, you must use device_add instead)");
+return NULL;
+}
+bus = pci_get_bus_devfn(&devfn, root, devaddr);
 if (!bus) {
 monitor_printf(mon, "Invalid PCI device address %s\n", devaddr);
 return NULL;
@@ -293,7 +306,8 @@ static int pci_device_hot_remove(Monitor *mon, const char 
*pci_addr)
 Error *local_err = NULL;
 
 if (!root) {
-monitor_printf(mon, "no primary PCI bus\n");
+monitor_printf(mon, "no primary PCI bus (if there are multiple"
+   " PCI roots, you must use device_del instead)");
 return -1;
 }
 
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 6747ac5..75513eb 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -249,15 +249,18 @@ static void pci_host_bus_register(int domain, PCIBus *bus)
 
 PCIBus *pci_find_primary_bus(void)
 {
+PCIBus *primary_bus = NULL;
 struct PCIHostBus *host;
 
 QLIST_FOREACH(host, &host_buses, next) {
-if (host->domain == 0) {
-return host->bus;
+if (primary_bus) {
+/* We have multiple root buses, refuse to select a primary */
+return NULL;
 }
+primary_bus = host->bus;
 }
 
-return NULL;
+return primary_bus;
 }
 
 PCIBus *pci_device_root_bus(const PCIDevice *d)
-- 
MST




  1   2   3   >