date:20160104

Re: [Qemu-devel] [PATCH] i2c-tiny-usb: add new usb to i2c bridge

2016-01-04 Thread Gerd Hoffmann


> +case 0x4107:
> +/* this seems to be a byte type access */
> +if (i2c_start_transfer(s->i2cbus, /*address*/index, 0)) {
> +trace_usb_i2c_tiny_i2c_start_transfer_failed();
> +p->actual_length = 0; /* write failure */
> +break;
> +}
> +for (i = 0; i < length; i++) {
> +trace_usb_i2c_tiny_write(request, index, i, data[i]);
> +i2c_send(s->i2cbus, data[i]);
> +}
> +p->actual_length = length;
> +i2c_end_transfer(s->i2cbus);
> +break;

I think most of the tracepoints should be moved into i2c code (or just
dropped in case we already have tracepoints there).

One (high-level) tracepoint per transfer request makes sense in the usb
code, i.e. trace_usb_i2c_transfer_{read,write}, so one can see in the
trace log which usb request triggered which i2c transaction.

> +case 0xc101:
> +{
> +/* thats what the real thing reports, FIXME: can we do better here? 
> */

Hmm, didn't we agree on adding a note about what the "real thing" we
mimic here is, to the comment at the start of the file?

> +uint32_t func = htole32(I2C_FUNC_I2C | I2C_FUNC_SMBUS_EMUL);

Can we move 'func' to the start of the function too, like we did with
'i'?

> +case 0xc106:
> +trace_usb_i2c_tiny_unknown_request(index, request, value, length);
> +trace_usb_i2c_tiny_unknown_request(data[0], data[1], data[2], 
> data[3]);
> +if (i2c_start_transfer(s->i2cbus, /*address*/ index, 1)) {
> +trace_usb_i2c_tiny_i2c_start_transfer_failed();
> +p->actual_length = 0;
> +break;
> +}

Doesn't look like this request is unknown ...

> +for (i = 0; i < length; i++) {
> +data[i] = i2c_recv(s->i2cbus);

Can this fail?

cheers,
  Gerd

[Qemu-devel] [PATCH v2 2/3] qemu-nbd: Minor texi updates

2016-01-04 Thread Sitsofe Wheeler

- Change some spacing.
- Add disconnect usage to synopsis.
- Highlight the command and its options in the synopsis.
- Fix up the grammar in the description.
- Move filename variable description out of the option table.
- Add a description of the dev variable.
- Remove duplicate entry for --format.
- Reword --discard documentation.
- Add --detect-zeroes documentation.
- Add reference to qemu man page to see also section.

Signed-off-by: Sitsofe Wheeler 
---
 qemu-nbd.texi | 34 ++
 1 file changed, 22 insertions(+), 12 deletions(-)

diff --git a/qemu-nbd.texi b/qemu-nbd.texi
index 26cc985..5331d69 100644
--- a/qemu-nbd.texi
+++ b/qemu-nbd.texi
@@ -1,19 +1,23 @@
 @example
 @c man begin SYNOPSIS
-usage: qemu-nbd [OPTION]...  @var{filename}
+@command{qemu-nbd} [OPTION]... @var{filename}
+
+@command{qemu-nbd} @option{-d} @var{dev}
 @c man end
 @end example
 
 @c man begin DESCRIPTION
 
-Export QEMU disk image using NBD protocol.
+Export a QEMU disk image using the NBD protocol.
 
 @c man end
 
 @c man begin OPTIONS
+@var{filename} is a disk image filename.
+
+@var{dev} is an NBD device.
+
 @table @option
-@item @var{filename}
-is a disk image filename
 @item -p, --port=@var{port}
 port to listen on (default @samp{10809})
 @item -o, --offset=@var{offset}
@@ -22,8 +26,9 @@ offset into the image
 interface to bind to (default @samp{0.0.0.0})
 @item -k, --socket=@var{path}
 Use a unix socket with path @var{path}
-@item -f, --format=@var{format}
-Set image format as @var{format}
+@item -f, --format=@var{fmt}
+force the use of the block driver for format @var{fmt} instead of
+auto-detecting
 @item -r, --read-only
 export read-only
 @item -P, --partition=@var{num}
@@ -44,17 +49,22 @@ the emulator's @code{-drive cache=...} option for allowed 
values.
 choose asynchronous I/O mode between @samp{threads} (the default)
 and @samp{native} (Linux only).
 @item --discard=@var{discard}
-toggles whether @dfn{discard} (also known as @dfn{trim} or @dfn{unmap})
-requests are ignored or passed to the filesystem.  The default is no
-(@samp{--discard=ignore}).
+controls whether @dfn{discard} (also known as @dfn{trim} or @dfn{unmap})
+requests are ignored or passed to the filesystem.  @var{discard} is one of
+@samp{ignore} (or @samp{off}), @samp{unmap} (or @samp{on}).  The default is
+@samp{ignore}.
+@item --detect-zeroes=@var{detect-zeroes}
+enables the automatic conversion of plain zero writes by the OS to
+driver-specific optimized zero write commands.  @var{detect-zeroes} is one of
+@samp{off}, @samp{on} or @samp{unmap}.  @samp{unmap}
+converts a zero write to an unmap operation and can only be used if
+@var{discard} is set to @samp{unmap}.  The default is @samp{off}.
 @item -c, --connect=@var{dev}
 connect @var{filename} to NBD device @var{dev}
 @item -d, --disconnect
 disconnect the specified device
 @item -e, --shared=@var{num}
 device can be shared by @var{num} clients (default @samp{1})
-@item -f, --format=@var{fmt}
-force block driver for format @var{fmt} instead of auto-detecting
 @item -t, --persistent
 don't exit on the last connection
 @item -v, --verbose
@@ -79,7 +89,7 @@ warranty; not even for MERCHANTABILITY or FITNESS FOR A 
PARTICULAR PURPOSE.
 @c man end
 
 @c man begin SEEALSO
-qemu-img(1)
+qemu(1), qemu-img(1)
 @c man end
 
 @end ignore
-- 
2.4.3

[Qemu-devel] [PATCH v2 3/3] qemu-nbd: Fix texi sentence capitalisation

2016-01-04 Thread Sitsofe Wheeler

Capitalise the first letter of sentences (and reword for grammar) the
options section of qemu-nbd.texi.

Signed-off-by: Sitsofe Wheeler 
---
 qemu-nbd.texi | 38 +++---
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/qemu-nbd.texi b/qemu-nbd.texi
index 5331d69..0027841 100644
--- a/qemu-nbd.texi
+++ b/qemu-nbd.texi
@@ -19,60 +19,60 @@ Export a QEMU disk image using the NBD protocol.
 
 @table @option
 @item -p, --port=@var{port}
-port to listen on (default @samp{10809})
+The TCP port to listen on (default @samp{10809})
 @item -o, --offset=@var{offset}
-offset into the image
+The offset into the image
 @item -b, --bind=@var{iface}
-interface to bind to (default @samp{0.0.0.0})
+The interface to bind to (default @samp{0.0.0.0})
 @item -k, --socket=@var{path}
 Use a unix socket with path @var{path}
 @item -f, --format=@var{fmt}
-force the use of the block driver for format @var{fmt} instead of
+Force the use of the block driver for format @var{fmt} instead of
 auto-detecting
 @item -r, --read-only
-export read-only
+Export the disk as read-only
 @item -P, --partition=@var{num}
-only expose partition @var{num}
+Only expose partition @var{num}
 @item -s, --snapshot
-use @var{filename} as an external snapshot, create a temporary
+Use @var{filename} as an external snapshot, create a temporary
 file with backing_file=@var{filename}, redirect the write to
 the temporary one
 @item -l, --load-snapshot=@var{snapshot_param}
-load an internal snapshot inside @var{filename} and export it
+Load an internal snapshot inside @var{filename} and export it
 as an read-only device, @var{snapshot_param} format is
 'snapshot.id=[ID],snapshot.name=[NAME]' or '[ID_OR_NAME]'
 @item -n, --nocache
 @itemx --cache=@var{cache}
-set cache mode to be used with the file.  See the documentation of
+The cache mode to be used with the file.  See the documentation of
 the emulator's @code{-drive cache=...} option for allowed values.
 @item --aio=@var{aio}
-choose asynchronous I/O mode between @samp{threads} (the default)
+Set the asynchronous I/O mode between @samp{threads} (the default)
 and @samp{native} (Linux only).
 @item --discard=@var{discard}
-controls whether @dfn{discard} (also known as @dfn{trim} or @dfn{unmap})
+Control whether @dfn{discard} (also known as @dfn{trim} or @dfn{unmap})
 requests are ignored or passed to the filesystem.  @var{discard} is one of
 @samp{ignore} (or @samp{off}), @samp{unmap} (or @samp{on}).  The default is
 @samp{ignore}.
 @item --detect-zeroes=@var{detect-zeroes}
-enables the automatic conversion of plain zero writes by the OS to
+Control the automatic conversion of plain zero writes by the OS to
 driver-specific optimized zero write commands.  @var{detect-zeroes} is one of
 @samp{off}, @samp{on} or @samp{unmap}.  @samp{unmap}
 converts a zero write to an unmap operation and can only be used if
 @var{discard} is set to @samp{unmap}.  The default is @samp{off}.
 @item -c, --connect=@var{dev}
-connect @var{filename} to NBD device @var{dev}
+Connect @var{filename} to NBD device @var{dev}
 @item -d, --disconnect
-disconnect the specified device
+Disconnect the device @var{dev}
 @item -e, --shared=@var{num}
-device can be shared by @var{num} clients (default @samp{1})
+Allow up to @var{num} clients to share the device (default @samp{1})
 @item -t, --persistent
-don't exit on the last connection
+Don't exit on the last connection
 @item -v, --verbose
-display extra debugging information
+Display extra debugging information
 @item -h, --help
-display this help and exit
+Display this help and exit
 @item -V, --version
-output version information and exit
+Display version information and exit
 @end table
 
 @c man end
-- 
2.4.3

[Qemu-devel] [PATCH v2 1/3] qemu-nbd: Fix unintended texi verbatim formatting

2016-01-04 Thread Sitsofe Wheeler

Indented lines in the texi meant the perlpod produced interpreted the
paragraph as being verbatim (thus formatting codes were not
interpreted). Fix this by un-indenting problem lines.

Signed-off-by: Sitsofe Wheeler 
---
 qemu-nbd.texi | 58 +-
 1 file changed, 29 insertions(+), 29 deletions(-)

diff --git a/qemu-nbd.texi b/qemu-nbd.texi
index 46fd483..26cc985 100644
--- a/qemu-nbd.texi
+++ b/qemu-nbd.texi
@@ -13,56 +13,56 @@ Export QEMU disk image using NBD protocol.
 @c man begin OPTIONS
 @table @option
 @item @var{filename}
- is a disk image filename
+is a disk image filename
 @item -p, --port=@var{port}
-  port to listen on (default @samp{10809})
+port to listen on (default @samp{10809})
 @item -o, --offset=@var{offset}
-  offset into the image
+offset into the image
 @item -b, --bind=@var{iface}
-  interface to bind to (default @samp{0.0.0.0})
+interface to bind to (default @samp{0.0.0.0})
 @item -k, --socket=@var{path}
-  Use a unix socket with path @var{path}
+Use a unix socket with path @var{path}
 @item -f, --format=@var{format}
-  Set image format as @var{format}
+Set image format as @var{format}
 @item -r, --read-only
-  export read-only
+export read-only
 @item -P, --partition=@var{num}
-  only expose partition @var{num}
+only expose partition @var{num}
 @item -s, --snapshot
-  use @var{filename} as an external snapshot, create a temporary
-  file with backing_file=@var{filename}, redirect the write to
-  the temporary one
+use @var{filename} as an external snapshot, create a temporary
+file with backing_file=@var{filename}, redirect the write to
+the temporary one
 @item -l, --load-snapshot=@var{snapshot_param}
-  load an internal snapshot inside @var{filename} and export it
-  as an read-only device, @var{snapshot_param} format is
-  'snapshot.id=[ID],snapshot.name=[NAME]' or '[ID_OR_NAME]'
+load an internal snapshot inside @var{filename} and export it
+as an read-only device, @var{snapshot_param} format is
+'snapshot.id=[ID],snapshot.name=[NAME]' or '[ID_OR_NAME]'
 @item -n, --nocache
 @itemx --cache=@var{cache}
-  set cache mode to be used with the file.  See the documentation of
-  the emulator's @code{-drive cache=...} option for allowed values.
+set cache mode to be used with the file.  See the documentation of
+the emulator's @code{-drive cache=...} option for allowed values.
 @item --aio=@var{aio}
-  choose asynchronous I/O mode between @samp{threads} (the default)
-  and @samp{native} (Linux only).
+choose asynchronous I/O mode between @samp{threads} (the default)
+and @samp{native} (Linux only).
 @item --discard=@var{discard}
-  toggles whether @dfn{discard} (also known as @dfn{trim} or @dfn{unmap})
-  requests are ignored or passed to the filesystem.  The default is no
-  (@samp{--discard=ignore}).
+toggles whether @dfn{discard} (also known as @dfn{trim} or @dfn{unmap})
+requests are ignored or passed to the filesystem.  The default is no
+(@samp{--discard=ignore}).
 @item -c, --connect=@var{dev}
-  connect @var{filename} to NBD device @var{dev}
+connect @var{filename} to NBD device @var{dev}
 @item -d, --disconnect
-  disconnect the specified device
+disconnect the specified device
 @item -e, --shared=@var{num}
-  device can be shared by @var{num} clients (default @samp{1})
+device can be shared by @var{num} clients (default @samp{1})
 @item -f, --format=@var{fmt}
-  force block driver for format @var{fmt} instead of auto-detecting
+force block driver for format @var{fmt} instead of auto-detecting
 @item -t, --persistent
-  don't exit on the last connection
+don't exit on the last connection
 @item -v, --verbose
-  display extra debugging information
+display extra debugging information
 @item -h, --help
-  display this help and exit
+display this help and exit
 @item -V, --version
-  output version information and exit
+output version information and exit
 @end table
 
 @c man end
-- 
2.4.3

Re: [Qemu-devel] [PATCH v2 1/4] Add Error **errp for xen_host_pci_device_get()

2016-01-04 Thread Cao jin




On 01/04/2016 11:15 PM, Stefano Stabellini wrote:

On Sun, 27 Dec 2015, Cao jin wrote:

To catch the error msg. Also modify the caller

Signed-off-by: Cao jin 


This looks much better, thanks.



[...]


-int xen_host_pci_device_get(XenHostPCIDevice *d, uint16_t domain,
-uint8_t bus, uint8_t dev, uint8_t func)
+void xen_host_pci_device_get(XenHostPCIDevice *d, uint16_t domain,
+uint8_t bus, uint8_t dev, uint8_t func,
+Error **errp)
  {
  unsigned int v;
-int rc = 0;

  d->config_fd = -1;
  d->domain = domain;
@@ -353,43 +360,48 @@ int xen_host_pci_device_get(XenHostPCIDevice *d, uint16_t 
domain,
  d->dev = dev;
  d->func = func;

-rc = xen_host_pci_config_open(d);
-if (rc) {
+xen_host_pci_config_open(d, errp);
+if (*errp) {


I think that errp could be NULL, therefore the right way to do this is:

 Error *err = NULL;
 foo(arg, &err);
 if (err) {
 handle the error...
 error_propagate(errp, err);
 }

see the comment at the beginning of include/qapi/error.h.



Hi stefano,

I read that comment, and find something maybe new:

"errp could be NULL", I think it is saying, if we are in a .realize() 
function, yes, *errp* maybe NULL, but reality is, here is the callee of 
.realize(), and we defined a local variable: Error *local_err = NULL in 
.realize() and passed it to all the callee, so, theoretically *errp* 
won`t be NULL. so the way you said above is suitable in .realize() IMHO, 
and I also did it in that way.


comment also says:

 * Receive an error and pass it on to the caller:
 * Error *err = NULL;
 * foo(arg, &err);
 * if (err) {
 * handle the error...
 * error_propagate(errp, err);
 * }
 * where Error **errp is a parameter, by convention the last one.

If I understand the last sentence well, the Error **errp in .realize() 
prototype is *the last one*, so we could call error_propagate(errp, err) 
only in .realize()


The comment also says:

 * But when all you do with the error is pass it on, please use
 * foo(arg, errp);
 * for readability."

We just pass error on in all the callees, so I guess I also did as 
comment suggest?


How do you think?

[...]
--
Yours Sincerely,

Cao Jin

Re: [Qemu-devel] [PATCH 0/2] Fix some coverity reported defects

2016-01-04 Thread Gerd Hoffmann

On Mi, 2015-12-23 at 14:39 +0530, Bandan Das wrote:
> The first change replaces QLIST_FOREACH with the safe variant
> and the second was incorrectly using MTPObject * in the trace function
> after freeing it.
> 
> Bandan Das (2):
>   usb-mtp: use safe variant when cleaning events list
>   usb-mtp: fix call to trace function
> 
>  hw/usb/dev-mtp.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 

Added to usb patch queue.

thanks,
  Gerd

Re: [Qemu-devel] [vfio-users] [PATCH v2 1/3] input: add qemu_input_qcode_to_linux + qemu_input_linux_to_qcode

2016-01-04 Thread Gerd Hoffmann

  Hi,

> like this?
> 
> cgroup_controllers = [ "cpu", "memory", "blkio", "cpuset", "cpuacct" ]

yes (+libvirtd restart so it re-reads the config).

cheers,
  Gerd

Re: [Qemu-devel] [vfio-users] [PATCH v2 1/3] input: add qemu_input_qcode_to_linux + qemu_input_linux_to_qcode

2016-01-04 Thread Gerd Hoffmann

On Mo, 2016-01-04 at 13:19 +, Jonathan Scruggs wrote:
> Oh. I just changed /dev/input/eventx (replace x with correct number
> for my devices) to permissions of 666 and it worked. I guess I had to
> change the conf file and change the permissions. Is there a way to
> make the devices work with qemu? The permission user is root and group
> of input for all the eventx devices. Do I need a udev script or is
> there a qemu user that can be added to the group of input?

I'm using chmod 666, adding the qemu user to the input group should work
too.

cheers,
  Gerd

Re: [Qemu-devel] [PATCH] sdhci: add quirk property for card insert interrupt status on Raspberry Pi

2016-01-04 Thread Peter Crosthwaite

On Mon, Jan 4, 2016 at 2:12 PM, Andrew Baumann
 wrote:
>> From: Peter Crosthwaite [mailto:crosthwaitepe...@gmail.com]
>> Sent: Thursday, 31 December 2015 21:38
>> On Thu, Dec 31, 2015 at 1:40 PM, Andrew Baumann
>>  wrote:
>> > This quirk is a workaround for the following hardware behaviour, on
>> > which UEFI (specifically, the bootloader for Windows on Pi2) depends:
>> >
>> > 1. at boot with an SD card present, the interrupt status/enable
>> >registers are initially zero
>> > 2. upon enabling it in the interrupt enable register, the card insert
>> >bit in the interrupt status register is immediately set
>> > 3. after a subsequent controller reset, the card insert interrupt does
>> >not fire, even if enabled in the interrupt enable register
>> >
>>
>> This is a baffling symptom. Does prnsts card ejection state fully work
>> with physical card ejections and insertions both before and after the
>> subsequent controller reset?
>
> I just tested this, by polling prnsts and intsts in a tight loop at board 
> startup. At power on with a card inserted, prnsts reads 1FFF. Subsequent 
> removal of the card, re-insertion etc. does not change its value.

Does either the subsequent reset or the interrupt ack change it? I'm
assuming it is stuck permanently at 1fff.

>After enabling interrupts, I reliably see a card insert interrupt in intsts. 
>If I then write zero to the interrupt enable register, the pending card insert 
>interrupt remains, which seems to dispel the "mask on read" theory. Once acked 
>or reset, the card insert interrupt never recurs. I never saw a card removal 
>interrupt.
>

So

* interrupt status is initially 0
* writing one to enable triggers the ghost
* it can only be cleared with a status ack
* you can never get a second ghost

This means you have two latches as there is no way it can be driven by
the raw pin, state, otherwise it would recur.

> I did once see a card interrupt (0x100, i.e. the one that comes from the card 
> itself, not the controller) after re-inserting the card, but I think that's 
> irrelevant.
>
> It's impossible to boot the Pi without having a card inserted (well, maybe 
> with a jtag debugger), but I did try inserting the card around 0.5s after 
> applying power, and the results were the same.
>
> So, without the prnsts bits, I can't confirm or deny your theory about 
> debouncing logic,

It's disproven if you can never observe something other than 1FFF
for prnsts anyway.

> but either way there is a reliable ghost of a card insertion interrupt that 
> is signalled at power on, and remains pending until it is either acked or the 
> controller reset, after which point it never recurs. And I'd really like to 
> model that somehow without making a mess of sdhci.c :) Any ideas?
>

Ok, I think it can be explained as a bad top-level connection as
follows. The pin is mis-connected in such a way that such that it sees
one edge on the POR reset and never sees any action again. The
controller considers this pin edge-triggered and has the penning quirk
as well, that is it saves edge interrupt until they are enabled and
then releases them singly to the status register.

This doesn't explain why the controller doesn't see the interrupt on
the soft reset, but perhaps that is explained by the spec, as I don't
see anywhere that says that the interrupt has to retrigger for a
constantly inserted card over a controller reset. Might be
implementation specific.

Looking at the set_cb stuff, I think the guard on your original quirk
implementation may be missing for the sd_set_cb() in sdhci_initfn().
If this guard were added that quirk would be more complete, as
currently it probably is seeing action on changes of state.

I think the way to correct the original quirk is to:

* make both sd_set_cb()'s conditional
* manually call insert_eject_cb() on the POR reset (call the CB
instead of register it).

Note that sdhci has no device::reset callback. You could add this to
implement your POR reset.

You then have the problem of the prnsts register, which I assume it
getting blasted by the reset memset. That can be managed by
specifically preserving those two bits of prnsts through the reset
(with an accompanying comment that this is needed for your quirk).

Your patch as-is here doesn't seem to address the penning behaviour
(where the interrupt status remains clear until it is enabled), maybe
that can be added as a second quirk if needed later?

Regards,
Peter

> Andrew

Re: [Qemu-devel] [XenGT][IGVT-g] Device model creation failed

2016-01-04 Thread Tian, Kevin

For below error:

[ 5023.070461] vGT error:(g2h_gm_range:1660) VM(21): invalid address range: 
g_addr(0x1000), size(0x1000)

Likely you didn’t use a XenGT-aware i915 driver inside VM, which is required 
e.g. to handle address space ballooning. A simple way is to copy same Dom0 
kernel/initrd/modules into guest image.

Thanks
Kevin

From: Oleksii Kurochko [mailto:oleksii.kuroc...@globallogic.com]
Sent: Sunday, January 03, 2016 3:16 AM
To: Tian, Kevin
Cc: igv...@lists.01.org; igv...@ml01.01.org; qemu-devel; Gerd Hoffmann; Li, 
Susie; Dong, Eddie; xen-de...@lists.xen.org
Subject: Re: [XenGT][IGVT-g] Device model creation failed

Hello.
I've tried byt_experimental branch and got some another result.
With vgt=1 it failed with some time and got next log:
[ 4859.380332] vGT info:(create_vgt_instance:118) vm_id=21, low_gm_sz=128MB, 
high_gm_sz=384MB, fence_sz=4, vgt_primary=1
[ 4859.389889] vGT info:(create_vgt_instance:170) Virtual GTT size: 0x20
[ 4859.393284] VM21 Ring0 context_save_area is allocated at gm(f90d000)
[ 4859.416542] VM21 Ring1 context_save_area is allocated at gm(f94d000)
[ 4859.416916] VM21 Ring2 context_save_area is allocated at gm(f98d000)
[ 4859.417492] vGT info:(create_vgt_instance:206) aperture: [0xc780, 
0xcf7f] guest [0xc780, 0xcf7f] va(0xc9001820)
[ 4859.417971] vGT info:(create_vgt_instance:217) GM: [0x780, 0xf7f], 
[0x2800, 0x3fff], guest[0x780, 0xf7f], [0x2800, 0x3fff]
[ 4859.422914] vGT info:(create_vgt_instance:254) filling VGT_PVINFO_PAGE for 
dom21:
[ 4859.422914]visable_gm_base=0x780, size=0x800
[ 4859.422914]hidden_gm_base=0x2800, size=0x1800
[ 4859.422914]fence_base=4, num=4
[ 4859.423521] vGT info:(vgt_hvm_io_req_handler:795) Received a IOREQ w/o vcpu 
target
[ 4859.424042] vGT info:(vgt_hvm_io_req_handler:796) Possible a false request 
from event binding
[ 4859.424555] vGT(1): create debugfs node: virtual_mmio_space
[ 4859.424713] vGT(1): create debugfs node: shadow_mmio_space
[ 4859.424883] vGT(1): create debugfs node: frame_buffer_format
[ 4859.425037] vGT(1): create debugfs node: frame_buffer_format
[ 4859.425974] vGT info:(vgt_emulation_thread:530) start kthread for VM21
[ 4859.426231] vGT info:(vgt_propagate_edid:770) EDID_PROPAGATE: Clear PORT_A 
for vm 21
[ 4859.426508] vGT info:(vgt_propagate_edid:770) EDID_PROPAGATE: Clear PORT_B 
for vm 21
[ 4859.426759] vGT info:(vgt_propagate_edid:770) EDID_PROPAGATE: Clear PORT_C 
for vm 21
[ 4859.426992] vGT info:(vgt_propagate_edid:770) EDID_PROPAGATE: Clear PORT_D 
for vm 21
[ 4859.427224] vGT info:(vgt_propagate_edid:770) EDID_PROPAGATE: Clear PORT_E 
for vm 21
[ 4859.518800] vGT info:(vgt_vport_connection_store:595) Monitor detection: 
PORT_A  is disconnected
[ 4859.567255] vGT info:(vgt_vport_connection_store:595) Monitor detection: 
PORT_B  is disconnected
[ 4859.631077] vGT info:(vgt_vport_connection_store:595) Monitor detection: 
PORT_C  is disconnected
[ 4859.682043] vGT info:(vgt_vport_connection_store:595) Monitor detection: 
PORT_D  is disconnected
[ 4859.734223] vGT warning:(pch_adpa_mmio_write:1174) HOTPLUG_FORCE_TRIGGER is 
set while VGA is enabled!
[ 4859.775194] vGT info:(vgt_vport_connection_store:583) Monitor detection: 
PORT_E  is connected
[ 4860.111434] add_map: domid=21 gfn_s=0xc7800 mfn_s=0xc7800 nr_mfns=0x8000
[ 4933.642429] vGT warning:(vgt_hvm_vmem_init:301) VM21: vmem_sz=0xf000!
[ 4943.772119] vGT warning:(vgt_emulate_read:355) vGT: untracked MMIO read: 
vm_id(21), offset=0x4094,len=4, val=0x0!!! base_off=0x0
[ 4945.801949] vGT warning:(vgt_emulate_read:355) vGT: untracked MMIO read: 
vm_id(21), offset=0x182110,len=4, val=0x0!!! base_off=0x2110
[ 4945.803030] vGT warning:(vgt_emulate_write:457) vGT: untracked MMIO write: 
vm_id(21), offset=0x182110,len=4, val=0x0!!! base_off=0x2110
[ 4945.972045] vGT warning:(vgt_emulate_read:355) vGT: untracked MMIO read: 
vm_id(21), offset=0x182110,len=4, val=0x0!!! base_off=0x2110
[ 4945.972649] vGT warning:(vgt_emulate_write:457) vGT: untracked MMIO write: 
vm_id(21), offset=0x182110,len=4, val=0x1!!! base_off=0x2110
[ 4946.038157] vGT warning:(vgt_emulate_read:355) vGT: untracked MMIO read: 
vm_id(21), offset=0x61204,len=4, val=0x0!!! base_off=0x0
[ 4946.039587] vGT warning:(vgt_emulate_write:457) vGT: untracked MMIO write: 
vm_id(21), offset=0x61204,len=4, val=0xabcd!!! base_off=0x0
[ 4983.876827] vGT info:(vgt_handle_default_event_virt:947) IRQ: VM(21) receive 
event (Blitter Command Streamer MI USER INTERRUPT)
[ 5023.070461] vGT error:(g2h_gm_range:1660) VM(21): invalid address range: 
g_addr(0x1000), size(0x1000)
[ 5023.070776] Assert at drivers/xen/vgt/aperture_gm.c line 70
[ 5023.071049] vGT warning:(mmio_g2h_gmadr:70) Killing VM21

Create in next way:
sudo xl -vvv create ubuntu.hvm
Parsing config from ubuntu.hvm
WARNING: specifying "tsc_mode" as an integer is deprecated. Please use the 
named parameter variant. e.g. tsc_mode="default"
WARNING: ignoring "kernel" d

Re: [Qemu-devel] [PATCH v2 1/4] Add Error **errp for xen_host_pci_device_get()

2016-01-04 Thread Cao jin




On 01/04/2016 11:15 PM, Stefano Stabellini wrote:

On Sun, 27 Dec 2015, Cao jin wrote:

To catch the error msg. Also modify the caller

Signed-off-by: Cao jin 


This looks much better, thanks.



[...]


-int xen_host_pci_device_get(XenHostPCIDevice *d, uint16_t domain,
-uint8_t bus, uint8_t dev, uint8_t func)
+void xen_host_pci_device_get(XenHostPCIDevice *d, uint16_t domain,
+uint8_t bus, uint8_t dev, uint8_t func,
+Error **errp)
  {
  unsigned int v;
-int rc = 0;

  d->config_fd = -1;
  d->domain = domain;
@@ -353,43 +360,48 @@ int xen_host_pci_device_get(XenHostPCIDevice *d, uint16_t 
domain,
  d->dev = dev;
  d->func = func;

-rc = xen_host_pci_config_open(d);
-if (rc) {
+xen_host_pci_config_open(d, errp);
+if (*errp) {


I think that errp could be NULL, therefore the right way to do this is:

 Error *err = NULL;
 foo(arg, &err);
 if (err) {
 handle the error...
 error_propagate(errp, err);
 }

see the comment at the beginning of include/qapi/error.h.



Thanks for reminding, I didn`t see the comment of error.h before, now I 
am aware why lots of people like the style you mentioned. Will fix it in 
next version, also the comments in other patch.


[...]

--
Yours Sincerely,

Cao jin

Re: [Qemu-devel] [PATCH v3 1/1] xlnx-zynqmp: Add support for high DDR memory regions

2016-01-04 Thread Peter Crosthwaite

On Mon, Jan 4, 2016 at 12:47 PM, Alistair Francis
 wrote:
> The Xilinx ZynqMP SoC and EP108 board supports three memory regions:
>  - A 2GB region starting at 0
>  - A 32GB region starting at 32GB
>  - A 256GB region starting at 768GB
>
> This patch adds support for the first two memory regions, which is
> automatically created based on the size specified by the QEMU memory
> command line argument.
>
> On hardware the physical memory region is one continuous region, it is then
> mapped into the three different regions by the DDRC. As we don't model the
> DDRC this is done at startup by QEMU. The board creates the memory region and
> then passes that memory region to the SoC. The SoC then maps the memory
> regions.
>
> Signed-off-by: Alistair Francis 
> ---
> V3:
>  - Assert on the RAM sizes
>  - Remove ram_size property
>  - General fixes
> V2:
>  - Create one continuous memory region and pass it to the SoC
>
> Also, the Xilinx ZynqMP TRM is avaliable at:
> http://www.xilinx.com/products/silicon-devices/soc/zynq-ultrascale-mpsoc.html?resultsTablePreSelect=documenttype:User%20Guides#documentation
>
>  hw/arm/xlnx-ep108.c  | 38 --
>  hw/arm/xlnx-zynqmp.c | 36 
>  include/hw/arm/xlnx-zynqmp.h | 13 +
>  3 files changed, 69 insertions(+), 18 deletions(-)
>
> diff --git a/hw/arm/xlnx-ep108.c b/hw/arm/xlnx-ep108.c
> index 85b978f..d55663b 100644
> --- a/hw/arm/xlnx-ep108.c
> +++ b/hw/arm/xlnx-ep108.c
> @@ -25,9 +25,6 @@ typedef struct XlnxEP108 {
>  MemoryRegion ddr_ram;
>  } XlnxEP108;
>
> -/* Max 2GB RAM */
> -#define EP108_MAX_RAM_SIZE 0x8000ull
> -
>  static struct arm_boot_info xlnx_ep108_binfo;
>
>  static void xlnx_ep108_init(MachineState *machine)
> @@ -35,20 +32,12 @@ static void xlnx_ep108_init(MachineState *machine)
>  XlnxEP108 *s = g_new0(XlnxEP108, 1);
>  Error *err = NULL;
>
> -object_initialize(&s->soc, sizeof(s->soc), TYPE_XLNX_ZYNQMP);
> -object_property_add_child(OBJECT(machine), "soc", OBJECT(&s->soc),
> -  &error_abort);
> -
> -object_property_set_bool(OBJECT(&s->soc), true, "realized", &err);
> -if (err) {
> -error_report("%s", error_get_pretty(err));
> -exit(1);
> -}
> -
> -if (machine->ram_size > EP108_MAX_RAM_SIZE) {
> +/* Create the memory region to pass to the SoC */
> +if (machine->ram_size > XLNX_ZYNQMP_MAX_RAM_SIZE) {
>  error_report("WARNING: RAM size " RAM_ADDR_FMT " above max 
> supported, "
> - "reduced to %llx", machine->ram_size, 
> EP108_MAX_RAM_SIZE);
> -machine->ram_size = EP108_MAX_RAM_SIZE;
> + "reduced to %llx", machine->ram_size,
> + XLNX_ZYNQMP_MAX_RAM_SIZE);
> +machine->ram_size = XLNX_ZYNQMP_MAX_RAM_SIZE;
>  }
>
>  if (machine->ram_size < 0x0800) {
> @@ -56,9 +45,22 @@ static void xlnx_ep108_init(MachineState *machine)
>   machine->ram_size);
>  }
>
> -memory_region_allocate_system_memory(&s->ddr_ram, NULL, "ddr-ram",
> +memory_region_allocate_system_memory(&s->ddr_ram, NULL,
> + "ddr-ram",

Whitespace change unneeded.

>   machine->ram_size);
> -memory_region_add_subregion(get_system_memory(), 0, &s->ddr_ram);
> +
> +object_initialize(&s->soc, sizeof(s->soc), TYPE_XLNX_ZYNQMP);
> +object_property_add_child(OBJECT(machine), "soc", OBJECT(&s->soc),
> +  &error_abort);
> +
> +object_property_set_link(OBJECT(&s->soc), OBJECT(&s->ddr_ram),
> + "ddr-ram", &error_abort);
> +
> +object_property_set_bool(OBJECT(&s->soc), true, "realized", &err);
> +if (err) {
> +error_report("%s", error_get_pretty(err));
> +exit(1);
> +}
>
>  xlnx_ep108_binfo.ram_size = machine->ram_size;
>  xlnx_ep108_binfo.kernel_filename = machine->kernel_filename;
> diff --git a/hw/arm/xlnx-zynqmp.c b/hw/arm/xlnx-zynqmp.c
> index 87553bb..e749fd0 100644
> --- a/hw/arm/xlnx-zynqmp.c
> +++ b/hw/arm/xlnx-zynqmp.c
> @@ -90,6 +90,11 @@ static void xlnx_zynqmp_init(Object *obj)
>&error_abort);
>  }
>
> +object_property_add_link(obj, "ddr-ram", TYPE_MEMORY_REGION,
> + (Object **)&s->ddr_ram,
> + qdev_prop_allow_set_link_before_realize,
> + OBJ_PROP_LINK_UNREF_ON_RELEASE, &error_abort);
> +
>  object_initialize(&s->gic, sizeof(s->gic), TYPE_ARM_GIC);
>  qdev_set_parent_bus(DEVICE(&s->gic), sysbus_get_default());
>
> @@ -120,9 +125,40 @@ static void xlnx_zynqmp_realize(DeviceState *dev, Error 
> **errp)
>  MemoryRegion *system_memory = get_system_memory();
>  uint8_t i;
>  const char *boot_cpu = s->boot_cpu ? s->boot_cpu : "apu-cpu[0]";
> +ram_addr_t ddr_low_size, ddr_high_s

Re: [Qemu-devel] [PATCH v4 11/14] vmdk: Return extent's file in bdrv_get_block_status

2016-01-04 Thread Fam Zheng

On Mon, 01/04 21:48, Max Reitz wrote:
> On 24.12.2015 06:50, Fam Zheng wrote:
> > Reviewed-by: Stefan Hajnoczi 
> > Signed-off-by: Fam Zheng 
> > ---
> >  block/vmdk.c | 11 +--
> >  1 file changed, 5 insertions(+), 6 deletions(-)
> > 
> > diff --git a/block/vmdk.c b/block/vmdk.c
> > index f5a56fd..b60a5af 100644
> > --- a/block/vmdk.c
> > +++ b/block/vmdk.c
> > @@ -1265,6 +1265,7 @@ static int64_t coroutine_fn 
> > vmdk_co_get_block_status(BlockDriverState *bs,
> >   0, 0);
> >  qemu_co_mutex_unlock(&s->lock);
> >  
> > +index_in_cluster = vmdk_find_index_in_cluster(extent, sector_num);
> >  switch (ret) {
> >  case VMDK_ERROR:
> >  ret = -EIO;
> > @@ -1276,15 +1277,13 @@ static int64_t coroutine_fn 
> > vmdk_co_get_block_status(BlockDriverState *bs,
> >  ret = BDRV_BLOCK_ZERO;
> >  break;
> >  case VMDK_OK:
> > -ret = BDRV_BLOCK_DATA;
> > -if (extent->file == bs->file && !extent->compressed) {
> > -ret |= BDRV_BLOCK_OFFSET_VALID | offset;
> > -}
> > -
> > +ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
> > +ret |= (offset + (index_in_cluster << BDRV_SECTOR_BITS))
> > +& BDRV_BLOCK_OFFSET_MASK;
> > +*file = extent->file->bs;
> 
> What if the extent is compressed?
> 

You're right, the offset shouldn't be set if compressed. Will fix.

Thanks!

Fam

Re: [Qemu-devel] [PATCH v4 01/14] block: Add "file" output parameter to block status query functions

2016-01-04 Thread Fam Zheng

On Mon, 01/04 22:00, Max Reitz wrote:
> On 24.12.2015 06:50, Fam Zheng wrote:
> > The added parameter can be used to return the BDS pointer which the
> > valid offset is referring to. It's value should be ignored unless
> 
> *Its
> 
> > BDRV_BLOCK_OFFSET_VALID in ret is set.
> > 
> > Until block drivers fill in the right value, let's clear it explicitly
> > right before calling .bdrv_get_block_status.
> > 
> > Reviewed-by: Stefan Hajnoczi 
> > Signed-off-by: Fam Zheng 
> > ---
> >  block/io.c| 42 --
> >  block/iscsi.c |  6 --
> >  block/mirror.c|  3 ++-
> >  block/parallels.c |  2 +-
> >  block/qcow.c  |  2 +-
> >  block/qcow2.c |  2 +-
> >  block/qed.c   |  3 ++-
> >  block/raw-posix.c |  3 ++-
> >  block/raw_bsd.c   |  3 ++-
> >  block/sheepdog.c  |  2 +-
> >  block/vdi.c   |  2 +-
> >  block/vmdk.c  |  2 +-
> >  block/vpc.c   |  2 +-
> >  block/vvfat.c |  2 +-
> >  include/block/block.h |  6 --
> >  include/block/block_int.h |  3 ++-
> >  qemu-img.c|  7 +--
> >  17 files changed, 59 insertions(+), 33 deletions(-)
> > 
> 
> [...]
> 
> > diff --git a/include/block/block.h b/include/block/block.h
> > index db8e096..70b4984 100644
> > --- a/include/block/block.h
> > +++ b/include/block/block.h
> 
> The comment explaining BDRV_BLOCK_OFFSET_VALID should be changed
> accordingly (you could also say "fixed", because apparently it wasn't
> always bs->file; sometimes it was bs itself (in case of raw-posix, iscsi
> and sheepdog)).

Yes, good point! Will fix this, and the typo above.

Fam

Re: [Qemu-devel] [PATCH v5 0/4] Extend TPM support with a QEMU-external TPM

2016-01-04 Thread Stefan Berger

"Xu, Quan"  wrote on 01/04/2016 08:26:03 PM:

> Date: 01/04/2016 08:26 PM
> Subject: RE: [PATCH v5 0/4] Extend TPM support with a QEMU-external TPM
> 
> On January 04 2016 11:23 PM,  wrote:
> > The following series of patches extends TPM support with an 
> external TPM that
> > offers a Linux CUSE (character device in userspace) interface. This 
TPM lets
> > each VM access its own private vTPM.
> > The CUSE TPM supports suspend/resume and migration. Much out-of-band
> > functionality necessary to control the CUSE TPM is implemented using 
ioctls.
> > 
> 
> Stefan,
> it is a good solution. Could you share more about this architecture?
> If you have an existing doc.

The architecture is as follows:

An extern tool (i.e., libvirt) start the CUSE TPM, which then provides 
/dev/vtpm- for the QEMU VM to talk to. QEMU receives the open 
filedescriptor or device name on the command line. All TPM commands from 
the guest go right into /dev/vtpm- via read/write() interface, so 
just like the passthrough. Out-of-band control, which we need for proper 
vTPM emualtipon, such as setting the locality, getting and setting of the 
state blobs of the vTPM following suspend/resume/snapshotting/migration, 
resetting the vTPM following a VM reset, shutdown of the vTPM process 
following VM shutdown, is done through the ioctl interface. The ioctl 
interface is defined in this file here:

https://github.com/stefanberger/swtpm/blob/master/include/swtpm/tpm_ioctl.h

I do not have an existing doc but the github swtpm project contains a man 
page describing the ioctls:

https://github.com/stefanberger/swtpm/blob/master/man/man3/swtpm_ioctls.pod

I hope this helps us to make progress.

Thanks and regards,
   Stefan

Re: [Qemu-devel] [RFC PATCH 0/3] x86: Add support for guest DMA dirty page tracking

2016-01-04 Thread Alexander Duyck

On Mon, Jan 4, 2016 at 12:41 PM, Konrad Rzeszutek Wilk
 wrote:
> On Sun, Dec 13, 2015 at 01:28:09PM -0800, Alexander Duyck wrote:
>> This patch set is meant to be the guest side code for a proof of concept
>> involving leaving pass-through devices in the guest during the warm-up
>> phase of guest live migration.  In order to accomplish this I have added a
>
> What does that mean? 'warm-up-phase'?

It is the first phase in a pre-copy migration.
https://en.wikipedia.org/wiki/Live_migration

Basically in this phase all the memory is marked as dirty and then
copied.  Any memory that changes gets marked as dirty as well.
Currently DMA circumvents this as the user space dirty page tracking
isn't able to track DMA.

>> new function called dma_mark_dirty that will mark the pages associated with
>> the DMA transaction as dirty in the case of either an unmap or a
>> sync_.*_for_cpu where the DMA direction is either DMA_FROM_DEVICE or
>> DMA_BIDIRECTIONAL.  The pass-through device must still be removed before
>> the stop-and-copy phase, however allowing the device to be present should
>> significantly improve the performance of the guest during the warm-up
>> period.
>
> .. if the warm-up phase is short I presume? If the warm-up phase takes
> a long time (busy guest that is of 1TB size) it wouldn't help much as the
> tracking of these DMA's may be quite long?
>
>>
>> This current implementation is very preliminary and there are number of
>> items still missing.  Specifically in order to make this a more complete
>> solution we need to support:
>> 1.  Notifying hypervisor that drivers are dirtying DMA pages received
>
> .. And somehow giving the hypervisor the GPFN so it can retain the PFN in
> the VT-d as long as possible.

Yes, what has happened is that the host went through and marked all
memory as read-only.  So trying to do any operation that requires
write access triggers a page fault which is then used by the host to
track pages that were dirtied.

>> 2.  Bypassing page dirtying when it is not needed.
>
> How would this work with with device doing DMA operations _after_ the 
> migration?
> That is the driver submits and DMA READ.. migrates away, device is unplugged,
> VT-d context is torn down - device does the DMA READ gets an VT-d error...
>
> and what then? How should the device on the other host replay the DMA READ?

The device has to quiesce before the migration can occur.  We cannot
have any DMA mappings still open when we reach the stop-and-copy phase
of the migration.  The solution I have proposed here works for
streaming mappings but doesn't solve the case for things like
dma_alloc_coherent where a bidirectional mapping is maintained between
the CPU and the device.

>>
>> The two mechanisms referenced above would likely require coordination with
>> QEMU and as such are open to discussion.  I haven't attempted to address
>> them as I am not sure there is a consensus as of yet.  My personal
>> preference would be to add a vendor-specific configuration block to the
>> emulated pci-bridge interfaces created by QEMU that would allow us to
>> essentially extend shpc to support guest live migration with pass-through
>> devices.
>
> shpc?

That is kind of what I was thinking.  We basically need some mechanism
to allow for the host to ask the device to quiesce.  It has been
proposed to possibly even look at something like an ACPI interface
since I know ACPI is used by QEMU to manage hot-plug in the standard
case.

- Alex

[Qemu-devel] [PATCH v8 4/4] arm_mptimer: Convert to use ptimer

2016-01-04 Thread Dmitry Osipenko

Current ARM MPTimer implementation uses QEMUTimer for the actual timer,
this implementation isn't complete and mostly tries to duplicate of what
generic ptimer is already doing fine.

Conversion to ptimer brings the following benefits and fixes:
- Simple timer pausing implementation
- Fixes counter value preservation after stopping the timer
- Code simplification and reduction

Bump VMSD to version 3, since VMState is changed and is not compatible
with the previous implementation.

Signed-off-by: Dmitry Osipenko 
---
 hw/timer/arm_mptimer.c | 110 ++---
 include/hw/timer/arm_mptimer.h |   4 +-
 2 files changed, 49 insertions(+), 65 deletions(-)

diff --git a/hw/timer/arm_mptimer.c b/hw/timer/arm_mptimer.c
index 3e59c2a..c06da5e 100644
--- a/hw/timer/arm_mptimer.c
+++ b/hw/timer/arm_mptimer.c
@@ -19,8 +19,9 @@
  * with this program; if not, see .
  */
 
+#include "hw/ptimer.h"
 #include "hw/timer/arm_mptimer.h"
-#include "qemu/timer.h"
+#include "qemu/main-loop.h"
 #include "qom/cpu.h"
 
 /* This device implements the per-cpu private timer and watchdog block
@@ -47,28 +48,10 @@ static inline uint32_t timerblock_scale(TimerBlock *tb)
 return (((tb->control >> 8) & 0xff) + 1) * 10;
 }
 
-static void timerblock_reload(TimerBlock *tb, int restart)
-{
-if (tb->count == 0) {
-return;
-}
-if (restart) {
-tb->tick = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
-}
-tb->tick += (int64_t)tb->count * timerblock_scale(tb);
-timer_mod(tb->timer, tb->tick);
-}
-
 static void timerblock_tick(void *opaque)
 {
 TimerBlock *tb = (TimerBlock *)opaque;
 tb->status = 1;
-if (tb->control & 2) {
-tb->count = tb->load;
-timerblock_reload(tb, 0);
-} else {
-tb->count = 0;
-}
 timerblock_update_irq(tb);
 }
 
@@ -76,21 +59,11 @@ static uint64_t timerblock_read(void *opaque, hwaddr addr,
 unsigned size)
 {
 TimerBlock *tb = (TimerBlock *)opaque;
-int64_t val;
 switch (addr) {
 case 0: /* Load */
 return tb->load;
 case 4: /* Counter.  */
-if (((tb->control & 1) == 0) || (tb->count == 0)) {
-return 0;
-}
-/* Slow and ugly, but hopefully won't happen too often.  */
-val = tb->tick - qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
-val /= timerblock_scale(tb);
-if (val < 0) {
-val = 0;
-}
-return val;
+return ptimer_get_count(tb->timer);
 case 8: /* Control.  */
 return tb->control;
 case 12: /* Interrupt status.  */
@@ -100,6 +73,19 @@ static uint64_t timerblock_read(void *opaque, hwaddr addr,
 }
 }
 
+static void timerblock_run(TimerBlock *tb, uint64_t count, int set_count)
+{
+if (set_count) {
+if (((tb->control & 3) == 3) && (count == 0)) {
+count = tb->load;
+}
+ptimer_set_count(tb->timer, count);
+}
+if ((tb->control & 1) && (count != 0)) {
+ptimer_run(tb->timer, !(tb->control & 2));
+}
+}
+
 static void timerblock_write(void *opaque, hwaddr addr,
  uint64_t value, unsigned size)
 {
@@ -108,32 +94,34 @@ static void timerblock_write(void *opaque, hwaddr addr,
 switch (addr) {
 case 0: /* Load */
 tb->load = value;
-/* Fall through.  */
-case 4: /* Counter.  */
-if ((tb->control & 1) && tb->count) {
-/* Cancel the previous timer.  */
-timer_del(tb->timer);
+/* Setting load to 0 stops the timer.  */
+if (tb->load == 0) {
+ptimer_stop(tb->timer);
 }
-tb->count = value;
-if (tb->control & 1) {
-timerblock_reload(tb, 1);
+ptimer_set_limit(tb->timer, tb->load, 1);
+timerblock_run(tb, tb->load, 0);
+break;
+case 4: /* Counter.  */
+/* Setting counter to 0 stops the one-shot timer.  */
+if (!(tb->control & 2) && (value == 0)) {
+ptimer_stop(tb->timer);
 }
+timerblock_run(tb, value, 1);
 break;
 case 8: /* Control.  */
 old = tb->control;
 tb->control = value;
-if (value & 1) {
-if ((old & 1) && (tb->count != 0)) {
-/* Do nothing if timer is ticking right now.  */
-break;
-}
-if (tb->control & 2) {
-tb->count = tb->load;
-}
-timerblock_reload(tb, 1);
-} else if (old & 1) {
-/* Shutdown the timer.  */
-timer_del(tb->timer);
+/* Timer mode switch requires ptimer to be stopped.  */
+if ((old & 3) != (tb->control & 3)) {
+ptimer_stop(tb->timer);
+}
+if (!(tb->control & 1)) {
+break;
+}
+ptimer_set_period(tb->timer, timerblock_scale(tb));
+if ((old & 3) != (tb->control & 3))

[Qemu-devel] [PATCH v8 2/4] hw/ptimer: Perform tick and counter wrap around if timer already expired

2016-01-04 Thread Dmitry Osipenko

ptimer_get_count() might be called while QEMU timer already been expired.
In that case ptimer would return counter = 0, which might be undesirable
in case of polled timer. Do counter wrap around for periodic timer to keep
it distributed.

In addition, there is no reason to keep expired timer tick deferred, so
just perform the tick from ptimer_get_count().

Signed-off-by: Dmitry Osipenko 
---
 hw/core/ptimer.c | 35 +--
 1 file changed, 29 insertions(+), 6 deletions(-)

diff --git a/hw/core/ptimer.c b/hw/core/ptimer.c
index 035af97..96a6c7a 100644
--- a/hw/core/ptimer.c
+++ b/hw/core/ptimer.c
@@ -85,15 +85,21 @@ static void ptimer_tick(void *opaque)
 
 uint64_t ptimer_get_count(ptimer_state *s)
 {
+int enabled = s->enabled;
 int64_t now;
+int64_t next;
 uint64_t counter;
+int expired;
+int oneshot;
 
-if (s->enabled) {
+if (enabled) {
 now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
+next = s->next_event;
+expired = (now - next >= 0);
+oneshot = (enabled == 2);
 /* Figure out the current counter value.  */
-if (now - s->next_event > 0
-|| s->period == 0) {
-/* Prevent timer underflowing if it should already have
+if (s->period == 0 || (expired && oneshot)) {
+/* Prevent one-shot timer underflowing if it should already have
triggered.  */
 counter = 0;
 } else {
@@ -114,12 +120,12 @@ uint64_t ptimer_get_count(ptimer_state *s)
backwards.
 */
 
-if ((s->enabled == 1) && (s->limit * period < 1)) {
+if (!oneshot && (s->limit * period < 1)) {
 period = 1 / s->limit;
 period_frac = 0;
 }
 
-rem = s->next_event - now;
+rem = expired ? now - next : next - now;
 div = period;
 
 clz1 = clz64(rem);
@@ -139,6 +145,23 @@ uint64_t ptimer_get_count(ptimer_state *s)
 div += 1;
 }
 counter = rem / div;
+
+if (expired) {
+/* Wrap around periodic counter.  */
+counter = s->delta = s->limit - counter % s->limit;
+}
+}
+
+if (expired) {
+if (oneshot) {
+ptimer_tick(s);
+} else {
+/* Don't use ptimer_tick() for the periodic timer since it
+ * would reset the delta value.
+ */
+ptimer_trigger(s);
+ptimer_reload(s);
+}
 }
 } else {
 counter = s->delta;
-- 
2.6.4

[Qemu-devel] [PATCH v8 3/4] hw/ptimer: Update .delta on period/freq change

2016-01-04 Thread Dmitry Osipenko

Delta value must be updated on period/freq change, otherwise running timer
would be restarted (counter reloaded with old delta). Only m68k/mcf520x
and arm/arm_timer devices are currently doing freq change correctly, i.e.
stopping the timer. Perform delta update to fix affected devices and
eliminate potential further mistakes.

Signed-off-by: Dmitry Osipenko 
---
 hw/core/ptimer.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/core/ptimer.c b/hw/core/ptimer.c
index 96a6c7a..8c2dd9f 100644
--- a/hw/core/ptimer.c
+++ b/hw/core/ptimer.c
@@ -207,6 +207,7 @@ void ptimer_stop(ptimer_state *s)
 /* Set counter increment interval in nanoseconds.  */
 void ptimer_set_period(ptimer_state *s, int64_t period)
 {
+s->delta = ptimer_get_count(s);
 s->period = period;
 s->period_frac = 0;
 if (s->enabled) {
@@ -218,6 +219,7 @@ void ptimer_set_period(ptimer_state *s, int64_t period)
 /* Set counter frequency in Hz.  */
 void ptimer_set_freq(ptimer_state *s, uint32_t freq)
 {
+s->delta = ptimer_get_count(s);
 s->period = 10ll / freq;
 s->period_frac = (10ll << 32) / freq;
 if (s->enabled) {
-- 
2.6.4

[Qemu-devel] [PATCH v8 1/4] hw/ptimer: Fix issues caused by the adjusted timer limit value

2016-01-04 Thread Dmitry Osipenko

Multiple issues here related to the timer with a adjusted .limit value:

1) ptimer_get_count() returns incorrect counter value for the disabled
timer after loading the counter with a small value, because adjusted limit
value is used instead of the original.

For instance:
1) ptimer_stop(t)
2) ptimer_set_period(t, 1)
3) ptimer_set_limit(t, 0, 1)
4) ptimer_get_count(t) <-- would return 1 instead of 0

2) ptimer_get_count() might return incorrect value for the timer running
with a adjusted limit value.

For instance:
1) ptimer_stop(t)
2) ptimer_set_period(t, 1)
3) ptimer_set_limit(t, 10, 1)
4) ptimer_run(t)
5) ptimer_get_count(t) <-- might return value > 10

3) Neither ptimer_set_period() nor ptimer_set_freq() are adjusting the
limit value, so it is still possible to make timer timeout value
arbitrary small.

For instance:
1) ptimer_set_period(t, 1)
2) ptimer_set_limit(t, 1, 0)
3) ptimer_set_period(t, 1) <-- bypass limit correction

Fix all of the above issues by adjusting timer period instead of the limit.
Do the adjust for periodic timer only.

Signed-off-by: Dmitry Osipenko 
---
 hw/core/ptimer.c | 59 ++--
 1 file changed, 36 insertions(+), 23 deletions(-)

diff --git a/hw/core/ptimer.c b/hw/core/ptimer.c
index edf077c..035af97 100644
--- a/hw/core/ptimer.c
+++ b/hw/core/ptimer.c
@@ -34,20 +34,39 @@ static void ptimer_trigger(ptimer_state *s)
 
 static void ptimer_reload(ptimer_state *s)
 {
-if (s->delta == 0) {
+uint32_t period_frac = s->period_frac;
+uint64_t period = s->period;
+uint64_t delta = s->delta;
+uint64_t limit = s->limit;
+
+if (delta == 0) {
 ptimer_trigger(s);
-s->delta = s->limit;
+delta = limit;
 }
-if (s->delta == 0 || s->period == 0) {
+if (delta == 0 || period == 0) {
 fprintf(stderr, "Timer with period zero, disabling\n");
 s->enabled = 0;
 return;
 }
 
+/*
+ * Artificially limit timeout rate to something
+ * achievable under QEMU.  Otherwise, QEMU spends all
+ * its time generating timer interrupts, and there
+ * is no forward progress.
+ * About ten microseconds is the fastest that really works
+ * on the current generation of host machines.
+ */
+
+if ((s->enabled == 1) && (limit * period < 1)) {
+period = 1 / limit;
+period_frac = 0;
+}
+
 s->last_event = s->next_event;
-s->next_event = s->last_event + s->delta * s->period;
-if (s->period_frac) {
-s->next_event += ((int64_t)s->period_frac * s->delta) >> 32;
+s->next_event = s->last_event + delta * period;
+if (period_frac) {
+s->next_event += ((int64_t)period_frac * delta) >> 32;
 }
 timer_mod(s->timer, s->next_event);
 }
@@ -82,6 +101,8 @@ uint64_t ptimer_get_count(ptimer_state *s)
 uint64_t div;
 int clz1, clz2;
 int shift;
+uint32_t period_frac = s->period_frac;
+uint64_t period = s->period;
 
 /* We need to divide time by period, where time is stored in
rem (64-bit integer) and period is stored in period/period_frac
@@ -93,8 +114,13 @@ uint64_t ptimer_get_count(ptimer_state *s)
backwards.
 */
 
+if ((s->enabled == 1) && (s->limit * period < 1)) {
+period = 1 / s->limit;
+period_frac = 0;
+}
+
 rem = s->next_event - now;
-div = s->period;
+div = period;
 
 clz1 = clz64(rem);
 clz2 = clz64(div);
@@ -103,13 +129,13 @@ uint64_t ptimer_get_count(ptimer_state *s)
 rem <<= shift;
 div <<= shift;
 if (shift >= 32) {
-div |= ((uint64_t)s->period_frac << (shift - 32));
+div |= ((uint64_t)period_frac << (shift - 32));
 } else {
 if (shift != 0)
-div |= (s->period_frac >> (32 - shift));
+div |= (period_frac >> (32 - shift));
 /* Look at remaining bits of period_frac and round div up if 
necessary.  */
-if ((uint32_t)(s->period_frac << shift))
+if ((uint32_t)(period_frac << shift))
 div += 1;
 }
 counter = rem / div;
@@ -181,19 +207,6 @@ void ptimer_set_freq(ptimer_state *s, uint32_t freq)
count = limit.  */
 void ptimer_set_limit(ptimer_state *s, uint64_t limit, int reload)
 {
-/*
- * Artificially limit timeout rate to something
- * achievable under QEMU.  Otherwise, QEMU spends all
- * its time generating timer interrupts, and there
- * is no forward progress.
- * About ten microseconds is the fastest that really works
- * on the current generation of host machines.
- */
-
-if (!use_icount && limit * s->period < 1 &&

[Qemu-devel] [PATCH v8 0/4] PTimer fixes and ARM MPTimer conversion

2016-01-04 Thread Dmitry Osipenko

Changelog for ARM MPTimer QEMUTimer to ptimer conversion:

V2: Fixed changing periodic timer counter value "on the fly". I added a
test to the gist to cover that issue.

V3: Fixed starting the timer with load = 0 and counter != 0, added tests
to the gist for this issue. Changed vmstate version for all VMSD's,
since loadvm doesn't check version of nested VMSD.

V4: Fixed spurious IT bit set for the timer starting in the periodic mode
with counter = 0. Test added.

V5: Code cleanup, now depends on ptimer_set_limit() fix.

V6: No code change, added test to check ptimer_get_count() with corrected
.limit value.

V7: No change.

V8: No change.

ARM MPTimer tests: https://gist.github.com/digetx/dbd46109503b1a91941a


Patch for ptimer is introduced since V5 of "ARM MPTimer conversion".

Changelog for the "ptimer fixes" patch:

V5: Only fixed ptimer_set_limit() for the disabled timer.

V6: As was pointed by Peter Maydell, there are other issues beyond
ptimer_set_limit(), so V6 supposed to cover all those issues.

V7: Added accidentally removed !use_icount check.
Added missed "else" statement.

V8: Adjust period instead of the limit and do it for periodic timer only
(.limit adjusting bug). Added patch/fix for freq/period change and
ptimer_get_count() improvement.

Dmitry Osipenko (4):
  hw/ptimer: Fix issues caused by the adjusted timer limit value
  hw/ptimer: Perform tick and counter wrap around if timer already
expired
  hw/ptimer: Update .delta on period/freq change
  arm_mptimer: Convert to use ptimer

 hw/core/ptimer.c   |  94 ---
 hw/timer/arm_mptimer.c | 110 ++---
 include/hw/timer/arm_mptimer.h |   4 +-
 3 files changed, 115 insertions(+), 93 deletions(-)

-- 
2.6.4

Re: [Qemu-devel] [Qemu-ppc] [PATCH] macio: fix overflow in lba to offset conversion for ATAPI devices

2016-01-04 Thread Programmingkid

> As the IDEState lba field is an int32_t, make sure we cast to int64_t before
> shifting to calculate the offset. Otherwise we end up with an overflow when
> trying to access sectors beyond 2GB as can occur when using DVD images.
> 
> Signed-off-by: Mark Cave-Ayland 
> ---
>  hw/ide/macio.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/ide/macio.c b/hw/ide/macio.c
> index 3ee962f..a78b6e0 100644
> --- a/hw/ide/macio.c
> +++ b/hw/ide/macio.c
> @@ -280,7 +280,7 @@ static void pmac_ide_atapi_transfer_cb(void *opaque, int 
> ret)
>  }
>  
>  /* Calculate current offset */
> -offset = (int64_t)(s->lba << 11) + s->io_buffer_index;
> +offset = ((int64_t)(s->lba) << 11) + s->io_buffer_index;
>  
>  pmac_dma_read(s->blk, offset, io->len, pmac_ide_atapi_transfer_cb, io);
>  return;
> -- 
> 1.7.10.4

It worked, but the first time I tried it, Mac OS 10.4.0 had a kernel panic. The 
AppleADBKeyboard.kext was the cause. The second time it booted to the 
installer. 

Reviewed-by: John Arbuckle

Re: [Qemu-devel] [PATCH v5 0/4] Extend TPM support with a QEMU-external TPM

2016-01-04 Thread Xu, Quan

On January 04 2016 11:23 PM,  wrote:
> The following series of patches extends TPM support with an external TPM that
> offers a Linux CUSE (character device in userspace) interface. This TPM lets
> each VM access its own private vTPM.
> The CUSE TPM supports suspend/resume and migration. Much out-of-band
> functionality necessary to control the CUSE TPM is implemented using ioctls.
> 

Stefan,
it is a good solution. Could you share more about this architecture? If you 
have an existing doc.


Quan

> This series of patches applies to 38a762fe.
> 
> Stefan Berger (4):
>   Provide support for the CUSE TPM
>   Introduce condition to notify waiters of completed command
>   Introduce condition in TPM backend for notification
>   Add support for VM suspend/resume for TPM TIS
>

[Qemu-devel] [v15 08/15] vfio: add check host bus reset is support or not

2016-01-04 Thread Cao jin

From: Chen Fan 

when init vfio devices done, we should test all the devices supported
aer whether conflict with others. For each one, get the hot reset
info for the affected device list.  For each affected device, all
should attach to the VM and on/below the same bus. also, we should test
all of the non-AER supporting vfio-pci devices on or below the target
bus to verify they have a reset mechanism.

Signed-off-by: Chen Fan 
---
 hw/vfio/pci.c | 238 --
 hw/vfio/pci.h |   1 +
 2 files changed, 232 insertions(+), 7 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 38b0aa5..16ab0e3 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -1832,6 +1832,218 @@ static int vfio_add_std_cap(VFIOPCIDevice *vdev, 
uint8_t pos)
 return 0;
 }
 
+static bool vfio_pci_host_slot_match(PCIHostDeviceAddress *host1,
+ PCIHostDeviceAddress *host2)
+{
+return (host1->domain == host2->domain && host1->bus == host2->bus &&
+host1->slot == host2->slot);
+}
+
+static bool vfio_pci_host_match(PCIHostDeviceAddress *host1,
+PCIHostDeviceAddress *host2)
+{
+return (vfio_pci_host_slot_match(host1, host2) &&
+host1->function == host2->function);
+}
+
+struct VFIODeviceFind {
+PCIDevice *pdev;
+bool found;
+};
+
+static void vfio_check_device_noreset(PCIBus *bus, PCIDevice *pdev,
+  void *opaque)
+{
+DeviceState *dev = DEVICE(pdev);
+DeviceClass *dc = DEVICE_GET_CLASS(dev);
+VFIOPCIDevice *vdev;
+struct VFIODeviceFind *find = opaque;
+
+if (find->found) {
+return;
+}
+
+if (!object_dynamic_cast(OBJECT(dev), "vfio-pci")) {
+if (!dc->reset) {
+goto found;
+}
+return;
+}
+vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
+if (!(vdev->features & VFIO_FEATURE_ENABLE_AER) &&
+!vdev->vbasedev.reset_works) {
+goto found;
+}
+
+return;
+found:
+find->pdev = pdev;
+find->found = true;
+}
+
+static void device_find(PCIBus *bus, PCIDevice *pdev, void *opaque)
+{
+struct VFIODeviceFind *find = opaque;
+
+if (find->found) {
+return;
+}
+
+if (pdev == find->pdev) {
+find->found = true;
+}
+}
+
+static int vfio_check_host_bus_reset(VFIOPCIDevice *vdev)
+{
+PCIBus *bus = vdev->pdev.bus;
+struct vfio_pci_hot_reset_info *info = NULL;
+struct vfio_pci_dependent_device *devices;
+VFIOGroup *group;
+struct VFIODeviceFind find;
+int ret, i;
+
+ret = vfio_get_hot_reset_info(vdev, &info);
+if (ret) {
+error_report("vfio: Cannot enable AER for device %s,"
+ " device does not support hot reset.",
+ vdev->vbasedev.name);
+goto out;
+}
+
+/* List all affected devices by bus reset */
+devices = &info->devices[0];
+
+/* Verify that we have all the groups required */
+for (i = 0; i < info->count; i++) {
+PCIHostDeviceAddress host;
+VFIOPCIDevice *tmp;
+VFIODevice *vbasedev_iter;
+bool found = false;
+
+host.domain = devices[i].segment;
+host.bus = devices[i].bus;
+host.slot = PCI_SLOT(devices[i].devfn);
+host.function = PCI_FUNC(devices[i].devfn);
+
+/* Skip the current device */
+if (vfio_pci_host_match(&host, &vdev->host)) {
+continue;
+}
+
+/* Ensure we own the group of the affected device */
+QLIST_FOREACH(group, &vfio_group_list, next) {
+if (group->groupid == devices[i].group_id) {
+break;
+}
+}
+
+if (!group) {
+error_report("vfio: Cannot enable AER for device %s, "
+ "depends on group %d which is not owned.",
+ vdev->vbasedev.name, devices[i].group_id);
+ret = -1;
+goto out;
+}
+
+/* Ensure affected devices for reset on/blow the bus */
+QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
+if (vbasedev_iter->type != VFIO_DEVICE_TYPE_PCI) {
+continue;
+}
+tmp = container_of(vbasedev_iter, VFIOPCIDevice, vbasedev);
+if (vfio_pci_host_match(&host, &tmp->host)) {
+PCIDevice *pci = PCI_DEVICE(tmp);
+
+/*
+ * AER errors may be broadcast to all functions of a multi-
+ * function endpoint.  If any of those sibling functions are
+ * also assigned, they need to have AER enabled or else an
+ * error may continue to cause a vm_stop condition.  IOW,
+ * AER setup of this function would be pointless.
+ */
+if (vfio_pci_host_slot_match(&vdev->host, &tmp->host) &&
+!(tmp->features & VFIO_FEATURE_ENABLE_AER)) {
+

[Qemu-devel] [v15 14/15] vfio-pci: pass the aer error to guest

2016-01-04 Thread Cao jin

From: Chen Fan 

when the vfio device encounters an uncorrectable error in host,
the vfio_pci driver will signal the eventfd registered by this
vfio device, the results in the qemu eventfd handler getting
invoked.

this patch is to pass the error to guest and have the guest driver
recover from the error.

Signed-off-by: Chen Fan 
---
 hw/vfio/pci.c | 53 +++--
 1 file changed, 47 insertions(+), 6 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index aa0d945..bc81132 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2557,18 +2557,59 @@ static void vfio_put_device(VFIOPCIDevice *vdev)
 static void vfio_err_notifier_handler(void *opaque)
 {
 VFIOPCIDevice *vdev = opaque;
+PCIDevice *dev = &vdev->pdev;
+PCIEAERMsg msg = {
+.severity = 0,
+.source_id = (pci_bus_num(dev->bus) << 8) | dev->devfn,
+};
 
 if (!event_notifier_test_and_clear(&vdev->err_notifier)) {
 return;
 }
 
 /*
- * TBD. Retrieve the error details and decide what action
- * needs to be taken. One of the actions could be to pass
- * the error to the guest and have the guest driver recover
- * from the error. This requires that PCIe capabilities be
- * exposed to the guest. For now, we just terminate the
- * guest to contain the error.
+ * in case the real hardware configration has been changed,
+ * here we should recheck the bus reset capability.
+ */
+if ((vdev->features & VFIO_FEATURE_ENABLE_AER) &&
+vfio_check_host_bus_reset(vdev)) {
+goto stop;
+}
+/*
+ * we should read the error details from the real hardware
+ * configuration spaces, here we only need to do is signaling
+ * to guest an uncorrectable error has occurred.
+ */
+if ((vdev->features & VFIO_FEATURE_ENABLE_AER) &&
+dev->exp.aer_cap) {
+uint8_t *aer_cap = dev->config + dev->exp.aer_cap;
+uint32_t uncor_status;
+bool isfatal;
+
+uncor_status = vfio_pci_read_config(dev,
+   dev->exp.aer_cap + PCI_ERR_UNCOR_STATUS, 4);
+
+/*
+ * if we receive the error signal but not this device, we can
+ * just ignore it.
+ */
+if (!(uncor_status & ~0UL)) {
+return;
+}
+
+isfatal = uncor_status & pci_get_long(aer_cap + PCI_ERR_UNCOR_SEVER);
+
+msg.severity = isfatal ? PCI_ERR_ROOT_CMD_FATAL_EN :
+ PCI_ERR_ROOT_CMD_NONFATAL_EN;
+
+pcie_aer_msg(dev, &msg);
+return;
+}
+
+stop:
+/*
+ * If the aer capability is not exposed to the guest. we just
+ * terminate the guest to contain the error.
  */
 
 error_report("%s(%04x:%02x:%02x.%x) Unrecoverable error detected.  "
-- 
1.9.3

[Qemu-devel] [v15 12/15] vfio: add bus in reset flag

2016-01-04 Thread Cao jin

From: Chen Fan 

mark the host bus be in reset. avoid multiple devices trigger the
host bus reset many times.

Signed-off-by: Chen Fan 
---
 hw/vfio/pci.c | 6 ++
 include/hw/vfio/vfio-common.h | 1 +
 2 files changed, 7 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index ee88db3..aa0d945 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2249,6 +2249,11 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool 
single)
 
 trace_vfio_pci_hot_reset(vdev->vbasedev.name, single ? "one" : "multi");
 
+if (vdev->vbasedev.bus_in_reset) {
+vdev->vbasedev.bus_in_reset = false;
+return 0;
+}
+
 vfio_pci_pre_reset(vdev);
 vdev->vbasedev.needs_reset = false;
 
@@ -2312,6 +2317,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool 
single)
 }
 vfio_pci_pre_reset(tmp);
 tmp->vbasedev.needs_reset = false;
+tmp->vbasedev.bus_in_reset = true;
 multi = true;
 break;
 }
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index f037f3c..44b19d7 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -95,6 +95,7 @@ typedef struct VFIODevice {
 bool reset_works;
 bool needs_reset;
 bool no_mmap;
+bool bus_in_reset;
 VFIODeviceOps *ops;
 unsigned int num_irqs;
 unsigned int num_regions;
-- 
1.9.3

[Qemu-devel] [v15 05/15] vfio: add pcie extanded capability support

2016-01-04 Thread Cao jin

From: Chen Fan 

For vfio pcie device, we could expose the extended capability on
PCIE bus. in order to avoid config space broken, we introduce
a copy config for parsing extended caps. and rebuild the pcie
extended config space.

Signed-off-by: Chen Fan 
---
 hw/vfio/pci.c | 70 ++-
 1 file changed, 69 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 288f2c7..64b0867 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -1482,6 +1482,21 @@ static uint8_t vfio_std_cap_max_size(PCIDevice *pdev, 
uint8_t pos)
 return next - pos;
 }
 
+
+static uint16_t vfio_ext_cap_max_size(const uint8_t *config, uint16_t pos)
+{
+uint16_t tmp, next = PCIE_CONFIG_SPACE_SIZE;
+
+for (tmp = PCI_CONFIG_SPACE_SIZE; tmp;
+tmp = PCI_EXT_CAP_NEXT(pci_get_long(config + tmp))) {
+if (tmp > pos && tmp < next) {
+next = tmp;
+}
+}
+
+return next - pos;
+}
+
 static void vfio_set_word_bits(uint8_t *buf, uint16_t val, uint16_t mask)
 {
 pci_set_word(buf, (pci_get_word(buf) & ~mask) | val);
@@ -1817,16 +1832,69 @@ static int vfio_add_std_cap(VFIOPCIDevice *vdev, 
uint8_t pos)
 return 0;
 }
 
+static int vfio_add_ext_cap(VFIOPCIDevice *vdev)
+{
+PCIDevice *pdev = &vdev->pdev;
+uint32_t header;
+uint16_t cap_id, next, size;
+uint8_t cap_ver;
+uint8_t *config;
+
+/*
+ * In order to avoid breaking config space, create a copy to
+ * use for parsing extended capabilities.
+ */
+config = g_memdup(pdev->config, vdev->config_size);
+
+for (next = PCI_CONFIG_SPACE_SIZE; next;
+ next = PCI_EXT_CAP_NEXT(pci_get_long(config + next))) {
+header = pci_get_long(config + next);
+cap_id = PCI_EXT_CAP_ID(header);
+cap_ver = PCI_EXT_CAP_VER(header);
+
+/*
+ * If it becomes important to configure extended capabilities to their
+ * actual size, use this as the default when it's something we don't
+ * recognize. Since QEMU doesn't actually handle many of the config
+ * accesses, exact size doesn't seem worthwhile.
+ */
+size = vfio_ext_cap_max_size(config, next);
+
+pcie_add_capability(pdev, cap_id, cap_ver, next, size);
+pci_set_long(dev->config + next, PCI_EXT_CAP(cap_id, cap_ver, 0));
+
+/* Use emulated next pointer to allow dropping extended caps */
+pci_long_test_and_set_mask(vdev->emulated_config_bits + next,
+   PCI_EXT_CAP_NEXT_MASK);
+}
+
+g_free(config);
+return 0;
+}
+
 static int vfio_add_capabilities(VFIOPCIDevice *vdev)
 {
 PCIDevice *pdev = &vdev->pdev;
+int ret;
 
 if (!(pdev->config[PCI_STATUS] & PCI_STATUS_CAP_LIST) ||
 !pdev->config[PCI_CAPABILITY_LIST]) {
 return 0; /* Nothing to add */
 }
 
-return vfio_add_std_cap(vdev, pdev->config[PCI_CAPABILITY_LIST]);
+ret = vfio_add_std_cap(vdev, pdev->config[PCI_CAPABILITY_LIST]);
+if (ret) {
+return ret;
+}
+
+/* on PCI bus, it doesn't make sense to expose extended capabilities. */
+if (!pci_is_express(pdev) ||
+!pci_bus_is_express(pdev->bus) ||
+!pci_get_long(pdev->config + PCI_CONFIG_SPACE_SIZE)) {
+return 0;
+}
+
+return vfio_add_ext_cap(vdev);
 }
 
 static void vfio_pci_pre_reset(VFIOPCIDevice *vdev)
-- 
1.9.3

[Qemu-devel] [v15 09/15] add check reset mechanism when hotplug vfio device

2016-01-04 Thread Cao jin

From: Chen Fan 

Since we support multi-function hotplug. the function 0 indicate
the closure of the slot, so we have the chance to do the check.

Signed-off-by: Chen Fan 
---
 hw/pci/pci.c | 29 +
 hw/vfio/pci.c| 19 +++
 hw/vfio/pci.h|  2 ++
 include/hw/pci/pci_bus.h |  5 +
 4 files changed, 55 insertions(+)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 168b9cc..f6ca6ef 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -81,6 +81,7 @@ static void pci_bus_realize(BusState *qbus, Error **errp)
 PCIBus *bus = PCI_BUS(qbus);
 
 vmstate_register(NULL, -1, &vmstate_pcibus, bus);
+notifier_with_return_list_init(&bus->hotplug_notifiers);
 }
 
 static void pci_bus_unrealize(BusState *qbus, Error **errp)
@@ -1835,6 +1836,22 @@ PCIDevice *pci_find_device(PCIBus *bus, int bus_num, 
uint8_t devfn)
 return bus->devices[devfn];
 }
 
+void pci_bus_add_hotplug_notifier(PCIBus *bus, NotifierWithReturn *notify)
+{
+notifier_with_return_list_add(&bus->hotplug_notifiers, notify);
+}
+
+void pci_bus_remove_hotplug_notifier(NotifierWithReturn *notifier)
+{
+notifier_with_return_remove(notifier);
+}
+
+static int pci_bus_hotplug_notifier(PCIBus *bus, void *opaque)
+{
+return notifier_with_return_list_notify(&bus->hotplug_notifiers,
+opaque);
+}
+
 static void pci_qdev_realize(DeviceState *qdev, Error **errp)
 {
 PCIDevice *pci_dev = (PCIDevice *)qdev;
@@ -1877,6 +1894,18 @@ static void pci_qdev_realize(DeviceState *qdev, Error 
**errp)
 pci_qdev_unrealize(DEVICE(pci_dev), NULL);
 return;
 }
+
+/*
+ *  If the function is func 0, indicate the closure of the slot.
+ *  signal the callback.
+ */
+if (DEVICE(pci_dev)->hotplugged &&
+pci_get_function_0(pci_dev) == pci_dev &&
+pci_bus_hotplug_notifier(bus, pci_dev)) {
+error_setg(errp, "failed to hotplug function 0");
+pci_qdev_unrealize(DEVICE(pci_dev), NULL);
+return;
+}
 }
 
 static void pci_default_realize(PCIDevice *dev, Error **errp)
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 16ab0e3..ff25c9b 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2044,6 +2044,19 @@ static int vfio_check_devices_host_bus_reset(void)
 return 0;
 }
 
+static int vfio_check_bus_reset(NotifierWithReturn *n, void *opaque)
+{
+VFIOPCIDevice *vdev = container_of(n, VFIOPCIDevice, hotplug_notifier);
+PCIDevice *pci_dev = PCI_DEVICE(vdev);
+PCIDevice *pci_func0 = opaque;
+
+if (pci_get_function_0(pci_dev) != pci_func0) {
+return 0;
+}
+
+return vfio_check_host_bus_reset(vdev);
+}
+
 static int vfio_setup_aer(VFIOPCIDevice *vdev, uint8_t cap_ver,
   int pos, uint16_t size)
 {
@@ -2091,6 +2104,9 @@ static int vfio_setup_aer(VFIOPCIDevice *vdev, uint8_t 
cap_ver,
 pdev->exp.aer_log.log_max = 0;
 }
 
+vdev->hotplug_notifier.notify = vfio_check_bus_reset;
+pci_bus_add_hotplug_notifier(pdev->bus, &vdev->hotplug_notifier);
+
 pcie_cap_deverr_init(pdev);
 return pcie_aer_init(pdev, pos, size);
 
@@ -2972,6 +2988,9 @@ static void vfio_exitfn(PCIDevice *pdev)
 vfio_unregister_req_notifier(vdev);
 vfio_unregister_err_notifier(vdev);
 pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
+if (vdev->features & VFIO_FEATURE_ENABLE_AER) {
+pci_bus_remove_hotplug_notifier(&vdev->hotplug_notifier);
+}
 vfio_disable_interrupts(vdev);
 if (vdev->intx.mmap_timer) {
 timer_free(vdev->intx.mmap_timer);
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index 59ae194..b385f07 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -142,6 +142,8 @@ typedef struct VFIOPCIDevice {
 bool no_kvm_intx;
 bool no_kvm_msi;
 bool no_kvm_msix;
+
+NotifierWithReturn hotplug_notifier;
 } VFIOPCIDevice;
 
 uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len);
diff --git a/include/hw/pci/pci_bus.h b/include/hw/pci/pci_bus.h
index 403fec6..7812fa9 100644
--- a/include/hw/pci/pci_bus.h
+++ b/include/hw/pci/pci_bus.h
@@ -39,8 +39,13 @@ struct PCIBus {
Keep a count of the number of devices with raised IRQs.  */
 int nirq;
 int *irq_count;
+
+NotifierWithReturnList hotplug_notifiers;
 };
 
+void pci_bus_add_hotplug_notifier(PCIBus *bus, NotifierWithReturn *notify);
+void pci_bus_remove_hotplug_notifier(NotifierWithReturn *notify);
+
 typedef struct PCIBridgeWindows PCIBridgeWindows;
 
 /*
-- 
1.9.3

[Qemu-devel] [v15 15/15] vfio: add 'aer' property to expose aercap

2016-01-04 Thread Cao jin

From: Chen Fan 

add 'aer' property to let user able to decide whether expose
the aer capability. by default we should disable aer feature,
because it needs configuration restrictions.

Signed-off-by: Chen Fan 
---
 hw/vfio/pci.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index bc81132..e800cf8 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3124,6 +3124,8 @@ static Property vfio_pci_dev_properties[] = {
sub_vendor_id, PCI_ANY_ID),
 DEFINE_PROP_UINT32("x-pci-sub-device-id", VFIOPCIDevice,
sub_device_id, PCI_ANY_ID),
+DEFINE_PROP_BIT("aer", VFIOPCIDevice, features,
+VFIO_FEATURE_ENABLE_AER_BIT, false),
 /*
  * TODO - support passed fds... is this necessary?
  * DEFINE_PROP_STRING("vfiofd", VFIOPCIDevice, vfiofd_name),
-- 
1.9.3

[Qemu-devel] [v15 13/15] pcie_aer: expose pcie_aer_msg() interface

2016-01-04 Thread Cao jin

From: Chen Fan 

For vfio device, we need to propagate the aer error to
Guest OS. we use the pcie_aer_msg() to send aer error
to guest.

Signed-off-by: Chen Fan 
Reviewed-by: Michael S. Tsirkin 
---
 hw/pci/pcie_aer.c | 2 +-
 include/hw/pci/pcie_aer.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/pci/pcie_aer.c b/hw/pci/pcie_aer.c
index 45f351b..fbbd7d2 100644
--- a/hw/pci/pcie_aer.c
+++ b/hw/pci/pcie_aer.c
@@ -370,7 +370,7 @@ static void pcie_aer_msg_root_port(PCIDevice *dev, const 
PCIEAERMsg *msg)
  *
  * Walk up the bus tree from the device, propagate the error message.
  */
-static void pcie_aer_msg(PCIDevice *dev, const PCIEAERMsg *msg)
+void pcie_aer_msg(PCIDevice *dev, const PCIEAERMsg *msg)
 {
 uint8_t type;
 
diff --git a/include/hw/pci/pcie_aer.h b/include/hw/pci/pcie_aer.h
index 156acb0..c2ee4e2 100644
--- a/include/hw/pci/pcie_aer.h
+++ b/include/hw/pci/pcie_aer.h
@@ -102,5 +102,6 @@ void pcie_aer_root_write_config(PCIDevice *dev,
 
 /* error injection */
 int pcie_aer_inject_error(PCIDevice *dev, const PCIEAERErr *err);
+void pcie_aer_msg(PCIDevice *dev, const PCIEAERMsg *msg);
 
 #endif /* QEMU_PCIE_AER_H */
-- 
1.9.3

[Qemu-devel] [v15 04/15] vfio: make the 4 bytes aligned for capability size

2016-01-04 Thread Cao jin

From: Chen Fan 

this function search the capability from the end, the last
size should 0x100 - pos, not 0xff - pos.

Signed-off-by: Chen Fan 
---
 hw/vfio/pci.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index a63cf85..288f2c7 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -1469,7 +1469,8 @@ static void vfio_unmap_bars(VFIOPCIDevice *vdev)
  */
 static uint8_t vfio_std_cap_max_size(PCIDevice *pdev, uint8_t pos)
 {
-uint8_t tmp, next = 0xff;
+uint8_t tmp;
+uint16_t next = PCI_CONFIG_SPACE_SIZE;
 
 for (tmp = pdev->config[PCI_CAPABILITY_LIST]; tmp;
  tmp = pdev->config[tmp + 1]) {
-- 
1.9.3

[Qemu-devel] [v15 11/15] vfio: add hot reset callback

2016-01-04 Thread Cao jin

From: Chen Fan 

For Vfio device, Once need to recovery devices by bus reset such as AER,
we always need to reset the host bus to recovery the devices under the bus,
so we need to specify to do host bus reset.
---
 hw/vfio/pci.c | 15 +++
 hw/vfio/pci.h |  1 +
 2 files changed, 16 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index ff25c9b..ee88db3 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -1914,6 +1914,8 @@ static int vfio_check_host_bus_reset(VFIOPCIDevice *vdev)
 /* List all affected devices by bus reset */
 devices = &info->devices[0];
 
+vdev->single_depend_dev = (info->count == 1);
+
 /* Verify that we have all the groups required */
 for (i = 0; i < info->count; i++) {
 PCIHostDeviceAddress host;
@@ -3035,6 +3037,18 @@ post_reset:
 vfio_pci_post_reset(vdev);
 }
 
+static void vfio_pci_device_hot_reset(DeviceState *dev)
+{
+PCIDevice *pdev = DO_UPCAST(PCIDevice, qdev, dev);
+VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
+
+if (vdev->features & VFIO_FEATURE_ENABLE_AER) {
+vfio_pci_hot_reset(vdev, vdev->single_depend_dev);
+} else {
+vfio_pci_reset(dev);
+}
+}
+
 static void vfio_instance_init(Object *obj)
 {
 PCIDevice *pci_dev = PCI_DEVICE(obj);
@@ -3082,6 +3096,7 @@ static void vfio_pci_dev_class_init(ObjectClass *klass, 
void *data)
 PCIDeviceClass *pdc = PCI_DEVICE_CLASS(klass);
 
 dc->reset = vfio_pci_reset;
+dc->hot_reset = vfio_pci_device_hot_reset;
 dc->props = vfio_pci_dev_properties;
 dc->vmsd = &vfio_pci_vmstate;
 dc->desc = "VFIO-based PCI device assignment";
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index b385f07..6186e62 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -142,6 +142,7 @@ typedef struct VFIOPCIDevice {
 bool no_kvm_intx;
 bool no_kvm_msi;
 bool no_kvm_msix;
+bool single_depend_dev;
 
 NotifierWithReturn hotplug_notifier;
 } VFIOPCIDevice;
-- 
1.9.3

[Qemu-devel] [v15 01/15] vfio: extract vfio_get_hot_reset_info as a single function

2016-01-04 Thread Cao jin

From: Chen Fan 

the function is used to get affected devices by bus reset.
so here extract it, and can used for aer soon.

Signed-off-by: Chen Fan 
---
 hw/vfio/pci.c | 66 +++
 1 file changed, 48 insertions(+), 18 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 1fb868c..efcd3cd 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -1654,6 +1654,51 @@ static void vfio_check_af_flr(VFIOPCIDevice *vdev, 
uint8_t pos)
 }
 }
 
+/*
+ * return negative with errno, return 0 on success.
+ * if success, the point of ret_info fill with the affected device reset info.
+ *
+ */
+static int vfio_get_hot_reset_info(VFIOPCIDevice *vdev,
+   struct vfio_pci_hot_reset_info **ret_info)
+{
+struct vfio_pci_hot_reset_info *info;
+int ret, count;
+
+*ret_info = NULL;
+
+info = g_malloc0(sizeof(*info));
+info->argsz = sizeof(*info);
+
+ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, info);
+if (ret && errno != ENOSPC) {
+ret = -errno;
+goto error;
+}
+
+count = info->count;
+
+info = g_realloc(info, sizeof(*info) +
+ (count * sizeof(struct vfio_pci_dependent_device)));
+info->argsz = sizeof(*info) +
+  (count * sizeof(struct vfio_pci_dependent_device));
+
+ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, info);
+if (ret) {
+ret = -errno;
+error_report("vfio: hot reset info failed: %m");
+goto error;
+}
+
+*ret_info = info;
+info = NULL;
+
+return 0;
+error:
+g_free(info);
+return ret;
+}
+
 static int vfio_add_std_cap(VFIOPCIDevice *vdev, uint8_t pos)
 {
 PCIDevice *pdev = &vdev->pdev;
@@ -1793,7 +1838,7 @@ static bool vfio_pci_host_match(PCIHostDeviceAddress 
*host1,
 static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
 {
 VFIOGroup *group;
-struct vfio_pci_hot_reset_info *info;
+struct vfio_pci_hot_reset_info *info = NULL;
 struct vfio_pci_dependent_device *devices;
 struct vfio_pci_hot_reset *reset;
 int32_t *fds;
@@ -1805,12 +1850,8 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool 
single)
 vfio_pci_pre_reset(vdev);
 vdev->vbasedev.needs_reset = false;
 
-info = g_malloc0(sizeof(*info));
-info->argsz = sizeof(*info);
-
-ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, info);
-if (ret && errno != ENOSPC) {
-ret = -errno;
+ret = vfio_get_hot_reset_info(vdev, &info);
+if (ret) {
 if (!vdev->has_pm_reset) {
 error_report("vfio: Cannot reset device %04x:%02x:%02x.%x, "
  "no available reset mechanism.", vdev->host.domain,
@@ -1819,18 +1860,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool 
single)
 goto out_single;
 }
 
-count = info->count;
-info = g_realloc(info, sizeof(*info) + (count * sizeof(*devices)));
-info->argsz = sizeof(*info) + (count * sizeof(*devices));
 devices = &info->devices[0];
-
-ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, info);
-if (ret) {
-ret = -errno;
-error_report("vfio: hot reset info failed: %m");
-goto out_single;
-}
-
 trace_vfio_pci_hot_reset_has_dep_devices(vdev->vbasedev.name);
 
 /* Verify that we have all the groups required */
-- 
1.9.3

[Qemu-devel] [v15 10/15] pci: Introduce device hot reset

2016-01-04 Thread Cao jin

From: Chen Fan 

The secondary bus reset in bridge control register setting trigger
a hot reset, Specially for vfio device, we usually need to do a hot
reset for the host bus other than the device reset.

Signed-off-by: Chen Fan 
---
 hw/core/qdev.c | 24 
 hw/pci/pci_bridge.c|  2 +-
 include/hw/qdev-core.h |  3 +++
 3 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index b3ad467..9c48bae 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -311,6 +311,13 @@ static int qdev_reset_one(DeviceState *dev, void *opaque)
 return 0;
 }
 
+static int qdev_hot_reset_one(DeviceState *dev, void *opaque)
+{
+device_hot_reset(dev);
+
+return 0;
+}
+
 static int qbus_reset_one(BusState *bus, void *opaque)
 {
 BusClass *bc = BUS_GET_CLASS(bus);
@@ -335,6 +342,11 @@ void qbus_reset_all(BusState *bus)
 qbus_walk_children(bus, NULL, NULL, qdev_reset_one, qbus_reset_one, NULL);
 }
 
+void qbus_hot_reset_all(BusState *bus)
+{
+qbus_walk_children(bus, NULL, NULL, qdev_hot_reset_one, qbus_reset_one, 
NULL);
+}
+
 void qbus_reset_all_fn(void *opaque)
 {
 BusState *bus = opaque;
@@ -1284,6 +1296,18 @@ void device_reset(DeviceState *dev)
 }
 }
 
+void device_hot_reset(DeviceState *dev)
+{
+DeviceClass *klass = DEVICE_GET_CLASS(dev);
+
+if (klass->hot_reset) {
+klass->hot_reset(dev);
+return;
+}
+
+device_reset(dev);
+}
+
 Object *qdev_get_machine(void)
 {
 static Object *dev;
diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
index 40c97b1..f1903db 100644
--- a/hw/pci/pci_bridge.c
+++ b/hw/pci/pci_bridge.c
@@ -268,7 +268,7 @@ void pci_bridge_write_config(PCIDevice *d,
 newctl = pci_get_word(d->config + PCI_BRIDGE_CONTROL);
 if (~oldctl & newctl & PCI_BRIDGE_CTL_BUS_RESET) {
 /* Trigger hot reset on 0->1 transition. */
-qbus_reset_all(&s->sec_bus.qbus);
+qbus_hot_reset_all(&s->sec_bus.qbus);
 }
 }
 
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index c537969..e9fe4b3 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -131,6 +131,7 @@ typedef struct DeviceClass {
 
 /* callbacks */
 void (*reset)(DeviceState *dev);
+void (*hot_reset)(DeviceState *dev);
 DeviceRealize realize;
 DeviceUnrealize unrealize;
 
@@ -351,6 +352,7 @@ void qdev_reset_all_fn(void *opaque);
  */
 void qbus_reset_all(BusState *bus);
 void qbus_reset_all_fn(void *opaque);
+void qbus_hot_reset_all(BusState *bus);
 
 /* This should go away once we get rid of the NULL bus hack */
 BusState *sysbus_get_default(void);
@@ -372,6 +374,7 @@ void qdev_machine_init(void);
  * Reset a single device (by calling the reset method).
  */
 void device_reset(DeviceState *dev);
+void device_hot_reset(DeviceState *dev);
 
 const struct VMStateDescription *qdev_get_vmsd(DeviceState *dev);
 
-- 
1.9.3

[Qemu-devel] [v15 03/15] pcie: modify the capability size assert

2016-01-04 Thread Cao jin

From: Chen Fan 

 Device's Offset and size can reach PCIE_CONFIG_SPACE_SIZE,
 fix the corresponding assert.

Signed-off-by: Chen Fan 
Reviewed-by: Marcel Apfelbaum 
Reviewed-by: Michael S. Tsirkin 
---
 hw/pci/pcie.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index 0eab29d..8f4c0e5 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -607,7 +607,7 @@ void pcie_add_capability(PCIDevice *dev,
 
 assert(offset >= PCI_CONFIG_SPACE_SIZE);
 assert(offset < offset + size);
-assert(offset + size < PCIE_CONFIG_SPACE_SIZE);
+assert(offset + size <= PCIE_CONFIG_SPACE_SIZE);
 assert(size >= 8);
 assert(pci_is_express(dev));
 
-- 
1.9.3

[Qemu-devel] [v15 07/15] vfio: add aer support for vfio device

2016-01-04 Thread Cao jin

From: Chen Fan 

Calling pcie_aer_init to initilize aer related registers for
vfio device, then reload physical related registers to expose
device capability.

Signed-off-by: Chen Fan 
---
 hw/vfio/pci.c | 81 ---
 hw/vfio/pci.h |  3 +++
 2 files changed, 81 insertions(+), 3 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 64b0867..38b0aa5 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -1832,6 +1832,62 @@ static int vfio_add_std_cap(VFIOPCIDevice *vdev, uint8_t 
pos)
 return 0;
 }
 
+static int vfio_setup_aer(VFIOPCIDevice *vdev, uint8_t cap_ver,
+  int pos, uint16_t size)
+{
+PCIDevice *pdev = &vdev->pdev;
+PCIDevice *dev_iter;
+uint8_t type;
+uint32_t errcap;
+
+if (!(vdev->features & VFIO_FEATURE_ENABLE_AER)) {
+pcie_add_capability(pdev, PCI_EXT_CAP_ID_ERR,
+cap_ver, pos, size);
+return 0;
+}
+
+dev_iter = pci_bridge_get_device(pdev->bus);
+if (!dev_iter) {
+goto error;
+}
+
+while (dev_iter) {
+type = pcie_cap_get_type(dev_iter);
+if ((type != PCI_EXP_TYPE_ROOT_PORT &&
+ type != PCI_EXP_TYPE_UPSTREAM &&
+ type != PCI_EXP_TYPE_DOWNSTREAM)) {
+goto error;
+}
+
+if (!dev_iter->exp.aer_cap) {
+goto error;
+}
+
+dev_iter = pci_bridge_get_device(dev_iter->bus);
+}
+
+errcap = vfio_pci_read_config(pdev, pos + PCI_ERR_CAP, 4);
+/*
+ * The ability to record multiple headers is depending on
+ * the state of the Multiple Header Recording Capable bit and
+ * enabled by the Multiple Header Recording Enable bit.
+ */
+if ((errcap & PCI_ERR_CAP_MHRC) &&
+(errcap & PCI_ERR_CAP_MHRE)) {
+pdev->exp.aer_log.log_max = PCIE_AER_LOG_MAX_DEFAULT;
+} else {
+pdev->exp.aer_log.log_max = 0;
+}
+
+pcie_cap_deverr_init(pdev);
+return pcie_aer_init(pdev, pos, size);
+
+error:
+error_report("vfio: Unable to enable AER for device %s, parent bus "
+ "does not support AER signaling", vdev->vbasedev.name);
+return -1;
+}
+
 static int vfio_add_ext_cap(VFIOPCIDevice *vdev)
 {
 PCIDevice *pdev = &vdev->pdev;
@@ -1839,6 +1895,7 @@ static int vfio_add_ext_cap(VFIOPCIDevice *vdev)
 uint16_t cap_id, next, size;
 uint8_t cap_ver;
 uint8_t *config;
+int ret = 0;
 
 /*
  * In order to avoid breaking config space, create a copy to
@@ -1860,16 +1917,29 @@ static int vfio_add_ext_cap(VFIOPCIDevice *vdev)
  */
 size = vfio_ext_cap_max_size(config, next);
 
-pcie_add_capability(pdev, cap_id, cap_ver, next, size);
-pci_set_long(dev->config + next, PCI_EXT_CAP(cap_id, cap_ver, 0));
+switch (cap_id) {
+case PCI_EXT_CAP_ID_ERR:
+ret = vfio_setup_aer(vdev, cap_ver, next, size);
+break;
+default:
+pcie_add_capability(pdev, cap_id, cap_ver, next, size);
+break;
+}
+
+if (ret) {
+goto out;
+}
+
+pci_set_long(pdev->config + next, PCI_EXT_CAP(cap_id, cap_ver, 0));
 
 /* Use emulated next pointer to allow dropping extended caps */
 pci_long_test_and_set_mask(vdev->emulated_config_bits + next,
PCI_EXT_CAP_NEXT_MASK);
 }
 
+out:
 g_free(config);
-return 0;
+return ret;
 }
 
 static int vfio_add_capabilities(VFIOPCIDevice *vdev)
@@ -2624,6 +2694,11 @@ static int vfio_initfn(PCIDevice *pdev)
 goto out_teardown;
 }
 
+if ((vdev->features & VFIO_FEATURE_ENABLE_AER) &&
+!pdev->exp.aer_cap) {
+goto out_teardown;
+}
+
 /* QEMU emulates all of MSI & MSIX */
 if (pdev->cap_present & QEMU_PCI_CAP_MSIX) {
 memset(vdev->emulated_config_bits + pdev->msix_cap, 0xff,
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index f004d52..48c1f69 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -15,6 +15,7 @@
 #include "qemu-common.h"
 #include "exec/memory.h"
 #include "hw/pci/pci.h"
+#include "hw/pci/pci_bridge.h"
 #include "hw/vfio/vfio-common.h"
 #include "qemu/event_notifier.h"
 #include "qemu/queue.h"
@@ -127,6 +128,8 @@ typedef struct VFIOPCIDevice {
 #define VFIO_FEATURE_ENABLE_VGA (1 << VFIO_FEATURE_ENABLE_VGA_BIT)
 #define VFIO_FEATURE_ENABLE_REQ_BIT 1
 #define VFIO_FEATURE_ENABLE_REQ (1 << VFIO_FEATURE_ENABLE_REQ_BIT)
+#define VFIO_FEATURE_ENABLE_AER_BIT 2
+#define VFIO_FEATURE_ENABLE_AER (1 << VFIO_FEATURE_ENABLE_AER_BIT)
 int32_t bootindex;
 uint8_t pm_cap;
 bool has_vga;
-- 
1.9.3

[Qemu-devel] [v15 06/15] aer: impove pcie_aer_init to support vfio device

2016-01-04 Thread Cao jin

From: Chen Fan 

pcie_aer_init was used to emulate an aer capability for pcie device,
but for vfio device, the aer config space size is mutable and is not
always equal to PCI_ERR_SIZEOF(0x48). it depends on where the TLP Prefix
register required, so here we add a size argument.

Signed-off-by: Chen Fan 
Reviewed-by: Michael S. Tsirkin 
---
 hw/pci-bridge/ioh3420.c| 2 +-
 hw/pci-bridge/xio3130_downstream.c | 2 +-
 hw/pci-bridge/xio3130_upstream.c   | 2 +-
 hw/pci/pcie_aer.c  | 4 ++--
 include/hw/pci/pcie_aer.h  | 2 +-
 5 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/pci-bridge/ioh3420.c b/hw/pci-bridge/ioh3420.c
index cce2fdd..4d9cd3f 100644
--- a/hw/pci-bridge/ioh3420.c
+++ b/hw/pci-bridge/ioh3420.c
@@ -129,7 +129,7 @@ static int ioh3420_initfn(PCIDevice *d)
 goto err_pcie_cap;
 }
 pcie_cap_root_init(d);
-rc = pcie_aer_init(d, IOH_EP_AER_OFFSET);
+rc = pcie_aer_init(d, IOH_EP_AER_OFFSET, PCI_ERR_SIZEOF);
 if (rc < 0) {
 goto err;
 }
diff --git a/hw/pci-bridge/xio3130_downstream.c 
b/hw/pci-bridge/xio3130_downstream.c
index b3a6479..9737041 100644
--- a/hw/pci-bridge/xio3130_downstream.c
+++ b/hw/pci-bridge/xio3130_downstream.c
@@ -92,7 +92,7 @@ static int xio3130_downstream_initfn(PCIDevice *d)
 goto err_pcie_cap;
 }
 pcie_cap_arifwd_init(d);
-rc = pcie_aer_init(d, XIO3130_AER_OFFSET);
+rc = pcie_aer_init(d, XIO3130_AER_OFFSET, PCI_ERR_SIZEOF);
 if (rc < 0) {
 goto err;
 }
diff --git a/hw/pci-bridge/xio3130_upstream.c b/hw/pci-bridge/xio3130_upstream.c
index eada582..4d7f894 100644
--- a/hw/pci-bridge/xio3130_upstream.c
+++ b/hw/pci-bridge/xio3130_upstream.c
@@ -81,7 +81,7 @@ static int xio3130_upstream_initfn(PCIDevice *d)
 }
 pcie_cap_flr_init(d);
 pcie_cap_deverr_init(d);
-rc = pcie_aer_init(d, XIO3130_AER_OFFSET);
+rc = pcie_aer_init(d, XIO3130_AER_OFFSET, PCI_ERR_SIZEOF);
 if (rc < 0) {
 goto err;
 }
diff --git a/hw/pci/pcie_aer.c b/hw/pci/pcie_aer.c
index 98d2c18..45f351b 100644
--- a/hw/pci/pcie_aer.c
+++ b/hw/pci/pcie_aer.c
@@ -94,12 +94,12 @@ static void aer_log_clear_all_err(PCIEAERLog *aer_log)
 aer_log->log_num = 0;
 }
 
-int pcie_aer_init(PCIDevice *dev, uint16_t offset)
+int pcie_aer_init(PCIDevice *dev, uint16_t offset, uint16_t size)
 {
 PCIExpressDevice *exp;
 
 pcie_add_capability(dev, PCI_EXT_CAP_ID_ERR, PCI_ERR_VER,
-offset, PCI_ERR_SIZEOF);
+offset, size);
 exp = &dev->exp;
 exp->aer_cap = offset;
 
diff --git a/include/hw/pci/pcie_aer.h b/include/hw/pci/pcie_aer.h
index 2fb8388..156acb0 100644
--- a/include/hw/pci/pcie_aer.h
+++ b/include/hw/pci/pcie_aer.h
@@ -87,7 +87,7 @@ struct PCIEAERErr {
 
 extern const VMStateDescription vmstate_pcie_aer_log;
 
-int pcie_aer_init(PCIDevice *dev, uint16_t offset);
+int pcie_aer_init(PCIDevice *dev, uint16_t offset, uint16_t size);
 void pcie_aer_exit(PCIDevice *dev);
 void pcie_aer_write_config(PCIDevice *dev,
uint32_t addr, uint32_t val, int len);
-- 
1.9.3

[Qemu-devel] [v15 00/15] vfio-pci: pass the aer error to guest

2016-01-04 Thread Cao jin

From: Chen Fan 

For now, for vfio pci passthough devices when qemu receives
an error from host aer report, currentlly just terminate the guest,
but usually user want to know what error occurred but stopping the
guest, so this patches add aer capability support for vfio device,
and pass the error to guest, and have guest driver to recover
from the error.

v14-v15:
   1. add device hot reset callback
   2. add bus_in_reset for vfio device to avoid multi do host bus reset

v13-v14:
   1. for multifunction device, requiring all functions enable AER.(9/13)
   2. due to all affected functions receive error signal, ignore no
  error occurred function. (12/13)

v12-v13:
   1. since support multifuncion hotplug, here add callback to enable aer.
   2. add pci device pre+post reset for aer host reset.

Chen Fan (15):
  vfio: extract vfio_get_hot_reset_info as a single function
  vfio: squeeze out vfio_pci_do_hot_reset for support bus reset
  pcie: modify the capability size assert
  vfio: make the 4 bytes aligned for capability size
  vfio: add pcie extanded capability support
  aer: impove pcie_aer_init to support vfio device
  vfio: add aer support for vfio device
  vfio: add check host bus reset is support or not
  add check reset mechanism when hotplug vfio device
  pci: Introduce device hot reset
  vfio: add hot reset callback
  vfio: add bus in reset flag
  pcie_aer: expose pcie_aer_msg() interface
  vfio-pci: pass the aer error to guest
  vfio: add 'aer' property to expose aercap

 hw/core/qdev.c |  24 ++
 hw/pci-bridge/ioh3420.c|   2 +-
 hw/pci-bridge/xio3130_downstream.c |   2 +-
 hw/pci-bridge/xio3130_upstream.c   |   2 +-
 hw/pci/pci.c   |  29 ++
 hw/pci/pci_bridge.c|   2 +-
 hw/pci/pcie.c  |   2 +-
 hw/pci/pcie_aer.c  |   6 +-
 hw/vfio/pci.c  | 622 +
 hw/vfio/pci.h  |   7 +
 include/hw/pci/pci_bus.h   |   5 +
 include/hw/pci/pcie_aer.h  |   3 +-
 include/hw/qdev-core.h |   3 +
 include/hw/vfio/vfio-common.h  |   1 +
 14 files changed, 637 insertions(+), 73 deletions(-)

-- 
1.9.3

[Qemu-devel] [v15 02/15] vfio: squeeze out vfio_pci_do_hot_reset for support bus reset

2016-01-04 Thread Cao jin

From: Chen Fan 

squeeze out vfio_pci_do_hot_reset to do host bus reset when AER recovery.

Signed-off-by: Chen Fan 
---
 hw/vfio/pci.c | 75 +++
 1 file changed, 44 insertions(+), 31 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index efcd3cd..a63cf85 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -1699,6 +1699,48 @@ error:
 return ret;
 }
 
+static int vfio_pci_do_hot_reset(VFIOPCIDevice *vdev,
+ struct vfio_pci_hot_reset_info *info)
+{
+VFIOGroup *group;
+struct vfio_pci_hot_reset *reset;
+int32_t *fds;
+int ret, i, count;
+struct vfio_pci_dependent_device *devices;
+
+/* Determine how many group fds need to be passed */
+count = 0;
+devices = &info->devices[0];
+QLIST_FOREACH(group, &vfio_group_list, next) {
+for (i = 0; i < info->count; i++) {
+if (group->groupid == devices[i].group_id) {
+count++;
+break;
+}
+}
+}
+
+reset = g_malloc0(sizeof(*reset) + (count * sizeof(*fds)));
+reset->argsz = sizeof(*reset) + (count * sizeof(*fds));
+fds = &reset->group_fds[0];
+
+/* Fill in group fds */
+QLIST_FOREACH(group, &vfio_group_list, next) {
+for (i = 0; i < info->count; i++) {
+if (group->groupid == devices[i].group_id) {
+fds[reset->count++] = group->fd;
+break;
+}
+}
+}
+
+/* Bus reset! */
+ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_PCI_HOT_RESET, reset);
+g_free(reset);
+
+return ret;
+}
+
 static int vfio_add_std_cap(VFIOPCIDevice *vdev, uint8_t pos)
 {
 PCIDevice *pdev = &vdev->pdev;
@@ -1840,9 +1882,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool 
single)
 VFIOGroup *group;
 struct vfio_pci_hot_reset_info *info = NULL;
 struct vfio_pci_dependent_device *devices;
-struct vfio_pci_hot_reset *reset;
-int32_t *fds;
-int ret, i, count;
+int ret, i;
 bool multi = false;
 
 trace_vfio_pci_hot_reset(vdev->vbasedev.name, single ? "one" : "multi");
@@ -1921,34 +1961,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool 
single)
 goto out_single;
 }
 
-/* Determine how many group fds need to be passed */
-count = 0;
-QLIST_FOREACH(group, &vfio_group_list, next) {
-for (i = 0; i < info->count; i++) {
-if (group->groupid == devices[i].group_id) {
-count++;
-break;
-}
-}
-}
-
-reset = g_malloc0(sizeof(*reset) + (count * sizeof(*fds)));
-reset->argsz = sizeof(*reset) + (count * sizeof(*fds));
-fds = &reset->group_fds[0];
-
-/* Fill in group fds */
-QLIST_FOREACH(group, &vfio_group_list, next) {
-for (i = 0; i < info->count; i++) {
-if (group->groupid == devices[i].group_id) {
-fds[reset->count++] = group->fd;
-break;
-}
-}
-}
-
-/* Bus reset! */
-ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_PCI_HOT_RESET, reset);
-g_free(reset);
+ret = vfio_pci_do_hot_reset(vdev, info);
 
 trace_vfio_pci_hot_reset_result(vdev->vbasedev.name,
 ret ? "%m" : "Success");
-- 
1.9.3

[Qemu-devel] [PATCH v2] l2tpv3: fix cookie decoding

2016-01-04 Thread Alexis Dambricourt

If a 32 bits l2tpv3 frame cookie MSB if set to 1, the cast to uint64_t
cookie will spread 1 to the four most significant bytes.
Then the condition (cookie != s->rx_cookie) becomes false.

Signed-off-by: Alexis Dambricourt 
---
 net/l2tpv3.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/l2tpv3.c b/net/l2tpv3.c
index 8e68e54..21d6119 100644
--- a/net/l2tpv3.c
+++ b/net/l2tpv3.c
@@ -325,7 +325,7 @@ static int l2tpv3_verify_header(NetL2TPV3State *s, uint8_t 
*buf)
 if (s->cookie_is_64) {
 cookie = ldq_be_p(buf + s->cookie_offset);
 } else {
-cookie = ldl_be_p(buf + s->cookie_offset);
+cookie = ldl_be_p(buf + s->cookie_offset) & 0xULL;
 }
 if (cookie != s->rx_cookie) {
 if (!s->header_mismatch) {
-- 
2.6.4

Re: [Qemu-devel] [PATCH v6] spec: add qcow2 bitmaps extension specification

2016-01-04 Thread John Snow

Since Max didn't offer a grammatical review, here's my attempt at some
suggestions.

On 12/23/2015 12:49 PM, Vladimir Sementsov-Ogievskiy wrote:
> The new feature for qcow2: storing bitmaps.
> 
> This patch adds new header extension to qcow2 - Bitmaps Extension. It
> provides an ability to store virtual disk related bitmaps in a qcow2
> image. For now there is only one type of such bitmaps: Dirty Tracking
> Bitmap, which just tracks virtual disk changes from some moment.
> 
> Note: Only bitmaps, relative to the virtual disk, stored in qcow2 file,
> should be stored in this qcow2 file. The size of each bitmap
> (considering its granularity) is equal to virtual disk size.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
> 
> v6:
> 
> - reword bitmap_directory_size description
> - bitmap type: make 0 reserved
> - extra_data_size: resize to 4bytes
>   Also, I've marked this field as "must be zero". We can always change
>   it, if we decide allowing managing app to specify any extra data, by
>   defining some magic value as a top of user extra data.. So, for now
>   non zeor extra_data_size should be considered as an error.
> - swap name and extra_data to give good alignment to extra_data.
> 
> 
> v5:
> 
> - 'Dirty bitmaps' renamed to 'Bitmaps', as we may have several types of
>   bitmaps.
> - rewordings
> - move upper bounds to "Notes about Qemu limits"
> - s/should/must somewhere. (but not everywhere)
> - move name_size field closer to name itself in bitmap header
> - add extra data area to bitmap header
> - move bitmap data description to separate section
> 
>  docs/specs/qcow2.txt | 161 
> ++-
>  1 file changed, 160 insertions(+), 1 deletion(-)
> 
> diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt
> index 121dfc8..b23966a 100644
> --- a/docs/specs/qcow2.txt
> +++ b/docs/specs/qcow2.txt
> @@ -103,7 +103,19 @@ in the description of a field.
>  write to an image with unknown auto-clear features if it
>  clears the respective bits from this field first.
>  
> -Bits 0-63:  Reserved (set to 0)
> +Bit 0:  Bitmaps extension bit.

For consistency, no period after this.

> +This bit is responsible for Bitmaps extension
> +consistency.
> +

I might phrase it as: "This bit indicates consistency for the Bitmaps
extension data."

> +If it is set, but there is no Bitmaps
> +extension, this should be considered as an
> +error.
> +

"This should be considered as an error" can be shortened to just "This
is an error." This makes the sentence top-heavy though, so how about:

"It is an error if this bit is set without the Bitmaps extension present."

> +If it is not set, but there is a Bitmaps
> +extension, its data should be considered as
> +inconsistent.
> +

Let's remove "considered" here. The data *is* inconsistent if this has
happened.

"If the Bitmaps extension is present but this bit is unset, the Bitmaps
extension data is inconsistent."

> +Bits 1-63:  Reserved (set to 0)
>  
>   96 -  99:  refcount_order
>  Describes the width of a reference count block entry 
> (width
> @@ -123,6 +135,7 @@ be stored. Each extension has a structure like the 
> following:
>  0x - End of the header extension area
>  0xE2792ACA - Backing file format name
>  0x6803f857 - Feature name table
> +0x23852875 - Bitmaps extension
>  other  - Unknown header extension, can be safely
>   ignored
>  
> @@ -166,6 +179,34 @@ the header extension data. Each entry look like this:
>  terminated if it has full length)
>  
>  
> +== Bitmaps extension ==
> +
> +Bitmaps extension is an optional header extension. It provides an ability to
> +store virtual disk related bitmaps in a qcow2 image. For now there is only 
> one
> +type of such bitmaps: Dirty Tracking Bitmap, which just tracks virtual disk
> +changes from some moment.
> +

I think "Bitmaps extension" is awkward without "The" leading it, so:

"The Bitmaps extension is an optional header extension."

And from Eric's suggestion:

"It provides the ability to store bitmaps related to a virtual disk.
For now, there is only one bitmap type: Dirty Tracking Bitmap, which
tracks virtual disk changes from some moment."

> +The data of the extension should be considered as consistent only if
> +corresponding auto-clear feature bit is set (see autoclear_features above).
> +

I might remove the parenthetical.

"The data in this extension should be considered consistent only if the
corres

Re: [Qemu-devel] [PATCH v6] spec: add qcow2 bitmaps extension specification

2016-01-04 Thread John Snow



On 12/24/2015 05:00 AM, Vladimir Sementsov-Ogievskiy wrote:
> On 24.12.2015 02:41, Eric Blake wrote:
>> On 12/23/2015 10:49 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> The new feature for qcow2: storing bitmaps.
>>>
>>> This patch adds new header extension to qcow2 - Bitmaps Extension. It
>>> provides an ability to store virtual disk related bitmaps in a qcow2
>>> image. For now there is only one type of such bitmaps: Dirty Tracking
>>> Bitmap, which just tracks virtual disk changes from some moment.
>>>
>>> Note: Only bitmaps, relative to the virtual disk, stored in qcow2 file,
>>> should be stored in this qcow2 file. The size of each bitmap
>>> (considering its granularity) is equal to virtual disk size.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>>> ---
>>>
>>> @@ -166,6 +179,34 @@ the header extension data. Each entry look like
>>> this:
>>>   terminated if it has full length)
>>> +== Bitmaps extension ==
>>> +
>>> +Bitmaps extension is an optional header extension. It provides an
>>> ability to
>>> +store virtual disk related bitmaps in a qcow2 image. For now there
>>> is only one
>>> +type of such bitmaps: Dirty Tracking Bitmap, which just tracks
>>> virtual disk
>>> +changes from some moment.
>> This is already the qcow2 spec, so 'in a qcow2 image' may be redundant.
>>   Possible idea for nicer grammar:
>>
>> It provides the ability to store bitmaps related to a virtual disk.  For
>> now, there is only one bitmap type: Dirty Tracking Bitmap, which tracks
>> virtual disk changes from some moment.
>>
>>
>>> + 17:granularity_bits
>>> +Granularity bits. Valid values are: 0 - 63.
>> Elsewhere, the file has 'valid values: 0-63'; dropping 'are' would make
>> this more consistent.
>>
>>> +
>>> +Note: Qemu currently doesn't support
>>> granularity_bits
>>> +greater than 31.
>>> +
>>> +Granularity is calculated as
>>> +granularity = 1 << granularity_bits
>>> +
>>> +Granularity of the bitmap is how many bytes of
>>> the image
>>> +accounts for one bit of the bitmap.
>>> +
>>> +18 - 19:name_size
>>> +Size of the bitmap name. Valid values: 1 - 1023.
>> Should this be more like:
>> Must be non-zero. Note: Qemu currently doesn't support values greater
>> than 1023.
>>
>>
>>> +=== Bitmap Data ===
>>> +
>>> +As noted above, bitmap data is stored in several (or may be one,
>>> exactly
>>> +bitmap_table_size) separate clusters, described by Bitmap Table.
>> bitmap_table_size was documented as "Number of entries in the Bitmap
>> Table of the bitmap", where each entry is 8 bytes.  But this sounds like
>> bitmap_table_size == 1 implies that the table is exactly 1 cluster (at
>> least 512 bytes).  I think you are trying to imply that the bitmap data
>> occupies ceil(cluster size / 8 / bitmap_tablesize) clusters.
> 
> I don't understand.. No. Bitmap data occupies bitmap_table_size
> clusters. The last one may have some meaningless remaining bits. If
> bitmap_table_size = 1, than bitmap data is stored in "exactly 1"
> cluster. Bitmap table is like page table.
> 

Eric is referring to earlier in the spec where you state:

"bitmap_table_size" "Number of entries in the Bitmap Table of the bitmap."

But later on, it appears as if "bitmap_table_size" refers to a number of
clusters:

"As noted above, bitmap data is stored in several (or may be one,
exactly bitmap_table_size) separate clusters"

Here, one may read "bitmap_table_size" to be referring to a cluster
count -- which is only indirectly true.


I think what is meant is this:

- bitmap_table_size refers to the number of bitmap table entries.
- each bitmap table entry indicates a cluster's worth of data.
- "bitmap_table_size := 0x01" implies eight bytes for the header, but an
entire cluster for data.

So "indirectly," bitmap_table_size refers to the number of clusters that
contain bitmap data, but to be accurately precise, it actually refers to
the number of bitmap table entries.

Correct?

>>
>> I also wonder if you need more text to cover what happens when the
>> number of entries does not end on a cluster boundary.  Must the
>> remaining bits of the cluster containing the tail of the Bitmap be set
>> to all 0, or is it garbage that must be ignored regardless of content?
>>
> 
>

Re: [Qemu-devel] [PATCH v6] spec: add qcow2 bitmaps extension specification

2016-01-04 Thread Max Reitz

On 23.12.2015 18:49, Vladimir Sementsov-Ogievskiy wrote:
> The new feature for qcow2: storing bitmaps.
> 
> This patch adds new header extension to qcow2 - Bitmaps Extension. It
> provides an ability to store virtual disk related bitmaps in a qcow2
> image. For now there is only one type of such bitmaps: Dirty Tracking
> Bitmap, which just tracks virtual disk changes from some moment.
> 
> Note: Only bitmaps, relative to the virtual disk, stored in qcow2 file,
> should be stored in this qcow2 file. The size of each bitmap
> (considering its granularity) is equal to virtual disk size.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---

OK, keeping my eyes open for grammar etc.; here goes nothing:

Generally, most of the qcow2 specification does not consider structure
names to be proper nouns, that is, they are generally written starting
with lower letters (e.g. "refcount table", "L2 table", "image header").

I'd prefer this spelling for all the structures presented herein (e.g.
"bitmaps extension", "bitmap directory", "dirty tracking bitmap", ...),
too, but that is probably a personal preference.

>  docs/specs/qcow2.txt | 161 
> ++-
>  1 file changed, 160 insertions(+), 1 deletion(-)
> 
> diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt
> index 121dfc8..b23966a 100644
> --- a/docs/specs/qcow2.txt
> +++ b/docs/specs/qcow2.txt
> @@ -103,7 +103,19 @@ in the description of a field.
>  write to an image with unknown auto-clear features if it
>  clears the respective bits from this field first.
>  
> -Bits 0-63:  Reserved (set to 0)
> +Bit 0:  Bitmaps extension bit.
> +This bit is responsible for Bitmaps extension
> +consistency.
> +
> +If it is set, but there is no Bitmaps
> +extension, this should be considered as an
> +error.

Maybe correct (this is why a native speaker should be doing this...),
but I'd omit the "as" (i.e. just "this should be considered an error").

> +
> +If it is not set, but there is a Bitmaps
> +extension, its data should be considered as
> +inconsistent.

Same here.

> +
> +Bits 1-63:  Reserved (set to 0)
>  
>   96 -  99:  refcount_order
>  Describes the width of a reference count block entry 
> (width
> @@ -123,6 +135,7 @@ be stored. Each extension has a structure like the 
> following:
>  0x - End of the header extension area
>  0xE2792ACA - Backing file format name
>  0x6803f857 - Feature name table
> +0x23852875 - Bitmaps extension
>  other  - Unknown header extension, can be safely
>   ignored
>  
> @@ -166,6 +179,34 @@ the header extension data. Each entry look like this:
>  terminated if it has full length)
>  
>  
> +== Bitmaps extension ==
> +
> +Bitmaps extension is an optional header extension. It provides an ability to

"The Bitmaps extension..."

Also, while it may be correct as it is, "provides the ability" sounds
better to me.

> +store virtual disk related bitmaps in a qcow2 image. For now there is only 
> one
> +type of such bitmaps: Dirty Tracking Bitmap, which just tracks virtual disk

I'd prefer "The Dirty Tracking Bitmap" or "Dirty Tracking Bitmaps".

> +changes from some moment.

I think, "moment" is rather used for a time span in English. So this
should be "point in time" instead.

(Maybe even "a certain point in time" rather than "some point in time",
but that would be a semantic change.)

> +
> +The data of the extension should be considered as consistent only if

Again, the "as" can be dropped (and maybe it should be).

Also, it should be "...only if the corresponding..." ("the" missing).

> +corresponding auto-clear feature bit is set (see autoclear_features above).
> +
> +The fields of Bitmaps extension are:

"...of the Bitmaps..."

> +
> +  0 -  3:  nb_bitmaps
> +   The number of bitmaps contained in the image. Must be
> +   greater or equal to 1.
> +
> +   Note: Qemu currently only supports up to 65535 bitmaps per
> +   image.
> +
> +  4 -  7:  bitmap_directory_size
> +   Size of the Bitmap Directory in bytes. It is a cumulative

"...It is the cumulative size..."

> +   size of all (nb_bitmaps) bitmap headers.
> +
> +  8 - 15:  bitmap_directory_offset
> +   Offset into the image file at which the Bitmap Directory
> +   starts. Must be aligned to a cluster boundary.
> +
> +
>  == Hos

Re: [Qemu-devel] [PATCH] sdhci: add quirk property for card insert interrupt status on Raspberry Pi

2016-01-04 Thread Andrew Baumann

> From: Peter Crosthwaite [mailto:crosthwaitepe...@gmail.com]
> Sent: Thursday, 31 December 2015 21:38
> On Thu, Dec 31, 2015 at 1:40 PM, Andrew Baumann
>  wrote:
> > This quirk is a workaround for the following hardware behaviour, on
> > which UEFI (specifically, the bootloader for Windows on Pi2) depends:
> >
> > 1. at boot with an SD card present, the interrupt status/enable
> >registers are initially zero
> > 2. upon enabling it in the interrupt enable register, the card insert
> >bit in the interrupt status register is immediately set
> > 3. after a subsequent controller reset, the card insert interrupt does
> >not fire, even if enabled in the interrupt enable register
> >
> 
> This is a baffling symptom. Does prnsts card ejection state fully work
> with physical card ejections and insertions both before and after the
> subsequent controller reset?

I just tested this, by polling prnsts and intsts in a tight loop at board 
startup. At power on with a card inserted, prnsts reads 1FFF. Subsequent 
removal of the card, re-insertion etc. does not change its value. After 
enabling interrupts, I reliably see a card insert interrupt in intsts. If I 
then write zero to the interrupt enable register, the pending card insert 
interrupt remains, which seems to dispel the "mask on read" theory. Once acked 
or reset, the card insert interrupt never recurs. I never saw a card removal 
interrupt.

I did once see a card interrupt (0x100, i.e. the one that comes from the card 
itself, not the controller) after re-inserting the card, but I think that's 
irrelevant.

It's impossible to boot the Pi without having a card inserted (well, maybe with 
a jtag debugger), but I did try inserting the card around 0.5s after applying 
power, and the results were the same.

So, without the prnsts bits, I can't confirm or deny your theory about 
debouncing logic, but either way there is a reliable ghost of a card insertion 
interrupt that is signalled at power on, and remains pending until it is either 
acked or the controller reset, after which point it never recurs. And I'd 
really like to model that somehow without making a mess of sdhci.c :) Any ideas?

Andrew

Re: [Qemu-devel] [Bug 1529859] [NEW] qemu 2.5.0 ivshmem segfault with msi=off option

2016-01-04 Thread Marc-André Lureau

See previously posted patch:

http://lists.gnu.org/archive/html/qemu-stable/2015-12/msg00034.html

On Mon, Jan 4, 2016 at 9:24 PM, Eric Blake  wrote:
> On 12/29/2015 06:38 AM, maquefel wrote:
>> Public bug reported:
>>
>> Launching qemu with "-device ivshmem,chardev=ivshmemid,msi=off -chardev
>> socket,path=/tmp/ivshmem_socket,id=ivshmemid"
>>
>> Causes segfault because, s->msi_vectors is not initialized and
>> s->msi_vectors == 0.
>>
>> Does ivshmem exactly need this line ? :
>>
>> s->msi_vectors[vector].pdev = pdev;
>>
>> It makes no sence for me.
>>
>> Subject: [PATCH] fixed ivshmem empty msi vector on msi=off segfault
>
> Patches require a Signed-off-by: line before they can be applied.
>
>>
>> ---
>>  hw/misc/ivshmem.c | 9 -
>>  1 file changed, 4 insertions(+), 5 deletions(-)
>>
>> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
>> index f73f0c2..2087d5e 100644
>> --- a/hw/misc/ivshmem.c
>> +++ b/hw/misc/ivshmem.c
>> @@ -359,8 +359,6 @@ static CharDriverState* create_eventfd_chr_device(void * 
>> opaque, EventNotifier *
>>  int eventfd = event_notifier_get_fd(n);
>>  CharDriverState *chr;
>>
>> -s->msi_vectors[vector].pdev = pdev;
>> -
>
> This avoids the segfault, but it may break other uses. Are you sure you
> don't need an 'if (s->msi_vectors[vector])' conditional?
>
>>  chr = qemu_chr_open_eventfd(eventfd);
>>
>>  if (chr == NULL) {
>> @@ -1038,10 +1036,11 @@ static void pci_ivshmem_exit(PCIDevice *dev)
>>  }
>>
>>  if (ivshmem_has_feature(s, IVSHMEM_MSI)) {
>> -msix_uninit_exclusive_bar(dev);
>> +msix_uninit_exclusive_bar(dev);
>
> I can't see what's changing here.  Whitespace?
>
>>  }
>> -
>> -g_free(s->msi_vectors);
>> +
>> +if(s->msi_vectors)
>> +   g_free(s->msi_vectors);
>
> This hunk is bogus.  g_free(NULL) already works properly.
>
> --
> Eric Blake   eblake redhat com+1-919-301-3266
> Libvirt virtualization library http://libvirt.org
>



-- 
Marc-André Lureau

Re: [Qemu-devel] [PATCH 2/6] device_tree: introduce load_device_tree_from_sysfs

2016-01-04 Thread Peter Maydell

On 4 January 2016 at 17:37, Eric Auger  wrote:
> Hi Peter,
> On 12/18/2015 03:10 PM, Peter Maydell wrote:
>> Does this code compile on non-Linux hosts? (You've put it in a file
>> which is built everywhere, but it's definitely semantically Linux
>> specific.)
>
> I struggled quite a lot while cross-compiling all dependencies for W32
> (~ http://wiki.qemu.org/Hosts/W32).
>
> Eventually device_tree.c compiles but there is a link issue since lstat
> does not seem to be available with MinGW
>
> But there is definitively a problem with hw/arm/sysbus-fdt.c which is
> not compiling due to the inclusion of #include 
>
> So thanks for raising the concern.
>
> With respect to read_fstree, what is your sugestion: shall I keep it in
> device_tree.c while protecting it with a CONFIG_LINUX or is it better to
> move it, for instance in hw/arm/sysbus-fdt.c?

I don't have a strong opinion, but I don't think this code
is arm-specific, so hw/arm doesn't sound quite right.
A CONFIG_LINUX ifdef might be simplest if there's no obvious
other file to put this.

thanks
-- PMM

Re: [Qemu-devel] [PATCH v5 4/6] expose floppy drive geometry and CMOS type

2016-01-04 Thread Michael S. Tsirkin

On Mon, Jan 04, 2016 at 03:44:42PM -0500, John Snow wrote:
> 
> 
> On 12/30/2015 03:11 PM, Roman Kagan wrote:
> > Make it possible to query the geometry and the CMOS type of a floppy
> > drive outside of the respective source files.
> > 
> > It will be useful, in particular, when dynamically building ACPI tables,
> > and will allow to properly populate the corresponding ACPI objects and
> > thus enable BIOS-less systems to access the floppy drives.
> > 
> > Signed-off-by: Roman Kagan 
> > Cc: "Michael S. Tsirkin" 
> > Cc: Eduardo Habkost 
> > Cc: Igor Mammedov 
> > Cc: John Snow 
> > Cc: Kevin Wolf 
> > Cc: Paolo Bonzini 
> > Cc: Richard Henderson 
> > Cc: qemu-bl...@nongnu.org
> > Cc: qemu-sta...@nongnu.org
> > ---
> > no changes since v4
> > 
> > changes since v3:
> >  - split out into a separate patch to faciliate review
> > 
> >  hw/block/fdc.c | 11 +++
> >  hw/i386/pc.c   |  2 +-
> >  include/hw/block/fdc.h |  2 ++
> >  include/hw/i386/pc.h   |  1 +
> >  4 files changed, 15 insertions(+), 1 deletion(-)
> > 
> > diff --git a/hw/block/fdc.c b/hw/block/fdc.c
> > index 4292ece..c858c5f 100644
> > --- a/hw/block/fdc.c
> > +++ b/hw/block/fdc.c
> > @@ -2408,6 +2408,17 @@ FDriveType isa_fdc_get_drive_type(ISADevice *fdc, 
> > int i)
> >  return isa->state.drives[i].drive;
> >  }
> >  
> > +void isa_fdc_get_drive_geometry(ISADevice *fdc, int i, uint8_t *cylinders,
> > +uint8_t *heads, uint8_t *sectors)
> > +{
> > +FDCtrlISABus *isa = ISA_FDC(fdc);
> > +FDrive *drv = &isa->state.drives[i];
> > +
> > +*cylinders = drv->max_track;
> > +*heads = (drv->flags & FDISK_DBL_SIDES) ? 2 : 1;
> > +*sectors = drv->last_sect;
> > +}
> > +
> >  static const VMStateDescription vmstate_isa_fdc ={
> >  .name = "fdc",
> >  .version_id = 2,
> > diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> > index c36b8cf..99fab83 100644
> > --- a/hw/i386/pc.c
> > +++ b/hw/i386/pc.c
> > @@ -199,7 +199,7 @@ static void pic_irq_request(void *opaque, int irq, int 
> > level)
> >  
> >  #define REG_EQUIPMENT_BYTE  0x14
> >  
> > -static int cmos_get_fd_drive_type(FDriveType fd0)
> > +int cmos_get_fd_drive_type(FDriveType fd0)
> >  {
> >  int val;
> >  
> > diff --git a/include/hw/block/fdc.h b/include/hw/block/fdc.h
> > index d48b2f8..adaf3dc 100644
> > --- a/include/hw/block/fdc.h
> > +++ b/include/hw/block/fdc.h
> > @@ -22,5 +22,7 @@ void sun4m_fdctrl_init(qemu_irq irq, hwaddr io_base,
> > DriveInfo **fds, qemu_irq *fdc_tc);
> >  
> >  FDriveType isa_fdc_get_drive_type(ISADevice *fdc, int i);
> > +void isa_fdc_get_drive_geometry(ISADevice *fdc, int i, uint8_t *cylinders,
> > +uint8_t *heads, uint8_t *sectors);
> >  
> >  #endif
> > diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
> > index 819..d044a9a 100644
> > --- a/include/hw/i386/pc.h
> > +++ b/include/hw/i386/pc.h
> > @@ -268,6 +268,7 @@ typedef void (*cpu_set_smm_t)(int smm, void *arg);
> >  void ioapic_init_gsi(GSIState *gsi_state, const char *parent_name);
> >  
> >  ISADevice *pc_find_fdc0(void);
> > +int cmos_get_fd_drive_type(FDriveType fd0);
> >  
> >  /* acpi_piix.c */
> >  
> > 
> 
> Patches 1,4:
> 
> Reviewed-by: John Snow 
> 
> Aside: Why did they have you split out the test changes to be separate
> from the code? Doesn't that introduce commits where the tests now fail?
> 
> --js

It's only a warning not a failure.

Re: [Qemu-devel] [PATCH v4 00/14] qemu-img map: Allow driver to return file of the allocated block

2016-01-04 Thread Max Reitz

On 24.12.2015 06:50, Fam Zheng wrote:
> v4: Rebase and resend, adding Eric's and Stefan's reviewed-by.
> 
> Fix one typo in patch 13.
> 
> Drop previous patch 14 for a later rework because it is not a hard
> requirement, but it is pending on Eric's QAPI-to-JSON visitor series:
> 
> https://lists.gnu.org/archive/html/qemu-devel/2015-12/msg03929.html
> 
> v3: Add Eric's rev-by in patches 6, 7, 13, 14.
> 12: New, split out from the previous 13.
> 12->13: Refactor "entry_mergable" from imp_map().
> Don't mess the merge conditions. [Paolo]
> Address Eric's comments:
> - Check has_foo before using foo.
> - Remove blank line between comments and definition in schema.
> - Use PRId64 instead of %ld.
> - Merge short lines.
> 
> v2: Add Eric's rev-by in patches 2, 4, 5.
> 01: Refering -> referring in commit message. [Eric]
> Recurse to "file" for sensible "zero" flag. [Paolo]
> 12: New. Make MapEntry a QAPI struct. [Paolo, Markus]
> 
> Original cover letter
> -
> 
> I stumbled upon this when looking at external bitmap formats.
> 
> Current "qemu-img map" command only displays filename if the data is allocated
> in bs (bs->file) itself, or in the backing chain. Otherwise, it displays an
> unfriendly error message:
> 
> $ qemu-img create -f vmdk -o subformat=monolithicFlat /tmp/test.vmdk 1G
> 
> $ qemu-img map /tmp/test.vmdk
> Offset  Length  Mapped to   File
> qemu-img: File contains external, encrypted or compressed clusters.
> 
> This can be improved. This series extends the .bdrv_co_get_block_status
> callback, to let block driver return the BDS of file; then updates all driver
> to implement it; and lastly, it changes qemu-img to use this information in
> "map" command:
> 
> 
> $ qemu-img map /tmp/test.vmdk
> Offset  Length  Mapped to   File
> 0   0x4000  0   /tmp/test-flat.vmdk
> 
> $ qemu-img map --output json /tmp/test.vmdk
> [{"length": 1073741824, "start": 0, "zero": false, "offset": 0, "depth": 
> 0,
>   "file": "/tmp/test-flat.vmdk", "data": true}
> ]
> 
> Fam Zheng (14):
>   block: Add "file" output parameter to block status query functions
>   qcow: Assign bs->file->bs to file in qcow_co_get_block_status
>   qcow2: Assign bs->file->bs to file in qcow2_co_get_block_status

Minor comment: I'd swap these two patches (2 and 3). Patch 1 breaks test
102, patch 3 fixes it again. It would be better to break it for as short
a time as possible.

Alternatively, in order not to break 102 at all, patch 1 would need to
leave the "if (bs->file &&" part of bdrv_co_get_block_status()
(@@ -1544,13 +1550,14 @@) as-is and change it only after the format
block drivers set *file.

Max

>   raw: Assign bs to file in raw_co_get_block_status
>   iscsi: Assign bs to file in iscsi_co_get_block_status
>   parallels: Assign bs->file->bs to file in
> parallels_co_get_block_status
>   qed: Assign bs->file->bs to file in bdrv_qed_co_get_block_status
>   sheepdog: Assign bs to file in sd_co_get_block_status
>   vdi: Assign bs->file->bs to file in vdi_co_get_block_status
>   vpc: Assign bs->file->bs to file in vpc_co_get_block_status
>   vmdk: Return extent's file in bdrv_get_block_status
>   qemu-img: In "map", use the returned "file" from bdrv_get_block_status
>   qemu-img: Make MapEntry a QAPI struct
>   iotests: Add "qemu-img map" test for VMDK extents
> 
>  block/io.c | 42 -
>  block/iscsi.c  |  9 --
>  block/mirror.c |  3 +-
>  block/parallels.c  |  3 +-
>  block/qcow.c   |  3 +-
>  block/qcow2.c  |  3 +-
>  block/qed.c|  6 +++-
>  block/raw-posix.c  |  4 ++-
>  block/raw_bsd.c|  4 ++-
>  block/sheepdog.c   |  5 ++-
>  block/vdi.c|  3 +-
>  block/vmdk.c   | 13 
>  block/vpc.c|  4 ++-
>  block/vvfat.c  |  2 +-
>  include/block/block.h  |  6 ++--
>  include/block/block_int.h  |  3 +-
>  qapi/block-core.json   | 27 
>  qemu-img.c | 78 
> --
>  tests/qemu-iotests/059 | 10 ++
>  tests/qemu-iotests/059.out | 38 ++
>  20 files changed, 198 insertions(+), 68 deletions(-)
> 




signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH] macio: fix overflow in lba to offset conversion for ATAPI devices

2016-01-04 Thread John Snow



On 01/04/2016 03:54 PM, Mark Cave-Ayland wrote:
> On 04/01/16 20:36, John Snow wrote:
> 
>> On 01/04/2016 02:15 PM, Mark Cave-Ayland wrote:
>>> On 04/01/16 19:04, P J P wrote:
>>>
 +-- On Mon, 4 Jan 2016, Mark Cave-Ayland wrote --+
 |  /* Calculate current offset */
 | -offset = (int64_t)(s->lba << 11) + s->io_buffer_index;
 | +offset = ((int64_t)(s->lba) << 11) + s->io_buffer_index;

 Maybe ((int64_t)s->lba << 11) ? No parenthesis around s->lba.
>>>
>>> Yes that works here too (perhaps I was just being over-cautious).
>>> Alex/John, please let me know if you want me to resubmit.
>>>
>>
>> PJP's version should work just fine. I won't ask you to resubmit, though...
> 
> Great, thanks :)
> 
>> ...But, well, while we're here, I have a question for you:
>>
>> So s->lba is an int that we left shift by 11 for a max of (2^43 - 2^11)
>> then we add it against s->io_buffer_index, a uint64_t, so this statement
>> could still in theory overflow.
>>
>> Except not really, since io_buffer_index is bounded (in general) by
>> io_buffer_total_len, which is usually (IDE_DMA_BUF_SECTORS*512 + 4) ->
>> ~132K.
>>
>> I don't think there's any rigorous bounds-checking of io_buffer_index,
>> just ad-hoc checking when we're good enough to remember to do it. And we
>> don't seem to do it anywhere in macio. Is it worth peppering in an
>> assert somewhere that io_buffer_index is reasonably small?
> 
> The DBDMA engine is limited to 16-bit transfers so the maximum transfer
> size is 64K, and s->io_buffer_index is used to hold the current position
> within this transfer so unless we get some very large disks I think we
> should be okay here?
> 

For all non-malicious uses of the code, yes.

If I want to apply some rigorous checking to this bound I should just
add a function to manipulate it centrally in core.c, I think.

> 
> ATB,
> 
> Mark.
> 


I'll pull this and edit it to PJP's suggestion.

--js

Re: [Qemu-devel] [PATCH v4 01/14] block: Add "file" output parameter to block status query functions

2016-01-04 Thread Max Reitz

On 24.12.2015 06:50, Fam Zheng wrote:
> The added parameter can be used to return the BDS pointer which the
> valid offset is referring to. It's value should be ignored unless

*Its

> BDRV_BLOCK_OFFSET_VALID in ret is set.
> 
> Until block drivers fill in the right value, let's clear it explicitly
> right before calling .bdrv_get_block_status.
> 
> Reviewed-by: Stefan Hajnoczi 
> Signed-off-by: Fam Zheng 
> ---
>  block/io.c| 42 --
>  block/iscsi.c |  6 --
>  block/mirror.c|  3 ++-
>  block/parallels.c |  2 +-
>  block/qcow.c  |  2 +-
>  block/qcow2.c |  2 +-
>  block/qed.c   |  3 ++-
>  block/raw-posix.c |  3 ++-
>  block/raw_bsd.c   |  3 ++-
>  block/sheepdog.c  |  2 +-
>  block/vdi.c   |  2 +-
>  block/vmdk.c  |  2 +-
>  block/vpc.c   |  2 +-
>  block/vvfat.c |  2 +-
>  include/block/block.h |  6 --
>  include/block/block_int.h |  3 ++-
>  qemu-img.c|  7 +--
>  17 files changed, 59 insertions(+), 33 deletions(-)
> 

[...]

> diff --git a/include/block/block.h b/include/block/block.h
> index db8e096..70b4984 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h

The comment explaining BDRV_BLOCK_OFFSET_VALID should be changed
accordingly (you could also say "fixed", because apparently it wasn't
always bs->file; sometimes it was bs itself (in case of raw-posix, iscsi
and sheepdog)).

Max



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH] macio: fix overflow in lba to offset conversion for ATAPI devices

2016-01-04 Thread Mark Cave-Ayland

On 04/01/16 20:36, John Snow wrote:

> On 01/04/2016 02:15 PM, Mark Cave-Ayland wrote:
>> On 04/01/16 19:04, P J P wrote:
>>
>>> +-- On Mon, 4 Jan 2016, Mark Cave-Ayland wrote --+
>>> |  /* Calculate current offset */
>>> | -offset = (int64_t)(s->lba << 11) + s->io_buffer_index;
>>> | +offset = ((int64_t)(s->lba) << 11) + s->io_buffer_index;
>>>
>>> Maybe ((int64_t)s->lba << 11) ? No parenthesis around s->lba.
>>
>> Yes that works here too (perhaps I was just being over-cautious).
>> Alex/John, please let me know if you want me to resubmit.
>>
> 
> PJP's version should work just fine. I won't ask you to resubmit, though...

Great, thanks :)

> ...But, well, while we're here, I have a question for you:
> 
> So s->lba is an int that we left shift by 11 for a max of (2^43 - 2^11)
> then we add it against s->io_buffer_index, a uint64_t, so this statement
> could still in theory overflow.
> 
> Except not really, since io_buffer_index is bounded (in general) by
> io_buffer_total_len, which is usually (IDE_DMA_BUF_SECTORS*512 + 4) ->
> ~132K.
> 
> I don't think there's any rigorous bounds-checking of io_buffer_index,
> just ad-hoc checking when we're good enough to remember to do it. And we
> don't seem to do it anywhere in macio. Is it worth peppering in an
> assert somewhere that io_buffer_index is reasonably small?

The DBDMA engine is limited to 16-bit transfers so the maximum transfer
size is 64K, and s->io_buffer_index is used to hold the current position
within this transfer so unless we get some very large disks I think we
should be okay here?


ATB,

Mark.

Re: [Qemu-devel] [PATCH v2 1/1] xlnx-zynqmp: Add support for high DDR memory regions

2016-01-04 Thread Alistair Francis

On Wed, Dec 30, 2015 at 6:35 PM, Peter Crosthwaite
 wrote:
> On Wed, Dec 30, 2015 at 6:19 PM, Peter Crosthwaite
>  wrote:
>> This concept might also be relevant to rPI work, where the SoC aliases
>> RAM. CC Andrew.
>>
>> On Wed, Dec 16, 2015 at 11:27 AM, Alistair Francis
>>  wrote:
>>> The Xilinx ZynqMP SoC and EP108 board supports three memory regions:
>>>  - A 2GB region starting at 0
>>>  - A 32GB region starting at 32GB
>>>  - A 256GB region starting at 768GB
>>>
>>> This patch adds support for the first two memory regions, which is
>>> automatically created based on the size specified by the QEMU memory
>>> command line argument.
>>>
>>> On hardware the physical memory region is one continuous region, it is then
>>> mapped into the three different regions by the DDRC. As we don't model the
>>> DDRC this is done at startup by QEMU. The board creates the memory region 
>>> and
>>> then passes that memory region to the SoC. The SoC then maps the memory
>>> regions.
>>>
>>> Signed-off-by: Alistair Francis 
>>> ---
>>> V2:
>>>  - Create one continuous memory region and pass it to the SoC
>>>
>>> Also, the Xilinx ZynqMP TRM is avaliable at:
>>> http://www.xilinx.com/products/silicon-devices/soc/zynq-ultrascale-mpsoc.html?resultsTablePreSelect=documenttype:User%20Guides#documentation
>>>
>>>  hw/arm/xlnx-ep108.c  | 42 
>>> +++---
>>>  hw/arm/xlnx-zynqmp.c | 31 +++
>>>  include/hw/arm/xlnx-zynqmp.h | 12 
>>>  3 files changed, 66 insertions(+), 19 deletions(-)
>>>
>>> diff --git a/hw/arm/xlnx-ep108.c b/hw/arm/xlnx-ep108.c
>>> index 85b978f..a0174d5 100644
>>> --- a/hw/arm/xlnx-ep108.c
>>> +++ b/hw/arm/xlnx-ep108.c
>>> @@ -22,12 +22,9 @@
>>>
>>>  typedef struct XlnxEP108 {
>>>  XlnxZynqMPState soc;
>>> -MemoryRegion ddr_ram;
>>> +MemoryRegion ddr_board_ram;
>>
>> Rename not needed. The Machine is the board and has no other DDR RAM
>> to refer to other than than the off-SoC DDR chips.
>>
>>>  } XlnxEP108;
>>>
>>> -/* Max 2GB RAM */
>>> -#define EP108_MAX_RAM_SIZE 0x8000ull
>>> -
>>>  static struct arm_boot_info xlnx_ep108_binfo;
>>>
>>>  static void xlnx_ep108_init(MachineState *machine)
>>> @@ -35,20 +32,12 @@ static void xlnx_ep108_init(MachineState *machine)
>>>  XlnxEP108 *s = g_new0(XlnxEP108, 1);
>>>  Error *err = NULL;
>>>
>>> -object_initialize(&s->soc, sizeof(s->soc), TYPE_XLNX_ZYNQMP);
>>> -object_property_add_child(OBJECT(machine), "soc", OBJECT(&s->soc),
>>> -  &error_abort);
>>> -
>>> -object_property_set_bool(OBJECT(&s->soc), true, "realized", &err);
>>> -if (err) {
>>> -error_report("%s", error_get_pretty(err));
>>> -exit(1);
>>> -}
>>> -
>>> -if (machine->ram_size > EP108_MAX_RAM_SIZE) {
>>> +/* Create the memory region to pass to the SoC */
>>> +if (machine->ram_size > XLNX_ZYNQMP_MAX_RAM_SIZE) {
>>>  error_report("WARNING: RAM size " RAM_ADDR_FMT " above max 
>>> supported, "
>>> - "reduced to %llx", machine->ram_size, 
>>> EP108_MAX_RAM_SIZE);
>>> -machine->ram_size = EP108_MAX_RAM_SIZE;
>>> + "reduced to %llx", machine->ram_size,
>>> + XLNX_ZYNQMP_MAX_RAM_SIZE);
>>> +machine->ram_size = XLNX_ZYNQMP_MAX_RAM_SIZE;
>>>  }
>>>
>>>  if (machine->ram_size < 0x0800) {
>>> @@ -56,9 +45,24 @@ static void xlnx_ep108_init(MachineState *machine)
>>>   machine->ram_size);
>>>  }
>>>
>>> -memory_region_allocate_system_memory(&s->ddr_ram, NULL, "ddr-ram",
>>> +memory_region_allocate_system_memory(&s->ddr_board_ram, NULL,
>>> + "xlnx-zynqmp-board-ram",
>>>   machine->ram_size);
>>> -memory_region_add_subregion(get_system_memory(), 0, &s->ddr_ram);
>>> +
>>> +object_initialize(&s->soc, sizeof(s->soc), TYPE_XLNX_ZYNQMP);
>>> +object_property_add_child(OBJECT(machine), "soc", OBJECT(&s->soc),
>>> +  &error_abort);
>>> +
>>> +object_property_set_int(OBJECT(&s->soc), machine->ram_size,
>>> +"ram-size", &error_abort);
>>> +object_property_set_link(OBJECT(&s->soc), OBJECT(&s->ddr_board_ram),
>>> + "xlnx-zynqmp-board-ram", &error_abort);
>>> +
>>> +object_property_set_bool(OBJECT(&s->soc), true, "realized", &err);
>>> +if (err) {
>>> +error_report("%s", error_get_pretty(err));
>>> +exit(1);
>>> +}
>>>
>>>  xlnx_ep108_binfo.ram_size = machine->ram_size;
>>>  xlnx_ep108_binfo.kernel_filename = machine->kernel_filename;
>>> diff --git a/hw/arm/xlnx-zynqmp.c b/hw/arm/xlnx-zynqmp.c
>>> index 87553bb..848d1ff 100644
>>> --- a/hw/arm/xlnx-zynqmp.c
>>> +++ b/hw/arm/xlnx-zynqmp.c
>>> @@ -90,6 +90,11 @@ static void xlnx_zynqmp_init(Object *obj)
>>>&error_abort);
>>>

[Qemu-devel] [PATCH v3 1/1] xlnx-zynqmp: Add support for high DDR memory regions

2016-01-04 Thread Alistair Francis

The Xilinx ZynqMP SoC and EP108 board supports three memory regions:
 - A 2GB region starting at 0
 - A 32GB region starting at 32GB
 - A 256GB region starting at 768GB

This patch adds support for the first two memory regions, which is
automatically created based on the size specified by the QEMU memory
command line argument.

On hardware the physical memory region is one continuous region, it is then
mapped into the three different regions by the DDRC. As we don't model the
DDRC this is done at startup by QEMU. The board creates the memory region and
then passes that memory region to the SoC. The SoC then maps the memory
regions.

Signed-off-by: Alistair Francis 
---
V3:
 - Assert on the RAM sizes
 - Remove ram_size property
 - General fixes
V2:
 - Create one continuous memory region and pass it to the SoC

Also, the Xilinx ZynqMP TRM is avaliable at:
http://www.xilinx.com/products/silicon-devices/soc/zynq-ultrascale-mpsoc.html?resultsTablePreSelect=documenttype:User%20Guides#documentation

 hw/arm/xlnx-ep108.c  | 38 --
 hw/arm/xlnx-zynqmp.c | 36 
 include/hw/arm/xlnx-zynqmp.h | 13 +
 3 files changed, 69 insertions(+), 18 deletions(-)

diff --git a/hw/arm/xlnx-ep108.c b/hw/arm/xlnx-ep108.c
index 85b978f..d55663b 100644
--- a/hw/arm/xlnx-ep108.c
+++ b/hw/arm/xlnx-ep108.c
@@ -25,9 +25,6 @@ typedef struct XlnxEP108 {
 MemoryRegion ddr_ram;
 } XlnxEP108;
 
-/* Max 2GB RAM */
-#define EP108_MAX_RAM_SIZE 0x8000ull
-
 static struct arm_boot_info xlnx_ep108_binfo;
 
 static void xlnx_ep108_init(MachineState *machine)
@@ -35,20 +32,12 @@ static void xlnx_ep108_init(MachineState *machine)
 XlnxEP108 *s = g_new0(XlnxEP108, 1);
 Error *err = NULL;
 
-object_initialize(&s->soc, sizeof(s->soc), TYPE_XLNX_ZYNQMP);
-object_property_add_child(OBJECT(machine), "soc", OBJECT(&s->soc),
-  &error_abort);
-
-object_property_set_bool(OBJECT(&s->soc), true, "realized", &err);
-if (err) {
-error_report("%s", error_get_pretty(err));
-exit(1);
-}
-
-if (machine->ram_size > EP108_MAX_RAM_SIZE) {
+/* Create the memory region to pass to the SoC */
+if (machine->ram_size > XLNX_ZYNQMP_MAX_RAM_SIZE) {
 error_report("WARNING: RAM size " RAM_ADDR_FMT " above max supported, "
- "reduced to %llx", machine->ram_size, EP108_MAX_RAM_SIZE);
-machine->ram_size = EP108_MAX_RAM_SIZE;
+ "reduced to %llx", machine->ram_size,
+ XLNX_ZYNQMP_MAX_RAM_SIZE);
+machine->ram_size = XLNX_ZYNQMP_MAX_RAM_SIZE;
 }
 
 if (machine->ram_size < 0x0800) {
@@ -56,9 +45,22 @@ static void xlnx_ep108_init(MachineState *machine)
  machine->ram_size);
 }
 
-memory_region_allocate_system_memory(&s->ddr_ram, NULL, "ddr-ram",
+memory_region_allocate_system_memory(&s->ddr_ram, NULL,
+ "ddr-ram",
  machine->ram_size);
-memory_region_add_subregion(get_system_memory(), 0, &s->ddr_ram);
+
+object_initialize(&s->soc, sizeof(s->soc), TYPE_XLNX_ZYNQMP);
+object_property_add_child(OBJECT(machine), "soc", OBJECT(&s->soc),
+  &error_abort);
+
+object_property_set_link(OBJECT(&s->soc), OBJECT(&s->ddr_ram),
+ "ddr-ram", &error_abort);
+
+object_property_set_bool(OBJECT(&s->soc), true, "realized", &err);
+if (err) {
+error_report("%s", error_get_pretty(err));
+exit(1);
+}
 
 xlnx_ep108_binfo.ram_size = machine->ram_size;
 xlnx_ep108_binfo.kernel_filename = machine->kernel_filename;
diff --git a/hw/arm/xlnx-zynqmp.c b/hw/arm/xlnx-zynqmp.c
index 87553bb..e749fd0 100644
--- a/hw/arm/xlnx-zynqmp.c
+++ b/hw/arm/xlnx-zynqmp.c
@@ -90,6 +90,11 @@ static void xlnx_zynqmp_init(Object *obj)
   &error_abort);
 }
 
+object_property_add_link(obj, "ddr-ram", TYPE_MEMORY_REGION,
+ (Object **)&s->ddr_ram,
+ qdev_prop_allow_set_link_before_realize,
+ OBJ_PROP_LINK_UNREF_ON_RELEASE, &error_abort);
+
 object_initialize(&s->gic, sizeof(s->gic), TYPE_ARM_GIC);
 qdev_set_parent_bus(DEVICE(&s->gic), sysbus_get_default());
 
@@ -120,9 +125,40 @@ static void xlnx_zynqmp_realize(DeviceState *dev, Error 
**errp)
 MemoryRegion *system_memory = get_system_memory();
 uint8_t i;
 const char *boot_cpu = s->boot_cpu ? s->boot_cpu : "apu-cpu[0]";
+ram_addr_t ddr_low_size, ddr_high_size;
 qemu_irq gic_spi[GIC_NUM_SPI_INTR];
 Error *err = NULL;
 
+s->ram_size = memory_region_size(s->ddr_ram);
+
+/* Create the DDR Memory Regions. User friendly checks shoulud happen at
+ * the board level
+ */
+if (s->ram_size > XLNX_ZYNQMP_MAX_LOW_RAM_SIZE) {
+/* The RAM

Re: [Qemu-devel] [PATCH v4 11/14] vmdk: Return extent's file in bdrv_get_block_status

2016-01-04 Thread Max Reitz

On 24.12.2015 06:50, Fam Zheng wrote:
> Reviewed-by: Stefan Hajnoczi 
> Signed-off-by: Fam Zheng 
> ---
>  block/vmdk.c | 11 +--
>  1 file changed, 5 insertions(+), 6 deletions(-)
> 
> diff --git a/block/vmdk.c b/block/vmdk.c
> index f5a56fd..b60a5af 100644
> --- a/block/vmdk.c
> +++ b/block/vmdk.c
> @@ -1265,6 +1265,7 @@ static int64_t coroutine_fn 
> vmdk_co_get_block_status(BlockDriverState *bs,
>   0, 0);
>  qemu_co_mutex_unlock(&s->lock);
>  
> +index_in_cluster = vmdk_find_index_in_cluster(extent, sector_num);
>  switch (ret) {
>  case VMDK_ERROR:
>  ret = -EIO;
> @@ -1276,15 +1277,13 @@ static int64_t coroutine_fn 
> vmdk_co_get_block_status(BlockDriverState *bs,
>  ret = BDRV_BLOCK_ZERO;
>  break;
>  case VMDK_OK:
> -ret = BDRV_BLOCK_DATA;
> -if (extent->file == bs->file && !extent->compressed) {
> -ret |= BDRV_BLOCK_OFFSET_VALID | offset;
> -}
> -
> +ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
> +ret |= (offset + (index_in_cluster << BDRV_SECTOR_BITS))
> +& BDRV_BLOCK_OFFSET_MASK;
> +*file = extent->file->bs;

What if the extent is compressed?

Max

>  break;
>  }
>  
> -index_in_cluster = vmdk_find_index_in_cluster(extent, sector_num);
>  n = extent->cluster_sectors - index_in_cluster;
>  if (n > nb_sectors) {
>  n = nb_sectors;
> 




signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v5 4/6] expose floppy drive geometry and CMOS type

2016-01-04 Thread John Snow



On 12/30/2015 03:11 PM, Roman Kagan wrote:
> Make it possible to query the geometry and the CMOS type of a floppy
> drive outside of the respective source files.
> 
> It will be useful, in particular, when dynamically building ACPI tables,
> and will allow to properly populate the corresponding ACPI objects and
> thus enable BIOS-less systems to access the floppy drives.
> 
> Signed-off-by: Roman Kagan 
> Cc: "Michael S. Tsirkin" 
> Cc: Eduardo Habkost 
> Cc: Igor Mammedov 
> Cc: John Snow 
> Cc: Kevin Wolf 
> Cc: Paolo Bonzini 
> Cc: Richard Henderson 
> Cc: qemu-bl...@nongnu.org
> Cc: qemu-sta...@nongnu.org
> ---
> no changes since v4
> 
> changes since v3:
>  - split out into a separate patch to faciliate review
> 
>  hw/block/fdc.c | 11 +++
>  hw/i386/pc.c   |  2 +-
>  include/hw/block/fdc.h |  2 ++
>  include/hw/i386/pc.h   |  1 +
>  4 files changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/block/fdc.c b/hw/block/fdc.c
> index 4292ece..c858c5f 100644
> --- a/hw/block/fdc.c
> +++ b/hw/block/fdc.c
> @@ -2408,6 +2408,17 @@ FDriveType isa_fdc_get_drive_type(ISADevice *fdc, int 
> i)
>  return isa->state.drives[i].drive;
>  }
>  
> +void isa_fdc_get_drive_geometry(ISADevice *fdc, int i, uint8_t *cylinders,
> +uint8_t *heads, uint8_t *sectors)
> +{
> +FDCtrlISABus *isa = ISA_FDC(fdc);
> +FDrive *drv = &isa->state.drives[i];
> +
> +*cylinders = drv->max_track;
> +*heads = (drv->flags & FDISK_DBL_SIDES) ? 2 : 1;
> +*sectors = drv->last_sect;
> +}
> +
>  static const VMStateDescription vmstate_isa_fdc ={
>  .name = "fdc",
>  .version_id = 2,
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index c36b8cf..99fab83 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -199,7 +199,7 @@ static void pic_irq_request(void *opaque, int irq, int 
> level)
>  
>  #define REG_EQUIPMENT_BYTE  0x14
>  
> -static int cmos_get_fd_drive_type(FDriveType fd0)
> +int cmos_get_fd_drive_type(FDriveType fd0)
>  {
>  int val;
>  
> diff --git a/include/hw/block/fdc.h b/include/hw/block/fdc.h
> index d48b2f8..adaf3dc 100644
> --- a/include/hw/block/fdc.h
> +++ b/include/hw/block/fdc.h
> @@ -22,5 +22,7 @@ void sun4m_fdctrl_init(qemu_irq irq, hwaddr io_base,
> DriveInfo **fds, qemu_irq *fdc_tc);
>  
>  FDriveType isa_fdc_get_drive_type(ISADevice *fdc, int i);
> +void isa_fdc_get_drive_geometry(ISADevice *fdc, int i, uint8_t *cylinders,
> +uint8_t *heads, uint8_t *sectors);
>  
>  #endif
> diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
> index 819..d044a9a 100644
> --- a/include/hw/i386/pc.h
> +++ b/include/hw/i386/pc.h
> @@ -268,6 +268,7 @@ typedef void (*cpu_set_smm_t)(int smm, void *arg);
>  void ioapic_init_gsi(GSIState *gsi_state, const char *parent_name);
>  
>  ISADevice *pc_find_fdc0(void);
> +int cmos_get_fd_drive_type(FDriveType fd0);
>  
>  /* acpi_piix.c */
>  
> 

Patches 1,4:

Reviewed-by: John Snow 

Aside: Why did they have you split out the test changes to be separate
from the code? Doesn't that introduce commits where the tests now fail?

--js

Re: [Qemu-devel] [RFC PATCH 0/3] x86: Add support for guest DMA dirty page tracking

2016-01-04 Thread Konrad Rzeszutek Wilk

On Sun, Dec 13, 2015 at 01:28:09PM -0800, Alexander Duyck wrote:
> This patch set is meant to be the guest side code for a proof of concept
> involving leaving pass-through devices in the guest during the warm-up
> phase of guest live migration.  In order to accomplish this I have added a

What does that mean? 'warm-up-phase'? 

> new function called dma_mark_dirty that will mark the pages associated with
> the DMA transaction as dirty in the case of either an unmap or a
> sync_.*_for_cpu where the DMA direction is either DMA_FROM_DEVICE or
> DMA_BIDIRECTIONAL.  The pass-through device must still be removed before
> the stop-and-copy phase, however allowing the device to be present should
> significantly improve the performance of the guest during the warm-up
> period.

.. if the warm-up phase is short I presume? If the warm-up phase takes
a long time (busy guest that is of 1TB size) it wouldn't help much as the
tracking of these DMA's may be quite long?

> 
> This current implementation is very preliminary and there are number of
> items still missing.  Specifically in order to make this a more complete 
> solution we need to support:
> 1.  Notifying hypervisor that drivers are dirtying DMA pages received

.. And somehow giving the hypervisor the GPFN so it can retain the PFN in
the VT-d as long as possible.

> 2.  Bypassing page dirtying when it is not needed.

How would this work with with device doing DMA operations _after_ the migration?
That is the driver submits and DMA READ.. migrates away, device is unplugged,
VT-d context is torn down - device does the DMA READ gets an VT-d error...

and what then? How should the device on the other host replay the DMA READ?

> 
> The two mechanisms referenced above would likely require coordination with
> QEMU and as such are open to discussion.  I haven't attempted to address
> them as I am not sure there is a consensus as of yet.  My personal
> preference would be to add a vendor-specific configuration block to the
> emulated pci-bridge interfaces created by QEMU that would allow us to
> essentially extend shpc to support guest live migration with pass-through
> devices.

shpc?

> 
> The functionality in this patch set is currently disabled by default.  To
> enable it you can select "SWIOTLB page dirtying" from the "Processor type
> and features" menu.
> 
> ---
> 
> Alexander Duyck (3):
>   swiotlb: Fold static unmap and sync calls into calling functions
>   xen/swiotlb: Fold static unmap and sync calls into calling functions
>   x86: Create dma_mark_dirty to dirty pages used for DMA by VM guest
> 
> 
>  arch/arm/include/asm/dma-mapping.h   |3 +
>  arch/arm64/include/asm/dma-mapping.h |5 +-
>  arch/ia64/include/asm/dma.h  |1 
>  arch/mips/include/asm/dma-mapping.h  |1 
>  arch/powerpc/include/asm/swiotlb.h   |1 
>  arch/tile/include/asm/dma-mapping.h  |1 
>  arch/unicore32/include/asm/dma-mapping.h |1 
>  arch/x86/Kconfig |   11 
>  arch/x86/include/asm/swiotlb.h   |   26 
>  drivers/xen/swiotlb-xen.c|   92 
> +-
>  lib/swiotlb.c|   83 ---
>  11 files changed, 123 insertions(+), 102 deletions(-)
> 
> --

Re: [Qemu-devel] [PATCH] macio: fix overflow in lba to offset conversion for ATAPI devices

2016-01-04 Thread John Snow

On 01/04/2016 02:15 PM, Mark Cave-Ayland wrote:
> On 04/01/16 19:04, P J P wrote:
> 
>> +-- On Mon, 4 Jan 2016, Mark Cave-Ayland wrote --+
>> |  /* Calculate current offset */
>> | -offset = (int64_t)(s->lba << 11) + s->io_buffer_index;
>> | +offset = ((int64_t)(s->lba) << 11) + s->io_buffer_index;
>>
>> Maybe ((int64_t)s->lba << 11) ? No parenthesis around s->lba.
> 
> Yes that works here too (perhaps I was just being over-cautious).
> Alex/John, please let me know if you want me to resubmit.
> 

PJP's version should work just fine. I won't ask you to resubmit, though...

> 
> ATB,
> 
> Mark.
> 

...But, well, while we're here, I have a question for you:

So s->lba is an int that we left shift by 11 for a max of (2^43 - 2^11)
then we add it against s->io_buffer_index, a uint64_t, so this statement
could still in theory overflow.

Except not really, since io_buffer_index is bounded (in general) by
io_buffer_total_len, which is usually (IDE_DMA_BUF_SECTORS*512 + 4) ->
~132K.

I don't think there's any rigorous bounds-checking of io_buffer_index,
just ad-hoc checking when we're good enough to remember to do it. And we
don't seem to do it anywhere in macio. Is it worth peppering in an
assert somewhere that io_buffer_index is reasonably small?

--js

Re: [Qemu-devel] [Bug 1529859] [NEW] qemu 2.5.0 ivshmem segfault with msi=off option

2016-01-04 Thread Eric Blake

On 12/29/2015 06:38 AM, maquefel wrote:
> Public bug reported:
> 
> Launching qemu with "-device ivshmem,chardev=ivshmemid,msi=off -chardev
> socket,path=/tmp/ivshmem_socket,id=ivshmemid"
> 
> Causes segfault because, s->msi_vectors is not initialized and
> s->msi_vectors == 0.
> 
> Does ivshmem exactly need this line ? :
> 
> s->msi_vectors[vector].pdev = pdev;
> 
> It makes no sence for me.
> 
> Subject: [PATCH] fixed ivshmem empty msi vector on msi=off segfault

Patches require a Signed-off-by: line before they can be applied.

> 
> ---
>  hw/misc/ivshmem.c | 9 -
>  1 file changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
> index f73f0c2..2087d5e 100644
> --- a/hw/misc/ivshmem.c
> +++ b/hw/misc/ivshmem.c
> @@ -359,8 +359,6 @@ static CharDriverState* create_eventfd_chr_device(void * 
> opaque, EventNotifier *
>  int eventfd = event_notifier_get_fd(n);
>  CharDriverState *chr;
>  
> -s->msi_vectors[vector].pdev = pdev;
> -

This avoids the segfault, but it may break other uses. Are you sure you
don't need an 'if (s->msi_vectors[vector])' conditional?

>  chr = qemu_chr_open_eventfd(eventfd);
>  
>  if (chr == NULL) {
> @@ -1038,10 +1036,11 @@ static void pci_ivshmem_exit(PCIDevice *dev)
>  }
>  
>  if (ivshmem_has_feature(s, IVSHMEM_MSI)) {
> -msix_uninit_exclusive_bar(dev);
> +msix_uninit_exclusive_bar(dev);

I can't see what's changing here.  Whitespace?

>  }
> -
> -g_free(s->msi_vectors);
> +
> +if(s->msi_vectors)
> +   g_free(s->msi_vectors);

This hunk is bogus.  g_free(NULL) already works properly.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] How to reserve guest physical region for ACPI

2016-01-04 Thread Laszlo Ersek

Michael CC'd me on the grandparent of the email below. I'll try to add
my thoughts in a single go, with regard to OVMF.

On 12/30/15 20:52, Michael S. Tsirkin wrote:
> On Wed, Dec 30, 2015 at 04:55:54PM +0100, Igor Mammedov wrote:
>> On Mon, 28 Dec 2015 14:50:15 +0200
>> "Michael S. Tsirkin"  wrote:
>>
>>> On Mon, Dec 28, 2015 at 10:39:04AM +0800, Xiao Guangrong wrote:

 Hi Michael, Paolo,

 Now it is the time to return to the challenge that how to reserve guest
 physical region internally used by ACPI.

 Igor suggested that:
 | An alternative place to allocate reserve from could be high memory.
 | For pc we have "reserved-memory-end" which currently makes sure
 | that hotpluggable memory range isn't used by firmware
 (https://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00926.html)

OVMF has no support for the "reserved-memory-end" fw_cfg file. The
reason is that nobody wrote that patch, nor asked for the patch to be
written. (Not implying that just requesting the patch would be
sufficient for the patch to be written.)

>>> I don't want to tie things to reserved-memory-end because this
>>> does not scale: next time we need to reserve memory,
>>> we'll need to find yet another way to figure out what is where.
>> Could you elaborate a bit more on a problem you're seeing?
>>
>> To me it looks like it scales rather well.
>> For example lets imagine that we adding a device
>> that has some on device memory that should be mapped into GPA
>> code to do so would look like:
>>
>>   pc_machine_device_plug_cb(dev)
>>   {
>>...
>>if (dev == OUR_NEW_DEVICE_TYPE) {
>>memory_region_add_subregion(as, current_reserved_end, &dev->mr);
>>set_new_reserved_end(current_reserved_end + 
>> memory_region_size(&dev->mr));
>>}
>>   }
>>
>> we can practically add any number of new devices that way.
> 
> Yes but we'll have to build a host side allocator for these, and that's
> nasty. We'll also have to maintain these addresses indefinitely (at
> least per machine version) as they are guest visible.
> Not only that, there's no way for guest to know if we move things
> around, so basically we'll never be able to change addresses.
> 
> 
>>  
>>> I would like ./hw/acpi/bios-linker-loader.c interface to be extended to
>>> support 64 bit RAM instead

This looks quite doable in OVMF, as long as the blob to allocate from
high memory contains *zero* ACPI tables.

(
Namely, each ACPI table is installed from the containing fw_cfg blob
with EFI_ACPI_TABLE_PROTOCOL.InstallAcpiTable(), and the latter has its
own allocation policy for the *copies* of ACPI tables it installs.

This allocation policy is left unspecified in the section of the UEFI
spec that governs EFI_ACPI_TABLE_PROTOCOL.

The current policy in edk2 (= the reference implementation) seems to be
"allocate from under 4GB". It is currently being changed to "try to
allocate from under 4GB, and if that fails, retry from high memory". (It
is motivated by Aarch64 machines that may have no DRAM at all under 4GB.)
)

>>> (and maybe a way to allocate and
>>> zero-initialize buffer without loading it through fwcfg),

Sounds reasonable.

>>> this way bios
>>> does the allocation, and addresses can be patched into acpi.
>> and then guest side needs to parse/execute some AML that would
>> initialize QEMU side so it would know where to write data.
> 
> Well not really - we can put it in a data table, by itself
> so it's easy to find.

Do you mean acpi_tb_find_table(), acpi_get_table_by_index() /
acpi_get_table_with_size()?

> 
> AML is only needed if access from ACPI is desired.
> 
> 
>> bios-linker-loader is a great interface for initializing some
>> guest owned data and linking it together but I think it adds
>> unnecessary complexity and is misused if it's used to handle
>> device owned data/on device memory in this and VMGID cases.
> 
> I want a generic interface for guest to enumerate these things.  linker
> seems quite reasonable but if you see a reason why it won't do, or want
> to propose a better interface, fine.

* The guest could do the following:
- while processing the ALLOCATE commands, it would make a note where in
GPA space each fw_cfg blob gets allocated
- at the end the guest would prepare a temporary array with a predefined
record format, that associates each fw_cfg blob's name with the concrete
allocation address
- it would create an FWCfgDmaAccess stucture pointing at this array,
with a new "control" bit set (or something similar)
- the guest could write the address of the FWCfgDmaAccess struct to the
appropriate register, as always.

* Another idea would be a GET_ALLOCATION_ADDRESS linker/loader command,
specifying:
- the fw_cfg blob's name, for which to retrieve the guest-allocated
  address (this command could only follow the matching ALLOCATE
  command, never precede it)
- a flag whether the address should be written to IO or MMIO space
  (would be likely IO on x86, MMIO on ARM)
- a unique uint64_t

Re: [Qemu-devel] [PATCH v4 4/5] qmp: Add blockdev-mirror command

2016-01-04 Thread Max Reitz

On 24.12.2015 05:45, Fam Zheng wrote:
> This will start a mirror job from a named device to another named
> device, its relation with drive-mirror is similar with blockdev-backup
> to drive-backup.
> 
> In blockdev-mirror, the target node should be prepared by blockdev-add,
> which will be responsible for assigning a name to the new node, so
> we don't have 'node-name' parameter.
> 
> Signed-off-by: Fam Zheng 
> Acked-by: Markus Armbruster 
> ---
>  blockdev.c   | 62 
> 
>  qapi/block-core.json | 48 
>  qmp-commands.hx  | 50 +-
>  3 files changed, 159 insertions(+), 1 deletion(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v3 4/5] qmp: Add blockdev-mirror command

2016-01-04 Thread Max Reitz

On 24.12.2015 04:25, Fam Zheng wrote:
> On Thu, 12/24 01:53, Max Reitz wrote:
>> On 23.12.2015 06:59, Fam Zheng wrote:
>>> This will start a mirror job from a named device to another named
>>> device, its relation with drive-mirror is similar with blockdev-backup
>>> to drive-backup.
>>>
>>> In blockdev-mirror, the target node should be prepared by blockdev-add,
>>> which will be responsible for assigning a name to the new node, so
>>> we don't have 'node-name' parameter.
>>>
>>> Signed-off-by: Fam Zheng 
>>> ---
>>>  blockdev.c   | 62 
>>> 
>>>  qapi/block-core.json | 47 +++
>>>  qmp-commands.hx  | 48 
>>>  3 files changed, 157 insertions(+)
>>
>> It appears you haven't addressed the comments for v2. I only had a
>> single one (regarding documentation), but Markus had a couple ones, so
>> those may be worth addressing.
> 
> Will look into that.
> 
>>
>>>
>>> diff --git a/blockdev.c b/blockdev.c
>>> index f42e171..2df0c6d 100644
>>> --- a/blockdev.c
>>> +++ b/blockdev.c
>>> @@ -3345,6 +3345,10 @@ static void blockdev_mirror_common(BlockDriverState 
>>> *bs,
>>>  if (bdrv_op_is_blocked(target, BLOCK_OP_TYPE_MIRROR_TARGET, errp)) {
>>>  return;
>>>  }
>>> +if (target->blk) {
>>> +error_setg(errp, "Cannot mirror to an attached block device");
>>> +return;
>>> +}
>>>  
>>>  if (!bs->backing && sync == MIRROR_SYNC_MODE_TOP) {
>>>  sync = MIRROR_SYNC_MODE_FULL;
>>> @@ -3518,6 +3522,64 @@ out:
>>>  aio_context_release(aio_context);
>>>  }
>>>  
>>> +void qmp_blockdev_mirror(const char *device, const char *target,
>>> + bool has_replaces, const char *replaces,
>>> + MirrorSyncMode sync,
>>> + bool has_speed, int64_t speed,
>>> + bool has_granularity, uint32_t granularity,
>>> + bool has_buf_size, int64_t buf_size,
>>> + bool has_on_source_error,
>>> + BlockdevOnError on_source_error,
>>> + bool has_on_target_error,
>>> + BlockdevOnError on_target_error,
>>> + Error **errp)
>>> +{
>>> +BlockDriverState *bs;
>>> +BlockBackend *blk;
>>> +BlockDriverState *target_bs;
>>> +AioContext *aio_context;
>>> +Error *local_err = NULL;
>>> +
>>> +blk = blk_by_name(device);
>>> +if (!blk) {
>>> +error_setg(errp, "Device '%s' not found", device);
>>> +return;
>>> +}
>>> +bs = blk_bs(blk);
>>> +
>>> +if (!bs) {
>>> +error_setg(errp, "Device '%s' has no media", device);
>>> +return;
>>> +}
>>> +
>>> +target_bs = bdrv_lookup_bs(target, target, errp);
>>> +if (!target_bs) {
>>> +return;
>>> +}
>>> +
>>> +aio_context = bdrv_get_aio_context(bs);
>>> +aio_context_acquire(aio_context);
>>> +
>>> +bdrv_ref(target_bs);
>>> +bdrv_set_aio_context(target_bs, aio_context);
>>> +
>>> +blockdev_mirror_common(bs, target_bs,
>>> +   has_replaces, replaces, sync,
>>> +   has_speed, speed,
>>> +   has_granularity, granularity,
>>> +   has_buf_size, buf_size,
>>> +   has_on_source_error, on_source_error,
>>> +   has_on_target_error, on_target_error,
>>> +   true, true,
>>
>> Shouldn't this be "false, false,", or, ideally, set by the user?
> 
> I think true is correct here because then it will be effectively controlled by
> open flags of target. I.e. mirror.c always sets BDRV_REQ_MAY_UNMAP, and
> bdrv_co_write_zeroes has:
> 
> if (!(bs->open_flags & BDRV_O_UNMAP)) {
> flags &= ~BDRV_REQ_MAY_UNMAP;
> }

I was asking because it differs from what drive-mirror does - but that
is probably a good thing (drive-mirror takes this flag from the user
(defaulting to false, which is why I was asking), but it takes the
open_flags for the new image from the mirror source, which is...
Interesting.

So it's probably better this way, right.

Max



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v8 1/2] mirror: Rewrite mirror_iteration

2016-01-04 Thread Max Reitz

On 24.12.2015 04:15, Fam Zheng wrote:
> The "pnum < nb_sectors" condition in deciding whether to actually copy
> data is unnecessarily strict, and the qiov initialization is
> unnecessarily for bdrv_aio_write_zeroes and bdrv_aio_discard.
> 
> Rewrite mirror_iteration to fix both flaws.
> 
> Signed-off-by: Fam Zheng 
> ---
>  block/mirror.c | 344 
> +++--
>  trace-events   |   1 -
>  2 files changed, 213 insertions(+), 132 deletions(-)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index f201f2b..0081c2e 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -46,7 +46,6 @@ typedef struct MirrorBlockJob {
>  BlockdevOnError on_source_error, on_target_error;
>  bool synced;
>  bool should_complete;
> -int64_t sector_num;
>  int64_t granularity;
>  size_t buf_size;
>  int64_t bdev_length;
> @@ -63,6 +62,8 @@ typedef struct MirrorBlockJob {
>  int ret;
>  bool unmap;
>  bool waiting_for_io;
> +int target_cluster_sectors;
> +int max_iov;
>  } MirrorBlockJob;
>  
>  typedef struct MirrorOp {
> @@ -158,115 +159,90 @@ static void mirror_read_complete(void *opaque, int ret)
>  mirror_write_complete, op);
>  }
>  
> -static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
> +/* Round sector_num and/or nb_sectors to target cluster if COW is needed, and
> + * return the offset of the adjusted tail sector against original. */
> +static int mirror_cow_align(MirrorBlockJob *s,
> +int64_t *sector_num,
> +int *nb_sectors)
> +{
> +bool head_need_cow, tail_need_cow;
> +int diff = 0;
> +int chunk_sectors = s->granularity >> BDRV_SECTOR_BITS;
> +
> +head_need_cow = !test_bit(*sector_num / chunk_sectors, s->cow_bitmap);
> +tail_need_cow = !test_bit((*sector_num + *nb_sectors - 1) / 
> chunk_sectors,
> +  s->cow_bitmap);
> +if (head_need_cow || tail_need_cow) {
> +int64_t align_sector_num;
> +int align_nb_sectors;
> +bdrv_round_to_clusters(s->target, *sector_num, *nb_sectors,
> +   &align_sector_num, &align_nb_sectors);
> +if (tail_need_cow) {
> +diff = align_sector_num + align_nb_sectors
> +   - (*sector_num + *nb_sectors);
> +assert(diff >= 0);
> +*nb_sectors += diff;
> +}
> +if (head_need_cow) {
> +int d = *sector_num - align_sector_num;
> +assert(d >= 0);
> +*sector_num = align_sector_num;
> +*nb_sectors += d;
> +}
> +}
> +
> +/* If the resulting chunks are more than max_iov, we have to shrink it
> + * under the alignment restriction. */
> +if (*nb_sectors / chunk_sectors > s->max_iov) {
> +int shrink = *nb_sectors / chunk_sectors - s->max_iov;

Isn't this missing a "shrink *= chunk_sectors"? Because after this line,
shrink's unit seems to be chunks, but the following code seems to presume it

> +if (tail_need_cow) {
> +shrink -= shrink % s->target_cluster_sectors;
> +}
> +diff -= shrink;
> +*nb_sectors -= shrink;
> +}

Max

(The rest looks fine.)



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH] macio: fix overflow in lba to offset conversion for ATAPI devices

2016-01-04 Thread Mark Cave-Ayland

On 04/01/16 19:04, P J P wrote:

> +-- On Mon, 4 Jan 2016, Mark Cave-Ayland wrote --+
> |  /* Calculate current offset */
> | -offset = (int64_t)(s->lba << 11) + s->io_buffer_index;
> | +offset = ((int64_t)(s->lba) << 11) + s->io_buffer_index;
> 
> Maybe ((int64_t)s->lba << 11) ? No parenthesis around s->lba.

Yes that works here too (perhaps I was just being over-cautious).
Alex/John, please let me know if you want me to resubmit.

ATB,

Mark.

Re: [Qemu-devel] [PATCH] macio: fix overflow in lba to offset conversion for ATAPI devices

2016-01-04 Thread P J P

+-- On Mon, 4 Jan 2016, Mark Cave-Ayland wrote --+
|  /* Calculate current offset */
| -offset = (int64_t)(s->lba << 11) + s->io_buffer_index;
| +offset = ((int64_t)(s->lba) << 11) + s->io_buffer_index;

Maybe ((int64_t)s->lba << 11) ? No parenthesis around s->lba.

--
 - P J P
47AF CE69 3A90 54AA 9045 1053 DD13 3D32 FE5B 041F

[Qemu-devel] [PATCH 5/6] nvdimm acpi: let qemu handle _DSM method

2016-01-04 Thread Xiao Guangrong

If dsm memory is successfully patched, we let qemu fully emulate
the dsm method

This patch saves _DSM input parameters into dsm memory, tell dsm
memory address to QEMU, then fetch the result from the dsm memory

Signed-off-by: Xiao Guangrong 
---
 hw/acpi/aml-build.c |  27 ++
 hw/acpi/nvdimm.c| 124 ++--
 include/hw/acpi/aml-build.h |   2 +
 3 files changed, 150 insertions(+), 3 deletions(-)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 677c1a6..e65171f 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1013,6 +1013,19 @@ Aml *create_field_common(int opcode, Aml *srcbuf, Aml 
*index, const char *name)
 return var;
 }
 
+/* ACPI 1.0b: 16.2.5.2 Named Objects Encoding: DefCreateField */
+Aml *aml_create_field(Aml *srcbuf, Aml *index, Aml *len, const char *name)
+{
+Aml *var = aml_alloc();
+build_append_byte(var->buf, 0x5B); /* ExtOpPrefix */
+build_append_byte(var->buf, 0x13); /* CreateFieldOp */
+aml_append(var, srcbuf);
+aml_append(var, index);
+aml_append(var, len);
+build_append_namestring(var->buf, "%s", name);
+return var;
+}
+
 /* ACPI 1.0b: 16.2.5.2 Named Objects Encoding: DefCreateDWordField */
 Aml *aml_create_dword_field(Aml *srcbuf, Aml *index, const char *name)
 {
@@ -1439,6 +1452,20 @@ Aml *aml_alias(const char *source_object, const char 
*alias_object)
 return var;
 }
 
+/* ACPI 1.0b: 16.2.5.4 Type 2 Opcodes Encoding: DefConcat */
+Aml *aml_concatenate(Aml *source1, Aml *source2, Aml *target)
+{
+Aml *var = aml_opcode(0x73 /* ConcatOp */);
+aml_append(var, source1);
+aml_append(var, source2);
+
+if (target) {
+aml_append(var, target);
+}
+
+return var;
+}
+
 void
 build_header(GArray *linker, GArray *table_data,
  AcpiTableHeader *h, const char *sig, int len, uint8_t rev,
diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index a72104c..dfccbc0 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -369,6 +369,24 @@ static void nvdimm_build_nfit(GSList *device_list, GArray 
*table_offsets,
 g_array_free(structures, true);
 }
 
+struct NvdimmDsmIn {
+uint32_t handle;
+uint32_t revision;
+uint32_t function;
+   /* the remaining size in the page is used by arg3. */
+union {
+uint8_t arg3[0];
+};
+} QEMU_PACKED;
+typedef struct NvdimmDsmIn NvdimmDsmIn;
+
+struct NvdimmDsmOut {
+/* the size of buffer filled by QEMU. */
+uint32_t len;
+uint8_t data[0];
+} QEMU_PACKED;
+typedef struct NvdimmDsmOut NvdimmDsmOut;
+
 static uint64_t
 nvdimm_dsm_read(void *opaque, hwaddr addr, unsigned size)
 {
@@ -408,11 +426,21 @@ void nvdimm_init_acpi_state(AcpiNVDIMMState *state, 
MemoryRegion *io,
 
 static void nvdimm_build_common_dsm(Aml *dev)
 {
-Aml *method, *ifctx, *function;
+Aml *method, *ifctx, *function, *unpatched, *field, *high_dsm_mem;
+Aml *result_size, *dsm_mem;
 uint8_t byte_list[1];
 
 method = aml_method(NVDIMM_COMMON_DSM, 4, AML_NOTSERIALIZED);
 function = aml_arg(2);
+dsm_mem = aml_arg(3);
+
+aml_append(method, aml_store(aml_call0(NVDIMM_GET_DSM_MEM), dsm_mem));
+
+/*
+ * do not support any method if DSM memory address has not been
+ * patched.
+ */
+unpatched = aml_if(aml_equal(dsm_mem, aml_int64(0x0)));
 
 /*
  * function 0 is called to inquire what functions are supported by
@@ -421,12 +449,102 @@ static void nvdimm_build_common_dsm(Aml *dev)
 ifctx = aml_if(aml_equal(function, aml_int(0)));
 byte_list[0] = 0 /* No function Supported */;
 aml_append(ifctx, aml_return(aml_buffer(1, byte_list)));
-aml_append(method, ifctx);
+aml_append(unpatched, ifctx);
 
 /* No function is supported yet. */
 byte_list[0] = 1 /* Not Supported */;
-aml_append(method, aml_return(aml_buffer(1, byte_list)));
+aml_append(unpatched, aml_return(aml_buffer(1, byte_list)));
+aml_append(method, unpatched);
+
+/* map DSM memory and IO into ACPI namespace. */
+aml_append(method, aml_operation_region("NPIO", AML_SYSTEM_IO,
+   aml_int(NVDIMM_ACPI_IO_BASE), NVDIMM_ACPI_IO_LEN));
+aml_append(method, aml_operation_region("NRAM", AML_SYSTEM_MEMORY,
+dsm_mem, TARGET_PAGE_SIZE));
+
+/*
+ * DSM notifier:
+ * LNTF: write the low 32 bits of DSM memory.
+ * HNTF: write the high 32 bits of DSM memory and notify QEMU to
+ *   emulate the access.
+ *
+ * They are IO ports so that accessing them will cause VM-exit, the
+ * control will be transferred to QEMU.
+ */
+field = aml_field("NPIO", AML_DWORD_ACC, AML_NOLOCK, AML_PRESERVE);
+aml_append(field, aml_named_field("LNTF",
+   sizeof(uint32_t) * BITS_PER_BYTE));
+aml_append(field, aml_named_field("HNTF",
+   sizeof(uint32_t) * BITS_PER_BYTE));
+aml_append(method, field);
 
+/*
+ * DSM input:
+ * @HDLE: store device

[Qemu-devel] [PATCH 4/6] acpi: allow using acpi named offset for OperationRegion

2016-01-04 Thread Xiao Guangrong

Extend aml_operation_region() to use named object

Signed-off-by: Xiao Guangrong 
---
 hw/acpi/aml-build.c | 4 ++--
 hw/i386/acpi-build.c| 7 ---
 include/hw/acpi/aml-build.h | 2 +-
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 83eadb3..677c1a6 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -958,14 +958,14 @@ Aml *aml_package(uint8_t num_elements)
 
 /* ACPI 1.0b: 16.2.5.2 Named Objects Encoding: DefOpRegion */
 Aml *aml_operation_region(const char *name, AmlRegionSpace rs,
-  uint32_t offset, uint32_t len)
+  Aml *offset, uint32_t len)
 {
 Aml *var = aml_alloc();
 build_append_byte(var->buf, 0x5B); /* ExtOpPrefix */
 build_append_byte(var->buf, 0x80); /* OpRegionOp */
 build_append_namestring(var->buf, "%s", name);
 build_append_byte(var->buf, rs);
-build_append_int(var->buf, offset);
+aml_append(var, offset);
 build_append_int(var->buf, len);
 return var;
 }
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 0836119..ad10c48 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -1139,7 +1139,7 @@ build_ssdt(GArray *table_data, GArray *linker,
 aml_append(dev, aml_name_decl("_CRS", crs));
 
 aml_append(dev, aml_operation_region("PEOR", AML_SYSTEM_IO,
-  misc->pvpanic_port, 1));
+  aml_int(misc->pvpanic_port), 1));
 field = aml_field("PEOR", AML_BYTE_ACC, AML_NOLOCK, AML_PRESERVE);
 aml_append(field, aml_named_field("PEPT", 8));
 aml_append(dev, field);
@@ -1179,7 +1179,8 @@ build_ssdt(GArray *table_data, GArray *linker,
 aml_append(sb_scope, dev);
 /* declare CPU hotplug MMIO region and PRS field to access it */
 aml_append(sb_scope, aml_operation_region(
-"PRST", AML_SYSTEM_IO, pm->cpu_hp_io_base, pm->cpu_hp_io_len));
+"PRST", AML_SYSTEM_IO, aml_int(pm->cpu_hp_io_base),
+pm->cpu_hp_io_len));
 field = aml_field("PRST", AML_BYTE_ACC, AML_NOLOCK, AML_PRESERVE);
 aml_append(field, aml_named_field("PRS", 256));
 aml_append(sb_scope, field);
@@ -1251,7 +1252,7 @@ build_ssdt(GArray *table_data, GArray *linker,
 
 aml_append(scope, aml_operation_region(
 stringify(MEMORY_HOTPLUG_IO_REGION), AML_SYSTEM_IO,
-pm->mem_hp_io_base, pm->mem_hp_io_len)
+aml_int(pm->mem_hp_io_base), pm->mem_hp_io_len)
 );
 
 field = aml_field(stringify(MEMORY_HOTPLUG_IO_REGION), AML_DWORD_ACC,
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index b4726a4..a8d8f3b 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -285,7 +285,7 @@ Aml *aml_interrupt(AmlConsumerAndProducer con_and_pro,
 Aml *aml_io(AmlIODecode dec, uint16_t min_base, uint16_t max_base,
 uint8_t aln, uint8_t len);
 Aml *aml_operation_region(const char *name, AmlRegionSpace rs,
-  uint32_t offset, uint32_t len);
+  Aml *offset, uint32_t len);
 Aml *aml_irq_no_flags(uint8_t irq);
 Aml *aml_named_field(const char *name, unsigned length);
 Aml *aml_reserved_field(unsigned length);
-- 
1.8.3.1

Re: [Qemu-devel] [PATCH COLO-Frame v13 17/39] COLO: Load VMState into qsb before restore it

2016-01-04 Thread Dr. David Alan Gilbert

* zhanghailiang (zhang.zhanghaili...@huawei.com) wrote:
> We should not destroy the state of SVM (Secondary VM) until we receive the 
> whole
> state from the PVM (Primary VM), in case the primary fails in the middle of 
> sending
> the state, so, here we cache the device state in Secondary before restore it.
> 
> Besides, we should call qemu_system_reset() before load VM state,
> which can ensure the data is intact.
> 
> Signed-off-by: zhanghailiang 
> Signed-off-by: Li Zhijian 
> Signed-off-by: Gonglei 
> Reviewed-by: Dr. David Alan Gilbert 
> Cc: Dr. David Alan Gilbert 
> ---
> v13:
> - Fix the define of colo_get_cmd_value() to use 'Error **errp' instead of
>   return value.
> v12:
> - Use the new helper colo_get_cmd_value() instead of colo_ctl_get()
> ---
>  migration/colo.c | 74 
> ++--
>  1 file changed, 72 insertions(+), 2 deletions(-)
> 
> diff --git a/migration/colo.c b/migration/colo.c
> index 925eb3c..8414feb 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -114,6 +114,28 @@ static void colo_get_check_cmd(QEMUFile *f, COLOCommand 
> expect_cmd,
>  }
>  }
>  
> +static uint64_t colo_get_cmd_value(QEMUFile *f, uint32_t expect_cmd,
> +   Error **errp)
> +{
> +Error *local_err = NULL;
> +uint64_t value;
> +int ret;
> +
> +colo_get_check_cmd(f, expect_cmd, &local_err);
> +if (local_err) {
> +error_propagate(errp, local_err);
> +return 0;
> +}
> +
> +value = qemu_get_be64(f);
> +ret = qemu_file_get_error(f);
> +if (ret < 0) {
> +error_setg_errno(errp, -ret, "Failed to get value for COlO commnd: 
> %s",

Tiny typo; you've used 'CO*l*O' rather than 'COLO' - actually all your other 
errors say 'colo'
so it's probably best to standardise.

Dave

> + COLOCommand_lookup[expect_cmd]);
> +}
> +return value;
> +}
> +
>  static int colo_do_checkpoint_transaction(MigrationState *s,
>QEMUSizedBuffer *buffer)
>  {
> @@ -297,6 +319,10 @@ static void colo_wait_handle_cmd(QEMUFile *f, int 
> *checkpoint_request,
>  void *colo_process_incoming_thread(void *opaque)
>  {
>  MigrationIncomingState *mis = opaque;
> +QEMUFile *fb = NULL;
> +QEMUSizedBuffer *buffer = NULL; /* Cache incoming device state */
> +uint64_t total_size;
> +uint64_t value;
>  Error *local_err = NULL;
>  int ret;
>  
> @@ -320,6 +346,12 @@ void *colo_process_incoming_thread(void *opaque)
>  goto out;
>  }
>  
> +buffer = qsb_create(NULL, COLO_BUFFER_BASE_SIZE);
> +if (buffer == NULL) {
> +error_report("Failed to allocate colo buffer!");
> +goto out;
> +}
> +
>  colo_put_cmd(mis->to_src_file, COLO_COMMAND_CHECKPOINT_READY,
>   &local_err);
>  if (local_err) {
> @@ -347,7 +379,21 @@ void *colo_process_incoming_thread(void *opaque)
>  goto out;
>  }
>  
> -/* TODO: read migration data into colo buffer */
> +/* read the VM state total size first */
> +value = colo_get_cmd_value(mis->from_src_file,
> + COLO_COMMAND_VMSTATE_SIZE, &local_err);
> +if (local_err) {
> +goto out;
> +}
> +
> +/* read vm device state into colo buffer */
> +total_size = qsb_fill_buffer(buffer, mis->from_src_file, value);
> +if (total_size != value) {
> +error_report("Got %lu VMState data, less than expected %lu",
> + total_size, value);
> +ret = -EINVAL;
> +goto out;
> +}
>  
>  colo_put_cmd(mis->to_src_file, COLO_COMMAND_VMSTATE_RECEIVED,
>   &local_err);
> @@ -355,13 +401,32 @@ void *colo_process_incoming_thread(void *opaque)
>  goto out;
>  }
>  
> -/* TODO: load vm state */
> +/* open colo buffer for read */
> +fb = qemu_bufopen("r", buffer);
> +if (!fb) {
> +error_report("Can't open colo buffer for read");
> +goto out;
> +}
> +
> +qemu_mutex_lock_iothread();
> +qemu_system_reset(VMRESET_SILENT);
> +if (qemu_loadvm_state(fb) < 0) {
> +error_report("COLO: loadvm failed");
> +qemu_mutex_unlock_iothread();
> +goto out;
> +}
> +qemu_mutex_unlock_iothread();
> +
> +/* TODO: flush vm state */
>  
>  colo_put_cmd(mis->to_src_file, COLO_COMMAND_VMSTATE_LOADED,
>   &local_err);
>  if (local_err) {
>  goto out;
>  }
> +
> +qemu_fclose(fb);
> +fb = NULL;
>  }
>  
>  out:
> @@ -370,6 +435,11 @@ out:
>  error_report_err(local_err);
>  }
>  
> +if (fb) {
> +qemu_fclose(fb);
> +}
> +qsb_free(buffer);
> +
>  qemu_mutex_lock_iothread();
>  colo_release_ram_cache();
>

[Qemu-devel] [PATCH 2/6] nvdimm acpi: initialize the resource used by NVDIMM ACPI

2016-01-04 Thread Xiao Guangrong

IO port 0x0a18 - 0x0a20 in guest is reserved for NVDIMM ACPI emulation,
the table, NVDIMM_DSM_MEM_FILE, will be patched into NVDIMM ACPI
binary code

OSPM uses this port to tell QEMU the final address of the DSM memory
and notify QEMU to emulate the DSM method

Signed-off-by: Xiao Guangrong 
---
 hw/acpi/Makefile.objs   |  2 +-
 hw/acpi/nvdimm.c| 35 +++
 hw/i386/acpi-build.c| 10 +-
 hw/i386/pc.c|  8 +---
 hw/i386/pc_piix.c   |  5 +
 hw/i386/pc_q35.c|  8 +++-
 include/hw/i386/pc.h|  5 -
 include/hw/mem/nvdimm.h | 25 -
 8 files changed, 82 insertions(+), 16 deletions(-)

diff --git a/hw/acpi/Makefile.objs b/hw/acpi/Makefile.objs
index 095597f..84c082d 100644
--- a/hw/acpi/Makefile.objs
+++ b/hw/acpi/Makefile.objs
@@ -2,7 +2,7 @@ common-obj-$(CONFIG_ACPI_X86) += core.o piix4.o pcihp.o
 common-obj-$(CONFIG_ACPI_X86_ICH) += ich9.o tco.o
 common-obj-$(CONFIG_ACPI_CPU_HOTPLUG) += cpu_hotplug.o
 common-obj-$(CONFIG_ACPI_MEMORY_HOTPLUG) += memory_hotplug.o
-common-obj-$(CONFIG_ACPI_NVDIMM) += nvdimm.o
+obj-$(CONFIG_ACPI_NVDIMM) += nvdimm.o
 common-obj-$(CONFIG_ACPI) += acpi_interface.o
 common-obj-$(CONFIG_ACPI) += bios-linker-loader.o
 common-obj-$(CONFIG_ACPI) += aml-build.o
diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index a2c58dd..bc7cd8f 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -28,6 +28,7 @@
 
 #include "hw/acpi/acpi.h"
 #include "hw/acpi/aml-build.h"
+#include "hw/nvram/fw_cfg.h"
 #include "hw/mem/nvdimm.h"
 
 static int nvdimm_plugged_device_list(Object *obj, void *opaque)
@@ -367,6 +368,40 @@ static void nvdimm_build_nfit(GSList *device_list, GArray 
*table_offsets,
 g_array_free(structures, true);
 }
 
+static uint64_t
+nvdimm_dsm_read(void *opaque, hwaddr addr, unsigned size)
+{
+return 0;
+}
+
+static void
+nvdimm_dsm_write(void *opaque, hwaddr addr, uint64_t val, unsigned size)
+{
+}
+
+static const MemoryRegionOps nvdimm_dsm_ops = {
+.read = nvdimm_dsm_read,
+.write = nvdimm_dsm_write,
+.endianness = DEVICE_LITTLE_ENDIAN,
+.valid = {
+.min_access_size = 4,
+.max_access_size = 4,
+},
+};
+
+void nvdimm_init_acpi_state(AcpiNVDIMMState *state, MemoryRegion *io,
+FWCfgState *fw_cfg, Object *owner)
+{
+memory_region_init_io(&state->io_mr, owner, &nvdimm_dsm_ops, state,
+  "nvdimm-acpi-io", NVDIMM_ACPI_IO_LEN);
+memory_region_add_subregion(io, NVDIMM_ACPI_IO_BASE, &state->io_mr);
+
+state->dsm_mem = g_array_new(false, true /* clear */, 1);
+acpi_data_push(state->dsm_mem, TARGET_PAGE_SIZE);
+fw_cfg_add_file(fw_cfg, NVDIMM_DSM_MEM_FILE, state->dsm_mem->data,
+state->dsm_mem->len);
+}
+
 #define NVDIMM_COMMON_DSM  "NCAL"
 
 static void nvdimm_build_common_dsm(Aml *dev)
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 4674461..0836119 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -39,7 +39,6 @@
 #include "hw/loader.h"
 #include "hw/isa/isa.h"
 #include "hw/acpi/memory_hotplug.h"
-#include "hw/mem/nvdimm.h"
 #include "sysemu/tpm.h"
 #include "hw/acpi/tpm.h"
 #include "sysemu/tpm_backend.h"
@@ -1696,13 +1695,6 @@ static bool acpi_has_iommu(void)
 return intel_iommu && !ambiguous;
 }
 
-static bool acpi_has_nvdimm(void)
-{
-PCMachineState *pcms = PC_MACHINE(qdev_get_machine());
-
-return pcms->nvdimm;
-}
-
 static
 void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables *tables)
 {
@@ -1787,7 +1779,7 @@ void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables 
*tables)
 build_dmar_q35(tables_blob, tables->linker);
 }
 
-if (acpi_has_nvdimm()) {
+if (guest_info->has_nvdimm) {
 nvdimm_build_acpi(table_offsets, tables_blob, tables->linker,
   pm.dsdt_revision);
 }
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 459260b..c7819e7 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1220,6 +1220,8 @@ PcGuestInfo *pc_guest_info_init(PCMachineState *pcms)
 }
 }
 
+guest_info->has_nvdimm = pcms->acpi_nvdimm_state.is_enabled;
+
 guest_info_state->machine_done.notify = pc_guest_info_machine_done;
 qemu_add_machine_init_done_notifier(&guest_info_state->machine_done);
 return guest_info;
@@ -1869,14 +1871,14 @@ static bool pc_machine_get_nvdimm(Object *obj, Error 
**errp)
 {
 PCMachineState *pcms = PC_MACHINE(obj);
 
-return pcms->nvdimm;
+return pcms->acpi_nvdimm_state.is_enabled;
 }
 
 static void pc_machine_set_nvdimm(Object *obj, bool value, Error **errp)
 {
 PCMachineState *pcms = PC_MACHINE(obj);
 
-pcms->nvdimm = value;
+pcms->acpi_nvdimm_state.is_enabled = value;
 }
 
 static void pc_machine_initfn(Object *obj)
@@ -1915,7 +1917,7 @@ static void pc_machine_initfn(Object *obj)
 &error_abort);
 
 /* nvdimm is disabled on default. */
-pcms->nvdimm = false;
+

[Qemu-devel] [PATCH 3/6] nvdimm acpi: introduce patched dsm memory

2016-01-04 Thread Xiao Guangrong

The dsm memory is used to save the input parameters and store
the dsm result which is filled by QEMU.

The address of dsm memory is decided by bios and patched into
int64 object returned by "MEMA" method

Signed-off-by: Xiao Guangrong 
---
 hw/acpi/aml-build.c | 12 
 hw/acpi/nvdimm.c| 24 ++--
 include/hw/acpi/aml-build.h |  1 +
 3 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 78e1290..83eadb3 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -394,6 +394,18 @@ Aml *aml_int(const uint64_t val)
 }
 
 /*
+ * ACPI 1.0b: 16.2.3 Data Objects Encoding:
+ * encode: QWordConst
+ */
+Aml *aml_int64(const uint64_t val)
+{
+Aml *var = aml_alloc();
+build_append_byte(var->buf, 0x0E); /* QWordPrefix */
+build_append_int_noprefix(var->buf, val, 8);
+return var;
+}
+
+/*
  * helper to construct NameString, which returns Aml object
  * for using with aml_append or other aml_* terms
  */
diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index bc7cd8f..a72104c 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -28,6 +28,7 @@
 
 #include "hw/acpi/acpi.h"
 #include "hw/acpi/aml-build.h"
+#include "hw/acpi/bios-linker-loader.h"
 #include "hw/nvram/fw_cfg.h"
 #include "hw/mem/nvdimm.h"
 
@@ -402,7 +403,8 @@ void nvdimm_init_acpi_state(AcpiNVDIMMState *state, 
MemoryRegion *io,
 state->dsm_mem->len);
 }
 
-#define NVDIMM_COMMON_DSM  "NCAL"
+#define NVDIMM_GET_DSM_MEM  "MEMA"
+#define NVDIMM_COMMON_DSM   "NCAL"
 
 static void nvdimm_build_common_dsm(Aml *dev)
 {
@@ -468,7 +470,8 @@ static void nvdimm_build_ssdt(GSList *device_list, GArray 
*table_offsets,
   GArray *table_data, GArray *linker,
   uint8_t revision)
 {
-Aml *ssdt, *sb_scope, *dev;
+Aml *ssdt, *sb_scope, *dev, *method;
+int offset;
 
 acpi_add_table(table_offsets, table_data);
 
@@ -499,9 +502,26 @@ static void nvdimm_build_ssdt(GSList *device_list, GArray 
*table_offsets,
 
 aml_append(sb_scope, dev);
 
+/*
+ * leave it at the end of ssdt so that we can conveniently get the
+ * offset of int64 object returned by the function which will be
+ * patched with the real address of the dsm memory by BIOS.
+ */
+method = aml_method(NVDIMM_GET_DSM_MEM, 0, AML_NOTSERIALIZED);
+aml_append(method, aml_return(aml_int64(0x0)));
+aml_append(sb_scope, method);
 aml_append(ssdt, sb_scope);
 /* copy AML table into ACPI tables blob and patch header there */
 g_array_append_vals(table_data, ssdt->buf->data, ssdt->buf->len);
+
+offset = table_data->len - 8;
+
+bios_linker_loader_alloc(linker, NVDIMM_DSM_MEM_FILE, TARGET_PAGE_SIZE,
+ false /* high memory */);
+bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
+   NVDIMM_DSM_MEM_FILE, table_data,
+   table_data->data + offset,
+   sizeof(uint64_t));
 build_header(linker, table_data,
 (void *)(table_data->data + table_data->len - ssdt->buf->len),
 "SSDT", ssdt->buf->len, revision, "NVDIMM");
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index ef44d02..b4726a4 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -246,6 +246,7 @@ Aml *aml_name(const char *name_format, ...) GCC_FMT_ATTR(1, 
2);
 Aml *aml_name_decl(const char *name, Aml *val);
 Aml *aml_return(Aml *val);
 Aml *aml_int(const uint64_t val);
+Aml *aml_int64(const uint64_t val);
 Aml *aml_arg(int pos);
 Aml *aml_to_integer(Aml *arg);
 Aml *aml_to_hexstring(Aml *src, Aml *dst);
-- 
1.8.3.1

[Qemu-devel] [PATCH 6/6] nvdimm acpi: emulate dsm method

2016-01-04 Thread Xiao Guangrong

Emulate dsm method after IO VM-exit

Currently, we only introduce the framework and no function is actually
supported

Signed-off-by: Xiao Guangrong 
---
 hw/acpi/aml-build.c |  2 +-
 hw/acpi/nvdimm.c| 83 -
 include/hw/acpi/aml-build.h |  1 +
 include/hw/mem/nvdimm.h | 17 ++
 4 files changed, 101 insertions(+), 2 deletions(-)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index e65171f..5a7644a 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -231,7 +231,7 @@ static void build_extop_package(GArray *package, uint8_t op)
 build_prepend_byte(package, 0x5B); /* ExtOpPrefix */
 }
 
-static void build_append_int_noprefix(GArray *table, uint64_t value, int size)
+void build_append_int_noprefix(GArray *table, uint64_t value, int size)
 {
 int i;
 
diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index dfccbc0..7be9857 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -390,12 +390,80 @@ typedef struct NvdimmDsmOut NvdimmDsmOut;
 static uint64_t
 nvdimm_dsm_read(void *opaque, hwaddr addr, unsigned size)
 {
+fprintf(stderr, "BUG: we never read _DSM IO Port.\n");
 return 0;
 }
 
 static void
 nvdimm_dsm_write(void *opaque, hwaddr addr, uint64_t val, unsigned size)
 {
+AcpiNVDIMMState *state = opaque;
+NvdimmDsmIn *in;
+hwaddr dsm_mem_addr;
+GArray *out;
+uint32_t buf_size;
+
+nvdimm_debug("write address %#lx value %#lx.\n", addr, val);
+
+if (size != sizeof(uint32_t)) {
+fprintf(stderr, "BUG: invalid IO bit width %#x.\n", size);
+return;
+}
+
+switch (addr) {
+case 0:
+state->low_dsm_mem_addr = val;
+return;
+case sizeof(uint32_t):
+state->high_dsm_mem_addr = val;
+break;
+default:
+fprintf(stderr, "BUG: IO access address %#lx is not dword"
+" aligned.\n", addr);
+return;
+};
+
+dsm_mem_addr = state->low_dsm_mem_addr;
+dsm_mem_addr |= (hwaddr)state->high_dsm_mem_addr << (sizeof(uint32_t) *
+BITS_PER_BYTE);
+nvdimm_debug("dsm address %#lx\n", dsm_mem_addr);
+
+/*
+ * The DSM memory is mapped to guest address space so an evil guest
+ * can change its content while we are doing DSM emulation. Avoid
+ * this by copying DSM memory to QEMU local memory.
+ */
+in = g_malloc(TARGET_PAGE_SIZE);
+cpu_physical_memory_read(dsm_mem_addr, in, TARGET_PAGE_SIZE);
+
+le32_to_cpus(&in->revision);
+le32_to_cpus(&in->function);
+le32_to_cpus(&in->handle);
+
+nvdimm_debug("Revision %#x Handler %#x Function %#x.\n", in->revision,
+ in->handle, in->function);
+
+out = g_array_new(false, true /* clear */, 1);
+
+/*
+ * function 0 is called to inquire what functions are supported by
+ * OSPM
+ */
+if (in->function == 0) {
+build_append_int_noprefix(out, 0 /* No function Supported */,
+  sizeof(uint8_t));
+} else {
+/* No function is supported yet. */
+build_append_int_noprefix(out, 1 /* Not Supported */,
+  sizeof(uint8_t));
+}
+
+buf_size = cpu_to_le32(out->len);
+cpu_physical_memory_write(dsm_mem_addr, &buf_size, sizeof(buf_size));
+cpu_physical_memory_write(dsm_mem_addr + sizeof(buf_size), out->data,
+  out->len);
+g_free(in);
+g_array_free(out, true);
 }
 
 static const MemoryRegionOps nvdimm_dsm_ops = {
@@ -408,6 +476,17 @@ static const MemoryRegionOps nvdimm_dsm_ops = {
 },
 };
 
+static const VMStateDescription nvdimm_acpi_vmstate = {
+.name = "nvdimm_acpi_vmstate",
+.version_id = 1,
+.minimum_version_id = 1,
+.fields = (VMStateField[]) {
+VMSTATE_UINT32(low_dsm_mem_addr, AcpiNVDIMMState),
+VMSTATE_UINT32(high_dsm_mem_addr, AcpiNVDIMMState),
+VMSTATE_END_OF_LIST()
+},
+};
+
 void nvdimm_init_acpi_state(AcpiNVDIMMState *state, MemoryRegion *io,
 FWCfgState *fw_cfg, Object *owner)
 {
@@ -419,6 +498,8 @@ void nvdimm_init_acpi_state(AcpiNVDIMMState *state, 
MemoryRegion *io,
 acpi_data_push(state->dsm_mem, TARGET_PAGE_SIZE);
 fw_cfg_add_file(fw_cfg, NVDIMM_DSM_MEM_FILE, state->dsm_mem->data,
 state->dsm_mem->len);
+
+vmstate_register(NULL, 0, &nvdimm_acpi_vmstate, state);
 }
 
 #define NVDIMM_GET_DSM_MEM  "MEMA"
@@ -430,7 +511,7 @@ static void nvdimm_build_common_dsm(Aml *dev)
 Aml *result_size, *dsm_mem;
 uint8_t byte_list[1];
 
-method = aml_method(NVDIMM_COMMON_DSM, 4, AML_NOTSERIALIZED);
+method = aml_method(NVDIMM_COMMON_DSM, 4, AML_SERIALIZED);
 function = aml_arg(2);
 dsm_mem = aml_arg(3);
 
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index 6c1816e..2fa8daa 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -

[Qemu-devel] [PATCH 1/6] pc: acpi: bump DSDT/SSDT compliance revision to v2

2016-01-04 Thread Xiao Guangrong

From: Igor Mammedov 

It turns on 64-bit integer handling in OSPM, which will be used for
writing simpler/smaller AML code in following patch.

Tested with Windows XP and Windows Server 2008, Linux:
  * XP doesn't care about revision and continues to use 32 integers
   and boots just fine with this change.
  * WS 2008 and Linux - support rev2 and use 64-bit integers

[
  Xiao: make dsdt/ssdt be v1 in qemu version <= 2.5 to keep
compatible.
]
Signed-off-by: Igor Mammedov 
Signed-off-by: Xiao Guangrong 
---
 hw/acpi/ich9.c  | 32 
 hw/acpi/nvdimm.c| 10 ++
 hw/acpi/piix4.c |  3 +++
 hw/i386/acpi-build.c| 24 +---
 include/hw/acpi/ich9.h  |  2 ++
 include/hw/i386/pc.h| 14 +-
 include/hw/mem/nvdimm.h |  2 +-
 7 files changed, 74 insertions(+), 13 deletions(-)

diff --git a/hw/acpi/ich9.c b/hw/acpi/ich9.c
index 1c7fcfa..b26f4cc 100644
--- a/hw/acpi/ich9.c
+++ b/hw/acpi/ich9.c
@@ -400,6 +400,33 @@ static void ich9_pm_set_enable_tco(Object *obj, bool 
value, Error **errp)
 s->pm.enable_tco = value;
 }
 
+static void ich9_pm_get_dsdt_revision(Object *obj, Visitor *v,
+  void *opaque, const char *name,
+  Error **errp)
+{
+ICH9LPCPMRegs *pm = opaque;
+uint8_t value = pm->dsdt_revision;
+
+visit_type_uint8(v, &value, name, errp);
+}
+
+static void ich9_pm_set_dsdt_revision(Object *obj, Visitor *v,
+  void *opaque, const char *name,
+  Error **errp)
+{
+ICH9LPCPMRegs *pm = opaque;
+Error *local_err = NULL;
+uint8_t value;
+
+visit_type_uint8(v, &value, name, &local_err);
+if (local_err) {
+goto out;
+}
+pm->dsdt_revision = value;
+out:
+error_propagate(errp, local_err);
+}
+
 void ich9_pm_add_properties(Object *obj, ICH9LPCPMRegs *pm, Error **errp)
 {
 static const uint32_t gpe0_len = ICH9_PMIO_GPE0_LEN;
@@ -407,6 +434,7 @@ void ich9_pm_add_properties(Object *obj, ICH9LPCPMRegs *pm, 
Error **errp)
 pm->disable_s3 = 0;
 pm->disable_s4 = 0;
 pm->s4_val = 2;
+pm->dsdt_revision = 2;
 
 object_property_add_uint32_ptr(obj, ACPI_PM_PROP_PM_IO_BASE,
&pm->pm_io_base, errp);
@@ -435,6 +463,10 @@ void ich9_pm_add_properties(Object *obj, ICH9LPCPMRegs 
*pm, Error **errp)
  ich9_pm_get_enable_tco,
  ich9_pm_set_enable_tco,
  NULL);
+object_property_add(obj, ACPI_DSDT_REVISION, "uint8",
+ich9_pm_get_dsdt_revision,
+ich9_pm_set_dsdt_revision,
+NULL, pm, NULL);
 }
 
 void ich9_pm_device_plug_cb(ICH9LPCPMRegs *pm, DeviceState *dev, Error **errp)
diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index 9534418..a2c58dd 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -430,7 +430,8 @@ static void nvdimm_build_nvdimm_devices(GSList 
*device_list, Aml *root_dev)
 }
 
 static void nvdimm_build_ssdt(GSList *device_list, GArray *table_offsets,
-  GArray *table_data, GArray *linker)
+  GArray *table_data, GArray *linker,
+  uint8_t revision)
 {
 Aml *ssdt, *sb_scope, *dev;
 
@@ -468,12 +469,12 @@ static void nvdimm_build_ssdt(GSList *device_list, GArray 
*table_offsets,
 g_array_append_vals(table_data, ssdt->buf->data, ssdt->buf->len);
 build_header(linker, table_data,
 (void *)(table_data->data + table_data->len - ssdt->buf->len),
-"SSDT", ssdt->buf->len, 1, "NVDIMM");
+"SSDT", ssdt->buf->len, revision, "NVDIMM");
 free_aml_allocator();
 }
 
 void nvdimm_build_acpi(GArray *table_offsets, GArray *table_data,
-   GArray *linker)
+   GArray *linker, uint8_t revision)
 {
 GSList *device_list;
 
@@ -483,6 +484,7 @@ void nvdimm_build_acpi(GArray *table_offsets, GArray 
*table_data,
 return;
 }
 nvdimm_build_nfit(device_list, table_offsets, table_data, linker);
-nvdimm_build_ssdt(device_list, table_offsets, table_data, linker);
+nvdimm_build_ssdt(device_list, table_offsets, table_data, linker,
+  revision);
 g_slist_free(device_list);
 }
diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c
index 2cd2fee..9d365f8 100644
--- a/hw/acpi/piix4.c
+++ b/hw/acpi/piix4.c
@@ -83,6 +83,8 @@ typedef struct PIIX4PMState {
 uint8_t disable_s4;
 uint8_t s4_val;
 
+uint8_t dsdt_revision;
+
 AcpiCpuHotplug gpe_cpu;
 
 MemHotplugState acpi_memory_hotplug;
@@ -588,6 +590,7 @@ static Property piix4_pm_properties[] = {
 DEFINE_PROP_UINT8(ACPI_PM_PROP_S3_DISABLED, PIIX4PMState, disable_s3, 0),
 DEFINE_PROP_UINT8(ACPI_PM_PROP_S4_DISABLED, PIIX4PMState, disable_s4, 0),
 DEFINE_PROP_UINT8(ACPI_PM_PROP_S4_VA

[Qemu-devel] [PATCH 0/6] NVDIMM ACPI: introduce the framework of QEMU emulated DSM

2016-01-04 Thread Xiao Guangrong

This patchset is against commit 5530427f0ca (acpi: extend aml_and() to
accept target argument) on pci branch of Michael's git tree
and can be found at:
  https://github.com/xiaogr/qemu.git nvdimm-acpi-v1

This is the second part of vNVDIMM implementation which implements the
BIOS patched dsm memory and introduces the framework that allows QEMU
to emulate DSM method

Thanks to Michael's idea, we do not reserve any memory for NVDIMM ACPI,
instead we let BIOS allocate the memory and patch the address to the
offset we want

IO port is still enabled as it plays as the way to notify QEMU and pass
the patched dsm memory address, so that IO port region, 0x0a18 - 0xa20,
is reserved and it is divided into two 32 bits ports and used to pass
the low 32 bits and high 32 bits of dsm memory address to QEMU

Thanks Igor's idea, this patchset also extends DSDT/SSDT to revision 2
to apply 64 bit operations, in order to keeping compatibility, old
version (<= 2.5) still uses revision 1. Since 64 bit operations breaks
old guests (such as windows XP), we should keep the 64 bits stuff in
the private place where common ACPI operation does not touch it

Igor Mammedov (1):
  pc: acpi: bump DSDT/SSDT compliance revision to v2

Xiao Guangrong (5):
  nvdimm acpi: initialize the resource used by NVDIMM ACPI
  nvdimm acpi: introduce patched dsm memory
  acpi: allow using acpi named offset for OperationRegion
  nvdimm acpi: let qemu handle _DSM method
  nvdimm acpi: emulate dsm method

 hw/acpi/Makefile.objs   |   2 +-
 hw/acpi/aml-build.c |  45 +++-
 hw/acpi/ich9.c  |  32 +
 hw/acpi/nvdimm.c| 276 ++--
 hw/acpi/piix4.c |   3 +
 hw/i386/acpi-build.c|  41 ---
 hw/i386/pc.c|   8 +-
 hw/i386/pc_piix.c   |   5 +
 hw/i386/pc_q35.c|   8 +-
 include/hw/acpi/aml-build.h |   6 +-
 include/hw/acpi/ich9.h  |   2 +
 include/hw/i386/pc.h|  19 ++-
 include/hw/mem/nvdimm.h |  44 ++-
 13 files changed, 449 insertions(+), 42 deletions(-)

-- 
1.8.3.1

Re: [Qemu-devel] [PATCH v1] qemu-iotests: s390x: fix test 051

2016-01-04 Thread Max Reitz

On 04.01.2016 06:29, Bo Tu wrote:
> From: Bo Tu  
> 
> Replace the remaining "-drive file..."
> by "-drive file=...,if=none,id=$device_id", then x86 and s390x
> can get the common output.
> "if=ide, if=floppy, if=scsi" are not supported by s390x,
> so these test cases are not executed for s390x platform.
> 
> Signed-off-by: Bo Tu 
> ---
>  tests/qemu-iotests/051| 32 
>  tests/qemu-iotests/051.out| 70 
> ++-
>  tests/qemu-iotests/051.pc.out | 52 +---
>  3 files changed, 69 insertions(+), 85 deletions(-)

Thanks!

Applied to my block branch:

https://github.com/XanClic/qemu/commits/block

Max



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH V6 7/8] introduce xlnx-dp

2016-01-04 Thread fred . konrad

From: KONRAD Frederic 

This is the implementation of the DisplayPort.
It has an aux-bus to access dpcd and edid.

Graphic plane is connected to the channel 3.
Video plane is connected to the channel 0.
Audio stream are connected to the channels 4 and 5.

Signed-off-by: KONRAD Frederic 
Tested-By: Hyun Kwon 
---
 hw/display/Makefile.objs |1 +
 hw/display/xlnx_dp.c | 1361 ++
 include/hw/display/xlnx_dp.h |  110 
 3 files changed, 1472 insertions(+)
 create mode 100644 hw/display/xlnx_dp.c
 create mode 100644 include/hw/display/xlnx_dp.h

diff --git a/hw/display/Makefile.objs b/hw/display/Makefile.objs
index 250a43f..3625ab2 100644
--- a/hw/display/Makefile.objs
+++ b/hw/display/Makefile.objs
@@ -43,3 +43,4 @@ virtio-gpu.o-libs += $(VIRGL_LIBS)
 virtio-gpu-3d.o-cflags := $(VIRGL_CFLAGS)
 virtio-gpu-3d.o-libs += $(VIRGL_LIBS)
 obj-$(CONFIG_DPCD) += dpcd.o
+obj-$(CONFIG_XLNX_ZYNQMP) += xlnx_dp.o
diff --git a/hw/display/xlnx_dp.c b/hw/display/xlnx_dp.c
new file mode 100644
index 000..4238d69
--- /dev/null
+++ b/hw/display/xlnx_dp.c
@@ -0,0 +1,1361 @@
+/*
+ * xlnx_dp.c
+ *
+ *  Copyright (C) 2015 : GreenSocs Ltd
+ *  http://www.greensocs.com/ , email: i...@greensocs.com
+ *
+ *  Developed by :
+ *  Frederic Konrad   
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option)any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see .
+ *
+ */
+
+#include "hw/display/xlnx_dp.h"
+
+#ifndef DEBUG_DP
+#define DEBUG_DP 0
+#endif
+
+#define DPRINTF(fmt, ...) do { 
\
+if (DEBUG_DP) {
\
+qemu_log("xlnx_dp: " fmt , ## __VA_ARGS__);
\
+}  
\
+} while (0);
+
+/*
+ * Register offset for DP.
+ */
+#define DP_LINK_BW_SET  (0x >> 2)
+#define DP_LANE_COUNT_SET   (0x0004 >> 2)
+#define DP_ENHANCED_FRAME_EN(0x0008 >> 2)
+#define DP_TRAINING_PATTERN_SET (0x000C >> 2)
+#define DP_LINK_QUAL_PATTERN_SET(0x0010 >> 2)
+#define DP_SCRAMBLING_DISABLE   (0x0014 >> 2)
+#define DP_DOWNSPREAD_CTRL  (0x0018 >> 2)
+#define DP_SOFTWARE_RESET   (0x001C >> 2)
+#define DP_TRANSMITTER_ENABLE   (0x0080 >> 2)
+#define DP_MAIN_STREAM_ENABLE   (0x0084 >> 2)
+#define DP_FORCE_SCRAMBLER_RESET(0x00C0 >> 2)
+#define DP_VERSION_REGISTER (0x00F8 >> 2)
+#define DP_CORE_ID  (0x00FC >> 2)
+
+#define DP_AUX_COMMAND_REGISTER (0x0100 >> 2)
+#define AUX_ADDR_ONLY_MASK  (0x1000)
+#define AUX_COMMAND_MASK(0x0F00)
+#define AUX_COMMAND_SHIFT   (8)
+#define AUX_COMMAND_NBYTES  (0x000F)
+
+#define DP_AUX_WRITE_FIFO   (0x0104 >> 2)
+#define DP_AUX_ADDRESS  (0x0108 >> 2)
+#define DP_AUX_CLOCK_DIVIDER(0x010C >> 2)
+#define DP_TX_USER_FIFO_OVERFLOW(0x0110 >> 2)
+#define DP_INTERRUPT_SIGNAL_STATE   (0x0130 >> 2)
+#define DP_AUX_REPLY_DATA   (0x0134 >> 2)
+#define DP_AUX_REPLY_CODE   (0x0138 >> 2)
+#define DP_AUX_REPLY_COUNT  (0x013C >> 2)
+#define DP_REPLY_DATA_COUNT (0x0148 >> 2)
+#define DP_REPLY_STATUS (0x014C >> 2)
+#define DP_HPD_DURATION (0x0150 >> 2)
+#define DP_MAIN_STREAM_HTOTAL   (0x0180 >> 2)
+#define DP_MAIN_STREAM_VTOTAL   (0x0184 >> 2)
+#define DP_MAIN_STREAM_POLARITY (0x0188 >> 2)
+#define DP_MAIN_STREAM_HSWIDTH  (0x018C >> 2)
+#define DP_MAIN_STREAM_VSWIDTH  (0x0190 >> 2)
+#define DP_MAIN_STREAM_HRES (0x0194 >> 2)
+#define DP_MAIN_STREAM_VRES (0x0198 >> 2)
+#define DP_MAIN_STREAM_HSTART   (0x019C >> 2)
+#define DP_MAIN_STREAM_VSTART   (0x01A0 >> 2)
+#define DP_MAIN_STREAM_MISC0(0x01A4 >> 2)
+#define DP_MAIN_STREAM_MISC1(0x01A8 >> 2)
+#define DP_MAIN_STREAM_M_VID(0x01AC >> 2)
+#define DP_MSA_TRANSFER_UNIT_SIZE   (0x01B0 >> 2)
+#define DP_MAIN_STREAM_N_VID(0x01B4 >> 2)
+#define DP_USER_DATA_COUNT_PER_LANE

[Qemu-devel] [PATCH V6 5/8] hw/i2c-ddc.c: Implement DDC I2C slave

2016-01-04 Thread fred . konrad

From: Peter Maydell 

Implement an I2C slave which implements DDC and returns the
EDID data for an attached monitor.

Signed-off-by: Peter Maydell 

  - Rebased on the current master.
  - Modified for QOM.

Signed-off-by: KONRAD Frederic 
Reviewed-by: Alistair Francis 
Tested-By: Hyun Kwon 
---
 default-configs/aarch64-softmmu.mak |   1 +
 hw/i2c/Makefile.objs|   1 +
 hw/i2c/i2c-ddc.c| 304 
 include/hw/i2c/i2c-ddc.h|  38 +
 4 files changed, 344 insertions(+)
 create mode 100644 hw/i2c/i2c-ddc.c
 create mode 100644 include/hw/i2c/i2c-ddc.h

diff --git a/default-configs/aarch64-softmmu.mak 
b/default-configs/aarch64-softmmu.mak
index 87165b7..2449483 100644
--- a/default-configs/aarch64-softmmu.mak
+++ b/default-configs/aarch64-softmmu.mak
@@ -4,5 +4,6 @@
 include arm-softmmu.mak
 
 CONFIG_AUX=y
+CONFIG_DDC=y
 CONFIG_DPCD=y
 CONFIG_XLNX_ZYNQMP=y
diff --git a/hw/i2c/Makefile.objs b/hw/i2c/Makefile.objs
index aeb8f38..6dd7b6c 100644
--- a/hw/i2c/Makefile.objs
+++ b/hw/i2c/Makefile.objs
@@ -1,4 +1,5 @@
 common-obj-y += core.o smbus.o smbus_eeprom.o
+common-obj-$(CONFIG_DDC) += i2c-ddc.o
 common-obj-$(CONFIG_VERSATILE_I2C) += versatile_i2c.o
 common-obj-$(CONFIG_ACPI_X86) += smbus_ich9.o
 common-obj-$(CONFIG_APM) += pm_smbus.o
diff --git a/hw/i2c/i2c-ddc.c b/hw/i2c/i2c-ddc.c
new file mode 100644
index 000..08bb7dc
--- /dev/null
+++ b/hw/i2c/i2c-ddc.c
@@ -0,0 +1,304 @@
+/* A simple I2C slave for returning monitor EDID data via DDC.
+ *
+ * Copyright (c) 2011 Linaro Limited
+ * Written by Peter Maydell
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License version 2 as
+ *  published by the Free Software Foundation.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License along
+ *  with this program; if not, see .
+ */
+#include "hw/i2c/i2c.h"
+#include "hw/i2c/i2c-ddc.h"
+
+#ifndef DEBUG_I2CDDC
+#define DEBUG_I2CDDC 0
+#endif
+
+#define DPRINTF(fmt, ...) do { 
\
+if (DEBUG_I2CDDC) {
\
+qemu_log("i2c-ddc: " fmt , ## __VA_ARGS__);
\
+}  
\
+} while (0);
+
+/* Structure defining a monitor's characteristics in a
+ * readable format: this should be passed to build_edid_blob()
+ * to convert it into the 128 byte binary EDID blob.
+ * Not all bits of the EDID are customisable here.
+ */
+struct EDIDData {
+char manuf_id[3]; /* three upper case letters */
+uint16_t product_id;
+uint32_t serial_no;
+uint8_t manuf_week;
+int manuf_year;
+uint8_t h_cm;
+uint8_t v_cm;
+uint8_t gamma;
+char monitor_name[14];
+char serial_no_string[14];
+/* Range limits */
+uint8_t vmin; /* Hz */
+uint8_t vmax; /* Hz */
+uint8_t hmin; /* kHz */
+uint8_t hmax; /* kHz */
+uint8_t pixclock; /* MHz / 10 */
+uint8_t timing_data[18];
+};
+
+typedef struct EDIDData EDIDData;
+
+/* EDID data for a simple LCD monitor */
+static const EDIDData lcd_edid = {
+/* The manuf_id ought really to be an assigned EISA ID */
+.manuf_id = "QMU",
+.product_id = 0,
+.serial_no = 1,
+.manuf_week = 1,
+.manuf_year = 2011,
+.h_cm = 40,
+.v_cm = 30,
+.gamma = 0x78,
+.monitor_name = "QEMU monitor",
+.serial_no_string = "1",
+.vmin = 40,
+.vmax = 120,
+.hmin = 30,
+.hmax = 100,
+.pixclock = 18,
+.timing_data = {
+/* Borrowed from a 21" LCD */
+0x48, 0x3f, 0x40, 0x30, 0x62, 0xb0, 0x32, 0x40, 0x40,
+0xc0, 0x13, 0x00, 0x98, 0x32, 0x11, 0x00, 0x00, 0x1e
+}
+};
+
+static uint8_t manuf_char_to_int(char c)
+{
+return (c - 'A') & 0x1f;
+}
+
+static void write_ascii_descriptor_block(uint8_t *descblob, uint8_t blocktype,
+ const char *string)
+{
+/* Write an EDID Descriptor Block of the "ascii string" type */
+int i;
+descblob[0] = descblob[1] = descblob[2] = descblob[4] = 0;
+descblob[3] = blocktype;
+/* The rest is 13 bytes of ASCII; if less then the rest must
+ * be filled with newline then spaces
+ */
+for (i = 5; i < 19; i++) {
+descblob[i] = string[i - 5];
+if (!descblob[i]) {
+break;
+}
+}
+if (i < 19) {
+descblob[i++] = '\n';
+}
+for ( ; i < 19; i++) {
+descblob[i] = ' ';
+}
+}
+
+static void write_range_limits_descriptor(const EDIDData *edid,
+

[Qemu-devel] [PATCH V6 6/8] introduce xlnx-dpdma

2016-01-04 Thread fred . konrad

From: KONRAD Frederic 

This is the implementation of the DPDMA.

Signed-off-by: KONRAD Frederic 
Reviewed-by: Alistair Francis 
Tested-By: Hyun Kwon 
---
 hw/dma/Makefile.objs|   1 +
 hw/dma/xlnx_dpdma.c | 792 
 include/hw/dma/xlnx_dpdma.h |  85 +
 3 files changed, 878 insertions(+)
 create mode 100644 hw/dma/xlnx_dpdma.c
 create mode 100644 include/hw/dma/xlnx_dpdma.h

diff --git a/hw/dma/Makefile.objs b/hw/dma/Makefile.objs
index 0e65ed0..5451836 100644
--- a/hw/dma/Makefile.objs
+++ b/hw/dma/Makefile.objs
@@ -8,6 +8,7 @@ common-obj-$(CONFIG_XILINX_AXI) += xilinx_axidma.o
 common-obj-$(CONFIG_ETRAXFS) += etraxfs_dma.o
 common-obj-$(CONFIG_STP2000) += sparc32_dma.o
 common-obj-$(CONFIG_SUN4M) += sun4m_iommu.o
+obj-$(CONFIG_XLNX_ZYNQMP) += xlnx_dpdma.o
 
 obj-$(CONFIG_OMAP) += omap_dma.o soc_dma.o
 obj-$(CONFIG_PXA2XX) += pxa2xx_dma.o
diff --git a/hw/dma/xlnx_dpdma.c b/hw/dma/xlnx_dpdma.c
new file mode 100644
index 000..3819b96
--- /dev/null
+++ b/hw/dma/xlnx_dpdma.c
@@ -0,0 +1,792 @@
+/*
+ * xlnx_dpdma.c
+ *
+ *  Copyright (C) 2015 : GreenSocs Ltd
+ *  http://www.greensocs.com/ , email: i...@greensocs.com
+ *
+ *  Developed by :
+ *  Frederic Konrad   
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see .
+ *
+ */
+
+#include "hw/dma/xlnx_dpdma.h"
+
+#ifndef DEBUG_DPDMA
+#define DEBUG_DPDMA 0
+#endif
+
+#define DPRINTF(fmt, ...) do { 
\
+if (DEBUG_DPDMA) { 
\
+qemu_log("xlnx_dpdma: " fmt , ## __VA_ARGS__); 
\
+}  
\
+} while (0);
+
+/*
+ * Registers offset for DPDMA.
+ */
+#define DPDMA_ERR_CTRL(0x)
+#define DPDMA_ISR (0x0004 >> 2)
+#define DPDMA_IMR (0x0008 >> 2)
+#define DPDMA_IEN (0x000C >> 2)
+#define DPDMA_IDS (0x0010 >> 2)
+#define DPDMA_EISR(0x0014 >> 2)
+#define DPDMA_EIMR(0x0018 >> 2)
+#define DPDMA_EIEN(0x001C >> 2)
+#define DPDMA_EIDS(0x0020 >> 2)
+#define DPDMA_CNTL(0x0100 >> 2)
+
+#define DPDMA_GBL (0x0104 >> 2)
+#define DPDMA_GBL_TRG_CH(n)   (1 << n)
+#define DPDMA_GBL_RTRG_CH(n)  (1 << 6 << n)
+
+#define DPDMA_ALC0_CNTL   (0x0108 >> 2)
+#define DPDMA_ALC0_STATUS (0x010C >> 2)
+#define DPDMA_ALC0_MAX(0x0110 >> 2)
+#define DPDMA_ALC0_MIN(0x0114 >> 2)
+#define DPDMA_ALC0_ACC(0x0118 >> 2)
+#define DPDMA_ALC0_ACC_TRAN   (0x011C >> 2)
+#define DPDMA_ALC1_CNTL   (0x0120 >> 2)
+#define DPDMA_ALC1_STATUS (0x0124 >> 2)
+#define DPDMA_ALC1_MAX(0x0128 >> 2)
+#define DPDMA_ALC1_MIN(0x012C >> 2)
+#define DPDMA_ALC1_ACC(0x0130 >> 2)
+#define DPDMA_ALC1_ACC_TRAN   (0x0134 >> 2)
+
+#define DPDMA_DSCR_STRT_ADDRE_CH(n)   ((0x0200 + n * 0x100) >> 2)
+#define DPDMA_DSCR_STRT_ADDR_CH(n)((0x0204 + n * 0x100) >> 2)
+#define DPDMA_DSCR_NEXT_ADDRE_CH(n)   ((0x0208 + n * 0x100) >> 2)
+#define DPDMA_DSCR_NEXT_ADDR_CH(n)((0x020C + n * 0x100) >> 2)
+#define DPDMA_PYLD_CUR_ADDRE_CH(n)((0x0210 + n * 0x100) >> 2)
+#define DPDMA_PYLD_CUR_ADDR_CH(n) ((0x0214 + n * 0x100) >> 2)
+
+#define DPDMA_CNTL_CH(n)  ((0x0218 + n * 0x100) >> 2)
+#define DPDMA_CNTL_CH_EN  (1)
+#define DPDMA_CNTL_CH_PAUSED  (1 << 1)
+
+#define DPDMA_STATUS_CH(n)((0x021C + n * 0x100) >> 2)
+#define DPDMA_STATUS_BURST_TYPE   (1 << 4)
+#define DPDMA_STATUS_MODE (1 << 5)
+#define DPDMA_STATUS_EN_CRC   (1 << 6)
+#define DPDMA_STATUS_LAST_DSCR(1 << 7)
+#define DPDMA_STATUS_LDSCR_FRAME  (1 << 8)
+#define DPDMA_STATUS_IGNR_DONE(1 << 9)
+#define DPDMA_STATUS_DSCR_DONE

[Qemu-devel] [PATCH V6 8/8] arm: xlnx-zynqmp: Add xlnx-dp and xlnx-dpdma

2016-01-04 Thread fred . konrad

From: KONRAD Frederic 

This adds the DP and the DPDMA to the Zynq MP platform.

Signed-off-by: KONRAD Frederic 
Reviewed-by: Peter Crosthwaite 
Tested-By: Hyun Kwon 
---
 hw/arm/xlnx-zynqmp.c | 30 ++
 include/hw/arm/xlnx-zynqmp.h |  5 +
 2 files changed, 35 insertions(+)

diff --git a/hw/arm/xlnx-zynqmp.c b/hw/arm/xlnx-zynqmp.c
index 87553bb..ce9919f 100644
--- a/hw/arm/xlnx-zynqmp.c
+++ b/hw/arm/xlnx-zynqmp.c
@@ -32,6 +32,12 @@
 #define SATA_ADDR   0xFD0C
 #define SATA_NUM_PORTS  2
 
+#define DP_ADDR 0xfd4a
+#define DP_IRQ  113
+
+#define DPDMA_ADDR  0xfd4c
+#define DPDMA_IRQ   116
+
 static const uint64_t gem_addr[XLNX_ZYNQMP_NUM_GEMS] = {
 0xFF0B, 0xFF0C, 0xFF0D, 0xFF0E,
 };
@@ -112,6 +118,12 @@ static void xlnx_zynqmp_init(Object *obj)
 qdev_set_parent_bus(DEVICE(&s->sdhci[i]),
 sysbus_get_default());
 }
+
+object_initialize(&s->dp, sizeof(s->dp), TYPE_XLNX_DP);
+qdev_set_parent_bus(DEVICE(&s->dp), sysbus_get_default());
+
+object_initialize(&s->dpdma, sizeof(s->dpdma), TYPE_XLNX_DPDMA);
+qdev_set_parent_bus(DEVICE(&s->dpdma), sysbus_get_default());
 }
 
 static void xlnx_zynqmp_realize(DeviceState *dev, Error **errp)
@@ -286,6 +298,24 @@ static void xlnx_zynqmp_realize(DeviceState *dev, Error 
**errp)
 sysbus_connect_irq(SYS_BUS_DEVICE(&s->sdhci[i]), 0,
gic_spi[sdhci_intr[i]]);
 }
+
+sysbus_mmio_map(SYS_BUS_DEVICE(&s->dp), 0, DP_ADDR);
+sysbus_connect_irq(SYS_BUS_DEVICE(&s->dp), 0, gic_spi[DP_IRQ]);
+
+sysbus_mmio_map(SYS_BUS_DEVICE(&s->dpdma), 0, DPDMA_ADDR);
+sysbus_connect_irq(SYS_BUS_DEVICE(&s->dpdma), 0, gic_spi[DPDMA_IRQ]);
+object_property_set_bool(OBJECT(&s->dp), true, "realized", &err);
+if (err) {
+error_propagate(errp, err);
+return;
+}
+object_property_set_bool(OBJECT(&s->dpdma), true, "realized", &err);
+if (err) {
+error_propagate(errp, err);
+return;
+}
+object_property_set_link(OBJECT(&s->dp), OBJECT(&s->dpdma), "dpdma",
+ &error_abort);
 }
 
 static Property xlnx_zynqmp_props[] = {
diff --git a/include/hw/arm/xlnx-zynqmp.h b/include/hw/arm/xlnx-zynqmp.h
index d116092..e683eaf 100644
--- a/include/hw/arm/xlnx-zynqmp.h
+++ b/include/hw/arm/xlnx-zynqmp.h
@@ -25,6 +25,8 @@
 #include "hw/ide/pci.h"
 #include "hw/ide/ahci.h"
 #include "hw/sd/sdhci.h"
+#include "hw/dma/xlnx_dpdma.h"
+#include "hw/display/xlnx_dp.h"
 
 #define TYPE_XLNX_ZYNQMP "xlnx,zynqmp"
 #define XLNX_ZYNQMP(obj) OBJECT_CHECK(XlnxZynqMPState, (obj), \
@@ -69,6 +71,9 @@ typedef struct XlnxZynqMPState {
 
 char *boot_cpu;
 ARMCPU *boot_cpu_ptr;
+
+XlnxDPState dp;
+XlnxDPDMAState dpdma;
 }  XlnxZynqMPState;
 
 #define XLNX_ZYNQMP_H
-- 
1.9.0

[Qemu-devel] [PATCH V6 0/8] Xilinx DisplayPort.

2016-01-04 Thread fred . konrad

From: KONRAD Frederic 

This is the 6th version of this patch-set of the implementation of the Xilinx
DisplayPort and DPDMA.

This 6th version fixes some minors issues.

Second patch introduces an AUX bus needed by the DP to read the DPCD.
It's also possible to connect an I2C device on it to to I2C through AUX
commands. The drivers requires I2C broadcast write to be modeled as well which
seems to be missing currently upstream.

The tree can be cloned at:
g...@git.greensocs.com:fkonrad/xilinx_dp.git branch xilinx_dp_v6_release

Details of the DPDMA part:
 * DPDMA is implemented as a QEMU SYSBUS device.
 * Interrupts are implemented except the axi error and fifo.

Details of the XILINX-DP:
 * DP is also implemented as a QEMU SYSBUS. Multiple memory regions are used to
   avoid having a single big region as there are holes in the DP memory map.
 * An aux-bus has been implemented, it creates a memory map for aux slaves and
   has an i2c bus (which is already implemented in QEMU).
 * The normal programmable i2c clock and controller implementation is missing
   from the QEMU tree so the easiest way for us was to implement a dummy-clk
   driver in the kernel. It's a clock which does nothing but fakes a clock such
   that the DPDMA driver works. The patch will be send separately.
 * The graphic plane works on channel 3, video on channel 0 and audios on
   channel 4 and 5.

Thanks,
Fred

V5 -> V6 changes:
  * globally:
* Rebased on current master (38a762fec63fd5c035aae29ba9a77d357e21e4a7).
* Fix some coding style issues.

V4 -> V5 changes:
  * aux:
* Move the header include/hw => include/hw/misc
  * dpcd:
* Move the header hw/display => include/hw/display
  * i2c-ddc:
* Move the header hw/i2c => include/hw/i2c
  * xlnx_dpdma:
* Move the header hw/dma => include/hw/dma
* Fix some styles issues.
  * xlnx_dp:
* Move the header hw/display => include/hw/display
  * globally:
* Rebased on current master (c49d3411faae8ffaab8f7e5db47405a008411c10).

V3 -> V4 changes:
  * xlnx_dpdma:
* Initialize operation_finished during reset.
* Add a function to trigger a VSYNC interrupt from the xlnx_dp.
  * xlnx_dp:
* Fix the default pixman format for video buffer.
* Remove unused buffer.
  * dpcd:
* Add the missing DPCD_LANE_X_STATUS.
* Set status field for all ports to avoid driver error.
* Use 4 lines by default.
* Use guest error in case of an outbound access.
  * i2c broadcast:
* Use a list of device instead of relying on broadcast field to remove duped
  code.
  * other:
* rebased on current master (774ee4772b6838b78741ea52d4bf26b8922244c5)

V2 -> V3 changes:
  * dpcd:
* Add a CONFIG_DPCD.
  * i2c-ddc:
* Fill in VMSD.
  * aux:
* Remove address field.
* Add a CONFIG_AUX.
  * dpdma:
* Fill in VMSD.
* Some coding style changes.
  * dp:
* Fill in VMSD.
* Coding style changes.

V1 -> V2 changes:
  * xlnx-zynqmp:
* Remove the dummy object_property_add_child(..).
  * dpcd:
* Compile only when the ZYNQMP platform is compiled.
* Use qemu_log instead of printf.
* Compile test debug traces.
* Remove the unused current_reg.
* Remove the blank realize.
* Use dpcd_ prefixes instead of aux_ prefixes.
* Add a reset callback.
* Add the VMSD.
* Add size constraint in the MemoryRegionOps structure instead of asserting.
* Style fixes.
  * aux:
* Compile only when the ZYNQMP platform is compiled.
* Remove the class init and the class for aux-slave.
  * dpdma:
* Compile only when the ZYNQMP platform is compiled.
* Unify per channel macro in one, simplify the switch case.
* Use extractXX.
* Make DPDMA_GBL an or'ed register.
  * dp:
* Compile only when the ZYNQMP platform is compiled.
* Don't look at the audio channel count.
* Use a third pixman plane when we do blending.
  * other:
* Drop the useless "console: add qemu_alloc_display_format." patch as
  suggested by Gerd.
* Rebase on current master (f3e3b083d4c266ea864ae3c83da49d4086857679).

KONRAD Frederic (7):
  i2cbus: remove unused dev field
  introduce aux-bus
  i2c: implement broadcast write
  introduce dpcd module
  introduce xlnx-dpdma
  introduce xlnx-dp
  arm: xlnx-zynqmp: Add xlnx-dp and xlnx-dpdma

Peter Maydell (1):
  hw/i2c-ddc.c: Implement DDC I2C slave

 default-configs/aarch64-softmmu.mak |3 +
 hw/arm/xlnx-zynqmp.c|   30 +
 hw/display/Makefile.objs|2 +
 hw/display/dpcd.c   |  171 +
 hw/display/xlnx_dp.c| 1361 +++
 hw/dma/Makefile.objs|1 +
 hw/dma/xlnx_dpdma.c |  792 
 hw/i2c/Makefile.objs|1 +
 hw/i2c/core.c   |  130 ++--
 hw/i2c/i2c-ddc.c|  304 
 hw/misc/Makefile.objs   |1 +
 hw/misc/aux.c   |  348 +
 include/hw/arm/xl

[Qemu-devel] [PATCH V6 2/8] introduce aux-bus

2016-01-04 Thread fred . konrad

From: KONRAD Frederic 

This introduces a new bus: aux-bus.

It contains an address space for aux slaves devices and a bridge to an I2C bus
for I2C through AUX transactions.

Signed-off-by: KONRAD Frederic 
Tested-By: Hyun Kwon 
---
 default-configs/aarch64-softmmu.mak |   1 +
 hw/misc/Makefile.objs   |   1 +
 hw/misc/aux.c   | 348 
 include/hw/misc/aux.h   | 124 +
 4 files changed, 474 insertions(+)
 create mode 100644 hw/misc/aux.c
 create mode 100644 include/hw/misc/aux.h

diff --git a/default-configs/aarch64-softmmu.mak 
b/default-configs/aarch64-softmmu.mak
index 96dd994..d3a2665 100644
--- a/default-configs/aarch64-softmmu.mak
+++ b/default-configs/aarch64-softmmu.mak
@@ -3,4 +3,5 @@
 # We support all the 32 bit boards so need all their config
 include arm-softmmu.mak
 
+CONFIG_AUX=y
 CONFIG_XLNX_ZYNQMP=y
diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
index d4765c2..4af5fbe 100644
--- a/hw/misc/Makefile.objs
+++ b/hw/misc/Makefile.objs
@@ -44,3 +44,4 @@ obj-$(CONFIG_STM32F2XX_SYSCFG) += stm32f2xx_syscfg.o
 obj-$(CONFIG_PVPANIC) += pvpanic.o
 obj-$(CONFIG_EDU) += edu.o
 obj-$(CONFIG_HYPERV_TESTDEV) += hyperv_testdev.o
+obj-$(CONFIG_AUX) += aux.o
diff --git a/hw/misc/aux.c b/hw/misc/aux.c
new file mode 100644
index 000..cdfec67
--- /dev/null
+++ b/hw/misc/aux.c
@@ -0,0 +1,348 @@
+/*
+ * aux.c
+ *
+ *  Copyright 2015 : GreenSocs Ltd
+ *  http://www.greensocs.com/ , email: i...@greensocs.com
+ *
+ *  Developed by :
+ *  Frederic Konrad   
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option)any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see .
+ *
+ */
+
+/*
+ * This is an implementation of the AUX bus for VESA Display Port v1.1a.
+ */
+
+#include "hw/misc/aux.h"
+#include "hw/i2c/i2c.h"
+#include "monitor/monitor.h"
+
+#ifndef DEBUG_AUX
+#define DEBUG_AUX 0
+#endif
+
+#define DPRINTF(fmt, ...) do { 
\
+if (DEBUG_AUX) {   
\
+qemu_log("aux: " fmt , ## __VA_ARGS__);
\
+}  
\
+} while (0);
+
+#define TYPE_AUXTOI2C "aux-to-i2c-bridge"
+#define AUXTOI2C(obj) OBJECT_CHECK(AUXTOI2CState, (obj), TYPE_AUXTOI2C)
+
+#define TYPE_AUX_BUS "aux-bus"
+#define AUX_BUS(obj) OBJECT_CHECK(AUXBus, (obj), TYPE_AUX_BUS)
+
+static void aux_slave_dev_print(Monitor *mon, DeviceState *dev, int indent);
+
+static void aux_bus_class_init(ObjectClass *klass, void *data)
+{
+BusClass *k = BUS_CLASS(klass);
+
+/* AUXSlave has an MMIO so we need to change the way we print information
+ * in monitor.
+ */
+k->print_dev = aux_slave_dev_print;
+}
+
+static const TypeInfo aux_bus_info = {
+.name = TYPE_AUX_BUS,
+.parent = TYPE_BUS,
+.instance_size = sizeof(AUXBus),
+.class_init = aux_bus_class_init
+};
+
+AUXBus *aux_init_bus(DeviceState *parent, const char *name)
+{
+AUXBus *bus;
+
+bus = AUX_BUS(qbus_create(TYPE_AUX_BUS, parent, name));
+bus->bridge = AUXTOI2C(qdev_create(BUS(bus), TYPE_AUXTOI2C));
+
+/* Memory related. */
+bus->aux_io = g_malloc(sizeof(*bus->aux_io));
+memory_region_init(bus->aux_io, OBJECT(bus), "aux-io", (1 << 20));
+address_space_init(&bus->aux_addr_space, bus->aux_io, "aux-io");
+return bus;
+}
+
+static void aux_bus_map_device(AUXBus *bus, AUXSlave *dev, hwaddr addr)
+{
+memory_region_add_subregion(bus->aux_io, addr, dev->mmio);
+}
+
+static bool aux_bus_is_bridge(AUXBus *bus, DeviceState *dev)
+{
+return (dev == DEVICE(bus->bridge));
+}
+
+AUXReply aux_request(AUXBus *bus, AUXCommand cmd, uint32_t address,
+  uint8_t len, uint8_t *data)
+{
+int temp;
+AUXReply ret = AUX_NACK;
+I2CBus *i2c_bus = aux_get_i2c_bus(bus);
+size_t i;
+bool is_write = false;
+
+DPRINTF("request at address 0x%" PRIX32 ", command %u, len %u\n", address,
+cmd, len);
+
+switch (cmd) {
+/*
+ * Forward the request on the AUX bus..
+ */
+case WRITE_AUX:
+is_write = true;
+/* fallthrough */
+case READ_AUX:
+for (i = 0; i < len; i++) {
+if (!address_space_rw(&bus->aux_addr_space, address++,
+  MEMTXATTRS_UNSPECIFIED, data++, 1,
+

[Qemu-devel] [PATCH V6 3/8] i2c: implement broadcast write

2016-01-04 Thread fred . konrad

From: KONRAD Frederic 

This does a write to every slaves when the I2C bus get a write to address 0.

Signed-off-by: KONRAD Frederic 
Reviewed-by: Alistair Francis 
Reviewed-by: Peter Crosthwaite 
Tested-By: Hyun Kwon 
---
 hw/i2c/core.c | 129 ++
 1 file changed, 75 insertions(+), 54 deletions(-)

diff --git a/hw/i2c/core.c b/hw/i2c/core.c
index e0f92de..5721210 100644
--- a/hw/i2c/core.c
+++ b/hw/i2c/core.c
@@ -9,11 +9,19 @@
 
 #include "hw/i2c/i2c.h"
 
+typedef struct I2CNode I2CNode;
+
+struct I2CNode {
+I2CSlave *elt;
+QLIST_ENTRY(I2CNode) next;
+};
+
 struct I2CBus
 {
 BusState qbus;
-I2CSlave *current_dev;
+QLIST_HEAD(, I2CNode) current_devs;
 uint8_t saved_address;
+bool broadcast;
 };
 
 static Property i2c_props[] = {
@@ -34,17 +42,12 @@ static void i2c_bus_pre_save(void *opaque)
 {
 I2CBus *bus = opaque;
 
-bus->saved_address = bus->current_dev ? bus->current_dev->address : -1;
-}
-
-static int i2c_bus_post_load(void *opaque, int version_id)
-{
-I2CBus *bus = opaque;
-
-/* The bus is loaded before attached devices, so load and save the
-   current device id.  Devices will check themselves as loaded.  */
-bus->current_dev = NULL;
-return 0;
+bus->saved_address = -1;
+if (!QLIST_EMPTY(&bus->current_devs)) {
+if (!bus->broadcast) {
+bus->saved_address = QLIST_FIRST(&bus->current_devs)->elt->address;
+}
+}
 }
 
 static const VMStateDescription vmstate_i2c_bus = {
@@ -52,9 +55,9 @@ static const VMStateDescription vmstate_i2c_bus = {
 .version_id = 1,
 .minimum_version_id = 1,
 .pre_save = i2c_bus_pre_save,
-.post_load = i2c_bus_post_load,
 .fields = (VMStateField[]) {
 VMSTATE_UINT8(saved_address, I2CBus),
+VMSTATE_BOOL(broadcast, I2CBus),
 VMSTATE_END_OF_LIST()
 }
 };
@@ -77,7 +80,7 @@ void i2c_set_slave_address(I2CSlave *dev, uint8_t address)
 /* Return nonzero if bus is busy.  */
 int i2c_bus_busy(I2CBus *bus)
 {
-return bus->current_dev != NULL;
+return !QLIST_EMPTY(&bus->current_devs);
 }
 
 /* Returns non-zero if the address is not valid.  */
@@ -85,95 +88,109 @@ int i2c_bus_busy(I2CBus *bus)
 int i2c_start_transfer(I2CBus *bus, uint8_t address, int recv)
 {
 BusChild *kid;
-I2CSlave *slave = NULL;
 I2CSlaveClass *sc;
+I2CNode *node;
+
+if (address == 0x00) {
+/*
+ * This is a broadcast, the current_devs will be all the devices of the
+ * bus.
+ */
+bus->broadcast = true;
+}
 
 QTAILQ_FOREACH(kid, &bus->qbus.children, sibling) {
 DeviceState *qdev = kid->child;
 I2CSlave *candidate = I2C_SLAVE(qdev);
-if (candidate->address == address) {
-slave = candidate;
-break;
+if ((candidate->address == address) || (bus->broadcast)) {
+node = g_malloc(sizeof(struct I2CNode));
+node->elt = candidate;
+QLIST_INSERT_HEAD(&bus->current_devs, node, next);
+if (!bus->broadcast) {
+break;
+}
 }
 }
 
-if (!slave) {
+if (QLIST_EMPTY(&bus->current_devs)) {
 return 1;
 }
 
-sc = I2C_SLAVE_GET_CLASS(slave);
-/* If the bus is already busy, assume this is a repeated
-   start condition.  */
-bus->current_dev = slave;
-if (sc->event) {
-sc->event(slave, recv ? I2C_START_RECV : I2C_START_SEND);
+QLIST_FOREACH(node, &bus->current_devs, next) {
+sc = I2C_SLAVE_GET_CLASS(node->elt);
+/* If the bus is already busy, assume this is a repeated
+   start condition.  */
+if (sc->event) {
+sc->event(node->elt, recv ? I2C_START_RECV : I2C_START_SEND);
+}
 }
 return 0;
 }
 
 void i2c_end_transfer(I2CBus *bus)
 {
-I2CSlave *dev = bus->current_dev;
 I2CSlaveClass *sc;
+I2CNode *node;
 
-if (!dev) {
+if (QLIST_EMPTY(&bus->current_devs)) {
 return;
 }
 
-sc = I2C_SLAVE_GET_CLASS(dev);
-if (sc->event) {
-sc->event(dev, I2C_FINISH);
+QLIST_FOREACH(node, &bus->current_devs, next) {
+sc = I2C_SLAVE_GET_CLASS(node->elt);
+if (sc->event) {
+sc->event(node->elt, I2C_FINISH);
+}
+QLIST_REMOVE(node, next);
+g_free(node);
 }
-
-bus->current_dev = NULL;
+bus->broadcast = false;
 }
 
 int i2c_send(I2CBus *bus, uint8_t data)
 {
-I2CSlave *dev = bus->current_dev;
 I2CSlaveClass *sc;
+I2CNode *node;
+int ret = -1;
 
-if (!dev) {
-return -1;
-}
-
-sc = I2C_SLAVE_GET_CLASS(dev);
-if (sc->send) {
-return sc->send(dev, data);
+QLIST_FOREACH(node, &bus->current_devs, next) {
+sc = I2C_SLAVE_GET_CLASS(node->elt);
+if (sc->send) {
+ret |= sc->send(node->elt, data);
+}
 }
-
-return -1;
+return ret;
 }
 
 int i2c_recv(I2CBus *b

[Qemu-devel] [PATCH V6 1/8] i2cbus: remove unused dev field

2016-01-04 Thread fred . konrad

From: KONRAD Frederic 

The dev field in i2cbus is not used.
So just drop it.

Signed-off-by: KONRAD Frederic 
Reviewed-by: Alistair Francis 
Reviewed-by: Peter Crosthwaite 
Tested-By: Hyun Kwon 
---
 hw/i2c/core.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/hw/i2c/core.c b/hw/i2c/core.c
index 5a64026..e0f92de 100644
--- a/hw/i2c/core.c
+++ b/hw/i2c/core.c
@@ -13,7 +13,6 @@ struct I2CBus
 {
 BusState qbus;
 I2CSlave *current_dev;
-I2CSlave *dev;
 uint8_t saved_address;
 };
 
-- 
1.9.0

[Qemu-devel] [PATCH V6 4/8] introduce dpcd module

2016-01-04 Thread fred . konrad

From: KONRAD Frederic 

This introduces dpcd module.
It wires on a aux-bus and can be accessed by the driver to get lane-speed, etc.

Signed-off-by: KONRAD Frederic 
Reviewed-by: Alistair Francis 
Reviewed-by: Peter Crosthwaite 
Tested-By: Hyun Kwon 
---
 default-configs/aarch64-softmmu.mak |   1 +
 hw/display/Makefile.objs|   1 +
 hw/display/dpcd.c   | 171 
 include/hw/display/dpcd.h   | 105 ++
 4 files changed, 278 insertions(+)
 create mode 100644 hw/display/dpcd.c
 create mode 100644 include/hw/display/dpcd.h

diff --git a/default-configs/aarch64-softmmu.mak 
b/default-configs/aarch64-softmmu.mak
index d3a2665..87165b7 100644
--- a/default-configs/aarch64-softmmu.mak
+++ b/default-configs/aarch64-softmmu.mak
@@ -4,4 +4,5 @@
 include arm-softmmu.mak
 
 CONFIG_AUX=y
+CONFIG_DPCD=y
 CONFIG_XLNX_ZYNQMP=y
diff --git a/hw/display/Makefile.objs b/hw/display/Makefile.objs
index f0cf431..250a43f 100644
--- a/hw/display/Makefile.objs
+++ b/hw/display/Makefile.objs
@@ -42,3 +42,4 @@ virtio-gpu.o-cflags := $(VIRGL_CFLAGS)
 virtio-gpu.o-libs += $(VIRGL_LIBS)
 virtio-gpu-3d.o-cflags := $(VIRGL_CFLAGS)
 virtio-gpu-3d.o-libs += $(VIRGL_LIBS)
+obj-$(CONFIG_DPCD) += dpcd.o
diff --git a/hw/display/dpcd.c b/hw/display/dpcd.c
new file mode 100644
index 000..c72bbe0
--- /dev/null
+++ b/hw/display/dpcd.c
@@ -0,0 +1,171 @@
+/*
+ * dpcd.c
+ *
+ *  Copyright (C) 2015 : GreenSocs Ltd
+ *  http://www.greensocs.com/ , email: i...@greensocs.com
+ *
+ *  Developed by :
+ *  Frederic Konrad   
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option)any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see .
+ *
+ */
+
+/*
+ * This is a simple AUX slave which emulates a connected screen.
+ */
+
+#include "hw/misc/aux.h"
+#include "hw/display/dpcd.h"
+
+#ifndef DEBUG_DPCD
+#define DEBUG_DPCD 0
+#endif
+
+#define DPRINTF(fmt, ...) do { 
\
+if (DEBUG_DPCD) {  
\
+qemu_log("dpcd: " fmt, ## __VA_ARGS__);
\
+}  
\
+} while (0);
+
+#define DPCD_READABLE_AREA  0x600
+
+struct DPCDState {
+/*< private >*/
+AUXSlave parent_obj;
+
+/*< public >*/
+/*
+ * The DCPD is 0x7 length but read as 0 after offset 0x5FF.
+ */
+uint8_t dpcd_info[DPCD_READABLE_AREA];
+
+MemoryRegion iomem;
+};
+
+static uint64_t dpcd_read(void *opaque, hwaddr offset, unsigned size)
+{
+uint8_t ret;
+DPCDState *e = DPCD(opaque);
+
+if (offset < DPCD_READABLE_AREA) {
+ret = e->dpcd_info[offset];
+} else {
+qemu_log_mask(LOG_GUEST_ERROR, "dpcd: Bad offset 0x%" HWADDR_PRIX "\n",
+   offset);
+ret = 0;
+}
+
+DPRINTF("read 0x%" PRIX8 " @0x%" HWADDR_PRIX "\n", ret, offset);
+return ret;
+}
+
+static void dpcd_write(void *opaque, hwaddr offset, uint64_t value,
+   unsigned size)
+{
+DPCDState *e = DPCD(opaque);
+
+DPRINTF("write 0x%" PRIX8 " @0x%" HWADDR_PRIX "\n", (uint8_t)value, 
offset);
+
+if (offset < DPCD_READABLE_AREA) {
+e->dpcd_info[offset] = value;
+} else {
+qemu_log_mask(LOG_GUEST_ERROR, "dpcd: Bad offset 0x%" HWADDR_PRIX "\n",
+   offset);
+}
+}
+
+static const MemoryRegionOps aux_ops = {
+.read = dpcd_read,
+.write = dpcd_write,
+.valid = {
+.min_access_size = 1,
+.max_access_size = 1,
+},
+.impl = {
+.min_access_size = 1,
+.max_access_size = 1,
+},
+};
+
+static void dpcd_reset(DeviceState *dev)
+{
+DPCDState *s = DPCD(dev);
+
+memset(&(s->dpcd_info), 0, sizeof(s->dpcd_info));
+
+s->dpcd_info[DPCD_REVISION] = DPCD_REV_1_0;
+s->dpcd_info[DPCD_MAX_LINK_RATE] = DPCD_5_4GBPS;
+s->dpcd_info[DPCD_MAX_LANE_COUNT] = DPCD_FOUR_LANES;
+s->dpcd_info[DPCD_RECEIVE_PORT0_CAP_0] = DPCD_EDID_PRESENT;
+/* buffer size */
+s->dpcd_info[DPCD_RECEIVE_PORT0_CAP_1] = 0xFF;
+
+s->dpcd_info[DPCD_LANE0_1_STATUS] = DPCD_LANE0_CR_DONE
+  | DPCD_LANE0_CHANNEL_EQ_DONE
+  | DPCD_LANE0_SYMBOL_LOCKED
+  | DPCD_LANE1_CR_DO

Re: [Qemu-devel] [PATCH 2/2] qemu-nbd: Minor texi updates

2016-01-04 Thread Eric Blake

On 12/30/2015 12:57 PM, Sitsofe Wheeler wrote:
> - Change some spacing.
> - Remove duplicate entry for --format.
> - Reword --discard documentation.
> - Add --detect-zeroes documentation.
> 
> Signed-off-by: Sitsofe Wheeler 
> ---
>  qemu-nbd.texi | 22 ++
>  1 file changed, 14 insertions(+), 8 deletions(-)
> 

> @@ -22,8 +22,9 @@ offset into the image
>  interface to bind to (default @samp{0.0.0.0})
>  @item -k, --socket=@var{path}
>  Use a unix socket with path @var{path}
> -@item -f, --format=@var{format}
> -Set image format as @var{format}
> +@item -f, --format=@var{fmt}
> +force the use of the block driver for format @var{fmt} instead of
> +auto-detecting

Why are you abbreviating 'format' to 'fmt'? Oh, because that's how the
other duplicate one was worded.

Should probably start with 'Force', to match the capitalization used
earlier

>  @item -r, --read-only
>  export read-only
>  @item -P, --partition=@var{num}
> @@ -44,17 +45,22 @@ the emulator's @code{-drive cache=...} option for allowed 
> values.
>  choose asynchronous I/O mode between @samp{threads} (the default)
>  and @samp{native} (Linux only).
>  @item --discard=@var{discard}
> -toggles whether @dfn{discard} (also known as @dfn{trim} or @dfn{unmap})

...then again, later lines are not starting with upper case. Maybe
that's worth its own patch to make it uniformly consistent?

> -requests are ignored or passed to the filesystem.  The default is no
> -(@samp{--discard=ignore}).
> +controls whether @dfn{discard} (also known as @dfn{trim} or @dfn{unmap})
> +requests are ignored or passed to the filesystem.  @var{discard} is one of
> +@samp{ignore} (or @samp{off}), @samp{unmap} (or @samp{on}).  The default is
> +@samp{ignore}.
> +@item --detect-zeroes=@var{detect-zeroes}
> +enables the automatic conversion of plain zero writes by the OS to driver
> +specific optimized zero write commands.  @var{detect-zeroes} is one of

I'd probably write this one as 'driver-specific'

> +@samp{off}, @samp{on} or @samp{unmap}.  @samp{unmap}
> +converts a zero write to an unmap operation and can only be used if
> +@var{discard} is set to @samp{unmap}.  The default is @samp{off}.
>  @item -c, --connect=@var{dev}
>  connect @var{filename} to NBD device @var{dev}
>  @item -d, --disconnect
>  disconnect the specified device
>  @item -e, --shared=@var{num}
>  device can be shared by @var{num} clients (default @samp{1})
> -@item -f, --format=@var{fmt}
> -force block driver for format @var{fmt} instead of auto-detecting
>  @item -t, --persistent
>  don't exit on the last connection
>  @item -v, --verbose
> 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] An RDMA race?

2016-01-04 Thread Dr. David Alan Gilbert

* Michael R. Hines (mhi...@digitalocean.com) wrote:
> Adding such a control message would defeat the benefits of RDMA, as there
> shouldn't be any signalling in the actual DMA path, or RDMA latency would
> be too high. If you're sending control messages for individual writes, then
> you need to change up your design. It's OK to design ACKs for groups of
> writes, depending on the requirements.

I started off with sending individual messages, and then once I had it working
I made it group them to send one message every 2048 pages.
The performance isn't very good though, and I've not yet analysed why.

> So, the out-of-order issue you're seeing is only with your new message, not
> the original messages?

Yes I believe they're only on the new messages; however:
  1) I'm sending a lot more control messages, so if there's a race I'm
a lot more likely to trigger it. (I'm not sure I'm triggering it in the
case where I group those 2048 together) - so does this mean it would
occasionally trigger on the unmodified code?

  2) My reading of the existing code is that I think it could happen;
a) the source is ready to send something and is waiting for a CONTROL_READY,
b) the destination sends the CONTROL_READY
(blocking in qemu_rdma_post_send_control call to 
 qemu_rdma_block_for_wrid(rdma, RDMA_WRID_SEND_CONTROL, NULL)
c) The source sends it's data
d) That arrives at the destination
e) finally the WRID_SEND_CONTROL arrives back

   It's having d/e the wrong way round which is the race I think I'm seeing
   and then we lose (d)'s data.

> Can you describe/document it in more detail so I can help advise?

There are 2 cases where the destination needs to know which pages it's received:
  i) In COLO or checkpointing where it's receiving a partial new checkpoint;
since it's only receiving a partial checkpoint it needs to know what it's
received. This allows the destination to avoid copying the whole of it's
received checkpoint and only copy the bits that changed.

 ii) On postcopy once a page is received by the destination the page has to
be atomically placed;  I've not thought too hard about that yet.

Dave

> 
> - Michael
> 
> On Mon, Dec 14, 2015 at 6:53 PM, Dr. David Alan Gilbert  > wrote:
> 
> > * Michael R. Hines (mhi...@digitalocean.com) wrote:
> > > David,
> > >
> > > Thanks for including my email directly. It helps a lot.
> > >
> > > Below, I'm going to assume that only "dest" is calling
> > > qemu_rdma_exchange_recv()
> > > and only src is calling qemu_rdma_exchange_send(), since you didn't
> > specify
> > > who
> > > is sending and who is receiving.
> > >
> > > If that assumption is wrong, please respond again.
> >
> > That's correct.
> >
> > > Comments inline.
> > >
> > > On Sat, Dec 12, 2015 at 1:48 AM, Dr. David Alan Gilbert <
> > dgilb...@redhat.com
> > > > wrote:
> > >
> > > > Hi Michael,
> > > >I think I've got an RDMA race condition, but I'm being a little
> > > > cautious at the moment and wondered if you agree with the following
> > > > diagnosis.
> > > >
> > > > It's showing up in a world of mine that's sending more control messages
> > > > from the destination->source and I'm seeing the following.
> > > >
> > > > We normally expect:
> > > >
> > > >srcdest
> > > >  --->control ready->
> > > >
> > >
> > > If src is sending, this is not correct. Dest should send the ready
> > message
> > > if it is receiving, not src, which breaks the above assumption. So, I'll
> > > reverse the assumption previously and continue with your observation and
> > > assume that src is receiving instead of dest, which should instead look
> > > like:
> >
> > Gah! Yes, I got the label the wrong way around; it's dest sending control
> > ready.
> >
> > > src  (receiving)  dest (sending)
> > >  --->control ready->
> > >
> > >
> > >
> > > >Sees SEND_CONTROL signal to ack that it has been sent
> > > >
> > >
> > > I'll assume here that you meant that dest sees the ready message and is
> > > then later sends something.
> > >
> > >
> > > >  <-control message--
> > > >Sees RECV_CONTROL message from dest
> > > >
> > > >
> > > Similar assumption for the receiver (src).
> > >
> > >
> > > > but what I'm seeing is:
> > > >srcdest
> > > >  --->control ready->
> > > >  <-control message--
> > > >Sees RECV_CONTROL message from dest
> > > >
> > >
> > > hm
> > >
> > >
> > > >Sees SEND_CONTROL signal to ack that it has been sent
> > > >
> > > >
> > > There's not enough information here... do you have a multi-threaded
> > > send or receive or something?
> >
> > No, I've been trying to wire RDMA into the COLO fault-tolerant setup;
> > so the change which got me to trigger this bug was that I'd
> > added a new control message 'notify write' which explicitly
> > told the destination it had a page written to; at the RDMA

Re: [Qemu-devel] [PATCH 1/2] qemu-nbd: Fix unintended texi verbatim formatting

2016-01-04 Thread Eric Blake

On 12/30/2015 12:54 PM, Sitsofe Wheeler wrote:

[meta-comment] When sending a patch series, it's best to also include a
0/2 cover letter that summarizes the series.  Doable with 'git config
format.coverLetter auto'.

> Indented lines in the texi meant the perlpod produced interpreted the
> paragraph as being verbatim (thus formatting codes were not
> interpreted). Fix this by un-indenting problem lines.
> 
> Signed-off-by: Sitsofe Wheeler 
> ---
>  qemu-nbd.texi | 58 +-
>  1 file changed, 29 insertions(+), 29 deletions(-)

Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [Bug 1399191] Re: Large VHDX image size

2016-01-04 Thread Max Reitz

Hi Jan,

ls -l returns the length of the file; qemu-img info prints the size of
the file (just like du does). Those are not necessarily the same, as you
can see. On modern filesystems, files can have holes in them which do
contribute to the file length, but which do not use any space on disk
and thus do not contribute to file size.

Besides using qemu-img, you can obtain the actual disk usage (the file
size) by using du or ls --size (-s for short).

Max

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1399191

Title:
  Large VHDX image size

Status in QEMU:
  New

Bug description:
  We are trying to convert a VMDK image to VHDX image for deploy to HyperV 
Server ( SCVMM 2012 SP1) using qemu-img.
  We tried converting the image using both 'fixed' as well as 'dynamic' format. 
We found that both the disks occupy the same size of 50GB. When the same is 
done with VHD image, we found that the dynamic disks are much lesser in size (5 
GB) than the fixed disk (50GB). 

  Why is that the VHDX generates large sized images for both the
  formats?

  The following commands were used to convert the vmdk image to VHDX
  format

  1. qemu-img convert -p -o subformat=fixed  -f vmdk -O vhdx Test.vmdk
  Test-fixed.vhdx

  qemu-img info Test-fixed.vhdx
  image: Test-fixed.vhdx
  file format: vhdx
  virtual size: 50G (53687091200 bytes)
  disk size: 50G
  cluster_size: 16777216


  
  2. qemu-img convert -p -o subformat=dynamic  -f vmdk -O vhdx Test.vmdk 
Test-dynamic.vhdx

  qemu-img info Test-dynamic.vhdx
  image: Test-dynamic.vhdx
  file format: vhdx
  virtual size: 50G (53687091200 bytes)
  disk size: 50G
  cluster_size: 16777216

  
  We tried this with the following version of qemu
  1. qemu-2.0.0
  2. qemu-2.1.2
  3. qemu-2.2.0-rc4

  
  Please let us know how to create compact VHDX images using qemu-img.
  Thank you

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1399191/+subscriptions

Re: [Qemu-devel] [PATCH v2] target-mips: Fix ALIGN instruction when bp=0

2016-01-04 Thread P J P

  Hello Miodrag,

+-- On Mon, 4 Jan 2016, Miodrag Dinic wrote --+
| thanks for your comments and review.
| Version 2 of the patch is in the attachment.

 -> http://qemu-project.org/Contribute/SubmitAPatch#Do_not_send_as_an_attachment

Generally it is preferred to have patches inline, instead of attachments. And 
using git-format-patch(1) and git-send-email(1) is more appreciated. I too 
learned it quite recently.

--
 - P J P
47AF CE69 3A90 54AA 9045 1053 DD13 3D32 FE5B 041F

Re: [Qemu-devel] could i using qemu-img covert && rebase -u to do qcow2 rollback?

2016-01-04 Thread Eric Blake

On 01/04/2016 10:45 AM, Max Reitz wrote:
> On 30.12.2015 12:31, Huan Zhang wrote:
>> Hi,
>> We are finding a way to do rollback for qcow2 in production environment,
>> But we can't ensure the below way will work well, and user data are safe.
>>
>> for example,
>> snap0.qcow2 -> snap1.qcow2 ->snap2.qcow2 -> active.qcow2
>>
>> rollback to snap1 using convert && rebase -u:
>> 1. qemu-img convert -O qcow2 snap1.qcow2 rollback.qcow2
>> 2. qemu-img rebase -u -b snap2.qcow2 rollback.qcow2
> 
> What do you mean by "rollback"? Just getting back to the state presented
> in snap1, discarding all the changes done in snap2 and active?
> 
> Then why don't you just throw snap2.qcow2 and active.qcow2 away and
> continue to work on snap1.qcow2? If you want to keep snap1 in its
> current state, just create a new image on top of it:
> 
> $ qemu-img create -f qcow2 -b snap1.qcow2 active.qcow2

Don't forget to also set the backing format (either '-o
backing_fmt=qcow2' or the undocumented '-F qcow2').  Without an explicit
format, libvirt will default to refusing to honor your backing files
rather than risk an unsafe format probe.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] could i using qemu-img covert && rebase -u to do qcow2 rollback?

2016-01-04 Thread Max Reitz

On 30.12.2015 12:31, Huan Zhang wrote:
> Hi,
> We are finding a way to do rollback for qcow2 in production environment,
> But we can't ensure the below way will work well, and user data are safe.
> 
> for example,
> snap0.qcow2 -> snap1.qcow2 ->snap2.qcow2 -> active.qcow2
> 
> rollback to snap1 using convert && rebase -u:
> 1. qemu-img convert -O qcow2 snap1.qcow2 rollback.qcow2
> 2. qemu-img rebase -u -b snap2.qcow2 rollback.qcow2

What do you mean by "rollback"? Just getting back to the state presented
in snap1, discarding all the changes done in snap2 and active?

Then why don't you just throw snap2.qcow2 and active.qcow2 away and
continue to work on snap1.qcow2? If you want to keep snap1 in its
current state, just create a new image on top of it:

$ qemu-img create -f qcow2 -b snap1.qcow2 active.qcow2

Anyway, the qemu-img rebase line looks fishy to me. What you are doing
is using snap2.qcow2 as a backing file of snap1.qcow2, basically, and
this is just the wrong way around than it was before.

Max

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] qcow2 snapshot + resize

2016-01-04 Thread Max Reitz

On 29.12.2015 10:38, lihuiba wrote:
> Hi,
> 
> In our production environment, we need to extend a qcow2 image with
> snapshots in it. This feature, however, is not implemented yet. 
> 
> So I want to ask, if this feature is under active development?

No, it is not.

> How can I
> help with this feature?

Well, you can implement it. ;-)

> It seems that, this feature is not too difficult as long as cluster_size
> is kept unchanged. Is this correct?

The thing is that one would need to update all the inactive L1 tables. I
don't think it should be too difficult, it's just that apparently so far
nobody ever had the need for this feature.

I can take a look, but I can't say anything about whether or when
anything will come out of it.

Max

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 2/6] device_tree: introduce load_device_tree_from_sysfs

2016-01-04 Thread Eric Auger

Hi Peter,
On 12/18/2015 03:10 PM, Peter Maydell wrote:
> On 17 December 2015 at 12:29, Eric Auger  wrote:
>> This function returns the host device tree blob from sysfs
>> (/sys/firmware/devicetree/base). It uses a recursive function
>> inspired from dtc read_fstree.
>>
>> Signed-off-by: Eric Auger 
>>
>> ---
>>
>> RFC -> v1:
>> - remove runtime dependency on dtc binary and introduce read_fstree
>> ---
>>  device_tree.c| 102 
>> +++
>>  include/sysemu/device_tree.h |   1 +
>>  2 files changed, 103 insertions(+)
>>
>> diff --git a/device_tree.c b/device_tree.c
>> index a9f5f8e..e556a99 100644
>> --- a/device_tree.c
>> +++ b/device_tree.c
>> @@ -17,6 +17,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
> 
> Does this code compile on non-Linux hosts? (You've put it in a file
> which is built everywhere, but it's definitely semantically Linux
> specific.)

I struggled quite a lot while cross-compiling all dependencies for W32
(~ http://wiki.qemu.org/Hosts/W32).

Eventually device_tree.c compiles but there is a link issue since lstat
does not seem to be available with MinGW

But there is definitively a problem with hw/arm/sysbus-fdt.c which is
not compiling due to the inclusion of #include 

So thanks for raising the concern.

With respect to read_fstree, what is your sugestion: shall I keep it in
device_tree.c while protecting it with a CONFIG_LINUX or is it better to
move it, for instance in hw/arm/sysbus-fdt.c?

> 
>>  #include "qemu-common.h"
>>  #include "qemu/error-report.h"
>> @@ -117,6 +118,107 @@ fail:
>>  return NULL;
>>  }
>>
>> +/**
>> + * read_fstree: this function is inspired from dtc read_fstree
>> + * @fdt: preallocated fdt blob buffer, to be populated
>> + * @dirname: directory to scan under /sys/firmware/devicetree/base
>> + * the search is recursive and the tree is search down to the
>> + * leafs (property files).
>> + *
>> + * the function self-asserts in case of error
>> + */
>> +static void read_fstree(void *fdt, const char *dirname)
>> +{
>> +DIR *d;
>> +struct dirent *de;
> 
> Indent here doesn't match QEMU coding style, which is four-space.
OK
> 
>> +struct stat st;
>> +const char *root_dir = "/sys/firmware/devicetree/base";
> 
> You use this string twice and its length once so it would be nice
> to have it in a #define.
OK
> 
>> +char *parent_node;
>> +
>> +if (strstr(dirname, root_dir) != dirname) {
>> +error_report("%s: %s must be searched within %s",
>> + __func__, dirname, root_dir);
>> +exit(1);
>> +}
>> +parent_node = (char *)&dirname[29];
> 
> I think 29 here should be strlen(SYSFS_DT_BASEDIR) or whatever
> you want to call it.
OK
> 
>> +
>> +d = opendir(dirname);
>> +if (!d) {
>> +error_report("%s cannot open %s", __func__, dirname);
>> +exit(1);
>> +}
>> +
>> +while ((de = readdir(d)) != NULL) {
>> +char *tmpnam;
>> +
>> +if (!g_strcmp0(de->d_name, ".")
>> +|| !g_strcmp0(de->d_name, "..")) {
>> +continue;
>> +}
> 
> If you used glib g_dir_open/g_dir_read_name/g_dir_close it would
> automatically skip '.' and '..' for you, but I'm not sure the
> benefit is enough to bother redoing this code now.
OK thanks for the hint
> 
>> +
>> +tmpnam = g_strjoin("/", dirname, de->d_name, NULL);
>> +
>> +if (lstat(tmpnam, &st) < 0) {
>> +error_report("%s cannot lstat %s", __func__, 
>> tmpnam);
>> +exit(1);
>> +}
>> +
>> +if (S_ISREG(st.st_mode)) {
>> +int ret, size = st.st_size;
>> +void *val = g_malloc0(size);
>> +FILE *pfile;
>> +
>> +pfile = fopen(tmpnam, "r");
>> +if (!pfile) {
>> +error_report("%s cannot open %s", __func__, tmpnam);
>> +exit(1);
>> +}
>> +ret = fread(val, 1, size, pfile);
>> +if (ferror(pfile) || ret < size) {
>> +error_report("%s fail reading %s", __func__, 
>> tmpnam);
>> +exit(1);
>> +}
>> +fclose(pfile);
> 
> This looks like it's reimplementing g_file_get_contents().
OK
> 
>> +
>> +if (strlen(parent_node) > 0) {
>> +qemu_fdt_setprop(fdt, parent_node,
>> + de->d_name, val, size);
>> +} else {
>> +qemu_fdt_setprop(fdt, "/", de->d_name, val, size);
>> +}
>> +g_free(val);
>> +} else if (S_ISDIR(st.st_mode)) {
>> +char *node_name;
>>

[Qemu-devel] [PATCH] i2c-tiny-usb: add new usb to i2c bridge

2016-01-04 Thread Tim Sander

Version 3 with improvements suggested by Gerd Hoffmann

Signed-off-by: Tim Sander 

i2c-tiny-usb is a small usb to i2c bridge:  



  
 http://www.harbaum.org/till/i2c_tiny_usb/index.shtml   



   




   
It is pretty simple and has no usb endpoints just a control.



   
Reasons for adding this device: 



   
* Linux device driver available 



   
* adding an additional i2c bus via command line e.g.



   
  -device usb-i2c-tiny,id=i2c-0 -device tmp105,bus=i2c,address=0x50 



   
--- 



   
 default-configs/usb.mak |   1 +



   
 hw/usb/Makefile.objs|   1 +



   
 hw/usb/dev-i2c-tiny.c   | 313 





 trace-events|  11 ++   



   
 4 files changed, 326 insertions(+)

[Qemu-devel] [PATCH] macio: fix overflow in lba to offset conversion for ATAPI devices

2016-01-04 Thread Mark Cave-Ayland

As the IDEState lba field is an int32_t, make sure we cast to int64_t before
shifting to calculate the offset. Otherwise we end up with an overflow when
trying to access sectors beyond 2GB as can occur when using DVD images.

Signed-off-by: Mark Cave-Ayland 
---
 hw/ide/macio.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/ide/macio.c b/hw/ide/macio.c
index 3ee962f..a78b6e0 100644
--- a/hw/ide/macio.c
+++ b/hw/ide/macio.c
@@ -280,7 +280,7 @@ static void pmac_ide_atapi_transfer_cb(void *opaque, int 
ret)
 }
 
 /* Calculate current offset */
-offset = (int64_t)(s->lba << 11) + s->io_buffer_index;
+offset = ((int64_t)(s->lba) << 11) + s->io_buffer_index;
 
 pmac_dma_read(s->blk, offset, io->len, pmac_ide_atapi_transfer_cb, io);
 return;
-- 
1.7.10.4

Re: [Qemu-devel] [PATCH] hw/i386: fill in the CENTURY field of the FADT (FACP) ACPI table

2016-01-04 Thread Laszlo Ersek

On 12/10/15 19:53, Igor Mammedov wrote:
> On Thu, 10 Dec 2015 18:25:34 +0100
> Laszlo Ersek  wrote:
> 
>> The ACPI specification (minimally versions 1.0b through 6.0) define
>> the FADT.CENTURY field as:
>>
>>   The RTC CMOS RAM index to the century of data value (hundred and
>>   thousand year decimals). If this field contains a zero, then the RTC
>>   centenary feature is not supported. If this field has a non-zero
>> value, then this field contains an index into RTC RAM space that OSPM
>> can use to program the centenary field.
>>
>> The x86 targets generate ACPI payload, emulate an RTC
>> (CONFIG_MC146818RTC), and that RTC supports the "centenary
>> feature" (see occurrences of RTC_CENTURY in cmos_ioport_write() and
>> cmos_ioport_read() in "hw/timer/mc146818rtc.c".)
>>
>> However, FADT.CENTURY is left at zero currently:
>>
>>   [06Ch 0108   1]RTC Century Index : 00
>>
>> which -- according to analysis done by Ruiyu Ni at Intel -- should
>> cause Linux and Windows 8+ to think the RTC centenary feature is
>> unavailable, and cause Windows 7 to (incorrectly) assume that the
>> offset to use is constant 0x32. (0x32 happens to be the right value
>> on QEMU, but Windows 7 is wrong to assume anything at all).
>>
>> Exposing the right nonzero offset in FADT.CENTURY informs Linux and
>> Windows 8+ about the right capabilities of the hardware, plus it
>> retrofits our FADT to Windows 7's behavior.
>>
>> Regression tested with the following guests (all UEFI installs):
>> - i386 Q35: Fedora 21 ("Fedlet" edition)
>> - x86_64:
>>   - i440fx:
>> - Fedora 21
>> - RHEL 6 and 7
>> - Windows 7 and 10
>> - Windows Server 2008 R2 and 2012 R2
>>   - Q35:
>> - Fedora 22
>> - Windows 8.1
>>
>> Cc: "Michael S. Tsirkin"  (supporter:ACPI/SMBIOS)
>> Cc: Igor Mammedov  (supporter:ACPI/SMBIOS)
>> Cc: Paolo Bonzini  (maintainer:X86)
>> Cc: Richard Henderson  (maintainer:X86)
>> Cc: Eduardo Habkost  (maintainer:X86)
>> Cc: Ruiyu Ni 
>> Signed-off-by: Laszlo Ersek 
>> ---
>>  hw/i386/acpi-build.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
>> index 95e0c65..c5e6c4b 100644
>> --- a/hw/i386/acpi-build.c
>> +++ b/hw/i386/acpi-build.c
>> @@ -42,6 +42,7 @@
>>  #include "sysemu/tpm.h"
>>  #include "hw/acpi/tpm.h"
>>  #include "sysemu/tpm_backend.h"
>> +#include "hw/timer/mc146818rtc_regs.h"
>>  
>>  /* Supported chipsets: */
>>  #include "hw/acpi/piix4.h"
>> @@ -334,6 +335,7 @@ static void fadt_setup(AcpiFadtDescriptorRev1
>> *fadt, AcpiPmInfo *pm) if (max_cpus > 8) {
>>  fadt->flags |= cpu_to_le32(1 <<
>> ACPI_FADT_F_FORCE_APIC_CLUSTER_MODEL); }
>> +fadt->century = RTC_CENTURY;
>>  }
>>  
>>  
> 
> Reviewed-by: Igor Mammedov 
> 

Thanks.

Can someone please pick up this patch?

Thanks
Laszlo

Re: [Qemu-devel] [Qemu-block] [PATCH 00/10] qcow2: Implement image locking

2016-01-04 Thread Max Reitz

On 24.12.2015 06:41, Denis V. Lunev wrote:
> On 12/24/2015 02:19 AM, Max Reitz wrote:
>> So the benefits of a qcow2 flag are only minor ones. However, I
>> personally believe that automatic unlock on crash is a very minor
>> benefit as well. That should never happen in practice anyway, and a
>> crashing qemu is such a great inconvenience that I as a user wouldn't
>> really mind having to unlock the image afterwards.
> IMHO you are wrong. This is VERY important. The situation would be exactly
> the same after node poweroff, which could happen and really happens in
> the real life from time to time.
> 
> In this cases VMs should start automatically and ASAP if configured this
> way. Any manual interaction here is a REAL pain.

Thanks, that's a good example.

However, I don't know much about management at that layer, so this is
probably where I'm out of the discussion.

(For instance, I don't know which kind of node you are talking about; I
presume it is a physical node, because if it was a virtual node, you'd
just kill the qemu instance in question by sending a QMP quit command.)

>> In fact, libvirt could even do that manually, couldn't it? If qemu
>> crashes, it just invokes qemu-img force-unlock on any qcow2 image which
>> was attached R/W to the VM.
> 
> in the situation above libvirt does not have the information or this
> information could be unreliable.

Well, then s/libvirt/any of the management layers/. As far as I know,
qemu-img commands are still used pretty high up in the stack.

>>> As an alternative, can we introduce .bdrv_flock() in protocol
>>> drivers, with
>>> similar semantics to flock(2) or lockf(3)? That way all formats can
>>> benefit,
>>> and a program crash will automatically drop the lock.
>> Making other formats benefit from addressing this issue is a good point,
>> but it too is a minor point. Formats other than qcow2 and raw are only
>> supported for compatibility anyway, and we don't need this feature for
>> raw.
> I would like to have this covered by flock and this indeed working for
> years with Parallels.
> 
>>
>> I feel like most of the question which approach to take revolves around
>> "But what if qemu crashes?". You (and others) are right in that having
>> to manually unlock the image then is cumbersome, however, I think that:
>> (1) qemu should never crash anyway.
>> (2) If qemu does crash, having to unlock the image is probably the
>>  least of your worries.
>> (3) If you are using libvirt, it should actually be possible for
>>  libvirt to automatically force-unlock images on qemu crash.
>>
>> This is why I don't think that keeping a locked image behind on qemu
>> crash is actually an issue.
>>
>> Max
>>
> pls see above. Node failure and unexpected power loss really matters.

Good points indeed (maybe, I can't actually judge, but I'll trust you).

Max



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [Qemu-block] ping Re: [PATCH v12] block/raw-posix.c: Make physical devices usable in QEMU under Mac OS X host

2016-01-04 Thread Programmingkid

On Jan 4, 2016, at 11:35 AM, Max Reitz wrote:

> On 29.12.2015 01:27, Programmingkid wrote:
>> I do realize you are busy Kevin, but I would
>> appreciate knowing my patch is in line 
>> for review.
> 
> Primarily, he's been on holiday since before christmas until next week.
> 
> (I'm telling you so you don't wonder why nothing happens.)
> 
> Max
> 

Thank you very much. I guess I have been a little frustrated with
this patch. I have been trying to have it submitted into QEMU
since August 2015!

Re: [Qemu-devel] [PATCH v4] igd-passthrough-i440FX: convert to realize()

2016-01-04 Thread Lars Kurth

On 04/01/2016 14:47, "Stefano Stabellini"
 wrote:

>Unfortunately I don't have a setup to test this either. Maybe Lars can
>find out who should be involved on the Intel side on this.

I can certainly help to this and get back to you. What exactly are we
asking Intel to do?
It is not clear to me from this email thread

Regards
Lars

Re: [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter

2016-01-04 Thread Dr. David Alan Gilbert

* Jason Wang (jasow...@redhat.com) wrote:
> 
> 
> On 01/04/2016 04:16 PM, Zhang Chen wrote:
> >
> >
> > On 01/04/2016 01:37 PM, Jason Wang wrote:
> >>
> >> On 12/31/2015 04:40 PM, Zhang Chen wrote:
> >>>
> >>> On 12/31/2015 10:36 AM, Jason Wang wrote:
>  On 12/22/2015 06:42 PM, Zhang Chen wrote:
> > From: zhangchen 
> >
> > Hi,all
> >
> > This patch add an colo-proxy object, COLO-Proxy is a part of COLO,
> > based on qemu netfilter and it's a plugin for qemu netfilter. the
> > function
> > keep Secondary VM connect normal to Primary VM and compare packets
> > sent by PVM to sent by SVM.if the packet difference,notify COLO do
> > checkpoint and send all primary packet has queued.
>  Thanks for the work. I don't object this method but still not
>  convinced
>  that qemu is the best place for this.
> 
>  As been raised in the past discussion, it's almost impossible to
>  cooperate with vhost backends. If we want this to be used in
>  production
>  environment, need to think of a solution for vhost. There's no such
>  worry if we decouple this from qemu.
> 
> > You can also get the series from:
> >
> > https://github.com/zhangckid/qemu/tree/colo-v2.2-periodic-mode-with-colo-proxyV2
> >
> >
> >
> > Usage:
> >
> > primary:
> > -netdev tap,id=bn0 -device e1000,netdev=bn0
> > -object
> > colo-proxy,id=f0,netdev=bn0,queue=all,mode=primary,addr=host:port
> >
> > secondary:
> > -netdev tap,id=bn0 -device e1000,netdev=bn0
> > -object
> > colo-proxy,id=f0,netdev=bn0,queue=all,mode=secondary,addr=host:port
>  Have a quick glance at how secondary mode work. What it does is just
>  forwarding packets between a nic and a socket, qemu socket backend did
>  exact the same job. You could even use socket in primary node and let
>  packet compare module talk to both primary and secondary node.
> >>> If we use qemu socket backend , the same netdev will used by qemu
> >>> socket and
> >>> qemu netfilter. this will against qemu net design. and then, when colo
> >>> do failover,
> >>> secondary do not have backend to use. that's the real problem.
> >> Then, maybe it's time to implement changing the netdev of a nic. The
> >> point here is that what secondary mode did is in fact a netdev backend
> >> instead of a filter ...
> >
> > Currently, you are right. in colo-proxy V2 code, I just compare IP
> > packet to
> > decide whether to do checkpoint.
> > But, in colo-proxy V3 I will compare tcp,icmp,udp packet to decide it.
> > because that can reduce frequency of checkpoint and improve
> > performance. To keep tcp connection well, colo secondary need to record
> > primary guest's init seq and adjust secondary guest's ack. if colo do
> > failover,
> > secondary also need do this to old tcp connection. qemu socket
> > can't do this job.
> 
> So a question here: is it a must to do things (e.g TCP analysis stuffs)
> at secondary? Looks like we could do this at primary node. And I saw
> you're doing packet comparing in primary node, any advantages of doing
> this in primary instead of secondary?

It needs to do this on the secondary; the trick is that things like TCP sequence
numbers are likely to be different on the primary and secondary; the kernel 
colo-proxy
implementation solved this problem by rewriting the sequence numbers on
the secondary to match the primary, after a failover, the secondary has
to keep doing that rewrite to ensure existing connections are OK.
Thus it's holding some state about the current connections.
I think also, to be able to do a 2nd failover (i.e. recover from the 1st failure
and then sometime later have another) you'd have to sync this
state over to a new host, so again that says the state needs to be part of
qemu or at least easily available to it.

Dave

> > and another problem is do failover, if we use qemu socket
> > to be backend in secondary, when colo do failover, I don't know how to
> > change
> > secondary be a normal qemu, if you know, please tell me.
> 
> Current qemu couldn't do this, but I mean we implement something like
> nic_change_backend which can change nic's peer(s). With this, in
> secondary, we can replace the socket backend with whatever you want (e.g
> tap or other).
> 
> Thanks
> 
> >
> >
> > Thanks for your revew
> > zhangchen 
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [Qemu-block] ping Re: [PATCH v12] block/raw-posix.c: Make physical devices usable in QEMU under Mac OS X host

2016-01-04 Thread Max Reitz

On 29.12.2015 01:27, Programmingkid wrote:
> I do realize you are busy Kevin, but I would
> appreciate knowing my patch is in line 
> for review.

Primarily, he's been on holiday since before christmas until next week.

(I'm telling you so you don't wonder why nothing happens.)

Max



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH v2 1/3] Enable PPC64 with TPM support

2016-01-04 Thread Stefan Berger

From: Stefan Berger 

Compile the TPM passthrough device emulation on ppc64.

Signed-off-by: Stefan Berger 
CC: Alexander Graf 
CC: qemu-...@nongnu.org
---
 configure | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/configure b/configure
index 83b40fc..82ca1b5 100755
--- a/configure
+++ b/configure
@@ -3229,7 +3229,8 @@ fi
 ##
 # TPM passthrough is only on x86 Linux
 
-if test "$targetos" = Linux && test "$cpu" = i386 -o "$cpu" = x86_64; then
+if test "$targetos" = Linux && test "$cpu" = i386 -o "$cpu" = x86_64 \
+-o "$cpu" = ppc64; then
   tpm_passthrough=$tpm
 else
   tpm_passthrough=no
-- 
2.4.3

1 2 >

1 - 100 of 152 matches

Mail list logo