date:20170807

** Changed in: qemu
   Status: Incomplete => Triaged

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1708551

Title:
  macOS Guest Reading USB 3.0 Bus as USB 2.0

Status in QEMU:
  Triaged

Bug description:
  Description:
  I'm having trouble with USB Passthrough. Running `system_profiler 
SPUSBDataType` on macOS guest confirms that the system "sees" my device, and 
that qemu is passing *something* through. However, the system sees my 
connection as USB 2.0, even though i'm passing through XHCI controllers 
(nec-usb-xhci/qemu-xhci). I suspect this is the reason why my guest is unable 
to recognize my iPhone in XCode & iTunes.

  Parameters:
  QEMU release version: 2.10.0-rc0
  Bios: [patched version of OVMF](https://github.com/gsomlo/edk2/tree/macboot)]
  Command Line: https://pastebin.com/pBSYbrW1
  Host: Arch Linux
  Guest: macOS v10.12.6
  Guest System Info: https://pastebin.com/U1Tihxei

  Notes:
  1. Observed sometime after late-May-early-June of this year.

  2. Due to [a defect in qemu v2.8 which affected GTK
  users](https://bugs.launchpad.net/qemu/+bug/1578192), and [a recent
  change to macOS' booting process conflicting with qemu
  v2.9](https://lists.nongnu.org/archive/html/qemu-
  devel/2017-03/msg06366.html), i'm forced to use qemu v2.10.0-rc0 (as
  -rc1 fails to load OVMF right now).

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1708551/+subscriptions

[Qemu-devel] [Bug 1709025] Re: Disk corrupted after snapshot deletion

Please try again with the latest release of QEMU first (currently 2.9,
or the latest RC of 2.10). QEMU 2.1.2 is very old, the bug might have
been fixed in the latest release already.

** Changed in: qemu
   Status: New => Incomplete

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1709025

Title:
  Disk corrupted after snapshot deletion

Status in QEMU:
  Incomplete

Bug description:
I found the vm disk corruption after snapshot deletion sometimes(the 
probability is very low, I'm afraid i can't reproduce it). And I found there is 
a patch for it as follow, but I'm not sure whether the patch repaired the bug. 
Drain disk before snapshot deletion can't guarantee anything, there is 
still pending IO in snapshot-deletion process. Anyone can help?

  authorZhang Haoyu    2014-10-21 16:38:01 
+0800
  committer Stefan Hajnoczi    2014-11-03 09:48:42 
+
  commit3432a1929ee18e08787ce35476abd74f2c93a17c (patch)
  tree  13a81c0a46707d91622f1593ccf7b926935371fd /block/snapshot.c
  parent573742a5431a99ceaba6968ae269cee247727cce (diff)
  snapshot: add bdrv_drain_all() to bdrv_snapshot_delete() to avoid concurrency 
problem
  If there are still pending i/o while deleting snapshot,
  because deleting snapshot is done in non-coroutine context, and
  the pending i/o read/write (bdrv_co_do_rw) is done in coroutine context,
  so it's possible to cause concurrency problem between above two operations.
  Add bdrv_drain_all() to bdrv_snapshot_delete() to avoid this problem.

  Signed-off-by: Zhang Haoyu 
  Reviewed-by: Paolo Bonzini 
  Message-id: 201410211637596311...@sangfor.com
  Signed-off-by: Stefan Hajnoczi 
  Diffstat (limited to 'block/snapshot.c')
  -rw-r--r--block/snapshot.c4   
  1 files changed, 4 insertions, 0 deletions
  diff --git a/block/snapshot.c b/block/snapshot.c
  index 85c52ff..698e1a1 100644
  --- a/block/snapshot.c
  +++ b/block/snapshot.c
  @@ -236,6 +236,10 @@ int bdrv_snapshot_delete(BlockDriverState *bs,
   error_setg(errp, "snapshot_id and name are both NULL");
   return -EINVAL;
   }
  +
  +/* drain all pending i/o before deleting snapshot */
  +bdrv_drain_all();
  +
   if (drv->bdrv_snapshot_delete) {
   return drv->bdrv_snapshot_delete(bs, snapshot_id, name, errp);
   }

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1709025/+subscriptions

Re: [Qemu-devel] [Bug 1708551] Re: macOS Guest Reading USB 3.0 Bus as USB 2.0

2017-08-07 Thread Divine E

In doing so, the device is now unseen by the system.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1708551

Title:
  macOS Guest Reading USB 3.0 Bus as USB 2.0

Status in QEMU:
  Incomplete

Bug description:
  Description:
  I'm having trouble with USB Passthrough. Running `system_profiler 
SPUSBDataType` on macOS guest confirms that the system "sees" my device, and 
that qemu is passing *something* through. However, the system sees my 
connection as USB 2.0, even though i'm passing through XHCI controllers 
(nec-usb-xhci/qemu-xhci). I suspect this is the reason why my guest is unable 
to recognize my iPhone in XCode & iTunes.

  Parameters:
  QEMU release version: 2.10.0-rc0
  Bios: [patched version of OVMF](https://github.com/gsomlo/edk2/tree/macboot)]
  Command Line: https://pastebin.com/pBSYbrW1
  Host: Arch Linux
  Guest: macOS v10.12.6
  Guest System Info: https://pastebin.com/U1Tihxei

  Notes:
  1. Observed sometime after late-May-early-June of this year.

  2. Due to [a defect in qemu v2.8 which affected GTK
  users](https://bugs.launchpad.net/qemu/+bug/1578192), and [a recent
  change to macOS' booting process conflicting with qemu
  v2.9](https://lists.nongnu.org/archive/html/qemu-
  devel/2017-03/msg06366.html), i'm forced to use qemu v2.10.0-rc0 (as
  -rc1 fails to load OVMF right now).

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1708551/+subscriptions

Re: [Qemu-devel] [PATCH] tests/pxe: Check virtio-net-ccw on s390x

On 07.08.2017 22:35, Michael S. Tsirkin wrote:
> On Thu, Aug 03, 2017 at 03:30:19PM +0200, Thomas Huth wrote:
>> Now that we've got a firmware that can do TFTP booting on s390x (i.e.
>> the pc-bios/s390-netboot.img), we can enable the PXE tester for this
>> architecture, too.
>>
>> Signed-off-by: Thomas Huth 
>> ---
>>  Since this only adds a test, I guess this could still be included
>>  for 2.10 ... otherwise, I think it's also OK to simply postpone this
>>  to 2.11 instead
> 
> I think 2.11 is preferable.
> 
>>  tests/Makefile.include |  1 +
>>  tests/boot-sector.c| 25 ++---
>>  tests/pxe-test.c   |  7 +++
>>  3 files changed, 30 insertions(+), 3 deletions(-)
>>
>> diff --git a/tests/Makefile.include b/tests/Makefile.include
>> index 59e536b..f9af5e3 100644
>> --- a/tests/Makefile.include
>> +++ b/tests/Makefile.include
>> @@ -337,6 +337,7 @@ check-qtest-microblazeel-y = $(check-qtest-microblaze-y)
>>  check-qtest-xtensaeb-y = $(check-qtest-xtensa-y)
>>  
>>  check-qtest-s390x-y = tests/boot-serial-test$(EXESUF)
>> +check-qtest-s390x-y += tests/pxe-test$(EXESUF)
>>  
>>  check-qtest-generic-y += tests/qom-test$(EXESUF)
>>  check-qtest-generic-y += tests/test-hmp$(EXESUF)
>> diff --git a/tests/boot-sector.c b/tests/boot-sector.c
>> index e3880f4..87a8fb5 100644
>> --- a/tests/boot-sector.c
>> +++ b/tests/boot-sector.c
>> @@ -21,6 +21,7 @@
>>  #define SIGNATURE 0xdead
>>  #define SIGNATURE_OFFSET 0x10
>>  #define BOOT_SECTOR_ADDRESS 0x7c00
>> +#define SIGNATURE_ADDR (BOOT_SECTOR_ADDRESS + SIGNATURE_OFFSET)
>>  
>>  /* Boot sector code: write SIGNATURE into memory,
>>   * then halt.
>> @@ -73,6 +74,7 @@ int boot_sector_init(char *fname)
>>  {
>>  int fd, ret;
>>  size_t len = sizeof boot_sector;
>> +const char *arch = qtest_get_arch();
>>  
>>  fd = mkstemp(fname);
>>  if (fd < 0) {
>> @@ -81,10 +83,27 @@ int boot_sector_init(char *fname)
>>  }
>>  
>>  /* For Open Firmware based system, we can use a Forth script instead */
>> -if (strcmp(qtest_get_arch(), "ppc64") == 0) {
>> +if (g_str_equal(arch, "ppc64")) {
>>  len = sprintf((char *)boot_sector, "\\ Bootscript\n%x %x c! %x %x 
>> c!\n",
>> -LOW(SIGNATURE), BOOT_SECTOR_ADDRESS + SIGNATURE_OFFSET,
>> -HIGH(SIGNATURE), BOOT_SECTOR_ADDRESS + SIGNATURE_OFFSET + 
>> 1);
>> +LOW(SIGNATURE), SIGNATURE_ADDR,
>> +HIGH(SIGNATURE), SIGNATURE_ADDR + 1);
>> +} else if (g_str_equal(arch, "s390x")) {
>> +/* For s390x, fake a kernel signature */
>> +const uint8_t psw[] = {
>> +0x00, 0x08, 0x00, 0x00, 0x80, 0x01, 0x00, 0x00
>> +};
>> +const uint8_t code[] = {
>> +0xa7, 0xf4, 0x00, 0x0a, /* j 0x10010 */
>> +0x00, 0x00, 0x00, 0x00,
>> +'S', '3', '9', '0',
>> +'E', 'P', 0x00, 0x01,
>> +0xa7, 0x38, HIGH(SIGNATURE_ADDR), LOW(SIGNATURE_ADDR), /* lhi 
>> r3 */
>> +0xa7, 0x48, LOW(SIGNATURE), HIGH(SIGNATURE),/* lhi 
>> r4,0xadde */
>> +0x40, 0x40, 0x30, 0x00, /* sth r4,0(r3) */
>> +0xa7, 0xf4, 0xff, 0xfa  /* j 0x10010 */
>> +};
>> +memcpy(boot_sector, psw, 8);
>> +memcpy(_sector[0x1], code, sizeof(code));
> 
> 
> Could we avoid overwriting boot sector?
> We really should have
> x86_boot_sector
> ppc64_boot_sector
> s390_boot_sector
> 
> And write out the correct thing.

Ok, I don't mind either way ... I can try to come up with a patch.

> Also:
>  * Q35 machine requires a minimum 0x7e000 bytes disk.
>  * (bug or feature?)
> 
> probably does not apply to s390x, does it?

We can also just load 0x1 + sizeof(code) bytes here. I'll fix it.

Cornelia, please unqueue the patch, I'll send a v2...

 Thomas

Re: [Qemu-devel] [Qemu-block] [PATCH] block: document semanatics of bdrv_co_preadv|pwritev

On Fri, 08/04 16:49, Daniel P. Berrange wrote:
> This is odd.  In the bdrv_aligned_readv() it looks very much like
> we'll reference qiov->niov, if bytes != 0, so if qiov was NULL we
> would crash.

It doesn't make sense if read doesn't have an iov, where should the data be
placed? :)

> 
> In bdrv_aligned_writev(), qiov->niov is also refernced if bytes != 0,
> *unless*  flags contains BDRV_REQ_ZERO_WRITE, in which case we'll
> invoke bdrv_co_do_pwrite_zeroes() instead.

This is intended. Zero-write doesn't need qiov, hence the BDRV_REQ_ZERO_WRITE
branch. Otherwise, we can assert qiov != NULL.

> 
> So unless I'm missing something, bdrv_co_preadv|writev cannot be
> called with a NULL  qiov, and bdrv_aligned_writev|readv might
> need their assertions tightened up.

bdrv_co_pwritev _is_ called with a NULL qiov from blk_aio_pwrite_zeroes. Your
other reasonings are right.

So for write we cannot remove the bytes parameter.

Fam

[Qemu-devel] Howto: instructing patchew about patch dependency

Hi, QEMU patch submitters,

I've patched patchew so that if you add a line:

Based-on: $MESSAGE_ID

in your cover letter body, or in a reply to it (or reply to reply), patchew will
try to apply your series on top of the one $MESSAGE_ID refers to, assuming the
base series _was_ successfully applied by patchew earlier.

In the single patch case, you can put the line under the "---" mark in the
commit message (no need to let it enter git history).

For human readability, around the "Based-on:" line it may be better if the
subject is noted. E.g.:

Based-on: 20170804140942.19342-1-f...@redhat.com
([PATCH for-2.10] vmdk: Fix error handling/reporting of vmdk_check)

Please don't join them in one line because it will confuse the parser, after all
following the message id text, there aren't many spaces left.

Fam

Re: [Qemu-devel] [PATCH f0r 2.11] runstate/migrate: Two more transitions

2017-08-07 Thread Peter Xu

On Mon, Aug 07, 2017 at 01:25:29PM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (pet...@redhat.com) wrote:
> > On Fri, Aug 04, 2017 at 06:50:11PM +0100, Dr. David Alan Gilbert (git) 
> > wrote:
> > > From: "Dr. David Alan Gilbert" 
> > > 
> > > There's a race if someone does a 'stop' near the end of migrate;
> > > the migration process goes through two runstates:
> > > 'finish migrate'
> > > 'postmigrate'
> > > 
> > > If the user issues a 'stop' between the two we end up with invalid
> > > state transitions.
> > > Add the transitions as valid.
> > > 
> > > Signed-off-by: Dr. David Alan Gilbert 
> > > ---
> > >  vl.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > > 
> > > diff --git a/vl.c b/vl.c
> > > index 99fcfa0442..bacb03f49d 100644
> > > --- a/vl.c
> > > +++ b/vl.c
> > > @@ -621,6 +621,7 @@ static const RunStateTransition 
> > > runstate_transitions_def[] = {
> > >  
> > >  { RUN_STATE_PAUSED, RUN_STATE_RUNNING },
> > >  { RUN_STATE_PAUSED, RUN_STATE_FINISH_MIGRATE },
> > > +{ RUN_STATE_PAUSED, RUN_STATE_POSTMIGRATE },
> > >  { RUN_STATE_PAUSED, RUN_STATE_PRELAUNCH },
> > >  { RUN_STATE_PAUSED, RUN_STATE_COLO},
> > >  
> > > @@ -633,6 +634,7 @@ static const RunStateTransition 
> > > runstate_transitions_def[] = {
> > >  { RUN_STATE_PRELAUNCH, RUN_STATE_INMIGRATE },
> > >  
> > >  { RUN_STATE_FINISH_MIGRATE, RUN_STATE_RUNNING },
> > > +{ RUN_STATE_FINISH_MIGRATE, RUN_STATE_PAUSED },
> > >  { RUN_STATE_FINISH_MIGRATE, RUN_STATE_POSTMIGRATE },
> > >  { RUN_STATE_FINISH_MIGRATE, RUN_STATE_PRELAUNCH },
> > >  { RUN_STATE_FINISH_MIGRATE, RUN_STATE_COLO},
> > 
> > Do we need this as well?
> > 
> > { RUN_STATE_POSTMIGRATE, RUN_STATE_PAUSED },
> 
> Apparently not:
> 
> (qemu) migrate "exec:cat > /dev/null"
> (qemu) infomigrate
> unknown command: 'infomigrate'
> (qemu) info status
> VM status: paused (postmigrate)
> (qemu) stop
> (qemu) info status
> VM status: paused (postmigrate)
> (qemu)
> 
> 
> so doing a stop at that point doesn't seem to cause any problem.

Yes. Thanks.

Reviewed-by: Peter Xu 

-- 
Peter Xu

Re: [Qemu-devel] [RFC PATCH 29/56] block: Make BlockDirtyInfo byte count unsigned in QAPI/QMP



On 08/07/2017 10:45 AM, Markus Armbruster wrote:
> Byte counts should use QAPI type 'size' (uint64_t).  BlockDirtyInfo
> member @count is 'int' (int64_t).  bdrv_query_dirty_bitmaps() computes
> @count from bdrv_get_dirty_count() in uint64_t, then implicitly
> converts to int64_t.  Before the commit before previous, the
> conversion was in bdrv_get_dirty_count() instead.
> 
> Change member @count to 'size'.
> 
> query-block now reports @count values above 2^63-1 correctly instead
> of their (negative) two's complement.
> 
> Signed-off-by: Markus Armbruster 

Assuming there's no "gotcha" here introduced by changing the QAPI, then
ACK; but you're the expert there, so I trust you!

Reviewed-by: John Snow

Re: [Qemu-devel] [RFC PATCH 28/56] block: Widen dirty bitmap granularity to uint64_t for safety



On 08/07/2017 10:45 AM, Markus Armbruster wrote:
> Block dirty bitmaps represent granularity in bytes as uint32_t.  It
> must be a power of two and a multiple of BDRV_SECTOR_SIZE.
> 
> The trouble with uint32_t is computations like this one in
> mirror_do_read():
> 
> uint64_t max_bytes;
> 
> max_bytes = s->granularity * s->max_iov;
> 
> The operands of * are uint32_t and int, so the product is computed in
> uint32_t (assuming 32 bit int), then zero-extended to uint64_t.
> 
> Since granularity is generally combined with 64 bit file offsets, it's
> best to make it 64 bits, too.  Less opportunity to screw up.
> 
> Signed-off-by: Markus Armbruster 

[bweeooop]

> --- a/block/dirty-bitmap.c
> +++ b/block/dirty-bitmap.c

[bwp]

> @@ -506,16 +506,11 @@ uint32_t 
> bdrv_get_default_bitmap_granularity(BlockDriverState *bs)
>  return granularity;
>  }
>  
> -uint32_t bdrv_dirty_bitmap_granularity(const BdrvDirtyBitmap *bitmap)
> +uint64_t bdrv_dirty_bitmap_granularity(const BdrvDirtyBitmap *bitmap)
>  {
>  return BDRV_SECTOR_SIZE << hbitmap_granularity(bitmap->bitmap);
>  }
>  
> -uint32_t bdrv_dirty_bitmap_meta_granularity(BdrvDirtyBitmap *bitmap)
> -{
> -return BDRV_SECTOR_SIZE << hbitmap_granularity(bitmap->meta);
> -}

Why? Unused? Not cool enough to mention?

Re: [Qemu-devel] [RFC PATCH 27/56] block/dirty-bitmap: Clean up signed vs. unsigned dirty counts



On 08/07/2017 10:45 AM, Markus Armbruster wrote:
> hbitmap_count() returns uint64_t.
> 
> Clean up test-hbitmap.c to check its value with g_assert_cmpuint()
> instead of g_assert_cmpint().
> 
> bdrv_get_dirty_count() and bdrv_get_meta_dirty_count() return its
> value converted to int64_t.  Clean them up to return it unadulterated.
> 
> This moves the implicit conversion to some callers, so clean them up,
> too.
> 
> mirror_run() assigns the value of bdrv_get_meta_dirty_count() to a
> local int64_t variable.  Change it to uint64_t.  Signedness still gets
> mixed up in the computation of s->common.len, but messing with that is
> more than I can handle right now.
> 
> get_remaining_dirty() tallies bdrv_get_dirty_count() values in
> int64_t.  Its caller block_save_pending() converts it back to
> uint64_t.  Change get_remaining_dirty() to uint64_t.
> 
> Signed-off-by: Markus Armbruster 

So these functions can't fail, so there's no reason to keep the int64_t
type around, OK.

Reviewed-by: John Snow

Re: [Qemu-devel] About virtio device hotplug in Q35! 【外域邮件.谨慎查阅】

2017-08-07 Thread Bob Chen

1. How to test the KVM exit rate?

2. The switches are separate devices of PLX Technology

# lspci -s 07:08.0 -nn
07:08.0 PCI bridge [0604]: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port
PCI Express Gen 3 (8.0 GT/s) Switch [10b5:8747] (rev ca)

# This is one of the Root Ports in the system.
[:00]-+-00.0  Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D
DMI2
 +-01.0-[01]00.0  LSI Logic / Symbios Logic MegaRAID SAS
2208 [Thunderbolt]
 +-02.0-[02-05]--
 +-03.0-[06-09]00.0-[07-09]--+-08.0-[08]--+-00.0  NVIDIA
Corporation GP102 [TITAN Xp]
 |   |\-00.1  NVIDIA
Corporation GP102 HDMI Audio Controller
 |   \-10.0-[09]--+-00.0  NVIDIA
Corporation GP102 [TITAN Xp]
 |\-00.1  NVIDIA
Corporation GP102 HDMI Audio Controller




3. ACS

It seemed that I had misunderstood your point? I finally found ACS
information on switches, not on GPUs.

Capabilities: [f24 v1] Access Control Services
ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl+
DirectTrans+
ACSCtl: SrcValid+ TransBlk- ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl-
DirectTrans-



2017-08-07 23:52 GMT+08:00 Alex Williamson :

> On Mon, 7 Aug 2017 21:00:04 +0800
> Bob Chen  wrote:
>
> > Bad news... The performance had dropped dramatically when using emulated
> > switches.
> >
> > I was referring to the PCIe doc at
> > https://github.com/qemu/qemu/blob/master/docs/pcie.txt
> >
> > # qemu-system-x86_64_2.6.2 -enable-kvm -cpu host,kvm=off -machine
> > q35,accel=kvm -nodefaults -nodefconfig \
> > -device ioh3420,id=root_port1,chassis=1,slot=1,bus=pcie.0 \
> > -device x3130-upstream,id=upstream_port1,bus=root_port1 \
> > -device
> > xio3130-downstream,id=downstream_port1,bus=upstream_
> port1,chassis=11,slot=11
> > \
> > -device
> > xio3130-downstream,id=downstream_port2,bus=upstream_
> port1,chassis=12,slot=12
> > \
> > -device vfio-pci,host=08:00.0,multifunction=on,bus=downstream_port1 \
> > -device vfio-pci,host=09:00.0,multifunction=on,bus=downstream_port2 \
> > -device ioh3420,id=root_port2,chassis=2,slot=2,bus=pcie.0 \
> > -device x3130-upstream,id=upstream_port2,bus=root_port2 \
> > -device
> > xio3130-downstream,id=downstream_port3,bus=upstream_
> port2,chassis=21,slot=21
> > \
> > -device
> > xio3130-downstream,id=downstream_port4,bus=upstream_
> port2,chassis=22,slot=22
> > \
> > -device vfio-pci,host=89:00.0,multifunction=on,bus=downstream_port3 \
> > -device vfio-pci,host=8a:00.0,multifunction=on,bus=downstream_port4 \
> > ...
> >
> >
> > Not 8 GPUs this time, only 4.
> >
> > *1. Attached to pcie bus directly (former situation):*
> >
> > Unidirectional P2P=Disabled Bandwidth Matrix (GB/s)
> >D\D 0  1  2  3
> >  0 420.93  10.03  11.07  11.09
> >  1  10.04 425.05  11.08  10.97
> >  2  11.17  11.17 425.07  10.07
> >  3  11.25  11.25  10.07 423.64
> > Unidirectional P2P=Enabled Bandwidth Matrix (GB/s)
> >D\D 0  1  2  3
> >  0 425.98  10.03  11.07  11.09
> >  1   9.99 426.43  11.07  11.07
> >  2  11.04  11.20 425.98   9.89
> >  3  11.21  11.21  10.06 425.97
> > Bidirectional P2P=Disabled Bandwidth Matrix (GB/s)
> >D\D 0  1  2  3
> >  0 430.67  10.45  19.59  19.58
> >  1  10.44 428.81  19.49  19.53
> >  2  19.62  19.62 429.52  10.57
> >  3  19.60  19.66  10.43 427.38
> > Bidirectional P2P=Enabled Bandwidth Matrix (GB/s)
> >D\D 0  1  2  3
> >  0 429.47  10.47  19.52  19.39
> >  1  10.48 427.15  19.64  19.52
> >  2  19.64  19.59 429.02  10.42
> >  3  19.60  19.64  10.47 427.81
> > P2P=Disabled Latency Matrix (us)
> >D\D 0  1  2  3
> >  0   4.50  13.72  14.49  14.44
> >  1  13.65   4.53  14.52  14.33
> >  2  14.22  13.82   4.52  14.50
> >  3  13.87  13.75  14.53   4.55
> > P2P=Enabled Latency Matrix (us)
> >D\D 0  1  2  3
> >  0   4.44  13.56  14.58  14.45
> >  1  13.56   4.48  14.39  14.45
> >  2  13.85  13.93   4.86  14.80
> >  3  14.51  14.23  14.70   4.72
> >
> >
> > *2. Attached to emulated Root Port and Switches:*
> >
> > Unidirectional P2P=Disabled Bandwidth Matrix (GB/s)
> >D\D 0  1  2  3
> >  0 420.48   3.15   3.12   3.12
> >  1   3.13 422.31   3.12   3.12
> >  2   3.08   3.09 421.40   3.13
> >  3   3.10   3.10   3.13 418.68
> > Unidirectional P2P=Enabled Bandwidth Matrix (GB/s)
> >D\D 0  1  2  3
> >  0 418.68   3.14   3.12   3.12
> >  1   3.15 420.03   3.12   3.12
> >  2   3.11   3.10 421.39   3.14
> >  3   3.11   3.08   3.13 419.13
> > Bidirectional P2P=Disabled Bandwidth Matrix (GB/s)
> >D\D 0  1  2  3
> >  0 424.36   5.36   5.35   5.34
> >  1   5.36 424.36   5.34

Re: [Qemu-devel] [PATCH for 2.10 v3 4/6] docker: docker.py make --no-cache skip checksum test

On Mon, 08/07 16:39, Alex Bennée wrote:
> If you invoke with NOCACHE=1 we pass --no-cache in the argv to
> docker.py but may still not force a rebuild if the dockerfile checksum
> hasn't changed. By testing for its presence we can force builds
> without having to manually remove the docker image.
> 
> Signed-off-by: Alex Bennée 
> ---
>  tests/docker/docker.py | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/tests/docker/docker.py b/tests/docker/docker.py
> index ee40ca04d9..aab1648cc5 100755
> --- a/tests/docker/docker.py
> +++ b/tests/docker/docker.py
> @@ -261,7 +261,8 @@ class BuildCommand(SubCommand):
>  tag = args.tag
>  
>  dkr = Docker()
> -if dkr.image_matches_dockerfile(tag, dockerfile):
> +if "--no-cache" not in argv and \
> +   dkr.image_matches_dockerfile(tag, dockerfile):
>  if not args.quiet:
>  print "Image is up to date."
>  else:
> -- 
> 2.13.0
> 
> 

NACK.

Please add "--no-cache" to BuildCommand.args, and use args.no_cache.

diff --git a/tests/docker/docker.py b/tests/docker/docker.py
index ee40ca04d9..530bc62d40 100755
--- a/tests/docker/docker.py
+++ b/tests/docker/docker.py
@@ -251,6 +251,8 @@ class BuildCommand(SubCommand):
 parser.add_argument("--add-current-user", "-u", dest="user",
 action="store_true",
 help="Add the current user to image's passwd")
+parser.add_argument("--no-cache", action="store_true",
+help="Disable docker cache and checksum")
 parser.add_argument("tag",
 help="Image Tag")
 parser.add_argument("dockerfile",

Re: [Qemu-devel] [PATCH for 2.10 v3 1/6] docker: ensure NOUSER for travis images

On Mon, 08/07 16:39, Alex Bennée wrote:
> While adding the current user is a useful default behaviour for
> creating new images it is not appropriate for Travis which already has
> a default user.

Target docker-travis@travis will not have NOUSER=1 automatically, right? Is that
a problem?

Fam

> 
> Signed-off-by: Alex Bennée 
> Reviewed-by: Philippe Mathieu-Daudé 
> Tested-by: Philippe Mathieu-Daudé 
> ---
>  tests/docker/Makefile.include | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/tests/docker/Makefile.include b/tests/docker/Makefile.include
> index aaab1a4208..d7dafdbd27 100644
> --- a/tests/docker/Makefile.include
> +++ b/tests/docker/Makefile.include
> @@ -71,6 +71,7 @@ docker-image-debian-ppc64el-cross: docker-image-debian9
>  docker-image-debian-s390x-cross: docker-image-debian9
>  docker-image-debian-win32-cross: docker-image-debian8-mxe
>  docker-image-debian-win64-cross: docker-image-debian8-mxe
> +docker-image-travis: NOUSER=1
>  
>  # Expand all the pre-requistes for each docker image and test combination
>  $(foreach i,$(DOCKER_IMAGES), \
> -- 
> 2.13.0
> 
>

[Qemu-devel] [Bug 1709025] Re: Disk corrupted after snapshot deletion

2017-08-07 Thread junchi

My qemu version is 2.1.2

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1709025

Title:
  Disk corrupted after snapshot deletion

Status in QEMU:
  New

Bug description:
I found the vm disk corruption after snapshot deletion sometimes(the 
probability is very low, I'm afraid i can't reproduce it). And I found there is 
a patch for it as follow, but I'm not sure whether the patch repaired the bug. 
Drain disk before snapshot deletion can't guarantee anything, there is 
still pending IO in snapshot-deletion process. Anyone can help?

  authorZhang Haoyu    2014-10-21 16:38:01 
+0800
  committer Stefan Hajnoczi    2014-11-03 09:48:42 
+
  commit3432a1929ee18e08787ce35476abd74f2c93a17c (patch)
  tree  13a81c0a46707d91622f1593ccf7b926935371fd /block/snapshot.c
  parent573742a5431a99ceaba6968ae269cee247727cce (diff)
  snapshot: add bdrv_drain_all() to bdrv_snapshot_delete() to avoid concurrency 
problem
  If there are still pending i/o while deleting snapshot,
  because deleting snapshot is done in non-coroutine context, and
  the pending i/o read/write (bdrv_co_do_rw) is done in coroutine context,
  so it's possible to cause concurrency problem between above two operations.
  Add bdrv_drain_all() to bdrv_snapshot_delete() to avoid this problem.

  Signed-off-by: Zhang Haoyu 
  Reviewed-by: Paolo Bonzini 
  Message-id: 201410211637596311...@sangfor.com
  Signed-off-by: Stefan Hajnoczi 
  Diffstat (limited to 'block/snapshot.c')
  -rw-r--r--block/snapshot.c4   
  1 files changed, 4 insertions, 0 deletions
  diff --git a/block/snapshot.c b/block/snapshot.c
  index 85c52ff..698e1a1 100644
  --- a/block/snapshot.c
  +++ b/block/snapshot.c
  @@ -236,6 +236,10 @@ int bdrv_snapshot_delete(BlockDriverState *bs,
   error_setg(errp, "snapshot_id and name are both NULL");
   return -EINVAL;
   }
  +
  +/* drain all pending i/o before deleting snapshot */
  +bdrv_drain_all();
  +
   if (drv->bdrv_snapshot_delete) {
   return drv->bdrv_snapshot_delete(bs, snapshot_id, name, errp);
   }

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1709025/+subscriptions

[Qemu-devel] [PATCH] target/i386: set rip_offset for some SSE4.1 instructions

2017-08-07 Thread Joseph Myers

When emulating various SSE4.1 instructions such as pinsrd, the address
of a memory operand is computed without allowing for the 8-bit
immediate operand located after the memory operand, meaning that the
memory operand uses the wrong address in the case where it is
rip-relative.  This patch adds the required rip_offset setting for
those instructions, so fixing some GCC test failures (13 in the gcc
testsuite in my GCC 6-based testing) when testing with a default CPU
setting enabling those instructions.

Signed-off-by: Joseph Myers 

---

diff --git a/target/i386/translate.c b/target/i386/translate.c
index cab9e32..5fdadf9 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -4080,6 +4080,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, 
int b,
 if (sse_fn_eppi == SSE_SPECIAL) {
 ot = mo_64_32(s->dflag);
 rm = (modrm & 7) | REX_B(s);
+s->rip_offset = 1;
 if (mod != 3)
 gen_lea_modrm(env, s, modrm);
 reg = ((modrm >> 3) & 7) | rex_r;

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [Qemu-devel] [Qemu-block] [PATCH for-2.10] block/nfs: fix mutex assertion in nfs_file_close()



On 08/07/2017 06:29 PM, Jeff Cody wrote:
> Commit c096358e747e88fc7364e40e3c354ee0bb683960 introduced assertion
> checks for when qemu_mutex() functions are called without the
> corresponding qemu_mutex_init() having initialized the mutex.
> 
> This uncovered a latent bug in qemu's nfs driver - in
> nfs_client_close(), the NFSClient structure is overwritten with zeros,
> prior to the mutex being destroyed.
> 
> Go ahead and destroy the mutex in nfs_client_close(), and change where
> we call qemu_mutex_init() so that it is correctly balanced.
> 
> There are also a couple of memory leaks obscured by the memset, so this
> fixes those as well.
> 
> Finally, we should be able to get rid of the memset(), as it isn't
> necessary.
> 
> Signed-off-by: Jeff Cody 


Looks sane to me.

Reviewed-by: John Snow

[Qemu-devel] [PATCH 18/22] translate-all: protect TB jumps with a per-destination-TB lock

This applies to both user-mode and !user-mode emulation.

Instead of relying on a global lock, protect the list of incoming
jumps with tb->jmp_lock.

In order to find the destination TB (and therefore its jmp_lock)
from the origin TB, we introduce tb->jmp_dest[]. This results
in TranslationBlock using 3 cache lines on many systems instead
of 2, but benchmarking doesn't seem to suffer much because of it.

I considered not using a linked list of jumps, which simplifies
code and keeps the struct under two cache lines. However, it
unnecessarily increases memory usage, which results in a performance
decrease. See for instance these numbers booting+shutting down
debian-arm:
  Time (s)  Rel. err (%)  Abs. err (s)  Rel. slowdown (%)
--
 before  20.88  0.74  0.154512 0.
 after   20.81  0.38  0.079078-0.33524904
 GTree   21.02  0.28  0.058856 0.67049808
 GHashTable + xxhash 21.63  1.08  0.233604  3.5919540

Using a hash table or a binary tree to keep track of the jumps
doesn't really pay off, not only due to the increased memory usage,
but also because most TBs have only 0 or 1 jumps to them. The maximum
number of jumps when booting debian-arm that I measured is 35, but
as we can see in the histogram below a TB with that many incoming jumps
is extremely rare; the average TB has 0.80 incoming jumps.

n_jumps: 379208; avg jumps/tb: 0.801099
dist: [0.0,1.0)|▄█▁▁▁ ▁▁ ▁▁▁  ▁▁▁ ▁|[34.0,35.0]

Signed-off-by: Emilio G. Cota 
---
 docs/devel/multi-thread-tcg.txt |   6 ++-
 include/exec/exec-all.h |  76 ++-
 accel/tcg/cpu-exec.c|   4 +-
 accel/tcg/translate-all.c   | 110 +---
 4 files changed, 126 insertions(+), 70 deletions(-)

diff --git a/docs/devel/multi-thread-tcg.txt b/docs/devel/multi-thread-tcg.txt
index faf8918..36da1f1 100644
--- a/docs/devel/multi-thread-tcg.txt
+++ b/docs/devel/multi-thread-tcg.txt
@@ -131,8 +131,10 @@ DESIGN REQUIREMENT: Safely handle invalidation of TBs
 
 The direct jump themselves are updated atomically by the TCG
 tb_set_jmp_target() code. Modification to the linked lists that allow
-searching for linked pages are done under the protect of the
-tb_lock().
+searching for linked pages are done under the protection of tb->jmp_lock,
+where tb is the destination block of a jump. Each origin block keeps a
+pointer to its destinations so that the appropriate lock can be acquired before
+iterating over a jump list.
 
 The global page table is a lockless radix tree; cmpxchg is used
 to atomically insert new elements.
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 495efdb..64753a0 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -367,7 +367,7 @@ struct TranslationBlock {
 #define CF_NOCACHE 0x1 /* To be freed after execution */
 #define CF_USE_ICOUNT  0x2
 #define CF_IGNORE_ICOUNT 0x4 /* Do not generate icount code */
-#define CF_INVALID 0x8 /* TB is stale. Setters must acquire tb_lock */
+#define CF_INVALID 0x8 /* TB is stale. Protected by tb->jmp_lock */
 #define CF_PARALLEL0x10 /* Generate code for a parallel context */
 /* cflags' mask for hashing/comparison */
 #define CF_HASH_MASK (CF_COUNT_MASK | CF_PARALLEL)
@@ -399,20 +399,27 @@ struct TranslationBlock {
 #else
 uintptr_t jmp_target_addr[2]; /* target address for indirect jump */
 #endif
-/* Each TB has an associated circular list of TBs jumping to this one.
- * jmp_list_first points to the first TB jumping to this one.
- * jmp_list_next is used to point to the next TB in a list.
- * Since each TB can have two jumps, it can participate in two lists.
- * jmp_list_first and jmp_list_next are 4-byte aligned pointers to a
- * TranslationBlock structure, but the two least significant bits of
- * them are used to encode which data field of the pointed TB should
- * be used to traverse the list further from that TB:
- * 0 => jmp_list_next[0], 1 => jmp_list_next[1], 2 => jmp_list_first.
- * In other words, 0/1 tells which jump is used in the pointed TB,
- * and 2 means that this is a pointer back to the target TB of this list.
+/*
+ * Each TB has a NULL-terminated list (jmp_list_head) of incoming jumps.
+ * Each TB can have two outgoing jumps, and therefore can participate
+ * in two lists. The list entries are kept in jmp_list_next[2]. The least
+ * significant bit (LSB) of the pointers in these lists is used to encode
+ * which of the two list entries is to be used in the pointed TB.
+ *
+ * List traversals are protected by jmp_lock. The destination TB of each
+ * outgoing jump is kept in jmp_dest[] so that the appropriate jmp_lock
+ *

Re: [Qemu-devel] [Qemu-block] [PATCH v3 for-2.10 0/4] improved --version/--help tweaks

On 08/03/2017 12:33 PM, Eric Blake wrote:
> Not sure if this should go through Kevin's block tree, Paolo's
> miscellaneous patches, or if I should just do a pull request
> myself (since patch 4 includes a change to qemu-nbd)
> 
> since v2: add R-b on 2-4; also sort 'qemu-img --help' create text
> 
> 001/4:[0012] [FC] 'qemu-img: Sort sub-command names in --help'
> 002/4:[] [--] 'qemu-io: Give more --version information'
> 003/4:[] [--] 'qga: Give more --version information'
> 004/4:[] [--] 'maint: Include bug-reporting info in --help output'
> 
> Eric Blake (4):
>   qemu-img: Sort sub-command names in --help
>   qemu-io: Give more --version information
>   qga: Give more --version information
>   maint: Include bug-reporting info in --help output
> 
>  include/qemu-common.h |  5 +
>  vl.c  |  4 +++-
>  bsd-user/main.c   |  2 ++
>  linux-user/main.c |  4 +++-
>  qemu-img.c|  2 +-
>  qemu-io.c |  9 ++---
>  qemu-nbd.c|  2 +-
>  qga/main.c|  8 +---
>  qemu-img-cmds.hx  | 21 -
>  9 files changed, 38 insertions(+), 19 deletions(-)
> 

Nothing to keep the commands from going out of order again, it looks
like -- can this be added as a comment or otherwise scripted as a
./hey_it_looks_like_you_are_about_to_release_qemu.sh script that makes
sure we've dotted the 'i's and crossed the 't's?

Lastly, Didn't we fix the certificate issue? (I thought Jeff had) -- and
even if not, it's still the correct address to send people to IMO. Let
people report to us if the SSL certificate appears to be broken.

Eh, regardless of shed coloring:

Reviewed-by: John Snow

[Qemu-devel] [PATCH 22/22] tcg: remove tb_lock

Use mmap_lock in user-mode to protect TCG state and the page
descriptors.
In !user-mode, each vCPU has its own TCG state, so no locks
needed. Per-page locks are used to protect the page descriptors.

Per-TB locks are used in both modes to protect TB jumps.

Some notes:

- tcg_tb_lookup/remove/insert/etc have their own internal lock(s),
  so there is no need to further serialize access to them.

- do_tb_flush is run in a safe async context, meaning no other
  vCPU threads are running. Therefore acquiring mmap_lock there
  is just to please tools such as thread sanitizer.

- Not visible in the diff, but tb_invalidate_phys_page already
  has an assert_memory_lock.

- cpu_io_recompile is !user-only, so no mmap_lock there.

- Added mmap_unlock()'s before all siglongjmp's that could
  be called in user-mode while mmap_lock is held.
  + Added an assert for !have_mmap_lock() after returning from
the longjmp in cpu_exec, just like we do in cpu_exec_step_atomic.

Performance numbers for the entire series (i.e. since "tcg: enable
multiple TCG contexts in softmmu"):

Host: Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz, 16 2-way cores

- Single-threaded, debian-arm bootup+shutdown (5 runs):
  * Before:
20.617793812 seconds time elapsed  ( +-  0.73% )
  * After:
20.713322693 seconds time elapsed  ( +-  0.93% )

- "-smp 8", debian-arm bootup+shutdown (5 runs):
  * Before:
16.900252299 seconds time elapsed  ( +-  1.30% )
  * After:
12.777150639 seconds time elapsed  ( +-  1.11% )

- "-smp 16", bootup+shutdown of ubuntu-17.10-ppc64el (5 runs):
  [ NB. The actual times are 27s longer, but I'm subtracting that
since that's the time it takes on this host to start loading
the guest kernel (there's grub, etc) ]
  * Before:
115.382853773 seconds time elapsed ( +-  1.24% )
  * After:
76.361623367 seconds time elapsed ( +-  0.49% )

Signed-off-by: Emilio G. Cota 
---
 docs/devel/multi-thread-tcg.txt |  11 ++--
 include/exec/cpu-common.h   |   1 -
 include/exec/exec-all.h |   4 --
 include/exec/tb-context.h   |   2 -
 tcg/tcg.h   |   4 +-
 accel/tcg/cpu-exec.c|  34 +++---
 accel/tcg/translate-all.c   | 135 
 exec.c  |  14 ++---
 linux-user/main.c   |   3 -
 9 files changed, 62 insertions(+), 146 deletions(-)

diff --git a/docs/devel/multi-thread-tcg.txt b/docs/devel/multi-thread-tcg.txt
index 36da1f1..e1e002b 100644
--- a/docs/devel/multi-thread-tcg.txt
+++ b/docs/devel/multi-thread-tcg.txt
@@ -61,6 +61,7 @@ have their block-to-block jumps patched.
 Global TCG State
 
 
+### User-mode emulation
 We need to protect the entire code generation cycle including any post
 generation patching of the translated code. This also implies a shared
 translation buffer which contains code running on all cores. Any
@@ -75,9 +76,11 @@ patching.
 
 (Current solution)
 
-Mainly as part of the linux-user work all code generation is
-serialised with a tb_lock(). For the SoftMMU tb_lock() also takes the
-place of mmap_lock() in linux-user.
+Code generation is serialised with mmap_lock().
+
+### !User-mode emulation
+Each vCPU has its own TCG context and associated TCG region, thereby
+requiring no locking.
 
 Translation Blocks
 --
@@ -192,7 +195,7 @@ work as "safe work" and exiting the cpu run loop. This 
ensure by the
 time execution restarts all flush operations have completed.
 
 TLB flag updates are all done atomically and are also protected by the
-tb_lock() which is used by the functions that update the TLB in bulk.
+corresponding page lock.
 
 (Known limitation)
 
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index 74341b1..6f64df0 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -23,7 +23,6 @@ typedef struct CPUListState {
 FILE *file;
 } CPUListState;
 
-/* The CPU list lock nests outside tb_lock/tb_unlock.  */
 void qemu_init_cpu_list(void);
 void cpu_list_lock(void);
 void cpu_list_unlock(void);
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 64753a0..bc71014 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -557,10 +557,6 @@ extern uintptr_t tci_tb_ptr;
smaller than 4 bytes, so we don't worry about special-casing this.  */
 #define GETPC_ADJ   2
 
-void tb_lock(void);
-void tb_unlock(void);
-void tb_lock_reset(void);
-
 #if !defined(CONFIG_USER_ONLY)
 
 struct MemoryRegion *iotlb_to_region(CPUState *cpu,
diff --git a/include/exec/tb-context.h b/include/exec/tb-context.h
index 8c9b49c..feb585e 100644
--- a/include/exec/tb-context.h
+++ b/include/exec/tb-context.h
@@ -32,8 +32,6 @@ typedef struct TBContext TBContext;
 struct TBContext {
 
 struct qht htable;
-/* any access to the tbs or the page table must use this lock */
-QemuMutex tb_lock;
 
 /* statistics */
 unsigned tb_flush_count;
diff --git a/tcg/tcg.h

[Qemu-devel] [PATCH 07/22] tcg: track TBs with per-region BST's

This paves the way for enabling scalable parallel generation of TCG code.

Instead of tracking TBs with a single binary search tree (BST), use a
BST for each TCG region, protecting it with a lock. This is as scalable
as it gets, since each TCG thread operates on a separate region.

The core of this change is the introduction of struct tcg_region_tree,
which contains a pointer to a GTree and an associated lock to serialize
accesses to it. We then allocate an array of tcg_region_tree's, adding
the appropriate padding to avoid false sharing based on
qemu_dcache_linesize.

Given a tc_ptr, we first find the corresponding region_tree. This
is done by special-casing the first and last regions first, since they
might be of size != region.size; otherwise we just divide the offset
by region.stride. I was worried about this division (several dozen
cycles of latency), but profiling shows that this is not a fast path.
Note that region.stride is not required to be a power of two; it
is only required to be a multiple of the host's page size.

Note that with this design we can also provide consistent snapshots
about all region trees at once; for instance, tcg_tb_foreach
acquires/releases all region_tree locks before/after iterating over them.

As an alternative I considered implementing a concurrent BST, but this
can be tricky to get right, offers no consistent snapshots of the BST,
and performance and scalability-wise I don't think it could ever beat
having separate GTrees, given that our workload is insert-mostly (all
concurrent BST designs I've seen focus, understandably, on making
lookups fast, which comes at the expense of convoluted, non-wait-free
insertions/removals).

Signed-off-by: Emilio G. Cota 
---
 include/exec/exec-all.h   |   1 -
 include/exec/tb-context.h |   1 -
 tcg/tcg.h |   6 ++
 accel/tcg/cpu-exec.c  |   2 +-
 accel/tcg/translate-all.c |  97 ---
 tcg/tcg.c | 191 ++
 6 files changed, 213 insertions(+), 85 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 9abc745..32e97e1 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -428,7 +428,6 @@ static inline uint32_t curr_cflags(void)
 return parallel_cpus ? CF_PARALLEL : 0;
 }
 
-void tb_remove(TranslationBlock *tb);
 void tb_flush(CPUState *cpu);
 void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr);
 TranslationBlock *tb_htable_lookup(CPUState *cpu, target_ulong pc,
diff --git a/include/exec/tb-context.h b/include/exec/tb-context.h
index 1d41202..d8472c8 100644
--- a/include/exec/tb-context.h
+++ b/include/exec/tb-context.h
@@ -31,7 +31,6 @@ typedef struct TBContext TBContext;
 
 struct TBContext {
 
-GTree *tb_tree;
 struct qht htable;
 /* any access to the tbs or the page table must use this lock */
 QemuMutex tb_lock;
diff --git a/tcg/tcg.h b/tcg/tcg.h
index aaf447e..96f3cd8 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -766,6 +766,12 @@ void tcg_region_reset_all(void);
 size_t tcg_code_size(void);
 size_t tcg_code_capacity(void);
 
+void tcg_tb_insert(TranslationBlock *tb);
+void tcg_tb_remove(TranslationBlock *tb);
+TranslationBlock *tcg_tb_lookup(uintptr_t tc_ptr);
+void tcg_tb_foreach(GTraverseFunc func, gpointer user_data);
+size_t tcg_nb_tbs(void);
+
 /* user-mode: Called with tb_lock held.  */
 static inline void *tcg_malloc(int size)
 {
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index d2b9411..e8c 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -218,7 +218,7 @@ static void cpu_exec_nocache(CPUState *cpu, int max_cycles,
 
 tb_lock();
 tb_phys_invalidate(tb, -1);
-tb_remove(tb);
+tcg_tb_remove(tb);
 tb_unlock();
 }
 #endif
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 4ed402e..bfa3547 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -206,8 +206,6 @@ void tb_lock_reset(void)
 }
 }
 
-static TranslationBlock *tb_find_pc(uintptr_t tc_ptr);
-
 void cpu_gen_init(void)
 {
 tcg_context_init(_init_ctx);
@@ -364,7 +362,7 @@ bool cpu_restore_state(CPUState *cpu, uintptr_t retaddr)
  * != 0 before attempting to restore state. We return early to
  * avoid blowing up on a recursive tb_lock(). The target must have
  * previously survived a failed cpu_restore_state because
- * tb_find_pc(0) would have failed anyway. It still should be
+ * tcg_tb_lookup(0) would have failed anyway. It still should be
  * fixed though.
  */
 
@@ -373,13 +371,13 @@ bool cpu_restore_state(CPUState *cpu, uintptr_t retaddr)
 }
 
 tb_lock();
-tb = tb_find_pc(retaddr);
+tb = tcg_tb_lookup(retaddr);
 if (tb) {
 cpu_restore_state_from_tb(cpu, tb, retaddr);
 if (tb->cflags & CF_NOCACHE) {
 /* one-shot translation, invalidate it immediately */
 tb_phys_invalidate(tb, -1);
-

[Qemu-devel] [PATCH 17/22] translate-all: discard TB when tb_link_page returns an existing matching TB

Use the recently-gained QHT feature of returning the matching TB if it
already exists. This allows us to get rid of the lookup we perform
right after acquiring tb_lock.

Suggested-by: Richard Henderson 
Signed-off-by: Emilio G. Cota 
---
 accel/tcg/cpu-exec.c  | 14 ++
 accel/tcg/translate-all.c | 47 ++-
 2 files changed, 40 insertions(+), 21 deletions(-)

diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index e8c..d86081a 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -237,10 +237,7 @@ void cpu_exec_step_atomic(CPUState *cpu)
 if (tb == NULL) {
 mmap_lock();
 tb_lock();
-tb = tb_htable_lookup(cpu, pc, cs_base, flags, cf_mask);
-if (likely(tb == NULL)) {
-tb = tb_gen_code(cpu, pc, cs_base, flags, cflags);
-}
+tb = tb_gen_code(cpu, pc, cs_base, flags, cflags);
 tb_unlock();
 mmap_unlock();
 }
@@ -348,14 +345,7 @@ static inline TranslationBlock *tb_find(CPUState *cpu,
 tb_lock();
 acquired_tb_lock = true;
 
-/* There's a chance that our desired tb has been translated while
- * taking the locks so we check again inside the lock.
- */
-tb = tb_htable_lookup(cpu, pc, cs_base, flags, cf_mask);
-if (likely(tb == NULL)) {
-/* if no translated code available, then translate it now */
-tb = tb_gen_code(cpu, pc, cs_base, flags, cf_mask);
-}
+tb = tb_gen_code(cpu, pc, cs_base, flags, cf_mask);
 
 mmap_unlock();
 /* We add the TB in the virtual pc hash table for the fast lookup */
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 8eb7af3..4fc5c76 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1490,12 +1490,16 @@ static inline void tb_page_add(PageDesc *p, 
TranslationBlock *tb,
  * (-1) to indicate that only one page contains the TB.
  *
  * Called with mmap_lock held for user-mode emulation.
+ *
+ * Returns @tb or an existing TB that matches @tb.
  */
-static void tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
- tb_page_addr_t phys_page2)
+static TranslationBlock *
+tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
+ tb_page_addr_t phys_page2)
 {
 PageDesc *p;
 PageDesc *p2 = NULL;
+void *existing_tb;
 uint32_t h;
 
 assert_memory_lock();
@@ -1503,6 +1507,11 @@ static void tb_link_page(TranslationBlock *tb, 
tb_page_addr_t phys_pc,
 /*
  * Add the TB to the page list.
  * To avoid deadlock, acquire first the lock of the lower-addressed page.
+ * We keep the locks held until after inserting the TB in the hash table,
+ * so that if the insertion fails we know for sure that the TBs are still
+ * in the page descriptors.
+ * Note that inserting into the hash table first isn't an option, since
+ * we can only insert TBs that are fully initialized.
  */
 p = page_find_alloc(phys_pc >> TARGET_PAGE_BITS, 1);
 if (likely(phys_page2 == -1)) {
@@ -1522,21 +1531,33 @@ static void tb_link_page(TranslationBlock *tb, 
tb_page_addr_t phys_pc,
 tb_page_add(p2, tb, 1, phys_page2);
 }
 
+/* add in the hash table */
+h = tb_hash_func(phys_pc, tb->pc, tb->flags, tb->cflags & CF_HASH_MASK,
+ tb->trace_vcpu_dstate);
+existing_tb = qht_insert(_ctx.htable, tb, h);
+
+/* remove TB from the page(s) if we couldn't insert it */
+if (existing_tb) {
+tb_page_remove(p, tb);
+invalidate_page_bitmap(p);
+if (p2) {
+tb_page_remove(p2, tb);
+invalidate_page_bitmap(p2);
+}
+tb = existing_tb;
+}
+
 if (p2) {
 page_unlock(p2);
 }
 page_unlock(p);
 
-/* add in the hash table */
-h = tb_hash_func(phys_pc, tb->pc, tb->flags, tb->cflags & CF_HASH_MASK,
- tb->trace_vcpu_dstate);
-qht_insert(_ctx.htable, tb, h);
-
 #ifdef CONFIG_USER_ONLY
 if (DEBUG_TB_CHECK_GATE) {
 tb_page_check();
 }
 #endif
+return tb;
 }
 
 /* Called with mmap_lock held for user mode emulation.  */
@@ -1545,7 +1566,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
   uint32_t flags, int cflags)
 {
 CPUArchState *env = cpu->env_ptr;
-TranslationBlock *tb;
+TranslationBlock *tb, *existing_tb;
 tb_page_addr_t phys_pc, phys_page2;
 target_ulong virt_page2;
 tcg_insn_unit *gen_code_buf;
@@ -1676,7 +1697,15 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
  * memory barrier is required before tb_link_page() makes the TB visible
  * through the physical hash table and physical page list.
  */
-tb_link_page(tb, phys_pc, phys_page2);
+existing_tb = tb_link_page(tb, phys_pc, phys_page2);
+/* if the TB already

Re: [Qemu-devel] tb_invalidate_phys_range and tb_invalidate_phys_page_range

On Sat, Jul 29, 2017 at 10:23:42 +0100, Peter Maydell wrote:
> On 29 July 2017 at 01:33, Emilio G. Cota  wrote:
(snip)
> > +void tb_invalidate_phys_range(tb_page_addr_t start, tb_page_addr_t end,
> > +  int is_cpu_write_access)
> > +{
> > +while (start < end) {
> > +tb_invalidate_phys_page_range(start, end, is_cpu_write_access);
> > +start &= TARGET_PAGE_MASK;
> > +start += TARGET_PAGE_SIZE;
> > +}
> > +}
> >
> > What I find puzzling is that here we pass 'end' unmodified, instead of
> > making sure the range passed is within a page.
> >
> > Is this a bug, or am I missing something?
> 
> It's a bit odd, but looking at the code I think it doesn't matter.
> The only thing that tb_invalidate_phys_page_range() uses 'end'
> for is when it does the "does this TB I've found overlap the
> range we're flushing" check with
>  if (!(tb_end <= start || tb_start >= end))
> 
> If we pass an 'end' value that's larger than it would be if
> we confined it to the page, this is safe because at worst
> we'll invalidate more TBs than we needed to (and in practice
> if the TB we've found is within the full range that
> tb_invalidate_phys_range() is checking we need to invalidate
> it anyway). I haven't quite worked it through but I rather
> suspect that you can't have a TB that's in the list for
> the PageDesc for 'start' and which overlaps in the
> "full" [start-end) range but which wouldn't overlap
> in a truncated [start-(end rounded down to end of page))
> range.
> 
> Basically the reason for the "same page" restriction
> is that tb_invalidate_phys_page_range() only does
> a single page_find() for the 'start' address. So
> as long as we call it once per page in the range
> it doesn't matter that we pass an overly pessimistic
> end. This is kind of bordering on "happens to work"
> territory though, so maybe we could improve it...

Thanks Peter.

I just sent a patch to improve this as part of the "tb lock removal"
series. The patch is here:
  https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg01270.html

Cheers,

Emilio

[Qemu-devel] [PATCH 21/22] translate-all: remove tb_lock mention from cpu_restore_state_from_tb

tb_lock was needed when the function did retranslation. However,
since fca8a500d519 ("tcg: Save insn data and use it in
cpu_restore_state_from_tb") we don't do retranslation.

Get rid of the comment.

Signed-off-by: Emilio G. Cota 
---
 accel/tcg/translate-all.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index dc19217..863b418 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -376,9 +376,7 @@ static int encode_search(TranslationBlock *tb, uint8_t 
*block)
 return p - block;
 }
 
-/* The cpu state corresponding to 'searched_pc' is restored.
- * Called with tb_lock held.
- */
+/* The cpu state corresponding to 'searched_pc' is restored */
 static int cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
  uintptr_t searched_pc)
 {
-- 
2.7.4

[Qemu-devel] [PATCH 16/22] translate-all: use per-page locking in !user-mode

Groundwork for supporting parallel TCG generation.

Instead of using a global lock (tb_lock) to protect changes
to pages, use fine-grained, per-page locks in !user-mode.
User-mode stays with mmap_lock.

Sometimes changes need to happen atomically on more than one
page (e.g. when a TB that spans across two pages is
added/invalidated, or when a range of pages is invalidated).
We therefore introduce struct page_collection, which helps
us keep track of a set of pages that have been locked in
the appropriate locking order (i.e. by ascending page index).

This commit first introduces the structs and the function helpers,
to then convert the calling code to use per-page locking. Note
that tb_lock is not removed yet.

While at it, rename tb_alloc_page to tb_page_add, which pairs with
tb_page_remove.

Signed-off-by: Emilio G. Cota 
---
 accel/tcg/translate-all.h |   3 +
 include/exec/exec-all.h   |   3 +-
 accel/tcg/translate-all.c | 433 +-
 3 files changed, 397 insertions(+), 42 deletions(-)

diff --git a/accel/tcg/translate-all.h b/accel/tcg/translate-all.h
index ba8e4d6..6d1d258 100644
--- a/accel/tcg/translate-all.h
+++ b/accel/tcg/translate-all.h
@@ -23,6 +23,9 @@
 
 
 /* translate-all.c */
+struct page_collection *page_collection_lock(tb_page_addr_t start,
+ tb_page_addr_t end);
+void page_collection_unlock(struct page_collection *set);
 void tb_invalidate_phys_page_fast(tb_page_addr_t start, int len);
 void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
int is_cpu_write_access);
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 26084c3..495efdb 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -380,7 +380,8 @@ struct TranslationBlock {
 /* original tb when cflags has CF_NOCACHE */
 struct TranslationBlock *orig_tb;
 /* first and second physical page containing code. The lower bit
-   of the pointer tells the index in page_next[] */
+   of the pointer tells the index in page_next[].
+   The list is protected by the TB's page('s) lock(s) */
 uintptr_t page_next[2];
 tb_page_addr_t page_addr[2];
 
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 76fecf5..8eb7af3 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -113,8 +113,55 @@ typedef struct PageDesc {
 #else
 unsigned long flags;
 #endif
+#ifndef CONFIG_USER_ONLY
+QemuSpin lock;
+#endif
 } PageDesc;
 
+/**
+ * struct page_entry - page descriptor entry
+ * @pd: pointer to the  PageDesc of the page this entry represents
+ * @index:  page index of the page
+ * @locked: whether the page is locked
+ *
+ * This struct helps us keep track of the locked state of a page, without
+ * bloating  PageDesc.
+ *
+ * A page lock protects accesses to all fields of  PageDesc.
+ *
+ * See also:  page_collection.
+ */
+struct page_entry {
+PageDesc *pd;
+tb_page_addr_t index;
+bool locked;
+};
+
+/**
+ * struct page_collection - tracks a set of pages (i.e.  page_entry's)
+ * @tree:   Binary search tree (BST) of the pages, with key == page index
+ * @max:Pointer to the page in @tree with the highest page index
+ *
+ * To avoid deadlock we lock pages in ascending order of page index.
+ * When operating on a set of pages, we need to keep track of them so that
+ * we can lock them in order and also unlock them later. For this we collect
+ * pages (i.e.  page_entry's) in a binary search @tree. Given that the
+ * @tree implementation we use does not provide an O(1) operation to obtain the
+ * highest-ranked element, we use @max to keep track of the inserted page
+ * with the highest index. This is valuable because if a page is not in
+ * the tree and its index is higher than @max's, then we can lock it
+ * without breaking the locking order rule.
+ *
+ * Note on naming: 'struct page_set' would be shorter, but we already have a 
few
+ * page_set_*() helpers, so page_collection is used instead to avoid confusion.
+ *
+ * See also: page_collection_lock().
+ */
+struct page_collection {
+GTree *tree;
+struct page_entry *max;
+};
+
 /* list iterators for lists of tagged pointers in TranslationBlock */
 #define tb_for_each_tagged(head, tb, n, field)  \
 for (n = (head) & 1,\
@@ -520,6 +567,15 @@ static PageDesc *page_find_alloc(tb_page_addr_t index, int 
alloc)
 return NULL;
 }
 pd = g_new0(PageDesc, V_L2_SIZE);
+#ifndef CONFIG_USER_ONLY
+{
+int i;
+
+for (i = 0; i < V_L2_SIZE; i++) {
+qemu_spin_init([i].lock);
+}
+}
+#endif
 existing = atomic_cmpxchg(lp, NULL, pd);
 if (unlikely(existing)) {
 g_free(pd);
@@ -535,6 +591,229 @@ static inline PageDesc *page_find(tb_page_addr_t index)
 return

[Qemu-devel] [PATCH 15/22] translate-all: move tb_invalidate_phys_page_range up in the file

This greatly simplifies next commit's diff.

Signed-off-by: Emilio G. Cota 
---
 accel/tcg/translate-all.c | 77 ---
 1 file changed, 39 insertions(+), 38 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index e8f663f..76fecf5 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1362,44 +1362,6 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
 
 /*
  * Invalidate all TBs which intersect with the target physical address range
- * [start;end[. NOTE: start and end may refer to *different* physical pages.
- * 'is_cpu_write_access' should be true if called from a real cpu write
- * access: the virtual CPU will exit the current TB if code is modified inside
- * this TB.
- *
- * Called with mmap_lock held for user-mode emulation, grabs tb_lock
- * Called with tb_lock held for system-mode emulation
- */
-static void tb_invalidate_phys_range_1(tb_page_addr_t start, tb_page_addr_t 
end)
-{
-tb_page_addr_t next;
-
-for (next = (start & TARGET_PAGE_MASK) + TARGET_PAGE_SIZE;
- start < end;
- start = next, next += TARGET_PAGE_SIZE) {
-tb_page_addr_t bound = MIN(next, end);
-
-tb_invalidate_phys_page_range(start, bound, 0);
-}
-}
-
-#ifdef CONFIG_SOFTMMU
-void tb_invalidate_phys_range(tb_page_addr_t start, tb_page_addr_t end)
-{
-assert_tb_locked();
-tb_invalidate_phys_range_1(start, end);
-}
-#else
-void tb_invalidate_phys_range(tb_page_addr_t start, tb_page_addr_t end)
-{
-assert_memory_lock();
-tb_lock();
-tb_invalidate_phys_range_1(start, end);
-tb_unlock();
-}
-#endif
-/*
- * Invalidate all TBs which intersect with the target physical address range
  * [start;end[. NOTE: start and end must refer to the *same* physical page.
  * 'is_cpu_write_access' should be true if called from a real cpu write
  * access: the virtual CPU will exit the current TB if code is modified inside
@@ -1501,6 +1463,45 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, 
tb_page_addr_t end,
 #endif
 }
 
+/*
+ * Invalidate all TBs which intersect with the target physical address range
+ * [start;end[. NOTE: start and end may refer to *different* physical pages.
+ * 'is_cpu_write_access' should be true if called from a real cpu write
+ * access: the virtual CPU will exit the current TB if code is modified inside
+ * this TB.
+ *
+ * Called with mmap_lock held for user-mode emulation, grabs tb_lock
+ * Called with tb_lock held for system-mode emulation
+ */
+static void tb_invalidate_phys_range_1(tb_page_addr_t start, tb_page_addr_t 
end)
+{
+tb_page_addr_t next;
+
+for (next = (start & TARGET_PAGE_MASK) + TARGET_PAGE_SIZE;
+ start < end;
+ start = next, next += TARGET_PAGE_SIZE) {
+tb_page_addr_t bound = MIN(next, end);
+
+tb_invalidate_phys_page_range(start, bound, 0);
+}
+}
+
+#ifdef CONFIG_SOFTMMU
+void tb_invalidate_phys_range(tb_page_addr_t start, tb_page_addr_t end)
+{
+assert_tb_locked();
+tb_invalidate_phys_range_1(start, end);
+}
+#else
+void tb_invalidate_phys_range(tb_page_addr_t start, tb_page_addr_t end)
+{
+assert_memory_lock();
+tb_lock();
+tb_invalidate_phys_range_1(start, end);
+tb_unlock();
+}
+#endif
+
 #ifdef CONFIG_SOFTMMU
 /* len must be <= 8 and start must be a multiple of len.
  * Called via softmmu_template.h when code areas are written to with
-- 
2.7.4

[Qemu-devel] [PATCH 19/22] cputlb: remove tb_lock from tlb_flush functions

The acquisition of tb_lock was added when the async tlb_flush
was introduced in e3b9ca810 ("cputlb: introduce tlb_flush_* async work.")

tb_lock was there to allow us to do memset() on the tb_jmp_cache's.
However, since f3ced3c5928 ("tcg: consistently access cpu->tb_jmp_cache
atomically") all accesses to tb_jmp_cache are atomic, so tb_lock
is not needed here. Get rid of it.

Signed-off-by: Emilio G. Cota 
---
 accel/tcg/cputlb.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 9377110..0e90bf2 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -125,8 +125,6 @@ static void tlb_flush_nocheck(CPUState *cpu)
 atomic_set(>tlb_flush_count, env->tlb_flush_count + 1);
 tlb_debug("(count: %zu)\n", tlb_flush_count());
 
-tb_lock();
-
 memset(env->tlb_table, -1, sizeof(env->tlb_table));
 memset(env->tlb_v_table, -1, sizeof(env->tlb_v_table));
 cpu_tb_jmp_cache_clear(cpu);
@@ -135,8 +133,6 @@ static void tlb_flush_nocheck(CPUState *cpu)
 env->tlb_flush_addr = -1;
 env->tlb_flush_mask = 0;
 
-tb_unlock();
-
 atomic_mb_set(>pending_tlb_flush, 0);
 }
 
@@ -180,8 +176,6 @@ static void tlb_flush_by_mmuidx_async_work(CPUState *cpu, 
run_on_cpu_data data)
 
 assert_cpu_is_self(cpu);
 
-tb_lock();
-
 tlb_debug("start: mmu_idx:0x%04lx\n", mmu_idx_bitmask);
 
 for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
@@ -197,8 +191,6 @@ static void tlb_flush_by_mmuidx_async_work(CPUState *cpu, 
run_on_cpu_data data)
 cpu_tb_jmp_cache_clear(cpu);
 
 tlb_debug("done\n");
-
-tb_unlock();
 }
 
 void tlb_flush_by_mmuidx(CPUState *cpu, uint16_t idxmap)
-- 
2.7.4

[Qemu-devel] [PATCH 14/22] translate-all: work page-by-page in tb_invalidate_phys_range_1

So that we pass a same-page range to tb_invalidate_phys_page_range,
instead of always passing an end address that could be on a different
page.

As discussed with Peter Maydell on the list, tb_invalidate_phys_page_range
doesn't actually do much with 'end', which explains why we have never
hit a bug despite going against what the comment on top of
tb_invalidate_phys_page_range requires:

> * Invalidate all TBs which intersect with the target physical address range
> * [start;end[. NOTE: start and end must refer to the *same* physical page.

The appended honours the comment, which avoids confusion.

While at it, rework the loop into a for loop, which is less error prone
(e.g. "continue" won't result in an infinite loop).

Signed-off-by: Emilio G. Cota 
---
 accel/tcg/translate-all.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 64d099b..e8f663f 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1372,10 +1372,14 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
  */
 static void tb_invalidate_phys_range_1(tb_page_addr_t start, tb_page_addr_t 
end)
 {
-while (start < end) {
-tb_invalidate_phys_page_range(start, end, 0);
-start &= TARGET_PAGE_MASK;
-start += TARGET_PAGE_SIZE;
+tb_page_addr_t next;
+
+for (next = (start & TARGET_PAGE_MASK) + TARGET_PAGE_SIZE;
+ start < end;
+ start = next, next += TARGET_PAGE_SIZE) {
+tb_page_addr_t bound = MIN(next, end);
+
+tb_invalidate_phys_page_range(start, bound, 0);
 }
 }
 
-- 
2.7.4

[Qemu-devel] [PATCH 05/22] qht: require a default comparison function

qht_lookup now uses the default cmp function. qht_lookup_custom is defined
to retain the old behaviour, that is a cmp function is explicitly provided.

qht_insert will gain use of the default cmp in the next patch.

Signed-off-by: Emilio G. Cota 
---
 include/qemu/qht.h| 23 +++
 accel/tcg/cpu-exec.c  |  4 ++--
 accel/tcg/translate-all.c | 16 +++-
 tests/qht-bench.c | 14 +++---
 tests/test-qht.c  | 15 ++-
 util/qht.c| 13 ++---
 6 files changed, 63 insertions(+), 22 deletions(-)

diff --git a/include/qemu/qht.h b/include/qemu/qht.h
index 531aa95..dd512bf 100644
--- a/include/qemu/qht.h
+++ b/include/qemu/qht.h
@@ -11,8 +11,11 @@
 #include "qemu/thread.h"
 #include "qemu/qdist.h"
 
+typedef bool (*qht_cmp_func_t)(const void *a, const void *b);
+
 struct qht {
 struct qht_map *map;
+qht_cmp_func_t cmp;
 QemuMutex lock; /* serializes setters of ht->map */
 unsigned int mode;
 };
@@ -47,10 +50,12 @@ typedef void (*qht_iter_func_t)(struct qht *ht, void *p, 
uint32_t h, void *up);
 /**
  * qht_init - Initialize a QHT
  * @ht: QHT to be initialized
+ * @cmp: default comparison function. Cannot be NULL.
  * @n_elems: number of entries the hash table should be optimized for.
  * @mode: bitmask with OR'ed QHT_MODE_*
  */
-void qht_init(struct qht *ht, size_t n_elems, unsigned int mode);
+void qht_init(struct qht *ht, qht_cmp_func_t cmp, size_t n_elems,
+  unsigned int mode);
 
 /**
  * qht_destroy - destroy a previously initialized QHT
@@ -78,7 +83,7 @@ void qht_destroy(struct qht *ht);
 bool qht_insert(struct qht *ht, void *p, uint32_t hash);
 
 /**
- * qht_lookup - Look up a pointer in a QHT
+ * qht_lookup_custom - Look up a pointer using a custom comparison function.
  * @ht: QHT to be looked up
  * @func: function to compare existing pointers against @userp
  * @userp: pointer to pass to @func
@@ -94,8 +99,18 @@ bool qht_insert(struct qht *ht, void *p, uint32_t hash);
  * Returns the corresponding pointer when a match is found.
  * Returns NULL otherwise.
  */
-void *qht_lookup(struct qht *ht, qht_lookup_func_t func, const void *userp,
- uint32_t hash);
+void *qht_lookup_custom(struct qht *ht, qht_lookup_func_t func,
+const void *userp, uint32_t hash);
+
+/**
+ * qht_lookup - Look up a pointer in a QHT
+ * @ht: QHT to be looked up
+ * @userp: pointer to pass to @func
+ * @hash: hash of the pointer to be looked up
+ *
+ * Calls qht_lookup_custom() using @ht's default comparison function.
+ */
+void *qht_lookup(struct qht *ht, const void *userp, uint32_t hash);
 
 /**
  * qht_remove - remove a pointer from the hash table
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index f42096a..d2b9411 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -280,7 +280,7 @@ struct tb_desc {
 uint32_t trace_vcpu_dstate;
 };
 
-static bool tb_cmp(const void *p, const void *d)
+static bool tb_lookup_cmp(const void *p, const void *d)
 {
 const TranslationBlock *tb = p;
 const struct tb_desc *desc = d;
@@ -325,7 +325,7 @@ TranslationBlock *tb_htable_lookup(CPUState *cpu, 
target_ulong pc,
 phys_pc = get_page_addr_code(desc.env, pc);
 desc.phys_page1 = phys_pc & TARGET_PAGE_MASK;
 h = tb_hash_func(phys_pc, pc, flags, cf_mask, *cpu->trace_dstate);
-return qht_lookup(_ctx.htable, tb_cmp, , h);
+return qht_lookup_custom(_ctx.htable, tb_lookup_cmp, , h);
 }
 
 static inline TranslationBlock *tb_find(CPUState *cpu,
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index fe3388a..4ed402e 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -782,11 +782,25 @@ static inline void code_gen_alloc(size_t tb_size)
 qemu_mutex_init(_ctx.tb_lock);
 }
 
+static bool tb_cmp(const void *ap, const void *bp)
+{
+const TranslationBlock *a = ap;
+const TranslationBlock *b = bp;
+
+return a->pc == b->pc &&
+a->cs_base == b->cs_base &&
+a->flags == b->flags &&
+(tb_cflags(a) & CF_HASH_MASK) == (tb_cflags(b) & CF_HASH_MASK) &&
+a->trace_vcpu_dstate == b->trace_vcpu_dstate &&
+a->page_addr[0] == b->page_addr[0] &&
+a->page_addr[1] == b->page_addr[1];
+}
+
 static void tb_htable_init(void)
 {
 unsigned int mode = QHT_MODE_AUTO_RESIZE;
 
-qht_init(_ctx.htable, CODE_GEN_HTABLE_SIZE, mode);
+qht_init(_ctx.htable, tb_cmp, CODE_GEN_HTABLE_SIZE, mode);
 }
 
 /* Must be called before using the QEMU cpus. 'tb_size' is the size
diff --git a/tests/qht-bench.c b/tests/qht-bench.c
index 4cabdfd..c94ac25 100644
--- a/tests/qht-bench.c
+++ b/tests/qht-bench.c
@@ -93,10 +93,10 @@ static void usage_complete(int argc, char *argv[])
 exit(-1);
 }
 
-static bool is_equal(const void *obj, const void *userp)
+static bool is_equal(const void *ap, const void *bp)
 {
-const long *a = obj;
-const long *b = userp;
+const

[Qemu-devel] [PATCH 12/22] translate-all: make l1_map lockless

Groundwork for supporting parallel TCG generation.

We never remove entries from the radix tree, so we can use cmpxchg
to implement lockless insertions.

Signed-off-by: Emilio G. Cota 
---
 docs/devel/multi-thread-tcg.txt |  4 ++--
 accel/tcg/translate-all.c   | 24 ++--
 2 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/docs/devel/multi-thread-tcg.txt b/docs/devel/multi-thread-tcg.txt
index a99b456..faf8918 100644
--- a/docs/devel/multi-thread-tcg.txt
+++ b/docs/devel/multi-thread-tcg.txt
@@ -134,8 +134,8 @@ tb_set_jmp_target() code. Modification to the linked lists 
that allow
 searching for linked pages are done under the protect of the
 tb_lock().
 
-The global page table is protected by the tb_lock() in system-mode and
-mmap_lock() in linux-user mode.
+The global page table is a lockless radix tree; cmpxchg is used
+to atomically insert new elements.
 
 The lookup caches are updated atomically and the lookup hash uses QHT
 which is designed for concurrent safe lookup.
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 0e3d223..12e1252 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -482,20 +482,12 @@ static void page_init(void)
 #endif
 }
 
-/* If alloc=1:
- * Called with tb_lock held for system emulation.
- * Called with mmap_lock held for user-mode emulation.
- */
 static PageDesc *page_find_alloc(tb_page_addr_t index, int alloc)
 {
 PageDesc *pd;
 void **lp;
 int i;
 
-if (alloc) {
-assert_memory_lock();
-}
-
 /* Level 1.  Always allocated.  */
 lp = l1_map + ((index >> v_l1_shift) & (v_l1_size - 1));
 
@@ -504,11 +496,17 @@ static PageDesc *page_find_alloc(tb_page_addr_t index, 
int alloc)
 void **p = atomic_rcu_read(lp);
 
 if (p == NULL) {
+void *existing;
+
 if (!alloc) {
 return NULL;
 }
 p = g_new0(void *, V_L2_SIZE);
-atomic_rcu_set(lp, p);
+existing = atomic_cmpxchg(lp, NULL, p);
+if (unlikely(existing)) {
+g_free(p);
+p = existing;
+}
 }
 
 lp = p + ((index >> (i * V_L2_BITS)) & (V_L2_SIZE - 1));
@@ -516,11 +514,17 @@ static PageDesc *page_find_alloc(tb_page_addr_t index, 
int alloc)
 
 pd = atomic_rcu_read(lp);
 if (pd == NULL) {
+void *existing;
+
 if (!alloc) {
 return NULL;
 }
 pd = g_new0(PageDesc, V_L2_SIZE);
-atomic_rcu_set(lp, pd);
+existing = atomic_cmpxchg(lp, NULL, pd);
+if (unlikely(existing)) {
+g_free(pd);
+pd = existing;
+}
 }
 
 return pd + (index & (V_L2_SIZE - 1));
-- 
2.7.4

[Qemu-devel] [PATCH 13/22] translate-all: remove hole in PageDesc

Groundwork for supporting parallel TCG generation.

Move the hole to the end of the struct, so that a u32
field can be added there without bloating the struct.

Signed-off-by: Emilio G. Cota 
---
 accel/tcg/translate-all.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 12e1252..64d099b 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -108,8 +108,8 @@ typedef struct PageDesc {
 #ifdef CONFIG_SOFTMMU
 /* in order to optimize self modifying code, we count the number
of lookups we do to a given page to use a bitmap */
-unsigned int code_write_count;
 unsigned long *code_bitmap;
+unsigned int code_write_count;
 #else
 unsigned long flags;
 #endif
-- 
2.7.4

[Qemu-devel] [PATCH 20/22] exec: remove tb_lock from notdirty_mem_write

By passing a locked page_collection to tb_invalidate_phys_page_fast.

Signed-off-by: Emilio G. Cota 
---
 accel/tcg/translate-all.h |  3 ++-
 accel/tcg/translate-all.c |  8 
 exec.c| 11 +--
 3 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/accel/tcg/translate-all.h b/accel/tcg/translate-all.h
index 6d1d258..e6cb963 100644
--- a/accel/tcg/translate-all.h
+++ b/accel/tcg/translate-all.h
@@ -26,7 +26,8 @@
 struct page_collection *page_collection_lock(tb_page_addr_t start,
  tb_page_addr_t end);
 void page_collection_unlock(struct page_collection *set);
-void tb_invalidate_phys_page_fast(tb_page_addr_t start, int len);
+void tb_invalidate_phys_page_fast(struct page_collection *pages,
+  tb_page_addr_t start, int len);
 void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
int is_cpu_write_access);
 void tb_invalidate_phys_range(tb_page_addr_t start, tb_page_addr_t end);
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 30c3fba..dc19217 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1915,10 +1915,12 @@ void tb_invalidate_phys_range(tb_page_addr_t start, 
tb_page_addr_t end)
 /* len must be <= 8 and start must be a multiple of len.
  * Called via softmmu_template.h when code areas are written to with
  * iothread mutex not held.
+ *
+ * Call with all @pages in the range [@start, @start + len[ locked.
  */
-void tb_invalidate_phys_page_fast(tb_page_addr_t start, int len)
+void tb_invalidate_phys_page_fast(struct page_collection *pages,
+  tb_page_addr_t start, int len)
 {
-struct page_collection *pages;
 PageDesc *p;
 
 #if 0
@@ -1937,7 +1939,6 @@ void tb_invalidate_phys_page_fast(tb_page_addr_t start, 
int len)
 return;
 }
 
-pages = page_collection_lock(start, start + len);
 if (!p->code_bitmap &&
 ++p->code_write_count >= SMC_BITMAP_USE_THRESHOLD) {
 build_page_bitmap(p);
@@ -1955,7 +1956,6 @@ void tb_invalidate_phys_page_fast(tb_page_addr_t start, 
int len)
 do_invalidate:
 tb_invalidate_phys_page_range__locked(pages, p, start, start + len, 1);
 }
-page_collection_unlock(pages);
 }
 #else
 /* Called with mmap_lock held. If pc is not 0 then it indicates the
diff --git a/exec.c b/exec.c
index 6e85535..620a496 100644
--- a/exec.c
+++ b/exec.c
@@ -2331,13 +2331,12 @@ ram_addr_t qemu_ram_addr_from_host(void *ptr)
 static void notdirty_mem_write(void *opaque, hwaddr ram_addr,
uint64_t val, unsigned size)
 {
-bool locked = false;
+struct page_collection *pages = NULL;
 
 assert(tcg_enabled());
 if (!cpu_physical_memory_get_dirty_flag(ram_addr, DIRTY_MEMORY_CODE)) {
-locked = true;
-tb_lock();
-tb_invalidate_phys_page_fast(ram_addr, size);
+pages = page_collection_lock(ram_addr, ram_addr + size);
+tb_invalidate_phys_page_fast(pages, ram_addr, size);
 }
 switch (size) {
 case 1:
@@ -2353,8 +2352,8 @@ static void notdirty_mem_write(void *opaque, hwaddr 
ram_addr,
 abort();
 }
 
-if (locked) {
-tb_unlock();
+if (pages) {
+page_collection_unlock(pages);
 }
 
 /* Set both VGA and migration bits for simplicity and to remove
-- 
2.7.4

[Qemu-devel] [PATCH 06/22] qht: return existing entry when qht_insert fails

The meaning of "existing" is now changed to "matches in hash and
ht->cmp result". This is saner than just checking the pointer value.

Note that we now return NULL on insertion success, or the existing
pointer on failure. We can do this because NULL pointers are not
allowed to be inserted in QHT.

Suggested-by: Richard Henderson 
Signed-off-by: Emilio G. Cota 
---
 include/qemu/qht.h |  7 ---
 tests/qht-bench.c  |  4 ++--
 tests/test-qht.c   |  5 -
 util/qht.c | 17 +
 4 files changed, 19 insertions(+), 14 deletions(-)

diff --git a/include/qemu/qht.h b/include/qemu/qht.h
index dd512bf..c320cb6 100644
--- a/include/qemu/qht.h
+++ b/include/qemu/qht.h
@@ -77,10 +77,11 @@ void qht_destroy(struct qht *ht);
  * In case of successful operation, smp_wmb() is implied before the pointer is
  * inserted into the hash table.
  *
- * Returns true on success.
- * Returns false if the @p-@hash pair already exists in the hash table.
+ * On success, returns NULL.
+ * On failure, returns the pointer from an entry that is equivalent (i.e.
+ * ht->cmp matches and the hash is the same) to @p-@h.
  */
-bool qht_insert(struct qht *ht, void *p, uint32_t hash);
+void *qht_insert(struct qht *ht, void *p, uint32_t hash);
 
 /**
  * qht_lookup_custom - Look up a pointer using a custom comparison function.
diff --git a/tests/qht-bench.c b/tests/qht-bench.c
index c94ac25..6c2eb8e 100644
--- a/tests/qht-bench.c
+++ b/tests/qht-bench.c
@@ -163,7 +163,7 @@ static void do_rw(struct thread_info *info)
 bool written = false;
 
 if (qht_lookup(, p, hash) == NULL) {
-written = qht_insert(, p, hash);
+written = !qht_insert(, p, hash);
 }
 if (written) {
 stats->in++;
@@ -322,7 +322,7 @@ static void htable_init(void)
 r = xorshift64star(r);
 p = [r & (init_range - 1)];
 hash = h(*p);
-if (qht_insert(, p, hash)) {
+if (!qht_insert(, p, hash)) {
 break;
 }
 retries++;
diff --git a/tests/test-qht.c b/tests/test-qht.c
index f8f2886..7164ae4 100644
--- a/tests/test-qht.c
+++ b/tests/test-qht.c
@@ -27,11 +27,14 @@ static void insert(int a, int b)
 
 for (i = a; i < b; i++) {
 uint32_t hash;
+void *existing;
 
 arr[i] = i;
 hash = i;
 
-qht_insert(, [i], hash);
+g_assert_true(!qht_insert(, [i], hash));
+existing = qht_insert(, [i], hash);
+g_assert_true(existing == [i]);
 }
 }
 
diff --git a/util/qht.c b/util/qht.c
index b69457d..90ecfca 100644
--- a/util/qht.c
+++ b/util/qht.c
@@ -510,9 +510,9 @@ void *qht_lookup(struct qht *ht, const void *userp, 
uint32_t hash)
 }
 
 /* call with head->lock held */
-static bool qht_insert__locked(struct qht *ht, struct qht_map *map,
-   struct qht_bucket *head, void *p, uint32_t hash,
-   bool *needs_resize)
+static void *qht_insert__locked(struct qht *ht, struct qht_map *map,
+struct qht_bucket *head, void *p, uint32_t 
hash,
+bool *needs_resize)
 {
 struct qht_bucket *b = head;
 struct qht_bucket *prev = NULL;
@@ -522,8 +522,9 @@ static bool qht_insert__locked(struct qht *ht, struct 
qht_map *map,
 do {
 for (i = 0; i < QHT_BUCKET_ENTRIES; i++) {
 if (b->pointers[i]) {
-if (unlikely(b->pointers[i] == p)) {
-return false;
+if (unlikely(b->hashes[i] == hash &&
+ ht->cmp(b->pointers[i], p))) {
+return b->pointers[i];
 }
 } else {
 goto found;
@@ -552,7 +553,7 @@ static bool qht_insert__locked(struct qht *ht, struct 
qht_map *map,
 atomic_set(>hashes[i], hash);
 atomic_set(>pointers[i], p);
 seqlock_write_end(>sequence);
-return true;
+return NULL;
 }
 
 static __attribute__((noinline)) void qht_grow_maybe(struct qht *ht)
@@ -576,12 +577,12 @@ static __attribute__((noinline)) void 
qht_grow_maybe(struct qht *ht)
 qemu_mutex_unlock(>lock);
 }
 
-bool qht_insert(struct qht *ht, void *p, uint32_t hash)
+void *qht_insert(struct qht *ht, void *p, uint32_t hash)
 {
 struct qht_bucket *b;
 struct qht_map *map;
 bool needs_resize = false;
-bool ret;
+void *ret;
 
 /* NULL pointers are not supported */
 qht_debug_assert(p);
-- 
2.7.4

[Qemu-devel] [PATCH 08/22] tcg: move tb_ctx.tb_phys_invalidate_count to tcg_ctx

Thereby making it per-TCGContext. Once we remove tb_lock, this will
avoid an atomic increment every time a TB is invalidated.

Signed-off-by: Emilio G. Cota 
---
 include/exec/tb-context.h |  1 -
 tcg/tcg.h |  3 +++
 accel/tcg/translate-all.c |  5 +++--
 tcg/tcg.c | 14 ++
 4 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/include/exec/tb-context.h b/include/exec/tb-context.h
index d8472c8..8c9b49c 100644
--- a/include/exec/tb-context.h
+++ b/include/exec/tb-context.h
@@ -37,7 +37,6 @@ struct TBContext {
 
 /* statistics */
 unsigned tb_flush_count;
-int tb_phys_invalidate_count;
 };
 
 extern TBContext tb_ctx;
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 96f3cd8..680df31 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -709,6 +709,8 @@ struct TCGContext {
 /* Threshold to flush the translated code buffer.  */
 void *code_gen_highwater;
 
+size_t tb_phys_invalidate_count;
+
 /* Track which vCPU triggers events */
 CPUState *cpu;  /* *_trans */
 TCGv_env tcg_env;   /* *_exec  */
@@ -768,6 +770,7 @@ size_t tcg_code_capacity(void);
 
 void tcg_tb_insert(TranslationBlock *tb);
 void tcg_tb_remove(TranslationBlock *tb);
+size_t tcg_tb_phys_invalidate_count(void);
 TranslationBlock *tcg_tb_lookup(uintptr_t tc_ptr);
 void tcg_tb_foreach(GTraverseFunc func, gpointer user_data);
 size_t tcg_nb_tbs(void);
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index bfa3547..8f6f8f1 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1067,7 +1067,8 @@ void tb_phys_invalidate(TranslationBlock *tb, 
tb_page_addr_t page_addr)
 /* suppress any remaining jumps to this TB */
 tb_jmp_unlink(tb);
 
-tb_ctx.tb_phys_invalidate_count++;
+atomic_set(_ctx->tb_phys_invalidate_count,
+   tcg_ctx->tb_phys_invalidate_count + 1);
 }
 
 #ifdef CONFIG_SOFTMMU
@@ -1858,7 +1859,7 @@ void dump_exec_info(FILE *f, fprintf_function cpu_fprintf)
 cpu_fprintf(f, "\nStatistics:\n");
 cpu_fprintf(f, "TB flush count  %u\n",
 atomic_read(_ctx.tb_flush_count));
-cpu_fprintf(f, "TB invalidate count %d\n", 
tb_ctx.tb_phys_invalidate_count);
+cpu_fprintf(f, "TB invalidate count %zu\n", 
tcg_tb_phys_invalidate_count());
 cpu_fprintf(f, "TLB flush count %zu\n", tlb_flush_count());
 tcg_dump_info(f, cpu_fprintf);
 
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 9f3c099..e51b54a 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -769,6 +769,20 @@ size_t tcg_code_capacity(void)
 return capacity;
 }
 
+size_t tcg_tb_phys_invalidate_count(void)
+{
+unsigned int n_ctxs = atomic_read(_tcg_ctxs);
+unsigned int i;
+size_t total = 0;
+
+for (i = 0; i < n_ctxs; i++) {
+const TCGContext *s = atomic_read(_ctxs[i]);
+
+total += atomic_read(>tb_phys_invalidate_count);
+}
+return total;
+}
+
 /* pool based memory allocation */
 void *tcg_malloc_internal(TCGContext *s, int size)
 {
-- 
2.7.4

[Qemu-devel] [PATCH 11/22] translate-all: exit from tb_phys_invalidate if qht_remove fails

Groundwork for supporting parallel TCG generation.

Once tb_lock goes away, it is conceivable that two (or more) threads
might race while invalidating the same TB. We currently do not
check for this, which means we would wrongly invalidate the same TB
more than once.

Fix this by using qht_remove as the synchronization point; if it fails,
that means the TB has already been invalidated, and therefore there
is nothing left to do in tb_phys_invalidate.

Signed-off-by: Emilio G. Cota 
---
 accel/tcg/translate-all.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 8e5e5b2..0e3d223 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1061,7 +1061,9 @@ void tb_phys_invalidate(TranslationBlock *tb, 
tb_page_addr_t page_addr)
 phys_pc = tb->page_addr[0] + (tb->pc & ~TARGET_PAGE_MASK);
 h = tb_hash_func(phys_pc, tb->pc, tb->flags, tb->cflags & CF_HASH_MASK,
  tb->trace_vcpu_dstate);
-qht_remove(_ctx.htable, tb, h);
+if (!qht_remove(_ctx.htable, tb, h)) {
+return;
+}
 
 /* remove the TB from the page list */
 if (tb->page_addr[0] != page_addr) {
-- 
2.7.4

[Qemu-devel] [PATCH 01/22] disas/arm: fix 'instuction' typo in comment

Signed-off-by: Emilio G. Cota 
---
 disas/arm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/disas/arm.c b/disas/arm.c
index 27396dd..a08bbbe 100644
--- a/disas/arm.c
+++ b/disas/arm.c
@@ -1651,7 +1651,7 @@ print_insn_coprocessor (bfd_vma pc, struct 
disassemble_info *info, long given,
}
   else
{
- /* Only match unconditional instuctions against unconditional
+ /* Only match unconditional instructions against unconditional
 patterns.  */
  if ((given & 0xf000) == 0xf000)
{
-- 
2.7.4

[Qemu-devel] [PATCH 10/22] translate-all: iterate over TBs in a page with page_for_each_tb

This commit does several things, but to avoid churn I merged them all
into the same commit. To wit:

- Use uintptr_t instead of TranslationBlock * for the list of TBs in a page.
  Just like we did in (c37e6d7e "tcg: Use uintptr_t type for
  jmp_list_{next|first} fields of TB"), the rationale is the same: these
  are tagged pointers, not pointers. So use a more appropriate type.

- Only check the least significant bit of the tagged pointers. Masking
  with 3/~3 is unnecessary and confusing.

- Use the new tb_for_each_tagged macros to define page_for_each_tb(_safe),
  which improves readability.

- Update tb_page_remove to use page_for_each_tb_safe(). In case there
  is a bug and we attempt to remove a TB that is not in the list, instead
  of segfaulting (since the list is NULL-terminated) we will reach
  g_assert_not_reached().

Signed-off-by: Emilio G. Cota 
---
 include/exec/exec-all.h   |  2 +-
 accel/tcg/translate-all.c | 57 ---
 2 files changed, 25 insertions(+), 34 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 32e97e1..26084c3 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -381,7 +381,7 @@ struct TranslationBlock {
 struct TranslationBlock *orig_tb;
 /* first and second physical page containing code. The lower bit
of the pointer tells the index in page_next[] */
-struct TranslationBlock *page_next[2];
+uintptr_t page_next[2];
 tb_page_addr_t page_addr[2];
 
 /* The following data are used to directly call another TB from
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 396c10c..8e5e5b2 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -104,7 +104,7 @@
 
 typedef struct PageDesc {
 /* list of TBs intersecting this ram page */
-TranslationBlock *first_tb;
+uintptr_t first_tb;
 #ifdef CONFIG_SOFTMMU
 /* in order to optimize self modifying code, we count the number
of lookups we do to a given page to use a bitmap */
@@ -134,6 +134,12 @@ typedef struct PageDesc {
  n = (uintptr_t)*prev & 1,  \
  tb = (TranslationBlock *)(*prev & ~1))
 
+#define page_for_each_tb(pagedesc, tb, n)   \
+tb_for_each_tagged((pagedesc)->first_tb, tb, n, page_next)
+
+#define page_for_each_tb_safe(pagedesc, tb, n, prev)\
+tb_for_each_tagged_safe((pagedesc)->first_tb, tb, n, page_next, prev)
+
 /* In system mode we want L1_MAP to be based on ram offsets,
while in user mode we want it to be based on virtual addresses.  */
 #if !defined(CONFIG_USER_ONLY)
@@ -834,7 +840,7 @@ static void page_flush_tb_1(int level, void **lp)
 PageDesc *pd = *lp;
 
 for (i = 0; i < V_L2_SIZE; ++i) {
-pd[i].first_tb = NULL;
+pd[i].first_tb = (uintptr_t)NULL;
 invalidate_page_bitmap(pd + i);
 }
 } else {
@@ -962,21 +968,19 @@ static void tb_page_check(void)
 
 #endif /* CONFIG_USER_ONLY */
 
-static inline void tb_page_remove(TranslationBlock **ptb, TranslationBlock *tb)
+static inline void tb_page_remove(PageDesc *pd, TranslationBlock *tb)
 {
 TranslationBlock *tb1;
+uintptr_t *prev;
 unsigned int n1;
 
-for (;;) {
-tb1 = *ptb;
-n1 = (uintptr_t)tb1 & 3;
-tb1 = (TranslationBlock *)((uintptr_t)tb1 & ~3);
+page_for_each_tb_safe(pd, tb1, n1, prev) {
 if (tb1 == tb) {
-*ptb = tb1->page_next[n1];
-break;
+*prev = tb1->page_next[n1];
+return;
 }
-ptb = >page_next[n1];
 }
+g_assert_not_reached();
 }
 
 /* remove the TB from a list of TBs jumping to the n-th jump target of the TB 
*/
@@ -1062,12 +1066,12 @@ void tb_phys_invalidate(TranslationBlock *tb, 
tb_page_addr_t page_addr)
 /* remove the TB from the page list */
 if (tb->page_addr[0] != page_addr) {
 p = page_find(tb->page_addr[0] >> TARGET_PAGE_BITS);
-tb_page_remove(>first_tb, tb);
+tb_page_remove(p, tb);
 invalidate_page_bitmap(p);
 }
 if (tb->page_addr[1] != -1 && tb->page_addr[1] != page_addr) {
 p = page_find(tb->page_addr[1] >> TARGET_PAGE_BITS);
-tb_page_remove(>first_tb, tb);
+tb_page_remove(p, tb);
 invalidate_page_bitmap(p);
 }
 
@@ -1098,10 +1102,7 @@ static void build_page_bitmap(PageDesc *p)
 
 p->code_bitmap = bitmap_new(TARGET_PAGE_SIZE);
 
-tb = p->first_tb;
-while (tb != NULL) {
-n = (uintptr_t)tb & 3;
-tb = (TranslationBlock *)((uintptr_t)tb & ~3);
+page_for_each_tb(p, tb, n) {
 /* NOTE: this is subtle as a TB may span two physical pages */
 if (n == 0) {
 /* NOTE: tb_end may be after the end of the page, but
@@ -1116,7 +1117,6 @@ static void build_page_bitmap(PageDesc *p)
 tb_end = ((tb->pc + tb->size) & ~TARGET_PAGE_MASK);

[Qemu-devel] [PATCH 03/22] translate-all: fix 'consisits' typo in comment

Signed-off-by: Emilio G. Cota 
---
 accel/tcg/translate-all.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index db3d42c..fe3388a 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -257,7 +257,7 @@ static target_long decode_sleb128(uint8_t **pp)
 /* Encode the data collected about the instructions while compiling TB.
Place the data at BLOCK, and return the number of bytes consumed.
 
-   The logical table consisits of TARGET_INSN_START_WORDS target_ulong's,
+   The logical table consists of TARGET_INSN_START_WORDS target_ulong's,
which come from the target's insn_start data, followed by a uintptr_t
which comes from the host pc of the end of the code implementing the insn.
 
-- 
2.7.4

[Qemu-devel] [PATCH 00/22] tcg: tb_lock removal

This series applies on top of the "multiple TCG contexts" series, v4:
  https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg06769.html

Highlights:

- First, fix a few typos I encountered while working on this (patches 1-3).
  I could send them separately to qemu-trivial if you prefer.
- QHT: use a proper cmp function, instead of just checking pointer values
  to determine equality of matches.
- Use a binary search tree for each TCG region.
- Make l1_map lockless by using cmpxchg
- Introduce page locks (for !user-mode), so that tb_lock is not
  needed when operating on a page
  - Introduce page_collection, to lock a range of pages
- Introduce tb->jmp_lock to protect TB jump lists.
- Remove tb_lock. User-mode uses just mmap_lock and tb->jmp_lock's;
  !user-mode uses the same jump locks as well as page locks.

Performance numbers are in patch 22. We get nice speedups, but I still
see a lot of idling when booting many cores. I suspect it comes from
cross-CPU events (e.g. TLB invalidations), but I need to profile it
better (perf is not good for this; mutrace doesn't quite work). But
anyway that's for another patchset.

You can fetch these changes from:
  https://github.com/cota/qemu/tree/multi-tcg-v4-parallel

Please review!

Thanks,

Emilio

diffstat:

 accel/tcg/cpu-exec.c|  58 +-
 accel/tcg/cputlb.c  |   8 -
 accel/tcg/translate-all.c   | 958 +++
 accel/tcg/translate-all.h   |   6 +-
 disas/arm.c |   2 +-
 docs/devel/multi-thread-tcg.txt |  21 +-
 exec.c  |  25 +-
 include/exec/cpu-common.h   |   1 -
 include/exec/exec-all.h |  86 +-
 include/exec/tb-context.h   |   4 -
 include/qemu/qht.h  |  36 +-
 linux-user/main.c   |   3 -
 linux-user/syscall.c|   2 +-
 tcg/tcg.c   | 205 +
 tcg/tcg.h   |  13 +-
 tests/qht-bench.c   |  18 +-
 tests/test-qht.c|  20 +-
 util/qht.c  |  30 +-
 18 files changed, 1022 insertions(+), 474 deletions(-)

[Qemu-devel] [PATCH 02/22] linux-user: fix 'finshed' typo in comment

Signed-off-by: Emilio G. Cota 
---
 linux-user/syscall.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index bbf7913..b1e993f 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -6231,7 +6231,7 @@ static void *clone_func(void *arg)
 pthread_mutex_lock(>mutex);
 pthread_cond_broadcast(>cond);
 pthread_mutex_unlock(>mutex);
-/* Wait until the parent has finshed initializing the tls state.  */
+/* Wait until the parent has finished initializing the tls state.  */
 pthread_mutex_lock(_lock);
 pthread_mutex_unlock(_lock);
 cpu_loop(env);
-- 
2.7.4

[Qemu-devel] [PATCH 04/22] qht: fix kernel-doc markup in qht.h

While at it, s/stuct/struct/.

Signed-off-by: Emilio G. Cota 
---
 include/qemu/qht.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/qemu/qht.h b/include/qemu/qht.h
index 56c2c77..531aa95 100644
--- a/include/qemu/qht.h
+++ b/include/qemu/qht.h
@@ -166,7 +166,7 @@ void qht_iter(struct qht *ht, qht_iter_func_t func, void 
*userp);
 /**
  * qht_statistics_init - Gather statistics from a QHT
  * @ht: QHT to gather statistics from
- * @stats: pointer to a struct qht_stats to be filled in
+ * @stats: pointer to a  qht_stats to be filled in
  *
  * Does NOT need to be called under an RCU read-critical section,
  * since it does not dereference any pointers stored in the hash table.
@@ -177,8 +177,8 @@ void qht_iter(struct qht *ht, qht_iter_func_t func, void 
*userp);
 void qht_statistics_init(struct qht *ht, struct qht_stats *stats);
 
 /**
- * qht_statistics_destroy - Destroy a struct qht_stats
- * @stats: stuct qht_stats to be destroyed
+ * qht_statistics_destroy - Destroy a  qht_stats
+ * @stats:  qht_stats to be destroyed
  *
  * See also: qht_statistics_init().
  */
-- 
2.7.4

[Qemu-devel] [PATCH 09/22] translate-all: introduce iterator macros for tagged TB lists

These will soon gain a couple of users.

Signed-off-by: Emilio G. Cota 
---
 accel/tcg/translate-all.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 8f6f8f1..396c10c 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -115,6 +115,25 @@ typedef struct PageDesc {
 #endif
 } PageDesc;
 
+/* list iterators for lists of tagged pointers in TranslationBlock */
+#define tb_for_each_tagged(head, tb, n, field)  \
+for (n = (head) & 1,\
+ tb = (TranslationBlock *)((head) & ~1);\
+ tb;\
+ tb = (TranslationBlock *)tb->field[n], \
+ n = (uintptr_t)tb & 1, \
+ tb = (TranslationBlock *)((uintptr_t)tb & ~1))
+
+/* prev is a *uintptr_t. It allows us to safely remove tb */
+#define tb_for_each_tagged_safe(head, tb, n, field, prev)   \
+for (prev = &(head),\
+ n = *prev & 1, \
+ tb = (TranslationBlock *)(*prev & ~1); \
+ tb;\
+ prev = >field[n],  \
+ n = (uintptr_t)*prev & 1,  \
+ tb = (TranslationBlock *)(*prev & ~1))
+
 /* In system mode we want L1_MAP to be based on ram offsets,
while in user mode we want it to be based on virtual addresses.  */
 #if !defined(CONFIG_USER_ONLY)
-- 
2.7.4

Re: [Qemu-devel] [Qemu-block] [PATCH for-2.10 0/5] block: bdrv_reopen() fixes



On 08/03/2017 11:02 AM, Kevin Wolf wrote:
> This is the first part of some fixes to bdrv_reopen(), which seems
> reasonable enough to merge for 2.10.
> 
> There is much more wrong with bdrv_reopen() currently, especially with
> respect to op blocker permissions (basically the required permissions
> can change based on the options used in bdrv_reopen(), and both things
> affect the whole tree and aren't integrated with each other at all), but
> I have little hope to get this fixed before 2.10, so let's start with
> what is ready now.
> 
> Kevin Wolf (5):
>   block: Fix order in bdrv_replace_child()
>   block: Allow reopen rw without BDRV_O_ALLOW_RDWR
>   block: Set BDRV_O_ALLOW_RDWR during rw reopen
>   qemu-io: Allow reopen read-write
>   qemu-iotests: Test reopen between read-only and read-write
> 
>  include/block/block.h  |  3 +-
>  block.c| 20 +-
>  qemu-io-cmds.c | 19 +++--
>  tests/qemu-iotests/187 | 69 
> ++
>  tests/qemu-iotests/187.out | 18 
>  tests/qemu-iotests/group   |  1 +
>  6 files changed, 120 insertions(+), 10 deletions(-)
>  create mode 100755 tests/qemu-iotests/187
>  create mode 100644 tests/qemu-iotests/187.out
> 

Not that you need it, but I reviewed the series because I wanted to know
what this change was, so while I'm here:

Reviewed-by: John Snow

Re: [Qemu-devel] [Qemu-block] [PATCH 3/4] qcow2: Drop debugging dump_refcounts()

On Mon, Aug 07, 2017 at 03:30:06PM -0500, Eric Blake wrote:
> It's been #if 0'd since its introduction in 2006, commit 585f8587.
> We can revive dead code if we need it, but in the meantime, it has
> bit-rotted (for example, not checking for failure in bdrv_getlength()).
> 
> Signed-off-by: Eric Blake 


Reviewed-by: Jeff Cody 


> ---
>  block/qcow2.c | 21 -
>  1 file changed, 21 deletions(-)
> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index d7c600b5a2..99407403ea 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -3798,27 +3798,6 @@ static ImageInfoSpecific 
> *qcow2_get_specific_info(BlockDriverState *bs)
>  return spec_info;
>  }
> 
> -#if 0
> -static void dump_refcounts(BlockDriverState *bs)
> -{
> -BDRVQcow2State *s = bs->opaque;
> -int64_t nb_clusters, k, k1, size;
> -int refcount;
> -
> -size = bdrv_getlength(bs->file->bs);
> -nb_clusters = size_to_clusters(s, size);
> -for(k = 0; k < nb_clusters;) {
> -k1 = k;
> -refcount = get_refcount(bs, k);
> -k++;
> -while (k < nb_clusters && get_refcount(bs, k) == refcount)
> -k++;
> -printf("%" PRId64 ": refcount=%d nb=%" PRId64 "\n", k, refcount,
> -   k - k1);
> -}
> -}
> -#endif
> -
>  static int qcow2_save_vmstate(BlockDriverState *bs, QEMUIOVector *qiov,
>int64_t pos)
>  {
> -- 
> 2.13.4
> 
>

Re: [Qemu-devel] [Qemu-block] [PATCH 2/4] qcow: Check failure of bdrv_getlength() and bdrv_truncate()

On Mon, Aug 07, 2017 at 03:30:05PM -0500, Eric Blake wrote:
> This also requires changing the return type of get_cluster_offset()
> and adjusting all callers.
> 
> Use osdep.h macros instead of open-coded rounding while in the
> area.
> 
> Reported-by: Markus Armbruster 
> Signed-off-by: Eric Blake 
> ---
>  block/qcow.c | 64 
> ++--
>  1 file changed, 45 insertions(+), 19 deletions(-)
> 
> diff --git a/block/qcow.c b/block/qcow.c
> index c08cdc4a7b..937023d447 100644
> --- a/block/qcow.c
> +++ b/block/qcow.c
> @@ -347,16 +347,17 @@ static int qcow_reopen_prepare(BDRVReopenState *state,
>   * 'compressed_size'. 'compressed_size' must be > 0 and <
>   * cluster_size
>   *
> - * return 0 if not allocated.
> + * return 0 if not allocated, -errno on failure.
>   */
> -static uint64_t get_cluster_offset(BlockDriverState *bs,
> -   uint64_t offset, int allocate,
> -   int compressed_size,
> -   int n_start, int n_end)
> +static int64_t get_cluster_offset(BlockDriverState *bs,
> +  uint64_t offset, int allocate,
> +  int compressed_size,
> +  int n_start, int n_end)
>  {
>  BDRVQcowState *s = bs->opaque;
>  int min_index, i, j, l1_index, l2_index;
> -uint64_t l2_offset, *l2_table, cluster_offset, tmp;
> +int64_t l2_offset, cluster_offset;
> +uint64_t *l2_table, tmp;
>  uint32_t min_count;
>  int new_l2_table;
> 
> @@ -368,8 +369,11 @@ static uint64_t get_cluster_offset(BlockDriverState *bs,
>  return 0;
>  /* allocate a new l2 entry */
>  l2_offset = bdrv_getlength(bs->file->bs);
> +if (l2_offset < 0) {
> +return l2_offset;
> +}
>  /* round to cluster size */
> -l2_offset = (l2_offset + s->cluster_size - 1) & ~(s->cluster_size - 
> 1);
> +l2_offset = QEMU_ALIGN_UP(l2_offset, s->cluster_size);
>  /* update the L1 entry */
>  s->l1_table[l1_index] = l2_offset;
>  tmp = cpu_to_be64(l2_offset);
> @@ -430,8 +434,10 @@ static uint64_t get_cluster_offset(BlockDriverState *bs,
>  if (decompress_cluster(bs, cluster_offset) < 0)
>  return 0;
>  cluster_offset = bdrv_getlength(bs->file->bs);
> -cluster_offset = (cluster_offset + s->cluster_size - 1) &
> -~(s->cluster_size - 1);
> +if (cluster_offset < 0) {
> +return cluster_offset;
> +}
> +cluster_offset = QEMU_ALIGN_UP(cluster_offset, s->cluster_size);
>  /* write the cluster content */
>  if (bdrv_pwrite(bs->file, cluster_offset, s->cluster_cache,
>  s->cluster_size) !=
> @@ -439,12 +445,19 @@ static uint64_t get_cluster_offset(BlockDriverState *bs,
>  return -1;
>  } else {
>  cluster_offset = bdrv_getlength(bs->file->bs);
> +if (cluster_offset < 0) {
> +return cluster_offset;
> +}
>  if (allocate == 1) {
> +int ret;
> +
>  /* round to cluster size */
> -cluster_offset = (cluster_offset + s->cluster_size - 1) &
> -~(s->cluster_size - 1);
> -bdrv_truncate(bs->file, cluster_offset + s->cluster_size,
> -  PREALLOC_MODE_OFF, NULL);
> +cluster_offset = QEMU_ALIGN_UP(cluster_offset, 
> s->cluster_size);
> +ret = bdrv_truncate(bs->file, cluster_offset + 
> s->cluster_size,
> +PREALLOC_MODE_OFF, NULL);

My r-b stands, but do you think it is worth checking for
(cluster_offset + s->cluster_size) > INT64_MAX?

> +if (ret < 0) {
> +return ret;
> +}
>  /* if encrypted, we must initialize the cluster
> content which won't be written */
>  if (bs->encrypted &&
> @@ -491,11 +504,14 @@ static int64_t coroutine_fn 
> qcow_co_get_block_status(BlockDriverState *bs,
>  {
>  BDRVQcowState *s = bs->opaque;
>  int index_in_cluster, n;
> -uint64_t cluster_offset;
> +int64_t cluster_offset;
> 
>  qemu_co_mutex_lock(>lock);
>  cluster_offset = get_cluster_offset(bs, sector_num << 9, 0, 0, 0, 0);
>  qemu_co_mutex_unlock(>lock);
> +if (cluster_offset < 0) {
> +return cluster_offset;
> +}
>  index_in_cluster = sector_num & (s->cluster_sectors - 1);
>  n = s->cluster_sectors - index_in_cluster;
>  if (n > nb_sectors)
> @@ -567,7 +583,7 @@ static coroutine_fn int qcow_co_readv(BlockDriverState 
> *bs, int64_t sector_num,
>  BDRVQcowState *s = bs->opaque;
>  int index_in_cluster;
>  int ret =

Re: [Qemu-devel] [Qemu-block] [PATCH 4/4] qcow2: Check failure of bdrv_getlength()

On Mon, Aug 07, 2017 at 03:30:07PM -0500, Eric Blake wrote:
> qcow2_co_pwritev_compressed() should not call bdrv_truncate()
> if determining the size failed.
> 
> Reported-by: Markus Armbruster 
> Signed-off-by: Eric Blake 



Reviewed-by: Jeff Cody 


> ---
>  block/qcow2.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 99407403ea..40ba26c111 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -3282,12 +3282,15 @@ qcow2_co_pwritev_compressed(BlockDriverState *bs, 
> uint64_t offset,
>  z_stream strm;
>  int ret, out_len;
>  uint8_t *buf, *out_buf;
> -uint64_t cluster_offset;
> +int64_t cluster_offset;
> 
>  if (bytes == 0) {
>  /* align end of file to a sector boundary to ease reading with
> sector based I/Os */
>  cluster_offset = bdrv_getlength(bs->file->bs);
> +if (cluster_offset < 0) {
> +return cluster_offset;
> +}
>  return bdrv_truncate(bs->file, cluster_offset, PREALLOC_MODE_OFF, 
> NULL);
>  }
> 
> -- 
> 2.13.4
> 
>

Re: [Qemu-devel] [Qemu-block] [PATCH 2/4] qcow: Check failure of bdrv_getlength() and bdrv_truncate()

On Mon, Aug 07, 2017 at 03:30:05PM -0500, Eric Blake wrote:
> This also requires changing the return type of get_cluster_offset()
> and adjusting all callers.
> 
> Use osdep.h macros instead of open-coded rounding while in the
> area.
> 
> Reported-by: Markus Armbruster 
> Signed-off-by: Eric Blake 



Reviewed-by: Jeff Cody 



> ---
>  block/qcow.c | 64 
> ++--
>  1 file changed, 45 insertions(+), 19 deletions(-)
> 
> diff --git a/block/qcow.c b/block/qcow.c
> index c08cdc4a7b..937023d447 100644
> --- a/block/qcow.c
> +++ b/block/qcow.c
> @@ -347,16 +347,17 @@ static int qcow_reopen_prepare(BDRVReopenState *state,
>   * 'compressed_size'. 'compressed_size' must be > 0 and <
>   * cluster_size
>   *
> - * return 0 if not allocated.
> + * return 0 if not allocated, -errno on failure.
>   */
> -static uint64_t get_cluster_offset(BlockDriverState *bs,
> -   uint64_t offset, int allocate,
> -   int compressed_size,
> -   int n_start, int n_end)
> +static int64_t get_cluster_offset(BlockDriverState *bs,
> +  uint64_t offset, int allocate,
> +  int compressed_size,
> +  int n_start, int n_end)
>  {
>  BDRVQcowState *s = bs->opaque;
>  int min_index, i, j, l1_index, l2_index;
> -uint64_t l2_offset, *l2_table, cluster_offset, tmp;
> +int64_t l2_offset, cluster_offset;
> +uint64_t *l2_table, tmp;
>  uint32_t min_count;
>  int new_l2_table;
> 
> @@ -368,8 +369,11 @@ static uint64_t get_cluster_offset(BlockDriverState *bs,
>  return 0;
>  /* allocate a new l2 entry */
>  l2_offset = bdrv_getlength(bs->file->bs);
> +if (l2_offset < 0) {
> +return l2_offset;
> +}
>  /* round to cluster size */
> -l2_offset = (l2_offset + s->cluster_size - 1) & ~(s->cluster_size - 
> 1);
> +l2_offset = QEMU_ALIGN_UP(l2_offset, s->cluster_size);
>  /* update the L1 entry */
>  s->l1_table[l1_index] = l2_offset;
>  tmp = cpu_to_be64(l2_offset);
> @@ -430,8 +434,10 @@ static uint64_t get_cluster_offset(BlockDriverState *bs,
>  if (decompress_cluster(bs, cluster_offset) < 0)
>  return 0;
>  cluster_offset = bdrv_getlength(bs->file->bs);
> -cluster_offset = (cluster_offset + s->cluster_size - 1) &
> -~(s->cluster_size - 1);
> +if (cluster_offset < 0) {
> +return cluster_offset;
> +}
> +cluster_offset = QEMU_ALIGN_UP(cluster_offset, s->cluster_size);
>  /* write the cluster content */
>  if (bdrv_pwrite(bs->file, cluster_offset, s->cluster_cache,
>  s->cluster_size) !=
> @@ -439,12 +445,19 @@ static uint64_t get_cluster_offset(BlockDriverState *bs,
>  return -1;
>  } else {
>  cluster_offset = bdrv_getlength(bs->file->bs);
> +if (cluster_offset < 0) {
> +return cluster_offset;
> +}
>  if (allocate == 1) {
> +int ret;
> +
>  /* round to cluster size */
> -cluster_offset = (cluster_offset + s->cluster_size - 1) &
> -~(s->cluster_size - 1);
> -bdrv_truncate(bs->file, cluster_offset + s->cluster_size,
> -  PREALLOC_MODE_OFF, NULL);
> +cluster_offset = QEMU_ALIGN_UP(cluster_offset, 
> s->cluster_size);
> +ret = bdrv_truncate(bs->file, cluster_offset + 
> s->cluster_size,
> +PREALLOC_MODE_OFF, NULL);
> +if (ret < 0) {
> +return ret;
> +}
>  /* if encrypted, we must initialize the cluster
> content which won't be written */
>  if (bs->encrypted &&
> @@ -491,11 +504,14 @@ static int64_t coroutine_fn 
> qcow_co_get_block_status(BlockDriverState *bs,
>  {
>  BDRVQcowState *s = bs->opaque;
>  int index_in_cluster, n;
> -uint64_t cluster_offset;
> +int64_t cluster_offset;
> 
>  qemu_co_mutex_lock(>lock);
>  cluster_offset = get_cluster_offset(bs, sector_num << 9, 0, 0, 0, 0);
>  qemu_co_mutex_unlock(>lock);
> +if (cluster_offset < 0) {
> +return cluster_offset;
> +}
>  index_in_cluster = sector_num & (s->cluster_sectors - 1);
>  n = s->cluster_sectors - index_in_cluster;
>  if (n > nb_sectors)
> @@ -567,7 +583,7 @@ static coroutine_fn int qcow_co_readv(BlockDriverState 
> *bs, int64_t sector_num,
>  BDRVQcowState *s = bs->opaque;
>  int index_in_cluster;
>  int ret = 0, n;
> -uint64_t cluster_offset;
> +int64_t

Re: [Qemu-devel] [Qemu-block] [PATCH 1/4] vpc: Check failure of bdrv_getlength()

On Mon, Aug 07, 2017 at 03:30:04PM -0500, Eric Blake wrote:
> vpc_open() was checking for bdrv_getlength() failure in one, but
> not the other, location.
> 
> Reported-by: Markus Armbruster 
> Signed-off-by: Eric Blake 
> ---
>  block/vpc.c | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/block/vpc.c b/block/vpc.c
> index 574879ba7c..468d10ec1c 100644
> --- a/block/vpc.c
> +++ b/block/vpc.c
> @@ -219,6 +219,7 @@ static int vpc_open(BlockDriverState *bs, QDict *options, 
> int flags,
>  uint64_t pagetable_size;
>  int disk_type = VHD_DYNAMIC;
>  int ret;
> +int64_t bs_size;
> 
>  bs->file = bdrv_open_child(NULL, options, "file", bs, _file,
> false, errp);
> @@ -411,7 +412,13 @@ static int vpc_open(BlockDriverState *bs, QDict 
> *options, int flags,
>  }
>  }
> 
> -if (s->free_data_block_offset > bdrv_getlength(bs->file->bs)) {
> +bs_size = bdrv_getlength(bs->file->bs);
> +if (bs_size < 0) {
> +error_setg_errno(errp, -bs_size, "unable to learn image size");
> +ret = bs_size;
> +goto fail;
> +}
> +if (s->free_data_block_offset > bs_size) {
>  error_setg(errp, "block-vpc: free_data_block_offset points after 
> "
>   "the end of file. The image has been 
> truncated.");
>  ret = -EINVAL;
> -- 
> 2.13.4
> 
> 

Reviewed-by: Jeff Cody

Re: [Qemu-devel] [Qemu-arm] [PATCH 1/8] memory.h: Move MemTxResult type to memattrs.h

2017-08-07 Thread Alistair Francis

On Fri, Aug 4, 2017 at 5:59 PM, Edgar E. Iglesias
 wrote:
> On Fri, Aug 04, 2017 at 06:20:42PM +0100, Peter Maydell wrote:
>> Move the MemTxResult type to memattrs.h. We're going to want to
>> use it in cpu/qom.h, which doesn't want to include all of
>> memory.h. In practice MemTxResult and MemTxAttrs are pretty
>> closely linked since both are used for the new-style
>> read_with_attrs and write_with_attrs callbacks, so memattrs.h
>> is a reasonable home for this rather than creating a whole
>> new header file for it.
>>
>> Signed-off-by: Peter Maydell 
>
> Reviewed-by: Edgar E. Iglesias 

Reviewed-by: Alistair Francis 

Thanks,
Alistair

>
>
>> ---
>>  include/exec/memattrs.h | 10 ++
>>  include/exec/memory.h   | 10 --
>>  2 files changed, 10 insertions(+), 10 deletions(-)
>>
>> diff --git a/include/exec/memattrs.h b/include/exec/memattrs.h
>> index e601061..d4a1642 100644
>> --- a/include/exec/memattrs.h
>> +++ b/include/exec/memattrs.h
>> @@ -46,4 +46,14 @@ typedef struct MemTxAttrs {
>>   */
>>  #define MEMTXATTRS_UNSPECIFIED ((MemTxAttrs) { .unspecified = 1 })
>>
>> +/* New-style MMIO accessors can indicate that the transaction failed.
>> + * A zero (MEMTX_OK) response means success; anything else is a failure
>> + * of some kind. The memory subsystem will bitwise-OR together results
>> + * if it is synthesizing an operation from multiple smaller accesses.
>> + */
>> +#define MEMTX_OK 0
>> +#define MEMTX_ERROR (1U << 0) /* device returned an error */
>> +#define MEMTX_DECODE_ERROR  (1U << 1) /* nothing at that address */
>> +typedef uint32_t MemTxResult;
>> +
>>  #endif
>> diff --git a/include/exec/memory.h b/include/exec/memory.h
>> index 400dd44..1dcd312 100644
>> --- a/include/exec/memory.h
>> +++ b/include/exec/memory.h
>> @@ -112,16 +112,6 @@ static inline void iommu_notifier_init(IOMMUNotifier 
>> *n, IOMMUNotify fn,
>>  n->end = end;
>>  }
>>
>> -/* New-style MMIO accessors can indicate that the transaction failed.
>> - * A zero (MEMTX_OK) response means success; anything else is a failure
>> - * of some kind. The memory subsystem will bitwise-OR together results
>> - * if it is synthesizing an operation from multiple smaller accesses.
>> - */
>> -#define MEMTX_OK 0
>> -#define MEMTX_ERROR (1U << 0) /* device returned an error */
>> -#define MEMTX_DECODE_ERROR  (1U << 1) /* nothing at that address */
>> -typedef uint32_t MemTxResult;
>> -
>>  /*
>>   * Memory region callbacks
>>   */
>> --
>> 2.7.4
>>
>>
>

Re: [Qemu-devel] [Qemu-block] [PATCH for-2.10 0/4] More blk_getlength() fixes



On 08/07/2017 04:30 PM, Eric Blake wrote:
> Thanks again to Markus for catching these spots.
> 
> Eric Blake (4):
>   vpc: Check failure of bdrv_getlength()
>   qcow: Check failure of bdrv_getlength() and bdrv_truncate()
>   qcow2: Drop debugging dump_refcounts()
>   qcow2: Check failure of bdrv_getlength()
> 
>  block/qcow.c  | 64 
> +--
>  block/qcow2.c | 26 
>  block/vpc.c   |  9 -
>  3 files changed, 57 insertions(+), 42 deletions(-)
> 

Reviewed-by: John Snow

[Qemu-devel] [PATCH for-2.10] block/nfs: fix mutex assertion in nfs_file_close()

Commit c096358e747e88fc7364e40e3c354ee0bb683960 introduced assertion
checks for when qemu_mutex() functions are called without the
corresponding qemu_mutex_init() having initialized the mutex.

This uncovered a latent bug in qemu's nfs driver - in
nfs_client_close(), the NFSClient structure is overwritten with zeros,
prior to the mutex being destroyed.

Go ahead and destroy the mutex in nfs_client_close(), and change where
we call qemu_mutex_init() so that it is correctly balanced.

There are also a couple of memory leaks obscured by the memset, so this
fixes those as well.

Finally, we should be able to get rid of the memset(), as it isn't
necessary.

Signed-off-by: Jeff Cody 
---
 block/nfs.c | 21 +
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/block/nfs.c b/block/nfs.c
index d8db419..bec16b7 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -433,19 +433,23 @@ static void nfs_client_close(NFSClient *client)
 if (client->context) {
 if (client->fh) {
 nfs_close(client->context, client->fh);
+client->fh = NULL;
 }
 aio_set_fd_handler(client->aio_context, nfs_get_fd(client->context),
false, NULL, NULL, NULL, NULL);
 nfs_destroy_context(client->context);
+client->context = NULL;
 }
-memset(client, 0, sizeof(NFSClient));
-}
-
-static void nfs_file_close(BlockDriverState *bs)
-{
-NFSClient *client = bs->opaque;
-nfs_client_close(client);
+g_free(client->path);
 qemu_mutex_destroy(>mutex);
+qapi_free_NFSServer(client->server);
+client->server = NULL;
+}
+
+static void nfs_file_close(BlockDriverState *bs)
+{
+NFSClient *client = bs->opaque;
+nfs_client_close(client);
 }
 
 static NFSServer *nfs_config(QDict *options, Error **errp)
@@ -498,6 +502,7 @@ static int64_t nfs_client_open(NFSClient *client, QDict 
*options,
 struct stat st;
 char *file = NULL, *strp = NULL;
 
+qemu_mutex_init(>mutex);
 opts = qemu_opts_create(_opts, NULL, 0, _abort);
 qemu_opts_absorb_qdict(opts, options, _err);
 if (local_err) {
@@ -660,7 +665,7 @@ static int nfs_file_open(BlockDriverState *bs, QDict 
*options, int flags,
 if (ret < 0) {
 return ret;
 }
-qemu_mutex_init(>mutex);
+
 bs->total_sectors = ret;
 ret = 0;
 return ret;
-- 
2.9.4

Re: [Qemu-devel] [PATCH for-2.11 0/5] qmp-shell non-interactive mode, delete scripts/qmp/qmp



On 08/04/2017 05:36 PM, Eduardo Habkost wrote:
> This series adds the ability to run QMP commands
> non-interactively to qmp-shell, and deletes scripts/qmp/qmp.
> 
> Eduardo Habkost (5):
>   qmp-shell: Use argparse module
>   qmp-shell: Pass split cmdargs to __build_cmd()
>   qmp-shell: execute_cmdargs() method
>   qmp-shell: Accept QMP command as argument
>   Remove scripts/qmp/qmp
> 
>  scripts/qmp/qmp   | 126 
> --
>  scripts/qmp/qmp-shell |  90 
>  2 files changed, 40 insertions(+), 176 deletions(-)
>  delete mode 100755 scripts/qmp/qmp
> 

Assuming there are no reasons to keep the old script around (I certainly
never used it) this appears to be fine. May as well fortify one really
good QMP tool instead of a dozen little half baked ones.

Tested-by: John Snow 
Reviewed-by: John Snow

Re: [Qemu-devel] [PULL 1/2] hw/i386: allow SHPC for Q35 machine

On Tue, Aug 08, 2017 at 12:40:08AM +0300, Michael S. Tsirkin wrote:
> From: Aleksandr Bezzubikov 
> 
> Unmask previously masked SHPC feature in _OSC method.
> 
> Signed-off-by: Aleksandr Bezzubikov 
> Reviewed-by: Marcel Apfelbaum 
> Reviewed-by: Michael S. Tsirkin 
> Signed-off-by: Michael S. Tsirkin 

Cc: qemu-sta...@nongnu.org

> ---
>  hw/i386/acpi-build.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index b9c245c..98dd424 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -1862,9 +1862,9 @@ static Aml *build_q35_osc_method(void)
>  
>  /*
>   * Always allow native PME, AER (no dependencies)
> - * Never allow SHPC (no SHPC controller in this system)
> + * Allow SHPC (PCI bridges can have SHPC controller)
>   */
> -aml_append(if_ctx, aml_and(a_ctrl, aml_int(0x1D), a_ctrl));
> +aml_append(if_ctx, aml_and(a_ctrl, aml_int(0x1F), a_ctrl));
>  
>  if_ctx2 = aml_if(aml_lnot(aml_equal(aml_arg(1), aml_int(1;
>  /* Unknown revision */
> -- 
> MST

[Qemu-devel] [PULL 2/2] cpu: add APIs to allocate/free CPU environment

These will be implemented and then used by follow-up patches.

Signed-off-by: Michael S. Tsirkin 
---
 include/qom/cpu.h | 31 +++
 qom/cpu.c | 34 ++
 2 files changed, 65 insertions(+)

diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index 25eefea..e9d30c5 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -162,6 +162,10 @@ typedef struct CPUClass {
 void (*dump_statistics)(CPUState *cpu, FILE *f,
 fprintf_function cpu_fprintf, int flags);
 int64_t (*get_arch_id)(CPUState *cpu);
+void * (*alloc_env)(CPUState *cpu);
+void (*get_env)(CPUState *cpu, void *env);
+void (*set_env)(CPUState *cpu, void *env);
+void (*free_env)(CPUState *cpu, void *env);
 bool (*get_paging_enabled)(const CPUState *cpu);
 void (*get_memory_mapping)(CPUState *cpu, MemoryMappingList *list,
Error **errp);
@@ -440,6 +444,33 @@ extern bool mttcg_enabled;
 #define qemu_tcg_mttcg_enabled() (mttcg_enabled)
 
 /**
+ * cpu_alloc_env: allocate CPU environment structure
+ * @cpu: allocate environment structure for this CPU
+ */
+void *cpu_alloc_env(CPUState *cpu);
+
+/**
+ * cpu_get_env: retrieve CPU environment structure
+ * @cpu: CPU to use
+ * @env: environment structure to use
+ */
+void cpu_get_env(CPUState *cpu, void *env);
+
+/**
+ * cpu_set_env: switch to given CPU environment
+ * @cpu: CPU to use
+ * @env: environment structure to use
+ */
+void cpu_set_env(CPUState *cpu, void *env);
+
+/**
+ * cpu_free_env: free CPU environment structure
+ * @cpu: free environment structure for this CPU
+ * @env: structure to free
+ */
+void cpu_free_env(CPUState *cpu, void *env);
+
+/**
  * cpu_paging_enabled:
  * @cpu: The CPU whose state is to be inspected.
  *
diff --git a/qom/cpu.c b/qom/cpu.c
index 4f38db0..9201fd9 100644
--- a/qom/cpu.c
+++ b/qom/cpu.c
@@ -89,6 +89,40 @@ out:
 return cpu;
 }
 
+void *cpu_alloc_env(CPUState *cpu)
+{
+CPUClass *cc = CPU_GET_CLASS(cpu);
+
+return cc->alloc_env ? cc->alloc_env(cpu) : NULL;
+}
+
+void cpu_get_env(CPUState *cpu, void *env)
+{
+CPUClass *cc = CPU_GET_CLASS(cpu);
+
+if (cc->get_env) {
+cc->get_env(cpu, env);
+}
+}
+
+void cpu_set_env(CPUState *cpu, void *env)
+{
+CPUClass *cc = CPU_GET_CLASS(cpu);
+
+if (cc->set_env) {
+cc->set_env(cpu, env);
+}
+}
+
+void cpu_free_env(CPUState *cpu, void *env)
+{
+CPUClass *cc = CPU_GET_CLASS(cpu);
+
+if (cc->free_env) {
+cc->free_env(cpu, env);
+}
+}
+
 bool cpu_paging_enabled(const CPUState *cpu)
 {
 CPUClass *cc = CPU_GET_CLASS(cpu);
-- 
MST

[Qemu-devel] [PULL 1/2] hw/i386: allow SHPC for Q35 machine

From: Aleksandr Bezzubikov 

Unmask previously masked SHPC feature in _OSC method.

Signed-off-by: Aleksandr Bezzubikov 
Reviewed-by: Marcel Apfelbaum 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/i386/acpi-build.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index b9c245c..98dd424 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -1862,9 +1862,9 @@ static Aml *build_q35_osc_method(void)
 
 /*
  * Always allow native PME, AER (no dependencies)
- * Never allow SHPC (no SHPC controller in this system)
+ * Allow SHPC (PCI bridges can have SHPC controller)
  */
-aml_append(if_ctx, aml_and(a_ctrl, aml_int(0x1D), a_ctrl));
+aml_append(if_ctx, aml_and(a_ctrl, aml_int(0x1F), a_ctrl));
 
 if_ctx2 = aml_if(aml_lnot(aml_equal(aml_arg(1), aml_int(1;
 /* Unknown revision */
-- 
MST

[Qemu-devel] [PULL 0/2] virtio: fix for rc2

The following changes since commit e6a74868d92f858ac33915b6772999d7de2fd288:

  build-sys: add --disable-vhost-user (2017-08-03 15:55:41 +0300)

are available in the git repository at:

  git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream

for you to fetch changes up to e2a7f28693aea7e194ec1435697ec4feb24f8a6f:

  cpu: add APIs to allocate/free CPU environment (2017-08-08 00:31:09 +0300)


virtio: fix for rc2

It turns out there's a way to setup SHPC on Q35: just put
a PCI to PCI bridge behind a DMI to PCI one. Our _OSC is
thus incorrect.

Signed-off-by: Michael S. Tsirkin 


Aleksandr Bezzubikov (1):
  hw/i386: allow SHPC for Q35 machine

Michael S. Tsirkin (1):
  cpu: add APIs to allocate/free CPU environment

 include/qom/cpu.h| 31 +++
 hw/i386/acpi-build.c |  4 ++--
 qom/cpu.c| 34 ++
 3 files changed, 67 insertions(+), 2 deletions(-)

Re: [Qemu-devel] [PATCH for-2.10] throttle: Make LeakyBucket.avg and LeakyBucket.max integer types

On 08/07/2017 11:15 AM, Alberto Garcia wrote:
> Both the throttling limits set with the throttling.iops-* and
> throttling.bps-* options and their QMP equivalents defined in the
> BlockIOThrottle struct are integer values.
> 
> Those limits are also reported in the BlockDeviceInfo struct and they
> are integers there as well.
> 
> Therefore there's no reason to store them internally as double and do
> the conversion everytime we're setting or querying them, so this patch
> uses int64_t for those types.
> 
> LeakyBucket.level and LeakyBucket.burst_level do however remain double
> because their value changes depending on the fraction of time elapsed
> since the previous I/O operation.
> 
> There's one particular instance of the previous code where bkt->max
> could have a non-integer value: that's in throttle_fix_bucket() when
> bkt->max is initialized to bkt->avg / 10. This is now an integer
> division and the result is rounded. We don't need to worry about this
> because:
> 
>a) with the magnitudes we're dealing with (bytes per second, I/O
>   operations per second) the limits are likely to be always
>   multiples of 10.
> 
>b) even if they weren't this doesn't affect the actual limits, only
>   the algorithm that makes the throttling smoother.
> 
> Signed-off-by: Alberto Garcia 
> ---
>  include/qemu/throttle.h | 4 ++--
>  util/throttle.c | 7 ++-
>  2 files changed, 4 insertions(+), 7 deletions(-)

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [Qemu-arm] [PATCH 2/2] loader: Ignore zero-sized ELF segments


On 08/07/2017 11:39 AM, Peter Maydell wrote:

Some ELF files have program headers that specify segments that
are of zero size. Ignore them, rather than trying to create
zero-length ROM blobs for them, because the zero-length blob
can falsely trigger the overlapping-ROM-blobs check.

Signed-off-by: Peter Maydell 


Reviewed-by: Philippe Mathieu-Daudé 


---
  include/hw/elf_ops.h | 24 +---
  1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/include/hw/elf_ops.h b/include/hw/elf_ops.h
index 2e526d3..d192e7e 100644
--- a/include/hw/elf_ops.h
+++ b/include/hw/elf_ops.h
@@ -451,14 +451,24 @@ static int glue(load_elf, SZ)(const char *name, int fd,
  *pentry = ehdr.e_entry - ph->p_vaddr + ph->p_paddr;
  }
  
-if (load_rom) {

-snprintf(label, sizeof(label), "phdr #%d: %s", i, name);
-
-/* rom_add_elf_program() seize the ownership of 'data' */
-rom_add_elf_program(label, data, file_size, mem_size, addr, 
as);
-} else {
-cpu_physical_memory_write(addr, data, file_size);
+if (mem_size == 0) {
+/* Some ELF files really do have segments of zero size;
+ * just ignore them rather than trying to create empty
+ * ROM blobs, because the zero-length blob can falsely
+ * trigger the overlapping-ROM-blobs check.
+ */
  g_free(data);
+} else {
+if (load_rom) {
+snprintf(label, sizeof(label), "phdr #%d: %s", i, name);
+
+/* rom_add_elf_program() seize the ownership of 'data' */
+rom_add_elf_program(label, data, file_size, mem_size,
+addr, as);
+} else {
+cpu_physical_memory_write(addr, data, file_size);
+g_free(data);
+}
  }
  
  total_size += mem_size;

Re: [Qemu-devel] [PATCH 2/4] qcow: Check failure of bdrv_getlength() and bdrv_truncate()


On 08/07/2017 05:30 PM, Eric Blake wrote:

This also requires changing the return type of get_cluster_offset()
and adjusting all callers.

Use osdep.h macros instead of open-coded rounding while in the
area.

Reported-by: Markus Armbruster 
Signed-off-by: Eric Blake 


Reviewed-by: Philippe Mathieu-Daudé 


---
  block/qcow.c | 64 ++--
  1 file changed, 45 insertions(+), 19 deletions(-)

diff --git a/block/qcow.c b/block/qcow.c
index c08cdc4a7b..937023d447 100644
--- a/block/qcow.c
+++ b/block/qcow.c
@@ -347,16 +347,17 @@ static int qcow_reopen_prepare(BDRVReopenState *state,
   * 'compressed_size'. 'compressed_size' must be > 0 and <
   * cluster_size
   *
- * return 0 if not allocated.
+ * return 0 if not allocated, -errno on failure.
   */
-static uint64_t get_cluster_offset(BlockDriverState *bs,
-   uint64_t offset, int allocate,
-   int compressed_size,
-   int n_start, int n_end)
+static int64_t get_cluster_offset(BlockDriverState *bs,
+  uint64_t offset, int allocate,
+  int compressed_size,
+  int n_start, int n_end)
  {
  BDRVQcowState *s = bs->opaque;
  int min_index, i, j, l1_index, l2_index;
-uint64_t l2_offset, *l2_table, cluster_offset, tmp;
+int64_t l2_offset, cluster_offset;
+uint64_t *l2_table, tmp;
  uint32_t min_count;
  int new_l2_table;

@@ -368,8 +369,11 @@ static uint64_t get_cluster_offset(BlockDriverState *bs,
  return 0;
  /* allocate a new l2 entry */
  l2_offset = bdrv_getlength(bs->file->bs);
+if (l2_offset < 0) {
+return l2_offset;
+}
  /* round to cluster size */
-l2_offset = (l2_offset + s->cluster_size - 1) & ~(s->cluster_size - 1);
+l2_offset = QEMU_ALIGN_UP(l2_offset, s->cluster_size);
  /* update the L1 entry */
  s->l1_table[l1_index] = l2_offset;
  tmp = cpu_to_be64(l2_offset);
@@ -430,8 +434,10 @@ static uint64_t get_cluster_offset(BlockDriverState *bs,
  if (decompress_cluster(bs, cluster_offset) < 0)
  return 0;
  cluster_offset = bdrv_getlength(bs->file->bs);
-cluster_offset = (cluster_offset + s->cluster_size - 1) &
-~(s->cluster_size - 1);
+if (cluster_offset < 0) {
+return cluster_offset;
+}
+cluster_offset = QEMU_ALIGN_UP(cluster_offset, s->cluster_size);
  /* write the cluster content */
  if (bdrv_pwrite(bs->file, cluster_offset, s->cluster_cache,
  s->cluster_size) !=
@@ -439,12 +445,19 @@ static uint64_t get_cluster_offset(BlockDriverState *bs,
  return -1;
  } else {
  cluster_offset = bdrv_getlength(bs->file->bs);
+if (cluster_offset < 0) {
+return cluster_offset;
+}
  if (allocate == 1) {
+int ret;
+
  /* round to cluster size */
-cluster_offset = (cluster_offset + s->cluster_size - 1) &
-~(s->cluster_size - 1);
-bdrv_truncate(bs->file, cluster_offset + s->cluster_size,
-  PREALLOC_MODE_OFF, NULL);
+cluster_offset = QEMU_ALIGN_UP(cluster_offset, 
s->cluster_size);
+ret = bdrv_truncate(bs->file, cluster_offset + s->cluster_size,
+PREALLOC_MODE_OFF, NULL);
+if (ret < 0) {
+return ret;
+}
  /* if encrypted, we must initialize the cluster
 content which won't be written */
  if (bs->encrypted &&
@@ -491,11 +504,14 @@ static int64_t coroutine_fn 
qcow_co_get_block_status(BlockDriverState *bs,
  {
  BDRVQcowState *s = bs->opaque;
  int index_in_cluster, n;
-uint64_t cluster_offset;
+int64_t cluster_offset;

  qemu_co_mutex_lock(>lock);
  cluster_offset = get_cluster_offset(bs, sector_num << 9, 0, 0, 0, 0);
  qemu_co_mutex_unlock(>lock);
+if (cluster_offset < 0) {
+return cluster_offset;
+}
  index_in_cluster = sector_num & (s->cluster_sectors - 1);
  n = s->cluster_sectors - index_in_cluster;
  if (n > nb_sectors)
@@ -567,7 +583,7 @@ static coroutine_fn int qcow_co_readv(BlockDriverState *bs, 
int64_t sector_num,
  BDRVQcowState *s = bs->opaque;
  int index_in_cluster;
  int ret = 0, n;
-uint64_t cluster_offset;
+int64_t cluster_offset;
  struct iovec hd_iov;
  QEMUIOVector hd_qiov;
  uint8_t *buf;
@@ -588,8 +604,10 @@ static coroutine_fn int qcow_co_readv(BlockDriverState 
*bs, int64_t sector_num,

  while

Re: [Qemu-devel] [PATCH 4/4] qcow2: Check failure of bdrv_getlength()


On 08/07/2017 05:30 PM, Eric Blake wrote:

qcow2_co_pwritev_compressed() should not call bdrv_truncate()
if determining the size failed.

Reported-by: Markus Armbruster 
Signed-off-by: Eric Blake 


Reviewed-by: Philippe Mathieu-Daudé 


---
  block/qcow2.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 99407403ea..40ba26c111 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3282,12 +3282,15 @@ qcow2_co_pwritev_compressed(BlockDriverState *bs, 
uint64_t offset,
  z_stream strm;
  int ret, out_len;
  uint8_t *buf, *out_buf;
-uint64_t cluster_offset;
+int64_t cluster_offset;

  if (bytes == 0) {
  /* align end of file to a sector boundary to ease reading with
 sector based I/Os */
  cluster_offset = bdrv_getlength(bs->file->bs);
+if (cluster_offset < 0) {
+return cluster_offset;
+}
  return bdrv_truncate(bs->file, cluster_offset, PREALLOC_MODE_OFF, 
NULL);
  }

[Qemu-devel] [Bug 1709170] [NEW] QEMU fails to honor O_TMPFILE

2017-08-07 Thread Thiago Macieira

Public bug reported:

When making a call like

  open("/tmp", O_TMPFILE | O_RDWR);

under QEMU, we ged -EISDIR.

Under any kernel 3.11 or later, we are supposed to get an unnamed file
in /tmp. In case the filesystem for /tmp does not support unnamed files,
we are supposed to get EOPNOTSUPP.

[I don't know the QEMU version, since this happened in a system I don't
have access to]

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1709170

Title:
  QEMU fails to honor O_TMPFILE

Status in QEMU:
  New

Bug description:
  When making a call like

open("/tmp", O_TMPFILE | O_RDWR);

  under QEMU, we ged -EISDIR.

  Under any kernel 3.11 or later, we are supposed to get an unnamed file
  in /tmp. In case the filesystem for /tmp does not support unnamed
  files, we are supposed to get EOPNOTSUPP.

  [I don't know the QEMU version, since this happened in a system I
  don't have access to]

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1709170/+subscriptions

Re: [Qemu-devel] [PATCH 3/4] qcow2: Drop debugging dump_refcounts()


On 08/07/2017 05:30 PM, Eric Blake wrote:

It's been #if 0'd since its introduction in 2006, commit 585f8587.
We can revive dead code if we need it, but in the meantime, it has
bit-rotted (for example, not checking for failure in bdrv_getlength()).

Signed-off-by: Eric Blake 


Reviewed-by: Philippe Mathieu-Daudé 


--- >   block/qcow2.c | 21 -
  1 file changed, 21 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index d7c600b5a2..99407403ea 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3798,27 +3798,6 @@ static ImageInfoSpecific 
*qcow2_get_specific_info(BlockDriverState *bs)
  return spec_info;
  }

-#if 0
-static void dump_refcounts(BlockDriverState *bs)
-{
-BDRVQcow2State *s = bs->opaque;
-int64_t nb_clusters, k, k1, size;
-int refcount;
-
-size = bdrv_getlength(bs->file->bs);
-nb_clusters = size_to_clusters(s, size);
-for(k = 0; k < nb_clusters;) {
-k1 = k;
-refcount = get_refcount(bs, k);
-k++;
-while (k < nb_clusters && get_refcount(bs, k) == refcount)
-k++;
-printf("%" PRId64 ": refcount=%d nb=%" PRId64 "\n", k, refcount,
-   k - k1);
-}
-}
-#endif
-
  static int qcow2_save_vmstate(BlockDriverState *bs, QEMUIOVector *qiov,
int64_t pos)
  {

Re: [Qemu-devel] [PATCH] tests/pxe: Check virtio-net-ccw on s390x

On Thu, Aug 03, 2017 at 03:30:19PM +0200, Thomas Huth wrote:
> Now that we've got a firmware that can do TFTP booting on s390x (i.e.
> the pc-bios/s390-netboot.img), we can enable the PXE tester for this
> architecture, too.
> 
> Signed-off-by: Thomas Huth 
> ---
>  Since this only adds a test, I guess this could still be included
>  for 2.10 ... otherwise, I think it's also OK to simply postpone this
>  to 2.11 instead

I think 2.11 is preferable.

>  tests/Makefile.include |  1 +
>  tests/boot-sector.c| 25 ++---
>  tests/pxe-test.c   |  7 +++
>  3 files changed, 30 insertions(+), 3 deletions(-)
> 
> diff --git a/tests/Makefile.include b/tests/Makefile.include
> index 59e536b..f9af5e3 100644
> --- a/tests/Makefile.include
> +++ b/tests/Makefile.include
> @@ -337,6 +337,7 @@ check-qtest-microblazeel-y = $(check-qtest-microblaze-y)
>  check-qtest-xtensaeb-y = $(check-qtest-xtensa-y)
>  
>  check-qtest-s390x-y = tests/boot-serial-test$(EXESUF)
> +check-qtest-s390x-y += tests/pxe-test$(EXESUF)
>  
>  check-qtest-generic-y += tests/qom-test$(EXESUF)
>  check-qtest-generic-y += tests/test-hmp$(EXESUF)
> diff --git a/tests/boot-sector.c b/tests/boot-sector.c
> index e3880f4..87a8fb5 100644
> --- a/tests/boot-sector.c
> +++ b/tests/boot-sector.c
> @@ -21,6 +21,7 @@
>  #define SIGNATURE 0xdead
>  #define SIGNATURE_OFFSET 0x10
>  #define BOOT_SECTOR_ADDRESS 0x7c00
> +#define SIGNATURE_ADDR (BOOT_SECTOR_ADDRESS + SIGNATURE_OFFSET)
>  
>  /* Boot sector code: write SIGNATURE into memory,
>   * then halt.
> @@ -73,6 +74,7 @@ int boot_sector_init(char *fname)
>  {
>  int fd, ret;
>  size_t len = sizeof boot_sector;
> +const char *arch = qtest_get_arch();
>  
>  fd = mkstemp(fname);
>  if (fd < 0) {
> @@ -81,10 +83,27 @@ int boot_sector_init(char *fname)
>  }
>  
>  /* For Open Firmware based system, we can use a Forth script instead */
> -if (strcmp(qtest_get_arch(), "ppc64") == 0) {
> +if (g_str_equal(arch, "ppc64")) {
>  len = sprintf((char *)boot_sector, "\\ Bootscript\n%x %x c! %x %x 
> c!\n",
> -LOW(SIGNATURE), BOOT_SECTOR_ADDRESS + SIGNATURE_OFFSET,
> -HIGH(SIGNATURE), BOOT_SECTOR_ADDRESS + SIGNATURE_OFFSET + 1);
> +LOW(SIGNATURE), SIGNATURE_ADDR,
> +HIGH(SIGNATURE), SIGNATURE_ADDR + 1);
> +} else if (g_str_equal(arch, "s390x")) {
> +/* For s390x, fake a kernel signature */
> +const uint8_t psw[] = {
> +0x00, 0x08, 0x00, 0x00, 0x80, 0x01, 0x00, 0x00
> +};
> +const uint8_t code[] = {
> +0xa7, 0xf4, 0x00, 0x0a, /* j 0x10010 */
> +0x00, 0x00, 0x00, 0x00,
> +'S', '3', '9', '0',
> +'E', 'P', 0x00, 0x01,
> +0xa7, 0x38, HIGH(SIGNATURE_ADDR), LOW(SIGNATURE_ADDR), /* lhi r3 
> */
> +0xa7, 0x48, LOW(SIGNATURE), HIGH(SIGNATURE),/* lhi r4,0xadde 
> */
> +0x40, 0x40, 0x30, 0x00, /* sth r4,0(r3) */
> +0xa7, 0xf4, 0xff, 0xfa  /* j 0x10010 */
> +};
> +memcpy(boot_sector, psw, 8);
> +memcpy(_sector[0x1], code, sizeof(code));


Could we avoid overwriting boot sector?
We really should have
x86_boot_sector
ppc64_boot_sector
s390_boot_sector

And write out the correct thing.

Also:
 * Q35 machine requires a minimum 0x7e000 bytes disk.
 * (bug or feature?)

probably does not apply to s390x, does it?

BTW I'm guessing it actually does not apply to the pxe test
at all.

>  }
>  
>  ret = write(fd, boot_sector, len);
> diff --git a/tests/pxe-test.c b/tests/pxe-test.c
> index cf6e225..0d70afc 100644
> --- a/tests/pxe-test.c
> +++ b/tests/pxe-test.c
> @@ -51,6 +51,11 @@ static void test_pxe_spapr_vlan(void)
>  test_pxe_one("-device spapr-vlan,netdev=" NETNAME, true);
>  }
>  
> +static void test_pxe_virtio_ccw(void)
> +{
> +test_pxe_one("-device virtio-net-ccw,bootindex=1,netdev=" NETNAME, 
> false);
> +}
> +
>  int main(int argc, char *argv[])
>  {
>  int ret;
> @@ -68,6 +73,8 @@ int main(int argc, char *argv[])
>  } else if (strcmp(arch, "ppc64") == 0) {
>  qtest_add_func("pxe/virtio", test_pxe_virtio_pci);
>  qtest_add_func("pxe/spapr-vlan", test_pxe_spapr_vlan);
> +} else if (g_str_equal(arch, "s390x")) {
> +qtest_add_func("pxe/virtio-ccw", test_pxe_virtio_ccw);
>  }
>  ret = g_test_run();
>  boot_sector_cleanup(disk);
> -- 
> 1.8.3.1

Re: [Qemu-devel] [PATCH 1/4] vpc: Check failure of bdrv_getlength()


On 08/07/2017 05:30 PM, Eric Blake wrote:

vpc_open() was checking for bdrv_getlength() failure in one, but
not the other, location.

Reported-by: Markus Armbruster 
Signed-off-by: Eric Blake 


Reviewed-by: Philippe Mathieu-Daudé 


---
  block/vpc.c | 9 -
  1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/block/vpc.c b/block/vpc.c
index 574879ba7c..468d10ec1c 100644
--- a/block/vpc.c
+++ b/block/vpc.c
@@ -219,6 +219,7 @@ static int vpc_open(BlockDriverState *bs, QDict *options, 
int flags,
  uint64_t pagetable_size;
  int disk_type = VHD_DYNAMIC;
  int ret;
+int64_t bs_size;

  bs->file = bdrv_open_child(NULL, options, "file", bs, _file,
 false, errp);
@@ -411,7 +412,13 @@ static int vpc_open(BlockDriverState *bs, QDict *options, 
int flags,
  }
  }

-if (s->free_data_block_offset > bdrv_getlength(bs->file->bs)) {
+bs_size = bdrv_getlength(bs->file->bs);
+if (bs_size < 0) {
+error_setg_errno(errp, -bs_size, "unable to learn image size");
+ret = bs_size;
+goto fail;
+}
+if (s->free_data_block_offset > bs_size) {
  error_setg(errp, "block-vpc: free_data_block_offset points after "
   "the end of file. The image has been 
truncated.");
  ret = -EINVAL;

[Qemu-devel] [PATCH 4/4] qcow2: Check failure of bdrv_getlength()

qcow2_co_pwritev_compressed() should not call bdrv_truncate()
if determining the size failed.

Reported-by: Markus Armbruster 
Signed-off-by: Eric Blake 
---
 block/qcow2.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 99407403ea..40ba26c111 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3282,12 +3282,15 @@ qcow2_co_pwritev_compressed(BlockDriverState *bs, 
uint64_t offset,
 z_stream strm;
 int ret, out_len;
 uint8_t *buf, *out_buf;
-uint64_t cluster_offset;
+int64_t cluster_offset;

 if (bytes == 0) {
 /* align end of file to a sector boundary to ease reading with
sector based I/Os */
 cluster_offset = bdrv_getlength(bs->file->bs);
+if (cluster_offset < 0) {
+return cluster_offset;
+}
 return bdrv_truncate(bs->file, cluster_offset, PREALLOC_MODE_OFF, 
NULL);
 }

-- 
2.13.4

[Qemu-devel] [PATCH 1/4] vpc: Check failure of bdrv_getlength()

vpc_open() was checking for bdrv_getlength() failure in one, but
not the other, location.

Reported-by: Markus Armbruster 
Signed-off-by: Eric Blake 
---
 block/vpc.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/block/vpc.c b/block/vpc.c
index 574879ba7c..468d10ec1c 100644
--- a/block/vpc.c
+++ b/block/vpc.c
@@ -219,6 +219,7 @@ static int vpc_open(BlockDriverState *bs, QDict *options, 
int flags,
 uint64_t pagetable_size;
 int disk_type = VHD_DYNAMIC;
 int ret;
+int64_t bs_size;

 bs->file = bdrv_open_child(NULL, options, "file", bs, _file,
false, errp);
@@ -411,7 +412,13 @@ static int vpc_open(BlockDriverState *bs, QDict *options, 
int flags,
 }
 }

-if (s->free_data_block_offset > bdrv_getlength(bs->file->bs)) {
+bs_size = bdrv_getlength(bs->file->bs);
+if (bs_size < 0) {
+error_setg_errno(errp, -bs_size, "unable to learn image size");
+ret = bs_size;
+goto fail;
+}
+if (s->free_data_block_offset > bs_size) {
 error_setg(errp, "block-vpc: free_data_block_offset points after "
  "the end of file. The image has been truncated.");
 ret = -EINVAL;
-- 
2.13.4

[Qemu-devel] [PATCH 3/4] qcow2: Drop debugging dump_refcounts()

It's been #if 0'd since its introduction in 2006, commit 585f8587.
We can revive dead code if we need it, but in the meantime, it has
bit-rotted (for example, not checking for failure in bdrv_getlength()).

Signed-off-by: Eric Blake 
---
 block/qcow2.c | 21 -
 1 file changed, 21 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index d7c600b5a2..99407403ea 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3798,27 +3798,6 @@ static ImageInfoSpecific 
*qcow2_get_specific_info(BlockDriverState *bs)
 return spec_info;
 }

-#if 0
-static void dump_refcounts(BlockDriverState *bs)
-{
-BDRVQcow2State *s = bs->opaque;
-int64_t nb_clusters, k, k1, size;
-int refcount;
-
-size = bdrv_getlength(bs->file->bs);
-nb_clusters = size_to_clusters(s, size);
-for(k = 0; k < nb_clusters;) {
-k1 = k;
-refcount = get_refcount(bs, k);
-k++;
-while (k < nb_clusters && get_refcount(bs, k) == refcount)
-k++;
-printf("%" PRId64 ": refcount=%d nb=%" PRId64 "\n", k, refcount,
-   k - k1);
-}
-}
-#endif
-
 static int qcow2_save_vmstate(BlockDriverState *bs, QEMUIOVector *qiov,
   int64_t pos)
 {
-- 
2.13.4

[Qemu-devel] [PATCH 2/4] qcow: Check failure of bdrv_getlength() and bdrv_truncate()

This also requires changing the return type of get_cluster_offset()
and adjusting all callers.

Use osdep.h macros instead of open-coded rounding while in the
area.

Reported-by: Markus Armbruster 
Signed-off-by: Eric Blake 
---
 block/qcow.c | 64 ++--
 1 file changed, 45 insertions(+), 19 deletions(-)

diff --git a/block/qcow.c b/block/qcow.c
index c08cdc4a7b..937023d447 100644
--- a/block/qcow.c
+++ b/block/qcow.c
@@ -347,16 +347,17 @@ static int qcow_reopen_prepare(BDRVReopenState *state,
  * 'compressed_size'. 'compressed_size' must be > 0 and <
  * cluster_size
  *
- * return 0 if not allocated.
+ * return 0 if not allocated, -errno on failure.
  */
-static uint64_t get_cluster_offset(BlockDriverState *bs,
-   uint64_t offset, int allocate,
-   int compressed_size,
-   int n_start, int n_end)
+static int64_t get_cluster_offset(BlockDriverState *bs,
+  uint64_t offset, int allocate,
+  int compressed_size,
+  int n_start, int n_end)
 {
 BDRVQcowState *s = bs->opaque;
 int min_index, i, j, l1_index, l2_index;
-uint64_t l2_offset, *l2_table, cluster_offset, tmp;
+int64_t l2_offset, cluster_offset;
+uint64_t *l2_table, tmp;
 uint32_t min_count;
 int new_l2_table;

@@ -368,8 +369,11 @@ static uint64_t get_cluster_offset(BlockDriverState *bs,
 return 0;
 /* allocate a new l2 entry */
 l2_offset = bdrv_getlength(bs->file->bs);
+if (l2_offset < 0) {
+return l2_offset;
+}
 /* round to cluster size */
-l2_offset = (l2_offset + s->cluster_size - 1) & ~(s->cluster_size - 1);
+l2_offset = QEMU_ALIGN_UP(l2_offset, s->cluster_size);
 /* update the L1 entry */
 s->l1_table[l1_index] = l2_offset;
 tmp = cpu_to_be64(l2_offset);
@@ -430,8 +434,10 @@ static uint64_t get_cluster_offset(BlockDriverState *bs,
 if (decompress_cluster(bs, cluster_offset) < 0)
 return 0;
 cluster_offset = bdrv_getlength(bs->file->bs);
-cluster_offset = (cluster_offset + s->cluster_size - 1) &
-~(s->cluster_size - 1);
+if (cluster_offset < 0) {
+return cluster_offset;
+}
+cluster_offset = QEMU_ALIGN_UP(cluster_offset, s->cluster_size);
 /* write the cluster content */
 if (bdrv_pwrite(bs->file, cluster_offset, s->cluster_cache,
 s->cluster_size) !=
@@ -439,12 +445,19 @@ static uint64_t get_cluster_offset(BlockDriverState *bs,
 return -1;
 } else {
 cluster_offset = bdrv_getlength(bs->file->bs);
+if (cluster_offset < 0) {
+return cluster_offset;
+}
 if (allocate == 1) {
+int ret;
+
 /* round to cluster size */
-cluster_offset = (cluster_offset + s->cluster_size - 1) &
-~(s->cluster_size - 1);
-bdrv_truncate(bs->file, cluster_offset + s->cluster_size,
-  PREALLOC_MODE_OFF, NULL);
+cluster_offset = QEMU_ALIGN_UP(cluster_offset, 
s->cluster_size);
+ret = bdrv_truncate(bs->file, cluster_offset + s->cluster_size,
+PREALLOC_MODE_OFF, NULL);
+if (ret < 0) {
+return ret;
+}
 /* if encrypted, we must initialize the cluster
content which won't be written */
 if (bs->encrypted &&
@@ -491,11 +504,14 @@ static int64_t coroutine_fn 
qcow_co_get_block_status(BlockDriverState *bs,
 {
 BDRVQcowState *s = bs->opaque;
 int index_in_cluster, n;
-uint64_t cluster_offset;
+int64_t cluster_offset;

 qemu_co_mutex_lock(>lock);
 cluster_offset = get_cluster_offset(bs, sector_num << 9, 0, 0, 0, 0);
 qemu_co_mutex_unlock(>lock);
+if (cluster_offset < 0) {
+return cluster_offset;
+}
 index_in_cluster = sector_num & (s->cluster_sectors - 1);
 n = s->cluster_sectors - index_in_cluster;
 if (n > nb_sectors)
@@ -567,7 +583,7 @@ static coroutine_fn int qcow_co_readv(BlockDriverState *bs, 
int64_t sector_num,
 BDRVQcowState *s = bs->opaque;
 int index_in_cluster;
 int ret = 0, n;
-uint64_t cluster_offset;
+int64_t cluster_offset;
 struct iovec hd_iov;
 QEMUIOVector hd_qiov;
 uint8_t *buf;
@@ -588,8 +604,10 @@ static coroutine_fn int qcow_co_readv(BlockDriverState 
*bs, int64_t sector_num,

 while (nb_sectors != 0) {
 /* prepare next request */
-cluster_offset = get_cluster_offset(bs, sector_num << 9,
-

[Qemu-devel] [PATCH for-2.10 0/4] More blk_getlength() fixes

Thanks again to Markus for catching these spots.

Eric Blake (4):
  vpc: Check failure of bdrv_getlength()
  qcow: Check failure of bdrv_getlength() and bdrv_truncate()
  qcow2: Drop debugging dump_refcounts()
  qcow2: Check failure of bdrv_getlength()

 block/qcow.c  | 64 +--
 block/qcow2.c | 26 
 block/vpc.c   |  9 -
 3 files changed, 57 insertions(+), 42 deletions(-)

-- 
2.13.4

Re: [Qemu-devel] Virtual RDMA device

2017-08-07 Thread Marcel Apfelbaum

On 01/08/2017 0:52, Michael S. Tsirkin wrote:

On Mon, Jul 31, 2017 at 10:10:15PM +0300, Marcel Apfelbaum wrote:

[...]

This seems to be the mail thread with most CCs.

Hi Michael,

Please copy the upstream lists. virtio-dev, virtualization and
qemu-devel for starters. I'd say linux-rdma as well.

Sure, adding all the mentioned lists.
To recap:

Yuval and I succeeded to implement a QEMU pvrdma device that works
(with some limitations) with VMware's Linux driver.
The git can be found at:
https://github.com/yuvalshaia/qemu/tree/pvrdma.master.ibverbs
The QEMU RFC at:
https://patchwork.kernel.org/patch/9653893/

While we do plan to merge it to QEMU, as a next step it seems
to be a consensus that virtio should be used to implement guest<->host
communication; since the project is not small, we need a plan.

We will do our best to attend the Micro RDMA conference:
-
https://www.linuxplumbersconf.org/2017/rdma-microconference-accepted-into-the-linux-plumbers-conference/

The attached presentation draft is intended to be a starting point
for a discussion on how to continue the project.

I attached a presentation
draft with *current* implementation of the pvrdma
device.(please excuse my presentation skills).

It is supposed to be a starting point for discussing
a virtio-rdma device and for a presentation at
the RDMA Micro-conference.

Please feel free to review/add and point to
what important things have I missed and how
we can continue.

I'd suggest ignoring migration initially, just get
rdma with memory overcommit working. For that:
- come up with a timeline
- decide who works on each of 3 things
1. extend libibverbs
2. layer virtio in qemu on top
3. write guest driver

I want to mention there is another discussion on this matter:
https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg06012.html

That simply lists what is missing in libibverbs to be able to implement
a PV RDMA device with memory overcommit and migration on top.

We should converge somehow.
Once Yuval returns from his vacation we will send a v2.

Thanks,
Marcel

[...]
Thanks,
Marcel

pvrdma.odp
Description: application/vnd.oasis.opendocument.presentation

[Qemu-devel] [PATCH 3/3] Add new functions for whitelisting and their calls

The 'check_updated_properties' function keeps track of properties
that were added/removed from fields across qemu versions. The
'check_updated_sizes' function reduces false positives generated
especially while testing backward migration by keeping a list
of common size/version changes. The 'check_new_sections' function
is used to check for sections that got deprecated or were introduced
in different versions of qemu and will show as false positives while
testing forward migration. Improved the variable names and added
multiple blank newlines to keep Python PEP8 warning away.

Signed-off-by: Deepak Verma 
---
 scripts/vmstate-static-checker.py | 226 ++
 1 file changed, 180 insertions(+), 46 deletions(-)

diff --git a/scripts/vmstate-static-checker.py 
b/scripts/vmstate-static-checker.py
index ae41e44..77e63f1 100755
--- a/scripts/vmstate-static-checker.py
+++ b/scripts/vmstate-static-checker.py
@@ -40,6 +40,107 @@ def bump_taint():
 if taint < 255:
 taint = taint + 1
 
+# Sections gain/lose new fields with time.
+# These are not name changes thats handled by another list.
+# These will be 'missing' or 'not found' in different versions of qemu
+
+
+def check_updated_properties(src_desc, field):
+src_desc = str(src_desc)
+field = str(field)
+updated_property = {
+  'ICH9LPC': ['ICH9LPC/smi_feat'],
+  'ide_bus/error': ['retry_sector_num', 'retry_nsector', 'retry_unit'],
+  'e1000': ['e1000/full_mac_state'],
+  'ich9_pm': ['ich9_pm/tco', 'ich9_pm/cpuhp']
+}
+
+if src_desc in updated_property and field in updated_property[src_desc]:
+return True
+
+return False
+
+
+# A lot of errors are generated due to differences in sizes some of which are 
false positives. This list
+# is used to save those common changes
+def check_updated_sizes(field, old_size, new_size):
+new_sizes_list = {
+'tally_counters.TxOk': [8, 64],
+'intel-iommu': [0, 1],
+'iommu-intel': [0, 1]
+}
+
+if field not in new_sizes_list:
+return False
+
+if (old_size in new_sizes_list[field] and new_size in 
new_sizes_list[field]):
+return True
+
+return False
+
+
+# With time new sections/hardwares supported and old ones are depreciated on
+# chipsets.
+# There is no separate list for new or dead sections as it's relative to which
+# qemu version you compare too.
+# Update this list with such sections.
+# some items in this list might overlap with changed sections names.
+def check_new_sections(sec):
+new_sections_list = [
+'virtio-balloon-device',
+'virtio-rng-device',
+'virtio-scsi-device',
+'virtio-blk-device',
+'virtio-serial-device',
+'virtio-net-device',
+'vhost-vsock-device',
+'virtio-input-host-device',
+'virtio-input-hid-device',
+'virtio-mouse-device',
+'virtio-keyboard-device',
+'virtio-vga',
+'virtio-input-device',
+'virtio-gpu-device',
+'virtio-tablet-device',
+'isa-pcspk',
+'qemu-xhci',
+'base-xhci',
+'vmgenid',
+'intel-iommu',
+'i8257',
+'i82801b11-bridge',
+'ivshmem',
+'ivshmem-doorbell',
+'ivshmem-plain',
+'usb-storage-device',
+'usb-storage-dev',
+'pci-qxl',
+'pci-uhci-usb',
+'pci-piix3',
+'pci-vga',
+'pci-bridge-seat',
+'pcie-root-port',
+'fw_cfg_io',
+'fw_cfg_mem',
+'exynos4210-ehci-usb',
+'sysbus-ehci-usb',
+'tegra2-ehci-usb',
+'kvm-apic',
+'fusbh200-ehci-usb',
+'apic',
+'apic-common',
+'xlnx,ps7-usb',
+'e1000e',
+'e1000-82544gc',
+'e1000-82545em']
+
+if sec in new_sections_list:
+return True
+
+return False
+
+# Fields might change name with time across qemu versions.
+
 
 def check_fields_match(name, s_field, d_field):
 if s_field == d_field:
@@ -286,12 +387,22 @@ def check_fields(src_fields, dest_fields, desc, sec):
 unused_count = s_item["size"] - d_item["size"]
 continue
 
-print('Section "' + sec + '", '
-  'Description "' + desc + '": '
-  'expected field "' + s_item["field"] + '", '
-  'got "' + d_item["field"] + '"; skipping rest')
-bump_taint()
-break
+# commit 20daa90a20d, extra field 'config' was added in newer 
releases
+# there will be a mismatch in the number of fields of irq_state 
and config
+# it's a known false positive so skip it
+if (desc in ["PCIDevice", "PCIEDevice"]):
+

[Qemu-devel] [PATCH 1/3] Fix format and styles; make code more pythonic

- Format fixes, cleaned up the print statement
- Style fixes, e.g. changed "if not x in y" to "if x not in y"
- Improved variable names

Signed-off-by: Deepak Verma 
---
 scripts/vmstate-static-checker.py | 111 +-
 1 file changed, 62 insertions(+), 49 deletions(-)

diff --git a/scripts/vmstate-static-checker.py 
b/scripts/vmstate-static-checker.py
index bcef7ee..b416b66 100755
--- a/scripts/vmstate-static-checker.py
+++ b/scripts/vmstate-static-checker.py
@@ -19,6 +19,11 @@
 # You should have received a copy of the GNU General Public License along
 # with this program; if not, see .
 
+#
+# 2017 Deepak Verma 
+# Added few functions and fields for whitelisting
+#
+
 import argparse
 import json
 import sys
@@ -26,6 +31,7 @@ import sys
 # Count the number of errors found
 taint = 0
 
+
 def bump_taint():
 global taint
 
@@ -92,7 +98,7 @@ def check_fields_match(name, s_field, d_field):
   'io_win_size', 'mig_io_win_size'],
 }
 
-if not name in changed_names:
+if name not in changed_names:
 return False
 
 if s_field in changed_names[name] and d_field in changed_names[name]:
@@ -100,6 +106,7 @@ def check_fields_match(name, s_field, d_field):
 
 return False
 
+
 def get_changed_sec_name(sec):
 # Section names can change -- see commit 292b1634 for an example.
 changes = {
@@ -114,16 +121,17 @@ def get_changed_sec_name(sec):
 return item
 return ""
 
+
 def exists_in_substruct(fields, item):
 # Some QEMU versions moved a few fields inside a substruct.  This
 # kept the on-wire format the same.  This function checks if
 # something got shifted inside a substruct.  For example, the
 # change in commit 1f42d22233b4f3d1a2933ff30e8d6a6d9ee2d08f
 
-if not "Description" in fields:
+if "Description" not in fields:
 return False
 
-if not "Fields" in fields["Description"]:
+if "Fields" not in fields["Description"]:
 return False
 
 substruct_fields = fields["Description"]["Fields"]
@@ -176,10 +184,10 @@ def check_fields(src_fields, dest_fields, desc, sec):
 except StopIteration:
 if d_iter_list == []:
 # We were not in a substruct
-print "Section \"" + sec + "\",",
-print "Description " + "\"" + desc + "\":",
-print "expected field \"" + s_item["field"] + "\",",
-print "while dest has no further fields"
+print('Section "' + sec + '", '
+  'Description "' + desc + '": '
+  'expected field "' + s_item["field"] + '", '
+  'while dest has no further fields')
 bump_taint()
 break
 
@@ -191,30 +199,28 @@ def check_fields(src_fields, dest_fields, desc, sec):
 advance_dest = True
 
 if unused_count != 0:
-if advance_dest == False:
+if not advance_dest:
 unused_count = unused_count - s_item["size"]
 if unused_count == 0:
 advance_dest = True
 continue
 if unused_count < 0:
-print "Section \"" + sec + "\",",
-print "Description \"" + desc + "\":",
-print "unused size mismatch near \"",
-print s_item["field"] + "\""
+print('Section "' + sec + '", '
+  'Description "' + desc + '": '
+  'unused size mismatch near "' + s_item["field"] + 
'"')
 bump_taint()
 break
 continue
 
-if advance_src == False:
+if not advance_src:
 unused_count = unused_count - d_item["size"]
 if unused_count == 0:
 advance_src = True
 continue
 if unused_count < 0:
-print "Section \"" + sec + "\",",
-print "Description \"" + desc + "\":",
-print "unused size mismatch near \"",
-print d_item["field"] + "\""
+print('Section "' + sec + '", '
+  'Description "' + desc + '": '
+  'unused size mismatch near "' + d_item["field"] + 
'"')
 bump_taint()
 break
 continue
@@ -262,16 +268,16 @@ def check_fields(src_fields, dest_fields, desc, sec):
 unused_count = s_item["size"] - d_item["size"]
 continue
 
-print "Section \"" + sec + "\",",
-print "Description \"" + desc + "\":",
-print "expected field \"" + s_item["field"] + "\",",
-print "got \"" +

[Qemu-devel] [PATCH 2/3] Update the existing whitelist

Appended newer fields and introduced new names in the whitelist

Signed-off-by: Deepak Verma 
---
 scripts/vmstate-static-checker.py | 22 --
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/scripts/vmstate-static-checker.py 
b/scripts/vmstate-static-checker.py
index b416b66..ae41e44 100755
--- a/scripts/vmstate-static-checker.py
+++ b/scripts/vmstate-static-checker.py
@@ -49,7 +49,6 @@ def check_fields_match(name, s_field, d_field):
 # is used to whitelist such changes in each section / description.
 changed_names = {
 'apic': ['timer', 'timer_expiry'],
-'e1000': ['dev', 'parent_obj'],
 'ehci': ['dev', 'pcidev'],
 'I440FX': ['dev', 'parent_obj'],
 'ich9_ahci': ['card', 'parent_obj'],
@@ -73,7 +72,6 @@ def check_fields_match(name, s_field, d_field):
  'tmr.timer', 'ar.tmr.timer',
  'tmr.overflow_time', 'ar.tmr.overflow_time',
  'gpe', 'ar.gpe'],
-'rtl8139': ['dev', 'parent_obj'],
 'qxl': ['num_surfaces', 'ssd.num_surfaces'],
 'usb-ccid': ['abProtocolDataStructure', 
'abProtocolDataStructure.data'],
 'usb-host': ['dev', 'parent_obj'],
@@ -96,6 +94,26 @@ def check_fields_match(name, s_field, d_field):
   'mem_win_size', 'mig_mem_win_size',
   'io_win_addr', 'mig_io_win_addr',
   'io_win_size', 'mig_io_win_size'],
+'rtl8139': ['dev', 'parent_obj'],
+'e1000e': ['PCIDevice', 'PCIEDevice', 'intr_state', 
'redhat_7_3_intr_state'],
+'nec-usb-xhci': ['PCIDevice', 'PCIEDevice'],
+'xhci-intr': ['er_full_unused', 'er_full'],
+'e1000': ['dev', 'parent_obj',
+  'tx.ipcss', 'tx.props.ipcss',
+  'tx.ipcso', 'tx.props.ipcso',
+  'tx.ipcse', 'tx.props.ipcse',
+  'tx.tucss', 'tx.props.tucss',
+  'tx.tucso', 'tx.props.tucso',
+  'tx.tucse', 'tx.props.tucse',
+  'tx.paylen', 'tx.props.paylen',
+  'tx.hdr_len', 'tx.props.hdr_len',
+  'tx.mss', 'tx.props.mss',
+  'tx.sum_needed', 'tx.props.sum_needed',
+  'tx.ip', 'tx.props.ip',
+  'tx.tcp', 'tx.props.tcp',
+  'tx.ipcss', 'tx.props.ipcss',
+  'tx.ipcss', 'tx.props.ipcss',
+  ]
 }
 
 if name not in changed_names:
-- 
1.8.3.1

[Qemu-devel] [PATCH 0/3] Vmstate-static-checker.py fix upstream

From: Deepak Verma 

This is an update to the script vmstate-static-checker.py. The whitelist has 
been updated and newer functions have been added to reduce the false
positives generated by the script while testing migration. The code has been
cleaned and updated to follow PEP8 guidelines.

dverma (3):
  Fix format and styles; make code more pythonic
  Update the existing whitelist
  Add new functions for whitelisting and their calls

 scripts/vmstate-static-checker.py | 335 --
 1 file changed, 250 insertions(+), 85 deletions(-)

-- 
1.8.3.1

Re: [Qemu-devel] >256 Virtio-net-pci hotplug Devices

2017-08-07 Thread Kinsella, Ray

Hi Marcel,

Yup - I am using Seabios by default.
I took all the measures from the Kernel time reported in syslog. 
As Seabios wasn't exhibiting any obvious scaling problem. 

Ray K

-Original Message-
From: Marcel Apfelbaum [mailto:mar...@redhat.com] 
Sent: Wednesday, August 2, 2017 5:43 AM
To: Kinsella, Ray ; Kevin O'Connor 
Cc: Tan, Jianfeng ; seab...@seabios.org; Michael 
Tsirkin ; qemu-devel@nongnu.org; Gerd Hoffmann 

Subject: Re: [Qemu-devel] >256 Virtio-net-pci hotplug Devices

It is an issue worth looking into it, one more question, all the measurements 
are from OS boot? Do you use SeaBIOS?
No problems with the firmware?

Thanks,
Marcel

[Qemu-devel] [PATCH v5 8/8] MAINTAINERS: add Dump maintainers

Proposing myself, since I have some familiarity with the code now.

Signed-off-by: Marc-André Lureau 
Acked-by: Laszlo Ersek 
---
 MAINTAINERS | 9 +
 1 file changed, 9 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index ccee28b12d..ed07334ffb 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1286,6 +1286,15 @@ S: Maintained
 F: device_tree.c
 F: include/sysemu/device_tree.h
 
+Dump
+S: Supported
+M: Marc-André Lureau 
+F: dump.c
+F: stubs/dump.c
+F: include/sysemu/dump.h
+F: include/sysemu/dump-arch.h
+F: scripts/dump-guest-memory.py
+
 Error reporting
 M: Markus Armbruster 
 S: Supported
-- 
2.14.0.1.geff633fa0

[Qemu-devel] [PATCH v5 4/8] dump: add guest ELF note

Read the guest ELF PT_NOTE from guest memory when fw_cfg
etc/vmcoreinfo entry provides the location, and write it as an
additional note in the dump.

Signed-off-by: Marc-André Lureau 
---
 include/sysemu/dump.h |   2 +
 dump.c| 103 ++
 2 files changed, 105 insertions(+)

diff --git a/include/sysemu/dump.h b/include/sysemu/dump.h
index 2672a15f8b..df43bd0e07 100644
--- a/include/sysemu/dump.h
+++ b/include/sysemu/dump.h
@@ -192,6 +192,8 @@ typedef struct DumpState {
   * this could be used to calculate
   * how much work we have
   * finished. */
+uint8_t *guest_note; /* ELF note content */
+size_t guest_note_size;
 } DumpState;
 
 uint16_t cpu_to_dump16(DumpState *s, uint16_t val);
diff --git a/dump.c b/dump.c
index d9090a24cc..7fe1044280 100644
--- a/dump.c
+++ b/dump.c
@@ -26,6 +26,8 @@
 #include "qapi/qmp/qerror.h"
 #include "qmp-commands.h"
 #include "qapi-event.h"
+#include "qemu/error-report.h"
+#include "hw/nvram/fw_cfg.h"
 
 #include 
 #ifdef CONFIG_LZO
@@ -38,6 +40,13 @@
 #define ELF_MACHINE_UNAME "Unknown"
 #endif
 
+#define MAX_GUEST_NOTE_SIZE (1 << 20) /* 1MB should be enough */
+
+#define ELF_NOTE_SIZE(hdr_size, name_size, desc_size)   \
+((DIV_ROUND_UP((hdr_size), 4) + \
+  DIV_ROUND_UP((name_size), 4) +\
+  DIV_ROUND_UP((desc_size), 4)) * 4)
+
 uint16_t cpu_to_dump16(DumpState *s, uint16_t val)
 {
 if (s->dump_info.d_endian == ELFDATA2LSB) {
@@ -76,6 +85,8 @@ static int dump_cleanup(DumpState *s)
 guest_phys_blocks_free(>guest_phys_blocks);
 memory_mapping_list_free(>list);
 close(s->fd);
+g_free(s->guest_note);
+s->guest_note = NULL;
 if (s->resume) {
 if (s->detached) {
 qemu_mutex_lock_iothread();
@@ -235,6 +246,19 @@ static inline int cpu_index(CPUState *cpu)
 return cpu->cpu_index + 1;
 }
 
+static void write_guest_note(WriteCoreDumpFunction f, DumpState *s,
+ Error **errp)
+{
+int ret;
+
+if (s->guest_note) {
+ret = f(s->guest_note, s->guest_note_size, s);
+if (ret < 0) {
+error_setg(errp, "dump: failed to write guest note");
+}
+}
+}
+
 static void write_elf64_notes(WriteCoreDumpFunction f, DumpState *s,
   Error **errp)
 {
@@ -258,6 +282,8 @@ static void write_elf64_notes(WriteCoreDumpFunction f, 
DumpState *s,
 return;
 }
 }
+
+write_guest_note(f, s, errp);
 }
 
 static void write_elf32_note(DumpState *s, Error **errp)
@@ -303,6 +329,8 @@ static void write_elf32_notes(WriteCoreDumpFunction f, 
DumpState *s,
 return;
 }
 }
+
+write_guest_note(f, s, errp);
 }
 
 static void write_elf_section(DumpState *s, int type, Error **errp)
@@ -714,6 +742,44 @@ static int buf_write_note(const void *buf, size_t size, 
void *opaque)
 return 0;
 }
 
+/*
+ * This function retrieves various sizes from an elf header.
+ *
+ * @note has to be a valid ELF note. The return sizes are unmodified
+ * (not padded or rounded up to be multiple of 4).
+ */
+static void get_note_sizes(DumpState *s, const void *note,
+   uint64_t *note_head_size,
+   uint64_t *name_size,
+   uint64_t *desc_size)
+{
+uint64_t note_head_sz;
+uint64_t name_sz;
+uint64_t desc_sz;
+
+if (s->dump_info.d_class == ELFCLASS64) {
+const Elf64_Nhdr *hdr = note;
+note_head_sz = sizeof(Elf64_Nhdr);
+name_sz = tswap64(hdr->n_namesz);
+desc_sz = tswap64(hdr->n_descsz);
+} else {
+const Elf32_Nhdr *hdr = note;
+note_head_sz = sizeof(Elf32_Nhdr);
+name_sz = tswap32(hdr->n_namesz);
+desc_sz = tswap32(hdr->n_descsz);
+}
+
+if (note_head_size) {
+*note_head_size = note_head_sz;
+}
+if (name_size) {
+*name_size = name_sz;
+}
+if (desc_size) {
+*desc_size = desc_sz;
+}
+}
+
 /* write common header, sub header and elf note to vmcore */
 static void create_header32(DumpState *s, Error **errp)
 {
@@ -1492,6 +1558,7 @@ static void dump_init(DumpState *s, int fd, bool 
has_format,
   DumpGuestMemoryFormat format, bool paging, bool 
has_filter,
   int64_t begin, int64_t length, Error **errp)
 {
+FWCfgState *fw_cfg = fw_cfg_find();
 CPUState *cpu;
 int nr_cpus;
 Error *err = NULL;
@@ -1563,6 +1630,42 @@ static void dump_init(DumpState *s, int fd, bool 
has_format,
 goto cleanup;
 }
 
+/*
+ * The goal of this block is to copy the guest note out of
+ * the guest.  Failure to do so is not fatal for dumping.
+ */
+if (fw_cfg) {
+uint64_t addr, note_head_size, name_size, desc_size;
+uint32_t

[Qemu-devel] [PATCH v5 6/8] kdump: set vmcoreinfo location

kdump header provides offset and size of the vmcoreinfo content,
append it if available (skip the ELF note header).

crash-7.1.9 was the first version that started looking in the
vmcoreinfo data for phys_base instead of in the kdump_sub_header.

Signed-off-by: Marc-André Lureau 
---
 dump.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/dump.c b/dump.c
index cef2dd5bf9..49805a3cdc 100644
--- a/dump.c
+++ b/dump.c
@@ -858,6 +858,18 @@ static void create_header32(DumpState *s, Error **errp)
 kh->dump_level = cpu_to_dump32(s, DUMP_LEVEL);
 
 offset_note = DISKDUMP_HEADER_BLOCKS * block_size + size;
+if (s->guest_note &&
+note_name_equal(s, s->guest_note, "VMCOREINFO")) {
+uint64_t hsize, name_size, size_vmcoreinfo_desc, offset_vmcoreinfo;
+
+get_note_sizes(s, s->guest_note,
+   , _size, _vmcoreinfo_desc);
+offset_vmcoreinfo = offset_note + s->note_size - s->guest_note_size +
+(DIV_ROUND_UP(hsize, 4) + DIV_ROUND_UP(name_size, 4)) * 4;
+kh->offset_vmcoreinfo = cpu_to_dump64(s, offset_vmcoreinfo);
+kh->size_vmcoreinfo = cpu_to_dump32(s, size_vmcoreinfo_desc);
+}
+
 kh->offset_note = cpu_to_dump64(s, offset_note);
 kh->note_size = cpu_to_dump32(s, s->note_size);
 
@@ -958,6 +970,18 @@ static void create_header64(DumpState *s, Error **errp)
 kh->dump_level = cpu_to_dump32(s, DUMP_LEVEL);
 
 offset_note = DISKDUMP_HEADER_BLOCKS * block_size + size;
+if (s->guest_note &&
+note_name_equal(s, s->guest_note, "VMCOREINFO")) {
+uint64_t hsize, name_size, size_vmcoreinfo_desc, offset_vmcoreinfo;
+
+get_note_sizes(s, s->guest_note,
+   , _size, _vmcoreinfo_desc);
+offset_vmcoreinfo = offset_note + s->note_size - s->guest_note_size +
+(DIV_ROUND_UP(hsize, 4) + DIV_ROUND_UP(name_size, 4)) * 4;
+kh->offset_vmcoreinfo = cpu_to_dump64(s, offset_vmcoreinfo);
+kh->size_vmcoreinfo = cpu_to_dump64(s, size_vmcoreinfo_desc);
+}
+
 kh->offset_note = cpu_to_dump64(s, offset_note);
 kh->note_size = cpu_to_dump64(s, s->note_size);
 
-- 
2.14.0.1.geff633fa0

[Qemu-devel] [PATCH v5 2/8] fw_cfg: add write callback

Reintroduce the write callback that was removed when write support was
removed in commit 023e3148567ac898c7258138f8e86c3c2bb40d07.

The write_cb callback is called when a write reaches the end of file,
that is when the write pointer/offset reaches the size of the file.

Signed-off-by: Marc-André Lureau 
---
 include/hw/nvram/fw_cfg.h |  2 ++
 hw/acpi/vmgenid.c |  2 +-
 hw/core/loader.c  |  2 +-
 hw/i386/acpi-build.c  |  2 +-
 hw/isa/lpc_ich9.c |  4 ++--
 hw/nvram/fw_cfg.c | 18 ++
 6 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/include/hw/nvram/fw_cfg.h b/include/hw/nvram/fw_cfg.h
index f50d068563..3527cd51d8 100644
--- a/include/hw/nvram/fw_cfg.h
+++ b/include/hw/nvram/fw_cfg.h
@@ -183,6 +183,7 @@ void fw_cfg_add_file(FWCfgState *s, const char *filename, 
void *data,
  * @s: fw_cfg device being modified
  * @filename: name of new fw_cfg file item
  * @select_cb: callback function when selecting
+ * @write_cb: callback function when write reached EOF
  * @callback_opaque: argument to be passed into callback function
  * @data: pointer to start of item data
  * @len: size of item data
@@ -202,6 +203,7 @@ void fw_cfg_add_file(FWCfgState *s, const char *filename, 
void *data,
  */
 void fw_cfg_add_file_callback(FWCfgState *s, const char *filename,
   FWCfgCallback select_cb,
+  FWCfgCallback write_cb,
   void *callback_opaque,
   void *data, size_t len, bool read_only);
 
diff --git a/hw/acpi/vmgenid.c b/hw/acpi/vmgenid.c
index a32b847fe0..c1e935da9f 100644
--- a/hw/acpi/vmgenid.c
+++ b/hw/acpi/vmgenid.c
@@ -124,7 +124,7 @@ void vmgenid_add_fw_cfg(VmGenIdState *vms, FWCfgState *s, 
GArray *guid)
 fw_cfg_add_file(s, VMGENID_GUID_FW_CFG_FILE, guid->data,
 VMGENID_FW_CFG_SIZE);
 /* Create a read-write fw_cfg file for Address */
-fw_cfg_add_file_callback(s, VMGENID_ADDR_FW_CFG_FILE, NULL, NULL,
+fw_cfg_add_file_callback(s, VMGENID_ADDR_FW_CFG_FILE, NULL, NULL, NULL,
  vms->vmgenid_addr_le,
  ARRAY_SIZE(vms->vmgenid_addr_le), false);
 }
diff --git a/hw/core/loader.c b/hw/core/loader.c
index 4593061445..91669d65aa 100644
--- a/hw/core/loader.c
+++ b/hw/core/loader.c
@@ -1023,7 +1023,7 @@ MemoryRegion *rom_add_blob(const char *name, const void 
*blob, size_t len,
 }
 
 fw_cfg_add_file_callback(fw_cfg, fw_file_name,
- fw_callback, callback_opaque,
+ fw_callback, NULL, callback_opaque,
  data, rom->datasize, read_only);
 }
 return mr;
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index b9c245c9bb..be2992b708 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2922,7 +2922,7 @@ void acpi_setup(void)
 
 build_state->rsdp = g_memdup(tables.rsdp->data, rsdp_size);
 fw_cfg_add_file_callback(pcms->fw_cfg, ACPI_BUILD_RSDP_FILE,
- acpi_build_update, build_state,
+ acpi_build_update, NULL, build_state,
  build_state->rsdp, rsdp_size, true);
 build_state->rsdp_mr = NULL;
 } else {
diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
index ac8416d42b..de8fbb7260 100644
--- a/hw/isa/lpc_ich9.c
+++ b/hw/isa/lpc_ich9.c
@@ -402,12 +402,12 @@ void ich9_lpc_pm_init(PCIDevice *lpc_pci, bool 
smm_enabled)
  * just link them into fw_cfg here.
  */
 fw_cfg_add_file_callback(fw_cfg, "etc/smi/requested-features",
- NULL, NULL,
+ NULL, NULL, NULL,
  lpc->smi_guest_features_le,
  sizeof lpc->smi_guest_features_le,
  false);
 fw_cfg_add_file_callback(fw_cfg, "etc/smi/features-ok",
- smi_features_ok_callback, lpc,
+ smi_features_ok_callback, NULL, lpc,
  >smi_features_ok,
  sizeof lpc->smi_features_ok,
  true);
diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c
index e3bd626b8c..28780088b9 100644
--- a/hw/nvram/fw_cfg.c
+++ b/hw/nvram/fw_cfg.c
@@ -56,6 +56,7 @@ struct FWCfgEntry {
 uint8_t *data;
 void *callback_opaque;
 FWCfgCallback select_cb;
+FWCfgCallback write_cb;
 };
 
 #define JPG_FILE 0
@@ -384,6 +385,12 @@ static void fw_cfg_dma_transfer(FWCfgState *s)
 stl_be_dma(s->dma_as, dma_addr + offsetof(FWCfgDmaAccess, control),
 dma.control);
 
+if (write &&
+!(dma.control & FW_CFG_DMA_CTL_ERROR) &&
+s->cur_offset == e->len) {
+e->write_cb(e->callback_opaque);
+}
+

[Qemu-devel] [PATCH v5 3/8] fw_cfg: add vmcoreinfo file

See docs/specs/fw_cfg.txt for details.

The "etc/vmcoreinfo" is added when using "-global
fw_cfg.vmcoreinfo=on" qemu option.

Disabled by default for machine types v2.9 and older.

Signed-off-by: Marc-André Lureau 
---
 include/hw/compat.h   |  8 
 include/hw/nvram/fw_cfg.h |  9 +
 hw/nvram/fw_cfg.c | 20 
 docs/specs/fw_cfg.txt | 16 
 4 files changed, 53 insertions(+)

diff --git a/include/hw/compat.h b/include/hw/compat.h
index 08f36004da..317fd2e2e3 100644
--- a/include/hw/compat.h
+++ b/include/hw/compat.h
@@ -18,6 +18,14 @@
 .driver   = "pcie-root-port",\
 .property = "x-migrate-msix",\
 .value= "false",\
+},{\
+.driver   = "fw_cfg_mem",\
+.property = "vmcoreinfo",\
+.value= "off",\
+},{\
+.driver   = "fw_cfg_io",\
+.property = "vmcoreinfo",\
+.value= "off",\
 },
 
 #define HW_COMPAT_2_8 \
diff --git a/include/hw/nvram/fw_cfg.h b/include/hw/nvram/fw_cfg.h
index 3527cd51d8..a35f47405d 100644
--- a/include/hw/nvram/fw_cfg.h
+++ b/include/hw/nvram/fw_cfg.h
@@ -30,6 +30,11 @@ typedef struct FWCfgFile {
 void fw_cfg_set_order_override(FWCfgState *fw_cfg, int order);
 void fw_cfg_reset_order_override(FWCfgState *fw_cfg);
 
+typedef struct FWCfgVMCoreInfo {
+uint64_t paddr;
+uint32_t size;
+} QEMU_PACKED FWCfgVMCoreInfo;
+
 typedef struct FWCfgFiles {
 uint32_t  count;
 FWCfgFile f[];
@@ -65,6 +70,10 @@ struct FWCfgState {
 dma_addr_t dma_addr;
 AddressSpace *dma_as;
 MemoryRegion dma_iomem;
+
+bool vmcoreinfo_enabled;
+bool has_vmcoreinfo;
+FWCfgVMCoreInfo vmcoreinfo;
 };
 
 struct FWCfgIoState {
diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c
index 28780088b9..342afc4ed2 100644
--- a/hw/nvram/fw_cfg.c
+++ b/hw/nvram/fw_cfg.c
@@ -504,6 +504,7 @@ static void fw_cfg_reset(DeviceState *d)
 
 /* we never register a read callback for FW_CFG_SIGNATURE */
 fw_cfg_select(s, FW_CFG_SIGNATURE);
+s->has_vmcoreinfo = false;
 }
 
 /* Save restore 32 bit int as uint16_t
@@ -869,7 +870,12 @@ static void fw_cfg_machine_ready(struct Notifier *n, void 
*data)
 qemu_register_reset(fw_cfg_machine_reset, s);
 }
 
+static void fw_cfg_vmci_written(void *dev)
+{
+FWCfgState *s = FW_CFG(dev);
 
+s->has_vmcoreinfo = true;
+}
 
 static void fw_cfg_common_realize(DeviceState *dev, Error **errp)
 {
@@ -895,6 +901,16 @@ static void fw_cfg_common_realize(DeviceState *dev, Error 
**errp)
 
 fw_cfg_add_i32(s, FW_CFG_ID, version);
 
+if (s->vmcoreinfo_enabled) {
+if (!s->dma_enabled) {
+error_setg(errp, "vmcoreinfo requires dma_enabled");
+return;
+}
+fw_cfg_add_file_callback(s, "etc/vmcoreinfo",
+ NULL, fw_cfg_vmci_written, s,
+ >vmcoreinfo, sizeof(s->vmcoreinfo), false);
+}
+
 s->machine_ready.notify = fw_cfg_machine_ready;
 qemu_add_machine_init_done_notifier(>machine_ready);
 }
@@ -1031,6 +1047,8 @@ static void fw_cfg_file_slots_allocate(FWCfgState *s, 
Error **errp)
 static Property fw_cfg_io_properties[] = {
 DEFINE_PROP_BOOL("dma_enabled", FWCfgIoState, parent_obj.dma_enabled,
  true),
+DEFINE_PROP_BOOL("vmcoreinfo", FWCfgIoState, parent_obj.vmcoreinfo_enabled,
+ true),
 DEFINE_PROP_UINT16("x-file-slots", FWCfgIoState, parent_obj.file_slots,
FW_CFG_FILE_SLOTS_DFLT),
 DEFINE_PROP_END_OF_LIST(),
@@ -1082,6 +1100,8 @@ static Property fw_cfg_mem_properties[] = {
 DEFINE_PROP_UINT32("data_width", FWCfgMemState, data_width, -1),
 DEFINE_PROP_BOOL("dma_enabled", FWCfgMemState, parent_obj.dma_enabled,
  true),
+DEFINE_PROP_BOOL("vmcoreinfo", FWCfgMemState, 
parent_obj.vmcoreinfo_enabled,
+ true),
 DEFINE_PROP_UINT16("x-file-slots", FWCfgMemState, parent_obj.file_slots,
FW_CFG_FILE_SLOTS_DFLT),
 DEFINE_PROP_END_OF_LIST(),
diff --git a/docs/specs/fw_cfg.txt b/docs/specs/fw_cfg.txt
index 08c00bdf44..37d0f9f40a 100644
--- a/docs/specs/fw_cfg.txt
+++ b/docs/specs/fw_cfg.txt
@@ -136,6 +136,22 @@ struct FWCfgFile { /* an individual file entry, 64 
bytes total */
 char name[56]; /* fw_cfg item name, NUL-terminated ascii */
 };
 
+=== etc/vmcoreinfo ===
+
+A guest may use this entry to add information details to qemu
+dumps. The entry gives location and size of an ELF note that is
+appended in qemu dumps.
+
+The entry is of 12 bytes with this format:
+
+struct FWCfgVMCoreInfo {
+uint64_t paddr; /* physical address of ELF note, LE */
+uint32_t size;  /* size of ELF note region, LE */
+};
+
+The note format/class must be of the target bitness and the size must
+be less than 1Mb.
+
 === All Other Data Items ===
 
 Please consult the QEMU source for the

[Qemu-devel] [PATCH v5 7/8] scripts/dump-guest-memory.py: add vmcoreinfo

Add vmcoreinfo ELF note if fw-cfg has the memory location details.

Signed-off-by: Marc-André Lureau 
---
 scripts/dump-guest-memory.py | 51 
 1 file changed, 51 insertions(+)

diff --git a/scripts/dump-guest-memory.py b/scripts/dump-guest-memory.py
index f7c6635f15..1f97373f0d 100644
--- a/scripts/dump-guest-memory.py
+++ b/scripts/dump-guest-memory.py
@@ -14,6 +14,7 @@ the COPYING file in the top-level directory.
 """
 
 import ctypes
+import struct
 
 UINTPTR_T = gdb.lookup_type("uintptr_t")
 
@@ -45,6 +46,12 @@ EM_S390 = 22
 EM_AARCH = 183
 EM_X86_64 = 62
 
+def le32_to_cpu(val):
+return struct.unpack(" 1 << 20:
+print('warning: invalid vmcoreinfo size')
+return
+# now get the full note
+note = get_arch_note(self.endianness,
+ header.n_namesz - 1, header.n_descsz)
+ctypes.memmove(ctypes.pointer(note), vmcoreinfo, ctypes.sizeof(note))
+
+self.notes.append(note)
+self.segments[0].p_filesz += ctypes.sizeof(note)
+self.segments[0].p_memsz += ctypes.sizeof(note)
+
 def add_segment(self, p_type, p_paddr, p_size):
 """Adds a segment to the elf."""
 
@@ -505,6 +531,30 @@ shape and this command should mostly work."""
 cur += chunk_size
 left -= chunk_size
 
+def phys_memory_read(self, addr, size):
+qemu_core = gdb.inferiors()[0]
+for block in self.guest_phys_blocks:
+if block["target_start"] <= addr \
+   and addr + size <= block["target_end"]:
+haddr = block["host_addr"] + (addr - block["target_start"])
+return qemu_core.read_memory(haddr, size)
+return None
+
+def add_vmcoreinfo(self):
+if not gdb.parse_and_eval("fw_cfg_find()") \
+   or not gdb.parse_and_eval("fw_cfg_find()->have_vmcoreinfo"):
+return
+
+addr = gdb.parse_and_eval("fw_cfg_find()->vmcoreinfo.paddr")
+size = gdb.parse_and_eval("fw_cfg_find()->vmcoreinfo.size")
+
+addr = le64_to_cpu(addr)
+size = le32_to_cpu(size)
+
+vmcoreinfo = self.phys_memory_read(addr, size)
+if vmcoreinfo:
+self.elf.add_vmcoreinfo_note(vmcoreinfo.tobytes())
+
 def invoke(self, args, from_tty):
 """Handles command invocation from gdb."""
 
@@ -518,6 +568,7 @@ shape and this command should mostly work."""
 
 self.elf = ELF(argv[1])
 self.guest_phys_blocks = get_guest_phys_blocks()
+self.add_vmcoreinfo()
 
 with open(argv[0], "wb") as vmcore:
 self.dump_init(vmcore)
-- 
2.14.0.1.geff633fa0

[Qemu-devel] [PATCH v5 5/8] dump: update phys_base header field based on VMCOREINFO content

If the guest note is VMCOREINFO, try to get phys_base from it.

Signed-off-by: Marc-André Lureau 
---
 dump.c| 56 +--
 docs/specs/fw_cfg.txt |  8 
 2 files changed, 62 insertions(+), 2 deletions(-)

diff --git a/dump.c b/dump.c
index 7fe1044280..cef2dd5bf9 100644
--- a/dump.c
+++ b/dump.c
@@ -780,6 +780,23 @@ static void get_note_sizes(DumpState *s, const void *note,
 }
 }
 
+static bool note_name_equal(DumpState *s,
+const uint8_t *note, const char *name)
+{
+int len = strlen(name) + 1;
+uint64_t head_size, name_size;
+
+get_note_sizes(s, note, _size, _size, NULL);
+head_size = ROUND_UP(head_size, 4);
+
+if (name_size != len ||
+memcmp(note + head_size, "VMCOREINFO", len)) {
+return false;
+}
+
+return true;
+}
+
 /* write common header, sub header and elf note to vmcore */
 static void create_header32(DumpState *s, Error **errp)
 {
@@ -1554,6 +1571,39 @@ static int64_t dump_calculate_size(DumpState *s)
 return total;
 }
 
+static void vmcoreinfo_update_phys_base(DumpState *s)
+{
+uint64_t size, note_head_size, name_size, phys_base;
+char **lines;
+uint8_t *vmci;
+size_t i;
+
+if (!note_name_equal(s, s->guest_note, "VMCOREINFO")) {
+return;
+}
+
+get_note_sizes(s, s->guest_note, _head_size, _size, );
+note_head_size = ROUND_UP(note_head_size, 4);
+
+vmci = s->guest_note + note_head_size + ROUND_UP(name_size, 4);
+*(vmci + size) = '\0';
+
+lines = g_strsplit((char *)vmci, "\n", -1);
+for (i = 0; lines[i]; i++) {
+if (g_str_has_prefix(lines[i], "NUMBER(phys_base)=")) {
+if (qemu_strtou64(lines[i] + 18, NULL, 16,
+  _base) < 0) {
+warn_report("Failed to read NUMBER(phys_base)=");
+} else {
+s->dump_info.phys_base = phys_base;
+}
+break;
+}
+}
+
+g_strfreev(lines);
+}
+
 static void dump_init(DumpState *s, int fd, bool has_format,
   DumpGuestMemoryFormat format, bool paging, bool 
has_filter,
   int64_t begin, int64_t length, Error **errp)
@@ -1631,8 +1681,9 @@ static void dump_init(DumpState *s, int fd, bool 
has_format,
 }
 
 /*
- * The goal of this block is to copy the guest note out of
- * the guest.  Failure to do so is not fatal for dumping.
+ * The goal of this block is to (a) update the previously guessed
+ * phys_base, (b) copy the guest note note out of the guest.
+ * Failure to do so is not fatal for dumping.
  */
 if (fw_cfg) {
 uint64_t addr, note_head_size, name_size, desc_size;
@@ -1661,6 +1712,7 @@ static void dump_init(DumpState *s, int fd, bool 
has_format,
 g_free(s->guest_note);
 s->guest_note = NULL;
 } else {
+vmcoreinfo_update_phys_base(s);
 s->note_size += s->guest_note_size;
 }
 }
diff --git a/docs/specs/fw_cfg.txt b/docs/specs/fw_cfg.txt
index 37d0f9f40a..64c6aaed1f 100644
--- a/docs/specs/fw_cfg.txt
+++ b/docs/specs/fw_cfg.txt
@@ -152,6 +152,14 @@ struct FWCfgVMCoreInfo {
 The note format/class must be of the target bitness and the size must
 be less than 1Mb.
 
+If the ELF note name is "VMCOREINFO", it is expected to be the Linux
+vmcoreinfo note (see Documentation/ABI/testing/sysfs-kernel-vmcoreinfo
+in Linux source). In this case, qemu dump code will read the content
+as a key=value text file, looking for "NUMBER(phys_base)" key
+value. The value is expected to be more accurate than architecture
+guess of the value. This is useful for KASLR-enabled guest with
+ancient tools not handling the VMCOREINFO note.
+
 === All Other Data Items ===
 
 Please consult the QEMU source for the most up-to-date and authoritative list
-- 
2.14.0.1.geff633fa0

[Qemu-devel] [PATCH v5 1/8] fw_cfg: rename read callback

The callback is called on select.

Furthermore, the next patch introduced a new callback, so rename the
function type with a generic name.

Signed-off-by: Marc-André Lureau 
---
 include/hw/loader.h   |  2 +-
 include/hw/nvram/fw_cfg.h |  7 ---
 hw/core/loader.c  |  2 +-
 hw/nvram/fw_cfg.c | 30 --
 4 files changed, 22 insertions(+), 19 deletions(-)

diff --git a/include/hw/loader.h b/include/hw/loader.h
index 490c9ff8e6..355fe0f5a2 100644
--- a/include/hw/loader.h
+++ b/include/hw/loader.h
@@ -192,7 +192,7 @@ int rom_add_file(const char *file, const char *fw_dir,
 MemoryRegion *rom_add_blob(const char *name, const void *blob, size_t len,
size_t max_len, hwaddr addr,
const char *fw_file_name,
-   FWCfgReadCallback fw_callback,
+   FWCfgCallback fw_callback,
void *callback_opaque, AddressSpace *as,
bool read_only);
 int rom_add_elf_program(const char *name, void *data, size_t datasize,
diff --git a/include/hw/nvram/fw_cfg.h b/include/hw/nvram/fw_cfg.h
index b77ea48abb..f50d068563 100644
--- a/include/hw/nvram/fw_cfg.h
+++ b/include/hw/nvram/fw_cfg.h
@@ -44,7 +44,7 @@ typedef struct FWCfgDmaAccess {
 uint64_t address;
 } QEMU_PACKED FWCfgDmaAccess;
 
-typedef void (*FWCfgReadCallback)(void *opaque);
+typedef void (*FWCfgCallback)(void *opaque);
 
 struct FWCfgState {
 /*< private >*/
@@ -182,7 +182,7 @@ void fw_cfg_add_file(FWCfgState *s, const char *filename, 
void *data,
  * fw_cfg_add_file_callback:
  * @s: fw_cfg device being modified
  * @filename: name of new fw_cfg file item
- * @callback: callback function
+ * @select_cb: callback function when selecting
  * @callback_opaque: argument to be passed into callback function
  * @data: pointer to start of item data
  * @len: size of item data
@@ -201,7 +201,8 @@ void fw_cfg_add_file(FWCfgState *s, const char *filename, 
void *data,
  * with FW_CFG_DMA_CTL_SELECT).
  */
 void fw_cfg_add_file_callback(FWCfgState *s, const char *filename,
-  FWCfgReadCallback callback, void 
*callback_opaque,
+  FWCfgCallback select_cb,
+  void *callback_opaque,
   void *data, size_t len, bool read_only);
 
 /**
diff --git a/hw/core/loader.c b/hw/core/loader.c
index ebe574c7ea..4593061445 100644
--- a/hw/core/loader.c
+++ b/hw/core/loader.c
@@ -989,7 +989,7 @@ err:
 
 MemoryRegion *rom_add_blob(const char *name, const void *blob, size_t len,
size_t max_len, hwaddr addr, const char *fw_file_name,
-   FWCfgReadCallback fw_callback, void *callback_opaque,
+   FWCfgCallback fw_callback, void *callback_opaque,
AddressSpace *as, bool read_only)
 {
 MachineClass *mc = MACHINE_GET_CLASS(qdev_get_machine());
diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c
index 5bd904487f..e3bd626b8c 100644
--- a/hw/nvram/fw_cfg.c
+++ b/hw/nvram/fw_cfg.c
@@ -55,7 +55,7 @@ struct FWCfgEntry {
 bool allow_write;
 uint8_t *data;
 void *callback_opaque;
-FWCfgReadCallback read_callback;
+FWCfgCallback select_cb;
 };
 
 #define JPG_FILE 0
@@ -236,8 +236,8 @@ static int fw_cfg_select(FWCfgState *s, uint16_t key)
 /* entry successfully selected, now run callback if present */
 arch = !!(key & FW_CFG_ARCH_LOCAL);
 e = >entries[arch][key & FW_CFG_ENTRY_MASK];
-if (e->read_callback) {
-e->read_callback(e->callback_opaque);
+if (e->select_cb) {
+e->select_cb(e->callback_opaque);
 }
 }
 
@@ -568,11 +568,11 @@ static const VMStateDescription vmstate_fw_cfg = {
 }
 };
 
-static void fw_cfg_add_bytes_read_callback(FWCfgState *s, uint16_t key,
-   FWCfgReadCallback callback,
-   void *callback_opaque,
-   void *data, size_t len,
-   bool read_only)
+static void fw_cfg_add_bytes_callback(FWCfgState *s, uint16_t key,
+  FWCfgCallback select_cb,
+  void *callback_opaque,
+  void *data, size_t len,
+  bool read_only)
 {
 int arch = !!(key & FW_CFG_ARCH_LOCAL);
 
@@ -583,7 +583,7 @@ static void fw_cfg_add_bytes_read_callback(FWCfgState *s, 
uint16_t key,
 
 s->entries[arch][key].data = data;
 s->entries[arch][key].len = (uint32_t)len;
-s->entries[arch][key].read_callback = callback;
+s->entries[arch][key].select_cb = select_cb;
 s->entries[arch][key].callback_opaque = callback_opaque;
 s->entries[arch][key].allow_write = !read_only;
 }
@@ -610,7 +610,7 @@ static

[Qemu-devel] [PATCH v5 0/8] KASLR kernel dump support

Recent linux kernels enable KASLR to randomize phys/virt memory
addresses. This series aims to provide enough information in qemu
dumps so that crash utility can work with randomized kernel too (it
hasn't been tested on other archs than x86 though, help welcome).

The previous design to provide qemu with debug details (using qemu-ga
and a dedicated vmcoreinfo ACPI device) failed to satisfy the
requirements during previous iterations.

In particular, the previous proposed vmcoreinfo ACPI device had the
following issues:
- hazardous memory handling with no explicit synchronization
- occupy 2 fw-cfg entries (for memory and pointer)
- occupy 4k of guest memory (this could have been tweaked)
- did not provide ACPI methods (this could have been added)
- may be difficult to maintain compatibility (according to Michael)

This is a new proposal, that leverage fw-cfg device instead of adding
a new device. A "etc/vmcoreinfo" entry is added, where the guest,
during boot or later, can write the addr/size location of an ELF note
to be appended in the qemu dump.

Note: only guest kernel is expected to write to a fw-cfg entry.  This
method is not meant for general qemu/user-space communication. There
are more appropriate devices for this purpose, and the guest kernel
should not expose this facility.

This is quite easier to implement, and uses less of the limited fw-cfg
slots, and guest memory. It also solves the synchronization issue, and
may be easier to discover or to maintain compatibility.

The Linux ELF note is expected to be the VMCOREINFO note, which will
have a special handling in qemu in this case helping kaslr-kernel
debugging. But it could be any valid ELF note.

Crash 7.1.9 will parse the "phys_base" value from the VMCOREINFO note,
and thus will work with KASLR-dump produced by this series.

The series implements the note addition in qemu ELF/kdump,
as well as the python scripts/dump-guest-memory.py.

To test:

Using kernel from https://github.com/elmarco/linux fw-cfg branch,
Compile and run guest kernel with CONFIG_RANDOMIZE_BASE=y & 
CONFIG_FW_CFG_SYSFS=y.

Run qemu with -global fw_cfg.vmcoreinfo=on

Produce an ELF dump:
{ "execute": "dump-guest-memory", "arguments": { "protocol": "file:dump", 
"paging": false } }

Produce a kdump:
{ "execute": "dump-guest-memory", "arguments": { "protocol": "file:dump", 
"paging": false, "format": "kdump-zlib" } }

Or with (gdb) dump-guest-memory, with scripts/dump-guest-memory.py script.

Analyze with crash >= 7.1.9 (or the git version for 4.13 fixes..):

$ crash vmlinux dump

v5:
- removed x-write-pointer-available patch from this series
- drop vmcoreinfo device
- add write callback to fw_cfg entries
- add a writable fw_cfg "vmcoreinfo" entry
- split phys_base update from VMCOREINFO note in a seperate patch
- most patches had non-trivial changes, dropping reviewed-by tags

v4: from Laszlo review
- switch to warn_report*()
- update test to follow vmgenid and use boot-sector infrastructure
- fix range checks in the python script
- add vmcoreinfo_get() stub

v3: from Laszlo review
- change vmcoreinfo offset to 36
- reset err to null after report
- use PRIu32
- change name_size and desc_size against MAX_VMCOREINFO_SIZE
- python code simplification
- check boundaries of blocks in phys_memory_read()
- fix some vmgi vs vmci names
- add more comments in code
- fix comment indentation
- add r-b tags

v2: from Laszlo review
- vmci: fix guest endianess handling
- vmci: fix wrong sizeof()
- vmci: add back reset logic from vmgenid
- dump: have 1MB size limit for vmcoreinfo
- dump: fix potential off-by-1 buffer manipulation
- dump: use temporary variable for qemu_strtou64
- dump: fixed VMCOREINFO duplication in kdump
- update gdb script to not call into qemu process
- update MAINTAINERS with some new files

Marc-André Lureau (8):
  fw_cfg: rename read callback
  fw_cfg: add write callback
  fw_cfg: add vmcoreinfo file
  dump: add guest ELF note
  dump: update phys_base header field based on VMCOREINFO content
  kdump: set vmcoreinfo location
  scripts/dump-guest-memory.py: add vmcoreinfo
  MAINTAINERS: add Dump maintainers

 scripts/dump-guest-memory.py |  51 
 include/hw/compat.h  |   8 ++
 include/hw/loader.h  |   2 +-
 include/hw/nvram/fw_cfg.h|  18 -
 include/sysemu/dump.h|   2 +
 dump.c   | 179 +++
 hw/acpi/vmgenid.c|   2 +-
 hw/core/loader.c |   4 +-
 hw/i386/acpi-build.c |   2 +-
 hw/isa/lpc_ich9.c|   4 +-
 hw/nvram/fw_cfg.c|  64 
 MAINTAINERS  |   9 +++
 docs/specs/fw_cfg.txt|  24 ++
 13 files changed, 343 insertions(+), 26 deletions(-)

-- 
2.14.0.1.geff633fa0

Re: [Qemu-devel] [PATCH v3 1/3] Fix format and styles; make code more pythonic

On 08/07/2017 11:53 AM, dverma wrote:
> - Format fixes, cleaned up the print statement
> - Style fixes, e.g. changed "if not x in y" to "if x not in y"
> - Improved variable names
> ---

Missing a Signed-off-by: tag; without that, we can't accept the patch.
Can you please resubmit with that fixed?  More hints at
https://wiki.qemu.org/index.php/Contribute/SubmitAPatch

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [for-2.10 PATCH 3/3] spapr: error out if PHB fails to setup PCI DRCs

It is currently possible to start QEMU with two PHBs without using the
index property:

-device spapr-pci-host-bridge,id=pci1,\
  buid=0x8002001,\
  liobn=0x8100,\
  liobn64=0x8101,\
  mem_win_addr=0x2001,\
  mem64_win_addr=0x2200,\
  io_win_addr=0x2001 \
-device spapr-pci-host-bridge,id=pci2,\
  buid=0x8002002,\
  liobn=0x8200,\
  liobn64=0x8201,\
  mem_win_addr=0x20018000,\
  mem64_win_addr=0x2300,\
  io_win_addr=0x2002 \

Each PHB has its index property equal to -1. As a consequence, each PHB
will want to create PCI DRCs with the same ids:

/* allocate connectors for child PCI devices */
if (sphb->dr_enabled) {
for (i = 0; i < PCI_SLOT_MAX * 8; i++) {
spapr_dr_connector_new(OBJECT(phb), TYPE_SPAPR_DRC_PCI,
   (sphb->index << 16) | i);
}
}

When DRC objects are added to the composition tree, an alias using the
DRC id is created in the "/dr-connector" path. But DRC ids are supposed
to be globally unique and the alias creation fails, leaving the DRC
object unrealized, which isn't expected by the rest of the DR logic.

This has the effect of silently turning-off PCI hotplug support (ie, PCI
hotplug no longer works on any PHB and no error message is printed).

This bug always existed and would even cause a non-fatal error to be
reported on the console (until recent commit bf26ae32a92a). This patch
causes the error message to be printed again and QEMU to exit.

Signed-off-by: Greg Kurz 
---
 hw/ppc/spapr_pci.c |9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 4b7882e3613d..4a1e6c5f697c 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -1731,9 +1731,16 @@ static void spapr_phb_realize(DeviceState *dev, Error 
**errp)
 
 /* allocate connectors for child PCI devices */
 if (sphb->dr_enabled) {
+Error *local_err = NULL;
+
 for (i = 0; i < PCI_SLOT_MAX * 8; i++) {
 spapr_dr_connector_new(OBJECT(phb), TYPE_SPAPR_DRC_PCI,
-   (sphb->index << 16) | i, NULL);
+   (sphb->index << 16) | i, _err);
+if (local_err) {
+error_propagate(errp, local_err);
+error_prepend(errp, "Failed to create DRC: ");
+return;
+}
 }
 }

[Qemu-devel] [Bug 1088617] Re: qemu-system-mipsel save/restore broken

Triaging old bug tickets ... can you still reproduce this issue with the
latest version of QEMU?

** Changed in: qemu
   Status: New => Incomplete

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1088617

Title:
  qemu-system-mipsel save/restore broken

Status in QEMU:
  Incomplete

Bug description:
  Save and restore on mipsel seems to be broken (tested with commit
  1c97e303d4ea80a2691334b0febe87a50660f99d). To reproduce:

  1. Download debian_squeeze_mipsel_standard.qcow2 and vmlinux-2.6.32-5
  -4kc-malta from from http://people.debian.org/~aurel32/qemu/mipsel/

  2. Boot the system. I had to ^D past a Bus error in fsck, which may be 
another bug (haven't investigated). The command line used was:
  qemu-system-mipsel -M malta -kernel vmlinux-2.6.32-5-4kc-malta -hda 
debian_squeeze_mipsel_standard.qcow2 -append "root=/dev/sda1 console=tty0" -k 
en-us -vnc :0

  3. Once the system is booted, go to the monitor and do "savevm
  booted". Then quit.

  4. Re-run qemu-system-mipsel again with "-loadvm booted". The guest
  system comes back but is hung (the monitor remains responsive,
  however).

  I also captured a debug log, which is attached. The immediate cause of
  the freeze seems to be that it's stuck in a loop repeatedly handling
  the same page fault over and over.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1088617/+subscriptions

[Qemu-devel] [Bug 1050694] Re: Interrupt 0xffffffff when debug is turned on

Triaging old bug tickets ... is there still something left to do here,
or could we close this ticket nowadays?

** Changed in: qemu
   Status: New => Incomplete

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1050694

Title:
  Interrupt 0x when debug is turned on

Status in QEMU:
  Incomplete

Bug description:
  Hi,

  I have been getting a GPF when I enable interrupts, working on
  implementing processes and a scheduler. When I comment out the
  scheduler code, I still get the GPF. I used the following QEMU command
  line to capture a log:

  qemu-system-i386 -smp 4 -monitor stdio -cpu core2duo -D
  /home/adam/century/util/qemu.log -d int,in_asm -s -hda
  "$harddisk_image" -m 3.5G

  Rather than posting the entire log, I need some help interpreting the 
following section (notice "INT=0x" on the top line):
  Servicing hardware INT=0x
  1: v= e= i=0 cpl=0 IP=0008:0010b63f pc=0010b63f SP=0010:0012b768 
EAX=
  EAX= EBX=2000 ECX=0018 EDX=05a00780
  ESI=00112faa EDI=000b8fa0 EBP=0012b780 ESP=0012b768
  EIP=0010b63f EFL=00207202 [---] CPL=0 II=0 A20=1 SMM=0 HLT=0
  ES =0010   00cf9300 DPL=0 DS [-WA]
  CS =0008   00cf9a00 DPL=0 CS32 [-R-]
  SS =0010   00cf9300 DPL=0 DS [-WA]
  DS =0010   00cf9300 DPL=0 DS [-WA]
  FS =0010   00cf9300 DPL=0 DS [-WA]
  GS =0010   00cf9300 DPL=0 DS [-WA]
  LDT=   8200 DPL=0 LDT
  TR =0008 0580 0067 8900 DPL=0 TSS32-avl
  GDT= 00127760 0027
  IDT= 00122f40 07ff
  CR0=8011 CR2= CR3=0014a000 CR4=
  DR0= DR1= DR2= DR3=
  DR6=0ff0 DR7=0400
  CCS=0024 CCD=0012b75c CCO=ADDL
  EFER=
  check_exception old: 0x new 0xd
  2: v=0d e=fffa i=0 cpl=0 IP=0008:0010b63f pc=0010b63f SP=0010:0012b768 
EAX=
  EAX= EBX=2000 ECX=0018 EDX=05a00780
  ESI=00112faa EDI=000b8fa0 EBP=0012b780 ESP=0012b768
  EIP=0010b63f EFL=00207202 [---] CPL=0 II=0 A20=1 SMM=0 HLT=0
  ES =0010   00cf9300 DPL=0 DS [-WA]
  CS =0008   00cf9a00 DPL=0 CS32 [-R-]
  SS =0010   00cf9300 DPL=0 DS [-WA]
  DS =0010   00cf9300 DPL=0 DS [-WA]
  FS =0010   00cf9300 DPL=0 DS [-WA]
  GS =0010   00cf9300 DPL=0 DS [-WA]
  LDT=   8200 DPL=0 LDT
  TR =0008 0580 0067 8900 DPL=0 TSS32-avl
  GDT= 00127760 0027
  IDT= 00122f40 07ff
  CR0=8011 CR2= CR3=0014a000 CR4=
  DR0= DR1= DR2= DR3=
  DR6=0ff0 DR7=0400
  CCS=0024 CCD=0012b75c CCO=ADDL
  EFER=

  To the best of my ability to interpret, I an getting an undefined
  interrupt, which is then triggering a GPF, which is caught. However,
  do not know where it might be coming from.

  Some additional information:

  This command works:

  qemu-system-i386 -smp 4 -monitor stdio -cpu core2duo -s -hda
  "$harddisk_image" -m 3.5G

  This command works:

  qemu-system-i386 -monitor stdio -cpu core2duo -D
  /home/adam/century/util/qemu.log -d int,in_asm -s -hda
  "$harddisk_image" -m 3.5G

  And, as above, this does not:

  qemu-system-i386 -smp 4 -monitor stdio -cpu core2duo -D
  /home/adam/century/util/qemu.log -d int,in_asm -s -hda
  "$harddisk_image" -m 3.5G

  [adam@os-development ~]$ qemu-system-i386 -version
  QEMU emulator version 1.2.0, Copyright (c) 2003-2008 Fabrice Bellard

  Attached is an image as a test case.  Please let me know if you need
  any additional information.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1050694/+subscriptions

[Qemu-devel] [for-2.10 PATCH 2/3] spapr_drc: add Error ** argument to spapr_dr_connector_new()

This just allow spapr_dr_connector_new() to propagate errors to its
callers. It doesn't change any functionality.

Signed-off-by: Greg Kurz 
---
 hw/ppc/spapr.c |4 ++--
 hw/ppc/spapr_drc.c |4 ++--
 hw/ppc/spapr_pci.c |2 +-
 include/hw/ppc/spapr_drc.h |2 +-
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index f7a19720dcdf..2bc3dbc25653 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2023,7 +2023,7 @@ static void 
spapr_create_lmb_dr_connectors(sPAPRMachineState *spapr)
 
 addr = i * lmb_size + spapr->hotplug_memory.base;
 spapr_dr_connector_new(OBJECT(spapr), TYPE_SPAPR_DRC_LMB,
-   addr / lmb_size);
+   addr / lmb_size, NULL);
 }
 }
 
@@ -2118,7 +2118,7 @@ static void spapr_init_cpus(sPAPRMachineState *spapr)
 
 if (mc->has_hotpluggable_cpus) {
 spapr_dr_connector_new(OBJECT(spapr), TYPE_SPAPR_DRC_CPU,
-   (core_id / smp_threads) * smt);
+   (core_id / smp_threads) * smt, NULL);
 }
 
 if (i < boot_cores_nr) {
diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
index 5260b5d363a0..061fff41654a 100644
--- a/hw/ppc/spapr_drc.c
+++ b/hw/ppc/spapr_drc.c
@@ -532,7 +532,7 @@ static void unrealize(DeviceState *d, Error **errp)
 }
 
 sPAPRDRConnector *spapr_dr_connector_new(Object *owner, const char *type,
- uint32_t id)
+ uint32_t id, Error **errp)
 {
 sPAPRDRConnector *drc = SPAPR_DR_CONNECTOR(object_new(type));
 char *prop_name;
@@ -542,7 +542,7 @@ sPAPRDRConnector *spapr_dr_connector_new(Object *owner, 
const char *type,
 prop_name = g_strdup_printf("dr-connector[%"PRIu32"]",
 spapr_drc_index(drc));
 object_property_add_child(owner, prop_name, OBJECT(drc), _abort);
-object_property_set_bool(OBJECT(drc), true, "realized", NULL);
+object_property_set_bool(OBJECT(drc), true, "realized", errp);
 g_free(prop_name);
 
 return drc;
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index d84abf1070a0..4b7882e3613d 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -1733,7 +1733,7 @@ static void spapr_phb_realize(DeviceState *dev, Error 
**errp)
 if (sphb->dr_enabled) {
 for (i = 0; i < PCI_SLOT_MAX * 8; i++) {
 spapr_dr_connector_new(OBJECT(phb), TYPE_SPAPR_DRC_PCI,
-   (sphb->index << 16) | i);
+   (sphb->index << 16) | i, NULL);
 }
 }
 
diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
index a7958d0a8d14..8651af2ffae0 100644
--- a/include/hw/ppc/spapr_drc.h
+++ b/include/hw/ppc/spapr_drc.h
@@ -248,7 +248,7 @@ uint32_t spapr_drc_index(sPAPRDRConnector *drc);
 sPAPRDRConnectorType spapr_drc_type(sPAPRDRConnector *drc);
 
 sPAPRDRConnector *spapr_dr_connector_new(Object *owner, const char *type,
- uint32_t id);
+ uint32_t id, Error **errp);
 sPAPRDRConnector *spapr_drc_by_index(uint32_t index);
 sPAPRDRConnector *spapr_drc_by_id(const char *type, uint32_t id);
 int spapr_drc_populate_dt(void *fdt, int fdt_offset, Object *owner,

[Qemu-devel] [for-2.10 PATCH 1/3] spapr_drc: abort if object_property_add_child() fails

object_property_add_child() can only fail in two cases:
- the child already has a parent, which shouldn't happen since the DRC was
  allocated a few lines above
- the parent already has a child with the same name, which would mean the
  caller tries to create a DRC that already exists

In both case, this is a QEMU bug and we should abort.

Signed-off-by: Greg Kurz 
---
 hw/ppc/spapr_drc.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
index 47d94e782ac2..5260b5d363a0 100644
--- a/hw/ppc/spapr_drc.c
+++ b/hw/ppc/spapr_drc.c
@@ -541,7 +541,7 @@ sPAPRDRConnector *spapr_dr_connector_new(Object *owner, 
const char *type,
 drc->owner = owner;
 prop_name = g_strdup_printf("dr-connector[%"PRIu32"]",
 spapr_drc_index(drc));
-object_property_add_child(owner, prop_name, OBJECT(drc), NULL);
+object_property_add_child(owner, prop_name, OBJECT(drc), _abort);
 object_property_set_bool(OBJECT(drc), true, "realized", NULL);
 g_free(prop_name);

[Qemu-devel] [for-2.10 PATCH 0/3] spapr: fix PCI hotplug issue when PHBs don't have index

While working on PHB hotplug for 2.11, a bug was discovered in the PCI DR
logic in the PHB code: it relies on the PHB index property to be set but
it doesn't enforce it. It is hence possible to create two PHBs with the
same index (ie, the default value -1), even though this isn't expected
by the rest of the PHB code. The most visible consequence, is that PCI
hotplug doesn't work anymore.

It was agreed that the right fix would be to make the index property
mandatory. This is too an intrusive change to do during soft/hard
freeze though. It is postponed for 2.11.

In the meantime, we can at least have QEMU to detect the error early
and to exit with an error message, instead of silently going on with
half-broken PHBs.

David,

These patches were made with the future work on PHB hotplug in mind. If
the series is too long, a similar result can be achieved with this single
change in spapr_dr_connector_new():

-object_property_set_bool(OBJECT(drc), true, "realized", NULL);
+object_property_set_bool(OBJECT(drc), true, "realized", _fatal);
 
Just tell me, if you prefer the shorter version.

--
Greg

---

Greg Kurz (3):
  spapr_drc: abort if object_property_add_child() fails
  spapr_drc: add Error ** argument to spapr_dr_connector_new()
  spapr: error out if PHB fails to setup PCI DRCs


 hw/ppc/spapr.c |4 ++--
 hw/ppc/spapr_drc.c |6 +++---
 hw/ppc/spapr_pci.c |8 +++-
 include/hw/ppc/spapr_drc.h |2 +-
 4 files changed, 13 insertions(+), 7 deletions(-)

[Qemu-devel] [PATCH v3 3/3] Add new functions for whitelisting and their calls

The 'check_updated_properties' function keeps track of properties
that were added/removed from fields across qemu versions. The
'check_updated_sizes' function reduces false positives generated
especially while testing backward migration by keeping a list
of common size/version changes. The 'check_new_sections' function
is used to check for sections that got deprecated or were introduced
in different versions of qemu and will show as false positives while
testing forward migration. Improved the variable names and added
multiple blank newlines to keep Python PEP8 warning away.
---
 scripts/vmstate-static-checker.py | 226 ++
 1 file changed, 180 insertions(+), 46 deletions(-)

diff --git a/scripts/vmstate-static-checker.py 
b/scripts/vmstate-static-checker.py
index ae41e44..77e63f1 100755
--- a/scripts/vmstate-static-checker.py
+++ b/scripts/vmstate-static-checker.py
@@ -40,6 +40,107 @@ def bump_taint():
 if taint < 255:
 taint = taint + 1
 
+# Sections gain/lose new fields with time.
+# These are not name changes thats handled by another list.
+# These will be 'missing' or 'not found' in different versions of qemu
+
+
+def check_updated_properties(src_desc, field):
+src_desc = str(src_desc)
+field = str(field)
+updated_property = {
+  'ICH9LPC': ['ICH9LPC/smi_feat'],
+  'ide_bus/error': ['retry_sector_num', 'retry_nsector', 'retry_unit'],
+  'e1000': ['e1000/full_mac_state'],
+  'ich9_pm': ['ich9_pm/tco', 'ich9_pm/cpuhp']
+}
+
+if src_desc in updated_property and field in updated_property[src_desc]:
+return True
+
+return False
+
+
+# A lot of errors are generated due to differences in sizes some of which are 
false positives. This list
+# is used to save those common changes
+def check_updated_sizes(field, old_size, new_size):
+new_sizes_list = {
+'tally_counters.TxOk': [8, 64],
+'intel-iommu': [0, 1],
+'iommu-intel': [0, 1]
+}
+
+if field not in new_sizes_list:
+return False
+
+if (old_size in new_sizes_list[field] and new_size in 
new_sizes_list[field]):
+return True
+
+return False
+
+
+# With time new sections/hardwares supported and old ones are depreciated on
+# chipsets.
+# There is no separate list for new or dead sections as it's relative to which
+# qemu version you compare too.
+# Update this list with such sections.
+# some items in this list might overlap with changed sections names.
+def check_new_sections(sec):
+new_sections_list = [
+'virtio-balloon-device',
+'virtio-rng-device',
+'virtio-scsi-device',
+'virtio-blk-device',
+'virtio-serial-device',
+'virtio-net-device',
+'vhost-vsock-device',
+'virtio-input-host-device',
+'virtio-input-hid-device',
+'virtio-mouse-device',
+'virtio-keyboard-device',
+'virtio-vga',
+'virtio-input-device',
+'virtio-gpu-device',
+'virtio-tablet-device',
+'isa-pcspk',
+'qemu-xhci',
+'base-xhci',
+'vmgenid',
+'intel-iommu',
+'i8257',
+'i82801b11-bridge',
+'ivshmem',
+'ivshmem-doorbell',
+'ivshmem-plain',
+'usb-storage-device',
+'usb-storage-dev',
+'pci-qxl',
+'pci-uhci-usb',
+'pci-piix3',
+'pci-vga',
+'pci-bridge-seat',
+'pcie-root-port',
+'fw_cfg_io',
+'fw_cfg_mem',
+'exynos4210-ehci-usb',
+'sysbus-ehci-usb',
+'tegra2-ehci-usb',
+'kvm-apic',
+'fusbh200-ehci-usb',
+'apic',
+'apic-common',
+'xlnx,ps7-usb',
+'e1000e',
+'e1000-82544gc',
+'e1000-82545em']
+
+if sec in new_sections_list:
+return True
+
+return False
+
+# Fields might change name with time across qemu versions.
+
 
 def check_fields_match(name, s_field, d_field):
 if s_field == d_field:
@@ -286,12 +387,22 @@ def check_fields(src_fields, dest_fields, desc, sec):
 unused_count = s_item["size"] - d_item["size"]
 continue
 
-print('Section "' + sec + '", '
-  'Description "' + desc + '": '
-  'expected field "' + s_item["field"] + '", '
-  'got "' + d_item["field"] + '"; skipping rest')
-bump_taint()
-break
+# commit 20daa90a20d, extra field 'config' was added in newer 
releases
+# there will be a mismatch in the number of fields of irq_state 
and config
+# it's a known false positive so skip it
+if (desc in ["PCIDevice", "PCIEDevice"]):
+if((s_item["field"] in ["irq_state",

[Qemu-devel] [PATCH v3 0/3] Vmstate-static-checker.py fix upstream

This is an update to the script vmstate-static-checker.py. The whitelist has 
been updated and newer functions have been added to reduce the false
positives generated by the script while testing migration. The code has been
cleaned and updated to follow PEP8 guidelines.

dverma (3):
  Fix format and styles; make code more pythonic
  Update the existing whitelist
  Add new functions for whitelisting and their calls

 scripts/vmstate-static-checker.py | 335 --
 1 file changed, 250 insertions(+), 85 deletions(-)

-- 
1.8.3.1

[Qemu-devel] [PATCH v3 2/3] Update the existing whitelist

Appended newer fields and introduced new names in the whitelist
---
 scripts/vmstate-static-checker.py | 22 --
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/scripts/vmstate-static-checker.py 
b/scripts/vmstate-static-checker.py
index b416b66..ae41e44 100755
--- a/scripts/vmstate-static-checker.py
+++ b/scripts/vmstate-static-checker.py
@@ -49,7 +49,6 @@ def check_fields_match(name, s_field, d_field):
 # is used to whitelist such changes in each section / description.
 changed_names = {
 'apic': ['timer', 'timer_expiry'],
-'e1000': ['dev', 'parent_obj'],
 'ehci': ['dev', 'pcidev'],
 'I440FX': ['dev', 'parent_obj'],
 'ich9_ahci': ['card', 'parent_obj'],
@@ -73,7 +72,6 @@ def check_fields_match(name, s_field, d_field):
  'tmr.timer', 'ar.tmr.timer',
  'tmr.overflow_time', 'ar.tmr.overflow_time',
  'gpe', 'ar.gpe'],
-'rtl8139': ['dev', 'parent_obj'],
 'qxl': ['num_surfaces', 'ssd.num_surfaces'],
 'usb-ccid': ['abProtocolDataStructure', 
'abProtocolDataStructure.data'],
 'usb-host': ['dev', 'parent_obj'],
@@ -96,6 +94,26 @@ def check_fields_match(name, s_field, d_field):
   'mem_win_size', 'mig_mem_win_size',
   'io_win_addr', 'mig_io_win_addr',
   'io_win_size', 'mig_io_win_size'],
+'rtl8139': ['dev', 'parent_obj'],
+'e1000e': ['PCIDevice', 'PCIEDevice', 'intr_state', 
'redhat_7_3_intr_state'],
+'nec-usb-xhci': ['PCIDevice', 'PCIEDevice'],
+'xhci-intr': ['er_full_unused', 'er_full'],
+'e1000': ['dev', 'parent_obj',
+  'tx.ipcss', 'tx.props.ipcss',
+  'tx.ipcso', 'tx.props.ipcso',
+  'tx.ipcse', 'tx.props.ipcse',
+  'tx.tucss', 'tx.props.tucss',
+  'tx.tucso', 'tx.props.tucso',
+  'tx.tucse', 'tx.props.tucse',
+  'tx.paylen', 'tx.props.paylen',
+  'tx.hdr_len', 'tx.props.hdr_len',
+  'tx.mss', 'tx.props.mss',
+  'tx.sum_needed', 'tx.props.sum_needed',
+  'tx.ip', 'tx.props.ip',
+  'tx.tcp', 'tx.props.tcp',
+  'tx.ipcss', 'tx.props.ipcss',
+  'tx.ipcss', 'tx.props.ipcss',
+  ]
 }
 
 if name not in changed_names:
-- 
1.8.3.1

[Qemu-devel] [PATCH v3 1/3] Fix format and styles; make code more pythonic

- Format fixes, cleaned up the print statement
- Style fixes, e.g. changed "if not x in y" to "if x not in y"
- Improved variable names
---
 scripts/vmstate-static-checker.py | 111 +-
 1 file changed, 62 insertions(+), 49 deletions(-)

diff --git a/scripts/vmstate-static-checker.py 
b/scripts/vmstate-static-checker.py
index bcef7ee..b416b66 100755
--- a/scripts/vmstate-static-checker.py
+++ b/scripts/vmstate-static-checker.py
@@ -19,6 +19,11 @@
 # You should have received a copy of the GNU General Public License along
 # with this program; if not, see .
 
+#
+# 2017 Deepak Verma 
+# Added few functions and fields for whitelisting
+#
+
 import argparse
 import json
 import sys
@@ -26,6 +31,7 @@ import sys
 # Count the number of errors found
 taint = 0
 
+
 def bump_taint():
 global taint
 
@@ -92,7 +98,7 @@ def check_fields_match(name, s_field, d_field):
   'io_win_size', 'mig_io_win_size'],
 }
 
-if not name in changed_names:
+if name not in changed_names:
 return False
 
 if s_field in changed_names[name] and d_field in changed_names[name]:
@@ -100,6 +106,7 @@ def check_fields_match(name, s_field, d_field):
 
 return False
 
+
 def get_changed_sec_name(sec):
 # Section names can change -- see commit 292b1634 for an example.
 changes = {
@@ -114,16 +121,17 @@ def get_changed_sec_name(sec):
 return item
 return ""
 
+
 def exists_in_substruct(fields, item):
 # Some QEMU versions moved a few fields inside a substruct.  This
 # kept the on-wire format the same.  This function checks if
 # something got shifted inside a substruct.  For example, the
 # change in commit 1f42d22233b4f3d1a2933ff30e8d6a6d9ee2d08f
 
-if not "Description" in fields:
+if "Description" not in fields:
 return False
 
-if not "Fields" in fields["Description"]:
+if "Fields" not in fields["Description"]:
 return False
 
 substruct_fields = fields["Description"]["Fields"]
@@ -176,10 +184,10 @@ def check_fields(src_fields, dest_fields, desc, sec):
 except StopIteration:
 if d_iter_list == []:
 # We were not in a substruct
-print "Section \"" + sec + "\",",
-print "Description " + "\"" + desc + "\":",
-print "expected field \"" + s_item["field"] + "\",",
-print "while dest has no further fields"
+print('Section "' + sec + '", '
+  'Description "' + desc + '": '
+  'expected field "' + s_item["field"] + '", '
+  'while dest has no further fields')
 bump_taint()
 break
 
@@ -191,30 +199,28 @@ def check_fields(src_fields, dest_fields, desc, sec):
 advance_dest = True
 
 if unused_count != 0:
-if advance_dest == False:
+if not advance_dest:
 unused_count = unused_count - s_item["size"]
 if unused_count == 0:
 advance_dest = True
 continue
 if unused_count < 0:
-print "Section \"" + sec + "\",",
-print "Description \"" + desc + "\":",
-print "unused size mismatch near \"",
-print s_item["field"] + "\""
+print('Section "' + sec + '", '
+  'Description "' + desc + '": '
+  'unused size mismatch near "' + s_item["field"] + 
'"')
 bump_taint()
 break
 continue
 
-if advance_src == False:
+if not advance_src:
 unused_count = unused_count - d_item["size"]
 if unused_count == 0:
 advance_src = True
 continue
 if unused_count < 0:
-print "Section \"" + sec + "\",",
-print "Description \"" + desc + "\":",
-print "unused size mismatch near \"",
-print d_item["field"] + "\""
+print('Section "' + sec + '", '
+  'Description "' + desc + '": '
+  'unused size mismatch near "' + d_item["field"] + 
'"')
 bump_taint()
 break
 continue
@@ -262,16 +268,16 @@ def check_fields(src_fields, dest_fields, desc, sec):
 unused_count = s_item["size"] - d_item["size"]
 continue
 
-print "Section \"" + sec + "\",",
-print "Description \"" + desc + "\":",
-print "expected field \"" + s_item["field"] + "\",",
-print "got \"" + d_item["field"] + "\"; skipping rest"
+

Re: [Qemu-devel] [PATCH v4 2/5] hw/pci: introduce pcie-pci-bridge device

2017-08-07 Thread Marcel Apfelbaum


On 07/08/2017 19:42, Alexander Bezzubikov wrote:

2017-08-07 19:39 GMT+03:00 Marcel Apfelbaum :

On 05/08/2017 23:27, Aleksandr Bezzubikov wrote:


Introduce a new PCIExpress-to-PCI Bridge device,
which is a hot-pluggable PCI Express device and
supports devices hot-plug with SHPC.




Hi Aleksandr,


This device is intended to replace the DMI-to-PCI
Bridge in an overwhelming majority of use-cases.



Please drop the last line ( ... majority...)
It simply replaces it.






Signed-off-by: Aleksandr Bezzubikov 
---
   hw/pci-bridge/Makefile.objs |   2 +-
   hw/pci-bridge/pcie_pci_bridge.c | 212

   include/hw/pci/pci.h|   1 +
   3 files changed, 214 insertions(+), 1 deletion(-)
   create mode 100644 hw/pci-bridge/pcie_pci_bridge.c

diff --git a/hw/pci-bridge/Makefile.objs b/hw/pci-bridge/Makefile.objs
index c4683cf..666db37 100644
--- a/hw/pci-bridge/Makefile.objs
+++ b/hw/pci-bridge/Makefile.objs
@@ -1,4 +1,4 @@
-common-obj-y += pci_bridge_dev.o
+common-obj-y += pci_bridge_dev.o pcie_pci_bridge.o
   common-obj-$(CONFIG_PCIE_PORT) += pcie_root_port.o gen_pcie_root_port.o
   common-obj-$(CONFIG_PXB) += pci_expander_bridge.o
   common-obj-$(CONFIG_XIO3130) += xio3130_upstream.o xio3130_downstream.o
diff --git a/hw/pci-bridge/pcie_pci_bridge.c
b/hw/pci-bridge/pcie_pci_bridge.c
new file mode 100644
index 000..4127725
--- /dev/null
+++ b/hw/pci-bridge/pcie_pci_bridge.c
@@ -0,0 +1,212 @@
+/*
+ * QEMU Generic PCIE-PCI Bridge
+ *
+ * Copyright (c) 2017 Aleksandr Bezzubikov
+ *



Please replace the below license with:

" This work is licensed under the terms of the GNU GPL, version 2
   or later. See the COPYING file in the top-level directory."

to be sure it matches QEMU's license.




+ * Permission is hereby granted, free of charge, to any person obtaining
a copy
+ * of this software and associated documentation files (the "Software"),
to deal
+ * in the Software without restriction, including without limitation the
rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or
sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be
included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT
SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "hw/pci/pci.h"
+#include "hw/pci/pci_bus.h"
+#include "hw/pci/pci_bridge.h"
+#include "hw/pci/msi.h"
+#include "hw/pci/shpc.h"
+#include "hw/pci/slotid_cap.h"
+
+typedef struct PCIEPCIBridge {
+/*< private >*/
+PCIBridge parent_obj;
+
+OnOffAuto msi;
+MemoryRegion shpc_bar;
+/*< public >*/
+} PCIEPCIBridge;
+
+#define TYPE_PCIE_PCI_BRIDGE_DEV  "pcie-pci-bridge"
+#define PCIE_PCI_BRIDGE_DEV(obj) \
+OBJECT_CHECK(PCIEPCIBridge, (obj), TYPE_PCIE_PCI_BRIDGE_DEV)
+
+static void pcie_pci_bridge_realize(PCIDevice *d, Error **errp)
+{
+PCIBridge *br = PCI_BRIDGE(d);
+PCIEPCIBridge *pcie_br = PCIE_PCI_BRIDGE_DEV(d);
+int rc, pos;
+
+pci_bridge_initfn(d, TYPE_PCI_BUS);
+
+d->config[PCI_INTERRUPT_PIN] = 0x1;
+memory_region_init(_br->shpc_bar, OBJECT(d), "shpc-bar",
+   shpc_bar_size(d));
+rc = shpc_init(d, >sec_bus, _br->shpc_bar, 0, errp);
+if (rc) {
+goto error;
+}
+
+rc = pcie_cap_init(d, 0, PCI_EXP_TYPE_PCI_BRIDGE, 0, errp);
+if (rc < 0) {
+goto cap_error;
+}
+
+pos = pci_add_capability(d, PCI_CAP_ID_PM, 0, PCI_PM_SIZEOF, errp);
+if (pos < 0) {
+goto pm_error;
+}
+d->exp.pm_cap = pos;
+pci_set_word(d->config + pos + PCI_PM_PMC, 0x3);
+
+pcie_cap_arifwd_init(d);
+pcie_cap_deverr_init(d);
+
+rc = pcie_aer_init(d, PCI_ERR_VER, 0x100, PCI_ERR_SIZEOF, errp);
+if (rc < 0) {
+goto aer_error;
+}
+
+if (pcie_br->msi != ON_OFF_AUTO_OFF) {
+rc = msi_init(d, 0, 1, true, true, errp);
+if (rc < 0) {
+goto msi_error;
+}
+}
+pci_register_bar(d, 0, PCI_BASE_ADDRESS_SPACE_MEMORY |
+ PCI_BASE_ADDRESS_MEM_TYPE_64, _br->shpc_bar);
+return;
+
+msi_error:
+pcie_aer_exit(d);
+aer_error:
+pm_error:
+pcie_cap_exit(d);
+cap_error:
+shpc_free(d);
+error:
+pci_bridge_exitfn(d);
+}
+
+static void pcie_pci_bridge_exit(PCIDevice *d)
+{
+PCIEPCIBridge *bridge_dev =

[Qemu-devel] [PATCH v5] vmgenid: replace x-write-pointer-available hack