date:20141120

Re: [Qemu-devel] [PATCH] gtk: Don't crash if -nodefaults

2014-11-20 Thread Gerd Hoffmann

On Fr, 2014-11-21 at 09:59 +0800, Fam Zheng wrote:
> This fixes a crash by just skipping the vte resize hack if cur is
> NULL.

Picked up for 2.2.

thanks,
  Gerd

Re: [Qemu-devel] [v2 2/2] migration: Implement multiple compression threads

2014-11-20 Thread Li, Liang Z

> > +int migrate_compress_threads(void)
> > +{
> > +MigrationState *s;
> > +
> > +s = migrate_get_current();
> > +
> > +return s->compress_thread_count;
> > +}
> > +
> >  int migrate_use_xbzrle(void)
> >  {
> >  MigrationState *s;
> > @@ -697,4 +795,5 @@ void migrate_fd_connect(MigrationState *s)
> >  
> >  qemu_thread_create(&s->thread, "migration", migration_thread, s,
> > QEMU_THREAD_JOINABLE);
> > +migrate_compress_threads_create(s);


> don't create compress_threads always.
> It may be better:

> if (!migrate_use_xbzrle()) {
> migrate_compress_threads_create(s);
> }

Thanks for your comments, in fact,  the multiple thread compression can co-work 
with xbrzle, which can help to accelerate live migration.

> BTW, this patch is too big to review. Spliting it into some patch will be 
> welcome.

I am doing it.

Re: [Qemu-devel] [v2 2/2] migration: Implement multiple compression threads

2014-11-20 Thread ChenLiang

On 2014/11/6 19:08, Li Liang wrote:

> Instead of sending the guest memory directly, this solution compress
> the ram page before sending, after receiving, the data will be
> decompressed.
> This feature can help to reduce the data transferred about
> 60%, this is very useful when the network bandwidth is limited,
> and the migration time can also be reduced about 70%. The
> feature is off by default, following the document
> docs/multiple-compression-threads.txt for information to use it.
> 
> Reviewed-by: Eric Blake 
> Signed-off-by: Li Liang 
> ---
>  arch_init.c   | 435 
> --
>  hmp-commands.hx   |  56 ++
>  hmp.c |  57 ++
>  hmp.h |   6 +
>  include/migration/migration.h |  12 +-
>  include/migration/qemu-file.h |   1 +
>  migration.c   |  99 ++
>  monitor.c |  21 ++
>  qapi-schema.json  |  88 -
>  qmp-commands.hx   | 131 +
>  10 files changed, 890 insertions(+), 16 deletions(-)
> 
> diff --git a/arch_init.c b/arch_init.c
> index 88a5ba0..a27d87b 100644
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -24,6 +24,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #ifndef _WIN32
>  #include 
>  #include 
> @@ -126,6 +127,7 @@ static uint64_t bitmap_sync_count;
>  #define RAM_SAVE_FLAG_CONTINUE 0x20
>  #define RAM_SAVE_FLAG_XBZRLE   0x40
>  /* 0x80 is reserved in migration.h start with 0x100 next */
> +#define RAM_SAVE_FLAG_COMPRESS_PAGE0x100
>  
>  static struct defconfig_file {
>  const char *filename;
> @@ -332,6 +334,177 @@ static uint64_t migration_dirty_pages;
>  static uint32_t last_version;
>  static bool ram_bulk_stage;
>  
> +#define COMPRESS_BUF_SIZE (TARGET_PAGE_SIZE + 16)
> +#define MIG_BUF_SIZE (COMPRESS_BUF_SIZE + 256 + 16)
> +struct MigBuf {
> +int buf_index;
> +uint8_t buf[MIG_BUF_SIZE];
> +};
> +
> +typedef struct MigBuf MigBuf;
> +
> +static void migrate_put_byte(MigBuf *f, int v)
> +{
> +f->buf[f->buf_index] = v;
> +f->buf_index++;
> +}
> +
> +static void migrate_put_be16(MigBuf *f, unsigned int v)
> +{
> +migrate_put_byte(f, v >> 8);
> +migrate_put_byte(f, v);
> +}
> +
> +static void migrate_put_be32(MigBuf *f, unsigned int v)
> +{
> +migrate_put_byte(f, v >> 24);
> +migrate_put_byte(f, v >> 16);
> +migrate_put_byte(f, v >> 8);
> +migrate_put_byte(f, v);
> +}
> +
> +static void migrate_put_be64(MigBuf *f, uint64_t v)
> +{
> +migrate_put_be32(f, v >> 32);
> +migrate_put_be32(f, v);
> +}
> +
> +static void migrate_put_buffer(MigBuf *f, const uint8_t *buf, int size)
> +{
> +int l;
> +
> +while (size > 0) {
> +l = MIG_BUF_SIZE - f->buf_index;
> +if (l > size) {
> +l = size;
> +}
> +memcpy(f->buf + f->buf_index, buf, l);
> +f->buf_index += l;
> +buf += l;
> +size -= l;
> +}
> +}
> +
> +static size_t migrate_save_block_hdr(MigBuf *f, RAMBlock *block,
> +ram_addr_t offset, int cont, int flag)
> +{
> +size_t size;
> +
> +migrate_put_be64(f, offset | cont | flag);
> +size = 8;
> +
> +if (!cont) {
> +migrate_put_byte(f, strlen(block->idstr));
> +migrate_put_buffer(f, (uint8_t *)block->idstr,
> +strlen(block->idstr));
> +size += 1 + strlen(block->idstr);
> +}
> +return size;
> +}
> +
> +static int migrate_qemu_add_compress(MigBuf *f,  const uint8_t *p,
> +int size, int level)
> +{
> +uLong  blen = COMPRESS_BUF_SIZE;
> +if (compress2(f->buf + f->buf_index + sizeof(int), &blen, (Bytef *)p,
> +size, level) != Z_OK) {
> +error_report("Compress Failed!\n");
> +return 0;
> +}
> +migrate_put_be32(f, blen);
> +f->buf_index += blen;
> +return blen + sizeof(int);
> +}
> +
> +enum {
> +COM_DONE = 0,
> +COM_START,
> +};
> +
> +static int  compress_thread_count;
> +static int  decompress_thread_count;
> +
> +struct compress_param {
> +int state;
> +MigBuf migbuf;
> +RAMBlock *block;
> +ram_addr_t offset;
> +bool last_stage;
> +int ret;
> +int bytes_sent;
> +uint8_t *p;
> +int cont;
> +bool bulk_stage;
> +};
> +
> +typedef struct compress_param compress_param;
> +compress_param *comp_param;
> +
> +struct decompress_param {
> +int state;
> +void *des;
> +uint8 compbuf[COMPRESS_BUF_SIZE];
> +int len;
> +};
> +typedef struct decompress_param decompress_param;
> +
> +static decompress_param *decomp_param;
> +bool incomming_migration_done;
> +static bool quit_thread;
> +
> +static int save_compress_ram_page(compress_param *param);
> +
> +
> +static void *do_data_compress(void *opaque)
> +{
> +compress_param *param = opaque;
> +while (!quit_thread) {
> +if (param->state == COM_START) {
> +save_compress_ram_page(param);
> +param->

Re: [Qemu-devel] [PATCH 00/17] RFC: userfault v2

2014-11-20 Thread zhanghailiang


On 2014/11/21 1:38, Andrea Arcangeli wrote:

Hi,

On Thu, Nov 20, 2014 at 10:54:29AM +0800, zhanghailiang wrote:

Yes, you are right. This is what i really want, bypass all non-present faults
and only track strict wrprotect faults. ;)

So, do you plan to support that in the userfault API?


Yes I think it's good idea to support wrprotect/COW faults too.



Great! Then i can expect your patches. ;)


I just wanted to understand if there was any other reason why you
needed only wrprotect faults, because the non-present faults didn't
look like a big performance concern if they triggered in addition to
wrprotect faults, but it's certainly ok to optimize them away so it's
fully optimal.



Er, you have got the answer, no special, it's only for optimality.


All it takes to differentiate the behavior should be one more bit
during registration so you can select non-present, wrprotect faults or
both. postcopy live migration would select only non-present faults,
postcopy live snapshot would select only wrprotect faults, anything
like distributed shared memory supporting shared readonly access and
exclusive write access, would select both flags.



It is really flexible in this way.


I just sent an (unfortunately) longish but way more detailed email
about live snapshotting with userfaultfd but I just wanted to give a
shorter answer here too :).



Thanks for your explanation, and your patience. It is really useful,
now i know more details about why 'fork & dump live snapshot' scenario
is not acceptable. Thanks :)


Thanks,
Andrea

.

Re: [Qemu-devel] [PATCH 3/4] sdhci: Support SDHCI devices on PCI

2014-11-20 Thread Gerd Hoffmann

  Hi,

> I know recent Intel chips (eg, baytrail) have a builtin sdhci
> controller (eg, 8086:0f16).  However, that has quirks defined in the
> Linux driver.  Basic functionality still does seem to work though when
> I use those ids in qemu.  The same basic functionality also seems to
> work when I use 1b36:0005 as well.
> 
> Is there a preference then to use the redhat ids?

Yes, for pci class devices (so the guest driver do not match specific
pci vendor/device ids, except for quirks) we started using the redhat
ids.

> Gerd, you seem to
> be in charge of the redhat pci ids - are you okay if I us one (should
> it be the existing 0005 or add 0006)?

Just grab the first free (which seems to be 0005) and add a line to
docs/specs/pci-ids.txt with your patch.

cheers,
  Gerd

Re: [Qemu-devel] [v2 2/2] migration: Implement multiple compression threads

2014-11-20 Thread Li, Liang Z

> > +static void migrate_put_be32(MigBuf *f, unsigned int v)
> > +{
> > +migrate_put_byte(f, v >> 24);
> > +migrate_put_byte(f, v >> 16);
> > +migrate_put_byte(f, v >> 8);
> > +migrate_put_byte(f, v);
> > +}
> > +
> > +static void migrate_put_be64(MigBuf *f, uint64_t v)
> > +{
> > +migrate_put_be32(f, v >> 32);
> > +migrate_put_be32(f, v);
> > +}
> > +

> This feels like you're doing something very similar to 
> the buffered file code that recently went in; could you
> reuse qemu_bufopen or the QEMUSizedBuffer?
> I think if you could use the qemu_buf somehow (maybe with
> modifications?) then you could avoid a lot of the 'if'd
> code below, because you'd always be working with a QEMUFile,
> it would just be a different QEMUFile.

I will do it in the next version patch.  

> > +static void *do_data_compress(void *opaque)
> > +{
> > +compress_param *param = opaque;
> > +while (!quit_thread) {
> > +if (param->state == COM_START) {
> > +save_compress_ram_page(param);
> > +param->state = COM_DONE;
> > + } else {
> > + g_usleep(1);
> 
> > There has to be a better way than heaving your thread spin
> > with sleeps; qemu_event or semaphore or something?

I will use QemuCond and QemuMutex  instead.

> > +static int save_compress_ram_page(compress_param *param)
> > +{
> > +int bytes_sent = param->bytes_sent;
> > +int blen = COMPRESS_BUF_SIZE;
> > +int cont = param->cont;
> > +uint8_t *p = param->p;
> > +int ret = param->ret;
> > +RAMBlock *block = param->block;
> > +ram_addr_t offset = param->offset;
> > +bool last_stage = param->last_stage;
> > +/* In doubt sent page as normal */
> > +XBZRLE_cache_lock();
> > +ram_addr_t current_addr = block->offset + offset;
> > +if (ret != RAM_SAVE_CONTROL_NOT_SUPP) {
> > +if (ret != RAM_SAVE_CONTROL_DELAYED) {
> > +if (bytes_sent > 0) {
> > +atomic_inc(&acct_info.norm_pages);
> > + } else if (bytes_sent == 0) {
> > +atomic_inc(&acct_info.dup_pages);
> > + }
> > +}
> > +} else if (is_zero_range(p, TARGET_PAGE_SIZE)) {
> > +atomic_inc(&acct_info.dup_pages);
> > +bytes_sent = migrate_save_block_hdr(¶m->migbuf, block, offset, 
> > cont,
> > + RAM_SAVE_FLAG_COMPRESS);
> > +migrate_put_byte(¶m->migbuf, 0);
> > +bytes_sent++;
> > +/* Must let xbzrle know, otherwise a previous (now 0'd) cached
> > + * page would be stale
> > + */
> > +xbzrle_cache_zero_page(current_addr);
> > +} else if (!param->bulk_stage && migrate_use_xbzrle()) {
> > +bytes_sent = save_xbzrle_page(¶m->migbuf, &p, current_addr, 
> > block,
> > +  offset, cont, last_stage, true);
> > +}
> > +XBZRLE_cache_unlock();
> > +/* XBZRLE overflow or normal page */

> I wonder if it's worth the complexity of doing the zero check
> and the xbzrle if you're already doing compression?  I assume
> zlib is going to handle a zero page reasonably well anyway?

Yes, the test show it's worth, with zero check is time will be shorter. The 
reason for checking the xbzrle is that  I want the compression co-work with 
xbzrle, 
using xbzrle can reduce the amount of data transferred.

> > +static uint64_t bytes_transferred;
> > +
> > +static void flush_compressed_data(QEMUFile *f)
> > +{
> > +int idx;
> > +if (!migrate_use_compress()) {
> > +return;
> > +}
> > +
> > +for (idx = 0; idx < compress_thread_count; idx++) {
> > +while (comp_param[idx].state != COM_DONE) {
> > +g_usleep(0);
> > +}

> Again, some type of event/semaphore rather than busy sleeping;
> and also I don't understand how the different threads keep everything
> in order - can you add some comments (or maybe notes in the docs)
> that explain how it all works?

I will add some comments in next version.

> >  }
> >  } else {
> > -bytes_sent = ram_save_page(f, block, offset, last_stage);
> > -
> > -/* if page is unmodified, continue to the next */
> > -if (bytes_sent > 0) {
> > -last_sent_block = block;
> > -break;
> > +if (!migrate_use_compress()) {
> > +bytes_sent = ram_save_page(f, block, offset, last_stage);
> > +/* if page is unmodified, continue to the next */
> > +if (bytes_sent > 0) {
> > +last_sent_block = block;
> > +break;
> > +}
> > +} else {
> > +cont = (block == last_sent_block) ? RAM_SAVE_FLAG_CONTINUE 
> > : 0;
> > +p = memory_region_get_ram_ptr(block->mr) + offset;
> > +ret = ram_control_save_page(f, block->offset,
> > +   offset, TARGET_PAGE_SIZE, &len);
> > +

Re: [Qemu-devel] [PATCH v4 19/47] Rework loadvm path for subloops

2014-11-20 Thread David Gibson

On Wed, Nov 19, 2014 at 05:50:11PM +, Dr. David Alan Gilbert wrote:
> * David Gibson (da...@gibson.dropbear.id.au) wrote:
> > On Fri, Oct 03, 2014 at 06:47:25PM +0100, Dr. David Alan Gilbert (git) 
> > wrote:
> > > From: "Dr. David Alan Gilbert" 
> > > 
> > > Postcopy needs to have two migration streams loading concurrently;
> > > one from memory (with the device state) and the other from the fd
> > > with the memory transactions.
> > > 
> > > Split the core of qemu_loadvm_state out so we can use it for both.
> > > 
> > > Allow the inner loadvm loop to quit and signal whether the parent
> > > should.
> > > 
> > > loadvm_handlers is made static since it's lifetime is greater
> > > than the outer qemu_loadvm_state.
> > 
> > Maybe it's just me, but "made static" to me indicates either a change
> > from fully-global to module-global, or (function) local automatic to
> > local static, not a change from function local-automatic to
> > module-global as here.
> > 
> > It's also not clear from this patch alone why the lifetime of
> > loadvm_handlers now needs to exceed that of qemu_loadvm_state().
> 
> OK, how about if I reworked that last sentence to be:
> 
>loadvm_handlers is made module-global to survive beyond the lifetime
>of the outer qemu_loadvm_state since it may still be in use by
>a subloop in the postcopy listen thread.

Yeah, that's better.  A global seems ugly though.  Would it be better
to dynamically allocate the list head and pass a pointer into the
listen thread, or even to pass the list head by value into the listen
thread.

The individual list elements need to be cleaned up at some point
anyway, so I don't think that introduces any lifetime questions that
weren't already there.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgpOgsmPM1cgz.pgp
Description: PGP signature

Re: [Qemu-devel] [PATCH v4 24/47] Allow savevm handlers to state whether they could go into postcopy

2014-11-20 Thread David Gibson

On Wed, Nov 19, 2014 at 05:53:54PM +, Dr. David Alan Gilbert wrote:
> * David Gibson (da...@gibson.dropbear.id.au) wrote:
> > On Fri, Oct 03, 2014 at 06:47:30PM +0100, Dr. David Alan Gilbert (git) 
> > wrote:
> > > From: "Dr. David Alan Gilbert" 
> > > 
> > > Use that to split the qemu_savevm_state_pending counts into postcopiable
> > > and non-postcopiable amounts
> > > 
> > > Signed-off-by: Dr. David Alan Gilbert 
> > > ---
> > >  arch_init.c |  7 +++
> > >  include/migration/vmstate.h |  2 +-
> > >  include/sysemu/sysemu.h |  4 +++-
> > >  migration.c |  9 -
> > >  savevm.c| 23 +++
> > >  5 files changed, 38 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/arch_init.c b/arch_init.c
> > > index 6970733..44072d8 100644
> > > --- a/arch_init.c
> > > +++ b/arch_init.c
> > > @@ -1192,6 +1192,12 @@ static int ram_load(QEMUFile *f, void *opaque, int 
> > > version_id)
> > >  return ret;
> > >  }
> > >  
> > > +/* RAM's always up for postcopying */
> > > +static bool ram_can_postcopy(void *opaque)
> > > +{
> > > +return true;
> > > +}
> > > +
> > >  static SaveVMHandlers savevm_ram_handlers = {
> > >  .save_live_setup = ram_save_setup,
> > >  .save_live_iterate = ram_save_iterate,
> > > @@ -1199,6 +1205,7 @@ static SaveVMHandlers savevm_ram_handlers = {
> > >  .save_live_pending = ram_save_pending,
> > >  .load_state = ram_load,
> > >  .cancel = ram_migration_cancel,
> > > +.can_postcopy = ram_can_postcopy,
> > 
> > Is there actually any plausible device for which you'd need a callback
> > here, rather than just having a static bool?
> > 
> > On the other hand, it does seem kind of plausible that there might be
> > situations in which some data from a device must be pre-copied, but
> > more can be post-copied, which would necessitate extending the
> > per-handler callback to return quantities for both.
> 
> It's cheap enough and I couldn't make a strong argument about
> any possible device, so I just used the function.

Ok.  I still wonder if it might be better to instead extend
the save_live_pending callback in order to return both
non-postcopyable and postcopyable quantites.  It allows for the case
of a postcopyable device which has some non-postcopyable data - and
with any postcopyable device other than RAM, it seems likely that
there will need to be some precopied metadata at least.  Plus it
avoids adding another callback.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgpvOHIjJm3cD.pgp
Description: PGP signature

Re: [Qemu-devel] [v2 2/2] migration: Implement multiple compression threads

2014-11-20 Thread Zhang, Yang Z

Eric Blake wrote on 2014-11-06:

Hi Eric

Thanks for your review and comment.

> On 11/06/2014 12:08 PM, Li Liang wrote:
>> Instead of sending the guest memory directly, this solution compress 
>> the ram page before sending, after receiving, the data will be 
>> decompressed.
>> This feature can help to reduce the data transferred about 60%, this 
>> is very useful when the network bandwidth is limited, and the 
>> migration time can also be reduced about 70%. The feature is off by 
>> default, following the document docs/multiple-compression-threads.txt
>> for information to use it.
>> 
>> Reviewed-by: Eric Blake 
> 
> Please DON'T add this line unless the author spelled it out (or if 
> they mentioned that it would be okay if you fix minor issues).  I 
> intentionally omitted a reviewed-by on v1:
> 
> https://lists.gnu.org/archive/html/qemu-devel/2014-11/msg00672.html
> 
> because I was not happy with the patch as it was presented and did not 
> think the work to fix it was trivial.  Furthermore, my review of v1 
> was just over the interface, and not the entire patch; there are very 
> likely still bugs lurking in the .c files.  Once again, I'm going to 
> limit my review of v2 to the interface (at least in this email):
> 
>> Signed-off-by: Li Liang 
>> ---
>> 
>> +++ b/qapi-schema.json
>> @@ -491,13 +491,17 @@
>>  #  to enable the capability on the source VM. The feature is
>>  disabled by #  default. (since 1.6) #
>> +# @compress: Using the multiple compression threads to accelerate
>> +live migration. +#  This feature can help to reduce the
>> migration traffic, by sending +#  compressed pages. The feature
>> is disabled by default. (since 2.3) +#  # @auto-converge: If enabled, 
>> QEMU will automatically throttle down the
> guest
>>  #  to speed up convergence of RAM migration. (since 1.6)
>>  #
>>  # Since: 1.2
>>  ##
>>  { 'enum': 'MigrationCapability',
>> -  'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks'] 
>> }
>> +  'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks', 
>> + 'compress'] }
>> 
> 
> I'll repeat what I said on v1 (but this time, with some links to back 
> it up :)

Agree. Additional commands lead to complexity. 

> 
> We really need to avoid a proliferation of new commands, two per 
> tunable does not scale well.  I think now is the time to implement my 
> earlier suggestion at making MigrationCapability become THE resource for 
> tunables:
> 
> https://lists.gnu.org/archive/html/qemu-devel/2014-03/msg02274.html

I see that you said you are trying to clean up this part. Have you done it? It 
is helpful if you can provide some draft patches or help on this since you are 
more professional than us on this part.

Best regards,
Yang

[Qemu-devel] [Bug 1368815] Re: qemu-img convert intermittently corrupts output images

2014-11-20 Thread Tony Breeds

Awesome.

Thanks!

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1368815

Title:
  qemu-img convert intermittently corrupts output images

Status in OpenStack Compute (Nova):
  In Progress
Status in QEMU:
  In Progress
Status in “qemu” package in Ubuntu:
  Fix Released
Status in “qemu” source package in Trusty:
  Triaged
Status in “qemu” source package in Utopic:
  Triaged
Status in “qemu” source package in Vivid:
  Fix Released

Bug description:
  ==
  Impact: occasional qcow2 corruption
  Test case: see the qemu-img command below
  Regression potential: this cherrypicks a patch from upstream to a 
not-insignificantly older qemu source tree.  While the cherrypick seems sane, 
it's possible that there are subtle interactions with the other delta.  I'd 
really like for a full qa-regression-test qemu testcase to be run against this 
package.
  ==

  -- Found in releases qemu-2.0.0, qemu-2.0.2, qemu-2.1.0. Tested on
  Ubuntu 14.04 using Ext4 filesystems.

  The command

qemu-img convert -O raw inputimage.qcow2 outputimage.raw

  intermittently creates corrupted output images, when the input image
  is not yet fully synchronized to disk. While the issue has actually
  been discovered in operation of of OpenStack nova, it can be
  reproduced "easily" on command line using

cat $SRC_PATH > $TMP_PATH && $QEMU_IMG_PATH convert -O raw $TMP_PATH
  $DST_PATH && cksum $DST_PATH

  on filesystems exposing this behavior. (The difficult part of this
  exercise is to prepare a filesystem to reliably trigger this race. On
  my test machine some filesystems are affected while other aren't, and
  unfortunately I haven't found the relevant difference between them,
  yet. Possible it's timing issues completely out of userspace control
  ...)

  The root cause, however, is the same as in

http://lists.gnu.org/archive/html/coreutils/2011-04/msg00069.html

  and it can be solved the same way as suggested in

http://lists.gnu.org/archive/html/coreutils/2011-04/msg00102.html

  In qemu, file block/raw-posix.c use the FIEMAP_FLAG_SYNC, i.e change

  f.fm.fm_flags = 0;

  to

  f.fm.fm_flags = FIEMAP_FLAG_SYNC;

  As discussed in the thread mentioned above, retrieving a page cache
  coherent map of file extents is possible only after fsync on that
  file.

  See also

https://bugs.launchpad.net/nova/+bug/1350766

  In that bug report filed against nova, fsync had been suggested to be
  performed by the framework invoking qemu-img. However, as the choice
  of fiemap -- implying this otherwise unneeded fsync of a temporary
  file  -- is not made by the caller but by qemu-img, I agree with the
  nova bug reviewer's objection to put it into nova. The fsync should
  instead be triggered by qemu-img utilizing the FIEMAP_FLAG_SYNC,
  specifically intended for that purpose.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1368815/+subscriptions

[Qemu-devel] [Bug 1392504] Re: USB Passthrough is not working anymore

2014-11-20 Thread Serge Hallyn

@Leen,

I am pushing qemu 2.1+dfsg-4ubuntu6.3~ppa1 (which includes the fix
proposed by Gonglei in comment #7) to ppa:serge-hallyn/libvirt-testing.
Please let us know whether it does in fact solve your issue.

** Also affects: qemu (Ubuntu)
   Importance: Undecided
   Status: New

** Changed in: qemu (Ubuntu)
   Status: New => Incomplete

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1392504

Title:
  USB Passthrough is not working anymore

Status in QEMU:
  New
Status in “qemu” package in Ubuntu:
  Incomplete

Bug description:
  After upgrading from Ubuntu 14.04 to Ubuntu 14.10 USB passthrough in
  QEMU (version is now 2.1.0 - Debian2.1+dfsg-4ubuntu6.1) is not working
  any more. Already tried to remove and attach the USB devices. I use 1
  USB2 HDD  +  1 USB3 HDD to a virtual linux machine; 1 USB2 HDD to a
  virual FNAS machine and a USB 2camera to virtual Win7 machine. All
  these devices are not working any more.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1392504/+subscriptions

Re: [Qemu-devel] [PATCH for-2.2] acpi-build: mark RAM dirty on table update

2014-11-20 Thread Amit Shah

On (Thu) 20 Nov 2014 [09:16:31], Igor Mammedov wrote:
> On Thu, 20 Nov 2014 09:49:20 +0530
> Amit Shah  wrote:
> 
> > On (Wed) 19 Nov 2014 [11:08:46], Igor Mammedov wrote:
> > > On Wed, 19 Nov 2014 12:51:00 +0530
> > > Amit Shah  wrote:
> > 
> > > > > -static void *acpi_add_rom_blob(AcpiBuildState *build_state, GArray
> > > > > *blob, +static ram_addr_t acpi_add_rom_blob(AcpiBuildState
> > > > > *build_state, GArray *blob, const char *name)
> > > > >  {
> > > > >  return rom_add_blob(name, blob->data, acpi_data_len(blob), -1,
> > > > > name, @@ -1777,6 +1781,7 @@ void acpi_setup(PcGuestInfo *guest_info)
> > > > >  /* Now expose it all to Guest */
> > > > >  build_state->table_ram = acpi_add_rom_blob(build_state,
> > > > > tables.table_data, ACPI_BUILD_TABLE_FILE);
> > > > > +assert(build_state->table_ram != RAM_ADDR_MAX);
> > > > >  build_state->table_size = acpi_data_len(tables.table_data);
> > > > 
> > > > Isn't an assert too strong if this happens during hotplug?
> > > > 
> > > > I'm trying to follow this code, but looks like this isn't called in
> > > > the hotplug path - is that right?
> > > yep, it's called only at startup
> > 
> > Thanks; what's the path taken for hotplug, then?  (Ensuring we have
> > that case covered too).
> In case of hotplug ACPI blob doesn't change at all (it has all
> possible device objects in it). It changes on the first time BIOS
> fetches ROM blob and on reset.
> Later during hotplug, guest gets SCI interrupt and runs related to
> hotplug event AML method, which fishes out bits necessary for hotplug
> via corresponding mmio interface.

Excellent, thanks!

Amit

[Qemu-devel] [Bug 1368815] Re: qemu-img convert intermittently corrupts output images

2014-11-20 Thread Serge Hallyn

Hi Tony,

yes, I've uploaded a proposed fix for trusty-proposed earlier today.  It
should be available for testing as soon as it is accepted.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1368815

Title:
  qemu-img convert intermittently corrupts output images

Status in OpenStack Compute (Nova):
  In Progress
Status in QEMU:
  In Progress
Status in “qemu” package in Ubuntu:
  Fix Released
Status in “qemu” source package in Trusty:
  Triaged
Status in “qemu” source package in Utopic:
  Triaged
Status in “qemu” source package in Vivid:
  Fix Released

Bug description:
  ==
  Impact: occasional qcow2 corruption
  Test case: see the qemu-img command below
  Regression potential: this cherrypicks a patch from upstream to a 
not-insignificantly older qemu source tree.  While the cherrypick seems sane, 
it's possible that there are subtle interactions with the other delta.  I'd 
really like for a full qa-regression-test qemu testcase to be run against this 
package.
  ==

  -- Found in releases qemu-2.0.0, qemu-2.0.2, qemu-2.1.0. Tested on
  Ubuntu 14.04 using Ext4 filesystems.

  The command

qemu-img convert -O raw inputimage.qcow2 outputimage.raw

  intermittently creates corrupted output images, when the input image
  is not yet fully synchronized to disk. While the issue has actually
  been discovered in operation of of OpenStack nova, it can be
  reproduced "easily" on command line using

cat $SRC_PATH > $TMP_PATH && $QEMU_IMG_PATH convert -O raw $TMP_PATH
  $DST_PATH && cksum $DST_PATH

  on filesystems exposing this behavior. (The difficult part of this
  exercise is to prepare a filesystem to reliably trigger this race. On
  my test machine some filesystems are affected while other aren't, and
  unfortunately I haven't found the relevant difference between them,
  yet. Possible it's timing issues completely out of userspace control
  ...)

  The root cause, however, is the same as in

http://lists.gnu.org/archive/html/coreutils/2011-04/msg00069.html

  and it can be solved the same way as suggested in

http://lists.gnu.org/archive/html/coreutils/2011-04/msg00102.html

  In qemu, file block/raw-posix.c use the FIEMAP_FLAG_SYNC, i.e change

  f.fm.fm_flags = 0;

  to

  f.fm.fm_flags = FIEMAP_FLAG_SYNC;

  As discussed in the thread mentioned above, retrieving a page cache
  coherent map of file extents is possible only after fsync on that
  file.

  See also

https://bugs.launchpad.net/nova/+bug/1350766

  In that bug report filed against nova, fsync had been suggested to be
  performed by the framework invoking qemu-img. However, as the choice
  of fiemap -- implying this otherwise unneeded fsync of a temporary
  file  -- is not made by the caller but by qemu-img, I agree with the
  nova bug reviewer's objection to put it into nova. The fsync should
  instead be triggered by qemu-img utilizing the FIEMAP_FLAG_SYNC,
  specifically intended for that purpose.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1368815/+subscriptions

Re: [Qemu-devel] [PATCH v4 00/47] Postcopy implementation

2014-11-20 Thread zhanghailiang


Hi David,

When i migrated VM in postcopy way when configuring VM with '-realtime 
mlock=on' option,
It failed, and reports "postcopy_ram_hosttest: remap_anon_pages not available: File 
exists" in destination,

Is it a bug of userfaultfd API?

cc: Andrea

reproduce Steps:
Source:
qemu-postcopy/qemu # x86_64-softmmu/qemu-system-x86_64 -msg timestamp=on \
-machine pc-i440fx-2.2,accel=kvm -m 1024 -realtime mlock=on -smp 4 \
-hda /mnt/sdb/pure_IMG/redhat/redhat-6.4-httpd.img -vnc :11 -monitor stdio

Destination:
qemu-postcopy/qemu # x86_64-softmmu/qemu-system-x86_64 -msg timestamp=on \
-machine pc-i440fx-2.2,accel=kvm -m 1024 -realtime mlock=on -smp 4 \
-hda /mnt/sdb/pure_IMG/redhat/redhat-6.4-httpd.img -vnc :12 -monitor stdio \
-incoming unix:/mnt/migrate.sock
(1) migrate_set_capability x-postcopy-ram on
(2) migrate -d unix:/mnt/migrate.sock

In Destination, it fails, reports:
savevm@2040988668 qemu_loadvm_state_main QEMU_VM_COMMAND ret: 0
savevm@2040988668 qemu_loadvm_state loop: section_type=6
savevm@2040988668 loadvm_postcopy_ram_handle_advise
postcopy_ram_hosttest: remap_anon_pages not available: File exists
savevm@2040988668 qemu_loadvm_state_main QEMU_VM_COMMAND ret: -1

And one more thing, i want to know: ;)
Why we must start precopy first before start postcopy?
Can we do postcopy at the beginning of migration?

Thanks,
zhanghailiang

On 2014/10/4 1:47, Dr. David Alan Gilbert (git) wrote:

From: "Dr. David Alan Gilbert" 

Hi,
   This is the 4th cut of my version of postcopy; it is designed for use with
the Linux kernel additions just posted by Andrea Arcangeli here:

http://marc.info/?l=linux-kernel&m=141235633015100&w=2

(Note: This is a new version compared to my previous postcopy patchset; you'll
need to update the kernel to the new version.)

Other than the new kernel ABI (which is only a small change to the userspace 
side);
the major changes are;

   a) Code for host page size != target page size
   b) Support for migration over fd
  From Cristian Klein; this is for libvirt support which Cristian recently
  posted to the libvirt list.
   c) It's now build bisectable and builds on 32bit

Testing wise; I've now done many thousand of postcopy migrations without
failure (both of idle and busy guests); so it seems pretty solid.

Must-TODO's:
   1) A partially repeatable migration_cancel failure
   2) virt_test's migrate.with_reboot test is failing
   3) The ACPI fix in 2.1 that allowed migrating RAMBlocks to be larger than
 the source feels like it needs looking at for postcopy.
   4) Paolo's comments with respect to the wakeup_request/is_running code
  in the migration thread
   5) xbzrle needs disabling once in postcopy

Later-TODO's:
   1) Control the rate of background page transfers during postcopy to
  reduce their impact on the latency of postcopy requests.
   2) Work with RDMA
   3) Could destination RP be made blocking (as per discussion with Paolo;
  I'm still worried that that changes too many assumptions)



V4:
   Initial support for host page size != target page size
 - tested heavily on hps==tps
 - only partially tested on hps!=tps systems
 - This involved quite a bit of rework around the discard code
   Updated to new kernel userfault ABI
 - It won't work with the previous version
   Fix mis-optimisation of postcopy request for wrong RAMBlock
  request for block A offset n
  un-needed fault for block B/m (already received - no req sent)
  request for block B/l  - wrongly sent as request for A/l
   Fix thinko in discard bitmap processing (missed last word of bitmap)
  Symptom: remap failures near the top of RAM if postcopy started late
   Fix bug that caused kernel page acknowledgments to be misaligned
  May have meant the guest was paused for longer than required
   Fix potential for crashing cleaning up failed RP
   Fixes in docs (from Yang)
   Handle migration by fd as sockets if they are sockets
   Build tested on 32bit
   Fully build bisectable (x86-64)


Dave

Cristian Klein (1):
   Handle bi-directional communication for fd migration

Dr. David Alan Gilbert (46):
   QEMUSizedBuffer based QEMUFile
   Tests: QEMUSizedBuffer/QEMUBuffer
   Start documenting how postcopy works.
   qemu_ram_foreach_block: pass up error value, and down the ramblock
 name
   improve DPRINTF macros, add to savevm
   Add qemu_get_counted_string to read a string prefixed by a count byte
   Create MigrationIncomingState
   socket shutdown
   Provide runtime Target page information
   Return path: Open a return path on QEMUFile for sockets
   Return path: socket_writev_buffer: Block even on non-blocking fd's
   Migration commands
   Return path: Control commands
   Return path: Send responses from destination to source
   Return path: Source handling of return path
   qemu_loadvm errors and debug
   ram_debug_dump_bitmap: Dump a migration bitmap as text
   Rework loadvm path for subloops
   Add migration-capability boolean for postcopy-ram.
   A

Re: [Qemu-devel] [BUGFIX][PATCH for 2.2 v5 1/1] -machine vmport=auto: Fix handling of VMWare ioport emulation for xen

2014-11-20 Thread Eduardo Habkost

On Thu, Nov 20, 2014 at 05:32:56PM -0500, Don Slutz wrote:
[...]
> @@ -1711,18 +1711,23 @@ static void pc_machine_set_max_ram_below_4g(Object 
> *obj, Visitor *v,
>  pcms->max_ram_below_4g = value;
>  }
>  
> -static bool pc_machine_get_vmport(Object *obj, Error **errp)
> +static void pc_machine_get_vmport(Object *obj, Visitor *v, void *opaque,
> +  const char *name, Error **errp)
>  {
>  PCMachineState *pcms = PC_MACHINE(obj);
> +int vmport = pcms->vmport;
>  
> -return pcms->vmport;
> +visit_type_enum(v, &vmport, OnOffAuto_lookup, NULL, name, errp);

A visit_type_OnOffAuto() function is automatically generated by the QAPI
schema, so you don't need to deal with the low level visit_type_enum()
function.

>  }
>  
> -static void pc_machine_set_vmport(Object *obj, bool value, Error **errp)
> +static void pc_machine_set_vmport(Object *obj, Visitor *v, void *opaque,
> +  const char *name, Error **errp)
>  {
>  PCMachineState *pcms = PC_MACHINE(obj);
> +int vmport;
>  
> -pcms->vmport = value;
> +visit_type_enum(v, &vmport, OnOffAuto_lookup, NULL, name, errp);
> +pcms->vmport = vmport;

'vmport' may be undefined in case the visitor return an error, and in
this case you shouldn't change pcms->vmport. This won't be a problem if
you just call:

visit_type_OnOffAuto(v, &pcms->vmport, name, errp);

>  }
>  
>  static void pc_machine_initfn(Object *obj)
> @@ -1737,11 +1742,11 @@ static void pc_machine_initfn(Object *obj)
>  pc_machine_get_max_ram_below_4g,
>  pc_machine_set_max_ram_below_4g,
>  NULL, NULL, NULL);
> -pcms->vmport = !xen_enabled();
> -object_property_add_bool(obj, PC_MACHINE_VMPORT,
> - pc_machine_get_vmport,
> - pc_machine_set_vmport,
> - NULL);
> +pcms->vmport = ON_OFF_AUTO_AUTO;
> +object_property_add(obj, PC_MACHINE_VMPORT, "str",

I believe "OnOffAuto" is a valid type name, if it is defined in the QAPI
schema.

> +pc_machine_get_vmport,
> +pc_machine_set_vmport,
> +NULL, NULL, NULL);
>  }
>  
[...]

-- 
Eduardo

[Qemu-devel] [PATCH] gtk: Don't crash if -nodefaults

2014-11-20 Thread Fam Zheng

This fixes a crash by just skipping the vte resize hack if cur is NULL.

Reproducer:

qemu-system-x86_64 -nodefaults

Signed-off-by: Fam Zheng 
---
 ui/gtk.c | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/ui/gtk.c b/ui/gtk.c
index 38bf463..29e09a7 100644
--- a/ui/gtk.c
+++ b/ui/gtk.c
@@ -1910,15 +1910,17 @@ void gtk_display_init(DisplayState *ds, bool 
full_screen, bool grab_on_hover)
 #ifdef VTE_RESIZE_HACK
 {
 VirtualConsole *cur = gd_vc_find_current(s);
-int i;
-
-for (i = 0; i < s->nb_vcs; i++) {
-VirtualConsole *vc = &s->vc[i];
-if (vc && vc->type == GD_VC_VTE && vc != cur) {
-gtk_widget_hide(vc->vte.terminal);
+if (cur) {
+int i;
+
+for (i = 0; i < s->nb_vcs; i++) {
+VirtualConsole *vc = &s->vc[i];
+if (vc && vc->type == GD_VC_VTE && vc != cur) {
+gtk_widget_hide(vc->vte.terminal);
+}
 }
+gd_update_windowsize(cur);
 }
-gd_update_windowsize(cur);
 }
 #endif
 
-- 
1.9.3

Re: [Qemu-devel] [PATCH v2 for-2.2 4/4] rtl8139: fix Pointer to local outside scope

2014-11-20 Thread Jason Wang

On 11/20/2014 07:35 PM, arei.gong...@huawei.com wrote:
> From: Gonglei 
>
> Coverity spot:
>  Assigning: iov = struct iovec [3]({{buf, 12UL},
>{(void *)dot1q_buf, 4UL},
>{buf + 12, size - 12}})
>  (address of temporary variable of type struct iovec [3]).
>  out_of_scope: Temporary variable of type struct iovec [3] goes out of scope.
>
> Pointer to local outside scope (RETURN_LOCAL)
> use_invalid:
>  Using iov, which points to an out-of-scope temporary variable of type struct 
> iovec [3].
>
> Signed-off-by: Gonglei 
> Signed-off-by: Paolo Bonzini 
> ---
>  hw/net/rtl8139.c | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c
> index 8b8a1b1..5f0197c 100644
> --- a/hw/net/rtl8139.c
> +++ b/hw/net/rtl8139.c
> @@ -1775,6 +1775,7 @@ static void rtl8139_transfer_frame(RTL8139State *s, 
> uint8_t *buf, int size,
>  int do_interrupt, const uint8_t *dot1q_buf)
>  {
>  struct iovec *iov = NULL;
> +struct iovec vlan_iov[3];
>  
>  if (!size)
>  {
> @@ -1789,6 +1790,9 @@ static void rtl8139_transfer_frame(RTL8139State *s, 
> uint8_t *buf, int size,
>  { .iov_base = buf + ETHER_ADDR_LEN * 2,
>  .iov_len = size - ETHER_ADDR_LEN * 2 },
>  };
> +
> +memcpy(vlan_iov, iov, sizeof(vlan_iov));
> +iov = vlan_iov;
>  }
>  
>  if (TxLoopBack == (s->TxConfig & TxLoopBack))

Reviewed-by: Jason Wang

Re: [Qemu-devel] [PATCH v2 for-2.2 3/4] pcnet: fix Negative array index read

2014-11-20 Thread Jason Wang

On 11/20/2014 07:35 PM, arei.gong...@huawei.com wrote:
> From: Gonglei 
>
> s->xmit_pos maybe assigned to a negative value (-1),
> but in this branch variable s->xmit_pos as an index to
> array s->buffer. Let's add a check for s->xmit_pos.
>
> Signed-off-by: Gonglei 
> Signed-off-by: Paolo Bonzini 
> Reviewed-by: Jason Wang 
> ---
>  hw/net/pcnet.c | 55 ++-
>  1 file changed, 30 insertions(+), 25 deletions(-)
>
> diff --git a/hw/net/pcnet.c b/hw/net/pcnet.c
> index d344c15..f409b92 100644
> --- a/hw/net/pcnet.c
> +++ b/hw/net/pcnet.c
> @@ -1212,7 +1212,7 @@ static void pcnet_transmit(PCNetState *s)
>  hwaddr xmit_cxda = 0;
>  int count = CSR_XMTRL(s)-1;
>  int add_crc = 0;
> -
> +int bcnt;
>  s->xmit_pos = -1;
>  
>  if (!CSR_TXON(s)) {
> @@ -1247,35 +1247,40 @@ static void pcnet_transmit(PCNetState *s)
>  s->xmit_pos = -1;
>  goto txdone;
>  }
> +
> +if (s->xmit_pos < 0) {
> +goto txdone;
> +}
> +
> +bcnt = 4096 - GET_FIELD(tmd.length, TMDL, BCNT);
> +s->phys_mem_read(s->dma_opaque, PHYSADDR(s, tmd.tbadr),
> + s->buffer + s->xmit_pos, bcnt, CSR_BSWP(s));
> +s->xmit_pos += bcnt;
> +
>  if (!GET_FIELD(tmd.status, TMDS, ENP)) {
> -int bcnt = 4096 - GET_FIELD(tmd.length, TMDL, BCNT);
> -s->phys_mem_read(s->dma_opaque, PHYSADDR(s, tmd.tbadr),
> - s->buffer + s->xmit_pos, bcnt, CSR_BSWP(s));
> -s->xmit_pos += bcnt;
> -} else if (s->xmit_pos >= 0) {
> -int bcnt = 4096 - GET_FIELD(tmd.length, TMDL, BCNT);
> -s->phys_mem_read(s->dma_opaque, PHYSADDR(s, tmd.tbadr),
> - s->buffer + s->xmit_pos, bcnt, CSR_BSWP(s));
> -s->xmit_pos += bcnt;
> +goto txdone;
> +}
> +
>  #ifdef PCNET_DEBUG
> -printf("pcnet_transmit size=%d\n", s->xmit_pos);
> +printf("pcnet_transmit size=%d\n", s->xmit_pos);
>  #endif
> -if (CSR_LOOP(s)) {
> -if (BCR_SWSTYLE(s) == 1)
> -add_crc = !GET_FIELD(tmd.status, TMDS, NOFCS);
> -s->looptest = add_crc ? PCNET_LOOPTEST_CRC : 
> PCNET_LOOPTEST_NOCRC;
> -pcnet_receive(qemu_get_queue(s->nic), s->buffer, 
> s->xmit_pos);
> -s->looptest = 0;
> -} else
> -if (s->nic)
> -qemu_send_packet(qemu_get_queue(s->nic), s->buffer,
> - s->xmit_pos);
> -
> -s->csr[0] &= ~0x0008;   /* clear TDMD */
> -s->csr[4] |= 0x0004;/* set TXSTRT */
> -s->xmit_pos = -1;
> +if (CSR_LOOP(s)) {
> +if (BCR_SWSTYLE(s) == 1)
> +add_crc = !GET_FIELD(tmd.status, TMDS, NOFCS);
> +s->looptest = add_crc ? PCNET_LOOPTEST_CRC : 
> PCNET_LOOPTEST_NOCRC;
> +pcnet_receive(qemu_get_queue(s->nic), s->buffer, s->xmit_pos);
> +s->looptest = 0;
> +} else {
> +if (s->nic) {
> +qemu_send_packet(qemu_get_queue(s->nic), s->buffer,
> + s->xmit_pos);
> +}
>  }
>  
> +s->csr[0] &= ~0x0008;   /* clear TDMD */
> +s->csr[4] |= 0x0004;/* set TXSTRT */
> +s->xmit_pos = -1;
> +
>  txdone:
>  SET_FIELD(&tmd.status, TMDS, OWN, 0);
>  TMDSTORE(&tmd, PHYSADDR(s,CSR_CXDA(s)));

Reviewed-by: Jason Wang

Re: [Qemu-devel] [PATCH] pcie: fix typo in pcie_cap_deverr_init()

2014-11-20 Thread Gonglei

On 2014/11/18 10:47, Gonglei (Arei) wrote:

> From: Gonglei 
> 
> Reported-by:
>  https://bugs.launchpad.net/qemu/+bug/1393440
> 
> Signed-off-by: Gonglei 
> ---
>  hw/pci/pcie.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> index 58455bd..fbba589 100644
> --- a/hw/pci/pcie.c
> +++ b/hw/pci/pcie.c
> @@ -145,7 +145,7 @@ void pcie_cap_deverr_init(PCIDevice *dev)
> PCI_EXP_DEVCTL_FERE | PCI_EXP_DEVCTL_URRE);
>  pci_long_test_and_set_mask(dev->w1cmask + pos + PCI_EXP_DEVSTA,
> PCI_EXP_DEVSTA_CED | PCI_EXP_DEVSTA_NFED |
> -   PCI_EXP_DEVSTA_URD | PCI_EXP_DEVSTA_URD);
> +   PCI_EXP_DEVSTA_FED | PCI_EXP_DEVSTA_URD);
>  }
>  
>  void pcie_cap_deverr_reset(PCIDevice *dev)

Ping...
I think this one is a candidate for 2.2 too.

Best regards,
-Gonglei

Re: [Qemu-devel] for 2.2: Re: [PATCH] pcie: fix improper use of negative value

2014-11-20 Thread Gonglei

On 2014/11/21 8:01, Eric Blake wrote:

> On 11/20/2014 01:55 AM, arei.gong...@huawei.com wrote:
>> From: Gonglei 
>>
>> Signed-off-by: Gonglei 
>> ---
>>  hw/pci/pcie.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> adding qemu-trivial in cc.  This is a candidate for 2.2.
> 
> Reviewed-by: Eric Blake 
> 

Thanks. :)

Best regards,
-Gonglei

>>
>> diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
>> index 58455bd..2902f7d 100644
>> --- a/hw/pci/pcie.c
>> +++ b/hw/pci/pcie.c
>> @@ -229,7 +229,7 @@ static void pcie_cap_slot_hotplug_common(PCIDevice 
>> *hotplug_dev,
>>  /* the slot is electromechanically locked.
>>   * This error is propagated up to qdev and then to HMP/QMP.
>>   */
>> -error_setg_errno(errp, -EBUSY, "slot is electromechanically 
>> locked");
>> +error_setg_errno(errp, EBUSY, "slot is electromechanically locked");
>>  }
>>  }
>>  
>>
>

Re: [Qemu-devel] [PATCH v2] persistent dirty bitmap: add QDB file spec.

2014-11-20 Thread Eric Blake

On 11/20/2014 03:34 AM, Vladimir Sementsov-Ogievskiy wrote:
> QDB file is for storing dirty bitmap. The specification is based on
> qcow2 specification.
> 
> Saving several bitmaps is necessary when server shutdowns during
> backup. In this case 2 tables for each disk are available. One
> collected for a previous period and one active. Though this feature
> is discussable.
> 
> Big endian format and Standard Cluster Descriptor are used to simplify
> integration with qcow2, to support internal bitmaps for qcow2 in future.
> 
> The idea is that the same procedure writing the data to QDB file could
> do the same for QCOW2. The only difference is cluster refcount table.
> Should we use it here or not is still questionable.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  docs/specs/qdb.txt | 132 
> +
>  1 file changed, 132 insertions(+)
>  create mode 100644 docs/specs/qdb.txt

No comment on whether the approach itself makes sense - just a
high-level review of this document in isolation.

> 
> diff --git a/docs/specs/qdb.txt b/docs/specs/qdb.txt
> new file mode 100644
> index 000..d570a69
> --- /dev/null
> +++ b/docs/specs/qdb.txt
> @@ -0,0 +1,132 @@
> +== General ==

Missing a copyright notice.  Yeah, you've got a lot of bad examples in
this directory (in docs/* in general), but there ARE a few of the newer
files that are starting to buck the trend and use a copyright/license blurb.

> +
> +"QDB" means "Qemu Dirty Bitmaps". QDB file can store several dirty bitmaps.
> +QDB file is organized in units of constant size, which are called clusters.
> +
> +All numbers in QDB are stored in Big Endian byte order.
> +
> +== Header ==
> +
> +The first cluster of a QDB image contains the file header:
> +
> +Byte  0 -  3:   magic
> +QDB magic string ("QDB\0")
> +
> +  4 -  7:   version
> +Version number (valid value is 1)
> +
> +  8 - 11:   cluster_bits
> +Number of bits that are used for addressing an offset
> +within a cluster (1 << cluster_bits is the cluster size).
> +Must not be less than 9 (i.e. 512 byte clusters).

Is there a maximum?

> +
> + 12 - 15:   nb_bitmaps
> +Number of bitmaps contained in the file
> +
> + 16 - 23:   bitmaps_offset
> +Offset into the QDB file at which the bitmap table 
> starts.
> +Must be aligned to a cluster boundary.
> +
> + 24 - 27:   header_length
> +Length of the header structure in bytes.

does that include the length of all extensions?  Should we enforce a
maximum header length of one cluster?

> +
> +Like in qcow2, directly after the image header, optional sections called 
> header extensions can
> +be stored. Each extension has a structure like the following:
> +
> +Byte  0 -  3:   Header extension type:
> +0x - End of the header extension area
> +other  - Unknown header extension, can be safely
> + ignored
> +
> +  4 -  7:   Length of the header extension data
> +
> +  8 -  n:   Header extension data
> +
> +  n -  m:   Padding to round up the header extension size to the next
> +multiple of 8.
> +
> +Unless stated otherwise, each header extension type shall appear at most once
> +in the same image.

I like how qcow2 v3 has a header extension for listing the name of each
header extension, for nicer error messages.  Also, I think that
declaring all unknown extensions as ignorable may be dangerous, since
you lack a capability bitmask.  Maybe it would be wise to copy the qcow2
v3 capabilities (including flags for ignorable vs. mandatory support of
given features, where a client can sanely decide what to do if it does
not recognize a feature).

> +
> +26 - 27:Size of the bitmap name
> +
> +36 - 39:Size of extra data in the table entry (used for future
> +extensions of the format)
> +
> +variable:   Extra data for future extensions. Unknown fields must be
> +ignored.

This block is width 0 if bytes 36-39 is 0?  How are extensions
identified?  Are they required to be done like overall file headers,
with an id, length, and then variable data, so that it is possible to
scan to the end of each unknown extension to see if the next extension
is known?  This is where capability bits in the overall header may make
more sense.

> +
> +variable:   Name of the bitmap (not null terminated)

The length of this block is determined by bytes 26-27?

> +
> +variable:   Padding to round up the bitmap table entry size to the
> +next multiple of 8.
> +
> +The fields "size", "granularity", "enabled" and "name" are corresponding with
> +the fields in struct BdrvDirtyBitmap.

Re: [Qemu-devel] [RFC][PATCH v2] block: add write threshold reporting for block devices

2014-11-20 Thread Eric Blake

On 11/20/2014 04:04 AM, Stefan Hajnoczi wrote:
>>
>> We're "only" talking about an optimisation here, even though a very
>> useful one, so I wouldn't easily make compromises here. We should
>> probably insist on using the node-name. Management tools need new code
>> anyway to make use of the new functionality, so they can implement
>> node-name support as well while they're at it.
> 
> Using node-name is the best thing to do.
> 
> My concern is just whether libvirt and other management tools are
> actually using node-name yet.

Libvirt is not yet using it, but the more compelling we make it, the
more libvirt will accelerate the efforts needed to start using
node-name.  I'm okay with requiring node-name here.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] for 2.2: Re: [PATCH] pcie: fix improper use of negative value

2014-11-20 Thread Eric Blake

On 11/20/2014 01:55 AM, arei.gong...@huawei.com wrote:
> From: Gonglei 
> 
> Signed-off-by: Gonglei 
> ---
>  hw/pci/pcie.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

adding qemu-trivial in cc.  This is a candidate for 2.2.

Reviewed-by: Eric Blake 

> 
> diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> index 58455bd..2902f7d 100644
> --- a/hw/pci/pcie.c
> +++ b/hw/pci/pcie.c
> @@ -229,7 +229,7 @@ static void pcie_cap_slot_hotplug_common(PCIDevice 
> *hotplug_dev,
>  /* the slot is electromechanically locked.
>   * This error is propagated up to qdev and then to HMP/QMP.
>   */
> -error_setg_errno(errp, -EBUSY, "slot is electromechanically locked");
> +error_setg_errno(errp, EBUSY, "slot is electromechanically locked");
>  }
>  }
>  
> 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v3 22/22] iotests: Add test for different refcount widths

2014-11-20 Thread Eric Blake

On 11/20/2014 10:06 AM, Max Reitz wrote:
> Add a test for conversion between different refcount widths and errors
> specific to certain widths (i.e. snapshots with refcount_width=1).
> 
> Signed-off-by: Max Reitz 
> ---
>  tests/qemu-iotests/112 | 278 
> +
>  tests/qemu-iotests/112.out | 143 +++
>  tests/qemu-iotests/group   |   1 +
>  3 files changed, 422 insertions(+)
>  create mode 100755 tests/qemu-iotests/112
>  create mode 100644 tests/qemu-iotests/112.out
> 

> +echo
> +echo '=== Multiple walks necessary during amend ==='
> +echo
> +
> +IMGOPTS="$IMGOPTS,refcount_width=1,cluster_size=512" _make_test_img 64k
> +
> +# Cluster 0 is the image header, clusters 1 to 4 are used by the L1 table, a
> +# single L2 table, the reftable and a single refblock. This creates 58 data
> +# clusters (actually, the L2 table is created here, too), so in total there 
> are
> +# then 63 used clusters in the image. With a refcount width of 64, one 
> refblock
> +# describes 64 clusters (512 bytes / 64 bits/entry = 64 entries), so this 
> will
> +# make the first refblock in the amended image have exactly one free entry.
> +$QEMU_IO -c "write 0 $((58 * 512))" "$TEST_IMG" | _filter_qemu_io
> +
> +# Now change the refcount width; since the first new refblock will have 
> exactly
> +# one free entry, that entry will be used to store its own reference. No 
> other
> +# refblocks are needed, so then the new reftable will be allocated; since the
> +# first new refblock is completely filled up, this will require a new 
> refblock
> +# which is why the refcount width changing function will need to run through
> +# everything one more time until the allocations are stable.
> +# Having more walks than usual should be visible as regressing progress (from
> +# 66.67 % (2/3 walks) to 50.00 % (2/4 walks)).
> +$QEMU_IMG amend -o refcount_width=64 -p "$TEST_IMG" | tr '\r' '\n' \
> +| grep -A 1 '66.67'

You probably know the drill by now: 'grep -A' is a GNU extension, and
not necessarily portable to other grep.  Portable would be "grep 66.67 |
head -n2", but I'm not going to insist on a rewrite since we already
depend on other GNU-isms in the testsuite.  (Or put another way, if
someone runs the test on BSD and it fails, _then_ we can patch things).

> +
> +=== Multiple walks necessary during amend ===
> +
> +Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=65536
> +wrote 29696/29696 bytes at offset 0
> +29 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +(66.67/100%)
> +(50.00/100%)
> +refcount width: 64
> +No errors were found on the image.

Nicely done, here, and in the overall series.

Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [Bug 1368815] Re: qemu-img convert intermittently corrupts output images

2014-11-20 Thread Tony Breeds

Hi Serge,


Is there any chance these fixes will go into trusty?

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1368815

Title:
  qemu-img convert intermittently corrupts output images

Status in OpenStack Compute (Nova):
  In Progress
Status in QEMU:
  In Progress
Status in “qemu” package in Ubuntu:
  Fix Released
Status in “qemu” source package in Trusty:
  Triaged
Status in “qemu” source package in Utopic:
  Triaged
Status in “qemu” source package in Vivid:
  Fix Released

Bug description:
  ==
  Impact: occasional qcow2 corruption
  Test case: see the qemu-img command below
  Regression potential: this cherrypicks a patch from upstream to a 
not-insignificantly older qemu source tree.  While the cherrypick seems sane, 
it's possible that there are subtle interactions with the other delta.  I'd 
really like for a full qa-regression-test qemu testcase to be run against this 
package.
  ==

  -- Found in releases qemu-2.0.0, qemu-2.0.2, qemu-2.1.0. Tested on
  Ubuntu 14.04 using Ext4 filesystems.

  The command

qemu-img convert -O raw inputimage.qcow2 outputimage.raw

  intermittently creates corrupted output images, when the input image
  is not yet fully synchronized to disk. While the issue has actually
  been discovered in operation of of OpenStack nova, it can be
  reproduced "easily" on command line using

cat $SRC_PATH > $TMP_PATH && $QEMU_IMG_PATH convert -O raw $TMP_PATH
  $DST_PATH && cksum $DST_PATH

  on filesystems exposing this behavior. (The difficult part of this
  exercise is to prepare a filesystem to reliably trigger this race. On
  my test machine some filesystems are affected while other aren't, and
  unfortunately I haven't found the relevant difference between them,
  yet. Possible it's timing issues completely out of userspace control
  ...)

  The root cause, however, is the same as in

http://lists.gnu.org/archive/html/coreutils/2011-04/msg00069.html

  and it can be solved the same way as suggested in

http://lists.gnu.org/archive/html/coreutils/2011-04/msg00102.html

  In qemu, file block/raw-posix.c use the FIEMAP_FLAG_SYNC, i.e change

  f.fm.fm_flags = 0;

  to

  f.fm.fm_flags = FIEMAP_FLAG_SYNC;

  As discussed in the thread mentioned above, retrieving a page cache
  coherent map of file extents is possible only after fsync on that
  file.

  See also

https://bugs.launchpad.net/nova/+bug/1350766

  In that bug report filed against nova, fsync had been suggested to be
  performed by the framework invoking qemu-img. However, as the choice
  of fiemap -- implying this otherwise unneeded fsync of a temporary
  file  -- is not made by the caller but by qemu-img, I agree with the
  nova bug reviewer's objection to put it into nova. The fsync should
  instead be triggered by qemu-img utilizing the FIEMAP_FLAG_SYNC,
  specifically intended for that purpose.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1368815/+subscriptions

[Qemu-devel] [BUGFIX][PATCH for 2.2 v5 1/1] -machine vmport=auto: Fix handling of VMWare ioport emulation for xen

2014-11-20 Thread Don Slutz

c/s 9b23cfb76b3a5e9eb5cc899eaf2f46bc46d33ba4

or

c/s b154537ad07598377ebf98252fb7d2aff127983b

moved the testing of xen_enabled() from pc_init1() to
pc_machine_initfn().

xen_enabled() does not return the correct value in
pc_machine_initfn().

Changed vmport from a bool to an enum.  Added the value "auto" to do
the old way.

Signed-off-by: Don Slutz 
Acked-by: Eric Blake 
---

v5:
  Eduardo Habkost:
What about changing pc_machine->vmport here instead of using a
no_vmport variable, so the actual vmport configuration may be
queried by anybody later using the QOM property? It would even
make the code shorter.
  Done.
  Eric Blake:
I've only reviewed the qapi/common.json and qemu-options.hx
files for QMP interface (and will leave the rest of the patch to
others), but I'm okay with the changes to those files. I guess
that means no R-b, since I didn't do a full review, so here's a
weaker acked-by.

v4:
  Michael S. Tsirkin, Eric Blake, Eduardo Habkost:
Rename vmport to OnOffAuto and move to qapi/common.json
  Eduardo Habkost:
Simpler convert of enum to no_vmport.
  Michael S. Tsirkin:
Add assert for ON_OFF_AUTO_MAX.

 hw/i386/pc.c | 23 ++-
 hw/i386/pc_piix.c|  7 ++-
 hw/i386/pc_q35.c |  7 ++-
 include/hw/i386/pc.h |  2 +-
 qapi/common.json | 15 +++
 qemu-options.hx  |  8 +---
 vl.c |  2 +-
 7 files changed, 48 insertions(+), 16 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 1205db8..15fba60 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1711,18 +1711,23 @@ static void pc_machine_set_max_ram_below_4g(Object 
*obj, Visitor *v,
 pcms->max_ram_below_4g = value;
 }
 
-static bool pc_machine_get_vmport(Object *obj, Error **errp)
+static void pc_machine_get_vmport(Object *obj, Visitor *v, void *opaque,
+  const char *name, Error **errp)
 {
 PCMachineState *pcms = PC_MACHINE(obj);
+int vmport = pcms->vmport;
 
-return pcms->vmport;
+visit_type_enum(v, &vmport, OnOffAuto_lookup, NULL, name, errp);
 }
 
-static void pc_machine_set_vmport(Object *obj, bool value, Error **errp)
+static void pc_machine_set_vmport(Object *obj, Visitor *v, void *opaque,
+  const char *name, Error **errp)
 {
 PCMachineState *pcms = PC_MACHINE(obj);
+int vmport;
 
-pcms->vmport = value;
+visit_type_enum(v, &vmport, OnOffAuto_lookup, NULL, name, errp);
+pcms->vmport = vmport;
 }
 
 static void pc_machine_initfn(Object *obj)
@@ -1737,11 +1742,11 @@ static void pc_machine_initfn(Object *obj)
 pc_machine_get_max_ram_below_4g,
 pc_machine_set_max_ram_below_4g,
 NULL, NULL, NULL);
-pcms->vmport = !xen_enabled();
-object_property_add_bool(obj, PC_MACHINE_VMPORT,
- pc_machine_get_vmport,
- pc_machine_set_vmport,
- NULL);
+pcms->vmport = ON_OFF_AUTO_AUTO;
+object_property_add(obj, PC_MACHINE_VMPORT, "str",
+pc_machine_get_vmport,
+pc_machine_set_vmport,
+NULL, NULL, NULL);
 }
 
 static void pc_machine_class_init(ObjectClass *oc, void *data)
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 7bb97a4..fffaab7 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -234,9 +234,14 @@ static void pc_init1(MachineState *machine,
 
 pc_vga_init(isa_bus, pci_enabled ? pci_bus : NULL);
 
+assert(pc_machine->vmport != ON_OFF_AUTO_MAX);
+if (pc_machine->vmport == ON_OFF_AUTO_AUTO) {
+pc_machine->vmport = xen_enabled() ? ON_OFF_AUTO_OFF : ON_OFF_AUTO_ON;
+}
+
 /* init basic PC hardware */
 pc_basic_device_init(isa_bus, gsi, &rtc_state, &floppy,
- !pc_machine->vmport, 0x4);
+ (pc_machine->vmport != ON_OFF_AUTO_ON), 0x4);
 
 pc_nic_init(isa_bus, pci_bus);
 
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 598e679..88cee93 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -242,9 +242,14 @@ static void pc_q35_init(MachineState *machine)
 
 pc_register_ferr_irq(gsi[13]);
 
+assert(pc_machine->vmport != ON_OFF_AUTO_MAX);
+if (pc_machine->vmport == ON_OFF_AUTO_AUTO) {
+pc_machine->vmport = xen_enabled() ? ON_OFF_AUTO_OFF : ON_OFF_AUTO_ON;
+}
+
 /* init basic PC hardware */
 pc_basic_device_init(isa_bus, gsi, &rtc_state, &floppy,
- !pc_machine->vmport, 0xff0104);
+ (pc_machine->vmport != ON_OFF_AUTO_ON), 0xff0104);
 
 /* connect pm stuff to lpc */
 ich9_lpc_pm_init(lpc);
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 7c3731f..7f7d2d4 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -37,7 +37,7 @@ struct PCMachineState {
 ISADevice *rtc;
 
 uint64_t max_ram

[Qemu-devel] [Bug 1368815] Re: qemu-img convert intermittently corrupts output images

2014-11-20 Thread Serge Hallyn

** Description changed:

+ ==
+ Impact: occasional qcow2 corruption
+ Test case: see the qemu-img command below
+ Regression potential: this cherrypicks a patch from upstream to a 
not-insignificantly older qemu source tree.  While the cherrypick seems sane, 
it's possible that there are subtle interactions with the other delta.  I'd 
really like for a full qa-regression-test qemu testcase to be run against this 
package.
+ ==
+ 
  -- Found in releases qemu-2.0.0, qemu-2.0.2, qemu-2.1.0. Tested on
  Ubuntu 14.04 using Ext4 filesystems.
  
  The command
  
qemu-img convert -O raw inputimage.qcow2 outputimage.raw
  
  intermittently creates corrupted output images, when the input image is
  not yet fully synchronized to disk. While the issue has actually been
  discovered in operation of of OpenStack nova, it can be reproduced
  "easily" on command line using
  
cat $SRC_PATH > $TMP_PATH && $QEMU_IMG_PATH convert -O raw $TMP_PATH
  $DST_PATH && cksum $DST_PATH
  
  on filesystems exposing this behavior. (The difficult part of this
  exercise is to prepare a filesystem to reliably trigger this race. On my
  test machine some filesystems are affected while other aren't, and
  unfortunately I haven't found the relevant difference between them, yet.
  Possible it's timing issues completely out of userspace control ...)
  
  The root cause, however, is the same as in
  
http://lists.gnu.org/archive/html/coreutils/2011-04/msg00069.html
  
  and it can be solved the same way as suggested in
  
http://lists.gnu.org/archive/html/coreutils/2011-04/msg00102.html
  
  In qemu, file block/raw-posix.c use the FIEMAP_FLAG_SYNC, i.e change
  
  f.fm.fm_flags = 0;
  
  to
  
  f.fm.fm_flags = FIEMAP_FLAG_SYNC;
  
  As discussed in the thread mentioned above, retrieving a page cache
  coherent map of file extents is possible only after fsync on that file.
  
  See also
  
https://bugs.launchpad.net/nova/+bug/1350766
  
  In that bug report filed against nova, fsync had been suggested to be
  performed by the framework invoking qemu-img. However, as the choice of
  fiemap -- implying this otherwise unneeded fsync of a temporary file  --
  is not made by the caller but by qemu-img, I agree with the nova bug
  reviewer's objection to put it into nova. The fsync should instead be
  triggered by qemu-img utilizing the FIEMAP_FLAG_SYNC, specifically
  intended for that purpose.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1368815

Title:
  qemu-img convert intermittently corrupts output images

Status in OpenStack Compute (Nova):
  In Progress
Status in QEMU:
  In Progress
Status in “qemu” package in Ubuntu:
  Fix Released
Status in “qemu” source package in Trusty:
  Triaged
Status in “qemu” source package in Utopic:
  Triaged
Status in “qemu” source package in Vivid:
  Fix Released

Bug description:
  ==
  Impact: occasional qcow2 corruption
  Test case: see the qemu-img command below
  Regression potential: this cherrypicks a patch from upstream to a 
not-insignificantly older qemu source tree.  While the cherrypick seems sane, 
it's possible that there are subtle interactions with the other delta.  I'd 
really like for a full qa-regression-test qemu testcase to be run against this 
package.
  ==

  -- Found in releases qemu-2.0.0, qemu-2.0.2, qemu-2.1.0. Tested on
  Ubuntu 14.04 using Ext4 filesystems.

  The command

qemu-img convert -O raw inputimage.qcow2 outputimage.raw

  intermittently creates corrupted output images, when the input image
  is not yet fully synchronized to disk. While the issue has actually
  been discovered in operation of of OpenStack nova, it can be
  reproduced "easily" on command line using

cat $SRC_PATH > $TMP_PATH && $QEMU_IMG_PATH convert -O raw $TMP_PATH
  $DST_PATH && cksum $DST_PATH

  on filesystems exposing this behavior. (The difficult part of this
  exercise is to prepare a filesystem to reliably trigger this race. On
  my test machine some filesystems are affected while other aren't, and
  unfortunately I haven't found the relevant difference between them,
  yet. Possible it's timing issues completely out of userspace control
  ...)

  The root cause, however, is the same as in

http://lists.gnu.org/archive/html/coreutils/2011-04/msg00069.html

  and it can be solved the same way as suggested in

http://lists.gnu.org/archive/html/coreutils/2011-04/msg00102.html

  In qemu, file block/raw-posix.c use the FIEMAP_FLAG_SYNC, i.e change

  f.fm.fm_flags = 0;

  to

  f.fm.fm_flags = FIEMAP_FLAG_SYNC;

  As discussed in the thread mentioned above, retrieving a page cache
  coherent map of file extents is possible only after fs

Re: [Qemu-devel] [PATCH v3 13/22] progress: Allow regressing progress

2014-11-20 Thread Eric Blake

On 11/20/2014 10:06 AM, Max Reitz wrote:
> Progress may regress; this should be displayed correctly by
> qemu_progress_print().
> 
> Signed-off-by: Max Reitz 
> ---
>  util/qemu-progress.c | 1 +
>  1 file changed, 1 insertion(+)

Reviewed-by: Eric Blake 

> 
> diff --git a/util/qemu-progress.c b/util/qemu-progress.c
> index 4ee5cd0..c0fb14d 100644
> --- a/util/qemu-progress.c
> +++ b/util/qemu-progress.c
> @@ -152,6 +152,7 @@ void qemu_progress_print(float delta, int max)
>  state.current = current;
>  
>  if (current > (state.last_print + state.min_skip) ||
> +current < (state.last_print - state.min_skip) ||
>  (current == 100) || (current == 0)) {

Maybe we can drop the redundant () as long as we are touching this area
of code?  But not worth a respin.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v3 10/22] qcow2: refcount_order parameter for qcow2_create2

2014-11-20 Thread Eric Blake

On 11/20/2014 10:06 AM, Max Reitz wrote:
> Add a refcount_order parameter to qcow2_create2(), use that value for
> the image header and for calculating the size required for
> preallocation.
> 
> For now, always pass 4.
> 
> This addition requires changes to the calculation of the file size for
> the "full" and "falloc" preallocation modes. That in turn is a nice
> opportunity to add a command about that calculation not necessarily

s/command/comment/

> being exact (and that being intentional).
> 
> Signed-off-by: Max Reitz 
> Reviewed-by: Eric Blake 

But review still holds. :)

> ---
>  block/qcow2.c | 46 +++---
>  1 file changed, 35 insertions(+), 11 deletions(-)
> 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v3 08/22] qcow2: More helpers for refcount modification

2014-11-20 Thread Eric Blake

On 11/20/2014 10:06 AM, Max Reitz wrote:
> Add helper functions for getting and setting refcounts in a refcount
> array for any possible refcount order, and choose the correct one during
> refcount initialization.
> 
> Signed-off-by: Max Reitz 
> ---
>  block/qcow2-refcount.c | 124 
> -
>  1 file changed, 122 insertions(+), 2 deletions(-)

Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v3 07/22] qcow2: Helper function for refcount modification

2014-11-20 Thread Eric Blake

On 11/20/2014 10:06 AM, Max Reitz wrote:
> Since refcounts do not always have to be a uint16_t, all refcount blocks
> and arrays in memory should not have a specific type (thus they become
> pointers to void) and for accessing them, two helper functions are used
> (a getter and a setter). Those functions are called indirectly through
> function pointers in the BDRVQcowState so they may later be exchanged
> for different refcount orders.
> 
> With the check and repair functions using this function, the refcount
> array they are creating will be in big endian byte order; additionally,
> using realloc_refcount_array() makes the size of this refcount array
> always cluster-aligned. Both combined allow rebuild_refcount_structure()
> to drop the bounce buffer which was used to convert parts of the
> refcount array to big endian byte order and store them on disk. Instead,
> those parts can now be written directly.

Thanks, that helps.

> 
> Signed-off-by: Max Reitz 
> ---
>  block/qcow2-refcount.c | 131 
> ++---
>  block/qcow2.h  |   8 +++
>  2 files changed, 89 insertions(+), 50 deletions(-)
> 

> @@ -2101,7 +2132,7 @@ int qcow2_check_refcounts(BlockDriverState *bs, 
> BdrvCheckResult *res,
>  /* Because the old reftable has been exchanged for a new one the
>   * references have to be recalculated */
>  rebuild = false;
> -memset(refcount_table, 0, nb_clusters * sizeof(uint16_t));
> +memset(refcount_table, 0, refcount_array_byte_size(s, nb_clusters));

Yep, that makes more sense than what you had in v2.

Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v3 06/22] qcow2: Helper for refcount array reallocation

2014-11-20 Thread Eric Blake

On 11/20/2014 10:06 AM, Max Reitz wrote:
> Add a helper function for reallocating a refcount array, independent of
> the refcount order. The newly allocated space is zeroed and the function
> handles failed reallocations gracefully.
> 
> The helper function will always align the buffer size to a cluster
> boundary; if storing the refcounts in such an array in big endian byte
> order, this makes it possible to write parts of the array directly as
> refcount blocks into the image file.
> 
> Signed-off-by: Max Reitz 
> Reviewed-by: Eric Blake 

Perhaps the changes since v2 warranted removing my earlier R-b to make
sure I review closely?

> ---
>  block/qcow2-refcount.c | 135 
> +++--
>  1 file changed, 86 insertions(+), 49 deletions(-)
> 

> +static int realloc_refcount_array(BDRVQcowState *s, uint16_t **array,
> +  int64_t *size, int64_t new_size)
> +{
> +/* Round to clusters so the array can be directly written to disk */
> +size_t old_byte_size = ROUND_UP(refcount_array_byte_size(s, *size),
> +s->cluster_size);
> +size_t new_byte_size = ROUND_UP(refcount_array_byte_size(s, new_size),
> +s->cluster_size);
> +uint16_t *new_ptr;
> +
> +if (new_byte_size <= old_byte_size) {
> +*size = new_size;
> +return 0;
> +}
> +
> +assert(new_byte_size > 0);

Should this assert be moved before the early exit?

Hmm, that's my only finding, and you did incorporate improvements mostly
to comments or asserts as compared to v2.  Moving the assert is small
enough, and not a show-stopper if you leave it in place, so:

Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PULL 9/9] coverity/s390x: avoid false positive in kvm_irqchip_add_adapter_route

2014-11-20 Thread Christian Borntraeger

Paolo Bonzini reported that Coverity reports an uninitialized pad value.
Let's use a designated initializer for kvm_irq_routing_entry to avoid
this false positive. This is similar to kvm_irqchip_add_msi_route and
other users of kvm_irq_routing_entry.

Signed-off-by: Christian Borntraeger 
---
 kvm-all.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kvm-all.c b/kvm-all.c
index b951320..cfd49f0 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1258,7 +1258,7 @@ static int kvm_irqchip_assign_irqfd(KVMState *s, int fd, 
int rfd, int virq,
 
 int kvm_irqchip_add_adapter_route(KVMState *s, AdapterInfo *adapter)
 {
-struct kvm_irq_routing_entry kroute;
+struct kvm_irq_routing_entry kroute = {};
 int virq;
 
 if (!kvm_gsi_routing_enabled()) {
-- 
1.9.3

[Qemu-devel] [PULL 8/9] valgrind/s390x: avoid false positives on KVM_SET_FPU ioctl

2014-11-20 Thread Christian Borntraeger

struct kvm_fpu contains an alignment padding on s390x. Let's use a
designated initializer to avoid false positives from valgrind/memcheck.

Signed-off-by: Christian Borntraeger 
---
 target-s390x/kvm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target-s390x/kvm.c b/target-s390x/kvm.c
index d247471..0c1da6e 100644
--- a/target-s390x/kvm.c
+++ b/target-s390x/kvm.c
@@ -208,7 +208,7 @@ int kvm_arch_put_registers(CPUState *cs, int level)
 CPUS390XState *env = &cpu->env;
 struct kvm_sregs sregs;
 struct kvm_regs regs;
-struct kvm_fpu fpu;
+struct kvm_fpu fpu = {};
 int r;
 int i;
 
-- 
1.9.3

[Qemu-devel] [PULL 4/9] valgrind/i386: avoid false positives on KVM_SET_XCRS ioctl

2014-11-20 Thread Christian Borntraeger

struct kvm_xcrs contains padding bytes. Let's use a designated
initializer to avoid false positives from valgrind/memcheck.

Signed-off-by: Christian Borntraeger 
---
 target-i386/kvm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index ccf36e8..f42b4bf 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -1085,7 +1085,7 @@ static int kvm_put_xsave(X86CPU *cpu)
 static int kvm_put_xcrs(X86CPU *cpu)
 {
 CPUX86State *env = &cpu->env;
-struct kvm_xcrs xcrs;
+struct kvm_xcrs xcrs = {};
 
 if (!kvm_has_xcrs()) {
 return 0;
-- 
1.9.3

[Qemu-devel] [PULL 2/9] valgrind/i386: avoid false positives on KVM_SET_CLOCK ioctl

2014-11-20 Thread Christian Borntraeger

kvm_clock_data contains pad fields. Let's use a designated
initializer to avoid false positives from valgrind/memcheck.

Signed-off-by: Christian Borntraeger 
---
 hw/i386/kvm/clock.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
index 58be2bd..efdf165 100644
--- a/hw/i386/kvm/clock.c
+++ b/hw/i386/kvm/clock.c
@@ -88,7 +88,7 @@ static void kvmclock_vm_state_change(void *opaque, int 
running,
 int ret;
 
 if (running) {
-struct kvm_clock_data data;
+struct kvm_clock_data data = {};
 uint64_t time_at_migration = kvmclock_current_nsec(s);
 
 s->clock_valid = false;
@@ -99,7 +99,6 @@ static void kvmclock_vm_state_change(void *opaque, int 
running,
 }
 
 data.clock = s->clock;
-data.flags = 0;
 ret = kvm_vm_ioctl(kvm_state, KVM_SET_CLOCK, &data);
 if (ret < 0) {
 fprintf(stderr, "KVM_SET_CLOCK failed: %s\n", strerror(ret));
-- 
1.9.3

[Qemu-devel] [PULL 5/9] valgrind/i386: avoid false positives on KVM_SET_MSRS ioctl

2014-11-20 Thread Christian Borntraeger

struct kvm_msrs contains padding bytes. Let's use a designated
initializer on the info part to avoid false positives from
valgrind/memcheck. Do the same for generic MSRS, the TSC and
feature control.

We also need to zero out the reserved fields in the entries.
We do this in kvm_msr_entry_set as suggested by Paolo. This
avoids a big memset that a designated initializer on the
full structure would do.

Signed-off-by: Christian Borntraeger 
---
 target-i386/kvm.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index f42b4bf..8b4a9e9 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -1152,6 +1152,7 @@ static void kvm_msr_entry_set(struct kvm_msr_entry *entry,
   uint32_t index, uint64_t value)
 {
 entry->index = index;
+entry->reserved = 0;
 entry->data = value;
 }
 
@@ -1170,7 +1171,9 @@ static int kvm_put_tscdeadline_msr(X86CPU *cpu)
 
 kvm_msr_entry_set(&msrs[0], MSR_IA32_TSCDEADLINE, env->tsc_deadline);
 
-msr_data.info.nmsrs = 1;
+msr_data.info = (struct kvm_msrs) {
+.nmsrs = 1,
+};
 
 return kvm_vcpu_ioctl(CPU(cpu), KVM_SET_MSRS, &msr_data);
 }
@@ -1190,7 +1193,11 @@ static int kvm_put_msr_feature_control(X86CPU *cpu)
 
 kvm_msr_entry_set(&msr_data.entry, MSR_IA32_FEATURE_CONTROL,
   cpu->env.msr_ia32_feature_control);
-msr_data.info.nmsrs = 1;
+
+msr_data.info = (struct kvm_msrs) {
+.nmsrs = 1,
+};
+
 return kvm_vcpu_ioctl(CPU(cpu), KVM_SET_MSRS, &msr_data);
 }
 
@@ -1339,7 +1346,9 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
 }
 }
 
-msr_data.info.nmsrs = n;
+msr_data.info = (struct kvm_msrs) {
+.nmsrs = n,
+};
 
 return kvm_vcpu_ioctl(CPU(cpu), KVM_SET_MSRS, &msr_data);
 
-- 
1.9.3

[Qemu-devel] [PULL 1/9] valgrind: avoid false positives in KVM_GET_DIRTY_LOG ioctl

2014-11-20 Thread Christian Borntraeger

struct kvm_dirty_log contains padding fields that trigger false
positives in valgrind. Let's use a designated initializer to avoid
false positives from valgrind/memcheck.

Signed-off-by: Christian Borntraeger 
---
 kvm-all.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kvm-all.c b/kvm-all.c
index 44a5e72..b951320 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -400,7 +400,7 @@ static int 
kvm_physical_sync_dirty_bitmap(MemoryRegionSection *section)
 {
 KVMState *s = kvm_state;
 unsigned long size, allocated_size = 0;
-KVMDirtyLog d;
+KVMDirtyLog d = {};
 KVMSlot *mem;
 int ret = 0;
 hwaddr start_addr = section->offset_within_address_space;
-- 
1.9.3

[Qemu-devel] [PULL 3/9] valgrind/i386: avoid false positives on KVM_SET_PIT ioctl

2014-11-20 Thread Christian Borntraeger

struct kvm_pit_state2 contains pad fields. Let's use a designated
initializer to avoid false positives from valgrind/memcheck.

Signed-off-by: Christian Borntraeger 
---
 hw/i386/kvm/i8254.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/i386/kvm/i8254.c b/hw/i386/kvm/i8254.c
index 472af81..90eea10 100644
--- a/hw/i386/kvm/i8254.c
+++ b/hw/i386/kvm/i8254.c
@@ -138,7 +138,7 @@ static void kvm_pit_get(PITCommonState *pit)
 static void kvm_pit_put(PITCommonState *pit)
 {
 KVMPITState *s = KVM_PIT(pit);
-struct kvm_pit_state2 kpit;
+struct kvm_pit_state2 kpit = {};
 struct kvm_pit_channel_state *kchan;
 struct PITChannelState *sc;
 int i, ret;
-- 
1.9.3

[Qemu-devel] [PULL 6/9] valgrind/i386: avoid false positives on KVM_GET_MSRS ioctl

2014-11-20 Thread Christian Borntraeger

struct kvm_msrs contains a pad field. Let's use a designated
initializer on the info part to avoid false positives from
valgrind/memcheck.

Signed-off-by: Christian Borntraeger 
---
 target-i386/kvm.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 8b4a9e9..7919d3e 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -1655,7 +1655,10 @@ static int kvm_get_msrs(X86CPU *cpu)
 }
 }
 
-msr_data.info.nmsrs = n;
+msr_data.info = (struct kvm_msrs) {
+.nmsrs = n,
+};
+
 ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_MSRS, &msr_data);
 if (ret < 0) {
 return ret;
-- 
1.9.3

[Qemu-devel] [PULL 7/9] valgrind/i386: avoid false positives on KVM_SET_VCPU_EVENTS ioctl

2014-11-20 Thread Christian Borntraeger

struct kvm_vcpu_events contains reserved fields. Let's use a
designated initializer to avoid false positives in valgrind.

Signed-off-by: Christian Borntraeger 
---
 target-i386/kvm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 7919d3e..43963c1 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -1884,7 +1884,7 @@ static int kvm_put_apic(X86CPU *cpu)
 static int kvm_put_vcpu_events(X86CPU *cpu, int level)
 {
 CPUX86State *env = &cpu->env;
-struct kvm_vcpu_events events;
+struct kvm_vcpu_events events = {};
 
 if (!kvm_has_vcpu_events()) {
 return 0;
-- 
1.9.3

[Qemu-devel] [PULL 0/9] valgrind/coverity/i386/s390x: memcheck false positives

2014-11-20 Thread Christian Borntraeger

Paolo, Peter,

here is an updated version of my valgrind tree. Please review and
consider for 2.3.

The following changes since commit af3ff19b48f0bbf3a8bd35c47460358e8c6ae5e5:

  Update version for v2.2.0-rc2 release (2014-11-18 18:00:58 +)

are available in the git repository at:

  git://github.com/borntraeger/qemu.git tags/memcheck

for you to fetch changes up to 113fb9793bf21a3219d305206c79024b0801d7ab:

  coverity/s390x: avoid false positive in kvm_irqchip_add_adapter_route 
(2014-11-20 22:10:58 +0100)


valgrind/coverity/i386/s390x: memcheck false positives

Let's avoid most memcheck false positives in KVM ioctls.


Christian Borntraeger (9):
  valgrind: avoid false positives in KVM_GET_DIRTY_LOG ioctl
  valgrind/i386: avoid false positives on KVM_SET_CLOCK ioctl
  valgrind/i386: avoid false positives on KVM_SET_PIT ioctl
  valgrind/i386: avoid false positives on KVM_SET_XCRS ioctl
  valgrind/i386: avoid false positives on KVM_SET_MSRS ioctl
  valgrind/i386: avoid false positives on KVM_GET_MSRS ioctl
  valgrind/i386: avoid false positives on KVM_SET_VCPU_EVENTS ioctl
  valgrind/s390x: avoid false positives on KVM_SET_FPU ioctl
  coverity/s390x: avoid false positive in kvm_irqchip_add_adapter_route

 hw/i386/kvm/clock.c |  3 +--
 hw/i386/kvm/i8254.c |  2 +-
 kvm-all.c   |  4 ++--
 target-i386/kvm.c   | 24 ++--
 target-s390x/kvm.c  |  2 +-
 5 files changed, 23 insertions(+), 12 deletions(-)

Re: [Qemu-devel] [PATCH v2 21/21] iotests: Add test for different refcount widths

2014-11-20 Thread Eric Blake

On 11/20/2014 06:48 AM, Max Reitz wrote:
> Sounds good, the only problem is that I'd have to hand-craft the image
> myself, because qemu generally uses self-references for refblocks (when
> allocating new refblocks, they will contain their own refcount).
> 
> I think this already would be too much effort (I'll reply to your second
> email right away ;-)). There is no fundamental difference in how new
> allocations for the new reftable and for the new refblocks are treated:
> If there's a new allocation, respin. If that works for the new reftable,
> that's enough to convince me it will work for new refblocks as well.

Agreed - one test of needing a third pass is sufficient to prove we
handle allocations until convergence.


-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v2 21/21] iotests: Add test for different refcount widths

2014-11-20 Thread Eric Blake

On 11/20/2014 07:03 AM, Max Reitz wrote:
> Some people may ask why the walks are performed in a loop without a
> fixed limit (because they can't find cases where allocations haven't
> settled at the third pass). But I doubt that'll be a serious problem.
> It's much easier to have such a basically unlimited loop with the
> reasoning "We don't know exactly how many loops it'll take, but it will
> definitely settle at some point in time" than limiting the loop and then
> having to explain why we know exactly that it won't take more than X
> passes. The only problem with not limiting is that we need one walk to
> verify that all allocations have settled. But we need that for the
> common case (two passes) anyway, so that's not an issue.
> 
> The code from this version does not care whether it takes one, two,
> three, four or 42 passes. It's all the same. It will never take one and
> it will probably never take 42 passes; but if it does, well, it will
> work. Therefore, I think testing one non-standard number of passes
> (three) is enough. I'd like to test more, but the effort's just not
> worth it. I think.

Yep, I agree.  I've pretty much convinced myself that the REASON we are
guaranteed that things converge is that each successive iteration
allocates fewer clusters than the one before, and that in later
iterations, refblocks are not fully populated by these fewer allocations
(that is, on recursion, we are allocating geometrically less).

I think I may have found a case that needs four passes.  What if between
the first and second pass, we have enough refblocks to require
allocating 2752 or more contiguous clusters for the new reftable (again
continuing with my 64-bit from 32-bit example, this means at least 1376
contiguous clusters in the old reftable).  That's a huge image already
(176128 refblocks, 11,272,192 clusters, or 5,771,362,304 bytes).  If we
time things so that the first pass ends without spilling the old
reftable (which by now seems fairly tractable to compute how many spare
clusters to start with), then allocating the new reftable will also
spill the old reftable, and based on the reftables alone, will result in
more than 4096 newly-referenced clusters on the second pass (or more
than 64 new refblocks).  This in turn is enough to require another full
refblock just to describe the reftable, but that spills the size of the
new reftable, so between the second and third iteration we now have to
allocate 2753 instead of 2752 contiguous clusters.  And _that_
reallocation is enough for the third pass to have to allocate yet more
clusters.  But like you say, testing this is going to be prohibitively
slow (it's not worth a 5 gigabyte test).

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v2] Add -semihosting-config ....cmdline=string.

2014-11-20 Thread Liviu Ionescu

On 20 Nov 2014, at 22:20, Eric Blake  wrote:

> Instead, try:
> 
> -semihosting-config target=native,cmdline="MessageTest
> --gtest_output=xml:gcm.xml,,baburiba"

with double comma, the output of my custom qemu shows that the command line was 
properly parsed:

GNU ARM Eclipse QEMU v2.1.92 (qemu-system-gnuarmeclipse).
Board/machine: 'LM3S6965EVB'.
Core: 'cortex-m3', flash: 256 KB, RAM: 64 KB.
Image: 'qemu_osx_aep_gcc_minimal_Debug/minimal.elf'.
Command line: 'MessageTest --gtest_output=xml:gcm.xml,baburiba' (47 bytes).

however the syntax is unusual and quite non-intuitive.

regards,

Liviu

Re: [Qemu-devel] [PATCH v2] Add -semihosting-config ....cmdline=string.

2014-11-20 Thread Eric Blake

On 11/20/2014 01:13 PM, Liviu Ionescu wrote:
> 
> On 20 Nov 2014, at 18:40, Liviu Ionescu  wrote:
> 
>> A new sub-option was added to -semihosting-config to define the entire
>> semihosting command line (cmdline=string).
> 
> unfortunately the use of a sub-option is not appropriate, the command line 
> string must be allowed to include *any* character, including the ',' which is 
> not allowed, being used by the parser.

The command line parser explicitly allows ,, as an escape for a literal
comma, rather than a separator between options, for anything converted
to use QemuOpts.

> 
> for example 
> 
>   -semihosting-config target=native,cmdline="MessageTest 
> --gtest_output=xml:gcm.xml,baburiba" 

Instead, try:

-semihosting-config target=native,cmdline="MessageTest
--gtest_output=xml:gcm.xml,,baburiba"


-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v2] Add -semihosting-config ....cmdline=string.

2014-11-20 Thread Liviu Ionescu

On 20 Nov 2014, at 18:40, Liviu Ionescu  wrote:

> A new sub-option was added to -semihosting-config to define the entire
> semihosting command line (cmdline=string).

unfortunately the use of a sub-option is not appropriate, the command line 
string must be allowed to include *any* character, including the ',' which is 
not allowed, being used by the parser.

for example 

-semihosting-config target=native,cmdline="MessageTest 
--gtest_output=xml:gcm.xml,baburiba" 

triggers an error:

Unsupported semihosting-config target=native,cmdline=MessageTest 
--gtest_output=xml:gcm.xml,baburiba

unless you know of an escape character, the only solution I see is to return to 
a separate option:

-semihosting-cmdline "0 1 2 3"

for my use I would define a global variable and assign it, but this will 
probably not work for other targets.

any suggestions?

regard,

Liviu

Re: [Qemu-devel] [PATCH v3 7/9] raw: Prohibit dangerous writes for probed images

2014-11-20 Thread Dr. David Alan Gilbert

* Kevin Wolf (kw...@redhat.com) wrote:


> diff --git a/block/raw_bsd.c b/block/raw_bsd.c
> index 401b967..2ce5409 100644
> --- a/block/raw_bsd.c
> +++ b/block/raw_bsd.c
> @@ -58,8 +58,58 @@ static int coroutine_fn raw_co_readv(BlockDriverState *bs, 
> int64_t sector_num,
>  static int coroutine_fn raw_co_writev(BlockDriverState *bs, int64_t 
> sector_num,
>int nb_sectors, QEMUIOVector *qiov)
>  {
> +void *buf = NULL;
> +BlockDriver *drv;
> +QEMUIOVector local_qiov;
> +int ret;
> +
> +if (bs->probed && sector_num == 0) {
> +/* As long as these conditions are true, we can't get partial writes 
> to
> + * the probe buffer and can just directly check the request. */
> +QEMU_BUILD_BUG_ON(BLOCK_PROBE_BUF_SIZE != 512);
> +QEMU_BUILD_BUG_ON(BDRV_SECTOR_SIZE != 512);
> +
> +if (nb_sectors == 0) {
> +/* qemu_iovec_to_buf() would fail, but we want to return success
> + * instead of -EINVAL in this case. */
> +return 0;
> +}
> +
> +buf = qemu_try_blockalign(bs->file, 512);
> +if (!buf) {
> +ret = -ENOMEM;
> +goto fail;
> +}
> +
> +ret = qemu_iovec_to_buf(qiov, 0, buf, 512);
> +if (ret != 512) {
> +ret = -EINVAL;
> +goto fail;
> +}
> +
> +drv = bdrv_probe_all(buf, 512, NULL);
> +if (drv != bs->drv) {
> +ret = -EPERM;
> +goto fail;
> +}

Two things about this worry me:
   1) It allows a running guest to prod at the probing code potentially quite
hard; if there is anything nasty that can be done during probing it would
potentially make it easier for a guest to find it.

   2) We don't log anything when this failure happens so if someone hits
this by accident for some reason it'll confuse them no end.  Could we add
a (1 time?) error_report/printf just so that there's something to work with ?

Dave

> +
> +/* Use the checked buffer, a malicious guest might be overwriting its
> + * original buffer in the background. */
> +qemu_iovec_init(&local_qiov, qiov->niov + 1);
> +qemu_iovec_add(&local_qiov, buf, 512);
> +qemu_iovec_concat(&local_qiov, qiov, 512, qiov->size - 512);
> +qiov = &local_qiov;
> +}
> +
>  BLKDBG_EVENT(bs->file, BLKDBG_WRITE_AIO);
> -return bdrv_co_writev(bs->file, sector_num, nb_sectors, qiov);
> +ret = bdrv_co_writev(bs->file, sector_num, nb_sectors, qiov);
> +
> +fail:
> +if (qiov == &local_qiov) {
> +qemu_iovec_destroy(&local_qiov);
> +}
> +qemu_vfree(buf);
> +return ret;
>  }
>  
>  static int64_t coroutine_fn raw_co_get_block_status(BlockDriverState *bs,
> @@ -158,6 +208,18 @@ static int raw_open(BlockDriverState *bs, QDict 
> *options, int flags,
>  Error **errp)
>  {
>  bs->sg = bs->file->sg;
> +
> +if (bs->probed && !bdrv_is_read_only(bs)) {
> +fprintf(stderr,
> +"WARNING: Image format was not specified for '%s' and 
> probing "
> +"guessed raw.\n"
> +" Automatically detecting the format is dangerous 
> for "
> +"raw images, write operations on block 0 will be 
> restricted.\n"
> +" Specify the 'raw' format explicitly to remove the "
> +"restrictions.\n",
> +bs->file->filename);
> +}
> +
>  return 0;
>  }
>  
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index cd94559..192fce8 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -326,6 +326,7 @@ struct BlockDriverState {
>  int sg;/* if true, the device is a /dev/sg* */
>  int copy_on_read; /* if true, copy read backing sectors into image
>   note this is a reference count */
> +bool probed;
>  
>  BlockDriver *drv; /* NULL means no media */
>  void *opaque;
> @@ -414,6 +415,8 @@ struct BlockDriverState {
>  };
>  
>  int get_tmp_filename(char *filename, int size);
> +BlockDriver *bdrv_probe_all(const uint8_t *buf, int buf_size,
> +const char *filename);
>  
>  void bdrv_set_io_limits(BlockDriverState *bs,
>  ThrottleConfig *cfg);
> -- 
> 1.8.3.1
> 
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [BUGFIX][PATCH for 2.2 v4 1/1] -machine vmport=auto: Fix handling of VMWare ioport emulation for xen

2014-11-20 Thread Eric Blake

On 11/20/2014 11:21 AM, Don Slutz wrote:
> c/s 9b23cfb76b3a5e9eb5cc899eaf2f46bc46d33ba4
> 
> or
> 
> c/s b154537ad07598377ebf98252fb7d2aff127983b
> 
> moved the testing of xen_enabled() from pc_init1() to
> pc_machine_initfn().
> 
> xen_enabled() does not return the correct value in
> pc_machine_initfn().
> 
> Changed vmport from a bool to an enum.  Added the value "auto" to do
> the old way.
> 
> Signed-off-by: Don Slutz 
> ---
> 
> v4:
>   Michael S. Tsirkin, Eric Blake, Eduardo Habkost:
> Rename vmport to OnOffAuto and move to qapi/common.json
>   Eduardo Habkost:
> Simpler convert of enum to no_vmport.
>   Michael S. Tsirkin:
> Add assert for ON_OFF_AUTO_MAX.
> 
>  hw/i386/pc.c | 23 ++-
>  hw/i386/pc_piix.c| 10 +-
>  hw/i386/pc_q35.c | 10 +-
>  include/hw/i386/pc.h |  2 +-
>  qapi/common.json | 15 +++
>  qemu-options.hx  |  8 +---
>  vl.c |  2 +-
>  7 files changed, 54 insertions(+), 16 deletions(-)

I've only reviewed the qapi/common.json and qemu-options.hx files for
QMP interface (and will leave the rest of the patch to others), but I'm
okay with the changes to those files. I guess that means no R-b, since I
didn't do a full review, so here's a weaker:

Acked-by: Eric Blake 

if you think it is worth adding.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v4 3/3] iotests: Use -qmp-pretty in 067

2014-11-20 Thread Eric Blake

On 11/20/2014 11:56 AM, Max Reitz wrote:
> On 20.11.2014 19:39, Kevin Wolf wrote:
>> Am 17.11.2014 um 13:31 hat Max Reitz geschrieben:
>>> 067 invokes query-block, resulting in a reference output with really
>>> long lines (which may pose a problem in email patches and always poses a
>>> problem when the output changes, because it is hard to see what has
>>> actually changed). Use -qmp-pretty to mitigate this issue.
>>>
>>> Signed-off-by: Max Reitz 
>>> Reviewed-by: Eric Blake 
>>> ---
>>>   tests/qemu-iotests/067 |   2 +-
>>>   tests/qemu-iotests/067.out | 779
>>> +
>>>   2 files changed, 723 insertions(+), 58 deletions(-)
>> Hm, when I tried to rebase my query-block patch that adds cache
>> information, I noticed that I get a trailing space on every line of the
>> JSON dump, which isn't there in this patch. So 'cp 067.out.bad 067.out'
>> ended up changing every single line. :-/
>>
>> Did you postprocess the reference output or do really get it without
>> trailing whitespace?
> 
> Yes, I post-processed it, because you once complained about me sending
> patches with trailing whitespace. :-P
> 
> As I wrote in some reply to some reply from Eric on some version of this
> patch, QEMU's pretty JSON formatter emits whitespace at the end of some
> lines.

Trailing whitespace in canonical sources is annoying and should be
avoided.  But this file is about trailing whitespace in generated files
(such as reference output) and which is therefore worth keeping, unless
we fix the cause of the reference material having trailing whitespace in
the first place.  That way, we can regenerate the file from the same
reference source, and won't have needless differences in trailing
whitespace.  Sounds like a bug in the JSON pretty formatter, but fixing
it should be a separate series.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [BUGFIX][PATCH for 2.2 v4 1/1] -machine vmport=auto: Fix handling of VMWare ioport emulation for xen

2014-11-20 Thread Don Slutz


On 11/20/14 14:05, Eduardo Habkost wrote:

On Thu, Nov 20, 2014 at 01:21:18PM -0500, Don Slutz wrote:
[...]

@@ -242,9 +243,16 @@ static void pc_q35_init(MachineState *machine)
  
  pc_register_ferr_irq(gsi[13]);
  
+assert(pc_machine->vmport != ON_OFF_AUTO_MAX);

+if (pc_machine->vmport == ON_OFF_AUTO_AUTO) {
+no_vmport = xen_enabled();
+} else {
+no_vmport = (pc_machine->vmport != ON_OFF_AUTO_ON);
+}
+
  /* init basic PC hardware */
  pc_basic_device_init(isa_bus, gsi, &rtc_state, &floppy,
- !pc_machine->vmport, 0xff0104);
+ no_vmport, 0xff0104);
  

Sorry for suggesting yet another change, but: what about changing
pc_machine->vmport here instead of using a no_vmport variable, so the
actual vmport configuration may be queried by anybody later using the
QOM property? It would even make the code shorter.

I mean:

 if (pc_machine->vmport == ON_OFF_AUTO_AUTO) {
 pc_machine->vmport = xen_enabled() ? ON_OFF_AUTO_OFF : ON_OFF_AUTO_ON);
 }
 pc_basic_device_init(isa_bus, gsi, &rtc_state, &floppy,
  (pc_machine->vmport != ON_OFF_AUTO_ON), 0xff0104);




Look better to me also.  So will adjust the patch.

-Don Slutz

Re: [Qemu-devel] [PATCH 2/2] LICENSE: clarify

2014-11-20 Thread Christopher Covington

Hi Paolo,

On 07/31/2013 02:19 AM, Paolo Bonzini wrote:
> 1) The GPL says that "if the Program does not specify a version number
> of this License, you may choose any version ever published by the Free
> Software Foundation".  This is not true, QEMU includes parts that are
> v2-only.
> 
> 2) Provide a default for files with no licensing information.
> 
> 3) It is not just hardware emulation that is under BSD license.
> 
> 4) Restrict GPLv2-only contributions to user mode emulation (due to
> code from Linux) and PCI passthrough (due to code from Neocleus).
> 
> 5) The rules were initially set by Fabrice but are being amended by
> other people (already in commit ee12e1f, LICENSE: There is no libqemu.a
> anymore, 2011-11-15).  Do not put words in his mouth.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  LICENSE | 15 ++-
>  1 file changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/LICENSE b/LICENSE
> index acae9a3..2bd2e95 100644
> --- a/LICENSE
> +++ b/LICENSE
> @@ -1,16 +1,21 @@
>  The following points clarify the QEMU license:
>  
> -1) QEMU as a whole is released under the GNU General Public License
> +1) QEMU as a whole is released under the GNU General Public License,
> +version 2.
>  
>  2) Parts of QEMU have specific licenses which are compatible with the
> -GNU General Public License. Hence each source file contains its own
> -licensing information.
> +GNU General Public License, version 2. Hence each source file contains
> +its own licensing information.  Source files with no licensing information
> +are released under the GNU General Public License, version 2 or (at your
> +option) any later version.
>  
> -Many hardware device emulation sources are released under the BSD license.
> +As of July 2013, contributions under version 2 of the GNU General Public
> +License (and no later version) are only accepted for the following files
> +or directories: bsd-user/, linux-user/, hw/misc/vfio.c, hw/xen/xen_pt*.
>  
>  3) The Tiny Code Generator (TCG) is released under the BSD license
> (see license headers in files).
>  
>  4) QEMU is a trademark of Fabrice Bellard.
>  
> -Fabrice Bellard.
> +Fabrice Bellard and the QEMU team

http://wiki.qemu.org/License appears to be out of sync with these changes.

Thanks,
Chris

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

Re: [Qemu-devel] [BUGFIX][PATCH for 2.2 v4 1/1] -machine vmport=auto: Fix handling of VMWare ioport emulation for xen

2014-11-20 Thread Eduardo Habkost

On Thu, Nov 20, 2014 at 01:21:18PM -0500, Don Slutz wrote:
[...]
> @@ -242,9 +243,16 @@ static void pc_q35_init(MachineState *machine)
>  
>  pc_register_ferr_irq(gsi[13]);
>  
> +assert(pc_machine->vmport != ON_OFF_AUTO_MAX);
> +if (pc_machine->vmport == ON_OFF_AUTO_AUTO) {
> +no_vmport = xen_enabled();
> +} else {
> +no_vmport = (pc_machine->vmport != ON_OFF_AUTO_ON);
> +}
> +
>  /* init basic PC hardware */
>  pc_basic_device_init(isa_bus, gsi, &rtc_state, &floppy,
> - !pc_machine->vmport, 0xff0104);
> + no_vmport, 0xff0104);
>  

Sorry for suggesting yet another change, but: what about changing
pc_machine->vmport here instead of using a no_vmport variable, so the
actual vmport configuration may be queried by anybody later using the
QOM property? It would even make the code shorter.

I mean:

if (pc_machine->vmport == ON_OFF_AUTO_AUTO) {
pc_machine->vmport = xen_enabled() ? ON_OFF_AUTO_OFF : ON_OFF_AUTO_ON);
}
pc_basic_device_init(isa_bus, gsi, &rtc_state, &floppy,
 (pc_machine->vmport != ON_OFF_AUTO_ON), 0xff0104);


-- 
Eduardo

Re: [Qemu-devel] [PATCH v4 3/3] iotests: Use -qmp-pretty in 067

2014-11-20 Thread Max Reitz


On 20.11.2014 19:39, Kevin Wolf wrote:

Am 17.11.2014 um 13:31 hat Max Reitz geschrieben:

067 invokes query-block, resulting in a reference output with really
long lines (which may pose a problem in email patches and always poses a
problem when the output changes, because it is hard to see what has
actually changed). Use -qmp-pretty to mitigate this issue.

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
---
  tests/qemu-iotests/067 |   2 +-
  tests/qemu-iotests/067.out | 779 +
  2 files changed, 723 insertions(+), 58 deletions(-)

Hm, when I tried to rebase my query-block patch that adds cache
information, I noticed that I get a trailing space on every line of the
JSON dump, which isn't there in this patch. So 'cp 067.out.bad 067.out'
ended up changing every single line. :-/

Did you postprocess the reference output or do really get it without
trailing whitespace?


Yes, I post-processed it, because you once complained about me sending 
patches with trailing whitespace. :-P


As I wrote in some reply to some reply from Eric on some version of this 
patch, QEMU's pretty JSON formatter emits whitespace at the end of some 
lines.


Max

Re: [Qemu-devel] [PATCH v4 3/3] iotests: Use -qmp-pretty in 067

2014-11-20 Thread Kevin Wolf

Am 17.11.2014 um 13:31 hat Max Reitz geschrieben:
> 067 invokes query-block, resulting in a reference output with really
> long lines (which may pose a problem in email patches and always poses a
> problem when the output changes, because it is hard to see what has
> actually changed). Use -qmp-pretty to mitigate this issue.
> 
> Signed-off-by: Max Reitz 
> Reviewed-by: Eric Blake 
> ---
>  tests/qemu-iotests/067 |   2 +-
>  tests/qemu-iotests/067.out | 779 
> +
>  2 files changed, 723 insertions(+), 58 deletions(-)

Hm, when I tried to rebase my query-block patch that adds cache
information, I noticed that I get a trailing space on every line of the
JSON dump, which isn't there in this patch. So 'cp 067.out.bad 067.out'
ended up changing every single line. :-/

Did you postprocess the reference output or do really get it without
trailing whitespace?

Kevin

Re: [Qemu-devel] [BUGFIX][PATCH for 2.2 1/1] hw/ide/core.c: Prevent SIGSEGV during migration

2014-11-20 Thread Don Slutz


On 11/19/14 07:29, Markus Armbruster wrote:

Don Slutz  writes:


The other callers to blk_set_enable_write_cache() in this file
already check for s->blk == NULL.

Signed-off-by: Don Slutz 
---

I think this is a bugfix that should be back ported to stable
releases.

I also think this should be done in xen's copy of QEMU for 4.5 with
back port(s) to active stable releases.

Note: In 2.1 and earlier the routine is
bdrv_set_enable_write_cache(); variable is s->bs.

Got a reproducer?


yes.  Migrating a guest from xen 4.2 or 4.3 to xen 4.4 (or 4.5-unstable) on
CentOS 6.3 with xen_emul_unplug=unnecessary and no cdrom defined.




I'm asking because I believe s->identify_set implies s->blk.
s->identify_set is initialized to zero, and gets set to non-zero exactly
on the first successful IDENTIFY DEVICE or IDENTIFY PACKET DEVICE, in
ide_identify(), ide_atapi_identify() or ide_cfata_identify(),
respectively.  Only called via cmd_identify() / cmd_identify_packet()
via ide_exec_cmd().  The latter immediately fails when !s->blk:

 s = idebus_active_if(bus);
 /* ignore commands to non existent slave */
 if (s != bus->ifs && !s->blk) {
 return;
 }


I do think that you are right.  I have now spent more time on why I am
seeing this.



Even if I'm right, your patch is fine, because it makes this spot more
obviously correct, and consistent with the other uses of
blk_set_enable_write_cache().  The case for stable is weak, though.



I had not fully tracked down what is happening before sending the bugfix.
I have now done more debugging, and have tracked it down to xen 4.4
now using "-nodefaults" with QEMU.

I needed to add output to QEMU to track this down because I have long
command lines...

(all I get for ps -ef):
root 14864 1 82 16:42 ?00:00:09 
/usr/lib/xen/bin/qemu-system-i386 -xen-domid 20 -chardev 
socket,id=libxl-cmd,path=/var/run/xen/qmp-libxl-20,server,nowait -mon 
chardev=libxl-cmd,mode=control -name C63-min-tools -machine 
xenfv,vmware_hw=7,xen_platform_pci=240 -hostbridge 82443 -device 
agp-bridge,id=agp,msi=off,msi-x=off,addr=0x12.0 -monitor pty -monitor vc 
-boot menu=on -device 
vmware-pci-bridge,msi=on,msi-x=on,id=pciBridge1.0,addr=0x11.0 -device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge5.0,multifunction=on,addr=0x15.0 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge5.1,multifunction=on,addr=0x15.1 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge5.2,multifunction=on,addr=0x15.2 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge5.3,multifunction=on,addr=0x15.3 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge5.4,multifunction=on,addr=0x15.4 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge5.5,multifunction=on,addr=0x15.5 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge5.6,multifunction=on,addr=0x15.6 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge5.7,multifunction=on,addr=0x15.7 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge6.0,multifunction=on,addr=0x16.0 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge6.1,multifunction=on,addr=0x16.1 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge6.2,multifunction=on,addr=0x16.2 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge6.3,multifunction=on,addr=0x16.3 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge6.4,multifunction=on,addr=0x16.4 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge6.5,multifunction=on,addr=0x16.5 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge6.6,multifunction=on,addr=0x16.6 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge6.7,multifunction=on,addr=0x16.7 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge7.0,multifunction=on,addr=0x17.0 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge7.1,multifunction=on,addr=0x17.1 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge7.2,multifunction=on,addr=0x17.2 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge7.3,multifunction=on,addr=0x17.3 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge7.4,multifunction=on,addr=0x17.4 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge7.5,multifunction=on,addr=0x17.5 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge7.6,multifunction=on,addr=0x17.6 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge7.7,multifunction=on,addr=0x17.7 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge8.0,multifunction=on,addr=0x18.0 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge8.1,multifunction=on,addr=0x18.1 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge8.2,multifunction=on,addr=0x18.2 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge8.3,multifunction=on,addr=0x18.3 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge8.4,multifunction=on,addr=0x18.4 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge8.5,multifunction=on,addr=0x18.5 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge8.6,multifunction=on,ad

[Qemu-devel] [BUGFIX][PATCH for 2.2 v4 1/1] -machine vmport=auto: Fix handling of VMWare ioport emulation for xen

2014-11-20 Thread Don Slutz

c/s 9b23cfb76b3a5e9eb5cc899eaf2f46bc46d33ba4

or

c/s b154537ad07598377ebf98252fb7d2aff127983b

moved the testing of xen_enabled() from pc_init1() to
pc_machine_initfn().

xen_enabled() does not return the correct value in
pc_machine_initfn().

Changed vmport from a bool to an enum.  Added the value "auto" to do
the old way.

Signed-off-by: Don Slutz 
---

v4:
  Michael S. Tsirkin, Eric Blake, Eduardo Habkost:
Rename vmport to OnOffAuto and move to qapi/common.json
  Eduardo Habkost:
Simpler convert of enum to no_vmport.
  Michael S. Tsirkin:
Add assert for ON_OFF_AUTO_MAX.

 hw/i386/pc.c | 23 ++-
 hw/i386/pc_piix.c| 10 +-
 hw/i386/pc_q35.c | 10 +-
 include/hw/i386/pc.h |  2 +-
 qapi/common.json | 15 +++
 qemu-options.hx  |  8 +---
 vl.c |  2 +-
 7 files changed, 54 insertions(+), 16 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 1205db8..15fba60 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1711,18 +1711,23 @@ static void pc_machine_set_max_ram_below_4g(Object 
*obj, Visitor *v,
 pcms->max_ram_below_4g = value;
 }
 
-static bool pc_machine_get_vmport(Object *obj, Error **errp)
+static void pc_machine_get_vmport(Object *obj, Visitor *v, void *opaque,
+  const char *name, Error **errp)
 {
 PCMachineState *pcms = PC_MACHINE(obj);
+int vmport = pcms->vmport;
 
-return pcms->vmport;
+visit_type_enum(v, &vmport, OnOffAuto_lookup, NULL, name, errp);
 }
 
-static void pc_machine_set_vmport(Object *obj, bool value, Error **errp)
+static void pc_machine_set_vmport(Object *obj, Visitor *v, void *opaque,
+  const char *name, Error **errp)
 {
 PCMachineState *pcms = PC_MACHINE(obj);
+int vmport;
 
-pcms->vmport = value;
+visit_type_enum(v, &vmport, OnOffAuto_lookup, NULL, name, errp);
+pcms->vmport = vmport;
 }
 
 static void pc_machine_initfn(Object *obj)
@@ -1737,11 +1742,11 @@ static void pc_machine_initfn(Object *obj)
 pc_machine_get_max_ram_below_4g,
 pc_machine_set_max_ram_below_4g,
 NULL, NULL, NULL);
-pcms->vmport = !xen_enabled();
-object_property_add_bool(obj, PC_MACHINE_VMPORT,
- pc_machine_get_vmport,
- pc_machine_set_vmport,
- NULL);
+pcms->vmport = ON_OFF_AUTO_AUTO;
+object_property_add(obj, PC_MACHINE_VMPORT, "str",
+pc_machine_get_vmport,
+pc_machine_set_vmport,
+NULL, NULL, NULL);
 }
 
 static void pc_machine_class_init(ObjectClass *oc, void *data)
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 7bb97a4..6771979 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -101,6 +101,7 @@ static void pc_init1(MachineState *machine,
 FWCfgState *fw_cfg = NULL;
 PcGuestInfo *guest_info;
 ram_addr_t lowmem;
+bool no_vmport;
 
 /* Check whether RAM fits below 4G (leaving 1/2 GByte for IO memory).
  * If it doesn't, we need to split it in chunks below and above 4G.
@@ -234,9 +235,16 @@ static void pc_init1(MachineState *machine,
 
 pc_vga_init(isa_bus, pci_enabled ? pci_bus : NULL);
 
+assert(pc_machine->vmport != ON_OFF_AUTO_MAX);
+if (pc_machine->vmport == ON_OFF_AUTO_AUTO) {
+no_vmport = xen_enabled();
+} else {
+no_vmport = (pc_machine->vmport != ON_OFF_AUTO_ON);
+}
+
 /* init basic PC hardware */
 pc_basic_device_init(isa_bus, gsi, &rtc_state, &floppy,
- !pc_machine->vmport, 0x4);
+ no_vmport, 0x4);
 
 pc_nic_init(isa_bus, pci_bus);
 
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 598e679..6f25868 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -88,6 +88,7 @@ static void pc_q35_init(MachineState *machine)
 PcGuestInfo *guest_info;
 ram_addr_t lowmem;
 DriveInfo *hd[MAX_SATA_PORTS];
+bool no_vmport;
 
 /* Check whether RAM fits below 4G (leaving 1/2 GByte for IO memory
  * and 256 Mbytes for PCI Express Enhanced Configuration Access Mapping
@@ -242,9 +243,16 @@ static void pc_q35_init(MachineState *machine)
 
 pc_register_ferr_irq(gsi[13]);
 
+assert(pc_machine->vmport != ON_OFF_AUTO_MAX);
+if (pc_machine->vmport == ON_OFF_AUTO_AUTO) {
+no_vmport = xen_enabled();
+} else {
+no_vmport = (pc_machine->vmport != ON_OFF_AUTO_ON);
+}
+
 /* init basic PC hardware */
 pc_basic_device_init(isa_bus, gsi, &rtc_state, &floppy,
- !pc_machine->vmport, 0xff0104);
+ no_vmport, 0xff0104);
 
 /* connect pm stuff to lpc */
 ich9_lpc_pm_init(lpc);
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 7c3731f..7f7d2d4 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -37,7 +37,7

[Qemu-devel] [Bug 1394550] Re: qemu: linux kernel too old to load a ram disk

2014-11-20 Thread Arsen.Shnurkov

I loaded kernel and initramfs through symlinks and make that symlinks
wrong (kernel become initramfs and vice versa)

** Changed in: qemu
   Status: New => Invalid

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1394550

Title:
  qemu: linux kernel too old to load a ram disk

Status in QEMU:
  Invalid

Bug description:
  I was built  kernel-genkernel-x86_64-3.17.3-gentoo-gnu   and
  initramfs-genkernel-x86_64-3.17.3-gentoo-gnu in Gentoo Linux from sys-
  kernel/gentoo-sources/gentoo-sources-3.17.3.ebuild

  When I run this kernel with switches  -kernel  -initrd  -append (and
  others), qemu gives misleading message:

  "qemu: linux kernel too old to load a ram disk"

  this is not an old linux! Just something should be configured better.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1394550/+subscriptions

[Qemu-devel] [Bug 1357440] Re: qemu-img: Assert for 'amend' command and the fuzzed image

2014-11-20 Thread Max Reitz

Hi,

This issue has been fixed in master
(af3ff19b48f0bbf3a8bd35c47460358e8c6ae5e5, 2.2.0-rc2):

$ ./qemu-img amend -o compat=0.10 -f qcow2 copy.img
qemu-img: Error while amending options: File too large

Thanks for your report,

Max

** Changed in: qemu
   Status: New => Fix Committed

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1357440

Title:
  qemu-img: Assert for 'amend' command and the fuzzed image

Status in QEMU:
  Fix Committed

Bug description:
  'qemu-img amend' failed with the assert on the fuzzed image.

  Sequence:
   1. Unpack the attached archive, make a copy of test.img
   2. Put copy.img and backing_img.vdi in the same directory
   3. Execute
 qemu-img amend -o compat=0.10 -f qcow2 copy.img

  Result: qemu-img was killed by SIGIOT with the reason:

  qemu-img: block/qcow2-cluster.c:1598: expand_zero_clusters_in_l1:
  Assertion `(cluster_index >= 0) && (cluster_index < *nb_clusters)'
  failed.

  qemu.git HEAD 2d591ce2aeebf

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1357440/+subscriptions

[Qemu-devel] [Bug 1357445] Re: qemu-img: 'amend -o compat=0.10' command failed with segfault on the fuzzed image

2014-11-20 Thread Max Reitz

Hi,

Well, I still (on 2.2.0-rc2) receive "File too large", so I guess that's
the fix.

Max

** Changed in: qemu
   Status: New => Fix Committed

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1357445

Title:
  qemu-img: 'amend -o compat=0.10' command failed with segfault on the
  fuzzed image

Status in QEMU:
  Fix Committed

Bug description:
  qemu-img amend -o compat=0.10' failed with a segmentation fault on the
  fuzzed image.

  Sequence:
   1. Unpack the attached archive, make a copy of test.img
   2. Put copy.img and backing_img.qed in the same directory
   3. Execute
 qemu-img amend -o compat=0.10 -f qcow2 copy.img

  Result: qemu-img was killed by SIGSEGV.

  Traces can be found in the attached archive.

  qemu.git HEAD 2d591ce2aeebf

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1357445/+subscriptions

Re: [Qemu-devel] [BUGFIX][PATCH for 2.2 v3 1/1] -machine vmport=auto: Fix handling of VMWare ioport emulation for xen

2014-11-20 Thread Don Slutz


On 11/20/14 03:44, Michael S. Tsirkin wrote:

On Wed, Nov 19, 2014 at 07:38:10PM -0500, Don Slutz wrote:

c/s 9b23cfb76b3a5e9eb5cc899eaf2f46bc46d33ba4

or

c/s b154537ad07598377ebf98252fb7d2aff127983b

moved the testing of xen_enabled() from pc_init1() to
pc_machine_initfn().

xen_enabled() does not return the correct value in
pc_machine_initfn().

Changed vmport from a bool to an enum.  Added the value "auto" to do
the old way.

Signed-off-by: Don Slutz 

This looks fine to me. A couple of minor comments below.
Also this changes qapi schema file, let's get ack from maintainers -
my understanding is that just adding a definition there won't
affect any users, correct?



---
  hw/i386/pc.c | 23 ++-
  hw/i386/pc_piix.c| 27 ++-
  hw/i386/pc_q35.c | 27 ++-
  include/hw/i386/pc.h |  2 +-
  qapi-schema.json | 16 
  qemu-options.hx  |  8 +---
  vl.c |  2 +-
  7 files changed, 89 insertions(+), 16 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 1205db8..2f68e4d 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1711,18 +1711,23 @@ static void pc_machine_set_max_ram_below_4g(Object 
*obj, Visitor *v,
  pcms->max_ram_below_4g = value;
  }
  
-static bool pc_machine_get_vmport(Object *obj, Error **errp)

+static void pc_machine_get_vmport(Object *obj, Visitor *v, void *opaque,
+  const char *name, Error **errp)
  {
  PCMachineState *pcms = PC_MACHINE(obj);
+int vmport = pcms->vmport;
  
-return pcms->vmport;

+visit_type_enum(v, &vmport, vmport_lookup, NULL, name, errp);
  }
  
-static void pc_machine_set_vmport(Object *obj, bool value, Error **errp)

+static void pc_machine_set_vmport(Object *obj, Visitor *v, void *opaque,
+  const char *name, Error **errp)
  {
  PCMachineState *pcms = PC_MACHINE(obj);
+int vmport;
  
-pcms->vmport = value;

+visit_type_enum(v, &vmport, vmport_lookup, NULL, name, errp);
+pcms->vmport = vmport;
  }
  
  static void pc_machine_initfn(Object *obj)

@@ -1737,11 +1742,11 @@ static void pc_machine_initfn(Object *obj)
  pc_machine_get_max_ram_below_4g,
  pc_machine_set_max_ram_below_4g,
  NULL, NULL, NULL);
-pcms->vmport = !xen_enabled();
-object_property_add_bool(obj, PC_MACHINE_VMPORT,
- pc_machine_get_vmport,
- pc_machine_set_vmport,
- NULL);
+pcms->vmport = VMPORT_AUTO;
+object_property_add(obj, PC_MACHINE_VMPORT, "str",
+pc_machine_get_vmport,
+pc_machine_set_vmport,
+NULL, NULL, NULL);
  }
  
  static void pc_machine_class_init(ObjectClass *oc, void *data)

diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 7bb97a4..81a7b83 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -101,6 +101,7 @@ static void pc_init1(MachineState *machine,
  FWCfgState *fw_cfg = NULL;
  PcGuestInfo *guest_info;
  ram_addr_t lowmem;
+bool no_vmport;
  
  /* Check whether RAM fits below 4G (leaving 1/2 GByte for IO memory).

   * If it doesn't, we need to split it in chunks below and above 4G.
@@ -234,9 +235,33 @@ static void pc_init1(MachineState *machine,
  
  pc_vga_init(isa_bus, pci_enabled ? pci_bus : NULL);
  

Probably should be assert(pc_machine->vmport != VMPORT_MAX) -
we never get any values except on/off/auto.


Ok, added the assert.


+if (xen_enabled()) {
+switch (pc_machine->vmport) {
+case VMPORT_MAX:
+case VMPORT_AUTO:
+case VMPORT_OFF:
+no_vmport = true;
+break;
+case VMPORT_ON:
+no_vmport = false;
+break;
+}
+} else {
+switch (pc_machine->vmport) {
+case VMPORT_MAX:
+case VMPORT_OFF:
+no_vmport = true;
+break;
+case VMPORT_AUTO:
+case VMPORT_ON:
+no_vmport = false;
+break;
+}
+}
+

Can't above be moved to a function in pc.c, and be reused between piix
and q35? It's big enough to avoid open-coding, IMHO.



I feel that the what Eduardo Habkost provided:

+assert(pc_machine->vmport != ON_OFF_AUTO_MAX);
+if (pc_machine->vmport == ON_OFF_AUTO_AUTO) {
+no_vmport = xen_enabled();
+} else {
+no_vmport = (pc_machine->vmport == ON_OFF_AUTO_ON);
+}

is short enough to not need it's own function.  But I can do this if 
still needed.



  /* init basic PC hardware */
  pc_basic_device_init(isa_bus, gsi, &rtc_state, &floppy,
- !pc_machine->vmport, 0x4);
+ no_vmport, 0x4);
  
  pc_nic_init(isa_bus, pci_bus);
  
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c

index 598e679..4f932d6 100644
--- a

Re: [Qemu-devel] [PATCH 00/17] RFC: userfault v2

2014-11-20 Thread Andrea Arcangeli

Hi,

On Thu, Nov 20, 2014 at 10:54:29AM +0800, zhanghailiang wrote:
> Yes, you are right. This is what i really want, bypass all non-present faults
> and only track strict wrprotect faults. ;)
> 
> So, do you plan to support that in the userfault API?

Yes I think it's good idea to support wrprotect/COW faults too.

I just wanted to understand if there was any other reason why you
needed only wrprotect faults, because the non-present faults didn't
look like a big performance concern if they triggered in addition to
wrprotect faults, but it's certainly ok to optimize them away so it's
fully optimal.

All it takes to differentiate the behavior should be one more bit
during registration so you can select non-present, wrprotect faults or
both. postcopy live migration would select only non-present faults,
postcopy live snapshot would select only wrprotect faults, anything
like distributed shared memory supporting shared readonly access and
exclusive write access, would select both flags.

I just sent an (unfortunately) longish but way more detailed email
about live snapshotting with userfaultfd but I just wanted to give a
shorter answer here too :).

Thanks,
Andrea

[Qemu-devel] [Bug 1394550] Re: qemu: linux kernel too old to load a ram disk

2014-11-20 Thread Arsen.Shnurkov

** Attachment added: "/etc/kernels/kernel-config-x86_64-3.17.3-gentoo-gnu-vm"
   
https://bugs.launchpad.net/qemu/+bug/1394550/+attachment/4264503/+files/kernel-config-x86_64-3.17.3-gentoo-gnu-vm

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1394550

Title:
  qemu: linux kernel too old to load a ram disk

Status in QEMU:
  New

Bug description:
  I was built  kernel-genkernel-x86_64-3.17.3-gentoo-gnu   and
  initramfs-genkernel-x86_64-3.17.3-gentoo-gnu in Gentoo Linux from sys-
  kernel/gentoo-sources/gentoo-sources-3.17.3.ebuild

  When I run this kernel with switches  -kernel  -initrd  -append (and
  others), qemu gives misleading message:

  "qemu: linux kernel too old to load a ram disk"

  this is not an old linux! Just something should be configured better.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1394550/+subscriptions

[Qemu-devel] [Bug 1394550] Re: qemu: linux kernel too old to load a ram disk

2014-11-20 Thread Arsen.Shnurkov

** Attachment added: "/boot/kernel-genkernel-x86_64-3.17.3-gentoo-gnu-vm"
   
https://bugs.launchpad.net/qemu/+bug/1394550/+attachment/4264505/+files/kernel-genkernel-x86_64-3.17.3-gentoo-gnu-vm

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1394550

Title:
  qemu: linux kernel too old to load a ram disk

Status in QEMU:
  New

Bug description:
  I was built  kernel-genkernel-x86_64-3.17.3-gentoo-gnu   and
  initramfs-genkernel-x86_64-3.17.3-gentoo-gnu in Gentoo Linux from sys-
  kernel/gentoo-sources/gentoo-sources-3.17.3.ebuild

  When I run this kernel with switches  -kernel  -initrd  -append (and
  others), qemu gives misleading message:

  "qemu: linux kernel too old to load a ram disk"

  this is not an old linux! Just something should be configured better.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1394550/+subscriptions

[Qemu-devel] [Bug 1394550] Re: qemu: linux kernel too old to load a ram disk

2014-11-20 Thread Arsen.Shnurkov

** Attachment added: "/boot/initramfs-genkernel-x86_64-3.17.3-gentoo-gnu-vm"
   
https://bugs.launchpad.net/qemu/+bug/1394550/+attachment/4264506/+files/initramfs-genkernel-x86_64-3.17.3-gentoo-gnu-vm

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1394550

Title:
  qemu: linux kernel too old to load a ram disk

Status in QEMU:
  New

Bug description:
  I was built  kernel-genkernel-x86_64-3.17.3-gentoo-gnu   and
  initramfs-genkernel-x86_64-3.17.3-gentoo-gnu in Gentoo Linux from sys-
  kernel/gentoo-sources/gentoo-sources-3.17.3.ebuild

  When I run this kernel with switches  -kernel  -initrd  -append (and
  others), qemu gives misleading message:

  "qemu: linux kernel too old to load a ram disk"

  this is not an old linux! Just something should be configured better.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1394550/+subscriptions

[Qemu-devel] [Bug 1394550] Re: qemu: linux kernel too old to load a ram disk

2014-11-20 Thread Arsen.Shnurkov

** Attachment added: "/etc/genkernel.conf"
   
https://bugs.launchpad.net/qemu/+bug/1394550/+attachment/4264504/+files/genkernel.conf

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1394550

Title:
  qemu: linux kernel too old to load a ram disk

Status in QEMU:
  New

Bug description:
  I was built  kernel-genkernel-x86_64-3.17.3-gentoo-gnu   and
  initramfs-genkernel-x86_64-3.17.3-gentoo-gnu in Gentoo Linux from sys-
  kernel/gentoo-sources/gentoo-sources-3.17.3.ebuild

  When I run this kernel with switches  -kernel  -initrd  -append (and
  others), qemu gives misleading message:

  "qemu: linux kernel too old to load a ram disk"

  this is not an old linux! Just something should be configured better.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1394550/+subscriptions

Re: [Qemu-devel] [PATCH] linux-user: Use the 5KEf processor for 64-bit emulation

2014-11-20 Thread Peter Maydell

On 20 November 2014 17:21, Maciej W. Rozycki  wrote:
>  It's not possible, the MIPS16 and microMIPS instruction sets are
> mutually exclusive as the same resource (the ISA bit or the LSB of the
> PC) is used to switch to either mode from the standard MIPS mode,
> depending on what a given processor implements.

Pity.

>  Besides it is worthwhile to have a processor that implements the
> microMIPS or the standard MIPS instruction set only, to trap unwanted
> mode switches in software that is supposed to use one of these
> instruction sets only (there is no such issue with the MIPS16
> instruction set as it it not fully self-contained and the presence of
> the standard MIPS instruction set is also required).

Indeed; for ARM "any" is an addition to the usual set of
CPU options, not a replacement for them.

>  The rest can probably be arranged although there are grey areas,
> specifically not all optional instruction sets have their opcodes
> defined in the microMIPS instruction set.  But for the user emulation
> mode that should probably be OK as long the artificial processor is not
> available for the system emulation mode.

Yes; for ARM we don't define the "any" CPU in system mode for
pretty much this reason.

>  Please note that as I suggested the right emulation could be chosen
> automagically based on ELF file header flags.  In fact I have a patch
> outstanding already down the queue that does some processor
> reconfiguration based on that, just like the Linux kernel would do.
> That approach could be further extended up to selecting among processors
> available, so e.g. a program that contains some microMIPS code would
> pick up a microMIPS-enabled processor and one including MIPS16 code
> would pick up one with the MIPS16 extension instead.

That would probably be a useful default. My views here are really
based on two things:
 * the binfmt-misc registration just starts QEMU with no -cpu options,
   so it's helpful if the default can run any binary that matches the
   ELF magic specification
 * the ifdef ladder in linux-user/main.c is kind of ugly :-)

thanks
-- PMM

Re: [Qemu-devel] [PATCH 00/17] RFC: userfault v2

2014-11-20 Thread Andrea Arcangeli

Hi,

On Fri, Oct 31, 2014 at 12:39:32PM -0700, Peter Feiner wrote:
> On Fri, Oct 31, 2014 at 11:29:49AM +0800, zhanghailiang wrote:
> > Agreed, but for doing live memory snapshot (VM is running when do 
> > snapsphot),
> > we have to do this (block the write action), because we have to save the 
> > page before it
> > is dirtied by writing action. This is the difference, compared to pre-copy 
> > migration.
> 
> Ah ha, I understand the difference now. I suppose that you have considered
> doing a traditional pre-copy migration (that is, passes over memory saving
> dirty pages, followed by a pause and a final dump of remaining dirty pages) to
> a file. Your approach has the advantage of having the VM pause time bounded by
> the time it takes to handle the userfault and do the write, as opposed to
> pre-copy migration which has a pause time bounded by the time it takes to do
> the final dump of dirty pages, which, in the worst case, is the time it takes
> to dump all of the guest memory!

It sounds really similar issue as live migration, one can implement a
precopy live snapshot, or a precopy+postcopy live snapshot or a pure
postcopy live snapshot.

The decision on the amount of precopy done before engaging postcopy
(zero passes, 1 pass, or more passes) would have similar tradeoffs
too, except instead of having to re-transmit the re-dirtied pages over
the wire, it would need to overwrite them to disk.

The more precopy passes, the longer it takes for the live snapshotting
process to finish and the more I/O there will be (for live migration it'd
be network bandwidth usage instead of amount of I/O), but the shortest
the postcopy runtime will be (and the shorter postcopy runtime is, the
fewer userfaults will end up triggering on writes, in turn reducing
the slowdown and the artificial fault latency introduced to the guest
runtime). But the more precopy passes the more overwriting will happen
during the "longer" precopy stage and the more overall load there will
be for the host (the otherwise idle part of the host).

For the postcopy live snapshot the wrprotect faults are quite
equivalent to the not-present faults of postcopy live migration logic.

> You could use the old fork & dump trick. Given that the guest's memory is
> backed by private VMA (as of a year ago when I last looked, is always the case
> for QEMU), you can have the kernel do the write protection for you.
> Essentially, you fork Qemu and, in the child process, dump the guest memory
> then exit. If the parent (including the guest) writes to guest memory, then it
> will fault and the kernel will copy the page. 
> 
> The fork & dump approach will give you the best performance w.r.t. guest pause
> times (i.e., just pausing for the COW fault handler), but it does have the
> distinct disadvantage of potentially using 2x the guest memory (i.e., if the
> parent process races ahead and writes to all of the pages before you finish 
> the
> dump). To mitigate memory copying, you could madvise MADV_DONTNEED the child
> memory as you copy it.

This is a very good point. fork must be evaluated first because it
literally already provides you a readonly memory snapshot of the guest
memory.

The memory cons mentioned above could lead to both -ENOMEM of too many
guests runs live snapshots at the same time in the same host, unless
overcommit_memory is set to 1 (0 by default). Even then if too many
live snapshots are running in parallel you could hit the OOM killer if
there are just a bit too many faults at the same time, or you could
hit heavy swapping which isn't ideal either.

In fact the -ENOMEM avoidance (with qemu failing) is one of the two
critical reasons why qemu always set the guest memory as
MADV_DONTFORK. But that's not the only reason.

To use the fork() trick you'd need to undo the MADV_DONTFORK first but
that would open another problem: there's a race condition between
fork() O_DIRECT and <4k hardblocksize of virtio-blk. If there's any
read() syscall with O_DIRECT with len=512 while fork() is running
(think if the aio runs in parallel with the live snapshot thread that
forks the child to dump the snapshot) and if the guest writes with the
CPU to any 512 fragment of the same page that is the destination
buffer of the write(len=512) (on two different 512bytes area of the
same guest page) the O_DIRECT write will get lost.

So to use fork we'd need to fix this longstanding race (I tried but in
the end we declared it an userland issue because it's not exploitable
to bypass permissions or corrupt kernel or unrelated memory). Or you'd
need to add locking between the dataplane/aio threads and the live
snapshot thread to ensure no direct-io I/O is ever in-flight while
fork runs.

The O_DIRECT however would only help if it's qemu TCG, if it's KVM
it's not even enough to stop O_DIRECT reads. KVM would use
gup(write=1) from the async-pf all the time... and then the shadow
pagetables would go out of sync (it won't destabilize the host of
course, but the guest memor

Re: [Qemu-devel] [PATCH] target-ppc: explicitly save page table headers in big endian

2014-11-20 Thread Alexander Graf



On 03.11.14 16:14, Cédric Le Goater wrote:
> Currently, when the page tables are saved, the kvm_get_htab_header structs 
> and the ptes are assumed being big endian and dumped as a indistinct blob 
> in the statefile. This is no longer true when the host is little endian 
> and this breaks restoration.
> 
> This patch unfolds the kvmppc_save_htab routine to write explicitly the 
> kvm_get_htab_header structs in big endian. The ptes are left untouched.
> 
> Signed-off-by: Cédric Le Goater 
> Cc: Paul Mackerras 
> Cc: Alexey Kardashevskiy 
> Cc: Gregory Kurz 

Thanks, applied to ppc-next-2.3.


Alex

Re: [Qemu-devel] [PATCH v3 4/4] target-tricore: Add instructions of RCR opcode format

2014-11-20 Thread Richard Henderson

On 11/20/2014 02:28 PM, Bastian Koppelmann wrote:
> +uint64_t helper_madd64_ssov(CPUTriCoreState *env, target_ulong r1,
> +uint64_t r2, target_ulong r3)
> +{
> +uint64_t ret_low, ret_high;
> +uint64_t r2_high;
> +int64_t t1 = sextract64(r1, 0, 32);
> +int64_t t3 = sextract64(r3, 0, 32);
> +
> +ret_low = t1 * t3;
> +ret_high = ((int64_t)ret_low >> 63);
> +r2_high = ((int64_t)r2 >> 63);
> +add128(&ret_low, &ret_high, r2, r2_high);
> +
> +/* check for saturate */
> +t1 = (int64_t)ret_low >> 63;
> +if (t1 != ret_high) {

Instead of 128-bit addition, just use the "normal" overflow detection:

  mul = t1 * t3;
  ret = mul + r2;
  ovf = (ret ^ mul) & ~(mul ^ r2);
  if ((int64_t)ovf < 0)


> +uint64_t helper_madd64_suov(CPUTriCoreState *env, target_ulong r1,
> +uint64_t r2, target_ulong r3)
> +{
> +uint64_t ret_low, ret_high;
> +uint64_t t1 = extract64(r1, 0, 32);
> +uint64_t t3 = extract64(r3, 0, 32);
> +
> +ret_low = t1 * t3;
> +ret_high = 0;
> +add128(&ret_low, &ret_high, r2, 0);
> +
> +if (ret_high != 0) {

I'm sure this is similar, though easier since its unsigned:

  mul = t1 * t3;
  ret = mul + r2;
  if (ret < r2)

> +env->PSW_USB_V = (1 << 31);
> +env->PSW_USB_SV = (1 << 31);
> +ret_low = UINT64_MAX;
> +} else if ((ret_high & (1LL << 63)) != 0) {

I'm not sure what this is about though, since your "ret_high != 0" shadows it,
so it'll never be executed.  Cut and paste from ssov, or is the addition
actually signed?

> +uint64_t helper_msub64_ssov(CPUTriCoreState *env, target_ulong r1,
> +uint64_t r2, target_ulong r3)
...
> +uint64_t helper_msub64_suov(CPUTriCoreState *env, target_ulong r1,
> +uint64_t r2, target_ulong r3)

Likewise, of course.

Otherwise, it looks good.


r~

Re: [Qemu-devel] [PATCH] linux-user: Use the 5KEf processor for 64-bit emulation

2014-11-20 Thread Maciej W. Rozycki

On Thu, 20 Nov 2014, Peter Maydell wrote:

> >  For user emulation mode I think we want to default to the highest ISA
> > level supported, for maximum user flexibility.  Currently the MIPS64r2
> > ISA is the highest 64-bit ISA we have a real processor support for so
> > use it and the 5KEf which is the processor we have that implements it.
> > Later, as newer processors are added, we can bump it further up.
> 
> Is it feasible to define an "any" CPU for linux-user which
> basically enables all known ISA features? This is what we
> do on ARM and some other target CPU families. If some CPU
> features conflict with each other (so you can't validly
> have a CPU with all of them) that might make it awkward, though.

 It's not possible, the MIPS16 and microMIPS instruction sets are 
mutually exclusive as the same resource (the ISA bit or the LSB of the 
PC) is used to switch to either mode from the standard MIPS mode, 
depending on what a given processor implements.

 Besides it is worthwhile to have a processor that implements the 
microMIPS or the standard MIPS instruction set only, to trap unwanted 
mode switches in software that is supposed to use one of these 
instruction sets only (there is no such issue with the MIPS16 
instruction set as it it not fully self-contained and the presence of 
the standard MIPS instruction set is also required).

 Additionally MIPSr6 is incompatible with any earlier ISA in the 
standard MIPS mode.

 The rest can probably be arranged although there are grey areas, 
specifically not all optional instruction sets have their opcodes 
defined in the microMIPS instruction set.  But for the user emulation 
mode that should probably be OK as long the artificial processor is not 
available for the system emulation mode.

 In the system emulation mode such a processor could lead software to 
confusion, for example the Linux kernel examines processor configuration 
bits and configures itself accordingly.  That works just fine across all 
the optional features in the standard MIPS mode, but a microMIPS kernel 
will not have access to the relevant maintenance instructions that have 
not been defined in the microMIPS instruction set.  An example is the 
SmartMIPS extension whose ACX register has to be maintained in context 
switches and the access to which requires instructions that have not 
been defined in the microMIPS instruction set.  So most likely there's a 
stub somewhere there that makes it panic instead.

 But in any case we'd need at least 3 processors anyway: MIPSr5 with the 
MIPS16 extension, MIPSr5 with the microMIPS extension and MIPSr6.

 Please note that as I suggested the right emulation could be chosen 
automagically based on ELF file header flags.  In fact I have a patch 
outstanding already down the queue that does some processor 
reconfiguration based on that, just like the Linux kernel would do.  
That approach could be further extended up to selecting among processors 
available, so e.g. a program that contains some microMIPS code would 
pick up a microMIPS-enabled processor and one including MIPS16 code 
would pick up one with the MIPS16 extension instead.

 Thanks for your input though, the idea makes sense overall where 
possible.

  Maciej

Re: [Qemu-devel] [PATCH v4 33/47] Postcopy: Postcopy startup in migration thread

2014-11-20 Thread Dr. David Alan Gilbert

* Paolo Bonzini (pbonz...@redhat.com) wrote:
> Il 03/10/2014 19:47, Dr. David Alan Gilbert (git) ha scritto:
> > From: "Dr. David Alan Gilbert" 
> > 
> > Rework the migration thread to setup and start postcopy.
> > 
> > Signed-off-by: Dr. David Alan Gilbert 
> > ---
> >  include/migration/migration.h |   3 +
> >  migration.c   | 201 
> > ++
> >  2 files changed, 185 insertions(+), 19 deletions(-)
> > 
> > diff --git a/include/migration/migration.h b/include/migration/migration.h
> > index b01cc17..f401775 100644
> > --- a/include/migration/migration.h
> > +++ b/include/migration/migration.h



> > +/* Switch from normal iteration to postcopy
> > + * Returns non-0 on error
> > + */
> > +static int postcopy_start(MigrationState *ms)
> > +{
> > +int ret;
> > +const QEMUSizedBuffer *qsb;
> > +migrate_set_state(ms, MIG_STATE_ACTIVE, MIG_STATE_POSTCOPY_ACTIVE);
> > +
> > +DPRINTF("postcopy_start\n");
> > +qemu_mutex_lock_iothread();
> > +DPRINTF("postcopy_start: setting run state\n");
> > +ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
> > +
> > +if (ret < 0) {
> > +migrate_set_state(ms, MIG_STATE_POSTCOPY_ACTIVE, MIG_STATE_ERROR);
> > +qemu_mutex_unlock_iothread();
> > +return -1;
> 
> Please use "goto" for error returns, like
> 
> fail_locked:
> qemu_mutex_unlock_iothread();
> fail:
> migrate_set_state(ms, MIG_STATE_POSTCOPY_ACTIVE, MIG_STATE_ERROR);
> return -1;

Done; they all end up unlocking, but I've got another label
for a case that has to close the fb later.

> > +}
> > +
> > +/*
> > + * in Finish migrate and with the io-lock held everything should
> > + * be quiet, but we've potentially still got dirty pages and we
> > + * need to tell the destination to throw any pages it's already 
> > received
> > + * that are dirty
> > + */
> > +if (ram_postcopy_send_discard_bitmap(ms)) {
> > +DPRINTF("postcopy send discard bitmap failed\n");
> > +migrate_set_state(ms, MIG_STATE_POSTCOPY_ACTIVE, MIG_STATE_ERROR);
> > +qemu_mutex_unlock_iothread();
> > +return -1;
> > +}
> > +
> > +DPRINTF("postcopy_start: sending req 2\n");
> > +qemu_savevm_send_reqack(ms->file, 2);
> 
> Perhaps move it below qemu_file_set_rate_limit, and add
> trace_qemu_savevm_send_reqack?

Trace added, and also moved as requested - was the request to move
it just to elimintate the other DPRINTF?

> Also what is 2/3/4?  Is this just for debugging or is it part of the
> protocol?

Debug; they're very useful for matching the debug streams up, especially
when the timers on the two hosts are very different.
(I'm up for suggestions on how to mark the 2/3/4 for debug more clearly,
especially if it meant that it didn't make the ping (ne reqack) dedicated
to debug).

> > +/*
> > + * send rest of state - note things that are doing postcopy
> > + * will notice we're in MIG_STATE_POSTCOPY_ACTIVE and not actually
> > + * wrap their state up here
> > + */
> > +qemu_file_set_rate_limit(ms->file, INT64_MAX);
> > +DPRINTF("postcopy_start: do state_complete\n");
> > +
> > +/*
> > + * We need to leave the fd free for page transfers during the
> > + * loading of the device state, so wrap all the remaining
> > + * commands and state into a package that gets sent in one go
> > + */
> 
> The comments in the code are very nice.  Thanks.  This is a huge
> improvement from the last version I received.
> 
> > +QEMUFile *fb = qemu_bufopen("w", NULL);
> > +if (!fb) {
> > +error_report("Failed to create buffered file");
> > +migrate_set_state(ms, MIG_STATE_POSTCOPY_ACTIVE, MIG_STATE_ERROR);
> > +qemu_mutex_unlock_iothread();
> > +return -1;
> > +}
> > +
> > +qemu_savevm_state_complete(fb);
> > +DPRINTF("postcopy_start: sending req 3\n");
> > +qemu_savevm_send_reqack(fb, 3);
> > +
> > +qemu_savevm_send_postcopy_ram_run(fb);
> > +
> > +/* <><> end of stuff going into the package */
> > +qsb = qemu_buf_get(fb);
> > +
> > +/* Now send that blob */
> > +if (qsb_get_length(qsb) > MAX_VM_CMD_PACKAGED_SIZE) {
> > +DPRINTF("postcopy_start: Unreasonably large packaged state: %lu\n",
> > +(unsigned long)(qsb_get_length(qsb)));
> > +migrate_set_state(ms, MIG_STATE_POSTCOPY_ACTIVE, MIG_STATE_ERROR);
> > +qemu_mutex_unlock_iothread();
> > +qemu_fclose(fb);
> 
> Close fb above migrate_set_state, and use goto as above.  Or just have
> three labels.

Done, it's a separate label.

> 
> > +return -1;
> > +}
> > +qemu_savevm_send_packaged(ms->file, qsb);
> > +qemu_fclose(fb);
> > +
> > +qemu_mutex_unlock_iothread();
> > +
> > +DPRINTF("postcopy_start not finished sending ack\n");
> > +qemu_savevm_send_reqack(ms->file, 4);
> > +
> > +ret = qemu_file_get_error(ms->file);
> > +if (ret) {

[Qemu-devel] [PATCH v3 19/22] qcow2: Add function for refcount order amendment

2014-11-20 Thread Max Reitz

Add a function qcow2_change_refcount_order() which allows changing the
refcount order of a qcow2 image.

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 block/qcow2-refcount.c | 454 +
 block/qcow2.h  |   4 +
 2 files changed, 458 insertions(+)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 60594d8..d619728 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -2477,3 +2477,457 @@ int qcow2_pre_write_overlap_check(BlockDriverState *bs, 
int ign, int64_t offset,
 
 return 0;
 }
+
+/* A pointer to a function of this type is given to walk_over_reftable(). That
+ * function will create refblocks and pass them to a RefblockFinishOp once they
+ * are completed (@refblock). @refblock_empty is set if the refblock is
+ * completely empty.
+ *
+ * Along with the refblock, a corresponding reftable entry is passed, in the
+ * reftable @reftable (which may be reallocated) at @reftable_index.
+ *
+ * @allocated should be set to true if a new cluster has been allocated.
+ */
+typedef int (RefblockFinishOp)(BlockDriverState *bs, uint64_t **reftable,
+   uint64_t reftable_index, uint64_t 
*reftable_size,
+   void *refblock, bool refblock_empty,
+   bool *allocated, Error **errp);
+
+/**
+ * This "operation" for walk_over_reftable() allocates the refblock on disk (if
+ * it is not empty) and inserts its offset into the new reftable. The size of
+ * this new reftable is increased as required.
+ */
+static int alloc_refblock(BlockDriverState *bs, uint64_t **reftable,
+  uint64_t reftable_index, uint64_t *reftable_size,
+  void *refblock, bool refblock_empty, bool *allocated,
+  Error **errp)
+{
+BDRVQcowState *s = bs->opaque;
+int64_t offset;
+
+if (!refblock_empty && reftable_index >= *reftable_size) {
+uint64_t *new_reftable;
+uint64_t new_reftable_size;
+
+new_reftable_size = ROUND_UP(reftable_index + 1,
+ s->cluster_size / sizeof(uint64_t));
+if (new_reftable_size > QCOW_MAX_REFTABLE_SIZE / sizeof(uint64_t)) {
+error_setg(errp,
+   "This operation would make the refcount table grow "
+   "beyond the maximum size supported by QEMU, aborting");
+return -ENOTSUP;
+}
+
+new_reftable = g_try_realloc(*reftable, new_reftable_size *
+sizeof(uint64_t));
+if (!new_reftable) {
+error_setg(errp, "Failed to increase reftable buffer size");
+return -ENOMEM;
+}
+
+memset(new_reftable + *reftable_size, 0,
+   (new_reftable_size - *reftable_size) * sizeof(uint64_t));
+
+*reftable  = new_reftable;
+*reftable_size = new_reftable_size;
+}
+
+if (!refblock_empty && !(*reftable)[reftable_index]) {
+offset = qcow2_alloc_clusters(bs, s->cluster_size);
+if (offset < 0) {
+error_setg_errno(errp, -offset, "Failed to allocate refblock");
+return offset;
+}
+(*reftable)[reftable_index] = offset;
+*allocated = true;
+}
+
+return 0;
+}
+
+/**
+ * This "operation" for walk_over_reftable() writes the refblock to disk at the
+ * offset specified by the new reftable's entry. It does not modify the new
+ * reftable or change any refcounts.
+ */
+static int flush_refblock(BlockDriverState *bs, uint64_t **reftable,
+  uint64_t reftable_index, uint64_t *reftable_size,
+  void *refblock, bool refblock_empty, bool *allocated,
+  Error **errp)
+{
+BDRVQcowState *s = bs->opaque;
+int64_t offset;
+int ret;
+
+if (reftable_index < *reftable_size && (*reftable)[reftable_index]) {
+offset = (*reftable)[reftable_index];
+
+ret = qcow2_pre_write_overlap_check(bs, 0, offset, s->cluster_size);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "Overlap check failed");
+return ret;
+}
+
+ret = bdrv_pwrite(bs->file, offset, refblock, s->cluster_size);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "Failed to write refblock");
+return ret;
+}
+} else {
+assert(refblock_empty);
+}
+
+return 0;
+}
+
+/**
+ * This function walks over the existing reftable and every referenced 
refblock;
+ * if @new_set_refcount is non-NULL, it is called for every refcount entry to
+ * create an equal new entry in the passed @new_refblock. Once that
+ * @new_refblock is completely filled, @operation will be called.
+ *
+ * @status_cb and @cb_opaque are used for the amend operation's status 
callback.
+ * @index is the index of the walk_over_reftable() calls and @total is the 
tot

[Qemu-devel] [PATCH v3 14/22] block: Add opaque value to the amend CB

2014-11-20 Thread Max Reitz

Add an opaque value which is to be passed to the bdrv_amend_options()
status callback.

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 block.c   |  4 ++--
 block/qcow2-cluster.c | 14 --
 block/qcow2.c |  9 +
 block/qcow2.h |  3 ++-
 include/block/block.h |  4 ++--
 include/block/block_int.h |  3 ++-
 qemu-img.c|  5 +++--
 7 files changed, 24 insertions(+), 18 deletions(-)

diff --git a/block.c b/block.c
index c979d51..c34b188 100644
--- a/block.c
+++ b/block.c
@@ -5790,12 +5790,12 @@ void bdrv_add_before_write_notifier(BlockDriverState 
*bs,
 }
 
 int bdrv_amend_options(BlockDriverState *bs, QemuOpts *opts,
-   BlockDriverAmendStatusCB *status_cb)
+   BlockDriverAmendStatusCB *status_cb, void *cb_opaque)
 {
 if (!bs->drv->bdrv_amend_options) {
 return -ENOTSUP;
 }
-return bs->drv->bdrv_amend_options(bs, opts, status_cb);
+return bs->drv->bdrv_amend_options(bs, opts, status_cb, cb_opaque);
 }
 
 /* This function will be called by the bdrv_recurse_is_first_non_filter method
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index ab43902..2daf334 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1620,7 +1620,8 @@ fail:
 static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
   int l1_size, int64_t *visited_l1_entries,
   int64_t l1_entries,
-  BlockDriverAmendStatusCB *status_cb)
+  BlockDriverAmendStatusCB *status_cb,
+  void *cb_opaque)
 {
 BDRVQcowState *s = bs->opaque;
 bool is_active_l1 = (l1_table == s->l1_table);
@@ -1646,7 +1647,7 @@ static int expand_zero_clusters_in_l1(BlockDriverState 
*bs, uint64_t *l1_table,
 /* unallocated */
 (*visited_l1_entries)++;
 if (status_cb) {
-status_cb(bs, *visited_l1_entries, l1_entries);
+status_cb(bs, *visited_l1_entries, l1_entries, cb_opaque);
 }
 continue;
 }
@@ -1768,7 +1769,7 @@ static int expand_zero_clusters_in_l1(BlockDriverState 
*bs, uint64_t *l1_table,
 
 (*visited_l1_entries)++;
 if (status_cb) {
-status_cb(bs, *visited_l1_entries, l1_entries);
+status_cb(bs, *visited_l1_entries, l1_entries, cb_opaque);
 }
 }
 
@@ -1797,7 +1798,8 @@ fail:
  * qcow2 version which doesn't yet support metadata zero clusters.
  */
 int qcow2_expand_zero_clusters(BlockDriverState *bs,
-   BlockDriverAmendStatusCB *status_cb)
+   BlockDriverAmendStatusCB *status_cb,
+   void *cb_opaque)
 {
 BDRVQcowState *s = bs->opaque;
 uint64_t *l1_table = NULL;
@@ -1814,7 +1816,7 @@ int qcow2_expand_zero_clusters(BlockDriverState *bs,
 
 ret = expand_zero_clusters_in_l1(bs, s->l1_table, s->l1_size,
  &visited_l1_entries, l1_entries,
- status_cb);
+ status_cb, cb_opaque);
 if (ret < 0) {
 goto fail;
 }
@@ -1849,7 +1851,7 @@ int qcow2_expand_zero_clusters(BlockDriverState *bs,
 
 ret = expand_zero_clusters_in_l1(bs, l1_table, s->snapshots[i].l1_size,
  &visited_l1_entries, l1_entries,
- status_cb);
+ status_cb, cb_opaque);
 if (ret < 0) {
 goto fail;
 }
diff --git a/block/qcow2.c b/block/qcow2.c
index 58f7bf6..39e6061 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2597,7 +2597,7 @@ static int qcow2_load_vmstate(BlockDriverState *bs, 
uint8_t *buf,
  * have to be removed.
  */
 static int qcow2_downgrade(BlockDriverState *bs, int target_version,
-   BlockDriverAmendStatusCB *status_cb)
+   BlockDriverAmendStatusCB *status_cb, void 
*cb_opaque)
 {
 BDRVQcowState *s = bs->opaque;
 int current_version = s->qcow_version;
@@ -2646,7 +2646,7 @@ static int qcow2_downgrade(BlockDriverState *bs, int 
target_version,
 /* clearing autoclear features is trivial */
 s->autoclear_features = 0;
 
-ret = qcow2_expand_zero_clusters(bs, status_cb);
+ret = qcow2_expand_zero_clusters(bs, status_cb, cb_opaque);
 if (ret < 0) {
 return ret;
 }
@@ -2661,7 +2661,8 @@ static int qcow2_downgrade(BlockDriverState *bs, int 
target_version,
 }
 
 static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
-   BlockDriverAmendStatusCB *status_cb)
+   BlockDriverAmendStatusCB *status_cb,
+   void *cb_opaque)
 {
 BDRVQcowState *s =

[Qemu-devel] [PATCH v3 13/22] progress: Allow regressing progress

2014-11-20 Thread Max Reitz

Progress may regress; this should be displayed correctly by
qemu_progress_print().

Signed-off-by: Max Reitz 
---
 util/qemu-progress.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/util/qemu-progress.c b/util/qemu-progress.c
index 4ee5cd0..c0fb14d 100644
--- a/util/qemu-progress.c
+++ b/util/qemu-progress.c
@@ -152,6 +152,7 @@ void qemu_progress_print(float delta, int max)
 state.current = current;
 
 if (current > (state.last_print + state.min_skip) ||
+current < (state.last_print - state.min_skip) ||
 (current == 100) || (current == 0)) {
 state.last_print = state.current;
 state.print();
-- 
1.9.3

[Qemu-devel] [PATCH v3 11/22] iotests: Prepare for refcount_width option

2014-11-20 Thread Max Reitz

Some tests do not work well with certain refcount widths (i.e. you
cannot create internal snapshots with refcount_width=1), so make those
widths unsupported.

Furthermore, add another filter to _filter_img_create in common.filter
which filters out the refcount_width value.

This is necessary for test 079, which does actually work with any
refcount width, but invoking qemu-img directly leads to the
refcount_width value being visible in the output; use _make_test_img
instead which will filter it out.

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 tests/qemu-iotests/007   |  3 +++
 tests/qemu-iotests/015   |  2 ++
 tests/qemu-iotests/026   |  7 +++
 tests/qemu-iotests/029   |  2 ++
 tests/qemu-iotests/051   |  3 +++
 tests/qemu-iotests/058   |  2 ++
 tests/qemu-iotests/067   |  2 ++
 tests/qemu-iotests/079   | 10 ++
 tests/qemu-iotests/079.out   | 38 ++
 tests/qemu-iotests/080   |  2 ++
 tests/qemu-iotests/089   |  2 ++
 tests/qemu-iotests/108   |  2 ++
 tests/qemu-iotests/common.filter |  3 ++-
 13 files changed, 41 insertions(+), 37 deletions(-)

diff --git a/tests/qemu-iotests/007 b/tests/qemu-iotests/007
index fe1a743..8d92490 100755
--- a/tests/qemu-iotests/007
+++ b/tests/qemu-iotests/007
@@ -43,6 +43,9 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 _supported_fmt qcow2
 _supported_proto generic
 _supported_os Linux
+# refcount_width must be at least 4 bits so we can create ten internal 
snapshots
+# (1 bit supports none, 2 bits support two, 4 bits support 14)
+_unsupported_imgopts 'refcount_width=\(1\|2\)[^0-9]'
 
 echo
 echo "creating image"
diff --git a/tests/qemu-iotests/015 b/tests/qemu-iotests/015
index 099d757..040ca22 100755
--- a/tests/qemu-iotests/015
+++ b/tests/qemu-iotests/015
@@ -43,6 +43,8 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 _supported_fmt qcow2
 _supported_proto generic
 _supported_os Linux
+# Internal snapshots are (currently) impossible with refcount_width=1
+_unsupported_imgopts 'refcount_width=1[^0-9]'
 
 echo
 echo "creating image"
diff --git a/tests/qemu-iotests/026 b/tests/qemu-iotests/026
index df2884b..3b7a07f 100755
--- a/tests/qemu-iotests/026
+++ b/tests/qemu-iotests/026
@@ -46,6 +46,13 @@ _supported_proto file
 _supported_os Linux
 _default_cache_mode "writethrough"
 _supported_cache_modes "writethrough" "none"
+# The refcount table tests expect a certain minimum width for refcount entries
+# (so that the refcount table actually needs to grow); that minimum is 16 bits,
+# being the default refcount entry width.
+# 32 and 64 bits do not work either, however, due to different leaked cluster
+# count on error.
+# Thus, the only remaining option is refcount_width=16.
+_unsupported_imgopts 'refcount_width=\([^1]\|.\([^6]\|$\)\)'
 
 echo "Errors while writing 128 kB"
 echo
diff --git a/tests/qemu-iotests/029 b/tests/qemu-iotests/029
index fa46ace..1cf1bba 100755
--- a/tests/qemu-iotests/029
+++ b/tests/qemu-iotests/029
@@ -44,6 +44,8 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 _supported_fmt qcow2
 _supported_proto generic
 _supported_os Linux
+# Internal snapshots are (currently) impossible with refcount_width=1
+_unsupported_imgopts 'refcount_width=1[^0-9]'
 
 offset_size=24
 offset_l1_size=36
diff --git a/tests/qemu-iotests/051 b/tests/qemu-iotests/051
index 11c858f..8e7d326 100755
--- a/tests/qemu-iotests/051
+++ b/tests/qemu-iotests/051
@@ -41,6 +41,9 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 _supported_fmt qcow2
 _supported_proto file
 _supported_os Linux
+# A compat=0.10 image is created in this test which does not support anything
+# other than refcount_width=16
+_unsupported_imgopts 'refcount_width=\([^1]\|.\([^6]\|$\)\)'
 
 function do_run_qemu()
 {
diff --git a/tests/qemu-iotests/058 b/tests/qemu-iotests/058
index 14584cd..3b03d8e 100755
--- a/tests/qemu-iotests/058
+++ b/tests/qemu-iotests/058
@@ -88,6 +88,8 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 _supported_fmt qcow2
 _supported_proto file
 _require_command QEMU_NBD
+# Internal snapshots are (currently) impossible with refcount_width=1
+_unsupported_imgopts 'refcount_width=1[^0-9]'
 
 echo
 echo "== preparing image =="
diff --git a/tests/qemu-iotests/067 b/tests/qemu-iotests/067
index 29cd6b5..7fd5f65 100755
--- a/tests/qemu-iotests/067
+++ b/tests/qemu-iotests/067
@@ -35,6 +35,8 @@ status=1  # failure is the default!
 _supported_fmt qcow2
 _supported_proto file
 _supported_os Linux
+# Because anything other than 16 would change the output of query-block
+_unsupported_imgopts 'refcount_width=\([^1]\|.\([^6]\|$\)\)'
 
 function do_run_qemu()
 {
diff --git a/tests/qemu-iotests/079 b/tests/qemu-iotests/079
index 6613cfb..ade6efa 100755
--- a/tests/qemu-iotests/079
+++ b/tests/qemu-iotests/079
@@ -42,19 +42,13 @@ _supported_fmt qcow2
 _supported_proto file nfs
 _supported_os Linux
 
-function test_qemu_img()
-{
-echo qemu-img "$@" | _filter_testd

Re: [Qemu-devel] [PATCH v4 33/47] Postcopy: Postcopy startup in migration thread

2014-11-20 Thread Paolo Bonzini



On 20/11/2014 18:12, Dr. David Alan Gilbert wrote:
> Trace added, and also moved as requested - was the request to move
> it just to elimintate the other DPRINTF?

Yes.

>> > Also what is 2/3/4?  Is this just for debugging or is it part of the
>> > protocol?
> Debug; they're very useful for matching the debug streams up, especially
> when the timers on the two hosts are very different.
> (I'm up for suggestions on how to mark the 2/3/4 for debug more clearly,
> especially if it meant that it didn't make the ping (ne reqack) dedicated
> to debug).
> 

No problem, as long as it's clear to the guy matching the code against
the debug output.

Paolo

[Qemu-devel] [PATCH v3 12/22] qcow2: Allow creation with refcount order != 4

2014-11-20 Thread Max Reitz

Add a creation option to qcow2 for setting the refcount order of images
to be created, and respect that option's value.

This breaks some test outputs, fix them.

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 block/qcow2.c  |  20 
 include/block/block_int.h  |   1 +
 tests/qemu-iotests/049.out | 112 ++---
 tests/qemu-iotests/082.out |  41 ++---
 tests/qemu-iotests/085.out |  38 +++
 5 files changed, 130 insertions(+), 82 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 39ef74a..58f7bf6 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2070,6 +2070,17 @@ static int qcow2_create(const char *filename, QemuOpts 
*opts, Error **errp)
 goto finish;
 }
 
+refcount_width = qemu_opt_get_number_del(opts, BLOCK_OPT_REFCOUNT_WIDTH,
+ refcount_width);
+if (refcount_width <= 0 || refcount_width > 64 ||
+!is_power_of_2(refcount_width))
+{
+error_setg(errp, "Refcount width must be a power of two and may not "
+   "exceed 64 bits");
+ret = -EINVAL;
+goto finish;
+}
+
 if (version < 3 && refcount_width != 16) {
 error_setg(errp, "Different refcount widths than 16 bits require "
"compatibility level 1.1 or above (use compat=1.1 or "
@@ -2709,6 +2720,9 @@ static int qcow2_amend_options(BlockDriverState *bs, 
QemuOpts *opts,
 } else if (!strcmp(desc->name, "lazy_refcounts")) {
 lazy_refcounts = qemu_opt_get_bool(opts, "lazy_refcounts",
lazy_refcounts);
+} else if (!strcmp(desc->name, "refcount_width")) {
+error_report("Cannot change refcount entry width");
+return -ENOTSUP;
 } else {
 /* if this assertion fails, this probably means a new option was
  * added without having it covered here */
@@ -2878,6 +2892,12 @@ static QemuOptsList qcow2_create_opts = {
 .help = "Postpone refcount updates",
 .def_value_str = "off"
 },
+{
+.name = BLOCK_OPT_REFCOUNT_WIDTH,
+.type = QEMU_OPT_NUMBER,
+.help = "Width of a reference count entry in bits",
+.def_value_str = "16"
+},
 { /* end of list */ }
 }
 };
diff --git a/include/block/block_int.h b/include/block/block_int.h
index a1c17b9..c34d610 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -56,6 +56,7 @@
 #define BLOCK_OPT_ADAPTER_TYPE  "adapter_type"
 #define BLOCK_OPT_REDUNDANCY"redundancy"
 #define BLOCK_OPT_NOCOW "nocow"
+#define BLOCK_OPT_REFCOUNT_WIDTH"refcount_width"
 
 typedef struct BdrvTrackedRequest {
 BlockDriverState *bs;
diff --git a/tests/qemu-iotests/049.out b/tests/qemu-iotests/049.out
index 09ca0ae..9369c12 100644
--- a/tests/qemu-iotests/049.out
+++ b/tests/qemu-iotests/049.out
@@ -4,90 +4,90 @@ QA output created by 049
 == 1. Traditional size parameter ==
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1024
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off 
cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off 
cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1024b
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off 
cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off 
cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1k
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off 
cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off 
cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1K
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off 
cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off 
cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1048576 encryption=off 
cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1048576 encryption=off 
cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1G
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1073741824 encryption=off 
cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1073741824 encryption=off 
cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1T
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1099511627776 encryption=off 
cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 s

[Qemu-devel] [PATCH v3 21/22] qcow2: Point to amend function in check

2014-11-20 Thread Max Reitz

If a reference count is not representable with the current refcount
order, the image check should point to qemu-img amend for increasing the
refcount order. However, qemu-img amend needs write access to the image
which cannot be provided if the image is marked corrupt; and the image
check will not mark the image consistent unless everything actually is
consistent.

Therefore, if an image is marked corrupt and the image check encounters
a reference count overflow, it cannot be fixed by using qemu-img amend
to increase the refcount order. Instead, one has to use qemu-img convert
to create a completely new copy of the image in this case.

Alternatively, we may want to give the user a way of manually removing
the corrupt flag, maybe through qemu-img amend, but this is not part of
this patch.

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 block/qcow2-refcount.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index d619728..9201d30 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1360,6 +1360,9 @@ static int inc_refcounts(BlockDriverState *bs,
 if (refcount == s->refcount_max) {
 fprintf(stderr, "ERROR: overflow cluster offset=0x%" PRIx64
 "\n", cluster_offset);
+fprintf(stderr, "Use qemu-img amend to increase the refcount entry 
"
+"width or qemu-img convert to create a clean copy if the "
+"image cannot be opened for writing\n");
 res->corruptions++;
 continue;
 }
-- 
1.9.3

[Qemu-devel] [PATCH v3 08/22] qcow2: More helpers for refcount modification

2014-11-20 Thread Max Reitz

Add helper functions for getting and setting refcounts in a refcount
array for any possible refcount order, and choose the correct one during
refcount initialization.

Signed-off-by: Max Reitz 
---
 block/qcow2-refcount.c | 124 -
 1 file changed, 122 insertions(+), 2 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index ff36924..60594d8 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -32,10 +32,49 @@ static int QEMU_WARN_UNUSED_RESULT 
update_refcount(BlockDriverState *bs,
 int64_t offset, int64_t length,
 int addend, enum qcow2_discard_type type);
 
+static uint64_t get_refcount_ro0(const void *refcount_array, uint64_t index);
+static uint64_t get_refcount_ro1(const void *refcount_array, uint64_t index);
+static uint64_t get_refcount_ro2(const void *refcount_array, uint64_t index);
+static uint64_t get_refcount_ro3(const void *refcount_array, uint64_t index);
 static uint64_t get_refcount_ro4(const void *refcount_array, uint64_t index);
+static uint64_t get_refcount_ro5(const void *refcount_array, uint64_t index);
+static uint64_t get_refcount_ro6(const void *refcount_array, uint64_t index);
 
+static void set_refcount_ro0(void *refcount_array, uint64_t index,
+ uint64_t value);
+static void set_refcount_ro1(void *refcount_array, uint64_t index,
+ uint64_t value);
+static void set_refcount_ro2(void *refcount_array, uint64_t index,
+ uint64_t value);
+static void set_refcount_ro3(void *refcount_array, uint64_t index,
+ uint64_t value);
 static void set_refcount_ro4(void *refcount_array, uint64_t index,
  uint64_t value);
+static void set_refcount_ro5(void *refcount_array, uint64_t index,
+ uint64_t value);
+static void set_refcount_ro6(void *refcount_array, uint64_t index,
+ uint64_t value);
+
+
+static Qcow2GetRefcountFunc *const get_refcount_funcs[] = {
+&get_refcount_ro0,
+&get_refcount_ro1,
+&get_refcount_ro2,
+&get_refcount_ro3,
+&get_refcount_ro4,
+&get_refcount_ro5,
+&get_refcount_ro6
+};
+
+static Qcow2SetRefcountFunc *const set_refcount_funcs[] = {
+&set_refcount_ro0,
+&set_refcount_ro1,
+&set_refcount_ro2,
+&set_refcount_ro3,
+&set_refcount_ro4,
+&set_refcount_ro5,
+&set_refcount_ro6
+};
 
 
 /*/
@@ -47,8 +86,10 @@ int qcow2_refcount_init(BlockDriverState *bs)
 unsigned int refcount_table_size2, i;
 int ret;
 
-s->get_refcount = &get_refcount_ro4;
-s->set_refcount = &set_refcount_ro4;
+assert(s->refcount_order >= 0 && s->refcount_order <= 6);
+
+s->get_refcount = get_refcount_funcs[s->refcount_order];
+s->set_refcount = set_refcount_funcs[s->refcount_order];
 
 assert(s->refcount_table_size <= INT_MAX / sizeof(uint64_t));
 refcount_table_size2 = s->refcount_table_size * sizeof(uint64_t);
@@ -80,6 +121,59 @@ void qcow2_refcount_close(BlockDriverState *bs)
 }
 
 
+static uint64_t get_refcount_ro0(const void *refcount_array, uint64_t index)
+{
+return (((const uint8_t *)refcount_array)[index / 8] >> (index % 8)) & 0x1;
+}
+
+static void set_refcount_ro0(void *refcount_array, uint64_t index,
+ uint64_t value)
+{
+assert(!(value >> 1));
+((uint8_t *)refcount_array)[index / 8] &= ~(0x1 << (index % 8));
+((uint8_t *)refcount_array)[index / 8] |= value << (index % 8);
+}
+
+static uint64_t get_refcount_ro1(const void *refcount_array, uint64_t index)
+{
+return (((const uint8_t *)refcount_array)[index / 4] >> (2 * (index % 4)))
+   & 0x3;
+}
+
+static void set_refcount_ro1(void *refcount_array, uint64_t index,
+ uint64_t value)
+{
+assert(!(value >> 2));
+((uint8_t *)refcount_array)[index / 4] &= ~(0x3 << (2 * (index % 4)));
+((uint8_t *)refcount_array)[index / 4] |= value << (2 * (index % 4));
+}
+
+static uint64_t get_refcount_ro2(const void *refcount_array, uint64_t index)
+{
+return (((const uint8_t *)refcount_array)[index / 2] >> (4 * (index % 2)))
+   & 0xf;
+}
+
+static void set_refcount_ro2(void *refcount_array, uint64_t index,
+ uint64_t value)
+{
+assert(!(value >> 4));
+((uint8_t *)refcount_array)[index / 2] &= ~(0xf << (4 * (index % 2)));
+((uint8_t *)refcount_array)[index / 2] |= value << (4 * (index % 2));
+}
+
+static uint64_t get_refcount_ro3(const void *refcount_array, uint64_t index)
+{
+return ((const uint8_t *)refcount_array)[index];
+}
+
+static void set_refcount_ro3(void *refcount_array, uint64_t index,
+ uint64_t value)
+{
+assert(!(value >> 8));
+((uint8_t *)refcount_array)[index] = value;
+}
+
 static uint64_t

[Qemu-devel] [PATCH v3 22/22] iotests: Add test for different refcount widths

2014-11-20 Thread Max Reitz

Add a test for conversion between different refcount widths and errors
specific to certain widths (i.e. snapshots with refcount_width=1).

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/112 | 278 +
 tests/qemu-iotests/112.out | 143 +++
 tests/qemu-iotests/group   |   1 +
 3 files changed, 422 insertions(+)
 create mode 100755 tests/qemu-iotests/112
 create mode 100644 tests/qemu-iotests/112.out

diff --git a/tests/qemu-iotests/112 b/tests/qemu-iotests/112
new file mode 100755
index 000..b184e34
--- /dev/null
+++ b/tests/qemu-iotests/112
@@ -0,0 +1,278 @@
+#!/bin/bash
+#
+# Test cases for different refcount_widths
+#
+# Copyright (C) 2014 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+# creator
+owner=mre...@redhat.com
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+
+here="$PWD"
+tmp=/tmp/$$
+status=1   # failure is the default!
+
+_cleanup()
+{
+   _cleanup_test_img
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+# This tests qcow2-specific low-level functionality
+_supported_fmt qcow2
+_supported_proto file
+_supported_os Linux
+# This test will set refcount_width on its own which would conflict with the
+# manual setting; compat will be overridden as well
+_unsupported_imgopts refcount_width 'compat=0.10'
+
+function print_refcount_width()
+{
+$QEMU_IMG info "$TEST_IMG" | sed -n '/refcount width:/ s/^ *//p'
+}
+
+echo
+echo '=== refcount_width limits ==='
+echo
+
+# Must be positive (non-zero)
+IMGOPTS="$IMGOPTS,refcount_width=0" _make_test_img 64M
+# Must be positive (non-negative)
+IMGOPTS="$IMGOPTS,refcount_width=-1" _make_test_img 64M
+# May not exceed 64
+IMGOPTS="$IMGOPTS,refcount_width=128" _make_test_img 64M
+# Must be a power of two
+IMGOPTS="$IMGOPTS,refcount_width=42" _make_test_img 64M
+
+# 1 is the minimum
+IMGOPTS="$IMGOPTS,refcount_width=1" _make_test_img 64M
+print_refcount_width
+
+# 64 is the maximum
+IMGOPTS="$IMGOPTS,refcount_width=64" _make_test_img 64M
+print_refcount_width
+
+# 16 is the default
+_make_test_img 64M
+print_refcount_width
+
+echo
+echo '=== refcount_width and compat=0.10 ==='
+echo
+
+# Should work
+IMGOPTS="$IMGOPTS,compat=0.10,refcount_width=16" _make_test_img 64M
+print_refcount_width
+
+# Should not work
+IMGOPTS="$IMGOPTS,compat=0.10,refcount_width=1" _make_test_img 64M
+IMGOPTS="$IMGOPTS,compat=0.10,refcount_width=64" _make_test_img 64M
+
+
+echo
+echo '=== Snapshot limit on refcount_width=1 ==='
+echo
+
+IMGOPTS="$IMGOPTS,refcount_width=1" _make_test_img 64M
+print_refcount_width
+
+$QEMU_IO -c 'write 0 512' "$TEST_IMG" | _filter_qemu_io
+
+# Should fail for now; in the future, this might be supported by automatically
+# copying all clusters with overflowing refcount
+$QEMU_IMG snapshot -c foo "$TEST_IMG"
+
+# The new L1 table could/should be leaked
+_check_test_img
+
+echo
+echo '=== Snapshot limit on refcount_width=2 ==='
+echo
+
+IMGOPTS="$IMGOPTS,refcount_width=2" _make_test_img 64M
+print_refcount_width
+
+$QEMU_IO -c 'write 0 512' "$TEST_IMG" | _filter_qemu_io
+
+# Should succeed
+$QEMU_IMG snapshot -c foo "$TEST_IMG"
+$QEMU_IMG snapshot -c bar "$TEST_IMG"
+# Should fail (4th reference)
+$QEMU_IMG snapshot -c baz "$TEST_IMG"
+
+# The new L1 table could/should be leaked
+_check_test_img
+
+echo
+echo '=== Compressed clusters with refcount_width=1 ==='
+echo
+
+IMGOPTS="$IMGOPTS,refcount_width=1" _make_test_img 64M
+print_refcount_width
+
+# Both should fit into a single host cluster; instead of failing to increase 
the
+# refcount of that cluster, qemu should just allocate a new cluster and make
+# this operation succeed
+$QEMU_IO -c 'write -P 0 -c  0  64k' \
+ -c 'write -P 1 -c 64k 64k' \
+ "$TEST_IMG" | _filter_qemu_io
+
+_check_test_img
+
+echo
+echo '=== MSb set in 64 bit refcount ==='
+echo
+
+IMGOPTS="$IMGOPTS,refcount_width=64" _make_test_img 64M
+print_refcount_width
+
+$QEMU_IO -c 'write 0 512' "$TEST_IMG" | _filter_qemu_io
+
+# Set the MSb in the refblock entry of the data cluster
+poke_file "$TEST_IMG" $((0x20028)) "\x80\x00\x00\x00\x00\x00\x00\x00"
+
+# Clear OFLAG_COPIED in the L2 entry of the data cluster
+poke_file "$TEST_IMG" $((0x4)) "\x00\x00\x00\x00\x00\x05\x00\x00"
+
+# And now try to write to that cluster (this would normall

[Qemu-devel] [PATCH v3 07/22] qcow2: Helper function for refcount modification

2014-11-20 Thread Max Reitz

Since refcounts do not always have to be a uint16_t, all refcount blocks
and arrays in memory should not have a specific type (thus they become
pointers to void) and for accessing them, two helper functions are used
(a getter and a setter). Those functions are called indirectly through
function pointers in the BDRVQcowState so they may later be exchanged
for different refcount orders.

With the check and repair functions using this function, the refcount
array they are creating will be in big endian byte order; additionally,
using realloc_refcount_array() makes the size of this refcount array
always cluster-aligned. Both combined allow rebuild_refcount_structure()
to drop the bounce buffer which was used to convert parts of the
refcount array to big endian byte order and store them on disk. Instead,
those parts can now be written directly.

Signed-off-by: Max Reitz 
---
 block/qcow2-refcount.c | 131 ++---
 block/qcow2.h  |   8 +++
 2 files changed, 89 insertions(+), 50 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 3cd540c..ff36924 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -32,6 +32,11 @@ static int QEMU_WARN_UNUSED_RESULT 
update_refcount(BlockDriverState *bs,
 int64_t offset, int64_t length,
 int addend, enum qcow2_discard_type type);
 
+static uint64_t get_refcount_ro4(const void *refcount_array, uint64_t index);
+
+static void set_refcount_ro4(void *refcount_array, uint64_t index,
+ uint64_t value);
+
 
 /*/
 /* refcount handling */
@@ -42,6 +47,9 @@ int qcow2_refcount_init(BlockDriverState *bs)
 unsigned int refcount_table_size2, i;
 int ret;
 
+s->get_refcount = &get_refcount_ro4;
+s->set_refcount = &set_refcount_ro4;
+
 assert(s->refcount_table_size <= INT_MAX / sizeof(uint64_t));
 refcount_table_size2 = s->refcount_table_size * sizeof(uint64_t);
 s->refcount_table = g_try_malloc(refcount_table_size2);
@@ -72,6 +80,19 @@ void qcow2_refcount_close(BlockDriverState *bs)
 }
 
 
+static uint64_t get_refcount_ro4(const void *refcount_array, uint64_t index)
+{
+return be16_to_cpu(((const uint16_t *)refcount_array)[index]);
+}
+
+static void set_refcount_ro4(void *refcount_array, uint64_t index,
+ uint64_t value)
+{
+assert(!(value >> 16));
+((uint16_t *)refcount_array)[index] = cpu_to_be16(value);
+}
+
+
 static int load_refcount_block(BlockDriverState *bs,
int64_t refcount_block_offset,
void **refcount_block)
@@ -97,7 +118,7 @@ int64_t qcow2_get_refcount(BlockDriverState *bs, int64_t 
cluster_index)
 uint64_t refcount_table_index, block_index;
 int64_t refcount_block_offset;
 int ret;
-uint16_t *refcount_block;
+void *refcount_block;
 int64_t refcount;
 
 refcount_table_index = cluster_index >> s->refcount_block_bits;
@@ -116,20 +137,24 @@ int64_t qcow2_get_refcount(BlockDriverState *bs, int64_t 
cluster_index)
 }
 
 ret = qcow2_cache_get(bs, s->refcount_block_cache, refcount_block_offset,
-(void**) &refcount_block);
+  &refcount_block);
 if (ret < 0) {
 return ret;
 }
 
 block_index = cluster_index & (s->refcount_block_size - 1);
-refcount = be16_to_cpu(refcount_block[block_index]);
+refcount = s->get_refcount(refcount_block, block_index);
 
-ret = qcow2_cache_put(bs, s->refcount_block_cache,
-(void**) &refcount_block);
+ret = qcow2_cache_put(bs, s->refcount_block_cache, &refcount_block);
 if (ret < 0) {
 return ret;
 }
 
+if (refcount < 0) {
+/* overflow */
+return -ERANGE;
+}
+
 return refcount;
 }
 
@@ -169,7 +194,7 @@ static int in_same_refcount_block(BDRVQcowState *s, 
uint64_t offset_a,
  * Returns 0 on success or -errno in error case
  */
 static int alloc_refcount_block(BlockDriverState *bs,
-int64_t cluster_index, uint16_t **refcount_block)
+int64_t cluster_index, void **refcount_block)
 {
 BDRVQcowState *s = bs->opaque;
 unsigned int refcount_table_index;
@@ -196,7 +221,7 @@ static int alloc_refcount_block(BlockDriverState *bs,
 }
 
  return load_refcount_block(bs, refcount_block_offset,
- (void**) refcount_block);
+refcount_block);
 }
 }
 
@@ -256,7 +281,7 @@ static int alloc_refcount_block(BlockDriverState *bs,
 /* The block describes itself, need to update the cache */
 int block_index = (new_block >> s->cluster_bits) &
 (s->refcount_block_size - 1);
-(*refcount_block)[block_index] = cpu_to_be16(1);
+s->set_refcount(*refcount_block, block_index, 1);
 } else {
 /* Described

[Qemu-devel] [PATCH v3 20/22] qcow2: Invoke refcount order amendment function

2014-11-20 Thread Max Reitz

Make use of qcow2_change_refcount_order() to support changing the
refcount order with qemu-img amend.

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 block/qcow2.c | 44 +++-
 1 file changed, 35 insertions(+), 9 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 23d2d59..aea54df 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2612,13 +2612,7 @@ static int qcow2_downgrade(BlockDriverState *bs, int 
target_version,
 }
 
 if (s->refcount_order != 4) {
-/* we would have to convert the image to a refcount_order == 4 image
- * here; however, since qemu (at the time of writing this) does not
- * support anything different than 4 anyway, there is no point in doing
- * so right now; however, we should error out (if qemu supports this in
- * the future and this code has not been adapted) */
-error_report("qcow2_downgrade: Image refcount orders other than 4 are "
- "currently not supported.");
+error_report("compat=0.10 requires refcount_width=16");
 return -ENOTSUP;
 }
 
@@ -2666,6 +2660,7 @@ typedef enum Qcow2AmendOperation {
  * invocation from an operation change */
 QCOW2_NO_OPERATION = 0,
 
+QCOW2_CHANGING_REFCOUNT_ORDER,
 QCOW2_DOWNGRADING,
 } Qcow2AmendOperation;
 
@@ -2741,6 +2736,7 @@ static int qcow2_amend_options(BlockDriverState *bs, 
QemuOpts *opts,
 const char *compat = NULL;
 uint64_t cluster_size = s->cluster_size;
 bool encrypt;
+int refcount_width = s->refcount_bits;
 int ret;
 QemuOptDesc *desc = opts->list->desc;
 Qcow2AmendHelperCBInfo helper_cb_info;
@@ -2790,8 +2786,16 @@ static int qcow2_amend_options(BlockDriverState *bs, 
QemuOpts *opts,
 lazy_refcounts = qemu_opt_get_bool(opts, "lazy_refcounts",
lazy_refcounts);
 } else if (!strcmp(desc->name, "refcount_width")) {
-error_report("Cannot change refcount entry width");
-return -ENOTSUP;
+refcount_width = qemu_opt_get_number(opts, "refcount_width",
+ refcount_width);
+
+if (refcount_width <= 0 || refcount_width > 64 ||
+!is_power_of_2(refcount_width))
+{
+error_report("Refcount width must be a power of two and may "
+ "not exceed 64 bits");
+return -EINVAL;
+}
 } else {
 /* if this point is reached, this probably means a new option was
  * added without having it covered here */
@@ -2805,6 +2809,7 @@ static int qcow2_amend_options(BlockDriverState *bs, 
QemuOpts *opts,
 .original_status_cb = status_cb,
 .original_cb_opaque = cb_opaque,
 .total_operations = (new_version < old_version)
+  + (s->refcount_bits != refcount_width)
 };
 
 /* Upgrade first (some features may require compat=1.1) */
@@ -2817,6 +2822,27 @@ static int qcow2_amend_options(BlockDriverState *bs, 
QemuOpts *opts,
 }
 }
 
+if (s->refcount_bits != refcount_width) {
+int refcount_order = ffs(refcount_width) - 1;
+Error *local_error = NULL;
+
+if (new_version < 3 && refcount_width != 16) {
+error_report("Different refcount widths than 16 bits require "
+ "compatibility level 1.1 or above (use compat=1.1 or "
+ "greater)");
+return -EINVAL;
+}
+
+helper_cb_info.current_operation = QCOW2_CHANGING_REFCOUNT_ORDER;
+ret = qcow2_change_refcount_order(bs, refcount_order,
+  &qcow2_amend_helper_cb,
+  &helper_cb_info, &local_error);
+if (ret < 0) {
+qerror_report_err(local_error);
+return ret;
+}
+}
+
 if (backing_file || backing_format) {
 ret = qcow2_change_backing_file(bs, backing_file ?: bs->backing_file,
 backing_format ?: bs->backing_format);
-- 
1.9.3

[Qemu-devel] [PATCH v3 02/22] qcow2: Add refcount_width to format-specific info

2014-11-20 Thread Max Reitz

Add the bit width of every refcount entry to the format-specific
information.

In contrast to lazy_refcounts and the corrupt flag, this should be
always emitted, even for compat=0.10 although it does not support any
refcount width other than 16 bits. This is because if a boolean is
optional, one normally assumes it to be false when omitted; but if an
integer is not specified, it is rather difficult to guess its value.

This new field breaks some test outputs, fix them.

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 block/qcow2.c  |  4 +++-
 qapi/block-core.json   |  5 -
 tests/qemu-iotests/060.out |  1 +
 tests/qemu-iotests/065 | 23 +++
 tests/qemu-iotests/067.out |  5 +
 tests/qemu-iotests/082.out |  7 +++
 tests/qemu-iotests/089.out |  2 ++
 7 files changed, 37 insertions(+), 10 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index f57aff9..d70e927 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2475,7 +2475,8 @@ static ImageInfoSpecific 
*qcow2_get_specific_info(BlockDriverState *bs)
 };
 if (s->qcow_version == 2) {
 *spec_info->qcow2 = (ImageInfoSpecificQCow2){
-.compat = g_strdup("0.10"),
+.compat = g_strdup("0.10"),
+.refcount_width = s->refcount_bits,
 };
 } else if (s->qcow_version == 3) {
 *spec_info->qcow2 = (ImageInfoSpecificQCow2){
@@ -2486,6 +2487,7 @@ static ImageInfoSpecific 
*qcow2_get_specific_info(BlockDriverState *bs)
 .corrupt= s->incompatible_features &
   QCOW2_INCOMPAT_CORRUPT,
 .has_corrupt= true,
+.refcount_width = s->refcount_bits,
 };
 }
 
diff --git a/qapi/block-core.json b/qapi/block-core.json
index b7083fb..e3a3cb7 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -41,13 +41,16 @@
 # @corrupt: #optional true if the image has been marked corrupt; only valid for
 #   compat >= 1.1 (since 2.2)
 #
+# @refcount-width: width of a refcount entry in bits (since 2.3)
+#
 # Since: 1.7
 ##
 { 'type': 'ImageInfoSpecificQCow2',
   'data': {
   'compat': 'str',
   '*lazy-refcounts': 'bool',
-  '*corrupt': 'bool'
+  '*corrupt': 'bool',
+  'refcount-width': 'int'
   } }
 
 ##
diff --git a/tests/qemu-iotests/060.out b/tests/qemu-iotests/060.out
index 9419da1..17b3eaf 100644
--- a/tests/qemu-iotests/060.out
+++ b/tests/qemu-iotests/060.out
@@ -19,6 +19,7 @@ cluster_size: 65536
 Format specific information:
 compat: 1.1
 lazy refcounts: false
+refcount width: 16
 corrupt: true
 qemu-io: can't open device TEST_DIR/t.IMGFMT: IMGFMT: Image is corrupt; cannot 
be opened read/write
 read 512/512 bytes at offset 0
diff --git a/tests/qemu-iotests/065 b/tests/qemu-iotests/065
index 8d3a9c9..8539aeb 100755
--- a/tests/qemu-iotests/065
+++ b/tests/qemu-iotests/065
@@ -88,34 +88,41 @@ class TestQMP(TestImageInfoSpecific):
 class TestQCow2(TestQemuImgInfo):
 '''Testing a qcow2 version 2 image'''
 img_options = 'compat=0.10'
-json_compare = { 'compat': '0.10' }
-human_compare = [ 'compat: 0.10' ]
+json_compare = { 'compat': '0.10', 'refcount-width': 16 }
+human_compare = [ 'compat: 0.10', 'refcount width: 16' ]
 
 class TestQCow3NotLazy(TestQemuImgInfo):
 '''Testing a qcow2 version 3 image with lazy refcounts disabled'''
 img_options = 'compat=1.1,lazy_refcounts=off'
-json_compare = { 'compat': '1.1', 'lazy-refcounts': False, 'corrupt': 
False }
-human_compare = [ 'compat: 1.1', 'lazy refcounts: false', 'corrupt: false' 
]
+json_compare = { 'compat': '1.1', 'lazy-refcounts': False,
+ 'refcount-width': 16, 'corrupt': False }
+human_compare = [ 'compat: 1.1', 'lazy refcounts: false',
+  'refcount width: 16', 'corrupt: false' ]
 
 class TestQCow3Lazy(TestQemuImgInfo):
 '''Testing a qcow2 version 3 image with lazy refcounts enabled'''
 img_options = 'compat=1.1,lazy_refcounts=on'
-json_compare = { 'compat': '1.1', 'lazy-refcounts': True, 'corrupt': False 
}
-human_compare = [ 'compat: 1.1', 'lazy refcounts: true', 'corrupt: false' ]
+json_compare = { 'compat': '1.1', 'lazy-refcounts': True,
+ 'refcount-width': 16, 'corrupt': False }
+human_compare = [ 'compat: 1.1', 'lazy refcounts: true',
+  'refcount width: 16', 'corrupt: false' ]
 
 class TestQCow3NotLazyQMP(TestQMP):
 '''Testing a qcow2 version 3 image with lazy refcounts disabled, opening
with lazy refcounts enabled'''
 img_options = 'compat=1.1,lazy_refcounts=off'
 qemu_options = 'lazy-refcounts=on'
-compare = { 'compat': '1.1', 'lazy-refcounts': False, 'corrupt': False }
+compare = { 'compat': '1.1', 'lazy-refcounts': False,
+'refcount-width': 16, 'corrupt': False }
+
 
 class TestQCow3LazyQMP(TestQMP):
 '''Testing a qcow2 version 3 image wi

[Qemu-devel] [PATCH v3 18/22] qcow2: Use intermediate helper CB for amend

2014-11-20 Thread Max Reitz

If there is more than one time-consuming operation to be performed for
qcow2_amend_options(), we need an intermediate CB which coordinates the
progress of the individual operations and passes the result to the
original status callback.

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 block/qcow2.c | 80 ++-
 1 file changed, 79 insertions(+), 1 deletion(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index c5c4e86..23d2d59 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2660,6 +2660,75 @@ static int qcow2_downgrade(BlockDriverState *bs, int 
target_version,
 return 0;
 }
 
+typedef enum Qcow2AmendOperation {
+/* This is the value Qcow2AmendHelperCBInfo::last_operation will be
+ * statically initialized to so that the helper CB can discern the first
+ * invocation from an operation change */
+QCOW2_NO_OPERATION = 0,
+
+QCOW2_DOWNGRADING,
+} Qcow2AmendOperation;
+
+typedef struct Qcow2AmendHelperCBInfo {
+/* The code coordinating the amend operations should only modify
+ * these four fields; the rest will be managed by the CB */
+BlockDriverAmendStatusCB *original_status_cb;
+void *original_cb_opaque;
+
+Qcow2AmendOperation current_operation;
+
+/* Total number of operations to perform (only set once) */
+int total_operations;
+
+/* The following fields are managed by the CB */
+
+/* Number of operations completed */
+int operations_completed;
+
+/* Cumulative offset of all completed operations */
+int64_t offset_completed;
+
+Qcow2AmendOperation last_operation;
+int64_t last_work_size;
+} Qcow2AmendHelperCBInfo;
+
+static void qcow2_amend_helper_cb(BlockDriverState *bs,
+  int64_t operation_offset,
+  int64_t operation_work_size, void *opaque)
+{
+Qcow2AmendHelperCBInfo *info = opaque;
+int64_t current_work_size;
+int64_t projected_work_size;
+
+if (info->current_operation != info->last_operation) {
+if (info->last_operation != QCOW2_NO_OPERATION) {
+info->offset_completed += info->last_work_size;
+info->operations_completed++;
+}
+
+info->last_operation = info->current_operation;
+}
+
+assert(info->total_operations > 0);
+assert(info->operations_completed < info->total_operations);
+
+info->last_work_size = operation_work_size;
+
+current_work_size = info->offset_completed + operation_work_size;
+
+/* current_work_size is the total work size for (operations_completed + 1)
+ * operations (which includes this one), so multiply it by the number of
+ * operations not covered and divide it by the number of operations
+ * covered to get a projection for the operations not covered */
+projected_work_size = current_work_size * (info->total_operations -
+   info->operations_completed - 1)
+/ (info->operations_completed + 1);
+
+info->original_status_cb(bs, info->offset_completed + operation_offset,
+ current_work_size + projected_work_size,
+ info->original_cb_opaque);
+}
+
 static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
BlockDriverAmendStatusCB *status_cb,
void *cb_opaque)
@@ -2674,6 +2743,7 @@ static int qcow2_amend_options(BlockDriverState *bs, 
QemuOpts *opts,
 bool encrypt;
 int ret;
 QemuOptDesc *desc = opts->list->desc;
+Qcow2AmendHelperCBInfo helper_cb_info;
 
 while (desc && desc->name) {
 if (!qemu_opt_find(opts, desc->name)) {
@@ -2731,6 +2801,12 @@ static int qcow2_amend_options(BlockDriverState *bs, 
QemuOpts *opts,
 desc++;
 }
 
+helper_cb_info = (Qcow2AmendHelperCBInfo){
+.original_status_cb = status_cb,
+.original_cb_opaque = cb_opaque,
+.total_operations = (new_version < old_version)
+};
+
 /* Upgrade first (some features may require compat=1.1) */
 if (new_version > old_version) {
 s->qcow_version = new_version;
@@ -2789,7 +2865,9 @@ static int qcow2_amend_options(BlockDriverState *bs, 
QemuOpts *opts,
 
 /* Downgrade last (so unsupported features can be removed before) */
 if (new_version < old_version) {
-ret = qcow2_downgrade(bs, new_version, status_cb, cb_opaque);
+helper_cb_info.current_operation = QCOW2_DOWNGRADING;
+ret = qcow2_downgrade(bs, new_version, &qcow2_amend_helper_cb,
+  &helper_cb_info);
 if (ret < 0) {
 return ret;
 }
-- 
1.9.3

[Qemu-devel] [PATCH v3 17/22] qcow2: Split upgrade/downgrade paths for amend

2014-11-20 Thread Max Reitz

If the image version should be upgraded, that is the first we should do;
if it should be downgraded, that is the last we should do. So split the
version change block into an upgrade part at the start and a downgrade
part at the end.

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 block/qcow2.c | 31 ---
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index e8d54ab..c5c4e86 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2731,20 +2731,13 @@ static int qcow2_amend_options(BlockDriverState *bs, 
QemuOpts *opts,
 desc++;
 }
 
-if (new_version != old_version) {
-if (new_version > old_version) {
-/* Upgrade */
-s->qcow_version = new_version;
-ret = qcow2_update_header(bs);
-if (ret < 0) {
-s->qcow_version = old_version;
-return ret;
-}
-} else {
-ret = qcow2_downgrade(bs, new_version, status_cb, cb_opaque);
-if (ret < 0) {
-return ret;
-}
+/* Upgrade first (some features may require compat=1.1) */
+if (new_version > old_version) {
+s->qcow_version = new_version;
+ret = qcow2_update_header(bs);
+if (ret < 0) {
+s->qcow_version = old_version;
+return ret;
 }
 }
 
@@ -2758,7 +2751,7 @@ static int qcow2_amend_options(BlockDriverState *bs, 
QemuOpts *opts,
 
 if (s->use_lazy_refcounts != lazy_refcounts) {
 if (lazy_refcounts) {
-if (s->qcow_version < 3) {
+if (new_version < 3) {
 error_report("Lazy refcounts only supported with compatibility 
"
  "level 1.1 and above (use compat=1.1 or 
greater)");
 return -EINVAL;
@@ -2794,6 +2787,14 @@ static int qcow2_amend_options(BlockDriverState *bs, 
QemuOpts *opts,
 }
 }
 
+/* Downgrade last (so unsupported features can be removed before) */
+if (new_version < old_version) {
+ret = qcow2_downgrade(bs, new_version, status_cb, cb_opaque);
+if (ret < 0) {
+return ret;
+}
+}
+
 return 0;
 }
 
-- 
1.9.3

[Qemu-devel] [PATCH v3 10/22] qcow2: refcount_order parameter for qcow2_create2

2014-11-20 Thread Max Reitz

Add a refcount_order parameter to qcow2_create2(), use that value for
the image header and for calculating the size required for
preallocation.

For now, always pass 4.

This addition requires changes to the calculation of the file size for
the "full" and "falloc" preallocation modes. That in turn is a nice
opportunity to add a command about that calculation not necessarily
being exact (and that being intentional).

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 block/qcow2.c | 46 +++---
 1 file changed, 35 insertions(+), 11 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 18c69fb..39ef74a 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1775,7 +1775,7 @@ static int preallocate(BlockDriverState *bs)
 static int qcow2_create2(const char *filename, int64_t total_size,
  const char *backing_file, const char *backing_format,
  int flags, size_t cluster_size, PreallocMode prealloc,
- QemuOpts *opts, int version,
+ QemuOpts *opts, int version, int refcount_order,
  Error **errp)
 {
 /* Calculate cluster_bits */
@@ -1808,9 +1808,21 @@ static int qcow2_create2(const char *filename, int64_t 
total_size,
 int ret;
 
 if (prealloc == PREALLOC_MODE_FULL || prealloc == PREALLOC_MODE_FALLOC) {
+/* Note: The following calculation does not need to be exact; if it is 
a
+ * bit off, either some bytes will be "leaked" (which is fine) or we
+ * will need to increase the file size by some bytes (which is fine,
+ * too, as long as the bulk is allocated here). Therefore, using
+ * floating point arithmetic is fine. */
 int64_t meta_size = 0;
 uint64_t nreftablee, nrefblocke, nl1e, nl2e;
 int64_t aligned_total_size = align_offset(total_size, cluster_size);
+int refblock_bits, refblock_size;
+/* refcount entry size in bytes */
+double rces = (1 << refcount_order) / 8.;
+
+/* see qcow2_open() */
+refblock_bits = cluster_bits - (refcount_order - 3);
+refblock_size = 1 << refblock_bits;
 
 /* header: 1 cluster */
 meta_size += cluster_size;
@@ -1835,20 +1847,20 @@ static int qcow2_create2(const char *filename, int64_t 
total_size,
  *   c = cluster size
  *   y1 = number of refcount blocks entries
  *   y2 = meta size including everything
+ *   rces = refcount entry size in bytes
  * then,
  *   y1 = (y2 + a)/c
- *   y2 = y1 * sizeof(u16) + y1 * sizeof(u16) * sizeof(u64) / c + m
+ *   y2 = y1 * rces + y1 * rces * sizeof(u64) / c + m
  * we can get y1:
- *   y1 = (a + m) / (c - sizeof(u16) - sizeof(u16) * sizeof(u64) / c)
+ *   y1 = (a + m) / (c - rces - rces * sizeof(u64) / c)
  */
-nrefblocke = (aligned_total_size + meta_size + cluster_size) /
-(cluster_size - sizeof(uint16_t) -
- 1.0 * sizeof(uint16_t) * sizeof(uint64_t) / cluster_size);
-nrefblocke = align_offset(nrefblocke, cluster_size / sizeof(uint16_t));
-meta_size += nrefblocke * sizeof(uint16_t);
+nrefblocke = (aligned_total_size + meta_size + cluster_size)
+   / (cluster_size - rces - rces * sizeof(uint64_t)
+ / cluster_size);
+meta_size += DIV_ROUND_UP(nrefblocke, refblock_size) * cluster_size;
 
 /* total size of refcount tables */
-nreftablee = nrefblocke * sizeof(uint16_t) / cluster_size;
+nreftablee = nrefblocke / refblock_size;
 nreftablee = align_offset(nreftablee, cluster_size / sizeof(uint64_t));
 meta_size += nreftablee * sizeof(uint64_t);
 
@@ -1883,7 +1895,7 @@ static int qcow2_create2(const char *filename, int64_t 
total_size,
 .l1_size= cpu_to_be32(0),
 .refcount_table_offset  = cpu_to_be64(cluster_size),
 .refcount_table_clusters= cpu_to_be32(1),
-.refcount_order = cpu_to_be32(4),
+.refcount_order = cpu_to_be32(refcount_order),
 .header_length  = cpu_to_be32(sizeof(*header)),
 };
 
@@ -2003,6 +2015,7 @@ static int qcow2_create(const char *filename, QemuOpts 
*opts, Error **errp)
 size_t cluster_size = DEFAULT_CLUSTER_SIZE;
 PreallocMode prealloc;
 int version = 3;
+int refcount_width = 16, refcount_order;
 Error *local_err = NULL;
 int ret;
 
@@ -2057,8 +2070,19 @@ static int qcow2_create(const char *filename, QemuOpts 
*opts, Error **errp)
 goto finish;
 }
 
+if (version < 3 && refcount_width != 16) {
+error_setg(errp, "Different refcount widths than 16 bits require "
+   "compatibility level 1.1 or above (use compat=1.1 or "
+   "greater)");
+ret = -EINVAL;
+goto finish;

[Qemu-devel] [PATCH v3 06/22] qcow2: Helper for refcount array reallocation

2014-11-20 Thread Max Reitz

Add a helper function for reallocating a refcount array, independent of
the refcount order. The newly allocated space is zeroed and the function
handles failed reallocations gracefully.

The helper function will always align the buffer size to a cluster
boundary; if storing the refcounts in such an array in big endian byte
order, this makes it possible to write parts of the array directly as
refcount blocks into the image file.

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 block/qcow2-refcount.c | 135 +++--
 1 file changed, 86 insertions(+), 49 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 66c78c0..3cd540c 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1108,6 +1108,68 @@ fail:
 /* refcount checking functions */
 
 
+static size_t refcount_array_byte_size(BDRVQcowState *s, uint64_t entries)
+{
+if (s->refcount_order < 3) {
+/* sub-byte width */
+int shift = 3 - s->refcount_order;
+return (entries + (1 << shift) - 1) >> shift;
+} else if (s->refcount_order == 3) {
+/* byte width */
+return entries;
+} else {
+/* multiple bytes wide */
+
+/* This assertion holds because there is no way we can address more 
than
+ * 2^(64 - 9) clusters at once (with cluster size 512 = 2^9, and 
because
+ * offsets have to be representable in bytes); due to every cluster
+ * corresponding to one refcount entry and because refcount_order has 
to
+ * be below 7, we are far below that limit */
+assert(!(entries >> (64 - (s->refcount_order - 3;
+
+return entries << (s->refcount_order - 3);
+}
+}
+
+/**
+ * Reallocates *array so that it can hold new_size entries. *size must contain
+ * the current number of entries in *array. If the reallocation fails, *array
+ * and *size will not be modified and -errno will be returned. If the
+ * reallocation is successful, *array will be set to the new buffer and *size
+ * will be set to new_size. The size of the reallocated refcount array buffer
+ * will be aligned to a cluster boundary, and the newly allocated area will be
+ * zeroed.
+ */
+static int realloc_refcount_array(BDRVQcowState *s, uint16_t **array,
+  int64_t *size, int64_t new_size)
+{
+/* Round to clusters so the array can be directly written to disk */
+size_t old_byte_size = ROUND_UP(refcount_array_byte_size(s, *size),
+s->cluster_size);
+size_t new_byte_size = ROUND_UP(refcount_array_byte_size(s, new_size),
+s->cluster_size);
+uint16_t *new_ptr;
+
+if (new_byte_size <= old_byte_size) {
+*size = new_size;
+return 0;
+}
+
+assert(new_byte_size > 0);
+
+new_ptr = g_try_realloc(*array, new_byte_size);
+if (!new_ptr) {
+return -ENOMEM;
+}
+
+memset((void *)((uintptr_t)new_ptr + old_byte_size), 0,
+   new_byte_size - old_byte_size);
+
+*array = new_ptr;
+*size  = new_size;
+
+return 0;
+}
 
 /*
  * Increases the refcount for a range of clusters in a given refcount table.
@@ -1124,6 +1186,7 @@ static int inc_refcounts(BlockDriverState *bs,
 {
 BDRVQcowState *s = bs->opaque;
 uint64_t start, last, cluster_offset, k;
+int ret;
 
 if (size <= 0) {
 return 0;
@@ -1135,23 +1198,12 @@ static int inc_refcounts(BlockDriverState *bs,
 cluster_offset += s->cluster_size) {
 k = cluster_offset >> s->cluster_bits;
 if (k >= *refcount_table_size) {
-int64_t old_refcount_table_size = *refcount_table_size;
-uint16_t *new_refcount_table;
-
-*refcount_table_size = k + 1;
-new_refcount_table = g_try_realloc(*refcount_table,
-   *refcount_table_size *
-   sizeof(**refcount_table));
-if (!new_refcount_table) {
-*refcount_table_size = old_refcount_table_size;
+ret = realloc_refcount_array(s, refcount_table,
+ refcount_table_size, k + 1);
+if (ret < 0) {
 res->check_errors++;
-return -ENOMEM;
+return ret;
 }
-*refcount_table = new_refcount_table;
-
-memset(*refcount_table + old_refcount_table_size, 0,
-   (*refcount_table_size - old_refcount_table_size) *
-   sizeof(**refcount_table));
 }
 
 if (++(*refcount_table)[k] == 0) {
@@ -1518,8 +1570,7 @@ static int check_refblocks(BlockDriverState *bs, 
BdrvCheckResult *res,
 fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR", i);
 
 if (fix & BDRV_FIX_ERRORS) {
-int64_t old_nb_clusters = *nb_clusters;
-uint16_t *new_refcount_table;
+

[Qemu-devel] [PATCH v3 09/22] qcow2: Open images with refcount order != 4

2014-11-20 Thread Max Reitz

No longer refuse to open images with a different refcount entry width
than 16 bits; only reject images with a refcount width larger than 64
bits (which is prohibited by the specification).

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 block/qcow2.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index d70e927..18c69fb 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -677,10 +677,10 @@ static int qcow2_open(BlockDriverState *bs, QDict 
*options, int flags,
 }
 
 /* Check support for various header values */
-if (header.refcount_order != 4) {
-report_unsupported(bs, errp, "%d bit reference counts",
-   1 << header.refcount_order);
-ret = -ENOTSUP;
+if (header.refcount_order > 6) {
+error_setg(errp, "Reference count entry width too large; may not "
+   "exceed 64 bits");
+ret = -EINVAL;
 goto fail;
 }
 s->refcount_order = header.refcount_order;
-- 
1.9.3

[Qemu-devel] [PATCH v3 15/22] qcow2: Use error_report() in qcow2_amend_options()

2014-11-20 Thread Max Reitz

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 block/qcow2.c  | 14 ++
 tests/qemu-iotests/061.out | 14 +++---
 2 files changed, 13 insertions(+), 15 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 39e6061..4e19615 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2691,11 +2691,11 @@ static int qcow2_amend_options(BlockDriverState *bs, 
QemuOpts *opts,
 } else if (!strcmp(compat, "1.1")) {
 new_version = 3;
 } else {
-fprintf(stderr, "Unknown compatibility level %s.\n", compat);
+error_report("Unknown compatibility level %s", compat);
 return -EINVAL;
 }
 } else if (!strcmp(desc->name, "preallocation")) {
-fprintf(stderr, "Cannot change preallocation mode.\n");
+error_report("Cannot change preallocation mode");
 return -ENOTSUP;
 } else if (!strcmp(desc->name, "size")) {
 new_size = qemu_opt_get_size(opts, "size", 0);
@@ -2706,16 +2706,14 @@ static int qcow2_amend_options(BlockDriverState *bs, 
QemuOpts *opts,
 } else if (!strcmp(desc->name, "encryption")) {
 encrypt = qemu_opt_get_bool(opts, "encryption", s->crypt_method);
 if (encrypt != !!s->crypt_method) {
-fprintf(stderr, "Changing the encryption flag is not "
-"supported.\n");
+error_report("Changing the encryption flag is not supported");
 return -ENOTSUP;
 }
 } else if (!strcmp(desc->name, "cluster_size")) {
 cluster_size = qemu_opt_get_size(opts, "cluster_size",
  cluster_size);
 if (cluster_size != s->cluster_size) {
-fprintf(stderr, "Changing the cluster size is not "
-"supported.\n");
+error_report("Changing the cluster size is not supported");
 return -ENOTSUP;
 }
 } else if (!strcmp(desc->name, "lazy_refcounts")) {
@@ -2761,8 +2759,8 @@ static int qcow2_amend_options(BlockDriverState *bs, 
QemuOpts *opts,
 if (s->use_lazy_refcounts != lazy_refcounts) {
 if (lazy_refcounts) {
 if (s->qcow_version < 3) {
-fprintf(stderr, "Lazy refcounts only supported with 
compatibility "
-"level 1.1 and above (use compat=1.1 or greater)\n");
+error_report("Lazy refcounts only supported with compatibility 
"
+ "level 1.1 and above (use compat=1.1 or 
greater)");
 return -EINVAL;
 }
 s->compatible_features |= QCOW2_COMPAT_LAZY_REFCOUNTS;
diff --git a/tests/qemu-iotests/061.out b/tests/qemu-iotests/061.out
index 9045544..2fd92ca 100644
--- a/tests/qemu-iotests/061.out
+++ b/tests/qemu-iotests/061.out
@@ -281,19 +281,19 @@ No errors were found on the image.
 === Testing invalid configurations ===
 
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864 
-Lazy refcounts only supported with compatibility level 1.1 and above (use 
compat=1.1 or greater)
+qemu-img: Lazy refcounts only supported with compatibility level 1.1 and above 
(use compat=1.1 or greater)
 qemu-img: Error while amending options: Invalid argument
-Lazy refcounts only supported with compatibility level 1.1 and above (use 
compat=1.1 or greater)
+qemu-img: Lazy refcounts only supported with compatibility level 1.1 and above 
(use compat=1.1 or greater)
 qemu-img: Error while amending options: Invalid argument
-Unknown compatibility level 0.42.
+qemu-img: Unknown compatibility level 0.42
 qemu-img: Error while amending options: Invalid argument
 qemu-img: Invalid parameter 'foo'
 qemu-img: Invalid options for file format 'qcow2'
-Changing the cluster size is not supported.
+qemu-img: Changing the cluster size is not supported
 qemu-img: Error while amending options: Operation not supported
-Changing the encryption flag is not supported.
+qemu-img: Changing the encryption flag is not supported
 qemu-img: Error while amending options: Operation not supported
-Cannot change preallocation mode.
+qemu-img: Cannot change preallocation mode
 qemu-img: Error while amending options: Operation not supported
 
 === Testing correct handling of unset value ===
@@ -301,7 +301,7 @@ qemu-img: Error while amending options: Operation not 
supported
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864 
 Should work:
 Should not work:
-Changing the cluster size is not supported.
+qemu-img: Changing the cluster size is not supported
 qemu-img: Error while amending options: Operation not supported
 
 === Testing zero expansion on inactive clusters ===
-- 
1.9.3

[Qemu-devel] [PATCH v3 05/22] qcow2: Refcount overflow and qcow2_alloc_bytes()

2014-11-20 Thread Max Reitz

qcow2_alloc_bytes() may reuse a cluster multiple times, in which case
the refcount is increased accordingly. However, if this would lead to an
overflow the function should instead just not reuse this cluster and
allocate a new one.

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 block/qcow2-refcount.c | 32 ++--
 1 file changed, 30 insertions(+), 2 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index be4e5fe..66c78c0 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -761,12 +761,13 @@ int qcow2_alloc_clusters_at(BlockDriverState *bs, 
uint64_t offset,
 int64_t qcow2_alloc_bytes(BlockDriverState *bs, int size)
 {
 BDRVQcowState *s = bs->opaque;
-int64_t offset, cluster_offset, new_cluster;
+int64_t offset, cluster_offset, new_cluster, refcount;
 int64_t ret;
 int free_in_cluster;
 
 BLKDBG_EVENT(bs->file, BLKDBG_CLUSTER_ALLOC_BYTES);
 assert(size > 0 && size <= s->cluster_size);
+ redo:
 if (s->free_byte_offset == 0) {
 offset = qcow2_alloc_clusters(bs, s->cluster_size);
 if (offset < 0) {
@@ -774,12 +775,25 @@ int64_t qcow2_alloc_bytes(BlockDriverState *bs, int size)
 }
 s->free_byte_offset = offset;
 }
- redo:
+
 free_in_cluster = s->cluster_size -
 offset_into_cluster(s, s->free_byte_offset);
 if (size <= free_in_cluster) {
 /* enough space in current cluster */
 offset = s->free_byte_offset;
+
+if (offset_into_cluster(s, offset) != 0) {
+/* We will have to increase the refcount of this cluster; if the
+ * maximum has been reached already, this cluster cannot be used */
+refcount = qcow2_get_refcount(bs, offset >> s->cluster_bits);
+if (refcount < 0) {
+return refcount;
+} else if (refcount == s->refcount_max) {
+s->free_byte_offset = 0;
+goto redo;
+}
+}
+
 s->free_byte_offset += size;
 free_in_cluster -= size;
 if (free_in_cluster == 0)
@@ -800,6 +814,20 @@ int64_t qcow2_alloc_bytes(BlockDriverState *bs, int size)
 if ((cluster_offset + s->cluster_size) == new_cluster) {
 /* we are lucky: contiguous data */
 offset = s->free_byte_offset;
+
+/* Same as above: In order to reuse the cluster, the refcount has 
to
+ * be increased; if that will not work, we are not so lucky after
+ * all */
+refcount = qcow2_get_refcount(bs, offset >> s->cluster_bits);
+if (refcount < 0) {
+qcow2_free_clusters(bs, new_cluster, s->cluster_size,
+QCOW2_DISCARD_NEVER);
+return refcount;
+} else if (refcount == s->refcount_max) {
+s->free_byte_offset = offset;
+goto redo;
+}
+
 ret = qcow2_update_cluster_refcount(bs, offset >> s->cluster_bits,
 1, QCOW2_DISCARD_NEVER);
 if (ret < 0) {
-- 
1.9.3

[Qemu-devel] [PATCH v3 04/22] qcow2: Respect error in qcow2_alloc_bytes()

2014-11-20 Thread Max Reitz

qcow2_update_cluster_refcount() may fail, and qcow2_alloc_bytes() should
mind that case.

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 block/qcow2-refcount.c | 32 +---
 1 file changed, 21 insertions(+), 11 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 6e06531..be4e5fe 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -761,7 +761,8 @@ int qcow2_alloc_clusters_at(BlockDriverState *bs, uint64_t 
offset,
 int64_t qcow2_alloc_bytes(BlockDriverState *bs, int size)
 {
 BDRVQcowState *s = bs->opaque;
-int64_t offset, cluster_offset;
+int64_t offset, cluster_offset, new_cluster;
+int64_t ret;
 int free_in_cluster;
 
 BLKDBG_EVENT(bs->file, BLKDBG_CLUSTER_ALLOC_BYTES);
@@ -783,23 +784,32 @@ int64_t qcow2_alloc_bytes(BlockDriverState *bs, int size)
 free_in_cluster -= size;
 if (free_in_cluster == 0)
 s->free_byte_offset = 0;
-if (offset_into_cluster(s, offset) != 0)
-qcow2_update_cluster_refcount(bs, offset >> s->cluster_bits, 1,
-  QCOW2_DISCARD_NEVER);
+if (offset_into_cluster(s, offset) != 0) {
+ret = qcow2_update_cluster_refcount(bs, offset >> s->cluster_bits,
+1, QCOW2_DISCARD_NEVER);
+if (ret < 0) {
+return ret;
+}
+}
 } else {
-offset = qcow2_alloc_clusters(bs, s->cluster_size);
-if (offset < 0) {
-return offset;
+new_cluster = qcow2_alloc_clusters(bs, s->cluster_size);
+if (new_cluster < 0) {
+return new_cluster;
 }
 cluster_offset = start_of_cluster(s, s->free_byte_offset);
-if ((cluster_offset + s->cluster_size) == offset) {
+if ((cluster_offset + s->cluster_size) == new_cluster) {
 /* we are lucky: contiguous data */
 offset = s->free_byte_offset;
-qcow2_update_cluster_refcount(bs, offset >> s->cluster_bits, 1,
-  QCOW2_DISCARD_NEVER);
+ret = qcow2_update_cluster_refcount(bs, offset >> s->cluster_bits,
+1, QCOW2_DISCARD_NEVER);
+if (ret < 0) {
+qcow2_free_clusters(bs, new_cluster, s->cluster_size,
+QCOW2_DISCARD_NEVER);
+return ret;
+}
 s->free_byte_offset += size;
 } else {
-s->free_byte_offset = offset;
+s->free_byte_offset = new_cluster;
 goto redo;
 }
 }
-- 
1.9.3

[Qemu-devel] [PATCH v3 16/22] qcow2: Use abort() instead of assert(false)

2014-11-20 Thread Max Reitz

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 block/qcow2.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 4e19615..e8d54ab 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2723,9 +2723,9 @@ static int qcow2_amend_options(BlockDriverState *bs, 
QemuOpts *opts,
 error_report("Cannot change refcount entry width");
 return -ENOTSUP;
 } else {
-/* if this assertion fails, this probably means a new option was
+/* if this point is reached, this probably means a new option was
  * added without having it covered here */
-assert(false);
+abort();
 }
 
 desc++;
-- 
1.9.3

[Qemu-devel] [PATCH v3 00/22] qcow2: Support refcount orders != 4

2014-11-20 Thread Max Reitz

As of version 3, the qcow2 file format supports different widths for
refcount entries, ranging from 1 to 64 bit (only powers of two).
Currently, qemu only supports 16 bit, which is the only width supported
by version 2 (compat=0.10) images.

This series adds support to qemu for all other valid refcount orders.
This is mainly done by adding two function pointers into the
BDRVQcowState structure for reading and writing refcount values
independently of the current refcount entry width; all in-memory
refcount arrays (mostly cached refcount blocks) now are void pointers
and are accessed through these functions alone.

Thanks to previous work of making the qemu code agnostic of e.g. the
number of refcount entries per refcount block, the rest is fairly
trivial. The most complex patch in this series is patch 18 which
implements changing the refcount order through qemu-img amend.

To test different refcount widths, simply invoke the qemu-iotests check
program with -o refcount_width=${your_desired_width}. The final test in
this series adds some tests for operations which do not work with
certain refcount orders and for refcount order amendment.


This series depends on version 4 (or any later version) of my
"chardev: Add -qmp-pretty" series (due to different test output of test
067, which makes changing it here much nicer).


v3 (everything [Eric], except for patch 13):
- Patch 6:
  - s/independently/independent/ in the commit message
  - Extended the commit message to tell why the new helper function
always aligns the size of the refcount array on cluster boundaries
  - Added a comment to realloc_refcount_array() (the new helper
function) which explains what this function does and why it does it
the way it does it
  - Immediately return from realloc_refcount_array() if the size of the
refcount array decreases (should not happen, but doesn't hurt
either) or stays the same (pretty likely); this prevents us from
going into g_try_realloc() with a size of 0, which in turn
guarantees that it will never return a NULL pointer on success
- Patch 7:
  - Extended the commit message to explain why the bounce buffer for
refblocks in realloc_refcount_array() can be omitted and that it
will therefore be omitted
  - Fixed alignment of check_refcounts_l2()'s header
  - Use refcount_array_byte_size() when clearing the refcount array
before recalculating the refcounts
- Patch 8:
  - Use two look-up tables for obtaining get_refcount() and
set_refcount(), respectively, instead of a single switch statement
(it is indeed much shorted and better readable this way)
- Patch 9: s/bit/bits/
- Patch 10:
  - Extended the commit message to note changes to the preallocation
file size calculation
  - Added a comment to that calculation that it does not need to be
exact (this addition is noted in the commit message as well)
- Patch 11: Added a missing comment
- Patch 13: Added
- Patch 19 (prev. 18):
  - get_refcount_functions() has been replaced by two look-up tables
(patch 8)
  - s/MIN/MAX/
  - Dropped two dead new_allocation assignments
  - s/QCOW2_DISCARD_NEVER/QCOW2_DISCARD_OTHER/ in two places
- Patch 22 (prev. 21):
  - Added a test case for the MSb set for 64 bit refcounts
  - Extended the 3-pass amend test case to test whether there are
actually three passes (by grepping in the progress report output)


git-backport-diff against v2:

Key:
[] : patches are identical
[] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/22:[] [--] 'qcow2: Add two new fields to BDRVQcowState'
002/22:[] [--] 'qcow2: Add refcount_width to format-specific info'
003/22:[] [--] 'qcow2: Use 64 bits for refcount values'
004/22:[] [--] 'qcow2: Respect error in qcow2_alloc_bytes()'
005/22:[] [--] 'qcow2: Refcount overflow and qcow2_alloc_bytes()'
006/22:[0024] [FC] 'qcow2: Helper for refcount array reallocation'
007/22:[0007] [FC] 'qcow2: Helper function for refcount modification'
008/22:[0068] [FC] 'qcow2: More helpers for refcount modification'
009/22:[0002] [FC] 'qcow2: Open images with refcount order != 4'
010/22:[0005] [FC] 'qcow2: refcount_order parameter for qcow2_create2'
011/22:[0001] [FC] 'iotests: Prepare for refcount_width option'
012/22:[] [--] 'qcow2: Allow creation with refcount order != 4'
013/22:[down] 'progress: Allow regressing progress'
014/22:[] [--] 'block: Add opaque value to the amend CB'
015/22:[] [--] 'qcow2: Use error_report() in qcow2_amend_options()'
016/22:[] [--] 'qcow2: Use abort() instead of assert(false)'
017/22:[] [--] 'qcow2: Split upgrade/downgrade paths for amend'
018/22:[] [--] 'qcow2: Use intermediate helper CB for amend'
019/22:[0013] [FC] 'qcow2: Add function for refcount order amendment'
020/22:[] [--] 'qcow2: Invoke refcount order amendment function'
021/22:[] [--] 'qcow

[Qemu-devel] [PATCH v3 03/22] qcow2: Use 64 bits for refcount values

2014-11-20 Thread Max Reitz

Refcounts may have a width of up to 64 bits, so qemu should use the same
width to represent refcount values internally.

Since for instance qcow2_get_refcount() signals an error by returning a
negative value, refcount values are generally signed to be able to
represent those error values correctly. This limits the maximum refcount
value supported by qemu to INT64_MAX (= 63 bits), as established in
"qcow2: Add two new fields to BDRVQcowState".

This limitation should have no implications in practice for normal valid
images. If the MSb in a 64 bit refcount value is set, we can safely
assume the value to be invalid (because reaching such high refcounts is
impossible due to other limitations of the qcow2 format).

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 block/qcow2-cluster.c  |  9 ++---
 block/qcow2-refcount.c | 37 -
 block/qcow2.h  |  7 ---
 3 files changed, 30 insertions(+), 23 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index df0b2c9..ab43902 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1640,7 +1640,7 @@ static int expand_zero_clusters_in_l1(BlockDriverState 
*bs, uint64_t *l1_table,
 for (i = 0; i < l1_size; i++) {
 uint64_t l2_offset = l1_table[i] & L1E_OFFSET_MASK;
 bool l2_dirty = false;
-int l2_refcount;
+int64_t l2_refcount;
 
 if (!l2_offset) {
 /* unallocated */
@@ -1696,14 +1696,17 @@ static int expand_zero_clusters_in_l1(BlockDriverState 
*bs, uint64_t *l1_table,
 }
 
 if (l2_refcount > 1) {
+int64_t ret64;
+
 /* For shared L2 tables, set the refcount accordingly (it 
is
  * already 1 and needs to be l2_refcount) */
-ret = qcow2_update_cluster_refcount(bs,
+ret64 = qcow2_update_cluster_refcount(bs,
 offset >> s->cluster_bits, l2_refcount - 1,
 QCOW2_DISCARD_OTHER);
-if (ret < 0) {
+if (ret64 < 0) {
 qcow2_free_clusters(bs, offset, s->cluster_size,
 QCOW2_DISCARD_OTHER);
+ret = ret64;
 goto fail;
 }
 }
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 6016211..6e06531 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -91,14 +91,14 @@ static int load_refcount_block(BlockDriverState *bs,
  * return value is the refcount of the cluster, negative values are -errno
  * and indicate an error.
  */
-int qcow2_get_refcount(BlockDriverState *bs, int64_t cluster_index)
+int64_t qcow2_get_refcount(BlockDriverState *bs, int64_t cluster_index)
 {
 BDRVQcowState *s = bs->opaque;
 uint64_t refcount_table_index, block_index;
 int64_t refcount_block_offset;
 int ret;
 uint16_t *refcount_block;
-uint16_t refcount;
+int64_t refcount;
 
 refcount_table_index = cluster_index >> s->refcount_block_bits;
 if (refcount_table_index >= s->refcount_table_size)
@@ -556,9 +556,10 @@ static int QEMU_WARN_UNUSED_RESULT 
update_refcount(BlockDriverState *bs,
 for(cluster_offset = start; cluster_offset <= last;
 cluster_offset += s->cluster_size)
 {
-int block_index, refcount;
+int block_index;
 int64_t cluster_index = cluster_offset >> s->cluster_bits;
 int64_t table_index = cluster_index >> s->refcount_block_bits;
+int64_t refcount;
 
 /* Load the refcount block and allocate it if needed */
 if (table_index != old_table_index) {
@@ -634,10 +635,10 @@ fail:
  * If the return value is non-negative, it is the new refcount of the cluster.
  * If it is negative, it is -errno and indicates an error.
  */
-int qcow2_update_cluster_refcount(BlockDriverState *bs,
-  int64_t cluster_index,
-  int addend,
-  enum qcow2_discard_type type)
+int64_t qcow2_update_cluster_refcount(BlockDriverState *bs,
+  int64_t cluster_index,
+  int addend,
+  enum qcow2_discard_type type)
 {
 BDRVQcowState *s = bs->opaque;
 int ret;
@@ -663,7 +664,7 @@ static int64_t alloc_clusters_noref(BlockDriverState *bs, 
uint64_t size)
 {
 BDRVQcowState *s = bs->opaque;
 uint64_t i, nb_clusters;
-int refcount;
+int64_t refcount;
 
 nb_clusters = size_to_clusters(s, size);
 retry:
@@ -722,7 +723,8 @@ int qcow2_alloc_clusters_at(BlockDriverState *bs, uint64_t 
offset,
 BDRVQcowState *s = bs->opaque;
 uint64_t cluster_index;
 uint64_t i;
-int refcount, ret;
+int64_t refcount;
+int ret;
 
 assert(nb_clusters >= 0);
 if (nb_cluster

[Qemu-devel] [PATCH v3 01/22] qcow2: Add two new fields to BDRVQcowState

2014-11-20 Thread Max Reitz

Add two new fields regarding refcount information (the bit width of
every entry and the maximum refcount value) to the BDRVQcowState.

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 block/qcow2-refcount.c | 2 +-
 block/qcow2.c  | 9 +
 block/qcow2.h  | 2 ++
 3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 9afdb40..6016211 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -584,7 +584,7 @@ static int QEMU_WARN_UNUSED_RESULT 
update_refcount(BlockDriverState *bs,
 
 refcount = be16_to_cpu(refcount_block[block_index]);
 refcount += addend;
-if (refcount < 0 || refcount > 0x) {
+if (refcount < 0 || refcount > s->refcount_max) {
 ret = -EINVAL;
 goto fail;
 }
diff --git a/block/qcow2.c b/block/qcow2.c
index d120494..f57aff9 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -684,6 +684,15 @@ static int qcow2_open(BlockDriverState *bs, QDict 
*options, int flags,
 goto fail;
 }
 s->refcount_order = header.refcount_order;
+s->refcount_bits = 1 << s->refcount_order;
+if (s->refcount_order < 6) {
+s->refcount_max = (UINT64_C(1) << s->refcount_bits) - 1;
+} else {
+/* The above shift would overflow with s->refcount_bits == 64;
+ * furthermore, we do not want to use UINT64_MAX because refcounts will
+ * be passed around in int64_ts (negative values for -errno) */
+s->refcount_max = INT64_MAX;
+}
 
 if (header.crypt_method > QCOW_CRYPT_AES) {
 error_setg(errp, "Unsupported encryption method: %" PRIu32,
diff --git a/block/qcow2.h b/block/qcow2.h
index 6e39a1b..4d8c902 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -258,6 +258,8 @@ typedef struct BDRVQcowState {
 int qcow_version;
 bool use_lazy_refcounts;
 int refcount_order;
+int refcount_bits;
+uint64_t refcount_max;
 
 bool discard_passthrough[QCOW2_DISCARD_MAX];
 
-- 
1.9.3

Re: [Qemu-devel] Fwd: Re: Tunneled Migration with Non-Shared Storage

2014-11-20 Thread Gary R Hook


On 11/20/14 3:54 AM, Dr. David Alan Gilbert wrote:

* Gary R Hook (grhookatw...@gmail.com) wrote:

Ugh, I wish I could teach Thunderbird to understand how to reply to a
newsgroup.

Apologies to Paolo for the direct note.

On 11/19/14 4:19 AM, Paolo Bonzini wrote:



On 19/11/2014 10:35, Dr. David Alan Gilbert wrote:

* Paolo Bonzini (pbonz...@redhat.com) wrote:



On 18/11/2014 21:28, Dr. David Alan Gilbert wrote:

This seems odd, since as far as I know the tunneling code is quite separate
to the migration code; I thought the only thing that the migration
code sees different is the file descriptors it gets past.
(Having said that, again I don't know storage stuff, so if this
is a storage special there may be something there...)


Tunnelled migration uses the old block-migration.c code.  Non-tunnelled
migration uses the NBD server and block/mirror.c.


OK, that explains that.  Is that because the tunneling code can't
deal with tunneling the NBD server connection?


The main problem with
the old code is that uses a possibly unbounded amount of memory in
mig_save_device_dirty and can have huge jitter if any serious workload
is running in the guest.


So that's sending dirty blocks iteratively? Not that I can see
when the allocations get freed; but is the amount allocated there
related to total disk size (as Gary suggested) or to the amount
of dirty blocks?


It should be related to the maximum rate limit (which can be set to
arbitrarily high values, however).


This makes no sense. The code in block_save_iterate() specifically
attempts to control the rate of transfer. But when
qemu_file_get_rate_limit() returns a number like 922337203685372723
(0xCCB) I'm under the impression that no bandwidth
constraints are being imposed at this layer. Why, then, would that
transfer be occurring at 20MB/sec (simple, under-utilized 1 gigE
connection) with no clear bottleneck in CPU or network? What other
relation might exist?


Disk IO on the disk that you're trying to transfer?


Well, non-tunneled runs fast enough (120 MB/s) to saturate the network 
pipe, so it's evident to me that the blocks can come screaming from the 
disk plenty fast. And there's no CPU bottleneck; the VM is really not 
doing much of anything at all. So I'll say no. I shall continue my 
investigation.



The reads are started, then the ones that are ready are sent and the
blocks are freed in flush_blks.  The jitter happens when the guest reads
a lot but only writes a few blocks.  In that case, the bdrv_drain_all in
mig_save_device_dirty can be called relatively often and it can be
expensive because it also waits for all guest-initiated reads to complete.


Pardon my ignorance, but this does not match my observations. What I am
seeing is the process size of the source qemu grow steadily until the
COR completes; during this time the backing file on the destination
system does not change/grow at all, which implies that no blocks are
being transferred. (I have tested this with a 25GB VM disk, and larger;
no network activity occurs during this period.) Once the COR is done and
the in-memory copy ready (marked by a "Completed 100%" message from
blk_mig_save_builked_block()) the transfer begins. At an abysmally slow
rate, I'll add, per the above. Another problem to be investigated.


Odd thought; can you try dropping your migration bandwidth limit
(migrate_set_speed) - try something low, like 10M - does the behaviour
stay the same, or does it start transmitting disk data before it's read
the lot?


Interesting idea. I shall attempt that.


The bulk phase is similar, just with different functions (the reads are
done in mig_save_device_bulk).  With a high rate limit, the total
allocated memory can reach a few gigabytes indeed.


Much, much more than that. It's definitely dependent upon the disk file
size. Tiny VM disks are a nit; big VM disks are a problem.


Well, if as you say it's not starting transmitting for some reason until
it's read the lot then that would make sense.


Right. I'm just saying that I don't think this works the way people 
thinks it works.



Depending on the scenario, a possible disadvantage of NBD migration is
that it can only throttle each disk separately, while the old code will
apply a single limit to all migrations.


How about no throttling at all? And just to be very clear, the goal is
fast (NBD-based) migrations of VMs using non-shared storage over an
encrypted channel. Safest, worst-case scenario. Aside from gaining an
understanding of this code.


There are vague plans to add TLS support for encrypting these streams
internally to qemu; but they're just thoughts at the moment.


:-(

--
Gary R Hook
Senior Kernel Engineer
NIMBOXX, Inc

Re: [Qemu-devel] [PATCH 3/4] sdhci: Support SDHCI devices on PCI

2014-11-20 Thread Kevin O'Connor

On Tue, Nov 18, 2014 at 07:27:24AM +0100, Paolo Bonzini wrote:
> On 18/11/2014 05:26, Kevin O'Connor wrote:
> > --- a/include/hw/pci/pci.h
> > +++ b/include/hw/pci/pci.h
> > @@ -53,6 +53,7 @@
> >  /* QEMU/Bochs VGA (0x1234) */
> >  #define PCI_VENDOR_ID_QEMU   0x1234
> >  #define PCI_DEVICE_ID_QEMU_VGA   0x
> > +#define PCI_DEVICE_ID_SDHCI  0x
> 
> 0x1234 is not a registered PCI id, and it's only used for VGA for
> backwards-compatibility reasons.
> 
> Please use 1b36:0005 instead, and document it in docs/specs/pci-ids.txt,
> or use a real-world PCI vendor/device pair (if you can find one that
> Linux doesn't have quirks for; that could be hard).

Hi Paolo.  Thanks for reviewing.

I know recent Intel chips (eg, baytrail) have a builtin sdhci
controller (eg, 8086:0f16).  However, that has quirks defined in the
Linux driver.  Basic functionality still does seem to work though when
I use those ids in qemu.  The same basic functionality also seems to
work when I use 1b36:0005 as well.

Is there a preference then to use the redhat ids?  Gerd, you seem to
be in charge of the redhat pci ids - are you okay if I us one (should
it be the existing 0005 or add 0006)?

Thanks,
-Kevin

Re: [Qemu-devel] [PATCH v4 0/3] chardev: Add -qmp-pretty

2014-11-20 Thread Kevin Wolf

Am 17.11.2014 um 13:31 hat Max Reitz geschrieben:
> This series does not add new functionality. Adding a QMP monitor with
> prettily formatted JSON output can be done as follows:
> 
> $ qemu -chardev stdio,id=mon0 -mon chardev=mon0,mode=control,pretty=on
> 
> However, this is rather cumbersome, so this series (its first patch)
> adds a shortcut in the form of the new command line option -qmp-pretty.
> 
> Since the argument given to a monitor command line option (such as -qmp)
> is parsed depending on its prefix and probably also depending on the
> current phase of the moon, this is cleaner than trying to add a "switch"
> to -qmp itself (in the form of "-qmp stdio,pretty=on").
> 
> 
> Patch 3 makes uses of the new option in qemu-iotest 067 to greatly
> increase maintainability of its reference output. Patch 2 extends the
> QMP filter for qemu-iotests so it is able to filter out the QMP version
> object in pretty mode.

Thanks, applied to block-next.

Kevin

Re: [Qemu-devel] [BUGFIX][PATCH for 2.2 v3 1/1] -machine vmport=auto: Fix handling of VMWare ioport emulation for xen

2014-11-20 Thread Eric Blake

On 11/20/2014 01:44 AM, Michael S. Tsirkin wrote:
> On Wed, Nov 19, 2014 at 07:38:10PM -0500, Don Slutz wrote:
>> c/s 9b23cfb76b3a5e9eb5cc899eaf2f46bc46d33ba4
>>
>> or
>>
>> c/s b154537ad07598377ebf98252fb7d2aff127983b
>>
>> moved the testing of xen_enabled() from pc_init1() to
>> pc_machine_initfn().
>>
>> xen_enabled() does not return the correct value in
>> pc_machine_initfn().
>>
>> Changed vmport from a bool to an enum.  Added the value "auto" to do
>> the old way.
>>
>> Signed-off-by: Don Slutz 
> 
> This looks fine to me. A couple of minor comments below.
> Also this changes qapi schema file, let's get ack from maintainers -
> my understanding is that just adding a definition there won't
> affect any users, correct?

Correct; adding a definition won't break existing users, whether or not
the new type is currently used by current commands.  However, it's still
worth designing the new enum type to be potentially reusable; the idea
of naming it OnOffAuto makes sense to me.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

1 2 3 >

1 - 100 of 238 matches

Mail list logo