date:20170428

[Qemu-devel] bytes xfer

2017-04-28 Thread ali saeedi

Hi
what does 'bytes_xfer' and 'xfer_limit' mean?
I am new in qemu.
thanks a lot

[Qemu-devel] [Bug 760976] Re: Nexenta 3.0.1 fails to install

2017-04-28 Thread Launchpad Bug Tracker

[Expired for QEMU because there has been no activity for 60 days.]

** Changed in: qemu
   Status: Incomplete => Expired

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/760976

Title:
  Nexenta 3.0.1 fails to install

Status in QEMU:
  Expired

Bug description:
  The latest git version of qemu (commit
  420b6c317de87890e06225de6e2f8af7bf714df0) fails to boot Nexenta3.0.1.
  I don't know if this is a bug in nextenta, or in QEMU or both.

  You can obtain a bootable image of Nextenta from
  http://www.nexenta.org/releases/nexenta-core-
  platform_3.0.1-b134_x86.iso.zip

  Host: Linux/x86_64 gcc4.5 ./configure --enable-linux-aio --enable-io-
  thread --enable-kvm

  qemu-img create nexenta3.0.1 3G
  qemu -hda nexenta3.0.1 -cdrom nexenta-core-platform3.0.1-b134x86.iso -boot d 
-k en-us -m 256

  Boots to grub OK, but when you hit install you get
  panic[cpu0]/thread=fec226c0: vmem_hash_delete(d4404690, d445abc0, 0):
  bad free.

  You get the same error with or without -enable-kvm

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/760976/+subscriptions

[Qemu-devel] [Bug 1654826] Re: Holding key down using input-linux freezes guest

2017-04-28 Thread D J

I am getting the same issue where I can even hear the sound glitch out.
I'm on Arch Linux.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1654826

Title:
  Holding key down using input-linux freezes guest

Status in QEMU:
  New

Bug description:
  Qemu release version 2.8.0
  KVM, kernel 4.9.1

  When using the -object input-linux capability in qemu for passthrough
  of input/evdev devices, I found that when a key is held for a few
  seconds or more (such as ctrl key), the guest system freezes until the
  key is released. In some cases, mouse control is also lost following
  one of these "freezes". I also noticed that one of the four cpu cores
  I have the guest pinned to ramps to 100% during these freezes.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1654826/+subscriptions

Re: [Qemu-devel] [PATCH v17 00/24] qcow2: persistent dirty bitmaps

2017-04-28 Thread Vladimir Sementsov-Ogievskiy

This is fixed by not fail on ! can_write() if there are no persistent 
BdrvDirtyBitmap's. But the problem is, what if we have readonly image with 
bitmaps? This should not fail. I'll have to add bool 'changed' field to 
BdrvDirtyBitmap to handle this. Or refactor all bool's to int flags

Best regards, 
Vladimir. 

От: John Snow [js...@redhat.com]
Отправлено: 27 апреля 2017 г. 19:43
Кому: Vladimir Sementsov-Ogievskiy; qemu-bl...@nongnu.org; qemu-devel@nongnu.org
Копия: kw...@redhat.com; f...@redhat.com; arm...@redhat.com; mre...@redhat.com; 
stefa...@redhat.com; pbonz...@redhat.com; Denis Lunev
Тема: Re: [Qemu-devel] [PATCH v17 00/24] qcow2: persistent dirty bitmaps

On 04/26/2017 07:30 AM, Vladimir Sementsov-Ogievskiy wrote:
> Hi all!
>
> There is a new update of qcow2-bitmap series - v17.
>
> web: 
> https://src.openvz.org/users/vsementsov/repos/qemu/browse?at=qcow2-bitmap-v17
> git: https://src.openvz.org/scm/~vsementsov/qemu.git (tag qcow2-bitmap-v17)
>

Hm, what is this branch based on? My patches tools are having trouble
finding common ancestors -- 2d6752d38d8acda6aae674a72b72be05482a58eb I
guess, but that's pre-rc0, and the histories don't quite match up.

I did a rebase myself, but I notice that patch 14/24; "qcow2: add
persistent dirty bitmaps support" causes a large number of iotest
regressions:

Failures: 007 009 010 011 012 013 014 015 017 018 019 020 022 023 024
025 026 028 029 030 031 032 034 035 037 038 039 040 041 042 043 044 046
047 048 050 051 052 053 055 056 058 060 061 062 063 065 066 073 074 080
082 086 089 090 095 097 098 102 104 110 112 114 115 117 121 122 124 129
130 132 133 138 140 141 142 150 152 154 155 156 158 159 170 176

Can you give that a look and if you respin, rebase on top of origin/master?

Thanks,
-John

> v17:
>
> 08: add r-b's by Max and John
> 09: clear unknown autoclear features from BDRVQcow2State before calling
> qcow2_load_autoloading_dirty_bitmaps(), and also do not extra update
> header if it is updated by qcow2_load_autoloading_dirty_bitmaps().
> 11: new patch, splitted out from 16
> 12: rewrite commit message
> 14: changing bdrv_close moved to separate new patch 11
> s/1/1ULL/ ; s/%u/%llu/ for two errors
> 16: s/No/Not/
> add Max's r-b
> 24: new patch
>
>
> v16:
>
> mostly by Kevin's comments:
> 07: just moved here, as preparation for merging refcounts to 08
> 08: move "qcow2-bitmap: refcounts" staff to this patch, to not break 
> qcow2-img check
> drop r-b's
> move necessary supporting static functions to this patch too (to not 
> break compilation)
> fprintf -> error_report
> other small changes with error messages and comments
> code style
> for qcow2-bitmap.c:
>   'include "exec/log.h"' was dropped
>   s/1/(1U << 0) for BME_FLAG_IN_USE
>   add BME_TABLE_ENTRY_FLAG_ALL_ONES to replace magic 1
>   don't check 'cluster_size <= 0' in check_table_entry
> old "[PATCH v15 08/25] block: introduce auto-loading bitmaps" was dropped
> 09: was "[PATCH v15 09/25] qcow2: add .bdrv_load_autoloading_dirty_bitmaps"
> drop r-b's
> some staff was moved to 08
> update_header_sync - error on bdrv_flush fail
> rename disk_sectors_in_bitmap_cluster to sectors_covered_by_bitmap_cluster
>  and adjust comment.
>  so, variable for storing this function result: s/dsc/sbc/
> s/1/BME_TABLE_ENTRY_FLAG_ALL_ONES/
> also, as Qcow2BitmapTable already introduced in 08,
>s/table_offset/table.offset/ and s/table_size/table.size, etc..
> update_ext_header_and_dir_in_place: add comments, add additional
>   update_header_sync, to reduce indeterminacy in case of error.
> call qcow2_load_autoloading_dirty_bitmaps directly from qcow2_open
> no .bdrv_load_autoloading_dirty_bitmaps
> 11: drop bdrv_store_persistent_dirty_bitmaps at all.
> drop r-b's
> 13: rename patch, rewrite commit msg
> drop r-b's
> move bdrv_release_named_dirty_bitmaps in bdrv_close() after 
> drv->bdrv_close()
> Qcow2BitmapTable is already introduced, so no
>   s/table_offset/table.offset/ and s/table_size/table.size, etc.. here
> old 25 with check_constraints_on_bitmap() improvements merged here (Eric)
> like in 09, s/dsc/sbc/ and 
> s/disk_sectors_in_bitmap_cluster/sectors_covered_by_bitmap_cluster/
> no .bdrv_store_persistent_dirty_bitmaps
> call qcow2_store_persistent_dirty_bitmaps directly from qcow2_inactivate
> 15: corresponding part of 25 merged here. Add John's r-b, as 25 was also 
> reviewed by John.
> 16: add John's r-b
>
>
> v15:
> 13,14: add John's r-b
> 15: qcow2_can_store_new_dirty_bitmap:
>   - check max dirty bitmaps and bitmap directory overhead
>   - switch to error_prepend
> rm Max's r-b
> not add John's r-b
> 17-24: add John's r-b
> 25: changed because 15 changed,
> not add John's r-b
>
>
> v14:
>
> 07: use '|=' to update need_update_header
> add John's r-b
> add Max's r-b
> 09: remove unused

Re: [Qemu-devel] RFC: some problems with bdrv_get_block_status

2017-04-28 Thread Denis V. Lunev

On 04/28/2017 09:31 PM, Eric Blake wrote:
> On 04/28/2017 11:17 AM, Denis V. Lunev wrote:
>> Hello, All!
>>
>> Recently we have experienced problems with very slow
>> bdrv_get_block_status call, which is massive called f.e.
>> from mirror_run().
>>
>> The problem was narrowed down to slow lseek64() system
>> call, which can take 1-2-3 seconds.
> I'm guessing you meant one-to-three (the range), not one-two-three
> (three separate digits), and just had an unfortunate abbreviation of
> 'to' turning into the phonetically-similar '2'.
>
from 1 to 3, you are correct


>> root@s158 ~]# strace -f -p 77048 -T  -e lseek
>> Process 77048 attached with 14 threads
>> [pid 77053] lseek(23, 6359613440, SEEK_DATA) = 6360276992 <1.250323>
> That sounds like a bug in your choice of filesystem.  It's been
> mentioned before that lseek has some pathetically poor behavior (I think
> tmpfs was one of the culprits), but I maintain that it is better to
> hammer on the kernel folks to fix the poor behavior than it is to have
> to implement user-space workarounds in every single affected program.
>
> That said, a workaround of being able to request the avoidance of lseek
> has been brought up on this list before.
>
>
>> The problem comes from this branch of the code
>>
>> bdrv_co_get_block_status
>> ...
>> if (bs->file &&
>> (ret & BDRV_BLOCK_DATA) && !(ret & BDRV_BLOCK_ZERO) &&
>> (ret & BDRV_BLOCK_OFFSET_VALID)) {
>> int file_pnum;
>>
>> ret2 = bdrv_co_get_block_status(bs->file, ret >> BDRV_SECTOR_BITS,
>> *pnum, _pnum);
>> if (ret2 >= 0) {
>> /* Ignore errors.  This is just providing extra information, it
>>  * is useful but not necessary.
>>  */
>> if (!file_pnum) {
>> /* !file_pnum indicates an offset at or beyond the EOF;
>> it is
>>  * perfectly valid for the format block driver to point
>> to such
>>  * offsets, so catch it and mark everything as zero */
>> ret |= BDRV_BLOCK_ZERO;
>> } else {
>> /* Limit request to the range reported by the protocol
>> driver */
>> *pnum = file_pnum;
>> ret |= (ret2 & BDRV_BLOCK_ZERO);
>> }
>> }
>> }
> I don't see anything wrong with this code. It's nice to know that we
> have data (qcow2 says we have allocated bytes due to this layer, so
> don't refer to the backing files), but when the underlying file can ALSO
> tell us that the underlying protocol is sparse and we are on a hole,
> then we know that we have BDRV_BLOCK_ZERO condition which can in turn
> enable other optimizations in quite a bit of the stack.  It IS valid to
> return BDRV_BLOCK_DATA | BDRV_BLOCK_ZERO (we don't have to limit
> ourselves to BDRV_BLOCK_DATA), and we should probably do so any time
> that such computation is cheap.
>
> I agree with your analysis that on a poor filesystem where lseek()
> behavior sucks that it is no longer cheap to determine where the holes are.
>
> But the proper workaround for those filesystems should NOT be made by
> undoing this part of bdrv_co_get_block_status(), but should rather be
> fixed in file-posix.c by the addition of a user-controllable knob on
> whether to skip lseek().  In other words, if we're going to work around
> the poor filesystem performance, the workaround should live in
> file-posix.c, not in the generic block/io.c.  Once the workaround is
> enabled, the 'ret2 = bdrv_co_get_block_status()' nested call should be
> lightning fast (because file-posix.c would no longer be using lseek when
> you've requested the workaround).
>
>> Frankly speaking, this optimization should not give much.
>> If upper layer format (QCOW2) says that we have data
>> here, then nowadays in 99.9% we do have the data.
> You are correct that we have BDRV_BLOCK_DATA; but it IS useful to ALSO
> know if that data reads back as all zeros (so we can skip reading it).
> Just because qcow2 reports something as data does NOT rule out whether
> the data still reads as zeros.
>
>> Meanwhile this branch can cause problems. We would
>> need block cleared entirely to get the benefit for most
>> cases in mirror and backup operations.
>>
>> At my opinion it worth to drop this at all.
>>
>> Guys, do you have any opinion?
> Again, my opinion is to not change this part of block/io.c.  Rather,
> work should be expended on individual protocol drivers to quit wasting
> unnecessary time on reporting BDRV_BLOCK_ZERO if the time spent to
> determine that is not worth the effort (as it is when using lseek() on
> inefficient filesystems).  It is always safe for a protocol driver to
> report BDRV_BLOCK_DATA instead of BDRV_BLOCK_ZERO, if asking the
> question is too expensive (treating holes as data may pessimize other
> operations, but that's okay if the penalty for asking is worse than what
> optimizations are able to regain).
>
>> Den
>>
>>

Re: [Qemu-devel] [RFC/BUG] xen-mapcache: buggy invalidate map cache?

2017-04-28 Thread Stefano Stabellini

On Thu, 13 Apr 2017, Herongguang (Stephen) wrote:
> On 2017/4/13 7:51, Stefano Stabellini wrote:
> > On Wed, 12 Apr 2017, Herongguang (Stephen) wrote:
> > > On 2017/4/12 6:32, Stefano Stabellini wrote:
> > > > On Tue, 11 Apr 2017, hrg wrote:
> > > > > On Tue, Apr 11, 2017 at 3:50 AM, Stefano Stabellini
> > > > >  wrote:
> > > > > > On Mon, 10 Apr 2017, Stefano Stabellini wrote:
> > > > > > > On Mon, 10 Apr 2017, hrg wrote:
> > > > > > > > On Sun, Apr 9, 2017 at 11:55 PM, hrg 
> > > > > > > > wrote:
> > > > > > > > > On Sun, Apr 9, 2017 at 11:52 PM, hrg 
> > > > > > > > > wrote:
> > > > > > > > > > Hi,
> > > > > > > > > > 
> > > > > > > > > > In xen_map_cache_unlocked(), map to guest memory maybe in
> > > > > > > > > > entry->next
> > > > > > > > > > instead of first level entry (if map to rom other than guest
> > > > > > > > > > memory
> > > > > > > > > > comes first), while in xen_invalidate_map_cache(), when VM
> > > > > > > > > > ballooned
> > > > > > > > > > out memory, qemu did not invalidate cache entries in linked
> > > > > > > > > > list(entry->next), so when VM balloon back in memory, gfns
> > > > > > > > > > probably
> > > > > > > > > > mapped to different mfns, thus if guest asks device to DMA
> > > > > > > > > > to
> > > > > > > > > > these
> > > > > > > > > > GPA, qemu may DMA to stale MFNs.
> > > > > > > > > > 
> > > > > > > > > > So I think in xen_invalidate_map_cache() linked lists should
> > > > > > > > > > also be
> > > > > > > > > > checked and invalidated.
> > > > > > > > > > 
> > > > > > > > > > What’s your opinion? Is this a bug? Is my analyze correct?
> > > > > > > Yes, you are right. We need to go through the list for each
> > > > > > > element of
> > > > > > > the array in xen_invalidate_map_cache. Can you come up with a
> > > > > > > patch?
> > > > > > I spoke too soon. In the regular case there should be no locked
> > > > > > mappings
> > > > > > when xen_invalidate_map_cache is called (see the DPRINTF warning at
> > > > > > the
> > > > > > beginning of the functions). Without locked mappings, there should
> > > > > > never
> > > > > > be more than one element in each list (see xen_map_cache_unlocked:
> > > > > > entry->lock == true is a necessary condition to append a new entry
> > > > > > to
> > > > > > the list, otherwise it is just remapped).
> > > > > > 
> > > > > > Can you confirm that what you are seeing are locked mappings
> > > > > > when xen_invalidate_map_cache is called? To find out, enable the
> > > > > > DPRINTK
> > > > > > by turning it into a printf or by defininig MAPCACHE_DEBUG.
> > > > > In fact, I think the DPRINTF above is incorrect too. In
> > > > > pci_add_option_rom(), rtl8139 rom is locked mapped in
> > > > > pci_add_option_rom->memory_region_get_ram_ptr (after
> > > > > memory_region_init_ram). So actually I think we should remove the
> > > > > DPRINTF warning as it is normal.
> > > > Let me explain why the DPRINTF warning is there: emulated dma operations
> > > > can involve locked mappings. Once a dma operation completes, the related
> > > > mapping is unlocked and can be safely destroyed. But if we destroy a
> > > > locked mapping in xen_invalidate_map_cache, while a dma is still
> > > > ongoing, QEMU will crash. We cannot handle that case.
> > > > 
> > > > However, the scenario you described is different. It has nothing to do
> > > > with DMA. It looks like pci_add_option_rom calls
> > > > memory_region_get_ram_ptr to map the rtl8139 rom. The mapping is a
> > > > locked mapping and it is never unlocked or destroyed.
> > > > 
> > > > It looks like "ptr" is not used after pci_add_option_rom returns. Does
> > > > the append patch fix the problem you are seeing? For the proper fix, I
> > > > think we probably need some sort of memory_region_unmap wrapper or maybe
> > > > a call to address_space_unmap.
> > > 
> > > Yes, I think so, maybe this is the proper way to fix this.
> > 
> > Would you be up for sending a proper patch and testing it? We cannot call
> > xen_invalidate_map_cache_entry directly from pci.c though, it would need
> > to be one of the other functions like address_space_unmap for example.
> > 
> 
> 
> Yes, I will look into this.

Any updates?


> > > > diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> > > > index e6b08e1..04f98b7 100644
> > > > --- a/hw/pci/pci.c
> > > > +++ b/hw/pci/pci.c
> > > > @@ -2242,6 +2242,7 @@ static void pci_add_option_rom(PCIDevice *pdev,
> > > > bool
> > > > is_default_rom,
> > > >}
> > > >  pci_register_bar(pdev, PCI_ROM_SLOT, 0, >rom);
> > > > +xen_invalidate_map_cache_entry(ptr);
> > > >}
> > > >  static void pci_del_option_rom(PCIDevice *pdev)
>

Re: [Qemu-devel] [PATCH v5 4/4] RFC: shutdown: Expose full ShutdownCause across QMP

2017-04-28 Thread Eric Blake

[adding Dan in cc]

On 04/27/2017 09:13 PM, Eric Blake wrote:
> Since all reset/shutdown requests have been associated with a
> reason, we can expose the full reason rather than a simple bool
> to the guest. Document that any future additions to the enum
> type will still use a 'host-' or 'guest-' prefix.
> 
> Signed-off-by: Eric Blake 
> 

This is RFC because I really want to know whether libvirt and upper
layer management apps will care about the various causes, or if exposing
just a boolean is okay.  We can maintain backwards compatibility if we
release with just a boolean and later decide we want to expose the enum,
but it's easier to start with a design that meets everyone's needs that
it is to have to add stuff later or wish that we hadn't exposed quite so
many enum values.


> +++ b/qapi/event.json
> @@ -10,9 +10,7 @@
>  # Emitted when the virtual machine has shut down, indicating that qemu is
>  # about to exit.
>  #
> -# @guest: If true, the shutdown was triggered by a guest request (such as
> -# a guest-initiated ACPI shutdown request or other hardware-specific action)
> -# rather than a host request (such as sending qemu a SIGINT). (since 2.10)
> +# @reason: What triggered the reset. (since 2.10)
>  #
>  # Note: If the command-line option "-no-shutdown" has been specified, qemu 
> will
>  # not exit, and a STOP event will eventually follow the SHUTDOWN event
> @@ -21,11 +19,11 @@
>  #
>  # Example:
>  #
> -# <- { "event": "SHUTDOWN", "data": { "guest": true },
> +# <- { "event": "SHUTDOWN", "data": { "reason": "guest-shutdown" },
>  #  "timestamp": { "seconds": 1267040730, "microseconds": 682951 } }
>  #
>  ##
> -{ 'event': 'SHUTDOWN', 'data': { 'guest': 'bool' } }
> +{ 'event': 'SHUTDOWN', 'data': { 'reason': 'ShutdownCause' } }
> 


-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] replay help [was: [PATCH v5 3/4] shutdown: Add source information to SHUTDOWN and RESET]

2017-04-28 Thread Eric Blake

[trimming cc's]

On 04/28/2017 10:01 AM, Markus Armbruster wrote:
> Eric Blake  writes:
> 
>> Libvirt would like to be able to distinguish between a SHUTDOWN
>> event triggered solely by guest request and one triggered by a
>> SIGTERM or other action on the host.  While qemu_kill_report() is
>> already able to tell whether a shutdown was triggered by a host
>> signal (but NOT by a host UI event, such as clicking the X on
>> the window), that information was then lost after being printed
>> to stderr.  The previous patch prepped things to use an enum
>> internally; now it's time to wire it up through all callers, and
>> to extend the SHUTDOWN and RESET events to report the details.
>>

>> The replay driver needs a followup patch if we want to be able to
>> faithfully replay the difference between a host- and guest-initiated
>> shutdown (for now, the replayed event is always attributed to host).
> 
> I'd prefer to get this right from the start, but that requires input
> from replay guys.
> 
> Scandalously, replay/ is not covered by MAINTAINERS.
> scripts/get_maintainers.pl blames it on Pavel Dovgalyuk (cc'ed), and git
> shows recent activity.  Pavel, please post a suitable patch to
> MAINTAINERS, and please help us figure out what to do about replaying
> reset here.

In particular, my questions include:

How sensitive is the enum ReplayEvents in replay-internal.h to
additions? Do we have to worry about cross-version compatibility (where
we can only add new events at the end), or can we reorder things at
will?  I suspect that depends on whether you ever plan to replay a
script recorded by old qemu while running new qemu.

One idea would be to carve out a range of new enums in ReplayEvents,
similar to how you have a range of CHECKPOINT_COUNT enums carved out;
the range would be large enough to encode each cause of shutdown, and
then you just map the range back into a reset request with the right
cause. Another would be to modify the replay script file format: right
now, it records a single byte for a shutdown request, but a single new
enum triggers that we now read a second byte for the cause (where the
second byte is directly the cause).  If replaying old streams matters,
we still have to burn a new enum (so that you can tell the difference
between old stream with no cause occupying one byte, vs. new stream with
second byte giving the cause); if cross-version qemu doesn't matter (ie.
upgrading qemu can invalidate all previously captured replay streams),
then this version would better be served by repurposing the existing
EVENT_SHUTDOWN enum in-place.

I'm probably up to the task of coding this, rather than waiting for a
solution from Pavel; but I would rather have some guidance on preferred
direction, particularly the understanding of how much we care about
cross-version replay (which is admittedly a tougher thing than
cross-version migration), so I don't waste time by starting coding down
a wrong direction.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v5 2/4] shutdown: Prepare for use of an enum in reset/shutdown_request

2017-04-28 Thread Eric Blake

On 04/28/2017 09:42 AM, Markus Armbruster wrote:
> Eric Blake  writes:
> 
>> We want to track why a guest was shutdown; in particular, being able
>> to tell the difference between a guest request (such as ACPI request)
>> and host request (such as SIGINT) will prove useful to libvirt.
>> Since all requests eventually end up changing shutdown_requested in
>> vl.c, the logical change is to make that value track the reason,
>> rather than its current 0/1 contents.
>>
>> Since command-line options control whether a reset request is turned
>> into a shutdown request instead, the same treatment is given to
>> reset_requested.
>>
>> This patch adds a QAPI enum ShutdownCause that describes reasons
>> that a shutdown can be requested, and changes qemu_system_reset() to
>> pass the reason through, although for now it is not reported.  The
>> next patch will actually wire things up to modify events to report
>> data based on the reason, and to pass the correct enum value in from
>> various call-sites that can trigger a reset/shutdown.  Since QAPI
>> generates enums starting at 0, it's easier if we use a different
>> number as our sentinel that no request has happened yet.  Most of
>> the changes are in vl.c, but xen was using things externally.
>>

>> -static int reset_requested;
>> -static int shutdown_requested, shutdown_signal;
>> +static int reset_requested = -1;
>> +static int shutdown_requested = -1, shutdown_signal;
> 
> Peeking ahead, I see that shutdown_requested and reset_requested take
> ShutdownCause values and -1.  The latter means "no shutdown requested".
> What about adding 'none' to ShutdownCause, with value 0, und use that
> instead of literal -1?  Would avoid the unusual "negative means false,
> non-negative means true".

Works nicely if the enum is internal-use only.  Gets a bit more awkward
if the enum is exposed to the end-user.

The fact that I let QAPI generate the enum in patch 3 is evidence that
I'm leaning towards exposing it to the end user (patch 4); if we want to
keep it internal-only, a better place for the enum might be in sysemu.h
(where we also have the weird '#define VMRESET_SILENT false' '#define
VMRESET_REPORT true' to name a boolean parameter).

> 
> PATCH 4 exposes ShutdownCause in event SHUTDOWN, and 'none' must not
> occur there.  However, if we ever add a query-shutdown to go with this
> event, we will need 'none' there.

So, query-shutdown would basically be: what is the last-reported
shutdown event (normally none, when the guest is still running; but if,
like libvirt, you start qemu -no-shutdown, it can then be the cause of
why we are in a shutdown/stopped state while waiting for final cleanup)?

How important/likely is such an event?  (Hmm, from libvirt's
perspective, events are usually reliable, but can be lost; if we can
restart libvirtd and reconnect to a qemu process that is hanging on to
life only because no one has cleaned it up yet, query-shutdown does seem
like a useful thing for libvirt to have at the time it reconnects to
that qemu process).

We could always include 'none' in the QAPI enum, then document in
'SHUTDOWN' and 'RESET' events that the cause will never be 'none'.  Doc
hacks like that feel a little unclean, but not so horrible as to be
unforgivable.

> 
> I'd be tempted to reshuffle declarations here, because shutdown_signal's
> int is a different one than reset_requested's and shutdown_requested,
> and the latter two's "negative means false, non-negative means true" is
> unusual enough to justify a comment.
...
> 
> Hmm.  In case we stick to literal -1: consider splitting this patch into
> a part that changes @shutdown_requested from zero/non-zero to
> negative/non-negative, and a part that uses ShutdownCause for the
> non-negative values.

You're definitely right that if the enum doesn't have a nice none=0
state, then reshuffling to the magic -1 as no request is awkward enough
to be done alone.

But part of the answer is also dependent on whether we want PATCH 4 or
not (or, as you brought up, the possibility of a query-shutdown command
with even more persistent storage of the last-reported event).

>> @@ -1650,11 +1650,11 @@ static void qemu_kill_report(void)
>>  static int qemu_reset_requested(void)
>>  {
>>  int r = reset_requested;
>> -if (r && replay_checkpoint(CHECKPOINT_RESET_REQUESTED)) {
>> -reset_requested = 0;
>> +if (r >= 0 && replay_checkpoint(CHECKPOINT_RESET_REQUESTED)) {
>> +reset_requested = -1;
>>  return r;
>>  }
>> -return false;
>> +return -1;
> 
> "return false" in a function returning int smells, good riddance.
> 

In one of my earlier drafts of the patch, I even tried to change the
return type from int to bool, but decided that keeping it as int worked
best (if I have to use the -1/cause dichotomy).  But you're also right
that with a 'none' value in the enum, I could directly return ShutdownCause.

>>  }
>>
>>  static int qemu_suspend_requested(void)
>> @@

Re: [Qemu-devel] [PULL 14/34] iotests: Launch qemu-nbd with -e 42

2017-04-28 Thread Eric Blake

On 04/28/2017 03:33 PM, Kevin Wolf wrote:
> From: Max Reitz 
> 
> There is no reason for the qemu-nbd server used for tests not to accept
> an arbitrary number of clients. In fact, test 181 will require it to
> accept two clients at the same time (and thus it fails before this
> patch).

Quick question:

Is this mention of test 181 the same test that occurs earlier in the
pull request as 8/34?  If so, should the series have been reordered for
bisectability?  If not, should the commit message be touched up?

At any rate, bisectability of qemu-iotests is less of a concern than
bisectability of 'make check' success, so I don't think your answer
invalidates the pull request.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v10 09/17] qcow2: Optimize write zero of unaligned tail cluster

2017-04-28 Thread Eric Blake

On 04/28/2017 03:48 PM, Max Reitz wrote:
> On 27.04.2017 03:46, Eric Blake wrote:
>> We've already improved discards to operate efficiently on the tail
>> of an unaligned qcow2 image; it's time to make a similar improvement
>> to write zeroes.  The special case is only valid at the tail
>> cluster of a file, where we must recognize that any sectors beyond
>> the image end would implicitly read as zero, and therefore should
>> not penalize our logic for widening a partial cluster into writing
>> the whole cluster as zero.
>>
>> Update test 154 to cover the new scenarios, using two images of
>> intentionally differing length.
>>
>> While at it, fix the test to gracefully skip when run as
>> ./check -qcow2 -o compat=0.10 154
>> since the older format lacks zero clusters already required earlier
>> in the test.
>>
>> Signed-off-by: Eric Blake 
>>

>>
>> +echo
>> +echo == unaligned image tail cluster, no allocation needed ==
>> +
>> +CLUSTER_SIZE=1024 TEST_IMG="$TEST_IMG.base" _make_test_img $((size + 1024))
> 
> Any reason for the CLUSTER_SIZE? It passes with 64 kB as well, and I
> actually think that would be a better test because as it is, the image's
> "unaligned" tail is exactly one cluster (so it isn't really unaligned).

Uhhh - copy-and-paste?  You're right, that 1024 is too small for what
I'm actually doing with it.  :(

> 
>> +_make_test_img -b "$TEST_IMG.base" $((size + 2048))
>> +
>> +# With no backing file, write to all or part of unallocated partial cluster
>> +
>> +# Backing file: 128m: ... | 00 --
> 
> Not sure what the "00 --" is supposed to mean. The cluster size is 1 kB,
> so this is just a single cluster; which will be converted from
> unallocated to a zero cluster because the tail reads as zero.
> 
>> +$QEMU_IO -c "write -z $size 512" "$TEST_IMG.base" | _filter_qemu_io

I'm doing a write of 512 bytes, so I was trying to show that in the
cluster starting at 128m, we have a write request for one sector to be
written to 00, while the rest of the sector is left uninitialized. The
obvious intent is that we note that the uninitialized portion reads as
zero, so we can optimize the entire cluster (even though it is a partial
cluster) to be a cluster write-zero instead of a sector-based allocating
write.  But since you've pointed out that my setup is flawed, I'll have
to double-check that I'm actually testing what I thought I was (I know I
hit the right breakpoints in gdb, but its the iotest expected output
that needs to show the difference between pre- and post-patched qemu
with regards to the partial-tail-cluster optimizations).

>> +
>> +# Backing file: 128m: ... | -- 00
> 
> Same here, but now it's the head that reads as zero (and it's already a
> zero cluster so nothing changes here).
> 
>> +$QEMU_IO -c "write -z $size 512" "$TEST_IMG.base" | _filter_qemu_io
> 
> But I suppose you mean $((size + 512))?

Umm. Yeah.  (Hey - you're not the only one that writes patches late at
night).

> 
>> +
>> +# Backing file: 128m: ... | 00 00
> 
> And finally this doesn't change anything either -- but then again this
> is a fully aligned request, so I don't quite see the point in testing
> this here.

At one point during my development, I had a bug where a partial write
request at the head of the cluster behaved differently from a partial
write request at the end (one allocated while the other did not).

Maybe it's best if I do a single write, then inspect the map, then reset
the image, rather than doing all three writes in a row and proving that
the net result of the three writes had no allocation.  In v9, when I was
trying to use unallocated clusters as a shortcut on files with no
backing image, this actually worked (after all three writes, the cluster
would still be unallocated); but since Kevin convinced me that v10 has
to set the zero flag on all three writes, I'm back to the point of
needing to clear the zero flag between writes.

Okay, I'll definitely have to fix it.

>> +# With backing file that reads as zero, write to all or part of entire 
>> cluster
>> +
>> +# Backing file: 128m: ... | 00 00
>> +# Active layer: 128m: ... | 00 00 00 00
>> +$QEMU_IO -c "write -z $size 2048" "$TEST_IMG" | _filter_qemu_io
>> +$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
>> +
>> +# Backing file: 128m: ... | 00 00
>> +# Active layer: 128m: ... | 00 00 -- --
> 
> How so? Shouldn't this just stay a zero cluster because the rest of the
> cluster already reads as zero?

Same problem as above.  In v9, I was trying to exploit code that left a
cluster unallocated; but that's not viable, so I'll have to reset the
image between steps.

> 
> Also, I'm not quite sure what you are testing by just issuing three
> consecutive write requests without checking the result each time. You
> assume it to be something which you wrote in the comments above each,
> but you can't be sure (yes, if something goes wrong in between, the
> following final map should catch it, but it's not for certain).

Well, it

Re: [Qemu-devel] [PATCH v5 + 2/2] target/aarch64: optimize indirect branches

2017-04-28 Thread Emilio G. Cota

On Fri, Apr 28, 2017 at 15:17:25 -0400, Emilio G. Cota wrote:
> Measurements:
(snip)
> -SPECint06 (test set), x86_64-linux-user. Host: 
> Intel i7-4790K @ 4.00GHz
(snip)
> -   SPECint06 (train set), x86_64-linux-user. Host: 
> Intel i7-4790K @ 4.00GHz
s/x86_64/aarch64/ , obviously.

E.

Re: [Qemu-devel] [PATCH v3] qmp-shell: add persistent command history

2017-04-28 Thread Kashyap Chamarthy

On Thu, Apr 27, 2017 at 06:36:28PM -0400, John Snow wrote:
> Use the existing readline history function we are utilizing
> to provide persistent command history across instances of qmp-shell.
> 
> This assists entering debug commands across sessions that may be
> interrupted by QEMU sessions terminating, where the qmp-shell has
> to be relaunched.
> 
> Signed-off-by: John Snow 
> ---
> 
> v3: Hardcoded limit of 1024 lines of history for now,
> as setting limits in ~/.inputrc does not appear to work (Nir)

Yeah, as-is, this is strict improvement, so it's alright for now.

> v2: Adjusted the errors to whine about non-ENOENT errors, but still
> intercept all errors as non-fatal.
> Save history atexit() to match bash standard behavior
>
> 
>  scripts/qmp/qmp-shell | 21 +
>  1 file changed, 21 insertions(+)

Reviewed-by: Kashyap Chamarthy 
Tested-by: Kashyap Chamarthy 
 
Can confim the, expected, 'atexit' behavior -- only upon termination
does the history file get saved:

$ cat ~/.qmp-shell_history 
query-block
query-named-block-nodes
query-kvm
quit

Tested with:

  $ git describe
  v3.9.0-304-gca7305b

Most of my test procedure is noted in my response to the earlier
revisions:

https://lists.nongnu.org/archive/html/qemu-devel/2017-03/msg00356.html

> diff --git a/scripts/qmp/qmp-shell b/scripts/qmp/qmp-shell
> index eccb88a..6ece6e7 100755
> --- a/scripts/qmp/qmp-shell
> +++ b/scripts/qmp/qmp-shell
> @@ -70,6 +70,9 @@ import json
>  import ast
>  import readline
>  import sys
> +import os
> +import errno
> +import atexit
>  
>  class QMPCompleter(list):
>  def complete(self, text, state):
> @@ -109,6 +112,8 @@ class QMPShell(qmp.QEMUMonitorProtocol):
>  self._pretty = pretty
>  self._transmode = False
>  self._actions = list()
> +self._histfile = os.path.join(os.path.expanduser('~'),
> +  '.qmp-shell_history')

Thanks for changing the name :-)  Now the raw QMP history file
(~/.qmp_history) and 'qmp-shell' history file (~/qmp-shell_history)
won't collide.

>  def __get_address(self, arg):
>  """
> @@ -132,11 +137,27 @@ class QMPShell(qmp.QEMUMonitorProtocol):
>  def __completer_setup(self):
>  self._completer = QMPCompleter()
>  self._fill_completion()
> +readline.set_history_length(1024)
>  readline.set_completer(self._completer.complete)
>  readline.parse_and_bind("tab: complete")
>  # XXX: default delimiters conflict with some command names (eg. 
> query-),
>  # clearing everything as it doesn't seem to matter
>  readline.set_completer_delims('')
> +try:
> +readline.read_history_file(self._histfile)
> +except Exception as e:
> +if isinstance(e, IOError) and e.errno == errno.ENOENT:
> +# File not found. No problem.
> +pass
> +else:
> +print "Failed to read history '%s'; %s" % (self._histfile, e)
> +atexit.register(self.__save_history)
> +
> +def __save_history(self):
> +try:
> +readline.write_history_file(self._histfile)
> +except Exception as e:
> +print "Failed to save history file '%s'; %s" % (self._histfile, 
> e)
>  
>  def __parse_value(self, val):
>  try:
> -- 
> 2.9.3
> 

-- 
/kashyap

Re: [Qemu-devel] RFC: some problems with bdrv_get_block_status

2017-04-28 Thread Denis V. Lunev

On 04/28/2017 09:31 PM, Eric Blake wrote:
> On 04/28/2017 11:17 AM, Denis V. Lunev wrote:
>> Hello, All!
>>
>> Recently we have experienced problems with very slow
>> bdrv_get_block_status call, which is massive called f.e.
>> from mirror_run().
>>
>> The problem was narrowed down to slow lseek64() system
>> call, which can take 1-2-3 seconds.
> I'm guessing you meant one-to-three (the range), not one-two-three
> (three separate digits), and just had an unfortunate abbreviation of
> 'to' turning into the phonetically-similar '2'.
First of all, Eric, thank you very much for the prompt answer.

yes. I have meant 1..3 seconds. Thank you for clarifying that.

>> root@s158 ~]# strace -f -p 77048 -T  -e lseek
>> Process 77048 attached with 14 threads
>> [pid 77053] lseek(23, 6359613440, SEEK_DATA) = 6360276992 <1.250323>
> That sounds like a bug in your choice of filesystem.  It's been
> mentioned before that lseek has some pathetically poor behavior (I think
> tmpfs was one of the culprits), but I maintain that it is better to
> hammer on the kernel folks to fix the poor behavior than it is to have
> to implement user-space workarounds in every single affected program.
>
> That said, a workaround of being able to request the avoidance of lseek
> has been brought up on this list before.
I have found that letter (Feb 2017). The problem there was with ZFS.
I have problems with EXT4. The problem comes not immediately
but after some time under load.

This call will also cost a lot for the case of NFSv4, which supports
SEEK_DATA and SEEK_HOLE, as it will require additional
network exchange. Thus this is not cheap for at least some cases
and thus it is dangerous to follow current way. May be other
filesystems are also affected.

>
>> The problem comes from this branch of the code
>>
>> bdrv_co_get_block_status
>> ...
>> if (bs->file &&
>> (ret & BDRV_BLOCK_DATA) && !(ret & BDRV_BLOCK_ZERO) &&
>> (ret & BDRV_BLOCK_OFFSET_VALID)) {
>> int file_pnum;
>>
>> ret2 = bdrv_co_get_block_status(bs->file, ret >> BDRV_SECTOR_BITS,
>> *pnum, _pnum);
>> if (ret2 >= 0) {
>> /* Ignore errors.  This is just providing extra information, it
>>  * is useful but not necessary.
>>  */
>> if (!file_pnum) {
>> /* !file_pnum indicates an offset at or beyond the EOF;
>> it is
>>  * perfectly valid for the format block driver to point
>> to such
>>  * offsets, so catch it and mark everything as zero */
>> ret |= BDRV_BLOCK_ZERO;
>> } else {
>> /* Limit request to the range reported by the protocol
>> driver */
>> *pnum = file_pnum;
>> ret |= (ret2 & BDRV_BLOCK_ZERO);
>> }
>> }
>> }
> I don't see anything wrong with this code. It's nice to know that we
> have data (qcow2 says we have allocated bytes due to this layer, so
> don't refer to the backing files), but when the underlying file can ALSO
> tell us that the underlying protocol is sparse and we are on a hole,
> then we know that we have BDRV_BLOCK_ZERO condition which can in turn
> enable other optimizations in quite a bit of the stack.  It IS valid to
> return BDRV_BLOCK_DATA | BDRV_BLOCK_ZERO (we don't have to limit
> ourselves to BDRV_BLOCK_DATA), and we should probably do so any time
> that such computation is cheap.
>
> I agree with your analysis that on a poor filesystem where lseek()
> behavior sucks that it is no longer cheap to determine where the holes are.
>
> But the proper workaround for those filesystems should NOT be made by
> undoing this part of bdrv_co_get_block_status(), but should rather be
> fixed in file-posix.c by the addition of a user-controllable knob on
> whether to skip lseek().  In other words, if we're going to work around
> the poor filesystem performance, the workaround should live in
> file-posix.c, not in the generic block/io.c.  Once the workaround is
> enabled, the 'ret2 = bdrv_co_get_block_status()' nested call should be
> lightning fast (because file-posix.c would no longer be using lseek when
> you've requested the workaround).
I understand that it is very good to know, but what benefit we will have
really:
- backup will skip cluster only if the cluster is all zeroes
- mirror skips the cluster only if it will be full zeroes
- the same applies for 'qemu-img convert'

I think we should somehow measure the benefit against the loss.
We have at least 2 (or probably 3) different filesystems which do
not work well with that.

I have seen the question to try disable 'detect-zeroes' and
this could look like a worthy option to put original behavior
under, i.e. we could perform lseek(SEEK_DATA) only if
'detect-zeroes' option is on, which well match the meaning
of that option and will save the flag namespace a bit.

Other option would be to split this API to one rough call which

[Qemu-devel] [PATCH] target/openrisc: Support non-busy idle state using PMR SPR

2017-04-28 Thread Stafford Horne

The OpenRISC architecture has the Power Management Register (PMR)
special purpose register to manage cpu power states.  The interesting
modes are:

 * Doze Mode (DME) - Stop cpu except timer & pic - wake on interrupt
 * Sleep Mode (SME) - Stop cpu and all units - wake on interrupt
 * Suspend Model (SUME) - Stop cpu and all units - wake on reset

The linux kernel will set DME when idle.

This patch implements the PMR SPR and halts the qemu cpu when there is a
change to DME or SME.  This means that openrisc qemu in no longer peggs
a host cpu at 100%.

In order for this to work we need to kick the CPU when timers are
expired.  Update the cpu timer to kick the cpu upon each timer event.

Signed-off-by: Stafford Horne 
---

Changes since last RFC - none

 hw/openrisc/cputimer.c   |  1 +
 target/openrisc/cpu.c|  3 ++-
 target/openrisc/cpu.h| 10 ++
 target/openrisc/interrupt.c  |  2 ++
 target/openrisc/machine.c|  1 +
 target/openrisc/sys_helper.c | 13 +
 6 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/hw/openrisc/cputimer.c b/hw/openrisc/cputimer.c
index a98c799..febc469 100644
--- a/hw/openrisc/cputimer.c
+++ b/hw/openrisc/cputimer.c
@@ -61,6 +61,7 @@ void cpu_openrisc_timer_update(OpenRISCCPU *cpu)
 }
 next = now + (uint64_t)wait * TIMER_PERIOD;
 timer_mod(cpu->env.timer, next);
+qemu_cpu_kick(CPU(cpu));
 }
 
 void cpu_openrisc_count_start(OpenRISCCPU *cpu)
diff --git a/target/openrisc/cpu.c b/target/openrisc/cpu.c
index c9b3f22..1d6330c 100644
--- a/target/openrisc/cpu.c
+++ b/target/openrisc/cpu.c
@@ -51,7 +51,8 @@ static void openrisc_cpu_reset(CPUState *s)
 cpu->env.lock_addr = -1;
 s->exception_index = -1;
 
-cpu->env.upr = UPR_UP | UPR_DMP | UPR_IMP | UPR_PICP | UPR_TTP;
+cpu->env.upr = UPR_UP | UPR_DMP | UPR_IMP | UPR_PICP | UPR_TTP |
+   UPR_PMP;
 cpu->env.dmmucfgr = (DMMUCFGR_NTW & (0 << 2)) | (DMMUCFGR_NTS & (6 << 2));
 cpu->env.immucfgr = (IMMUCFGR_NTW & (0 << 2)) | (IMMUCFGR_NTS & (6 << 2));
 
diff --git a/target/openrisc/cpu.h b/target/openrisc/cpu.h
index 938ccc3..2721432 100644
--- a/target/openrisc/cpu.h
+++ b/target/openrisc/cpu.h
@@ -140,6 +140,15 @@ enum {
 IMMUCFGR_HTR = (1 << 11),
 };
 
+/* Power management register */
+enum {
+PMR_SDF = (15 << 0),
+PMR_DME = (1 << 4),
+PMR_SME = (1 << 5),
+PMR_DCGE = (1 << 6),
+PMR_SUME = (1 << 7),
+};
+
 /* Float point control status register */
 enum {
 FPCSR_FPEE = 1,
@@ -284,6 +293,7 @@ typedef struct CPUOpenRISCState {
 uint32_t immucfgr;/* IMMU configure register */
 uint32_t esr; /* Exception supervisor register */
 uint32_t evbar;   /* Exception vector base address register */
+uint32_t pmr; /* Power Management Register */
 uint32_t fpcsr;   /* Float register */
 float_status fp_status;
 
diff --git a/target/openrisc/interrupt.c b/target/openrisc/interrupt.c
index 2c91fab..3959671 100644
--- a/target/openrisc/interrupt.c
+++ b/target/openrisc/interrupt.c
@@ -60,6 +60,8 @@ void openrisc_cpu_do_interrupt(CPUState *cs)
 env->sr |= SR_SM;
 env->sr &= ~SR_IEE;
 env->sr &= ~SR_TEE;
+env->pmr &= ~PMR_DME;
+env->pmr &= ~PMR_SME;
 env->tlb->cpu_openrisc_map_address_data = _openrisc_get_phys_nommu;
 env->tlb->cpu_openrisc_map_address_code = _openrisc_get_phys_nommu;
 env->lock_addr = -1;
diff --git a/target/openrisc/machine.c b/target/openrisc/machine.c
index a82be62..a20cce7 100644
--- a/target/openrisc/machine.c
+++ b/target/openrisc/machine.c
@@ -138,6 +138,7 @@ static const VMStateDescription vmstate_env = {
 VMSTATE_UINT32(dmmucfgr, CPUOpenRISCState),
 VMSTATE_UINT32(immucfgr, CPUOpenRISCState),
 VMSTATE_UINT32(evbar, CPUOpenRISCState),
+VMSTATE_UINT32(pmr, CPUOpenRISCState),
 VMSTATE_UINT32(esr, CPUOpenRISCState),
 VMSTATE_UINT32(fpcsr, CPUOpenRISCState),
 VMSTATE_UINT64(mac, CPUOpenRISCState),
diff --git a/target/openrisc/sys_helper.c b/target/openrisc/sys_helper.c
index fa3d6a4..44acf0d 100644
--- a/target/openrisc/sys_helper.c
+++ b/target/openrisc/sys_helper.c
@@ -22,6 +22,7 @@
 #include "cpu.h"
 #include "exec/exec-all.h"
 #include "exec/helper-proto.h"
+#include "exception.h"
 
 #define TO_SPR(group, number) (((group) << 11) + (number))
 
@@ -141,6 +142,15 @@ void HELPER(mtspr)(CPUOpenRISCState *env,
 case TO_SPR(5, 2):  /* MACHI */
 env->mac = deposit64(env->mac, 32, 32, rb);
 break;
+case TO_SPR(8, 0):  /* PMR */
+env->pmr = rb;
+if (env->pmr & PMR_DME || env->pmr & PMR_SME) {
+cpu_restore_state(cs, GETPC());
+env->pc += 4;
+cs->halted = 1;
+raise_exception(cpu, EXCP_HLT);
+}
+break;
 case TO_SPR(9, 0):  /* PICMR */
 env->picmr |= rb;
 break;
@@ -287,6 +297,9 @@ target_ulong HELPER(mfspr)(CPUOpenRISCState

Re: [Qemu-devel] RFC: some problems with bdrv_get_block_status

2017-04-28 Thread Denis V. Lunev

On 04/28/2017 10:35 PM, Denis V. Lunev wrote:
> On 04/28/2017 09:31 PM, Eric Blake wrote:
>> On 04/28/2017 11:17 AM, Denis V. Lunev wrote:
>>> Hello, All!
>>>
>>> Recently we have experienced problems with very slow
>>> bdrv_get_block_status call, which is massive called f.e.
>>> from mirror_run().
>>>
>>> The problem was narrowed down to slow lseek64() system
>>> call, which can take 1-2-3 seconds.
>> I'm guessing you meant one-to-three (the range), not one-two-three
>> (three separate digits), and just had an unfortunate abbreviation of
>> 'to' turning into the phonetically-similar '2'.
>>
> from 1 to 3, you are correct
>
>
>>> root@s158 ~]# strace -f -p 77048 -T  -e lseek
>>> Process 77048 attached with 14 threads
>>> [pid 77053] lseek(23, 6359613440, SEEK_DATA) = 6360276992 <1.250323>
>> That sounds like a bug in your choice of filesystem.  It's been
>> mentioned before that lseek has some pathetically poor behavior (I think
>> tmpfs was one of the culprits), but I maintain that it is better to
>> hammer on the kernel folks to fix the poor behavior than it is to have
>> to implement user-space workarounds in every single affected program.
>>
>> That said, a workaround of being able to request the avoidance of lseek
>> has been brought up on this list before.
>>
>>
>>> The problem comes from this branch of the code
>>>
>>> bdrv_co_get_block_status
>>> ...
>>> if (bs->file &&
>>> (ret & BDRV_BLOCK_DATA) && !(ret & BDRV_BLOCK_ZERO) &&
>>> (ret & BDRV_BLOCK_OFFSET_VALID)) {
>>> int file_pnum;
>>>
>>> ret2 = bdrv_co_get_block_status(bs->file, ret >> BDRV_SECTOR_BITS,
>>> *pnum, _pnum);
>>> if (ret2 >= 0) {
>>> /* Ignore errors.  This is just providing extra information, it
>>>  * is useful but not necessary.
>>>  */
>>> if (!file_pnum) {
>>> /* !file_pnum indicates an offset at or beyond the EOF;
>>> it is
>>>  * perfectly valid for the format block driver to point
>>> to such
>>>  * offsets, so catch it and mark everything as zero */
>>> ret |= BDRV_BLOCK_ZERO;
>>> } else {
>>> /* Limit request to the range reported by the protocol
>>> driver */
>>> *pnum = file_pnum;
>>> ret |= (ret2 & BDRV_BLOCK_ZERO);
>>> }
>>> }
>>> }
>> I don't see anything wrong with this code. It's nice to know that we
>> have data (qcow2 says we have allocated bytes due to this layer, so
>> don't refer to the backing files), but when the underlying file can ALSO
>> tell us that the underlying protocol is sparse and we are on a hole,
>> then we know that we have BDRV_BLOCK_ZERO condition which can in turn
>> enable other optimizations in quite a bit of the stack.  It IS valid to
>> return BDRV_BLOCK_DATA | BDRV_BLOCK_ZERO (we don't have to limit
>> ourselves to BDRV_BLOCK_DATA), and we should probably do so any time
>> that such computation is cheap.
>>
>> I agree with your analysis that on a poor filesystem where lseek()
>> behavior sucks that it is no longer cheap to determine where the holes are.
>>
>> But the proper workaround for those filesystems should NOT be made by
>> undoing this part of bdrv_co_get_block_status(), but should rather be
>> fixed in file-posix.c by the addition of a user-controllable knob on
>> whether to skip lseek().  In other words, if we're going to work around
>> the poor filesystem performance, the workaround should live in
>> file-posix.c, not in the generic block/io.c.  Once the workaround is
>> enabled, the 'ret2 = bdrv_co_get_block_status()' nested call should be
>> lightning fast (because file-posix.c would no longer be using lseek when
>> you've requested the workaround).
>>
>>> Frankly speaking, this optimization should not give much.
>>> If upper layer format (QCOW2) says that we have data
>>> here, then nowadays in 99.9% we do have the data.
>> You are correct that we have BDRV_BLOCK_DATA; but it IS useful to ALSO
>> know if that data reads back as all zeros (so we can skip reading it).
>> Just because qcow2 reports something as data does NOT rule out whether
>> the data still reads as zeros.
>>
>>> Meanwhile this branch can cause problems. We would
>>> need block cleared entirely to get the benefit for most
>>> cases in mirror and backup operations.
>>>
>>> At my opinion it worth to drop this at all.
>>>
>>> Guys, do you have any opinion?
>> Again, my opinion is to not change this part of block/io.c.  Rather,
>> work should be expended on individual protocol drivers to quit wasting
>> unnecessary time on reporting BDRV_BLOCK_ZERO if the time spent to
>> determine that is not worth the effort (as it is when using lseek() on
>> inefficient filesystems).  It is always safe for a protocol driver to
>> report BDRV_BLOCK_DATA instead of BDRV_BLOCK_ZERO, if asking the
>> question is too expensive (treating holes as

Re: [Qemu-devel] [PATCH v10 07/17] qemu-io: Switch 'alloc' command to byte-based length

2017-04-28 Thread Max Reitz

On 28.04.2017 22:36, Eric Blake wrote:
> On 04/28/2017 03:09 PM, Max Reitz wrote:
>> On 28.04.2017 21:59, Eric Blake wrote:
>>> On 04/28/2017 02:46 PM, Max Reitz wrote:
 On 27.04.2017 03:46, Eric Blake wrote:
> For the 'alloc' command, accepting an offset in bytes but a length
> in sectors, and reporting output in sectors, is confusing.  Do
> everything in bytes, and adjust the expected output accordingly.
>
> Signed-off-by: Eric Blake 
> Reviewed-by: Philippe Mathieu-Daudé 
>
>>>
>  }
> +if (!QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE)) {
> +printf("bytes %" PRId64 " is not sector aligned\n",

 This isn't real English. :-)
>>>
>>> But, it's just copy-and-paste from the other instances you just reviewed
>>> in 6/17!  [Translation - if I change this one, I also get to redo that one]
>>
>> No, those are "offset" and "count" -- all singular. "bytes" is plural. ;-)
> 
> Then an obvious solution: s/bytes/count/ in the parameter name :)
> 
> But I still get to redo those, to add the '-' in 'sector-aligned'.

Oh, right! Didn't even notice. Well, in real languages stuff like that
would have to be joined into a single word anyway.

>>> Which of these various alternatives (if any) looks better:
>>>
>>> bytes=511 is not sector-aligned
>>> 511 is not a sector-aligned value for 'bytes'
>>> requested 'bytes' of 511 is not sector-aligned
>>> alignment error: 511 bytes is not sector-aligned
>>> 'bytes' must be sector-aligned: 511
>>> your clever entry here...
>>
>> How about "byte count" instead of "bytes" or "bytes value", if really
>> want to have the exact spelling in there?
>>
>> For your entries above: (1) and (2) work for me (I like (2) a bit
>> better), (3) doesn't sound like real English either, and it should be
>> s/is/are/ in (4) (but it still sounds off with that change). (5) I
>> mostly dislike because I dislike error message of the form "This should
>> be X: $foo", I like "$foo is not X" better.
> 
> Maybe this variation of (3) solves the singular/plural disconnect:
> 
> request of 511 for 'bytes' is not sector-aligned
> 
> which makes it obvious that the "request of 511" (singular) and not the
> parameter name (whether singular 'count' or plural 'bytes') is the
> subject.  But it's a bit wordier than (2).  So it looks like (2) may be
> a winner in all the situations.  But I also think you convinced me to
> rename the command parameter; in my next spin, the help text will read:
> 
> alloc offset [count] -- checks if offset is allocated in the file
> 
> which starts to be non-trivial enough to drop R-b that you were willing
> to give for just an error message wording change.

Well, reviewing the change will be simple enough, so it doesn't really
matter to me. :-)

(I'm still a bit upset why you think that the average qemu-io user
cannot make the connection between "byte count" and a parameter named
"bytes". Because I am the average qemu-io user. Huff!)

Max



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v10 09/17] qcow2: Optimize write zero of unaligned tail cluster

2017-04-28 Thread Max Reitz

On 27.04.2017 03:46, Eric Blake wrote:
> We've already improved discards to operate efficiently on the tail
> of an unaligned qcow2 image; it's time to make a similar improvement
> to write zeroes.  The special case is only valid at the tail
> cluster of a file, where we must recognize that any sectors beyond
> the image end would implicitly read as zero, and therefore should
> not penalize our logic for widening a partial cluster into writing
> the whole cluster as zero.
> 
> Update test 154 to cover the new scenarios, using two images of
> intentionally differing length.
> 
> While at it, fix the test to gracefully skip when run as
> ./check -qcow2 -o compat=0.10 154
> since the older format lacks zero clusters already required earlier
> in the test.
> 
> Signed-off-by: Eric Blake 
> 
> ---
> v10: rebase to better reporting of preallocated zero clusters
> v9: new patch
> ---
>  block/qcow2.c  |   7 +++
>  tests/qemu-iotests/154 | 112 
> -
>  tests/qemu-iotests/154.out |  83 +
>  3 files changed, 200 insertions(+), 2 deletions(-)
> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 5c1573c..8038793 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -2449,6 +2449,10 @@ static bool is_zero_sectors(BlockDriverState *bs, 
> int64_t start,
>  BlockDriverState *file;
>  int64_t res;
> 
> +if (start + count > bs->total_sectors) {
> +count = bs->total_sectors - start;
> +}
> +
>  if (!count) {
>  return true;
>  }
> @@ -2467,6 +2471,9 @@ static coroutine_fn int 
> qcow2_co_pwrite_zeroes(BlockDriverState *bs,
>  uint32_t tail = (offset + count) % s->cluster_size;
> 
>  trace_qcow2_pwrite_zeroes_start_req(qemu_coroutine_self(), offset, 
> count);
> +if (offset + count == bs->total_sectors * BDRV_SECTOR_SIZE) {
> +tail = 0;
> +}
> 
>  if (head || tail) {
>  int64_t cl_start = (offset - head) >> BDRV_SECTOR_BITS;
> diff --git a/tests/qemu-iotests/154 b/tests/qemu-iotests/154
> index 7ca7219..005f09f 100755
> --- a/tests/qemu-iotests/154
> +++ b/tests/qemu-iotests/154
> @@ -2,7 +2,7 @@
>  #
>  # qcow2 specific bdrv_pwrite_zeroes tests with backing files (complements 
> 034)
>  #
> -# Copyright (C) 2016 Red Hat, Inc.
> +# Copyright (C) 2016-2017 Red Hat, Inc.
>  #
>  # This program is free software; you can redistribute it and/or modify
>  # it under the terms of the GNU General Public License as published by
> @@ -42,7 +42,10 @@ _supported_proto file
>  _supported_os Linux
> 
>  CLUSTER_SIZE=4k
> -size=128M
> +size=$((128 * 1024 * 1024))
> +
> +# This test requires zero clusters, added in v3 images
> +_unsupported_imgopts compat=0.10
> 
>  echo
>  echo == backing file contains zeros ==
> @@ -299,6 +302,111 @@ $QEMU_IO -c "read -P 0 75k 1k" "$TEST_IMG" | 
> _filter_qemu_io
> 
>  $QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
> 
> +echo
> +echo == unaligned image tail cluster, no allocation needed ==
> +
> +CLUSTER_SIZE=1024 TEST_IMG="$TEST_IMG.base" _make_test_img $((size + 1024))

Any reason for the CLUSTER_SIZE? It passes with 64 kB as well, and I
actually think that would be a better test because as it is, the image's
"unaligned" tail is exactly one cluster (so it isn't really unaligned).

> +_make_test_img -b "$TEST_IMG.base" $((size + 2048))
> +
> +# With no backing file, write to all or part of unallocated partial cluster
> +
> +# Backing file: 128m: ... | 00 --

Not sure what the "00 --" is supposed to mean. The cluster size is 1 kB,
so this is just a single cluster; which will be converted from
unallocated to a zero cluster because the tail reads as zero.

> +$QEMU_IO -c "write -z $size 512" "$TEST_IMG.base" | _filter_qemu_io
> +
> +# Backing file: 128m: ... | -- 00

Same here, but now it's the head that reads as zero (and it's already a
zero cluster so nothing changes here).

> +$QEMU_IO -c "write -z $size 512" "$TEST_IMG.base" | _filter_qemu_io

But I suppose you mean $((size + 512))?

> +
> +# Backing file: 128m: ... | 00 00

And finally this doesn't change anything either -- but then again this
is a fully aligned request, so I don't quite see the point in testing
this here.

> +$QEMU_IO -c "write -z $size 1024" "$TEST_IMG.base" | _filter_qemu_io
> +
> +# No offset should be allocated, although the cluster itself is considered
> +# allocated due to the zero flag
> +$QEMU_IO -c "alloc $size 1024" "$TEST_IMG.base" | _filter_qemu_io
> +$QEMU_IMG map --output=json "$TEST_IMG.base" | _filter_qemu_img_map
> +
> +# With backing file that reads as zero, write to all or part of entire 
> cluster
> +
> +# Backing file: 128m: ... | 00 00
> +# Active layer: 128m: ... | 00 00 00 00
> +$QEMU_IO -c "write -z $size 2048" "$TEST_IMG" | _filter_qemu_io
> +$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
> +
> +# Backing file: 128m: ... | 00 00
> +# Active layer: 128m: ... | 00 00 -- --

How so? Shouldn't

Re: [Qemu-devel] [PATCH v10 07/17] qemu-io: Switch 'alloc' command to byte-based length

2017-04-28 Thread Eric Blake

On 04/28/2017 03:09 PM, Max Reitz wrote:
> On 28.04.2017 21:59, Eric Blake wrote:
>> On 04/28/2017 02:46 PM, Max Reitz wrote:
>>> On 27.04.2017 03:46, Eric Blake wrote:
 For the 'alloc' command, accepting an offset in bytes but a length
 in sectors, and reporting output in sectors, is confusing.  Do
 everything in bytes, and adjust the expected output accordingly.

 Signed-off-by: Eric Blake 
 Reviewed-by: Philippe Mathieu-Daudé 

>>
  }
 +if (!QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE)) {
 +printf("bytes %" PRId64 " is not sector aligned\n",
>>>
>>> This isn't real English. :-)
>>
>> But, it's just copy-and-paste from the other instances you just reviewed
>> in 6/17!  [Translation - if I change this one, I also get to redo that one]
> 
> No, those are "offset" and "count" -- all singular. "bytes" is plural. ;-)

Then an obvious solution: s/bytes/count/ in the parameter name :)

But I still get to redo those, to add the '-' in 'sector-aligned'.

> 
>>
>> Which of these various alternatives (if any) looks better:
>>
>> bytes=511 is not sector-aligned
>> 511 is not a sector-aligned value for 'bytes'
>> requested 'bytes' of 511 is not sector-aligned
>> alignment error: 511 bytes is not sector-aligned
>> 'bytes' must be sector-aligned: 511
>> your clever entry here...
> 
> How about "byte count" instead of "bytes" or "bytes value", if really
> want to have the exact spelling in there?
> 
> For your entries above: (1) and (2) work for me (I like (2) a bit
> better), (3) doesn't sound like real English either, and it should be
> s/is/are/ in (4) (but it still sounds off with that change). (5) I
> mostly dislike because I dislike error message of the form "This should
> be X: $foo", I like "$foo is not X" better.

Maybe this variation of (3) solves the singular/plural disconnect:

request of 511 for 'bytes' is not sector-aligned

which makes it obvious that the "request of 511" (singular) and not the
parameter name (whether singular 'count' or plural 'bytes') is the
subject.  But it's a bit wordier than (2).  So it looks like (2) may be
a winner in all the situations.  But I also think you convinced me to
rename the command parameter; in my next spin, the help text will read:

alloc offset [count] -- checks if offset is allocated in the file

which starts to be non-trivial enough to drop R-b that you were willing
to give for just an error message wording change.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PULL 33/34] iotests: fix exclusion option

2017-04-28 Thread Kevin Wolf

From: John Snow 

If you are running out-of-tree, the -x option to exclude
a certain iotest is broken.

Replace porcelain usage of ls with a sturdier awk command.

Reviewed-by: Fam Zheng 
Signed-off-by: John Snow 
Message-id: 20170427205100.9505-3-js...@redhat.com
Reviewed-by: Eric Blake 
Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/common | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/common b/tests/qemu-iotests/common
index fa8e69e..f2a7199 100644
--- a/tests/qemu-iotests/common
+++ b/tests/qemu-iotests/common
@@ -86,7 +86,8 @@ s/ .*//p
 elif $xgroup
 then
 # arg after -x
-[ ! -s $tmp.list ] && ls [0-9][0-9][0-9] [0-9][0-9][0-9][0-9] 
>$tmp.list 2>/dev/null
+# Populate $tmp.list with all tests
+awk '/^[0-9]{3,}/ {print $1}' "${source_iotests}/group" > $tmp.list 
2>/dev/null
 group_list=`sed -n <"$source_iotests/group" -e 's/$/ /' -e 
"/^[0-9][0-9][0-9].* $r /"'{
 s/ .*//p
 }'`
-- 
1.8.3.1

[Qemu-devel] [PULL 34/34] progress: Show current progress on SIGINFO

2017-04-28 Thread Kevin Wolf

From: Max Reitz 

Currently we only print progress information on retrieval of SIGUSR1.
Some systems have a dedicated SIGINFO for this, however, so it should be
handled appropriately if it is available.

Buglink: https://bugs.launchpad.net/qemu/+bug/1662468
Signed-off-by: Max Reitz 
Message-id: 20170207235757.2026-1-mre...@redhat.com
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Max Reitz 
---
 qemu-img.texi| 3 ++-
 util/qemu-progress.c | 3 +++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/qemu-img.texi b/qemu-img.texi
index 8c573ae..50a2364 100644
--- a/qemu-img.texi
+++ b/qemu-img.texi
@@ -84,7 +84,8 @@ with or without a command shows help and lists the supported 
formats
 @item -p
 display progress bar (compare, convert and rebase commands only).
 If the @var{-p} option is not used for a command that supports it, the
-progress is reported when the process receives a @code{SIGUSR1} signal.
+progress is reported when the process receives a @code{SIGUSR1} or
+@code{SIGINFO} signal.
 @item -q
 Quiet mode - do not print any output (except errors). There's no progress bar
 in case both @var{-q} and @var{-p} options are used.
diff --git a/util/qemu-progress.c b/util/qemu-progress.c
index f745233..3c2223c 100644
--- a/util/qemu-progress.c
+++ b/util/qemu-progress.c
@@ -88,6 +88,9 @@ static void progress_dummy_init(void)
 action.sa_handler = sigusr_print;
 action.sa_flags = 0;
 sigaction(SIGUSR1, , NULL);
+#ifdef SIGINFO
+sigaction(SIGINFO, , NULL);
+#endif
 
 /*
  * SIGUSR1 is SIG_IPI and gets blocked in qemu_init_main_loop(). In the
-- 
1.8.3.1

[Qemu-devel] [PULL 30/34] qemu-img: improve convert_iteration_sectors()

2017-04-28 Thread Kevin Wolf

From: Vladimir Sementsov-Ogievskiy 

Do not do extra call to _get_block_status()

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Message-id: 20170407113404.9351-1-vsement...@virtuozzo.com
Reviewed-by: John Snow 
Signed-off-by: Max Reitz 
---
 qemu-img.c | 32 ++--
 1 file changed, 10 insertions(+), 22 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 9eb8283..9b6e728 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1554,9 +1554,15 @@ static int convert_iteration_sectors(ImgConvertState *s, 
int64_t sector_num)
 
 if (s->sector_next_status <= sector_num) {
 BlockDriverState *file;
-ret = bdrv_get_block_status(blk_bs(s->src[src_cur]),
-sector_num - src_cur_offset,
-n, , );
+if (s->target_has_backing) {
+ret = bdrv_get_block_status(blk_bs(s->src[src_cur]),
+sector_num - src_cur_offset,
+n, , );
+} else {
+ret = bdrv_get_block_status_above(blk_bs(s->src[src_cur]), NULL,
+  sector_num - src_cur_offset,
+  n, , );
+}
 if (ret < 0) {
 return ret;
 }
@@ -1565,26 +1571,8 @@ static int convert_iteration_sectors(ImgConvertState *s, 
int64_t sector_num)
 s->status = BLK_ZERO;
 } else if (ret & BDRV_BLOCK_DATA) {
 s->status = BLK_DATA;
-} else if (!s->target_has_backing) {
-/* Without a target backing file we must copy over the contents of
- * the backing file as well. */
-/* Check block status of the backing file chain to avoid
- * needlessly reading zeroes and limiting the iteration to the
- * buffer size */
-ret = bdrv_get_block_status_above(blk_bs(s->src[src_cur]), NULL,
-  sector_num - src_cur_offset,
-  n, , );
-if (ret < 0) {
-return ret;
-}
-
-if (ret & BDRV_BLOCK_ZERO) {
-s->status = BLK_ZERO;
-} else {
-s->status = BLK_DATA;
-}
 } else {
-s->status = BLK_BACKING_FILE;
+s->status = s->target_has_backing ? BLK_BACKING_FILE : BLK_DATA;
 }
 
 s->sector_next_status = sector_num + n;
-- 
1.8.3.1

[Qemu-devel] [PULL 31/34] qemu-img: use blk_co_pwrite_zeroes for zero sectors when compressed

2017-04-28 Thread Kevin Wolf

From: Lidong Chen 

When the buffer is zero, blk_co_pwrite_zeroes is more effective than
blk_co_pwritev with BDRV_REQ_WRITE_COMPRESSED. This patch can reduce
the time for converting qcow2 images with lots of zero data.

Signed-off-by: Lidong Chen 
Message-id: 1493261907-18734-1-git-send-email-lidongc...@tencent.com
Signed-off-by: Max Reitz 
---
 qemu-img.c | 44 ++--
 1 file changed, 14 insertions(+), 30 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 9b6e728..c719636 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1649,6 +1649,8 @@ static int coroutine_fn convert_co_write(ImgConvertState 
*s, int64_t sector_num,
 
 while (nb_sectors > 0) {
 int n = nb_sectors;
+BdrvRequestFlags flags = s->compressed ? BDRV_REQ_WRITE_COMPRESSED : 0;
+
 switch (status) {
 case BLK_BACKING_FILE:
 /* If we have a backing file, leave clusters unallocated that are
@@ -1658,43 +1660,24 @@ static int coroutine_fn 
convert_co_write(ImgConvertState *s, int64_t sector_num,
 break;
 
 case BLK_DATA:
-/* We must always write compressed clusters as a whole, so don't
- * try to find zeroed parts in the buffer. We can only save the
- * write if the buffer is completely zeroed and we're allowed to
- * keep the target sparse. */
-if (s->compressed) {
-if (s->has_zero_init && s->min_sparse &&
-buffer_is_zero(buf, n * BDRV_SECTOR_SIZE))
-{
-assert(!s->target_has_backing);
-break;
-}
-
-iov.iov_base = buf;
-iov.iov_len = n << BDRV_SECTOR_BITS;
-qemu_iovec_init_external(, , 1);
-
-ret = blk_co_pwritev(s->target, sector_num << BDRV_SECTOR_BITS,
- n << BDRV_SECTOR_BITS, ,
- BDRV_REQ_WRITE_COMPRESSED);
-if (ret < 0) {
-return ret;
-}
-break;
-}
-
-/* If there is real non-zero data or we're told to keep the target
- * fully allocated (-S 0), we must write it. Otherwise we can treat
- * it as zero sectors. */
+/* If we're told to keep the target fully allocated (-S 0) or there
+ * is real non-zero data, we must write it. Otherwise we can treat
+ * it as zero sectors.
+ * Compressed clusters need to be written as a whole, so in that
+ * case we can only save the write if the buffer is completely
+ * zeroed. */
 if (!s->min_sparse ||
-is_allocated_sectors_min(buf, n, , s->min_sparse))
+(!s->compressed &&
+ is_allocated_sectors_min(buf, n, , s->min_sparse)) ||
+(s->compressed &&
+ !buffer_is_zero(buf, n * BDRV_SECTOR_SIZE)))
 {
 iov.iov_base = buf;
 iov.iov_len = n << BDRV_SECTOR_BITS;
 qemu_iovec_init_external(, , 1);
 
 ret = blk_co_pwritev(s->target, sector_num << BDRV_SECTOR_BITS,
- n << BDRV_SECTOR_BITS, , 0);
+ n << BDRV_SECTOR_BITS, , flags);
 if (ret < 0) {
 return ret;
 }
@@ -1704,6 +1687,7 @@ static int coroutine_fn convert_co_write(ImgConvertState 
*s, int64_t sector_num,
 
 case BLK_ZERO:
 if (s->has_zero_init) {
+assert(!s->target_has_backing);
 break;
 }
 ret = blk_co_pwrite_zeroes(s->target,
-- 
1.8.3.1

[Qemu-devel] [PULL 24/34] block: Add errp to b{lk,drv}_truncate()

2017-04-28 Thread Kevin Wolf

From: Max Reitz 

For one thing, this allows us to drop the error message generation from
qemu-img.c and blockdev.c and instead have it unified in
bdrv_truncate().

Signed-off-by: Max Reitz 
Message-id: 20170328205129.15138-3-mre...@redhat.com
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Max Reitz 
---
 block.c| 16 
 block/blkdebug.c   |  2 +-
 block/block-backend.c  |  5 +++--
 block/commit.c |  5 +++--
 block/crypto.c |  2 +-
 block/mirror.c |  2 +-
 block/parallels.c  | 13 -
 block/qcow.c   |  6 +++---
 block/qcow2-refcount.c |  5 -
 block/qcow2.c  | 14 +-
 block/qed.c|  2 +-
 block/raw-format.c |  2 +-
 block/vdi.c|  4 ++--
 block/vhdx-log.c   |  2 +-
 block/vhdx.c   | 10 +++---
 block/vmdk.c   | 13 +++--
 block/vpc.c| 13 +++--
 blockdev.c | 21 +
 include/block/block.h  |  2 +-
 include/sysemu/block-backend.h |  2 +-
 qemu-img.c | 17 -
 qemu-io-cmds.c |  5 +++--
 22 files changed, 73 insertions(+), 90 deletions(-)

diff --git a/block.c b/block.c
index 7b557f3..6a1937e 100644
--- a/block.c
+++ b/block.c
@@ -3307,7 +3307,7 @@ exit:
 /**
  * Truncate file to 'offset' bytes (needed only for file protocols)
  */
-int bdrv_truncate(BdrvChild *child, int64_t offset)
+int bdrv_truncate(BdrvChild *child, int64_t offset, Error **errp)
 {
 BlockDriverState *bs = child->bs;
 BlockDriver *drv = bs->drv;
@@ -3315,12 +3315,18 @@ int bdrv_truncate(BdrvChild *child, int64_t offset)
 
 assert(child->perm & BLK_PERM_RESIZE);
 
-if (!drv)
+if (!drv) {
+error_setg(errp, "No medium inserted");
 return -ENOMEDIUM;
-if (!drv->bdrv_truncate)
+}
+if (!drv->bdrv_truncate) {
+error_setg(errp, "Image format driver does not support resize");
 return -ENOTSUP;
-if (bs->read_only)
+}
+if (bs->read_only) {
+error_setg(errp, "Image is read-only");
 return -EACCES;
+}
 
 ret = drv->bdrv_truncate(bs, offset);
 if (ret == 0) {
@@ -3328,6 +3334,8 @@ int bdrv_truncate(BdrvChild *child, int64_t offset)
 bdrv_dirty_bitmap_truncate(bs);
 bdrv_parent_cb_resize(bs);
 ++bs->write_gen;
+} else {
+error_setg_errno(errp, -ret, "Failed to resize image");
 }
 return ret;
 }
diff --git a/block/blkdebug.c b/block/blkdebug.c
index cc4a146..9bd066e 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -661,7 +661,7 @@ static int64_t blkdebug_getlength(BlockDriverState *bs)
 
 static int blkdebug_truncate(BlockDriverState *bs, int64_t offset)
 {
-return bdrv_truncate(bs->file, offset);
+return bdrv_truncate(bs->file, offset, NULL);
 }
 
 static void blkdebug_refresh_filename(BlockDriverState *bs, QDict *options)
diff --git a/block/block-backend.c b/block/block-backend.c
index ae3d771..f5bf13e 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1746,13 +1746,14 @@ int blk_pwrite_compressed(BlockBackend *blk, int64_t 
offset, const void *buf,
BDRV_REQ_WRITE_COMPRESSED);
 }
 
-int blk_truncate(BlockBackend *blk, int64_t offset)
+int blk_truncate(BlockBackend *blk, int64_t offset, Error **errp)
 {
 if (!blk_is_available(blk)) {
+error_setg(errp, "No medium inserted");
 return -ENOMEDIUM;
 }
 
-return bdrv_truncate(blk->root, offset);
+return bdrv_truncate(blk->root, offset, errp);
 }
 
 static void blk_pdiscard_entry(void *opaque)
diff --git a/block/commit.c b/block/commit.c
index 91d2c34..76a0d98 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -151,7 +151,7 @@ static void coroutine_fn commit_run(void *opaque)
 }
 
 if (base_len < s->common.len) {
-ret = blk_truncate(s->base, s->common.len);
+ret = blk_truncate(s->base, s->common.len, NULL);
 if (ret) {
 goto out;
 }
@@ -511,8 +511,9 @@ int bdrv_commit(BlockDriverState *bs)
  * grow the backing file image if possible.  If not possible,
  * we must return an error */
 if (length > backing_length) {
-ret = blk_truncate(backing, length);
+ret = blk_truncate(backing, length, _err);
 if (ret < 0) {
+error_report_err(local_err);
 goto ro_cleanup;
 }
 }
diff --git a/block/crypto.c b/block/crypto.c
index 34549b2..7ad64b5 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -389,7 +389,7 @@ static int block_crypto_truncate(BlockDriverState *bs, 
int64_t offset)
 
 offset += payload_offset;
 
-return bdrv_truncate(bs->file, offset);
+return bdrv_truncate(bs->file,

[Qemu-devel] [PULL 25/34] block: Add errp to BD.bdrv_truncate()

2017-04-28 Thread Kevin Wolf

From: Max Reitz 

Add an Error parameter to the block drivers' bdrv_truncate() interface.
If a block driver does not set this in case of an error, the generic
bdrv_truncate() implementation will do so.

Where it is obvious, this patch also makes some block drivers set this
value.

Signed-off-by: Max Reitz 
Message-id: 20170328205129.15138-4-mre...@redhat.com
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Max Reitz 
---
 block.c   |  4 ++--
 block/blkdebug.c  |  4 ++--
 block/crypto.c|  5 +++--
 block/file-posix.c|  2 +-
 block/file-win32.c|  6 +++---
 block/gluster.c   |  3 ++-
 block/iscsi.c |  4 ++--
 block/nfs.c   |  2 +-
 block/qcow2.c |  8 
 block/qed.c   |  2 +-
 block/raw-format.c|  4 ++--
 block/rbd.c   |  2 +-
 block/sheepdog.c  | 14 ++
 include/block/block_int.h |  2 +-
 14 files changed, 31 insertions(+), 31 deletions(-)

diff --git a/block.c b/block.c
index 6a1937e..ff232a2 100644
--- a/block.c
+++ b/block.c
@@ -3328,13 +3328,13 @@ int bdrv_truncate(BdrvChild *child, int64_t offset, 
Error **errp)
 return -EACCES;
 }
 
-ret = drv->bdrv_truncate(bs, offset);
+ret = drv->bdrv_truncate(bs, offset, errp);
 if (ret == 0) {
 ret = refresh_total_sectors(bs, offset >> BDRV_SECTOR_BITS);
 bdrv_dirty_bitmap_truncate(bs);
 bdrv_parent_cb_resize(bs);
 ++bs->write_gen;
-} else {
+} else if (errp && !*errp) {
 error_setg_errno(errp, -ret, "Failed to resize image");
 }
 return ret;
diff --git a/block/blkdebug.c b/block/blkdebug.c
index 9bd066e..d2a7561 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -659,9 +659,9 @@ static int64_t blkdebug_getlength(BlockDriverState *bs)
 return bdrv_getlength(bs->file->bs);
 }
 
-static int blkdebug_truncate(BlockDriverState *bs, int64_t offset)
+static int blkdebug_truncate(BlockDriverState *bs, int64_t offset, Error 
**errp)
 {
-return bdrv_truncate(bs->file, offset, NULL);
+return bdrv_truncate(bs->file, offset, errp);
 }
 
 static void blkdebug_refresh_filename(BlockDriverState *bs, QDict *options)
diff --git a/block/crypto.c b/block/crypto.c
index 7ad64b5..6828180 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -381,7 +381,8 @@ static int block_crypto_create_generic(QCryptoBlockFormat 
format,
 return ret;
 }
 
-static int block_crypto_truncate(BlockDriverState *bs, int64_t offset)
+static int block_crypto_truncate(BlockDriverState *bs, int64_t offset,
+ Error **errp)
 {
 BlockCrypto *crypto = bs->opaque;
 size_t payload_offset =
@@ -389,7 +390,7 @@ static int block_crypto_truncate(BlockDriverState *bs, 
int64_t offset)
 
 offset += payload_offset;
 
-return bdrv_truncate(bs->file, offset, NULL);
+return bdrv_truncate(bs->file, offset, errp);
 }
 
 static void block_crypto_close(BlockDriverState *bs)
diff --git a/block/file-posix.c b/block/file-posix.c
index ade71db..9c0d701 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1407,7 +1407,7 @@ static void raw_close(BlockDriverState *bs)
 }
 }
 
-static int raw_truncate(BlockDriverState *bs, int64_t offset)
+static int raw_truncate(BlockDriverState *bs, int64_t offset, Error **errp)
 {
 BDRVRawState *s = bs->opaque;
 struct stat st;
diff --git a/block/file-win32.c b/block/file-win32.c
index e132ba1..7872e00 100644
--- a/block/file-win32.c
+++ b/block/file-win32.c
@@ -460,7 +460,7 @@ static void raw_close(BlockDriverState *bs)
 }
 }
 
-static int raw_truncate(BlockDriverState *bs, int64_t offset)
+static int raw_truncate(BlockDriverState *bs, int64_t offset, Error **errp)
 {
 BDRVRawState *s = bs->opaque;
 LONG low, high;
@@ -475,11 +475,11 @@ static int raw_truncate(BlockDriverState *bs, int64_t 
offset)
  */
 dwPtrLow = SetFilePointer(s->hfile, low, , FILE_BEGIN);
 if (dwPtrLow == INVALID_SET_FILE_POINTER && GetLastError() != NO_ERROR) {
-fprintf(stderr, "SetFilePointer error: %lu\n", GetLastError());
+error_setg_win32(errp, GetLastError(), "SetFilePointer error");
 return -EIO;
 }
 if (SetEndOfFile(s->hfile) == 0) {
-fprintf(stderr, "SetEndOfFile error: %lu\n", GetLastError());
+error_setg_win32(errp, GetLastError(), "SetEndOfFile error");
 return -EIO;
 }
 return 0;
diff --git a/block/gluster.c b/block/gluster.c
index cf29b5f..be45c08 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -1092,7 +1092,8 @@ static coroutine_fn int 
qemu_gluster_co_rw(BlockDriverState *bs,
 return acb.ret;
 }
 
-static int qemu_gluster_truncate(BlockDriverState *bs, int64_t offset)
+static int qemu_gluster_truncate(BlockDriverState *bs, int64_t offset,
+ Error **errp)
 {
 int ret;
 BDRVGlusterState *s

[Qemu-devel] [PULL 27/34] qcow2: Allow discard of final unaligned cluster

2017-04-28 Thread Kevin Wolf

From: Eric Blake 

As mentioned in commit 0c1bd46, we ignored requests to
discard the trailing cluster of an unaligned image.  While
discard is an advisory operation from the guest standpoint,
(and we are therefore free to ignore any request), our
qcow2 implementation exploits the fact that a discarded
cluster reads back as 0.  As long as we discard on cluster
boundaries, we are fine; but that means we could observe
non-zero data leaked at the tail of an unaligned image.

Enhance iotest 66 to cover this case, and fix the implementation
to honor a discard request on the final partial cluster.

Signed-off-by: Eric Blake 
Message-id: 20170407013709.18440-1-ebl...@redhat.com
Signed-off-by: Max Reitz 
---
 block/qcow2.c  |  7 ++-
 tests/qemu-iotests/066 | 12 +++-
 tests/qemu-iotests/066.out | 12 
 3 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 4ca4cf0..5c1573c 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2515,7 +2515,12 @@ static coroutine_fn int 
qcow2_co_pdiscard(BlockDriverState *bs,
 
 if (!QEMU_IS_ALIGNED(offset | count, s->cluster_size)) {
 assert(count < s->cluster_size);
-return -ENOTSUP;
+/* Ignore partial clusters, except for the special case of the
+ * complete partial cluster at the end of an unaligned file */
+if (!QEMU_IS_ALIGNED(offset, s->cluster_size) ||
+offset + count != bs->total_sectors * BDRV_SECTOR_SIZE) {
+return -ENOTSUP;
+}
 }
 
 qemu_co_mutex_lock(>lock);
diff --git a/tests/qemu-iotests/066 b/tests/qemu-iotests/066
index 364166d..c2116a3 100755
--- a/tests/qemu-iotests/066
+++ b/tests/qemu-iotests/066
@@ -42,16 +42,18 @@ _supported_fmt qcow2
 _supported_proto generic
 _supported_os Linux
 
+# Intentionally create an unaligned image
 IMGOPTS="compat=1.1"
-IMG_SIZE=64M
+IMG_SIZE=$((64 * 1024 * 1024 + 512))
 
 echo
-echo "=== Testing snapshotting an image with zero clusters ==="
+echo "=== Testing cluster discards ==="
 echo
 _make_test_img $IMG_SIZE
-# Write some normal clusters, zero them (creating preallocated zero clusters)
-# and discard those
-$QEMU_IO -c "write 0 256k" -c "write -z 0 256k" -c "discard 0 256k" 
"$TEST_IMG" \
+# Write some normal clusters, zero some of them (creating preallocated
+# zero clusters) and discard everything. Everything should now read as 0.
+$QEMU_IO -c "write 0 256k" -c "write -z 0 256k" -c "write 64M 512" \
+-c "discard 0 $IMG_SIZE" -c "read -P 0 0 $IMG_SIZE" "$TEST_IMG" \
  | _filter_qemu_io
 # Check the image (there shouldn't be any leaks)
 _check_test_img
diff --git a/tests/qemu-iotests/066.out b/tests/qemu-iotests/066.out
index 7bc9a10..7c1f31a 100644
--- a/tests/qemu-iotests/066.out
+++ b/tests/qemu-iotests/066.out
@@ -1,13 +1,17 @@
 QA output created by 066
 
-=== Testing snapshotting an image with zero clusters ===
+=== Testing cluster discards ===
 
-Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67109376
 wrote 262144/262144 bytes at offset 0
 256 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 wrote 262144/262144 bytes at offset 0
 256 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-discard 262144/262144 bytes at offset 0
-256 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 512/512 bytes at offset 67108864
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+discard 67109376/67109376 bytes at offset 0
+64 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 67109376/67109376 bytes at offset 0
+64 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 No errors were found on the image.
 *** done
-- 
1.8.3.1

[Qemu-devel] [PULL 28/34] block: fix obvious coding style mistakes in block_int.h

2017-04-28 Thread Kevin Wolf

From: Klim Kireev 

Signed-off-by: Klim Kireev 
Signed-off-by: Denis V. Lunev 
CC: Kevin Wolf 
CC: Max Reitz 
Message-id: 1491405505-31620-2-git-send-email-...@openvz.org
Reviewed-by: Eric Blake 
Signed-off-by: Max Reitz 
---
 include/block/block_int.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index dcde90a..8773940 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -252,7 +252,7 @@ struct BlockDriver {
  * Returns 0 for completed check, -errno for internal errors.
  * The check results are stored in result.
  */
-int (*bdrv_check)(BlockDriverState* bs, BdrvCheckResult *result,
+int (*bdrv_check)(BlockDriverState *bs, BdrvCheckResult *result,
 BdrvCheckMode fix);
 
 int (*bdrv_amend_options)(BlockDriverState *bs, QemuOpts *opts,
@@ -454,13 +454,13 @@ struct BdrvChildRole {
 /* Returns a name that is supposedly more useful for human users than the
  * node name for identifying the node in question (in particular, a BB
  * name), or NULL if the parent can't provide a better name. */
-const char* (*get_name)(BdrvChild *child);
+const char *(*get_name)(BdrvChild *child);
 
 /* Returns a malloced string that describes the parent of the child for a
  * human reader. This could be a node-name, BlockBackend name, qdev ID or
  * QOM path of the device owning the BlockBackend, job type and ID etc. The
  * caller is responsible for freeing the memory. */
-char* (*get_parent_desc)(BdrvChild *child);
+char *(*get_parent_desc)(BdrvChild *child);
 
 /*
  * If this pair of functions is implemented, the parent doesn't issue new
-- 
1.8.3.1

[Qemu-devel] [PULL 29/34] block: assert no image modification under BDRV_O_INACTIVE

2017-04-28 Thread Kevin Wolf

From: "Denis V. Lunev" 

As long as BDRV_O_INACTIVE is set, the image file is only opened so we
have a file descriptor for it. We're definitely not supposed to modify
the image, it's still owned by the migration source.

This commit is an addition to 09e0c771 but the assert() is added to
bdrv_truncate().

Signed-off-by: Denis V. Lunev 
CC: Kevin Wolf 
CC: Max Reitz 
Message-id: 1491405505-31620-3-git-send-email-...@openvz.org
Reviewed-by: Eric Blake 
Signed-off-by: Max Reitz 
---
 block.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/block.c b/block.c
index 76bf00f..6c6bb3e 100644
--- a/block.c
+++ b/block.c
@@ -3328,6 +3328,8 @@ int bdrv_truncate(BdrvChild *child, int64_t offset, Error 
**errp)
 return -EACCES;
 }
 
+assert(!(bs->open_flags & BDRV_O_INACTIVE));
+
 ret = drv->bdrv_truncate(bs, offset, errp);
 if (ret == 0) {
 ret = refresh_total_sectors(bs, offset >> BDRV_SECTOR_BITS);
-- 
1.8.3.1

[Qemu-devel] [PULL 26/34] block: Add .bdrv_truncate() error messages

2017-04-28 Thread Kevin Wolf

From: Max Reitz 

Add missing error messages for the block driver implementations of
.bdrv_truncate(); drop the generic one from block.c's bdrv_truncate().

Since one of these changes touches a mis-indented block in
block/file-posix.c, this patch fixes that coding style issue along the
way.

Signed-off-by: Max Reitz 
Message-id: 20170328205129.15138-5-mre...@redhat.com
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Max Reitz 
---
 block.c|  2 --
 block/file-posix.c | 17 -
 block/gluster.c|  4 +++-
 block/iscsi.c  |  2 ++
 block/nfs.c| 10 +-
 block/qcow2.c  |  2 ++
 block/qed.c|  4 +++-
 block/raw-format.c |  2 ++
 block/rbd.c|  1 +
 9 files changed, 34 insertions(+), 10 deletions(-)

diff --git a/block.c b/block.c
index ff232a2..76bf00f 100644
--- a/block.c
+++ b/block.c
@@ -3334,8 +3334,6 @@ int bdrv_truncate(BdrvChild *child, int64_t offset, Error 
**errp)
 bdrv_dirty_bitmap_truncate(bs);
 bdrv_parent_cb_resize(bs);
 ++bs->write_gen;
-} else if (errp && !*errp) {
-error_setg_errno(errp, -ret, "Failed to resize image");
 }
 return ret;
 }
diff --git a/block/file-posix.c b/block/file-posix.c
index 9c0d701..1941fb6 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1411,20 +1411,27 @@ static int raw_truncate(BlockDriverState *bs, int64_t 
offset, Error **errp)
 {
 BDRVRawState *s = bs->opaque;
 struct stat st;
+int ret;
 
 if (fstat(s->fd, )) {
-return -errno;
+ret = -errno;
+error_setg_errno(errp, -ret, "Failed to fstat() the file");
+return ret;
 }
 
 if (S_ISREG(st.st_mode)) {
 if (ftruncate(s->fd, offset) < 0) {
-return -errno;
+ret = -errno;
+error_setg_errno(errp, -ret, "Failed to resize the file");
+return ret;
 }
 } else if (S_ISCHR(st.st_mode) || S_ISBLK(st.st_mode)) {
-   if (offset > raw_getlength(bs)) {
-   return -EINVAL;
-   }
+if (offset > raw_getlength(bs)) {
+error_setg(errp, "Cannot grow device files");
+return -EINVAL;
+}
 } else {
+error_setg(errp, "Resizing this file is not supported");
 return -ENOTSUP;
 }
 
diff --git a/block/gluster.c b/block/gluster.c
index be45c08..1d4e2f7 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -1100,7 +1100,9 @@ static int qemu_gluster_truncate(BlockDriverState *bs, 
int64_t offset,
 
 ret = glfs_ftruncate(s->fd, offset);
 if (ret < 0) {
-return -errno;
+ret = -errno;
+error_setg_errno(errp, -ret, "Failed to truncate file");
+return ret;
 }
 
 return 0;
diff --git a/block/iscsi.c b/block/iscsi.c
index 1ef38cf..5daa201 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -2065,6 +2065,7 @@ static int iscsi_truncate(BlockDriverState *bs, int64_t 
offset, Error **errp)
 Error *local_err = NULL;
 
 if (iscsilun->type != TYPE_DISK) {
+error_setg(errp, "Cannot resize non-disk iSCSI devices");
 return -ENOTSUP;
 }
 
@@ -2075,6 +2076,7 @@ static int iscsi_truncate(BlockDriverState *bs, int64_t 
offset, Error **errp)
 }
 
 if (offset > iscsi_getlength(bs)) {
+error_setg(errp, "Cannot grow iSCSI devices");
 return -EINVAL;
 }
 
diff --git a/block/nfs.c b/block/nfs.c
index 5ae665a..76572ae 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -767,7 +767,15 @@ static int64_t 
nfs_get_allocated_file_size(BlockDriverState *bs)
 static int nfs_file_truncate(BlockDriverState *bs, int64_t offset, Error 
**errp)
 {
 NFSClient *client = bs->opaque;
-return nfs_ftruncate(client->context, client->fh, offset);
+int ret;
+
+ret = nfs_ftruncate(client->context, client->fh, offset);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "Failed to truncate file");
+return ret;
+}
+
+return 0;
 }
 
 /* Note that this will not re-establish a connection with the NFS server
diff --git a/block/qcow2.c b/block/qcow2.c
index 6c34798..4ca4cf0 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2551,6 +2551,7 @@ static int qcow2_truncate(BlockDriverState *bs, int64_t 
offset, Error **errp)
 new_l1_size = size_to_l1(s, offset);
 ret = qcow2_grow_l1_table(bs, new_l1_size, true);
 if (ret < 0) {
+error_setg_errno(errp, -ret, "Failed to grow the L1 table");
 return ret;
 }
 
@@ -2559,6 +2560,7 @@ static int qcow2_truncate(BlockDriverState *bs, int64_t 
offset, Error **errp)
 ret = bdrv_pwrite_sync(bs->file, offsetof(QCowHeader, size),
, sizeof(uint64_t));
 if (ret < 0) {
+error_setg_errno(errp, -ret, "Failed to update the image size");
 return ret;
 }
 
diff --git a/block/qed.c b/block/qed.c
index fa2aeee..fd76817 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -1526,11

[Qemu-devel] [PULL 19/34] block: fix alignment calculations in bdrv_co_do_zero_pwritev

2017-04-28 Thread Kevin Wolf

From: "Denis V. Lunev" 

tail_padding_bytes is calculated wrong. F.e. for
offset = 0
bytes = 2048
align = 512
we will have tail_padding_bytes = 512 which is definitely wrong. The patch
fixes that arithmetics.

Fortunately this problem is harmless, we will have 1 extra allocation and
free thus there is no need to put this into stable. The problem is here
from the very beginning.

Signed-off-by: Denis V. Lunev 
CC: Stefan Hajnoczi 
CC: Fam Zheng 
Reviewed-by: Eric Blake 
Signed-off-by: Kevin Wolf 
---
 block/io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/io.c b/block/io.c
index 80d6c62..40bd94f 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1444,7 +1444,7 @@ static int coroutine_fn bdrv_co_do_zero_pwritev(BdrvChild 
*child,
 int ret = 0;
 
 head_padding_bytes = offset & (align - 1);
-tail_padding_bytes = align - ((offset + bytes) & (align - 1));
+tail_padding_bytes = (align - (offset + bytes)) & (align - 1);
 
 
 assert(flags & BDRV_REQ_ZERO_WRITE);
-- 
1.8.3.1

[Qemu-devel] [PULL 07/34] qemu-iotests: Filter HMP readline escape characters

2017-04-28 Thread Kevin Wolf

The only thing the escape characters achieve is making the reference
output unreadable and lines that are potentially so long that git
doesn't want to put them into an email any more. Let's filter them out.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
---
 tests/qemu-iotests/028.out   |   2 +-
 tests/qemu-iotests/051   |   3 +-
 tests/qemu-iotests/051.out   | 106 +++
 tests/qemu-iotests/051.pc.out| 132 +++
 tests/qemu-iotests/068   |   4 +-
 tests/qemu-iotests/068.out   |   6 +-
 tests/qemu-iotests/130.out   |   4 +-
 tests/qemu-iotests/142   |   2 +-
 tests/qemu-iotests/142.out   |  10 +--
 tests/qemu-iotests/145   |   3 +-
 tests/qemu-iotests/145.out   |   2 +-
 tests/qemu-iotests/common.filter |   7 +++
 tests/qemu-iotests/common.qemu   |   4 +-
 13 files changed, 147 insertions(+), 138 deletions(-)

diff --git a/tests/qemu-iotests/028.out b/tests/qemu-iotests/028.out
index acd2870..7d54aeb 100644
--- a/tests/qemu-iotests/028.out
+++ b/tests/qemu-iotests/028.out
@@ -469,7 +469,7 @@ No errors were found on the image.
 block-backup
 
 Formatting 'TEST_DIR/t.IMGFMT.copy', fmt=IMGFMT size=4294968832 
backing_file=TEST_DIR/t.IMGFMT.base backing_fmt=IMGFMT
-(qemu) i[K[Din[K[D[Dinf[K[D[D[Dinfo[K[D[D[D[Dinfo 
[K[D[D[D[D[Dinfo b[K[D[D[D[D[D[Dinfo 
bl[K[D[D[D[D[D[D[Dinfo blo[K[D[D[D[D[D[D[D[Dinfo 
bloc[K[D[D[D[D[D[D[D[D[Dinfo 
block[K[D[D[D[D[D[D[D[D[D[Dinfo 
block-[K[D[D[D[D[D[D[D[D[D[D[Dinfo 
block-j[K[D[D[D[D[D[D[D[D[D[D[D[Dinfo 
block-jo[K[D[D[D[D[D[D[D[D[D[D[D[D[Dinfo 
block-job[K[D[D[D[D[D[D[D[D[D[D[D[D[D[Dinfo block-jobs[K
+(qemu) info block-jobs
 No active jobs
 === IO: pattern 195
 read 512/512 bytes at offset 3221194240
diff --git a/tests/qemu-iotests/051 b/tests/qemu-iotests/051
index 630cb7a..7c78a02 100755
--- a/tests/qemu-iotests/051
+++ b/tests/qemu-iotests/051
@@ -60,7 +60,8 @@ function do_run_qemu()
 
 function run_qemu()
 {
-do_run_qemu "$@" 2>&1 | _filter_testdir | _filter_qemu | 
_filter_generated_node_ids
+do_run_qemu "$@" 2>&1 | _filter_testdir | _filter_qemu |
+_filter_generated_node_ids | _filter_hmp
 }
 
 size=128M
diff --git a/tests/qemu-iotests/051.out b/tests/qemu-iotests/051.out
index 7524c62..3b7b040 100644
--- a/tests/qemu-iotests/051.out
+++ b/tests/qemu-iotests/051.out
@@ -58,12 +58,12 @@ QEMU X.Y.Z monitor - type 'help' for more information
 
 Testing: -drive 
file=TEST_DIR/t.qcow2,driver=qcow2,backing.file.filename=TEST_DIR/t.qcow2.orig,if=none,id=drive0
 -nodefaults
 QEMU X.Y.Z monitor - type 'help' for more information
-(qemu) i[K[Din[K[D[Dinf[K[D[D[Dinfo[K[D[D[D[Dinfo 
[K[D[D[D[D[Dinfo b[K[D[D[D[D[D[Dinfo 
bl[K[D[D[D[D[D[D[Dinfo blo[K[D[D[D[D[D[D[D[Dinfo 
bloc[K[D[D[D[D[D[D[D[D[Dinfo block[K
+(qemu) info block
 drive0 (NODE_NAME): TEST_DIR/t.qcow2 (qcow2)
 Removable device: not locked, tray closed
 Cache mode:   writeback
 Backing file: TEST_DIR/t.qcow2.orig (chain depth: 1)
-(qemu) q[K[Dqu[K[D[Dqui[K[D[D[Dquit[K
+(qemu) quit
 
 Testing: -drive 
file=TEST_DIR/t.qcow2,driver=raw,backing.file.filename=TEST_DIR/t.qcow2.orig
 QEMU_PROG: -drive 
file=TEST_DIR/t.qcow2,driver=raw,backing.file.filename=TEST_DIR/t.qcow2.orig: 
Driver doesn't support backing files
@@ -79,11 +79,11 @@ QEMU_PROG: -drive 
file=TEST_DIR/t.qcow2,file.backing.driver=qcow2,file.backing.f
 
 Testing: -drive file=TEST_DIR/t.qcow2,format=qcow2,lazy-refcounts=on
 QEMU X.Y.Z monitor - type 'help' for more information
-(qemu) q[K[Dqu[K[D[Dqui[K[D[D[Dquit[K
+(qemu) quit
 
 Testing: -drive file=TEST_DIR/t.qcow2,format=qcow2,lazy-refcounts=off
 QEMU X.Y.Z monitor - type 'help' for more information
-(qemu) q[K[Dqu[K[D[Dqui[K[D[D[Dquit[K
+(qemu) quit
 
 Testing: -drive file=TEST_DIR/t.qcow2,format=qcow2,lazy-refcounts=
 QEMU_PROG: -drive file=TEST_DIR/t.qcow2,format=qcow2,lazy-refcounts=: 
Parameter 'lazy-refcounts' expects 'on' or 'off'
@@ -103,7 +103,7 @@ QEMU_PROG: -drive 
file=TEST_DIR/t.qcow2,format=qcow2,lazy-refcounts=on: Lazy ref
 
 Testing: -drive file=TEST_DIR/t.qcow2,format=qcow2,lazy-refcounts=off
 QEMU X.Y.Z monitor - type 'help' for more information
-(qemu) q[K[Dqu[K[D[Dqui[K[D[D[Dquit[K
+(qemu) quit
 
 
 === No medium ===
@@ -117,93 +117,93 @@ QEMU X.Y.Z monitor - type 'help' for more information
 
 Testing: -drive file=TEST_DIR/t.qcow2,if=virtio,readonly=on
 QEMU X.Y.Z monitor - type 'help' for more information
-(qemu) q[K[Dqu[K[D[Dqui[K[D[D[Dquit[K
+(qemu) quit
 
 
 === Cache modes ===
 
 Testing: -drive driver=null-co,cache=none
 QEMU X.Y.Z monitor - type 'help' for more information
-(qemu) q[K[Dqu[K[D[Dqui[K[D[D[Dquit[K
+(qemu) quit
 
 Testing: -drive

[Qemu-devel] [PULL 32/34] iotests: clarify help text

2017-04-28 Thread Kevin Wolf

From: John Snow 

Split the help text to highlight the groups of options
a little better, carving out a clear "format" and
"protocols" section.

Signed-off-by: John Snow 
Message-id: 20170427205100.9505-2-js...@redhat.com
Reviewed-by: Eric Blake 
Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/common | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/common b/tests/qemu-iotests/common
index 9c6f972..fa8e69e 100644
--- a/tests/qemu-iotests/common
+++ b/tests/qemu-iotests/common
@@ -138,7 +138,7 @@ common options
 -v  verbose
 -d  debug
 
-check options
+image format options
 -rawtest raw (default)
 -bochs  test bochs
 -cloop  test cloop
@@ -150,14 +150,18 @@ check options
 -vpctest vpc
 -vhdx   test vhdx
 -vmdk   test vmdk
+-luks   test luks
+
+image protocol options
 -file   test file (default)
 -rbdtest rbd
 -sheepdog   test sheepdog
 -nbdtest nbd
 -sshtest ssh
 -nfstest nfs
--luks   test luks
 -vxhs   test vxhs
+
+other options
 -xdiff  graphical mode diff
 -nocacheuse O_DIRECT on backing file
 -misalign   misalign memory allocations
-- 
1.8.3.1

[Qemu-devel] [PULL 17/34] iotests: 109: Filter out "len" of failed jobs

2017-04-28 Thread Kevin Wolf

From: Fam Zheng 

Mirror calculates job len from current I/O progress:

s->common.len = s->common.offset +
(cnt + s->sectors_in_flight) * BDRV_SECTOR_SIZE;

The final "len" of a failed mirror job in iotests 109 depends on the
subtle timing of the completion of read and write issued in the first
mirror iteration.  The second iteration may or may not have run when the
I/O error happens, resulting in non-deterministic output of the
BLOCK_JOB_COMPLETED event text.

Similar to what was done in a752e4786, filter out the field to make the
test robust.

Signed-off-by: Fam Zheng 
Reviewed-by: Eric Blake 
Tested-by: Eric Blake 
Signed-off-by: Kevin Wolf 
---
 tests/qemu-iotests/109   |  6 --
 tests/qemu-iotests/109.out   | 20 ++--
 tests/qemu-iotests/common.filter |  6 ++
 3 files changed, 20 insertions(+), 12 deletions(-)

diff --git a/tests/qemu-iotests/109 b/tests/qemu-iotests/109
index 6161633..3b496a3 100755
--- a/tests/qemu-iotests/109
+++ b/tests/qemu-iotests/109
@@ -81,7 +81,8 @@ for fmt in qcow qcow2 qed vdi vmdk vpc; do
 
 # This first test should fail: The image format was probed, we may not
 # write an image header at the start of the image
-run_qemu "$TEST_IMG" "$TEST_IMG.src" "" "BLOCK_JOB_ERROR"
+run_qemu "$TEST_IMG" "$TEST_IMG.src" "" "BLOCK_JOB_ERROR" |
+_filter_block_job_len
 $QEMU_IO -c 'read -P 0 0 64k' "$TEST_IMG" | _filter_qemu_io
 
 
@@ -104,7 +105,8 @@ for sample_img in empty.bochs iotest-dirtylog-10G-4M.vhdx 
parallels-v1 \
 _make_test_img 64M
 bzcat "$SAMPLE_IMG_DIR/$sample_img.bz2" > "$TEST_IMG.src"
 
-run_qemu "$TEST_IMG" "$TEST_IMG.src" "" "BLOCK_JOB_ERROR" | 
_filter_block_job_offset
+run_qemu "$TEST_IMG" "$TEST_IMG.src" "" "BLOCK_JOB_ERROR" |
+_filter_block_job_offset | _filter_block_job_len
 $QEMU_IO -c 'read -P 0 0 64k' "$TEST_IMG" | _filter_qemu_io
 
 run_qemu "$TEST_IMG" "$TEST_IMG.src" "'format': 'raw'," "BLOCK_JOB_READY"
diff --git a/tests/qemu-iotests/109.out b/tests/qemu-iotests/109.out
index 55fe536..dc02f9e 100644
--- a/tests/qemu-iotests/109.out
+++ b/tests/qemu-iotests/109.out
@@ -10,7 +10,7 @@ Automatically detecting the format is dangerous for raw 
images, write operations
 Specify the 'raw' format explicitly to remove the restrictions.
 {"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"BLOCK_JOB_ERROR", "data": {"device": "src", "operation": "write", "action": 
"report"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"BLOCK_JOB_COMPLETED", "data": {"device": "src", "len": 65536, "offset": 0, 
"speed": 0, "type": "mirror", "error": "Operation not permitted"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"BLOCK_JOB_COMPLETED", "data": {"device": "src", "len": LEN, "offset": 0, 
"speed": 0, "type": "mirror", "error": "Operation not permitted"}}
 {"return": []}
 read 65536/65536 bytes at offset 0
 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
@@ -31,7 +31,7 @@ Automatically detecting the format is dangerous for raw 
images, write operations
 Specify the 'raw' format explicitly to remove the restrictions.
 {"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"BLOCK_JOB_ERROR", "data": {"device": "src", "operation": "write", "action": 
"report"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"BLOCK_JOB_COMPLETED", "data": {"device": "src", "len": 197120, "offset": 512, 
"speed": 0, "type": "mirror", "error": "Operation not permitted"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"BLOCK_JOB_COMPLETED", "data": {"device": "src", "len": LEN, "offset": 512, 
"speed": 0, "type": "mirror", "error": "Operation not permitted"}}
 {"return": []}
 read 65536/65536 bytes at offset 0
 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
@@ -52,7 +52,7 @@ Automatically detecting the format is dangerous for raw 
images, write operations
 Specify the 'raw' format explicitly to remove the restrictions.
 {"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"BLOCK_JOB_ERROR", "data": {"device": "src", "operation": "write", "action": 
"report"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"BLOCK_JOB_COMPLETED", "data": {"device": "src", "len": 327680, "offset": 
262144, "speed": 0, "type": "mirror", "error": "Operation not permitted"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"BLOCK_JOB_COMPLETED", "data": {"device": "src", "len": LEN, "offset": 262144, 
"speed": 0, "type": "mirror", "error": "Operation not permitted"}}
 {"return": []}
 read 65536/65536 bytes at offset 0
 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
@@ -73,7 +73,7 @@

[Qemu-devel] [PULL 15/34] Issue a deprecation warning if the user specifies the "-hdachs" option.

2017-04-28 Thread Kevin Wolf

From: Thomas Huth 

If the user needs to specify the disk geometry, the corresponding
parameters of the "-device ide-hd" option should be used instead.
"-hdachs" is considered as deprecated and might be removed soon.

Signed-off-by: Thomas Huth 
Reviewed-by: Eric Blake 
Signed-off-by: Kevin Wolf 
---
 qemu-options.hx | 4 ++--
 vl.c| 2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index 787b9c3..f68829f 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -803,8 +803,8 @@ STEXI
 Force hard disk 0 physical geometry (1 <= @var{c} <= 16383, 1 <=
 @var{h} <= 16, 1 <= @var{s} <= 63) and optionally force the BIOS
 translation mode (@var{t}=none, lba or auto). Usually QEMU can guess
-all those parameters. This option is useful for old MS-DOS disk
-images.
+all those parameters. This option is deprecated, please use
+@code{-device ide-hd,cyls=c,heads=h,secs=s,...} instead.
 ETEXI
 
 DEF("fsdev", HAS_ARG, QEMU_OPTION_fsdev,
diff --git a/vl.c b/vl.c
index f46e070..42d4bce 100644
--- a/vl.c
+++ b/vl.c
@@ -3231,6 +3231,8 @@ int main(int argc, char **argv, char **envp)
 }
 }
 }
+error_report("'-hdachs' is deprecated, please use '-device"
+ " ide-hd,cyls=c,heads=h,secs=s,...' instead");
 break;
 case QEMU_OPTION_numa:
 opts = qemu_opts_parse_noisily(qemu_find_opts("numa"),
-- 
1.8.3.1

[Qemu-devel] [PULL 21/34] qemu-img/convert: Move bs_n > 1 && -B check down

2017-04-28 Thread Kevin Wolf

From: Max Reitz 

It does not make much sense to use a backing image for the target when
you concatenate multiple images (because then there is no correspondence
between the source images' backing files and the target's); but it was
still possible to give one by using -o backing_file=X instead of -B X.

Fix this by moving the check.

(Also, change the error message because -B is not the only way to
 specify the backing file, evidently.)

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
Signed-off-by: Kevin Wolf 
---
 qemu-img.c | 13 +++--
 tests/qemu-iotests/122.out |  4 ++--
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index e0503ae..7044884 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -2089,12 +2089,6 @@ static int img_convert(int argc, char **argv)
 }
 
 
-if (s.src_num > 1 && out_baseimg) {
-error_report("-B makes no sense when concatenating multiple input "
- "images");
-goto fail_getopt;
-}
-
 /* ret is still -EINVAL until here */
 ret = bdrv_parse_cache_mode(src_cache, _flags, _writethrough);
 if (ret < 0) {
@@ -2208,6 +2202,13 @@ static int img_convert(int argc, char **argv)
 }
 s.target_has_backing = (bool) out_baseimg;
 
+if (s.src_num > 1 && out_baseimg) {
+error_report("Having a backing file for the target makes no sense when 
"
+ "concatenating multiple input images");
+ret = -1;
+goto out;
+}
+
 /* Check if compression is supported */
 if (s.compressed) {
 bool encryption =
diff --git a/tests/qemu-iotests/122.out b/tests/qemu-iotests/122.out
index 98814de..9317d80 100644
--- a/tests/qemu-iotests/122.out
+++ b/tests/qemu-iotests/122.out
@@ -61,8 +61,8 @@ read 65536/65536 bytes at offset 4194304
 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 read 65536/65536 bytes at offset 8388608
 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-qemu-img: -B makes no sense when concatenating multiple input images
-qemu-img: -B makes no sense when concatenating multiple input images
+qemu-img: Having a backing file for the target makes no sense when 
concatenating multiple input images
+qemu-img: Having a backing file for the target makes no sense when 
concatenating multiple input images
 
 === Compression with misaligned allocations and image sizes ===
 
-- 
1.8.3.1

[Qemu-devel] [PULL 23/34] block/vhdx: Make vhdx_create() always set errp

2017-04-28 Thread Kevin Wolf

From: Max Reitz 

This patch makes vhdx_create() always set errp in case of an error. It
also adds errp parameters to vhdx_create_bat() and
vhdx_create_new_region_table() so we can pass on the error object
generated by blk_truncate() as of a future commit.

Signed-off-by: Max Reitz 
Reviewed-by: Kevin Wolf 
Message-id: 20170328205129.15138-2-mre...@redhat.com
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Max Reitz 
---
 block/vhdx.c | 23 +++
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/block/vhdx.c b/block/vhdx.c
index 052a753..d25bcd9 100644
--- a/block/vhdx.c
+++ b/block/vhdx.c
@@ -1586,7 +1586,7 @@ exit:
 static int vhdx_create_bat(BlockBackend *blk, BDRVVHDXState *s,
uint64_t image_size, VHDXImageType type,
bool use_zero_blocks, uint64_t file_offset,
-   uint32_t length)
+   uint32_t length, Error **errp)
 {
 int ret = 0;
 uint64_t data_file_offset;
@@ -1609,14 +1609,19 @@ static int vhdx_create_bat(BlockBackend *blk, 
BDRVVHDXState *s,
  * is the furthest thing we have written yet */
 ret = blk_truncate(blk, data_file_offset);
 if (ret < 0) {
+error_setg_errno(errp, -ret,
+"Failed to resize the underlying file");
 goto exit;
 }
 } else if (type == VHDX_TYPE_FIXED) {
 ret = blk_truncate(blk, data_file_offset + image_size);
 if (ret < 0) {
+error_setg_errno(errp, -ret,
+"Failed to resize the underlying file");
 goto exit;
 }
 } else {
+error_setg(errp, "Unsupported image type");
 ret = -ENOTSUP;
 goto exit;
 }
@@ -1627,6 +1632,7 @@ static int vhdx_create_bat(BlockBackend *blk, 
BDRVVHDXState *s,
 /* for a fixed file, the default BAT entry is not zero */
 s->bat = g_try_malloc0(length);
 if (length && s->bat == NULL) {
+error_setg(errp, "Failed to allocate memory for the BAT");
 ret = -ENOMEM;
 goto exit;
 }
@@ -1646,6 +1652,7 @@ static int vhdx_create_bat(BlockBackend *blk, 
BDRVVHDXState *s,
 }
 ret = blk_pwrite(blk, file_offset, s->bat, length, 0);
 if (ret < 0) {
+error_setg_errno(errp, -ret, "Failed to write the BAT");
 goto exit;
 }
 }
@@ -1671,7 +1678,8 @@ static int vhdx_create_new_region_table(BlockBackend *blk,
 uint32_t log_size,
 bool use_zero_blocks,
 VHDXImageType type,
-uint64_t *metadata_offset)
+uint64_t *metadata_offset,
+Error **errp)
 {
 int ret = 0;
 uint32_t offset = 0;
@@ -1740,7 +1748,7 @@ static int vhdx_create_new_region_table(BlockBackend *blk,
 /* The region table gives us the data we need to create the BAT,
  * so do that now */
 ret = vhdx_create_bat(blk, s, image_size, type, use_zero_blocks,
-  bat_file_offset, bat_length);
+  bat_file_offset, bat_length, errp);
 if (ret < 0) {
 goto exit;
 }
@@ -1749,12 +1757,14 @@ static int vhdx_create_new_region_table(BlockBackend 
*blk,
 ret = blk_pwrite(blk, VHDX_REGION_TABLE_OFFSET, buffer,
  VHDX_HEADER_BLOCK_SIZE, 0);
 if (ret < 0) {
+error_setg_errno(errp, -ret, "Failed to write first region table");
 goto exit;
 }
 
 ret = blk_pwrite(blk, VHDX_REGION_TABLE2_OFFSET, buffer,
  VHDX_HEADER_BLOCK_SIZE, 0);
 if (ret < 0) {
+error_setg_errno(errp, -ret, "Failed to write second region table");
 goto exit;
 }
 
@@ -1825,6 +1835,7 @@ static int vhdx_create(const char *filename, QemuOpts 
*opts, Error **errp)
 ret = -ENOTSUP;
 goto exit;
 } else {
+error_setg(errp, "Invalid subformat '%s'", type);
 ret = -EINVAL;
 goto exit;
 }
@@ -1879,12 +1890,14 @@ static int vhdx_create(const char *filename, QemuOpts 
*opts, Error **errp)
 ret = blk_pwrite(blk, VHDX_FILE_ID_OFFSET, , sizeof(signature),
  0);
 if (ret < 0) {
+error_setg_errno(errp, -ret, "Failed to write file signature");
 goto delete_and_exit;
 }
 if (creator) {
 ret = blk_pwrite(blk, VHDX_FILE_ID_OFFSET + sizeof(signature),
  creator, creator_items * sizeof(gunichar2), 0);
 if (ret < 0) {
+error_setg_errno(errp, -ret, "Failed to write creator field");
 goto delete_and_exit;
 }
 }
@@ -1893,13 +1906,14 @@ static int vhdx_create(const char

[Qemu-devel] [PULL 18/34] block: Do not unref bs->file on error in BD's open

2017-04-28 Thread Kevin Wolf

From: Max Reitz 

The block layer takes care of removing the bs->file child if the block
driver's bdrv_open()/bdrv_file_open() implementation fails. The block
driver therefore does not need to do so, and indeed should not unless it
sets bs->file to NULL afterwards -- because if this is not done, the
bdrv_unref_child() in bdrv_open_inherit() will dereference the freed
memory block at bs->file afterwards, which is not good.

We can now decide whether to add a "bs->file = NULL;" after each of the
offending bdrv_unref_child() invocations, or just drop them altogether.
The latter is simpler, so let's do that.

Cc: qemu-stable 
Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Kevin Wolf 
---
 block/blkdebug.c  | 4 +---
 block/blkreplay.c | 3 ---
 block/blkverify.c | 3 ---
 3 files changed, 1 insertion(+), 9 deletions(-)

diff --git a/block/blkdebug.c b/block/blkdebug.c
index 67e8024..cc4a146 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -389,14 +389,12 @@ static int blkdebug_open(BlockDriverState *bs, QDict 
*options, int flags,
 } else if (align) {
 error_setg(errp, "Invalid alignment");
 ret = -EINVAL;
-goto fail_unref;
+goto out;
 }
 
 ret = 0;
 goto out;
 
-fail_unref:
-bdrv_unref_child(bs, bs->file);
 out:
 if (ret < 0) {
 g_free(s->config_file);
diff --git a/block/blkreplay.c b/block/blkreplay.c
index e110211..6aa5fd4 100755
--- a/block/blkreplay.c
+++ b/block/blkreplay.c
@@ -37,9 +37,6 @@ static int blkreplay_open(BlockDriverState *bs, QDict 
*options, int flags,
 
 ret = 0;
 fail:
-if (ret < 0) {
-bdrv_unref_child(bs, bs->file);
-}
 return ret;
 }
 
diff --git a/block/blkverify.c b/block/blkverify.c
index 9a1e21c..af23281 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -142,9 +142,6 @@ static int blkverify_open(BlockDriverState *bs, QDict 
*options, int flags,
 
 ret = 0;
 fail:
-if (ret < 0) {
-bdrv_unref_child(bs, bs->file);
-}
 qemu_opts_del(opts);
 return ret;
 }
-- 
1.8.3.1

[Qemu-devel] [PULL 12/34] qemu_iotests: Remove _readlink()

2017-04-28 Thread Kevin Wolf

It is unused.

Suggested-by: Fam Zheng 
Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
Reviewed-by: Eric Blake 
Reviewed-by: Fam Zheng 
---
 tests/qemu-iotests/common.config | 18 --
 1 file changed, 18 deletions(-)

diff --git a/tests/qemu-iotests/common.config b/tests/qemu-iotests/common.config
index 5ee00a1..d1b45f5 100644
--- a/tests/qemu-iotests/common.config
+++ b/tests/qemu-iotests/common.config
@@ -217,23 +217,5 @@ fi
 
 export SAMPLE_IMG_DIR
 
-_readlink()
-{
-if [ $# -ne 1 ]; then
-echo "Usage: _readlink filename" 1>&2
-exit 1
-fi
-
-perl -e "\$in=\"$1\";" -e '
-$lnk = readlink($in);
-if ($lnk =~ m!^/.*!) {
-print "$lnk\n";
-}
-else {
-chomp($dir = `dirname $in`);
-print "$dir/$lnk\n";
-}'
-}
-
 # make sure this script returns success
 true
-- 
1.8.3.1

[Qemu-devel] [PULL 20/34] qemu-img/convert: Use @opts for one thing only

2017-04-28 Thread Kevin Wolf

From: Max Reitz 

After storing the creation options for the new image into @opts, we
fetch some things for our own information, like the backing file name,
or whether to use encryption or preallocation.

With the -n parameter, there will not be any creation options; this is
not too bad because this just means that querying a NULL @opts will
always return the default value.

However, we also use @opts for the --object options. Therefore, @opts is
not necessarily NULL if -n was specified; instead, it may contain those
options. In practice, this probably does not cause any problems because
there most likely is no object that supports any of the parameters we
query here, but this is neither something we should rely on nor does
this variable reuse make the code very nice to read.

Therefore, just use a separate variable for the --object options.

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
Signed-off-by: Kevin Wolf 
---
 qemu-img.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index b94fc11..e0503ae 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -2049,13 +2049,15 @@ static int img_convert(int argc, char **argv)
 case 'W':
 s.wr_in_order = false;
 break;
-case OPTION_OBJECT:
-opts = qemu_opts_parse_noisily(_object_opts,
-   optarg, true);
-if (!opts) {
+case OPTION_OBJECT: {
+QemuOpts *object_opts;
+object_opts = qemu_opts_parse_noisily(_object_opts,
+  optarg, true);
+if (!object_opts) {
 goto fail_getopt;
 }
 break;
+}
 case OPTION_IMAGE_OPTS:
 image_opts = true;
 break;
-- 
1.8.3.1

[Qemu-devel] [PULL 14/34] iotests: Launch qemu-nbd with -e 42

2017-04-28 Thread Kevin Wolf

From: Max Reitz 

There is no reason for the qemu-nbd server used for tests not to accept
an arbitrary number of clients. In fact, test 181 will require it to
accept two clients at the same time (and thus it fails before this
patch).

This patch updates common.rc to launch qemu-nbd with -e 42 which should
be enough for all of our current and future tests.

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
Signed-off-by: Kevin Wolf 
---
 tests/qemu-iotests/common.rc | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
index 62529ee..9fd3130 100644
--- a/tests/qemu-iotests/common.rc
+++ b/tests/qemu-iotests/common.rc
@@ -171,7 +171,9 @@ _make_test_img()
 
 # Start an NBD server on the image file, which is what we'll be talking to
 if [ $IMGPROTO = "nbd" ]; then
-eval "$QEMU_NBD -v -t -b 127.0.0.1 -p 10810 -f $IMGFMT  $TEST_IMG_FILE 
>/dev/null &"
+# Pass a sufficiently high number to -e that should be enough for all
+# tests
+eval "$QEMU_NBD -v -t -b 127.0.0.1 -p 10810 -f $IMGFMT -e 42  
$TEST_IMG_FILE >/dev/null &"
 sleep 1 # FIXME: qemu-nbd needs to be listening before we continue
 fi
 
-- 
1.8.3.1

[Qemu-devel] [PULL 22/34] qemu-img: Document backing options

2017-04-28 Thread Kevin Wolf

From: Max Reitz 

The create and convert subcommands have shorthands to set the
backing_file and, in the case of create, the backing_fmt options for the
new image. However, they have not been documented so far, which is
remedied by this patch.

Reported-by: Eric Blake 
Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
Signed-off-by: Kevin Wolf 
---
 qemu-img-cmds.hx | 8 
 qemu-img.texi| 4 ++--
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index 8ac7822..bf4ce59 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -22,9 +22,9 @@ STEXI
 ETEXI
 
 DEF("create", img_create,
-"create [-q] [--object objectdef] [-f fmt] [-o options] filename [size]")
+"create [-q] [--object objectdef] [-f fmt] [-b backing_file] [-F 
backing_fmt] [-o options] filename [size]")
 STEXI
-@item create [--object @var{objectdef}] [-q] [-f @var{fmt}] [-o @var{options}] 
@var{filename} [@var{size}]
+@item create [--object @var{objectdef}] [-q] [-f @var{fmt}] [-b 
@var{backing_file}] [-F @var{backing_fmt}] [-o @var{options}] @var{filename} 
[@var{size}]
 ETEXI
 
 DEF("commit", img_commit,
@@ -40,9 +40,9 @@ STEXI
 ETEXI
 
 DEF("convert", img_convert,
-"convert [--object objectdef] [--image-opts] [-c] [-p] [-q] [-n] [-f fmt] 
[-t cache] [-T src_cache] [-O output_fmt] [-o options] [-s snapshot_id_or_name] 
[-l snapshot_param] [-S sparse_size] [-m num_coroutines] [-W] filename 
[filename2 [...]] output_filename")
+"convert [--object objectdef] [--image-opts] [-c] [-p] [-q] [-n] [-f fmt] 
[-t cache] [-T src_cache] [-O output_fmt] [-B backing_file] [-o options] [-s 
snapshot_id_or_name] [-l snapshot_param] [-S sparse_size] [-m num_coroutines] 
[-W] filename [filename2 [...]] output_filename")
 STEXI
-@item convert [--object @var{objectdef}] [--image-opts] [-c] [-p] [-q] [-n] 
[-f @var{fmt}] [-t @var{cache}] [-T @var{src_cache}] [-O @var{output_fmt}] [-o 
@var{options}] [-s @var{snapshot_id_or_name}] [-l @var{snapshot_param}] [-S 
@var{sparse_size}] [-m @var{num_coroutines}] [-W] @var{filename} 
[@var{filename2} [...]] @var{output_filename}
+@item convert [--object @var{objectdef}] [--image-opts] [-c] [-p] [-q] [-n] 
[-f @var{fmt}] [-t @var{cache}] [-T @var{src_cache}] [-O @var{output_fmt}] [-B 
@var{backing_file}] [-o @var{options}] [-s @var{snapshot_id_or_name}] [-l 
@var{snapshot_param}] [-S @var{sparse_size}] [-m @var{num_coroutines}] [-W] 
@var{filename} [@var{filename2} [...]] @var{output_filename}
 ETEXI
 
 DEF("dd", img_dd,
diff --git a/qemu-img.texi b/qemu-img.texi
index c81db3e..8c573ae 100644
--- a/qemu-img.texi
+++ b/qemu-img.texi
@@ -224,7 +224,7 @@ If @code{-r} is specified, exit codes representing the 
image state refer to the
 state after (the attempt at) repairing it. That is, a successful @code{-r all}
 will yield the exit code 0, independently of the image state before.
 
-@item create [-f @var{fmt}] [-o @var{options}] @var{filename} [@var{size}]
+@item create [-f @var{fmt}] [-b @var{backing_file}] [-F @var{backing_fmt}] [-o 
@var{options}] @var{filename} [@var{size}]
 
 Create the new disk image @var{filename} of size @var{size} and format
 @var{fmt}. Depending on the file format, you can add one or more @var{options}
@@ -302,7 +302,7 @@ Error on reading data
 
 @end table
 
-@item convert [-c] [-p] [-n] [-f @var{fmt}] [-t @var{cache}] [-T 
@var{src_cache}] [-O @var{output_fmt}] [-o @var{options}] [-s 
@var{snapshot_id_or_name}] [-l @var{snapshot_param}] [-m @var{num_coroutines}] 
[-W] [-S @var{sparse_size}] @var{filename} [@var{filename2} [...]] 
@var{output_filename}
+@item convert [-c] [-p] [-n] [-f @var{fmt}] [-t @var{cache}] [-T 
@var{src_cache}] [-O @var{output_fmt}] [-B @var{backing_file}] [-o 
@var{options}] [-s @var{snapshot_id_or_name}] [-l @var{snapshot_param}] [-m 
@var{num_coroutines}] [-W] [-S @var{sparse_size}] @var{filename} 
[@var{filename2} [...]] @var{output_filename}
 
 Convert the disk image @var{filename} or a snapshot 
@var{snapshot_param}(@var{snapshot_id_or_name} is deprecated)
 to disk image @var{output_filename} using format @var{output_fmt}. It can be 
optionally compressed (@code{-c}
-- 
1.8.3.1

[Qemu-devel] [PULL 09/34] block: An empty filename counts as no filename

2017-04-28 Thread Kevin Wolf

From: Max Reitz 

Reproducer:
$ ./qemu-img info ''
qemu-img: ./block.c:1008: bdrv_open_driver: Assertion
`!drv->bdrv_needs_filename || bs->filename[0]' failed.
[1]26105 abort (core dumped)  ./qemu-img info ''

This patch fixes this to be:
$ ./qemu-img info ''
qemu-img: Could not open '': The 'file' block driver requires a file
name

Cc: qemu-stable 
Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
Reviewed-by: Fam Zheng 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Kevin Wolf 
---
 block.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block.c b/block.c
index ceaca44..7b557f3 100644
--- a/block.c
+++ b/block.c
@@ -1204,7 +1204,7 @@ static int bdrv_open_common(BlockDriverState *bs, 
BlockBackend *file,
 filename = qdict_get_try_str(options, "filename");
 }
 
-if (drv->bdrv_needs_filename && !filename) {
+if (drv->bdrv_needs_filename && (!filename || !filename[0])) {
 error_setg(errp, "The '%s' block driver requires a file name",
drv->format_name);
 ret = -EINVAL;
-- 
1.8.3.1

[Qemu-devel] [PULL 13/34] block: Remove NULL check in bdrv_co_flush

2017-04-28 Thread Kevin Wolf

From: Fam Zheng 

Reported by Coverity. We already use bs in bdrv_inc_in_flight before
checking for NULL. It is unnecessary as all callers pass non-NULL bs, so
drop it.

Signed-off-by: Fam Zheng 
Reviewed-by: Max Reitz 
Signed-off-by: Kevin Wolf 
---
 block/io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/io.c b/block/io.c
index a32fd1d..80d6c62 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2300,7 +2300,7 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
 
 bdrv_inc_in_flight(bs);
 
-if (!bs || !bdrv_is_inserted(bs) || bdrv_is_read_only(bs) ||
+if (!bdrv_is_inserted(bs) || bdrv_is_read_only(bs) ||
 bdrv_is_sg(bs)) {
 goto early_exit;
 }
-- 
1.8.3.1

[Qemu-devel] [PULL 10/34] iotests/051: Add test for empty filename

2017-04-28 Thread Kevin Wolf

From: Max Reitz 

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
Reviewed-by: Fam Zheng 
Signed-off-by: Kevin Wolf 
---
 tests/qemu-iotests/051| 1 +
 tests/qemu-iotests/051.out| 3 +++
 tests/qemu-iotests/051.pc.out | 3 +++
 3 files changed, 7 insertions(+)

diff --git a/tests/qemu-iotests/051 b/tests/qemu-iotests/051
index 7c78a02..26c29de 100755
--- a/tests/qemu-iotests/051
+++ b/tests/qemu-iotests/051
@@ -232,6 +232,7 @@ echo === Leaving out required options ===
 echo
 
 run_qemu -drive driver=file
+run_qemu -drive driver=file,filename=
 run_qemu -drive driver=nbd
 run_qemu -drive driver=raw
 run_qemu -drive file.driver=file
diff --git a/tests/qemu-iotests/051.out b/tests/qemu-iotests/051.out
index 3b7b040..4d3b1ff 100644
--- a/tests/qemu-iotests/051.out
+++ b/tests/qemu-iotests/051.out
@@ -221,6 +221,9 @@ QEMU X.Y.Z monitor - type 'help' for more information
 Testing: -drive driver=file
 QEMU_PROG: -drive driver=file: The 'file' block driver requires a file name
 
+Testing: -drive driver=file,filename=
+QEMU_PROG: -drive driver=file,filename=: The 'file' block driver requires a 
file name
+
 Testing: -drive driver=nbd
 QEMU_PROG: -drive driver=nbd: NBD server address missing
 
diff --git a/tests/qemu-iotests/051.pc.out b/tests/qemu-iotests/051.pc.out
index 7f9a121..76d7205 100644
--- a/tests/qemu-iotests/051.pc.out
+++ b/tests/qemu-iotests/051.pc.out
@@ -319,6 +319,9 @@ QEMU X.Y.Z monitor - type 'help' for more information
 Testing: -drive driver=file
 QEMU_PROG: -drive driver=file: The 'file' block driver requires a file name
 
+Testing: -drive driver=file,filename=
+QEMU_PROG: -drive driver=file,filename=: The 'file' block driver requires a 
file name
+
 Testing: -drive driver=nbd
 QEMU_PROG: -drive driver=nbd: NBD server address missing
 
-- 
1.8.3.1

[Qemu-devel] [PULL 16/34] iotests: Fix typo in 026

2017-04-28 Thread Kevin Wolf

From: Eric Blake 

s/refcout/refcount/

CC: qemu-triv...@nongnu.org
Signed-off-by: Eric Blake 
Reviewed-by: Laurent Vivier 
Signed-off-by: Kevin Wolf 
---
 tests/qemu-iotests/026 | 2 +-
 tests/qemu-iotests/026.out | 2 +-
 tests/qemu-iotests/026.out.nocache | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/tests/qemu-iotests/026 b/tests/qemu-iotests/026
index f5a7f02..7fadfba 100755
--- a/tests/qemu-iotests/026
+++ b/tests/qemu-iotests/026
@@ -119,7 +119,7 @@ done
 
 
 echo
-echo === Refcout table growth tests ===
+echo === Refcount table growth tests ===
 echo
 CLUSTER_SIZE=512
 
diff --git a/tests/qemu-iotests/026.out b/tests/qemu-iotests/026.out
index 59b8f74..86a50a2 100644
--- a/tests/qemu-iotests/026.out
+++ b/tests/qemu-iotests/026.out
@@ -462,7 +462,7 @@ Event: cluster_alloc; errno: 28; imm: off; once: off; write 
-b
 write failed: No space left on device
 No errors were found on the image.
 
-=== Refcout table growth tests ===
+=== Refcount table growth tests ===
 
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
diff --git a/tests/qemu-iotests/026.out.nocache 
b/tests/qemu-iotests/026.out.nocache
index b4aeebc..ea2e166 100644
--- a/tests/qemu-iotests/026.out.nocache
+++ b/tests/qemu-iotests/026.out.nocache
@@ -470,7 +470,7 @@ Event: cluster_alloc; errno: 28; imm: off; once: off; write 
-b
 write failed: No space left on device
 No errors were found on the image.
 
-=== Refcout table growth tests ===
+=== Refcount table growth tests ===
 
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
-- 
1.8.3.1

[Qemu-devel] [PULL 00/34] Block layer patches

2017-04-28 Thread Kevin Wolf

The following changes since commit 81b2d5ceb0cfb4cdc2163492e3169ed714b0cda9:

  Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20170426' into 
staging (2017-04-26 20:50:49 +0100)

are available in the git repository at:


  git://repo.or.cz/qemu/kevin.git tags/for-upstream

for you to fetch changes up to 5fc0fe383fff318b38291dcdf2cf38e329ec232a:

  Merge remote-tracking branch 'mreitz/tags/pull-block-2017-04-28' into 
queue-block (2017-04-28 20:52:17 +0200)



Block layer patches


Denis V. Lunev (2):
  block: fix alignment calculations in bdrv_co_do_zero_pwritev
  block: assert no image modification under BDRV_O_INACTIVE

Eric Blake (2):
  iotests: Fix typo in 026
  qcow2: Allow discard of final unaligned cluster

Fam Zheng (2):
  block: Remove NULL check in bdrv_co_flush
  iotests: 109: Filter out "len" of failed jobs

John Snow (2):
  iotests: clarify help text
  iotests: fix exclusion option

Kevin Wolf (8):
  file-posix: Remove unnecessary includes
  file-win32: Remove unnecessary include
  migration: Call blk_resume_after_migration() for postcopy
  qemu-iotests: Filter HMP readline escape characters
  qemu-iotests: Test postcopy migration
  qemu-iotests: Remove PERL_PROG and BC_PROG
  qemu_iotests: Remove _readlink()
  Merge remote-tracking branch 'mreitz/tags/pull-block-2017-04-28' into 
queue-block

Klim Kireev (1):
  block: fix obvious coding style mistakes in block_int.h

Krzysztof Kozlowski (1):
  block: Constify data passed by pointer to blk_name

Lidong Chen (1):
  qemu-img: use blk_co_pwrite_zeroes for zero sectors when compressed

Max Reitz (13):
  Revert "block/io: Comment out permission assertions"
  block: An empty filename counts as no filename
  iotests/051: Add test for empty filename
  iotests: Launch qemu-nbd with -e 42
  block: Do not unref bs->file on error in BD's open
  qemu-img/convert: Use @opts for one thing only
  qemu-img/convert: Move bs_n > 1 && -B check down
  qemu-img: Document backing options
  block/vhdx: Make vhdx_create() always set errp
  block: Add errp to b{lk,drv}_truncate()
  block: Add errp to BD.bdrv_truncate()
  block: Add .bdrv_truncate() error messages
  progress: Show current progress on SIGINFO

Peter Lieven (1):
  qemu-img: simplify img_convert

Thomas Huth (1):
  Issue a deprecation warning if the user specifies the "-hdachs" option.

Vladimir Sementsov-Ogievskiy (1):
  qemu-img: improve convert_iteration_sectors()

 block.c|  26 +--
 block/blkdebug.c   |   8 +-
 block/blkreplay.c  |   3 -
 block/blkverify.c  |   3 -
 block/block-backend.c  |   7 +-
 block/commit.c |   5 +-
 block/crypto.c |   5 +-
 block/file-posix.c |  21 ++-
 block/file-win32.c |   7 +-
 block/gluster.c|   7 +-
 block/io.c |  16 +-
 block/iscsi.c  |   6 +-
 block/mirror.c |   2 +-
 block/nfs.c|  12 +-
 block/parallels.c  |  13 +-
 block/qcow.c   |   6 +-
 block/qcow2-refcount.c |   5 +-
 block/qcow2.c  |  31 ++--
 block/qed.c|   8 +-
 block/raw-format.c |   6 +-
 block/rbd.c|   3 +-
 block/sheepdog.c   |  14 +-
 block/vdi.c|   4 +-
 block/vhdx-log.c   |   2 +-
 block/vhdx.c   |  25 ++-
 block/vmdk.c   |  13 +-
 block/vpc.c|  13 +-
 blockdev.c |  21 +--
 include/block/block.h  |   2 +-
 include/block/block_int.h  |   8 +-
 include/sysemu/block-backend.h |   4 +-
 migration/savevm.c |   8 +
 qemu-img-cmds.hx   |   8 +-
 qemu-img.c | 313 ++---
 qemu-img.texi  |   7 +-
 qemu-io-cmds.c |   5 +-
 qemu-options.hx|   4 +-
 tests/qemu-iotests/026 |   2 +-
 tests/qemu-iotests/026.out |   2 +-
 tests/qemu-iotests/026.out.nocache |   2 +-
 tests/qemu-iotests/028.out |   2 +-
 tests/qemu-iotests/051 |   4 +-
 tests/qemu-iotests/051.out | 109 ++---
 tests/qemu-iotests/051.pc.out  | 135 
 tests/qemu-iotests/066 |  12 +-
 tests/qemu-iotests/066.out |  12 +-
 tests/qemu-iotests/068 |   4 +-
 tests/qemu-iotests/068.out |   6 +-
 tests/qemu-iotests/109 |   6 +-
 tests/qemu-iotests/109.out |  20

[Qemu-devel] [PULL 08/34] qemu-iotests: Test postcopy migration

2017-04-28 Thread Kevin Wolf

Signed-off-by: Kevin Wolf 
Reviewed-by: Dr. David Alan Gilbert 
---
 tests/qemu-iotests/181 | 119 +
 tests/qemu-iotests/181.out |  38 +++
 tests/qemu-iotests/group   |   1 +
 3 files changed, 158 insertions(+)
 create mode 100755 tests/qemu-iotests/181
 create mode 100644 tests/qemu-iotests/181.out

diff --git a/tests/qemu-iotests/181 b/tests/qemu-iotests/181
new file mode 100755
index 000..e969a2a
--- /dev/null
+++ b/tests/qemu-iotests/181
@@ -0,0 +1,119 @@
+#!/bin/bash
+#
+# Test postcopy live migration with shared storage
+#
+# Copyright (C) 2017 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+# creator
+owner=kw...@redhat.com
+
+seq=`basename $0`
+echo "QA output created by $seq"
+
+here=`pwd`
+status=1   # failure is the default!
+
+MIG_SOCKET="${TEST_DIR}/migrate"
+
+_cleanup()
+{
+rm -f "${MIG_SOCKET}"
+   _cleanup_test_img
+_cleanup_qemu
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+. ./common.qemu
+
+_supported_fmt generic
+_supported_proto generic
+_supported_os Linux
+
+size=64M
+_make_test_img $size
+
+echo
+echo === Starting VMs ===
+echo
+
+qemu_comm_method="monitor"
+
+_launch_qemu \
+-drive file="${TEST_IMG}",cache=${CACHEMODE},driver=$IMGFMT,id=disk
+src=$QEMU_HANDLE
+
+_launch_qemu \
+-drive file="${TEST_IMG}",cache=${CACHEMODE},driver=$IMGFMT,id=disk \
+-incoming "unix:${MIG_SOCKET}"
+dest=$QEMU_HANDLE
+
+echo
+echo === Write something on the source ===
+echo
+
+silent=
+_send_qemu_cmd $src 'qemu-io disk "write -P 0x55 0 64k"' "(qemu)"
+_send_qemu_cmd $src "" "ops/sec"
+_send_qemu_cmd $src 'qemu-io disk "read -P 0x55 0 64k"' "(qemu)"
+_send_qemu_cmd $src "" "ops/sec"
+
+echo
+echo === Do postcopy migration to destination ===
+echo
+
+# Slow down migration so much that it definitely won't finish before we can
+# switch to postcopy
+silent=yes
+_send_qemu_cmd $src 'migrate_set_speed 4k' "(qemu)"
+_send_qemu_cmd $src 'migrate_set_capability postcopy-ram on' "(qemu)"
+_send_qemu_cmd $src "migrate -d unix:${MIG_SOCKET}" "(qemu)"
+_send_qemu_cmd $src 'migrate_start_postcopy' "(qemu)"
+
+QEMU_COMM_TIMEOUT=1 qemu_cmd_repeat=10 silent=yes \
+_send_qemu_cmd $src "info migrate" "completed\|failed"
+silent=yes _send_qemu_cmd $src "" "(qemu)"
+
+echo
+echo === Do some I/O on the destination ===
+echo
+
+# It is important that we use the BlockBackend of the guest device here instead
+# of the node name, which would create a new BlockBackend and not test whether
+# the guest has the necessary permissions to access the image now
+silent=
+_send_qemu_cmd $dest 'qemu-io disk "read -P 0x55 0 64k"' "(qemu)"
+_send_qemu_cmd $dest "" "ops/sec"
+_send_qemu_cmd $dest 'qemu-io disk "write -P 0x66 1M 64k"' "(qemu)"
+_send_qemu_cmd $dest "" "ops/sec"
+
+echo
+echo === Shut down and check image ===
+echo
+
+_send_qemu_cmd $src 'quit' ""
+_send_qemu_cmd $dest 'quit' ""
+wait=1 _cleanup_qemu
+
+_check_test_img
+
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/181.out b/tests/qemu-iotests/181.out
new file mode 100644
index 000..6534ba2
--- /dev/null
+++ b/tests/qemu-iotests/181.out
@@ -0,0 +1,38 @@
+QA output created by 181
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+
+=== Starting VMs ===
+
+
+=== Write something on the source ===
+
+QEMU X.Y.Z monitor - type 'help' for more information
+(qemu) qemu-io disk "write -P 0x55 0 64k"
+wrote 65536/65536 bytes at offset 0
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+(qemu) 
+(qemu) qemu-io disk "read -P 0x55 0 64k"
+read 65536/65536 bytes at offset 0
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+=== Do postcopy migration to destination ===
+
+
+=== Do some I/O on the destination ===
+
+QEMU X.Y.Z monitor - type 'help' for more information
+(qemu) qemu-io disk "read -P 0x55 0 64k"
+read 65536/65536 bytes at offset 0
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+(qemu) 
+(qemu) qemu-io disk "write -P 0x66 1M 64k"
+wrote 65536/65536 bytes at offset 1048576
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+=== Shut down and check image ===
+
+(qemu) quit
+(qemu) 
+(qemu) quit
+No errors were found on the image.
+*** done
diff --git

[Qemu-devel] [PULL 11/34] qemu-iotests: Remove PERL_PROG and BC_PROG

2017-04-28 Thread Kevin Wolf

We test for the presence of perl and bc and save their path in the
variables PERL_PROG and BC_PROG, but never actually make use of them.
Remove the checks and assignments so qemu-iotests can run even when
bc isn't installed.

Reported-by: Yash Mankad 
Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Fam Zheng 
---
 tests/qemu-iotests/common.config | 6 --
 1 file changed, 6 deletions(-)

diff --git a/tests/qemu-iotests/common.config b/tests/qemu-iotests/common.config
index c4b51b3..5ee00a1 100644
--- a/tests/qemu-iotests/common.config
+++ b/tests/qemu-iotests/common.config
@@ -75,18 +75,12 @@ _fatal()
 exit 1
 }
 
-export PERL_PROG="`set_prog_path perl`"
-[ "$PERL_PROG" = "" ] && _fatal "perl not found"
-
 export AWK_PROG="`set_prog_path awk`"
 [ "$AWK_PROG" = "" ] && _fatal "awk not found"
 
 export SED_PROG="`set_prog_path sed`"
 [ "$SED_PROG" = "" ] && _fatal "sed not found"
 
-export BC_PROG="`set_prog_path bc`"
-[ "$BC_PROG" = "" ] && _fatal "bc not found"
-
 export PS_ALL_FLAGS="-ef"
 
 if [ -z "$QEMU_PROG" ]; then
-- 
1.8.3.1

[Qemu-devel] [PULL 01/34] block: Constify data passed by pointer to blk_name

2017-04-28 Thread Kevin Wolf

From: Krzysztof Kozlowski 

blk_name() is not modifying data passed to it through pointer and it
returns also a pointer to const so the argument can be made const for
code safeness.

Signed-off-by: Krzysztof Kozlowski 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Kevin Wolf 
---
 block/block-backend.c  | 2 +-
 include/sysemu/block-backend.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 7405024..ae3d771 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -420,7 +420,7 @@ void monitor_remove_blk(BlockBackend *blk)
  * Return @blk's name, a non-null string.
  * Returns an empty string iff @blk is not referenced by the monitor.
  */
-const char *blk_name(BlockBackend *blk)
+const char *blk_name(const BlockBackend *blk)
 {
 return blk->name ?: "";
 }
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 7462228..133e542 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -99,7 +99,7 @@ int blk_get_refcnt(BlockBackend *blk);
 void blk_ref(BlockBackend *blk);
 void blk_unref(BlockBackend *blk);
 void blk_remove_all_bs(void);
-const char *blk_name(BlockBackend *blk);
+const char *blk_name(const BlockBackend *blk);
 BlockBackend *blk_by_name(const char *name);
 BlockBackend *blk_next(BlockBackend *blk);
 bool monitor_add_blk(BlockBackend *blk, const char *name, Error **errp);
-- 
1.8.3.1

[Qemu-devel] [PULL 06/34] migration: Call blk_resume_after_migration() for postcopy

2017-04-28 Thread Kevin Wolf

Commit d35ff5e6 ('block: Ignore guest dev permissions during incoming
migration') added blk_resume_after_migration() to the precopy migration
path, but neglected to add it to the duplicated code that is used for
postcopy migration. This means that the guest device doesn't request the
necessary permissions, which ultimately led to failing assertions.

Add the missing blk_resume_after_migration() to the postcopy path.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
---
 migration/savevm.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/migration/savevm.c b/migration/savevm.c
index 03ae1bd..a00c1ab 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1622,6 +1622,14 @@ static void loadvm_postcopy_handle_run_bh(void *opaque)
 error_report_err(local_err);
 }
 
+/* If we get an error here, just don't restart the VM yet. */
+blk_resume_after_migration(_err);
+if (local_err) {
+error_free(local_err);
+local_err = NULL;
+autostart = false;
+}
+
 trace_loadvm_postcopy_handle_run_cpu_sync();
 cpu_synchronize_all_post_init();
 
-- 
1.8.3.1

[Qemu-devel] [PULL 05/34] qemu-img: simplify img_convert

2017-04-28 Thread Kevin Wolf

From: Peter Lieven 

img_convert has been around before there was an ImgConvertState or
a block backend, but it has never been modified to directly use
these structs. Change this by parsing parameters directly into
the ImgConvertState and directly use BlockBackend where possible.
Furthermore variable initialization has been reworked and sorted.

Signed-off-by: Peter Lieven 
Reviewed-by: Eric Blake 
Signed-off-by: Kevin Wolf 
---
 qemu-img.c | 201 +
 1 file changed, 81 insertions(+), 120 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index bbe1574..b94fc11 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1522,7 +1522,7 @@ typedef struct ImgConvertState {
 int min_sparse;
 size_t cluster_sectors;
 size_t buf_sectors;
-int num_coroutines;
+long num_coroutines;
 int running_coroutines;
 Coroutine *co[MAX_COROUTINES];
 int64_t wait_sector_num[MAX_COROUTINES];
@@ -1916,39 +1916,29 @@ static int convert_do_copy(ImgConvertState *s)
 
 static int img_convert(int argc, char **argv)
 {
-int c, bs_n, bs_i, compress, cluster_sectors, skip_create;
-int64_t ret = 0;
-int progress = 0, flags, src_flags;
-bool writethrough, src_writethrough;
-const char *fmt, *out_fmt, *cache, *src_cache, *out_baseimg, *out_filename;
+int c, bs_i, flags, src_flags = 0;
+const char *fmt = NULL, *out_fmt = "raw", *cache = "unsafe",
+   *src_cache = BDRV_DEFAULT_CACHE, *out_baseimg = NULL,
+   *out_filename, *out_baseimg_param, *snapshot_name = NULL;
 BlockDriver *drv, *proto_drv;
-BlockBackend **blk = NULL, *out_blk = NULL;
-BlockDriverState **bs = NULL, *out_bs = NULL;
-int64_t total_sectors;
-int64_t *bs_sectors = NULL;
-size_t bufsectors = IO_BUF_SIZE / BDRV_SECTOR_SIZE;
 BlockDriverInfo bdi;
-QemuOpts *opts = NULL;
+BlockDriverState *out_bs;
+QemuOpts *opts = NULL, *sn_opts = NULL;
 QemuOptsList *create_opts = NULL;
-const char *out_baseimg_param;
 char *options = NULL;
-const char *snapshot_name = NULL;
-int min_sparse = 8; /* Need at least 4k of zeros for sparse detection */
-bool quiet = false;
 Error *local_err = NULL;
-QemuOpts *sn_opts = NULL;
-ImgConvertState state;
-bool image_opts = false;
-bool wr_in_order = true;
-long num_coroutines = 8;
+bool writethrough, src_writethrough, quiet = false, image_opts = false,
+ skip_create = false, progress = false;
+int64_t ret = -EINVAL;
+
+ImgConvertState s = (ImgConvertState) {
+/* Need at least 4k of zeros for sparse detection */
+.min_sparse = 8,
+.buf_sectors= IO_BUF_SIZE / BDRV_SECTOR_SIZE,
+.wr_in_order= true,
+.num_coroutines = 8,
+};
 
-fmt = NULL;
-out_fmt = "raw";
-cache = "unsafe";
-src_cache = BDRV_DEFAULT_CACHE;
-out_baseimg = NULL;
-compress = 0;
-skip_create = 0;
 for(;;) {
 static const struct option long_options[] = {
 {"help", no_argument, 0, 'h'},
@@ -1981,22 +1971,19 @@ static int img_convert(int argc, char **argv)
 out_baseimg = optarg;
 break;
 case 'c':
-compress = 1;
+s.compressed = true;
 break;
 case 'e':
 error_report("option -e is deprecated, please use \'-o "
   "encryption\' instead!");
-ret = -1;
 goto fail_getopt;
 case '6':
 error_report("option -6 is deprecated, please use \'-o "
   "compat6\' instead!");
-ret = -1;
 goto fail_getopt;
 case 'o':
 if (!is_valid_option_list(optarg)) {
 error_report("Invalid option list: %s", optarg);
-ret = -1;
 goto fail_getopt;
 }
 if (!options) {
@@ -2017,7 +2004,6 @@ static int img_convert(int argc, char **argv)
 if (!sn_opts) {
 error_report("Failed in parsing snapshot param '%s'",
  optarg);
-ret = -1;
 goto fail_getopt;
 }
 } else {
@@ -2031,15 +2017,14 @@ static int img_convert(int argc, char **argv)
 sval = cvtnum(optarg);
 if (sval < 0) {
 error_report("Invalid minimum zero buffer size for sparse 
output specified");
-ret = -1;
 goto fail_getopt;
 }
 
-min_sparse = sval / BDRV_SECTOR_SIZE;
+s.min_sparse = sval / BDRV_SECTOR_SIZE;
 break;
 }
 case 'p':
-progress = 1;
+progress = true;
 break;
 case 't':
 cache = optarg;
@@ -2051,19 +2036,18 @@ static int img_convert(int argc, char

[Qemu-devel] [PULL 02/34] file-posix: Remove unnecessary includes

2017-04-28 Thread Kevin Wolf

Signed-off-by: Kevin Wolf 
---
 block/file-posix.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 0c48968..ade71db 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -25,8 +25,6 @@
 #include "qapi/error.h"
 #include "qemu/cutils.h"
 #include "qemu/error-report.h"
-#include "qemu/timer.h"
-#include "qemu/log.h"
 #include "block/block_int.h"
 #include "qemu/module.h"
 #include "trace.h"
-- 
1.8.3.1

[Qemu-devel] [PULL 04/34] Revert "block/io: Comment out permission assertions"

2017-04-28 Thread Kevin Wolf

From: Max Reitz 

This reverts commit e3e0003a8f6570aba1421ef99a0b383a43371a74.

This commit was necessary for the 2.9 release because we were unable to
fix the underlying issue(s) in time. However, we will be for 2.10.

Signed-off-by: Max Reitz 
Acked-by: Fam Zheng 
Signed-off-by: Kevin Wolf 
---
 block.c|  6 +-
 block/io.c | 12 ++--
 2 files changed, 3 insertions(+), 15 deletions(-)

diff --git a/block.c b/block.c
index 5db266b..ceaca44 100644
--- a/block.c
+++ b/block.c
@@ -3313,11 +3313,7 @@ int bdrv_truncate(BdrvChild *child, int64_t offset)
 BlockDriver *drv = bs->drv;
 int ret;
 
-/* FIXME: Some format block drivers use this function instead of implicitly
- *growing their file by writing beyond its end.
- *See bdrv_aligned_pwritev() for an explanation why we currently
- *cannot assert this permission in that case. */
-// assert(child->perm & BLK_PERM_RESIZE);
+assert(child->perm & BLK_PERM_RESIZE);
 
 if (!drv)
 return -ENOMEDIUM;
diff --git a/block/io.c b/block/io.c
index a7142e0..a32fd1d 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1362,16 +1362,8 @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChild 
*child,
 assert(!waited || !req->serialising);
 assert(req->overlap_offset <= offset);
 assert(offset + bytes <= req->overlap_offset + req->overlap_bytes);
-/* FIXME: Block migration uses the BlockBackend of the guest device at a
- *point when it has not yet taken write permissions. This will be
- *fixed by a future patch, but for now we have to bypass this
- *assertion for block migration to work. */
-// assert(child->perm & BLK_PERM_WRITE);
-/* FIXME: Because of the above, we also cannot guarantee that all format
- *BDS take the BLK_PERM_RESIZE permission on their file BDS, since
- *they are not obligated to do so if they do not have any parent
- *that has taken the permission to write to them. */
-// assert(end_sector <= bs->total_sectors || child->perm & 
BLK_PERM_RESIZE);
+assert(child->perm & BLK_PERM_WRITE);
+assert(end_sector <= bs->total_sectors || child->perm & BLK_PERM_RESIZE);
 
 ret = notifier_with_return_list_notify(>before_write_notifiers, req);
 
-- 
1.8.3.1

[Qemu-devel] [PULL 03/34] file-win32: Remove unnecessary include

2017-04-28 Thread Kevin Wolf

Signed-off-by: Kevin Wolf 
---
 block/file-win32.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/block/file-win32.c b/block/file-win32.c
index 800fabd..e132ba1 100644
--- a/block/file-win32.c
+++ b/block/file-win32.c
@@ -24,7 +24,6 @@
 #include "qemu/osdep.h"
 #include "qapi/error.h"
 #include "qemu/cutils.h"
-#include "qemu/timer.h"
 #include "block/block_int.h"
 #include "qemu/module.h"
 #include "block/raw-aio.h"
-- 
1.8.3.1

Re: [Qemu-devel] [PATCH v10 01/17] block: Update comments on BDRV_BLOCK_* meanings

2017-04-28 Thread Eric Blake

On 04/26/2017 08:46 PM, Eric Blake wrote:
> We had some conflicting documentation: a nice 8-way table that
> described all possible combinations of DATA, ZERO, and
> OFFSET_VALID, couple with text that implied that OFFSET_VALID
> always meant raw data could be read directly.  As the 8-way
> table is the intended semantics, simplify the rest of the text
> to get rid of the confusion.
> 
> Suggested-by: Max Reitz 
> Signed-off-by: Eric Blake 
> 
> ---

>  /*
>   * Allocation status flags
> - * BDRV_BLOCK_DATA: data is read from a file returned by 
> bdrv_get_block_status.
> + * BDRV_BLOCK_DATA: data is read from a file returned by 
> bdrv_get_block_status

I guess we always return the BDS where the data is read from, even when
we can't actually provide an offset to that data (such as when the
actual data is encrypted or compressed, and therefore has no raw
offset).  But I'm still wondering if this line can be shortened.

>   * BDRV_BLOCK_ZERO: sectors read as zero
> - * BDRV_BLOCK_OFFSET_VALID: sector stored as raw data in a file returned by
> - *  bdrv_get_block_status.

I think when this was originally written, we blindly assumed that offset
pointed into bs->file.  Then with the exposure of the BDS graph, we
changed bdrv_get_block_status() to have a BlockDriverState **file
parameter (offset points into the returned file), since bs->file need
not always be the right BDS.  This old comment is on track...

> + * BDRV_BLOCK_OFFSET_VALID: an associated offset exists for accessing raw 
> data
>   * BDRV_BLOCK_ALLOCATED: the content of the block is determined by this
>   *   layer (as opposed to the backing file)
>   * BDRV_BLOCK_RAW: used internally to indicate that the request
>   * was answered by the raw driver and that one
>   * should look in bs->file directly.
>   *
> - * If BDRV_BLOCK_OFFSET_VALID is set, bits 9-62 represent the offset in
> - * bs->file where sector data can be read from as raw data.

...but this one is stale...

> - *
>   * DATA == 0 && ZERO == 0 means that data is read from backing_hd if present.
>   *
> + * If BDRV_BLOCK_OFFSET_VALID is set, bits 9-62 represent the offset
> + * in bs->file that is allocated for the corresponding raw data;

...and I reused the stale wording.  Oops.

> + * however, whether that offset actually contains data also depends on
> + * BDRV_BLOCK_DATA, as follows:
> + *
>   * DATA ZERO OFFSET_VALID
>   *  ttt   sectors read as zero, bs->file is zero at offset
>   *  tft   sectors read as valid from bs->file at offset

And these references to bs->file are stale too.

Guess I have more to fix on the respin.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v10 07/17] qemu-io: Switch 'alloc' command to byte-based length

2017-04-28 Thread Max Reitz

On 28.04.2017 21:59, Eric Blake wrote:
> On 04/28/2017 02:46 PM, Max Reitz wrote:
>> On 27.04.2017 03:46, Eric Blake wrote:
>>> For the 'alloc' command, accepting an offset in bytes but a length
>>> in sectors, and reporting output in sectors, is confusing.  Do
>>> everything in bytes, and adjust the expected output accordingly.
>>>
>>> Signed-off-by: Eric Blake 
>>> Reviewed-by: Philippe Mathieu-Daudé 
>>>
> 
>>>  }
>>> +if (!QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE)) {
>>> +printf("bytes %" PRId64 " is not sector aligned\n",
>>
>> This isn't real English. :-)
> 
> But, it's just copy-and-paste from the other instances you just reviewed
> in 6/17!  [Translation - if I change this one, I also get to redo that one]

No, those are "offset" and "count" -- all singular. "bytes" is plural. ;-)

> 
> Which of these various alternatives (if any) looks better:
> 
> bytes=511 is not sector-aligned
> 511 is not a sector-aligned value for 'bytes'
> requested 'bytes' of 511 is not sector-aligned
> alignment error: 511 bytes is not sector-aligned> 'bytes' must be 
> sector-aligned: 511
> your clever entry here...

How about "byte count" instead of "bytes" or "bytes value", if really
want to have the exact spelling in there?

For your entries above: (1) and (2) work for me (I like (2) a bit
better), (3) doesn't sound like real English either, and it should be
s/is/are/ in (4) (but it still sounds off with that change). (5) I
mostly dislike because I dislike error message of the form "This should
be X: $foo", I like "$foo is not X" better.

>> With that fixed (somehow, you know better than me how to):
> 
> Re-reading my various alternatives, I do think that /sector
> aligned/sector-aligned/ helps no matter what; and that the remaining
> trick is to use quoting or '=' or some other lexical trick to make it
> obvious that 'bytes' is a parameter name whose value 511 is invalid,
> rather than part of the actual error of a value that is not properly
> aligned.

First of all, I think the user is clever enough to figure out that a
description like "byte count" refers to the "bytes" parameter. ;-)

Secondly, this is even more true because this is a debugging tool so we
don't have to worry too much about... Let's say inexperienced users.

Max

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v10 08/17] qemu-io: Switch 'map' output to byte-based reporting

2017-04-28 Thread Eric Blake

On 04/28/2017 02:53 PM, Max Reitz wrote:
> On 27.04.2017 03:46, Eric Blake wrote:
>> Mixing byte offset and sector allocation counts is a bit
>> confusing.  Also, reporting n/m sectors, where m decreases
>> according to the remaining size of the file, isn't really
>> adding any useful information.
> 
> Since this map doesn't leave out any range in the image file -- not
> really, no. :-)
> 
>> Update the output to use
>> byte counts, and adjust the affected tests (./check -qcow2 102,
>> ./check -vpc 146).

My commit message is stale if test 179 gets committed first (although,
as we mentioned earlier, I'm trying to split this series in two, which
would delay the introduction of 179...)

I still wonder if an even more concise representation is worth it; the
resulting output lines are still quite long.  We don't make any
guarantees about qemu-io back-compat, so I may still tweak it further
(but if I do, I'll drop R-b)

I should probably also add a blurb to the commit message about how
'qemu-img map' and 'qemu-io map' are two different beasts.

>>
>> Signed-off-by: Eric Blake 
>>
>> ---
>> v10: rebase to updated test 179
>> v9: new patch
>> ---
>>  qemu-io-cmds.c |  5 ++---
>>  tests/qemu-iotests/102.out |  4 ++--
>>  tests/qemu-iotests/146.out | 30 +++---
>>  tests/qemu-iotests/179.out | 22 +++---
>>  4 files changed, 30 insertions(+), 31 deletions(-)
> 
> Reviewed-by: Max Reitz 
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v10 07/17] qemu-io: Switch 'alloc' command to byte-based length

2017-04-28 Thread Eric Blake

On 04/28/2017 02:46 PM, Max Reitz wrote:
> On 27.04.2017 03:46, Eric Blake wrote:
>> For the 'alloc' command, accepting an offset in bytes but a length
>> in sectors, and reporting output in sectors, is confusing.  Do
>> everything in bytes, and adjust the expected output accordingly.
>>
>> Signed-off-by: Eric Blake 
>> Reviewed-by: Philippe Mathieu-Daudé 
>>

>>  }
>> +if (!QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE)) {
>> +printf("bytes %" PRId64 " is not sector aligned\n",
> 
> This isn't real English. :-)

But, it's just copy-and-paste from the other instances you just reviewed
in 6/17!  [Translation - if I change this one, I also get to redo that one]

Which of these various alternatives (if any) looks better:

bytes=511 is not sector-aligned
511 is not a sector-aligned value for 'bytes'
requested 'bytes' of 511 is not sector-aligned
alignment error: 511 bytes is not sector-aligned
'bytes' must be sector-aligned: 511
your clever entry here...

> 
> With that fixed (somehow, you know better than me how to):

Re-reading my various alternatives, I do think that /sector
aligned/sector-aligned/ helps no matter what; and that the remaining
trick is to use quoting or '=' or some other lexical trick to make it
obvious that 'bytes' is a parameter name whose value 511 is invalid,
rather than part of the actual error of a value that is not properly
aligned.

> 
> Reviewed-by: Max Reitz 

If you state a preference for one of my variants, then the respin will
use that variant consistently and add your R-b.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v10 08/17] qemu-io: Switch 'map' output to byte-based reporting

2017-04-28 Thread Max Reitz

On 27.04.2017 03:46, Eric Blake wrote:
> Mixing byte offset and sector allocation counts is a bit
> confusing.  Also, reporting n/m sectors, where m decreases
> according to the remaining size of the file, isn't really
> adding any useful information.

Since this map doesn't leave out any range in the image file -- not
really, no. :-)

> Update the output to use
> byte counts, and adjust the affected tests (./check -qcow2 102,
> ./check -vpc 146).
> 
> Signed-off-by: Eric Blake 
> 
> ---
> v10: rebase to updated test 179
> v9: new patch
> ---
>  qemu-io-cmds.c |  5 ++---
>  tests/qemu-iotests/102.out |  4 ++--
>  tests/qemu-iotests/146.out | 30 +++---
>  tests/qemu-iotests/179.out | 22 +++---
>  4 files changed, 30 insertions(+), 31 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v10 05/17] iotests: Add test 179 to cover write zeroes with unmap

2017-04-28 Thread Eric Blake

On 04/28/2017 02:28 PM, Max Reitz wrote:
> On 27.04.2017 03:46, Eric Blake wrote:
>> No tests were covering write zeroes with unmap.  Additionally,
>> I needed to prove that my previous patches for correct status
>> reporting and write zeroes optimizations actually had an impact.
>>
>> The test works for cluster_size between 8k and 2M (for smaller
>> sizes, it fails because our allocation patterns are not contiguous
>> with small clusters).
>>
>> Signed-off-by: Eric Blake 
>>
>> ---
>> v10: drop any changes to v2 files, rewrite test to work with updates
>> earlier in the series, add a blkdebug probe
>> v9: new patch
>> ---
>>  tests/qemu-iotests/179 | 123 
>> +
>>  tests/qemu-iotests/179.out |  90 +
>>  tests/qemu-iotests/group   |   1 +
>>  3 files changed, 214 insertions(+)
>>  create mode 100755 tests/qemu-iotests/179
>>  create mode 100644 tests/qemu-iotests/179.out
> 
> Can I convince you to write this test in a different way? I really have
> a hard time following all of the writes and building a mental image of
> how the qcow2 file should look in the end so I can compare it to the map
> output.
> 
> I'd like it better if you would check the qcow2 file after each group of
> write operations, maybe even creating a new image for every such group.
> But even just having more qemu-img map calls (and changing nothing else)
> would help me a lot.

Will do.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v10 07/17] qemu-io: Switch 'alloc' command to byte-based length

2017-04-28 Thread Max Reitz

On 27.04.2017 03:46, Eric Blake wrote:
> For the 'alloc' command, accepting an offset in bytes but a length
> in sectors, and reporting output in sectors, is confusing.  Do
> everything in bytes, and adjust the expected output accordingly.
> 
> Signed-off-by: Eric Blake 
> Reviewed-by: Philippe Mathieu-Daudé 
> 
> ---
> v10: rebase to code cleanup
> v9: new patch
> ---
>  qemu-io-cmds.c| 30 ++
>  tests/qemu-iotests/019.out|  8 
>  tests/qemu-iotests/common.pattern |  2 +-
>  3 files changed, 23 insertions(+), 17 deletions(-)
> 
> diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
> index fabc394..34f6707 100644
> --- a/qemu-io-cmds.c
> +++ b/qemu-io-cmds.c
> @@ -1760,7 +1760,7 @@ out:
>  static int alloc_f(BlockBackend *blk, int argc, char **argv)
>  {
>  BlockDriverState *bs = blk_bs(blk);
> -int64_t offset, sector_num, nb_sectors, remaining;
> +int64_t offset, sector_num, nb_sectors, remaining, bytes;
>  char s1[64];
>  int num, ret;
>  int64_t sum_alloc;
> @@ -1776,18 +1776,24 @@ static int alloc_f(BlockBackend *blk, int argc, char 
> **argv)
>  }
> 
>  if (argc == 3) {
> -nb_sectors = cvtnum(argv[2]);
> -if (nb_sectors < 0) {
> -print_cvtnum_err(nb_sectors, argv[2]);
> +bytes = cvtnum(argv[2]);
> +if (bytes < 0) {
> +print_cvtnum_err(bytes, argv[2]);
>  return 0;
> -} else if (nb_sectors > INT_MAX) {
> -printf("length argument cannot exceed %d, given %s\n",
> -   INT_MAX, argv[2]);
> +} else if (bytes > INT_MAX * BDRV_SECTOR_SIZE) {
> +printf("length argument cannot exceed %llu, given %s\n",
> +   INT_MAX * BDRV_SECTOR_SIZE, argv[2]);
>  return 0;
>  }
>  } else {
> -nb_sectors = 1;
> +bytes = BDRV_SECTOR_SIZE;
>  }
> +if (!QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE)) {
> +printf("bytes %" PRId64 " is not sector aligned\n",

This isn't real English. :-)

With that fixed (somehow, you know better than me how to):

Reviewed-by: Max Reitz 

> +   bytes);
> +return 0;
> +}
> +nb_sectors = bytes >> BDRV_SECTOR_BITS;
> 
>  remaining = nb_sectors;
>  sum_alloc = 0;



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v10 06/17] qemu-io: Don't open-code QEMU_IS_ALIGNED

2017-04-28 Thread Max Reitz

On 27.04.2017 03:46, Eric Blake wrote:
> Manual comparison against 0x1ff is not as clean as using our
> alignment macros from osdep.h.
> 
> Suggested-by: Philippe Mathieu-Daudé 
> Signed-off-by: Eric Blake 
> 
> ---
> v10: new patch
> ---
>  qemu-io-cmds.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v10 05/17] iotests: Add test 179 to cover write zeroes with unmap

2017-04-28 Thread Max Reitz

On 27.04.2017 03:46, Eric Blake wrote:
> No tests were covering write zeroes with unmap.  Additionally,
> I needed to prove that my previous patches for correct status
> reporting and write zeroes optimizations actually had an impact.
> 
> The test works for cluster_size between 8k and 2M (for smaller
> sizes, it fails because our allocation patterns are not contiguous
> with small clusters).
> 
> Signed-off-by: Eric Blake 
> 
> ---
> v10: drop any changes to v2 files, rewrite test to work with updates
> earlier in the series, add a blkdebug probe
> v9: new patch
> ---
>  tests/qemu-iotests/179 | 123 
> +
>  tests/qemu-iotests/179.out |  90 +
>  tests/qemu-iotests/group   |   1 +
>  3 files changed, 214 insertions(+)
>  create mode 100755 tests/qemu-iotests/179
>  create mode 100644 tests/qemu-iotests/179.out

Can I convince you to write this test in a different way? I really have
a hard time following all of the writes and building a mental image of
how the qcow2 file should look in the end so I can compare it to the map
output.

I'd like it better if you would check the qcow2 file after each group of
write operations, maybe even creating a new image for every such group.
But even just having more qemu-img map calls (and changing nothing else)
would help me a lot.

Max

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v5 + 1/2] target/aarch64: optimize cross-page direct jumps in softmmu

2017-04-28 Thread Emilio G. Cota

On Fri, Apr 28, 2017 at 15:17:24 -0400, Emilio G. Cota wrote:
> Perf numbers in next commit's log.
> 
> Signed-off-by: Emilio G. Cota 
> ---
>  target/arm/translate-a64.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
> index 24de30d..5b691fc 100644
> --- a/target/arm/translate-a64.c
> +++ b/target/arm/translate-a64.c
> @@ -373,8 +373,7 @@ static inline void gen_goto_tb(DisasContext *s, int n, 
> uint64_t dest)
>  } else if (s->singlestep_enabled) {
>  gen_exception_internal(EXCP_DEBUG);
>  } else {
> -tcg_gen_exit_tb(0);
> -s->is_jmp = DISAS_TB_JUMP;

I'm not sure about removing this line though. Would it be better to leave it?
I can't see how TB_JUMP ends up doing anything in the rest of the file.

Thanks,

E.

[Qemu-devel] [PATCH v5+] TCG cross-tb optimizations

2017-04-28 Thread Emilio G. Cota

While at it, we might want to include these two for aarch64.

They apply on top Richard's tcg-next branch, which includes v5.

Thanks,

Emilio

[Qemu-devel] [PATCH v5 + 2/2] target/aarch64: optimize indirect branches

2017-04-28 Thread Emilio G. Cota

Measurements:

[Baseline performance is that before applying this and the previous commit]

-NBench, aarch64-softmmu. Host: Intel 
i7-4790K @ 4.00GHz

 1.7x 
+-+--+-+
  | 
 |
  |   cross 
 |
 1.6x 
+cross+jr....+-+
  | #++#
 |
  | #  #
 |
 1.5x 
+-+...*..#...+-+
  | *+++*  #
 |
  | *   *  #
 |
 1.4x 
+-+...*...*..#...+-+
  | *   *  #
 |
  | #   *   *  #
 |
 1.3x 
+-++++#...*...*..#...+-+
  |  *++*   #   *   *  #
 |
  |  *  *   #   *   *  #
 |
 1.2x 
+-+*..*...#...*...*..#...+-+
  |  *  *   #   *   *  #
 |
  |  *  *   #   *   *  #
 |
 1.1x 
+-+...+++#..#..*..*...#...*...*..#...+-+
  |   #  *  *   #   *   *  #
 |
  | *  *  #  *  *   #   *   *  #  ###   
+++###  *  *   # |
   1x 
+-++-++-###++-*++*++#++*++*+-+#+++*+++*++#++*++*-+#++*++#++###-++*++*-+#++*+-*+++#+-++-+
  | *###  *  *  #   *  *  #  *  *   #  *++*###  *   *  #  *  *  #  
*   *  #  *  *++#   *  *  #  *  *   # |
  | *   *++#  *  *  #   *  *  #  *  *   #  *  *  #  *   *  #  *  *  #  
*   *  #  *  *  #   *  *  #  *  *   # |
 0.9x 
+-+---*###--###---###----###--*###--###--*###--###---###-----+-+
  ASSIGNMENT BITFIELD   FOURFP EMULATION   HUFFMAN   LU DECOMPOSITIONNEURAL 
NUMERIC SORSTRING SORThmean
  png: http://imgur.com/qO9ubtk
NB. cross here represents the previous commit.

-SPECint06 (test set), x86_64-linux-user. Host: 
Intel i7-4790K @ 4.00GHz

 1.5x 
+-+--+-+
  |   
*  |
  |   
*+++*   jr |
  |   * 
  *  |
 1.4x 
+-+.*...*.++++-+
  |   * 
  *  |   |
  |  ** 
  *  |   |
  |  *   ** 
  ** |
 1.3x 
+-+*...**...**.|.*...+-+
  |   +++*   ** 
  ** | * |
  |  *   *   ** 
  **+++* |
  |  *   *   *   *

[Qemu-devel] [PATCH v5 + 1/2] target/aarch64: optimize cross-page direct jumps in softmmu

2017-04-28 Thread Emilio G. Cota

Perf numbers in next commit's log.

Signed-off-by: Emilio G. Cota 
---
 target/arm/translate-a64.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 24de30d..5b691fc 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -373,8 +373,7 @@ static inline void gen_goto_tb(DisasContext *s, int n, 
uint64_t dest)
 } else if (s->singlestep_enabled) {
 gen_exception_internal(EXCP_DEBUG);
 } else {
-tcg_gen_exit_tb(0);
-s->is_jmp = DISAS_TB_JUMP;
+tcg_gen_lookup_and_goto_ptr(cpu_pc);
 }
 }
 }
-- 
2.7.4

Re: [Qemu-devel] [PATCH v2] i386: Allow cpuid bit override

2017-04-28 Thread Eduardo Habkost

On Tue, Apr 18, 2017 at 04:19:10PM -0500, Eric Blake wrote:
> On 04/18/2017 04:01 PM, Michael Roth wrote:
> 
> >>> @@ -3706,8 +3707,17 @@ static void x86_cpu_get_bit_prop(Object *obj, 
> >>> Visitor *v, const char *name,
> >>>  X86CPU *cpu = X86_CPU(obj);
> >>>  BitProperty *fp = opaque;
> >>>  uint32_t f = cpu->env.features[fp->w];
> >>> +uint32_t ff = cpu->forced_features[fp->w];
> >>>  bool value = (f & fp->mask) == fp->mask;
> >>> -visit_type_bool(v, name, , errp);
> >>> +bool forced = (ff & fp->mask) == fp->mask;
> >>> +char str[] = "force";
> >>> +char *strval = str;
> >>> +
> >>> +if (forced) {
> >>> +visit_type_str(v, name, , errp);
> >>> +} else {
> >>> +visit_type_bool(v, name, , errp);
> >>
> >> Interesting approach. This means we keep returning a boolean
> >> value on the property getter (which is important to not break
> >> compatibility), but return a string in case the feature was set
> >> to "force".
> >>
> >> We probably should update the 'type' field of the property, as it
> >> won't be a real "bool" property anymore.
> >>
> >> I won't say I love that solution, but it seems to work. I'm CCing
> >> the QAPI maintainers to see what they think about it.
> 
> Use of a formal QAPI alternate type seems reasonable here.
> 
> >>
> >> (For reference: in the v1 thread I have suggested introducing a
> >> new enum type in the QAPI schema, but this would break
> >> compatibility with existing management code that expects the
> >> property to return a boolean value [like the users of
> >> query-cpu-model-expansion]).
> 
> >>> @@ -3724,7 +3735,15 @@ static void x86_cpu_set_bit_prop(Object *obj, 
> >>> Visitor *v, const char *name,
> >>>  return;
> >>>  }
> >>>  
> >>> -visit_type_bool(v, name, , _err);
> >>> +visit_type_str(v, name, , _err);
> >>> +if (!local_err && !strcmp(strval, "force")) {
> >>> +value = true;
> >>> +cpu->forced_features[fp->w] |= fp->mask;
> >>> +} else {
> >>> +local_err = NULL;
> >>> +visit_type_bool(v, name, , _err);
> >>> +}
> >>
> >> I'm not sure this is valid usage of the visitor API. If the
> >> visit_type_str() call succeeds, isn't the visitor allowed to
> >> consume the original value, and return something completely
> >> different when visit_type_bool() is called?
> > 
> 
> Better would be defining the QAPI alternate type:
> 
> { 'enum': 'Force', 'data': ['force']}
> { 'alternate': 'BoolOrForce',
>   'data': { 'b': 'bool', 's': 'Force' } }
> 
> then given a BoolOrForce *state; declaration, you do either:
> 
> state->type = QTYPE_QBOOL;
> state->u.b = true;
> 
> or
> 
> state->type = QTYPE_QSTRING;
> state->u.s = FORCE_FORCE;
> 
> and then call visit_type_BoolOrForce(...,  ...)
> 
> Qcow2OverlapChecks is an example of an alternate in use, if that helps.
> 
> > That was indeed one of the design goals, to the extent that
> > you could even take a stream of data types and actually build
> > up a dictionary based on the ordering visited (e.g. the
> > once-proposed BER visitor for migration and why QAPI uses
> > OrderedDict when generating visitors).
> 
> We document that QMP is order-insensitive, but you are right that
> QAPI-generated code still respects ordering. Some of our testsuite at
> least relies on the stability you get by always visiting things in
> declaration order, even though other dictionary orders should still be
> valid.
> 
> > 
> > I'm not sure how important that is anymore, but even setting
> > that aside that we still have the issue of not handling the
> > data type as declared, and relying on error-handling to probe
> > it, which doesn't necessarily leave the visitor in a recoverable
> > state where we can continue visiting. I think it might be possible
> > to rely on the "alternate" QAPI type to handle this case. Maybe
> > something like this?
> 
> Yes, that was the proposal I outlined above.  Note that your rough-idea
> code:
> 
> > 
> > GenericAlternate *alt = NULL;
> > visit_start_alternate(v, name, , sizeof(*alt), false, _err);
> > if (local_err) {
> > goto out;
> > }
> > if (!alt) {
> > goto out_obj;
> > }
> > switch (alt->type) {
> > case QTYPE_QBOOL:
> > visit_type_bool(v, name, , _err);
> > break;
> > case QTYPE_QSTRING:
> > visit_type_str(v, name, , _err);
> 
> ...
> is a duplicate of what is already generated for
> visit_type_BoolOrForce(), so you don't need to open-code it.
> 
> > 
> > This would only work for setters/qobject-input-visitor ATM though since
> > qobject-output-visitor doesn't support the visit_start_alternate()
> > interface. I'm not sure why it wasn't enabled on the output side, maybe
> > Eric knows if doing so would be feasible for this situation?
> 
> qobject-output-visitor DOES support alternates.  It doesn't need a
> .start_alternate callback, because the generic visitor-core.c code
> handles things just fine for output

Re: [Qemu-devel] [PATCH v2 00/25] qmp: add async command type

2017-04-28 Thread Kevin Wolf

Am 28.04.2017 um 17:55 hat Marc-André Lureau geschrieben:
> On Tue, Apr 25, 2017 at 2:23 PM Kevin Wolf  wrote:
> 
> > Am 24.04.2017 um 21:10 hat Markus Armbruster geschrieben:
> > > With 2.9 out of the way, how can we make progress on this one?
> > >
> > > I can see two ways to get asynchronous QMP commands accepted:
> > >
> > > 1. We break QMP compatibility in QEMU 3.0 and convert all long-running
> > >tasks from "synchronous command + event" to "asynchronous command".
> > >
> > >This is design option 1 quoted below.  *If* we decide to leave
> > >compatibility behind for 3.0, *and* we decide we like the
> > >asynchronous sufficiently better to put in the work, we can do it.
> > >
> > >I guess there's nothing to do here until we decide on breaking
> > >compatibility in 3.0.
> > >
> > > 2. We don't break QMP compatibility, but we add asynchronous commands
> > >anyway, because we decide that's how we want to do "jobs".
> > >
> > >This is design option 3 quoted below.  As I said, I dislike its lack
> > >of orthogonality.  But if asynchronous commands help us get jobs
> > >done, I can bury my dislike.
> >
> > I don't think async commands are attractive at all for doing jobs. I
> 
> It's still a bit obscure to me what we mean by "jobs".

I guess the best definition that we have is: Image streaming, mirroring,
live backup, live commit and future "similar things".

> feel they bring up more questions that they answer, for example, what
> > happens if libvirt crashes and then reconnects? Which monitor connection
> > does get the reply for an async command sent on the now disconnected
> > one?
> >
> 
> The monitor to receive a reply is the one that sent the command (just
> like return today)
> 
> As explained in the cover letter, an async command may cancel the
> ongoing operation on disconnect.

But that's not what you generally want. You don't want to abort your
backup just because libvirt lost its monitor connection, but qemu should
continue to copy the data, and when libvirt reconnects it should be able
to get back control of this background operation and bring it to
successful completion.

> If there is a global state change, a separate event should be
> broadcasted (no change proposed here)

In a way, the existence of a block job is global state today. Not sure
if this is what you mean, though.

> > We already have a model for doing long-running jobs, and as far as I'm
> > aware, it's working and we're not fighting limitations of the design. So
> > what are we even trying to solve here? In the context of jobs, async
> > commands feel like a solution in need of a problem to me.
> 
> See the cover letter for the 2 main reasons for this proposal. If your
> domain API is fine, you don't have to opt-in and you may continue to use
> the current sync model. However, I believe there is benefit in using this
> work to have a more consitent async API.

I think we need a clear understanding of what the potential use cases
are that could make good use a new infrastructure. We don't generally
add infrastructure if we don't have a concrete idea what its users could
be. I only ruled out that the current users of block jobs are a good fit
for it, but there may be other use cases for which it works great.

If commands can opt-in or opt-out of the new model, consistency isn't a
particularly good argument, though.

> > Things may look a bit different in typically quick, but potentially
> > long-running commands. That is, anything that we currently execute
> > synchronously while holding the BQL, but that involves I/O and could
> > therefore take a while (impacting the performance of the VM) or even
> > block indefinitely.
> >
> > The first problem (we're holding the lock too long) can be addressed
> > by making things async just inside qemu and we don't need to expose
> > the change on the QMP level. The second one (blocking indefinitely)
> > requires
> >
> 
> That's what I propose as 1)
> 
> 
> > being async on the QMP level if we want the monitor to be responsive
> > even if we're using an image on an NFS server that went down.
> >
> 
> That's the 2)
> 
> > On the other hand, using the traditional job infrastructure is way
> > over the top if all you want to do is 'query-block', so we need
> > something different for making it async. And if a client
> > disconnects, the 'query-block' result can just be thrown away, it's
> > much simpler than actual jobs.
> 
> I agree a fully-featured job infrastructure is way over the top, and I
> believe I propose a minimal change to make optionnally some QMP
> commands async.

So are commands like 'query-block' (which are typically _not_ considered
long-running) what you're aiming for with your proposal? This is a case
where I think we could consider the use of async QMP commands, but I
didn't have the impression that this kind of commands was your primary
target.

> > So where I can see advantages for a new async command type is

Re: [Qemu-devel] [PATCH v10 04/17] qcow2: Optimize zero_single_l2() to minimize L2 churn

2017-04-28 Thread Eric Blake

On 04/28/2017 01:00 PM, Max Reitz wrote:
> On 27.04.2017 03:46, Eric Blake wrote:
>> Similar to discard_single_l2(), we should try to avoid dirtying
>> the L2 cache when the cluster we are changing already has the
>> right characteristics.
>>

>> -/* Update L2 entries */
>> +/*
>> + * Minimize L2 changes if the cluster already reads back as
>> + * zeroes with correct allocation.
>> + */
>> +if (qcow2_get_cluster_type(old_offset) == QCOW2_CLUSTER_ZERO &&
>> +!(unmap && old_offset & L2E_OFFSET_MASK)) {
>> +continue;
>> +}
>> +
>>  qcow2_cache_entry_mark_dirty(bs, s->l2_table_cache, l2_table);
>> -if (old_offset & QCOW_OFLAG_COMPRESSED || flags & 
>> BDRV_REQ_MAY_UNMAP) {
>> +if (old_offset & QCOW_OFLAG_COMPRESSED || unmap) {
> 
> Works, but I'd like it better to store the cluster type somewhere and
> then use it here instead of checking the COMPRESSED flag.

Indeed, qcow2_get_cluster_type() checks QCOW_OFLAG_COMPRESSED first -
but I like the idea of storing it in a temporary variable now that we
are using the cluster type in more than one place.

> 
> But it works, so it's up to you:
> 
> Reviewed-by: Max Reitz 
> 
>>  l2_table[l2_index + i] = cpu_to_be64(QCOW_OFLAG_ZERO);
>>  qcow2_free_any_clusters(bs, old_offset, 1, 
>> QCOW2_DISCARD_REQUEST);
>>  } else {
>>
> 
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v10 02/17] qcow2: Correctly report status of preallocated zero clusters

2017-04-28 Thread Eric Blake

On 04/28/2017 12:35 PM, Max Reitz wrote:
> On 27.04.2017 03:46, Eric Blake wrote:
>> We were throwing away the preallocation information associated with
>> zero clusters.  But we should be matching the well-defined semantics
>> in bdrv_get_block_status(), where (BDRV_BLOCK_ZERO |
>> BDRV_BLOCK_OFFSET_VALID) informs the user which offset is reserved,
>> while still reminding the user that reading from that offset is
>> likely to read garbage.
>>
>> Making this change lets us see which portions of an image are zero
>> but preallocated, when using qemu-img map --output=json.  The
>> --output=human side intentionally ignores all zero clusters, whether
>> or not they are preallocated.
>>
>> The fact that there is no change to qemu-iotests './check -qcow2'
>> merely means that we aren't yet testing this aspect of qemu-img;
>> a later patch will add a test.
>>
>> Signed-off-by: Eric Blake 
>>
>> ---
>> v10: new patch
>> ---
>>  block/qcow2-cluster.c | 32 +++-
>>  1 file changed, 27 insertions(+), 5 deletions(-)
> 
> I'd propose you split the qcow2 changes off of this series.

Yeah, it's sort of morphed into a well-tested later part and an earlier
part that is still undergoing heavy review.  I don't know how much churn
there would be to rebase things to do the blkdebug changes first (it may
invalidate portions of my test 177; but I could always tag such spots as
FIXME while waiting on the qcow2 part to settle and land).  I'll see
what I can do, as I've now got multiple series queued up, and should at
least get SOMETHING to land to start clearing up the logjam.


>> @@ -335,9 +339,10 @@ static int count_contiguous_clusters_by_type(int 
>> nb_clusters,
>>  int i;
>>
>>  for (i = 0; i < nb_clusters; i++) {
>> -int type = qcow2_get_cluster_type(be64_to_cpu(l2_table[i]));
>> +uint64_t entry = be64_to_cpu(l2_table[i]);
>> +int type = qcow2_get_cluster_type(entry);
>>
>> -if (type != wanted_type) {
>> +if (type != wanted_type || entry & L2E_OFFSET_MASK) {
> 
> This works only for wanted_type \in \{ ZERO, UNALLOCATED \} -- which is
> what the comment you added describes, but this may still warrant a
> renaming of this function (count_contiguous_unallocated_clusters?) and
> probably an assertion that wanted_type is ZERO or UNALLOCATED.

Good idea.


>> @@ -559,9 +564,26 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, 
>> uint64_t offset,
>>  ret = -EIO;
>>  goto fail;
>>  }
>> -c = count_contiguous_clusters_by_type(nb_clusters, 
>> _table[l2_index],
>> -  QCOW2_CLUSTER_ZERO);
>> -*cluster_offset = 0;
>> +/* Distinguish between pure zero clusters and pre-allocated ones */
>> +if (*cluster_offset & L2E_OFFSET_MASK) {
>> +c = count_contiguous_clusters(nb_clusters, s->cluster_size,
>> +  _table[l2_index], 
>> QCOW_OFLAG_ZERO);
> 
> You should probably switch this patch and the next one -- or I just send
> my patches myself and you base your series on top of it...

I tested with your patches in, then rebased with your patches out to see
what broke, and...

> 
> Because before the next patch, this happens:
> 

> $ ./qemu-img map foo.qcow2
> Offset  Length  Mapped to   File
> qemu-img: block/qcow2-cluster.c:319: count_contiguous_clusters:
> Assertion `qcow2_get_cluster_type(first_entry) == QCOW2_CLUSTER_NORMAL'
> failed.

got that same assert. So I realized that your patch was necessary and
rebased it back in, but obviously forgot to re-test at all points in the
series. Yes, your should go first, and I'd be more than happy if you
post yours formally and I rebase my qcow2 portions on top of yours (all
the more reason for me to split this series into two, and get the
blkdebug portions in now while we still hammer on the qcow2 stuff).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 2/3] migration: Remove use of old MigrationParams

2017-04-28 Thread Eric Blake

On 04/25/2017 05:30 AM, Juan Quintela wrote:
> We have change in the previous patch to use migration capabilities for
> it.  Notice that we continue using the old command line flags from
> migrate command from the time being.  Remove the set_params method as
> now it is empty.
> 
> Signed-off-by: Juan Quintela 
> ---
>  include/migration/migration.h |  3 +--
>  migration/block.c | 17 ++---
>  migration/colo.c  |  3 ---
>  migration/migration.c |  8 +---
>  migration/savevm.c|  2 --
>  5 files changed, 8 insertions(+), 25 deletions(-)
> 
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index 618ab0e..2917baa 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -42,8 +42,7 @@
>  extern int only_migratable;
>  
>  struct MigrationParams {
> -bool blk;
> -bool shared;
> +bool unused; /* C don't allow empty structs */

s/don't/doesn't/

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH] MAINTAINERS: Add qemu-progress to the block layer

2017-04-28 Thread Eric Blake

On 04/28/2017 11:55 AM, Max Reitz wrote:
> util/qemu-progress.c is currently unmaintained. The only user of its
> functionality is qemu-img, so it effectively is part of the block layer.
> 
> Suggested-by: Fam Zheng 
> Signed-off-by: Max Reitz 
> ---
>  MAINTAINERS | 1 +
>  1 file changed, 1 insertion(+)

Reviewed-by: Eric Blake 

> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index a1d2b3a4d3..768b4d94ec 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -1168,6 +1168,7 @@ F: include/block/
>  F: qemu-img*
>  F: qemu-io*
>  F: tests/qemu-iotests/
> +F: util/qemu-progress.c
>  T: git git://repo.or.cz/qemu/kevin.git block
>  
>  Block I/O path
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] RFC: some problems with bdrv_get_block_status

2017-04-28 Thread Eric Blake

On 04/28/2017 11:17 AM, Denis V. Lunev wrote:
> Hello, All!
> 
> Recently we have experienced problems with very slow
> bdrv_get_block_status call, which is massive called f.e.
> from mirror_run().
> 
> The problem was narrowed down to slow lseek64() system
> call, which can take 1-2-3 seconds.

I'm guessing you meant one-to-three (the range), not one-two-three
(three separate digits), and just had an unfortunate abbreviation of
'to' turning into the phonetically-similar '2'.

> 
> root@s158 ~]# strace -f -p 77048 -T  -e lseek
> Process 77048 attached with 14 threads
> [pid 77053] lseek(23, 6359613440, SEEK_DATA) = 6360276992 <1.250323>

That sounds like a bug in your choice of filesystem.  It's been
mentioned before that lseek has some pathetically poor behavior (I think
tmpfs was one of the culprits), but I maintain that it is better to
hammer on the kernel folks to fix the poor behavior than it is to have
to implement user-space workarounds in every single affected program.

That said, a workaround of being able to request the avoidance of lseek
has been brought up on this list before.

> The problem comes from this branch of the code
> 
> bdrv_co_get_block_status
> ...
> if (bs->file &&
> (ret & BDRV_BLOCK_DATA) && !(ret & BDRV_BLOCK_ZERO) &&
> (ret & BDRV_BLOCK_OFFSET_VALID)) {
> int file_pnum;
> 
> ret2 = bdrv_co_get_block_status(bs->file, ret >> BDRV_SECTOR_BITS,
> *pnum, _pnum);
> if (ret2 >= 0) {
> /* Ignore errors.  This is just providing extra information, it
>  * is useful but not necessary.
>  */
> if (!file_pnum) {
> /* !file_pnum indicates an offset at or beyond the EOF;
> it is
>  * perfectly valid for the format block driver to point
> to such
>  * offsets, so catch it and mark everything as zero */
> ret |= BDRV_BLOCK_ZERO;
> } else {
> /* Limit request to the range reported by the protocol
> driver */
> *pnum = file_pnum;
> ret |= (ret2 & BDRV_BLOCK_ZERO);
> }
> }
> }

I don't see anything wrong with this code. It's nice to know that we
have data (qcow2 says we have allocated bytes due to this layer, so
don't refer to the backing files), but when the underlying file can ALSO
tell us that the underlying protocol is sparse and we are on a hole,
then we know that we have BDRV_BLOCK_ZERO condition which can in turn
enable other optimizations in quite a bit of the stack.  It IS valid to
return BDRV_BLOCK_DATA | BDRV_BLOCK_ZERO (we don't have to limit
ourselves to BDRV_BLOCK_DATA), and we should probably do so any time
that such computation is cheap.

I agree with your analysis that on a poor filesystem where lseek()
behavior sucks that it is no longer cheap to determine where the holes are.

But the proper workaround for those filesystems should NOT be made by
undoing this part of bdrv_co_get_block_status(), but should rather be
fixed in file-posix.c by the addition of a user-controllable knob on
whether to skip lseek().  In other words, if we're going to work around
the poor filesystem performance, the workaround should live in
file-posix.c, not in the generic block/io.c.  Once the workaround is
enabled, the 'ret2 = bdrv_co_get_block_status()' nested call should be
lightning fast (because file-posix.c would no longer be using lseek when
you've requested the workaround).

> Frankly speaking, this optimization should not give much.
> If upper layer format (QCOW2) says that we have data
> here, then nowadays in 99.9% we do have the data.

You are correct that we have BDRV_BLOCK_DATA; but it IS useful to ALSO
know if that data reads back as all zeros (so we can skip reading it).
Just because qcow2 reports something as data does NOT rule out whether
the data still reads as zeros.

> Meanwhile this branch can cause problems. We would
> need block cleared entirely to get the benefit for most
> cases in mirror and backup operations.
> 
> At my opinion it worth to drop this at all.
> 
> Guys, do you have any opinion?

Again, my opinion is to not change this part of block/io.c.  Rather,
work should be expended on individual protocol drivers to quit wasting
unnecessary time on reporting BDRV_BLOCK_ZERO if the time spent to
determine that is not worth the effort (as it is when using lseek() on
inefficient filesystems).  It is always safe for a protocol driver to
report BDRV_BLOCK_DATA instead of BDRV_BLOCK_ZERO, if asking the
question is too expensive (treating holes as data may pessimize other
operations, but that's okay if the penalty for asking is worse than what
optimizations are able to regain).

> 
> Den
> 
> P.S. The kernel is one based on RedHat 3.10.0-514. The same
>   problem was observed in 3.10.0-327 too.

Red Hat is always two words (I'm allowed to point that out, as they
employ me ;).  But if it

Re: [Qemu-devel] [PATCH v15 20/21] file-posix: Add image locking to perm operations

2017-04-28 Thread Kevin Wolf

Am 28.04.2017 um 17:30 hat Fam Zheng geschrieben:
> On Fri, 04/28 15:45, Kevin Wolf wrote:
> > Am 26.04.2017 um 05:34 hat Fam Zheng geschrieben:
> > > This extends the permission bits of op blocker API to external using
> > > Linux OFD locks.
> > > 
> > > Each permission in @perm and @shared_perm is represented by a locked
> > > byte in the image file.  Requesting a permission in @perm is translated
> > > to a shared lock of the corresponding byte; rejecting to share the same
> > > permission is translated to a shared lock of a separate byte. With that,
> > > we use 2x number of bytes of distinct permission types.
> > > 
> > > virtlockd in libvirt locks the first byte, so we do locking from a
> > > higher offset.
> > > 
> > > Suggested-by: Kevin Wolf 
> > > Signed-off-by: Fam Zheng 
> > 
> > >  BlockDriver bdrv_file = {
> > >  .format_name = "file",
> > >  .protocol_name = "file",
> > > @@ -1977,7 +2234,11 @@ BlockDriver bdrv_file = {
> > >  .bdrv_get_info = raw_get_info,
> > >  .bdrv_get_allocated_file_size
> > >  = raw_get_allocated_file_size,
> > > -
> > > +.bdrv_inactivate = raw_inactivate,
> > > +.bdrv_invalidate_cache = raw_invalidate_cache,
> > > +.bdrv_check_perm = raw_check_perm,
> > > +.bdrv_set_perm   = raw_set_perm,
> > > +.bdrv_abort_perm_update = raw_abort_perm_update,
> > >  .create_opts = _create_opts,
> > >  };
> > 
> > By the way, is it intentional that we apply locking only to bdrv_file,
> > but not to bdrv_host_device or bdrv_host_cdrom?
> 
> Good question.
> 
> Though CDROM is not very interesting, I am not sure about bdrv_host_device:
> 
> 1) On the one hand, a host device can contain a qcow2 image so maybe locking 
> is
> useful.  Another reason to lock is that they share the same QAPI option,
> BlockdevOptionsFile. That last reason is, it should be very easy to add it.
> 
> 2) On the other hand, unlike files, host devices get pretty high chances in
> being accesses by multiple readers/writers, outside of QEMU, such as partition
> detection, mount, fsck, etc.
> 
> What is your opinion?

I would add it, if nothing else to be consistent with regular files. You
don't even need qcow2 on a block device for it to be useful, even for
raw images the guest may not be able to tolerate a second writer (in
this case, share-rw of the qdev device will directly control the locking
mode).

As for 2), I don't think these other users are a problem because we're
only taking shared locks. We'll prevent other users from taking an
exclusive lock, but that's exactly right because it's true that we're
using the image.

Kevin

Re: [Qemu-devel] [PATCH v5 2/4] shutdown: Prepare for use of an enum in reset/shutdown_request

2017-04-28 Thread Dr. David Alan Gilbert

* Eric Blake (ebl...@redhat.com) wrote:
> On 04/28/2017 11:09 AM, Dr. David Alan Gilbert wrote:
> 
> >>> At a higher level, using your tags, I'm not sure where a reset triggered
> >>> by a fault detected by the hypervisor lives - e.g. an x86 triple fault
> >>> where the guest screws up so badly that it just gets reset.  Is
> >>> that a guest-reset or a guest-panic or what - neither case
> >>> was actually asked for by the guest itself.
> >>
> >> Wouldn't that be host-error (qemu detected an error that prevents
> >> further execution of the guest without a reset - and a triple fault
> >> seems to fall into the category of the guest getting itself wedged
> >> rather than actually trying to reset)?  Except patch 3 only used
> >> SHUTDOWN_TYPE_HOST_ERROR in the xen portion of the patch.
> >>
> >> So if any x86 expert has an opinion on where triple-fault handling is
> >> emulated, and what category should be used there, I'm welcome to
> >> tweaking this series.
> > 
> > It's pretty much on the border anyway, I don't think it matters too
> > much; it sounds perfectly reasonable.
> 
> Actually, reading
> https://blogs.msdn.microsoft.com/larryosterman/2005/02/08/faster-syscall-trap-redux/
> makes it sound like the triple-fault = reset is exploited by existing OS
> (dating back to days of targetting 286 machines), so it is bare-metal
> behavior that we have to faithfully emulate as a guest-triggered reset,
> and not something where the guest has wedged itself to the point where
> qemu can no longer execute the guest.

The point is it's both :-)
A lot of x86 reset code tries four or five different ways to invoke
a reset and if all else fails they triple fault.

Dave

> -- 
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.   +1-919-301-3266
> Virtualization:  qemu.org | libvirt.org
> 



--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH v5 2/4] shutdown: Prepare for use of an enum in reset/shutdown_request

2017-04-28 Thread Eric Blake

On 04/28/2017 11:09 AM, Dr. David Alan Gilbert wrote:

>>> At a higher level, using your tags, I'm not sure where a reset triggered
>>> by a fault detected by the hypervisor lives - e.g. an x86 triple fault
>>> where the guest screws up so badly that it just gets reset.  Is
>>> that a guest-reset or a guest-panic or what - neither case
>>> was actually asked for by the guest itself.
>>
>> Wouldn't that be host-error (qemu detected an error that prevents
>> further execution of the guest without a reset - and a triple fault
>> seems to fall into the category of the guest getting itself wedged
>> rather than actually trying to reset)?  Except patch 3 only used
>> SHUTDOWN_TYPE_HOST_ERROR in the xen portion of the patch.
>>
>> So if any x86 expert has an opinion on where triple-fault handling is
>> emulated, and what category should be used there, I'm welcome to
>> tweaking this series.
> 
> It's pretty much on the border anyway, I don't think it matters too
> much; it sounds perfectly reasonable.

Actually, reading
https://blogs.msdn.microsoft.com/larryosterman/2005/02/08/faster-syscall-trap-redux/
makes it sound like the triple-fault = reset is exploited by existing OS
(dating back to days of targetting 286 machines), so it is bare-metal
behavior that we have to faithfully emulate as a guest-triggered reset,
and not something where the guest has wedged itself to the point where
qemu can no longer execute the guest.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v10 04/17] qcow2: Optimize zero_single_l2() to minimize L2 churn

2017-04-28 Thread Max Reitz

On 27.04.2017 03:46, Eric Blake wrote:
> Similar to discard_single_l2(), we should try to avoid dirtying
> the L2 cache when the cluster we are changing already has the
> right characteristics.
> 
> Note that by the time we get to zero_single_l2(), BDRV_REQ_MAY_UNMAP
> is a requirement to unallocate a cluster (this is because the block
> layer clears that flag if discard.* flags during open requested that
> we never punch holes - see the conversation around commit 170f4b2e,
> https://lists.gnu.org/archive/html/qemu-devel/2016-09/msg07306.html).
> Therefore, this patch can only reuse a zero cluster as-is if either
> unmapping is not requested, or if the zero cluster was not associated
> with an allocation.
> 
> Technically, there are some cases where an unallocated cluster
> already reads as all zeroes (namely, when there is no backing file
> [easy: check bs->backing], or when the backing file also reads as
> zeroes [harder: we can't check bdrv_get_block_status since we are
> already holding the lock]), where the guest would not immediately see
> a difference if we left that cluster unallocated.  But if the user
> did not request unmapping, leaving an unallocated cluster is wrong;
> and even if the user DID request unmapping, keeping a cluster
> unallocated risks a subtle semantic change of guest-visible contents
> if a backing file is later added, and it is not worth auditing
> whether all internal uses such as mirror properly avoid an unmap
> request.  Thus, this patch is intentionally limited to just clusters
> that are already marked as zero.
> 
> Signed-off-by: Eric Blake 
> 
> ---
> v10: new patch, replacing earlier attempt to use unallocated clusters,
> and ditching any optimization of v2 files
> ---
>  block/qcow2-cluster.c | 13 +++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index db3d937..d542894 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -1614,6 +1614,7 @@ static int zero_single_l2(BlockDriverState *bs, 
> uint64_t offset,
>  int l2_index;
>  int ret;
>  int i;
> +bool unmap = !!(flags & BDRV_REQ_MAY_UNMAP);
> 
>  ret = get_cluster_table(bs, offset, _table, _index);
>  if (ret < 0) {
> @@ -1629,9 +1630,17 @@ static int zero_single_l2(BlockDriverState *bs, 
> uint64_t offset,
> 
>  old_offset = be64_to_cpu(l2_table[l2_index + i]);
> 
> -/* Update L2 entries */
> +/*
> + * Minimize L2 changes if the cluster already reads back as
> + * zeroes with correct allocation.
> + */
> +if (qcow2_get_cluster_type(old_offset) == QCOW2_CLUSTER_ZERO &&
> +!(unmap && old_offset & L2E_OFFSET_MASK)) {
> +continue;
> +}
> +
>  qcow2_cache_entry_mark_dirty(bs, s->l2_table_cache, l2_table);
> -if (old_offset & QCOW_OFLAG_COMPRESSED || flags & 
> BDRV_REQ_MAY_UNMAP) {
> +if (old_offset & QCOW_OFLAG_COMPRESSED || unmap) {

Works, but I'd like it better to store the cluster type somewhere and
then use it here instead of checking the COMPRESSED flag.

But it works, so it's up to you:

Reviewed-by: Max Reitz 

>  l2_table[l2_index + i] = cpu_to_be64(QCOW_OFLAG_ZERO);
>  qcow2_free_any_clusters(bs, old_offset, 1, 
> QCOW2_DISCARD_REQUEST);
>  } else {
> 




signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v10 03/17] qcow2: Reuse preallocated zero clusters

2017-04-28 Thread Max Reitz

On 27.04.2017 03:46, Eric Blake wrote:
> From: Max Reitz 
> 
> Instead of just freeing preallocated zero clusters and completely
> allocating them from scratch, reuse them.
> 
> We cannot do this in handle_copied(), however, since this is a COW
> operation. Therefore, we have to add the new logic to handle_alloc() and
> simply return the existing offset if it exists. The only catch is that
> we have to convince qcow2_alloc_cluster_link_l2() not to free the old
> clusters (because we have reused them).
> 
> Reported-by: Eric Blake 
> Signed-off-by: Max Reitz 
> Signed-off-by: Eric Blake 
> 
> ---
> v10: new patch. Max hasn't posted the patch directly on list, but
> did mention it here:
> https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg03936.html
> and that he would post it once he has tests. Well, my later patches
> add a test that requires this one :)  The other two patches that
> he mentioned there are also good, but not essential for my series
> (and I didn't want to write tests to expose the behavior difference,
> because it would deprive Max of that fun).

Well, the main reason I didn't send the patches yet is because I was
tired while writing them ("Date: Sat Apr 22 01:17:52 2017 +0200") and I
wanted to take another look before sending them. I guess now's as good a
time as any.

> ---
>  block/qcow2.h |  3 ++
>  block/qcow2-cluster.c | 83 
> +++
>  2 files changed, 60 insertions(+), 26 deletions(-)

[...]

> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index d1063df..db3d937 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c

[...]

> @@ -1192,31 +1199,53 @@ static int handle_alloc(BlockDriverState *bs, 
> uint64_t guest_offset,
>   * wrong with our code. */
>  assert(nb_clusters > 0);
> 
> +if (!*host_offset && qcow2_get_cluster_type(entry) == QCOW2_CLUSTER_ZERO 
> &&
> +(entry & L2E_OFFSET_MASK) != 0 && (entry & QCOW_OFLAG_COPIED))

*host_offset works with this, too, if
start_of_cluster(s, *host_offset) == (entry & L2E_OFFSET_MASK).

If !(entry & QCOW_OFLAG_COPIED), we should check whether the refcount
maybe is 1 and then set OFLAG_COPIED. But that is something we don't
even do for normal clusters yet, so it's something to fix another day.

> +{
> +/* Try to reuse preallocated zero clusters; contiguous normal 
> clusters
> + * would be fine, too, but count_cow_clusters() above has limited
> + * nb_clusters already to a range of COW clusters */
> +int preallocated_nb_clusters =
> +count_contiguous_clusters(nb_clusters, s->cluster_size, l2_table,

s/l2_table/_table[l2_index]/

> +  QCOW_OFLAG_COPIED);
> +
> +if (preallocated_nb_clusters) {

preallocated_nb_clusters must be at least 1, so an assertion would be
better.

Max

> +nb_clusters = preallocated_nb_clusters;
> +alloc_cluster_offset = entry & L2E_OFFSET_MASK;
> +
> +/* We want to reuse these clusters, so 
> qcow2_alloc_cluster_link_l2()
> + * should not free them. */
> +keep_old_clusters = true;
> +}
> +}
> +
>  qcow2_cache_put(bs, s->l2_table_cache, (void **) _table);
> 
> -/* Allocate, if necessary at a given offset in the image file */
> -alloc_cluster_offset = start_of_cluster(s, *host_offset);
> -ret = do_alloc_cluster_offset(bs, guest_offset, _cluster_offset,
> -  _clusters);
> -if (ret < 0) {
> -goto fail;
> -}
> -
> -/* Can't extend contiguous allocation */
> -if (nb_clusters == 0) {
> -*bytes = 0;
> -return 0;
> -}
> -
> -/* !*host_offset would overwrite the image header and is reserved for "no
> - * host offset preferred". If 0 was a valid host offset, it'd trigger the
> - * following overlap check; do that now to avoid having an invalid value 
> in
> - * *host_offset. */
>  if (!alloc_cluster_offset) {
> -ret = qcow2_pre_write_overlap_check(bs, 0, alloc_cluster_offset,
> -nb_clusters * s->cluster_size);
> -assert(ret < 0);
> -goto fail;
> +/* Allocate, if necessary at a given offset in the image file */
> +alloc_cluster_offset = start_of_cluster(s, *host_offset);
> +ret = do_alloc_cluster_offset(bs, guest_offset, 
> _cluster_offset,
> +  _clusters);
> +if (ret < 0) {
> +goto fail;
> +}
> +
> +/* Can't extend contiguous allocation */
> +if (nb_clusters == 0) {
> +*bytes = 0;
> +return 0;
> +}
> +
> +/* !*host_offset would overwrite the image header and is reserved for
> + * "no host offset preferred". If 0 was a valid host offset, it'd
> + * trigger

Re: [Qemu-devel] [PATCH v5 13/19] tb-hash: improve tb_jmp_cache hash function in user mode

2017-04-28 Thread Emilio G. Cota

On Fri, Apr 28, 2017 at 18:00:57 +0100, Alex Bennée wrote:
> I'll note when I've plotted hit-rates against the cache we don't seem to
> be making a good even use of the cache over time. But I suspect there is
> more that could be done here. 

If you're talking about full-system mode here, then yes, the hit rate
isn't great, which I think is due to frequent flushes of tb_jmp_cache.

For user-mode and for many of these benchmarks we get hit rates in the high
90's. I also tried xxhash and hit rates didn't go up (sometimes even went
one or two percentage points lower), which shows that the hashing function
we have now is good enough, i.e. unless we increase the size of tb_jmp_cache,
there's not a lot to improve here.

I didn't check all benchmarks though, just a few of the SPEC06int ones.

If you have those numbers/plots I'd be interested in seeing them.

Thanks,

E.

Re: [Qemu-devel] [PATCH v10 02/17] qcow2: Correctly report status of preallocated zero clusters

2017-04-28 Thread Max Reitz

On 27.04.2017 03:46, Eric Blake wrote:
> We were throwing away the preallocation information associated with
> zero clusters.  But we should be matching the well-defined semantics
> in bdrv_get_block_status(), where (BDRV_BLOCK_ZERO |
> BDRV_BLOCK_OFFSET_VALID) informs the user which offset is reserved,
> while still reminding the user that reading from that offset is
> likely to read garbage.
> 
> Making this change lets us see which portions of an image are zero
> but preallocated, when using qemu-img map --output=json.  The
> --output=human side intentionally ignores all zero clusters, whether
> or not they are preallocated.
> 
> The fact that there is no change to qemu-iotests './check -qcow2'
> merely means that we aren't yet testing this aspect of qemu-img;
> a later patch will add a test.
> 
> Signed-off-by: Eric Blake 
> 
> ---
> v10: new patch
> ---
>  block/qcow2-cluster.c | 32 +++-
>  1 file changed, 27 insertions(+), 5 deletions(-)

I'd propose you split the qcow2 changes off of this series.

> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index 100398c..d1063df 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -328,6 +328,10 @@ static int count_contiguous_clusters(int nb_clusters, 
> int cluster_size,
>   return i;
>  }
> 
> +/*
> + * Checks how many consecutive clusters in a given L2 table have the same
> + * cluster type with no corresponding allocation.
> + */
>  static int count_contiguous_clusters_by_type(int nb_clusters,
>   uint64_t *l2_table,
>   int wanted_type)
> @@ -335,9 +339,10 @@ static int count_contiguous_clusters_by_type(int 
> nb_clusters,
>  int i;
> 
>  for (i = 0; i < nb_clusters; i++) {
> -int type = qcow2_get_cluster_type(be64_to_cpu(l2_table[i]));
> +uint64_t entry = be64_to_cpu(l2_table[i]);
> +int type = qcow2_get_cluster_type(entry);
> 
> -if (type != wanted_type) {
> +if (type != wanted_type || entry & L2E_OFFSET_MASK) {

This works only for wanted_type \in \{ ZERO, UNALLOCATED \} -- which is
what the comment you added describes, but this may still warrant a
renaming of this function (count_contiguous_unallocated_clusters?) and
probably an assertion that wanted_type is ZERO or UNALLOCATED.

>  break;
>  }
>  }
> @@ -559,9 +564,26 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, 
> uint64_t offset,
>  ret = -EIO;
>  goto fail;
>  }
> -c = count_contiguous_clusters_by_type(nb_clusters, 
> _table[l2_index],
> -  QCOW2_CLUSTER_ZERO);
> -*cluster_offset = 0;
> +/* Distinguish between pure zero clusters and pre-allocated ones */
> +if (*cluster_offset & L2E_OFFSET_MASK) {
> +c = count_contiguous_clusters(nb_clusters, s->cluster_size,
> +  _table[l2_index], 
> QCOW_OFLAG_ZERO);

You should probably switch this patch and the next one -- or I just send
my patches myself and you base your series on top of it...

Because before the next patch, this happens:

$ ./qemu-img create -f qcow2 foo.qcow2 64M
Formatting 'foo.qcow2', fmt=qcow2 size=67108864 encryption=off
cluster_size=65536 lazy_refcounts=off refcount_bits=16
$ ./qemu-io -c 'write 0 64M' -c 'write -z 0 64M' foo.qcow2
wrote 67108864/67108864 bytes at offset 0
64 MiB, 1 ops; 0.1846 sec (346.590 MiB/sec and 5.4155 ops/sec)
wrote 67108864/67108864 bytes at offset 0
64 MiB, 1 ops; 0.0004 sec (127.551 GiB/sec and 2040.8163 ops/sec)
$ ./qemu-img map foo.qcow2
Offset  Length  Mapped to   File
qemu-img: block/qcow2-cluster.c:319: count_contiguous_clusters:
Assertion `qcow2_get_cluster_type(first_entry) == QCOW2_CLUSTER_NORMAL'
failed.
[1]4970 abort (core dumped)  ./qemu-img map foo.qcow2

Max

> +*cluster_offset &= L2E_OFFSET_MASK;
> +if (offset_into_cluster(s, *cluster_offset)) {
> +qcow2_signal_corruption(bs, true, -1, -1,
> +"Preallocated zero cluster offset %#"
> +PRIx64 " unaligned (L2 offset: %#"
> +PRIx64 ", L2 index: %#x)",
> +*cluster_offset, l2_offset, 
> l2_index);
> +ret = -EIO;
> +goto fail;
> +}
> +} else {
> +c = count_contiguous_clusters_by_type(nb_clusters,
> +  _table[l2_index],
> +  QCOW2_CLUSTER_ZERO);
> +*cluster_offset = 0;
> +}
>  break;
>  case QCOW2_CLUSTER_UNALLOCATED:
>  /* how many empty clusters ? */
> 




signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [Qemu-ppc] [PATCH 1/4] hw/ppc: setting spapr_drc_detach_cb in spapr_dr_connector_new

2017-04-28 Thread Daniel Henrique Barboza




On 04/26/2017 07:09 PM, Michael Roth wrote:

Quoting Daniel Henrique Barboza (2017-04-26 16:31:21)

The idea of moving the detach callback functions to the constructor
of the dr_connector is to set them statically at init time, avoiding
any post-load hooks to restore it (after a migration, for example).

Summary of changes:

- hw/ppc/spapr_drc.c and include/hw/ppc/spapr_drc.h:
 *  spapr_dr_connector_new() now has an additional parameter,
spapr_drc_detach_cb *detach_cb
 *  'spapr_drc_detach_cb *detach_cb' parameter was removed of
the detach function pointer in sPAPRDRConnectorClass

- hw/ppc/spapr_pci.c:
 * the callback 'spapr_phb_remove_pci_device_cb' is now passed
as a parameter in 'spapr_dr_connector_new' instead of 'drck->detach()'

- hw/ppc/spapr.c:
 * 'spapr_create_lmb_dr_connectors' now passes the callback
'spapr_lmb_release' to 'spapr_dr_connector_new' instead of 'drck-detach()'
 * 'spapr_init_cpus' now passes the callback 'spapr_core_release'
to 'spapr_dr_connector_new' instead of 'drck-detach()'
 * moved the callback functions up in the code so they can be referenced
by 'spapr_create_lmb_dr_connectors' and 'spapr_init_cpus'

Signed-off-by: Daniel Henrique Barboza 

Patch looks good, but we need a follow-up afterward that removes the
need for the void *opaque argument as I mentioned in my last email.
Otherwise we'd still need to restore the opaque argument in post-load
which isn't really viable.


I'm on it!  Thanks!



---
  hw/ppc/spapr.c | 71 +++---
  hw/ppc/spapr_drc.c | 17 ++-
  hw/ppc/spapr_pci.c |  5 ++--
  include/hw/ppc/spapr_drc.h |  4 +--
  4 files changed, 49 insertions(+), 48 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 80d12d0..bc11757 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1887,6 +1887,29 @@ static void spapr_drc_reset(void *opaque)
  }
  }

+typedef struct sPAPRDIMMState {
+uint32_t nr_lmbs;
+} sPAPRDIMMState;
+
+static void spapr_lmb_release(DeviceState *dev, void *opaque)
+{
+sPAPRDIMMState *ds = (sPAPRDIMMState *)opaque;
+HotplugHandler *hotplug_ctrl;
+
+if (--ds->nr_lmbs) {
+return;
+}
+
+g_free(ds);
+
+/*
+ * Now that all the LMBs have been removed by the guest, call the
+ * pc-dimm unplug handler to cleanup up the pc-dimm device.
+ */
+hotplug_ctrl = qdev_get_hotplug_handler(dev);
+hotplug_handler_unplug(hotplug_ctrl, dev, _abort);
+}
+
  static void spapr_create_lmb_dr_connectors(sPAPRMachineState *spapr)
  {
  MachineState *machine = MACHINE(spapr);
@@ -1900,7 +1923,7 @@ static void 
spapr_create_lmb_dr_connectors(sPAPRMachineState *spapr)

  addr = i * lmb_size + spapr->hotplug_memory.base;
  drc = spapr_dr_connector_new(OBJECT(spapr), 
SPAPR_DR_CONNECTOR_TYPE_LMB,
- addr/lmb_size);
+ (addr / lmb_size), spapr_lmb_release);
  qemu_register_reset(spapr_drc_reset, drc);
  }
  }
@@ -1956,6 +1979,14 @@ static CPUArchId *spapr_find_cpu_slot(MachineState *ms, 
uint32_t id, int *idx)
  return >possible_cpus->cpus[index];
  }

+static void spapr_core_release(DeviceState *dev, void *opaque)
+{
+HotplugHandler *hotplug_ctrl;
+
+hotplug_ctrl = qdev_get_hotplug_handler(dev);
+hotplug_handler_unplug(hotplug_ctrl, dev, _abort);
+}
+
  static void spapr_init_cpus(sPAPRMachineState *spapr)
  {
  MachineState *machine = MACHINE(spapr);
@@ -1998,7 +2029,8 @@ static void spapr_init_cpus(sPAPRMachineState *spapr)
  sPAPRDRConnector *drc =
  spapr_dr_connector_new(OBJECT(spapr),
 SPAPR_DR_CONNECTOR_TYPE_CPU,
-   (core_id / smp_threads) * smt);
+   (core_id / smp_threads) * smt,
+   spapr_core_release);

  qemu_register_reset(spapr_drc_reset, drc);
  }
@@ -2596,29 +2628,6 @@ out:
  error_propagate(errp, local_err);
  }

-typedef struct sPAPRDIMMState {
-uint32_t nr_lmbs;
-} sPAPRDIMMState;
-
-static void spapr_lmb_release(DeviceState *dev, void *opaque)
-{
-sPAPRDIMMState *ds = (sPAPRDIMMState *)opaque;
-HotplugHandler *hotplug_ctrl;
-
-if (--ds->nr_lmbs) {
-return;
-}
-
-g_free(ds);
-
-/*
- * Now that all the LMBs have been removed by the guest, call the
- * pc-dimm unplug handler to cleanup up the pc-dimm device.
- */
-hotplug_ctrl = qdev_get_hotplug_handler(dev);
-hotplug_handler_unplug(hotplug_ctrl, dev, _abort);
-}
-
  static void spapr_del_lmbs(DeviceState *dev, uint64_t addr_start, uint64_t 
size,
 Error **errp)
  {
@@ -2636,7 +2645,7 @@ static void spapr_del_lmbs(DeviceState *dev, uint64_t 
addr_start, uint64_t size,
  g_assert(drc);

  drck =

Re: [Qemu-devel] [PATCH 31/41] migration: Move postcopy stuff to postcopy-ram.c

2017-04-28 Thread Dr. David Alan Gilbert

* Juan Quintela (quint...@redhat.com) wrote:
> Yes, we don't have a good place to put that stuff.
> 
> Signed-off-by: Juan Quintela 

Reviewed-by: Dr. David Alan Gilbert 

> ---
>  include/migration/migration.h | 25 -
>  migration/migration.c | 18 --
>  migration/postcopy-ram.c  | 18 ++
>  migration/postcopy-ram.h  | 26 ++
>  4 files changed, 44 insertions(+), 43 deletions(-)
> 
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index 9063c32..5c27fe7 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -54,28 +54,6 @@ enum mig_rp_message_type {
>  
>  typedef QLIST_HEAD(, LoadStateEntry) LoadStateEntry_Head;
>  
> -/* The current postcopy state is read/set by postcopy_state_get/set
> - * which update it atomically.
> - * The state is updated as postcopy messages are received, and
> - * in general only one thread should be writing to the state at any one
> - * time, initially the main thread and then the listen thread;
> - * Corner cases are where either thread finishes early and/or errors.
> - * The state is checked as messages are received to ensure that
> - * the source is sending us messages in the correct order.
> - * The state is also used by the RAM reception code to know if it
> - * has to place pages atomically, and the cleanup code at the end of
> - * the main thread to know if it has to delay cleanup until the end
> - * of postcopy.
> - */
> -typedef enum {
> -POSTCOPY_INCOMING_NONE = 0,  /* Initial state - no postcopy */
> -POSTCOPY_INCOMING_ADVISE,
> -POSTCOPY_INCOMING_DISCARD,
> -POSTCOPY_INCOMING_LISTENING,
> -POSTCOPY_INCOMING_RUNNING,
> -POSTCOPY_INCOMING_END
> -} PostcopyState;
> -
>  /* State for the incoming migration */
>  struct MigrationIncomingState {
>  QEMUFile *from_src_file;
> @@ -253,7 +231,4 @@ void savevm_skip_configuration(void);
>  int global_state_store(void);
>  void global_state_store_running(void);
>  
> -PostcopyState postcopy_state_get(void);
> -/* Set the state and return the old state */
> -PostcopyState postcopy_state_set(PostcopyState new_state);
>  #endif
> diff --git a/migration/migration.c b/migration/migration.c
> index ff32086..d2f3c5e 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -85,13 +85,6 @@ static NotifierList migration_state_notifiers =
>  
>  static bool deferred_incoming;
>  
> -/*
> - * Current state of incoming postcopy; note this is not part of
> - * MigrationIncomingState since it's state is used during cleanup
> - * at the end as MIS is being freed.
> - */
> -static PostcopyState incoming_postcopy_state;
> -
>  /* When we add fault tolerance, we could have several
> migrations at once.  For now we don't need to add
> dynamic creation of migration */
> @@ -2085,14 +2078,3 @@ void migrate_fd_connect(MigrationState *s)
>  s->migration_thread_running = true;
>  }
>  
> -PostcopyState  postcopy_state_get(void)
> -{
> -return atomic_mb_read(_postcopy_state);
> -}
> -
> -/* Set the state and return the old state */
> -PostcopyState postcopy_state_set(PostcopyState new_state)
> -{
> -return atomic_xchg(_postcopy_state, new_state);
> -}
> -
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index 3359224..cae5d48 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -787,3 +787,21 @@ void postcopy_discard_send_finish(MigrationState *ms, 
> PostcopyDiscardState *pds)
>  
>  g_free(pds);
>  }
> +
> +/*
> + * Current state of incoming postcopy; note this is not part of
> + * MigrationIncomingState since it's state is used during cleanup
> + * at the end as MIS is being freed.
> + */
> +static PostcopyState incoming_postcopy_state;
> +
> +PostcopyState  postcopy_state_get(void)
> +{
> +return atomic_mb_read(_postcopy_state);
> +}
> +
> +/* Set the state and return the old state */
> +PostcopyState postcopy_state_set(PostcopyState new_state)
> +{
> +return atomic_xchg(_postcopy_state, new_state);
> +}
> diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
> index 4c25f03..52d51e8 100644
> --- a/migration/postcopy-ram.h
> +++ b/migration/postcopy-ram.h
> @@ -81,6 +81,28 @@ int postcopy_place_page(MigrationIncomingState *mis, void 
> *host, void *from,
>  int postcopy_place_page_zero(MigrationIncomingState *mis, void *host,
>   size_t pagesize);
>  
> +/* The current postcopy state is read/set by postcopy_state_get/set
> + * which update it atomically.
> + * The state is updated as postcopy messages are received, and
> + * in general only one thread should be writing to the state at any one
> + * time, initially the main thread and then the listen thread;
> + * Corner cases are where either thread finishes early and/or errors.
> + * The state is checked as messages are received to ensure that
> + *

Re: [Qemu-devel] [PATCH v10 01/17] block: Update comments on BDRV_BLOCK_* meanings

2017-04-28 Thread Max Reitz

On 27.04.2017 03:46, Eric Blake wrote:
> We had some conflicting documentation: a nice 8-way table that
> described all possible combinations of DATA, ZERO, and
> OFFSET_VALID, couple with text that implied that OFFSET_VALID
> always meant raw data could be read directly.  As the 8-way
> table is the intended semantics, simplify the rest of the text
> to get rid of the confusion.
> 
> Suggested-by: Max Reitz 
> Signed-off-by: Eric Blake 
> 
> ---
> v10: new patch
> ---
>  include/block/block.h | 13 +++--
>  1 file changed, 7 insertions(+), 6 deletions(-)

Reviewed-by: Max Reitz 

Thanks!



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v5 14/19] target/alpha: Use tcg_gen_goto_ptr

2017-04-28 Thread Alex Bennée


Richard Henderson  writes:

> Signed-off-by: Richard Henderson 
> ---
>  target/alpha/translate.c | 49 
> +++-
>  1 file changed, 36 insertions(+), 13 deletions(-)
>
> diff --git a/target/alpha/translate.c b/target/alpha/translate.c
> index df5d695..c1a5fbf 100644
> --- a/target/alpha/translate.c
> +++ b/target/alpha/translate.c
> @@ -89,6 +89,10 @@ typedef enum {
> updated the PC for the next instruction to be executed.  */
>  EXIT_PC_STALE,
>
> +/* Similarly, but force TB exit without chaining.  */
> +EXIT_PC_UPDATED_FORCE,
> +EXIT_PC_STALE_FORCE,
> +
>  /* We are ending the TB with a noreturn function call, e.g. longjmp.
> No following code will be executed.  */
>  EXIT_NORETURN,
> @@ -455,11 +459,16 @@ static bool in_superpage(DisasContext *ctx, int64_t 
> addr)
>  #endif
>  }
>
> +static bool use_exit_tb(DisasContext *ctx)
> +{
> +/* Suppress any optimization in the case of single-steping and IO.  */
> +return ((ctx->tb->cflags & CF_LAST_IO)
> +|| ctx->singlestep_enabled || singlestep);
> +}
> +
>  static bool use_goto_tb(DisasContext *ctx, uint64_t dest)
>  {
> -/* Suppress goto_tb in the case of single-steping and IO.  */
> -if ((ctx->tb->cflags & CF_LAST_IO)
> -|| ctx->singlestep_enabled || singlestep) {
> +if (use_exit_tb(ctx)) {
>  return false;
>  }
>  #ifndef CONFIG_USER_ONLY
> @@ -1257,14 +1266,14 @@ static ExitStatus gen_call_pal(DisasContext *ctx, int 
> palcode)
> need the page permissions check.  We'll see the existence of
> the page when we create the TB, and we'll flush all TBs if
> we change the PAL base register.  */
> -if (!ctx->singlestep_enabled && !(ctx->tb->cflags & CF_LAST_IO)) {
> +if (use_exit_tb(ctx)) {
> +tcg_gen_movi_i64(cpu_pc, entry);
> +return EXIT_PC_UPDATED;
> +} else {
>  tcg_gen_goto_tb(0);
>  tcg_gen_movi_i64(cpu_pc, entry);
>  tcg_gen_exit_tb((uintptr_t)ctx->tb);
>  return EXIT_GOTO_TB;
> -} else {
> -tcg_gen_movi_i64(cpu_pc, entry);
> -return EXIT_PC_UPDATED;
>  }
>  }
>  #endif
> @@ -1323,7 +1332,7 @@ static ExitStatus gen_mfpr(DisasContext *ctx, TCGv va, 
> int regno)
>  gen_io_start();
>  helper(va);
>  gen_io_end();
> -return EXIT_PC_STALE;
> +return EXIT_PC_STALE_FORCE;
>  } else {
>  helper(va);
>  }
> @@ -1374,7 +1383,7 @@ static ExitStatus gen_mtpr(DisasContext *ctx, TCGv vb, 
> int regno)
>  case 252:
>  /* HALT */
>  gen_helper_halt(vb);
> -return EXIT_PC_STALE;
> +return EXIT_PC_STALE_FORCE;
>
>  case 251:
>  /* ALARM */
> @@ -1388,7 +1397,7 @@ static ExitStatus gen_mtpr(DisasContext *ctx, TCGv vb, 
> int regno)
> that ended with a CALL_PAL.  Since the base register usually only
> changes during boot, flushing everything works well.  */
>  gen_helper_tb_flush(cpu_env);
> -return EXIT_PC_STALE;
> +return EXIT_PC_STALE_FORCE;
>
>  case 32 ... 39:
>  /* Accessing the "non-shadow" general registers.  */
> @@ -2373,7 +2382,7 @@ static ExitStatus translate_one(DisasContext *ctx, 
> uint32_t insn)
>  gen_io_start();
>  gen_helper_load_pcc(va, cpu_env);
>  gen_io_end();
> -ret = EXIT_PC_STALE;
> +ret = EXIT_PC_STALE_FORCE;
>  } else {
>  gen_helper_load_pcc(va, cpu_env);
>  }
> @@ -2990,18 +2999,32 @@ void gen_intermediate_code(CPUAlphaState *env, struct 
> TranslationBlock *tb)
>  case EXIT_GOTO_TB:
>  case EXIT_NORETURN:
>  break;
> +
>  case EXIT_PC_STALE:
>  tcg_gen_movi_i64(cpu_pc, ctx.pc);
> -/* FALLTHRU */
> +goto do_exit_pc_updated;
> +case EXIT_PC_STALE_FORCE:
> +tcg_gen_movi_i64(cpu_pc, ctx.pc);
> +goto do_exit_pc_updated_force;
> +
>  case EXIT_PC_UPDATED:
> +do_exit_pc_updated:
> +if (!use_exit_tb()) {
> +tcg_gen_lookup_and_goto_ptr(cpu_pc);
> +break;
> +}
> +/* FALLTHRU */
> +case EXIT_PC_UPDATED_FORCE:
> +do_exit_pc_updated_force:
>  if (ctx.singlestep_enabled) {
>  gen_excp_1(EXCP_DEBUG, 0);
>  } else {
>  tcg_gen_exit_tb(0);
>  }
>  break;

Hmm this is ugly. You can make it a bit cleaner I think:

case EXIT_PC_STALE:
tcg_gen_movi_i64(cpu_pc, ctx.pc);
/* FALLTHRU */
case EXIT_PC_UPDATED:
if (!use_exit_tb()) {
tcg_gen_lookup_and_goto_ptr(cpu_pc);
break;
}
goto do_exit_pc_updated_force;

case EXIT_PC_STALE_FORCE:

Re: [Qemu-devel] [PATCH v5 11/19] target/i386: optimize cross-page direct jumps in softmmu

2017-04-28 Thread Alex Bennée


Richard Henderson  writes:

> From: "Emilio G. Cota" 
>
> Instead of unconditionally exiting to the exec loop, use the
> gen_jr helper to jump to the target if it is valid.
>
> Perf impact: see next commit's log.
>
> Reviewed-by: Richard Henderson 
> Signed-off-by: Emilio G. Cota 
> Message-Id: <1493263764-18657-10-git-send-email-c...@braap.org>
> Signed-off-by: Richard Henderson 
> ---
>  target/i386/translate.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/target/i386/translate.c b/target/i386/translate.c
> index f0e48dc..ea113fe 100644
> --- a/target/i386/translate.c
> +++ b/target/i386/translate.c
> @@ -2154,9 +2154,9 @@ static inline void gen_goto_tb(DisasContext *s, int 
> tb_num, target_ulong eip)
>  gen_jmp_im(eip);
>  tcg_gen_exit_tb((uintptr_t)s->tb + tb_num);
>  } else {
> -/* jump to another page: currently not optimized */
> +/* jump to another page */
>  gen_jmp_im(eip);
> -gen_eob(s);
> +gen_jr(s, cpu_tmp0);


I had to look up what was going on with cpu_tmp0 there. Is there a
particular reason i386 has these global temps with implied setting
rules? It does seem somewhat hacky.

Given cmp_tmp0 seems to be a heavily used across i386 I guess it keeps
to the style of the translator :-/


>  }
>  }


--
Alex Bennée

Re: [Qemu-devel] [PATCH v5 13/19] tb-hash: improve tb_jmp_cache hash function in user mode

2017-04-28 Thread Alex Bennée


Richard Henderson  writes:

> From: "Emilio G. Cota" 
>
> Optimizations to cross-page chaining and indirect branches make
> performance more sensitive to the hit rate of tb_jmp_cache.
> The constraint of reserving some bits for the page number
> lowers the achievable quality of the hashing function.
>
> However, user-mode does not have this requirement. Thus,
> with this change we use for user-mode a hashing function that
> is both faster and of better quality than the previous one.
>
> Measurements:
>
> Note: baseline (i.e. speedup == 1x) is QEMU v2.9.0.
>
> -   SPECint06 (test set), x86_64-linux-user. Host: 
> Intel i7-6700K @ 4.00GHz
>
>  2.2x 
> +-+--+-+
>   |   
>|
>   | jr
>|
>2x +jr+multhash
> ++...+-+
>   |jr+hash
>   |$$$ |
>   |   
>   |$+$ |
>   |   
>  ### $ |
>  1.8x 
> +-+..#|#.$...+-+
>   |  
> ++#+# $ |
>   |   
> |# # $ |
>  1.6x 
> +-+***.#.$++$$$..+-+
>   | $$$  
> *+* # $ |$+$|
>   |   ++$$$   ### $  
> * * # $  +++|$ $|
>   | ++###+$   # # $  
> * * # $   ###   ## $|
>  1.4x 
> +-+...***+#.$.***.#.$..*.*.#.$...#+#$$.*++*|#.$..+-+
>   | *+* # $ * * # $  
> * * # $   # # $ *  *+# $|
>   | * * # $   + * * # $  
> * * # $ *** # $ *  * # $   ###$$|
>  1.2x 
> +-+...*.*.#.$.***##$$.*.*.#.$..*.*.#.$.*.*.#.$.*..*.#.$.***+#+$..+-+
>   | * * # $ *+* # $ * * # $   +++
> * * # $ ++###$$ * * # $ *  * # $ * * # $|
>   |***##$$  * * # $ * * # $ * * # $ ***##$$  ++###   
> * * # $ *** #+$ * * # $ *  * # $ * * # $|
>   |*+*+#+$ ***##$$$ * * # $ * * # $ * * # $ *+* # $ ++$$ ***+#   
> * * # $ * * # $ * * # $ *  * # $ * * # $|
>1x 
> +-++-*+*+#+$+*+*+#-+$+*+*-#+$+*+*+#+$+*+*+#+$+*-*+#+$+***++#+$+*+*+#$$+*+*+#+$+*+*+#+$+*+*-#+$+*+-*+#+$+*+*+#+$-++-+
>   |* * # $ * * #  $ * * # $ * * # $ * * # $ * * # $ * *  # $ * * # $ 
> * * # $ * * # $ * * # $ *  * # $ * * # $|
>   |* * # $ * * #  $ * * # $ * * # $ * * # $ * * # $ * *  # $ * * # $ 
> * * # $ * * # $ * * # $ *  * # $ * * # $|
>  0.8x 
> +-+--***##$$-***##$$$-***##$$-***##$$-***##$$-***##$$-***###$$-***##$$-***##$$-***##$$-***##$$-##$$-***##$$--+-+
>  astar   bzip2  gcc   gobmk h264ref   hmmlibquantum  mcf 
> omnetpperlbench   sjengxalancbmk   hmean
>   png: http://imgur.com/4UXTrEc
>
> Here I also tried the hash function suggested by Paolo ("multhash"):
>
>   return ((uint64_t) (pc * 2654435761) >> 32) & (TB_JMP_CACHE_SIZE - 1);
>
> As you can see it is just as good as the other new function ("hash"),
> which is what I ended up going with.
>
> -  SPECint06 (train set), x86_64-linux-user. Host: 
> Intel i7-6700K @ 4.00GHz
>
>  2.6x 
> +-+--+-+
>   |   
>|
>   | jr
>### |
>  2.4x 
> +jr+hash...#.#...+-+
>   |

[Qemu-devel] [PATCH] MAINTAINERS: Add qemu-progress to the block layer

2017-04-28 Thread Max Reitz

util/qemu-progress.c is currently unmaintained. The only user of its
functionality is qemu-img, so it effectively is part of the block layer.

Suggested-by: Fam Zheng 
Signed-off-by: Max Reitz 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index a1d2b3a4d3..768b4d94ec 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1168,6 +1168,7 @@ F: include/block/
 F: qemu-img*
 F: qemu-io*
 F: tests/qemu-iotests/
+F: util/qemu-progress.c
 T: git git://repo.or.cz/qemu/kevin.git block
 
 Block I/O path
-- 
2.12.2

Re: [Qemu-devel] [PATCH v5 12/19] target/i386: optimize indirect branches

2017-04-28 Thread Alex Bennée


Richard Henderson  writes:

> From: "Emilio G. Cota" 
>
> Speed up indirect branches by jumping to the target if it is valid.
>
> Softmmu measurements (see later commit for user-mode numbers):
>
> Note: baseline (i.e. speedup == 1x) is QEMU v2.9.0.
>
> -  SPECint06 (test set), x86_64-softmmu (Ubuntu 16.04 guest). 
> Host: Intel i7-4790K @ 4.00GHz
>
>  2.4x 
> +-+--+-+
>   |   
>|
>   |   cross   
>|
>  2.2x 
> +cross+jr..+++...+-+
>   |   
> |  |
>   |   
> +++ |  |
>2x 
> +-+..|..|+-+
>   |   
>  |  |  |
>   |   
>  |  |  |
>  1.8x 
> +-+..|...+-+
>   |   
>  |# |# |
>   |   
> |# |
>  1.6x 
> +-+*.|*.|#...+-+
>   |   
>* |* |# |
>   |   
>* |* |# |
>  1.4x 
> +-+...+++..*.|*.|#...+-+
>   |  ++   
>    * |*++# +++ |
>   |+++|  |
>   #++# *++*  #  +++ |  |
>  1.2x 
> +-+..###.+++|..|.....#.*..*..#...|.###...+-+
>   |+++   #    #  ***###  
> *++*  # *  *  ##++#  |#  +++#++#|
>   |### +++  *++* #  *++*  #  ++#  #  *|* |# +++  
> *  *  # *  *  #  ***  #  *| *|#    #|
>1x 
> +-++-*++*++#++***###++*++*+#++*+-*++#+++#++***++#+-*+*++#-+##++*++*-+#+*++*-+#++*+*++#++*-+*+#++*++*++#-++-+
>   |*  *  #  * *  #  *  * #  *  *  # *  *  #  * *  #  *|* |#  *++* #  
> *  *  # *  *  #  * *  #  *  * #  *  *  #|
>   |*  *  #  * *  #  *  * #  *  *  # *  *  #  * *  #  *+*++#  *  * #  
> *  *  # *  *  #  * *  #  *  * #  *  *  #|
>  0.8x 
> +-+--###--***###--##--###-###--***###--***###--##--###-###--***###--##--###--+-+
>  astar   bzip2  gcc   gobmk h264ref   hmmlibquantum  mcf 
> omnetpperlbench   sjengxalancbmk   hmean
>   png: http://imgur.com/DU36YFU
>
> NB. 'cross' represents the previous commit.
>
> Reviewed-by: Richard Henderson 
> Signed-off-by: Emilio G. Cota 
> Message-Id: <1493263764-18657-11-git-send-email-c...@braap.org>
> Signed-off-by: Richard Henderson 

Reviewed-by: Alex Bennée 

> ---
>  target/i386/translate.c | 14 --
>  1 file changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/target/i386/translate.c b/target/i386/translate.c
> index ea113fe..674ec96 100644
> --- a/target/i386/translate.c
> +++ b/target/i386/translate.c
> @@ -4996,7 +4996,7 @@ static target_ulong disas_insn(CPUX86State *env, 
> DisasContext *s,
>  gen_push_v(s, cpu_T1);
>  gen_op_jmp_v(cpu_T0);
>  gen_bnd_jmp(s);
> -gen_eob(s);
> +gen_jr(s, cpu_T0);
>  break;
>  case 3: /* lcall Ev */
>  gen_op_ld_v(s, ot, cpu_T1, cpu_A0);
> @@ -5014,7 +5014,8 @@ static target_ulong disas_insn(CPUX86State *env, 
> DisasContext *s,
>tcg_const_i32(dflag - 1),
>tcg_const_i32(s->pc - s->cs_base));
>  }
> -gen_eob(s);
> +

Re: [Qemu-devel] [PATCH 3/3] migration: Remove old MigrationParams

2017-04-28 Thread Dr. David Alan Gilbert

* Juan Quintela (quint...@redhat.com) wrote:
> Not used anymore after moving block migration to use capabilities.
> 
> Signed-off-by: Juan Quintela 

Reviewed-by: Dr. David Alan Gilbert 

> ---
>  include/migration/migration.h | 10 ++
>  include/migration/vmstate.h   |  1 -
>  include/qemu/typedefs.h   |  1 -
>  include/sysemu/sysemu.h   |  3 +--
>  migration/colo.c  |  2 +-
>  migration/migration.c |  8 +++-
>  migration/savevm.c| 16 +++-
>  7 files changed, 10 insertions(+), 31 deletions(-)
> 
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index 2917baa..3495162 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -41,10 +41,6 @@
>  /* for vl.c */
>  extern int only_migratable;
>  
> -struct MigrationParams {
> -bool unused; /* C don't allow empty structs */
> -};
> -
>  /* Messages sent on the return path from destination to source */
>  enum mig_rp_message_type {
>  MIG_RP_MSG_INVALID = 0,  /* Must be 0 */
> @@ -134,12 +130,10 @@ struct MigrationState
>  QEMUBH *cleanup_bh;
>  QEMUFile *to_dst_file;
>  
> -/* New style params from 'migrate-set-parameters' */
> +/* params from 'migrate-set-parameters' */
>  MigrationParameters parameters;
>  
>  int state;
> -/* Old style params from 'migrate' command */
> -MigrationParams params;
>  
>  /* State related to return path */
>  struct {
> @@ -229,7 +223,7 @@ void migrate_fd_connect(MigrationState *s);
>  
>  void add_migration_state_change_notifier(Notifier *notify);
>  void remove_migration_state_change_notifier(Notifier *notify);
> -MigrationState *migrate_init(const MigrationParams *params);
> +MigrationState *migrate_init(void);
>  bool migration_is_blocked(Error **errp);
>  bool migration_in_setup(MigrationState *);
>  bool migration_is_idle(void);
> diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
> index 9452dec..4396d7e 100644
> --- a/include/migration/vmstate.h
> +++ b/include/migration/vmstate.h
> @@ -37,7 +37,6 @@ typedef int LoadStateHandler(QEMUFile *f, void *opaque, int 
> version_id);
>  
>  typedef struct SaveVMHandlers {
>  /* This runs inside the iothread lock.  */
> -void (*set_params)(const MigrationParams *params, void * opaque);
>  SaveStateHandler *save_state;
>  
>  void (*cleanup)(void *opaque);
> diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
> index f08d327..a388243 100644
> --- a/include/qemu/typedefs.h
> +++ b/include/qemu/typedefs.h
> @@ -49,7 +49,6 @@ typedef struct MemoryRegion MemoryRegion;
>  typedef struct MemoryRegionCache MemoryRegionCache;
>  typedef struct MemoryRegionSection MemoryRegionSection;
>  typedef struct MigrationIncomingState MigrationIncomingState;
> -typedef struct MigrationParams MigrationParams;
>  typedef struct MigrationState MigrationState;
>  typedef struct Monitor Monitor;
>  typedef struct MonitorDef MonitorDef;
> diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> index 058d5eb..3340202 100644
> --- a/include/sysemu/sysemu.h
> +++ b/include/sysemu/sysemu.h
> @@ -102,8 +102,7 @@ enum qemu_vm_cmd {
>  #define MAX_VM_CMD_PACKAGED_SIZE (1ul << 24)
>  
>  bool qemu_savevm_state_blocked(Error **errp);
> -void qemu_savevm_state_begin(QEMUFile *f,
> - const MigrationParams *params);
> +void qemu_savevm_state_begin(QEMUFile *f);
>  void qemu_savevm_state_header(QEMUFile *f);
>  int qemu_savevm_state_iterate(QEMUFile *f, bool postcopy);
>  void qemu_savevm_state_cleanup(void);
> diff --git a/migration/colo.c b/migration/colo.c
> index 5c6c2f0..75e8807 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -333,7 +333,7 @@ static int colo_do_checkpoint_transaction(MigrationState 
> *s,
>  }
>  
>  qemu_savevm_state_header(fb);
> -qemu_savevm_state_begin(fb, >params);
> +qemu_savevm_state_begin(fb);
>  qemu_mutex_lock_iothread();
>  qemu_savevm_state_complete_precopy(fb, false);
>  qemu_mutex_unlock_iothread();
> diff --git a/migration/migration.c b/migration/migration.c
> index 9b96f1a..f094079 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1102,7 +1102,7 @@ bool migration_is_idle(void)
>  return false;
>  }
>  
> -MigrationState *migrate_init(const MigrationParams *params)
> +MigrationState *migrate_init(void)
>  {
>  MigrationState *s = migrate_get_current();
>  
> @@ -1116,7 +1116,6 @@ MigrationState *migrate_init(const MigrationParams 
> *params)
>  s->cleanup_bh = 0;
>  s->to_dst_file = NULL;
>  s->state = MIGRATION_STATUS_NONE;
> -s->params = *params;
>  s->rp_state.from_dst_file = NULL;
>  s->rp_state.error = false;
>  s->mbps = 0.0;
> @@ -1215,7 +1214,6 @@ void qmp_migrate(const char *uri, bool has_blk, bool 
> blk,
>  {
>  Error *local_err = NULL;
>  MigrationState *s =

[Qemu-devel] [Qemu-devel RFC v3 5/5] msf2: Add Emcraft's Smartfusion2 SOM kit.

2017-04-28 Thread Subbaraya Sundeep

Emulated Emcraft's Smartfusion2 System On Module starter
kit.

Signed-off-by: Subbaraya Sundeep 
---
 hw/arm/Makefile.objs |  1 +
 hw/arm/msf2_som.c| 68 
 2 files changed, 69 insertions(+)
 create mode 100644 hw/arm/msf2_som.c

diff --git a/hw/arm/Makefile.objs b/hw/arm/Makefile.objs
index cce2759..d0b7093 100644
--- a/hw/arm/Makefile.objs
+++ b/hw/arm/Makefile.objs
@@ -5,6 +5,7 @@ obj-y += omap_sx1.o palm.o realview.o spitz.o stellaris.o 
msf2_soc.o
 obj-y += tosa.o versatilepb.o vexpress.o virt.o xilinx_zynq.o z2.o
 obj-$(CONFIG_ACPI) += virt-acpi-build.o
 obj-y += netduino2.o
+obj-y += msf2_som.o
 obj-y += sysbus-fdt.o
 
 obj-y += armv7m.o exynos4210.o pxa2xx.o pxa2xx_gpio.o pxa2xx_pic.o
diff --git a/hw/arm/msf2_som.c b/hw/arm/msf2_som.c
new file mode 100644
index 000..c41edd5
--- /dev/null
+++ b/hw/arm/msf2_som.c
@@ -0,0 +1,68 @@
+/*
+ * SmartFusion2 SOM starter kit(from Emcraft) emulation.
+ *
+ * Copyright (c) 2017 Subbaraya Sundeep 
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "hw/boards.h"
+#include "hw/arm/msf2_soc.h"
+#include "hw/arm/arm.h"
+
+static void msf2_init(MachineState *machine)
+{
+DeviceState *dev;
+DeviceState *spi_flash;
+MSF2State *soc;
+DriveInfo *dinfo = drive_get_next(IF_MTD);
+qemu_irq cs_line;
+SSIBus *spi_bus;
+
+dev = qdev_create(NULL, TYPE_MSF2_SOC);
+qdev_prop_set_string(dev, "cpu-model", "cortex-m3");
+object_property_set_bool(OBJECT(dev), true, "realized", _fatal);
+
+soc = MSF2_SOC(dev);
+
+/* Attach SPI flash to SPI0 controller */
+spi_bus = (SSIBus *)qdev_get_child_bus(dev, "spi0");
+spi_flash = ssi_create_slave_no_init(spi_bus, "s25sl12801");
+qdev_prop_set_uint8(spi_flash, "spansion-cr2nv", 1);
+if (dinfo) {
+qdev_prop_set_drive(spi_flash, "drive", blk_by_legacy_dinfo(dinfo),
+_fatal);
+}
+qdev_init_nofail(spi_flash);
+cs_line = qdev_get_gpio_in_named(spi_flash, SSI_GPIO_CS, 0);
+sysbus_connect_irq(SYS_BUS_DEVICE(>spi[0]), 1, cs_line);
+
+armv7m_load_kernel(ARM_CPU(first_cpu), machine->kernel_filename,
+   ENVM_SIZE);
+}
+
+static void msf2_machine_init(MachineClass *mc)
+{
+mc->desc = "SmartFusion2 SOM kit from Emcraft";
+mc->init = msf2_init;
+}
+
+DEFINE_MACHINE("smartfusion2-som", msf2_machine_init)
-- 
2.5.0

[Qemu-devel] [Qemu-devel RFC v3 4/5] msf2: Add Smartfusion2 SoC.

2017-04-28 Thread Subbaraya Sundeep

Smartfusion2 SoC has hardened Microcontroller subsystem
and flash based FPGA fabric. This patch adds support for
Microcontroller subsystem in the SoC.

Signed-off-by: Subbaraya Sundeep 
---
 default-configs/arm-softmmu.mak |   1 +
 hw/arm/Makefile.objs|   2 +-
 hw/arm/msf2_soc.c   | 194 
 include/hw/arm/msf2_soc.h   |  62 +
 4 files changed, 258 insertions(+), 1 deletion(-)
 create mode 100644 hw/arm/msf2_soc.c
 create mode 100644 include/hw/arm/msf2_soc.h

diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
index 78d7af0..7062512 100644
--- a/default-configs/arm-softmmu.mak
+++ b/default-configs/arm-softmmu.mak
@@ -122,3 +122,4 @@ CONFIG_ACPI=y
 CONFIG_SMBIOS=y
 CONFIG_ASPEED_SOC=y
 CONFIG_GPIO_KEY=y
+CONFIG_MSF2=y
diff --git a/hw/arm/Makefile.objs b/hw/arm/Makefile.objs
index 4c5c4ee..cce2759 100644
--- a/hw/arm/Makefile.objs
+++ b/hw/arm/Makefile.objs
@@ -1,7 +1,7 @@
 obj-y += boot.o collie.o exynos4_boards.o gumstix.o highbank.o
 obj-$(CONFIG_DIGIC) += digic_boards.o
 obj-y += integratorcp.o mainstone.o musicpal.o nseries.o
-obj-y += omap_sx1.o palm.o realview.o spitz.o stellaris.o
+obj-y += omap_sx1.o palm.o realview.o spitz.o stellaris.o msf2_soc.o
 obj-y += tosa.o versatilepb.o vexpress.o virt.o xilinx_zynq.o z2.o
 obj-$(CONFIG_ACPI) += virt-acpi-build.o
 obj-y += netduino2.o
diff --git a/hw/arm/msf2_soc.c b/hw/arm/msf2_soc.c
new file mode 100644
index 000..a470872
--- /dev/null
+++ b/hw/arm/msf2_soc.c
@@ -0,0 +1,194 @@
+/*
+ * SmartFusion2 SoC emulation.
+ *
+ * Copyright (c) 2017 Subbaraya Sundeep 
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qemu-common.h"
+#include "hw/arm/arm.h"
+#include "exec/address-spaces.h"
+#include "hw/char/serial.h"
+#include "hw/boards.h"
+#include "sysemu/block-backend.h"
+#include "hw/arm/msf2_soc.h"
+
+#define MSF2_TIMER_BASE   0x40004000
+#define MSF2_SYSREG_BASE  0x40038000
+
+#define MSF2_TIMER_IRQ0   14
+#define MSF2_TIMER_IRQ1   15
+
+static const uint32_t spi_addr[MSF2_NUM_SPIS] = { 0x40001000 , 0x40011000 };
+static const uint32_t uart_addr[MSF2_NUM_UARTS] = { 0x4000 , 0x4001 };
+
+static const int spi_irq[MSF2_NUM_SPIS] = { 2, 3 };
+static const int uart_irq[MSF2_NUM_UARTS] = { 10, 11 };
+
+static void msf2_soc_initfn(Object *obj)
+{
+MSF2State *s = MSF2_SOC(obj);
+int i;
+
+object_initialize(>armv7m, sizeof(s->armv7m), TYPE_ARMV7M);
+qdev_set_parent_bus(DEVICE(>armv7m), sysbus_get_default());
+
+object_initialize(>sysreg, sizeof(s->sysreg), TYPE_MSF2_SYSREG);
+qdev_set_parent_bus(DEVICE(>sysreg), sysbus_get_default());
+
+object_initialize(>timer, sizeof(s->timer), TYPE_MSF2_TIMER);
+qdev_set_parent_bus(DEVICE(>timer), sysbus_get_default());
+
+for (i = 0; i < MSF2_NUM_SPIS; i++) {
+object_initialize(>spi[i], sizeof(s->spi[i]),
+  TYPE_MSF2_SPI);
+qdev_set_parent_bus(DEVICE(>spi[i]), sysbus_get_default());
+}
+}
+
+static void msf2_soc_realize(DeviceState *dev_soc, Error **errp)
+{
+MSF2State *s = MSF2_SOC(dev_soc);
+DeviceState *dev, *armv7m;
+SysBusDevice *busdev;
+Error *err = NULL;
+int i;
+
+MemoryRegion *system_memory = get_system_memory();
+MemoryRegion *nvm = g_new(MemoryRegion, 1);
+MemoryRegion *nvm_alias = g_new(MemoryRegion, 1);
+MemoryRegion *sram = g_new(MemoryRegion, 1);
+MemoryRegion *ddr = g_new(MemoryRegion, 1);
+
+memory_region_init_ram(nvm, NULL, "MSF2.envm", ENVM_SIZE,
+   _fatal);
+memory_region_init_alias(nvm_alias, NULL, "MSF2.flash.alias",
+ nvm, 0, ENVM_SIZE);
+vmstate_register_ram_global(nvm);
+
+memory_region_set_readonly(nvm, true);
+

Re: [Qemu-devel] [PATCH 2/3] migration: Remove use of old MigrationParams

2017-04-28 Thread Dr. David Alan Gilbert

* Juan Quintela (quint...@redhat.com) wrote:
> We have change in the previous patch to use migration capabilities for
> it.  Notice that we continue using the old command line flags from
> migrate command from the time being.  Remove the set_params method as
> now it is empty.
> 
> Signed-off-by: Juan Quintela 
> ---
>  include/migration/migration.h |  3 +--
>  migration/block.c | 17 ++---
>  migration/colo.c  |  3 ---
>  migration/migration.c |  8 +---
>  migration/savevm.c|  2 --
>  5 files changed, 8 insertions(+), 25 deletions(-)
> 
> diff --git a/migration/colo.c b/migration/colo.c
> index c19eb3f..5c6c2f0 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -332,9 +332,6 @@ static int colo_do_checkpoint_transaction(MigrationState 
> *s,
>  goto out;
>  }
>  
> -/* Disable block migration */
> -s->params.blk = 0;
> -s->params.shared = 0;

Hmm you don't seem to have replaced this with anything.
I think that's a behavioural change; the trick COLO did (I'm not sure if this
is still the way it works) is that they initiate the first migration
with block migration enabled so that the two hosts (with non-shared storage)
get sync'd storage, and then at the completion of that first migration
they then switch into the checkpointing mode where they're only
doing updates - that's why it gets switched off at this point
prior to the 1st checkpoint.

Dave

>  qemu_savevm_state_header(fb);
>  qemu_savevm_state_begin(fb, >params);
>  qemu_mutex_lock_iothread();
> diff --git a/migration/migration.c b/migration/migration.c
> index 775b24c..9b96f1a 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -787,6 +787,10 @@ void 
> qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
>  s->enabled_capabilities[cap->value->capability] = cap->value->state;
>  }
>  
> +if (s->enabled_capabilities[MIGRATION_CAPABILITY_BLOCK_SHARED]) {
> +s->enabled_capabilities[MIGRATION_CAPABILITY_BLOCK_ENABLED] = true;
> +}
> +
>  if (migrate_postcopy_ram()) {
>  if (migrate_use_compression()) {
>  /* The decompression threads asynchronously write into RAM
> @@ -1214,9 +1218,6 @@ void qmp_migrate(const char *uri, bool has_blk, bool 
> blk,
>  MigrationParams params;
>  const char *p;
>  
> -params.blk = has_blk && blk;
> -params.shared = has_inc && inc;
> -
>  if (migration_is_setup_or_active(s->state) ||
>  s->state == MIGRATION_STATUS_CANCELLING ||
>  s->state == MIGRATION_STATUS_COLO) {
> @@ -1239,6 +1240,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool 
> blk,
>  }
>  
>  if (has_inc && inc) {
> +migrate_set_block_enabled(s);
>  migrate_set_block_shared(s);
>  }
>  
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 0c01988..102b11d 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1233,8 +1233,6 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
>  {
>  int ret;
>  MigrationParams params = {
> -.blk = 0,
> -.shared = 0
>  };
>  MigrationState *ms = migrate_init();
>  MigrationStatus status;
> -- 
> 2.9.3
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

[Qemu-devel] [Qemu-devel RFC v3 3/5] msf2: Add Smartfusion2 SPI controller

2017-04-28 Thread Subbaraya Sundeep

Modelled Microsemi's Smartfusion2 SPI controller.

Signed-off-by: Subbaraya Sundeep 
---
Hi Peter and Alistair,

I created two SPI controllers as per SoC spec 
in hw/arm/msf2_soc.c. I am assuming there has to be two
busses spi0 and spi1 one for each controller. In board file
(hw/arm/msf2_som.c) attached SPI flash to SPI0 controller.
I am not able to understand(from hw/ssi/xilinx_spips.c)
how to create two busses in hw/ssi/msf2_spi.c. 
Please help me here. Below is the output of info qtree:

(qemu) info qtree
bus: main-system-bus
  type System
  dev: msf2-soc, id ""
cpu-model = "cortex-m3"
  dev: msf2-spi, id ""
gpio-out "sysbus-irq" 2
mmio 40011000/0040
bus: spi0
  type SSI
  dev: msf2-spi, id ""
gpio-out "sysbus-irq" 2
mmio 40001000/0040
bus: spi0
  type SSI
  dev: s25sl12801, id ""
gpio-in "ssi-gpio-cs" 1
nonvolatile-cfg = 36863 (0x8fff)
spansion-cr1nv = 0 (0x0)
spansion-cr2nv = 1 (0x1)
spansion-cr3nv = 2 (0x2)
spansion-cr4nv = 16 (0x10)
drive = "mtd0"

Thanks,
Sundeep

 hw/ssi/Makefile.objs  |   1 +
 hw/ssi/msf2_spi.c | 373 ++
 include/hw/ssi/msf2_spi.h | 102 +
 3 files changed, 476 insertions(+)
 create mode 100644 hw/ssi/msf2_spi.c
 create mode 100644 include/hw/ssi/msf2_spi.h

diff --git a/hw/ssi/Makefile.objs b/hw/ssi/Makefile.objs
index 487add2..86445d7 100644
--- a/hw/ssi/Makefile.objs
+++ b/hw/ssi/Makefile.objs
@@ -4,6 +4,7 @@ common-obj-$(CONFIG_XILINX_SPI) += xilinx_spi.o
 common-obj-$(CONFIG_XILINX_SPIPS) += xilinx_spips.o
 common-obj-$(CONFIG_ASPEED_SOC) += aspeed_smc.o
 common-obj-$(CONFIG_STM32F2XX_SPI) += stm32f2xx_spi.o
+common-obj-$(CONFIG_MSF2) += msf2_spi.o
 
 obj-$(CONFIG_OMAP) += omap_spi.o
 obj-$(CONFIG_IMX) += imx_spi.o
diff --git a/hw/ssi/msf2_spi.c b/hw/ssi/msf2_spi.c
new file mode 100644
index 000..e7ffa21
--- /dev/null
+++ b/hw/ssi/msf2_spi.c
@@ -0,0 +1,373 @@
+/*
+ * SPI controller model of Microsemi SmartFusion2.
+ *
+ * Copyright (C) 2017 Subbaraya Sundeep 
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/ssi/msf2_spi.h"
+#include "sysemu/sysemu.h"
+#include "qemu/log.h"
+
+#ifndef MSF2_SPI_ERR_DEBUG
+#define MSF2_SPI_ERR_DEBUG   0
+#endif
+
+#define DB_PRINT_L(lvl, fmt, args...) do { \
+if (MSF2_SPI_ERR_DEBUG >= lvl) { \
+qemu_log("%s: " fmt, __func__, ## args); \
+} \
+} while (0);
+
+#define DB_PRINT(fmt, args...) DB_PRINT_L(1, fmt, ## args)
+
+static void txfifo_reset(MSF2SpiState *s)
+{
+fifo32_reset(>tx_fifo);
+
+s->regs[R_SPI_STATUS] &= ~S_TXFIFOFUL;
+s->regs[R_SPI_STATUS] |= S_TXFIFOEMP;
+}
+
+static void rxfifo_reset(MSF2SpiState *s)
+{
+fifo32_reset(>rx_fifo);
+
+s->regs[R_SPI_STATUS] &= ~S_RXFIFOFUL;
+s->regs[R_SPI_STATUS] |= S_RXFIFOEMP;
+}
+
+static void set_fifodepth(MSF2SpiState *s)
+{
+int size = s->regs[R_SPI_DFSIZE] & FRAMESZ_MASK;
+
+if (0 <= size && size <= 8) {
+s->fifo_depth = 32;
+}
+if (9 <= size && size <= 16) {
+s->fifo_depth = 16;
+}
+if (17 <= size && size <= 32) {
+s->fifo_depth = 8;
+}
+}
+
+static void msf2_spi_do_reset(MSF2SpiState *s)
+{
+memset(s->regs, 0, sizeof s->regs);
+s->regs[R_SPI_CONTROL] = 0x8102;
+s->regs[R_SPI_DFSIZE] = 0x4;
+s->regs[R_SPI_STATUS] = 0x2440;
+s->regs[R_SPI_CLKGEN] = 0x7;
+s->regs[R_SPI_STAT8] = 0x7;
+s->regs[R_SPI_RIS] = 0x0;
+
+s->fifo_depth = 4;
+s->frame_count = 1;
+s->enabled = false;
+
+rxfifo_reset(s);
+txfifo_reset(s);
+}
+
+static void update_mis(MSF2SpiState *s)
+{
+uint32_t reg = s->regs[R_SPI_CONTROL];
+uint32_t tmp;
+
+/*
+ * form the Control register interrupt enable bits
+ * same as RIS, MIS and Interrupt clear

[Qemu-devel] [Qemu-devel RFC v3 2/5] msf2: Microsemi Smartfusion2 System Register block.

2017-04-28 Thread Subbaraya Sundeep

Added Sytem register block of Smartfusion2.
This block has PLL registers which are accessed by guest.

Signed-off-by: Subbaraya Sundeep 
---
 hw/misc/Makefile.objs |   1 +
 hw/misc/msf2_sysreg.c | 127 ++
 include/hw/misc/msf2_sysreg.h |  77 +
 3 files changed, 205 insertions(+)
 create mode 100644 hw/misc/msf2_sysreg.c
 create mode 100644 include/hw/misc/msf2_sysreg.h

diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
index c8b4893..aee53df 100644
--- a/hw/misc/Makefile.objs
+++ b/hw/misc/Makefile.objs
@@ -56,3 +56,4 @@ obj-$(CONFIG_EDU) += edu.o
 obj-$(CONFIG_HYPERV_TESTDEV) += hyperv_testdev.o
 obj-$(CONFIG_AUX) += auxbus.o
 obj-$(CONFIG_ASPEED_SOC) += aspeed_scu.o aspeed_sdmc.o
+obj-$(CONFIG_MSF2) += msf2_sysreg.o
diff --git a/hw/misc/msf2_sysreg.c b/hw/misc/msf2_sysreg.c
new file mode 100644
index 000..6386953
--- /dev/null
+++ b/hw/misc/msf2_sysreg.c
@@ -0,0 +1,127 @@
+/*
+ * System Register block model of Microsemi SmartFusion2.
+ *
+ * Copyright (c) 2017 Subbaraya Sundeep 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "hw/misc/msf2_sysreg.h"
+#include "sysemu/sysemu.h"
+#include "qemu/log.h"
+
+#ifndef MSF2_SYSREG_ERR_DEBUG
+#define MSF2_SYSREG_ERR_DEBUG  0
+#endif
+
+#define DB_PRINT_L(lvl, fmt, args...) do { \
+if (MSF2_SYSREG_ERR_DEBUG >= lvl) { \
+qemu_log("%s: " fmt, __func__, ## args); \
+} \
+} while (0);
+
+#define DB_PRINT(fmt, args...) DB_PRINT_L(1, fmt, ## args)
+
+static void msf2_sysreg_reset(DeviceState *d)
+{
+MSF2SysregState *s = MSF2_SYSREG(d);
+
+DB_PRINT("RESET\n");
+
+s->regs[MSSDDR_PLL_STATUS_LOW_CR] = 0x02420041;
+s->regs[MSSDDR_FACC1_CR] = 0x0A482124;
+s->regs[MSSDDR_PLL_STATUS] = 0x3;
+}
+
+static uint64_t msf2_sysreg_read(void *opaque, hwaddr offset,
+unsigned size)
+{
+MSF2SysregState *s = opaque;
+offset /= 4;
+uint32_t ret = 0;
+
+if (offset < ARRAY_SIZE(s->regs)) {
+ret = s->regs[offset];
+DB_PRINT("addr: %08" HWADDR_PRIx " data: %08" PRIx32 "\n",
+offset * 4, ret);
+} else {
+DB_PRINT("addr: %08" HWADDR_PRIx " not valid\n", offset * 4);
+}
+
+return ret;
+}
+
+static void msf2_sysreg_write(void *opaque, hwaddr offset,
+  uint64_t val, unsigned size)
+{
+MSF2SysregState *s = (MSF2SysregState *)opaque;
+offset /= 4;
+
+DB_PRINT("addr: %08" HWADDR_PRIx " data: %08" PRIx64 "\n", offset * 4, 
val);
+
+switch (offset) {
+case MSSDDR_PLL_STATUS:
+break;
+
+default:
+if (offset < ARRAY_SIZE(s->regs)) {
+s->regs[offset] = val;
+}
+break;
+}
+}
+
+static const MemoryRegionOps sysreg_ops = {
+.read = msf2_sysreg_read,
+.write = msf2_sysreg_write,
+.endianness = DEVICE_NATIVE_ENDIAN,
+};
+
+static void msf2_sysreg_init(Object *obj)
+{
+MSF2SysregState *s = MSF2_SYSREG(obj);
+
+memory_region_init_io(>iomem, obj, _ops, s, TYPE_MSF2_SYSREG,
+  MSF2_SYSREG_MMIO_SIZE);
+sysbus_init_mmio(SYS_BUS_DEVICE(obj), >iomem);
+}
+
+static const VMStateDescription vmstate_msf2_sysreg = {
+.name = TYPE_MSF2_SYSREG,
+.version_id = 1,
+.minimum_version_id = 1,
+.fields = (VMStateField[]) {
+VMSTATE_UINT32_ARRAY(regs, MSF2SysregState, MSF2_SYSREG_NUM_REGS),
+VMSTATE_END_OF_LIST()
+}
+};
+
+static void msf2_sysreg_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+
+dc->vmsd = _msf2_sysreg;
+dc->reset = msf2_sysreg_reset;
+}
+
+static const TypeInfo msf2_sysreg_info = {
+.name  = TYPE_MSF2_SYSREG,
+.parent = TYPE_SYS_BUS_DEVICE,
+.class_init = msf2_sysreg_class_init,
+.instance_size  = sizeof(MSF2SysregState),
+.instance_init = msf2_sysreg_init,
+};
+
+static void msf2_sysreg_register_types(void)
+{
+type_register_static(_sysreg_info);
+}
+
+type_init(msf2_sysreg_register_types)
diff --git a/include/hw/misc/msf2_sysreg.h b/include/hw/misc/msf2_sysreg.h
new file mode 100644
index 000..03c5773
--- /dev/null
+++ b/include/hw/misc/msf2_sysreg.h
@@ -0,0 +1,77 @@
+/*
+ * Microsemi SmartFusion2 SYSREG
+ *
+ * Copyright (c) 2017 Subbaraya Sundeep 
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights

[Qemu-devel] [Qemu-devel RFC v3 0/5] Add support for Smartfusion2 SoC

2017-04-28 Thread Subbaraya Sundeep

Hi Qemu-devel,

I am trying to add Smartfusion2 SoC.
SoC is from Microsemi and System on Module(SOM)
board is from Emcraft systems. Smartfusion2 has hardened
Microcontroller(Cortex-M3)based Sub System and FPGA fabric.
At the moment only system timer, sysreg and SPI
controller are modelled.

Testing:
./arm-softmmu/qemu-system-arm -M smartfusion2-som -serial mon:stdio \
-kernel u-boot.bin -display none -drive file=spi.bin,if=mtd,format=raw

U-boot is from Emcraft with modified SPI driver not to use PDMA.
Linux is 4.5 linux with Smartfusion2 SoC dts and clocksource 
driver added by myself @
https://github.com/Subbaraya-Sundeep/linux.git

Baremetal elfs from Microsemi Softconsole IDE are also working.

Changes from v2:
Added SoC file and board file.
Changes from v1:
Added SPI controller.

Thanks,
Sundeep


Subbaraya Sundeep (5):
  msf2: Add Smartfusion2 System timer
  msf2: Microsemi Smartfusion2 System Register block.
  msf2: Add Smartfusion2 SPI controller
  msf2: Add Smartfusion2 SoC.
  msf2: Add Emcraft's Smartfusion2 SOM kit.

 default-configs/arm-softmmu.mak |   1 +
 hw/arm/Makefile.objs|   3 +-
 hw/arm/msf2_soc.c   | 194 +
 hw/arm/msf2_som.c   |  68 
 hw/misc/Makefile.objs   |   1 +
 hw/misc/msf2_sysreg.c   | 127 ++
 hw/ssi/Makefile.objs|   1 +
 hw/ssi/msf2_spi.c   | 373 
 hw/timer/Makefile.objs  |   1 +
 hw/timer/msf2_timer.c   | 250 +++
 include/hw/arm/msf2_soc.h   |  62 +++
 include/hw/misc/msf2_sysreg.h   |  77 +
 include/hw/ssi/msf2_spi.h   | 102 +++
 include/hw/timer/msf2_timer.h   |  82 +
 14 files changed, 1341 insertions(+), 1 deletion(-)
 create mode 100644 hw/arm/msf2_soc.c
 create mode 100644 hw/arm/msf2_som.c
 create mode 100644 hw/misc/msf2_sysreg.c
 create mode 100644 hw/ssi/msf2_spi.c
 create mode 100644 hw/timer/msf2_timer.c
 create mode 100644 include/hw/arm/msf2_soc.h
 create mode 100644 include/hw/misc/msf2_sysreg.h
 create mode 100644 include/hw/ssi/msf2_spi.h
 create mode 100644 include/hw/timer/msf2_timer.h

-- 
2.5.0

[Qemu-devel] [Qemu-devel RFC v3 1/5] msf2: Add Smartfusion2 System timer

2017-04-28 Thread Subbaraya Sundeep

Modelled System Timer in Microsemi's Smartfusion2 Soc.
Timer has two 32bit down counters and two interrupts.

Signed-off-by: Subbaraya Sundeep 
---
 hw/timer/Makefile.objs|   1 +
 hw/timer/msf2_timer.c | 250 ++
 include/hw/timer/msf2_timer.h |  82 ++
 3 files changed, 333 insertions(+)
 create mode 100644 hw/timer/msf2_timer.c
 create mode 100644 include/hw/timer/msf2_timer.h

diff --git a/hw/timer/Makefile.objs b/hw/timer/Makefile.objs
index dd6f27e..0bdf1e1 100644
--- a/hw/timer/Makefile.objs
+++ b/hw/timer/Makefile.objs
@@ -41,3 +41,4 @@ common-obj-$(CONFIG_STM32F2XX_TIMER) += stm32f2xx_timer.o
 common-obj-$(CONFIG_ASPEED_SOC) += aspeed_timer.o
 
 common-obj-$(CONFIG_SUN4V_RTC) += sun4v-rtc.o
+common-obj-$(CONFIG_MSF2) += msf2_timer.o
diff --git a/hw/timer/msf2_timer.c b/hw/timer/msf2_timer.c
new file mode 100644
index 000..d1dbde5
--- /dev/null
+++ b/hw/timer/msf2_timer.c
@@ -0,0 +1,250 @@
+/*
+ * Timer block model of Microsemi SmartFusion2.
+ *
+ * Copyright (c) 2017 Subbaraya Sundeep .
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/timer/msf2_timer.h"
+#include "hw/sysbus.h"
+#include "qemu/log.h"
+#include "qemu/main-loop.h"
+
+#ifndef MSF2_TIMER_ERR_DEBUG
+#define MSF2_TIMER_ERR_DEBUG  0
+#endif
+
+#define DB_PRINT_L(lvl, fmt, args...) do { \
+if (MSF2_TIMER_ERR_DEBUG >= lvl) { \
+qemu_log("%s: " fmt, __func__, ## args); \
+} \
+} while (0);
+
+#define DB_PRINT(fmt, args...) DB_PRINT_L(1, fmt, ## args)
+
+static void timer_update_irq(struct Msf2Timer *st)
+{
+bool isr, ier;
+
+isr = !!(st->regs[R_TIM_RIS] & TIMER_RIS_ACK);
+ier = !!(st->regs[R_TIM_CTRL] & TIMER_CTRL_INTR);
+
+qemu_set_irq(st->irq, (ier && isr));
+}
+
+static void timer_update(struct Msf2Timer *st)
+{
+uint64_t count;
+
+DB_PRINT("timer=%d\n", st->nr);
+
+if (!(st->regs[R_TIM_CTRL] & TIMER_CTRL_ENBL)) {
+ptimer_stop(st->ptimer);
+return;
+}
+
+count = st->regs[R_TIM_LOADVAL];
+ptimer_set_limit(st->ptimer, count, 1);
+ptimer_run(st->ptimer, 1);
+}
+
+static uint64_t
+timer_read(void *opaque, hwaddr addr, unsigned int size)
+{
+MSF2TimerState *t = opaque;
+struct Msf2Timer *st;
+uint32_t r = 0;
+unsigned int timer = 0;
+int isr;
+int ier;
+
+addr >>= 2;
+/*
+ * Two independent timers has same base address.
+ * Based on addr passed figure out which timer is being used.
+ */
+if (addr >= R_TIM1_MAX) {
+timer = 1;
+addr -= R_TIM1_MAX;
+}
+
+st = >timers[timer];
+
+switch (addr) {
+case R_TIM_VAL:
+r = ptimer_get_count(st->ptimer);
+DB_PRINT("msf2_timer t=%d read counter=%x\n", timer, r);
+break;
+
+case R_TIM_MIS:
+isr = !!(st->regs[R_TIM_RIS] & TIMER_RIS_ACK);
+ier = !!(st->regs[R_TIM_CTRL] & TIMER_CTRL_INTR);
+r = ier && isr;
+break;
+
+default:
+if (addr < ARRAY_SIZE(st->regs)) {
+r = st->regs[addr];
+}
+break;
+}
+DB_PRINT("timer=%d %lu=%x\n", timer, addr * 4, r);
+return r;
+}
+
+static void
+timer_write(void *opaque, hwaddr addr,
+uint64_t val64, unsigned int size)
+{
+MSF2TimerState *t = opaque;
+struct Msf2Timer *st;
+unsigned int timer = 0;
+uint32_t value = val64;
+
+addr >>= 2;
+/*
+ * Two independent timers has same base address.
+ * Based on addr passed figure out which timer is being used.
+ */
+if (addr >= R_TIM1_MAX) {
+timer = 1;
+addr -= R_TIM1_MAX;
+}
+
+st = >timers[timer];
+
+DB_PRINT("addr=%lu val=%x (timer=%d)\n", addr * 4, value, timer);
+
+switch (addr) {
+case R_TIM_CTRL:
+st->regs[R_TIM_CTRL] = value;
+timer_update(st);
+break;
+
+case

Re: [Qemu-devel] [PATCH v5 10/19] target/i386: introduce gen_jr helper to generate lookup_and_goto_ptr

2017-04-28 Thread Alex Bennée


Richard Henderson  writes:

> From: "Emilio G. Cota" 
>
> This helper will be used by subsequent changes.
>
> Signed-off-by: Emilio G. Cota 
> Message-Id: <1493263764-18657-9-git-send-email-c...@braap.org>
> Signed-off-by: Richard Henderson 

Reviewed-by: Alex Bennée 

> ---
>  target/i386/translate.c | 25 -
>  1 file changed, 24 insertions(+), 1 deletion(-)
>
> diff --git a/target/i386/translate.c b/target/i386/translate.c
> index 1d1372f..f0e48dc 100644
> --- a/target/i386/translate.c
> +++ b/target/i386/translate.c
> @@ -141,6 +141,7 @@ typedef struct DisasContext {
>  } DisasContext;
>
>  static void gen_eob(DisasContext *s);
> +static void gen_jr(DisasContext *s, TCGv dest);
>  static void gen_jmp(DisasContext *s, target_ulong eip);
>  static void gen_jmp_tb(DisasContext *s, target_ulong eip, int tb_num);
>  static void gen_op(DisasContext *s1, int op, TCGMemOp ot, int d);
> @@ -2509,7 +2510,8 @@ static void gen_bnd_jmp(DisasContext *s)
> If INHIBIT, set HF_INHIBIT_IRQ_MASK if it isn't already set.
> If RECHECK_TF, emit a rechecking helper for #DB, ignoring the state of
> S->TF.  This is used by the syscall/sysret insns.  */
> -static void gen_eob_worker(DisasContext *s, bool inhibit, bool recheck_tf)
> +static void
> +do_gen_eob_worker(DisasContext *s, bool inhibit, bool recheck_tf, TCGv jr)
>  {
>  gen_update_cc_op(s);
>
> @@ -2530,12 +2532,27 @@ static void gen_eob_worker(DisasContext *s, bool 
> inhibit, bool recheck_tf)
>  tcg_gen_exit_tb(0);
>  } else if (s->tf) {
>  gen_helper_single_step(cpu_env);
> +} else if (!TCGV_IS_UNUSED(jr)) {
> +TCGv vaddr = tcg_temp_new();
> +
> +tcg_gen_add_tl(vaddr, jr, cpu_seg_base[R_CS]);
> +tcg_gen_lookup_and_goto_ptr(vaddr);
> +tcg_temp_free(vaddr);
>  } else {
>  tcg_gen_exit_tb(0);
>  }
>  s->is_jmp = DISAS_TB_JUMP;
>  }
>
> +static inline void
> +gen_eob_worker(DisasContext *s, bool inhibit, bool recheck_tf)
> +{
> +TCGv unused;
> +
> +TCGV_UNUSED(unused);
> +do_gen_eob_worker(s, inhibit, recheck_tf, unused);
> +}
> +
>  /* End of block.
> If INHIBIT, set HF_INHIBIT_IRQ_MASK if it isn't already set.  */
>  static void gen_eob_inhibit_irq(DisasContext *s, bool inhibit)
> @@ -2549,6 +2566,12 @@ static void gen_eob(DisasContext *s)
>  gen_eob_worker(s, false, false);
>  }
>
> +/* Jump to register */
> +static void gen_jr(DisasContext *s, TCGv dest)
> +{
> +do_gen_eob_worker(s, false, false, dest);
> +}
> +
>  /* generate a jump to eip. No segment change must happen before as a
> direct call to the next block may occur */
>  static void gen_jmp_tb(DisasContext *s, target_ulong eip, int tb_num)


--
Alex Bennée

Re: [Qemu-devel] [PATCH] progress: Show current progress on SIGINFO

2017-04-28 Thread Max Reitz

On 08.02.2017 00:57, Max Reitz wrote:
> Currently we only print progress information on retrieval of SIGUSR1.
> Some systems have a dedicated SIGINFO for this, however, so it should be
> handled appropriately if it is available.
> 
> Buglink: https://bugs.launchpad.net/qemu/+bug/1662468
> Signed-off-by: Max Reitz 
> ---
> Can anyone test this on a BSD system? O:-)
> ---
>  util/qemu-progress.c | 3 +++
>  qemu-img.texi| 3 ++-
>  2 files changed, 5 insertions(+), 1 deletion(-)

Applied to my block branch.

(https://github.com/XanClic/qemu/commits/block)

Max



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 0/2] iotests: trivial fixups

2017-04-28 Thread Max Reitz

On 27.04.2017 22:50, John Snow wrote:
> Excised from an old patchset from 2015, just a couple of touchups.
> 
> John Snow (2):
>   iotests: clarify help text
>   iotests: fix exclusion option
> 
>  tests/qemu-iotests/common | 11 ---
>  1 file changed, 8 insertions(+), 3 deletions(-)

Thanks, applied to my block tree:

https://github.com/XanClic/qemu/commits/block

Max



signature.asc
Description: OpenPGP digital signature

1 2 3 >

1 - 100 of 278 matches

Mail list logo