Re: [Qemu-devel] [PATCH] tci: Add implementation for INDEX_op_ld16u_i64

2019-04-10 Thread Thomas Huth
On 10/04/2019 21.48, Stefan Weil wrote:
> This fixes "make check-tcg" on a Debian x86_64 host.
> 
> Signed-off-by: Stefan Weil 
> ---
>  tcg/tci.c | 15 +++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/tcg/tci.c b/tcg/tci.c
> index 33edca1903..a6208653e8 100644
> --- a/tcg/tci.c
> +++ b/tcg/tci.c
> @@ -127,6 +127,12 @@ static void tci_write_reg8(tcg_target_ulong *regs, 
> TCGReg index, uint8_t value)
>  tci_write_reg(regs, index, value);
>  }
>  
> +static void
> +tci_write_reg16(tcg_target_ulong *regs, TCGReg index, uint16_t value)
> +{
> +tci_write_reg(regs, index, value);
> +}
> +
>  static void
>  tci_write_reg32(tcg_target_ulong *regs, TCGReg index, uint32_t value)
>  {
> @@ -585,6 +591,8 @@ uintptr_t tcg_qemu_tb_exec(CPUArchState *env, uint8_t 
> *tb_ptr)
>  tci_write_reg8(regs, t0, *(uint8_t *)(t1 + t2));
>  break;
>  case INDEX_op_ld8s_i32:
> +TODO();
> +break;
>  case INDEX_op_ld16u_i32:
>  TODO();
>  break;
> @@ -854,7 +862,14 @@ uintptr_t tcg_qemu_tb_exec(CPUArchState *env, uint8_t 
> *tb_ptr)
>  tci_write_reg8(regs, t0, *(uint8_t *)(t1 + t2));
>  break;
>  case INDEX_op_ld8s_i64:
> +TODO();
> +break;
>  case INDEX_op_ld16u_i64:
> +t0 = *tb_ptr++;
> +t1 = tci_read_r(regs, _ptr);
> +t2 = tci_read_s32(_ptr);
> +tci_write_reg16(regs, t0, *(uint16_t *)(t1 + t2));
> +break;
>  case INDEX_op_ld16s_i64:
>  TODO();
>  break;
> 

Works for me, too:

 https://gitlab.com/huth/qemu/-/jobs/194622472

Tested-by: Thomas Huth 



Re: [Qemu-devel] QEMU Logo question

2019-04-10 Thread Thomas Huth
On 10/04/2019 19.51, Barajas, Felipe wrote:
> Hi,
> 
> I am trying to understand who in the QEMU community can give me permission to 
> use the QEMU logo in a whitepaper.
> I am an application engineer at Intel and I have prepared some benchmarks 
> using QEMU as a solution accelerated by Intel hardware (CPUs, SSDs, etc)
> I would like to use the QEMU logo in this white paper.
> 
> Do you know who in the community would have authority to give me such 
> permission ?

The logo has been designed by Benoît Canet, see:

 http://lists.gnu.org/archive/html/qemu-devel/2012-02/msg02865.html

It is licensed under the terms of CC BY 3.0 :

 http://creativecommons.org/licenses/by/3.0/

 HTH,
  Thomas



Re: [Qemu-devel] [PATCH for-4.0] hw/i386/pc: Fix crash when hot-plugging nvdimm on older machine types

2019-04-10 Thread Thomas Huth
 Hi,

On 11/04/2019 03.56, Wei Yang wrote:
> On Mon, Apr 08, 2019 at 09:29:11PM +, Wei Yang wrote:

 Thomas,

 Thanks for pointing this out, while I have some different idea on how to 
 fix
 this.

 The reason of the core dump is errp already been set in
 hotplug_handler_pre_plug(), and this function check acpi hotplug 
 capability.
 The order of this check is correct, while we should  return when errp is 
 set
 in hotplug_handler_pre_plug().

 I got a fix like this, which I have tested and looks good to me.


 diff --git a/hw/i386/pc.c b/hw/i386/pc.c
 index 6077d27361..b11f3b15c1 100644
 --- a/hw/i386/pc.c
 +++ b/hw/i386/pc.c
 @@ -2091,6 +2091,9 @@ static void pc_memory_pre_plug(HotplugHandler 
 *hotplug_dev, DeviceState *dev,
  }
  
  hotplug_handler_pre_plug(pcms->acpi_dev, dev, errp);
 +if (*errp) {
 +return;
 +}
>>>
>>> Not sure, but I think you can not rely on the fact that the caller set
>>> *errp = NULL already... that's why it is more common to use a local_err
>>> variable and error_propagate() for such cases (which is what I did in my
>>> patch).
>>>
>>
>> Ok, that's fine for me.
>>
>>> Also, why don't you want the "nvdimm is not enabled: missing 'nvdimm' in
>>> '-M'" check to be done first?
>>>
>>
>> Because this function pc_memory_pre_plug() will be called not only when
>> nvdimm is hot-plugged but also dimm is hot-plugged. And
>> hotplug_handler_pre_plug() here is to check the acpi(if it has) hot-plug
>> capability.
>>
>> So the check in pc_memory_pre_plug() is from generic to specific: 
>>1. Do we have capability to hot-plug?
>>2. If the device is nvdimm, do we enabled nvdimm?
>>
> 
> Thomas
> 
> Do you think this is a reasonable explanation?

Fine for me, I don't mind either way as long as the crash is fixed. Feel
free to send a patch that restores the desired order.

 Thomas



Re: [Qemu-devel] [PATCH for-4.0-maybe] device_tree: Fix integer overflowing in load_device_tree()

2019-04-10 Thread Markus Armbruster
Philippe Mathieu-Daudé  writes:

> On 4/10/19 8:34 AM, Alistair Francis wrote:
>> On Tue, Apr 9, 2019 at 10:59 PM Markus Armbruster  wrote:
>>> Philippe Mathieu-Daudé  writes:
 On 4/10/19 7:28 AM, Markus Armbruster wrote:
> Philippe Mathieu-Daudé  writes:
[...]
>> BTW how did you figure that out?
>
> Downstream handling of upstream commit da885fe1ee8 led me to the
> function.  I spotted dt_size = get_image_size(filename_path).
> Experience has taught me to check the left hand side's type.  Bad.  Then
> I saw how dt_size gets increased.  Worse.

 So you genuinely neglected to mention Kurtis Miller then :)
>>>
>>> Explanation, not excuse: the only occurence of the name in my downstream
>>> reading was a two-liner BZ comment, which I totally missed in my haste
>>> to give the fix a chance to make 4.0.  I certainly didn't mean to
>>> deprive him of credit!
>
> My English teacher explained me neglected sounds like a
> reprimand/reproach (as in negligence), sorry I didn't mean to be rude
> here, I wanted to say something like "Peter remarked you did gave
> credits to the first one who reported this issue, but since you figured
> this bug via your own way, the omission was not intentional then."

Don't worry, my social interactions teacher taught me to assume good
intent ;)

[...]



Re: [Qemu-devel] [PATCH v4 1/6] vfio-ccw: make it safe to access channel programs

2019-04-10 Thread Eric Farman




On 4/9/19 7:34 PM, Halil Pasic wrote:

On Mon, 8 Apr 2019 19:07:47 +0200
Cornelia Huck  wrote:


On Mon, 8 Apr 2019 13:02:12 -0400
Farhan Ali  wrote:


On 03/01/2019 04:38 AM, Cornelia Huck wrote:

When we get a solicited interrupt, the start function may have
been cleared by a csch, but we still have a channel program
structure allocated. Make it safe to call the cp accessors in
any case, so we can call them unconditionally.

While at it, also make sure that functions called from other parts
of the code return gracefully if the channel program structure
has not been initialized (even though that is a bug in the caller).

Reviewed-by: Eric Farman
Signed-off-by: Cornelia Huck
---


Hi Connie,

My series of fixes for vfio-ccw depends on this patch as I would like to
call cp_free unconditionally :) (I had developed my code on top of your
patches).

Could we pick this patch up as well when/if you pick up my patch series?
I am in the process of sending out a v2.

Regarding this patch we could merge it as a stand alone patch, separate
from this series. And also the patch LGTM

Reviewed-by: Farhan Ali 


Actually, I wanted to ask how people felt about merging this whole
series for the next release :) It would be one thing less on my plate...



Sorry I was not able to spend any significant amount of time on this
lately.

Gave the combined set (this + Farhans fio-ccw fixes for kernel
stacktraces v2) it a bit of smoke testing after some minor adjustments
to make it compile:

--- a/drivers/s390/cio/vfio_ccw_ops.c
+++ b/drivers/s390/cio/vfio_ccw_ops.c
@@ -13,6 +13,7 @@
  #include 
  #include 
  #include 
+#include 
  
  #include "vfio_ccw_private.h"





Hrm...  Taking today's master, and the two series you mention (slight 
adjustment to apply patch 3 of Connie's series, because part of it was 
split out a few weeks ago), and I don't encounter this.  Tried switching 
between SLUB/SLAB, but still compiles fine.



I'm just running fio on a pass-through DASD and on some virto-blk disks
in parallel. My QEMU is today's vfio-ccw-caps from your repo.

I see stuff like this:
qemu-git: vfio-ccw: wirte I/O region failed with errno=16[1811/7332/0 iops] 
[eta 26m:34s]


Without knowing what the I/O was that failed, this is a guessing game. 
But I encountered something similar just now running fio.


qemu:
2019-04-11T02:06:09.524838Z qemu-system-s390x: vfio-ccw: wirte I/O 
region failed with errno=16


guest:
[  422.931458] dasd-eckd 0.0.ca8d: An error occurred in the DASD device 
driver, reason=14 730bbe9a
[  553.741554] dasd-eckd 0.0.ca8e: An error occurred in the DASD device 
driver, reason=14 e59b81da
[  554.761552] dasd-eckd 0.0.ca8d: An error occurred in the DASD device 
driver, reason=14 cdf4fb4e
[  554.921518] dasd-eckd 0.0.ca8b: An error occurred in the DASD device 
driver, reason=14 68775082
[  555.271556] dasd-eckd 0.0.ca8d: ERP cdf4fb4e has run out of 
retries and failed

[  555.271786] dasd(eckd): I/O status report for device 0.0.ca8d:
   dasd(eckd): in req: cdf4fb4e CC:00 FC:00 AC:00 
SC:00 DS:00 CS:00 RC:-16

   dasd(eckd): device 0.0.ca8d: Failing CCW:   (null)
   dasd(eckd): SORRY - NO VALID SENSE AVAILABLE
[  555.272214] dasd(eckd): Related CP in req: cdf4fb4e
   dasd(eckd): CCW 6434c30f: 0340  DAT:
   dasd(eckd): CCW 7a65f7e0: 0800 70E5B700 DAT:
[  555.272508] dasd(eckd):..


From the associated I/O, I think this is fixed by a series I am nearly 
ready to send for review.  I'll try again with those fixes on top of the 
two series here, and report back.



[Thread 0x3ff75890910 (LWP 43803) exited]/7932KB/0KB /s] [1930/7932/0 iops] 
[eta 26m:33s]
[Thread 0x3ff6b7b7910 (LWP 43800) exited]/8030KB/0KB /s] [2031/8030/0 iops] 
[eta 26m:32s]
dasd-eckd 0.0.1234: An error occurred in the DASD device driver, reason=14 
caa27abe
INFO: task kworker/u6:1:26 blocked for more than 122 seconds.ps] [eta 
23m:26s]eta 23m:25s]
   Not tainted 5.1.0-rc3-00217-g6ab18dc #598
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/u6:1D026  2 0x
Workqueue: writeback wb_workfn (flush-94:0)
Call Trace:
([<00ae23f2>] __schedule+0x4fa/0xc98)
  [<00ae2bda>] schedule+0x4a/0xb0
  [<001b30ec>] io_schedule+0x2c/0x50
  [<0071cc9c>] blk_mq_get_tag+0x1bc/0x310
  [<0071571c>] blk_mq_get_request+0x1a4/0x4a8
  [<00719d38>] blk_mq_make_request+0x100/0x728
  [<0070aa0a>] generic_make_request+0x26a/0x478
  [<0070ac76>] submit_bio+0x5e/0x178
  [<004cfa2c>] ext4_io_submit+0x74/0x88
  [<004cfd32>] ext4_bio_write_page+0x2d2/0x4c8
  [<004aa5b4>] mpage_submit_page+0x74/0xa8
  [<004aa676>] mpage_process_page_bufs+0x8e/0x1b8
  [<004aa9bc>] mpage_prepare_extent_to_map+0x21c/0x390
  [<004b063c>] ext4_writepages+0x4bc/0x11a0
  

Re: [Qemu-devel] [PATCH for-4.0] hw/i386/pc: Fix crash when hot-plugging nvdimm on older machine types

2019-04-10 Thread Wei Yang
On Mon, Apr 08, 2019 at 09:29:11PM +, Wei Yang wrote:
>>> 
>>> Thomas,
>>> 
>>> Thanks for pointing this out, while I have some different idea on how to fix
>>> this.
>>> 
>>> The reason of the core dump is errp already been set in
>>> hotplug_handler_pre_plug(), and this function check acpi hotplug capability.
>>> The order of this check is correct, while we should  return when errp is set
>>> in hotplug_handler_pre_plug().
>>> 
>>> I got a fix like this, which I have tested and looks good to me.
>>> 
>>> 
>>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
>>> index 6077d27361..b11f3b15c1 100644
>>> --- a/hw/i386/pc.c
>>> +++ b/hw/i386/pc.c
>>> @@ -2091,6 +2091,9 @@ static void pc_memory_pre_plug(HotplugHandler 
>>> *hotplug_dev, DeviceState *dev,
>>>  }
>>>  
>>>  hotplug_handler_pre_plug(pcms->acpi_dev, dev, errp);
>>> +if (*errp) {
>>> +return;
>>> +}
>>
>>Not sure, but I think you can not rely on the fact that the caller set
>>*errp = NULL already... that's why it is more common to use a local_err
>>variable and error_propagate() for such cases (which is what I did in my
>>patch).
>>
>
>Ok, that's fine for me.
>
>>Also, why don't you want the "nvdimm is not enabled: missing 'nvdimm' in
>>'-M'" check to be done first?
>>
>
>Because this function pc_memory_pre_plug() will be called not only when
>nvdimm is hot-plugged but also dimm is hot-plugged. And
>hotplug_handler_pre_plug() here is to check the acpi(if it has) hot-plug
>capability.
>
>So the check in pc_memory_pre_plug() is from generic to specific: 
>1. Do we have capability to hot-plug?
>2. If the device is nvdimm, do we enabled nvdimm?
>

Thomas

Do you think this is a reasonable explanation?

-- 
Wei Yang
Help you, Help me



Re: [Qemu-devel] [PATCH for-4.1] q35: acpi: do not create dummy MCFG table

2019-04-10 Thread Wei Yang
On Wed, Apr 10, 2019 at 05:01:50PM +0200, Igor Mammedov wrote:
>On Wed, 10 Apr 2019 22:27:56 +0800
>Wei Yang  wrote:
>
>[...]
>> >@@ -2411,19 +2410,7 @@ build_mcfg_q35(GArray *table_data, BIOSLinker 
>> >*linker, AcpiMcfgInfo *info)
>> > mcfg->allocation[0].start_bus_number = 0;
>> > mcfg->allocation[0].end_bus_number = PCIE_MMCFG_BUS(info->mcfg_size - 
>> > 1);
>> > 
>> >-/* MCFG is used for ECAM which can be enabled or disabled by guest.  
>> 
>> I want to cnfirm what is "enabled or disabled by guest" here.
>
>Firmware theoretically during PCI initialization may disable ECAM support
>and that's when we do no need MCFG. In practice that's not happening
>(SeaBIOS or UEFI) but we in case there is out there a firmware that does
>disable ECAM we do not generate MCFG.
>
>Note:
>ACPI tables generated twice, 1st when QEMU starts and the second time
>when firmware accesses fwcfg to read blobs for the 1st time.
>The later happens after PCI subsystem was initialized by firmware.
>At that time we know if ECAM was enabled or not.
>

That's much clear, thanks :-)

So this is the guest BIOS instead of guest kernel who may disable/enable it.

>> If we don't reserve mcfg and "guest" enable mcfg during running, the ACPI
>> table size changed. But the destination still has the original table size,
>> since destination "guest" keep sleep during this period.
>> 
>> Now the migration would face table size difference
>
>with commit a1666142db we do not care as all the tables created on
>source will be migrated to destination as is overwriting whatever blobs
>destination created on startup.
>
>> and break migration?
>nope,
>
>to help you figure out why it works
>look at what following git commits did:
>  git log c8d6f66ae7..a1666142db
>and pay attention to 'used_length'
>

To be honest, this is what I feel confused in your previous reply.

First I want to confirm both fields in RAMBlock affects the migration:

* used_length
* max_length

Both of them should be the same on both source/destination, otherwise the
migration would fail.

Then I thought the migration would be broken if source/destination has
different knowledge about acpi table size. Because this will introduce
different value of used_length, even we have resizable MemoryRegion.

The 1st time ACPI generation flow:

acpi_add_rom_blob
rom_add_blob
rom_set_mr
memory_region_init_resizable_ram
qemu_ram_alloc_resizable
new_block->used_length = size
new_block->max_length = max_size

The 2nd time ACPI generation flow:

acpi_ram_update
memory_regioin_ram_resize
qemu_ram_resize
block->used_length = new_size

The max_length is always the same, while used_length would be changed to the
actual table_blob size.

In case source/destination has different knowledge about acpi table size, the
table_blob size(even after aligned) could be different.

This is why I thought there is still some chance to break migration after
resizable MemoryRegion.

Do I miss something?

-- 
Wei Yang
Help you, Help me



Re: [Qemu-devel] [PATCH] qemu-img: fix .hx and .texi disparity

2019-04-10 Thread John Snow



On 4/10/19 2:22 PM, Eric Blake wrote:
> On 4/10/19 12:37 PM, John Snow wrote:
>>
>>
>> On 4/10/19 1:39 AM, Markus Armbruster wrote:
>>> John Snow  writes:
>>>
 It turns out that having options listed in three places continues to be
 a bad idea. I'm still toying with the idea of an improved infrastructure
 here, but in the meantime, another bandaid.

> 
>>>
>>> Nice to have in 4.0.
>>>
>>> Reviewed-by: Markus Armbruster 
>>>
>>
>> If it can go in for 4.0, who stages it? Kevin?
> 
> At this point, we've missed -rc3; the fix on its own is not a
> release-blocker, but if something else comes up that requires -rc4 for
> an actual release blocker, we can sort out someone sending a pull
> request for this one at the same time. (I'm aware of a couple of iotest
> fixups in the same boat... Guess it's time to start updating the wiki to
> point to email threads where we document 'wish list' patches)
> 

OK. I also have... a pretty major bitmaps documentation rewrite which I
probably won't have ready in time for a theoretical rc4 either, so I
guess I'll need to include it for 4.0.1.

I didn't realize how badly out of date the bitmaps.rst file had gotten.
I'll keep hammering away at it.

--js



Re: [Qemu-devel] [RFC-PATCH] Introducing virtio-example device.

2019-04-10 Thread Michael S. Tsirkin
On Wed, Apr 10, 2019 at 08:15:32PM +0100, Stefan Hajnoczi wrote:
> On Wed, Apr 10, 2019 at 4:45 PM Yoni Bettan  wrote:
> > On 4/9/19 4:17 PM, Stefan Hajnoczi wrote:
> > > On Mon, Apr 01, 2019 at 02:18:43PM +0300, Yoni Bettan wrote:
> > The final purpose is to have:
> >
> > 1. device specification
> >
> > 2. device implementation
> >
> > 3. device driver
> >
> > 4. blog
> >
> > maybe I should have written it at the beginning, this is not the entire
> > project but it is its start.
> 
> The way I'd design VIRTIO devices without prior knowledge is:
> 
> 1. Learn the VIRTIO device model.  Understand the concepts in VIRTIO.
> <-- this is hard today, there's not much good documentation

Best doc is still probably Rusty's whitepaper. It only covers 0.X
spec so somewhat outdated but it does explain the concepts I think.

> 2. Design an initial version of the device spec.  Mostly configuration
> layout, virtqueues, and request structs.  Not much text is necessary
> at this point, but it's critical for thinking through features before
> implementation.
> 
> 3. Implement guest driver and device emulation.
> 
> 4. Iterate on spec and implementation until it's functionally complete.
> 
> 5. Submit the spec to the VIRTIO Technical Committee.
> 
> 6. Submit driver and device emulation patches.  They can be merged
> when the spec is approved/close to approved.
> 
> Are you jumping to #3?  This is likely to lead to poor quality
> implementations and specs because the fundamental VIRTIO concepts
> weren't understood.
> 
> If the point is to educate others and/or do it "the right way", then I
> would really avoid hacking around without first doing the other steps.
> 
> Stefan



Re: [Qemu-devel] [PATCH] hostmem-file: warn when memory-backend-file, share=on and in incoming migration

2019-04-10 Thread Catherine Ho
Hi Dr. David

On Thu, 11 Apr 2019 at 00:57, Dr. David Alan Gilbert 
wrote:

> * Catherine Ho (catherine.h...@gmail.com) wrote:
> > Hi Dr. David
> >
> > On Wed, 10 Apr 2019 at 22:59, Dr. David Alan Gilbert <
> dgilb...@redhat.com>
> > wrote:
> >
> > > * Catherine Ho (catherine.h...@gmail.com) wrote:
> > > > Hi Igor
> > > >
> > > >
> > > > On Mon, 8 Apr 2019 at 18:35, Igor Mammedov 
> wrote:
> > > >
> > > > > On Sun,  7 Apr 2019 22:19:05 -0400
> > > > > Catherine Ho  wrote:
> > > > >
> > > > > > Currently it is not forbidden to use "-object
> > > > > memory-backend-file,share=on"
> > > > > > and together with "-incoming". But after incoming migration is
> > > finished,
> > > > > > the memory-backend-file will be definitely written if share=on.
> So
> > > the
> > > > > > memory-backend-file can only be used once, but failed in the 2nd
> time
> > > > > > incoming.
> > > > > >
> > > > > > Thus it gives a warning and the users can run the qemu if they
> really
> > > > > > want to do it.
> > > > >
> > > > > Shouldn't we add a migration blocker in such a case instead of
> warning
> > > > > and letting qemu run wild?
> > > > >
> > > >
> > > > IMO, it doesn't need to block this. With share=on and -incoming, the
> user
> > > > can
> > > > still save the device memory state into memory-backend file again if
> > > > ignore-shared
> > > > capability is on.
> > > >
> > > > If we block this, the user can't use the ignore-shared capability in
> > > > incoming
> > > > migration.
> > >
> > > -incomign with share=on is a perfectly normal thing to do - it just
> > > depends who you are sharing the file with and the lifetime of that
> > > shared file.
> > >
> > > For example; if you're just running a qemu with vhost-user then you
> > > use share=on - however wyou typically select the backend file as
> > > a new file from /dev/shm - it's not a file that you previously migrated
> > > to.
> > >
> > Thanks,
> > but using a new file from /dev/shm means kernel will start from
> > start_kernel or early? Is it different from the x-ignore-shared case?
> > If we remove the share=on in incoming migration, all the writting
> > of ram will not be flush into the memory backend file. Thus we
> > can use the base memory backend file for ever.
> > e.g.
> > 1) save the vm like a snapshot, current ram state is "kernel
> > has been started, systemd has been started"
> > 2) restore it with -incoming and *no* share=on flag
> > 3) restore it with -incoming and *no* share=on again...
> > In contrary, if we use share=on, the base backend file will
> > be written at once after 1st time incoming.
> >
> > So, IMO, no "share=on" is the proper usage of incoming migration
> > when ignore-shared is on.
> > Please correct me if sth is wrong, thanks:)
>
> OK, I see what you're trying to do - you mean for the 'snapshotting'
> case;  but that's not the only use.  Another use is for being able to
> do a very quick upgrade of the running qemu to a new qemu binary
> version; and in that case you want to be able to write to the shared
> file so that you can repeatedly do the quick migrate.
>
> Dave
>
> Ah, that quick upgrade example makes sense to me. Thanks for the
explanation.

B.R.
Catherine


Re: [Qemu-devel] Questions about acpi interrupt link device's ‘_PRS' field

2019-04-10 Thread Li Qiang
Paolo Bonzini  于2019年4月10日周三 下午11:55写道:

> On 10/04/19 16:33, Li Qiang wrote:
> > Hi all,
> >
> >
> >
> > I see the link device ‘_PRS’  uses irq line 5, 10, 11 in
> > ‘build_link_dev’ function.
> >
> > But I never see the 5 lines uses in the guest, just uses 10 and 11.
> >
> > Why this happen?  Maybe related with the guest?
>
> Because the MADT table tells the guest to only use lines 10 and 11.  The
> BIOS configures the chipset that way.
>
>
Hi Paolo,

I read the MADT spec, and found that it may related with 'Entry Type 2 :
Interrupt Source Override'.
However, in build_madt function, I found following code when fill interrupt
source overide.

#define ACPI_BUILD_PCI_IRQS ((1<<5) | (1<<9) | (1<<10) | (1<<11))
if (!(ACPI_BUILD_PCI_IRQS & (1 << i))) {
/* No need for a INT source override structure. */
continue;
}

Seems the madt doesn't play a role?

Also in the seabios code, I see the pci interrupt linking device is routed
hard-coded by pci_irqs.
So this means the seabios doesn't use the madt/dsdt table to configure PCI
interrupt routing?

Thanks,
Li Qiang

const u8 pci_irqs[4] = {
10, 10, 11, 11
};

static void piix_isa_bridge_setup(struct pci_device *pci, void *arg)
{
int i, irq;
u8 elcr[2];

elcr[0] = 0x00;
elcr[1] = 0x00;
for (i = 0; i < 4; i++) {
irq = pci_irqs[i];
/* set to trigger level */
elcr[irq >> 3] |= (1 << (irq & 7));
/* activate irq remapping in PIIX */
pci_config_writeb(pci->bdf, 0x60 + i, irq);
}
outb(elcr[0], PIIX_PORT_ELCR1);
outb(elcr[1], PIIX_PORT_ELCR2);
dprintf(1, "PIIX3/PIIX4 init: elcr=%02x %02x\n", elcr[0], elcr[1]);
}

Paolo
>
>


[Qemu-devel] [PATCH for 4.1 v3 0/6] RISC-V: Allow specifying CPU ISA via command line

2019-04-10 Thread Alistair Francis
This patch series adds a generic RISC-V CPU that can be generated at run
time based on the ISA string specified to QEMU via the -cpu argument. This
is supported on the virt and spike boards allowing users to specify the

RISC-V extensions as well as the ISA version.
As part of the conversion we have deprecated the version specifi Spike
machines.

v3:
 - Ensure a minimal length so we don't run off the end of the string.
 - Don't parse the rv32/rv64 in the riscv_generate_cpu_init() loop
v2:
 - Keep the any CPU for linux-user

Alistair Francis (6):
  linux-user/riscv: Add the CPU type as a comment
  target/riscv: Fall back to generating a RISC-V CPU
  target/riscv: Create settable CPU properties
  riscv: virt: Allow specifying a CPU via commandline
  target/riscv: Remove the generic no MMU CPUs
  riscv: Add a generic spike machine

 hw/riscv/spike.c  | 106 +++-
 hw/riscv/virt.c   |   3 +-
 linux-user/riscv/target_elf.h |   1 +
 target/riscv/cpu.c| 147 +-
 target/riscv/cpu.h|  12 ++-
 5 files changed, 262 insertions(+), 7 deletions(-)

-- 
2.21.0



Re: [Qemu-devel] [PATCH] target/riscv: Remove unused include of riscv_htif.h for virt board

2019-04-10 Thread Alistair Francis
On Wed, Apr 10, 2019 at 9:30 AM Jonathan Behrens  wrote:
>
> Unless I'm missing something, the virt board doesn't support HTIF and
> should not be including this header.

Thanks for the patch!

You aren't missing anything, it can be removed.

>
> Jonathan

Can you send a v2 without the uncertainty in the commit message? Can
you also remove your name from the message? For such a small patch the
commit title by itself will be enough.

>
> Signed-off-by: Jonathan Behrens 

With an updated commit message:

Reviewed-by: Alistair Francis 

Alistair

> ---
>  hw/riscv/virt.c | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
> index fc4c6b306e..3526463034 100644
> --- a/hw/riscv/virt.c
> +++ b/hw/riscv/virt.c
> @@ -29,7 +29,6 @@
>  #include "hw/sysbus.h"
>  #include "hw/char/serial.h"
>  #include "target/riscv/cpu.h"
> -#include "hw/riscv/riscv_htif.h"
>  #include "hw/riscv/riscv_hart.h"
>  #include "hw/riscv/sifive_plic.h"
>  #include "hw/riscv/sifive_clint.h"
> --
> 2.20.1



[Qemu-devel] [PATCH for 4.1 v3 2/6] target/riscv: Fall back to generating a RISC-V CPU

2019-04-10 Thread Alistair Francis
If a user specifies a CPU that we don't understand then we want to fall
back to a CPU generated from the ISA string.

At the moment the generated CPU is assumed to be a privledge spec
version 1.10 CPU with an MMU. This can be changed in the future.

Signed-off-by: Alistair Francis 
---
v3:
 - Ensure a minimal length so we don't run off the end of the string.
 - Don't parse the rv32/rv64 in the loop
 target/riscv/cpu.c | 101 -
 target/riscv/cpu.h |   2 +
 2 files changed, 102 insertions(+), 1 deletion(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index d61bce6d55..27be9e412a 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -19,6 +19,7 @@
 
 #include "qemu/osdep.h"
 #include "qemu/log.h"
+#include "qemu/error-report.h"
 #include "cpu.h"
 #include "exec/exec-all.h"
 #include "qapi/error.h"
@@ -103,6 +104,99 @@ static void set_resetvec(CPURISCVState *env, int resetvec)
 #endif
 }
 
+static void riscv_generate_cpu_init(Object *obj)
+{
+RISCVCPU *cpu = RISCV_CPU(obj);
+CPURISCVState *env = >env;
+RISCVCPUClass *mcc = RISCV_CPU_GET_CLASS(cpu);
+const char *riscv_cpu = mcc->isa_str;
+target_ulong target_misa = 0;
+target_ulong rvxlen = 0;
+int i;
+bool valid = false;
+
+/*
+ * We need at least 5 charecters for the string to be valid. Check that
+ * now so we can be lazier later.
+ */
+if (strlen(riscv_cpu) < 5) {
+error_report("'%s' does not appear to be a valid RISC-V ISA string",
+ riscv_cpu);
+exit(1);
+}
+
+if (riscv_cpu[0] == 'r' && riscv_cpu[1] == 'v') {
+/* Starts with "rv" */
+if (riscv_cpu[2] == '3' && riscv_cpu[3] == '2') {
+valid = true;
+rvxlen = RV32;
+}
+if (riscv_cpu[2] == '6' && riscv_cpu[3] == '4') {
+valid = true;
+rvxlen = RV64;
+}
+}
+
+if (!valid) {
+error_report("'%s' does not appear to be a valid RISC-V CPU",
+ riscv_cpu);
+exit(1);
+}
+
+for (i = 4; i < strlen(riscv_cpu); i++) {
+switch (riscv_cpu[i]) {
+case 'i':
+if (target_misa & RVE) {
+error_report("I and E extensions are incompatible");
+exit(1);
+}
+target_misa |= RVI;
+continue;
+case 'e':
+if (target_misa & RVI) {
+error_report("I and E extensions are incompatible");
+exit(1);
+}
+target_misa |= RVE;
+continue;
+case 'g':
+target_misa |= RVI | RVM | RVA | RVF | RVD;
+continue;
+case 'm':
+target_misa |= RVM;
+continue;
+case 'a':
+target_misa |= RVA;
+continue;
+case 'f':
+target_misa |= RVF;
+continue;
+case 'd':
+target_misa |= RVD;
+continue;
+case 'c':
+target_misa |= RVC;
+continue;
+case 's':
+target_misa |= RVS;
+continue;
+case 'u':
+target_misa |= RVU;
+continue;
+default:
+warn_report("QEMU does not support the %c extension",
+riscv_cpu[i]);
+continue;
+}
+}
+
+set_misa(env, rvxlen | target_misa);
+set_versions(env, USER_VERSION_2_02_0, PRIV_VERSION_1_10_0);
+set_resetvec(env, DEFAULT_RSTVEC);
+set_feature(env, RISCV_FEATURE_MMU);
+set_feature(env, RISCV_FEATURE_PMP);
+}
+
 static void riscv_any_cpu_init(Object *obj)
 {
 CPURISCVState *env = _CPU(obj)->env;
@@ -178,6 +272,7 @@ static void rv64imacu_nommu_cpu_init(Object *obj)
 static ObjectClass *riscv_cpu_class_by_name(const char *cpu_model)
 {
 ObjectClass *oc;
+RISCVCPUClass *mcc;
 char *typename;
 char **cpuname;
 
@@ -188,7 +283,10 @@ static ObjectClass *riscv_cpu_class_by_name(const char 
*cpu_model)
 g_free(typename);
 if (!oc || !object_class_dynamic_cast(oc, TYPE_RISCV_CPU) ||
 object_class_is_abstract(oc)) {
-return NULL;
+/* No CPU found, try the generic CPU and pass in the ISA string */
+oc = object_class_by_name(TYPE_RISCV_CPU_GEN);
+mcc = RISCV_CPU_CLASS(oc);
+mcc->isa_str = g_strdup(cpu_model);
 }
 return oc;
 }
@@ -440,6 +538,7 @@ static const TypeInfo riscv_cpu_type_infos[] = {
 .class_init = riscv_cpu_class_init,
 },
 DEFINE_CPU(TYPE_RISCV_CPU_ANY,  riscv_any_cpu_init),
+DEFINE_CPU(TYPE_RISCV_CPU_GEN,  riscv_generate_cpu_init),
 #if defined(TARGET_RISCV32)
 DEFINE_CPU(TYPE_RISCV_CPU_RV32GCSU_V1_09_1, rv32gcsu_priv1_09_1_cpu_init),
 DEFINE_CPU(TYPE_RISCV_CPU_RV32GCSU_V1_10_0, rv32gcsu_priv1_10_0_cpu_init),
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 20bce8742e..453108a855 100644
--- 

[Qemu-devel] [PATCH for 4.1 v3 4/6] riscv: virt: Allow specifying a CPU via commandline

2019-04-10 Thread Alistair Francis
Signed-off-by: Alistair Francis 
---
 hw/riscv/virt.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
index fc4c6b306e..5b25f028ad 100644
--- a/hw/riscv/virt.c
+++ b/hw/riscv/virt.c
@@ -400,7 +400,7 @@ static void riscv_virt_board_init(MachineState *machine)
 /* Initialize SOC */
 object_initialize_child(OBJECT(machine), "soc", >soc, sizeof(s->soc),
 TYPE_RISCV_HART_ARRAY, _abort, NULL);
-object_property_set_str(OBJECT(>soc), VIRT_CPU, "cpu-type",
+object_property_set_str(OBJECT(>soc), machine->cpu_type, "cpu-type",
 _abort);
 object_property_set_int(OBJECT(>soc), smp_cpus, "num-harts",
 _abort);
@@ -526,6 +526,7 @@ static void riscv_virt_board_machine_init(MachineClass *mc)
 mc->desc = "RISC-V VirtIO Board (Privileged ISA v1.10)";
 mc->init = riscv_virt_board_init;
 mc->max_cpus = 8; /* hardcoded limit in BBL */
+mc->default_cpu_type = VIRT_CPU;
 }
 
 DEFINE_MACHINE("virt", riscv_virt_board_machine_init)
-- 
2.21.0



[Qemu-devel] [PATCH for 4.1 v3 1/6] linux-user/riscv: Add the CPU type as a comment

2019-04-10 Thread Alistair Francis
Signed-off-by: Alistair Francis 
---
 linux-user/riscv/target_elf.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/linux-user/riscv/target_elf.h b/linux-user/riscv/target_elf.h
index a6716a6aac..9dd65652ee 100644
--- a/linux-user/riscv/target_elf.h
+++ b/linux-user/riscv/target_elf.h
@@ -9,6 +9,7 @@
 #define RISCV_TARGET_ELF_H
 static inline const char *cpu_get_model(uint32_t eflags)
 {
+/* TYPE_RISCV_CPU_ANY */
 return "any";
 }
 #endif
-- 
2.21.0



[Qemu-devel] [PATCH for 4.1 v3 6/6] riscv: Add a generic spike machine

2019-04-10 Thread Alistair Francis
Add a generic spike machine (not tied to a version) and deprecate the
spike mahines that are tied to a specific version. As we can now specify
the CPU via the command line we no londer need specific versions of the
spike machines.

Signed-off-by: Alistair Francis 
---
 hw/riscv/spike.c | 106 ++-
 1 file changed, 105 insertions(+), 1 deletion(-)

diff --git a/hw/riscv/spike.c b/hw/riscv/spike.c
index 2a000a5800..9d3f7cec4d 100644
--- a/hw/riscv/spike.c
+++ b/hw/riscv/spike.c
@@ -39,6 +39,7 @@
 #include "chardev/char.h"
 #include "sysemu/arch_init.h"
 #include "sysemu/device_tree.h"
+#include "sysemu/qtest.h"
 #include "exec/address-spaces.h"
 #include "elf.h"
 
@@ -160,7 +161,89 @@ static void create_fdt(SpikeState *s, const struct 
MemmapEntry *memmap,
 qemu_fdt_add_subnode(fdt, "/chosen");
 qemu_fdt_setprop_string(fdt, "/chosen", "bootargs", cmdline);
 }
- }
+}
+
+static void spike_board_init(MachineState *machine)
+{
+const struct MemmapEntry *memmap = spike_memmap;
+
+SpikeState *s = g_new0(SpikeState, 1);
+MemoryRegion *system_memory = get_system_memory();
+MemoryRegion *main_mem = g_new(MemoryRegion, 1);
+MemoryRegion *mask_rom = g_new(MemoryRegion, 1);
+int i;
+
+/* Initialize SOC */
+object_initialize_child(OBJECT(machine), "soc", >soc, sizeof(s->soc),
+TYPE_RISCV_HART_ARRAY, _abort, NULL);
+object_property_set_str(OBJECT(>soc), machine->cpu_type, "cpu-type",
+_abort);
+object_property_set_int(OBJECT(>soc), smp_cpus, "num-harts",
+_abort);
+object_property_set_bool(OBJECT(>soc), true, "realized",
+_abort);
+
+/* register system main memory (actual RAM) */
+memory_region_init_ram(main_mem, NULL, "riscv.spike.ram",
+   machine->ram_size, _fatal);
+memory_region_add_subregion(system_memory, memmap[SPIKE_DRAM].base,
+main_mem);
+
+/* create device tree */
+create_fdt(s, memmap, machine->ram_size, machine->kernel_cmdline);
+
+/* boot rom */
+memory_region_init_rom(mask_rom, NULL, "riscv.spike.mrom",
+   memmap[SPIKE_MROM].size, _fatal);
+memory_region_add_subregion(system_memory, memmap[SPIKE_MROM].base,
+mask_rom);
+
+if (machine->kernel_filename) {
+load_kernel(machine->kernel_filename);
+}
+
+/* reset vector */
+uint32_t reset_vec[8] = {
+0x0297,  /* 1:  auipc  t0, %pcrel_hi(dtb) */
+0x02028593,  /* addi   a1, t0, %pcrel_lo(1b) */
+0xf1402573,  /* csrr   a0, mhartid  */
+#if defined(TARGET_RISCV32)
+0x0182a283,  /* lw t0, 24(t0) */
+#elif defined(TARGET_RISCV64)
+0x0182b283,  /* ld t0, 24(t0) */
+#endif
+0x00028067,  /* jr t0 */
+0x,
+memmap[SPIKE_DRAM].base, /* start: .dword DRAM_BASE */
+0x,
+ /* dtb: */
+};
+
+/* copy in the reset vector in little_endian byte order */
+for (i = 0; i < sizeof(reset_vec) >> 2; i++) {
+reset_vec[i] = cpu_to_le32(reset_vec[i]);
+}
+rom_add_blob_fixed_as("mrom.reset", reset_vec, sizeof(reset_vec),
+  memmap[SPIKE_MROM].base, _space_memory);
+
+/* copy in the device tree */
+if (fdt_pack(s->fdt) || fdt_totalsize(s->fdt) >
+memmap[SPIKE_MROM].size - sizeof(reset_vec)) {
+error_report("not enough space to store device-tree");
+exit(1);
+}
+qemu_fdt_dumpdtb(s->fdt, fdt_totalsize(s->fdt));
+rom_add_blob_fixed_as("mrom.fdt", s->fdt, fdt_totalsize(s->fdt),
+  memmap[SPIKE_MROM].base + sizeof(reset_vec),
+  _space_memory);
+
+/* initialize HTIF using symbols found in load_kernel */
+htif_mm_init(system_memory, mask_rom, >soc.harts[0].env, serial_hd(0));
+
+/* Core Local Interruptor (timer and IPI) */
+sifive_clint_create(memmap[SPIKE_CLINT].base, memmap[SPIKE_CLINT].size,
+smp_cpus, SIFIVE_SIP_BASE, SIFIVE_TIMECMP_BASE, SIFIVE_TIME_BASE);
+}
 
 static void spike_v1_10_0_board_init(MachineState *machine)
 {
@@ -172,6 +255,12 @@ static void spike_v1_10_0_board_init(MachineState *machine)
 MemoryRegion *mask_rom = g_new(MemoryRegion, 1);
 int i;
 
+if (!qtest_enabled()) {
+info_report("The Spike v1.10.0 machine has been depreceated. "
+"Please use the deneric spike machine and specify the ISA "
+"versions using -cpu.");
+}
+
 /* Initialize SOC */
 object_initialize_child(OBJECT(machine), "soc", >soc, sizeof(s->soc),
 TYPE_RISCV_HART_ARRAY, _abort, NULL);
@@ -254,6 +343,12 @@ static void 

[Qemu-devel] [PATCH for 4.1 v3 5/6] target/riscv: Remove the generic no MMU CPUs

2019-04-10 Thread Alistair Francis
These can now be specified via the command line so we no longer need
these.

Signed-off-by: Alistair Francis 
---
 target/riscv/cpu.c | 2 --
 target/riscv/cpu.h | 2 --
 2 files changed, 4 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index c792bacd24..9ba77a1983 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -586,13 +586,11 @@ static const TypeInfo riscv_cpu_type_infos[] = {
 #if defined(TARGET_RISCV32)
 DEFINE_CPU(TYPE_RISCV_CPU_RV32GCSU_V1_09_1, rv32gcsu_priv1_09_1_cpu_init),
 DEFINE_CPU(TYPE_RISCV_CPU_RV32GCSU_V1_10_0, rv32gcsu_priv1_10_0_cpu_init),
-DEFINE_CPU(TYPE_RISCV_CPU_RV32IMACU_NOMMU,  rv32imacu_nommu_cpu_init),
 DEFINE_CPU(TYPE_RISCV_CPU_SIFIVE_E31,   rv32imacu_nommu_cpu_init),
 DEFINE_CPU(TYPE_RISCV_CPU_SIFIVE_U34,   rv32gcsu_priv1_10_0_cpu_init)
 #elif defined(TARGET_RISCV64)
 DEFINE_CPU(TYPE_RISCV_CPU_RV64GCSU_V1_09_1, rv64gcsu_priv1_09_1_cpu_init),
 DEFINE_CPU(TYPE_RISCV_CPU_RV64GCSU_V1_10_0, rv64gcsu_priv1_10_0_cpu_init),
-DEFINE_CPU(TYPE_RISCV_CPU_RV64IMACU_NOMMU,  rv64imacu_nommu_cpu_init),
 DEFINE_CPU(TYPE_RISCV_CPU_SIFIVE_E51,   rv64imacu_nommu_cpu_init),
 DEFINE_CPU(TYPE_RISCV_CPU_SIFIVE_U54,   rv64gcsu_priv1_10_0_cpu_init)
 #endif
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index bc877d8107..6806f602b5 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -51,10 +51,8 @@
 #define TYPE_RISCV_CPU_GEN  RISCV_CPU_TYPE_NAME("rv*")
 #define TYPE_RISCV_CPU_RV32GCSU_V1_09_1 RISCV_CPU_TYPE_NAME("rv32gcsu-v1.9.1")
 #define TYPE_RISCV_CPU_RV32GCSU_V1_10_0 RISCV_CPU_TYPE_NAME("rv32gcsu-v1.10.0")
-#define TYPE_RISCV_CPU_RV32IMACU_NOMMU  RISCV_CPU_TYPE_NAME("rv32imacu-nommu")
 #define TYPE_RISCV_CPU_RV64GCSU_V1_09_1 RISCV_CPU_TYPE_NAME("rv64gcsu-v1.9.1")
 #define TYPE_RISCV_CPU_RV64GCSU_V1_10_0 RISCV_CPU_TYPE_NAME("rv64gcsu-v1.10.0")
-#define TYPE_RISCV_CPU_RV64IMACU_NOMMU  RISCV_CPU_TYPE_NAME("rv64imacu-nommu")
 #define TYPE_RISCV_CPU_SIFIVE_E31   RISCV_CPU_TYPE_NAME("sifive-e31")
 #define TYPE_RISCV_CPU_SIFIVE_E51   RISCV_CPU_TYPE_NAME("sifive-e51")
 #define TYPE_RISCV_CPU_SIFIVE_U34   RISCV_CPU_TYPE_NAME("sifive-u34")
-- 
2.21.0



[Qemu-devel] [PATCH for 4.1 v3 3/6] target/riscv: Create settable CPU properties

2019-04-10 Thread Alistair Francis
Signed-off-by: Alistair Francis 
---
 target/riscv/cpu.c | 52 ++
 target/riscv/cpu.h |  8 +++
 2 files changed, 56 insertions(+), 4 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 27be9e412a..c792bacd24 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -23,6 +23,7 @@
 #include "cpu.h"
 #include "exec/exec-all.h"
 #include "qapi/error.h"
+#include "hw/qdev-properties.h"
 #include "migration/vmstate.h"
 
 /* RISC-V CPU definitions */
@@ -191,12 +192,9 @@ static void riscv_generate_cpu_init(Object *obj)
 }
 
 set_misa(env, rvxlen | target_misa);
-set_versions(env, USER_VERSION_2_02_0, PRIV_VERSION_1_10_0);
-set_resetvec(env, DEFAULT_RSTVEC);
-set_feature(env, RISCV_FEATURE_MMU);
-set_feature(env, RISCV_FEATURE_PMP);
 }
 
+
 static void riscv_any_cpu_init(Object *obj)
 {
 CPURISCVState *env = _CPU(obj)->env;
@@ -394,7 +392,11 @@ static void riscv_cpu_disas_set_info(CPUState *s, 
disassemble_info *info)
 static void riscv_cpu_realize(DeviceState *dev, Error **errp)
 {
 CPUState *cs = CPU(dev);
+RISCVCPU *cpu = RISCV_CPU(dev);
+CPURISCVState *env = >env;
 RISCVCPUClass *mcc = RISCV_CPU_GET_CLASS(dev);
+int priv_version = PRIV_VERSION_1_10_0;
+int user_version = USER_VERSION_2_02_0;
 Error *local_err = NULL;
 
 cpu_exec_realizefn(cs, _err);
@@ -403,6 +405,39 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 return;
 }
 
+if (cpu->cfg.priv_spec) {
+if (!g_strcmp0(cpu->cfg.priv_spec, "v1.10.0")) {
+priv_version = PRIV_VERSION_1_10_0;
+} else if (!g_strcmp0(cpu->cfg.priv_spec, "v1.9.1")) {
+priv_version = PRIV_VERSION_1_09_1;
+} else {
+error_report("Unsupported privilege spec version '%s'",
+ cpu->cfg.priv_spec);
+exit(1);
+}
+}
+
+if (cpu->cfg.user_spec) {
+if (!g_strcmp0(cpu->cfg.user_spec, "v2.02.0")) {
+user_version = USER_VERSION_2_02_0;
+} else {
+error_report("Unsupported user spec version '%s'",
+ cpu->cfg.user_spec);
+exit(1);
+}
+}
+
+set_versions(env, user_version, priv_version);
+set_resetvec(env, DEFAULT_RSTVEC);
+
+if (cpu->cfg.mmu) {
+set_feature(env, RISCV_FEATURE_MMU);
+}
+
+if (cpu->cfg.pmp) {
+set_feature(env, RISCV_FEATURE_PMP);
+}
+
 riscv_cpu_register_gdb_regs_for_features(cs);
 
 qemu_init_vcpu(cs);
@@ -424,6 +459,14 @@ static const VMStateDescription vmstate_riscv_cpu = {
 .unmigratable = 1,
 };
 
+static Property riscv_cpu_properties[] = {
+DEFINE_PROP_STRING("priv_spec", RISCVCPU, cfg.priv_spec),
+DEFINE_PROP_STRING("user_spec", RISCVCPU, cfg.user_spec),
+DEFINE_PROP_BOOL("mmu", RISCVCPU, cfg.mmu, true),
+DEFINE_PROP_BOOL("pmp", RISCVCPU, cfg.pmp, true),
+DEFINE_PROP_END_OF_LIST(),
+};
+
 static void riscv_cpu_class_init(ObjectClass *c, void *data)
 {
 RISCVCPUClass *mcc = RISCV_CPU_CLASS(c);
@@ -464,6 +507,7 @@ static void riscv_cpu_class_init(ObjectClass *c, void *data)
 #endif
 /* For now, mark unmigratable: */
 cc->vmsd = _riscv_cpu;
+dc->props = riscv_cpu_properties;
 }
 
 char *riscv_isa_string(RISCVCPU *cpu)
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 453108a855..bc877d8107 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -226,6 +226,14 @@ typedef struct RISCVCPU {
 CPUState parent_obj;
 /*< public >*/
 CPURISCVState env;
+
+/* Configuration Settings */
+struct {
+char *priv_spec;
+char *user_spec;
+bool mmu;
+bool pmp;
+} cfg;
 } RISCVCPU;
 
 static inline RISCVCPU *riscv_env_get_cpu(CPURISCVState *env)
-- 
2.21.0



Re: [Qemu-devel] [PATCH 0/3] vhost-scsi: Support live migration

2019-04-10 Thread Liran Alon



> On 9 Apr 2019, at 17:29, Stefan Hajnoczi  wrote:
> 
> On Thu, Mar 21, 2019 at 09:55:42AM +0200, Nir Weiner wrote:
>> Originally migration was not possible with vhost-scsi because
>> as part of migration, the source host target SCSI device state
>> needs to be saved and loaded into the destination host target SCSI
>> device. This cannot be done by QEMU.
>> 
>> As this can be handled either by external orchestrator or by having
>> shared-storage (i.e. iSCSI), there is no reason to limit the
>> orchestrator from being able to explictly specify it wish to enable
>> migration even when VM have a vhost-scsi device.
>> 
>> Liran Alon (1):
>>  vhost-scsi: Allow user to enable migration
>> 
>> Nir Weiner (2):
>>  vhost-scsi: The vhost device should be stopped when the VM is not
>>running
>>  vhost-scsi: Add vmstate descriptor
>> 
>> hw/scsi/vhost-scsi.c  | 57 ++-
>> include/hw/virtio/vhost-scsi-common.h |  1 +
>> 2 files changed, 48 insertions(+), 10 deletions(-)
> 
> What happens when migration is attempted while there is in-flight I/O in
> the vring?
> 
> Stefan

What do you define as “in-flight I/O”? I think there is a need to separate the 
discussion here to multiple cases:

1) The I/O request was written to vring but guest have not yet written to 
doorbell:
This is the simplest case. No state is lost as the vring is migrated from 
source host to dest host as part of guest memory migration.
Also, the vring properties (E.g. Guest base address) is transferred from source 
host to dest host as part of vhost-scsi VMState.
(See patch 2/3 of series which adds VMSTATE_VIRTIO_DEVICE to vhost-scsi VMState 
which will cause virtio_save() to save vring on migration stream).

2) The I/O request was written to vring and the guest have written to doorbell:
The I/O request was submitted and processed by the vhost-scsi backend. 
Therefore, the I/O request was sent to the remote TGT server.
If the the TGT server has performed the write and returned ACK but the ACK was 
not received by guest, then guest is expected to initiate the write again.
(Similar to what happens for a momentary TGT server downtime / network 
downtime). So this isn’t suppose to be an issue.

An interesting case is what happens if there is an in-flight I/O write request 
(Request A) performed by source host that was sent already over the network to 
the TGT server.
But until it reaches the TGT server, the VM running on dest performs another 
I/O write request (Request B) to the same sector which results in Request B 
being handled by TGT server before Request A. In this case, it is possible that 
TGT server will actually overwrite data written by Request B with older request 
Request A that arrived to TGT later.
(This will be the result in case of using an iSCSI TGT).
But, there are various techniques that TGT server can implement to avoid this. 
For example, fencing-out requests from any older iSCSI connection than the most 
recent one.

In general, vhost-scsi migration shouldn’t be different than vhost-net or 
vhost-vsock migration.
The only thing that may be problematic in case of vhost-scsi is because the 
source host target SCSI device state may need to be saved and restored on dest 
host target SCSI device.
However, when using a shared-storage via iSCSI, this concern is irrelevant. 
Therefore, this patch series attempts to allow the admin to remove the 
migration-blocker limitation from vhost-scsi if it is using such a setup. As 
described in cover-letter.

-Liran











Re: [Qemu-devel] [PATCH] sd: Fix out-of-bounds assertions

2019-04-10 Thread Lidong Chen

Hi,

Thank you all for the reviews and comments! Since the fixes in sd.c have 
gone through the review, I can fix the issue in util/main-loop.c 
(mentioned in the reviews of Peter and Liam) in a separate patch.


Thanks,

Lidong

On 4/9/2019 3:39 AM, Liam Merwick wrote:

On 09/04/2019 06:51, Markus Armbruster wrote:

Lidong Chen  writes:


Due to an off-by-one error, the assert statements allow an
out-of-bounds array access.

Signed-off-by: Lidong Chen 



Reviewed-by: Liam Merwick 



---
  hw/sd/sd.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/sd/sd.c b/hw/sd/sd.c
index aaab15f..818f86c 100644
--- a/hw/sd/sd.c
+++ b/hw/sd/sd.c
@@ -144,7 +144,7 @@ static const char *sd_state_name(enum 
SDCardStates state)

  if (state == sd_inactive_state) {
  return "inactive";
  }
-    assert(state <= ARRAY_SIZE(state_name));
+    assert(state < ARRAY_SIZE(state_name));
  return state_name[state];
  }
  @@ -165,7 +165,7 @@ static const char 
*sd_response_name(sd_rsp_type_t rsp)

  if (rsp == sd_r1b) {
  rsp = sd_r1;
  }
-    assert(rsp <= ARRAY_SIZE(response_name));
+    assert(rsp < ARRAY_SIZE(response_name));
  return response_name[rsp];
  }

This is the second fix for this bug pattern in a fortnight. Where's
one, there are more:



As Lidong mentioned, an internal static analysis tool (Parfait) was 
used to catch these. Not every arch/board is compiled but I had 
eyeballed the code of most interest to me and they seemed fine (e.g. 
for array accesses, the subsequent loops used a less-than check)


However, this WIN32 code in util/main-loop.c seems wrong.

425 g_assert(n_poll_fds <= ARRAY_SIZE(poll_fds));
426
427 for (i = 0; i < w->num; i++) {
428 poll_fds[n_poll_fds + i].fd = (DWORD_PTR)w->events[i];
429 poll_fds[n_poll_fds + i].events = G_IO_IN;
430 }

Seems like this should be:

g_assert(n_poll_fds + w->num <= ARRAY_SIZE(poll_fds));

Otherwise, the rest seem fine.

Regards,

Liam




$ git-grep '<= ARRAY_SIZE'
hw/intc/arm_gicv3_cpuif.c:    assert(aprmax <= 
ARRAY_SIZE(cs->ich_apr[0]));
hw/intc/arm_gicv3_cpuif.c:    assert(aprmax <= 
ARRAY_SIZE(cs->ich_apr[0]));
hw/net/stellaris_enet.c:    if (s->tx_fifo_len + 4 <= 
ARRAY_SIZE(s->tx_fifo)) {

hw/sd/pxa2xx_mmci.c:    && s->tx_len <= ARRAY_SIZE(s->tx_fifo)
hw/sd/pxa2xx_mmci.c:    && s->rx_len <= ARRAY_SIZE(s->rx_fifo)
hw/sd/pxa2xx_mmci.c:    && s->resp_len <= ARRAY_SIZE(s->resp_fifo);
hw/sd/sd.c:    assert(state <= ARRAY_SIZE(state_name));
hw/sd/sd.c:    assert(rsp <= ARRAY_SIZE(response_name));
hw/usb/hcd-xhci.c:    assert(n <= ARRAY_SIZE(tmp));
target/mips/op_helper.c:    if (base_reglist > 0 && base_reglist <= 
ARRAY_SIZE (multiple_regs)) {
target/mips/op_helper.c:    if (base_reglist > 0 && base_reglist <= 
ARRAY_SIZE (multiple_regs)) {
target/mips/op_helper.c:    if (base_reglist > 0 && base_reglist <= 
ARRAY_SIZE (multiple_regs)) {
target/mips/op_helper.c:    if (base_reglist > 0 && base_reglist <= 
ARRAY_SIZE (multiple_regs)) {

target/ppc/kvm.c:   <= ARRAY_SIZE(hw_debug_points));
target/ppc/kvm.c:   <= ARRAY_SIZE(hw_debug_points));
target/ppc/kvm.c:    assert((nb_hw_breakpoint + nb_hw_watchpoint) <= 
ARRAY_SIZE(dbg->arch.bp));

tcg/tcg.c:    tcg_debug_assert(pi <= ARRAY_SIZE(op->args));
util/main-loop.c:    g_assert(n_poll_fds <= ARRAY_SIZE(poll_fds));
util/module.c:    assert(n_dirs <= ARRAY_SIZE(dirs));

Lidong Chen, would you like to have a look at these?

Cc'ing maintainers to help with further investigation.







Re: [Qemu-devel] [ANNOUNCE] QEMU 4.0.0-rc3 is now available

2019-04-10 Thread Andrew Randrianasulu
Please also include this patch in next -rc or final, it fixes 32-bit 
compilation:

https://patchew.org/QEMU/20190402073018.17747-1-kraxel%40redhat.com/
([Qemu-devel] [PATCH] curses: fix wchar_t printf warning)

without it I get 

ui/curses.c: In function 'get_ucs':
ui/curses.c:456:25: error: format '%x' expects argument of type 'unsigned int', 
but argument 3 has type 'wchar_t {aka long int}' [-Werror=format=]
 fprintf(stderr, "Could not convert 0x%02x from WCHAR_T to UCS-2: %s\n",
 ^
cc1: all warnings being treated as errors
make: *** [ui/curses.o] Error 1
make: *** Waiting for unfinished jobs



Re: [Qemu-devel] [PATCH] hw/char/escc: Lower irq when transmit buffer is filled

2019-04-10 Thread Stephen Checkoway



On Apr 10, 2019, at 16:01, Philippe Mathieu-Daudé  wrote:

> On 3/6/19 12:01 PM, Paolo Bonzini wrote:
>> On 05/03/19 06:10, Stephen Checkoway wrote:
>>> The SCC/ESCC will briefly stop asserting an interrupt when the
>>> transmit FIFO is filled.
>>> 
>>> This code doesn't model the transmit FIFO/shift register so the
>>> pending transmit interrupt is never deasserted which means that an
>>> edge-triggered interrupt controller will never see the low-to-high
>>> transition it needs to raise another interrupt. The practical
>>> consequence of this is that guest firmware with an interrupt service
>>> routine for the ESCC that does not send all of the data it has
>>> immediately will stop sending data if the following sequence of
>>> events occurs:
>>> 1. Disable processor interrupts
>>> 2. Write a character to the ESCC
>>> 3. Add additional characters to a buffer which is drained by the ISR
>>> 4. Enable processor interrupts
>>> 
>>> In this case, the first character will be sent, the interrupt will
>>> fire and the ISR will output the second character. Since the pending
>>> transmit interrupt remains asserted, no additional interrupts will
>>> ever fire.
>>> 
>>> This fixes that situation by explicitly lowering the IRQ when a
>>> character is written to the buffer.
>>> 
>>> Signed-off-by: Stephen Checkoway 
>> 
>> Looks good but I would like Mark to give his ack as well.
>> 
>> Mark, could you also add hw/char/escc.c to both SPARC and Mac sections
>> of MAINTAINERS?
> 
> Cc'ing Artyom who also made some changes in this file, and Laurent.
> 
> 
> Stephen, which particular chipset are you using?

I'm away from the hardware, but my notes say Z80C30. I've included the image 
from the board if you want to try to decipher the rest of the part number. My 
best guess is it's a Z85C3008VEC. If it's important, I can ask someone who is a 
few thousand km closer to the hardware than I currently am to take a look.



> This file models various types. I had a talk with Mark or Laurent at
> last KVM forum about this device. IIRC, while the sun4m machines use a
> real chipset, the MacIO embeds an slighly modified IP core.
> 
> I can't find what you describe in the Z85C30 doc, however in the ESCC
> datasheet referenced in escc.c (which happens to be a 85MiB pdf!!!) I found:

(This chip is ridiculously configurable, I honestly found it much easier to 
write a driver by reverse engineering existing firmware than by reading the 
part sheet.)

> 
>  Transmit Interrupts and Transmit Buffer Empty Bit on the NMOS/CMOS
> 
>  The TxIP is reset either by writing data to the transmit buffer or
>  by issuing the Reset Tx Int command in WR0.
> 
> I understand the NMOS/CMOS desc. matches the Z85C30 (no FIFO used).

It has been a little while since I was last looking at this, but my 
recollection was the SCC has a 1-byte transmit buffer and the ESCC has a small 
(4 maybe?) byte transmit buffer. But in either case writing data to the 
transmit buffer should clear the interrupt.

> So your description and patch makes sens.
> What worries me is the controller could have other pending IRQs to
> deliver and you are clearing them. Shouldn't we only clear the
> INTR_TXINT bit, and call escc_update_irq() which should lower the IRQ if
> no bits are pending?
> 
> Maybe as:
> 
>s->wregs[W_INTR] &= ~INTR_TXINT;
>escc_update_irq(s);

Ah, I think I see. In this case, escc_update_irq() will lower the IRQ if no 
other interrupts are pending and the set_txint() will raise it again. 
Otherwise, it'll remain raised. I can give this a shot and see how it goes. (I 
think this should be R_INTR instead of W_INTR, but I'll double check once I 
have an opportunity to dig into this again.)

Thanks for looking at this!

Steve


-- 
Stephen Checkoway







[Qemu-devel] [PATCH v2 6/6] iotests: Test qemu-img convert --salvage

2019-04-10 Thread Max Reitz
This test converts a simple image to another, but blkdebug injects
block_status and read faults at some offsets.  The resulting image
should be the same as the input image, except that sectors that could
not be read have to be 0.

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/249 | 163 +
 tests/qemu-iotests/249.out |  43 ++
 tests/qemu-iotests/group   |   1 +
 3 files changed, 207 insertions(+)
 create mode 100755 tests/qemu-iotests/249
 create mode 100644 tests/qemu-iotests/249.out

diff --git a/tests/qemu-iotests/249 b/tests/qemu-iotests/249
new file mode 100755
index 00..d3883d75f3
--- /dev/null
+++ b/tests/qemu-iotests/249
@@ -0,0 +1,163 @@
+#!/bin/bash
+#
+# Test qemu-img convert --salvage
+#
+# Copyright (C) 2018 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+# creator
+owner=mre...@redhat.com
+
+seq=$(basename $0)
+echo "QA output created by $seq"
+
+here=$PWD
+status=1   # failure is the default!
+
+_cleanup()
+{
+_cleanup_test_img
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+. ./common.qemu
+
+_supported_fmt generic
+_supported_proto file
+_supported_os Linux
+
+
+TEST_IMG="$TEST_IMG.orig" _make_test_img 64M
+
+$QEMU_IO -c 'write -P 42 0 64M' "$TEST_IMG.orig" | _filter_qemu_io
+
+
+sector_size=512
+
+# Offsets on which to fail block-status.  Keep in ascending order so
+# the indexing done by _filter_offsets will appear in ascending order
+# in the output as well.
+status_fail_offsets="$((16 * 1024 * 1024 + 8192))
+ $((33 * 1024 * 1024 + 512))"
+
+# Offsets on which to fail reads.  Keep in ascending order for the
+# same reason.
+# Element 1 is shared with $status_fail_offsets on purpose.
+# Elements 2 and later test what happens when a continuous range of
+# sectors is inaccessible.
+read_fail_offsets="$((32 * 1024 * 1024 - 65536))
+   $((33 * 1024 * 1024 + 512))
+   $(seq $((34 * 1024 * 1024)) $sector_size \
+ $((34 * 1024 * 1024 + 4096 - $sector_size)))"
+
+
+# blkdebug must be above the format layer so it can intercept all
+# block-status events
+source_img="json:{'driver': 'blkdebug',
+  'image': {
+  'driver': '$IMGFMT',
+  'file': {
+  'driver': 'file',
+  'filename': '$TEST_IMG.orig'
+  }
+  },
+  'inject-error': ["
+
+for ofs in $status_fail_offsets
+do
+source_img+="{ 'event': 'none',
+   'iotype': 'block-status',
+   'errno': 5,
+   'sector': $((ofs / sector_size)) },"
+done
+
+for ofs in $read_fail_offsets
+do
+source_img+="{ 'event': 'none',
+   'iotype': 'read',
+   'errno': 5,
+   'sector': $((ofs / sector_size)) },"
+done
+
+# Remove the trailing comma and terminate @inject-error and json:{}
+source_img="${source_img%,} ] }"
+
+
+echo
+
+
+_filter_offsets() {
+filters=
+
+index=0
+for ofs in $2
+do
+filters+=" -e s/$(printf "$1" $ofs)/status_fail_offset_$index/"
+index=$((index + 1))
+done
+
+index=0
+for ofs in $3
+do
+filters+=" -e s/$(printf "$1" $ofs)/read_fail_offset_$index/"
+index=$((index + 1))
+done
+
+sed $filters
+}
+
+# While determining the number of allocated sectors in the input
+# image, we should see one block status warning per element of
+# $status_fail_offsets.
+#
+# Then, the image is read.  Since the block status is queried in
+# basically the same way, the same warnings as in the previous step
+# should reappear.  Interleaved with those we should see a read
+# warning per element of $read_fail_offsets.
+# Note that $read_fail_offsets and $status_fail_offsets share an
+# element (read_fail_offset_1 == status_fail_offset_1), so
+# "status_fail_offset_1" in the output is the same as
+# "read_fail_offset_1".
+$QEMU_IMG convert --salvage "$source_img" "$TEST_IMG" 2>&1 \
+| _filter_offsets '%i' "$status_fail_offsets" "$read_fail_offsets"
+
+echo
+
+# The offsets where the block status could not be determined should
+# have been treated as containing data and thus should be correct in
+# the output 

[Qemu-devel] [PATCH v2 3/6] blkdebug: Add @iotype error option

2019-04-10 Thread Max Reitz
This new error option allows users of blkdebug to inject errors only on
certain kinds of I/O operations.  Users usually want to make a very
specific operation fail, not just any; but right now they simply hope
that the event that triggers the error injection is followed up with
that very operation.  That may not be true, however, because the block
layer is changing (including blkdebug, which may increase the number of
types of I/O operations on which to inject errors).

The new option's default has been chosen to keep backwards
compatibility.

Note that similar to the internal representation, we could choose to
expose this option as a list of I/O types.  But there is no practical
use for this, because as described above, users usually know exactly
which kind of operation they want to make fail, so there is no need to
specify multiple I/O types at once.  In addition, exposing this option
as a list would require non-trivial changes to qemu_opts_absorb_qdict().

Signed-off-by: Max Reitz 
---
 qapi/block-core.json | 26 +++
 block/blkdebug.c | 50 
 2 files changed, 67 insertions(+), 9 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 7ccbfff9d0..5141e91172 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3235,6 +3235,26 @@
 'l1_shrink_write_table', 'l1_shrink_free_l2_clusters',
 'cor_write'] }
 
+##
+# @BlkdebugIOType:
+#
+# Kinds of I/O that blkdebug can inject errors in.
+#
+# @read: .bdrv_co_preadv()
+#
+# @write: .bdrv_co_pwritev()
+#
+# @write-zeroes: .bdrv_co_pwrite_zeroes()
+#
+# @discard: .bdrv_co_pdiscard()
+#
+# @flush: .bdrv_co_flush_to_disk()
+#
+# Since: 4.1
+##
+{ 'enum': 'BlkdebugIOType',
+  'data': [ 'read', 'write', 'write-zeroes', 'discard', 'flush' ] }
+
 ##
 # @BlkdebugInjectErrorOptions:
 #
@@ -3245,6 +3265,11 @@
 # @state:   the state identifier blkdebug needs to be in to
 #   actually trigger the event; defaults to "any"
 #
+# @iotype:  the type of I/O operations on which this error should
+#   be injected; defaults to "all read, write,
+#   write-zeroes, discard, and flush operations"
+#   (since: 4.1)
+#
 # @errno:   error identifier (errno) to be returned; defaults to
 #   EIO
 #
@@ -3262,6 +3287,7 @@
 { 'struct': 'BlkdebugInjectErrorOptions',
   'data': { 'event': 'BlkdebugEvent',
 '*state': 'int',
+'*iotype': 'BlkdebugIOType',
 '*errno': 'int',
 '*sector': 'int',
 '*once': 'bool',
diff --git a/block/blkdebug.c b/block/blkdebug.c
index efd9441625..ca84b12e32 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -75,6 +75,7 @@ typedef struct BlkdebugRule {
 int state;
 union {
 struct {
+uint64_t iotype_mask;
 int error;
 int immediately;
 int once;
@@ -91,6 +92,9 @@ typedef struct BlkdebugRule {
 QSIMPLEQ_ENTRY(BlkdebugRule) active_next;
 } BlkdebugRule;
 
+QEMU_BUILD_BUG_MSG(BLKDEBUGIO_TYPE__MAX > 64,
+   "BlkdebugIOType mask does not fit into an uint64_t");
+
 static QemuOptsList inject_error_opts = {
 .name = "inject-error",
 .head = QTAILQ_HEAD_INITIALIZER(inject_error_opts.head),
@@ -103,6 +107,10 @@ static QemuOptsList inject_error_opts = {
 .name = "state",
 .type = QEMU_OPT_NUMBER,
 },
+{
+.name = "iotype",
+.type = QEMU_OPT_STRING,
+},
 {
 .name = "errno",
 .type = QEMU_OPT_NUMBER,
@@ -162,6 +170,8 @@ static int add_rule(void *opaque, QemuOpts *opts, Error 
**errp)
 int event;
 struct BlkdebugRule *rule;
 int64_t sector;
+BlkdebugIOType iotype;
+Error *local_error = NULL;
 
 /* Find the right event for the rule */
 event_name = qemu_opt_get(opts, "event");
@@ -192,6 +202,26 @@ static int add_rule(void *opaque, QemuOpts *opts, Error 
**errp)
 sector = qemu_opt_get_number(opts, "sector", -1);
 rule->options.inject.offset =
 sector == -1 ? -1 : sector * BDRV_SECTOR_SIZE;
+
+iotype = qapi_enum_parse(_lookup,
+ qemu_opt_get(opts, "iotype"),
+ BLKDEBUGIO_TYPE__MAX, _error);
+if (local_error) {
+error_propagate(errp, local_error);
+return -1;
+}
+if (iotype != BLKDEBUGIO_TYPE__MAX) {
+rule->options.inject.iotype_mask = (1ull << iotype);
+} else {
+/* Apply the default */
+rule->options.inject.iotype_mask =
+(1ull << BLKDEBUGIO_TYPE_READ)
+| (1ull << BLKDEBUGIO_TYPE_WRITE)
+| (1ull << BLKDEBUGIO_TYPE_WRITE_ZEROES)
+| (1ull << BLKDEBUGIO_TYPE_DISCARD)
+| (1ull << BLKDEBUGIO_TYPE_FLUSH);
+}
+
 break;
 
 case 

[Qemu-devel] [PATCH v2 1/6] qemu-img: Move quiet into ImgConvertState

2019-04-10 Thread Max Reitz
Move img_convert()'s quiet flag into the ImgConvertState so it is
accessible by nested functions.  -q dictates that it suppresses anything
but errors, so if those functions want to emit warnings, they need to
query this flag first.  (There currently are no such warnings, but there
will be as of the next patch.)

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 qemu-img.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index aa6f81f1ea..c53666aa41 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1569,6 +1569,7 @@ typedef struct ImgConvertState {
 int64_t target_backing_sectors; /* negative if unknown */
 bool wr_in_order;
 bool copy_range;
+bool quiet;
 int min_sparse;
 int alignment;
 size_t cluster_sectors;
@@ -2005,7 +2006,7 @@ static int img_convert(int argc, char **argv)
 QDict *open_opts = NULL;
 char *options = NULL;
 Error *local_err = NULL;
-bool writethrough, src_writethrough, quiet = false, image_opts = false,
+bool writethrough, src_writethrough, image_opts = false,
  skip_create = false, progress = false, tgt_image_opts = false;
 int64_t ret = -EINVAL;
 bool force_share = false;
@@ -2113,7 +2114,7 @@ static int img_convert(int argc, char **argv)
 src_cache = optarg;
 break;
 case 'q':
-quiet = true;
+s.quiet = true;
 break;
 case 'n':
 skip_create = true;
@@ -2202,7 +2203,7 @@ static int img_convert(int argc, char **argv)
 }
 
 /* Initialize before goto out */
-if (quiet) {
+if (s.quiet) {
 progress = false;
 }
 qemu_progress_init(progress, 1.0);
@@ -2213,7 +2214,7 @@ static int img_convert(int argc, char **argv)
 
 for (bs_i = 0; bs_i < s.src_num; bs_i++) {
 s.src[bs_i] = img_open(image_opts, argv[optind + bs_i],
-   fmt, src_flags, src_writethrough, quiet,
+   fmt, src_flags, src_writethrough, s.quiet,
force_share);
 if (!s.src[bs_i]) {
 ret = -1;
@@ -2376,7 +2377,7 @@ static int img_convert(int argc, char **argv)
 
 if (skip_create) {
 s.target = img_open(tgt_image_opts, out_filename, out_fmt,
-flags, writethrough, quiet, false);
+flags, writethrough, s.quiet, false);
 } else {
 /* TODO ultimately we should allow --target-image-opts
  * to be used even when -n is not given.
@@ -2384,7 +2385,7 @@ static int img_convert(int argc, char **argv)
  * to allow filenames in option syntax
  */
 s.target = img_open_file(out_filename, open_opts, out_fmt,
- flags, writethrough, quiet, false);
+ flags, writethrough, s.quiet, false);
 open_opts = NULL; /* blk_new_open will have freed it */
 }
 if (!s.target) {
-- 
2.20.1




[Qemu-devel] [PATCH v2 4/6] blkdebug: Add "none" event

2019-04-10 Thread Max Reitz
Together with @iotypes and @sector, this can be used to trap e.g. the
first read or write access to a certain sector without having to know
what happens internally in the block layer, i.e. which "real" events
happen right before such an access.

Signed-off-by: Max Reitz 
---
 qapi/block-core.json | 4 +++-
 block/blkdebug.c | 2 ++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 5141e91172..717b13f7f5 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3215,6 +3215,8 @@
 #
 # @cor_write: a write due to copy-on-read (since 2.11)
 #
+# @none: triggers once at creation of the blkdebug node (since 4.1)
+#
 # Since: 2.9
 ##
 { 'enum': 'BlkdebugEvent', 'prefix': 'BLKDBG',
@@ -3233,7 +3235,7 @@
 'pwritev_rmw_tail', 'pwritev_rmw_after_tail', 'pwritev',
 'pwritev_zero', 'pwritev_done', 'empty_image_prepare',
 'l1_shrink_write_table', 'l1_shrink_free_l2_clusters',
-'cor_write'] }
+'cor_write', 'none' ] }
 
 ##
 # @BlkdebugIOType:
diff --git a/block/blkdebug.c b/block/blkdebug.c
index ca84b12e32..69c608be6f 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -491,6 +491,8 @@ static int blkdebug_open(BlockDriverState *bs, QDict 
*options, int flags,
 goto out;
 }
 
+bdrv_debug_event(bs, BLKDBG_NONE);
+
 ret = 0;
 out:
 if (ret < 0) {
-- 
2.20.1




[Qemu-devel] [PATCH v2 5/6] blkdebug: Inject errors on .bdrv_co_block_status()

2019-04-10 Thread Max Reitz
Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 qapi/block-core.json | 5 -
 block/blkdebug.c | 8 
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 717b13f7f5..2aa675a192 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3252,10 +3252,13 @@
 #
 # @flush: .bdrv_co_flush_to_disk()
 #
+# @block-status: .bdrv_co_block_status()
+#
 # Since: 4.1
 ##
 { 'enum': 'BlkdebugIOType',
-  'data': [ 'read', 'write', 'write-zeroes', 'discard', 'flush' ] }
+  'data': [ 'read', 'write', 'write-zeroes', 'discard', 'flush',
+'block-status' ] }
 
 ##
 # @BlkdebugInjectErrorOptions:
diff --git a/block/blkdebug.c b/block/blkdebug.c
index 69c608be6f..df9c8b1673 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -670,7 +670,15 @@ static int coroutine_fn 
blkdebug_co_block_status(BlockDriverState *bs,
  int64_t *map,
  BlockDriverState **file)
 {
+int err;
+
 assert(QEMU_IS_ALIGNED(offset | bytes, bs->bl.request_alignment));
+
+err = rule_check(bs, offset, bytes, BLKDEBUGIO_TYPE_BLOCK_STATUS);
+if (err) {
+return err;
+}
+
 return bdrv_co_block_status_from_file(bs, want_zero, offset, bytes,
   pnum, map, file);
 }
-- 
2.20.1




[Qemu-devel] [PATCH v2 2/6] qemu-img: Add salvaging mode to convert

2019-04-10 Thread Max Reitz
This adds a salvaging mode (--salvage) to qemu-img convert which ignores
read errors and treats the respective areas as containing only zeroes.
This can be used for instance to at least partially recover the data
from terminally corrupted qcow2 images.

Signed-off-by: Max Reitz 
---
 qemu-img.c   | 85 
 qemu-img-cmds.hx |  4 +--
 qemu-img.texi|  5 +++
 3 files changed, 71 insertions(+), 23 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index c53666aa41..c2216e67a6 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -66,6 +66,7 @@ enum {
 OPTION_SIZE = 264,
 OPTION_PREALLOCATION = 265,
 OPTION_SHRINK = 266,
+OPTION_SALVAGE = 267,
 };
 
 typedef enum OutputFormat {
@@ -1569,6 +1570,7 @@ typedef struct ImgConvertState {
 int64_t target_backing_sectors; /* negative if unknown */
 bool wr_in_order;
 bool copy_range;
+bool salvage;
 bool quiet;
 int min_sparse;
 int alignment;
@@ -1616,25 +1618,44 @@ static int convert_iteration_sectors(ImgConvertState 
*s, int64_t sector_num)
 }
 
 if (s->sector_next_status <= sector_num) {
-int64_t count = n * BDRV_SECTOR_SIZE;
+uint64_t offset = (sector_num - src_cur_offset) * BDRV_SECTOR_SIZE;
+int64_t count;
 
-if (s->target_has_backing) {
+do {
+count = n * BDRV_SECTOR_SIZE;
+
+if (s->target_has_backing) {
+ret = bdrv_block_status(blk_bs(s->src[src_cur]), offset,
+count, , NULL, NULL);
+} else {
+ret = bdrv_block_status_above(blk_bs(s->src[src_cur]), NULL,
+  offset, count, , NULL,
+  NULL);
+}
+
+if (ret < 0) {
+if (s->salvage) {
+if (n == 1) {
+if (!s->quiet) {
+warn_report("error while reading block status at "
+"offset %" PRIu64 ": %s", offset,
+strerror(-ret));
+}
+/* Just try to read the data, then */
+ret = BDRV_BLOCK_DATA;
+count = BDRV_SECTOR_SIZE;
+} else {
+/* Retry on a shorter range */
+n = DIV_ROUND_UP(n, 4);
+}
+} else {
+error_report("error while reading block status at offset "
+ "%" PRIu64 ": %s", offset, strerror(-ret));
+return ret;
+}
+}
+} while (ret < 0);
 
-ret = bdrv_block_status(blk_bs(s->src[src_cur]),
-(sector_num - src_cur_offset) *
-BDRV_SECTOR_SIZE,
-count, , NULL, NULL);
-} else {
-ret = bdrv_block_status_above(blk_bs(s->src[src_cur]), NULL,
-  (sector_num - src_cur_offset) *
-  BDRV_SECTOR_SIZE,
-  count, , NULL, NULL);
-}
-if (ret < 0) {
-error_report("error while reading block status of sector %" PRId64
- ": %s", sector_num, strerror(-ret));
-return ret;
-}
 n = DIV_ROUND_UP(count, BDRV_SECTOR_SIZE);
 
 if (ret & BDRV_BLOCK_ZERO) {
@@ -1671,6 +1692,7 @@ static int convert_iteration_sectors(ImgConvertState *s, 
int64_t sector_num)
 static int coroutine_fn convert_co_read(ImgConvertState *s, int64_t sector_num,
 int nb_sectors, uint8_t *buf)
 {
+uint64_t single_read_until = 0;
 int n, ret;
 QEMUIOVector qiov;
 
@@ -1679,6 +1701,7 @@ static int coroutine_fn convert_co_read(ImgConvertState 
*s, int64_t sector_num,
 BlockBackend *blk;
 int src_cur;
 int64_t bs_sectors, src_cur_offset;
+uint64_t offset;
 
 /* In the case of compression with multiple source files, we can get a
  * nb_sectors that spreads into the next part. So we must be able to
@@ -1687,14 +1710,30 @@ static int coroutine_fn convert_co_read(ImgConvertState 
*s, int64_t sector_num,
 blk = s->src[src_cur];
 bs_sectors = s->src_sectors[src_cur];
 
+offset = (sector_num - src_cur_offset) << BDRV_SECTOR_BITS;
+
 n = MIN(nb_sectors, bs_sectors - (sector_num - src_cur_offset));
+if (single_read_until > offset) {
+n = 1;
+}
 qemu_iovec_init_buf(, buf, n << BDRV_SECTOR_BITS);
 
-ret = blk_co_preadv(
-blk, (sector_num - src_cur_offset) << BDRV_SECTOR_BITS,
-n << BDRV_SECTOR_BITS, , 0);
+ret = 

[Qemu-devel] [PATCH v2 0/6] qemu-img: Add salvaging mode to convert

2019-04-10 Thread Max Reitz
Hi,

This series adds a --salvage option to qemu-img convert.  With this,
qemu-img will not abort when it encounters an I/O error.  Instead, it
tries to narrow it down and will treat the affected sectors as being
completely 0 (and print a warning).

Testing this is not so easy, because while real I/O errors during read
operations should be treated as described above, errors encountered
during bdrv_block_status() should just be ignored and the affected
sectors should be considered allocated.  But blkdebug does not yet have
a way to intercept this, and:

(1) Just adding a new block-status event would be silly, because I don't
want an event, I want it to fail on a certain kind of operation, on
a certain sector range, independently of any events, so why can't we
just do that?  See patch 4.

(2) If we just make blkdebug intercept .bdrv_co_block_status() like all
other kinds of operations, at least iotest 041 fails, which does
exactly that silly thing: It uses the read_aio event to wait for any
read.  But it turns out that there may be a bdrv_*block_status()
call in between, so suddenly the wrong operation yields an error.
As I said, the real fault here is that it does not really make sense
to pray that the operation you want to fail is the one that is
immediately executed after some event that you hope will trigger
that operation.
See patch 3.

So patch 3 allows blkdebug users to select which kind of I/O operation
they actually want to make fail, and patch 4 allows them to not use any
event, but to have a rule active all the time.

Together, we can then enable error injection for block-status in patch 5
and make use of event=none iotype=block-status in patch 6.


v2:
- Patch 2:
  - 2058c2ad261de7f58fae01d63d3d0efa484caf2a has added an error message
when message when *block_status() fails.  Continue using this
message (with the problematic offset instead of the sector index),
and adjust my new messages to fit its style.
  - Context conflict due to c075a0af22442e88226a6cef22da3fafccfb5d7e
- Patch 3: %s/4\.0/4.1/
- Patch 4: %s/4\.0/4.1/
- Patch 6: Amendments necessitated by the changes to the warnings added
   in patch 2


git-backport-diff against v2:

Key:
[] : patches are identical
[] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/6:[] [--] 'qemu-img: Move quiet into ImgConvertState'
002/6:[0017] [FC] 'qemu-img: Add salvaging mode to convert'
003/6:[0004] [FC] 'blkdebug: Add @iotype error option'
004/6:[0002] [FC] 'blkdebug: Add "none" event'
005/6:[] [-C] 'blkdebug: Inject errors on .bdrv_co_block_status()'
006/6:[0035] [FC] 'iotests: Test qemu-img convert --salvage'


Max Reitz (6):
  qemu-img: Move quiet into ImgConvertState
  qemu-img: Add salvaging mode to convert
  blkdebug: Add @iotype error option
  blkdebug: Add "none" event
  blkdebug: Inject errors on .bdrv_co_block_status()
  iotests: Test qemu-img convert --salvage

 qapi/block-core.json   |  33 +++-
 block/blkdebug.c   |  60 --
 qemu-img.c |  98 --
 qemu-img-cmds.hx   |   4 +-
 qemu-img.texi  |   5 ++
 tests/qemu-iotests/249 | 163 +
 tests/qemu-iotests/249.out |  43 ++
 tests/qemu-iotests/group   |   1 +
 8 files changed, 368 insertions(+), 39 deletions(-)
 create mode 100755 tests/qemu-iotests/249
 create mode 100644 tests/qemu-iotests/249.out

-- 
2.20.1




Re: [Qemu-devel] [PATCH 4/7] iotests: Update 241 to expose backing layer fragmentation

2019-04-10 Thread Eric Blake
On 4/8/19 8:51 AM, Vladimir Sementsov-Ogievskiy wrote:
> 03.04.2019 6:05, Eric Blake wrote:
>> Previous commits have mentioned that our NBD server still sends
>> unaligned fragments when an active layer with large advertised minimum
>> block size is backed by another layer with a smaller block
>> size. Expand the test to actually cover this scenario, by using qcow2
>> encryption (which forces 512-byte alignment) with an unaligned raw
>> backing file.
>>
>> The test passes, but only because the client side works around the
>> server's non-compliance; if you repeat the test manually with tracing
>> turned on, you will see the server sending a status for 1000 bytes
>> data then 1048 bytes hole, which is not aligned. But reverting commit
>> 737d3f5244 shows that it is indeed the client working around the bug
>> in the server.
>>
>> Signed-off-by: Eric Blake 
> 
> Oops, 241 fails for me:
> 
> -WARNING: Image format was not specified for 
> '/home/eblake/qemu/tests/qemu-iotests/scratch/t.raw' and probing guessed raw.
> +WARNING: Image format was not specified for 
> '/work/src/qemu/eric/tests/qemu-iotests/scratch/t.raw' and probing guessed 
> raw.
> 
> We forget to filter output :(

The filter bug is pre-existing; separate patch for that is now posted.

> 
> Tested-by: Vladimir Sementsov-Ogievskiy 

I saw this email before -rc3, but thought I had only introduced the
problem in this patch which I omitted from my rc3 pull request. Oh well,
the actual break in iotests came earlier; if there's an -rc4 for other
reasons, we'll get the test fixed then.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [PATCH for-4.0?] iotest: Fix 241 to run in generic directory

2019-04-10 Thread Eric Blake
Filter the qemu-nbd server output to get rid of a direct reference
to my build directory.

Fixes: e9dce9cb
Reported-by: Max Reitz 
Signed-off-by: Eric Blake 
---

Not worth -rc4 on its own, but if something else pops up that requires
another spin, I plan on a pull request for this one. Otherwise it
slips to 4.1, and 4.0 just has a broken iotest.

 tests/qemu-iotests/241 | 4 +++-
 tests/qemu-iotests/241.out | 6 +++---
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/tests/qemu-iotests/241 b/tests/qemu-iotests/241
index 4b196857387..017a736aaba 100755
--- a/tests/qemu-iotests/241
+++ b/tests/qemu-iotests/241
@@ -28,6 +28,7 @@ nbd_unix_socket=$TEST_DIR/test_qemu_nbd_socket
 _cleanup()
 {
 _cleanup_test_img
+rm -f "$TEST_DIR/server.log"
 nbd_server_stop
 }
 trap "_cleanup; exit \$status" 0 1 2 3 15
@@ -69,12 +70,13 @@ echo

 # Intentionally omit '-f' to force image probing, which in turn forces
 # sector alignment, here at the server.
-nbd_server_start_unix_socket "$TEST_IMG_FILE"
+nbd_server_start_unix_socket "$TEST_IMG_FILE" 2> "$TEST_DIR/server.log"

 $QEMU_NBD_PROG --list -k $nbd_unix_socket | grep '\(size\|min\)'
 $QEMU_IMG map -f raw --output=json "$TEST_IMG" | _filter_qemu_img_map
 $QEMU_IO -f raw -c map "$TEST_IMG"
 nbd_server_stop
+cat "$TEST_DIR/server.log" | _filter_testdir

 echo
 echo "=== Exporting unaligned raw image, forced client sector alignment ==="
diff --git a/tests/qemu-iotests/241.out b/tests/qemu-iotests/241.out
index f481074a02e..75f9f465e52 100644
--- a/tests/qemu-iotests/241.out
+++ b/tests/qemu-iotests/241.out
@@ -10,13 +10,13 @@ QA output created by 241

 === Exporting unaligned raw image, forced server sector alignment ===

-WARNING: Image format was not specified for 
'/home/eblake/qemu/tests/qemu-iotests/scratch/t.raw' and probing guessed raw.
- Automatically detecting the format is dangerous for raw images, write 
operations on block 0 will be restricted.
- Specify the 'raw' format explicitly to remove the restrictions.
   size:  1024
   min block: 512
 [{ "start": 0, "length": 1024, "depth": 0, "zero": false, "data": true, 
"offset": OFFSET}]
 1 KiB (0x400) bytes allocated at offset 0 bytes (0x0)
+WARNING: Image format was not specified for 'TEST_DIR/t.raw' and probing 
guessed raw.
+ Automatically detecting the format is dangerous for raw images, write 
operations on block 0 will be restricted.
+ Specify the 'raw' format explicitly to remove the restrictions.

 === Exporting unaligned raw image, forced client sector alignment ===

-- 
2.20.1




[Qemu-devel] [PATCH v4 08/11] iotests: Add filter commit test cases

2019-04-10 Thread Max Reitz
This patch adds some tests on how commit copes with filter nodes.

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/040 | 130 +
 tests/qemu-iotests/040.out |   4 +-
 2 files changed, 132 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/040 b/tests/qemu-iotests/040
index b81133a474..dc3fe57fbd 100755
--- a/tests/qemu-iotests/040
+++ b/tests/qemu-iotests/040
@@ -394,5 +394,135 @@ class TestReopenOverlay(ImageCommitTestCase):
 def test_reopen_overlay(self):
 self.run_commit_test(self.img1, self.img0)
 
+class TestCommitWithFilters(iotests.QMPTestCase):
+img0 = os.path.join(iotests.test_dir, '0.img')
+img1 = os.path.join(iotests.test_dir, '1.img')
+img2 = os.path.join(iotests.test_dir, '2.img')
+img3 = os.path.join(iotests.test_dir, '3.img')
+
+def setUp(self):
+qemu_img('create', '-f', iotests.imgfmt, self.img0, '1M')
+qemu_img('create', '-f', iotests.imgfmt, self.img1, '1M')
+qemu_img('create', '-f', iotests.imgfmt, self.img2, '1M')
+qemu_img('create', '-f', iotests.imgfmt, self.img3, '1M')
+
+self.vm = iotests.VM()
+self.vm.launch()
+result = self.vm.qmp('object-add', qom_type='throttle-group', id='tg')
+self.assert_qmp(result, 'return', {})
+
+result = self.vm.qmp('blockdev-add', **{
+'node-name': 'top-filter',
+'driver': 'throttle',
+'throttle-group': 'tg',
+'file': {
+'node-name': 'cow-3',
+'driver': iotests.imgfmt,
+'file': {
+'driver': 'file',
+'filename': self.img3
+},
+'backing': {
+'node-name': 'cow-2',
+'driver': iotests.imgfmt,
+'file': {
+'driver': 'file',
+'filename': self.img2
+},
+'backing': {
+'node-name': 'cow-1',
+'driver': iotests.imgfmt,
+'file': {
+'driver': 'file',
+'filename': self.img1
+},
+'backing': {
+'node-name': 'bottom-filter',
+'driver': 'throttle',
+'throttle-group': 'tg',
+'file': {
+'node-name': 'cow-0',
+'driver': iotests.imgfmt,
+'file': {
+'driver': 'file',
+'filename': self.img0
+}
+}
+}
+}
+}
+}
+})
+self.assert_qmp(result, 'return', {})
+
+def tearDown(self):
+self.vm.shutdown()
+os.remove(self.img3)
+os.remove(self.img2)
+os.remove(self.img1)
+os.remove(self.img0)
+
+# Filters make for funny filenames, so we cannot just use
+# self.imgX for the block-commit parameters
+def get_filename(self, node):
+return self.vm.node_info(node)['image']['filename']
+
+def test_filterless_commit(self):
+self.assert_no_active_block_jobs()
+result = self.vm.qmp('block-commit',
+ job_id='commit',
+ device='top-filter',
+ top=self.get_filename('cow-2'),
+ base=self.get_filename('cow-1'))
+self.assert_qmp(result, 'return', {})
+self.wait_until_completed(drive='commit')
+
+def test_commit_through_filter(self):
+self.assert_no_active_block_jobs()
+result = self.vm.qmp('block-commit',
+ job_id='commit',
+ device='top-filter',
+ top=self.get_filename('cow-1'),
+ base=self.get_filename('cow-0'))
+# Cannot commit through explicitly added filters (yet,
+# although in the future we probably want to make users use
+# blockdev-copy for this)
+self.assert_qmp(result, 'error/class', 'GenericError')
+self.assert_qmp(result, 'error/desc', 'Cannot commit through explicit 
filter nodes')
+
+def test_filtered_active_commit_with_filter(self):
+self.assert_no_active_block_jobs()
+result = self.vm.qmp('block-commit',
+ job_id='commit',
+ device='top-filter',
+ base=self.get_filename('cow-2'))
+# Not 

[Qemu-devel] [PATCH v4 04/11] block: Inline bdrv_co_block_status_from_*()

2019-04-10 Thread Max Reitz
With bdrv_filtered_rw_bs(), we can easily handle this default filter
behavior in bdrv_co_block_status().

blkdebug wants to have an additional assertion, so it keeps its own
implementation, except bdrv_co_block_status_from_file() needs to be
inlined there.

Suggested-by: Eric Blake 
Signed-off-by: Max Reitz 
---
 include/block/block_int.h | 22 -
 block/blkdebug.c  |  7 --
 block/blklogwrites.c  |  1 -
 block/commit.c|  1 -
 block/copy-on-read.c  |  2 --
 block/io.c| 51 +--
 block/mirror.c|  1 -
 block/throttle.c  |  1 -
 8 files changed, 22 insertions(+), 64 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index d0309e6307..76c7c0a111 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -1187,28 +1187,6 @@ void bdrv_format_default_perms(BlockDriverState *bs, 
BdrvChild *c,
uint64_t perm, uint64_t shared,
uint64_t *nperm, uint64_t *nshared);
 
-/*
- * Default implementation for drivers to pass bdrv_co_block_status() to
- * their file.
- */
-int coroutine_fn bdrv_co_block_status_from_file(BlockDriverState *bs,
-bool want_zero,
-int64_t offset,
-int64_t bytes,
-int64_t *pnum,
-int64_t *map,
-BlockDriverState **file);
-/*
- * Default implementation for drivers to pass bdrv_co_block_status() to
- * their backing file.
- */
-int coroutine_fn bdrv_co_block_status_from_backing(BlockDriverState *bs,
-   bool want_zero,
-   int64_t offset,
-   int64_t bytes,
-   int64_t *pnum,
-   int64_t *map,
-   BlockDriverState **file);
 const char *bdrv_get_parent_name(const BlockDriverState *bs);
 void blk_dev_change_media_cb(BlockBackend *blk, bool load, Error **errp);
 bool blk_dev_has_removable_media(BlockBackend *blk);
diff --git a/block/blkdebug.c b/block/blkdebug.c
index efd9441625..7950ae729c 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -637,8 +637,11 @@ static int coroutine_fn 
blkdebug_co_block_status(BlockDriverState *bs,
  BlockDriverState **file)
 {
 assert(QEMU_IS_ALIGNED(offset | bytes, bs->bl.request_alignment));
-return bdrv_co_block_status_from_file(bs, want_zero, offset, bytes,
-  pnum, map, file);
+assert(bs->file && bs->file->bs);
+*pnum = bytes;
+*map = offset;
+*file = bs->file->bs;
+return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID;
 }
 
 static void blkdebug_close(BlockDriverState *bs)
diff --git a/block/blklogwrites.c b/block/blklogwrites.c
index eb2b4901a5..1eb4a5c613 100644
--- a/block/blklogwrites.c
+++ b/block/blklogwrites.c
@@ -518,7 +518,6 @@ static BlockDriver bdrv_blk_log_writes = {
 .bdrv_co_pwrite_zeroes  = blk_log_writes_co_pwrite_zeroes,
 .bdrv_co_flush_to_disk  = blk_log_writes_co_flush_to_disk,
 .bdrv_co_pdiscard   = blk_log_writes_co_pdiscard,
-.bdrv_co_block_status   = bdrv_co_block_status_from_file,
 
 .is_filter  = true,
 .strong_runtime_opts= blk_log_writes_strong_runtime_opts,
diff --git a/block/commit.c b/block/commit.c
index 252007fd57..c366ee9655 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -254,7 +254,6 @@ static void bdrv_commit_top_child_perm(BlockDriverState 
*bs, BdrvChild *c,
 static BlockDriver bdrv_commit_top = {
 .format_name= "commit_top",
 .bdrv_co_preadv = bdrv_commit_top_preadv,
-.bdrv_co_block_status   = bdrv_co_block_status_from_backing,
 .bdrv_refresh_filename  = bdrv_commit_top_refresh_filename,
 .bdrv_child_perm= bdrv_commit_top_child_perm,
 
diff --git a/block/copy-on-read.c b/block/copy-on-read.c
index 53972b1da3..fe9260163c 100644
--- a/block/copy-on-read.c
+++ b/block/copy-on-read.c
@@ -150,8 +150,6 @@ static BlockDriver bdrv_copy_on_read = {
 .bdrv_eject = cor_eject,
 .bdrv_lock_medium   = cor_lock_medium,
 
-.bdrv_co_block_status   = bdrv_co_block_status_from_file,
-
 .bdrv_recurse_is_first_non_filter   = cor_recurse_is_first_non_filter,
 
 .has_variable_length= true,
diff --git a/block/io.c b/block/io.c
index 5c33ecc080..8d124bae5c 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1993,36 +1993,6 @@ typedef struct BdrvCoBlockStatusData 

[Qemu-devel] [PATCH v4 02/11] block: Filtered children access functions

2019-04-10 Thread Max Reitz
What bs->file and bs->backing mean depends on the node.  For filter
nodes, both signify a node that will eventually receive all R/W
accesses.  For format nodes, bs->file contains metadata and data, and
bs->backing will not receive writes -- instead, writes are COWed to
bs->file.  Usually.

In any case, it is not trivial to guess what a child means exactly with
our currently limited form of expression.  It is better to introduce
some functions that actually guarantee a meaning:

- bdrv_filtered_cow_child() will return the child that receives requests
  filtered through COW.  That is, reads may or may not be forwarded
  (depending on the overlay's allocation status), but writes never go to
  this child.

- bdrv_filtered_rw_child() will return the child that receives requests
  filtered through some very plain process.  Reads and writes issued to
  the parent will go to the child as well (although timing, etc. may be
  modified).

- All drivers but quorum (but quorum is pretty opaque to the general
  block layer anyway) always only have one of these children: All read
  requests must be served from the filtered_rw_child (if it exists), so
  if there was a filtered_cow_child in addition, it would not receive
  any requests at all.
  (The closest here is mirror, where all requests are passed on to the
  source, but with write-blocking, write requests are "COWed" to the
  target.  But that just means that the target is a special child that
  cannot be introspected by the generic block layer functions, and that
  source is a filtered_rw_child.)
  Therefore, we can also add bdrv_filtered_child() which returns that
  one child (or NULL, if there is no filtered child).

Also, many places in the current block layer should be skipping filters
(all filters or just the ones added implicitly, it depends) when going
through a block node chain.  They do not do that currently, but this
patch makes them.

One example for this is qemu-img map, which should skip filters and only
look at the COW elements in the graph.  The change to iotest 204's
reference output shows how using blkdebug on top of a COW node used to
make qemu-img map disregard the rest of the backing chain, but with this
patch, the allocation in the base image is reported correctly.

Furthermore, a note should be made that sometimes we do want to access
bs->backing directly.  This is whenever the operation in question is not
about accessing the COW child, but the "backing" child, be it COW or
not.  This is the case in functions such as bdrv_open_backing_file() or
whenever we have to deal with the special behavior of @backing as a
blockdev option, which is that it does not default to null like all
other child references do.

Finally, the query functions (query-block and query-named-block-nodes)
are modified to return any filtered child under "backing", not just
bs->backing or COW children.  This is so that filters do not interrupt
the reported backing chain.  This changes the output of iotest 184, as
the throttled node now appears as a backing child.

Signed-off-by: Max Reitz 
---
 qapi/block-core.json   |   4 +
 include/block/block.h  |   1 +
 include/block/block_int.h  |  40 +--
 block.c| 210 +++--
 block/backup.c |   8 +-
 block/block-backend.c  |  16 ++-
 block/commit.c |  33 +++---
 block/io.c |  45 ---
 block/mirror.c |  21 ++--
 block/qapi.c   |  30 +++--
 block/stream.c |  13 +-
 blockdev.c |  88 +++---
 migration/block-dirty-bitmap.c |   4 +-
 nbd/server.c   |   6 +-
 qemu-img.c |  29 ++---
 tests/qemu-iotests/184.out |   7 +-
 tests/qemu-iotests/204.out |   1 +
 17 files changed, 411 insertions(+), 145 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 7ccbfff9d0..dbd9286e4a 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2502,6 +2502,10 @@
 # On successful completion the image file is updated to drop the backing file
 # and the BLOCK_JOB_COMPLETED event is emitted.
 #
+# In case @device is a filter node, block-stream modifies the first non-filter
+# overlay node below it to point to base's backing node (or NULL if @base was
+# not specified) instead of modifying @device itself.
+#
 # @job-id: identifier for the newly-created block job. If
 #  omitted, the device name will be used. (Since 2.7)
 #
diff --git a/include/block/block.h b/include/block/block.h
index c7a26199aa..2005664f14 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -467,6 +467,7 @@ BlockDriverState *bdrv_lookup_bs(const char *device,
  const char *node_name,
  Error **errp);
 bool bdrv_chain_contains(BlockDriverState *top, BlockDriverState *base);
+bool bdrv_legacy_chain_contains(BlockDriverState *top, 

[Qemu-devel] [PATCH v4 11/11] iotests: Test committing to overridden backing

2019-04-10 Thread Max Reitz
Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/040 | 61 ++
 tests/qemu-iotests/040.out |  4 +--
 2 files changed, 63 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/040 b/tests/qemu-iotests/040
index dc3fe57fbd..155c1d8c23 100755
--- a/tests/qemu-iotests/040
+++ b/tests/qemu-iotests/040
@@ -524,5 +524,66 @@ class TestCommitWithFilters(iotests.QMPTestCase):
 self.assert_qmp(result, 'error/desc',
 'Top image file %s not found' % cow3_name)
 
+class TestCommitWithOverriddenBacking(iotests.QMPTestCase):
+img_base_a = os.path.join(iotests.test_dir, 'base_a.img')
+img_base_b = os.path.join(iotests.test_dir, 'base_b.img')
+img_top = os.path.join(iotests.test_dir, 'top.img')
+
+def setUp(self):
+qemu_img('create', '-f', iotests.imgfmt, self.img_base_a, '1M')
+qemu_img('create', '-f', iotests.imgfmt, self.img_base_b, '1M')
+qemu_img('create', '-f', iotests.imgfmt, '-b', self.img_base_a, \
+ self.img_top)
+
+self.vm = iotests.VM()
+self.vm.launch()
+
+# Use base_b instead of base_a as the backing of top
+result = self.vm.qmp('blockdev-add', **{
+'node-name': 'top',
+'driver': iotests.imgfmt,
+'file': {
+'driver': 'file',
+'filename': self.img_top
+},
+'backing': {
+'node-name': 'base',
+'driver': iotests.imgfmt,
+'file': {
+'driver': 'file',
+'filename': self.img_base_b
+}
+}
+})
+self.assert_qmp(result, 'return', {})
+
+def tearDown(self):
+self.vm.shutdown()
+os.remove(self.img_top)
+os.remove(self.img_base_a)
+os.remove(self.img_base_b)
+
+def test_commit_to_a(self):
+# Try committing to base_a (which should fail, as top's
+# backing image is base_b instead)
+result = self.vm.qmp('block-commit',
+ job_id='commit',
+ device='top',
+ base=self.img_base_a)
+self.assert_qmp(result, 'error/class', 'GenericError')
+
+def test_commit_to_b(self):
+# Try committing to base_b (which should work, since that is
+# actually top's backing image)
+result = self.vm.qmp('block-commit',
+ job_id='commit',
+ device='top',
+ base=self.img_base_b)
+self.assert_qmp(result, 'return', {})
+
+self.vm.event_wait('BLOCK_JOB_READY')
+self.vm.qmp('block-job-complete', device='commit')
+self.vm.event_wait('BLOCK_JOB_COMPLETED')
+
 if __name__ == '__main__':
 iotests.main(supported_fmts=['qcow2', 'qed'])
diff --git a/tests/qemu-iotests/040.out b/tests/qemu-iotests/040.out
index 220a5fa82c..bbcbc202a0 100644
--- a/tests/qemu-iotests/040.out
+++ b/tests/qemu-iotests/040.out
@@ -1,5 +1,5 @@
-...
+.
 --
-Ran 47 tests
+Ran 49 tests
 
 OK
-- 
2.20.1




[Qemu-devel] [PATCH v4 10/11] iotests: Add test for commit in sub directory

2019-04-10 Thread Max Reitz
Add a test for committing an overlay in a sub directory to one of the
images in its backing chain, using both relative and absolute filenames.

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/020 | 36 
 tests/qemu-iotests/020.out | 10 ++
 2 files changed, 46 insertions(+)

diff --git a/tests/qemu-iotests/020 b/tests/qemu-iotests/020
index 71fa753b4e..cfcbc0cf45 100755
--- a/tests/qemu-iotests/020
+++ b/tests/qemu-iotests/020
@@ -31,6 +31,11 @@ _cleanup()
_cleanup_test_img
 rm -f "$TEST_IMG.base"
 rm -f "$TEST_IMG.orig"
+
+rm -f "$TEST_DIR/subdir/t.$IMGFMT.base"
+rm -f "$TEST_DIR/subdir/t.$IMGFMT.mid"
+rm -f "$TEST_DIR/subdir/t.$IMGFMT"
+rmdir "$TEST_DIR/subdir" &> /dev/null
 }
 trap "_cleanup; exit \$status" 0 1 2 3 15
 
@@ -134,6 +139,37 @@ $QEMU_IO -c 'writev 0 64k' "$TEST_IMG" | _filter_qemu_io
 $QEMU_IMG commit "$TEST_IMG"
 _cleanup
 
+
+echo
+echo 'Testing commit in sub-directory with relative filenames'
+echo
+
+pushd "$TEST_DIR" > /dev/null
+
+mkdir subdir
+
+TEST_IMG="subdir/t.$IMGFMT.base" _make_test_img 1M
+TEST_IMG="subdir/t.$IMGFMT.mid" _make_test_img -b "t.$IMGFMT.base"
+TEST_IMG="subdir/t.$IMGFMT" _make_test_img -b "t.$IMGFMT.mid"
+
+# Should work
+$QEMU_IMG commit -b "t.$IMGFMT.mid" "subdir/t.$IMGFMT"
+
+# Might theoretically work, but does not in practice (we have to
+# decide between this and the above; and since we always represent
+# backing file names as relative to the overlay, we go for the above)
+$QEMU_IMG commit -b "subdir/t.$IMGFMT.mid" "subdir/t.$IMGFMT" 2>&1 | \
+_filter_imgfmt
+
+# This should work as well
+$QEMU_IMG commit -b "$TEST_DIR/subdir/t.$IMGFMT.mid" "subdir/t.$IMGFMT"
+
+popd > /dev/null
+
+# Now let's try with just absolute filenames
+$QEMU_IMG commit -b "$TEST_DIR/subdir/t.$IMGFMT.mid" \
+"$TEST_DIR/subdir/t.$IMGFMT"
+
 # success, all done
 echo "*** done"
 rm -f $seq.full
diff --git a/tests/qemu-iotests/020.out b/tests/qemu-iotests/020.out
index 4b722b2dd0..228c37dded 100644
--- a/tests/qemu-iotests/020.out
+++ b/tests/qemu-iotests/020.out
@@ -1094,4 +1094,14 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576 
backing_file=json:{'driv
 wrote 65536/65536 bytes at offset 0
 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 qemu-img: Block job failed: No space left on device
+
+Testing commit in sub-directory with relative filenames
+
+Formatting 'subdir/t.IMGFMT.base', fmt=IMGFMT size=1048576
+Formatting 'subdir/t.IMGFMT.mid', fmt=IMGFMT size=1048576 
backing_file=t.IMGFMT.base
+Formatting 'subdir/t.IMGFMT', fmt=IMGFMT size=1048576 backing_file=t.IMGFMT.mid
+Image committed.
+qemu-img: Did not find 'subdir/t.IMGFMT.mid' in the backing chain of 
'subdir/t.IMGFMT'
+Image committed.
+Image committed.
 *** done
-- 
2.20.1




[Qemu-devel] [PATCH v4 09/11] iotests: Add filter mirror test cases

2019-04-10 Thread Max Reitz
This patch adds some test cases how mirroring relates to filters.  One
of them tests what happens when you mirror off a filtered COW node, two
others use the mirror filter node as basically our only example of an
implicitly created filter node so far (besides the commit filter).

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/041 | 146 -
 tests/qemu-iotests/041.out |   4 +-
 2 files changed, 147 insertions(+), 3 deletions(-)

diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
index 0c1432f189..c2b5299f62 100755
--- a/tests/qemu-iotests/041
+++ b/tests/qemu-iotests/041
@@ -20,8 +20,9 @@
 
 import time
 import os
+import json
 import iotests
-from iotests import qemu_img, qemu_io
+from iotests import qemu_img, qemu_img_pipe, qemu_io
 
 backing_img = os.path.join(iotests.test_dir, 'backing.img')
 target_backing_img = os.path.join(iotests.test_dir, 'target-backing.img')
@@ -1191,5 +1192,148 @@ class TestReplaces(iotests.QMPTestCase):
 os.remove(test_img)
 os.remove(target_img)
 
+# Tests for mirror with filters (and how the mirror filter behaves, as
+# an example for an implicit filter)
+class TestFilters(iotests.QMPTestCase):
+def setUp(self):
+qemu_img('create', '-f', iotests.imgfmt, backing_img, '1M')
+qemu_img('create', '-f', iotests.imgfmt, '-b', backing_img, test_img)
+qemu_img('create', '-f', iotests.imgfmt, '-b', backing_img, target_img)
+
+qemu_io('-c', 'write -P 1 0 512k', backing_img)
+qemu_io('-c', 'write -P 2 512k 512k', test_img)
+
+self.vm = iotests.VM()
+self.vm.launch()
+
+result = self.vm.qmp('blockdev-add', **{
+'node-name': 'target',
+'driver': iotests.imgfmt,
+'file': {
+'driver': 'file',
+'filename': target_img
+},
+'backing': None
+})
+self.assert_qmp(result, 'return', {})
+
+self.filterless_chain = {
+'node-name': 'source',
+'driver': iotests.imgfmt,
+'file': {
+'driver': 'file',
+'filename': test_img
+},
+'backing': {
+'node-name': 'backing',
+'driver': iotests.imgfmt,
+'file': {
+'driver': 'file',
+'filename': backing_img
+}
+}
+}
+
+def tearDown(self):
+self.vm.shutdown()
+
+os.remove(test_img)
+os.remove(target_img)
+os.remove(backing_img)
+
+def test_cor(self):
+result = self.vm.qmp('blockdev-add', **{
+'node-name': 'filter',
+'driver': 'copy-on-read',
+'file': self.filterless_chain
+})
+self.assert_qmp(result, 'return', {})
+
+result = self.vm.qmp('blockdev-mirror',
+ job_id='mirror',
+ device='filter',
+ target='target',
+ sync='top')
+self.assert_qmp(result, 'return', {})
+
+self.complete_and_wait('mirror')
+
+self.vm.qmp('blockdev-del', node_name='target')
+
+target_map = qemu_img_pipe('map', '--output=json', target_img)
+target_map = json.loads(target_map)
+
+assert target_map[0]['start'] == 0
+assert target_map[0]['length'] == 512 * 1024
+assert target_map[0]['depth'] == 1
+
+assert target_map[1]['start'] == 512 * 1024
+assert target_map[1]['length'] == 512 * 1024
+assert target_map[1]['depth'] == 0
+
+def test_implicit_mirror_filter(self):
+result = self.vm.qmp('blockdev-add', **self.filterless_chain)
+self.assert_qmp(result, 'return', {})
+
+# We need this so we can query from above the mirror node
+result = self.vm.qmp('device_add',
+ driver='virtio-blk',
+ id='virtio',
+ bus='pci.0',
+ drive='source')
+self.assert_qmp(result, 'return', {})
+
+result = self.vm.qmp('blockdev-mirror',
+ job_id='mirror',
+ device='source',
+ target='target',
+ sync='top')
+self.assert_qmp(result, 'return', {})
+
+# The mirror filter is now an implicit node, so it should be
+# invisible when querying the backing chain
+device_info = self.vm.qmp('query-block')['return'][0]
+assert device_info['qdev'] == 

[Qemu-devel] [PATCH v4 06/11] iotests: Add tests for mirror @replaces loops

2019-04-10 Thread Max Reitz
This adds two tests for cases where our old check_to_replace_node()
function failed to detect that executing this job with these parameters
would result in a cyclic graph.

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 tests/qemu-iotests/041 | 124 +
 tests/qemu-iotests/041.out |   4 +-
 2 files changed, 126 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
index 26bf1701eb..0c1432f189 100755
--- a/tests/qemu-iotests/041
+++ b/tests/qemu-iotests/041
@@ -1067,5 +1067,129 @@ class TestOrphanedSource(iotests.QMPTestCase):
  target='dest-ro')
 self.assert_qmp(result, 'error/class', 'GenericError')
 
+# Various tests for the @replaces option (independent of quorum)
+class TestReplaces(iotests.QMPTestCase):
+def setUp(self):
+self.vm = iotests.VM()
+self.vm.launch()
+
+def tearDown(self):
+self.vm.shutdown()
+
+def test_drive_mirror_loop(self):
+qemu_img('create', '-f', iotests.imgfmt, test_img, '1M')
+
+result = self.vm.qmp('object-add', qom_type='throttle-group', id='tg')
+self.assert_qmp(result, 'return', {})
+
+result = self.vm.qmp('blockdev-add', **{
+'node-name': 'source',
+'driver': 'throttle',
+'throttle-group': 'tg',
+'file': {
+'node-name': 'filtered',
+'driver': iotests.imgfmt,
+'file': {
+'driver': 'file',
+'filename': test_img
+}
+}
+})
+self.assert_qmp(result, 'return', {})
+
+# Mirror from @source to @target in sync=none, so that @source
+# will be @target's backing file; but replace @filtered.
+# Then, @target's backing file will be @source, whose backing
+# file is now @target instead of @filtered.  That is a loop.
+# (But apart from the loop, replacing @filtered instead of
+# @source is fine, because both are just filtered versions of
+# each other.)
+result = self.vm.qmp('drive-mirror',
+ job_id='mirror',
+ device='source',
+ target=target_img,
+ format=iotests.imgfmt,
+ node_name='target',
+ sync='none',
+ replaces='filtered')
+if 'error' in result:
+# This is the correct result
+self.assert_qmp(result, 'error/class', 'GenericError')
+else:
+# This is wrong, but let's run it to the bitter conclusion
+self.complete_and_wait(drive='mirror')
+# Fail for good measure, although qemu should have crashed
+# anyway
+self.fail('Loop creation was successful')
+
+os.remove(test_img)
+try:
+os.remove(target_img)
+except OSError:
+pass
+
+def test_blockdev_mirror_loop(self):
+qemu_img('create', '-f', iotests.imgfmt, test_img, '1M')
+qemu_img('create', '-f', iotests.imgfmt, target_img, '1M')
+
+result = self.vm.qmp('object-add', qom_type='throttle-group', id='tg')
+self.assert_qmp(result, 'return', {})
+
+result = self.vm.qmp('blockdev-add', **{
+'node-name': 'source',
+'driver': 'throttle',
+'throttle-group': 'tg',
+'file': {
+'node-name': 'middle',
+'driver': 'throttle',
+'throttle-group': 'tg',
+'file': {
+'node-name': 'bottom',
+'driver': iotests.imgfmt,
+'file': {
+'driver': 'file',
+'filename': test_img
+}
+}
+}
+})
+self.assert_qmp(result, 'return', {})
+
+result = self.vm.qmp('blockdev-add', **{
+'node-name': 'target',
+'driver': iotests.imgfmt,
+'file': {
+'driver': 'file',
+'filename': target_img
+},
+'backing': 'middle'
+})
+
+# Mirror from @source to @target.  With blockdev-mirror, the
+# current (old) backing file is retained (which is @middle).
+# By replacing @bottom, @middle's file will be @target, whose
+# backing file is @middle again.  That is a loop.
+# (But apart from the loop, replacing @bottom instead of
+# @source is fine, because both are just 

[Qemu-devel] [PATCH v4 07/11] block: Leave BDS.backing_file constant

2019-04-10 Thread Max Reitz
Parts of the block layer treat BDS.backing_file as if it were whatever
the image header says (i.e., if it is a relative path, it is relative to
the overlay), other parts treat it like a cache for
bs->backing->bs->filename (relative paths are relative to the CWD).
Considering bs->backing->bs->filename exists, let us make it mean the
former.

Among other things, this now allows the user to specify a base when
using qemu-img to commit an image file in a directory that is not the
CWD (assuming, everything uses relative filenames).

Before this patch:

$ ./qemu-img create -f qcow2 foo/bot.qcow2 1M
$ ./qemu-img create -f qcow2 -b bot.qcow2 foo/mid.qcow2
$ ./qemu-img create -f qcow2 -b mid.qcow2 foo/top.qcow2
$ ./qemu-img commit -b mid.qcow2 foo/top.qcow2
qemu-img: Did not find 'mid.qcow2' in the backing chain of 'foo/top.qcow2'
$ ./qemu-img commit -b foo/mid.qcow2 foo/top.qcow2
qemu-img: Did not find 'foo/mid.qcow2' in the backing chain of 'foo/top.qcow2'
$ ./qemu-img commit -b $PWD/foo/mid.qcow2 foo/top.qcow2
qemu-img: Did not find '[...]/foo/mid.qcow2' in the backing chain of 
'foo/top.qcow2'

After this patch:

$ ./qemu-img commit -b mid.qcow2 foo/top.qcow2
Image committed.
$ ./qemu-img commit -b foo/mid.qcow2 foo/top.qcow2
qemu-img: Did not find 'foo/mid.qcow2' in the backing chain of 'foo/top.qcow2'
$ ./qemu-img commit -b $PWD/foo/mid.qcow2 foo/top.qcow2
Image committed.

With this change, bdrv_find_backing_image() must look at whether the
user has overridden a BDS's backing file.  If so, it can no longer use
bs->backing_file, but must instead compare the given filename against
the backing node's filename directly.

Note that this changes the QAPI output for a node's backing_file.  We
had very inconsistent output there (sometimes what the image header
said, sometimes the actual filename of the backing image).  This
inconsistent output was effectively useless, so we have to decide one
way or the other.  Considering that bs->backing_file usually at runtime
contained the path to the image relative to qemu's CWD (or absolute),
this patch changes QAPI's backing_file to always report the
bs->backing->bs->filename from now on.  If you want to receive the image
header information, you have to refer to full-backing-filename.

This necessitates a change to iotest 228.  The interesting information
it really wanted is the image header, and it can get that now, but it
has to use full-backing-filename instead of backing_file.  Because of
this patch's changes to bs->backing_file's behavior, we also need some
reference output changes.

Along with the changes to bs->backing_file, stop updating
BDS.backing_format in bdrv_backing_attach() as well.  This necessitates
a change to the reference output of iotest 191.

iotest 245 changes in behavior: With the backing node no longer
overriding the parent node's backing_file string, you can now omit the
@backing option when reopening a node with neither a default nor a
current backing file even if it used to have a backing node at some
point.

Signed-off-by: Max Reitz 
---
 include/block/block_int.h  | 19 ++-
 block.c| 35 ---
 block/qapi.c   |  7 ---
 qemu-img.c | 12 ++--
 tests/qemu-iotests/191.out |  1 -
 tests/qemu-iotests/228 |  6 +++---
 tests/qemu-iotests/228.out |  6 +++---
 tests/qemu-iotests/245 |  4 +++-
 8 files changed, 65 insertions(+), 25 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index 76c7c0a111..69524bc712 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -760,11 +760,20 @@ struct BlockDriverState {
 bool walking_aio_notifiers; /* to make removal during iteration safe */
 
 char filename[PATH_MAX];
-char backing_file[PATH_MAX]; /* if non zero, the image is a diff of
-this file image */
-/* The backing filename indicated by the image header; if we ever
- * open this file, then this is replaced by the resulting BDS's
- * filename (i.e. after a bdrv_refresh_filename() run). */
+/*
+ * If not empty, this image is a diff in relation to backing_file.
+ * Note that this is the name given in the image header and
+ * therefore may or may not be equal to .backing->bs->filename.
+ * If this field contains a relative path, it is to be resolved
+ * relatively to the overlay's location.
+ */
+char backing_file[PATH_MAX];
+/*
+ * The backing filename indicated by the image header.  Contrary
+ * to backing_file, if we ever open this file, auto_backing_file
+ * is replaced by the resulting BDS's filename (i.e. after a
+ * bdrv_refresh_filename() run).
+ */
 char auto_backing_file[PATH_MAX];
 char backing_format[16]; /* if non-zero and backing_file exists */
 
diff --git a/block.c b/block.c
index 820244f52e..a4c2dda039 100644
--- a/block.c
+++ b/block.c
@@ -78,6 +78,8 @@ static 

[Qemu-devel] [PATCH v4 05/11] block: Fix check_to_replace_node()

2019-04-10 Thread Max Reitz
Currently, check_to_replace_node() only allows mirror to replace a node
in the chain of the source node, and only if it is the first non-filter
node below the source.  Well, technically, the idea is that you can
exactly replace a quorum child by mirroring from quorum.

This has (probably) two reasons:
(1) We do not want to create loops.
(2) @replaces and @device should have exactly the same content so
replacing them does not cause visible data to change.

This has two issues:
(1) It is overly restrictive.  It is completely fine for @replaces to be
a filter.
(2) It is not restrictive enough.  You can create loops with this as
follows:

$ qemu-img create -f qcow2 /tmp/source.qcow2 64M
$ qemu-system-x86_64 -qmp stdio
{"execute": "qmp_capabilities"}
{"execute": "object-add",
 "arguments": {"qom-type": "throttle-group", "id": "tg0"}}
{"execute": "blockdev-add",
 "arguments": {
 "node-name": "source",
 "driver": "throttle",
 "throttle-group": "tg0",
 "file": {
 "node-name": "filtered",
 "driver": "qcow2",
 "file": {
 "driver": "file",
 "filename": "/tmp/source.qcow2"
 } } } }
{"execute": "drive-mirror",
 "arguments": {
 "job-id": "mirror",
 "device": "source",
 "target": "/tmp/target.qcow2",
 "format": "qcow2",
 "node-name": "target",
 "sync" :"none",
 "replaces": "filtered"
 } }
{"execute": "block-job-complete", "arguments": {"device": "mirror"}}

And qemu crashes because of a stack overflow due to the loop being
created (target's backing file is source, so when it replaces filtered,
it points to itself through source).

(blockdev-mirror can be broken similarly.)

So let us make the checks for the two conditions above explicit, which
makes the whole function exactly as restrictive as it needs to be.

Signed-off-by: Max Reitz 
---
 include/block/block.h |  1 +
 block.c   | 83 +++
 blockdev.c| 34 --
 3 files changed, 110 insertions(+), 8 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 2005664f14..2878198892 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -402,6 +402,7 @@ bool bdrv_is_first_non_filter(BlockDriverState *candidate);
 
 /* check if a named node can be replaced when doing drive-mirror */
 BlockDriverState *check_to_replace_node(BlockDriverState *parent_bs,
+BlockDriverState *backing_bs,
 const char *node_name, Error **errp);
 
 /* async block I/O */
diff --git a/block.c b/block.c
index 89cb6de4c3..820244f52e 100644
--- a/block.c
+++ b/block.c
@@ -5916,7 +5916,59 @@ bool bdrv_is_first_non_filter(BlockDriverState 
*candidate)
 return false;
 }
 
+static bool is_child_of(BlockDriverState *child, BlockDriverState *parent)
+{
+BdrvChild *c;
+
+if (!parent) {
+return false;
+}
+
+QLIST_FOREACH(c, >children, next) {
+if (c->bs == child || is_child_of(child, c->bs)) {
+return true;
+}
+}
+
+return false;
+}
+
+/*
+ * Return true if there are only filters in [@top, @base).  Note that
+ * this may include quorum (which bdrv_chain_contains() cannot
+ * handle).
+ */
+static bool is_filtered_child(BlockDriverState *top, BlockDriverState *base)
+{
+BdrvChild *c;
+
+if (!top) {
+return false;
+}
+
+if (top == base) {
+return true;
+}
+
+if (!top->drv->is_filter) {
+return false;
+}
+
+QLIST_FOREACH(c, >children, next) {
+if (is_filtered_child(c->bs, base)) {
+return true;
+}
+}
+
+return false;
+}
+
+/*
+ * @parent_bs is mirror's source BDS, @backing_bs is the BDS which
+ * will be attached to the target when mirror completes.
+ */
 BlockDriverState *check_to_replace_node(BlockDriverState *parent_bs,
+BlockDriverState *backing_bs,
 const char *node_name, Error **errp)
 {
 BlockDriverState *to_replace_bs = bdrv_find_node(node_name);
@@ -5935,13 +5987,32 @@ BlockDriverState 
*check_to_replace_node(BlockDriverState *parent_bs,
 goto out;
 }
 
-/* We don't want arbitrary node of the BDS chain to be replaced only the 
top
- * most non filter in order to prevent data corruption.
- * Another benefit is that this tests exclude backing files which are
- * blocked by the backing blockers.
+/*
+ * If to_replace_bs is (recursively) a child of backing_bs,
+ * replacing it may create a loop.  We cannot allow that.
  */
-if (!bdrv_recurse_is_first_non_filter(parent_bs, to_replace_bs)) {
-error_setg(errp, "Only top most non filter can be replaced");
+if (to_replace_bs == backing_bs || is_child_of(to_replace_bs, backing_bs)) 
{
+error_setg(errp, "Replacing this node would result in a loop");
+to_replace_bs 

[Qemu-devel] [PATCH v4 03/11] block: Storage child access function

2019-04-10 Thread Max Reitz
For completeness' sake, add a function for accessing a node's storage
child, too.  For filters, this is their filtered child; for non-filters,
this is bs->file.

Some places are deliberately left unconverted:
- BDS opening/closing functions where bs->file is handled specially
  (which is basically wrong, but at least simplifies probing)
- bdrv_co_block_status_from_file(), because its name implies that it
  points to ->file
- bdrv_snapshot_goto() in one places unrefs bs->file.  Such a
  modification is not covered by this patch and is therefore just
  safeguarded by an additional assert(), but otherwise kept as-is.

Signed-off-by: Max Reitz 
---
 include/block/block_int.h |  6 +
 block.c   | 53 ---
 block/io.c| 22 +++-
 block/qapi.c  |  7 +++---
 block/snapshot.c  | 40 -
 5 files changed, 81 insertions(+), 47 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index b22b1164f8..d0309e6307 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -1243,6 +1243,7 @@ int refresh_total_sectors(BlockDriverState *bs, int64_t 
hint);
 BdrvChild *bdrv_filtered_cow_child(BlockDriverState *bs);
 BdrvChild *bdrv_filtered_rw_child(BlockDriverState *bs);
 BdrvChild *bdrv_filtered_child(BlockDriverState *bs);
+BdrvChild *bdrv_storage_child(BlockDriverState *bs);
 BlockDriverState *bdrv_skip_implicit_filters(BlockDriverState *bs);
 BlockDriverState *bdrv_skip_rw_filters(BlockDriverState *bs);
 BlockDriverState *bdrv_backing_chain_next(BlockDriverState *bs);
@@ -1267,4 +1268,9 @@ static inline BlockDriverState 
*bdrv_filtered_bs(BlockDriverState *bs)
 return child_bs(bdrv_filtered_child(bs));
 }
 
+static inline BlockDriverState *bdrv_storage_bs(BlockDriverState *bs)
+{
+return child_bs(bdrv_storage_child(bs));
+}
+
 #endif /* BLOCK_INT_H */
diff --git a/block.c b/block.c
index e8f6febda0..89cb6de4c3 100644
--- a/block.c
+++ b/block.c
@@ -4404,15 +4404,21 @@ exit:
 int64_t bdrv_get_allocated_file_size(BlockDriverState *bs)
 {
 BlockDriver *drv = bs->drv;
+BlockDriverState *storage_bs;
+
 if (!drv) {
 return -ENOMEDIUM;
 }
+
 if (drv->bdrv_get_allocated_file_size) {
 return drv->bdrv_get_allocated_file_size(bs);
 }
-if (bs->file) {
-return bdrv_get_allocated_file_size(bs->file->bs);
+
+storage_bs = bdrv_storage_bs(bs);
+if (storage_bs) {
+return bdrv_get_allocated_file_size(storage_bs);
 }
+
 return -ENOTSUP;
 }
 
@@ -4982,7 +4988,7 @@ int bdrv_debug_breakpoint(BlockDriverState *bs, const 
char *event,
   const char *tag)
 {
 while (bs && bs->drv && !bs->drv->bdrv_debug_breakpoint) {
-bs = bs->file ? bs->file->bs : NULL;
+bs = bdrv_storage_bs(bs);
 }
 
 if (bs && bs->drv && bs->drv->bdrv_debug_breakpoint) {
@@ -4995,7 +5001,7 @@ int bdrv_debug_breakpoint(BlockDriverState *bs, const 
char *event,
 int bdrv_debug_remove_breakpoint(BlockDriverState *bs, const char *tag)
 {
 while (bs && bs->drv && !bs->drv->bdrv_debug_remove_breakpoint) {
-bs = bs->file ? bs->file->bs : NULL;
+bs = bdrv_storage_bs(bs);
 }
 
 if (bs && bs->drv && bs->drv->bdrv_debug_remove_breakpoint) {
@@ -5008,7 +5014,7 @@ int bdrv_debug_remove_breakpoint(BlockDriverState *bs, 
const char *tag)
 int bdrv_debug_resume(BlockDriverState *bs, const char *tag)
 {
 while (bs && (!bs->drv || !bs->drv->bdrv_debug_resume)) {
-bs = bs->file ? bs->file->bs : NULL;
+bs = bdrv_storage_bs(bs);
 }
 
 if (bs && bs->drv && bs->drv->bdrv_debug_resume) {
@@ -5021,7 +5027,7 @@ int bdrv_debug_resume(BlockDriverState *bs, const char 
*tag)
 bool bdrv_debug_is_suspended(BlockDriverState *bs, const char *tag)
 {
 while (bs && bs->drv && !bs->drv->bdrv_debug_is_suspended) {
-bs = bs->file ? bs->file->bs : NULL;
+bs = bdrv_storage_bs(bs);
 }
 
 if (bs && bs->drv && bs->drv->bdrv_debug_is_suspended) {
@@ -6142,14 +6148,23 @@ void bdrv_refresh_filename(BlockDriverState *bs)
 bs->exact_filename[0] = '\0';
 
 drv->bdrv_refresh_filename(bs);
-} else if (bs->file) {
-/* Try to reconstruct valid information from the underlying file */
+} else if (bdrv_storage_child(bs)) {
+/*
+ * Try to reconstruct valid information from the underlying
+ * file -- this only works for format nodes (filter nodes
+ * cannot be probed and as such must be selected by the user
+ * either through an options dict, or through a special
+ * filename which the filter driver must construct in its
+ * .bdrv_refresh_filename() implementation).
+ */
+BlockDriverState *storage_bs = bdrv_storage_bs(bs);
 
 bs->exact_filename[0] = '\0';
 
 /*
  * We can use the underlying file's filename if:
  * - it 

[Qemu-devel] [PATCH v4 01/11] block: Mark commit and mirror as filter drivers

2019-04-10 Thread Max Reitz
The commit and mirror block nodes are filters, so they should be marked
as such.  (Strictly speaking, BDS.is_filter's documentation states that
a filter's child must be bs->file.  The following patch will relax this
restriction, however.)

Signed-off-by: Max Reitz 
Reviewed-by: Alberto Garcia 
Reviewed-by: Eric Blake 
---
 block/commit.c | 2 ++
 block/mirror.c | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/block/commit.c b/block/commit.c
index ba60fef58a..02eab34925 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -257,6 +257,8 @@ static BlockDriver bdrv_commit_top = {
 .bdrv_co_block_status   = bdrv_co_block_status_from_backing,
 .bdrv_refresh_filename  = bdrv_commit_top_refresh_filename,
 .bdrv_child_perm= bdrv_commit_top_child_perm,
+
+.is_filter  = true,
 };
 
 void commit_start(const char *job_id, BlockDriverState *bs,
diff --git a/block/mirror.c b/block/mirror.c
index ff15cfb197..8b2404051f 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1489,6 +1489,8 @@ static BlockDriver bdrv_mirror_top = {
 .bdrv_co_block_status   = bdrv_co_block_status_from_backing,
 .bdrv_refresh_filename  = bdrv_mirror_top_refresh_filename,
 .bdrv_child_perm= bdrv_mirror_top_child_perm,
+
+.is_filter  = true,
 };
 
 static void mirror_start_job(const char *job_id, BlockDriverState *bs,
-- 
2.20.1




[Qemu-devel] [PATCH v4 00/11] block: Deal with filters

2019-04-10 Thread Max Reitz
Note: This is technically the first part of my active mirror followup.
But just very technically.  I noticed that that followup started to
consist of two parts, namely (A) fix filtery things in the block layer,
and (B) fix active mirror.  So I decided to split it.  This is part A.
Part B is “mirror: Mainly coroutine refinements”.


When we introduced filters, we did it a bit casually.  Sure, we talked a
lot about them before, but that was mostly discussion about where
implicit filters should be added to the graph (note that we currently
only have two implicit filters, those being mirror and commit).  But in
the end, we really just designated some drivers filters (Quorum,
blkdebug, etc.) and added some specifically (throttle, COR), without
really looking through the block layer to see where issues might occur.

It turns out vast areas of the block layer just don’t know about filters
and cannot really handle them.  Many cases will work in practice, in
others, well, too bad, you cannot use some feature because some part
deep inside the block layer looks at your filters and thinks they are
format nodes.

This series sets out to correct a bit of that.  I lost my head many
times and I’m sure this series is incomplete in many ways, but it really
doesn’t do any good if it sits on my disk any longer, it needs to go out
now.

The most important patches of this series are patches 2 and 3.  These
introduce functions to encapsulate bs->backing and bs->file accesses.
Because sometimes, bs->backing means COW, sometimes it means filtered
node.  And sometimes, bs->file means metadata storage, and sometimes it
means filtered node.  With this functions, it’s always clear what the
caller wants, and it will always get what it wants.

Besides that, patch 2 introduces functions to skip filters which may be
used by parts of the block layer that just don’t care about them.


Secondly, the restraints put on mirror’s @replaces parameter are
revisited and fixed.


Thirdly, BDS.backing_file is changed to be constant.  I don’t quite know
why we modify it whenever we change a BDS’s backing file, but that’s
definitely not quite right.  This fixes things like being able to
perform a commit on a file (using relative filenames) in a directory
that’s not qemu’s CWD.


Finally, a number of tests are added.


There are probably many things that are worthy of discussion, of which
only some come to my head, e.g.:

- In which cases do we want to skip filters, in which cases do we want
  to skip implicit filters?
  My approach was to basically never skip explicitly added filters,
  except when it’s about finding a file in some tree (e.g. in a backing
  chain).  Maybe there are cases where you think we should skip even
  explicitly added filters.

- I made interesting decisions like “When you mirror from a node, we
  should indeed mirror from that node, but when replacing it, we should
  skip leave all implicit filters on top intact.”  You may disagree with
  that.
  (My reasoning here is that users aren’t supposed to know about
  implicit filters, and therefore, they should not intend to remove
  them.  Also, mirror accepts only root nodes as the source, so you
  cannot really specify the node below the implicit filters.  But you
  can use @replaces to drop the implicit filters, if you know they are
  there.)

- bdrv_query_bds_stats() is changed: “parent” now means storage,
  “backing” means COW.  This is what makes sense, although it breaks
  compatibility; but only for filters that use bs->backing for the
  filtered child (i.e. mirror top and commit top).  The alternatives
  would be:
  - Leave everything as it is.  But this means that whenever you add
another filter (throttle or COR), the backing chain is still broken
because they use bs->file for their filtered child.  So this is not
really an option.
  - Present all filtered children under “backing”.  We would need to
present them under “parent” as well, though, if they are referenced
as bs->file, otherwise this too would break compatibility and would
not be any better.
This seems rather broken because we may present the same node twice
(once as “parent”, once as “backing”).
Well, or we decide to break compatibility here, too, but to me it
seems wrong to present filtered nodes under “backing” but not under
“parent”.

  So I went for the solution that makes the most sense to me.


v4:
- Dropped patch 2 because it’s in master
- (New) patch 2:
  - Fixed the description of BDS.is_filter (requested by Kevin); I
didn’t do that before patch 1 because the description is kind of
wrong today already (Quorum does not have a bs->file but has been
marked a filter from the start).  It was only kind of wrong
because the description just claims that some callbacks get
automatically passed to bs->file, not that the filter must have
bs->file present.  But then patch 1 is correct without adjusting the
description, and we only need to do so 

Re: [Qemu-devel] [PATCH] hw/char/escc: Lower irq when transmit buffer is filled

2019-04-10 Thread Philippe Mathieu-Daudé
On 3/6/19 12:01 PM, Paolo Bonzini wrote:
> On 05/03/19 06:10, Stephen Checkoway wrote:
>> The SCC/ESCC will briefly stop asserting an interrupt when the
>> transmit FIFO is filled.
>>
>> This code doesn't model the transmit FIFO/shift register so the
>> pending transmit interrupt is never deasserted which means that an
>> edge-triggered interrupt controller will never see the low-to-high
>> transition it needs to raise another interrupt. The practical
>> consequence of this is that guest firmware with an interrupt service
>> routine for the ESCC that does not send all of the data it has
>> immediately will stop sending data if the following sequence of
>> events occurs:
>> 1. Disable processor interrupts
>> 2. Write a character to the ESCC
>> 3. Add additional characters to a buffer which is drained by the ISR
>> 4. Enable processor interrupts
>>
>> In this case, the first character will be sent, the interrupt will
>> fire and the ISR will output the second character. Since the pending
>> transmit interrupt remains asserted, no additional interrupts will
>> ever fire.
>>
>> This fixes that situation by explicitly lowering the IRQ when a
>> character is written to the buffer.
>>
>> Signed-off-by: Stephen Checkoway 
> 
> Looks good but I would like Mark to give his ack as well.
> 
> Mark, could you also add hw/char/escc.c to both SPARC and Mac sections
> of MAINTAINERS?

Cc'ing Artyom who also made some changes in this file, and Laurent.


Stephen, which particular chipset are you using?

This file models various types. I had a talk with Mark or Laurent at
last KVM forum about this device. IIRC, while the sun4m machines use a
real chipset, the MacIO embeds an slighly modified IP core.

I can't find what you describe in the Z85C30 doc, however in the ESCC
datasheet referenced in escc.c (which happens to be a 85MiB pdf!!!) I found:

  Transmit Interrupts and Transmit Buffer Empty Bit on the NMOS/CMOS

  The TxIP is reset either by writing data to the transmit buffer or
  by issuing the Reset Tx Int command in WR0.

I understand the NMOS/CMOS desc. matches the Z85C30 (no FIFO used).

So your description and patch makes sens.
What worries me is the controller could have other pending IRQs to
deliver and you are clearing them. Shouldn't we only clear the
INTR_TXINT bit, and call escc_update_irq() which should lower the IRQ if
no bits are pending?

Maybe as:

s->wregs[W_INTR] &= ~INTR_TXINT;
escc_update_irq(s);

Thanks,

Phil.

> Thanks,
> 
> Paolo
> 
>> ---
>>  hw/char/escc.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/hw/char/escc.c b/hw/char/escc.c
>> index 628f5f81f7..bea55ad8da 100644
>> --- a/hw/char/escc.c
>> +++ b/hw/char/escc.c
>> @@ -509,6 +509,7 @@ static void escc_mem_write(void *opaque, hwaddr addr,
>>  break;
>>  case SERIAL_DATA:
>>  trace_escc_mem_writeb_data(CHN_C(s), val);
>> +qemu_irq_lower(s->irq);
>>  s->tx = val;
>>  if (s->wregs[W_TXCTRL2] & TXCTRL2_TXEN) { // tx enabled
>>  if (qemu_chr_fe_backend_connected(>chr)) {
>>
> 
> 



[Qemu-devel] [PATCH] tci: Add implementation for INDEX_op_ld16u_i64

2019-04-10 Thread Stefan Weil
This fixes "make check-tcg" on a Debian x86_64 host.

Signed-off-by: Stefan Weil 
---
 tcg/tci.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/tcg/tci.c b/tcg/tci.c
index 33edca1903..a6208653e8 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -127,6 +127,12 @@ static void tci_write_reg8(tcg_target_ulong *regs, TCGReg 
index, uint8_t value)
 tci_write_reg(regs, index, value);
 }
 
+static void
+tci_write_reg16(tcg_target_ulong *regs, TCGReg index, uint16_t value)
+{
+tci_write_reg(regs, index, value);
+}
+
 static void
 tci_write_reg32(tcg_target_ulong *regs, TCGReg index, uint32_t value)
 {
@@ -585,6 +591,8 @@ uintptr_t tcg_qemu_tb_exec(CPUArchState *env, uint8_t 
*tb_ptr)
 tci_write_reg8(regs, t0, *(uint8_t *)(t1 + t2));
 break;
 case INDEX_op_ld8s_i32:
+TODO();
+break;
 case INDEX_op_ld16u_i32:
 TODO();
 break;
@@ -854,7 +862,14 @@ uintptr_t tcg_qemu_tb_exec(CPUArchState *env, uint8_t 
*tb_ptr)
 tci_write_reg8(regs, t0, *(uint8_t *)(t1 + t2));
 break;
 case INDEX_op_ld8s_i64:
+TODO();
+break;
 case INDEX_op_ld16u_i64:
+t0 = *tb_ptr++;
+t1 = tci_read_r(regs, _ptr);
+t2 = tci_read_s32(_ptr);
+tci_write_reg16(regs, t0, *(uint16_t *)(t1 + t2));
+break;
 case INDEX_op_ld16s_i64:
 TODO();
 break;
-- 
2.20.1




[Qemu-devel] [ANNOUNCE] QEMU 4.0.0-rc3 is now available

2019-04-10 Thread Michael Roth
Hello,

On behalf of the QEMU Team, I'd like to announce the availability of the
fourth release candidate for the QEMU 4.0 release.  This release is meant
for testing purposes and should not be used in a production environment.

  http://download.qemu-project.org/qemu-4.0.0-rc3.tar.xz
  http://download.qemu-project.org/qemu-4.0.0-rc3.tar.xz.sig

A note from the maintainer:

  If no release-critical bugs are reported with this release
  candidate, then we plan to release with no further changes
  on the 16th April. If we need to roll an rc4 then we'll do
  that on the 16th, with final 4.0.0 release on the 23rd.

You can help improve the quality of the QEMU 4.0 release by testing this
release and reporting bugs on Launchpad:

  https://bugs.launchpad.net/qemu/

The release plan, as well a documented known issues for release
candidates, are available at:

  http://wiki.qemu.org/Planning/4.0

Please add entries to the ChangeLog for the 4.0 release below:

  http://wiki.qemu.org/ChangeLog/4.0

Thank you to everyone involved!

Changes since rc2:

532cc6da74: Update version for v4.0.0-rc3 release (Peter Maydell)
065e6298a7: device_tree: Fix integer overflowing in load_device_tree() (Markus 
Armbruster)
f151f8aca5: migration/ram.c: Fix use-after-free in multifd_recv_unfill_packet() 
(Peter Maydell)
3e20c81ed8: tests: Make check-block a phony target (Markus Armbruster)
ae909496e9: hw/i386/pc: Fix crash when hot-plugging nvdimm on older machine 
types (Thomas Huth)
77b1757090: include/qemu/bswap.h: Use __builtin_memcpy() in accessor functions 
(Peter Maydell)
1cab464136: roms: Allow passing configure options to the EDK2 build tools 
(Philippe Mathieu-Daudé)
d912e795e0: roms: Rename the EFIROM variable to avoid clashing with iPXE 
(Philippe Mathieu-Daudé)
8cb2ca3d74: target/i386: Generate #UD for LOCK on a register increment (Peter 
Maydell)
5cf0d326a0: spapr_pci: Fix extended config space accesses (Greg Kurz)
1c685a9026: pci: Allow PCI bus subtypes to support extended config space 
accesses (Greg Kurz)
e53f88df77: nbd/client: Fix error message for server with unusable sizing (Eric 
Blake)
099fbcd65c: nbd/server: Don't fail NBD_OPT_INFO for byte-aligned sources (Eric 
Blake)
6e280648d2: nbd/server: Trace client noncompliance on unaligned requests (Eric 
Blake)
2178a569be: nbd/server: Fix blockstatus trace (Eric Blake)
ab63817119: hmp: Fix drive_add ... format=help crash (Markus Armbruster)
71ba74f67e: linux-user: rename gettid() to sys_gettid() to avoid clash with 
glibc (Daniel P. Berrangé)
184943d827: linux-user: assume __NR_gettid always exists (Daniel P. Berrangé)
3f48686fac: block: Forward 'discard' to temporary overlay (Kevin Wolf)
c19f2b711e: test qgraph.c: Fix segs due to out of scope default (Dr. David Alan 
Gilbert)
c098aac7dc: tests/libqos: fix usage of bool in pci-spapr.c (Jafar Abdi)
08f7ad1b00: tests/libqos: fix usage of bool in pci-pc.c (Jafar Abdi)
d013283a46: migration: Fix migrate_set_parameter (Juan Quintela)
c6e5bafb6f: migration/ram.c: Fix codes conflict about bitmap_mutex (Zhang Chen)
79bcac250f: riscv: plic: Log guest errors (Alistair Francis)
0feb4a7129: riscv: plic: Fix incorrect irq calculation (Alistair Francis)
2bcd05cf24: xen-block: scale sector based quantities correctly (Paul Durrant)
15f084505a: xen-block: only advertize discard to the frontend when it is 
enabled... (Paul Durrant)
7357b22159: hw/s390x/3270-ccw: avoid taking address of fields in packed struct 
(Daniel P. Berrangé)
5d45a33292: hw/s390x/ipl: avoid taking address of fields in packed struct 
(Daniel P. Berrangé)
bea0279b72: hw/s390/css: avoid taking address members in packed structs (Daniel 
P. Berrangé)
e1d0b37261: hw/vfio/ccw: avoid taking address members in packed structs (Daniel 
P. Berrangé)
81fb1e646e: intel_iommu: Drop extended root field (Peter Xu)
2811af3b49: intel_iommu: Fix root_scalable migration breakage (Peter Xu)
20f86a75a7: virtio-net: Fix typo in comment (Yuval Shaia)
75c5626c88: intel_iommu: Correct caching-mode error message (Alex Williamson)
22132828d1: acpi: verify file entries in bios_linker_loader_add_pointer() (Liam 
Merwick)




Re: [Qemu-devel] [RFC-PATCH] Introducing virtio-example device.

2019-04-10 Thread Stefan Hajnoczi
On Wed, Apr 10, 2019 at 4:45 PM Yoni Bettan  wrote:
> On 4/9/19 4:17 PM, Stefan Hajnoczi wrote:
> > On Mon, Apr 01, 2019 at 02:18:43PM +0300, Yoni Bettan wrote:
> The final purpose is to have:
>
> 1. device specification
>
> 2. device implementation
>
> 3. device driver
>
> 4. blog
>
> maybe I should have written it at the beginning, this is not the entire
> project but it is its start.

The way I'd design VIRTIO devices without prior knowledge is:

1. Learn the VIRTIO device model.  Understand the concepts in VIRTIO.
<-- this is hard today, there's not much good documentation

2. Design an initial version of the device spec.  Mostly configuration
layout, virtqueues, and request structs.  Not much text is necessary
at this point, but it's critical for thinking through features before
implementation.

3. Implement guest driver and device emulation.

4. Iterate on spec and implementation until it's functionally complete.

5. Submit the spec to the VIRTIO Technical Committee.

6. Submit driver and device emulation patches.  They can be merged
when the spec is approved/close to approved.

Are you jumping to #3?  This is likely to lead to poor quality
implementations and specs because the fundamental VIRTIO concepts
weren't understood.

If the point is to educate others and/or do it "the right way", then I
would really avoid hacking around without first doing the other steps.

Stefan



Re: [Qemu-devel] [RFC-PATCH] Introducing virtio-example device.

2019-04-10 Thread Stefan Hajnoczi
On Wed, Apr 10, 2019 at 4:45 PM Yoni Bettan  wrote:
> On 4/9/19 4:17 PM, Stefan Hajnoczi wrote:
> > On Mon, Apr 01, 2019 at 02:18:43PM +0300, Yoni Bettan wrote:
> > There are multiple problems with the code, but the larger issue is that
> > this example device is just helping people shoot themselves in the foot
> > more easily.
>
>
> If you can point me to those problem I will be glad so I can update the
> code and understand those problems you are talking about.

Please see Eduardo's reply.  I didn't review much since he already
pointed out many things.

One thing he didn't mention:
+elem = virtqueue_pop(vq, sizeof(VirtQueueElement));

The return value can be NULL.  Spurious notifications could happen so
the code shouldn't crash when this returns NULL.

I apologize for the critical replies.  What you're doing is valuable.
I think explaining the VIRTIO device model and the order in which
things are done will lead to higher quality devices so I'm making a
lot of noise about it :).

Stefan



Re: [Qemu-devel] [PATCH] configure: Automatically fall back to TCI on non-release architectures

2019-04-10 Thread Stefan Weil
On 10.04.19 10:22, Thomas Huth wrote:
> Additionally, I think it should be possible to compile with the
> x86_64-linux-user target and then to run "make check-tcg" ... however,
> that currently crashes with:
> 
> TODO qemu/tcg/tci.c:859: tcg_qemu_tb_exec()
> qemu/tcg/tci.c:859: tcg fatal error
> qemu: uncaught target signal 11 (Segmentation fault) - core dumped

That's easy to fix. I implemented only those TCG opcodes which really
were executed during testing. This test triggers an assertion because it
needs INDEX_op_ld16u_i64 which was missing up to now.

I'll send a patch which adds support for INDEX_op_ld16u_i64.

Thank you for your tests.

Stefan




Re: [Qemu-devel] [PATCH for-4.1 v2] qemu-img: Saner printing of large file sizes

2019-04-10 Thread Max Reitz
On 01.04.19 16:57, Eric Blake wrote:
> Disk sizes close to INT64_MAX cause overflow, for some pretty
> ridiculous output:
> 
>   $ ./nbdkit -U - memory size=$((2**63 - 512)) --run 'qemu-img info $nbd'
>   image: nbd+unix://?socket=/tmp/nbdkitHSAzNz/socket
>   file format: raw
>   virtual size: -8388607T (9223372036854775296 bytes)
>   disk size: unavailable
> 
> But there's no reason to have two separate implementations of integer
> to human-readable abbreviation, where one has overflow and stops at
> 'T', while the other avoids overflow and goes all the way to 'E'. With
> this patch, the output now claims 8EiB instead of -8388607T, which
> really is the correct rounding of largest file size supported by qemu
> (we could go 511 bytes larger if we used byte-accurate sizing instead
> of rounding up to the next sector boundary, but that wouldn't change
> the human-readable result).
> 
> Quite a few iotests need updates to expected output to match.
> 
> Reported-by: Richard W.M. Jones 
> Signed-off-by: Eric Blake 
> Tested-by: Richard W.M. Jones 
> Reviewed-by: Alberto Garcia 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> ---
> 
> v2 - actually update iotests to match; no change to block/ code so R-b added

There are more iotests this breaks.  First, there is 059 for vmdk, which
looks just like the rest.

But for -m32, it gets a bit more difficult.  Every size above 999 GB
(1000 GB gets rounded to 1 TB, which is 2^31 * 512) gets printed as
"inf [unit]":

$ ./qemu-img create -f qcow2 /tmp/foo.qcow2 1T
$ ./qemu-img info /tmp/foo.qcow2
[...]
virtual size: inf TiB (1099511627776 bytes)
[...]

$ ./qemu-img create -f qcow2 /tmp/foo.qcow2 1P
$ ./qemu-img info /tmp/foo.qcow2
[...]
virtual size: inf PiB (1125899906842624 bytes)
[...]

This breaks the iotests that test the maximum disk size of more exotic
formats.  I can see failures for vdi, vhdx, and parallels.

But regardless of the iotests, we shouldn’t show the size as infinite
just because of -m32.

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH for-4.0?] hmp: delvm: use hmp_handle_error

2019-04-10 Thread Cole Robinson
On 4/10/19 2:27 PM, Eric Blake wrote:
> On 4/10/19 1:03 PM, Cole Robinson wrote:
>> This gives us the consistent 'Error:' prefix added in 66363e9a43f,
>> which helps users like libvirt who still need to scrape hmp error
>> messages to detect failure.
>>
>> Signed-off-by: Cole Robinson 
>> ---
>>  hmp.c | 7 ---
>>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> Not enough to drive -rc4 on its own, but worth adding to our wishlist of
> potential easy patches if we do have a release blocker surface.
> 
> Reviewed-by: Eric Blake 
> 
>>
>> diff --git a/hmp.c b/hmp.c
>> index 8eec768088..74a4bfc1f9 100644
>> --- a/hmp.c
>> +++ b/hmp.c
>> @@ -1481,10 +1481,11 @@ void hmp_delvm(Monitor *mon, const QDict *qdict)
>>  const char *name = qdict_get_str(qdict, "name");
>>  
>>  if (bdrv_all_delete_snapshot(name, , ) < 0) {
>> -error_reportf_err(err,
>> -  "Error while deleting snapshot on device '%s': ",
>> -  bdrv_get_device_name(bs));
>> +error_prepend(,
>> +  "Error while deleting snapshot on device '%s': ",
> 
> Do we want to reword the message (maybe s/Error while //) to avoid the
> word "Error" twice in the same line?
> 

I'm fine with that, and I checked it won't affect libvirt's error
scraping in this area

Thanks,
Cole



[Qemu-devel] QEMU Logo question

2019-04-10 Thread Barajas, Felipe
Hi,

I am trying to understand who in the QEMU community can give me permission to 
use the QEMU logo in a whitepaper.
I am an application engineer at Intel and I have prepared some benchmarks using 
QEMU as a solution accelerated by Intel hardware (CPUs, SSDs, etc)
I would like to use the QEMU logo in this white paper.

Do you know who in the community would have authority to give me such 
permission ?

Thank you

Felipe Barajas




Re: [Qemu-devel] [PATCH for-4.0?] hmp: delvm: use hmp_handle_error

2019-04-10 Thread Eric Blake
On 4/10/19 1:03 PM, Cole Robinson wrote:
> This gives us the consistent 'Error:' prefix added in 66363e9a43f,
> which helps users like libvirt who still need to scrape hmp error
> messages to detect failure.
> 
> Signed-off-by: Cole Robinson 
> ---
>  hmp.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)

Not enough to drive -rc4 on its own, but worth adding to our wishlist of
potential easy patches if we do have a release blocker surface.

Reviewed-by: Eric Blake 

> 
> diff --git a/hmp.c b/hmp.c
> index 8eec768088..74a4bfc1f9 100644
> --- a/hmp.c
> +++ b/hmp.c
> @@ -1481,10 +1481,11 @@ void hmp_delvm(Monitor *mon, const QDict *qdict)
>  const char *name = qdict_get_str(qdict, "name");
>  
>  if (bdrv_all_delete_snapshot(name, , ) < 0) {
> -error_reportf_err(err,
> -  "Error while deleting snapshot on device '%s': ",
> -  bdrv_get_device_name(bs));
> +error_prepend(,
> +  "Error while deleting snapshot on device '%s': ",

Do we want to reword the message (maybe s/Error while //) to avoid the
word "Error" twice in the same line?

> +  bdrv_get_device_name(bs));
>  }
> +hmp_handle_error(mon, );
>  }
>  
>  void hmp_info_snapshots(Monitor *mon, const QDict *qdict)
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-devel] [PATCH v2] target/i386: kvm: add VMX migration blocker

2019-04-10 Thread Cole Robinson
On 11/20/18 6:44 AM, Dr. David Alan Gilbert wrote:
> * Paolo Bonzini (pbonz...@redhat.com) wrote:
>> Nested VMX does not support live migration yet.  Add a blocker
>> until that is worked out.
>>
>> Nested SVM only does not support it, but unfortunately it is
>> enabled by default for -cpu host so we cannot really disable it.
>>
>> Signed-off-by: Paolo Bonzini 
> 
> So I'm OK with this, but it does need a release note warning whenever it
> goes in, because it'll surprise those who've already enabled nesting
> but don't use it on all their VMs.
> 

We are hitting this in Fedora 30. Now that nested VMX is enabled by
default at the kernel level, and virt-manager/boxes will use the
equivalent of -cpu host by default, libvirt managedsave (migrate to
file) and virt-manager snapshots (savevm) are rejected for default
created VMs on intel. That's quite unfortunate.

Any ideas on how to resolve this?

FYI it's the root of this bug though there's libvirt issues mixed in:
https://bugzilla.redhat.com/show_bug.cgi?id=1697997

Thanks,
Cole



Re: [Qemu-devel] [PATCH] qemu-img: fix .hx and .texi disparity

2019-04-10 Thread Eric Blake
On 4/10/19 12:37 PM, John Snow wrote:
> 
> 
> On 4/10/19 1:39 AM, Markus Armbruster wrote:
>> John Snow  writes:
>>
>>> It turns out that having options listed in three places continues to be
>>> a bad idea. I'm still toying with the idea of an improved infrastructure
>>> here, but in the meantime, another bandaid.
>>>

>>
>> Nice to have in 4.0.
>>
>> Reviewed-by: Markus Armbruster 
>>
> 
> If it can go in for 4.0, who stages it? Kevin?

At this point, we've missed -rc3; the fix on its own is not a
release-blocker, but if something else comes up that requires -rc4 for
an actual release blocker, we can sort out someone sending a pull
request for this one at the same time. (I'm aware of a couple of iotest
fixups in the same boat... Guess it's time to start updating the wiki to
point to email threads where we document 'wish list' patches)

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH for-4.1] roms: Correct the EDK2_BASETOOLS_OPTFLAGS variable description

2019-04-10 Thread Laszlo Ersek
On 04/10/19 17:10, Philippe Mathieu-Daudé wrote:
> On 4/10/19 4:58 PM, Laszlo Ersek wrote:
>> On 04/10/19 06:57, Philippe Mathieu-Daudé wrote:
>>> In commit 1cab464136b4 we incorrectly described the
>>> EDK2_BASETOOLS_OPTFLAGS can pass CPPFLAGS and CFLAGS
>>> options to the EDK2 build tools, but it only expands
>>> the CFLAGS (not to the CPPFLAGS).
>>> Update the description to be more accurate.
>>>
>>> Reported-by: Laszlo Ersek 
>>> Signed-off-by: Philippe Mathieu-Daudé 
>>> ---
>>>  roms/Makefile | 4 ++--
>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/roms/Makefile b/roms/Makefile
>>> index 1ff78b63bb3..d7fd6025e7c 100644
>>> --- a/roms/Makefile
>>> +++ b/roms/Makefile
>>> @@ -120,8 +120,8 @@ build-efi-roms: build-pxe-roms
>>> $(patsubst %,bin-i386-efi/%.efidrv,$(pxerom_targets)) \
>>> $(patsubst %,bin-x86_64-efi/%.efidrv,$(pxerom_targets))
>>>  
>>> -# Build scripts can pass compiler/linker flags to the EDK2 build tools
>>> -# via the EDK2_BASETOOLS_OPTFLAGS (CPPFLAGS and CFLAGS) and
>>> +# Build scripts can pass compiler/linker flags to the EDK2
>>> +# build  tools via the EDK2_BASETOOLS_OPTFLAGS (CFLAGS) and
>>>  # EDK2_BASETOOLS_LDFLAGS (LDFLAGS) environment variables.
>>>  #
>>>  # Example:
>>>
>>
>> Reviewed-by: Laszlo Ersek 
>>
>> Please noone submit a pull request with this patch (for 4.1) until my
>> edk2 bundling series is merged; as this one will likely introduce
>> another contextual conflict, and I'd *really* like to avoid another
>> rebase on my end, due to such an issue.
> 
> Thanks for the review. I'll rebase it on top of your 'bundle edk2
> platform firmware with QEMU' v4 and resend (later).

Thank you!

> Feel free to include it on top of your pullreq.

... I think I'll keep that pullreq as minimal as it can be. :)

Thanks
Laszlo

> 
> Regards,
> 
> Phil.
> 




[Qemu-devel] [PATCH] hmp: delvm: use hmp_handle_error

2019-04-10 Thread Cole Robinson
This gives us the consistent 'Error:' prefix added in 66363e9a43f,
which helps users like libvirt who still need to scrape hmp error
messages to detect failure.

Signed-off-by: Cole Robinson 
---
 hmp.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/hmp.c b/hmp.c
index 8eec768088..74a4bfc1f9 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1481,10 +1481,11 @@ void hmp_delvm(Monitor *mon, const QDict *qdict)
 const char *name = qdict_get_str(qdict, "name");
 
 if (bdrv_all_delete_snapshot(name, , ) < 0) {
-error_reportf_err(err,
-  "Error while deleting snapshot on device '%s': ",
-  bdrv_get_device_name(bs));
+error_prepend(,
+  "Error while deleting snapshot on device '%s': ",
+  bdrv_get_device_name(bs));
 }
+hmp_handle_error(mon, );
 }
 
 void hmp_info_snapshots(Monitor *mon, const QDict *qdict)
-- 
2.21.0




Re: [Qemu-devel] [Qemu-block] [PULL 07/14] iotests: Add 241 to test NBD on unaligned images

2019-04-10 Thread Eric Blake
On 4/10/19 12:45 PM, Max Reitz wrote:
> On 01.04.19 16:08, Eric Blake wrote:
>> Add a test for the NBD client workaround in the previous patch.  It's
>> not really feasible for an iotest to assume a specific tracing engine,
>> so we can't really probe trace_nbd_parse_blockstatus_compliance to see
>> if the server was fixed vs. whether the client just worked around the
>> server (other than by rearranging order between code patches and this
>> test). But having a successful exchange sure beats the previous state
>> of an error message. Since format probing can change alignment, we can
>> use that as an easy way to test several configurations.
>>
>> Not tested yet, but worth adding to this test in future patches: an
>> NBD server that can advertise a non-sector-aligned size (such as
>> nbdkit) causes qemu as the NBD client to misbehave when it rounds the
>> size up and accesses beyond the advertised size. Qemu as NBD server
>> never advertises a non-sector-aligned size (since bdrv_getlength()
>> currently rounds up to sector boundaries); until qemu can act as such
>> a server, testing that flaw will have to rely on external binaries.
>>

>> +++ b/tests/qemu-iotests/241.out

>> +=== Exporting unaligned raw image, forced server sector alignment ===
>> +
>> +WARNING: Image format was not specified for 
>> '/home/eblake/qemu/tests/qemu-iotests/scratch/t.raw' and probing guessed raw.
>> + Automatically detecting the format is dangerous for raw images, 
>> write operations on block 0 will be restricted.
>> + Specify the 'raw' format explicitly to remove the restrictions.
> 
> Is there a patch for this already?

Oh shoot. I saw the complaint about the hard-coded path needing a
filter, and still failed to fix it in time for 4.0-rc3. Not a driver for
-rc4 on its own, but if we have something else pop up that mandates
-rc4, I'll get that fix in.

Will post shortly...

> 
> Also, I planned to reply to the original patch – but where is it?  The
> latest version I can see on the list is v3, and that only has seven
> lines in the reference output.  Well, the difference would explain why
> Vladimir would give a Tested-by to a test that clearly will not run on
> his machine.

Yeah, this line in the commit message is evidence of my botching things:

> [eblake: add forced-512 alignment, and nbdkit reproducer comment]

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH RFC 2/3] pxtool: Add new qemu-img command info generation tool

2019-04-10 Thread John Snow



On 4/10/19 1:54 AM, Markus Armbruster wrote:
> John Snow  writes:
> 
>> Presently we use hxtool and a .hx format to generate a few things like
>> the qemu_img subcommand dispatch table, the qemu_img help() display output,
>> and a help output in qemu-img.texi.
>>
>> Unfortunately, this means that this information is duplicated in at least
>> three places:
>>
>> (1) in qemu-img-cmds.hx as a human readable string
>> (2) in qemu-img-cmds.hx as a texi string
>> (3) in qemu-img.texi again, as a texi string
>>
>> We can do a little better, though: all of these sources can be generated
>> from a single json file. Add a new small tool ("pxtool") that can do this.
>>
>> This tool can at least handle generating (1) and (2) from the same source
>> without needing to reduplicate that information. Deduplicating (3) happens
>> in the next patch.
>>
>> Notes:
>>  - The json format was chosen to be very "stupid", and the command line
>>documentation is being kept one-per-line to make future diffs easier
>>to read.
>>  - It's easy enough to generate the human-readable output from the texi
>>output by removing '@var{foo}' with 'foo'.
>>  - The qemu-img command dispatch always calls img_cmdname, so we don't
>>need to store this information separately, either.
>>  - The need for DEF() macros could be removed as well, but I left it in
>>to create a minimally disruptive patch.
>> Signed-off-by: John Snow 
> 
> We got just five .hx:
> 
> qemu-img.cmds.hxkilled off by this patch
> qemu-options.hx CLI QAPIfication should kill this off
> hw/audio/pl041.hx   misnamed, not actually food for hxtool
> hmp-commands.hx no exit strategy
> hmp-commands-info.hxfor these two, yet
> 
> CLI QAPIfication got stuck in the back-burner, and as long as that's the
> case, it's not an alternative to your patches.
> 

Good to know. I figure that at least while we wait on something more
comprehensive there's no real short term harm in tidying up.

Something I'd really like to do is define a python/json-esque
configuration file that allows you to specify some "common options" that
are shared between all of the sub-commands, and then define each command
in terms of both which common options it possesses, and then any local
command-specific options it has.

(Why the common options design? So that argument names and command
options can be encouraged to be identical and identically documented
between all subcommands that use them.)

Then, from the configuration file, generate all of the necessary C
parsers (I have a protoype for this, it's not difficult to do in at
least the basic case) that can return boxed command arguments, and then
also generate the help strings from that metadata.

I suspect that work *IS* something that might brush up against / should
use (or extend) QAPI code.

Then, I'd like to find a way to split qemu-img.texi into sub-command
files and find a way to source the same "informative text" for both:

(1) The texi output, as per usual, and
(2) qemu-img subcommand --help

such that --help, when passed as an argument to the subcommand, prints
out help only relevant to the subcommand instead, e.g.

`qemu-img check --help`

might print the "common options" section only as it relates to commands
actually used by the check command, then prints the check section of the
texi as formatted for terminals.

This way, we can have manpages, html pages, and interactive help text
all derived from the same semi-automated sources, always up to date and
much easier to read/discover/parse by human eyeballs.

That's what I'd like to accomplish, ultimately.

For now, I think this RFC set is not harmful and won't bother anybody.
It's definitely not worse than what we have now, fragility of removing
@var{} tokens and all.

--js




Re: [Qemu-devel] [PATCH v2 6/8] migration: Add multifd-compress parameter

2019-04-10 Thread Dr. David Alan Gilbert
* Juan Quintela (quint...@redhat.com) wrote:
> Signed-off-by: Juan Quintela 
> 
> ---
> Rename it to NONE
> ---
>  hmp.c| 17 +
>  hw/core/qdev-properties.c| 11 +++
>  include/hw/qdev-properties.h |  1 +
>  migration/migration.c| 16 
>  qapi/common.json | 13 +
>  qapi/migration.json  | 19 +++
>  tests/migration-test.c   | 13 ++---
>  7 files changed, 83 insertions(+), 7 deletions(-)
> 
> diff --git a/hmp.c b/hmp.c
> index 8eec768088..02fbe27757 100644
> --- a/hmp.c
> +++ b/hmp.c
> @@ -435,6 +435,9 @@ void hmp_info_migrate_parameters(Monitor *mon, const 
> QDict *qdict)
>  monitor_printf(mon, "%s: %u\n",
>  MigrationParameter_str(MIGRATION_PARAMETER_MULTIFD_CHANNELS),
>  params->multifd_channels);
> +monitor_printf(mon, "%s: %s\n",
> +MigrationParameter_str(MIGRATION_PARAMETER_MULTIFD_COMPRESS),
> +MultifdCompress_str(params->multifd_compress));
>  monitor_printf(mon, "%s: %" PRIu64 "\n",
>  MigrationParameter_str(MIGRATION_PARAMETER_XBZRLE_CACHE_SIZE),
>  params->xbzrle_cache_size);
> @@ -1737,6 +1740,7 @@ void hmp_migrate_set_parameter(Monitor *mon, const 
> QDict *qdict)
>  MigrateSetParameters *p = g_new0(MigrateSetParameters, 1);
>  uint64_t valuebw = 0;
>  uint64_t cache_size;
> +int compress_type;
>  Error *err = NULL;
>  int val, ret;
>  
> @@ -1822,6 +1826,19 @@ void hmp_migrate_set_parameter(Monitor *mon, const 
> QDict *qdict)
>  p->has_multifd_channels = true;
>  visit_type_int(v, param, >multifd_channels, );
>  break;
> +case MIGRATION_PARAMETER_MULTIFD_COMPRESS:
> +p->has_multifd_compress = true;
> +visit_type_enum(v, param, _type,
> +_lookup, );
> +if (err) {
> +break;
> +}
> +if (compress_type < 0 || compress_type > MULTIFD_COMPRESS__MAX) {

I still think that needs to be >= rather than > ?

Dave

> +error_setg(, "Invalid multifd_compress option %s", valuestr);
> +break;
> +}
> +p->multifd_compress = compress_type;
> +break;
>  case MIGRATION_PARAMETER_XBZRLE_CACHE_SIZE:
>  p->has_xbzrle_cache_size = true;
>  visit_type_size(v, param, _size, );
> diff --git a/hw/core/qdev-properties.c b/hw/core/qdev-properties.c
> index 5da1439a8b..7c8e71532f 100644
> --- a/hw/core/qdev-properties.c
> +++ b/hw/core/qdev-properties.c
> @@ -645,6 +645,17 @@ const PropertyInfo qdev_prop_fdc_drive_type = {
>  .set_default_value = set_default_value_enum,
>  };
>  
> +/* --- MultifdCompress --- */
> +
> +const PropertyInfo qdev_prop_multifd_compress = {
> +.name = "MultifdCompress",
> +.description = "multifd_compress values",
> +.enum_table = _lookup,
> +.get = get_enum,
> +.set = set_enum,
> +.set_default_value = set_default_value_enum,
> +};
> +
>  /* --- pci address --- */
>  
>  /*
> diff --git a/include/hw/qdev-properties.h b/include/hw/qdev-properties.h
> index b6758c852e..ac452d8f2c 100644
> --- a/include/hw/qdev-properties.h
> +++ b/include/hw/qdev-properties.h
> @@ -23,6 +23,7 @@ extern const PropertyInfo qdev_prop_tpm;
>  extern const PropertyInfo qdev_prop_ptr;
>  extern const PropertyInfo qdev_prop_macaddr;
>  extern const PropertyInfo qdev_prop_on_off_auto;
> +extern const PropertyInfo qdev_prop_multifd_compress;
>  extern const PropertyInfo qdev_prop_losttickpolicy;
>  extern const PropertyInfo qdev_prop_blockdev_on_error;
>  extern const PropertyInfo qdev_prop_bios_chs_trans;
> diff --git a/migration/migration.c b/migration/migration.c
> index 609e0df5d0..d6f8ef342a 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -82,6 +82,7 @@
>  /* The delay time (in ms) between two COLO checkpoints */
>  #define DEFAULT_MIGRATE_X_CHECKPOINT_DELAY (200 * 100)
>  #define DEFAULT_MIGRATE_MULTIFD_CHANNELS 2
> +#define DEFAULT_MIGRATE_MULTIFD_COMPRESS MULTIFD_COMPRESS_NONE
>  
>  /* Background transfer rate for postcopy, 0 means unlimited, note
>   * that page requests can still exceed this limit.
> @@ -769,6 +770,8 @@ MigrationParameters *qmp_query_migrate_parameters(Error 
> **errp)
>  params->block_incremental = s->parameters.block_incremental;
>  params->has_multifd_channels = true;
>  params->multifd_channels = s->parameters.multifd_channels;
> +params->has_multifd_compress = true;
> +params->multifd_compress = s->parameters.multifd_compress;
>  params->has_xbzrle_cache_size = true;
>  params->xbzrle_cache_size = s->parameters.xbzrle_cache_size;
>  params->has_max_postcopy_bandwidth = true;
> @@ -1268,6 +1271,9 @@ static void 
> migrate_params_test_apply(MigrateSetParameters *params,
>  if (params->has_multifd_channels) {
>  dest->multifd_channels = params->multifd_channels;
>  }
> +if 

Re: [Qemu-devel] [PATCH] roms: List and describe the Makefile 'clean' rule

2019-04-10 Thread Laszlo Ersek
On 04/10/19 07:36, Philippe Mathieu-Daudé wrote:
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  roms/Makefile | 10 --
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/roms/Makefile b/roms/Makefile
> index 1ff78b63bb3..f55c4a2d3bb 100644
> --- a/roms/Makefile
> +++ b/roms/Makefile
> @@ -61,6 +61,8 @@ default:
>   @echo "  skiboot-- update skiboot.lid"
>   @echo "  u-boot.e500-- update u-boot.e500"
>   @echo "  u-boot.sam460  -- update u-boot.sam460"
> + @echo "  clean  -- delete the files generated by the previous" \
> +   "build targets"
>  
>  bios: build-seabios-config-seabios-128k build-seabios-config-seabios-256k
>   cp seabios/builds/seabios-128k/bios.bin ../pc-bios/bios.bin

This hunk makes sense in general, but it will conflict with my

  [Qemu-devel] [PATCH for-4.1 v4 07/12]
  roms: build edk2 firmware binaries and variable store templates

Please let's delay this hunk until after my pull...

> @@ -102,7 +104,7 @@ pxe-rom-%: build-pxe-roms
>  
>  efirom: $(patsubst %,efi-rom-%,$(pxerom_variants))
>  
> -efi-rom-%: build-pxe-roms build-efi-roms $(EDK2_EFIROM)
> +efi-rom-%: build-pxe-roms build-efi-roms edk2-basetools
>   $(EDK2_EFIROM) -f "0x$(VID)" -i "0x$(DID)" -l 0x02 \
>   -b ipxe/src/bin/$(VID)$(DID).rom \
>   -ec ipxe/src/bin-i386-efi/$(VID)$(DID).efidrv \

This hunk doesn't belong in this patch at all -- it's from my patch

  [Qemu-devel] [PATCH for-4.1 v4 06/12]
  roms/Makefile: replace the $(EDK2_EFIROM) target with "edk2-basetools"

(The hunk also conflicts with the purpose stated in $SUBJECT.)

> @@ -131,7 +133,7 @@ build-efi-roms: build-pxe-roms
>  #EDK2_BASETOOLS_LDFLAGS='...' \
>  #efirom
>  #
> -$(EDK2_EFIROM):
> +edk2-basetools:
>   $(MAKE) -C edk2/BaseTools \
>   EXTRA_OPTFLAGS='$(EDK2_BASETOOLS_OPTFLAGS)' \
>   EXTRA_LDFLAGS='$(EDK2_BASETOOLS_LDFLAGS)'

Ditto.

> @@ -156,6 +158,9 @@ skiboot:
>   $(MAKE) -C skiboot CROSS=$(powerpc64_cross_prefix)
>   cp skiboot/skiboot.lid ../pc-bios/skiboot.lid
>  
> +efi: edk2-basetools
> + $(MAKE) -f Makefile.edk2
> +
>  clean:
>   rm -rf seabios/.config seabios/out seabios/builds
>   $(MAKE) -C sgabios clean

And this seems to belong to:

  [Qemu-devel] [PATCH for-4.1 v4 07/12]
  roms: build edk2 firmware binaries and variable store templates

> @@ -166,3 +171,4 @@ clean:
>   rm -rf u-boot/build.e500
>   $(MAKE) -C u-boot-sam460ex distclean
>   $(MAKE) -C skiboot clean
> + $(MAKE) -f Makefile.edk2 clean
> 

Ditto.

So, for qemu-trivial's sake:

Nacked-by: Laszlo Ersek 

Thanks
Laszlo



Re: [Qemu-devel] [Qemu-block] [PULL 07/14] iotests: Add 241 to test NBD on unaligned images

2019-04-10 Thread Max Reitz
On 01.04.19 16:08, Eric Blake wrote:
> Add a test for the NBD client workaround in the previous patch.  It's
> not really feasible for an iotest to assume a specific tracing engine,
> so we can't really probe trace_nbd_parse_blockstatus_compliance to see
> if the server was fixed vs. whether the client just worked around the
> server (other than by rearranging order between code patches and this
> test). But having a successful exchange sure beats the previous state
> of an error message. Since format probing can change alignment, we can
> use that as an easy way to test several configurations.
> 
> Not tested yet, but worth adding to this test in future patches: an
> NBD server that can advertise a non-sector-aligned size (such as
> nbdkit) causes qemu as the NBD client to misbehave when it rounds the
> size up and accesses beyond the advertised size. Qemu as NBD server
> never advertises a non-sector-aligned size (since bdrv_getlength()
> currently rounds up to sector boundaries); until qemu can act as such
> a server, testing that flaw will have to rely on external binaries.
> 
> Signed-off-by: Eric Blake 
> Message-Id: <20190329042750.14704-2-ebl...@redhat.com>
> Tested-by: Vladimir Sementsov-Ogievskiy 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> [eblake: add forced-512 alignment, and nbdkit reproducer comment]
> ---
>  tests/qemu-iotests/241 | 100 +
>  tests/qemu-iotests/241.out |  26 ++
>  tests/qemu-iotests/group   |   1 +
>  3 files changed, 127 insertions(+)
>  create mode 100755 tests/qemu-iotests/241
>  create mode 100644 tests/qemu-iotests/241.out
> 
> diff --git a/tests/qemu-iotests/241 b/tests/qemu-iotests/241
> new file mode 100755
> index 000..4b196857387
> --- /dev/null
> +++ b/tests/qemu-iotests/241
> @@ -0,0 +1,100 @@
> +#!/bin/bash
> +#
> +# Test qemu-nbd vs. unaligned images
> +#
> +# Copyright (C) 2018-2019 Red Hat, Inc.
> +#
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 2 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see .
> +#
> +
> +seq="$(basename $0)"
> +echo "QA output created by $seq"
> +
> +status=1 # failure is the default!
> +
> +nbd_unix_socket=$TEST_DIR/test_qemu_nbd_socket
> +
> +_cleanup()
> +{
> +_cleanup_test_img
> +nbd_server_stop
> +}
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +# get standard environment, filters and checks
> +. ./common.rc
> +. ./common.filter
> +. ./common.nbd
> +
> +_supported_fmt raw
> +_supported_proto nbd
> +_supported_os Linux
> +_require_command QEMU_NBD
> +
> +# can't use _make_test_img, because qemu-img rounds image size up,
> +# and because we want to use Unix socket rather than TCP port. Likewise,
> +# we have to redirect TEST_IMG to our server.
> +# This tests that we can deal with the hole at the end of an unaligned
> +# raw file (either because the server doesn't advertise alignment too
> +# large, or because the client ignores the server's noncompliance - even
> +# though we can't actually wire iotests into checking trace messages).
> +printf %01000d 0 > "$TEST_IMG_FILE"
> +TEST_IMG="nbd:unix:$nbd_unix_socket"
> +
> +echo
> +echo "=== Exporting unaligned raw image, natural alignment ==="
> +echo
> +
> +nbd_server_start_unix_socket -f $IMGFMT "$TEST_IMG_FILE"
> +
> +$QEMU_NBD_PROG --list -k $nbd_unix_socket | grep '\(size\|min\)'
> +$QEMU_IMG map -f raw --output=json "$TEST_IMG" | _filter_qemu_img_map
> +$QEMU_IO -f raw -c map "$TEST_IMG"
> +nbd_server_stop
> +
> +echo
> +echo "=== Exporting unaligned raw image, forced server sector alignment ==="
> +echo
> +
> +# Intentionally omit '-f' to force image probing, which in turn forces
> +# sector alignment, here at the server.
> +nbd_server_start_unix_socket "$TEST_IMG_FILE"
> +
> +$QEMU_NBD_PROG --list -k $nbd_unix_socket | grep '\(size\|min\)'
> +$QEMU_IMG map -f raw --output=json "$TEST_IMG" | _filter_qemu_img_map
> +$QEMU_IO -f raw -c map "$TEST_IMG"
> +nbd_server_stop
> +
> +echo
> +echo "=== Exporting unaligned raw image, forced client sector alignment ==="
> +echo
> +
> +# Now force sector alignment at the client.
> +nbd_server_start_unix_socket -f $IMGFMT "$TEST_IMG_FILE"
> +
> +$QEMU_NBD_PROG --list -k $nbd_unix_socket | grep '\(size\|min\)'
> +$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
> +$QEMU_IO -c map "$TEST_IMG"
> +nbd_server_stop
> +
> +# Not tested yet: we also want to ensure that qemu as NBD client does
> +# not access 

Re: [Qemu-devel] [PATCH] qemu-img: fix .hx and .texi disparity

2019-04-10 Thread John Snow



On 4/10/19 1:39 AM, Markus Armbruster wrote:
> John Snow  writes:
> 
>> It turns out that having options listed in three places continues to be
>> a bad idea. I'm still toying with the idea of an improved infrastructure
>> here, but in the meantime, another bandaid.
>>
>> There are three locations:
>> (1) .hx file, formatted as texi
>> (2) .hx file, formatted as human readable.
>> (3) .texi file, as section headers, formatted as texi.
>>
>> You can compare the two summaries within the .hx file like so:
>>
>> Human-readable command summaries:
>> `./qemu-img --help | grep 'Command syntax' -A14`
>> Detokenized texi command summaries:
>> `grep "@item" qemu-img-cmds.hx | sed -E 's|@var\{([^\}]*?)\}|\1|g'`
>>
>> You can compare the two separate texi summaries like so:
>>
>> Texi command summaries:
>> `grep "@item" qemu-img-cmds.hx"`
>> Texi command headers:
>> grep -E "@item.*@var" qemu-img.texi | tail -14
>>
>> Signed-off-by: John Snow 
> 
> Extra points for explaining how you found the issues in detail
> sufficient for reproducing.
> 
> Nice to have in 4.0.
> 
> Reviewed-by: Markus Armbruster 
> 

If it can go in for 4.0, who stages it? Kevin?

--js



Re: [Qemu-devel] [PATCH RESEND v2 2/2] cpus: move hvf_cpu_synchronize* calls to cpu_synchronize* functions

2019-04-10 Thread Paolo Bonzini
On 10/04/19 16:02, Roman Bolshakov wrote:
> I've applied, built and tested both sequentially. Applying and running
> with patch 1/2 alone doesn't result in the behavior I mentioned. I also
> tried to apply only the first hunk that moves hvf_cpu_synchronize_state
> into cpu_synchronize_state and it also causes the issue with BIOS.
> 
> Honestly, I don't know if the tests run qemu with hvf accel. AFAIK
> kvm-unit-tests can be used to test an accel itself.

No, tests do not cover HVF.

Paolo



Re: [Qemu-devel] [PATCH v2 13/13] qemu-iotests: Test the x-blockdev-reopen QMP command

2019-04-10 Thread Max Reitz
On 06.03.19 19:11, Alberto Garcia wrote:
> This patch adds several tests for the x-blockdev-reopen QMP command.
> 
> Signed-off-by: Alberto Garcia 
> ---
>  tests/qemu-iotests/243 | 991 
> +
>  tests/qemu-iotests/243.out |   5 +
>  tests/qemu-iotests/group   |   1 +
>  3 files changed, 997 insertions(+)
>  create mode 100644 tests/qemu-iotests/243
>  create mode 100644 tests/qemu-iotests/243.out
> 
> diff --git a/tests/qemu-iotests/243 b/tests/qemu-iotests/243
> new file mode 100644
> index 00..35b30092ca
> --- /dev/null
> +++ b/tests/qemu-iotests/243
> @@ -0,0 +1,991 @@

[...]

> +# If an image has a backing file then the 'backing' option must be
> +# passed on reopen. We don't allow leaving the option out in this
> +# case because it's unclear what the correct semantics would be.
> +def test_missing_backing_options_1(self):

[...]

> +# hd0 has no backing file: we can omit the 'backing' option
> +self.reopen(opts)

[...]

> +# Detach hd2 from hd0.
> +self.reopen(opts, {'backing': None})
> +self.reopen(opts, {}, "backing is missing for 'hd0'")

I don’t understand the second test.  hd0 has no default backing file and
it currently has no backing child attached to it.  Why would this call
fail now?

I’m asking because this no longer throws an error after “block: Leave
BDS.backing_file constant” of my “block: Deal with filters” series.

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH] hostmem-file: warn when memory-backend-file, share=on and in incoming migration

2019-04-10 Thread Dr. David Alan Gilbert
* Catherine Ho (catherine.h...@gmail.com) wrote:
> Hi Dr. David
> 
> On Wed, 10 Apr 2019 at 22:59, Dr. David Alan Gilbert 
> wrote:
> 
> > * Catherine Ho (catherine.h...@gmail.com) wrote:
> > > Hi Igor
> > >
> > >
> > > On Mon, 8 Apr 2019 at 18:35, Igor Mammedov  wrote:
> > >
> > > > On Sun,  7 Apr 2019 22:19:05 -0400
> > > > Catherine Ho  wrote:
> > > >
> > > > > Currently it is not forbidden to use "-object
> > > > memory-backend-file,share=on"
> > > > > and together with "-incoming". But after incoming migration is
> > finished,
> > > > > the memory-backend-file will be definitely written if share=on. So
> > the
> > > > > memory-backend-file can only be used once, but failed in the 2nd time
> > > > > incoming.
> > > > >
> > > > > Thus it gives a warning and the users can run the qemu if they really
> > > > > want to do it.
> > > >
> > > > Shouldn't we add a migration blocker in such a case instead of warning
> > > > and letting qemu run wild?
> > > >
> > >
> > > IMO, it doesn't need to block this. With share=on and -incoming, the user
> > > can
> > > still save the device memory state into memory-backend file again if
> > > ignore-shared
> > > capability is on.
> > >
> > > If we block this, the user can't use the ignore-shared capability in
> > > incoming
> > > migration.
> >
> > -incomign with share=on is a perfectly normal thing to do - it just
> > depends who you are sharing the file with and the lifetime of that
> > shared file.
> >
> > For example; if you're just running a qemu with vhost-user then you
> > use share=on - however wyou typically select the backend file as
> > a new file from /dev/shm - it's not a file that you previously migrated
> > to.
> >
> Thanks,
> but using a new file from /dev/shm means kernel will start from
> start_kernel or early? Is it different from the x-ignore-shared case?
> If we remove the share=on in incoming migration, all the writting
> of ram will not be flush into the memory backend file. Thus we
> can use the base memory backend file for ever.
> e.g.
> 1) save the vm like a snapshot, current ram state is "kernel
> has been started, systemd has been started"
> 2) restore it with -incoming and *no* share=on flag
> 3) restore it with -incoming and *no* share=on again...
> In contrary, if we use share=on, the base backend file will
> be written at once after 1st time incoming.
> 
> So, IMO, no "share=on" is the proper usage of incoming migration
> when ignore-shared is on.
> Please correct me if sth is wrong, thanks:)

OK, I see what you're trying to do - you mean for the 'snapshotting'
case;  but that's not the only use.  Another use is for being able to
do a very quick upgrade of the running qemu to a new qemu binary
version; and in that case you want to be able to write to the shared
file so that you can repeatedly do the quick migrate.

Dave

> 
> B.R.
> Catherine
> 
> 
> > QEMU has no way to know about the provenance of the shared file it's
> > been given, so we can't really warn people about it.
> >
> > Dave
> >
> > > B.R.
> > > Catherine
> > >
> > > >
> > > >
> > > > > Signed-off-by: Catherine Ho 
> > > > > ---
> > > > >  backends/hostmem-file.c | 11 +++
> > > > >  1 file changed, 11 insertions(+)
> > > > >
> > > > > diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
> > > > > index 37ac6445d2..59429ee0b4 100644
> > > > > --- a/backends/hostmem-file.c
> > > > > +++ b/backends/hostmem-file.c
> > > > > @@ -16,6 +16,7 @@
> > > > >  #include "sysemu/hostmem.h"
> > > > >  #include "sysemu/sysemu.h"
> > > > >  #include "qom/object_interfaces.h"
> > > > > +#include "migration/migration.h"
> > > > >
> > > > >  /* hostmem-file.c */
> > > > >  /**
> > > > > @@ -79,6 +80,16 @@ file_backend_memory_alloc(HostMemoryBackend
> > *backend,
> > > > Error **errp)
> > > > >  }
> > > > >  }
> > > > >
> > > > > +/*
> > > > > + * In ignore shared case, if share=on for host memory backend
> > file,
> > > > > + * the ram might be written after incoming process is finished.
> > Thus
> > > > > + * the memory backend can't be reused for 2nd/3rd... incoming
> > > > > + */
> > > > > +if (backend->share && migrate_ignore_shared()
> > > > > +   && runstate_check(RUN_STATE_INMIGRATE))
> > > > > +warn_report("share=on for memory backend file might be "
> > > > > +"conflicted with incoming in ignore shared
> > > > case");
> > > > > +
> > > > >  backend->force_prealloc = mem_prealloc;
> > > > >  name = host_memory_backend_get_name(backend);
> > > > >  memory_region_init_ram_from_file(>mr, OBJECT(backend),
> > > >
> > > >
> > --
> > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
> >
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-devel] [PATCH for-4.0?] iotests: Let 245 pass on tmpfs

2019-04-10 Thread Peter Maydell
On Wed, 10 Apr 2019 at 17:40, Max Reitz  wrote:
>
> On 10.04.19 18:39, Eric Blake wrote:
> > On 4/10/19 11:29 AM, Max Reitz wrote:
> >> tmpfs does not support O_DIRECT.  Detect this case, and skip flipping
> >> @direct if the filesystem does not support it.
> >>
> >> Fixes: bf3e50f6239090e63a8ffaaec971671e66d88e07
> >> Signed-off-by: Max Reitz 
> >> ---
> >>  tests/qemu-iotests/245 | 10 --
> >>  1 file changed, 8 insertions(+), 2 deletions(-)
> >
> > Are you trying to get this in 4.0-rc3? (As a test, it has no bearing on
> > the actual binaries; fewer testsuite failures are nice if we squeeze it
> > in, but at the same time it is not enough to delay the release if it
> > does not get fixed until 4.1).
> >
> > Reviewed-by: Eric Blake 
>
> If there are other patches for rc3, we could take this as well.  But
> there is no point in making a pull request just for this.

rc3 has been tagged already, I'm afraid. rc4 will be only
if something release-critical crops up.

thanks
-- PMM



Re: [Qemu-devel] [PATCH v5 2/5] virtio-pmem: Add virtio pmem driver

2019-04-10 Thread Cornelia Huck
On Wed, 10 Apr 2019 12:46:12 -0400
"Michael S. Tsirkin"  wrote:

> On Wed, Apr 10, 2019 at 04:31:39PM +0200, Cornelia Huck wrote:
> > On Wed, 10 Apr 2019 10:03:01 -0400 (EDT)
> > Pankaj Gupta  wrote:
> >   
> > > > 
> > > > On Wed, Apr 10, 2019 at 09:38:22AM +0530, Pankaj Gupta wrote:
> > > > > This patch adds virtio-pmem driver for KVM guest.
> > > > > 
> > > > > Guest reads the persistent memory range information from
> > > > > Qemu over VIRTIO and registers it on nvdimm_bus. It also
> > > > > creates a nd_region object with the persistent memory
> > > > > range information so that existing 'nvdimm/pmem' driver
> > > > > can reserve this into system memory map. This way
> > > > > 'virtio-pmem' driver uses existing functionality of pmem
> > > > > driver to register persistent memory compatible for DAX
> > > > > capable filesystems.
> > > > > 
> > > > > This also provides function to perform guest flush over
> > > > > VIRTIO from 'pmem' driver when userspace performs flush
> > > > > on DAX memory range.
> > > > > 
> > > > > Signed-off-by: Pankaj Gupta   
> >   
> > > > > diff --git a/include/uapi/linux/virtio_ids.h
> > > > > b/include/uapi/linux/virtio_ids.h
> > > > > index 6d5c3b2d4f4d..346389565ac1 100644
> > > > > --- a/include/uapi/linux/virtio_ids.h
> > > > > +++ b/include/uapi/linux/virtio_ids.h
> > > > > @@ -43,5 +43,6 @@
> > > > >  #define VIRTIO_ID_INPUT18 /* virtio input */
> > > > >  #define VIRTIO_ID_VSOCK19 /* virtio vsock transport */
> > > > >  #define VIRTIO_ID_CRYPTO   20 /* virtio crypto */
> > > > > +#define VIRTIO_ID_PMEM 25 /* virtio pmem */
> > > > >  
> > > > >  #endif /* _LINUX_VIRTIO_IDS_H */
> > > > 
> > > > Didn't Paolo point out someone is using 25 for audio?
> > > 
> > > 
> > > Sorry! I did not notice this.
> > >   
> > > > 
> > > > Please, reserve an ID with the Virtio TC before using it.
> > > 
> > > I thought I requested[1] ID 25.
> > > 
> > > [1] https://github.com/oasis-tcs/virtio-spec/issues/38
> > > [2] https://lists.oasis-open.org/archives/virtio-dev/201805/threads.html
> > > 
> > > In that case I will send request for next available ID i.e 26?  
> > 
> > Have the folks using this for audio sent a reservation request as well?
> > If not, I'd say the first requester should get the id...  
> 
> No but I think they ship their's already. No one ships pmem
> so it's less pain for everyone if we just skip 25.

Ugh. Ok, then we should change pmem...

> 
> > (And yes, we need to be quicker with voting on/applying id
> > reservations :/)  
> 
> We can't vote on what was not proposed for a vote.
> At the moment that responsibility is with the commented
> once discussion on the comment has taken place.
> 
> I think what's missing is a description of the process
> in the README. Want to write it up and post it?

I can add that to my TODO list, can't promise speedy resolution, though.



Re: [Qemu-devel] [Qemu-block] [RFC PATCH] aio: Add a knob to always poll if there are in-flight requests

2019-04-10 Thread Sergio Lopez

Stefan Hajnoczi  writes:

> On Tue, Apr 02, 2019 at 02:19:08PM +0200, Sergio Lopez wrote:
>> The polling mode in aio_poll is able to trim down ~20us on the average
>> request latency, but it needs manual fine tuning to adjust it to the
>> characteristics of the storage.
>> 
>> Here we add a new knob to the IOThread object, "poll-inflight". When
>> this knob is enabled, aio_poll will always use polling if there are
>> in-flight requests, ignoring the rest of poll-* parameters. If there
>> aren't any in-flight requests, the usual polling rules apply, which is
>> useful given that the default poll-max-ns value of 32us is usually
>> enough to catch a new request in the VQ when the Guest is putting
>> pressure on us.
>> 
>> To keep track of the number of in-flight requests, AioContext has a
>> new counter which is increased/decreased by thread-pool.c and
>> linux-aio.c on request submission/completion.
>> 
>> With poll-inflight, users willing to spend more Host CPU resources in
>> exchange for a lower latency just need to enable a single knob.
>> 
>> This is just an initial version of this feature and I'm just sharing
>> it to get some early feedback. As such, managing this property through
>> QAPI is not yet implemented.
>> 
>> Signed-off-by: Sergio Lopez 
>> ---
>>  block/linux-aio.c |  7 +++
>>  include/block/aio.h   |  9 -
>>  include/sysemu/iothread.h |  1 +
>>  iothread.c| 33 +
>>  util/aio-posix.c  | 32 +++-
>>  util/thread-pool.c|  3 +++
>>  6 files changed, 83 insertions(+), 2 deletions(-)
>
> Hi Sergio,
> More polling modes are useful for benchmarking and performance analysis.
> From this perspective I think poll-inflight is worthwhile.
>
> Like most performance optimizations, the effectiveness of this new
> polling mode depends on the workload.  It could waste CPU, especially on
> a queue depth 1 workload with a slow disk.
>
> Do you think better self-tuning is possible?  Then users don't need to
> set tunables like this one.

Probably only if we aim for some more complex, which will have its own
inherent costs. We could take inspiration from Linux's io_poll hybrid
mode, which maintains per-device statistics to calculate the average
latency, to take a nap for half that time and free the CPU a bit.

Of course, our case is significantly harder. The kernel only deals with
the HW, and only a few devices do have support for io_poll. In our case,
the IOThread may be shared among various devices with radically
different backends, which may also have a wide range of latencies
(depending on the underlying storage, file format, cache mode...). But
perhaps we can try to be clever and calculate the standard deviation
of the collected data to (in)validate the stats.

There are also some implementation challenges, as deciding where to
store those stats and designing an interface for aio_poll to access
that information, preferably in a lockless fashion.

If we can figure those out, we should be able to iterate over all the
BDSs sharing the AioContext, using the average latency (if valid),
combined with a timestamp from when the first in-flight request was
issued, to calculate a deadline and, with it, decide if we should either
take a nap using ppoll() with a timeout calculated to be wake up early
enough to catch the completion while polling, or just enter polling mode
for a while.

Perhaps it'd be worth doing a simple PoC outside QEMU, using the
vhost-user-blk example server to avoid the block layer complexity and
evaluate the raw benefits with different kinds of backends and
workloads.

Thanks,
Sergio.


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH v5 2/5] virtio-pmem: Add virtio pmem driver

2019-04-10 Thread Michael S. Tsirkin
On Wed, Apr 10, 2019 at 04:31:39PM +0200, Cornelia Huck wrote:
> On Wed, 10 Apr 2019 10:03:01 -0400 (EDT)
> Pankaj Gupta  wrote:
> 
> > > 
> > > On Wed, Apr 10, 2019 at 09:38:22AM +0530, Pankaj Gupta wrote:  
> > > > This patch adds virtio-pmem driver for KVM guest.
> > > > 
> > > > Guest reads the persistent memory range information from
> > > > Qemu over VIRTIO and registers it on nvdimm_bus. It also
> > > > creates a nd_region object with the persistent memory
> > > > range information so that existing 'nvdimm/pmem' driver
> > > > can reserve this into system memory map. This way
> > > > 'virtio-pmem' driver uses existing functionality of pmem
> > > > driver to register persistent memory compatible for DAX
> > > > capable filesystems.
> > > > 
> > > > This also provides function to perform guest flush over
> > > > VIRTIO from 'pmem' driver when userspace performs flush
> > > > on DAX memory range.
> > > > 
> > > > Signed-off-by: Pankaj Gupta 
> 
> > > > diff --git a/include/uapi/linux/virtio_ids.h
> > > > b/include/uapi/linux/virtio_ids.h
> > > > index 6d5c3b2d4f4d..346389565ac1 100644
> > > > --- a/include/uapi/linux/virtio_ids.h
> > > > +++ b/include/uapi/linux/virtio_ids.h
> > > > @@ -43,5 +43,6 @@
> > > >  #define VIRTIO_ID_INPUT18 /* virtio input */
> > > >  #define VIRTIO_ID_VSOCK19 /* virtio vsock transport */
> > > >  #define VIRTIO_ID_CRYPTO   20 /* virtio crypto */
> > > > +#define VIRTIO_ID_PMEM 25 /* virtio pmem */
> > > >  
> > > >  #endif /* _LINUX_VIRTIO_IDS_H */  
> > > 
> > > Didn't Paolo point out someone is using 25 for audio?  
> > 
> > 
> > Sorry! I did not notice this.
> > 
> > > 
> > > Please, reserve an ID with the Virtio TC before using it.  
> > 
> > I thought I requested[1] ID 25.
> > 
> > [1] https://github.com/oasis-tcs/virtio-spec/issues/38
> > [2] https://lists.oasis-open.org/archives/virtio-dev/201805/threads.html
> > 
> > In that case I will send request for next available ID i.e 26?
> 
> Have the folks using this for audio sent a reservation request as well?
> If not, I'd say the first requester should get the id...

No but I think they ship their's already. No one ships pmem
so it's less pain for everyone if we just skip 25.

> (And yes, we need to be quicker with voting on/applying id
> reservations :/)

We can't vote on what was not proposed for a vote.
At the moment that responsibility is with the commented
once discussion on the comment has taken place.

I think what's missing is a description of the process
in the README. Want to write it up and post it?

-- 
MST



Re: [Qemu-devel] [PATCH for-4.0?] iotests: Let 245 pass on tmpfs

2019-04-10 Thread Max Reitz
On 10.04.19 18:39, Eric Blake wrote:
> On 4/10/19 11:29 AM, Max Reitz wrote:
>> tmpfs does not support O_DIRECT.  Detect this case, and skip flipping
>> @direct if the filesystem does not support it.
>>
>> Fixes: bf3e50f6239090e63a8ffaaec971671e66d88e07
>> Signed-off-by: Max Reitz 
>> ---
>>  tests/qemu-iotests/245 | 10 --
>>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> Are you trying to get this in 4.0-rc3? (As a test, it has no bearing on
> the actual binaries; fewer testsuite failures are nice if we squeeze it
> in, but at the same time it is not enough to delay the release if it
> does not get fixed until 4.1).
> 
> Reviewed-by: Eric Blake 

If there are other patches for rc3, we could take this as well.  But
there is no point in making a pull request just for this.

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH for-4.0?] iotests: Let 245 pass on tmpfs

2019-04-10 Thread Eric Blake
On 4/10/19 11:29 AM, Max Reitz wrote:
> tmpfs does not support O_DIRECT.  Detect this case, and skip flipping
> @direct if the filesystem does not support it.
> 
> Fixes: bf3e50f6239090e63a8ffaaec971671e66d88e07
> Signed-off-by: Max Reitz 
> ---
>  tests/qemu-iotests/245 | 10 --
>  1 file changed, 8 insertions(+), 2 deletions(-)

Are you trying to get this in 4.0-rc3? (As a test, it has no bearing on
the actual binaries; fewer testsuite failures are nice if we squeeze it
in, but at the same time it is not enough to delay the release if it
does not get fixed until 4.1).

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [PATCH] target/riscv: Remove unused include of riscv_htif.h for virt board

2019-04-10 Thread Jonathan Behrens
Unless I'm missing something, the virt board doesn't support HTIF and
should not be including this header.

Jonathan

Signed-off-by: Jonathan Behrens 
---
 hw/riscv/virt.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
index fc4c6b306e..3526463034 100644
--- a/hw/riscv/virt.c
+++ b/hw/riscv/virt.c
@@ -29,7 +29,6 @@
 #include "hw/sysbus.h"
 #include "hw/char/serial.h"
 #include "target/riscv/cpu.h"
-#include "hw/riscv/riscv_htif.h"
 #include "hw/riscv/riscv_hart.h"
 #include "hw/riscv/sifive_plic.h"
 #include "hw/riscv/sifive_clint.h"
-- 
2.20.1


[Qemu-devel] [PATCH] iotests: Let 245 pass on tmpfs

2019-04-10 Thread Max Reitz
tmpfs does not support O_DIRECT.  Detect this case, and skip flipping
@direct if the filesystem does not support it.

Fixes: bf3e50f6239090e63a8ffaaec971671e66d88e07
Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/245 | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/245 b/tests/qemu-iotests/245
index 7891a210c1..a04c6235c1 100644
--- a/tests/qemu-iotests/245
+++ b/tests/qemu-iotests/245
@@ -209,6 +209,12 @@ class TestBlockdevReopen(iotests.QMPTestCase):
 
 # Reopen an image several times changing some of its options
 def test_reopen(self):
+# Check whether the filesystem supports O_DIRECT
+if 'O_DIRECT' in qemu_io('-f', 'raw', '-t', 'none', '-c', 'quit', 
hd_path[0]):
+supports_direct = False
+else:
+supports_direct = True
+
 # Open the hd1 image passing all backing options
 opts = hd_opts(1)
 opts['backing'] = hd_opts(0)
@@ -231,9 +237,9 @@ class TestBlockdevReopen(iotests.QMPTestCase):
 self.assert_qmp(self.get_node('hd1'), 'cache/writeback', True)
 self.assert_qmp(self.get_node('hd1'), 'cache/direct', False)
 self.assert_qmp(self.get_node('hd1'), 'cache/no-flush', False)
-self.reopen(opts, {'cache': { 'direct': True, 'no-flush': True }})
+self.reopen(opts, {'cache': { 'direct': supports_direct, 'no-flush': 
True }})
 self.assert_qmp(self.get_node('hd1'), 'cache/writeback', True)
-self.assert_qmp(self.get_node('hd1'), 'cache/direct', True)
+self.assert_qmp(self.get_node('hd1'), 'cache/direct', supports_direct)
 self.assert_qmp(self.get_node('hd1'), 'cache/no-flush', True)
 
 # Reopen again with the original options
-- 
2.20.1




Re: [Qemu-devel] [PATCH 2/3] xen-bus: allow AioContext to be specified for each event channel

2019-04-10 Thread Paul Durrant
> -Original Message-
> From: Anthony PERARD [mailto:anthony.per...@citrix.com]
> Sent: 10 April 2019 16:23
> To: Paul Durrant 
> Cc: qemu-devel@nongnu.org; qemu-bl...@nongnu.org; 
> xen-de...@lists.xenproject.org; Stefano Stabellini
> ; Stefan Hajnoczi ; Kevin Wolf 
> ; Max
> Reitz 
> Subject: Re: [PATCH 2/3] xen-bus: allow AioContext to be specified for each 
> event channel
> 
> On Wed, Apr 10, 2019 at 04:20:05PM +0100, Paul Durrant wrote:
> > > -Original Message-
> > > From: Anthony PERARD [mailto:anthony.per...@citrix.com]
> > > Sent: 10 April 2019 13:57
> > > To: Paul Durrant 
> > > Cc: qemu-devel@nongnu.org; qemu-bl...@nongnu.org; 
> > > xen-de...@lists.xenproject.org; Stefano
> Stabellini
> > > ; Stefan Hajnoczi ; Kevin 
> > > Wolf ;
> Max
> > > Reitz 
> > > Subject: Re: [PATCH 2/3] xen-bus: allow AioContext to be specified for 
> > > each event channel
> > >
> > > On Mon, Apr 08, 2019 at 04:16:16PM +0100, Paul Durrant wrote:
> > > > This patch adds an AioContext parameter to 
> > > > xen_device_bind_event_channel()
> > > > and then uses aio_set_fd_handler() to set the callback rather than
> > > > qemu_set_fd_handler().
> > > >
> > > > Signed-off-by: Paul Durrant 
> > > > ---
> > > > @@ -943,6 +944,7 @@ static void xen_device_event(void *opaque)
> > > >  }
> > > >
> > > >  XenEventChannel *xen_device_bind_event_channel(XenDevice *xendev,
> > > > +   AioContext *ctx,
> > > > unsigned int port,
> > > > XenEventHandler handler,
> > > > void *opaque, Error 
> > > > **errp)
> > > > @@ -968,8 +970,9 @@ XenEventChannel 
> > > > *xen_device_bind_event_channel(XenDevice *xendev,
> > > >  channel->handler = handler;
> > > >  channel->opaque = opaque;
> > > >
> > > > -qemu_set_fd_handler(xenevtchn_fd(channel->xeh), xen_device_event, 
> > > > NULL,
> > > > -channel);
> > > > +channel->ctx = ctx;
> > > > +aio_set_fd_handler(channel->ctx, xenevtchn_fd(channel->xeh), false,
> > > > +   xen_device_event, NULL, NULL, channel);
> > >
> > > I wonder if the `'is_external' parameter of aio_set_fd_handler shoud be
> > > `true' here, instead. That flag seems to be used when making a snapshot
> > > of a blockdev, for example.
> > >
> > > That was introduced by:
> > > dca21ef23ba48f6f1428c59f295a857e5dc203c8^..c07bc2c1658fffeee08eb46402b2f66d55b07586
> > >
> > > What do you think?
> >
> > Interesting. I admit I was merely transcribing what qemu_set_fd_handler() 
> > passes without really
> looking into the values. Looking at the arguments that virtio-blk passes to 
> aio_set_event_notifier()
> though, and what 'is_external' means, it would appear that setting it to true 
> is probably the right
> thing to do. Do you want me to send a v2 of the series or can you fix it up?
> 
> I'll fix that up.

Thanks,

  Paul

> 
> Reviewed-by: Anthony PERARD 
> 
> Thanks,
> 
> --
> Anthony PERARD



Re: [Qemu-devel] Questions about acpi interrupt link device's ‘_PRS' field

2019-04-10 Thread Paolo Bonzini
On 10/04/19 16:33, Li Qiang wrote:
> Hi all,
> 
>  
> 
> I see the link device ‘_PRS’  uses irq line 5, 10, 11 in
> ‘build_link_dev’ function.
> 
> But I never see the 5 lines uses in the guest, just uses 10 and 11.
> 
> Why this happen?  Maybe related with the guest?

Because the MADT table tells the guest to only use lines 10 and 11.  The
BIOS configures the chipset that way.

Paolo




Re: [Qemu-devel] [PATCH] xen-block: support feature-large-sector-size

2019-04-10 Thread Paul Durrant
> -Original Message-
> From: Anthony PERARD [mailto:anthony.per...@citrix.com]
> Sent: 10 April 2019 16:52
> To: Paul Durrant 
> Cc: qemu-devel@nongnu.org; xen-de...@lists.xenproject.org; 
> qemu-bl...@nongnu.org; Stefano Stabellini
> ; Stefan Hajnoczi ; Kevin Wolf 
> ; Max
> Reitz 
> Subject: Re: [PATCH] xen-block: support feature-large-sector-size
> 
> On Tue, Apr 09, 2019 at 05:40:38PM +0100, Paul Durrant wrote:
> > A recent Xen commit [1] clarified the semantics of sector based quantities
> > used in the blkif protocol such that it is now safe to create a xen-block
> > device with a logical_block_size != 512, as long as the device only
> > connects to a frontend advertizing 'feature-large-block-size'.
> >
> > This patch modifies xen-block accordingly. It also uses a stack variable
> > for the BlockBackend in xen_block_realize() to avoid repeated dereferencing
> > of the BlockConf pointer, and changes the parameters of
> > xen_block_dataplane_create() so that the BlockBackend pointer and sector
> > size are passed expicitly rather than implicitly via the BlockConf.
> >
> > These modifications have been tested against a recent Windows PV XENVBD
> > driver [2] using a xen-disk device with a 4kB logical block size.
> >
> > [1] 
> > http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=67e1c050e36b2c9900cca83618e56189effbad98
> > [2] https://winpvdrvbuild.xenproject.org:8080/job/XENVBD-master/126
> >
> > Signed-off-by: Paul Durrant 
> > ---
> > diff --git a/hw/block/xen-block.c b/hw/block/xen-block.c
> > index ef635be4c2..05e890ad78 100644
> > --- a/hw/block/xen-block.c
> > +++ b/hw/block/xen-block.c
> > @@ -51,11 +51,25 @@ static void xen_block_connect(XenDevice *xendev, Error 
> > **errp)
> [...]
> > +if (xen_device_frontend_scanf(xendev, "feature-large-sector-size", 
> > "%u",
> > +  _large_sector_size) != 1) {
> > +feature_large_sector_size = 0;
> > +}
> > +
> > +if (feature_large_sector_size != 1 &&
> > +conf->logical_block_size != XEN_BLKIF_SECTOR_SIZE) {
> > +error_setg(errp, "logical_block_size != %u not supported",
> 
> Maybe add "by frontend" to the error message?

Yes, I'm fine with that addition.

> 
> > +   XEN_BLKIF_SECTOR_SIZE);
> > +return;
> > +}
> > +
> 
> With the question answered:
> Reviewed-by: Anthony PERARD 
> 

Thanks,

  Paul

> Thanks,
> 
> --
> Anthony PERARD



Re: [Qemu-devel] [PATCH] xen-block: support feature-large-sector-size

2019-04-10 Thread Anthony PERARD
On Tue, Apr 09, 2019 at 05:40:38PM +0100, Paul Durrant wrote:
> A recent Xen commit [1] clarified the semantics of sector based quantities
> used in the blkif protocol such that it is now safe to create a xen-block
> device with a logical_block_size != 512, as long as the device only
> connects to a frontend advertizing 'feature-large-block-size'.
> 
> This patch modifies xen-block accordingly. It also uses a stack variable
> for the BlockBackend in xen_block_realize() to avoid repeated dereferencing
> of the BlockConf pointer, and changes the parameters of
> xen_block_dataplane_create() so that the BlockBackend pointer and sector
> size are passed expicitly rather than implicitly via the BlockConf.
> 
> These modifications have been tested against a recent Windows PV XENVBD
> driver [2] using a xen-disk device with a 4kB logical block size.
> 
> [1] 
> http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=67e1c050e36b2c9900cca83618e56189effbad98
> [2] https://winpvdrvbuild.xenproject.org:8080/job/XENVBD-master/126
> 
> Signed-off-by: Paul Durrant 
> ---
> diff --git a/hw/block/xen-block.c b/hw/block/xen-block.c
> index ef635be4c2..05e890ad78 100644
> --- a/hw/block/xen-block.c
> +++ b/hw/block/xen-block.c
> @@ -51,11 +51,25 @@ static void xen_block_connect(XenDevice *xendev, Error 
> **errp)
[...]
> +if (xen_device_frontend_scanf(xendev, "feature-large-sector-size", "%u",
> +  _large_sector_size) != 1) {
> +feature_large_sector_size = 0;
> +}
> +
> +if (feature_large_sector_size != 1 &&
> +conf->logical_block_size != XEN_BLKIF_SECTOR_SIZE) {
> +error_setg(errp, "logical_block_size != %u not supported",

Maybe add "by frontend" to the error message?

> +   XEN_BLKIF_SECTOR_SIZE);
> +return;
> +}
> +

With the question answered:
Reviewed-by: Anthony PERARD 

Thanks,

-- 
Anthony PERARD



[Qemu-devel] [PATCH for-4.0-maybe] device_tree: Fix integer overflowing in load_device_tree()

2019-04-10 Thread Philippe Mathieu-Daudé
On 4/10/19 8:34 AM, Alistair Francis wrote:
> On Tue, Apr 9, 2019 at 10:59 PM Markus Armbruster  wrote:
>> Philippe Mathieu-Daudé  writes:
>>> On 4/10/19 7:28 AM, Markus Armbruster wrote:
 Philippe Mathieu-Daudé  writes:
> On 4/9/19 7:40 PM, Markus Armbruster wrote:
>> If the value of get_image_size() exceeds INT_MAX / 2 - 1, the
>> computation of @dt_size overflows to a negative number, which then
>> gets converted to a very large size_t for g_malloc0() and
>> load_image_size().  In the (fortunately improbable) case g_malloc0()
>> succeeds and load_image_size() survives, we'd assign the negative
>> number to *sizep.  What that would do to the callers I can't say, but
>> it's unlikely to be good.
>>
>> Fix by rejecting images whose size would overflow.
>>
>> Signed-off-by: Markus Armbruster 
>> ---
>>  device_tree.c | 4 
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/device_tree.c b/device_tree.c
>> index 296278e12a..f8b46b3c73 100644
>> --- a/device_tree.c
>> +++ b/device_tree.c
>> @@ -84,6 +84,10 @@ void *load_device_tree(const char *filename_path, int 
>> *sizep)
>>   filename_path);
>>  goto fail;
>>  }
>> +if (dt_size > INT_MAX / 2 - 1) {
>
> We should avoid magic number duplication.
> That said, this patch looks safe.
>
> Reviewed-by: Philippe Mathieu-Daudé 

 Thanks!

> BTW how did you figure that out?

 Downstream handling of upstream commit da885fe1ee8 led me to the
 function.  I spotted dt_size = get_image_size(filename_path).
 Experience has taught me to check the left hand side's type.  Bad.  Then
 I saw how dt_size gets increased.  Worse.
>>>
>>> So you genuinely neglected to mention Kurtis Miller then :)
>>
>> Explanation, not excuse: the only occurence of the name in my downstream
>> reading was a two-liner BZ comment, which I totally missed in my haste
>> to give the fix a chance to make 4.0.  I certainly didn't mean to
>> deprive him of credit!

My English teacher explained me neglected sounds like a
reprimand/reproach (as in negligence), sorry I didn't mean to be rude
here, I wanted to say something like "Peter remarked you did gave
credits to the first one who reported this issue, but since you figured
this bug via your own way, the omission was not intentional then."

> No worries. I have sent the pull request and it includes the reported by.
> 
> Alistair
> 
>>
>> [...]
>>



Re: [Qemu-devel] [RFC-PATCH] Introducing virtio-example device.

2019-04-10 Thread Yoni Bettan



On 4/9/19 4:17 PM, Stefan Hajnoczi wrote:

On Mon, Apr 01, 2019 at 02:18:43PM +0300, Yoni Bettan wrote:

The main goal is to add an example device to Qemu to be used as template or
guideline for contributors when they wish to create a new virtio device.

Another reason for this device is to document "the right way" to write
a new virtio device in Qemu.

This device is a simple device and its functionality is to increase its input
by 1.

The device driver is located at:
https://github.com/ybettan/QemuDeviceDrivers.git

In addition I am writing a blog to give a logical overview of the virtio
protocol and a step-by-step guide to write a new virtio device.
This blog can be found at https://howtovms.wordpress.com.

scripts/checkpatch.pl have some errors do to "//" (old style one line comment),
those lines contains FIXMEs for the next version and will be removed.

Signed-off-by: Yoni Bettan 



Hi Stefan and thank you for your review.


Where is the specification for this device?  The lack of specification
is already not "the right way".




Another step in this process is to write a specification for this 
device. Since I am learning the virtio protocol while implementing this 
example device it was easier for me to start with the device and from 
there write its specification but take into consideration that the 
specification will be written soon.





There are multiple problems with the code, but the larger issue is that
this example device is just helping people shoot themselves in the foot
more easily.



If you can point me to those problem I will be glad so I can update the 
code and understand those problems you are talking about.





The difficulty with VIRTIO is not how to implement devices/drivers, it's
that people don't read the specification and therefore do not understand
the device model properly.  The spec is dry and missing information in
some places.  I think more accessible documentation, like your blog, can
help here.



As I said, the blog will be updated with explanations on each step of 
the communication between the device and the driver and the reason it is 
not there yet is because I preferred getting some reviews, make the code 
better and only then documenting "the write way" to not mislead peoples.





If you would like to educate people about VIRTIO, then explaining the
device model, lifecycle, how to change a device in a
backwards-compatible way, virtqueue semantics, etc are the topics that
deserve attention.

For someone who has learnt these topics, implementing the device/driver
is not hard.  For someone who doesn't understand these topics, no
example device will help.  At best they will copy-paste together
something that sort of works but has issues.



For me, reading the specification and even reading some code examples 
over the internet didn't made me understand it until I saw the related 
code so I agree with you it is not enough yet but all the other parts 
are on there way.



The final purpose is to have:

1. device specification

2. device implementation

3. device driver

4. blog

maybe I should have written it at the beginning, this is not the entire 
project but it is its start.





Stefan



Thanks again for your review.




Re: [Qemu-devel] [PATCH v5 2/5] virtio-pmem: Add virtio pmem driver

2019-04-10 Thread Pankaj Gupta


> > 
> > > This patch adds virtio-pmem driver for KVM guest.
> > > 
> > > Guest reads the persistent memory range information from
> > > Qemu over VIRTIO and registers it on nvdimm_bus. It also
> > > creates a nd_region object with the persistent memory
> > > range information so that existing 'nvdimm/pmem' driver
> > > can reserve this into system memory map. This way
> > > 'virtio-pmem' driver uses existing functionality of pmem
> > > driver to register persistent memory compatible for DAX
> > > capable filesystems.
> > > 
> > > This also provides function to perform guest flush over
> > > VIRTIO from 'pmem' driver when userspace performs flush
> > > on DAX memory range.
> > > 
> > > Signed-off-by: Pankaj Gupta 
> > > ---
> > >  drivers/nvdimm/virtio_pmem.c |  88 ++
> > >  drivers/virtio/Kconfig   |  10 +++
> > >  drivers/virtio/Makefile  |   1 +
> > >  drivers/virtio/pmem.c| 124 +++
> > >  include/linux/virtio_pmem.h  |  60 +++
> > >  include/uapi/linux/virtio_ids.h  |   1 +
> > >  include/uapi/linux/virtio_pmem.h |  10 +++
> > >  7 files changed, 294 insertions(+)
> > >  create mode 100644 drivers/nvdimm/virtio_pmem.c
> > >  create mode 100644 drivers/virtio/pmem.c
> > >  create mode 100644 include/linux/virtio_pmem.h
> > >  create mode 100644 include/uapi/linux/virtio_pmem.h
> > > 
> > (...)
> > > diff --git a/drivers/virtio/pmem.c b/drivers/virtio/pmem.c
> > > new file mode 100644
> > > index ..cc9de9589d56
> > > --- /dev/null
> > > +++ b/drivers/virtio/pmem.c
> > > @@ -0,0 +1,124 @@
> > > +// SPDX-License-Identifier: GPL-2.0
> > > +/*
> > > + * virtio_pmem.c: Virtio pmem Driver
> > > + *
> > > + * Discovers persistent memory range information
> > > + * from host and registers the virtual pmem device
> > > + * with libnvdimm core.
> > > + */
> > > +#include 
> > > +#include <../../drivers/nvdimm/nd.h>
> > > +
> > > +static struct virtio_device_id id_table[] = {
> > > + { VIRTIO_ID_PMEM, VIRTIO_DEV_ANY_ID },
> > > + { 0 },
> > > +};
> > > +
> > > + /* Initialize virt queue */
> > > +static int init_vq(struct virtio_pmem *vpmem)
> > 
> > IMHO, you don't gain much by splitting off this function...
> 
> It make sense to have all the vq-init-related stuff in one function, so
> here pmem_lock and req_list are used only for the vq.

Yes.

> Saying that - maybe it would be better to have the 3 in one struct.
> 
> > 
> > > +{
> > > + struct virtqueue *vq;
> > > +
> > > + /* single vq */
> > > + vpmem->req_vq = vq = virtio_find_single_vq(vpmem->vdev,
> > > + host_ack, "flush_queue");
> > > + if (IS_ERR(vq))
> > > + return PTR_ERR(vq);
> > 
> > I'm personally not a fan of chained assignments... I think I'd just
> > drop the 'vq' variable and operate on vpmem->req_vq directly.
> 
> +1

Will drop extra vq.

> 
> > 
> > > +
> > > + spin_lock_init(>pmem_lock);
> > > + INIT_LIST_HEAD(>req_list);
> > > +
> > > + return 0;
> > > +};
> > > +
> > > +static int virtio_pmem_probe(struct virtio_device *vdev)
> > > +{
> > > + int err = 0;
> > > + struct resource res;
> > > + struct virtio_pmem *vpmem;
> > > + struct nvdimm_bus *nvdimm_bus;
> > > + struct nd_region_desc ndr_desc = {};
> > > + int nid = dev_to_node(>dev);
> > > + struct nd_region *nd_region;
> > > +
> > > + if (!vdev->config->get) {
> > > + dev_err(>dev, "%s failure: config disabled\n",
> > 
> > Maybe s/config disabled/config access disabled/ ? That seems to be the
> > more common message.
> > 
> > > + __func__);
> > > + return -EINVAL;
> > > + }
> > > +
> > > + vdev->priv = vpmem = devm_kzalloc(>dev, sizeof(*vpmem),
> > > + GFP_KERNEL);
> > 
> > Here, the vpmem variable makes sense for convenience, but I'm again not
> > a fan of the chaining :)
> 
> +1

here as well.

> 
> > 
> > > + if (!vpmem) {
> > > + err = -ENOMEM;
> > > + goto out_err;
> > > + }
> > > +
> > > + vpmem->vdev = vdev;
> > > + err = init_vq(vpmem);
> > > + if (err)
> > > + goto out_err;
> > > +
> > > + virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> > > + start, >start);
> > > + virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> > > + size, >size);
> > > +
> > > + res.start = vpmem->start;
> > > + res.end   = vpmem->start + vpmem->size-1;
> > > + vpmem->nd_desc.provider_name = "virtio-pmem";
> > > + vpmem->nd_desc.module = THIS_MODULE;
> > > +
> > > + vpmem->nvdimm_bus = nvdimm_bus = nvdimm_bus_register(>dev,
> > > + >nd_desc);
> > 
> > And here :)
> > 
> > > + if (!nvdimm_bus)
> > > + goto out_vq;
> > > +
> > > + dev_set_drvdata(>dev, nvdimm_bus);
> > > +
> > > + ndr_desc.res = 
> > > + ndr_desc.numa_node = nid;
> > > + ndr_desc.flush = virtio_pmem_flush;
> > > + set_bit(ND_REGION_PAGEMAP, _desc.flags);
> > > + set_bit(ND_REGION_ASYNC, _desc.flags);
> > > + nd_region = 

Re: [Qemu-devel] [PATCH v5 2/5] virtio-pmem: Add virtio pmem driver

2019-04-10 Thread Pankaj Gupta


> 
> > This patch adds virtio-pmem driver for KVM guest.
> > 
> > Guest reads the persistent memory range information from
> > Qemu over VIRTIO and registers it on nvdimm_bus. It also
> > creates a nd_region object with the persistent memory
> > range information so that existing 'nvdimm/pmem' driver
> > can reserve this into system memory map. This way
> > 'virtio-pmem' driver uses existing functionality of pmem
> > driver to register persistent memory compatible for DAX
> > capable filesystems.
> > 
> > This also provides function to perform guest flush over
> > VIRTIO from 'pmem' driver when userspace performs flush
> > on DAX memory range.
> > 
> > Signed-off-by: Pankaj Gupta 
> > ---
> >  drivers/nvdimm/virtio_pmem.c |  88 ++
> >  drivers/virtio/Kconfig   |  10 +++
> >  drivers/virtio/Makefile  |   1 +
> >  drivers/virtio/pmem.c| 124 +++
> >  include/linux/virtio_pmem.h  |  60 +++
> >  include/uapi/linux/virtio_ids.h  |   1 +
> >  include/uapi/linux/virtio_pmem.h |  10 +++
> >  7 files changed, 294 insertions(+)
> >  create mode 100644 drivers/nvdimm/virtio_pmem.c
> >  create mode 100644 drivers/virtio/pmem.c
> >  create mode 100644 include/linux/virtio_pmem.h
> >  create mode 100644 include/uapi/linux/virtio_pmem.h
> > 
> (...)
> > diff --git a/drivers/virtio/pmem.c b/drivers/virtio/pmem.c
> > new file mode 100644
> > index ..cc9de9589d56
> > --- /dev/null
> > +++ b/drivers/virtio/pmem.c
> > @@ -0,0 +1,124 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * virtio_pmem.c: Virtio pmem Driver
> > + *
> > + * Discovers persistent memory range information
> > + * from host and registers the virtual pmem device
> > + * with libnvdimm core.
> > + */
> > +#include 
> > +#include <../../drivers/nvdimm/nd.h>
> > +
> > +static struct virtio_device_id id_table[] = {
> > +   { VIRTIO_ID_PMEM, VIRTIO_DEV_ANY_ID },
> > +   { 0 },
> > +};
> > +
> > + /* Initialize virt queue */
> > +static int init_vq(struct virtio_pmem *vpmem)
> 
> IMHO, you don't gain much by splitting off this function...

o.k. I kept this for better code structure.

> 
> > +{
> > +   struct virtqueue *vq;
> > +
> > +   /* single vq */
> > +   vpmem->req_vq = vq = virtio_find_single_vq(vpmem->vdev,
> > +   host_ack, "flush_queue");
> > +   if (IS_ERR(vq))
> > +   return PTR_ERR(vq);
> 
> I'm personally not a fan of chained assignments... I think I'd just
> drop the 'vq' variable and operate on vpmem->req_vq directly.

Sure.

> 
> > +
> > +   spin_lock_init(>pmem_lock);
> > +   INIT_LIST_HEAD(>req_list);
> > +
> > +   return 0;
> > +};
> > +
> > +static int virtio_pmem_probe(struct virtio_device *vdev)
> > +{
> > +   int err = 0;
> > +   struct resource res;
> > +   struct virtio_pmem *vpmem;
> > +   struct nvdimm_bus *nvdimm_bus;
> > +   struct nd_region_desc ndr_desc = {};
> > +   int nid = dev_to_node(>dev);
> > +   struct nd_region *nd_region;
> > +
> > +   if (!vdev->config->get) {
> > +   dev_err(>dev, "%s failure: config disabled\n",
> 
> Maybe s/config disabled/config access disabled/ ? That seems to be the
> more common message.

This is better.

> 
> > +   __func__);
> > +   return -EINVAL;
> > +   }
> > +
> > +   vdev->priv = vpmem = devm_kzalloc(>dev, sizeof(*vpmem),
> > +   GFP_KERNEL);
> 
> Here, the vpmem variable makes sense for convenience, but I'm again not
> a fan of the chaining :)

Sure will change :)

> 
> > +   if (!vpmem) {
> > +   err = -ENOMEM;
> > +   goto out_err;
> > +   }
> > +
> > +   vpmem->vdev = vdev;
> > +   err = init_vq(vpmem);
> > +   if (err)
> > +   goto out_err;
> > +
> > +   virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> > +   start, >start);
> > +   virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> > +   size, >size);
> > +
> > +   res.start = vpmem->start;
> > +   res.end   = vpmem->start + vpmem->size-1;
> > +   vpmem->nd_desc.provider_name = "virtio-pmem";
> > +   vpmem->nd_desc.module = THIS_MODULE;
> > +
> > +   vpmem->nvdimm_bus = nvdimm_bus = nvdimm_bus_register(>dev,
> > +   >nd_desc);
> 
> And here :)

Sure.

> 
> > +   if (!nvdimm_bus)
> > +   goto out_vq;
> > +
> > +   dev_set_drvdata(>dev, nvdimm_bus);
> > +
> > +   ndr_desc.res = 
> > +   ndr_desc.numa_node = nid;
> > +   ndr_desc.flush = virtio_pmem_flush;
> > +   set_bit(ND_REGION_PAGEMAP, _desc.flags);
> > +   set_bit(ND_REGION_ASYNC, _desc.flags);
> > +   nd_region = nvdimm_pmem_region_create(nvdimm_bus, _desc);
> > +   nd_region->provider_data =  dev_to_virtio
> > +   (nd_region->dev.parent->parent);
> 
> Isn't it clear that this parent chain will always end up at >dev?

Yes, It will resolve to >dev.

> Maybe simply set ->provider_data to vdev directly? (Does it need to
> grab a 

Re: [Qemu-devel] [PATCH] hostmem-file: warn when memory-backend-file, share=on and in incoming migration

2019-04-10 Thread Catherine Ho
Hi Dr. David

On Wed, 10 Apr 2019 at 22:59, Dr. David Alan Gilbert 
wrote:

> * Catherine Ho (catherine.h...@gmail.com) wrote:
> > Hi Igor
> >
> >
> > On Mon, 8 Apr 2019 at 18:35, Igor Mammedov  wrote:
> >
> > > On Sun,  7 Apr 2019 22:19:05 -0400
> > > Catherine Ho  wrote:
> > >
> > > > Currently it is not forbidden to use "-object
> > > memory-backend-file,share=on"
> > > > and together with "-incoming". But after incoming migration is
> finished,
> > > > the memory-backend-file will be definitely written if share=on. So
> the
> > > > memory-backend-file can only be used once, but failed in the 2nd time
> > > > incoming.
> > > >
> > > > Thus it gives a warning and the users can run the qemu if they really
> > > > want to do it.
> > >
> > > Shouldn't we add a migration blocker in such a case instead of warning
> > > and letting qemu run wild?
> > >
> >
> > IMO, it doesn't need to block this. With share=on and -incoming, the user
> > can
> > still save the device memory state into memory-backend file again if
> > ignore-shared
> > capability is on.
> >
> > If we block this, the user can't use the ignore-shared capability in
> > incoming
> > migration.
>
> -incomign with share=on is a perfectly normal thing to do - it just
> depends who you are sharing the file with and the lifetime of that
> shared file.
>
> For example; if you're just running a qemu with vhost-user then you
> use share=on - however wyou typically select the backend file as
> a new file from /dev/shm - it's not a file that you previously migrated
> to.
>
Thanks,
but using a new file from /dev/shm means kernel will start from
start_kernel or early? Is it different from the x-ignore-shared case?
If we remove the share=on in incoming migration, all the writting
of ram will not be flush into the memory backend file. Thus we
can use the base memory backend file for ever.
e.g.
1) save the vm like a snapshot, current ram state is "kernel
has been started, systemd has been started"
2) restore it with -incoming and *no* share=on flag
3) restore it with -incoming and *no* share=on again...
In contrary, if we use share=on, the base backend file will
be written at once after 1st time incoming.

So, IMO, no "share=on" is the proper usage of incoming migration
when ignore-shared is on.
Please correct me if sth is wrong, thanks:)

B.R.
Catherine


> QEMU has no way to know about the provenance of the shared file it's
> been given, so we can't really warn people about it.
>
> Dave
>
> > B.R.
> > Catherine
> >
> > >
> > >
> > > > Signed-off-by: Catherine Ho 
> > > > ---
> > > >  backends/hostmem-file.c | 11 +++
> > > >  1 file changed, 11 insertions(+)
> > > >
> > > > diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
> > > > index 37ac6445d2..59429ee0b4 100644
> > > > --- a/backends/hostmem-file.c
> > > > +++ b/backends/hostmem-file.c
> > > > @@ -16,6 +16,7 @@
> > > >  #include "sysemu/hostmem.h"
> > > >  #include "sysemu/sysemu.h"
> > > >  #include "qom/object_interfaces.h"
> > > > +#include "migration/migration.h"
> > > >
> > > >  /* hostmem-file.c */
> > > >  /**
> > > > @@ -79,6 +80,16 @@ file_backend_memory_alloc(HostMemoryBackend
> *backend,
> > > Error **errp)
> > > >  }
> > > >  }
> > > >
> > > > +/*
> > > > + * In ignore shared case, if share=on for host memory backend
> file,
> > > > + * the ram might be written after incoming process is finished.
> Thus
> > > > + * the memory backend can't be reused for 2nd/3rd... incoming
> > > > + */
> > > > +if (backend->share && migrate_ignore_shared()
> > > > +   && runstate_check(RUN_STATE_INMIGRATE))
> > > > +warn_report("share=on for memory backend file might be "
> > > > +"conflicted with incoming in ignore shared
> > > case");
> > > > +
> > > >  backend->force_prealloc = mem_prealloc;
> > > >  name = host_memory_backend_get_name(backend);
> > > >  memory_region_init_ram_from_file(>mr, OBJECT(backend),
> > >
> > >
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
>


Re: [Qemu-devel] [PATCH for-4.1] commit: Make base read-only if there is an early failure

2019-04-10 Thread Eric Blake
On 4/10/19 10:24 AM, Alberto Garcia wrote:
> You can reproduce this by passing an invalid filter-node-name (like
> "1234") to block-commit. In this case the base image is put in
> read-write mode but is never reset back to read-only.
> 

Is it worth iotest coverage?

> Signed-off-by: Alberto Garcia 
> ---
>  block/commit.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/block/commit.c b/block/commit.c
> index ba60fef58a..698eda1dfe 100644
> --- a/block/commit.c
> +++ b/block/commit.c
> @@ -384,6 +384,9 @@ fail:
>  if (s->top) {
>  blk_unref(s->top);
>  }
> +if (s->base_read_only) {
> +bdrv_reopen_set_read_only(base, true, NULL);
> +}
>  job_early_fail(>common.job);
>  /* commit_top_bs has to be replaced after deleting the block job,
>   * otherwise this would fail because of lack of permissions. */
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH 2/3] xen-bus: allow AioContext to be specified for each event channel

2019-04-10 Thread Paul Durrant
> -Original Message-
> From: Anthony PERARD [mailto:anthony.per...@citrix.com]
> Sent: 10 April 2019 13:57
> To: Paul Durrant 
> Cc: qemu-devel@nongnu.org; qemu-bl...@nongnu.org; 
> xen-de...@lists.xenproject.org; Stefano Stabellini
> ; Stefan Hajnoczi ; Kevin Wolf 
> ; Max
> Reitz 
> Subject: Re: [PATCH 2/3] xen-bus: allow AioContext to be specified for each 
> event channel
> 
> On Mon, Apr 08, 2019 at 04:16:16PM +0100, Paul Durrant wrote:
> > This patch adds an AioContext parameter to xen_device_bind_event_channel()
> > and then uses aio_set_fd_handler() to set the callback rather than
> > qemu_set_fd_handler().
> >
> > Signed-off-by: Paul Durrant 
> > ---
> > @@ -943,6 +944,7 @@ static void xen_device_event(void *opaque)
> >  }
> >
> >  XenEventChannel *xen_device_bind_event_channel(XenDevice *xendev,
> > +   AioContext *ctx,
> > unsigned int port,
> > XenEventHandler handler,
> > void *opaque, Error **errp)
> > @@ -968,8 +970,9 @@ XenEventChannel 
> > *xen_device_bind_event_channel(XenDevice *xendev,
> >  channel->handler = handler;
> >  channel->opaque = opaque;
> >
> > -qemu_set_fd_handler(xenevtchn_fd(channel->xeh), xen_device_event, NULL,
> > -channel);
> > +channel->ctx = ctx;
> > +aio_set_fd_handler(channel->ctx, xenevtchn_fd(channel->xeh), false,
> > +   xen_device_event, NULL, NULL, channel);
> 
> I wonder if the `'is_external' parameter of aio_set_fd_handler shoud be
> `true' here, instead. That flag seems to be used when making a snapshot
> of a blockdev, for example.
> 
> That was introduced by:
> dca21ef23ba48f6f1428c59f295a857e5dc203c8^..c07bc2c1658fffeee08eb46402b2f66d55b07586
> 
> What do you think?

Interesting. I admit I was merely transcribing what qemu_set_fd_handler() 
passes without really looking into the values. Looking at the arguments that 
virtio-blk passes to aio_set_event_notifier() though, and what 'is_external' 
means, it would appear that setting it to true is probably the right thing to 
do. Do you want me to send a v2 of the series or can you fix it up?

  Cheers,

Paul

> 
> 
> --
> Anthony PERARD



Re: [Qemu-devel] [PATCH 0/2] qemu-img convert: ignore read errors

2019-04-10 Thread Max Reitz
On 09.04.19 16:56, Andrey Shinkevich wrote:
> The 'qemu-img convert' new command option 'force read' with the key '-R'
> allows converting a damaged image to get all the available information
> in case of the read errors. The program reports read errors and continue
> the image conversion. The users should keep in their minds that the
> resulting image is inconsistent.

I’ve already sent a series “Add salvaging mode to convert” back in December.

Max



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [PATCH for-4.1] commit: Make base read-only if there is an early failure

2019-04-10 Thread Alberto Garcia
You can reproduce this by passing an invalid filter-node-name (like
"1234") to block-commit. In this case the base image is put in
read-write mode but is never reset back to read-only.

Signed-off-by: Alberto Garcia 
---
 block/commit.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/commit.c b/block/commit.c
index ba60fef58a..698eda1dfe 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -384,6 +384,9 @@ fail:
 if (s->top) {
 blk_unref(s->top);
 }
+if (s->base_read_only) {
+bdrv_reopen_set_read_only(base, true, NULL);
+}
 job_early_fail(>common.job);
 /* commit_top_bs has to be replaced after deleting the block job,
  * otherwise this would fail because of lack of permissions. */
-- 
2.11.0




Re: [Qemu-devel] [PATCH 2/3] xen-bus: allow AioContext to be specified for each event channel

2019-04-10 Thread Anthony PERARD
On Wed, Apr 10, 2019 at 04:20:05PM +0100, Paul Durrant wrote:
> > -Original Message-
> > From: Anthony PERARD [mailto:anthony.per...@citrix.com]
> > Sent: 10 April 2019 13:57
> > To: Paul Durrant 
> > Cc: qemu-devel@nongnu.org; qemu-bl...@nongnu.org; 
> > xen-de...@lists.xenproject.org; Stefano Stabellini
> > ; Stefan Hajnoczi ; Kevin Wolf 
> > ; Max
> > Reitz 
> > Subject: Re: [PATCH 2/3] xen-bus: allow AioContext to be specified for each 
> > event channel
> > 
> > On Mon, Apr 08, 2019 at 04:16:16PM +0100, Paul Durrant wrote:
> > > This patch adds an AioContext parameter to xen_device_bind_event_channel()
> > > and then uses aio_set_fd_handler() to set the callback rather than
> > > qemu_set_fd_handler().
> > >
> > > Signed-off-by: Paul Durrant 
> > > ---
> > > @@ -943,6 +944,7 @@ static void xen_device_event(void *opaque)
> > >  }
> > >
> > >  XenEventChannel *xen_device_bind_event_channel(XenDevice *xendev,
> > > +   AioContext *ctx,
> > > unsigned int port,
> > > XenEventHandler handler,
> > > void *opaque, Error 
> > > **errp)
> > > @@ -968,8 +970,9 @@ XenEventChannel 
> > > *xen_device_bind_event_channel(XenDevice *xendev,
> > >  channel->handler = handler;
> > >  channel->opaque = opaque;
> > >
> > > -qemu_set_fd_handler(xenevtchn_fd(channel->xeh), xen_device_event, 
> > > NULL,
> > > -channel);
> > > +channel->ctx = ctx;
> > > +aio_set_fd_handler(channel->ctx, xenevtchn_fd(channel->xeh), false,
> > > +   xen_device_event, NULL, NULL, channel);
> > 
> > I wonder if the `'is_external' parameter of aio_set_fd_handler shoud be
> > `true' here, instead. That flag seems to be used when making a snapshot
> > of a blockdev, for example.
> > 
> > That was introduced by:
> > dca21ef23ba48f6f1428c59f295a857e5dc203c8^..c07bc2c1658fffeee08eb46402b2f66d55b07586
> > 
> > What do you think?
> 
> Interesting. I admit I was merely transcribing what qemu_set_fd_handler() 
> passes without really looking into the values. Looking at the arguments that 
> virtio-blk passes to aio_set_event_notifier() though, and what 'is_external' 
> means, it would appear that setting it to true is probably the right thing to 
> do. Do you want me to send a v2 of the series or can you fix it up?

I'll fix that up.

Reviewed-by: Anthony PERARD 

Thanks,

-- 
Anthony PERARD



Re: [Qemu-devel] [PATCH v2 for-4.0?] aio-posix: ensure poll mode is left when aio_notify is called

2019-04-10 Thread Stefan Hajnoczi
On Tue, Apr 09, 2019 at 02:28:23PM +0200, Paolo Bonzini wrote:
> With aio=thread, adaptive polling makes latency worse rather than
> better, because it delays the execution of the ThreadPool's
> completion bottom half.
> 
> event_notifier_poll() does run while polling, detecting that
> a bottom half was scheduled by a worker thread, but because
> ctx->notifier is explicitly ignored in run_poll_handlers_once(),
> scheduling the BH does not count as making progress and
> run_poll_handlers() keeps running.  Fix this by recomputing
> the deadline after *timeout could have changed.
> 
> With this change, ThreadPool still cannot participate in polling
> but at least it does not suffer from extra latency.
> 
> Reported-by: Sergio Lopez 
> Cc: Stefan Hajnoczi 
> Cc: Kevin Wolf 
> Cc: qemu-bl...@nongnu.org
> Signed-off-by: Paolo Bonzini 
> Message-Id: <1553692145-86728-1-git-send-email-pbonz...@redhat.com>
> Signed-off-by: Paolo Bonzini 
> ---
> v1->v2: use qemu_soonest_timeout to handle timeout == -1
>  util/aio-posix.c | 12 
>  1 file changed, 8 insertions(+), 4 deletions(-)

Thanks, applied to my block-next tree:
https://github.com/stefanha/qemu/commits/block-next

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH for-4.1] roms: Correct the EDK2_BASETOOLS_OPTFLAGS variable description

2019-04-10 Thread Philippe Mathieu-Daudé
On 4/10/19 4:58 PM, Laszlo Ersek wrote:
> On 04/10/19 06:57, Philippe Mathieu-Daudé wrote:
>> In commit 1cab464136b4 we incorrectly described the
>> EDK2_BASETOOLS_OPTFLAGS can pass CPPFLAGS and CFLAGS
>> options to the EDK2 build tools, but it only expands
>> the CFLAGS (not to the CPPFLAGS).
>> Update the description to be more accurate.
>>
>> Reported-by: Laszlo Ersek 
>> Signed-off-by: Philippe Mathieu-Daudé 
>> ---
>>  roms/Makefile | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/roms/Makefile b/roms/Makefile
>> index 1ff78b63bb3..d7fd6025e7c 100644
>> --- a/roms/Makefile
>> +++ b/roms/Makefile
>> @@ -120,8 +120,8 @@ build-efi-roms: build-pxe-roms
>>  $(patsubst %,bin-i386-efi/%.efidrv,$(pxerom_targets)) \
>>  $(patsubst %,bin-x86_64-efi/%.efidrv,$(pxerom_targets))
>>  
>> -# Build scripts can pass compiler/linker flags to the EDK2 build tools
>> -# via the EDK2_BASETOOLS_OPTFLAGS (CPPFLAGS and CFLAGS) and
>> +# Build scripts can pass compiler/linker flags to the EDK2
>> +# build  tools via the EDK2_BASETOOLS_OPTFLAGS (CFLAGS) and
>>  # EDK2_BASETOOLS_LDFLAGS (LDFLAGS) environment variables.
>>  #
>>  # Example:
>>
> 
> Reviewed-by: Laszlo Ersek 
> 
> Please noone submit a pull request with this patch (for 4.1) until my
> edk2 bundling series is merged; as this one will likely introduce
> another contextual conflict, and I'd *really* like to avoid another
> rebase on my end, due to such an issue.

Thanks for the review. I'll rebase it on top of your 'bundle edk2
platform firmware with QEMU' v4 and resend (later). Feel free to include
it on top of your pullreq.

Regards,

Phil.



Re: [Qemu-devel] [PATCH for-4.1] q35: acpi: do not create dummy MCFG table

2019-04-10 Thread Igor Mammedov
On Wed, 10 Apr 2019 22:27:56 +0800
Wei Yang  wrote:

[...]
> >@@ -2411,19 +2410,7 @@ build_mcfg_q35(GArray *table_data, BIOSLinker 
> >*linker, AcpiMcfgInfo *info)
> > mcfg->allocation[0].start_bus_number = 0;
> > mcfg->allocation[0].end_bus_number = PCIE_MMCFG_BUS(info->mcfg_size - 
> > 1);
> > 
> >-/* MCFG is used for ECAM which can be enabled or disabled by guest.  
> 
> I want to cnfirm what is "enabled or disabled by guest" here.

Firmware theoretically during PCI initialization may disable ECAM support
and that's when we do no need MCFG. In practice that's not happening
(SeaBIOS or UEFI) but we in case there is out there a firmware that does
disable ECAM we do not generate MCFG.

Note:
ACPI tables generated twice, 1st when QEMU starts and the second time
when firmware accesses fwcfg to read blobs for the 1st time.
The later happens after PCI subsystem was initialized by firmware.
At that time we know if ECAM was enabled or not.

> If we don't reserve mcfg and "guest" enable mcfg during running, the ACPI
> table size changed. But the destination still has the original table size,
> since destination "guest" keep sleep during this period.
> 
> Now the migration would face table size difference

with commit a1666142db we do not care as all the tables created on
source will be migrated to destination as is overwriting whatever blobs
destination created on startup.

> and break migration?
nope,

to help you figure out why it works
look at what following git commits did:
  git log c8d6f66ae7..a1666142db
and pay attention to 'used_length'

> 
> >- * To avoid table size changes (which create migration issues),
> >- * always create the table even if there are no allocations,
> >- * but set the signature to a reserved value in this case.
> >- * ACPI spec requires OSPMs to ignore such tables.
> >- */
> >-if (info->mcfg_base == PCIE_BASE_ADDR_UNMAPPED) {
> >-/* Reserved signature: ignored by OSPM */
> >-sig = "QEMU";
> >-} else {
> >-sig = "MCFG";
> >-}
> >-build_header(linker, table_data, (void *)mcfg, sig, len, 1, NULL, NULL);
> >+build_header(linker, table_data, (void *)mcfg, "MCFG", len, 1, NULL, 
> >NULL);
> > }
> > 
> > /*
> >@@ -2592,6 +2579,9 @@ static bool acpi_get_mcfg(AcpiMcfgInfo *mcfg)
> > }
> > mcfg->mcfg_base = qnum_get_uint(qobject_to(QNum, o));
> > qobject_unref(o);
> >+if (mcfg->mcfg_base == PCIE_BASE_ADDR_UNMAPPED) {
> >+return false;
> >+}
> > 
> > o = object_property_get_qobject(pci_host, PCIE_HOST_MCFG_SIZE, NULL);
> > assert(o);
> >-- 
> >2.7.4  
> 




Re: [Qemu-devel] [PATCH 3/3] xen-bus / xen-block: add support for event channel polling

2019-04-10 Thread Anthony PERARD
On Mon, Apr 08, 2019 at 04:16:17PM +0100, Paul Durrant wrote:
> This patch introduces a poll callback for event channel fd-s and uses
> this to invoke the channel callback function.
> 
> To properly support polling, it is necessary for the event channel callback
> function to return a boolean saying whether it has done any useful work or
> not. Thus xen_block_dataplane_event() is modified to directly invoke
> xen_block_handle_requests() and the latter only returns true if it actually
> processes any requests. This also means that the call to qemu_bh_schedule()
> is moved into xen_block_complete_aio(), which is more intuitive since the
> only reason for doing a deferred poll of the shared ring should be because
> there were previously insufficient resources to fully complete a previous
> poll.
> 
> Signed-off-by: Paul Durrant 
> ---

Reviewed-by: Anthony PERARD 

Thanks,

-- 
Anthony PERARD



Re: [Qemu-devel] [PATCH] hostmem-file: warn when memory-backend-file, share=on and in incoming migration

2019-04-10 Thread Dr. David Alan Gilbert
* Catherine Ho (catherine.h...@gmail.com) wrote:
> Hi Igor
> 
> 
> On Mon, 8 Apr 2019 at 18:35, Igor Mammedov  wrote:
> 
> > On Sun,  7 Apr 2019 22:19:05 -0400
> > Catherine Ho  wrote:
> >
> > > Currently it is not forbidden to use "-object
> > memory-backend-file,share=on"
> > > and together with "-incoming". But after incoming migration is finished,
> > > the memory-backend-file will be definitely written if share=on. So the
> > > memory-backend-file can only be used once, but failed in the 2nd time
> > > incoming.
> > >
> > > Thus it gives a warning and the users can run the qemu if they really
> > > want to do it.
> >
> > Shouldn't we add a migration blocker in such a case instead of warning
> > and letting qemu run wild?
> >
> 
> IMO, it doesn't need to block this. With share=on and -incoming, the user
> can
> still save the device memory state into memory-backend file again if
> ignore-shared
> capability is on.
> 
> If we block this, the user can't use the ignore-shared capability in
> incoming
> migration.

-incomign with share=on is a perfectly normal thing to do - it just
depends who you are sharing the file with and the lifetime of that
shared file.

For example; if you're just running a qemu with vhost-user then you
use share=on - however wyou typically select the backend file as
a new file from /dev/shm - it's not a file that you previously migrated
to.

QEMU has no way to know about the provenance of the shared file it's
been given, so we can't really warn people about it.

Dave

> B.R.
> Catherine
> 
> >
> >
> > > Signed-off-by: Catherine Ho 
> > > ---
> > >  backends/hostmem-file.c | 11 +++
> > >  1 file changed, 11 insertions(+)
> > >
> > > diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
> > > index 37ac6445d2..59429ee0b4 100644
> > > --- a/backends/hostmem-file.c
> > > +++ b/backends/hostmem-file.c
> > > @@ -16,6 +16,7 @@
> > >  #include "sysemu/hostmem.h"
> > >  #include "sysemu/sysemu.h"
> > >  #include "qom/object_interfaces.h"
> > > +#include "migration/migration.h"
> > >
> > >  /* hostmem-file.c */
> > >  /**
> > > @@ -79,6 +80,16 @@ file_backend_memory_alloc(HostMemoryBackend *backend,
> > Error **errp)
> > >  }
> > >  }
> > >
> > > +/*
> > > + * In ignore shared case, if share=on for host memory backend file,
> > > + * the ram might be written after incoming process is finished. Thus
> > > + * the memory backend can't be reused for 2nd/3rd... incoming
> > > + */
> > > +if (backend->share && migrate_ignore_shared()
> > > +   && runstate_check(RUN_STATE_INMIGRATE))
> > > +warn_report("share=on for memory backend file might be "
> > > +"conflicted with incoming in ignore shared
> > case");
> > > +
> > >  backend->force_prealloc = mem_prealloc;
> > >  name = host_memory_backend_get_name(backend);
> > >  memory_region_init_ram_from_file(>mr, OBJECT(backend),
> >
> >
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-devel] [PATCH for-4.1] roms: Correct the EDK2_BASETOOLS_OPTFLAGS variable description

2019-04-10 Thread Laszlo Ersek
On 04/10/19 06:57, Philippe Mathieu-Daudé wrote:
> In commit 1cab464136b4 we incorrectly described the
> EDK2_BASETOOLS_OPTFLAGS can pass CPPFLAGS and CFLAGS
> options to the EDK2 build tools, but it only expands
> the CFLAGS (not to the CPPFLAGS).
> Update the description to be more accurate.
> 
> Reported-by: Laszlo Ersek 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  roms/Makefile | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/roms/Makefile b/roms/Makefile
> index 1ff78b63bb3..d7fd6025e7c 100644
> --- a/roms/Makefile
> +++ b/roms/Makefile
> @@ -120,8 +120,8 @@ build-efi-roms: build-pxe-roms
>   $(patsubst %,bin-i386-efi/%.efidrv,$(pxerom_targets)) \
>   $(patsubst %,bin-x86_64-efi/%.efidrv,$(pxerom_targets))
>  
> -# Build scripts can pass compiler/linker flags to the EDK2 build tools
> -# via the EDK2_BASETOOLS_OPTFLAGS (CPPFLAGS and CFLAGS) and
> +# Build scripts can pass compiler/linker flags to the EDK2
> +# build  tools via the EDK2_BASETOOLS_OPTFLAGS (CFLAGS) and
>  # EDK2_BASETOOLS_LDFLAGS (LDFLAGS) environment variables.
>  #
>  # Example:
> 

Reviewed-by: Laszlo Ersek 

Please noone submit a pull request with this patch (for 4.1) until my
edk2 bundling series is merged; as this one will likely introduce
another contextual conflict, and I'd *really* like to avoid another
rebase on my end, due to such an issue.

Thanks!
Laszlo



Re: [Qemu-devel] [PATCH v2 for-4.0?] aio-posix: ensure poll mode is left when aio_notify is called

2019-04-10 Thread Stefan Hajnoczi
On Tue, Apr 09, 2019 at 02:28:23PM +0200, Paolo Bonzini wrote:

Why is this 4.0 material?  It's not a 4.0 regression and tweaking the
event loop is risky.  I suggest waiting for 4.1.

> With aio=thread, adaptive polling makes latency worse rather than
> better, because it delays the execution of the ThreadPool's
> completion bottom half.
> 
> event_notifier_poll() does run while polling, detecting that
> a bottom half was scheduled by a worker thread, but because
> ctx->notifier is explicitly ignored in run_poll_handlers_once(),
> scheduling the BH does not count as making progress and
> run_poll_handlers() keeps running.  Fix this by recomputing
> the deadline after *timeout could have changed.
> 
> With this change, ThreadPool still cannot participate in polling
> but at least it does not suffer from extra latency.
> 
> Reported-by: Sergio Lopez 
> Cc: Stefan Hajnoczi 
> Cc: Kevin Wolf 
> Cc: qemu-bl...@nongnu.org
> Signed-off-by: Paolo Bonzini 
> Message-Id: <1553692145-86728-1-git-send-email-pbonz...@redhat.com>
> Signed-off-by: Paolo Bonzini 
> ---
> v1->v2: use qemu_soonest_timeout to handle timeout == -1
>  util/aio-posix.c | 12 
>  1 file changed, 8 insertions(+), 4 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH for-4.0 v2 2/2] roms: Allow the EDK2_EFIROM variable to be overridden

2019-04-10 Thread Laszlo Ersek
On 04/10/19 08:25, Olaf Hering wrote:
> Am Mon, 8 Apr 2019 13:05:07 +0200
> schrieb Laszlo Ersek :
>
>> Then build scripts could be updated to invoke:
>>
>>   make -C roms \
>> EDK2_BASETOOLS_OPTFLAGS='...' \
>> EDK2_BASETOOLS_LDFLAGS='...' \
>> efirom
>
> The question remains: 'But why?'.

Because it lets you pass "-fno-pie" & friends.

> Why can edk2 not be built with '-fno-pie' right away?

Those flags are not universal across gcc/binutils versions.

Some gcc/binutils versions don't enable PIE to begin with.

Some gcc/binutils versions take different PIE-disablement flags
(considering both the compiler and the linker) from other gcc/binutils
versions.

For example, please refer to the following *iPXE* commit:

> commit 7c395b0e21806b946fe944a27fc273407f357ea1
> Author: Michael Brown 
> Date:   Wed Jun 14 12:33:16 2017 +0100
>
> [build] Use -no-pie on newer versions of gcc
>
> Some distributions patch gcc to generate position independent
> executables by default.  We currently include a workaround to check
> for this and to add -fno-PIE -nopie to CFLAGS if required.
>
> Newer patched versions of gcc require -fno-PIE -no-pie instead.  Check
> for both variants.
>
> Reported-by: Nathan Rennie-Waldock 
> Originally-fixed-by: Markos Chandras 
> Signed-off-by: Michael Brown 

(I remember this commit because the logic it had added actually failed
on a system that I used once for building iPXE -- it was an x86_64
system with both a native gcc, and a *different version*
x86_64-to-x86_64 *cross* gcc. The selection logic determined the flags
for the one compiler toolchain, but then passed the flags to the other
compiler toolchain. The build broke. I narrowed it down to the above
commit, and then shrugged it off; it wasn't important enough to spend
more time on it.)

I also refer you to the following *edk2* commit:

> commit 11d0cd23dd1bc15a6e6a1598250ea2e0c4c36e9a
> Author: Ard Biesheuvel 
> Date:   Mon Jun 18 10:23:49 2018 +0200
>
> BaseTools/tools_def IA32: drop -no-pie linker option for GCC49
>
> As reported by Liming, GCC 4.9.2 does not support the -no-pie
> linker option that we added to the GCC49 and GCC5 toolchain
> profiles in commit c25d3905523a ("BaseTools/tools_def IA32:
> disable PIE code generation explicitly") to work around issues
> with recent distro toolchains that enable PIE code generation
> by default.
>
> So rollback the changes for GCC49 but preserve them for GCC5
>
> Contributed-under: TianoCore Contribution Agreement 1.1
> Signed-off-by: Ard Biesheuvel 
> Acked-by: Laszlo Ersek 

The above facility will let you stuff the options into the edk2
BaseTools build, in your downstream qemu.spec file, that match your
downstream gcc/binutils version.

Most build hosts will need no flags.

> Did noone approach the edk2 developers yet that their buildsystem is
> broken in that regard?

Well, have you?

I don't remember anyone else reporting such an issue yet, on edk2-devel
or elsewhere, with building BaseTools. I certainly remember
collaborating with Gary Lin from SUSE on various stuff, but not this.

You are welcome to file a bug at .
Please pick "EDK2" for "Product", "Code" for "Component", and
"BaseTools" for "Package".

... Please don't try to force a "zero build-interface changes" policy on
upstream, just so you can avoid a small tweak in a downstream build
script when you rebase. Not even the gcc/binutils command lines conform
to that. We all hope downstream rebases to be painless, and upstreams do
strive to help with that, but the downstream rebase experience is never
*entirely* painless (or automated).

Thanks,
Laszlo



Re: [Qemu-devel] [Qemu-block] [PATCH] block/io.c: fix for the allocation failure

2019-04-10 Thread Stefan Hajnoczi
On Mon, Apr 08, 2019 at 12:14:49PM +0200, Kevin Wolf wrote:
> Am 08.04.2019 um 12:04 hat Kevin Wolf geschrieben:
> > Am 08.04.2019 um 11:44 hat Andrey Shinkevich geschrieben:
> > > 
> > > 
> > > On 06/04/2019 01:50, John Snow wrote:
> > > > 
> > > > 
> > > > On 4/5/19 10:24 AM, Andrey Shinkevich wrote:
> > > >> On a file system used by the customer, fallocate() returns an error
> > > >> if the block is not properly aligned. So, bdrv_co_pwrite_zeroes()
> > > >> fails. We can handle that case the same way as it is done for the
> > > >> unsupported cases, namely, call to bdrv_driver_pwritev() that writes
> > > >> zeroes to an image for the unaligned chunk of the block.
> > > >>
> > > >> Suggested-by: Denis V. Lunev 
> > > >> Signed-off-by: Andrey Shinkevich 
> > > >> ---
> > > >>   block/io.c | 2 +-
> > > >>   1 file changed, 1 insertion(+), 1 deletion(-)
> > > >>
> > > >> diff --git a/block/io.c b/block/io.c
> > > >> index dfc153b..0412a51 100644
> > > >> --- a/block/io.c
> > > >> +++ b/block/io.c
> > > >> @@ -1516,7 +1516,7 @@ static int coroutine_fn 
> > > >> bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
> > > >>   assert(!bs->supported_zero_flags);
> > > >>   }
> > > >>   
> > > >> -if (ret == -ENOTSUP && !(flags & BDRV_REQ_NO_FALLBACK)) {
> > > >> +if (ret < 0 && !(flags & BDRV_REQ_NO_FALLBACK)) {
> > > >>   /* Fall back to bounce buffer if write zeroes is 
> > > >> unsupported */
> > > >>   BdrvRequestFlags write_flags = flags & 
> > > >> ~BDRV_REQ_ZERO_WRITE;
> > > >>   
> > > >>
> > > > 
> > > > I suppose that if fallocate fails for any reason and we're allowing
> > > > fallback, we're either going to succeed ... or fail again very soon
> > > > thereafter.
> > > > 
> > > > Are there any cases where it is vital to not ignore the first fallocate
> > > > failure? I'm a little wary of ignoring the return code from
> > > > bdrv_co_pwrite_zeroes, but I am assuming that if there is a "real"
> > > > failure here that the following bounce writes will also fail "safely."
> > > > 
> > > > I'm not completely confident, but I have no tangible objections:
> > > > Reviewed-by: John Snow 
> > > > 
> > > 
> > > Thank you for your review, John!
> > > 
> > > Let me clarify the circumstances and quote the bug report:
> > > "Customer had Win-2012 VM with 50GB system disk which was later resized 
> > > to 256GB without resizing the partition inside VM.
> > > Now, while trying to resize to 50G, the following error will appear
> > > 'Failed to reduce the number of L2 tables: Invalid argument'
> > > It was found that it is possible to shrink the disk to 128G and any size 
> > > above that number, but size below 128G will bring the mentioned error."
> > > 
> > > The fallocate() returns no error on that file system if the offset and
> > > the (offset + bytes) parameters of the bdrv_co_do_pwrite_zeroes() both
> > > are aligned to 4K.
> > 
> > What is the return value you get from this file system?
> > 
> > Maybe turning that into ENOTSUP in file-posix would be less invasive.
> > Just falling back for any error gives me the vague feeling that it could
> > cause problems sooner or later.
> 
> Also, which file system is this? This seems to be a file system bug.
> fallocate() isn't documented to possibly have alignment restrictions for
> FALLOC_FL_ZERO_RANGE (if this is the operation you're talking about).
> FALLOC_FL_PUNCH_HOLE even explicitly mentions the behaviour for partial
> blocks, so there is no doubt that operations for partial blocks are
> considered valid. Operations that may impose restrictions are explicitly
> documented as such.
> 
> So in any case, this would only be a workaround for a buggy file system.
> The real bug needs to be fixed there.

I agree regarding the root cause of the bug, but the fallback behavior
is reasonable IMO.

Andrey: If you update the patch with a more specific errno I'll queue
that patch instead.

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH v5 2/5] virtio-pmem: Add virtio pmem driver

2019-04-10 Thread Yuval Shaia
> +
> +static int virtio_pmem_probe(struct virtio_device *vdev)
> +{
> + int err = 0;
> + struct resource res;
> + struct virtio_pmem *vpmem;
> + struct nvdimm_bus *nvdimm_bus;
> + struct nd_region_desc ndr_desc = {};
> + int nid = dev_to_node(>dev);
> + struct nd_region *nd_region;
> +
> + if (!vdev->config->get) {
> + dev_err(>dev, "%s failure: config disabled\n",
> + __func__);
> + return -EINVAL;
> + }
> +
> + vdev->priv = vpmem = devm_kzalloc(>dev, sizeof(*vpmem),
> + GFP_KERNEL);
> + if (!vpmem) {
> + err = -ENOMEM;
> + goto out_err;
> + }
> +
> + vpmem->vdev = vdev;
> + err = init_vq(vpmem);
> + if (err)
> + goto out_err;
> +
> + virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> + start, >start);
> + virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> + size, >size);
> +
> + res.start = vpmem->start;
> + res.end   = vpmem->start + vpmem->size-1;
> + vpmem->nd_desc.provider_name = "virtio-pmem";
> + vpmem->nd_desc.module = THIS_MODULE;
> +
> + vpmem->nvdimm_bus = nvdimm_bus = nvdimm_bus_register(>dev,
> + >nd_desc);
> + if (!nvdimm_bus)
> + goto out_vq;
> +
> + dev_set_drvdata(>dev, nvdimm_bus);
> +
> + ndr_desc.res = 
> + ndr_desc.numa_node = nid;
> + ndr_desc.flush = virtio_pmem_flush;
> + set_bit(ND_REGION_PAGEMAP, _desc.flags);
> + set_bit(ND_REGION_ASYNC, _desc.flags);
> + nd_region = nvdimm_pmem_region_create(nvdimm_bus, _desc);
> + nd_region->provider_data =  dev_to_virtio

Pleas delete the extra space.

> + (nd_region->dev.parent->parent);
> +
> + if (!nd_region)
> + goto out_nd;
> +
> + return 0;
> +out_nd:
> + err = -ENXIO;
> + nvdimm_bus_unregister(nvdimm_bus);
> +out_vq:
> + vdev->config->del_vqs(vdev);
> +out_err:
> + dev_err(>dev, "failed to register virtio pmem memory\n");
> + return err;
> +}



Re: [Qemu-devel] [PATCH v5 2/5] virtio-pmem: Add virtio pmem driver

2019-04-10 Thread Yuval Shaia
On Wed, Apr 10, 2019 at 02:24:26PM +0200, Cornelia Huck wrote:
> On Wed, 10 Apr 2019 09:38:22 +0530
> Pankaj Gupta  wrote:
> 
> > This patch adds virtio-pmem driver for KVM guest.
> > 
> > Guest reads the persistent memory range information from
> > Qemu over VIRTIO and registers it on nvdimm_bus. It also
> > creates a nd_region object with the persistent memory
> > range information so that existing 'nvdimm/pmem' driver
> > can reserve this into system memory map. This way
> > 'virtio-pmem' driver uses existing functionality of pmem
> > driver to register persistent memory compatible for DAX
> > capable filesystems.
> > 
> > This also provides function to perform guest flush over
> > VIRTIO from 'pmem' driver when userspace performs flush
> > on DAX memory range.
> > 
> > Signed-off-by: Pankaj Gupta 
> > ---
> >  drivers/nvdimm/virtio_pmem.c |  88 ++
> >  drivers/virtio/Kconfig   |  10 +++
> >  drivers/virtio/Makefile  |   1 +
> >  drivers/virtio/pmem.c| 124 +++
> >  include/linux/virtio_pmem.h  |  60 +++
> >  include/uapi/linux/virtio_ids.h  |   1 +
> >  include/uapi/linux/virtio_pmem.h |  10 +++
> >  7 files changed, 294 insertions(+)
> >  create mode 100644 drivers/nvdimm/virtio_pmem.c
> >  create mode 100644 drivers/virtio/pmem.c
> >  create mode 100644 include/linux/virtio_pmem.h
> >  create mode 100644 include/uapi/linux/virtio_pmem.h
> > 
> (...)
> > diff --git a/drivers/virtio/pmem.c b/drivers/virtio/pmem.c
> > new file mode 100644
> > index ..cc9de9589d56
> > --- /dev/null
> > +++ b/drivers/virtio/pmem.c
> > @@ -0,0 +1,124 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * virtio_pmem.c: Virtio pmem Driver
> > + *
> > + * Discovers persistent memory range information
> > + * from host and registers the virtual pmem device
> > + * with libnvdimm core.
> > + */
> > +#include 
> > +#include <../../drivers/nvdimm/nd.h>
> > +
> > +static struct virtio_device_id id_table[] = {
> > +   { VIRTIO_ID_PMEM, VIRTIO_DEV_ANY_ID },
> > +   { 0 },
> > +};
> > +
> > + /* Initialize virt queue */
> > +static int init_vq(struct virtio_pmem *vpmem)
> 
> IMHO, you don't gain much by splitting off this function...

It make sense to have all the vq-init-related stuff in one function, so
here pmem_lock and req_list are used only for the vq.
Saying that - maybe it would be better to have the 3 in one struct.

> 
> > +{
> > +   struct virtqueue *vq;
> > +
> > +   /* single vq */
> > +   vpmem->req_vq = vq = virtio_find_single_vq(vpmem->vdev,
> > +   host_ack, "flush_queue");
> > +   if (IS_ERR(vq))
> > +   return PTR_ERR(vq);
> 
> I'm personally not a fan of chained assignments... I think I'd just
> drop the 'vq' variable and operate on vpmem->req_vq directly.

+1

> 
> > +
> > +   spin_lock_init(>pmem_lock);
> > +   INIT_LIST_HEAD(>req_list);
> > +
> > +   return 0;
> > +};
> > +
> > +static int virtio_pmem_probe(struct virtio_device *vdev)
> > +{
> > +   int err = 0;
> > +   struct resource res;
> > +   struct virtio_pmem *vpmem;
> > +   struct nvdimm_bus *nvdimm_bus;
> > +   struct nd_region_desc ndr_desc = {};
> > +   int nid = dev_to_node(>dev);
> > +   struct nd_region *nd_region;
> > +
> > +   if (!vdev->config->get) {
> > +   dev_err(>dev, "%s failure: config disabled\n",
> 
> Maybe s/config disabled/config access disabled/ ? That seems to be the
> more common message.
> 
> > +   __func__);
> > +   return -EINVAL;
> > +   }
> > +
> > +   vdev->priv = vpmem = devm_kzalloc(>dev, sizeof(*vpmem),
> > +   GFP_KERNEL);
> 
> Here, the vpmem variable makes sense for convenience, but I'm again not
> a fan of the chaining :)

+1

> 
> > +   if (!vpmem) {
> > +   err = -ENOMEM;
> > +   goto out_err;
> > +   }
> > +
> > +   vpmem->vdev = vdev;
> > +   err = init_vq(vpmem);
> > +   if (err)
> > +   goto out_err;
> > +
> > +   virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> > +   start, >start);
> > +   virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> > +   size, >size);
> > +
> > +   res.start = vpmem->start;
> > +   res.end   = vpmem->start + vpmem->size-1;
> > +   vpmem->nd_desc.provider_name = "virtio-pmem";
> > +   vpmem->nd_desc.module = THIS_MODULE;
> > +
> > +   vpmem->nvdimm_bus = nvdimm_bus = nvdimm_bus_register(>dev,
> > +   >nd_desc);
> 
> And here :)
> 
> > +   if (!nvdimm_bus)
> > +   goto out_vq;
> > +
> > +   dev_set_drvdata(>dev, nvdimm_bus);
> > +
> > +   ndr_desc.res = 
> > +   ndr_desc.numa_node = nid;
> > +   ndr_desc.flush = virtio_pmem_flush;
> > +   set_bit(ND_REGION_PAGEMAP, _desc.flags);
> > +   set_bit(ND_REGION_ASYNC, _desc.flags);
> > +   nd_region = nvdimm_pmem_region_create(nvdimm_bus, _desc);
> > +   nd_region->provider_data =  dev_to_virtio
> > +  

  1   2   >